YaCy is a decentralized, peer-to-peer (P2P) search engine that allows users to create a distributed search network without relying on centralized servers. When deployed locally, YaCy uses an embedded Jetty Web server, eliminating the need for additional server software like Apache or Nginx. However, users may encounter issues such as the server stopping unexpectedly, often due to resource constraints. A critical configuration to address this is the steady-state minimum setting, which pauses crawling when free disk space falls below a specified threshold (e.g., 4096 MiB or 4GB). This article explores how to configure this setting, stabilize Jetty, and prepare YaCy for potential public deployment.
Understanding YaCy's Jetty Server and Stability
YaCy's built-in Jetty server, a lightweight Java-based framework, handles the web interface (accessible at http://localhost:8090
), search queries, and P2P communication. Jetty is reliable for local and small-scale public deployments, as seen in projects like Jenkins and Apache Solr. However, stability issues may arise if system resources (disk, memory, or CPU) are insufficient, particularly during intensive crawling tasks. For example, a user might notice YaCy stopping unexpectedly, requiring a restart. Common causes include:
- Disk Space Exhaustion: Crawling and indexing can generate large amounts of data, filling the disk and causing Jetty to halt.
- Memory Overload: Java Virtual Machine (JVM) memory shortages may crash the server.
- Port Conflicts or Network Issues: Misconfigured ports (default: 8090 for HTTP, 8443 for HTTPS) or firewall restrictions can disrupt Jetty.
The steady-state minimum setting mitigates disk-related issues by stopping crawling when free disk space drops below a set value, such as 4096 MiB.
Configuring the Steady-State Minimum
The steady-state minimum ensures YaCy pauses crawling before disk space is depleted, protecting Jetty's stability. Here's how to configure it to 4096 MiB (4GB):
Via the Web Interface
- Access the YaCy admin panel at
http://localhost:8090
. - Navigate to System Administration or Performance Settings (the exact menu may vary by version).
- Locate the Steady-State Minimum field, typically under Index Control or Performance.
- Set the value to
4096
MiB (or adjust based on your disk capacity, e.g., 10–20% of total space). - Save the changes. YaCy will now pause crawling when free disk space falls below 4GB.
- Verify by monitoring the crawler status (
http://localhost:8090/Crawler_p.html
) or logs (DATA/LOG/yacy00.log
).
Via Configuration File
- Locate the YaCy configuration file at
DATA/SETTINGS/yacy.conf
in the installation directory. - Open the file and find or add the line:
disk.steady=4096
- Save the file and restart YaCy:
For Docker deployments:./stopYACY.sh ./startYACY.sh
docker restart yacy
Via Docker Environment Variable
For Docker users, set the steady-state minimum when launching the container:
docker run -d -p 8090:8090 -p 8443:8443 -e YACY_DISK_STEADY=4096 --name yacy yacy/yacy_search_server:latest
Stabilizing Jetty for Local Deployment
To prevent Jetty from stopping unexpectedly, address resource constraints alongside the steady-state minimum:
-
Monitor Disk Usage:
- Ensure at least 20GB of free disk space for crawling and indexing.
- Regularly clear old logs (
DATA/LOG
) and limit index size in the web interface (http://localhost:8090/IndexControl_p.html
). - Check disk usage with:
df -h
-
Optimize JVM Memory:
- Edit the startup script (
startYACY.sh
orstartYACY.bat
) to allocate more memory:java -Xms512m -Xmx4096m -jar yacy.jar
- This sets the initial heap to 512MB and the maximum to 4GB, suitable for most local setups.
- Edit the startup script (
-
Limit Crawler Load:
- Reduce crawler concurrency and depth in the web interface (
http://localhost:8090/Crawler_p.html
). - Set a maximum index size to control data growth.
- Reduce crawler concurrency and depth in the web interface (
-
Check for Port Conflicts:
- Verify ports 8090 and 8443 are free:
netstat -tuln | grep 8090
- If conflicts occur, change the port in
yacy.conf
(e.g.,port=8080
).
- Verify ports 8090 and 8443 are free:
-
Enable Auto-Restart:
- For Linux, configure YaCy as a systemd service:
[Unit] Description=YaCy Search Engine After=network.target [Service] ExecStart=/path/to/yacy/startYACY.sh WorkingDirectory=/path/to/yacy Restart=always [Install] WantedBy=multi-user.target
- Enable and start:
systemctl enable yacy systemctl start yacy
- For Linux, configure YaCy as a systemd service:
-
Review Logs:
- Check
DATA/LOG/yacy00.log
for errors likeOutOfMemoryError
orDisk full
. - For Docker:
docker logs yacy
- Check
Preparing for Public Deployment
For public-facing YaCy instances, Jetty remains suitable and does not require replacement. Its lightweight design supports small to medium-scale traffic, and YaCy's typical use case (search and P2P communication) rarely demands high concurrency. However, additional configurations enhance stability and security:
-
Nginx Reverse Proxy:
- Use Nginx to handle public requests, enabling standard ports (80/443) and SSL:
server { listen 443 ssl; server_name yacy.example.com; ssl_certificate /etc/letsencrypt/live/yacy.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/yacy.example.com/privkey.pem; location / { proxy_pass http://localhost:8090; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
- Install an SSL certificate with Certbot:
sudo certbot --nginx -d yacy.example.com
- Use Nginx to handle public requests, enabling standard ports (80/443) and SSL:
-
Disk and Resource Planning:
- Use a server with at least 4GB RAM and 100GB disk space to handle public crawling tasks.
- Maintain the steady-state minimum (e.g., 4096 MiB) to prevent disk exhaustion.
- Regularly back up
DATA/INDEX
andDATA/SETTINGS
.
-
Security:
- Set an admin password in the web interface (
http://localhost:8090/AccessControl_p.html
). - Restrict firewall access to ports 80/443 (public) and 8090/8443 (internal).
- Set an admin password in the web interface (
-
Monitoring:
- Use tools like Prometheus or
df -h
to track disk and resource usage. - Ensure systemd or Docker auto-restarts YaCy on failure.
- Use tools like Prometheus or
Conclusion
YaCy's built-in Jetty server is stable for both local and public deployments when properly configured. Setting the steady-state minimum to 4096 MiB prevents disk-related crashes by pausing crawling at low disk space. Combine this with JVM memory optimization, crawler limits, and auto-restart mechanisms to resolve issues like unexpected stops. For public deployment, Jetty requires no replacement; instead, use Nginx as a reverse proxy for SSL and load management. By monitoring logs and resources, users can maintain a robust YaCy instance for decentralized search, whether for private or public use.