Wishlist 0 ¥0.00

Local Deployment of YaCy with Steady-State Minimum Configuration

YaCy is a decentralized, peer-to-peer (P2P) search engine that allows users to create a distributed search network without relying on centralized servers. When deployed locally, YaCy uses an embedded Jetty Web server, eliminating the need for additional server software like Apache or Nginx. However, users may encounter issues such as the server stopping unexpectedly, often due to resource constraints. A critical configuration to address this is the steady-state minimum setting, which pauses crawling when free disk space falls below a specified threshold (e.g., 4096 MiB or 4GB). This article explores how to configure this setting, stabilize Jetty, and prepare YaCy for potential public deployment.

Understanding YaCy's Jetty Server and Stability

YaCy's built-in Jetty server, a lightweight Java-based framework, handles the web interface (accessible at http://localhost:8090), search queries, and P2P communication. Jetty is reliable for local and small-scale public deployments, as seen in projects like Jenkins and Apache Solr. However, stability issues may arise if system resources (disk, memory, or CPU) are insufficient, particularly during intensive crawling tasks. For example, a user might notice YaCy stopping unexpectedly, requiring a restart. Common causes include:

  • Disk Space Exhaustion: Crawling and indexing can generate large amounts of data, filling the disk and causing Jetty to halt.
  • Memory Overload: Java Virtual Machine (JVM) memory shortages may crash the server.
  • Port Conflicts or Network Issues: Misconfigured ports (default: 8090 for HTTP, 8443 for HTTPS) or firewall restrictions can disrupt Jetty.

The steady-state minimum setting mitigates disk-related issues by stopping crawling when free disk space drops below a set value, such as 4096 MiB.

Configuring the Steady-State Minimum

The steady-state minimum ensures YaCy pauses crawling before disk space is depleted, protecting Jetty's stability. Here's how to configure it to 4096 MiB (4GB):

Via the Web Interface

  1. Access the YaCy admin panel at http://localhost:8090.
  2. Navigate to System Administration or Performance Settings (the exact menu may vary by version).
  3. Locate the Steady-State Minimum field, typically under Index Control or Performance.
  4. Set the value to 4096 MiB (or adjust based on your disk capacity, e.g., 10–20% of total space).
  5. Save the changes. YaCy will now pause crawling when free disk space falls below 4GB.
  6. Verify by monitoring the crawler status (http://localhost:8090/Crawler_p.html) or logs (DATA/LOG/yacy00.log).

Via Configuration File

  1. Locate the YaCy configuration file at DATA/SETTINGS/yacy.conf in the installation directory.
  2. Open the file and find or add the line:
    disk.steady=4096
    
  3. Save the file and restart YaCy:
    ./stopYACY.sh
    ./startYACY.sh
    
    For Docker deployments:
    docker restart yacy
    

Via Docker Environment Variable

For Docker users, set the steady-state minimum when launching the container:

docker run -d -p 8090:8090 -p 8443:8443 -e YACY_DISK_STEADY=4096 --name yacy yacy/yacy_search_server:latest

Stabilizing Jetty for Local Deployment

To prevent Jetty from stopping unexpectedly, address resource constraints alongside the steady-state minimum:

  1. Monitor Disk Usage:

    • Ensure at least 20GB of free disk space for crawling and indexing.
    • Regularly clear old logs (DATA/LOG) and limit index size in the web interface (http://localhost:8090/IndexControl_p.html).
    • Check disk usage with:
      df -h
      
  2. Optimize JVM Memory:

    • Edit the startup script (startYACY.sh or startYACY.bat) to allocate more memory:
      java -Xms512m -Xmx4096m -jar yacy.jar
      
    • This sets the initial heap to 512MB and the maximum to 4GB, suitable for most local setups.
  3. Limit Crawler Load:

    • Reduce crawler concurrency and depth in the web interface (http://localhost:8090/Crawler_p.html).
    • Set a maximum index size to control data growth.
  4. Check for Port Conflicts:

    • Verify ports 8090 and 8443 are free:
      netstat -tuln | grep 8090
      
    • If conflicts occur, change the port in yacy.conf (e.g., port=8080).
  5. Enable Auto-Restart:

    • For Linux, configure YaCy as a systemd service:
      [Unit]
      Description=YaCy Search Engine
      After=network.target
      [Service]
      ExecStart=/path/to/yacy/startYACY.sh
      WorkingDirectory=/path/to/yacy
      Restart=always
      [Install]
      WantedBy=multi-user.target
      
    • Enable and start:
      systemctl enable yacy
      systemctl start yacy
      
  6. Review Logs:

    • Check DATA/LOG/yacy00.log for errors like OutOfMemoryError or Disk full.
    • For Docker:
      docker logs yacy
      

Preparing for Public Deployment

For public-facing YaCy instances, Jetty remains suitable and does not require replacement. Its lightweight design supports small to medium-scale traffic, and YaCy's typical use case (search and P2P communication) rarely demands high concurrency. However, additional configurations enhance stability and security:

  1. Nginx Reverse Proxy:

    • Use Nginx to handle public requests, enabling standard ports (80/443) and SSL:
      server {
          listen 443 ssl;
          server_name yacy.example.com;
          ssl_certificate /etc/letsencrypt/live/yacy.example.com/fullchain.pem;
          ssl_certificate_key /etc/letsencrypt/live/yacy.example.com/privkey.pem;
          location / {
              proxy_pass http://localhost:8090;
              proxy_set_header Host $host;
              proxy_set_header X-Real-IP $remote_addr;
              proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
              proxy_set_header X-Forwarded-Proto $scheme;
          }
      }
      
    • Install an SSL certificate with Certbot:
      sudo certbot --nginx -d yacy.example.com
      
  2. Disk and Resource Planning:

    • Use a server with at least 4GB RAM and 100GB disk space to handle public crawling tasks.
    • Maintain the steady-state minimum (e.g., 4096 MiB) to prevent disk exhaustion.
    • Regularly back up DATA/INDEX and DATA/SETTINGS.
  3. Security:

    • Set an admin password in the web interface (http://localhost:8090/AccessControl_p.html).
    • Restrict firewall access to ports 80/443 (public) and 8090/8443 (internal).
  4. Monitoring:

    • Use tools like Prometheus or df -h to track disk and resource usage.
    • Ensure systemd or Docker auto-restarts YaCy on failure.

Conclusion

YaCy's built-in Jetty server is stable for both local and public deployments when properly configured. Setting the steady-state minimum to 4096 MiB prevents disk-related crashes by pausing crawling at low disk space. Combine this with JVM memory optimization, crawler limits, and auto-restart mechanisms to resolve issues like unexpected stops. For public deployment, Jetty requires no replacement; instead, use Nginx as a reverse proxy for SSL and load management. By monitoring logs and resources, users can maintain a robust YaCy instance for decentralized search, whether for private or public use.

No comments

About Us

Since 1996, our company has been focusing on domain name registration, web hosting, server hosting, website construction, e-commerce and other Internet services, and constantly practicing the concept of "providing enterprise-level solutions and providing personalized service support". As a Dell Authorized Solution Provider, we also provide hardware product solutions associated with the company's services.
 

Contact Us

Address: No. 2, Jingwu Road, Zhengzhou City, Henan Province

Phone: 0086-371-63520088 

QQ:76257322

Website: 800188.com

E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.