Create your own search engine with YaCy

    ByVirginia D. Bannon

    May 25, 2022

    Traditional search engines like Google are pretty good at what they do, but many people choose not to use them because of privacy issues. Then there are those who worry that content will fall through the cracks simply because the creator hasn’t followed best practices to search engine optimization (SEO).

    YaCy (opens in a new tab)a open-source distributed search engine, works much like its traditional peers, but suffers none of their ills. YaCy uses a peer-to-peer (P2P) network, so every user running an instance of the search engine joins in the effort of indexing the Internet. The index is distributed and redundant among all YaCy users.

    To further strengthen its privacy credentials, YaCy ensures that no one can tell who searched for which words, making all searches functionally anonymous.

    YaCy only indexes publicly accessible, non-password protected pages. You can also use it as a search engine for your websiteor use it to index pages on the intranet, which it guarantees cannot be accessed by anyone outside your network.

    (Image credit: YaCy)

    Facility

    YaCy is written in Java and runs on the Windows, macOSand linux. Search engines are complex beasts, but thanks to the distributed nature of YaCy, you don’t need a fast machine, or a lot of space to run a YaCy client.

    The installation is quite simple. Before you begin, make sure Java is installed on the machine. Windows and macOS users can get pre-built binaries from Adoptium (opens in a new tab)while Linux users can check it out from their official repositories.

    For example, Debian users can use sudo apt install default-jdkwhile Fedora users can search for available versions with sudo dnf search openjdkbefore installing the latest version with sudo dnf install .

    Once Java is installed, download the YaCy executable (opens in a new tab) for your platform and extract it. For example, the command sudo tar –extract –file yacy_*z –directory /opt -v, will extract the installer under the /opt directory in Linux. Now just change to the extracted directory and start YaCy:

    # cd /opt/yacy

    # ./startYACY.sh

    YaCy is now running on port 8090 on your computer. Launch a web browser and head to http://localhost:8090 to access the YaCy instance. You can now search the Internet as you would with a traditional search engine.

    (Image credit: YaCy)

    explore internet

    You can do much more with the YaCy search engine than just search passively. For example, since P2P indexing is user-driven, you can ask YaCy to crawl any website.

    To access advanced administrative controls for your search engine, click the Administration button in the upper right corner. This brings up the administration panel, which among other things allows you to modify how your YaCy instance interacts with other YaCy clients on the network.

    To start a manual web crawl, go to Load web pages, Crawler option under the First steps menu. Enter the URL in the space provided and press Start a new exploration. As the crawler starts up, it starts displaying all kinds of crawling statistics, and you can scroll down to graphically view the structure of the scrolled website.

    After initiating the crawl, head to Monitoring > Index Browser to see how many pages have been indexed and view other details, such as their name and number of outbound links.

    (Image credit: YaCy)

    For now, you can use the default option and explore the other options, such as crawler limitation, once you get familiar with YaCy. The search engine can run multiple crawls at the same time, and you can either launch them in series from the First steps section, or head to Production > Advanced Crawler to explore multiple websites at the same time.

    Once the crawl job is started, YaCy indexes the URLs you enter and stores the index on your local computer. To ensure that your index is available to YaCy users all over the world, you will need to join the YaCy P2P network.

    For this, you need to open port 8090 in your router’s firewall. Log into your router’s admin page and look for a control panel that controls the firewall or port forwarding.

    (Image credit: YaCy)

    Once you find your router’s firewall preferences, whitelist port 8090. If your router performs port forwarding, you must redirect incoming traffic to your computer’s IP address, using the same port.

    After joining the YaCy network, you can switch the Do a remote indexing option under the Advanced crawler. This allows your client to serve the URLs it indexes, and other clients on the network that have chosen to accept requests can help you perform the crawl.

    Your own Google

    Instead of searching the web, you can use YaCy to search your own data or implement a search system for local file shares in your corporate intranet.

    For this you will need to run YaCy as an internal indexer. In these modes, only people on your local network can use your custom instance of YaCy to search for shared files, and none of the data is shared with users outside your network.

    (Image credit: YaCy)

    Head toward Administration > Getting Started > Use Cases & Account. Here you can specify basic details such as YaCy’s interface language.

    You will also be able to modify the behavior of your YaCy instance from here. The default option is to use your client as part of YaCy’s global P2P network to help you crawl and index the web.

    To create a search portal for your own website, you need to select the Search portal for your own web pages option. Next, scroll down and tap the Set configuration button. Next, you need to crawl your domain to generate the content that will be available through your search tool.

    To embed search into your website, scroll down the left column to the Search Portal Integration section. You are dropped off at the Portal Setup page, from which you can customize the appearance of YaCy with your branding to integrate it into your website. When you are finished, press the Edit search page button. You can now use any of the generated iframe snippets to embed custom search powered by YaCy into your website.

    Similarly, to use YaCy to index the local network, you will need to select the third option in the First steps section. You can then use the Advanced crawler to crawl your intranet.

    (Image credit: YaCy)

    Conclusion

    There’s so much more you can do with YaCy. The project does not offer enough documentation to cover all the features of the search engine. However, the project is quite intuitive and its interface is detailed enough to help you switch to the right option.

    All things considered, YaCy is one of the best options for users who want an anonymous, unbiased, ad-free, and privacy-friendly web search engine that you can also use to help users find content on your website or privately on your intranet.

    We have listed the best web browsers.