Distributed Search Engine

This page contains a brief description of the project that aims to build a decentralized web search engine. In the following, we give a summary of the goals of the project and of the technology that will be used to achieve these goals. Fuller description is available in the white paper. This white paper is not the final version and will be continuously updated as the project evolves. We also prepared a verbal presentation of this material that is available via the link on this page. Further information concerning the structure and operations of our company is available via the "About" link in the menu. At the moment we are conducting a crowd funding round and further information can be seen via the "Ico" link.

A search engine is a key element in organization of publicly available knowledge. It plays a key role in the interaction of human subjects and aggregations thereof with the world's repository of knowledge, a fraction of which is available via world wide web. Millions of people spend significant part of their day working time searching for the information on the web and analyzing the information they were able to retrieve this way. This is especially true about people working in technical disciplines whose work concerns analysis and production of new technological knowledge. Sample professions include engineers, pharmacologists, biologists , data analysts, and especially workers associated with academic institutions, who enjoy wider availability of free information sources. Currently available search engines do an adequate job in retrieving web information. This is especially true about relatively simple kinds of information, for example, location of businesses or web sites of specific companies. It is not so clear when we talk about scientific knowledge and deep inference. Unfortunately, there is no metric provided by the search engines. Worse still, the search engines keep their ranking algorithms secret. The argument for this secrecy usually invokes the large community of professional programmers who are employed to design sites in a special way to manipulate the rank. We think this argument is misleading. In our opinion, it is possible to design open source algorithms that will rank the pages according to the amount and quality of the information they contain, ranking low the pages that do not contain the information being sought for. It is one of our goals to implement such algorithms. We will develop software suites to allow researchers and individuals to propose and test their own algorithms, in restricted domains and globally. We will develop software tools to monetarily reward the developers of the best algorithms, through transfer of funds earned in advertisement service to the developers of the algorithms.

It would not be an exaggeration to say that when a person is reading a book, article, or other text that he is interacting with the world knowledge system, a kind of "knowledge graph", as this structure came to be known. In this way, he alters permanently his own knowledge graph, adding new elements to it, and reweighting already existing elements. Web search is heavily involved in this process, as it is a constant dialogue that a person leads with the search engine, asking more and more questions. This dialogue is a very singinificant source of information. In a sense, it is an imprint of the personality of the human that conducted this dialogue, some kind of digital personality. Nowadays, this data is misused. It is used with the sole purpose of feeding the advertisements to the person. This situation can be changed. We will develop software tools that will allow humans to collect this information, store it, make it subject to analysis, and perhaps allow it lead its own "digital life" on the web, interacting with existing web pages and other personality imprints. We will also allow them to trade it. This information is subject to privacy constrains and therefore differential privacy algorithms must be developed to efficiently use it. We will create a market for this kind of data. It is especially important for large technological corporations, data tanks, that conduct detailed technological research using the minds and lifetime of their workers. We will develop tools to collect and organize the data streams that are produced by the workers in their interaction with the public information sources. We will also develop tools for the people to construct and manage their "web reflection", the above mentioned data structure that results from their information consumption.

A few year ago there happened so-called decentralization revolution, or blockchain revolution. It was triggered by the creation of bitcoin, and stemmed from the realization that operation of many commercial companies have a natural operation order that is conveniently captured by a ledger or a blockchain. Amazing software tools were created, for example, Hyperledger Fabric. However, the progress in decentralizing existing organizations is slow. One of the first areas where such a decentralization can be attempted is the business that concerns itself entirely with the web, such as web search. We propose a way to decentralize this business. The core of web search business is the delivery of advertisement based on the keywords the end user searches for. The ads are delivered for a monetary compensation, that goes to the company that maintains the search engine. We observe that this system can be decentralized in the following way. The operations that are involved in the web analysis, information collection, organization and ranking can be done by a network of computers, rather than a data center, as is currently done. Upon a query, ranked pages can be returned to the user by this network. Accordingly, the advertisements can be served by the computer in this network. We propose an architecture and a set of algorithms that would maintain the interaction of this network with the advertisers, and distribute the revenue collected for the delivery of the ads to the addresses of the machines directly involved in information analysis necessary for the serving the search results for the query. In this way, the advertisement revenue goes to the maintainers of the network, rather than a centralized entity. We show that it is possible to maintain privacy of the advertisers and searchers in this system. We think that development of such a system will be useful for the development of the web and will make it more resilient, flexible and rich. We think that the process that we are proposing to follow to the individuals who wish to participate in the maintenance of this network is quite analogous to the mining of cryptocurrency. Instead of finding the solutions of cryptographic puzzles, here the nodes in the network participate in the analysis, ranking and organization of knowledge ( web text). We may therefore call this process "knowledge mining".

The architecture that we propose is centered around network of nodes, servers that run a specialized software stack. We will make this software stack free and publicly available. This is actually the core of the work that we will have to perform to launch this project. This software stack will contain different components that cover the necessary operations that need to be performed in web analysis, namely : crawling the pages, storing and parsing them, computing the rank, constructing the index, and also certain additional operations that are necessary when operating on the network rather than a data center. It is a remarkable and truly amazing fact that such a system can be made resilient to the attacks. We can be sure at the moment that such a system will be subject to all possible attacks by hackers and other organizations. Every possibility to ruin or manipulate its operation will be explored. Nonetheless, there is a way to ensure that the rank computed by a pre-arranged algorithm will actually be delivered to the end user. We specify this mechanism to some detail in the white paper. It is a kind of consensus mechanism based on preventing collusion between nodes. We should mention that at the moment we are implementing the simplest version of this construction, which is not completely decentralized. It involves external elements of trust, randomness severs, as we call them, that ensure that there is no collusion between nodes. The system will work if the majority of the nodes in given locality are honest. This architecture will take a place in the long row of already proposed consensus mechanisms that are currently being explored by cryptocurrency community.