Docs

Financial Component

The financial component is described in more detail in our whitepaper. Here we describe specs of nodes. The nodes are organized into a B-tree. This B-tree has growth rules to scale operations

Node types

Node spec may differ by the position of node in the B-tree. At the root the sepcs are minimal.

B-tree structure

In order to scale transactions, the compute nodes are organized in a B-tree. The vertices closer to root may have higher gas price. Each vertex in the B-tree is served by a pool of nodes. The vertices are addressed by the tree address
[a]=[a_0:a_1:...:a_l]
where a_i is a 256 hex string. They maintain a blockchain of transactions issued by the clients of this vertex.
For clarity, we will use the phrase "the vertex B_a issues a TX_X". This phrase means the following:
  1. nodes N_a_i serving the vertex B_a reached a consensus on this TX_X and recorded it on its BC. This record includes signatures of nodes N_a_i
  2. the nodes of the vertex communicate TX_X to nodes N_p_j, that maintain the parent vertex B_p of the vertex B_a
  3. the communication happens according to the protocol specified by parent vertex B_p for the specific child vertex B_a

Vertex manifest

The vertex manifest specifies
  1. Node spec
  2. The minimal number of nodes to serve this vertex. This is security guarantee , the decentralization degree of this particulr vertex
  3. The protocol by which nodes can join the vertex. The vertex can , for example, be "private", by allowing only specified "trusted" nodes.
  4. Load calculation algorithm
  5. Maximal load
  6. Branching rules
  7. Consensus algorithm. The default consensus is Avalanche, with the modification descibed in whitepaper
  8. Gas calculation algortihm and price

Growth rules

The default growth rule, followed e.g. by the root vertex, is
[Average # of TXs per second] > N0
[Average Node Load] > L0
Each vertex maintains its own blockchain from which [Average # of transactions] is calculated. The node load comes from Data Availability Check transactions described below. These transactions are recorded on chain and therefore this load can be unabmiguosly calculated. If this condition is met, the Vertex opens possibility for nodes to create a subvertex. This contract includes
  1. Schedule of financial reports
  2. Audit rules
In the simplest case, [Financial report] is the following structure
[Merkle root of financial transactions in a given time period]
[Merkle root of state of entities within the vertex]
[Transfer matrix of funds between vertices of the same level]
Time period in the above is set by the parent vertex.

Structure of financial transactions

The financial transactions are tied with the B-tree structure. The transfer transaction between two clients with addresses A=[a_0:...:a_k:a_k+1:...a_n] and B=[a_0:...:a_k:b_k+1:...:b_m], a_k+1 != b_k+1 can be settled by the vertex B_[a_:...;a_k] and do not have to be propagated further. The transfer of funds happens in the following stages:
  1. the transaction propagates up to the nearest vertex to which both of the accounts are common children. This "propagation" consists in transferiing funds to the vertex account, as registered with its parent. This chain of transfers may look like [a_0:...a_n] -> [a_0:....:a_n-1]-> ... -> [a_0:...a_k]
  2. once the funds are available for B_[a_0:...a_k], a downward chain of transactions is initiated [a_0:....a_k] -> [a_0:...a_k:b_k+1] -> ... -> [a_0:...:a_k:b_k+1:...:b_n]
At each step, the parent bank makes the transaction available on its blockchain. This serves as a trigger for the childe vertex that it can initiate the transaction. To avoid frequent update at the low height of the tree ( near to the root), the transactions are batched as follows. The transfers between vertices B_a are settled in gross , i.e., the sums of money that were requested to be transferred by clients are summed.
It is up to the vertex to specify the schedule of the transactions. For example, there may be "fast vertex", such that transactions within that vertex are settled very fast. This is necessary to incorporate millisecond trading.
The financial transaction structure is based on fractional reserve model. Each vertex corresponds to a bank. Its subvertices are its clients. Vertex can perform transfer transactions between its clients accounts locally. To perform transfer to accounts of clients of other vertex of the same or higher level, the vertex must submit a transaction to its parent vertex.
[Transfer(amount, acount_id)]
This transfer happens from the account of the vertex itself. If a client wishes to transfer funds to [account_id], then it 1. transferes funds to vertex account 2. vertex reports the transaction to its parent vertex, which then trsfers the funds to the [account_id] , if it is one of its clients, or issues a transaction to its parent , if [account_id] belongs to more distant part of the B-tree.

Clinets

Client joins the network by creating account with a vertex B_a, a= [a_0:...:a_n]. Thus, client has the address client_id = [a_0:...:a_n:c_id].
The vertex B_a does the following calculation for the clinet:
  1. Maitains Merkle root of its execution state ( see blow)
  2. Performs Financial transactions

Execution state

Apart from vertex nodes, there are Execution nodes in this system, see below.

Subvertex

Subvertex of a vertex reports

Execution and Storage nodes

One of the goals of this project is to maintain parallelized execution of heavy asynchronous computations on large data structures. This computations will be performed by nodes distinct from the nodes that maintain the state and financial transactions. There are two types of these nodes:
  1. Code and metadata storage nodes
  2. Compute nodes
The metadata storage nodes store plain code ( perhaps encrypted). The compute nodes (a-nodes in the following) can contain large data structures on which massively parallel jobs may be executed in a prolonged period of time.

Execution contracts

These contracts are specified with the b-nodes and must include
  1. Code call data : address of the plain text code and compiler version
  2. Pricing model.
  3. Timing information
  4. Variable locking and data update model. Specification of concurrency and variable update rule.
  5. Parallelization degree: the number of a-nodes that must argee on transaction outcome before state change is commited
  6. Merkle root update rule for the client state
  7. Randomness source: in the case the computaion involves some randomness, its source must be specified
  8. Payment schedule: in the case the computation is very large, the a-nodes may need continuous payment, even before reaching consensus on the result of the computation
  9. Consensus model

Consensus model for state update

It may be required by the client that the result of the comutation must be checked by b-nodes. This is part of the consensus model for state update. Light version of it is just checking Merlke root of the state. Then the b-nodes cross-check the Merkle roots of the state change of the client. More detailed check may be rquired by the client.

Data Availability

As part of the code that the client submits for execution, it must specify serialization rule to convert its state into a byte array. This is the rule that the a-nodes will be using to store the state. Then b-node may be required by the client to perform periodic random checks of data. This may take the form of the following contact that a-nodes must sign for:
SubmitDataCheckChallenge(client_id,[i:j]) // callable by b-nodes
SubmitResponce(challenge_id) // callable by a-nodes

A-nodes must respond with byte array [i:j] that corresponds to the serialized state, otherwise they will be penalized for failing Data Availability challeneg.

Transaction model

Each client has a corresponding code base that it submitted to the execution nodes ( and the merkle root of which is stored on the vertex). By default, all these functions are public: anybody can call them. It is the responsibility of the programmer to restrict the ids of other entities that can call these functions. Each transaction has the structure
TX=[Code_id,sig,Execution contract, [affected entities]]
where [affected entities]=[c1,...cn] is the list of entities whose functions may be called by Code_id. In the Code_id, there may be function calls to libraries or other eentities that are already deployed on the system. The transaction must specify the complete list of these entities. A-node must reach consensus with all the a-nodes that serve tbe entities [c1,...,cn]. If [A_i,1,...,A_i,k] is the list of a-nodes that registered with the vertex to serve the entity E_i, then if [E_1,...,E_s] is the complete list of the affected entities in the transaction , then the whole pool [A_i,k], i=1...s of a-nodes must reach the consensus on state change , before the transaction can be accepted by the vertex nodes. Note that this structure is potentially recursive: i.e., Code_id calls function form Code_id1, Code_id1 calls function from Code_id2 etc. If the transaction grows very large, the pool of nodes that must confirm the result can also grow very large. To discourage writing such transactions, vertices may implement pricing model in which the price depends exponentially on the number of involved entities.

Search Engine implementation

In this section, detail how a serach engine can be implemented in the DEE described above.

Namespace

We split the URI namespace into segments S_a. Each segment is split split into buckets B_a_i. The size of the segment is chosen such that R-nodes can serve one segment. Size of bucket is chosen such that a bucket fits C-nodes. For example, the total data of a bucket must not exceed ~10GB. Buckets and segments are "Entities" of DEE.

Entity structure

We focus here only on particular implementation geared for fog computing. In this model , there a-nodes of 2 types: C-nodes and R-nodes. C-nodes are ephemeral user machines, like laptops, with little computational power. They can go offline at any time. These nodes do the following tasks:
  1. Crawl pages
  2. Create local index
  3. Compute rank
The schedule of the crawl will be orchestrated by the R-nodes. R-nodes will also be responsible for transferring data necessary for rank computation.
R-nodes maintain the following data structures
  1. URI list in a bucket, for each C-node that crawls the bucket
  2. List of URIs that were confirmed by "enough" C-nodes
  3. Merkle roots of ranks of pages in each bucket, for each Rank_id and each C-node
R-nodes perform the following operatins
  1. allow C-nodes to register with them for crawl tasks
  2. the network of R-nodes decides in assignment of buckets which C-nodes will be responsible for. This is the place where coverage algorithm is implemented: the R-nodes maintain lists of C-nodes that cover particular bcket, and when new C-npde joins, it is assigned to a bucket which has the lowest number of healthy C-nodes that cover it.
  3. Maintains the blockchain of search transactions issued to C-nodes that are registered with it

Return Tree

Return tree is a network of R-nodes that are charged with the task of merging the sorted arrays of URIs that are returned in responce to a serch transaction. Each of the R-nodes in the specified Return Tree can be its root. This root serves as an entry point to the search engine. This is where the users can submit query.

Cluster

Cluster is a basic revenue generating entity of the Distributed search engine. It consists of a set of R-nodes R_s_i, each serving segment S_s. There are multiple R-nodes that serve the same segment. When C-node returns crawl resuslts for bucket [B_a \in S_s], it returns it to the subnetwork [R_s_i]. The R-nodes can be malicious, and a consensus must be reached among them. It is the responsibility of the C-node to make sure its results are propagated among the subnetwork [R_s_i].

Buckets

Buckets are entities that maitain the following info:
  1. URI list
  2. Ranks for each Rank_id
Buckets serve api calls
  1. generate random array of URIs for crawl
  2. submit ordered list of URIs

Crawl operations

Crawl operation is orchestrated by the network of R-nodes. For each bucket they store the list of URIs that were reported to be in this bucket by C-nodes. Then , on request from C-node, they provide lists of URIs to be crawled by the C-node, and record these lists into a Crawl History Ledger for each c-node. The rule by which lists from different c-nodes are merged may vary and is under experimental study at the moment. The crwal is considered complete when %p of c-nodes agree to precision \epsioln on the content of the bucket. ( Details here on the metric on the space of the texts within the bucket).

News crawl

R-nodes must provide update rule for page recrawl and for maintaining fresh page information.

Search Transaction

This transacton can be initiated by a user on each of the R-ndoes participating in an Return Tree. This transaction has the following struncture
  1. User initiates TX: [Query,R-node_id,Rank_id,sig]
  2. For each vertex in the Index Tree [Query_id,list of R-nodes and their index segments I_a] where the query was submitted
  3. C-nodes submit Merkle root of the sorted list ot URIs that it submits in responce [Query_id,Merkle(ranked URIs)]
  4. for each vertex in the RTree, by each R-node that serves this vertex, the following data is submitted to the storage nodes [Query_id,Merkle( C-node or R-node merged sorted list of URIs)]
  5. user finalization. If user is identified, she may choose to consume the advertisement that comes with the transaction, and , if the transaction is valid ( as determined by the network), then she and the participating nodes may become eleigible to receive the advertising reward.
The validation of the transaction is performed by the b-nodes that serve R-nodes. This verification involves :
  1. Check Merkle tree of the merging of the sorted lists in the RTree. The R-nodes are required to store the merge trees for some period of time ( days, after which the data can be retired), so that the data is available for b-nodes for check at random positions.
  2. The verification of the receipt from the user on the receipt of the final merged sort list.
  3. Optionally, verification of advertisemtn consumption, if this is the adverticement transaction
  4. The b-nodes involved in the verification must document their consensus: the list of y/n votes with signatures.