' > index.html
15 | mv build/Paper.pdf paper.pdf
16 | git add -f README.md index.html paper.pdf
17 | git commit -m "Built pdf from {$SHA}."
18 |
19 | ENCRYPTED_KEY_VAR="encrypted_${ENCRYPTION_LABEL}_key"
20 | ENCRYPTED_IV_VAR="encrypted_${ENCRYPTION_LABEL}_iv"
21 | ENCRYPTED_KEY=${!ENCRYPTED_KEY_VAR}
22 | ENCRYPTED_IV=${!ENCRYPTED_IV_VAR}
23 | openssl aes-256-cbc -K $ENCRYPTED_KEY -iv $ENCRYPTED_IV -in deploykey.enc -out deploykey -d
24 | chmod 600 deploykey
25 | eval `ssh-agent -s`
26 | ssh-add deploykey
27 |
28 | git push -f "$PUSH_REPO" gh-pages
29 |
--------------------------------------------------------------------------------
/p2p-analysis.md:
--------------------------------------------------------------------------------
1 | P2p source code and several packages below
2 |
3 | - Discover contains the[Kademlia protocol](./references/Kademlia.pdf). It is a UDP-based p2p node discovery protocol.
4 | - Discv5: new node discovery protocol. Still a test proposal. This analysis is not covered.
5 | - Part of the code for nat network address translation
6 | - Netutil: some tools
7 | - Simulations simulation of p2p networks. This analysis is not covered.
8 |
9 | Source code analysis of the discover package
10 |
11 | - [Discovered nodes for persistent storage database.go](p2p-database-analysis.md)
12 | - [The core logic of the Kademlia protocol is tabel.go](p2p-table-analysis.md)
13 | - [UDP protocol processing logic udp.go](p2p-udp-analysis.md)
14 | - [Network address translation nat.go](p2p-nat-analysis.md)
15 |
16 | p2p/ package source analysis
17 |
18 | - [Encrypted link processing protocol between nodes rlpx.go](p2p-rlpx-analysis.md)
19 | - [The processing logic to select nodes and then connect them.](p2p-dial-analysis.md)
20 | - [Processing of node and node connections and handling of protocols peer.go](p2p-peer-analysis.md)
21 | - [Logical server.go for p2p server](p2p-server-analysis.md)
22 |
--------------------------------------------------------------------------------
/references/yellowpaper/BRANCHES.md:
--------------------------------------------------------------------------------
1 | ## Protocol Versions
2 |
3 | Each protocol version is specified in `Paper.tex` found in a branch of this repository.
4 |
5 | | Branch | Version | Applicable Block Numbers |
6 | |-------------------|-----------------------------------------------------------------------------------|---------------------------------|
7 | | master | [Byzantium](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-609.md) | Since 4,370,000 and onwards |
8 | | spurious-dragon | [Spurious Dragon](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-607.md) | Since 2,675,000 until 4,369,999 |
9 | | tangerine-whistle | [Tangerine Whistle](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-608.md) | Since 2,463,000 until 2,674,999 |
10 | | dao-fork | [DAO Fork](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-779.md) | Since 1,920,000 until 2,462,000 |
11 | | homestead | [Homestead](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-606.md) | Since 1,150,000 until 1,919,999 |
12 | | frontier | [Frontier](https://github.com/ethereum/yellowpaper/tree/frontier) | Since 1 until 1,149,999 |
13 |
--------------------------------------------------------------------------------
/references/readme.md:
--------------------------------------------------------------------------------
1 | The dotted line contains the subtrees, with the prefixes from top to bottom being 0,01,000,0010.
2 |
3 | > According to the above understanding: For any node, you can decompose this binary tree into a series of consecutive subtrees that do not contain your own. The highest-level subtree consists of the other half of the tree that does not contain its own tree; the next subtree consists of the remaining half that does not contain itself; and so on, until the entire tree is split.
4 |
5 | The dotted line contains the subtrees, with the prefixes from top to bottom being 1,01,000,0010.
6 |
7 | Each such list is called a K bucket, and the internal information storage location of each K bucket is arranged according to the time sequence seen last time. The most recent (least-recently) look is placed on the head, and finally (most-recently) See the tail. Each bucket has no more than k data items.
8 |
9 | > Recent (least-recently) and last-most (recently) translations will create ambiguity
10 |
11 | The least-recently visited node is placed in the queue header, and the most-recently visited node is placed at the end of the queue. As a result of simulating the analysis of gnutella user behavior, the most recently visited active node is also the node most likely to need access in the future. This strategy uses Epoch.
12 |
--------------------------------------------------------------------------------
/packing-tools.md:
--------------------------------------------------------------------------------
1 | # Some basic tools for packaging
2 |
3 | In [go-ethereum](https://github.com/ethereum/go-ethereum) project, there is a small module golang ecosystem in some excellent tools package, due to the simple function, a chapter alone far too thin. However, because Ethereum's packaging of these gadgets is very elegant, it has strong independence and practicality. We do some analysis here, at least for the familiarity with the encoding of the Ethereum source code.
4 |
5 | ## metrics
6 |
7 | In [ethdb-analysis](./ethdb-analysis.md), we saw the encapsulation of the[goleveldb](https://github.com/syndtr/goleveldb)project. Ethdb abstracts a layer on goleveldb.
8 |
9 | [type Database interface](https://github.com/ethereum/go-ethereum/blob/master/ethdb/interface.go#L29)
10 |
11 | In order to support the use of the same interface with MemDatabase, it also uses a lot of probe tools under the gometrics package in LDBDatabase , and can start a goroutine execution.
12 |
13 | [go db.meter(3 \* time.Second)](https://github.com/ethereum/go-ethereum/blob/master/ethdb/database.go#L198)
14 |
15 | Collect the delay and I/O data volume in the goleveldb process in a 3-second cycle. It seems convenient, but the question is how do we use the information we collect?
16 |
17 | ## log
18 |
19 | Golang's built-in log package has always been used as a slot, and the Ethereum project is no exception. Therefore [log15](https://github.com/inconshreveable/log15) was introduced to solve the problem of inconvenient log usage.
20 |
--------------------------------------------------------------------------------
/eth-analysis.md:
--------------------------------------------------------------------------------
1 | The source code of eth includes the following packages.
2 |
3 | - The downloader is mainly used to synchronize with the network, including the traditional synchronization method and fast synchronization method.
4 | - Fetcher is mainly used for block-based notification synchronization. When we receive the NewBlockHashesMsg message, we only receive a lot of block hash values. The hash value needs to be used to synchronize the block.
5 | - Filter provides RPC-based filtering, including real-time data synchronization (PendingTx), and historical log filtering (Log filter)
6 | - Gasprice offers price advice for gas, based on the gasprice of the last few blocks, to get the current price of gasprice
7 |
8 | Partial source analysis of eth protocol
9 |
10 | - [Ethereum's network protocol](eth-network-analysis.md)
11 |
12 | Source analysis of the fetcher part
13 |
14 | - [fetch partial source analysis](eth-fetcher-analysis.md)
15 |
16 | Downloader partial source code analysis
17 |
18 | - [Node fast synchronization algorithm](fast-sync-algorithm.md)
19 | - [The schedule and result assembly used to provide the download task. queue.go](eth-downloader-queue-analysis.md)
20 | - [Used to represent the peer, provide QoS and other functions peer.go](eth-downloader-peer-analysis.md)
21 | - [The fast synchronization algorithm is used to provide the state-root synchronization of the Pivot point statesync.go](eth-downloader-statesync.md)
22 | - [Analysis of the general process of synchronization](eth-downloader-analysis.md)
23 |
24 | Filter part of the source code analysis
25 |
26 | - [Provide Bloom filter query and RPC filtering](eth-bloombits-and-filter-analysis.md)
27 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # go-ethereum code analysis
2 |
3 | **I hope to analyze the code of Ethereum to learn the use of blockchain technology and GO language.**
4 |
5 | Analysis of [go-ethereum](https://github.com/ethereum/go-ethereum)
6 | The process, I hope to start from the low-level technical components that rely on less, and slowly go deep into the core logic.
7 |
8 | ## Table of contents
9 |
10 | - [go ethereum code analysis (account, smart contract, logs, etc...)](/go-ethereum-code-analysis.md)
11 | - [yellow book symbol index](symbol-index.md)
12 | - [rlp, rlpx analysis](/rlp-analysis.md)
13 | - [trie source analysis](/trie-analysis.md)
14 | - [ethdb analysis](/ethdb-analysis.md)
15 | - [rpc analysis](/rpc-analysis.md)
16 | - [p2p analysis](/p2p-analysis.md)
17 | - [eth protocol analysis](/eth-analysis.md)
18 | - **core analysis**
19 | - [blockchain index, chain_indexer analysis](/core-chain_indexer-analysis.md)
20 | - [bloom filter index, bloombits-analysis](/core-bloombits-analysis.md)
21 | - [ethereum trie, tree management, rollback, state-analysis](/core-state-analysis.md)
22 | - [transaction processing](/core-state-process-analysis.md)
23 | - **vm analysis**
24 | - [stack & data structure](/core-vm-stack-memory-analysis.md)
25 | - [instruction, jump table, interpreter analysis](/core-vm-jumptable-instruction.md)
26 | - [vm analysis](/core-vm-analysis.md)
27 | - **transaction pool management**
28 | - [transaction execution](/core-txlist-data-structure-analysis.md)
29 | - [transaction pool management](/core-txpool-analysis.md)
30 | - [genesis block](/core-genesis-analysis.md)
31 | - [blockchain-analysis](/core-blockchain-analysis.md)
32 | - [miner analysis & CPU mining](/miner-analysis-CPU-mining.md)
33 | - [pow, poa, pos algorithms](/pow-analysis.md)
34 | - [ethereum test network Clique_PoA introduciton](/ethereum-Clique_PoA-introduction.md)
35 | - [swarm, raw & file upload, pss and feed](/ethereum-swarm-introduction.md)
36 |
--------------------------------------------------------------------------------
/p2p-nat-analysis.md:
--------------------------------------------------------------------------------
1 | Nat is the meaning of network address translation. This part of the source code is relatively independent and single, so I will not analyze it here. Everyone knows the basic functions.
2 |
3 | There are two network protocols, **upnp** and **pmp**, under **nat**.
4 |
5 | ### Upnp application scenario (pmp is a protocol similar to upnp)
6 |
7 | If the user accesses the Internet through NAT and needs to use modules such as P2P, BC or eMule, the UPnP function will bring great convenience. UPnP can automatically map the port numbers of BC and eMule to the public network, so that users on the public network can also initiate connections to the private network side of the NAT.
8 |
9 | The main function is to provide an interface to map the IP + port of the intranet to the IP + port of the router. This is equivalent to the internal network's IP address of the external network, so that users of the public network can directly access you. Otherwise, you need to access it through UDP holes.
10 |
11 | ### UDP protocol in p2p
12 |
13 | Most of the environments in which users run today are all intranet environments. The ports that are monitored in the intranet environment cannot be directly accessed by other public network programs. Need to go through a hole punching process. Both parties can connect. This is called UDP hole punching.
14 |
15 | network can not directly access the program on the intranet. Because the router does not know how to route data to this program on the intranet.
16 |
17 | Then we first contact the program of the external network through the program of the intranet, so the router will automatically assign a port to the program on the intranet. And record a mapping 192.168.1.1:3003 -> 111.21.12.12:3003 in the router. This mapping will eventually disappear as time goes by.
18 |
19 | After the router establishes such a mapping relationship. Other programs on the Internet can be happy to access the port 111.21.12.12:3003. Because all data sent to this port will eventually be routed to the 192.168.1.1:3003 port. This is the so-called process of hole punching.
20 |
21 | 
22 |
23 | **To have access to a node of the LAN network, we also need PAT**
24 |
25 | 
26 |
--------------------------------------------------------------------------------
/references/yellowpaper/README.md:
--------------------------------------------------------------------------------
1 | # Ethereum Yellow Paper
2 |
3 | [](https://creativecommons.org/licenses/by-sa/4.0/)
4 | [](https://gitter.im/ethereum/yellowpaper?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
5 |
6 | The Yellow Paper is a formal definition of the Ethereum protocol, originally by Gavin Wood, currently maintained by Nick Savers and with contributions from many people around the world.
7 |
8 | It is a free culture work, licensed under Creative Commons Attribution Share-Alike (CC-BY-SA) version 4.0.
9 |
10 | ## Usage
11 |
12 | The paper comes as a single ``latex`` file ``Paper.tex``. The latest version is generally available as a PDF at https://ethereum.github.io/yellowpaper/paper.pdf. If you find that the borders for links block too much text when viewing the PDF in the browser, you can instead download it and open and view it with a PDF viewer application such as Adobe Acrobat or Evince, where the borders are less likely to display over text.
13 |
14 | ## How to build
15 |
16 | The paper also comes as a single ``latex`` file ``Paper.tex``, which is built as a PDF as follows.
17 |
18 | ```
19 | git clone https://github.com/ethereum/yellowpaper.git
20 | cd yellowpaper
21 | ./build.sh
22 | ```
23 | This will create a PDF version of the Yellow Paper. Following building, you can also use standard `pdflatex` tools like http://latex.informatik.uni-halle.de/latex-online/latex.php for compiling/preview.
24 |
25 | ## Tips on editing
26 |
27 | You can use [TeX Stack Exchange](https://tex.stackexchange.com/); https://en.wikibooks.org/wiki/LaTeX/ (e.g. [Bibliography Management](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management) and [Hyperlinks](https://en.wikibooks.org/wiki/LaTeX/Hyperlinks)); and [BibTeX editor](http://truben.no/latex/bibtex/).
28 |
29 | ## Versions
30 |
31 | The previous protocol versions are listed in [BRANCHES.md](./BRANCHES.md).
32 |
33 | ### Other language versions
34 | - [Chinese](https://github.com/yuange1024/ethereum_yellowpaper) translated by YuanGe and GaoTianlu.
35 | - [French](https://github.com/asseth/yellowpaper) translated by Asseth (checkout to branch 'french' ).
36 |
--------------------------------------------------------------------------------
/pos-proofofstake-introduction.md:
--------------------------------------------------------------------------------
1 | **Proof-of-stake(POS)** is an algorithm for the distributed consensus of blockchain networks of cryptocurrency. In Pos-based cryptocurrency, the creator of the next block is selected by combining random selection, wealth value, or age. Conversely, Pow-based cryptocurrencies (such as Bitcoin) determine the creator of the block by cracking the hash puzzle.
2 |
3 | ## Multiple block selection mechanisms
4 |
5 | Proof-of-stake must have a way to define the next valid block in the blockchain. If you only rely on account balances, it will lead to a centralization result, because if the single richest member will have a permanent advantage. Instead, there are several different options that are designed.
6 |
7 | ### Random block selection
8 |
9 | Nxt and BlackCoin use a random way to predict the next block producer. By using a formula, this formula selects the minimum value of the user's share hash value. Argmin hash(stake). Because the shares are public, all nodes can calculate the same value.
10 |
11 | ### Based on currency age selection
12 |
13 | Peercoin's proof-of-stake system combines the concept of random selection and currency age. The age of the coin is the number of coins multiplied by the holding time of the currency. Coins held for more than 30 days will have the opportunity to become the forge of the next block. Users with older coins will have a greater chance to sign the next block. However, once used to sign a block, his currency age will be cleared to zero. You must wait another 30 days before you can sign the next block. Similarly, the maximum age of the coin will only increase to the maximum of 90 days will not increase, in order to avoid the very old age of the user has an absolute role in the blockchain. This process makes the network secure and gradually creates new currencies without consuming very large computing resources. Peercoin's developers affirm that it is more difficult to attack on such a network than Pow, because without a centralized mine, it is more difficult to get 51% of the currency than to get 51% of the calculation.
14 |
15 | ## Advantage
16 |
17 | Proof of Work relies on energy consumption. According to the bitcoin mine operator, the energy consumption of a bitcoin did not reach 240 kWh in 2014 (equivalent to burning 16 gallons of gasoline, in terms of carbon production). And the consumption of energy is paid in non-cryptocurrency. Proof of Stake is thousands of times more efficient than Pow.
18 |
19 | The incentives for block producers are also different. In Pow mode, the producer of the block may not have any cryptocurrency. Miners' intention is to maximize their own benefits. It is unclear whether such inconsistencies will reduce the security of the currency or increase the security risks of the system. In the Pos system, these people who are safe in the guardian system always use people with more money.
20 |
21 | ## Criticism
22 |
23 | Some authors believe that pos is not an ideal option for distributed coherence protocols. One of the problems is the usual "nothing at stake" problem, which says that for the producers of the blockchain, voting at the two forked points at the same time does not cause any problems, and this may result in Consistency is difficult to resolve. Because working on multiple chains at the same time consumes very little resources (the root Pow is different), anyone can abuse this feature, so that anyone can spend double on different chains.
24 |
25 | There are also many ways to try to solve this problem:
26 |
27 | - Ethereum recommends using the Slasher protocol to allow users to punish people who cheat. If someone tries to create a block on multiple different block branches, it is considered a cheater. This proposal assumes that if you want to create a fork you must double sign, if you create a fork without a stake, then you will be punished. However, Slasher has never been adopted. Ethereum developers believe that Pow is a problem worthy of facing. The plan is to replace it with a different Pos protocol CASPER.
28 | - Peercoin uses a centralized broadcast checkpoint approach (signed with the developer's private key). The depth of the blockchain reconstruction cannot exceed the latest checkpoint. The trade-off is that the developer is a centralized authority and controls the blockchain.
29 | - The Nxt protocol only allows the latest 720 block reconstructions. However, this only adjusts the problem. A client can follow a fork with 721 blocks, regardless of whether the fork is the highest blockchain, thus preventing consistency.
30 | - A hybrid agreement between Proof of burn and proof of stake. Proof of burn exists as a checkpoint node, has the highest reward, does not contain transactions, is the safest...
31 | - A mixture of pow and pos. Pos as an extension of dependency pow, based on the Proof of Activity proposal, this proposal hopes to solve the nothing-at-stake problem, using pow miners mining, and pos as the second authentication mechanism. Some blockchain solutions like PIVX using pow and pos mixture.
32 |
--------------------------------------------------------------------------------
/hashimoto.md:
--------------------------------------------------------------------------------
1 | Hashimoto :I/O bound proof of work
2 |
3 | Abstract: Using a cryptographic hash function not as a proofofwork by itself, but
4 | rather as a generator of pointers to a shared data set, allows for an I/O bound
5 | proof of work. This method of proof of work is difficult to optimize via ASIC
6 | design, and difficult to outsource to nodes without the full data set. The name is
7 | based on the three operations which comprise the algorithm: hash, shift, and
8 | modulo.
9 |
10 | The need for proofs which are difficult to outsource and optimize
11 |
12 | A common challenge in cryptocurrency development is maintaining decentralization of the
13 | network. The use of proofofwork to achieve decentralized consensus has been most notably
14 | demonstrated by Bitcoin, which uses partial collisions with zero of sha256, similar to hashcash. As
15 | Bitcoin’s popularity has grown, dedicated hardware (currently application specific integrated circuits, or
16 | ASICs) has been produced to rapidly iterate the hash based proofofwork function. Newer projects
17 | similar to Bitcoin often use different algorithms for proofofwork, and often with the goal of ASIC
18 | resistance. For algorithms such as Bitcoin’s, the improvement factor of ASICs means that commodity
19 | computer hardware can no longer be effectively used, potentially limiting adoption.
20 |
21 | Proofofwork can also be “outsourced”, or performed by a dedicated machine (a “miner”)
22 | without knowledge ofwhat is being verified. This is often the case in Bitcoin’s “mining pools”. It is also
23 | beneficial for a proofofwork algorithm to be difficult to outsource, in order to promote decentralization
24 | and encourage all nodes participating in the proofofwork process to also verify transactions. With these
25 | goals in mind, we present Hashimoto, an I/O bound proofofwork algorithm we believe to be resistant to
26 | both ASIC design and outsourcing.
27 |
28 | Initial attempts at "ASIC resistance" involved changing Bitcoin's sha256 algorithm for a different,
29 | more memory intensive algorithm, Percival's "scrypt" password based key derivation function1. Many
30 | implementations set the scrypt arguments to low memory requirements, defeating much ofthe purpose of
31 | the key derivation algorithm. While changing to a new algorithm, coupled with the relative obscurity of the
32 | various scryptbased cryptocurrencies allowed for a delay, scrypt optimized ASICs are now available.
33 | Similar attempts at variations or multiple heterogeneous hash functions can at best only delay ASIC
34 | implementations.
35 |
36 | Leveraging shared data sets to create I/O bound proofs
37 |
38 | "A supercomputer is a device for turning compute-bound problems into I/O-bound problems."
39 | -Ken Batcher
40 |
41 | Instead, an algorithm will have little room to be sped up by new hardware if it acts in a way that commodity computer systems are already optimized for.
42 |
43 | Since I/O bounds are what decades of computing research has gone towards solving, it's unlikely that the relatively small motivation ofmining a few coins would be able to advance the state ofthe art in cache hierarchies. In the case that advances are made, they will be likely to impact the entire industry of computer hardware.
44 |
45 | Fortuitously, all nodes participating in current implementations ofcryptocurrency have a large set of mutually agreed upon data; indeed this “blockchain” is the foundation ofthe currency. Using this large data set can both limit the advantage ofspecialized hardware, and require working nodes to have the entire data set.
46 |
47 | Hashimoto is based offBitcoin’s proofofwork2. In Bitcoin’s case, as in Hashimoto, a successful
48 | proofsatisfies the following inequality:
49 |
50 | hash_output < target
51 |
52 | For bitcoin, the hash_output is determined by
53 |
54 | hash_output = sha256(prev_hash, merkle_root, nonce)
55 |
56 | where prev_hash is the previous block’s hash and cannot be changed. The merkle_root is based on the transactions included in the block, and will be different for each individual node. The nonce is rapidly incremented as hash_outputs are calculated and do not satisfy the inequality. Thus the bottleneck of the proofis the sha256 function, and increasing the speed of sha256 or parallelizing it is something ASICs can do very effectively.
57 |
58 | Hashimoto uses this hash output as a starting point, which is used to generated inputs for a second hash function. We call the original hash hash_output_A, and the final result of the prooffinal_output.
59 |
60 | Hash_output_A can be used to select many transactions from the shared blockchain, which are then used as inputs to the second hash. Instead of organizing transactions into blocks, for this purpose it is simpler to organize all transactions sequentially. For example, the 47th transaction of the 815th block might be termed transaction 141,918. We will use 64 transactions, though higher and lower numbers could work, with different access properties. We define the following functions:
61 |
62 | - nonce 64bits. A new nonce is created for each attempt.
63 | - get_txid(T) return the txid (a hash ofa transaction) of transaction number T from block B.
64 | - block_height the current height ofthe block chain, which increases at each new block
65 |
66 | Hashimoto chooses transactions by doing the following:
67 |
68 | hash_output_A = sha256(prev_hash, merkle_root, nonce)
69 | for i = 0 to 63 do
70 | shifted_A = hash_output_A >> i
71 | transaction = shifted_A mod total_transactions
72 | txid[i] = get_txid(transaction) << i
73 | end for
74 | txid_mix = txid[0] ⊕ txid[1] … ⊕ txid[63]
75 | final_output = txid_mix ⊕ (nonce << 192)
76 |
77 | The target is then compared with final_output, and smaller values are accepted as proofs.
78 |
--------------------------------------------------------------------------------
/core-vm-stack-memory-analysis.md:
--------------------------------------------------------------------------------
1 | Vm uses the Stack of objects in stack.go as the stack of the virtual machine. Memory represents the memory object used in the virtual machine.
2 |
3 | ## stack
4 |
5 | The simpler is to use 1024 big.Int fixed-length arrays as the storage for the stack.
6 |
7 | structure
8 |
9 | ```go
10 | // stack is an object for basic stack operations. Items popped to the stack are
11 | // expected to be changed and modified. stack does not take care of adding newly
12 | // initialised objects.
13 | type Stack struct {
14 | data []*big.Int
15 | }
16 |
17 | func newstack() *Stack {
18 | return &Stack{data: make([]*big.Int, 0, 1024)}
19 | }
20 | ```
21 |
22 | push operation
23 |
24 | ```go
25 | func (st *Stack) push(d *big.Int) { // Append to the end
26 | // NOTE push limit (1024) is checked in baseCheck
27 | //stackItem := new(big.Int).Set(d)
28 | //st.data = append(st.data, stackItem)
29 | st.data = append(st.data, d)
30 | }
31 | func (st *Stack) pushN(ds ...*big.Int) {
32 | st.data = append(st.data, ds...)
33 | }
34 | ```
35 |
36 | pop operation
37 |
38 | ```go
39 | func (st *Stack) pop() (ret *big.Int) { // Take it out from the end.
40 | ret = st.data[len(st.data)-1]
41 | st.data = st.data[:len(st.data)-1]
42 | return
43 | }
44 | ```
45 |
46 | The value operation of the exchange element, and the operation?
47 |
48 | ```go
49 | func (st *Stack) swap(n int) { // Swaps the value of the element at the top of the stack and the element at a distance n from the top of the stack.
50 | st.data[st.len()-n], st.data[st.len()-1] = st.data[st.len()-1], st.data[st.len()-n]
51 | }
52 | ```
53 |
54 | Dup operation like copying the value of the specified location to the top of the heap
55 |
56 | ```go
57 | func (st *Stack) dup(pool *intPool, n int) {
58 | st.push(pool.get().Set(st.data[st.len()-n]))
59 | }
60 | ```
61 |
62 | Peek operation. Peeking at the top of the stack
63 |
64 | ```go
65 | func (st *Stack) peek() *big.Int {
66 | return st.data[st.len()-1]
67 | }
68 | ```
69 |
70 | Back peek at the elements of the specified location
71 |
72 | ```go
73 | // Back returns the n'th item in stack
74 | func (st *Stack) Back(n int) *big.Int {
75 | return st.data[st.len()-n-1]
76 | }
77 | ```
78 |
79 | Require guarantees that the number of stack elements is greater than or equal to n.
80 |
81 | ```go
82 | func (st *Stack) require(n int) error {
83 | if st.len() < n {
84 | return fmt.Errorf("stack underflow (%d <=> %d)", len(st.data), n)
85 | }
86 | return nil
87 | }
88 | ```
89 |
90 | ## intpool
91 |
92 | Very simple. It is a 256-sized pool of big.int used to speed up the allocation of bit.Int.
93 |
94 | ```go
95 | var checkVal = big.NewInt(-42)
96 |
97 | const poolLimit = 256
98 |
99 | // intPool is a pool of big integers that
100 | // can be reused for all big.Int operations.
101 | type intPool struct {
102 | pool *Stack
103 | }
104 |
105 | func newIntPool() *intPool {
106 | return &intPool{pool: newstack()}
107 | }
108 |
109 | func (p *intPool) get() *big.Int {
110 | if p.pool.len() > 0 {
111 | return p.pool.pop()
112 | }
113 | return new(big.Int)
114 | }
115 | func (p *intPool) put(is ...*big.Int) {
116 | if len(p.pool.data) > poolLimit {
117 | return
118 | }
119 |
120 | for _, i := range is {
121 | // verifyPool is a build flag. Pool verification makes sure the integrity
122 | // of the integer pool by comparing values to a default value.
123 | if verifyPool {
124 | i.Set(checkVal)
125 | }
126 |
127 | p.pool.push(i)
128 | }
129 | }
130 | ```
131 |
132 | ## memory
133 |
134 | Construction, memory storage is byte[]. There is also a record of lastGasCost.
135 |
136 | ```go
137 | type Memory struct {
138 | store []byte
139 | lastGasCost uint64
140 | }
141 |
142 | func NewMemory() *Memory {
143 | return &Memory{}
144 | }
145 | ```
146 |
147 | use Resize to allocate space
148 |
149 | ```go
150 | // Resize resizes the memory to size
151 | func (m *Memory) Resize(size uint64) {
152 | if uint64(m.Len()) < size {
153 | m.store = append(m.store, make([]byte, size-uint64(m.Len()))...)
154 | }
155 | }
156 | ```
157 |
158 | Then use Set to set the value
159 |
160 | ```go
161 | // Set sets offset + size to value
162 | func (m *Memory) Set(offset, size uint64, value []byte) {
163 | // length of store may never be less than offset + size.
164 | // The store should be resized PRIOR to setting the memory
165 | if size > uint64(len(m.store)) {
166 | panic("INVALID memory: store empty")
167 | }
168 |
169 | // It's possible the offset is greater than 0 and size equals 0. This is because
170 | // the calcMemSize (common.go) could potentially return 0 when size is zero (NO-OP)
171 | if size > 0 {
172 | copy(m.store[offset:offset+size], value)
173 | }
174 | }
175 | ```
176 |
177 | Get to get the value, one is to get the copy, one is to get the pointer.
178 |
179 | ```go
180 | // Get returns offset + size as a new slice
181 | func (self *Memory) Get(offset, size int64) (cpy []byte) {
182 | if size == 0 {
183 | return nil
184 | }
185 |
186 | if len(self.store) > int(offset) {
187 | cpy = make([]byte, size)
188 | copy(cpy, self.store[offset:offset+size])
189 |
190 | return
191 | }
192 |
193 | return
194 | }
195 |
196 | // GetPtr returns the offset + size
197 | func (self *Memory) GetPtr(offset, size int64) []byte {
198 | if size == 0 {
199 | return nil
200 | }
201 |
202 | if len(self.store) > int(offset) {
203 | return self.store[offset : offset+size]
204 | }
205 |
206 | return nil
207 | }
208 | ```
209 |
210 | ## Some extra helper functions in stack_table.go
211 |
212 | ```go
213 | func makeStackFunc(pop, push int) stackValidationFunc {
214 | return func(stack *Stack) error {
215 | if err := stack.require(pop); err != nil {
216 | return err
217 | }
218 |
219 | if stack.len()+push-pop > int(params.StackLimit) {
220 | return fmt.Errorf("stack limit reached %d (%d)", stack.len(), params.StackLimit)
221 | }
222 | return nil
223 | }
224 | }
225 |
226 | func makeDupStackFunc(n int) stackValidationFunc {
227 | return makeStackFunc(n, n+1)
228 | }
229 |
230 | func makeSwapStackFunc(n int) stackValidationFunc {
231 | return makeStackFunc(n, n)
232 | }
233 | ```
234 |
--------------------------------------------------------------------------------
/go-ethereum-code-analysis.md:
--------------------------------------------------------------------------------
1 | ## Go-ethereum source code analysis
2 |
3 | Because go ethereum is the most widely used Ethereum client, subsequent source code analysis is analyzed from the code github.
4 |
5 | ### Build a go ethereum debugging environment
6 |
7 | #### windows 10 64bit
8 |
9 | First download the go installation package to install, because GO's website is walled, so download it from the address below.
10 |
11 | https://studygolang.com/dl/golang/go1.9.1.windows-amd64.msi
12 |
13 | After installation, set the environment variable, add the C:\Go\bin directory to your PATH environment variable, then add a GOPATH environment variable, and set the GOPATH value to the code path of your GO language download (I set it up C:\GOPATH)
14 |
15 | 
16 |
17 | Install the git tool, please refer to the tutorial on the network to install the git tool, go language automatically download code from github requires git tool support
18 |
19 | Open the command line tool to download the code for go-ethereum
20 | `go get github.com/ethereum/go-ethereum`
21 |
22 | After the command is successfully executed, the code will be downloaded to the following directory, %GOPATH%\src\github.com\ethereum\go-ethereum if it appears during execution
23 |
24 | # github.com/ethereum/go-ethereum/crypto/secp256k1
25 | exec: "gcc": executable file not found in %PATH%
26 |
27 | You need to install the gcc tool, we download and install from the address below
28 |
29 | http://tdm-gcc.tdragon.net/download
30 |
31 | Next install the IDE tool. The IDE I use is Gogland from JetBrains. Can be downloaded at the address below
32 |
33 | https://download.jetbrains.com/go/gogland-173.2696.28.exe
34 |
35 | Open the IDE after the installation is complete. Select File -> Open -> select GOPATH\src\github.com\ethereum\go-ethereum to open it.
36 |
37 | Then open go-ethereum/rlp/decode_test.go. Right-click on the edit box to select Run. If the run is successful, the environment setup is complete.
38 |
39 | 
40 |
41 | ### Ubuntu 16.04 64bit
42 |
43 | Go installation package for installation
44 |
45 | `apt install golang-go git -y`
46 |
47 | Golang environment configuration:
48 |
49 | _Edit the /etc/profile file and add the following to the file:_
50 |
51 | export GOROOT=/usr/bin/go
52 | export GOPATH=/root/home/goproject
53 | export GOBIN=/root/home/goproject/bin
54 | export GOLIB=/root/home/goproject/
55 | export PATH=$PATH:$GOBIN:$GOPATH/bin:$GOROOT/bin
56 |
57 | Execute the following command to make the environment variable take effect:
58 |
59 | # source /etc/profile
60 |
61 | Download source code:
62 | #cd /root/home/goproject; mkdir src; cd src # Enter the go project directory, create the src directory, and enter the src directory
63 | #git clone https://github.com/ethereum/go-ethereum
64 |
65 | Open it with vim or another IDE, the best is Visual Code
66 |
67 | ### Go ethereum directory is probably introduced
68 |
69 | The organization structure of the go-ethereum project is basically a directory divided by functional modules. The following is a brief introduction to the structure of each directory. Each directory is also a package in the GO language. I understand that the package in Java should be similar. meaning.
70 |
71 | accounts Achieved a high-level Ethereum account management
72 | bmt Implementation of binary Merkel tree
73 | build Mainly some scripts and configurations compiled and built
74 | cmd A lot of command line tools, one by one
75 | /abigen Source code generator to convert Ethereum contract definitions into easy to use, compile-time type-safe Go packages
76 | /bootnode Start a node that only implements network discovery
77 | /evm Ethereum virtual machine development tool to provide a configurable, isolated code debugging environment
78 | /faucet
79 | /geth Ethereum command line client, the most important tool
80 | /p2psim Provides a tool to simulate the http API
81 | /puppeth Create a new Ethereum network wizard, such as Clique POA consensus
82 | /rlpdump Provides a formatted output of RLP data
83 | /swarm Swarm network utils
84 | /util Provide some public tools
85 | /wnode This is a simple Whisper node. It can be used as a standalone boot node. In addition, it can be used for different testing and diagnostic purposes.
86 | common Provide some public tools
87 | compression Package rle implements the run-length encoding used for Ethereum data.
88 | consensus Provide some consensus algorithms from Ethereum, such as ethhash, clique(proof-of-authority)
89 | console console package
90 | contracts Smart contracts deployed in genesis block, such as checkqueue, DAO...
91 | core Ethereum's core data structures and algorithms (virtual machines, states, blockchains, Bloom filters)
92 | crypto Encryption and hash algorithms
93 | eth Implemented the consensus of Ethereum
94 | ethclient Provides RPC client for Ethereum
95 | ethdb Eth's database (including the actual use of leveldb and the in-memory database for testing)
96 | ethstats Provide a report on the status of the network
97 | event Handling real-time events
98 | les Implemented a lightweight server of Ethereum
99 | light Achieve on-demand retrieval for Ethereum lightweight clients
100 | log Provide log information that is friendly to humans and computers
101 | metrics Provide disk counter, publish to grafana for sample
102 | miner Provide block creation and mining in Ethereum
103 | mobile Some wrappers used by the mobile side
104 | node Ethereum's various types of nodes
105 | p2p Ethereum p2p network protocol
106 | rlp Ethereum serialization, called recursive length prefix
107 | rpc Remote method call, used in APIs and services
108 | swarm Swarm network processing
109 | tests testing purposes
110 | trie Ethereum's important data structure package: trie implements Merkle Patricia Tries.
111 | whisper A protocol for the whisper node is provided.
112 |
113 | It can be seen that the code of Ethereum is still quite large, but roughly speaking, the code structure is still quite good. I hope to analyze from some relatively independent modules. Then delve into the internal code. The focus may be on modules such as p2p networks that are not covered in the Yellow Book.
114 |
--------------------------------------------------------------------------------
/references/yellowpaper/JS.tex:
--------------------------------------------------------------------------------
1 | \section{Javascript API}\label{app:jsapi}
2 |
3 | The JavaScript API provides a consistent API across multiple scenarios including each of the clients' web-based in-process \DH{}App frameworks and the out-of-process RPC-based infrastructure. All key access takes places though the special \texttt{eth} object, part of the global namespace.
4 |
5 | \subsection{Values}
6 | There are no special object types in the API; all values are strings. As strings, values may be of several forms, and are interpreted by the API according to a series of rules:
7 |
8 | \begin{enumerate}
9 | \item If the string contains only digits from 0-9, then it is interpreted as a decimal integer;
10 | \item if the string begins with the characters \texttt{0x}, then it is interpreted as a hexadecimal integer;
11 | \item it is interpreted as a binary string otherwise.
12 | \end{enumerate}
13 |
14 | The only exception to this are for parameters that expect a binary string; in this case the string is always interpreted as such.
15 |
16 | Values are implicitly converted between integers and hashes/byte-arrays; when this happens, integers are interpreted as big-endian as is standard for Ethereum. The following forms are allowed; they are all interpreted in the same way:
17 |
18 | \begin{enumerate}
19 | \item \texttt{"4276803"}
20 | \item \texttt{"0x414243"}
21 | \item \texttt{"ABC"}
22 | \end{enumerate}
23 |
24 | In each case, they are interpreted as the number 4276803. The first two values may be alternated between with the additional String methods \texttt{bin()} and \texttt{unbin()}.
25 |
26 | As byte arrays, values may be concatenated with the \texttt{+} operator as is normal for strings.
27 |
28 | Strings also have a number of additional methods to help with conversion and alignment when switching between addresses, 256-bit integers and free-form byte-arrays for transaction data:
29 |
30 | \begin{itemize}
31 | \item \texttt{bin()}: Converts the string to binary format.
32 | \item \texttt{pad(l)}: Converts the string to binary format (ready for data parameters) and pads with zeroes until it is of width \texttt{l}. Will pad to the left if the original string is numeric, or to the right if binary. If \texttt{l} is less than the width of the string, it is resized accordingly.
33 | \item \texttt{pad(a, b)}: Converts the string to binary format (ready for data parameters) and pads with zeroes on the left size until it is of width \texttt{a}. Then pads with zeroes on the right side until it has grown to size \texttt{b}. If \texttt{b} is less hat the width of the string, it is resized accordingly.
34 | \item \texttt{unbin()}: Converts the string from binary format to hex format.
35 | \item \texttt{unpad()}: Converts the string from binary format to hex format, first removing any zeroes from the right side.
36 | \item \texttt{dec()}: Converts the string to decimal format (typically from hex).
37 | \end{itemize}
38 |
39 | \subsection{The \texttt{eth} object}
40 |
41 | \subsubsection{Properties}
42 |
43 | For each such item, there is also an asynchronous method, taking a parameter of the callback function, itself taking a single parameter of the property's return value and of the same name but prefixed with get and recapitalised, e.g. \texttt{getCoinbase(fn)}.
44 |
45 | \begin{itemize}
46 | \item \texttt{coinbase} Returns the coinbase address of the client.
47 | \item \texttt{isListening} Returns true if and only if the client is actively listening for network connections.
48 | \item \texttt{isMining} Returns true if and only if the client is actively mining new blocks.
49 | \item \texttt{gasPrice} Returns the client's present price of gas.
50 | \item \texttt{key} Returns the special key-pair object corresponding to the preferred account owned by the client.
51 | \item \texttt{keys} Returns a list of the special key-pair objects corresponding to all accounts owned by the client.
52 | \item \texttt{peerCount} Returns the number of peers currently connected to the client.
53 | \end{itemize}
54 |
55 | \subsubsection{Synchronous Getters}
56 | For each such item, there is also an asynchronous method, taking an additional parameter of the callback function, itself taking a single parameter of the synchronous method's return value and of the same name but prefixed with get and recapitalised, e.g. \texttt{getBalanceAt(a, fn)}.
57 |
58 | \begin{itemize}
59 | \item \texttt{balanceAt(a)} Returns the balance of the account of address given by the address \texttt{a}.
60 | \item \texttt{storageAt(a, x)} Returns the value in storage at position given by the number x of the account of address given by the address \texttt{a}.
61 | \item \texttt{txCountAt(a)} Returns the number of transactions send from the account of address given by \texttt{a}.
62 | \item \texttt{isContractAt(a)} Returns true if the account of address given by \texttt{a} has associated code.
63 | \end{itemize}
64 |
65 | \subsubsection{Transactions}
66 |
67 | \begin{itemize}
68 | \item \texttt{create(sec, xEndowment, bCode, xGas, xGasPrice, fn)} Creates a new contract-creation transaction, given parameters:
69 | \begin{itemize}
70 | \item \texttt{sec}, the secret-key for the sender;
71 | \item \texttt{xEndowment}, the number equal to the account's endowment;
72 | \item \texttt{bCode}, the binary string (byte array) of EVM-bytecode for the initialisation of the account;
73 | \item \texttt{xGas}, the number equal to the amount of gas to purchase for the transaction (unused gas is refunded);
74 | \item \texttt{xGasPrice}, the number equal to the price of gas for this transaction. Returns the special address object representing the new account; and
75 | \item \texttt{fn}, the callback function, called on completion of the transaction.
76 | \end{itemize}
77 | \item \texttt{transact(sec, xValue, aDest, bData, xGas, xGasPrice, fn)} Creates a new message-call transaction, given parameters:
78 | \begin{itemize}
79 | \item \texttt{sec}, the secret-key for the sender;
80 | \item \texttt{xValue}, the value transferred for the transaction (in Wei);
81 | \item \texttt{aDest}, the address representing the destination address of the message;
82 | \item \texttt{bData}, the binary string (byte array), containing the associated data of the message;
83 | \item \texttt{xGas}, the amount of gas to purchase for the transaction (unused gas is refunded);
84 | \item \texttt{xGasPrice}, the price of gas for this transaction; and
85 | \item \texttt{fn}, the callback function, called on completion of the transaction.
86 | \end{itemize}
87 | \end{itemize}
88 |
89 | \subsubsection{Events}
90 |
91 | \begin{itemize}
92 | \item \texttt{watch(a, fn)}: Registers \texttt{fn} as a callback for whenever anything about the state of the account at address \texttt{a} changes, and also on the initial load.
93 | \item \texttt{watch(a, x, fn)}: Registers \texttt{fn} as a callback for whenever the storage location \texttt{x} of the account at address \texttt{a} changes, and also on the initial load.
94 | \item \texttt{newBlock(fn)}: Registers \texttt{fn} as a callback for whenever the state changes, and also on the initial load.
95 | \end{itemize}
96 |
97 | \subsubsection{Misc}
98 |
99 | \begin{itemize}
100 | \item \texttt{secretToAddress(a)}: Determines the address from the secret key \texttt{a}.
101 | \end{itemize}
102 |
103 |
--------------------------------------------------------------------------------
/symbol-index.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 |  It is the state of t+1 (account trie).
4 |
5 |  It is a state transition function, which can also be understood as an execution engine.
6 |
7 |  is a transaction
8 |
9 | 
10 |
11 |  Is a state transition function at the block level.
12 |
13 |  It is a block and consists of many transactions.
14 |
15 |  Transaction at position 0.
16 |
17 |  Is the block termination state transition function (a function that rewards the miner).
18 |
19 |  Ether logo
20 |
21 |  The conversion relationship between the various units used in Ethereum and Wei (for example: a Finney corresponds to 10^15 Wei).
22 |
23 |  machine-state
24 |
25 | ## Some basic rules
26 |
27 | - For most functions, they are identified by uppercase letters.
28 | - Tuples are generally identified by capital letters
29 | - A scalar or fixed-size array of bytes is identified by a lowercase letter. For example, n represents the nonce of the transaction, and there may be some exceptions, such as δ representing the amount of stack data required for a given instruction.
30 | - Variable-length byte arrays generally use bold lowercase letters. For example, **o** represents the output data of a message call. It is also possible to use bold uppercase letters for some important things.
31 |
32 |  Byte sequence
33 |  positive integer
34 |  byte sequence length of 32 bytes
35 |
36 |  positive integer smaller than 2 ^ 256,
37 | **[ ]** is used to index array elements which corresponds to
38 |  the first object represents a stack machine (machine's stack) is
39 |  representative of the machine memory (machine's memory) the first 32 elements inside
40 |  a placeholder, any character may be representative of any object
41 |
42 |  Value representative of the object to be modified
43 |  intermediate state
44 |  中间状态 2
45 |   intermediate state 2
46 | if the previous f represents a function, f \* represents the back of a similar function, but a function f is sequentially executed for the inner elements.
47 |
48 |  It represents the final element of the list which is
49 |  representative of the last element of the list which is
50 |
51 |  seeking the length x
52 |
53 |  a represents an address representing the hash of the root hash code of the nonce
54 |  banlance
55 |  storage trie of root hash
56 |  If code is b then KEC(b)=== this hash
57 |
58 | 
59 |
60 |  world state collapse function
61 | 
62 |
63 |  any
64 |  or
65 |  and
66 |
67 |  Homestead
68 |
69 | ## Transaction
70 |
71 |  transaction nonce
72 |  gasPrice
73 |  gasLimit
74 |  to
75 |  value
76 |
77 | The sender's address can be obtained by three values.
78 |
79 |  Initialization code contract
80 |  method call
81 | 
82 |
83 | ## Block head
84 |
85 | ParentHash
86 | OmmersHash
87 | beneficiary miner address
88 | stateRoot
89 | transactionRoot
90 | receiptRoot
91 | logsBloom
92 | difficult
93 | height number
94 | gasLimit
95 | gasUsed
96 | timestamp
97 | extraData
98 | mixHash
99 | nonce
100 |
101 | ## Receipt
102 |
103 |  Recitt of the i-th transaction
104 |
105 | 
106 |  World-state after execution of the transaction
107 |
108 | the total amount of gas after transaction execution block
109 | present transaction log to perform all the Bloom filter data generated
110 | set of logs generated by the transaction
111 |
112 |  Log entry Oa log generated address, Ot topic Od time
113 |
114 | ## Transaction execution
115 |
116 |  substate
117 |  suicide set
118 |  log series
119 |  refund balance
120 |
121 |  The total amount of gas used during the transaction.
122 |  The log generated by the transaction.
123 |
124 |  Execute code owner
125 |  originator of the transaction
126 |  gasPrice
127 |  inputdata
128 |  cause the address to execute code, so if a transaction is the transaction originator
129 |  value
130 |  code needs to execute
131 |  the current header area
132 |  current call depth
133 |
134 |  Execution model **s** suicide set; **l** log collection **o** output; **r** refund
135 |
136 |  Execution function
137 |
138 |  Currently available gas
139 |  program counter
140 |  the memory contents of
141 |  the memory word effective amount of
142 |  stack contents
143 |
144 |  w represents the current instruction that needs to be executed
145 |
146 |  Stack Object number of instructions required to remove
147 |  the number of instructions required to increase the stack object
148 |
--------------------------------------------------------------------------------
/trie-structure.md:
--------------------------------------------------------------------------------
1 | ## Overview
2 |
3 | The Trie tree, also known as the dictionary tree, word search tree or prefix tree, is a multi-fork tree structure for fast retrieval. For example, the dictionary tree of English letters is a 26-fork tree, and the digital dictionary tree is a 10-fork tree.
4 |
5 | The word Trie comes from re trie ve, pronounced /tri:/ "tree", and some people read /traɪ/ "try".
6 |
7 | Trie trees can take advantage of the common prefix of strings to save storage space. As shown in the following figure, the trie tree saves 6 strings tea, ten, to, in, inn, int with 10 nodes:
8 |
9 | 
10 |
11 | In the trie tree, the common prefix for the strings in, inn, and int is "in", so you can store only one copy of "in" to save space. Of course, if there are a large number of strings in the system and these strings have no common prefix, the corresponding trie tree will consume a lot of memory, which is also a disadvantage of the trie tree.
12 |
13 | The basic properties of the Trie tree can be summarized as:
14 |
15 | 1. The root node does not contain characters, except for the root node, each node contains only one character.
16 |
17 | 1. From the root node to a node, the characters passing through the path are connected, which is the string corresponding to the node.
18 |
19 | 1. All the children of each node contain different strings.
20 |
21 | ## Operations
22 |
23 | The Insert, Delete, and Find of the letter tree are very simple. You can use a one-cycle loop, that is, the i-th loop finds the subtree corresponding to the first i letters, and then performs the corresponding operations. To implement this letter tree, we can save it with the most common array (static memory), and of course we can also open dynamic pointer types (dynamically open memory). As for the point of the node to the son, there are generally three ways:
24 |
25 | 1. Open an array of letter set size for each node, the corresponding subscript is the letter represented by the son, and the content is the position of the son corresponding to the large array, that is, the label;
26 |
27 | 2. Hang a linked list for each node and record who each son is in a certain order;
28 |
29 | 3. Record the tree using the left son and the right brother.
30 |
31 | The three methods have their own characteristics. The first type is easy to implement, but the actual space requirements are relatively large; the second type is easier to implement, the space requirement is relatively small, but it is more time consuming; the third type, the space requirement is the smallest, but it is relatively time consuming and difficult to write.
32 |
33 | The following shows the implementation of dynamic memory development:
34 |
35 | ```C
36 | #define MAX_NUM 26
37 | enum NODE_TYPE{ //"COMPLETED" means a string is generated so far.
38 | COMPLETED,
39 | UNCOMPLETED
40 | };
41 | struct Node {
42 | enum NODE_TYPE type;
43 | char ch;
44 | struct Node* child[MAX_NUM]; //26-tree->a, b ,c, .....z
45 | };
46 |
47 | struct Node* ROOT; //tree root
48 |
49 | struct Node* createNewNode(char ch){
50 | // create a new node
51 | struct Node *new_node = (struct Node*)malloc(sizeof(struct Node));
52 | new_node->ch = ch;
53 | new_node->type == UNCOMPLETED;
54 | int i;
55 | for(i = 0; i < MAX_NUM; i++)
56 | new_node->child[i] = NULL;
57 | return new_node;
58 | }
59 |
60 | void initialization() {
61 | //intiazation: creat an empty tree, with only a ROOT
62 | ROOT = createNewNode(' ');
63 | }
64 |
65 | int charToindex(char ch) { //a "char" maps to an index
66 | return ch - 'a';
67 | }
68 |
69 | int find(const char chars[], int len) {
70 | struct Node* ptr = ROOT;
71 | int i = 0;
72 | while(i < len) {
73 | if(ptr->child[charToindex(chars[i])] == NULL) {
74 | break;
75 | }
76 | ptr = ptr->child[charToindex(chars[i])];
77 | i++;
78 | }
79 | return (i == len) && (ptr->type == COMPLETED);
80 | }
81 |
82 | void insert(const char chars[], int len) {
83 | struct Node* ptr = ROOT;
84 | int i;
85 | for(i = 0; i < len; i++) {
86 | if(ptr->child[charToindex(chars[i])] == NULL) {
87 | ptr->child[charToindex(chars[i])] = createNewNode(chars[i]);
88 | }
89 | ptr = ptr->child[charToindex(chars[i])];
90 | }
91 | ptr->type = COMPLETED;
92 | }
93 | ```
94 |
95 | ## Advanced implementation
96 |
97 | Can be implemented in a double array (Double-Array). The use of double arrays can greatly reduce memory usage
98 |
99 | 
100 |
101 | ## Usecases
102 |
103 | Trie is a very simple and efficient data structure, but there are a large number of application examples.
104 |
105 | (1) String retrieval
106 |
107 | Save some information about the known strings (dictionaries) in the trie tree in advance to find out if other unknown strings have appeared or appeared frequently.
108 |
109 | Example:
110 |
111 | @ Give a list of vocabulary words consisting of N words, and an article written in lowercase English. Please write all the new words that are not in the vocabulary list in the earliest order.
112 |
113 | @ Give a dictionary where the words are bad words. Words are all lowercase letters. A piece of text is given, and each line of text is also composed of lowercase letters. Determine if the text contains any bad words. For example, if rob is a bad word, the text problem contains bad words.
114 |
115 | (2) The longest common prefix of the string
116 |
117 | The Trie tree uses the common prefix of multiple strings to save storage space. Conversely, when we store a large number of strings on a trie tree, we can quickly get the common prefix of some strings.
118 |
119 | Example:
120 |
121 | @ Give N lowercase English alphabet strings, and Q queries, which is the length of the longest common prefix for asking two strings?
122 |
123 | Solution: First create a corresponding letter tree for all strings. At this point, it is found that the length of the longest common prefix for the two strings is the number of common ancestors of the node at which they are located, so the problem translates into the nearest Recent Common Ancestor (LCA) problem.
124 |
125 | The recent public ancestor problem is also a classic problem, which can be done in the following ways:
126 |
127 | 1. Using the Disjoint Set, you can use the classic Tarjan algorithm;
128 |
129 | 2. After finding the Euler Sequence of the letter tree, you can turn it into the classic Minimum Minimum Query (RMQ) problem.
130 |
131 | (About and check, Tarjan algorithm, RMQ problem, there is a lot of information online.)
132 |
133 | (3) Sorting
134 |
135 | The Trie tree is a multi-fork tree. As long as the whole tree is pre-ordered, the corresponding string is output in lexicographic order.
136 |
137 | Example:
138 |
139 | @ Gives you N different English names consisting of only one word, letting you sort them out lexicographically from small to large.
140 |
141 | (4) As an auxiliary structure of other data structures and algorithms
142 |
143 | Such as suffix tree, AC automaton, etc.
144 |
145 | ## Trie tree complexity analysis
146 |
147 | (1) The time complexity of insertion and lookup is O(N), where N is the length of the string.
148 |
149 | (2) The space complexity is 26^n level, which is very large (can be improved by double array implementation).
150 |
151 | ## Summary
152 |
153 | Trie tree is a very important data structure. It has a wide range of applications in information retrieval, string matching, etc. At the same time, it is also the basis of many algorithms and complex data structures, such as suffix trees, AC automata, etc. Therefore, Mastering the data structure of Trie Tree is very basic and necessary for an IT staff!
154 |
--------------------------------------------------------------------------------
/references/yellowpaper/Paper.reflib:
--------------------------------------------------------------------------------
1 |
2 |
3 | Biblio.bib
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 | cryptoeprint:2013:881
12 |
13 | Misc
14 |
15 | Accelerating Bitcoin’s Transaction Processing. Fast Money Grows on Trees, Not Chains
16 | Sompolinsky, Yonatan and Zohar, Aviv
17 |
18 |
19 |
20 |
21 | 2013
22 | http://eprint.iacr.org/
23 | Cryptology ePrint Archive, Report 2013/881
24 |
25 |
26 |
27 |
28 | gura2004comparing
29 |
30 | InCollection
31 |
32 | Comparing elliptic curve cryptography and RSA on 8-bit CPUs
33 | Gura, Nils and Patel, Arun and Wander, Arvinderpal and Eberle, Hans and Shantz, Sheueling Chang
34 |
35 |
36 |
37 | 119-132
38 | 2004
39 | Cryptographic Hardware and Embedded Systems-CHES 2004
40 | Springer
41 |
42 |
43 |
44 |
45 | laurie2004proof
46 |
47 | InProceedings
48 |
49 | Proof-of-Work” proves not to work; version 0.2
50 | Laurie, Ben and Clayton, Richard
51 |
52 |
53 |
54 |
55 | 2004
56 | Workshop on Economics and Information, Security
57 |
58 |
59 |
60 |
61 | nakamoto2008bitcoin
62 |
63 | Article
64 |
65 | Bitcoin: A peer-to-peer electronic cash system
66 | Nakamoto, Satoshi
67 | Consulted
68 | 1
69 |
70 | 2012
71 | 2008
72 |
73 |
74 |
75 |
76 | sprankel2013technical
77 |
78 | Misc
79 |
80 | Technical Basis of Digital Currencies
81 | Sprankel, Simon
82 |
83 |
84 |
85 |
86 | 2013
87 |
88 |
89 |
90 |
91 | aron2012bitcoin
92 |
93 | Article
94 |
95 | BitCoin software finds new life
96 | Aron, Jacob
97 | New Scientist
98 | 213
99 | 2847
100 | 20
101 | 2012
102 | Elsevier
103 |
104 |
105 |
106 |
107 | mastercoin2013willett
108 |
109 | article
110 |
111 | MasterCoin Complete Specification
112 | J. R. Willett
113 |
114 |
115 |
116 |
117 | 2013
118 | https://github.com/mastercoin-MSC/spec
119 |
120 |
121 |
122 |
123 | colouredcoins2012rosenfeld
124 |
125 | article
126 |
127 | Overview of Colored Coins
128 | Meni Rosenfeld
129 |
130 |
131 |
132 |
133 | 2012
134 | https://bitcoil.co.il/BitcoinX.pdf
135 |
136 |
137 |
138 |
139 | boutellier2014pirates
140 |
141 | incollection
142 |
143 | Pirates, Pioneers, Innovators and Imitators
144 | Boutellier, Roman and Heinzen, Mareike
145 |
146 |
147 |
148 | 85-96
149 | 2014
150 | Growth Through Innovation
151 | Springer
152 |
153 |
154 |
155 |
156 | szabo1997formalizing
157 |
158 | Article
159 |
160 | Formalizing and securing relationships on public networks
161 | Szabo, Nick
162 | First Monday
163 | 2
164 | 9
165 |
166 | 1997
167 |
168 |
169 |
170 |
171 | miller1997future
172 |
173 | InProceedings
174 |
175 | The Future of Law
176 | Miller, Mark
177 |
178 |
179 |
180 |
181 | 1997
182 | paper delivered at the Extro 3 Conference (August 9)
183 |
184 |
185 |
186 |
187 | buterin2013ethereum
188 |
189 | article
190 |
191 | Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform
192 | Vitalik Buterin
193 |
194 |
195 |
196 |
197 | 2013
198 | http://ethereum.org/ethereum.html
199 |
200 |
201 |
202 |
--------------------------------------------------------------------------------
/rlp-more.md:
--------------------------------------------------------------------------------
1 | ## Ethereum RLP coding
2 |
3 | > RLP (Recursive Length Prefix), which is the encoding method used in the serialization of Ethereum. RLP is mainly used for network transmission and persistent storage of data in Ethereum.
4 |
5 | ### Why do you have to rebuild wheels?
6 |
7 | There are many methods for object serialization, such as JSON encoding, but JSON has an obvious disadvantage: the encoding result is relatively large. For example, the following structure:
8 |
9 | ```go
10 | type Student struct{
11 | Name string `json:"name"`
12 | Sex string `json:"sex"`
13 | }
14 | s := Student{Name:"icattlecoder", Sex:"male"}
15 | bs,_ := json.Marsal(&s)
16 | print(string(bs))
17 | // {"name":"icattlecoder","sex":"male"}
18 | ```
19 |
20 | Variable s has serialization result `{"name":"icattlecoder","sex":"male"}`, the length of the string is 35, the actual data is valid `icattlecoder`, and `male` a total of 16 bytes, we can see that too much redundant information is introduced when serializing JSON. Assuming Ethereum uses JSON to serialize, then the original 50GB blockchain may now be 100GB, which is double in size.
21 |
22 | Therefore, Ethereum needs to design a coding method with smaller results.
23 |
24 | ### RLP encoding definition
25 |
26 | RLP actually only encodes the following two types of data:
27 |
28 | 1. Byte array
29 |
30 | 2. An array of byte arrays, called a list
31 |
32 | **Rule 1** : For a single byte whose value is between [0, 127], its encoding is itself.
33 |
34 | Example 1: `a` The encoding is `97`.
35 |
36 | **Rule 2** : If the byte array is long `l <= 55`, the result of the encoding is the array itself, plus the `128+l` prefix.
37 |
38 | Example 2: The empty string encoding is `128`, ie `128 = 128 + 0`.
39 |
40 | Example 3: The `abc` result of the encoding is `131 97 98 99`, in which `131=128+len("abc")`, `97 98 99` in order `a b c`.
41 |
42 | **Rule 3** : If the array length is greater than 55, the first result of the encoding is the length of the encoding of 183 plus the length of the array, then the encoding of the length of the array itself, and finally the encoding of the byte array.
43 |
44 | Example 4: Encode the following string:
45 |
46 | ```text
47 | The length of this sentence is more than 55 bytes, I know it because I pre-designed it
48 | ```
49 |
50 | This string has a total of 86 bytes, and the encoding of 86 requires only one byte, which is its own, so the result of the encoding is as follows:
51 |
52 | ```byte
53 | 184 86 84 104 101 32 108 101 110 103 116 104 32 111 102 32 116 104 105 115 32 115 101 110 116 101 110 99 101 32 105 115 32 109 111 114 101 32 116 104 97 110 32 53 53 32 98 121 116 101 115 44 32 73 32 107 110 111 119 32 105 116 32 98 101 99 97 117 115 101 32 73 32 112 114 101 45 100 101 115 105 103 110 101 100 32 105 116
54 | ```
55 |
56 | The first three bytes are calculated as follows:
57 |
58 | 1. `184 = 183 + 1` Because the array length is `86` encoded and only takes up one byte.
59 | 2. `86` Array length is `86`
60 | 3. `84` the `T` character
61 |
62 | **Rule 4** : If the list length is less than 55, the first bit of the encoding result is the length of the encoding of the 192 plus list length, and then the encoding of each sublist is sequentially connected.
63 |
64 | Note that rule 4 itself is recursively defined.
65 | Example 6: `["abc", "def"]` The result of the encoding is `200 131 97 98 99 131 100 101 102`.
66 | Where in `abc` the encoded `131 97 98 99`, `def` encoding is `131 100 101 102`. The total length of the two encoded sub-strings is 8, so the encoding results of the calculated one: `192 + 8 = 200`.
67 |
68 | **Rule 5** : If the list length exceeds 55, the first digit of the encoding result is the encoding length of 247 plus the length of the list, then the encoding of the length of the list itself, and finally the encoding of each sublist is connected in turn.
69 |
70 | Rule 5 itself is also recursively defined, similar to rule 3.
71 |
72 | Example 7:
73 |
74 | ```text
75 | ["The length of this sentence is more than 55 bytes, ", "I know it because I pre-designed it"]
76 | ```
77 |
78 | The coding result is:
79 |
80 | ```byte
81 | 248 88 179 84 104 101 32 108 101 110 103 116 104 32 111 102 32 116 104 105 115 32 115 101 110 116 101 110 99 101 32 105 115 32 109 111 114 101 32 116 104 97 110 32 53 53 32 98 121 116 101 115 44 32 163 73 32 107 110 111 119 32 105 116 32 98 101 99 97 117 115 101 32 73 32 112 114 101 45 100 101 115 105 103 110 101 100 32 105 116
82 | ```
83 |
84 | The first two bytes are calculated as follows:
85 |
86 | 1. `248 = 247 +1`
87 | 2. `88 = 86 + 2` In the example of rule 3 , the length is `86`, and in this example, since there are two substrings, the encoding of the length of each substring itself occupies 1 byte each, so it occupies 2 bytes in total.
88 |
89 | The first three bytes `179` in accordance with **Rule 2** stars `179 = 128 + 51`
90 |
91 | The 55th byte is `163` also derived from **Rule 2** `163 = 128 + 35`
92 |
93 | ### RLP decoding
94 |
95 | When decoding, first `f` perform the following rule judgment according to the size of the first byte of the encoding result:
96 |
97 | 1. If f∈ [0,128), then it is a byte itself.
98 |
99 | 2. If f∈[128,184), then it is a byte array with a length of 55 or less. The length of the array is `l=f-128`
100 |
101 | 3. If f∈[184,192), then it is an array of length over 55, the length of the length of the code itself `ll=f-183`, and then read the bytes of length ll from the second byte, encoded into integers according to BigEndian, l Is the length of the array.
102 |
103 | 4. If f∈(192,247], then it is a list with a total length of not more than 55, and the list length is `l=f-192`. Recursively uses rules 1~4 for decoding.
104 |
105 | 5. If f∈(247,256], then it is a list with a length greater than 55 after encoding, its length is itself encoded length `ll=f-247`, and then reads bytes of length ll from the second byte, encoded by BigEndian into integers l, l This is the length of the sublist. It is then recursively decoded according to the decoding rules.
106 |
107 | The above explains what is called **recursive length prefix** encoding, which itself explains the encoding rules very well.
108 |
109 | Sample python code to debug RLP encoding:
110 |
111 | ```python
112 | import sys
113 | import json
114 | from termcolor import colored
115 |
116 | def rlp_encode(input):
117 | if isinstance(input, str):
118 | if len(input) == 1 and ord(input) < 0x80:
119 | return input
120 | else:
121 | return encode_length(len(input), 0x80) + input
122 | elif isinstance(input, list):
123 | output = ''
124 | for item in input:
125 | output += rlp_encode(item)
126 | return encode_length(len(output), 0xc0) + output
127 |
128 |
129 | def encode_length(L, offset):
130 | if L < 56:
131 | return chr(L + offset)
132 | elif L < 256**8:
133 | BL = to_binary(L)
134 | return chr(len(BL) + offset + 55) + BL
135 | else:
136 | raise Exception("input too long")
137 |
138 |
139 | def to_binary(x):
140 | if x == 0:
141 | return ''
142 | else:
143 | return to_binary(int(x / 256)) + chr(x % 256)
144 |
145 |
146 | def format_rlp_encode(input):
147 | output = []
148 | for c in input:
149 | ordC = ord(c)
150 | if ordC >= 0x80:
151 | output.append("0x{:02x}".format(ordC))
152 | else:
153 | output.append("'{}'".format(c))
154 |
155 | return "[ " + ", ".join(output) + " ]"
156 |
157 |
158 | # run as source file
159 | if __name__ == "__main__":
160 |
161 | represent = "The "
162 | obj_type = "string"
163 | if len(sys.argv) == 2:
164 | input = sys.argv[1]
165 | if input.startswith("json"):
166 | input = json.loads(input[4:])
167 | obj_type = "list"
168 | else:
169 | input = sys.argv[1:]
170 | obj_type = "list"
171 |
172 | if len(input) == 0:
173 | represent += "empty "
174 | represent += obj_type
175 |
176 | represent += " {} = ".format(json.dumps(input))
177 | # finally output
178 | output = rlp_encode(input)
179 | represent += format_rlp_encode(output)
180 | print(colored(represent, 'green'))
181 | ```
182 |
--------------------------------------------------------------------------------
/references/yellowpaper/Wire.tex:
--------------------------------------------------------------------------------
1 | \section{Wire Protocol}\label{app:wire}
2 | The wire-protocol specifies a network-level protocol for how two peers can communicate. It includes handshake procedures and the means for transferring information such as peers, blocks and transactions. Peer-to-peer communications between nodes running Ethereum clients are designed to be governed by a simple wire-protocol making use of existing Ethereum technologies and standards such as RLP wherever practical.
3 |
4 | Ethereum nodes may connect to each other over TCP only. Peers are free to advertise and accept connections on any port(s) they wish, however, a default port on which the connection may be listened and made will be 30303.
5 |
6 | Though TCP provides a connection-oriented medium, Ethereum nodes communicate in terms of packets. These packets are formed as a 4-byte synchronisation token (0x22400891), a 4-byte "payload size", to be interpreted as a big-endian integer and finally an N-byte \textbf{RLP-serialised} data structure, where N is the aforementioned "payload size". To be clear, the payload size specifies the number of bytes in the packet ''following'' the first 8.
7 |
8 | There are a number of different types of message that may be sent. This ``type'' is always determined by the first entry of the structure, represented as a scalar. The structure of each message type is described below.
9 |
10 | \begin{tabular*}{\columnwidth}[h]{rlll}
11 | \toprule
12 | \multicolumn{4}{c}{\textbf{00s: Session control}} \vspace{5pt} \\
13 | \textbf{Value} & \textbf{Mnemonic} & \textbf{Expected Reply} & \textbf{Packet Format} \vspace{5pt} \\
14 | 0x00 & \textsc{Hello} & & $(\text{0x}00, v \in \mathbb{P}, n \in \mathbb{P}, \mathbf{i} \in \mathbb{B}, c \in \mathbb{P}, p \in \mathbb{P}, u \in \mathbb{B}_{64})$ \\
15 | & \multicolumn{3}{p{0.8\columnwidth}}{
16 | This is the first packet sent over the connection, and sent once by both sides. No other messages may be sent until a \textsc{Hello} is received.
17 | \begin{itemize}
18 | \item $v$ is the Protocol Version. See the latest documentation for which version is current.
19 | \item $n$ is the Network Id should be 0.
20 | \item $\mathbf{i}$ is the Client Id and specifies the client software identity as a human-readable string (e.g. ``Ethereum(++)/1.0.0'').
21 | \item $c$ is the client's Capabilities and specifies the capabilities of the client as a set of flags; presently three bits are used:
22 | \begin{description}
23 | \item[0x01] Client provides peer discovery service;
24 | \item[0x02] Client provides transaction relaying service;
25 | \item[0x04] Client provides block-chain querying service.
26 | \end{description}
27 | \item $p$ is the Listen Port and specifies the port that the client is listening on (on the interface that the present connection traverses). If 0 it indicates the client is not listening.
28 | \item $u$ is the Unique Identity of the node and specifies a 512-bit hash that identifies this node.
29 | \end{itemize}
30 | }\\
31 | \midrule
32 | 0x01 & \textsc{Disconnect} && $(\text{0x}01, r \in \mathbb{P})$ \\
33 | & \multicolumn{3}{p{0.8\columnwidth}}{
34 | Inform the peer that a disconnection is imminent; if received, a peer should disconnect immediately. When sending, well-behaved hosts give their peers a fighting chance (read: wait 2 seconds) to disconnect to before disconnecting themselves.
35 | \begin{itemize}
36 | \item $r$ is an integer specifying one of a number of reasons for disconnect:
37 | \begin{description}
38 | \item[0x00] Disconnect requested;
39 | \item[0x01] TCP sub-system error;
40 | \item[0x02] Bad protocol;
41 | \item[0x03] Useless peer;
42 | \item[0x04] Too many peers;
43 | \item[0x05] Already connected;
44 | \item[0x06] Incompatible network protocols;
45 | \item[0x07] Client quitting.
46 | \end{description}
47 | \end{itemize}
48 | }\\
49 | \midrule
50 | 0x02 & \textsc{Ping} & \textsc{Pong} & $(\text{0x}02)$ \\
51 | & \multicolumn{3}{p{0.8\columnwidth}}{Requests an immediate reply of \textsc{Pong} from the peer.}\\
52 | \midrule
53 | 0x03 & \textsc{Pong} && $(\text{0x}03)$ \\
54 | & \multicolumn{3}{p{0.8\columnwidth}}{Reply to peer's \textsc{Ping} packet.}\\
55 | \bottomrule
56 | \end{tabular*}
57 |
58 |
59 | \begin{tabular*}{\columnwidth}[h]{rlll}
60 | \toprule
61 | \multicolumn{4}{c}{\textbf{10s: Information}} \vspace{5pt} \\
62 | \textbf{Value} & \textbf{Mnemonic} & \textbf{Expected Reply} & \textbf{Packet Format} \vspace{5pt} \\
63 | 0x10 & \textsc{GetPeers} & \textsc{Peers} & $(\text{0x}10)$ \\
64 | & \multicolumn{3}{p{0.8\columnwidth}}{Request the peer to enumerate some known peers for us to connect to. This should include the peer itself.}\\
65 | \midrule
66 | 0x11 & \textsc{Peers} & & $(\text{0x}11, (a_0 \in \mathbb{B}_4, p_0 \in \mathbb{P}, i_0 \in \mathbb{B}_{64}), (a_1 \in \mathbb{B}_4, p_1 \in \mathbb{P}, i_1 \in \mathbb{B}_{64}), ...)$ \\
67 | & \multicolumn{3}{p{0.8\columnwidth}}{
68 | Specifies a number of known peers.
69 | \begin{itemize}
70 | \item $a_0$, $a_1$, ... is the node's IPv4 address, a 4-byte array that should be interpreted as the IP address $a_0[0]$.$a_0[1]$.$a_0[2]$.$a_0[3]$.
71 | \item $p_0$, $p_1$, ... is the node's Port and is an integer.
72 | \item $i_0$, $i_1$, ... is the node's Unique Identifier and is the 512-bit hash that serves to identify the node.
73 | \end{itemize}
74 | }\\
75 | \midrule
76 | 0x12 & \textsc{Transactions} & & $(\text{0x}12, L_T(T_0), L_T(T_1), ...)$ \\
77 | & \multicolumn{3}{p{0.8\columnwidth}}{
78 | where $L_T$ is the transaction preparation function, as specified in section \ref{ch:block}.
79 |
80 | Specify a transaction or transactions that the peer should make sure is included on its transaction queue. The items in the list (following the first item 0x12) are transactions in the format described in the main Ethereum specification.
81 | \begin{itemize}
82 | \item $T_0$, $T_1$, ... are the transactions that should be assimilated.
83 | \end{itemize}
84 | }\\
85 | \midrule
86 | 0x13 & \textsc{Blocks} && $(\text{0x}13, L_B(b_0), L_B(b_1), ...)$ \\
87 | & \multicolumn{3}{p{0.8\columnwidth}}{
88 | Where $L_B$ is the block preparation function, as specified in section \ref{ch:block}.
89 |
90 | Specify a block or blocks that the peer should know about. The items in the list (following the first item, 0x13) are blocks as described in the format described in the main specification.
91 | \begin{itemize}
92 | \item $b_0$, $b_1$, ... are the blocks that should be assimilated.
93 | \end{itemize}
94 | }\\
95 | \midrule
96 | 0x14 & \textsc{GetChain} & \textsc{Blocks} or \textsc{NotInChain} & $(\text{0x}14, p_0 \in \mathbb{B}_{32}, p_1 \in \mathbb{B}_{32}, ..., c \in \mathbb{P})$ \\
97 | & \multicolumn{3}{p{0.8\columnwidth}}{
98 | Request the peer to send $c$ blocks in the current canonical block chain that are children of one of a number of given blocks, according to a preferential order with $p_0$ being the most prefered. If the designated parent is the present block chain head, an empty reply should be sent. If none of the parents are in the current canonical block chain, then a NotInChain message should be sent along with $p_n$, the least preferential parent. If no parents are passed, then a reply need not be made.
99 | \begin{itemize}
100 | \item $p_0$, $p_1$, ... are the SHA3 hashes of the parents of blocks that we should be informed of with a \textsc{Blocks} reply. Typically, these will be specified in increasing age (or decreasing block number).
101 | \item $c$ is the number of children blocks of the most preferred parent that we should be informed of through the corresponding \textsc{Blocks} reply.
102 | \end{itemize}
103 | }\\
104 | \midrule
105 | 0x15 & \textsc{NotInChain} && $(\text{0x}15, p \in \mathbb{B}_{32})$ \\
106 | & \multicolumn{3}{p{0.8\columnwidth}}{Inform the peer that a particular block was not found in its block chain.
107 | \begin{itemize}
108 | \item $p$ is the SHA3 hash of the block that was not found in the block chain. Typically, this will be the least preferential (oldest) block hash given in a previous \textsc{GetChain} message.
109 | \end{itemize}
110 | }\\
111 | \midrule
112 | 0x16 & \textsc{GetTransactions} & \textsc{Transactions} & $(\text{0x}16)$ \\
113 | & \multicolumn{3}{p{0.8\columnwidth}}{Request the peer to send all transactions currently in the queue. See \textsc{Transactions}.}\\
114 | \bottomrule
115 | \end{tabular*}
116 |
117 |
--------------------------------------------------------------------------------
/references/yellowpaper/cancel.sty:
--------------------------------------------------------------------------------
1 | % cancel.sty version 2.2 12-Apr-2013.
2 | % Donald Arseneau asnd@triumf.ca
3 | % This software is contributed to the public domain by its author,
4 | % who disclaims all copyrights. For people and jurisdictions that
5 | % do not recognize contribution to the public domain, this software
6 | % is licensed by the terms of the unlicense, .
7 | %
8 | % Commands:
9 | % ~~~~~~~~~
10 | % \cancel draws a diagonal line (slash) through its argument.
11 | % \bcancel uses the negative slope (a backslash).
12 | % \xcancel draws an X (actually \cancel plus \bcancel).
13 | % \cancelto{}{} draws a diagonal arrow through the
14 | % expression, pointing to the value.
15 | %
16 | % The first three work in math and text mode, but \cancelto is only
17 | % for math mode.
18 | % The slope of the line or arrow depends on what is being cancelled.
19 | %
20 | % Options:
21 | % ~~~~~~~~
22 | % By default, none of these commands affects the horizontal spacing,
23 | % so they might over-print neighboring parts of the formula (or text).
24 | % They do add their height to the expression, so there should never be
25 | % unintended vertical overlap. There is a package option [makeroom] to
26 | % increase the horizontal spacing to make room for the cancellation value.
27 | %
28 | % If you use the color package, then you can declare
29 | % \renewcommand{\CancelColor}{}
30 | % and the cancellation marks will be printed in that color (e.g., \blue).
31 | % However, if you are using color, I recommend lightly shaded blocks rather
32 | % than diagonal arrows for cancelling.
33 | %
34 | % The option [thicklines] asks for heavier lines and arrows. This may be
35 | % useful when the lines are colored a light shade.
36 | %
37 | % The size (math style) of the \cancelto value depends on package options
38 | % according to this table:
39 | %
40 | % Current style [samesize] [smaller] [Smaller]
41 | % ------------- ---------------- ---------------- ----------------
42 | % \displaystyle \displaystyle \textstyle \scriptstyle
43 | % \textstyle \textstyle \scriptstyle \scriptstyle
44 | % \scriptstyle \scriptstyle \scriptscriptstyle \scriptscriptstyle
45 | % \scriptscriptstyle \scriptscriptstyle \scriptscriptstyle \scriptscriptstyle
46 | %
47 | % ("smaller" is the default behavior. It gives textstyle limits in
48 | % displaystyle, whereas "Smaller" gives scriptstyle limits.)
49 | %
50 | % This package is provided without guarantees or support. Drawing slashes
51 | % through math to indicate "cancellation" is poor design. I don't recommend
52 | % that you use this package at all.
53 |
54 | \ProvidesPackage{cancel}[2013/04/12 v2.2 Cancel math terms]
55 |
56 | \newcommand{\CancelColor}{}
57 | \newcommand{\cancelto}{1}% default option = smaller
58 | \let\canto@fil\hidewidth
59 | \let\canc@thinlines\thinlines
60 |
61 | \DeclareOption{samesize}{\def\cancelto{999}}
62 | \DeclareOption{smaller}{\def\cancelto{1}}
63 | \DeclareOption{Smaller}{\def\cancelto{0}}
64 | \DeclareOption{makeroom}{\def\canto@fil{\hfil}}
65 | \DeclareOption{overlap}{\let\canto@fil\hidewidth}
66 | \DeclareOption{thicklines}{\let\canc@thinlines\thicklines}
67 |
68 | \ProcessOptions
69 |
70 | \DeclareRobustCommand\cancel[1]{\ifmmode
71 | \mathpalette{\@cancel{\@can@slash{}}}{#1}\else
72 | \@cancel{\@can@slash{}}\hbox{#1}\fi}
73 | \DeclareRobustCommand\bcancel[1]{\ifmmode
74 | \mathpalette{\@cancel{\@can@slash{-}}}{#1}\else
75 | \@cancel{\@can@slash{-}}\hbox{#1}\fi}
76 | \DeclareRobustCommand\xcancel[1]{\ifmmode
77 | \mathpalette{\@cancel{\@can@slash{+}\@can@slash{-}}}{#1}\else
78 | \@cancel{\@can@slash{+}\@can@slash{-}}\hbox{#1}\fi}
79 |
80 | \newcommand\@cancel[3]{%
81 | \OriginalPictureCmds\@begin@tempboxa\hbox{\m@th$#2{#3}$}%
82 | \dimen@\height
83 | \setbox\@tempboxa\hbox{$\m@th\vcenter{\box\@tempboxa}$}%
84 | \advance\dimen@-\height % the difference in height
85 | \unitlength\p@ \canc@thinlines
86 | {\/\raise\dimen@\hbox{\ooalign{#1\hfil\box\@tempboxa\hfil \cr}}}%
87 | \@end@tempboxa
88 | }
89 |
90 | \def\@can@slash#1{\canto@fil$\m@th \CancelColor\vcenter{\hbox{%
91 | \dimen@\width \@min@pt\dimen@ 2\@min@pt\totalheight6%
92 | \ifdim\totalheight<\dimen@ % wide
93 | \@min@pt\dimen@ 8%
94 | \@tempcnta\totalheight \multiply\@tempcnta 5 \divide\@tempcnta\dimen@
95 | \advance\dimen@ 2\p@ % "+2"
96 | \edef\@tempa{(\ifcase\@tempcnta 6,#11\or 4,#11\or 2,#11\or 4,#13\else 1,#11\fi
97 | ){\strip@pt\dimen@}}%
98 | \else % tall
99 | \@min@pt\totalheight8%
100 | \advance\totalheight2\p@ % "+2"
101 | \@tempcnta\dimen@ \multiply\@tempcnta 5 \divide\@tempcnta\totalheight
102 | \dimen@ \ifcase\@tempcnta .16\or .25\or .5\or .75\else 1\fi \totalheight
103 | \edef\@tempa{(\ifcase\@tempcnta 1,#16\or 1,#14\or 1,#12\or 3,#14\else 1,#11\fi
104 | ){\strip@pt\dimen@}}%
105 | \fi
106 | \expandafter\line\@tempa}}$\canto@fil \cr}
107 |
108 | \ifcase\cancelto
109 | \def\cancelto#1#2{\mathchoice % Smaller option
110 | {\@cancelto\scriptstyle{#1}\displaystyle{#2}}%
111 | {\@cancelto\scriptstyle{#1}\textstyle{#2}}%
112 | {\@cancelto\scriptscriptstyle{#1}\scriptstyle{#2}}%
113 | {\@cancelto\scriptscriptstyle{#1}\scriptscriptstyle{#2}}%
114 | }
115 | \or
116 | \def\cancelto#1#2{\mathchoice % smaller option (default)
117 | {\@cancelto\textstyle{#1}\displaystyle{#2}}%
118 | {\@cancelto\scriptstyle{#1}\textstyle{#2}}%
119 | {\@cancelto\scriptscriptstyle{#1}\scriptstyle{#2}}%
120 | {\@cancelto\scriptscriptstyle{#1}\scriptscriptstyle{#2}}%
121 | }
122 | \else
123 | \def\cancelto#1#2{\mathchoice % samesize option
124 | {\@cancelto\textstyle{#1}\displaystyle{#2}}%
125 | {\@cancelto\textstyle{#1}\textstyle{#2}}%
126 | {\@cancelto\scriptstyle{#1}\scriptstyle{#2}}%
127 | {\@cancelto\scriptscriptstyle{#1}\scriptscriptstyle{#2}}%
128 | }
129 | \fi
130 |
131 | \newcommand\@cancelto[4]{%
132 | \OriginalPictureCmds\@begin@tempboxa\hbox{\m@th$#3{#4}$}%
133 | \dimen@\width % wide
134 | \@min@pt\dimen@ 2\@min@pt\totalheight4
135 | \ifdim\totalheight<\dimen@
136 | \@tempcnta\totalheight \multiply\@tempcnta 5 \divide\@tempcnta\dimen@
137 | \@tempdimb 3\p@ % extra width for arrowhead ("+2")
138 | \advance\dimen@ \ifcase\@tempcnta 5\or 5\or 4\or 3\else 2\fi \p@
139 | \@min@pt\dimen@9\advance\dimen@\p@
140 | \edef\@tempa{\ifcase\@tempcnta 5441\or 5441\or 5421\or 4443\else 3611\fi
141 | {\strip@pt\dimen@}{\strip@pt\@tempdimb}}%
142 | \def\@tempb{Cancel #4 to #2; case wide }%
143 | \else % tall
144 | \advance\totalheight3\p@ % "+2"
145 | \@tempcnta\dimen@ \multiply\@tempcnta 5 \divide\@tempcnta\totalheight
146 | \advance\totalheight3\p@ % "+2"
147 | \dimen@ \ifcase\@tempcnta .25\or .25\or .5\or .75\else 1\fi \totalheight
148 | \@tempdimb \ifcase\@tempcnta .8\or .8\or 1.2\or 1.5\else 2\fi \p@
149 | \edef\@tempa{\ifcase\@tempcnta 0814\or 0814\or 1812\or 2734\else 3611\fi
150 | {\strip@pt\dimen@}{\strip@pt\@tempdimb}}%
151 | \fi
152 | \dimen@\height
153 | \setbox\@tempboxa\hbox{$\m@th\vcenter{\box\@tempboxa}$}%
154 | \advance\dimen@-\height % the difference in height
155 | \unitlength\p@ \canc@thinlines
156 | {\/\raise\dimen@\hbox{\expandafter\canto@vector\@tempa{#1}{#2}}}%
157 | \@end@tempboxa
158 | }
159 |
160 | % #1, #2 offset of label #6 extra width to clear arrowhead
161 | % #3, #4 vector direction #7 superscript label style
162 | % #5 vector width #8 superscript label
163 | \def\canto@vector#1#2#3#4#5#6#7#8{%
164 | \dimen@.5\p@
165 | \setbox\z@\vbox{\boxmaxdepth.5\p@
166 | \hbox{\kern-1.2\p@\kern#1\dimen@$#7{#8}\m@th$}}%
167 | \ifx\canto@fil\hidewidth \wd\z@\z@ \else \kern-#6\unitlength \fi
168 | \ooalign{%
169 | \canto@fil$\m@th \CancelColor
170 | \vcenter{\hbox{\dimen@#6\unitlength \kern\dimen@
171 | \multiply\dimen@#4\divide\dimen@#3 \vrule\@depth\dimen@\@width\z@
172 | \vector(#3,#4){#5}%
173 | }}^{\raise#2\dimen@\copy\z@\kern-\scriptspace}$%
174 | \canto@fil \cr
175 | \hfil \box\@tempboxa \kern\wd\z@ \hfil \cr}}
176 |
177 | \def\@min@pt#1#2{\ifdim#1<#2\p@ #1#2\p@ \relax\fi}
178 |
179 | % pict2e removes bounding box from line and vector, so use original
180 | % versions by declaring \OriginalPictureCmds; make it a no-op if undefined
181 |
182 | \@ifundefined{OriginalPictureCmds}{\let\OriginalPictureCmds\relax}{}
183 |
184 | % Sometime maybe find a better solution that uses all slopes with pict2e
185 |
--------------------------------------------------------------------------------
/p2p-peer-analysis.md:
--------------------------------------------------------------------------------
1 | Inside the p2p code. The peer represents a created network link. Multiple protocols may be running on a single link. For example, the agreement of Ethereum (eth). Swarm's agreement. Or the agreement of Whisper.
2 |
3 | peer structure
4 |
5 | ```go
6 | type protoRW struct {
7 | Protocol
8 | in chan Msg // receices read messages
9 | closed <-chan struct{} // receives when peer is shutting down
10 | wstart <-chan struct{} // receives when write may start
11 | werr chan<- error // for write results
12 | offset uint64
13 | w MsgWriter
14 | }
15 |
16 | // Protocol represents a P2P subprotocol implementation.
17 | type Protocol struct {
18 | // Name should contain the official protocol name,
19 | // often a three-letter word.
20 | Name string
21 |
22 | // Version should contain the version number of the protocol.
23 | Version uint
24 |
25 | // Length should contain the number of message codes used
26 | // by the protocol.
27 | Length uint64
28 |
29 | // Run is called in a new groutine when the protocol has been
30 | // negotiated with a peer. It should read and write messages from
31 | // rw. The Payload for each message must be fully consumed.
32 | //
33 | // The peer connection is closed when Start returns. It should return
34 | // any protocol-level error (such as an I/O error) that is
35 | // encountered.
36 | Run func(peer *Peer, rw MsgReadWriter) error
37 |
38 | // NodeInfo is an optional helper method to retrieve protocol specific metadata
39 | // about the host node.
40 | NodeInfo func() interface{}
41 |
42 | // PeerInfo is an optional helper method to retrieve protocol specific metadata
43 | // about a certain peer in the network. If an info retrieval function is set,
44 | // but returns nil, it is assumed that the protocol handshake is still running.
45 | PeerInfo func(id discover.NodeID) interface{}
46 | }
47 |
48 | // Peer represents a connected remote node.
49 | type Peer struct {
50 | rw *conn
51 | running map[string]*protoRW // Operating agreement
52 | log log.Logger
53 | created mclock.AbsTime
54 |
55 | wg sync.WaitGroup
56 | protoErr chan error
57 | closed chan struct{}
58 | disc chan DiscReason
59 |
60 | // events receives message send / receive events if set
61 | events *event.Feed
62 | }
63 | ```
64 |
65 | Peer creation, find the protomap supported by the current Peer based on the match
66 |
67 | ```go
68 | func newPeer(conn *conn, protocols []Protocol) *Peer {
69 | protomap := matchProtocols(protocols, conn.caps, conn)
70 | p := &Peer{
71 | rw: conn,
72 | running: protomap,
73 | created: mclock.Now(),
74 | disc: make(chan DiscReason),
75 | protoErr: make(chan error, len(protomap)+1), // protocols + pingLoop
76 | closed: make(chan struct{}),
77 | log: log.New("id", conn.id, "conn", conn.flags),
78 | }
79 | return p
80 | }
81 | ```
82 |
83 | The start of the peer starts two goroutine threads. One is reading. One is to perform a ping.
84 |
85 | ```go
86 | func (p *Peer) run() (remoteRequested bool, err error) {
87 | var (
88 | writeStart = make(chan struct{}, 1) // A channel used to control when a write can be made.
89 | writeErr = make(chan error, 1)
90 | readErr = make(chan error, 1)
91 | reason DiscReason // sent to the peer
92 | )
93 | p.wg.Add(2)
94 | go p.readLoop(readErr)
95 | go p.pingLoop()
96 |
97 | // Start all protocol handlers.
98 | writeStart <- struct{}{}
99 | // Start all the protocols.
100 | p.startProtocols(writeStart, writeErr)
101 |
102 | // Wait for an error or disconnect.
103 | loop:
104 | for {
105 | select {
106 | case err = <-writeErr:
107 | // A write finished. Allow the next write to start if
108 | // there was no error.
109 | if err != nil {
110 | reason = DiscNetworkError
111 | break loop
112 | }
113 | writeStart <- struct{}{}
114 | case err = <-readErr:
115 | if r, ok := err.(DiscReason); ok {
116 | remoteRequested = true
117 | reason = r
118 | } else {
119 | reason = DiscNetworkError
120 | }
121 | break loop
122 | case err = <-p.protoErr:
123 | reason = discReasonForError(err)
124 | break loop
125 | case err = <-p.disc:
126 | break loop
127 | }
128 | }
129 |
130 | close(p.closed)
131 | p.rw.close(reason)
132 | p.wg.Wait()
133 | return remoteRequested, err
134 | }
135 | ```
136 |
137 | The startProtocols method, which traverses all protocols.
138 |
139 | ```go
140 | func (p *Peer) startProtocols(writeStart <-chan struct{}, writeErr chan<- error) {
141 | p.wg.Add(len(p.running))
142 | for _, proto := range p.running {
143 | proto := proto
144 | proto.closed = p.closed
145 | proto.wstart = writeStart
146 | proto.werr = writeErr
147 | var rw MsgReadWriter = proto
148 | if p.events != nil {
149 | rw = newMsgEventer(rw, p.events, p.ID(), proto.Name)
150 | }
151 | p.log.Trace(fmt.Sprintf("Starting protocol %s/%d", proto.Name, proto.Version))
152 | // This is equivalent to opening a goroutine for each protocol. Call its Run method.
153 | go func() {
154 | // proto.Run(p, rw) This method should be an infinite loop. If you return, you have encountered an error.
155 | err := proto.Run(p, rw)
156 | if err == nil {
157 | p.log.Trace(fmt.Sprintf("Protocol %s/%d returned", proto.Name, proto.Version))
158 | err = errProtocolReturned
159 | } else if err != io.EOF {
160 | p.log.Trace(fmt.Sprintf("Protocol %s/%d failed", proto.Name, proto.Version), "err", err)
161 | }
162 | p.protoErr <- err
163 | p.wg.Done()
164 | }()
165 | }
166 | }
167 | ```
168 |
169 | Go back and look at the **readLoop** method. This method is also an infinite loop. Call p.rw to read an Msg (this rw is actually the object of the frameRLPx mentioned earlier, that is, the object after the frame is divided. Then the corresponding processing is performed according to the type of Msg, if the type of Msg is the protocol of the internal running Type. Then send it to the proto.in queue of the corresponding protocol.
170 |
171 | ```go
172 | func (p *Peer) readLoop(errc chan<- error) {
173 | defer p.wg.Done()
174 | for {
175 | msg, err := p.rw.ReadMsg()
176 | if err != nil {
177 | errc <- err
178 | return
179 | }
180 | msg.ReceivedAt = time.Now()
181 | if err = p.handle(msg); err != nil {
182 | errc <- err
183 | return
184 | }
185 | }
186 | }
187 |
188 |
189 | func (p *Peer) handle(msg Msg) error {
190 | switch {
191 | case msg.Code == pingMsg:
192 | msg.Discard()
193 | go SendItems(p.rw, pongMsg)
194 | case msg.Code == discMsg:
195 | var reason [1]DiscReason
196 | // This is the last message. We don't need to discard or
197 | // check errors because, the connection will be closed after it.
198 | rlp.Decode(msg.Payload, &reason)
199 | return reason[0]
200 | case msg.Code < baseProtocolLength:
201 | // ignore other base protocol messages
202 | return msg.Discard()
203 | default:
204 | // it's a subprotocol message
205 | proto, err := p.getProto(msg.Code)
206 | if err != nil {
207 | return fmt.Errorf("msg code out of range: %v", msg.Code)
208 | }
209 | select {
210 | case proto.in <- msg:
211 | return nil
212 | case <-p.closed:
213 | return io.EOF
214 | }
215 | }
216 | return nil
217 | }
218 | ```
219 |
220 | Take a look at **pingLoop**. This method is very simple. It is time to send a pingMsg message to the peer.
221 |
222 | ```go
223 | func (p *Peer) pingLoop() {
224 | ping := time.NewTimer(pingInterval)
225 | defer p.wg.Done()
226 | defer ping.Stop()
227 | for {
228 | select {
229 | case <-ping.C:
230 | if err := SendItems(p.rw, pingMsg); err != nil {
231 | p.protoErr <- err
232 | return
233 | }
234 | ping.Reset(pingInterval)
235 | case <-p.closed:
236 | return
237 | }
238 | }
239 | }
240 | ```
241 |
242 | Finally, take a look at the read and write methods of protoRW. You can see that both reads and writes are blocking.
243 |
244 | ```go
245 | func (rw *protoRW) WriteMsg(msg Msg) (err error) {
246 | if msg.Code >= rw.Length {
247 | return newPeerError(errInvalidMsgCode, "not handled")
248 | }
249 | msg.Code += rw.offset
250 | select {
251 | case <-rw.wstart: // Wait until the write is executable. Is this for multi-threaded control?
252 | err = rw.w.WriteMsg(msg)
253 | // Report write status back to Peer.run. It will initiate
254 | // shutdown if the error is non-nil and unblock the next write
255 | // otherwise. The calling protocol code should exit for errors
256 | // as well but we don't want to rely on that.
257 | rw.werr <- err
258 | case <-rw.closed:
259 | err = fmt.Errorf("shutting down")
260 | }
261 | return err
262 | }
263 |
264 | func (rw *protoRW) ReadMsg() (Msg, error) {
265 | select {
266 | case msg := <-rw.in:
267 | msg.Code -= rw.offset
268 | return msg, nil
269 | case <-rw.closed:
270 | return Msg{}, io.EOF
271 | }
272 | }
273 | ```
274 |
--------------------------------------------------------------------------------
/ethereum-swarm-introduction.md:
--------------------------------------------------------------------------------
1 | ## Swarm storage introduction
2 |
3 | **Swarm defines 3 crucial notions:**
4 |
5 | 1. chunk
6 | - Chunks are pieces of data of limited size (max 4K)
7 | 2. reference
8 | - A reference is a unique identifier - cryptography hash of the data
9 | 3. manifest
10 | - A manifest is a data structure describing file collections
11 |
12 | **Kademlia topology**
13 | Swarm use the ethereum devp2p rlpx suite as the transport layer of the underlay network. This allows semi-stable peer connections over TCP with authenticated, encrypted, synchronous data streams.
14 | In a graph with kademlia topology, a path between any two points exists, it can be found using only local decisions on each hop and is guaranteed to terminate in no more steps than the depth of the destination plus one.
15 | 
16 |
17 | this distributed hash table stores resource locations throughout the network
18 |
19 | it calculates the distance to build the depth.
20 | Exclusive or was chosen because it acts as a distance function between all the node IDs. Specifically:
21 |
22 | - the distance between a node and itself is zero
23 | - it is symmetric: the "distances" calculated from A to B and from B to A are the same
24 | - it follows the triangle inequality: given A, B and C are vertices (points) of a triangle, then the distance from A to B is shorter than (or equal to) the sum of the distance from A to C plus the distance from C to B.
25 |
26 | Each bit of the node ID refer to a list, every entry in a list holds the neccessary data to locate another node(IP, port, nodeID)
27 | Every list corresponds to a specific distance from the node. Nodes that can go in the nth list must have a differing nth bit from the node's ID; the first n-1 bits of the candidate ID must match those of the node's ID. This means that it is very easy to populate the first list as 1/2 of the nodes in the network are far away candidates. The next list can use only 1/4 of the nodes in the network (one bit closer than the first), etc.
28 | 
29 |
30 | If a node wants to share a file, it processes the contents of the file, calculating from it a number (hash) that will identify this file within the file-sharing network.
31 | The hashes and the node IDs must be of the same length. It then searches for several nodes whose ID is close to the hash, and has its own IP address stored at those nodes. i.e. it publishes itself as a source for this file. A searching client will use Kademlia to search the network for the node whose ID has the smallest distance to the file hash, then will retrieve the sources list that is stored in that node.
32 |
33 | **Distributed preimage archive**
34 | Distributed hash tables (DHTs) utilise an overlay network to implement a key-value store distributed over the nodes. The basic idea is that the keyspace is mapped onto the overlay address space, and information about an element in the container is to be found with nodes whose address is in the proximity of the key. DHTs for decentralised content addressed storage typically associate content fingerprints with a list of nodes (seeders) who can serve that content. However, the same structure can be used directly: it is not information about the location of content that is stored at the node closest to the address (fingerprint), but the content itself. We call this structure distributed preimage archive (DPA).
35 |
36 | 
37 |
38 | **Replicas hold by a set of nearest neighbours**
39 | A chunk is said to be redundantly retrievable of degree n if it is retrievable and would remain so after any n-1 responsible nodes leave the network. In the case of request forwarding failures, one can retry, or start concurrent retrieve requests.
40 | The area of the fully connected neighbourhood defines an area of responsibility. A storer node is responsible for (storing) a chunk if the chunk falls within the node’s area of responsibility. As long as these assumptions hold, each chunk is retrievable even if R−1 storer nodes drop offline simultaneously. Erasure code is implemented also.
41 |
42 | **Caching and purging Storage**
43 | Since the Swarm has an address-key based retrieval protocol, content will be twice as likely be requested from a node that is one bit (one proximity bin) closer to the content’s address.
44 | What a node stores is determined by the access count of chunks: if we reach the capacity limit for storage the oldest unaccessed chunks are removed. On the one hand, this is backed by an incentive system rewarding serving chunks.
45 |
46 | **Synchronisation**
47 | In order to reduce network traffic resulting from receiving chunks from multiple sources, all store requests can go via a confirmation roundtrip. For each peer connection in both directions, the source peer sends an offeredHashes message containing a batch of hashes offered to push to the recipient. Recipient responds with a wantedHashes.
48 | 
49 |
50 | **Data layer**
51 | There are 4 different layers of data units relevant to Swarm:
52 |
53 | - message: p2p RLPx network layer. Messages are relevant for the devp2p wire protocols The BZZ URL schemes.
54 | - chunk: fixed size data unit of storage in the distributed preimage archive
55 | - file: the smallest unit that is associated with a mime-type and not guaranteed to have integrity unless it is complete. This is the smallest unit semantic to the user, basically a file on a filesystem.
56 | - collection: a mapping of paths to files is represented by the swarm manifest. This layer has a mapping to file system directory tree. Given trivial routing conventions, a url can be mapped to files in a standardised way, allowing manifests to mimic site maps/routing tables. As a result, Swarm is able to act as a webserver, a virtual cloud hosting service.
57 |
58 | The actual storage layer of Swarm consists of two main components, the localstore and the netstore. The local store consists of an in-memory fast cache (memory store) and a persistent disk storage (dbstore). The NetStore is extending local store to a distributed storage of Swarm and implements the distributed preimage archive (DPA).
59 | 
60 |
61 | **Pss**
62 | pss (Postal Service over Swarm) is a messaging protocol utilizing whisper protocol over Swarm with strong privacy features.
63 | With pss you can send messages to any node in the Swarm network. The messages are routed in the same manner as retrieve requests for chunks. Instead of chunk hash reference, pss messages specify a destination in the overlay address space independently of the message payload. This destination can describe a specific node if it is a complete overlay address or a neighbourhood if it is partially specified one. Up to the destination, the message is relayed through devp2p peer connections using forwarding kademlia (passing messages via semi-permanent peer-to-peer TCP connections between relaying nodes using kademlia routing). Within the destination neighbourhood the message is broadcast using gossip.
64 | By default message is encrypted, but we can enable Raw sending.
65 |
66 | **Feed**
67 | Since Swarm hashes are content addressed, changes to data will constantly result in changing hashes. Swarm Feeds provide a way to easily overcome this problem and provide a single, persistent, identifier to follow sequential data.
68 | You can think of a Feed as a user’s Twitter account, where he/she posts updates about a particular Topic. In fact, the Feed object is simply defined as:
69 |
70 | ```go
71 | type Feed struct {
72 | Topic Topic
73 | User common.Address
74 | }
75 | ```
76 |
77 | Users can post to any topic. If you know the user’s address and agree on a particular Topic, you can then effectively “follow” that user’s Feed.
78 |
79 | For convenience, feed.NewTopic() provides a way to “merge” a byte array with a string in order to build a Feed Topic out of both. This is used at the API level to create the illusion of subtopics. This way of building topics allows to use a random byte array (for example the hash of a smartcontract address) and merge it with a human-readable string such as "photo_id" in order to create a Topic that could represent the photo about that particular feed. This way, when you have a new photo, you could immediately build a Topic out of it and see if some user posted comments about that photo.
80 | Feeds are not created, only updated. If a particular Feed (user, topic combination) has never posted to, trying to fetch updates will yield nothing.
81 | `POST /bzz-feed:/?topic=&user=&level=&time=