├── README.md ├── blog.txt ├── exercises ├── ex1.py ├── ex2.py ├── ex2b.py ├── ex2c.py ├── ex2d.py ├── ex3.py └── ex4.py └── src ├── db.py ├── rlp.py ├── trie.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | understanding_ethereum_trie 2 | =========================== 3 | Repo supporting blog post at http://easythereentropy.wordpress.com/2014/06/04/understanding-the-ethereum-trie/ 4 | 5 | Hopefully, this will help those confused soles who have yet to grasp the trie. 6 | -------------------------------------------------------------------------------- /blog.txt: -------------------------------------------------------------------------------- 1 | The other day I finally got around to reading the entire ethereum yellow paper and to figuring out how the modified Merkle-patricia-tree (trie) works. So let's go through a brief but hopefully complete explanation of the trie, using examples. 2 | 3 | To avoid blockchain bloat, most of the data in the ethereum network is actually not stored on the blockchain itself. Rather, the blockchain stores hashes of RLP encodings of the data, where the hashes can be used as keys to look up the data in an offline key-value store. And since these are cryptographic hashes, we can prove the data is authentic if it hashes to the value stored in the blockchain. Note that RLP (recursive length prefix encoding) is ethereum's home-rolled encoding system, and all values in the database (except hashes) are RLP encoded. 4 | 5 | It is implied that the key-value database used to store ethereum data is a radix tree (trie), with a couple modifications to boost efficiency. In a normal radix tree, a key is the actual path taken through the tree to get to the corresponding value. That is, beginning from the root node of the tree, each character in the key tells you which child node to follow to get to the corresponding value, where the values are stored in the leaf nodes that terminate every path through the tree. Supposing the keys come from an alphabet containing N characters, each node in the tree can have up to N children, and the maximum depth of the tree is the maximum length of a key. 6 | 7 | Radix trees are nice because they allow keys that begin with the same sequence of characters to have values that are closer together in the tree. There are also no key collisions in a trie, like there might be in hash-tables. They can, however, be rather inefficient, like when you have a long key where no other key shares a common prefix. Then you have to travel (and store) a considerable number of nodes in the tree to get to the value, despite there being no other values along the path. 8 | 9 | The ethereum implementation of radix trees introduces a number of improvements. First, to make the tree cryptographically secure, each node is referenced by its hash, which in current implementations are used for look-up in a leveldb database. With this scheme, the root node becomes a cryptographic fingerprint of the entire data structure (hence, Merkle). Second, a number of node 'types' are introduced to improve efficiency. There is the blank node, which is simply empty, and the standard leaf node, which is a simple list of [key, value]. Then there are extension nodes, which are also simple [key, value] lists, but where value is a hash of some other node. The hash can be used to look-up that node in the database. Finally, there are branch nodes, which are lists of length 17. The first 16 elements correspond to the 16 possible hex characters in a key, and the final element holds a value if there is a [key, value] pair where the key ends at the branch node. If you don't get it yet, don't worry, no one does :D. We will work through examples to make it all clear. 10 | 11 | One more important thing is a special hex-prefix (HP) encoding used for keys. As mentioned, the alphabet is hex, so there are 16 possible children for each node. Since there are two kinds of [key, value] nodes (leaf and extension), a special 'terminator' flag is used to denote which type the key refers to. If the terminator flag is on, the key refers to a leaf node, and the corresponding value is the value for that key. If it's off, then the value is a hash to be used to look-up the corresponding node in the db. HP also encodes whether or not the key is of odd or even length. Finally, we note that a single hex character, or 4 bit binary number, is known as a nibble. 12 | 13 | The HP specification is rather simple. A nibble is appended to the key that encodes both the terminator status and parity. The lowest significant bit in the nibble encodes parity, while the next lowest encodes terminator status. If the key was in fact even, then we add another nibble, of value 0, to maintain overall evenness (so we can properly represent in bytes). 14 | 15 | Ok. So this all sounds fine and dandy, and you probably read about it here or here, or if you're quite brave, here, but let's get down and dirty with some python examples. I've set up a little repo on github that you can clone and follow along with. 16 |


 17 | git clone git@github.com:ebuchman/understanding_ethereum_trie
 18 |

19 | Basically I just grabbed the necessary files from the pyethereum repo (trie.py, utils.py, rlp.py, db.py), and wrote a bunch of exercises as short python scripts that you can try out. I also added some print statements to help you see what's going on in trie.py, though due to recursion, this can get messy, so there's a flag at the top of trie.py allowing you to turn printing on/off. Please feel free to improve the print statements and send a pull-request! You should be in the trie directory after cloning, and run your scripts with python excercises/exA.py, where A is the exercise number. So let's start with ex1.py. 20 | 21 | In ex1.py, we initialize a trie with a blank root, and add a single entry: 22 |


 23 | state = trie.Trie('triedb', trie.BLANK_ROOT)
 24 | state.update('\x01\x01\x02', rlp.encode(['hello']))
 25 | print state.root_hash.encode('hex')
 26 |

27 | Here, we're using '\x01\x01\x02' as the key and 'hello' as the value. The key should be a string (max 32 bytes, typically a big-endian integer or an address), and the value an rlp encoding of arbitrary data. Note we could have used something simpler, like 'dog', as our key, but let's keep it real with raw bytes. We can follow through the code in trie.py to see what happens under the hood. Basically, in this case, since we start with a blank node, trie.py creates a new leaf node (adding the terminator flag to the key), rlp encodes it, takes the hash, and stores [hash, rlp(node)] in the database. The print statement should display the hash, which we can use from now on as the root hash for our trie. Finally, for completeness, we look at the HP encoding of the key: 28 |


 29 | k, v = state.root_node
 30 | print 'root node:', [k, v]
 31 | print 'hp encoded key, in hex', k.encode('hex')
 32 |

33 | 34 | The output of ex1.py is 35 |


 36 | root hash 15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc
 37 | root node: [' \x01\x01\x02', '\xc6\x85hello']
 38 | hp encoded key, in hex: 20010102
 39 |

40 | Note the final 6 nibbles are the key we used, 010102, while the first two give us the HP encoding. The first nibble tells us that this is a terminator node (since it would be 10 in binary, so the second least significant bit is on), and since the key was even length (least significant bit is 0), we add a second 0 nibble. 41 | 42 | Moving on to ex2.py, we initialize a trie that starts with the previous hash: 43 |


 44 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
 45 | print state.root_node
 46 |

47 | The print statement should give us the [key, value] pair we previously stored. Great. Let's add some more entries. We're going to try this a few different ways, so we can clearly see the different possibilities. We'll use multiple ex2 python files, initializing the trie from the original hash each time. First, let's make an entry with the same key we already used but a different value. Since the new value will lead to a new hash, we will have two tries, referenced by two different hashes, both starting with the same key (the rest of ex2.py) 48 |


 49 | state.update('\x01\x01\x02', rlp.encode(['hellothere']))
 50 | print state.root_hash.encode('hex')
 51 | print state.root_node
 52 |

53 | The output for ex2.py is: 54 | 05e13d8be09601998499c89846ec5f3101a1ca09373a5f0b74021261af85d396 55 | [' \x01\x01\x02', '\xcb\x8ahellothere'] 56 | 57 | So that's not all that interesting, but it's nice that we didn't overwrite the original entry, and can still access both using their respective hashes. Now, let's add an entry that use's the same key but with a different final nibble (ex2b.py): 58 |


 59 | state.update('\x01\x01\x03', rlp.encode(['hellothere']))
 60 | print 'root hash:', state.root_hash.encode('hex')
 61 | k, v = state.root_node
 62 | print 'root node:', [k, v]
 63 | print 'hp encoded key, in hex:', k.encode('hex')
 64 |

65 | This print 'root node' statement should return something mostly unintelligible. That's because it's giving us a [key, value] node where the key is the common prefix from our two keys ([0,1,0,1,0]), encoded using HP to include a non-terminator flag and an indication that the key is odd-length, and the value is the hash of the rlp encoding of the node we're interested in. That is, it's an extension node. We can use the hash to look up the node in the database: 66 |


 67 | print state._get_node_type(state.root_node) == trie.NODE_TYPE_EXTENSION
 68 | common_prefix_key, node_hash = state.root_node
 69 | print state._decode_to_node(node_hash)
 70 | print state._get_node_type(state._decode_to_node(node_hash)) == trie.NODE_TYPE_BRANCH
 71 |

72 | And the output for ex2b.py: 73 |


 74 | root hash: b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d
 75 | root node: ['\x10\x10\x10', '"\x01\xab\x83u\x15o\'\xf7T-h\xde\x94K/\xba\xa3[\x83l\x94\xe7\xb3\x8a\xcf\n\nt\xbb\xef\xd9']
 76 | hp encoded key, in hex: 101010
 77 | True
 78 | ['', '', [' ', '\xc6\x85hello'], [' ', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '', '', '']
 79 | True
 80 |

81 | This result is rather interesting. What we have here is a branch node, a list with 17 entries. Note the difference in our original keys: they both start with [0,1,0,1,0], and one ends in 2 while the other ends in 3. So, when we add the new entry (key ending in 3), the node that previously held the key ending in 2 is replaced with a branch node whose key is the HP encoded common prefix of the two keys. The branch node is stored as a [key, value] extension node, where key is the HP encoded common prefix and value is the hash of the node, which can be used to look-up the branch node that it points to. The entry at index 2 of this branch node is the original node with key ending in 2 ('hello'), while the entry at index 3 is the new node ('hellothere'). Since both keys are only one nibble longer than the key for the branch node itself, the final nibble is encoded implicitly by the position of the nodes in the branch node. And since that exhausts all the characters in the keys, these nodes are stored with empty keys in the branch node. 82 | 83 | You'll note I added a couple print statements to verify that these nodes are in fact what I say they are - extension and branch nodes, respectively. 84 | 85 | Ok, so that was pretty cool. Let's do it again but with a key equal to the first few nibbles of our original key (ex2c.py): 86 |


 87 | state.update('\x01\x01', rlp.encode(['hellothere']))
 88 |

89 | Again, we see that this results in the creation of a branch node, but something different has happened. The branch node corresponds to the key '\x01\x01', but there is also a value with that key ('hellothere'). Hence, that value is placed in the final (17th) position of the branch node. The other entry, with key '\x01\x01\x02', is placed in the position corresponding to the next nibble in its key, in this case, 0. Since it's key hasn't been fully exhausted, we store the leftover nibbles (in this case, just '2') in the key position for the node. Hence the output: 90 |


 91 | [['2', '\xc6\x85hello'], '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '\xcb\x8ahellothere']
 92 |

93 | Make sense? Let's do one final component of exercise 2 (ex2d.py). Here, we add a new entry with a key that is identical to the original key, but has an additional two nibbles: 94 |


 95 | state.update('\x01\x01\x02\x57', rlp.encode(['hellothere']))
 96 |

97 | In this case, the opposite of what we just saw happens! The original entry's value is stored at the final position of the branch node, where the key for the branch node is the key for that value ('\x01\x01\x02'). The second entry is stored at the position of it's next nibble (5), with a key equal to the remaining nibbles (just 7): 98 |


 99 | ['', '', '', '', '', ['7', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
100 |

101 | Tada! Try playing around a bit to make sure you understand what's going on here. Nodes are stored in the database according to the hash of their rlp encoding. Once a node is retrieved, key's are used to travel a path through a further series of nodes (which may involve more hash lookups) to reach the final value. Of course, we've only used two entries in each of these examples to keep things simple, but that has been sufficient to expose the basic mechanic of the trie. We could add more entries to fill up the branch node, but since we already understand how that works, let's move on to something more complicated. In exercise 3, we will add a third entry, which shares a common prefix with the second entry. This one's a little longer, but the result is totally awesome (ex3.py): 102 |


103 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
104 | print state.root_hash.encode('hex')
105 | print state.root_node
106 | print ''
107 | state.update('\x01\x01\x02\x55', rlp.encode(['hellothere']))
108 | print 'root hash:', state.root_hash.encode('hex')
109 | print 'root node:', state.root_node
110 | print 'branch node it points to:', state._decode_to_node(state.root_node[1])
111 | print ''
112 |

113 | Nothing new yet. Initialize from original hash, add a new node with key '\x01\x01\x02\x55'. Creates a branch node and points to it with a hash. We know this. Now the fun stuff: 114 |


115 | state.update('\x01\x01\x02\x57', rlp.encode(['jimbojones']))
116 | print 'root hash:', state.root_hash.encode('hex')
117 | print 'root node:', state.root_node
118 | branch_node = state._decode_to_node(state.root_node[1])
119 | print 'branch node it points to:', branch_node
120 |

121 | We're doing the same thing - add a new node, this time with key '\x01\x01\x02\x57' and value 'jimbojones'. But now, in our branch node, where there used to be a node with value 'hellothere' (ie. at index 5), there is a messy ole hash! What do we do with hashes in tries? We use em to look up more nodes, of course! 122 |


123 | next_hash = branch_node[5]
124 | print 'hash stored in branch node:', next_hash.encode('hex')
125 | print 'branch node it points to:', state._decode_to_node(next_hash)
126 |

127 | And the output: 128 |


129 | root hash: 17fe8af9c6e73de00ed5fd45d07e88b0c852da5dd4ee43870a26c39fc0ec6fb3
130 | root node: ['\x00\x01\x01\x02', '\r\xca6X\xe5T\xd0\xbd\xf6\xd7\x19@\xd1E\t\x8ehW\x03\x8a\xbd\xa3\xb2\x92!\xae{2\x1bp\x06\xbb']
131 | branch node it points to: ['', '', '', '', '', ['5', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
132 | 
133 | root hash: fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727
134 | root node: ['\x00\x01\x01\x02', '\xd5/\xaf\x1f\xdeO!u>&3h_+\xac?\xf1\xf3*\xb7)3\xec\xe9\xd5\x9f2\xcaoc\x95m']
135 | branch node it points to: ['', '', '', '', '', '\x00&\x15\xb7\xc4\x05\xf6\xf3F2\x9a(N\x8f\xb2H\xe75\xcf\xfa\x89C-\xab\xa2\x9eV\xe4\x14\xdfl0', '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
136 | hash stored in branch node: 002615b7c405f6f346329a284e8fb248e735cffa89432daba29e56e414df6c30
137 | branch node it points to: ['', '', '', '', '', [' ', '\xcb\x8ahellothere'], '', [' ', '\xcb\x8ajimbojones'], '', '', '', '', '', '', '', '', '']
138 |

139 | Tada! So this hash, which corresponds to key [0,1,0,1,0,2,5], points to another branch node which holds our values 'hellothere' and 'jimbojones' at the appropriate positions. I recommend experimenting a little further by adding some new entries, specifically, try filling in the final branch node some more, including the last position. 140 | 141 | Ok! So this has been pretty cool. Hopefully by now you have a pretty solid understanding of how the trie works, the HP encoding, the different node types, and how the nodes are connected and refer to each other. As a final exercise, let's do some look-ups. 142 |


143 | state = trie.Trie('triedb', 'b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d'.decode('hex'))
144 | print 'using root hash from ex2b'
145 | print rlp.decode(state.get('\x01\x01\x03'))                          
146 | print ''
147 | state = trie.Trie('triedb', 'fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727'.decode('hex'))
148 | print 'using root hash from ex3'                                     
149 | print rlp.decode(state.get('\x01\x01\x02'))
150 | print rlp.decode(state.get('\x01\x01\x02\x55'))
151 | print rlp.decode(state.get('\x01\x01\x02\x57'))        
152 |

153 | You should see the values we stored in previous exercises. 154 | 155 | And that's that! Now, you might wonder, "so, how is all this trie stuff actually used in ethereum?" Great question. And my repository does not have the solutions. But if you clone the official pyethereum repo, and do a quick grep -r 'Trie' . , it should clue you in. What we find is that a trie is used in two key places: to encode transaction lists in a block, and to encode the state of a block. For transactions, the keys are big-endian integers representing the transaction count in the current block. For the state trie, the keys are ethereum adresses. 156 | 157 | There you have it folks. We have achieved an understanding of the ethereum trie. Now go forth, and trie it! 158 | -------------------------------------------------------------------------------- /exercises/ex1.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie 6 | state = trie.Trie('triedb', trie.BLANK_ROOT) 7 | state.update('\x01\x01\x02', rlp.encode(['hello'])) 8 | print 'root hash', state.root_hash.encode('hex') 9 | k, v = state.root_node 10 | print 'root node:', [k, v] 11 | print 'hp encoded key, in hex:', k.encode('hex') 12 | -------------------------------------------------------------------------------- /exercises/ex2.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie from previous hash; add new entry with same key. 6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex')) 7 | print state.root_hash.encode('hex') 8 | print state.root_node 9 | print '' 10 | state.update('\x01\x01\x02', rlp.encode(['hellothere'])) 11 | print state.root_hash.encode('hex') 12 | print state.root_node 13 | # we now have two tries, addressed in the database by their respective hashes, though they each have the same key 14 | -------------------------------------------------------------------------------- /exercises/ex2b.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie from previous hash; add new [key, value] where key has common prefix 6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex')) 7 | print state.root_hash.encode('hex') 8 | print state.root_node 9 | print '' 10 | state.update('\x01\x01\x03', rlp.encode(['hellothere'])) 11 | print 'root hash:', state.root_hash.encode('hex') 12 | k, v = state.root_node 13 | print 'root node:', [k, v] 14 | print 'hp encoded key, in hex:', k.encode('hex') 15 | print state._get_node_type(state.root_node) == trie.NODE_TYPE_EXTENSION 16 | common_prefix_key, node_hash = state.root_node 17 | print state._decode_to_node(node_hash) 18 | print state._get_node_type(state._decode_to_node(node_hash)) == trie.NODE_TYPE_BRANCH 19 | -------------------------------------------------------------------------------- /exercises/ex2c.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie from previous hash; add new [key, value] where key has common prefix 6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex')) 7 | print state.root_hash.encode('hex') 8 | print state.root_node 9 | print '' 10 | state.update('\x01\x01', rlp.encode(['hellothere'])) 11 | print 'root hash:', state.root_hash.encode('hex') 12 | print 'root node:', state.root_node 13 | print state._decode_to_node(state.root_node[1]) 14 | -------------------------------------------------------------------------------- /exercises/ex2d.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie from previous hash; add new [key, value] where key has common prefix 6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex')) 7 | print state.root_hash.encode('hex') 8 | print state.root_node 9 | print '' 10 | state.update('\x01\x01\x02\x55', rlp.encode(['hellothere'])) 11 | print 'root hash:', state.root_hash.encode('hex') 12 | print 'root node:', state.root_node 13 | print state._decode_to_node(state.root_node[1]) 14 | -------------------------------------------------------------------------------- /exercises/ex3.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie from previous hash; add new [key, value] where key has common prefix 6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex')) 7 | print state.root_hash.encode('hex') 8 | print state.root_node 9 | print '' 10 | state.update('\x01\x01\x02\x55', rlp.encode(['hellothere'])) 11 | print 'root hash:', state.root_hash.encode('hex') 12 | print 'root node:', state.root_node 13 | print 'branch node it points to:', state._decode_to_node(state.root_node[1]) 14 | print '' 15 | state.update('\x01\x01\x02\x57', rlp.encode(['jimbojones'])) 16 | print 'root hash:', state.root_hash.encode('hex') 17 | print 'root node:', state.root_node 18 | branch_node = state._decode_to_node(state.root_node[1]) 19 | print 'branch node it points to:', branch_node 20 | next_hash = branch_node[5] 21 | print 'hash stored in branch node:', next_hash.encode('hex') 22 | print 'branch node it points to:', state._decode_to_node(next_hash) 23 | -------------------------------------------------------------------------------- /exercises/ex4.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('src') 3 | import trie, utils, rlp 4 | 5 | #initialize trie from previous hash; add new [key, value] where key has common prefix 6 | state = trie.Trie('triedb', 'b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d'.decode('hex')) 7 | print 'using root hash from ex2b' 8 | print rlp.decode(state.get('\x01\x01\x03')) 9 | print '' 10 | state = trie.Trie('triedb', 'fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727'.decode('hex')) 11 | print 'using root hash from ex3' 12 | print rlp.decode(state.get('\x01\x01\x02')) 13 | print rlp.decode(state.get('\x01\x01\x02\x55')) 14 | print rlp.decode(state.get('\x01\x01\x02\x57')) 15 | -------------------------------------------------------------------------------- /src/db.py: -------------------------------------------------------------------------------- 1 | import leveldb 2 | import threading 3 | 4 | databases = {} 5 | 6 | 7 | class DB(object): 8 | 9 | def __init__(self, dbfile): 10 | self.dbfile = dbfile 11 | if dbfile not in databases: 12 | databases[dbfile] = ( 13 | leveldb.LevelDB(dbfile), dict(), threading.Lock()) 14 | self.db, self.uncommitted, self.lock = databases[dbfile] 15 | 16 | def get(self, key): 17 | if key in self.uncommitted: 18 | return self.uncommitted[key] 19 | return self.db.Get(key) 20 | 21 | def put(self, key, value): 22 | with self.lock: 23 | self.uncommitted[key] = value 24 | 25 | def commit(self): 26 | with self.lock: 27 | batch = leveldb.WriteBatch() 28 | for k, v in self.uncommitted.iteritems(): 29 | batch.Put(k, v) 30 | self.db.Write(batch, sync=True) 31 | self.uncommitted.clear() 32 | 33 | def delete(self, key): 34 | with self.lock: 35 | if key in self.uncommitted: 36 | del self.uncommitted[key] 37 | if key not in self: 38 | self.db.Delete(key) 39 | else: 40 | self.db.Delete(key) 41 | 42 | def _has_key(self, key): 43 | try: 44 | self.get(key) 45 | return True 46 | except KeyError: 47 | return False 48 | 49 | def __contains__(self, key): 50 | return self._has_key(key) 51 | 52 | def __eq__(self, other): 53 | return isinstance(other, self.__class__) and self.db == other.db 54 | -------------------------------------------------------------------------------- /src/rlp.py: -------------------------------------------------------------------------------- 1 | ''' 2 | First byte of an encoded item 3 | 4 | x: single byte, itself 5 | | 6 | | 7 | 0x7f == 127 8 | 9 | 0x80 == 128 10 | | 11 | x: [0, 55] byte long string, x-0x80 == length 12 | | 13 | 0xb7 == 183 14 | 15 | 0xb8 == 184 16 | | 17 | x: [55, ] long string, x-0xf8 == length of the length 18 | | 19 | 0xbf == 191 20 | 21 | 0xc0 == 192 22 | | 23 | x: [0, 55] byte long list, x-0xc0 == length 24 | | 25 | 0xf7 == 247 26 | 27 | 0xf8 == 248 28 | | 29 | x: [55, ] long list, x-0xf8 == length of the length 30 | | 31 | 0xff == 255 32 | ''' 33 | 34 | 35 | def int_to_big_endian(integer): 36 | '''convert a integer to big endian binary string''' 37 | # 0 is a special case, treated same as '' 38 | if integer == 0: 39 | return '' 40 | s = '%x' % integer 41 | if len(s) & 1: 42 | s = '0' + s 43 | return s.decode('hex') 44 | 45 | 46 | def big_endian_to_int(string): 47 | '''convert a big endian binary string to integer''' 48 | # '' is a special case, treated same as 0 49 | string = string or '\x00' 50 | s = string.encode('hex') 51 | return long(s, 16) 52 | 53 | 54 | def __decode(s, pos=0): 55 | ''' decode string start at `pos` 56 | :param s: string of rlp encoded data 57 | :param pos: start position of `s` to decode from 58 | :return: 59 | o: decoded object 60 | pos: end position of the obj in the string of rlp encoded data 61 | ''' 62 | assert pos < len(s), "read beyond end of string in __decode" 63 | 64 | fchar = ord(s[pos]) 65 | if fchar < 128: 66 | return (s[pos], pos + 1) 67 | elif fchar < 184: 68 | b = fchar - 128 69 | return (s[pos + 1:pos + 1 + b], pos + 1 + b) 70 | elif fchar < 192: 71 | b = fchar - 183 72 | b2 = big_endian_to_int(s[pos + 1:pos + 1 + b]) 73 | return (s[pos + 1 + b:pos + 1 + b + b2], pos + 1 + b + b2) 74 | elif fchar < 248: 75 | o = [] 76 | pos += 1 77 | pos_end = pos + fchar - 192 78 | 79 | while pos < pos_end: 80 | obj, pos = __decode(s, pos) 81 | o.append(obj) 82 | assert pos == pos_end, "read beyond list boundary in __decode" 83 | return (o, pos) 84 | else: 85 | b = fchar - 247 86 | b2 = big_endian_to_int(s[pos + 1:pos + 1 + b]) 87 | o = [] 88 | pos += 1 + b 89 | pos_end = pos + b2 90 | while pos < pos_end: 91 | obj, pos = __decode(s, pos) 92 | o.append(obj) 93 | assert pos == pos_end, "read beyond list boundary in __decode" 94 | return (o, pos) 95 | 96 | 97 | def decode(s): 98 | assert isinstance(s, str) 99 | if s: 100 | return __decode(s)[0] 101 | 102 | 103 | def into(data, pos): 104 | fchar = ord(data[pos]) 105 | if fchar < 192: 106 | raise Exception("Cannot descend further") 107 | elif fchar < 248: 108 | return pos + 1 109 | else: 110 | return pos + 1 + (fchar - 247) 111 | 112 | 113 | def next_item_pos(data, pos): 114 | '''get position of next item in the encoded list or string: 115 | 116 | if list, then get next item's start position 117 | if string, then get next charactor's postion 118 | 119 | :param data: rlp encoded from list or string 120 | :pos: current item's position 121 | ''' 122 | fchar = ord(data[pos]) 123 | if fchar < 128: 124 | return pos + 1 125 | elif (fchar % 64) < 56: 126 | return pos + 1 + (fchar % 64) 127 | else: 128 | b = (fchar % 64) - 55 129 | b2 = big_endian_to_int(data[pos + 1:pos + 1 + b]) 130 | return pos + 1 + b + b2 131 | 132 | 133 | def descend(data, *indices): 134 | pos = 0 135 | for i in indices: 136 | finish_pos = next_item_pos(data, pos) 137 | pos = into(data, pos) 138 | for j in range(i): 139 | pos = next_item_pos(data, pos) 140 | if pos >= finish_pos: 141 | raise Exception("End of list") 142 | return data[pos: finish_pos] 143 | 144 | 145 | def encode_length(L, offset): 146 | if L < 56: 147 | return chr(L + offset) 148 | elif L < 256 ** 8: 149 | BL = int_to_big_endian(L) 150 | return chr(len(BL) + offset + 55) + BL 151 | else: 152 | raise Exception("input too long") 153 | 154 | 155 | def encode(s): 156 | if isinstance(s, (str, unicode)): 157 | s = str(s) 158 | if len(s) == 1 and ord(s) < 128: 159 | return s 160 | else: 161 | return encode_length(len(s), 128) + s 162 | elif isinstance(s, list): 163 | return concat(map(encode, s)) 164 | 165 | raise TypeError("Encoding of %s not supported" % type(s)) 166 | 167 | 168 | def concat(s): 169 | ''' 170 | :param s: a list, each item is a string of a rlp encoded data 171 | ''' 172 | assert isinstance(s, list) 173 | output = ''.join(s) 174 | return encode_length(len(output), 192) + output 175 | -------------------------------------------------------------------------------- /src/trie.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import rlp 5 | import utils 6 | import db 7 | 8 | DB = db.DB 9 | 10 | PRINT = 0 #change to 1 to turn on printing 11 | 12 | def bin_to_nibbles(s): 13 | """convert string s to nibbles (half-bytes) 14 | 15 | >>> bin_to_nibbles("") 16 | [] 17 | >>> bin_to_nibbles("h") 18 | [6, 8] 19 | >>> bin_to_nibbles("he") 20 | [6, 8, 6, 5] 21 | >>> bin_to_nibbles("hello") 22 | [6, 8, 6, 5, 6, 12, 6, 12, 6, 15] 23 | """ 24 | res = [] 25 | for x in s: 26 | res += divmod(ord(x), 16) 27 | return res 28 | 29 | 30 | def nibbles_to_bin(nibbles): 31 | if any(x > 15 or x < 0 for x in nibbles): 32 | raise Exception("nibbles can only be [0,..15]") 33 | 34 | if len(nibbles) % 2: 35 | raise Exception("nibbles must be of even numbers") 36 | 37 | res = '' 38 | for i in range(0, len(nibbles), 2): 39 | res += chr(16 * nibbles[i] + nibbles[i + 1]) 40 | return res 41 | 42 | 43 | NIBBLE_TERMINATOR = 16 44 | 45 | 46 | def with_terminator(nibbles): 47 | nibbles = nibbles[:] 48 | if not nibbles or nibbles[-1] != NIBBLE_TERMINATOR: 49 | nibbles.append(NIBBLE_TERMINATOR) 50 | return nibbles 51 | 52 | 53 | def without_terminator(nibbles): 54 | nibbles = nibbles[:] 55 | if nibbles and nibbles[-1] == NIBBLE_TERMINATOR: 56 | del nibbles[-1] 57 | return nibbles 58 | 59 | 60 | def adapt_terminator(nibbles, has_terminator): 61 | if has_terminator: 62 | return with_terminator(nibbles) 63 | else: 64 | return without_terminator(nibbles) 65 | 66 | 67 | def pack_nibbles(nibbles): 68 | """pack nibbles to binary 69 | 70 | :param nibbles: a nibbles sequence. may have a terminator 71 | """ 72 | 73 | if nibbles[-1:] == [NIBBLE_TERMINATOR]: 74 | flags = 2 75 | nibbles = nibbles[:-1] 76 | else: 77 | flags = 0 78 | 79 | oddlen = len(nibbles) % 2 80 | flags |= oddlen # set lowest bit if odd number of nibbles 81 | if oddlen: 82 | nibbles = [flags] + nibbles 83 | else: 84 | nibbles = [flags, 0] + nibbles 85 | o = '' 86 | for i in range(0, len(nibbles), 2): 87 | o += chr(16 * nibbles[i] + nibbles[i + 1]) 88 | return o 89 | 90 | 91 | def unpack_to_nibbles(bindata): 92 | """unpack packed binary data to nibbles 93 | 94 | :param bindata: binary packed from nibbles 95 | :return: nibbles sequence, may have a terminator 96 | """ 97 | o = bin_to_nibbles(bindata) 98 | flags = o[0] 99 | if flags & 2: 100 | o.append(NIBBLE_TERMINATOR) 101 | if flags & 1 == 1: 102 | o = o[1:] 103 | else: 104 | o = o[2:] 105 | return o 106 | 107 | 108 | def starts_with(full, part): 109 | ''' test whether the items in the part is 110 | the leading items of the full 111 | ''' 112 | if len(full) < len(part): 113 | return False 114 | return full[:len(part)] == part 115 | 116 | 117 | ( 118 | NODE_TYPE_BLANK, 119 | NODE_TYPE_LEAF, 120 | NODE_TYPE_EXTENSION, 121 | NODE_TYPE_BRANCH 122 | ) = tuple(range(4)) 123 | 124 | 125 | def is_key_value_type(node_type): 126 | return node_type in [NODE_TYPE_LEAF, 127 | NODE_TYPE_EXTENSION] 128 | 129 | BLANK_NODE = '' 130 | BLANK_ROOT = '' 131 | 132 | 133 | class Trie(object): 134 | 135 | def __init__(self, dbfile, root_hash=BLANK_ROOT): 136 | '''it also present a dictionary like interface 137 | 138 | :param dbfile: key value database 139 | :root: blank or trie node in form of [key, value] or [v0,v1..v15,v] 140 | ''' 141 | dbfile = os.path.abspath(dbfile) 142 | self.db = DB(dbfile) 143 | self.set_root_hash(root_hash) 144 | 145 | @property 146 | def root_hash(self): 147 | '''always empty or a 32 bytes string 148 | ''' 149 | return self.get_root_hash() 150 | 151 | def get_root_hash(self): 152 | if self.root_node == BLANK_NODE: 153 | return BLANK_ROOT 154 | assert isinstance(self.root_node, list) 155 | val = rlp.encode(self.root_node) 156 | key = utils.sha3(val) 157 | self.db.put(key, val) 158 | return key 159 | 160 | @root_hash.setter 161 | def root_hash(self, value): 162 | self.set_root_hash(value) 163 | 164 | def set_root_hash(self, root_hash): 165 | if root_hash == BLANK_ROOT: 166 | self.root_node = BLANK_NODE 167 | return 168 | assert isinstance(root_hash, (str, unicode)) 169 | assert len(root_hash) in [0, 32] 170 | self.root_node = self._decode_to_node(root_hash) 171 | 172 | def clear(self): 173 | ''' clear all tree data 174 | ''' 175 | self._delete_child_stroage(self.root_node) 176 | self._delete_node_storage(self.root_node) 177 | self.db.commit() 178 | self.root_node = BLANK_NODE 179 | 180 | def _delete_child_stroage(self, node): 181 | node_type = self._get_node_type(node) 182 | if node_type == NODE_TYPE_BRANCH: 183 | for item in node[:16]: 184 | self._delete_child_stroage(self._decode_to_node(item)) 185 | elif is_key_value_type(node_type): 186 | node_type = self._get_node_type(node) 187 | if node_type == NODE_TYPE_EXTENSION: 188 | self._delete_child_stroage(self._decode_to_node(node[1])) 189 | 190 | def _encode_node(self, node): 191 | if node == BLANK_NODE: 192 | return BLANK_NODE 193 | assert isinstance(node, list) 194 | rlpnode = rlp.encode(node) 195 | if len(rlpnode) < 32: 196 | return node 197 | 198 | hashkey = utils.sha3(rlpnode) 199 | self.db.put(hashkey, rlpnode) 200 | return hashkey 201 | 202 | def _decode_to_node(self, encoded): 203 | if encoded == BLANK_NODE: 204 | return BLANK_NODE 205 | if isinstance(encoded, list): 206 | return encoded 207 | return rlp.decode(self.db.get(encoded)) 208 | 209 | def _get_node_type(self, node): 210 | ''' get node type and content 211 | 212 | :param node: node in form of list, or BLANK_NODE 213 | :return: node type 214 | ''' 215 | if node == BLANK_NODE: 216 | return NODE_TYPE_BLANK 217 | 218 | if len(node) == 2: 219 | nibbles = unpack_to_nibbles(node[0]) 220 | has_terminator = (nibbles and nibbles[-1] == NIBBLE_TERMINATOR) 221 | return NODE_TYPE_LEAF if has_terminator\ 222 | else NODE_TYPE_EXTENSION 223 | if len(node) == 17: 224 | return NODE_TYPE_BRANCH 225 | 226 | def _get(self, node, key): 227 | """ get value inside a node 228 | 229 | :param node: node in form of list, or BLANK_NODE 230 | :param key: nibble list without terminator 231 | :return: 232 | BLANK_NODE if does not exist, otherwise value or hash 233 | """ 234 | node_type = self._get_node_type(node) 235 | if node_type == NODE_TYPE_BLANK: 236 | return BLANK_NODE 237 | 238 | if node_type == NODE_TYPE_BRANCH: 239 | # already reach the expected node 240 | if not key: 241 | return node[-1] 242 | sub_node = self._decode_to_node(node[key[0]]) 243 | return self._get(sub_node, key[1:]) 244 | 245 | # key value node 246 | curr_key = without_terminator(unpack_to_nibbles(node[0])) 247 | if node_type == NODE_TYPE_LEAF: 248 | return node[1] if key == curr_key else BLANK_NODE 249 | 250 | if node_type == NODE_TYPE_EXTENSION: 251 | # traverse child nodes 252 | if starts_with(key, curr_key): 253 | sub_node = self._decode_to_node(node[1]) 254 | return self._get(sub_node, key[len(curr_key):]) 255 | else: 256 | return BLANK_NODE 257 | 258 | def _update(self, node, key, value): 259 | """ update item inside a node 260 | 261 | :param node: node in form of list, or BLANK_NODE 262 | :param key: nibble list without terminator 263 | .. note:: key may be [] 264 | :param value: value string 265 | :return: new node 266 | 267 | if this node is changed to a new node, it's parent will take the 268 | responsibility to *store* the new node storage, and delete the old 269 | node storage 270 | """ 271 | assert value != BLANK_NODE 272 | node_type = self._get_node_type(node) 273 | 274 | if node_type == NODE_TYPE_BLANK: 275 | if PRINT: print 'blank' 276 | return [pack_nibbles(with_terminator(key)), value] 277 | 278 | elif node_type == NODE_TYPE_BRANCH: 279 | if PRINT: print 'branch' 280 | if not key: 281 | if PRINT: print '\tdone', node 282 | node[-1] = value 283 | if PRINT: print '\t', node 284 | 285 | else: 286 | if PRINT: print 'recursive branch' 287 | if PRINT: print '\t', node, key, value 288 | new_node = self._update_and_delete_storage( 289 | self._decode_to_node(node[key[0]]), 290 | key[1:], value) 291 | if PRINT: print '\t', new_node 292 | node[key[0]] = self._encode_node(new_node) 293 | if PRINT: print '\t', node 294 | return node 295 | 296 | elif is_key_value_type(node_type): 297 | if PRINT: print 'kv' 298 | return self._update_kv_node(node, key, value) 299 | 300 | def _update_and_delete_storage(self, node, key, value): 301 | old_node = node[:] 302 | new_node = self._update(node, key, value) 303 | if old_node != new_node: 304 | self._delete_node_storage(old_node) 305 | return new_node 306 | 307 | def _update_kv_node(self, node, key, value): 308 | node_type = self._get_node_type(node) 309 | curr_key = without_terminator(unpack_to_nibbles(node[0])) 310 | is_inner = node_type == NODE_TYPE_EXTENSION 311 | if PRINT: print 'this node is an extension node?', is_inner 312 | if PRINT: print 'cur key, next key', curr_key, key 313 | 314 | # find longest common prefix 315 | prefix_length = 0 316 | for i in range(min(len(curr_key), len(key))): 317 | if key[i] != curr_key[i]: 318 | break 319 | prefix_length = i + 1 320 | 321 | remain_key = key[prefix_length:] 322 | remain_curr_key = curr_key[prefix_length:] 323 | 324 | if PRINT: print 'remain keys..' 325 | if PRINT: print prefix_length, remain_key, remain_curr_key 326 | 327 | # if the keys were the same, then either this is a terminal node or not. if yes, return [key, value]. if not, its an extension node, so the value of this node points to another node, from which we use remaining key. 328 | 329 | if remain_key == [] == remain_curr_key: 330 | if PRINT: print 'keys were same', node[0], key 331 | if not is_inner: 332 | if PRINT: print 'not an extension node' 333 | return [node[0], value] 334 | if PRINT: print 'yes an extension node!' 335 | new_node = self._update_and_delete_storage( 336 | self._decode_to_node(node[1]), remain_key, value) 337 | 338 | elif remain_curr_key == []: 339 | if PRINT: print 'old key exhausted' 340 | if is_inner: 341 | if PRINT: print '\t is extension', self._decode_to_node(node[1]) 342 | new_node = self._update_and_delete_storage( 343 | self._decode_to_node(node[1]), remain_key, value) 344 | else: 345 | if PRINT: print '\tnew branch' 346 | new_node = [BLANK_NODE] * 17 347 | new_node[-1] = node[1] 348 | new_node[remain_key[0]] = self._encode_node([ 349 | pack_nibbles(with_terminator(remain_key[1:])), 350 | value 351 | ]) 352 | if PRINT: print new_node 353 | else: 354 | if PRINT: print 'making a branch' 355 | new_node = [BLANK_NODE] * 17 356 | if len(remain_curr_key) == 1 and is_inner: 357 | if PRINT: print 'key done and is inner' 358 | new_node[remain_curr_key[0]] = node[1] 359 | else: 360 | if PRINT: print 'key not done or not inner', node, key, value 361 | if PRINT: print remain_curr_key 362 | new_node[remain_curr_key[0]] = self._encode_node([ 363 | pack_nibbles( 364 | adapt_terminator(remain_curr_key[1:], not is_inner) 365 | ), 366 | node[1] 367 | ]) 368 | 369 | if remain_key == []: 370 | new_node[-1] = value 371 | else: 372 | new_node[remain_key[0]] = self._encode_node([ 373 | pack_nibbles(with_terminator(remain_key[1:])), value 374 | ]) 375 | if PRINT: print new_node 376 | 377 | if prefix_length: 378 | # create node for key prefix 379 | if PRINT: print 'prefix length', prefix_length 380 | new_node= [pack_nibbles(curr_key[:prefix_length]), 381 | self._encode_node(new_node)] 382 | if PRINT: print 'new node type', self._get_node_type(new_node) 383 | return new_node 384 | else: 385 | return new_node 386 | 387 | def _delete_node_storage(self, node): 388 | '''delete storage 389 | :param node: node in form of list, or BLANK_NODE 390 | ''' 391 | if node == BLANK_NODE: 392 | return 393 | assert isinstance(node, list) 394 | encoded = self._encode_node(node) 395 | if len(encoded) < 32: 396 | return 397 | self.db.delete(encoded) 398 | 399 | def _delete(self, node, key): 400 | """ update item inside a node 401 | 402 | :param node: node in form of list, or BLANK_NODE 403 | :param key: nibble list without terminator 404 | .. note:: key may be [] 405 | :return: new node 406 | 407 | if this node is changed to a new node, it's parent will take the 408 | responsibility to *store* the new node storage, and delete the old 409 | node storage 410 | """ 411 | node_type = self._get_node_type(node) 412 | if node_type == NODE_TYPE_BLANK: 413 | return BLANK_NODE 414 | 415 | if node_type == NODE_TYPE_BRANCH: 416 | return self._delete_branch_node(node, key) 417 | 418 | if is_key_value_type(node_type): 419 | return self._delete_kv_node(node, key) 420 | 421 | def _normalize_branch_node(self, node): 422 | '''node should have only one item changed 423 | ''' 424 | not_blank_items_count = sum(1 for x in range(17) if node[x]) 425 | assert not_blank_items_count >= 1 426 | 427 | if not_blank_items_count > 1: 428 | return node 429 | 430 | # now only one item is not blank 431 | not_blank_index = [i for i, item in enumerate(node) if item][0] 432 | 433 | # the value item is not blank 434 | if not_blank_index == 16: 435 | return [pack_nibbles(with_terminator([])), node[16]] 436 | 437 | # normal item is not blank 438 | sub_node = self._decode_to_node(node[not_blank_index]) 439 | sub_node_type = self._get_node_type(sub_node) 440 | 441 | if is_key_value_type(sub_node_type): 442 | # collape subnode to this node, not this node will have same 443 | # terminator with the new sub node, and value does not change 444 | new_key = [not_blank_index] + \ 445 | unpack_to_nibbles(sub_node[0]) 446 | return [pack_nibbles(new_key), sub_node[1]] 447 | if sub_node_type == NODE_TYPE_BRANCH: 448 | return [pack_nibbles([not_blank_index]), 449 | self._encode_node(sub_node)] 450 | assert False 451 | 452 | def _delete_and_delete_storage(self, node, key): 453 | old_node = node[:] 454 | new_node = self._delete(node, key) 455 | if old_node != new_node: 456 | self._delete_node_storage(old_node) 457 | return new_node 458 | 459 | def _delete_branch_node(self, node, key): 460 | # already reach the expected node 461 | if not key: 462 | node[-1] = BLANK_NODE 463 | return self._normalize_branch_node(node) 464 | 465 | encoded_new_sub_node = self._encode_node( 466 | self._delete_and_delete_storage( 467 | self._decode_to_node(node[key[0]]), key[1:]) 468 | ) 469 | 470 | if encoded_new_sub_node == node[key[0]]: 471 | return node 472 | 473 | node[key[0]] = encoded_new_sub_node 474 | if encoded_new_sub_node == BLANK_NODE: 475 | return self._normalize_branch_node(node) 476 | 477 | return node 478 | 479 | def _delete_kv_node(self, node, key): 480 | node_type = self._get_node_type(node) 481 | assert is_key_value_type(node_type) 482 | curr_key = without_terminator(unpack_to_nibbles(node[0])) 483 | 484 | if not starts_with(key, curr_key): 485 | # key not found 486 | return node 487 | 488 | if node_type == NODE_TYPE_LEAF: 489 | return BLANK_NODE if key == curr_key else node 490 | 491 | # for inner key value type 492 | new_sub_node = self._delete_and_delete_storage( 493 | self._decode_to_node(node[1]), key[len(curr_key):]) 494 | 495 | if self._encode_node(new_sub_node) == node[1]: 496 | return node 497 | 498 | # new sub node is BLANK_NODE 499 | if new_sub_node == BLANK_NODE: 500 | return BLANK_NODE 501 | 502 | assert isinstance(new_sub_node, list) 503 | 504 | # new sub node not blank, not value and has changed 505 | new_sub_node_type = self._get_node_type(new_sub_node) 506 | 507 | if is_key_value_type(new_sub_node_type): 508 | # collape subnode to this node, not this node will have same 509 | # terminator with the new sub node, and value does not change 510 | new_key = curr_key + unpack_to_nibbles(new_sub_node[0]) 511 | return [pack_nibbles(new_key), new_sub_node[1]] 512 | 513 | if new_sub_node_type == NODE_TYPE_BRANCH: 514 | return [pack_nibbles(curr_key), self._encode_node(new_sub_node)] 515 | 516 | # should be no more cases 517 | assert False 518 | 519 | def delete(self, key): 520 | ''' 521 | :param key: a string with length of [0, 32] 522 | ''' 523 | if not isinstance(key, (str, unicode)): 524 | raise Exception("Key must be string") 525 | 526 | if len(key) > 32: 527 | raise Exception("Max key length is 32") 528 | 529 | self.root_node = self._delete_and_delete_storage( 530 | self.root_node, 531 | bin_to_nibbles(str(key))) 532 | self.get_root_hash() 533 | self.db.commit() 534 | 535 | def _get_size(self, node): 536 | '''Get counts of (key, value) stored in this and the descendant nodes 537 | 538 | :param node: node in form of list, or BLANK_NODE 539 | ''' 540 | if node == BLANK_NODE: 541 | return 0 542 | 543 | node_type = self._get_node_type(node) 544 | 545 | if is_key_value_type(node_type): 546 | value_is_node = node_type == NODE_TYPE_EXTENSION 547 | if value_is_node: 548 | return self._get_size(self._decode_to_node(node[1])) 549 | else: 550 | return 1 551 | elif node_type == NODE_TYPE_BRANCH: 552 | sizes = [self._get_size(self._decode_to_node(node[x])) 553 | for x in range(16)] 554 | sizes = sizes + [1 if node[-1] else 0] 555 | return sum(sizes) 556 | 557 | def _to_dict(self, node): 558 | '''convert (key, value) stored in this and the descendant nodes 559 | to dict items. 560 | 561 | :param node: node in form of list, or BLANK_NODE 562 | 563 | .. note:: 564 | 565 | Here key is in full form, rather than key of the individual node 566 | ''' 567 | if node == BLANK_NODE: 568 | return {} 569 | 570 | node_type = self._get_node_type(node) 571 | 572 | if is_key_value_type(node_type): 573 | nibbles = without_terminator(unpack_to_nibbles(node[0])) 574 | key = '+'.join([str(x) for x in nibbles]) 575 | if node_type == NODE_TYPE_EXTENSION: 576 | sub_dict = self._to_dict(self._decode_to_node(node[1])) 577 | else: 578 | sub_dict = {str(NIBBLE_TERMINATOR): node[1]} 579 | 580 | # prepend key of this node to the keys of children 581 | res = {} 582 | for sub_key, sub_value in sub_dict.iteritems(): 583 | full_key = '{0}+{1}'.format(key, sub_key).strip('+') 584 | res[full_key] = sub_value 585 | return res 586 | 587 | elif node_type == NODE_TYPE_BRANCH: 588 | res = {} 589 | for i in range(16): 590 | sub_dict = self._to_dict(self._decode_to_node(node[i])) 591 | 592 | for sub_key, sub_value in sub_dict.iteritems(): 593 | full_key = '{0}+{1}'.format(i, sub_key).strip('+') 594 | res[full_key] = sub_value 595 | 596 | if node[16]: 597 | res[str(NIBBLE_TERMINATOR)] = node[-1] 598 | return res 599 | 600 | def to_dict(self): 601 | d = self._to_dict(self.root_node) 602 | res = {} 603 | for key_str, value in d.iteritems(): 604 | if key_str: 605 | nibbles = [int(x) for x in key_str.split('+')] 606 | else: 607 | nibbles = [] 608 | key = nibbles_to_bin(without_terminator(nibbles)) 609 | res[key] = value 610 | return res 611 | 612 | def get(self, key): 613 | return self._get(self.root_node, bin_to_nibbles(str(key))) 614 | 615 | def __len__(self): 616 | return self._get_size(self.root_node) 617 | 618 | def __getitem__(self, key): 619 | return self.get(key) 620 | 621 | def __setitem__(self, key, value): 622 | return self.update(key, value) 623 | 624 | def __delitem__(self, key): 625 | return self.delete(key) 626 | 627 | def __iter__(self): 628 | return iter(self.to_dict()) 629 | 630 | def __contains__(self, key): 631 | return self.get(key) != BLANK_NODE 632 | 633 | def update(self, key, value): 634 | ''' 635 | :param key: a string with length of [0, 32] 636 | :value: a string 637 | ''' 638 | if not isinstance(key, (str, unicode)): 639 | raise Exception("Key must be string") 640 | 641 | if len(key) > 32: 642 | raise Exception("Max key length is 32") 643 | 644 | if not isinstance(value, (str, unicode)): 645 | raise Exception("Value must be string") 646 | 647 | if value == '': 648 | return self.delete(key) 649 | 650 | self.root_node = self._update_and_delete_storage( 651 | self.root_node, 652 | bin_to_nibbles(str(key)), 653 | value) 654 | if PRINT: print 'root hash before db commit', self.get_root_hash().encode('hex') 655 | self.db.commit() 656 | 657 | def root_hash_valid(self): 658 | if self.root_hash == BLANK_ROOT: 659 | return True 660 | return self.root_hash in self.db 661 | 662 | if __name__ == "__main__": 663 | import sys 664 | 665 | def encode_node(nd): 666 | if isinstance(nd, str): 667 | return nd.encode('hex') 668 | else: 669 | return rlp.encode(nd).encode('hex') 670 | 671 | if len(sys.argv) >= 2: 672 | if sys.argv[1] == 'insert': 673 | t = Trie(sys.argv[2], sys.argv[3].decode('hex')) 674 | t.update(sys.argv[4], sys.argv[5]) 675 | print encode_node(t.root_hash) 676 | elif sys.argv[1] == 'get': 677 | t = Trie(sys.argv[2], sys.argv[3].decode('hex')) 678 | print t.get(sys.argv[4]) 679 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import logging.config 3 | from sha3 import sha3_256 4 | from bitcoin import privtopub 5 | import struct 6 | import os 7 | import sys 8 | import rlp 9 | import db 10 | import random 11 | from rlp import big_endian_to_int, int_to_big_endian 12 | 13 | 14 | logger = logging.getLogger(__name__) 15 | 16 | 17 | # decorator 18 | def debug(label): 19 | def deb(f): 20 | def inner(*args, **kwargs): 21 | i = random.randrange(1000000) 22 | print label, i, 'start', args 23 | x = f(*args, **kwargs) 24 | print label, i, 'end', x 25 | return x 26 | return inner 27 | return deb 28 | 29 | 30 | def sha3(seed): 31 | return sha3_256(seed).digest() 32 | 33 | 34 | def privtoaddr(x): 35 | if len(x) > 32: 36 | x = x.decode('hex') 37 | return sha3(privtopub(x)[1:])[12:].encode('hex') 38 | 39 | 40 | def zpad(x, l): 41 | return '\x00' * max(0, l - len(x)) + x 42 | 43 | 44 | def coerce_addr_to_bin(x): 45 | if isinstance(x, (int, long)): 46 | return zpad(int_to_big_endian(x), 20).encode('hex') 47 | elif len(x) == 40 or len(x) == 0: 48 | return x.decode('hex') 49 | else: 50 | return zpad(x, 20)[-20:] 51 | 52 | 53 | def coerce_addr_to_hex(x): 54 | if isinstance(x, (int, long)): 55 | return zpad(int_to_big_endian(x), 20).encode('hex') 56 | elif len(x) == 40 or len(x) == 0: 57 | return x 58 | else: 59 | return zpad(x, 20)[-20:].encode('hex') 60 | 61 | 62 | def coerce_to_int(x): 63 | if isinstance(x, (int, long)): 64 | return x 65 | elif len(x) == 40: 66 | return big_endian_to_int(x.decode('hex')) 67 | else: 68 | return big_endian_to_int(x) 69 | 70 | 71 | def coerce_to_bytes(x): 72 | if isinstance(x, (int, long)): 73 | return int_to_big_endian(x) 74 | elif len(x) == 40: 75 | return x.decode('hex') 76 | else: 77 | return x 78 | 79 | 80 | def int_to_big_endian4(integer): 81 | ''' 4 bytes big endian integer''' 82 | return struct.pack('>I', integer) 83 | 84 | 85 | def recursive_int_to_big_endian(item): 86 | ''' convert all int to int_to_big_endian recursively 87 | ''' 88 | if isinstance(item, (int, long)): 89 | return int_to_big_endian(item) 90 | elif isinstance(item, (list, tuple)): 91 | res = [] 92 | for item in item: 93 | res.append(recursive_int_to_big_endian(item)) 94 | return res 95 | return item 96 | 97 | 98 | def rlp_encode(item): 99 | ''' 100 | item can be nested string/integer/list of string/integer 101 | ''' 102 | return rlp.encode(recursive_int_to_big_endian(item)) 103 | 104 | # Format encoders/decoders for bin, addr, int 105 | 106 | 107 | def decode_hash(v): 108 | '''decodes a bytearray from hash''' 109 | return db_get(v) 110 | 111 | 112 | def decode_bin(v): 113 | '''decodes a bytearray from serialization''' 114 | if not isinstance(v, (str, unicode)): 115 | raise Exception("Value must be binary, not RLP array") 116 | return v 117 | 118 | 119 | def decode_addr(v): 120 | '''decodes an address from serialization''' 121 | if len(v) not in [0, 20]: 122 | raise Exception("Serialized addresses must be empty or 20 bytes long!") 123 | return v.encode('hex') 124 | 125 | 126 | def decode_int(v): 127 | '''decodes and integer from serialization''' 128 | if len(v) > 0 and v[0] == '\x00': 129 | raise Exception("No leading zero bytes allowed for integers") 130 | return big_endian_to_int(v) 131 | 132 | 133 | def decode_root(root): 134 | if isinstance(root, list): 135 | if len(rlp.encode(root)) >= 32: 136 | raise Exception("Direct RLP roots must have length <32") 137 | elif isinstance(root, (str, unicode)): 138 | if len(root) != 0 and len(root) != 32: 139 | raise Exception("String roots must be empty or length-32") 140 | else: 141 | raise Exception("Invalid root") 142 | return root 143 | 144 | 145 | def encode_hash(v): 146 | '''encodes a bytearray into hash''' 147 | k = sha3(v) 148 | db_put(k, v) 149 | return k 150 | 151 | 152 | def encode_bin(v): 153 | '''encodes a bytearray into serialization''' 154 | return v 155 | 156 | 157 | def encode_root(v): 158 | '''encodes a trie root into serialization''' 159 | return v 160 | 161 | 162 | def encode_addr(v): 163 | '''encodes an address into serialization''' 164 | if not isinstance(v, (str, unicode)) or len(v) not in [0, 40]: 165 | raise Exception("Address must be empty or 40 chars long") 166 | return v.decode('hex') 167 | 168 | 169 | def encode_int(v): 170 | '''encodes an integer into serialization''' 171 | if not isinstance(v, (int, long)) or v < 0 or v >= 2 ** 256: 172 | raise Exception("Integer invalid or out of range") 173 | return int_to_big_endian(v) 174 | 175 | decoders = { 176 | "hash": decode_hash, 177 | "bin": decode_bin, 178 | "addr": decode_addr, 179 | "int": decode_int, 180 | "trie_root": decode_root, 181 | } 182 | 183 | encoders = { 184 | "hash": encode_hash, 185 | "bin": encode_bin, 186 | "addr": encode_addr, 187 | "int": encode_int, 188 | "trie_root": encode_root, 189 | } 190 | 191 | 192 | def print_func_call(ignore_first_arg=False, max_call_number=100): 193 | ''' utility function to facilitate debug, it will print input args before 194 | function call, and print return value after function call 195 | 196 | usage: 197 | 198 | @print_func_call 199 | def some_func_to_be_debu(): 200 | pass 201 | 202 | :param ignore_first_arg: whether print the first arg or not. 203 | useful when ignore the `self` parameter of an object method call 204 | ''' 205 | from functools import wraps 206 | 207 | def display(x): 208 | x = str(x) 209 | try: 210 | x.decode('ascii') 211 | except: 212 | return 'NON_PRINTABLE' 213 | return x 214 | 215 | local = {'call_number': 0} 216 | 217 | def inner(f): 218 | 219 | @wraps(f) 220 | def wrapper(*args, **kwargs): 221 | local['call_number'] = local['call_number'] + 1 222 | tmp_args = args[1:] if ignore_first_arg and len(args) else args 223 | this_call_number = local['call_number'] 224 | print('{0}#{1} args: {2}, {3}'.format( 225 | f.__name__, 226 | this_call_number, 227 | ', '.join([display(x) for x in tmp_args]), 228 | ', '.join(display(key) + '=' + str(value) 229 | for key, value in kwargs.iteritems()) 230 | )) 231 | res = f(*args, **kwargs) 232 | print('{0}#{1} return: {2}'.format( 233 | f.__name__, 234 | this_call_number, 235 | display(res))) 236 | 237 | if local['call_number'] > 100: 238 | raise Exception("Touch max call number!") 239 | return res 240 | return wrapper 241 | return inner 242 | 243 | 244 | class DataDir(object): 245 | 246 | ethdirs = { 247 | "linux2": "~/.pyethereum", 248 | "darwin": "~/Library/Application Support/Pyethereum/", 249 | "win32": "~/AppData/Roaming/Pyethereum", 250 | "win64": "~/AppData/Roaming/Pyethereum", 251 | } 252 | 253 | def __init__(self): 254 | self._path = None 255 | 256 | def set(self, path): 257 | path = os.path.abspath(path) 258 | if not os.path.exists(path): 259 | os.makedirs(path) 260 | assert os.path.isdir(path) 261 | self._path = path 262 | 263 | def _set_default(self): 264 | p = self.ethdirs.get(sys.platform, self.ethdirs['linux2']) 265 | self.set(os.path.expanduser(os.path.normpath(p))) 266 | 267 | @property 268 | def path(self): 269 | if not self._path: 270 | self._set_default() 271 | return self._path 272 | 273 | data_dir = DataDir() 274 | 275 | 276 | def get_db_path(): 277 | return os.path.join(data_dir.path, 'statedb') 278 | 279 | 280 | def get_index_path(): 281 | return os.path.join(data_dir.path, 'indexdb') 282 | 283 | 284 | def db_put(key, value): 285 | database = db.DB(get_db_path()) 286 | res = database.put(key, value) 287 | database.commit() 288 | return res 289 | 290 | 291 | def db_get(key): 292 | database = db.DB(get_db_path()) 293 | return database.get(key) 294 | 295 | 296 | def configure_logging(loggerlevels=':DEBUG', verbosity=1): 297 | logconfig = dict( 298 | version=1, 299 | disable_existing_loggers=False, 300 | formatters=dict( 301 | debug=dict( 302 | format='[%(asctime)s] %(name)s %(levelname)s %(threadName)s:' 303 | ' %(message)s' 304 | ), 305 | minimal=dict( 306 | format='%(message)s' 307 | ), 308 | ), 309 | handlers=dict( 310 | default={ 311 | 'level': 'INFO', 312 | 'class': 'logging.StreamHandler', 313 | 'formatter': 'minimal' 314 | }, 315 | verbose={ 316 | 'level': 'DEBUG', 317 | 'class': 'logging.StreamHandler', 318 | 'formatter': 'debug' 319 | }, 320 | ), 321 | loggers=dict() 322 | ) 323 | 324 | for loggerlevel in filter(lambda _: ':' in _, loggerlevels.split(',')): 325 | name, level = loggerlevel.split(':') 326 | logconfig['loggers'][name] = dict( 327 | handlers=['verbose'], level=level, propagate=False) 328 | 329 | if len(logconfig['loggers']) == 0: 330 | logconfig['loggers'][''] = dict( 331 | handlers=['default'], 332 | level={0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}.get( 333 | verbosity), 334 | propagate=True) 335 | 336 | logging.config.dictConfig(logconfig) 337 | # logging.debug("logging set up like that: %r", logconfig) 338 | 339 | 340 | class Denoms(): 341 | def __init__(self): 342 | self.wei = 1 343 | self.babbage = 10**3 344 | self.lovelace = 10**6 345 | self.shannon = 10**9 346 | self.szabo = 10**12 347 | self.finney = 10**15 348 | self.ether = 10**18 349 | self.turing = 2**256 350 | 351 | denoms = Denoms() 352 | --------------------------------------------------------------------------------