├── README.md
├── blog.txt
├── exercises
├── ex1.py
├── ex2.py
├── ex2b.py
├── ex2c.py
├── ex2d.py
├── ex3.py
└── ex4.py
└── src
├── db.py
├── rlp.py
├── trie.py
└── utils.py
/README.md:
--------------------------------------------------------------------------------
1 | understanding_ethereum_trie
2 | ===========================
3 | Repo supporting blog post at http://easythereentropy.wordpress.com/2014/06/04/understanding-the-ethereum-trie/
4 |
5 | Hopefully, this will help those confused soles who have yet to grasp the trie.
6 |
--------------------------------------------------------------------------------
/blog.txt:
--------------------------------------------------------------------------------
1 | The other day I finally got around to reading the entire ethereum yellow paper and to figuring out how the modified Merkle-patricia-tree (trie) works. So let's go through a brief but hopefully complete explanation of the trie, using examples.
2 |
3 | To avoid blockchain bloat, most of the data in the ethereum network is actually not stored on the blockchain itself. Rather, the blockchain stores hashes of RLP encodings of the data, where the hashes can be used as keys to look up the data in an offline key-value store. And since these are cryptographic hashes, we can prove the data is authentic if it hashes to the value stored in the blockchain. Note that RLP (recursive length prefix encoding) is ethereum's home-rolled encoding system, and all values in the database (except hashes) are RLP encoded.
4 |
5 | It is implied that the key-value database used to store ethereum data is a radix tree (trie), with a couple modifications to boost efficiency. In a normal radix tree, a key is the actual path taken through the tree to get to the corresponding value. That is, beginning from the root node of the tree, each character in the key tells you which child node to follow to get to the corresponding value, where the values are stored in the leaf nodes that terminate every path through the tree. Supposing the keys come from an alphabet containing N characters, each node in the tree can have up to N children, and the maximum depth of the tree is the maximum length of a key.
6 |
7 | Radix trees are nice because they allow keys that begin with the same sequence of characters to have values that are closer together in the tree. There are also no key collisions in a trie, like there might be in hash-tables. They can, however, be rather inefficient, like when you have a long key where no other key shares a common prefix. Then you have to travel (and store) a considerable number of nodes in the tree to get to the value, despite there being no other values along the path.
8 |
9 | The ethereum implementation of radix trees introduces a number of improvements. First, to make the tree cryptographically secure, each node is referenced by its hash, which in current implementations are used for look-up in a leveldb database. With this scheme, the root node becomes a cryptographic fingerprint of the entire data structure (hence, Merkle). Second, a number of node 'types' are introduced to improve efficiency. There is the blank node, which is simply empty, and the standard leaf node, which is a simple list of [key, value]
. Then there are extension nodes, which are also simple [key, value]
lists, but where value
is a hash of some other node. The hash can be used to look-up that node in the database. Finally, there are branch nodes, which are lists of length 17. The first 16 elements correspond to the 16 possible hex characters in a key, and the final element holds a value if there is a [key, value]
pair where the key ends at the branch node. If you don't get it yet, don't worry, no one does :D. We will work through examples to make it all clear.
10 |
11 | One more important thing is a special hex-prefix (HP) encoding used for keys. As mentioned, the alphabet is hex, so there are 16 possible children for each node. Since there are two kinds of [key, value]
nodes (leaf and extension), a special 'terminator' flag is used to denote which type the key refers to. If the terminator flag is on, the key refers to a leaf node, and the corresponding value is the value for that key. If it's off, then the value is a hash to be used to look-up the corresponding node in the db. HP also encodes whether or not the key is of odd or even length. Finally, we note that a single hex character, or 4 bit binary number, is known as a nibble.
12 |
13 | The HP specification is rather simple. A nibble is appended to the key that encodes both the terminator status and parity. The lowest significant bit in the nibble encodes parity, while the next lowest encodes terminator status. If the key was in fact even, then we add another nibble, of value 0, to maintain overall evenness (so we can properly represent in bytes).
14 |
15 | Ok. So this all sounds fine and dandy, and you probably read about it here or here, or if you're quite brave, here, but let's get down and dirty with some python examples. I've set up a little repo on github that you can clone and follow along with.
16 |
17 | git clone git@github.com:ebuchman/understanding_ethereum_trie
18 |
19 | Basically I just grabbed the necessary files from the pyethereum repo (trie.py, utils.py, rlp.py, db.py), and wrote a bunch of exercises as short python scripts that you can try out. I also added some print statements to help you see what's going on in trie.py, though due to recursion, this can get messy, so there's a flag at the top of trie.py
allowing you to turn printing on/off. Please feel free to improve the print statements and send a pull-request! You should be in the trie directory after cloning, and run your scripts with python excercises/exA.py
, where A
is the exercise number. So let's start with ex1.py
.
20 |
21 | In ex1.py
, we initialize a trie with a blank root, and add a single entry:
22 |
23 | state = trie.Trie('triedb', trie.BLANK_ROOT)
24 | state.update('\x01\x01\x02', rlp.encode(['hello']))
25 | print state.root_hash.encode('hex')
26 |
27 | Here, we're using '\x01\x01\x02'
as the key and 'hello'
as the value. The key should be a string (max 32 bytes, typically a big-endian integer or an address), and the value an rlp encoding of arbitrary data. Note we could have used something simpler, like 'dog'
, as our key, but let's keep it real with raw bytes. We can follow through the code in trie.py
to see what happens under the hood. Basically, in this case, since we start with a blank node, trie.py
creates a new leaf node (adding the terminator flag to the key), rlp encodes it, takes the hash, and stores [hash, rlp(node)] in the database. The print statement should display the hash, which we can use from now on as the root hash for our trie. Finally, for completeness, we look at the HP encoding of the key:
28 |
29 | k, v = state.root_node
30 | print 'root node:', [k, v]
31 | print 'hp encoded key, in hex', k.encode('hex')
32 |
33 |
34 | The output of ex1.py
is
35 |
36 | root hash 15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc
37 | root node: [' \x01\x01\x02', '\xc6\x85hello']
38 | hp encoded key, in hex: 20010102
39 |
40 | Note the final 6 nibbles are the key we used, 010102
, while the first two give us the HP encoding. The first nibble tells us that this is a terminator node (since it would be 10
in binary, so the second least significant bit is on), and since the key was even length (least significant bit is 0), we add a second 0 nibble.
41 |
42 | Moving on to ex2.py
, we initialize a trie that starts with the previous hash:
43 |
44 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
45 | print state.root_node
46 |
47 | The print statement should give us the [key, value]
pair we previously stored. Great. Let's add some more entries. We're going to try this a few different ways, so we can clearly see the different possibilities. We'll use multiple ex2
python files, initializing the trie from the original hash each time. First, let's make an entry with the same key we already used but a different value. Since the new value will lead to a new hash, we will have two tries, referenced by two different hashes, both starting with the same key (the rest of ex2.py
)
48 |
49 | state.update('\x01\x01\x02', rlp.encode(['hellothere']))
50 | print state.root_hash.encode('hex')
51 | print state.root_node
52 |
53 | The output for ex2.py
is:
54 | 05e13d8be09601998499c89846ec5f3101a1ca09373a5f0b74021261af85d396
55 | [' \x01\x01\x02', '\xcb\x8ahellothere']
56 |
57 | So that's not all that interesting, but it's nice that we didn't overwrite the original entry, and can still access both using their respective hashes. Now, let's add an entry that use's the same key but with a different final nibble (ex2b.py
):
58 |
59 | state.update('\x01\x01\x03', rlp.encode(['hellothere']))
60 | print 'root hash:', state.root_hash.encode('hex')
61 | k, v = state.root_node
62 | print 'root node:', [k, v]
63 | print 'hp encoded key, in hex:', k.encode('hex')
64 |
65 | This print 'root node'
statement should return something mostly unintelligible. That's because it's giving us a [key, value] node where the key is the common prefix from our two keys ([0,1,0,1,0]), encoded using HP to include a non-terminator flag and an indication that the key is odd-length, and the value is the hash of the rlp encoding of the node we're interested in. That is, it's an extension node. We can use the hash to look up the node in the database:
66 |
67 | print state._get_node_type(state.root_node) == trie.NODE_TYPE_EXTENSION
68 | common_prefix_key, node_hash = state.root_node
69 | print state._decode_to_node(node_hash)
70 | print state._get_node_type(state._decode_to_node(node_hash)) == trie.NODE_TYPE_BRANCH
71 |
72 | And the output for ex2b.py
:
73 |
74 | root hash: b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d
75 | root node: ['\x10\x10\x10', '"\x01\xab\x83u\x15o\'\xf7T-h\xde\x94K/\xba\xa3[\x83l\x94\xe7\xb3\x8a\xcf\n\nt\xbb\xef\xd9']
76 | hp encoded key, in hex: 101010
77 | True
78 | ['', '', [' ', '\xc6\x85hello'], [' ', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '', '', '']
79 | True
80 |
81 | This result is rather interesting. What we have here is a branch node, a list with 17 entries. Note the difference in our original keys: they both start with [0,1,0,1,0]
, and one ends in 2
while the other ends in 3
. So, when we add the new entry (key ending in 3
), the node that previously held the key ending in 2
is replaced with a branch node whose key is the HP encoded common prefix of the two keys. The branch node is stored as a [key, value]
extension node, where key
is the HP encoded common prefix and value
is the hash of the node, which can be used to look-up the branch node that it points to. The entry at index 2 of this branch node is the original node with key ending in 2 ('hello'), while the entry at index 3 is the new node ('hellothere'). Since both keys are only one nibble longer than the key for the branch node itself, the final nibble is encoded implicitly by the position of the nodes in the branch node. And since that exhausts all the characters in the keys, these nodes are stored with empty keys in the branch node.
82 |
83 | You'll note I added a couple print statements to verify that these nodes are in fact what I say they are - extension and branch nodes, respectively.
84 |
85 | Ok, so that was pretty cool. Let's do it again but with a key equal to the first few nibbles of our original key (ex2c.py
):
86 |
87 | state.update('\x01\x01', rlp.encode(['hellothere']))
88 |
89 | Again, we see that this results in the creation of a branch node, but something different has happened. The branch node corresponds to the key '\x01\x01', but there is also a value with that key ('hellothere'). Hence, that value is placed in the final (17th) position of the branch node. The other entry, with key '\x01\x01\x02', is placed in the position corresponding to the next nibble in its key, in this case, 0. Since it's key hasn't been fully exhausted, we store the leftover nibbles (in this case, just '2') in the key position for the node. Hence the output:
90 |
91 | [['2', '\xc6\x85hello'], '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '\xcb\x8ahellothere']
92 |
93 | Make sense? Let's do one final component of exercise 2 (ex2d.py
). Here, we add a new entry with a key that is identical to the original key, but has an additional two nibbles:
94 |
95 | state.update('\x01\x01\x02\x57', rlp.encode(['hellothere']))
96 |
97 | In this case, the opposite of what we just saw happens! The original entry's value is stored at the final position of the branch node, where the key for the branch node is the key for that value ('\x01\x01\x02'). The second entry is stored at the position of it's next nibble (5), with a key equal to the remaining nibbles (just 7):
98 |
99 | ['', '', '', '', '', ['7', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
100 |
101 | Tada! Try playing around a bit to make sure you understand what's going on here. Nodes are stored in the database according to the hash of their rlp encoding. Once a node is retrieved, key's are used to travel a path through a further series of nodes (which may involve more hash lookups) to reach the final value. Of course, we've only used two entries in each of these examples to keep things simple, but that has been sufficient to expose the basic mechanic of the trie. We could add more entries to fill up the branch node, but since we already understand how that works, let's move on to something more complicated. In exercise 3, we will add a third entry, which shares a common prefix with the second entry. This one's a little longer, but the result is totally awesome (ex3.py
):
102 |
103 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
104 | print state.root_hash.encode('hex')
105 | print state.root_node
106 | print ''
107 | state.update('\x01\x01\x02\x55', rlp.encode(['hellothere']))
108 | print 'root hash:', state.root_hash.encode('hex')
109 | print 'root node:', state.root_node
110 | print 'branch node it points to:', state._decode_to_node(state.root_node[1])
111 | print ''
112 |
113 | Nothing new yet. Initialize from original hash, add a new node with key '\x01\x01\x02\x55'
. Creates a branch node and points to it with a hash. We know this. Now the fun stuff:
114 |
115 | state.update('\x01\x01\x02\x57', rlp.encode(['jimbojones']))
116 | print 'root hash:', state.root_hash.encode('hex')
117 | print 'root node:', state.root_node
118 | branch_node = state._decode_to_node(state.root_node[1])
119 | print 'branch node it points to:', branch_node
120 |
121 | We're doing the same thing - add a new node, this time with key '\x01\x01\x02\x57'
and value 'jimbojones'
. But now, in our branch node, where there used to be a node with value 'hellothere'
(ie. at index 5
), there is a messy ole hash! What do we do with hashes in tries? We use em to look up more nodes, of course!
122 |
123 | next_hash = branch_node[5]
124 | print 'hash stored in branch node:', next_hash.encode('hex')
125 | print 'branch node it points to:', state._decode_to_node(next_hash)
126 |
127 | And the output:
128 |
129 | root hash: 17fe8af9c6e73de00ed5fd45d07e88b0c852da5dd4ee43870a26c39fc0ec6fb3
130 | root node: ['\x00\x01\x01\x02', '\r\xca6X\xe5T\xd0\xbd\xf6\xd7\x19@\xd1E\t\x8ehW\x03\x8a\xbd\xa3\xb2\x92!\xae{2\x1bp\x06\xbb']
131 | branch node it points to: ['', '', '', '', '', ['5', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
132 |
133 | root hash: fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727
134 | root node: ['\x00\x01\x01\x02', '\xd5/\xaf\x1f\xdeO!u>&3h_+\xac?\xf1\xf3*\xb7)3\xec\xe9\xd5\x9f2\xcaoc\x95m']
135 | branch node it points to: ['', '', '', '', '', '\x00&\x15\xb7\xc4\x05\xf6\xf3F2\x9a(N\x8f\xb2H\xe75\xcf\xfa\x89C-\xab\xa2\x9eV\xe4\x14\xdfl0', '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
136 | hash stored in branch node: 002615b7c405f6f346329a284e8fb248e735cffa89432daba29e56e414df6c30
137 | branch node it points to: ['', '', '', '', '', [' ', '\xcb\x8ahellothere'], '', [' ', '\xcb\x8ajimbojones'], '', '', '', '', '', '', '', '', '']
138 |
139 | Tada! So this hash, which corresponds to key [0,1,0,1,0,2,5]
, points to another branch node which holds our values 'hellothere'
and 'jimbojones'
at the appropriate positions. I recommend experimenting a little further by adding some new entries, specifically, try filling in the final branch node some more, including the last position.
140 |
141 | Ok! So this has been pretty cool. Hopefully by now you have a pretty solid understanding of how the trie works, the HP encoding, the different node types, and how the nodes are connected and refer to each other. As a final exercise, let's do some look-ups.
142 |
143 | state = trie.Trie('triedb', 'b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d'.decode('hex'))
144 | print 'using root hash from ex2b'
145 | print rlp.decode(state.get('\x01\x01\x03'))
146 | print ''
147 | state = trie.Trie('triedb', 'fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727'.decode('hex'))
148 | print 'using root hash from ex3'
149 | print rlp.decode(state.get('\x01\x01\x02'))
150 | print rlp.decode(state.get('\x01\x01\x02\x55'))
151 | print rlp.decode(state.get('\x01\x01\x02\x57'))
152 |
153 | You should see the values we stored in previous exercises.
154 |
155 | And that's that! Now, you might wonder, "so, how is all this trie stuff actually used in ethereum?" Great question. And my repository does not have the solutions. But if you clone the official pyethereum repo, and do a quick grep -r 'Trie' .
, it should clue you in. What we find is that a trie is used in two key places: to encode transaction lists in a block, and to encode the state of a block. For transactions, the keys are big-endian integers representing the transaction count in the current block. For the state trie, the keys are ethereum adresses.
156 |
157 | There you have it folks. We have achieved an understanding of the ethereum trie. Now go forth, and trie it!
158 |
--------------------------------------------------------------------------------
/exercises/ex1.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie
6 | state = trie.Trie('triedb', trie.BLANK_ROOT)
7 | state.update('\x01\x01\x02', rlp.encode(['hello']))
8 | print 'root hash', state.root_hash.encode('hex')
9 | k, v = state.root_node
10 | print 'root node:', [k, v]
11 | print 'hp encoded key, in hex:', k.encode('hex')
12 |
--------------------------------------------------------------------------------
/exercises/ex2.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie from previous hash; add new entry with same key.
6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
7 | print state.root_hash.encode('hex')
8 | print state.root_node
9 | print ''
10 | state.update('\x01\x01\x02', rlp.encode(['hellothere']))
11 | print state.root_hash.encode('hex')
12 | print state.root_node
13 | # we now have two tries, addressed in the database by their respective hashes, though they each have the same key
14 |
--------------------------------------------------------------------------------
/exercises/ex2b.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie from previous hash; add new [key, value] where key has common prefix
6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
7 | print state.root_hash.encode('hex')
8 | print state.root_node
9 | print ''
10 | state.update('\x01\x01\x03', rlp.encode(['hellothere']))
11 | print 'root hash:', state.root_hash.encode('hex')
12 | k, v = state.root_node
13 | print 'root node:', [k, v]
14 | print 'hp encoded key, in hex:', k.encode('hex')
15 | print state._get_node_type(state.root_node) == trie.NODE_TYPE_EXTENSION
16 | common_prefix_key, node_hash = state.root_node
17 | print state._decode_to_node(node_hash)
18 | print state._get_node_type(state._decode_to_node(node_hash)) == trie.NODE_TYPE_BRANCH
19 |
--------------------------------------------------------------------------------
/exercises/ex2c.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie from previous hash; add new [key, value] where key has common prefix
6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
7 | print state.root_hash.encode('hex')
8 | print state.root_node
9 | print ''
10 | state.update('\x01\x01', rlp.encode(['hellothere']))
11 | print 'root hash:', state.root_hash.encode('hex')
12 | print 'root node:', state.root_node
13 | print state._decode_to_node(state.root_node[1])
14 |
--------------------------------------------------------------------------------
/exercises/ex2d.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie from previous hash; add new [key, value] where key has common prefix
6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
7 | print state.root_hash.encode('hex')
8 | print state.root_node
9 | print ''
10 | state.update('\x01\x01\x02\x55', rlp.encode(['hellothere']))
11 | print 'root hash:', state.root_hash.encode('hex')
12 | print 'root node:', state.root_node
13 | print state._decode_to_node(state.root_node[1])
14 |
--------------------------------------------------------------------------------
/exercises/ex3.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie from previous hash; add new [key, value] where key has common prefix
6 | state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
7 | print state.root_hash.encode('hex')
8 | print state.root_node
9 | print ''
10 | state.update('\x01\x01\x02\x55', rlp.encode(['hellothere']))
11 | print 'root hash:', state.root_hash.encode('hex')
12 | print 'root node:', state.root_node
13 | print 'branch node it points to:', state._decode_to_node(state.root_node[1])
14 | print ''
15 | state.update('\x01\x01\x02\x57', rlp.encode(['jimbojones']))
16 | print 'root hash:', state.root_hash.encode('hex')
17 | print 'root node:', state.root_node
18 | branch_node = state._decode_to_node(state.root_node[1])
19 | print 'branch node it points to:', branch_node
20 | next_hash = branch_node[5]
21 | print 'hash stored in branch node:', next_hash.encode('hex')
22 | print 'branch node it points to:', state._decode_to_node(next_hash)
23 |
--------------------------------------------------------------------------------
/exercises/ex4.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.append('src')
3 | import trie, utils, rlp
4 |
5 | #initialize trie from previous hash; add new [key, value] where key has common prefix
6 | state = trie.Trie('triedb', 'b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d'.decode('hex'))
7 | print 'using root hash from ex2b'
8 | print rlp.decode(state.get('\x01\x01\x03'))
9 | print ''
10 | state = trie.Trie('triedb', 'fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727'.decode('hex'))
11 | print 'using root hash from ex3'
12 | print rlp.decode(state.get('\x01\x01\x02'))
13 | print rlp.decode(state.get('\x01\x01\x02\x55'))
14 | print rlp.decode(state.get('\x01\x01\x02\x57'))
15 |
--------------------------------------------------------------------------------
/src/db.py:
--------------------------------------------------------------------------------
1 | import leveldb
2 | import threading
3 |
4 | databases = {}
5 |
6 |
7 | class DB(object):
8 |
9 | def __init__(self, dbfile):
10 | self.dbfile = dbfile
11 | if dbfile not in databases:
12 | databases[dbfile] = (
13 | leveldb.LevelDB(dbfile), dict(), threading.Lock())
14 | self.db, self.uncommitted, self.lock = databases[dbfile]
15 |
16 | def get(self, key):
17 | if key in self.uncommitted:
18 | return self.uncommitted[key]
19 | return self.db.Get(key)
20 |
21 | def put(self, key, value):
22 | with self.lock:
23 | self.uncommitted[key] = value
24 |
25 | def commit(self):
26 | with self.lock:
27 | batch = leveldb.WriteBatch()
28 | for k, v in self.uncommitted.iteritems():
29 | batch.Put(k, v)
30 | self.db.Write(batch, sync=True)
31 | self.uncommitted.clear()
32 |
33 | def delete(self, key):
34 | with self.lock:
35 | if key in self.uncommitted:
36 | del self.uncommitted[key]
37 | if key not in self:
38 | self.db.Delete(key)
39 | else:
40 | self.db.Delete(key)
41 |
42 | def _has_key(self, key):
43 | try:
44 | self.get(key)
45 | return True
46 | except KeyError:
47 | return False
48 |
49 | def __contains__(self, key):
50 | return self._has_key(key)
51 |
52 | def __eq__(self, other):
53 | return isinstance(other, self.__class__) and self.db == other.db
54 |
--------------------------------------------------------------------------------
/src/rlp.py:
--------------------------------------------------------------------------------
1 | '''
2 | First byte of an encoded item
3 |
4 | x: single byte, itself
5 | |
6 | |
7 | 0x7f == 127
8 |
9 | 0x80 == 128
10 | |
11 | x: [0, 55] byte long string, x-0x80 == length
12 | |
13 | 0xb7 == 183
14 |
15 | 0xb8 == 184
16 | |
17 | x: [55, ] long string, x-0xf8 == length of the length
18 | |
19 | 0xbf == 191
20 |
21 | 0xc0 == 192
22 | |
23 | x: [0, 55] byte long list, x-0xc0 == length
24 | |
25 | 0xf7 == 247
26 |
27 | 0xf8 == 248
28 | |
29 | x: [55, ] long list, x-0xf8 == length of the length
30 | |
31 | 0xff == 255
32 | '''
33 |
34 |
35 | def int_to_big_endian(integer):
36 | '''convert a integer to big endian binary string'''
37 | # 0 is a special case, treated same as ''
38 | if integer == 0:
39 | return ''
40 | s = '%x' % integer
41 | if len(s) & 1:
42 | s = '0' + s
43 | return s.decode('hex')
44 |
45 |
46 | def big_endian_to_int(string):
47 | '''convert a big endian binary string to integer'''
48 | # '' is a special case, treated same as 0
49 | string = string or '\x00'
50 | s = string.encode('hex')
51 | return long(s, 16)
52 |
53 |
54 | def __decode(s, pos=0):
55 | ''' decode string start at `pos`
56 | :param s: string of rlp encoded data
57 | :param pos: start position of `s` to decode from
58 | :return:
59 | o: decoded object
60 | pos: end position of the obj in the string of rlp encoded data
61 | '''
62 | assert pos < len(s), "read beyond end of string in __decode"
63 |
64 | fchar = ord(s[pos])
65 | if fchar < 128:
66 | return (s[pos], pos + 1)
67 | elif fchar < 184:
68 | b = fchar - 128
69 | return (s[pos + 1:pos + 1 + b], pos + 1 + b)
70 | elif fchar < 192:
71 | b = fchar - 183
72 | b2 = big_endian_to_int(s[pos + 1:pos + 1 + b])
73 | return (s[pos + 1 + b:pos + 1 + b + b2], pos + 1 + b + b2)
74 | elif fchar < 248:
75 | o = []
76 | pos += 1
77 | pos_end = pos + fchar - 192
78 |
79 | while pos < pos_end:
80 | obj, pos = __decode(s, pos)
81 | o.append(obj)
82 | assert pos == pos_end, "read beyond list boundary in __decode"
83 | return (o, pos)
84 | else:
85 | b = fchar - 247
86 | b2 = big_endian_to_int(s[pos + 1:pos + 1 + b])
87 | o = []
88 | pos += 1 + b
89 | pos_end = pos + b2
90 | while pos < pos_end:
91 | obj, pos = __decode(s, pos)
92 | o.append(obj)
93 | assert pos == pos_end, "read beyond list boundary in __decode"
94 | return (o, pos)
95 |
96 |
97 | def decode(s):
98 | assert isinstance(s, str)
99 | if s:
100 | return __decode(s)[0]
101 |
102 |
103 | def into(data, pos):
104 | fchar = ord(data[pos])
105 | if fchar < 192:
106 | raise Exception("Cannot descend further")
107 | elif fchar < 248:
108 | return pos + 1
109 | else:
110 | return pos + 1 + (fchar - 247)
111 |
112 |
113 | def next_item_pos(data, pos):
114 | '''get position of next item in the encoded list or string:
115 |
116 | if list, then get next item's start position
117 | if string, then get next charactor's postion
118 |
119 | :param data: rlp encoded from list or string
120 | :pos: current item's position
121 | '''
122 | fchar = ord(data[pos])
123 | if fchar < 128:
124 | return pos + 1
125 | elif (fchar % 64) < 56:
126 | return pos + 1 + (fchar % 64)
127 | else:
128 | b = (fchar % 64) - 55
129 | b2 = big_endian_to_int(data[pos + 1:pos + 1 + b])
130 | return pos + 1 + b + b2
131 |
132 |
133 | def descend(data, *indices):
134 | pos = 0
135 | for i in indices:
136 | finish_pos = next_item_pos(data, pos)
137 | pos = into(data, pos)
138 | for j in range(i):
139 | pos = next_item_pos(data, pos)
140 | if pos >= finish_pos:
141 | raise Exception("End of list")
142 | return data[pos: finish_pos]
143 |
144 |
145 | def encode_length(L, offset):
146 | if L < 56:
147 | return chr(L + offset)
148 | elif L < 256 ** 8:
149 | BL = int_to_big_endian(L)
150 | return chr(len(BL) + offset + 55) + BL
151 | else:
152 | raise Exception("input too long")
153 |
154 |
155 | def encode(s):
156 | if isinstance(s, (str, unicode)):
157 | s = str(s)
158 | if len(s) == 1 and ord(s) < 128:
159 | return s
160 | else:
161 | return encode_length(len(s), 128) + s
162 | elif isinstance(s, list):
163 | return concat(map(encode, s))
164 |
165 | raise TypeError("Encoding of %s not supported" % type(s))
166 |
167 |
168 | def concat(s):
169 | '''
170 | :param s: a list, each item is a string of a rlp encoded data
171 | '''
172 | assert isinstance(s, list)
173 | output = ''.join(s)
174 | return encode_length(len(output), 192) + output
175 |
--------------------------------------------------------------------------------
/src/trie.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | import os
4 | import rlp
5 | import utils
6 | import db
7 |
8 | DB = db.DB
9 |
10 | PRINT = 0 #change to 1 to turn on printing
11 |
12 | def bin_to_nibbles(s):
13 | """convert string s to nibbles (half-bytes)
14 |
15 | >>> bin_to_nibbles("")
16 | []
17 | >>> bin_to_nibbles("h")
18 | [6, 8]
19 | >>> bin_to_nibbles("he")
20 | [6, 8, 6, 5]
21 | >>> bin_to_nibbles("hello")
22 | [6, 8, 6, 5, 6, 12, 6, 12, 6, 15]
23 | """
24 | res = []
25 | for x in s:
26 | res += divmod(ord(x), 16)
27 | return res
28 |
29 |
30 | def nibbles_to_bin(nibbles):
31 | if any(x > 15 or x < 0 for x in nibbles):
32 | raise Exception("nibbles can only be [0,..15]")
33 |
34 | if len(nibbles) % 2:
35 | raise Exception("nibbles must be of even numbers")
36 |
37 | res = ''
38 | for i in range(0, len(nibbles), 2):
39 | res += chr(16 * nibbles[i] + nibbles[i + 1])
40 | return res
41 |
42 |
43 | NIBBLE_TERMINATOR = 16
44 |
45 |
46 | def with_terminator(nibbles):
47 | nibbles = nibbles[:]
48 | if not nibbles or nibbles[-1] != NIBBLE_TERMINATOR:
49 | nibbles.append(NIBBLE_TERMINATOR)
50 | return nibbles
51 |
52 |
53 | def without_terminator(nibbles):
54 | nibbles = nibbles[:]
55 | if nibbles and nibbles[-1] == NIBBLE_TERMINATOR:
56 | del nibbles[-1]
57 | return nibbles
58 |
59 |
60 | def adapt_terminator(nibbles, has_terminator):
61 | if has_terminator:
62 | return with_terminator(nibbles)
63 | else:
64 | return without_terminator(nibbles)
65 |
66 |
67 | def pack_nibbles(nibbles):
68 | """pack nibbles to binary
69 |
70 | :param nibbles: a nibbles sequence. may have a terminator
71 | """
72 |
73 | if nibbles[-1:] == [NIBBLE_TERMINATOR]:
74 | flags = 2
75 | nibbles = nibbles[:-1]
76 | else:
77 | flags = 0
78 |
79 | oddlen = len(nibbles) % 2
80 | flags |= oddlen # set lowest bit if odd number of nibbles
81 | if oddlen:
82 | nibbles = [flags] + nibbles
83 | else:
84 | nibbles = [flags, 0] + nibbles
85 | o = ''
86 | for i in range(0, len(nibbles), 2):
87 | o += chr(16 * nibbles[i] + nibbles[i + 1])
88 | return o
89 |
90 |
91 | def unpack_to_nibbles(bindata):
92 | """unpack packed binary data to nibbles
93 |
94 | :param bindata: binary packed from nibbles
95 | :return: nibbles sequence, may have a terminator
96 | """
97 | o = bin_to_nibbles(bindata)
98 | flags = o[0]
99 | if flags & 2:
100 | o.append(NIBBLE_TERMINATOR)
101 | if flags & 1 == 1:
102 | o = o[1:]
103 | else:
104 | o = o[2:]
105 | return o
106 |
107 |
108 | def starts_with(full, part):
109 | ''' test whether the items in the part is
110 | the leading items of the full
111 | '''
112 | if len(full) < len(part):
113 | return False
114 | return full[:len(part)] == part
115 |
116 |
117 | (
118 | NODE_TYPE_BLANK,
119 | NODE_TYPE_LEAF,
120 | NODE_TYPE_EXTENSION,
121 | NODE_TYPE_BRANCH
122 | ) = tuple(range(4))
123 |
124 |
125 | def is_key_value_type(node_type):
126 | return node_type in [NODE_TYPE_LEAF,
127 | NODE_TYPE_EXTENSION]
128 |
129 | BLANK_NODE = ''
130 | BLANK_ROOT = ''
131 |
132 |
133 | class Trie(object):
134 |
135 | def __init__(self, dbfile, root_hash=BLANK_ROOT):
136 | '''it also present a dictionary like interface
137 |
138 | :param dbfile: key value database
139 | :root: blank or trie node in form of [key, value] or [v0,v1..v15,v]
140 | '''
141 | dbfile = os.path.abspath(dbfile)
142 | self.db = DB(dbfile)
143 | self.set_root_hash(root_hash)
144 |
145 | @property
146 | def root_hash(self):
147 | '''always empty or a 32 bytes string
148 | '''
149 | return self.get_root_hash()
150 |
151 | def get_root_hash(self):
152 | if self.root_node == BLANK_NODE:
153 | return BLANK_ROOT
154 | assert isinstance(self.root_node, list)
155 | val = rlp.encode(self.root_node)
156 | key = utils.sha3(val)
157 | self.db.put(key, val)
158 | return key
159 |
160 | @root_hash.setter
161 | def root_hash(self, value):
162 | self.set_root_hash(value)
163 |
164 | def set_root_hash(self, root_hash):
165 | if root_hash == BLANK_ROOT:
166 | self.root_node = BLANK_NODE
167 | return
168 | assert isinstance(root_hash, (str, unicode))
169 | assert len(root_hash) in [0, 32]
170 | self.root_node = self._decode_to_node(root_hash)
171 |
172 | def clear(self):
173 | ''' clear all tree data
174 | '''
175 | self._delete_child_stroage(self.root_node)
176 | self._delete_node_storage(self.root_node)
177 | self.db.commit()
178 | self.root_node = BLANK_NODE
179 |
180 | def _delete_child_stroage(self, node):
181 | node_type = self._get_node_type(node)
182 | if node_type == NODE_TYPE_BRANCH:
183 | for item in node[:16]:
184 | self._delete_child_stroage(self._decode_to_node(item))
185 | elif is_key_value_type(node_type):
186 | node_type = self._get_node_type(node)
187 | if node_type == NODE_TYPE_EXTENSION:
188 | self._delete_child_stroage(self._decode_to_node(node[1]))
189 |
190 | def _encode_node(self, node):
191 | if node == BLANK_NODE:
192 | return BLANK_NODE
193 | assert isinstance(node, list)
194 | rlpnode = rlp.encode(node)
195 | if len(rlpnode) < 32:
196 | return node
197 |
198 | hashkey = utils.sha3(rlpnode)
199 | self.db.put(hashkey, rlpnode)
200 | return hashkey
201 |
202 | def _decode_to_node(self, encoded):
203 | if encoded == BLANK_NODE:
204 | return BLANK_NODE
205 | if isinstance(encoded, list):
206 | return encoded
207 | return rlp.decode(self.db.get(encoded))
208 |
209 | def _get_node_type(self, node):
210 | ''' get node type and content
211 |
212 | :param node: node in form of list, or BLANK_NODE
213 | :return: node type
214 | '''
215 | if node == BLANK_NODE:
216 | return NODE_TYPE_BLANK
217 |
218 | if len(node) == 2:
219 | nibbles = unpack_to_nibbles(node[0])
220 | has_terminator = (nibbles and nibbles[-1] == NIBBLE_TERMINATOR)
221 | return NODE_TYPE_LEAF if has_terminator\
222 | else NODE_TYPE_EXTENSION
223 | if len(node) == 17:
224 | return NODE_TYPE_BRANCH
225 |
226 | def _get(self, node, key):
227 | """ get value inside a node
228 |
229 | :param node: node in form of list, or BLANK_NODE
230 | :param key: nibble list without terminator
231 | :return:
232 | BLANK_NODE if does not exist, otherwise value or hash
233 | """
234 | node_type = self._get_node_type(node)
235 | if node_type == NODE_TYPE_BLANK:
236 | return BLANK_NODE
237 |
238 | if node_type == NODE_TYPE_BRANCH:
239 | # already reach the expected node
240 | if not key:
241 | return node[-1]
242 | sub_node = self._decode_to_node(node[key[0]])
243 | return self._get(sub_node, key[1:])
244 |
245 | # key value node
246 | curr_key = without_terminator(unpack_to_nibbles(node[0]))
247 | if node_type == NODE_TYPE_LEAF:
248 | return node[1] if key == curr_key else BLANK_NODE
249 |
250 | if node_type == NODE_TYPE_EXTENSION:
251 | # traverse child nodes
252 | if starts_with(key, curr_key):
253 | sub_node = self._decode_to_node(node[1])
254 | return self._get(sub_node, key[len(curr_key):])
255 | else:
256 | return BLANK_NODE
257 |
258 | def _update(self, node, key, value):
259 | """ update item inside a node
260 |
261 | :param node: node in form of list, or BLANK_NODE
262 | :param key: nibble list without terminator
263 | .. note:: key may be []
264 | :param value: value string
265 | :return: new node
266 |
267 | if this node is changed to a new node, it's parent will take the
268 | responsibility to *store* the new node storage, and delete the old
269 | node storage
270 | """
271 | assert value != BLANK_NODE
272 | node_type = self._get_node_type(node)
273 |
274 | if node_type == NODE_TYPE_BLANK:
275 | if PRINT: print 'blank'
276 | return [pack_nibbles(with_terminator(key)), value]
277 |
278 | elif node_type == NODE_TYPE_BRANCH:
279 | if PRINT: print 'branch'
280 | if not key:
281 | if PRINT: print '\tdone', node
282 | node[-1] = value
283 | if PRINT: print '\t', node
284 |
285 | else:
286 | if PRINT: print 'recursive branch'
287 | if PRINT: print '\t', node, key, value
288 | new_node = self._update_and_delete_storage(
289 | self._decode_to_node(node[key[0]]),
290 | key[1:], value)
291 | if PRINT: print '\t', new_node
292 | node[key[0]] = self._encode_node(new_node)
293 | if PRINT: print '\t', node
294 | return node
295 |
296 | elif is_key_value_type(node_type):
297 | if PRINT: print 'kv'
298 | return self._update_kv_node(node, key, value)
299 |
300 | def _update_and_delete_storage(self, node, key, value):
301 | old_node = node[:]
302 | new_node = self._update(node, key, value)
303 | if old_node != new_node:
304 | self._delete_node_storage(old_node)
305 | return new_node
306 |
307 | def _update_kv_node(self, node, key, value):
308 | node_type = self._get_node_type(node)
309 | curr_key = without_terminator(unpack_to_nibbles(node[0]))
310 | is_inner = node_type == NODE_TYPE_EXTENSION
311 | if PRINT: print 'this node is an extension node?', is_inner
312 | if PRINT: print 'cur key, next key', curr_key, key
313 |
314 | # find longest common prefix
315 | prefix_length = 0
316 | for i in range(min(len(curr_key), len(key))):
317 | if key[i] != curr_key[i]:
318 | break
319 | prefix_length = i + 1
320 |
321 | remain_key = key[prefix_length:]
322 | remain_curr_key = curr_key[prefix_length:]
323 |
324 | if PRINT: print 'remain keys..'
325 | if PRINT: print prefix_length, remain_key, remain_curr_key
326 |
327 | # if the keys were the same, then either this is a terminal node or not. if yes, return [key, value]. if not, its an extension node, so the value of this node points to another node, from which we use remaining key.
328 |
329 | if remain_key == [] == remain_curr_key:
330 | if PRINT: print 'keys were same', node[0], key
331 | if not is_inner:
332 | if PRINT: print 'not an extension node'
333 | return [node[0], value]
334 | if PRINT: print 'yes an extension node!'
335 | new_node = self._update_and_delete_storage(
336 | self._decode_to_node(node[1]), remain_key, value)
337 |
338 | elif remain_curr_key == []:
339 | if PRINT: print 'old key exhausted'
340 | if is_inner:
341 | if PRINT: print '\t is extension', self._decode_to_node(node[1])
342 | new_node = self._update_and_delete_storage(
343 | self._decode_to_node(node[1]), remain_key, value)
344 | else:
345 | if PRINT: print '\tnew branch'
346 | new_node = [BLANK_NODE] * 17
347 | new_node[-1] = node[1]
348 | new_node[remain_key[0]] = self._encode_node([
349 | pack_nibbles(with_terminator(remain_key[1:])),
350 | value
351 | ])
352 | if PRINT: print new_node
353 | else:
354 | if PRINT: print 'making a branch'
355 | new_node = [BLANK_NODE] * 17
356 | if len(remain_curr_key) == 1 and is_inner:
357 | if PRINT: print 'key done and is inner'
358 | new_node[remain_curr_key[0]] = node[1]
359 | else:
360 | if PRINT: print 'key not done or not inner', node, key, value
361 | if PRINT: print remain_curr_key
362 | new_node[remain_curr_key[0]] = self._encode_node([
363 | pack_nibbles(
364 | adapt_terminator(remain_curr_key[1:], not is_inner)
365 | ),
366 | node[1]
367 | ])
368 |
369 | if remain_key == []:
370 | new_node[-1] = value
371 | else:
372 | new_node[remain_key[0]] = self._encode_node([
373 | pack_nibbles(with_terminator(remain_key[1:])), value
374 | ])
375 | if PRINT: print new_node
376 |
377 | if prefix_length:
378 | # create node for key prefix
379 | if PRINT: print 'prefix length', prefix_length
380 | new_node= [pack_nibbles(curr_key[:prefix_length]),
381 | self._encode_node(new_node)]
382 | if PRINT: print 'new node type', self._get_node_type(new_node)
383 | return new_node
384 | else:
385 | return new_node
386 |
387 | def _delete_node_storage(self, node):
388 | '''delete storage
389 | :param node: node in form of list, or BLANK_NODE
390 | '''
391 | if node == BLANK_NODE:
392 | return
393 | assert isinstance(node, list)
394 | encoded = self._encode_node(node)
395 | if len(encoded) < 32:
396 | return
397 | self.db.delete(encoded)
398 |
399 | def _delete(self, node, key):
400 | """ update item inside a node
401 |
402 | :param node: node in form of list, or BLANK_NODE
403 | :param key: nibble list without terminator
404 | .. note:: key may be []
405 | :return: new node
406 |
407 | if this node is changed to a new node, it's parent will take the
408 | responsibility to *store* the new node storage, and delete the old
409 | node storage
410 | """
411 | node_type = self._get_node_type(node)
412 | if node_type == NODE_TYPE_BLANK:
413 | return BLANK_NODE
414 |
415 | if node_type == NODE_TYPE_BRANCH:
416 | return self._delete_branch_node(node, key)
417 |
418 | if is_key_value_type(node_type):
419 | return self._delete_kv_node(node, key)
420 |
421 | def _normalize_branch_node(self, node):
422 | '''node should have only one item changed
423 | '''
424 | not_blank_items_count = sum(1 for x in range(17) if node[x])
425 | assert not_blank_items_count >= 1
426 |
427 | if not_blank_items_count > 1:
428 | return node
429 |
430 | # now only one item is not blank
431 | not_blank_index = [i for i, item in enumerate(node) if item][0]
432 |
433 | # the value item is not blank
434 | if not_blank_index == 16:
435 | return [pack_nibbles(with_terminator([])), node[16]]
436 |
437 | # normal item is not blank
438 | sub_node = self._decode_to_node(node[not_blank_index])
439 | sub_node_type = self._get_node_type(sub_node)
440 |
441 | if is_key_value_type(sub_node_type):
442 | # collape subnode to this node, not this node will have same
443 | # terminator with the new sub node, and value does not change
444 | new_key = [not_blank_index] + \
445 | unpack_to_nibbles(sub_node[0])
446 | return [pack_nibbles(new_key), sub_node[1]]
447 | if sub_node_type == NODE_TYPE_BRANCH:
448 | return [pack_nibbles([not_blank_index]),
449 | self._encode_node(sub_node)]
450 | assert False
451 |
452 | def _delete_and_delete_storage(self, node, key):
453 | old_node = node[:]
454 | new_node = self._delete(node, key)
455 | if old_node != new_node:
456 | self._delete_node_storage(old_node)
457 | return new_node
458 |
459 | def _delete_branch_node(self, node, key):
460 | # already reach the expected node
461 | if not key:
462 | node[-1] = BLANK_NODE
463 | return self._normalize_branch_node(node)
464 |
465 | encoded_new_sub_node = self._encode_node(
466 | self._delete_and_delete_storage(
467 | self._decode_to_node(node[key[0]]), key[1:])
468 | )
469 |
470 | if encoded_new_sub_node == node[key[0]]:
471 | return node
472 |
473 | node[key[0]] = encoded_new_sub_node
474 | if encoded_new_sub_node == BLANK_NODE:
475 | return self._normalize_branch_node(node)
476 |
477 | return node
478 |
479 | def _delete_kv_node(self, node, key):
480 | node_type = self._get_node_type(node)
481 | assert is_key_value_type(node_type)
482 | curr_key = without_terminator(unpack_to_nibbles(node[0]))
483 |
484 | if not starts_with(key, curr_key):
485 | # key not found
486 | return node
487 |
488 | if node_type == NODE_TYPE_LEAF:
489 | return BLANK_NODE if key == curr_key else node
490 |
491 | # for inner key value type
492 | new_sub_node = self._delete_and_delete_storage(
493 | self._decode_to_node(node[1]), key[len(curr_key):])
494 |
495 | if self._encode_node(new_sub_node) == node[1]:
496 | return node
497 |
498 | # new sub node is BLANK_NODE
499 | if new_sub_node == BLANK_NODE:
500 | return BLANK_NODE
501 |
502 | assert isinstance(new_sub_node, list)
503 |
504 | # new sub node not blank, not value and has changed
505 | new_sub_node_type = self._get_node_type(new_sub_node)
506 |
507 | if is_key_value_type(new_sub_node_type):
508 | # collape subnode to this node, not this node will have same
509 | # terminator with the new sub node, and value does not change
510 | new_key = curr_key + unpack_to_nibbles(new_sub_node[0])
511 | return [pack_nibbles(new_key), new_sub_node[1]]
512 |
513 | if new_sub_node_type == NODE_TYPE_BRANCH:
514 | return [pack_nibbles(curr_key), self._encode_node(new_sub_node)]
515 |
516 | # should be no more cases
517 | assert False
518 |
519 | def delete(self, key):
520 | '''
521 | :param key: a string with length of [0, 32]
522 | '''
523 | if not isinstance(key, (str, unicode)):
524 | raise Exception("Key must be string")
525 |
526 | if len(key) > 32:
527 | raise Exception("Max key length is 32")
528 |
529 | self.root_node = self._delete_and_delete_storage(
530 | self.root_node,
531 | bin_to_nibbles(str(key)))
532 | self.get_root_hash()
533 | self.db.commit()
534 |
535 | def _get_size(self, node):
536 | '''Get counts of (key, value) stored in this and the descendant nodes
537 |
538 | :param node: node in form of list, or BLANK_NODE
539 | '''
540 | if node == BLANK_NODE:
541 | return 0
542 |
543 | node_type = self._get_node_type(node)
544 |
545 | if is_key_value_type(node_type):
546 | value_is_node = node_type == NODE_TYPE_EXTENSION
547 | if value_is_node:
548 | return self._get_size(self._decode_to_node(node[1]))
549 | else:
550 | return 1
551 | elif node_type == NODE_TYPE_BRANCH:
552 | sizes = [self._get_size(self._decode_to_node(node[x]))
553 | for x in range(16)]
554 | sizes = sizes + [1 if node[-1] else 0]
555 | return sum(sizes)
556 |
557 | def _to_dict(self, node):
558 | '''convert (key, value) stored in this and the descendant nodes
559 | to dict items.
560 |
561 | :param node: node in form of list, or BLANK_NODE
562 |
563 | .. note::
564 |
565 | Here key is in full form, rather than key of the individual node
566 | '''
567 | if node == BLANK_NODE:
568 | return {}
569 |
570 | node_type = self._get_node_type(node)
571 |
572 | if is_key_value_type(node_type):
573 | nibbles = without_terminator(unpack_to_nibbles(node[0]))
574 | key = '+'.join([str(x) for x in nibbles])
575 | if node_type == NODE_TYPE_EXTENSION:
576 | sub_dict = self._to_dict(self._decode_to_node(node[1]))
577 | else:
578 | sub_dict = {str(NIBBLE_TERMINATOR): node[1]}
579 |
580 | # prepend key of this node to the keys of children
581 | res = {}
582 | for sub_key, sub_value in sub_dict.iteritems():
583 | full_key = '{0}+{1}'.format(key, sub_key).strip('+')
584 | res[full_key] = sub_value
585 | return res
586 |
587 | elif node_type == NODE_TYPE_BRANCH:
588 | res = {}
589 | for i in range(16):
590 | sub_dict = self._to_dict(self._decode_to_node(node[i]))
591 |
592 | for sub_key, sub_value in sub_dict.iteritems():
593 | full_key = '{0}+{1}'.format(i, sub_key).strip('+')
594 | res[full_key] = sub_value
595 |
596 | if node[16]:
597 | res[str(NIBBLE_TERMINATOR)] = node[-1]
598 | return res
599 |
600 | def to_dict(self):
601 | d = self._to_dict(self.root_node)
602 | res = {}
603 | for key_str, value in d.iteritems():
604 | if key_str:
605 | nibbles = [int(x) for x in key_str.split('+')]
606 | else:
607 | nibbles = []
608 | key = nibbles_to_bin(without_terminator(nibbles))
609 | res[key] = value
610 | return res
611 |
612 | def get(self, key):
613 | return self._get(self.root_node, bin_to_nibbles(str(key)))
614 |
615 | def __len__(self):
616 | return self._get_size(self.root_node)
617 |
618 | def __getitem__(self, key):
619 | return self.get(key)
620 |
621 | def __setitem__(self, key, value):
622 | return self.update(key, value)
623 |
624 | def __delitem__(self, key):
625 | return self.delete(key)
626 |
627 | def __iter__(self):
628 | return iter(self.to_dict())
629 |
630 | def __contains__(self, key):
631 | return self.get(key) != BLANK_NODE
632 |
633 | def update(self, key, value):
634 | '''
635 | :param key: a string with length of [0, 32]
636 | :value: a string
637 | '''
638 | if not isinstance(key, (str, unicode)):
639 | raise Exception("Key must be string")
640 |
641 | if len(key) > 32:
642 | raise Exception("Max key length is 32")
643 |
644 | if not isinstance(value, (str, unicode)):
645 | raise Exception("Value must be string")
646 |
647 | if value == '':
648 | return self.delete(key)
649 |
650 | self.root_node = self._update_and_delete_storage(
651 | self.root_node,
652 | bin_to_nibbles(str(key)),
653 | value)
654 | if PRINT: print 'root hash before db commit', self.get_root_hash().encode('hex')
655 | self.db.commit()
656 |
657 | def root_hash_valid(self):
658 | if self.root_hash == BLANK_ROOT:
659 | return True
660 | return self.root_hash in self.db
661 |
662 | if __name__ == "__main__":
663 | import sys
664 |
665 | def encode_node(nd):
666 | if isinstance(nd, str):
667 | return nd.encode('hex')
668 | else:
669 | return rlp.encode(nd).encode('hex')
670 |
671 | if len(sys.argv) >= 2:
672 | if sys.argv[1] == 'insert':
673 | t = Trie(sys.argv[2], sys.argv[3].decode('hex'))
674 | t.update(sys.argv[4], sys.argv[5])
675 | print encode_node(t.root_hash)
676 | elif sys.argv[1] == 'get':
677 | t = Trie(sys.argv[2], sys.argv[3].decode('hex'))
678 | print t.get(sys.argv[4])
679 |
--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
1 | import logging
2 | import logging.config
3 | from sha3 import sha3_256
4 | from bitcoin import privtopub
5 | import struct
6 | import os
7 | import sys
8 | import rlp
9 | import db
10 | import random
11 | from rlp import big_endian_to_int, int_to_big_endian
12 |
13 |
14 | logger = logging.getLogger(__name__)
15 |
16 |
17 | # decorator
18 | def debug(label):
19 | def deb(f):
20 | def inner(*args, **kwargs):
21 | i = random.randrange(1000000)
22 | print label, i, 'start', args
23 | x = f(*args, **kwargs)
24 | print label, i, 'end', x
25 | return x
26 | return inner
27 | return deb
28 |
29 |
30 | def sha3(seed):
31 | return sha3_256(seed).digest()
32 |
33 |
34 | def privtoaddr(x):
35 | if len(x) > 32:
36 | x = x.decode('hex')
37 | return sha3(privtopub(x)[1:])[12:].encode('hex')
38 |
39 |
40 | def zpad(x, l):
41 | return '\x00' * max(0, l - len(x)) + x
42 |
43 |
44 | def coerce_addr_to_bin(x):
45 | if isinstance(x, (int, long)):
46 | return zpad(int_to_big_endian(x), 20).encode('hex')
47 | elif len(x) == 40 or len(x) == 0:
48 | return x.decode('hex')
49 | else:
50 | return zpad(x, 20)[-20:]
51 |
52 |
53 | def coerce_addr_to_hex(x):
54 | if isinstance(x, (int, long)):
55 | return zpad(int_to_big_endian(x), 20).encode('hex')
56 | elif len(x) == 40 or len(x) == 0:
57 | return x
58 | else:
59 | return zpad(x, 20)[-20:].encode('hex')
60 |
61 |
62 | def coerce_to_int(x):
63 | if isinstance(x, (int, long)):
64 | return x
65 | elif len(x) == 40:
66 | return big_endian_to_int(x.decode('hex'))
67 | else:
68 | return big_endian_to_int(x)
69 |
70 |
71 | def coerce_to_bytes(x):
72 | if isinstance(x, (int, long)):
73 | return int_to_big_endian(x)
74 | elif len(x) == 40:
75 | return x.decode('hex')
76 | else:
77 | return x
78 |
79 |
80 | def int_to_big_endian4(integer):
81 | ''' 4 bytes big endian integer'''
82 | return struct.pack('>I', integer)
83 |
84 |
85 | def recursive_int_to_big_endian(item):
86 | ''' convert all int to int_to_big_endian recursively
87 | '''
88 | if isinstance(item, (int, long)):
89 | return int_to_big_endian(item)
90 | elif isinstance(item, (list, tuple)):
91 | res = []
92 | for item in item:
93 | res.append(recursive_int_to_big_endian(item))
94 | return res
95 | return item
96 |
97 |
98 | def rlp_encode(item):
99 | '''
100 | item can be nested string/integer/list of string/integer
101 | '''
102 | return rlp.encode(recursive_int_to_big_endian(item))
103 |
104 | # Format encoders/decoders for bin, addr, int
105 |
106 |
107 | def decode_hash(v):
108 | '''decodes a bytearray from hash'''
109 | return db_get(v)
110 |
111 |
112 | def decode_bin(v):
113 | '''decodes a bytearray from serialization'''
114 | if not isinstance(v, (str, unicode)):
115 | raise Exception("Value must be binary, not RLP array")
116 | return v
117 |
118 |
119 | def decode_addr(v):
120 | '''decodes an address from serialization'''
121 | if len(v) not in [0, 20]:
122 | raise Exception("Serialized addresses must be empty or 20 bytes long!")
123 | return v.encode('hex')
124 |
125 |
126 | def decode_int(v):
127 | '''decodes and integer from serialization'''
128 | if len(v) > 0 and v[0] == '\x00':
129 | raise Exception("No leading zero bytes allowed for integers")
130 | return big_endian_to_int(v)
131 |
132 |
133 | def decode_root(root):
134 | if isinstance(root, list):
135 | if len(rlp.encode(root)) >= 32:
136 | raise Exception("Direct RLP roots must have length <32")
137 | elif isinstance(root, (str, unicode)):
138 | if len(root) != 0 and len(root) != 32:
139 | raise Exception("String roots must be empty or length-32")
140 | else:
141 | raise Exception("Invalid root")
142 | return root
143 |
144 |
145 | def encode_hash(v):
146 | '''encodes a bytearray into hash'''
147 | k = sha3(v)
148 | db_put(k, v)
149 | return k
150 |
151 |
152 | def encode_bin(v):
153 | '''encodes a bytearray into serialization'''
154 | return v
155 |
156 |
157 | def encode_root(v):
158 | '''encodes a trie root into serialization'''
159 | return v
160 |
161 |
162 | def encode_addr(v):
163 | '''encodes an address into serialization'''
164 | if not isinstance(v, (str, unicode)) or len(v) not in [0, 40]:
165 | raise Exception("Address must be empty or 40 chars long")
166 | return v.decode('hex')
167 |
168 |
169 | def encode_int(v):
170 | '''encodes an integer into serialization'''
171 | if not isinstance(v, (int, long)) or v < 0 or v >= 2 ** 256:
172 | raise Exception("Integer invalid or out of range")
173 | return int_to_big_endian(v)
174 |
175 | decoders = {
176 | "hash": decode_hash,
177 | "bin": decode_bin,
178 | "addr": decode_addr,
179 | "int": decode_int,
180 | "trie_root": decode_root,
181 | }
182 |
183 | encoders = {
184 | "hash": encode_hash,
185 | "bin": encode_bin,
186 | "addr": encode_addr,
187 | "int": encode_int,
188 | "trie_root": encode_root,
189 | }
190 |
191 |
192 | def print_func_call(ignore_first_arg=False, max_call_number=100):
193 | ''' utility function to facilitate debug, it will print input args before
194 | function call, and print return value after function call
195 |
196 | usage:
197 |
198 | @print_func_call
199 | def some_func_to_be_debu():
200 | pass
201 |
202 | :param ignore_first_arg: whether print the first arg or not.
203 | useful when ignore the `self` parameter of an object method call
204 | '''
205 | from functools import wraps
206 |
207 | def display(x):
208 | x = str(x)
209 | try:
210 | x.decode('ascii')
211 | except:
212 | return 'NON_PRINTABLE'
213 | return x
214 |
215 | local = {'call_number': 0}
216 |
217 | def inner(f):
218 |
219 | @wraps(f)
220 | def wrapper(*args, **kwargs):
221 | local['call_number'] = local['call_number'] + 1
222 | tmp_args = args[1:] if ignore_first_arg and len(args) else args
223 | this_call_number = local['call_number']
224 | print('{0}#{1} args: {2}, {3}'.format(
225 | f.__name__,
226 | this_call_number,
227 | ', '.join([display(x) for x in tmp_args]),
228 | ', '.join(display(key) + '=' + str(value)
229 | for key, value in kwargs.iteritems())
230 | ))
231 | res = f(*args, **kwargs)
232 | print('{0}#{1} return: {2}'.format(
233 | f.__name__,
234 | this_call_number,
235 | display(res)))
236 |
237 | if local['call_number'] > 100:
238 | raise Exception("Touch max call number!")
239 | return res
240 | return wrapper
241 | return inner
242 |
243 |
244 | class DataDir(object):
245 |
246 | ethdirs = {
247 | "linux2": "~/.pyethereum",
248 | "darwin": "~/Library/Application Support/Pyethereum/",
249 | "win32": "~/AppData/Roaming/Pyethereum",
250 | "win64": "~/AppData/Roaming/Pyethereum",
251 | }
252 |
253 | def __init__(self):
254 | self._path = None
255 |
256 | def set(self, path):
257 | path = os.path.abspath(path)
258 | if not os.path.exists(path):
259 | os.makedirs(path)
260 | assert os.path.isdir(path)
261 | self._path = path
262 |
263 | def _set_default(self):
264 | p = self.ethdirs.get(sys.platform, self.ethdirs['linux2'])
265 | self.set(os.path.expanduser(os.path.normpath(p)))
266 |
267 | @property
268 | def path(self):
269 | if not self._path:
270 | self._set_default()
271 | return self._path
272 |
273 | data_dir = DataDir()
274 |
275 |
276 | def get_db_path():
277 | return os.path.join(data_dir.path, 'statedb')
278 |
279 |
280 | def get_index_path():
281 | return os.path.join(data_dir.path, 'indexdb')
282 |
283 |
284 | def db_put(key, value):
285 | database = db.DB(get_db_path())
286 | res = database.put(key, value)
287 | database.commit()
288 | return res
289 |
290 |
291 | def db_get(key):
292 | database = db.DB(get_db_path())
293 | return database.get(key)
294 |
295 |
296 | def configure_logging(loggerlevels=':DEBUG', verbosity=1):
297 | logconfig = dict(
298 | version=1,
299 | disable_existing_loggers=False,
300 | formatters=dict(
301 | debug=dict(
302 | format='[%(asctime)s] %(name)s %(levelname)s %(threadName)s:'
303 | ' %(message)s'
304 | ),
305 | minimal=dict(
306 | format='%(message)s'
307 | ),
308 | ),
309 | handlers=dict(
310 | default={
311 | 'level': 'INFO',
312 | 'class': 'logging.StreamHandler',
313 | 'formatter': 'minimal'
314 | },
315 | verbose={
316 | 'level': 'DEBUG',
317 | 'class': 'logging.StreamHandler',
318 | 'formatter': 'debug'
319 | },
320 | ),
321 | loggers=dict()
322 | )
323 |
324 | for loggerlevel in filter(lambda _: ':' in _, loggerlevels.split(',')):
325 | name, level = loggerlevel.split(':')
326 | logconfig['loggers'][name] = dict(
327 | handlers=['verbose'], level=level, propagate=False)
328 |
329 | if len(logconfig['loggers']) == 0:
330 | logconfig['loggers'][''] = dict(
331 | handlers=['default'],
332 | level={0: 'ERROR', 1: 'WARNING', 2: 'INFO', 3: 'DEBUG'}.get(
333 | verbosity),
334 | propagate=True)
335 |
336 | logging.config.dictConfig(logconfig)
337 | # logging.debug("logging set up like that: %r", logconfig)
338 |
339 |
340 | class Denoms():
341 | def __init__(self):
342 | self.wei = 1
343 | self.babbage = 10**3
344 | self.lovelace = 10**6
345 | self.shannon = 10**9
346 | self.szabo = 10**12
347 | self.finney = 10**15
348 | self.ether = 10**18
349 | self.turing = 2**256
350 |
351 | denoms = Denoms()
352 |
--------------------------------------------------------------------------------