hey

12 | 13 |

14 |

15 | 18 |

19 |

Home
For Users
For Developers
Forums

25 |

Donate!

27 |

28 | 29 |

30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 |

BEP:	5
Title:	DHT Protocol
Version:	11031
Last-Modified:	2008-02-28 16:43:58 -0800 (Thu, 28 Feb 2008)
Author:	Andrew Loewenstern <drue at bittorrent.com>
Status:	Draft
Type:	Standards Track
Created:	31-Jan-2008
Post-History:

57 |

58 |

59 |

Contents

60 |

Overview
Routing Table
BitTorrent Protocol Extension
Torrent File Extensions
KRPC Protocol
- Contact Encoding
- Queries
- Responses
- Errors
71 |
DHT Queries
- ping
- find_node
- get_peers
- announce_peer
78 |
References
Copyright

82 |

83 |

BitTorrent uses a "distributed sloppy hash table" (DHT) for storing 84 | peer contact information for "trackerless" torrents. In effect, each 85 | peer becomes a tracker. The protocol is based on Kademila [1] and is 86 | implemented over UDP.

87 |

Please note the terminology used in this document to avoid 88 | confusion. A "peer" is a client/server listening on a TCP port that 89 | implements the BitTorrent protocol. A "node" is a client/server 90 | listening on a UDP port implementing the distributed hash table 91 | protocol. The DHT is composed of nodes and stores the location of 92 | peers. BitTorrent clients include a DHT node, which is used to contact 93 | other nodes in the DHT to get the location of peers to download from 94 | using the BitTorrent protocol.

95 |

96 |

Overview

97 |

Each node has a globally unique identifier known as the "node ID." 98 | Node IDs are chosen at random from the same 160-bit space as 99 | BitTorrent infohashes [2]. A "distance metric" is used to 100 | compare two node IDs or a node ID and an infohash for "closeness." 101 | Nodes must maintain a routing table containing the contact information 102 | for a small number of other nodes. The routing table becomes more 103 | detailed as IDs get closer to the node's own ID. Nodes know about many 104 | other nodes in the DHT that have IDs that are "close" to their own but 105 | have only a handful of contacts with IDs that are very far away from 106 | their own.

107 |

In Kademlia, the distance metric is XOR and the result is interpreted 108 | as an unsigned integer. distance(A,B) = |A xor B| Smaller values 109 | are closer.

110 |

When a node wants to find peers for a torrent, it uses the distance 111 | metric to compare the infohash of the torrent with the IDs of the 112 | nodes in its own routing table. It then contacts the nodes it knows 113 | about with IDs closest to the infohash and asks them for the contact 114 | information of peers currently downloading the torrent. If a contacted 115 | node knows about peers for the torrent, the peer contact information 116 | is returned with the response. Otherwise, the contacted node must 117 | respond with the contact information of the nodes in its routing table 118 | that are closest to the infohash of the torrent. The original node 119 | iteratively queries nodes that are closer to the target infohash until 120 | it cannot find any closer nodes. After the search is exhausted, the 121 | client then inserts the peer contact information for itself onto the 122 | responding nodes with IDs closest to the infohash of the torrent.

123 |

The return value for a query for peers includes an opaque value known 124 | as the "token." For a node to announce that its controlling peer is 125 | downloading a torrent, it must present the token received from the 126 | same queried node in a recent query for peers. When a node attempts to 127 | "announce" a torrent, the queried node checks the token against the 128 | querying node's IP address. This is to prevent malicious hosts from 129 | signing up other hosts for torrents. Since the token is merely 130 | returned by the querying node to the same node it received the token 131 | from, the implementation is not defined. Tokens must be accepted for a 132 | reasonable amount of time after they have been distributed. The 133 | BitTorrent implementation uses the SHA1 hash of the IP address 134 | concatenated onto a secret that changes every five minutes and tokens 135 | up to ten minutes old are accepted.

136 |

137 |

138 |

Routing Table

139 |

Every node maintains a routing table of known good nodes. The nodes in 140 | the routing table are used as starting points for queries in the 141 | DHT. Nodes from the routing table are returned in response to queries 142 | from other nodes.

143 |

Not all nodes that we learn about are equal. Some are "good" and some 144 | are not. Many nodes using the DHT are able to send queries and receive 145 | responses, but are not able to respond to queries from other nodes. It 146 | is important that each node's routing table must contain only known 147 | good nodes. A good node is a node has responded to one of our queries 148 | within the last 15 minutes. A node is also good if it has ever 149 | responded to one of our queries and has sent us a query within the 150 | last 15 minutes. After 15 minutes of inactivity, a node becomes 151 | questionable. Nodes become bad when they fail to respond to multiple 152 | queries in a row. Nodes that we know are good are given priority over 153 | nodes with unknown status.

154 |

The routing table covers the entire node ID space from 0 to 155 | 2¹⁶⁰. The routing table is subdivided into "buckets" that 156 | each cover a portion of the space. An empty table has one bucket with 157 | an ID space range of min=0, max=2¹⁶⁰. When a node with ID 158 | "N" is inserted into the table, it is placed within the bucket that 159 | has min <= N < max. An empty table has only one bucket so any 160 | node must fit within it. Each bucket can only hold K nodes, currently 161 | eight, before becoming "full." When a bucket is full of known good 162 | nodes, no more nodes may be added unless our own node ID falls within 163 | the range of the bucket. In that case, the bucket is replaced by two 164 | new buckets each with half the range of the old bucket and the nodes 165 | from the old bucket are distributed among the two new ones. For a new 166 | table with only one bucket, the full bucket is always split into two 167 | new buckets covering the ranges 0..2¹⁵⁹ and 168 | 2¹⁵⁹..2¹⁶⁰.

169 |

When the bucket is full of good nodes, the new node is simply 170 | discarded. If any nodes in the bucket are known to have become bad, 171 | then one is replaced by the new node. If there are any questionable 172 | nodes in the bucket have not been seen in the last 15 minutes, the 173 | least recently seen node is pinged. If the pinged node responds then 174 | the next least recently seen questionable node is pinged until one 175 | fails to respond or all of the nodes in the bucket are known to be 176 | good. If a node in the bucket fails to respond to a ping, it is 177 | suggested to try once more before discarding the node and replacing it 178 | with a new good node. In this way, the table fills with stable long 179 | running nodes.

180 |

Each bucket should maintain a "last changed" property to 181 | indicate how "fresh" the contents are. When a node in a bucket is 182 | pinged and it responds, or a node is added to a bucket, or a node in a 183 | bucket is replaced with another node, the bucket's last changed 184 | property should be updated. Buckets that have not been changed in 15 185 | minutes should be "refreshed." This is done by picking a random ID in 186 | the range of the bucket and performing a find_nodes search on it. Nodes 187 | that are able to receive queries from other nodes usually do not need 188 | to refresh buckets often. Nodes that are not able to receive queries 189 | from other nodes usually will need to refresh all buckets periodically 190 | to ensure there are good nodes in their table when the DHT is needed.

191 |

Upon inserting the first node into its routing table and when starting 192 | up thereafter, the node should attempt to find the closest nodes in 193 | the DHT to itself. It does this by issuing find_node messages to 194 | closer and closer nodes until it cannot find any closer. The routing 195 | table should be saved between invocations of the client software.

196 |

197 |

198 |

BitTorrent Protocol Extension

199 |

The BitTorrent protocol has been extended to exchange node UDP port 200 | numbers between peers that are introduced by a tracker. In this way, 201 | clients can get their routing tables seeded automatically through the 202 | download of regular torrents. Newly installed clients who attempt to 203 | download a trackerless torrent on the first try will not have any 204 | nodes in their routing table and will need the contacts included in 205 | the torrent file.

206 |

Peers supporting the DHT set the last bit of the 8-byte reserved flags 207 | exchanged in the BitTorrent protocol handshake. Peer receiving a 208 | handshake indicating the remote peer supports the DHT should send a 209 | PORT message. It begins with byte 0x09 and has a two byte payload 210 | containing the UDP port of the DHT node in network byte order. Peers 211 | that receive this message should attempt to ping the node on the 212 | received port and IP address of the remote peer. If a response to the 213 | ping is recieved, the node should attempt to insert the new contact 214 | information into their routing table according to the usual rules.

215 |

216 |

217 |

Torrent File Extensions

218 |

A trackerless torrent dictionary does not have an "announce" key. 219 | Instead, a trackerless torrent has a "nodes" key. This key should be 220 | set to the K closest nodes in the torrent generating client's routing 221 | table. Alternatively, the key could be set to a known good node such 222 | as one operated by the person generating the torrent. Please do not 223 | automatically add "router.bittorrent.com" to torrent files or 224 | automatically add this node to clients routing tables.

225 |

226 | nodes = [["<host>", <port>], ["<host>", <port>], ...]
227 | nodes = [["127.0.0.1", 6881], ["your.router.node", 4804]]
228 |

229 |

230 |

231 |

KRPC Protocol

232 |

The KRPC protocol is a simple RPC mechanism consisting of bencoded 233 | dictionaries sent over UDP. A single query packet is sent out and a 234 | single packet is sent in response. There is no retry. There are three 235 | message types: query, response, and error. For the DHT protocol, there 236 | are four queries: ping, find_node, get_peers, and announce_peer.

237 |

A KRPC message is a single dictionary with two keys common to 238 | every message and additional keys depending on the type of message. 239 | Every message has a key "t" with a string value representing a transaction 240 | ID. This transaction ID is generated by the querying node and is echoed 241 | in the response, so responses may be correlated with multiple queries 242 | to the same node. The transaction ID should be encoded as a short string 243 | of binary numbers, typically 2 characters are enough as they cover 2^16 244 | outstanding queries. The other key contained in every KRPC message is "y" 245 | with a single character value describing the type of message. The value 246 | of the "y" key is one of "q" for query, "r" for response, or "e" for 247 | error.

248 |

249 |

Contact Encoding

250 |

Contact information for peers is encoded as a 6-byte string. Also 251 | known as "Compact IP-address/port info" the 4-byte IP address is in 252 | network byte order with the 2 byte port in network byte order 253 | concatenated onto the end.

254 |

Contact information for nodes is encoded as a 26-byte string. 255 | Also known as "Compact node info" the 20-byte Node ID in network byte 256 | order has the compact IP-address/port info concatenated to the end.

257 |

258 |

259 |

Queries

260 |

Queries, or KRPC message dictionaries with a "y" value of "q", 261 | contain two additional keys; "q" and "a". Key "q" has a string value 262 | containing the method name of the query. Key "a" has a dictionary value 263 | containing named arguments to the query.

264 |

265 |

266 |

Responses

267 |

Responses, or KRPC message dictionaries with a "y" value of "r", 268 | contain one additional key "r". The value of "r" is a dictionary 269 | containing named return values. Response messages are sent upon 270 | successful completion of a query.

271 |

272 |

273 |

Errors

274 |

Errors, or KRPC message dictionaries with a "y" value of "e", 275 | contain one additional key "e". The value of "e" is a list. The first 276 | element is an integer representing the error code. The second element 277 | is a string containing the error message. Errors are sent when a query 278 | cannot be fulfilled. The following table describes the possible error 279 | codes:

280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 298 | 299 | 300 | 301 | 302 | 303 |

Code	Description
201	Generic Error
202	Server Error
203	Protocol Error, such as a malformed 297 \| packet, invalid arguments, or bad token
204	Method Unknown

304 |

Example Error Packets:

305 |

306 | generic error = {"t":"aa", "y":"e", "e":[201, "A Generic Error Ocurred"]}
307 | bencoded = d1:eli201e23:A Generic Error Ocurrede1:t2:aa1:y1:ee
308 |

309 |

310 |

311 |

312 |

DHT Queries

313 |

All queries have an "id" key and value containing the node ID of the 314 | querying node. All responses have an "id" key and value containing the 315 | node ID of the responding node.

316 |

317 |

ping

318 |

The most basic query is a ping. "q" = "ping" A ping query has a 319 | single argument, "id" the value is a 20-byte string containing the 320 | senders node ID in network byte order. The appropriate response to a 321 | ping has a single key "id" containing the node ID of the responding 322 | node.

323 |

324 | arguments:  {"id"&nbsp;: "<querying nodes id>"}
325 | 
326 | response: {"id"&nbsp;: "<queried nodes id>"}
327 |

328 |

Example Packets

329 |

330 | ping Query = {"t":"aa", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}}
331 | bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t2:aa1:y1:qe
332 |

333 |

334 | Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
335 | bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re
336 |

337 |

338 |

339 |

find_node

340 |

Find node is used to find the contact information for a node given 341 | its ID. "q" == "find_node" A find_node query has two arguments, "id" 342 | containing the node ID of the querying node, and "target" containing 343 | the ID of the node sought by the queryer. When a node receives a 344 | find_node query, it should respond with a key "nodes" and value of a 345 | string containing the compact node info for the target node or the K 346 | (8) closest good nodes in its own routing table.

347 |

348 | arguments:  {"id"&nbsp;: "<querying nodes id>", "target"&nbsp;: "<id of target node>"}
349 | 
350 | response: {"id"&nbsp;: "<queried nodes id>", "nodes"&nbsp;: "<compact node info>"}
351 |

352 |

Example Packets

353 |

354 | find_node Query = {"t":"aa", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}
355 | bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t2:aa1:y1:qe
356 |

357 |

358 | Response = {"t":"aa", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456..."}}
359 | bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...e1:t2:aa1:y1:re
360 |

361 |

362 |

363 |

get_peers

364 |

Get peers associated with a torrent infohash. "q" = "get_peers" A 365 | get_peers query has two arguments, "id" containing the node ID of the 366 | querying node, and "info_hash" containing the infohash of the torrent. 367 | If the queried node has peers for the infohash, they are returned in a 368 | key "values" as a list of strings. Each string containing "compact" format 369 | peer information for a single peer. If the queried node has no 370 | peers for the infohash, a key "nodes" is returned containing the K 371 | nodes in the queried nodes routing table closest to the infohash 372 | supplied in the query. In either case a "token" key is also included in 373 | the return value. The token value is a required argument for a future 374 | announce_peer query. The token value should be a short binary string.

375 |

376 | arguments:  {"id"&nbsp;: "<querying nodes id>", "info_hash"&nbsp;: "<20-byte infohash of target torrent>"}
377 | 
378 | response: {"id"&nbsp;: "<queried nodes id>", "token"&nbsp;:"<opaque write token>", "values"&nbsp;: ["<peer 1 info string>", "<peer 2 info string>"]}
379 | 
380 | or: {"id"&nbsp;: "<queried nodes id>", "token"&nbsp;:"<opaque write token>", "nodes"&nbsp;: "<compact node info>"}
381 |

382 |

Example Packets:

383 |

384 | get_peers Query = {"t":"aa", "y":"q", "q":"get_peers", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456"}}
385 | bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe
386 |

387 |

388 | Response with peers = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "values": ["axje.u", "idhtnm"]}}
389 | bencoded = d1:rd2:id20:abcdefghij01234567895:token8:aoeusnth6:valuesl6:axje.u6:idhtnmee1:t2:aa1:y1:re
390 |

391 |

392 | Response with closest nodes = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "nodes": "def456..."}}
393 | bencoded = d1:rd2:id20:abcdefghij01234567895:nodes9:def456...5:token8:aoeusnthe1:t2:aa1:y1:re
394 |

395 |

396 |

397 |

announce_peer

398 |

Announce that the peer, controlling the querying node, is downloading 399 | a torrent on a port. announce_peer has four arguments: "id" containing the node ID of the 400 | querying node, "info_hash" containing the infohash of the torrent, 401 | "port" containing the port as an integer, and the "token" received in 402 | response to a previous get_peers query. The queried node must verify 403 | that the token was previously sent to the same IP address as the 404 | querying node. Then the queried node should store the IP address of the 405 | querying node and the supplied port number under the infohash in its 406 | store of peer contact information.

407 |

408 | arguments:  {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"}
409 | 
410 | response: {"id" : "<queried nodes id>"}
411 |

412 |

Example Packets:

413 |

414 | announce_peers Query = {"t":"aa", "y":"q", "q":"announce_peer", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456", "port": 6881, "token": "aoeusnth"}}
415 | bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:<br />
416 | mnopqrstuvwxyz1234564:porti6881e5:token8:aoeusnthe1:q13:announce_peer1:t2:aa1:y1:qe
417 |

418 |

419 | Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
420 | bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re
421 |

422 |

423 |

424 |

425 |

References

426 | 427 | 428 | 429 | 430 | 431 |

[1]	Peter Maymounkov, David Mazieres, "Kademlia: A Peer-to-peer Information System Based on the XOR Metric", IPTPS 2002. http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf

432 | 433 | 434 | 435 | 436 | 437 |

[2]	Use SHA1 and plenty of entropy to ensure a unique ID.

438 |

439 |

440 |

Copyright

441 |

This document has been placed in the public domain.

442 | 449 |

450 | 451 | 452 |

453 | 456 | 457 |

458 | 459 | 460 | --------------------------------------------------------------------------------

hey

BitTorrent.org