└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # P2P Handbook 2 | 3 | ## Status: Living Document 4 | 5 | This handbook is very much a work in progress. It's incomplete, and I'm filling 6 | in bits and pieces as I go! 7 | 8 | You should file issues and ask questions about aspects of p2p you're interested 9 | in! 10 | 11 | ## THE HANDBOOK 12 | 13 | This handbook is an introduction to the primitives of peer-to-peer systems for 14 | the web, and a guide to the powerful abstractions you can build on top of those 15 | primitives to form distributed systems. 16 | 17 | This handbook's goal is to prepare you to use the excellent ecosystem of p2p 18 | modules available on npm today, and to develop your own p2p modules and 19 | applications on top. 20 | 21 | ## Table of Contents 22 | 23 | - [Status: Living Document](#status-living-document) 24 | - [THE HANDBOOK](#the-handbook) 25 | - [introduction](#introduction) 26 | - [why you should use p2p](#why-you-should-use-p2p) 27 | - [why node & javascript](#why-node-javascript) 28 | - [what superpowers does p2p grant?](#what-superpowers-does-p2p-grant) 29 | - [CAP theorum](#cap-theorum) 30 | - [p2p "layers" (roles)](#p2p-layers-roles) 31 | - [identity](#identity) 32 | - [discovery](#discovery) 33 | - [bittorrent dht](#bittorrent-dht) 34 | - [tracker / signal server](#tracker-signal-server) 35 | - [multicast dns](#multicast-dns) 36 | - [static bootstrap list](#static-bootstrap-list) 37 | - [dns](#dns) 38 | - [peer routing](#peer-routing) 39 | - [swarm / topology](#swarm-topology) 40 | - [content routing](#content-routing) 41 | - [p2p data structures](#p2p-data-structures) 42 | - [other handy not-strictly-p2p modules](#other-handy-not-strictly-p2p-modules) 43 | - [protocol](#protocol) 44 | - [awesome p2p modules](#awesome-p2p-modules) 45 | - [glueing p2p modules together to make apps](#glueing-p2p-modules-together-to-make-apps) 46 | - [Inspiration](#inspiration) 47 | 48 | --- 49 | 50 | ### introduction 51 | 52 | This is the handbook I wish that I had when I started delving down the 53 | peer-to-peer rabbit hole. 54 | 55 | what is p2p 56 | 57 | ### why you should use p2p 58 | 59 | ### why node & javascript 60 | 61 | ### what superpowers does p2p grant? 62 | 63 | technology grants us superpowers that we would otherwise not have 64 | 65 | - offline-first 66 | - abstract away corporate infrastructure 67 | - data can flow more freely 68 | 69 | ### CAP theorum 70 | 71 | - 72 | - dominictarr's article 73 | 74 | Consistency. Availability. Partition tolerance. Choose two. 75 | 76 | It's possible to "cheat" and get all three: _eventual consistency_ 77 | 78 | ### p2p "layers" (roles) 79 | 80 | Whereas the OSI model breaks networks into 7 layers, and the TCP/IP model does that into 4, it's beneficial to think of P2P in terms of _roles_: 81 | 82 | ``` 83 | +=============================+ 84 | | roles | 85 | +=============================+ 86 | | applications | 87 | +-----------------------------+ 88 | | protocols & data structures | 89 | +-----------------------------+ 90 | | content routing | 91 | +-----------------------------+ 92 | | swarm topology | 93 | +-----------------------------+ 94 | | peer routing | 95 | +-----------------------------+ 96 | | discovery | 97 | +-----------------------------+ 98 | | identity | 99 | +-----------------------------+ 100 | ``` 101 | 102 | ### Identity 103 | 104 | In a centralized system, this is easy: ask the user to provide an identifier. 105 | Maybe an email address, maybe a username. The server checks whether that 106 | identifier has already been taken. If it has, the user must choose another 107 | identifier. If the identity server is down, nobody can sign in, and nobody can 108 | create new identities. In fact, the whole system might become unusable if the 109 | software can't verify you are who you say you are. 110 | 111 | In a distributed system, you are permanently stuck with such restrictions: 112 | 113 | 1. there is no single authority to register a new identifier with 114 | 2. you cannot know of all other peers in the network, so any identity you choose 115 | may be in conflict with another peer now or in the future 116 | 3. there is no single authority to corroborate your claim that you are who you 117 | say you are 118 | 119 | One easy solution might be to have each peer generate a random number between 0 120 | and some big number. If the number is big enough, you're unlikely to see a 121 | conflict now or in the foreseeable future (though [the odds grow 122 | quadratically](https://en.wikipedia.org/wiki/Birthday_attack). This solution 123 | satisfies #1 and #2, but not #3: anyone who saw your identifier could 124 | impersonate you on the network, since there are no authorities to check who's 125 | who. We need a way for users to prove they are who they claim to be, without an 126 | authority to provide confirmation. 127 | 128 | What if you could provide some sort of proof that you were the author of a 129 | given message? A digital signature that only the holder of that large random 130 | number could produce? 131 | 132 | [Public key cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography) 133 | does just this: it's like the above, except *two* "random" numbers are 134 | generated: one you share widely as your identity, and the other you keep secret. 135 | You can use the secret key to produce a signature on any data you'd like, which 136 | anybody else in possession of the message, the signature, and your public key 137 | can verify: all without any remote authorities. 138 | 139 | Public keys are long and unwieldy for humans, like 140 | `7bc0bd9bd557547b81d53196674c869c6d698164c68b0033fd4b48849ce59110`. You could 141 | use a human "friendly" encoding like 142 | [proquint](https://github.com/deoxxa/proquint) to make this 143 | `bazol-rikur-lijit-rudij-jilam-kikag-satik-sipim-nojuk-hojam-rapis-gubab-rajuz-hafoh-jogun-bihan`. 144 | Pronouncible, but still not particularly useful to a human being. 145 | 146 | There is a conjecture called [Zooko's 147 | Triangle](https://en.wikipedia.org/wiki/Zooko%27s_triangle) that claims no 148 | system can achieve all three properties of being human-meaningful, 149 | decentralized, and secure. 150 | 151 | ### discovery 152 | 153 | given a key, find interested peers 154 | 155 | #### bittorrent dht 156 | 157 | The bittorrent dht, mainline, is huge: there are millions of nodes worldwide. 158 | You can store arbitrary data in them, but be warned that most nodes are 159 | configured to aggressively flush keypairs they receive. 160 | 161 | This approach is great when the swarm lacks the resources to maintain a powerful 162 | reliable central server for discovery -- a DHT is an excellent means for free 163 | short-term storage of peer location information. Because of its low retention 164 | duration, it may be required for peers to republish their contact information at 165 | a regular interval. 166 | 167 | - dht 168 | - discovery-channel 169 | - webtorrent 170 | 171 | #### tracker / signal server 172 | 173 | Run a centralized server(s) at some known IP that many other peers also connect 174 | to. This server can act as a connection broker, exchanging ip:ports of peers to 175 | help them connect to each other directly. 176 | 177 | This is also a partially effective means of traversing around certain types of 178 | NATs. 179 | 180 | This approach need not be centralized: bittorrent employs a decentralized set of 181 | trackers, which peers are free to decide between to use for discovery. 182 | 183 | This is useful when your potential swarm size is small, and your potential peers 184 | are located broadly across the open internet and require very accessible points 185 | (maybe served over port 80) to find. 186 | 187 | - signalhub 188 | 189 | #### multicast dns 190 | 191 | Multicast DNS is a very well supported protocol (bonjour/zeroconf+apple/airplay, 192 | etc) for finding other machines that are on the same network as you. 193 | 194 | It consists of sending a "query" message to the entire network. Based on its 195 | contents, machines who believe themselves applicable can respond with their 196 | information to facilitate direct contact between the two. 197 | 198 | This is useful when your peers are likely to be on the same network as you. 199 | 200 | - mdns 201 | - discovery-channel 202 | 203 | #### static bootstrap list 204 | 205 | When all else fails, you can still ship your application with a list of 206 | hardcoded peer locations, to help peers join the larger network. This could be 207 | as simple as 208 | 209 | ```json 210 | [ 211 | "QmVvUkSZqM2EG1SK9s49uN6pizNhXVFHpuJgh53my4A4pP": "tcp://12.62.93.214:9090", 212 | "QmawE3T8oyxMgz5KaYvsJ2BgQUZWvnKzebwLhqGPA8sqGb": "udp://24.243.12.9:1234", 213 | "QmQXV24ZjKPGiXfeQd4etZGgywzd4wfaGKGS48EGVdrS6C": "utp://99.4.78.181:97" 214 | ] 215 | ``` 216 | 217 | #### dns 218 | 219 | Store peer IDs and IP addresses as low-TTL DNS entries (dns round robin). This 220 | relies on DNS' peer-to-peer-like record dissemination protocol to propagate new 221 | peers over the time domain. 222 | 223 | This requires a central service that would function a lot like a tracker: it 224 | needs to both a) maintain a subset of active peers in the swarm, and b) publish 225 | a further subset of them to a DNS server. 226 | 227 | This approach is so useful because your peer application doesn't even need to 228 | understand p2p protocols: it can just connect to some known example.com and get 229 | a new peer every time it connects. 230 | 231 | - Q: any modules for publishing round robin dns entries? 232 | 233 | ### peer routing 234 | 235 | given a peer-id and peer-info, get a duplex stream 236 | 237 | "with this piece: what if we could map a public key to a duplex stream, across the world?" 238 | 239 | sub-topics 240 | - NAT hole-punching 241 | - relaying 242 | 243 | ### swarm / topology 244 | 245 | - the topology of peers in the network 246 | - 247 | 248 | 1. structured 249 | - fully-connected-topology 250 | - kademlia 251 | - chord 252 | 2. unstructured 253 | - signalhub 254 | - bittorrent swarm 255 | 256 | ### content routing 257 | 258 | given a message, figure out which peers to route it to 259 | ^ need to get a more solid understanding on this 260 | 261 | 1. examples 262 | 1. gossip 263 | 1. secure-gossip 264 | 2. hyperlog / ssb (anti-entropy) 265 | 266 | ### p2p data structures 267 | 268 | data structures well suited to an unreliable network (or offline) with untrusted peers 269 | 270 | 1. stream 271 | 2. crdt 272 | - 273 | - 274 | 3. append-only log 275 | 1. hyperlog 276 | 2. secure-scuttlebutt 277 | 4. content addressable store 278 | 5. merkle linking (blockchain, hyperlog, ssb, etc) 279 | a. merkle dag (ipfs) 280 | 7. dht 281 | 282 | #### other handy not-strictly-p2p modules 283 | 284 | 1. leveldb 285 | 2. abstract-blob-store 286 | 287 | ### protocol 288 | 289 | what messages the peers of the network agree to exchange 290 | 291 | 1. streams: the ultimate transport-agnostic abstraction 292 | 2. multistream: multiplexing protocols over a stream 293 | 294 | ### awesome p2p modules 295 | 296 | TODO: sort these into their respective sections? or have tiny per-module bits? 297 | 298 | - signalhub (peer discovery) 299 | - hyperlog (identity, data structure, protocol) 300 | - discovery-channel (peer discovery) 301 | - discovery-swarm (peer discovery + content routing + swarm) 302 | - bittorrent-dht (peer discovery + swarm) 303 | - ssb-keys (identity) 304 | - pubsub-swarm (identity, peer discovery, swarm, content routing) 305 | - secure-gossip (content routing, protocol) 306 | 307 | ### glueing p2p modules together to make apps 308 | 309 | 1. p2p picture sharing service (walk through glueing together modules from the 310 | roles to do this) 311 | 312 | ## Further Reading 313 | 314 | - http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/ 315 | 316 | ## Inspiration 317 | 318 | - 319 | - 320 | - 321 | - 322 | - 323 | - 324 | - 325 | - 326 | - 327 | 328 | 329 | --------------------------------------------------------------------------------