└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Peer-to-Peer Frequently Asked Questions 2 | 3 | **N.B.** This FAQ focuses on the very nebulous term "p2p system". There's not a 4 | single answer that maps exactly to *all* peer-to-peer systems; this FAQ does its 5 | best to provide a general answer when possible, and provide concrete examples 6 | where it makes sense. 7 | 8 | This FAQ also provides multiple answers per question, from various authors. 9 | There is no single objective perspective, so more viewpoints are invited: file a 10 | pull request! 11 | 12 | ## 1. Sounds great, but will it scale? 13 | 14 | *From [@staltz][staltz]:* 15 | 16 | Yes. It is rare to find a p2p service that does not scale. They are distributed 17 | systems by design, and most distributed systems are meant to scale. You could 18 | say, then, that many distributed systems take cues from p2p systems in order to 19 | scale properly. As a good example, Skype was built by the same engineers who 20 | built Kazaa, and Skype internally used p2p distribution in order to alleviate 21 | the load from any single node, and to save costs. Bittorrent also thrives in 22 | situations where there are a high number of peers. 23 | 24 | Like centralized systems, performance will suffer if the load is not 25 | distributed. A torrent file with only one seed and thousands of leechers would 26 | struggle to initially share to the first wave of peers. Unlike a centralized 27 | system though, once that first wave of peers downloads a copy, the bandwidth for 28 | that torrent data to be served grows exponentially. 29 | 30 | ## 2. If websites are hosted on p2p, what happens when no peers are online? 31 | 32 | *From [@noffle][noffle]:* 33 | 34 | The same result as when a centralized website goes down: it isn't available. 35 | 36 | The difference is that peer-to-peer networks distribute *the power to host*. I 37 | could run a peer serving my website on a server. Instantly I have the same 38 | website availability as a traditional centralized website. The difference is 39 | that there may be many peers in the swarm that are also hosting my website, so 40 | if my server goes down, the site will continue to be accessible through those 41 | seeding peers. 42 | 43 | *From [@retrohacker][retrohacker]:* 44 | 45 | Many p2p systems, i.e. BitTorrent, are optimized for sharing popular content. 46 | The more popular a piece of content, the more available the content becomes. The 47 | less popular content is, the less available the content becomes. Popularity in 48 | this case is the number of peers actively consuming and sharing a piece of 49 | content. The ability to access any piece of content on a p2p network is limited 50 | by the availability of peers, no peers no content. 51 | 52 | If you share content on a p2p network that you have a vested interest in being 53 | always-available, you must invest in maintaining your own highly available peers 54 | that share this content. This is not dissimilar to a centralized network, in 55 | that you must build highly available infrastructure to share your content. 56 | However, unlike a centralized network, you're infrastructure is no longer a 57 | single point of failure since you have the benefit of a p2p network supporting 58 | you. 59 | 60 | In the p2p model, you are not soley responsible for your uptime or performance. 61 | If your system falls over, consumers of your content can still fetch from 62 | another peer. If there is a spike in popularity of your content, peers will 63 | share content amongst eachother reducing the burden on your infrastructure. In 64 | many cases this can provide a better overall experience for consumers of your 65 | content. 66 | 67 | ## 3. What about security? Somebody could share a hacked version of a p2p website? 68 | 69 | *From [@noffle][noffle]:* 70 | 71 | It depends what the security model of the system hosting the website uses. There 72 | are two commonly tools I know of for ensuring that a copy of data you've 73 | received from a potentially untrusted source is authentic: 74 | 75 | 1. You used the [hash](https://wikipedia.org/Hash_function) of the data to 76 | request from the p2p network. If so, the data you receive from a peer can be 77 | hashed, and that hash compared against the one used to make the request for 78 | the data. [IPFS](https://ipfs.io) and [Secure 79 | Scuttlebutt](https://scuttlebutt.nz) do this. A caveat is that the data is 80 | static: the hash never changes and thus neither can the data. A benefit is 81 | that content-addressable data can be safely cached indefinitely. 82 | 83 | 2. You used the [public key](https://wikipedia.org/Public_key_cryptography) of 84 | the author to request the data from the p2p network. The idea is that every 85 | version of the data is cryptographically signed by the data's author, so that 86 | any data you download will have a signature can be checked against the public 87 | key used to request the data. This guarantees that the data came from the 88 | author you expected, and also permits changes to that data, unlike with 89 | content-addressed above. [Dat](https://dat-project.org), 90 | [IPFS](https://ipfs.io) and [SSB](https://scuttlebutt.nz) all use this 91 | approach for dynamic data. 92 | 93 | *From [@matthiasbeyer][matthiasbeyer]:* 94 | 95 | If cryptographic signatures come into play, this is not possible. 96 | 97 | Consider a content-addressed system. In such systems, content is addressed via 98 | a cryptographic hash which represents the content. For example, a file 99 | containing "Hello World" gets a hash "648a6a6ffff". 100 | If a peer now tries to fetch content from the network, it does so by asking 101 | for the content of "648a6a6ffff". 102 | If it gets sent this content, it can then verify with that same hash, 103 | whether the content it got is the actual content it requested. 104 | 105 | An attacker would be able to host malicious nodes in the network, but as the 106 | node which _requests_ the content (your node) can verify that it got what it 107 | expected. 108 | 109 | ## 4. What about privacy? Everybody in the p2p network can see what I am looking at. 110 | 111 | *Help contribute an answer to this question!* 112 | 113 | ## 5. P2P is great, but sometimes you need a single authoritative source of truth 114 | 115 | *From [@matthiasbeyer][matthiasbeyer]:* 116 | 117 | This is not true. Consider git: Each branch could be considered as source of 118 | truth (or rather "point of truth"). 119 | Branches may depend on eachother, branches may be merged. Branches may _not_ 120 | depend on eachother (git can have multiple "orphan" branches) and may _not_ be 121 | merged. Still, they are points of truth. 122 | With p2p systems in a decentralized environment, this is true as well. 123 | There might never be the _one_ version which is currently the point of truth, 124 | but as long as versions of the system can be merged, this is not a problem. 125 | 126 | Events in such a system can even be sorted chronologically via 127 | [vector clocks](https://en.wikipedia.org/wiki/Vector_clock) 128 | where each key is the unique peer hash. 129 | 130 | There exists a technology which brings data types to the table which can exist 131 | in a p2p system without ever needing a single source of truth. These types are 132 | named [CRDT](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type)s. 133 | 134 | *From [@noffle][noffle]:* 135 | 136 | If you are cryptographically signing the data you create (see #3), users can 137 | request your content by your public key. In this way you are able to control 138 | what data appears in this feed of data, but rely on potentially untrusted peers 139 | to distributed that data. 140 | 141 | By introducing a monotonic increasing sequence number to each new entry in the 142 | signed feed, peers can be assured that no messages were suppressed or censored. 143 | 144 | ## 6. What if p2p technology is used by "bad actors"? 145 | 146 | *From [@staltz][staltz]:* 147 | 148 | A very common concern with P2P technologies is that they aid crime, piracy, 149 | pedophilia, and other bad activities. The upside of not having an authority is 150 | also its unfortunate downside. That said, this aspect of information systems is 151 | overestimated when compared to other technologies like cars, weapons, hard 152 | drives, and kitchen cutlery. Terrorist attacks are carried out often through 153 | cars and common knives, yet it seems absurd to common sense that there would be 154 | global realtime surveillance of all cars and kitchen knives in order to prevent 155 | crimes. On the other hand, information systems by themselves cannot directly 156 | cause any physical harm. The absurdity of censoring cars and cutlery should 157 | extend also to information systems, or at least the discourse around security 158 | and crime prevention should get the priorities right and first address the root 159 | causes, the supporting incentives, the real weapons, and the tradeoffs involved. 160 | 161 | Another topic to consider the meaning of "bad", and how could *only bad actions* 162 | be prevented without preventing good. How could technology-for-freedom empower 163 | good actors *without* empowering bad actors? Conversely, how could 164 | technology-for-control enable those in power to arrest bad actors without 165 | enabling them to arrest good actors? 166 | 167 | It's problematic to have this good/bad debate, because it's about moral. Morality 168 | is culturally-bound, it's relative to the beliefs of a group. Moral in a global 169 | tech platform (the Internet) is toxic because it pushes one worldview and chokes 170 | pluralism. Our focus therefore should not be on the discussion around morality, 171 | it should be around freedom versus control, and how they affect a tech system 172 | deployed globally. 173 | 174 | More about this: 175 | https://theintercept.com/2015/11/17/u-s-mass-surveillance-has-no-record-of-thwarting-large-terror-attacks-regardless-of-snowden-leaks/ 176 | 177 | ## 7. What areas do modern p2p apps still struggle with? 178 | 179 | *From [@noffle][noffle]:* 180 | 181 | Apps still seem to have a hard time managing resources, like CPU and network 182 | bandwidth. If an app naively tries to download and replicate ALL of the data it 183 | sees, it's easy for it to overwhelm the machine it's running on. Many apps still 184 | have a ways to go in offering good controls for CPU and bandwidth use. 185 | 186 | 187 | [noffle]: http://git.scuttlebot.io/@C3iYh/12sO1uvKq1KcZXLFxSySzxOkHxXN8rtNB5MGA=.ed25519 188 | [staltz]: https://github.com/staltz 189 | [retrohacker]: https://github.com/retrohacker 190 | [matthiasbeyer]: https://github.com/retrohacker 191 | 192 | --------------------------------------------------------------------------------