├── data_sync ├── assets │ └── mvds │ │ ├── batch.png │ │ ├── interactive.png │ │ ├── batch_seqdiagram.png │ │ ├── interactive_seqdiagram.png │ │ ├── mscgen-about.txt │ │ ├── batch.msc │ │ ├── interactive.msc │ │ ├── batch-mode.mermaid │ │ ├── interactive-mode.mermaid │ │ ├── batch.svg │ │ └── interactive.svg ├── README.md ├── discovery.md ├── discovery-comparison.md ├── mvds.md ├── p2p-data-sync-comparison.md └── p2p-data-sync-mobile.md ├── _config.yml ├── secure_transport ├── README.md └── whisper_shortcomings.md ├── message-types.md ├── glossary.md └── README.md /data_sync/assets/mvds/batch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/status-im/bigbrother-specs/HEAD/data_sync/assets/mvds/batch.png -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: minima 2 | 3 | header_pages: 4 | - data_sync/README.md 5 | - secure_transport/README.md 6 | - glossary.md 7 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/interactive.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/status-im/bigbrother-specs/HEAD/data_sync/assets/mvds/interactive.png -------------------------------------------------------------------------------- /data_sync/assets/mvds/batch_seqdiagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/status-im/bigbrother-specs/HEAD/data_sync/assets/mvds/batch_seqdiagram.png -------------------------------------------------------------------------------- /data_sync/assets/mvds/interactive_seqdiagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/status-im/bigbrother-specs/HEAD/data_sync/assets/mvds/interactive_seqdiagram.png -------------------------------------------------------------------------------- /secure_transport/README.md: -------------------------------------------------------------------------------- 1 | # Secure Transport 2 | 3 | ## Assumptions 4 | 5 | ## Further reading 6 | 7 | - [Whisper Shortcomings](whisper_shortcomings.md) 8 | -------------------------------------------------------------------------------- /message-types.md: -------------------------------------------------------------------------------- 1 | # Message Types 2 | 3 | * **Direct** - `1:1` 4 | 5 | * **Group** - `N:M` 6 | * **Public** - Allow anyone to join 7 | * **Private** - Invite only 8 | 9 | * **Channel** - `1:M` 10 | * **Broadcast** - `1:-` A broadcast message contains no recepient and no topic. 11 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/mscgen-about.txt: -------------------------------------------------------------------------------- 1 | Use https://mscgen.js.org/ and export image there. Make sure to keep actual 2 | source code in source control and not just add images. 3 | 4 | There's also a CLI tool. 5 | 6 | Why mscgen over mermaid? 7 | It's simpler and less buggy (UI rendering, etc). More specific tool for 8 | sequence diagrams in particulr. See above site examples. 9 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/batch.msc: -------------------------------------------------------------------------------- 1 | # Alice and Bob: batch data sync 2 | msc { 3 | hscale="2", wordwraparcs=on; 4 | 5 | alice [label="Alice"], 6 | bob [label="Bob"]; 7 | 8 | --- [label="batch data sync"]; 9 | alice => alice [label="add messages to payload state"]; 10 | alice >> bob [label="send payload with messages"]; 11 | 12 | bob => bob [label="add acks to payload state"]; 13 | bob >> alice [label="send payload with acks"]; 14 | } -------------------------------------------------------------------------------- /data_sync/README.md: -------------------------------------------------------------------------------- 1 | # Data Sync 2 | 3 | The data sync protocol defines the protocol with which clients interact. It interacts directly with the secure transport layer. 4 | 5 | Currently there is a spec for [minimally viable data synchronization](./mvds.md), which features certain guarantees required. 6 | 7 | @todo: Do nodes chose what they want to sync? 8 | 9 | ## Assumptions 10 | 11 | @TODO 12 | 13 | ## Further Reading 14 | - [Introducing a Data Sync Layer](https://discuss.status.im/t/introducing-a-data-sync-layer/864) 15 | - [Different approaches to p2p data sync](./p2p-data-sync-comparison.md) 16 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/interactive.msc: -------------------------------------------------------------------------------- 1 | # Alice and Bob: interactive data sync 2 | msc { 3 | hscale="2", wordwraparcs=on; 4 | 5 | alice [label="Alice"], 6 | bob [label="Bob"]; 7 | 8 | --- [label="interactive data sync"]; 9 | alice => alice [label="add offers to payload state"]; 10 | alice >> bob [label="send payload with offers"]; 11 | 12 | bob => bob [label="add requests to payload state"]; 13 | bob >> alice [label="send payload with requests"]; 14 | 15 | alice => alice [label="add requested messages to state"]; 16 | alice >> bob [label="send payload with messages"]; 17 | 18 | bob => bob [label="add acks to payload state"]; 19 | bob >> alice [label="send payload with acks"]; 20 | } -------------------------------------------------------------------------------- /secure_transport/whisper_shortcomings.md: -------------------------------------------------------------------------------- 1 | # Shortcomings with Whisper 2 | 3 | Briefly on some shortcomings with Whisper. The main positive about it is that 4 | it is default in Ethereum and in production. It suffers from lack of scalability, and it's DoS protection mechanism is poor. 5 | 6 | - Anonymity claims not rigorous 7 | - Lack of machine-readable spec 8 | - PoW bad for heterogeneous nodes 9 | - no incentive to run a node 10 | - scaling difficulties 11 | 12 | ## Whisper at Status in practice 13 | 14 | Briefly on what Whisper looks like in practice with Status. 15 | 16 | - Cluster with bootnodes, nodes, mailservers 17 | - Partial relaying on mobile 18 | - Lower PoW (isolated network) 19 | - Mailservers: HA requirement and direct TCP 20 | - (Topics and chat) 21 | 22 | 23 | ## Futher reading 24 | 25 | Whisper vs PSS comparison: https://our.status.im/whisper-pss-comparison/ 26 | -------------------------------------------------------------------------------- /data_sync/discovery.md: -------------------------------------------------------------------------------- 1 | # Discovery 2 | 3 | ## Bootstrap 4 | 5 | In order to facilitate a more decentralized (democratic) implementation of bootstrapping for a p2p network, we introduce an `on-chain` smart contract containing a list of bootstrapping nodes. This smart contract is permissionless and operates with a stake / slashing model. Ideally we use an optmistic smart contract model1 for the submission process. 6 | A node is added to the set of nodes unless a challenge was voted valid by `2/3` of the current bootstrap nodes. 7 | 8 | *We consider this approach naive and requiring further investigation.* 9 | 10 | ## Further Reading 11 | - [P2P Discovery Comparison](discovery-comparison.md) 12 | - [Bootstrap Nodes Smart Contract](https://discuss.status.im/t/bootstrap-nodes-smart-contract/1135) 13 | 14 | ## Footnotes 15 | [1] https://medium.com/@decanus/optimistic-contracts-fb75efa7ca84 16 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/batch-mode.mermaid: -------------------------------------------------------------------------------- 1 | # Sequence diagram source 2 | 3 | Live version: https://notes.status.im/ZhoVhCCoSrakZs6D5ohRdg?both 4 | 5 | ```mermaid 6 | sequenceDiagram 7 | 8 | Note right of Alice: X1?: Assumes sharing relationship and csecure channel between Alice and Bob has been established. 9 | Alice->>Alice: X2?: Save message A1 locally 10 | Alice->>Alice: 1: Add MESSAGE A1 to payload state for Bob 11 | 12 | loop Alice: Every epoch, for every peer 13 | alt New messages to send to Bob 14 | Alice--x+Bob: X3?: Send payload with [MESSAGE A1] 15 | Alice->>Alice: 4. Update state for Bob A1 MESSAGE 16 | else No new messages 17 | Alice->>Alice: Noop 18 | end 19 | end 20 | 21 | loop Bob listens or polls 22 | alt Bob receives payload from Alice 23 | Bob->Bob: X4?: Bob updates state 24 | Bob->Bob: 2: ACK A1 added to next payload 25 | else Bob doesn't get message 26 | Bob->Bob: Noop 27 | end 28 | end 29 | 30 | loop Bob: Every epoch, for every peer 31 | alt New messages to send to Alice 32 | Bob--x-Alice: X3?: Send payload with [ACK A1] 33 | Bob->>Alice: 4. Update state for Alice A1 ACK 34 | else No new messages 35 | Alice->>Alice: Noop 36 | end 37 | end 38 | 39 | Alice->>Alice: 3: [...Receive ACK logic...] 40 | ``` 41 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/interactive-mode.mermaid: -------------------------------------------------------------------------------- 1 | sequenceDiagram 2 | 3 | Note right of Alice: X1?: Assumes sharing relationship and csecure channel between Alice and Bob has been established. 4 | Alice->>Alice: X2?: Save message A1 locally 5 | Alice->>Alice: 1: Add OFFER A1 to payload state for Bob 6 | 7 | loop Alice: Every epoch, for every peer 8 | alt New records to send to Bob 9 | Alice--x+Bob: X3?: Send payload with [OFFER A1] 10 | Alice->>Alice: 4. Update state for Bob A1 OFFER 11 | else No new records 12 | Alice->>Alice: Noop 13 | end 14 | end 15 | 16 | loop Bob listens or polls 17 | alt Bob receives payload from Alice 18 | Bob->Bob: X4?: Bob updates state 19 | Bob->Bob: 2: REQUEST A1 added to next payload 20 | else Bob doesn't get message 21 | Bob->Bob: Noop 22 | end 23 | end 24 | 25 | loop Alice listens or polls 26 | alt Alice receives payload from Bob 27 | Alice->Alice: X4?: Alice updates state 28 | Alice->Alice: 2: MESSAGE A1 added to next payload 29 | else Alice doesn't get message 30 | Alice->Alice: Noop 31 | end 32 | end 33 | 34 | 35 | loop Bob: Every epoch, for every peer 36 | alt New records to send to Alice 37 | Bob--x-Alice: X3?: Send payload with [ACK A1] 38 | Bob->>Alice: 4. Update state for Alice A1 ACK 39 | else No new records 40 | Alice->>Alice: Noop 41 | end 42 | end 43 | 44 | Alice->>Alice: 3: [...Receive ACK logic...] 45 | -------------------------------------------------------------------------------- /glossary.md: -------------------------------------------------------------------------------- 1 | # Glossary 2 | 3 | **Node**: Some process that is able to store data, do processing and communicate with other nodes. 4 | 5 | **Peer**: The other nodes that a peer is connected to. 6 | 7 | **Peer-to-peer (P2P)**: Protocols where resources are divided among multiple peers, without the need of central coordination. 8 | 9 | **Device**: A node capable of storing some data locally. 10 | 11 | **Identity**: A user's cryptographic identity, usually a single or several public keypairs. 12 | 13 | **User**: A (human) end-user that may have multiple *devices*, and some form of identity. 14 | 15 | **Data replication**: Storing the same piece of data in multiple locations, in order to improve availability and performance. 16 | 17 | **Data sync**: Achieving *consistency* among a set of nodes storing data 18 | 19 | **Mobile-friendly**: Multiple factors that together make a solution suitable for mobile. These are things such as dealing with *mostly-offline* scenarios, *network churn*, *limited resources* (such as bandwidth, battery, storage or compute). 20 | 21 | **Replication object**: Also known as the minimal unit of replication, the thing that we are replicating. 22 | 23 | **Friend-to-friend network (F2F)**: A private P2P network where there are only connections between mutually trusted peers. 24 | 25 | **Content addressable storage (CAS)**: Storing information such that it can be retrieved by its content, not its location. Commonly performed by the use of cryptographic hash functions. 26 | 27 | **Public P2P**: Open network where peers can connect to each other. 28 | 29 | **Structured P2P network**: A public p2p network where data placement is related to the network topology. E.g. a DHT. 30 | 31 | **Unstructed P2P network**: A public p2p network where peers connect in an ad hoc manner, and a peer only knows its neighbors, not what they have. 32 | 33 | **Super-peer P2P network**: A non-pure p2p network, hybrid client/server architecture, where certain peers do more work. 34 | 35 | **Private P2P**: Private network where peers must mutually trust each other before connecting, can either be F2F or group-based. 36 | 37 | **Group-based P2P network**: A private p2p network where you need some form of access to join, but within group you can connect to new peers, e.g. friend of a friend or a private BT tracker. 38 | -------------------------------------------------------------------------------- /data_sync/discovery-comparison.md: -------------------------------------------------------------------------------- 1 | # Approaches to discovery in P2P networks 2 | 3 | *Written April 8th, 2019 by Dean Eigenmann* 4 | 5 | **WARNING: This is an early draft, and likely contains errors.** 6 | 7 | In this paper, we compare various protocols and methods used for peer discovery in p2p networks, with the intention of making it easier to choose one for any specific use case. It is part of a larger series on secure messaging, and therefore will likely contain certain biases towards that use case. 8 | 9 | > Request for comments: Please direct comments on this document to @decanus. These comments can be on the structure, content, things that are wrong, things that should be looked into more, things that are unclear, as well as anything else that comes to mind. 10 | 11 | ## Table of Contents 12 | 1. [Introduction](#introduction) 13 | 2. [Methodology](#methodology) 14 | 3. [Compared technologies](#compared-technologies) 15 | 4. [Comparison](#comparison) 16 | 5. [Summary](#summary) 17 | 6. [Acknowledgements](#acknowledgements) 18 | 7. [References](#references) 19 | 20 | ## Introduction 21 | 22 | An important aspect of p2p networks is the actual discovery of peers, this dictates how a node finds other nodes in the network. It can also determine how on which nodes information is stored and whom a node most connect to in order to retrive said information. In this paper we analyze various methods for peer discovery, what network types they work for and what their benefits and drawbacks are when applying various limitations such as the challenge of running these methods on mobile devices. 23 | 24 | ## Methodology 25 | 26 | Literature comparing various discovery methods for p2p networks seems rather limited, therefore most of the methodologies used within this paper are not based on previous literature. **TODO, probably not best said like this** 27 | 28 | The notes presented below are based on provided documentation and specification, code has not been looked at generally or fully reviewed. All results are tentative. 29 | 30 | ### Compared Dimensions 31 | 32 | We first create a general overview of the various methods, highlighting their network type, fault tolerance, cost lookup and query efficiency. 33 | 34 | ## Compared technologies 35 | 36 | @TODO Write some stuff on each technology, quick abstract 37 | 38 | ### Gnutella 39 | 40 | ### Kademlia 41 | 42 | ### Secure Scuttlebutt (SSB) 43 | 44 | ### Rendezvous 45 | 46 | ## Comparison 47 | 48 | | Method | Type | Fault Tolerance | Cost Lookup | Query Efficiency | 49 | |------------|--------------|-----------------|-------------|------------------| 50 | | Gnutella | Unstructured | Random | O(n) | Poor | 51 | | Kademlia | Structured | Good | O(log n) | Good | 52 | | SSB | | | | | 53 | | Rendezvous | | | O(n) | | 54 | 55 | ## Summary 56 | 57 | ## Acknowledgements 58 | 59 | ## References 60 | 61 | 1. [S. Masood, M. Shahid, M. Sharif and M. Yasmin "Comparative Analysis of Peer to Peer Networks"](http://oaji.net/articles/2017/2698-1520328416.pdf) 62 | 63 | --- 64 | **OLD** 65 | 66 | ## [Kademlia](https://en.wikipedia.org/wiki/Kademlia) 67 | 68 | ### Reading 69 | - http://www.scs.stanford.edu/~dm/home/papers/kpos.pdf 70 | - https://arxiv.org/pdf/1408.3079.pdf 71 | 72 | **Pros** 73 | - @TODO 74 | 75 | **Cons** 76 | - Running a DHT on a mobile client 77 | 78 | ## Secure Scuttlebutt 79 | 80 | ## [Rendezvous](https://github.com/libp2p/specs/tree/e1083c1f9d8f7afc0d65a43a12b05492f3873385/rendezvous) 81 | 82 | **Pros** 83 | - Leightweight 84 | 85 | **Cons** 86 | - Susceptible to spam attacks (suggest PoW for anti-spam, this is no longer mobile friendly) 87 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BigBrother Specs 2 | 3 | *Big Brother can't watch you!* 4 | 5 | ## Table of Contents 6 | - [Abstract](#abstract) 7 | - [Motivation](#motivation) 8 | - [Terms](#terms) 9 | - [Design Goals](#design-goals) 10 | - [System Design / Architecture](#system-design--architecture) 11 | - [SOLID](#solid) 12 | - [Phases](#phases) 13 | - [Phase 0](#phase-0---xkeyscore-data-sync) 14 | - [Phase 1](#phase-1---bullrun-transport-privacy-layer) 15 | - [Phase 2](#phase-2---stoneghost-p2p-overlay) 16 | - [Phase 3](#phase-3---credible-secure-transport) 17 | - [Phase 4](#phase-4---cointelpro--jtrig-ops-data-sync-clients) 18 | - [Phase 5](#phase-5---nativeflora--tao-ops-trust-establishment) 19 | - [Footnotes](#footnotes) 20 | 21 | ## Abstract 22 | 23 | BigBrother defines a family of protocols that when combined create a decentralized peer-to-peer messaging stack. BigBrother should enable various [messaging types](message-types.md), in order to be called a messaging stack. 24 | 25 | The family of protocols SHOULD be flexible enough to allow various use case implementations not limited to that being defined by the entire family. The BigBrother protocol itself defines mainly the way the family of protocols interact with one another allowing for customizability on each layer. 26 | 27 | ## Motivation 28 | 29 | **TODO: TALK ABOUT SHORTCOMINGS OF WHISPER** 30 | 31 | ## Terms 32 | 33 | | Term | Definition| 34 | |:---:|--| 35 | | **Stack** | Defines the entire family of protocols, how they interact and what goals are achieved. | 36 | | **Protocol** | Defines a single layer in the *stack* along with its endpoints used for communication with various other protocols in the stack. 37 | 38 | ## Design Goals 39 | 40 | 1. **Anonymity** - `sender` / `receiver` MUST remain anonymous. 41 | 1. **Unlinkability** - It should not be identifiable which parties are talking to eachother. 42 | 1. **Scalable** - @TODO 43 | 1. **Incentivized** - @TODO 44 | 1. **Decentralized** - @TODO 45 | 1. **Resistant** - @TODO 46 | 1. **Inclusive** - Protocols within the stack should be designed to work on resource restricted devices, this allows for a higher participation making the entire BigBrother protocol more usable and reliable. 47 | 48 | ## System Design / Architecture 49 | 50 | The protocols summarized by BigBrother all follow a standard of architectural design principles with the goal of keeping each component noncomplicated. We architect a stack where each protocol interacts holistically, without the desire to create unnecessary complexity. Each protocol included in the stack should be simplistic enough to allow for multiple implementations creating client diversity, allowing us to ensure that the entire stack is unambiguous1. 51 | 52 | ### SOLID 53 | 54 | Throughout the design of the stack we follow the SOLID design principles, redefining them slightly so they make sense in the context of protocols. 55 | 56 | - **S**ingle Responsibility principle: Each protocol in the family should only have a single responsibility. 57 | - **O**pen/closed principle: @TODO 58 | - **L**iskov substitution principle: @TODO 59 | - **I**nterface segregation principle: @TODO 60 | - **D**ependency inversion principle: @TODO 61 | 62 | ## Stack 63 | 64 | The below table shows the intended layers of the network from the *highest* to the *lowest*. 65 | 66 | | Layer / Protocol | Purpose | Example | 67 | |-------------------|---------------------------------|-------------------------------| 68 | | Sync Clients | End user functionality | 1:1, group chat, tribute, ... | 69 | | Data Sync | Syncing data/state | Bramble Sync Protocol (ish) | 70 | | Secure Transport | Confidentiality, PFS, etc | Double Ratchet | 71 | | Transport Privacy | Metadata protection | Mixnet? | 72 | | P2P Overlay | Overlay routing, NAT traversal | libp2p? | 73 | 74 | ## Phases 75 | 76 | Inspired by the ETH 2.0 implementation process, we have decided to roll out BigBrother in multiple phases. The phases can mostly be linked to various layers in the [stack](#stack). Some of these phases are sometimes more loosely coupled than ETH 2.0 components meaning they can be done in parallel as only the communication API is relevant. 77 | 78 | ### Phase 0 - [XKEYSCORE (Data Sync)](/data_sync/README.md) 79 | 80 | ### Phase 1 - BULLRUN (Transport Privacy Layer) 81 | 82 | - Mixnet 83 | - [PSS](https://gist.github.com/zelig/d52dab6a4509125f842bbd0dce1e9440) 84 | - Bluetooth local network 85 | - Sneakernet 86 | 87 | ### Phase 2 - STONEGHOST (P2P Overlay) 88 | 89 | ### Phase 3 - [CREDIBLE (Secure Transport)](/secure_transport/README.md) 90 | 91 | - Message Layer Security (MLS) 92 | 93 | ### Phase 4 - COINTELPRO / JTRIG OPS (Data Sync Clients) 94 | 95 | ### Phase 5 - NATIVEFLORA / TAO OPS (Trust Establishment) 96 | 97 | ## Footnotes 98 | 99 | - [1] http://ethdocs.org/en/latest/ethereum-clients/choosing-a-client.html 100 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/batch.svg: -------------------------------------------------------------------------------- 1 | ]> 2 | 3 | # Generated by mscgen_js - https://sverweij.github.io/mscgen_js 4 | # Alice and Bob: interactive data sync 5 | msc { 6 | hscale="2", wordwraparcs=on; 7 | 8 | alice [label="Alice"], 9 | bob [label="Bob"]; 10 | 11 | --- [label="batch data sync"]; 12 | alice => alice [label="add messages to payload state"]; 13 | alice >> bob [label="send payload with messages"]; 14 | 15 | bob => bob [label="add acks to payload state"]; 16 | bob >> alice [label="send payload with acks"]; 17 | }AliceBobbatch data syncadd messages to payload statesend payload with messagesadd acks to payload statesend payload with acks -------------------------------------------------------------------------------- /data_sync/mvds.md: -------------------------------------------------------------------------------- 1 | # Minimum Viable Data Synchronization 2 | 3 | **DRAFT VERSION 0.5.1** 4 | 5 | *Written by Oskar Thorén oskar@status.im & Dean Eigenmann dean@status.im* 6 | 7 | **This document follows [RFC-2119](https://tools.ietf.org/html/rfc2119).** 8 | 9 | ## Table of Contents 10 | 11 | 1. [Abstract](#abstract) 12 | 2. [Definitions](#definitions) 13 | 3. [Wire Protocol](#wire-protocol) 14 | 1. [Secure Transport](#secure-transport) 15 | 2. [Payloads](#payloads) 16 | 4. [Synchronization](#synchronization) 17 | 1. [State](#state) 18 | 2. [Flow](#flow) 19 | 3. [Retransmission](#retransmission) 20 | 5. [Footnotes](#footnotes) 21 | 6. [Acknowledgements](#acknowledgements) 22 | 23 | ## Abstract 24 | 25 | In this specification, we describe a minimum viable protocol for data synchronization inspired by the Bramble Synchronization Protocol1. This protocol is designed to ensure reliable messaging between peers across an unreliable peer-to-peer (P2P) network where they may be unreachable or unresponsive. 26 | 27 | We present a functional specification for future implementation2 as well as reference simulation data which demonstrates its performance. 28 | 29 | ## Definitions 30 | 31 | | Term | Description | 32 | |------------|-------------------------------------------------------------------------------------| 33 | | **Peer** | The other nodes that a node is connected to. | 34 | | **Record** | Defines a payload element of either the type `OFFER`, `REQUEST`, `MESSAGE` or `ACK` | 35 | | **Node** | Some process that is able to store data, do processing and communicate for MVDS. | 36 | 37 | ## Wire Protocol 38 | 39 | ### Secure Transport 40 | 41 | This specification does not define anything related to the transport of packets. It is assumed that this is abstracted in such a way that any secure transport protocol could be easily implemented. Likewise, properties such as confidentiality, integrity, authenticity and forward secrecy are assumed to be provided by a layer below. 42 | 43 | ### Payloads 44 | 45 | Payloads are implemented using [protocol buffers v3](https://developers.google.com/protocol-buffers/). 46 | 47 | ```protobuf 48 | syntax = "proto3"; 49 | 50 | message Payload { 51 | repeated bytes acks = 1; 52 | repeated bytes offers = 2; 53 | repeated bytes requests = 3; 54 | repeated Message messages = 4; 55 | } 56 | 57 | message Message { 58 | bytes group_id = 1; 59 | int64 timestamp = 2; 60 | bytes body = 3; 61 | } 62 | 63 | ``` 64 | 65 | Each payload contains the following fields: 66 | 67 | - **Acks:** This field contains a list (can be empty) of `message identifiers` informing the recipient that sender holds a specific message. 68 | - **Offers:** This field contains a list (can be empty) of `message identifiers` that the sender would like to give to the recipient. 69 | - **Requests:** This field contains a list (can be empty) of `message identifiers` that the sender would like to receive from the recipient. 70 | - **Messages:** This field contains a list of messages (can be empty). 71 | 72 | **Message Identifiers:** Each `message` has a message identifier calculated by hashing the `group_id`, `timestamp` and `body` fields as follows: 73 | 74 | ``` 75 | HASH("MESSAGE_ID", group_id, timestamp, body); 76 | ``` 77 | 78 | The current `HASH` function used is `sha256`. 79 | 80 | ## Synchronization 81 | 82 | ### State 83 | 84 | We refer to `state` as a collection of data each node SHOULD hold for records of the types `OFFER`, `REQUEST` and `MESSAGE` per peer. We MUST NOT keep states for `ACK` records as we do not retransmit those periodically. The following information is stored for records: 85 | 86 | - **Type** - Either `OFFER`, `REQUEST` or `MESSAGE` 87 | - **Send Count** - The amount of times a record has been sent to a peer. 88 | - **Send Epoch** - The next epoch at which a record can be sent to a peer. 89 | 90 | ### Flow 91 | 92 | A maximum of one payload SHOULD be sent to peers per epoch, this payload contains all `ACK`, `OFFER`, `REQUEST` and `MESSAGE` records for the specific peer. Payloads are created every epoch, containing reactions to previously received records by peers or new records being sent out by nodes. 93 | 94 | Nodes MAY have two modes with which they can send records: `BATCH` and `INTERACTIVE` mode. The following rules dictate how nodes construct payloads every epoch for any given peer for both modes. 95 | 96 | #### Interactive Mode 97 | 98 | - A node initially offers a `MESSAGE` when attempting to send it to a peer. This means an `OFFER` is added to the next payload and state for the given peer. 99 | - When a node receives an `OFFER`, a `REQUEST` is added to the next payload and state for the given peer. 100 | - When a node receives a `REQUEST` for a previously sent `OFFER`, the `OFFER` is removed from the state and the corresponding `MESSAGE` is added to the next payload and state for the given peer. 101 | - When a node receives a `MESSAGE`, the `REQUEST` is removed from the state and an `ACK` is added to the next payload for the given peer. 102 | - When a node receives an `ACK`, the `MESSAGE` is removed from the state for the given peer. 103 | - All records that require retransmission are added to the payload, given `Send Epoch` has been reached. 104 | 105 |

106 | 107 |
108 | Figure 1: Delivery without retransmissions in interactive mode. 109 |

110 | 111 | #### Batch Mode 112 | 113 | 1. When a node sends a `MESSAGE`, it is added to the next payload and the state for the given peer. 114 | 2. When a node receives a `MESSAGE`, an `ACK` is added to the next payload for the corresponding peer. 115 | 3. When a node receives an `ACK`, the `MESSAGE` is removed from the state for the given peer. 116 | 4. All records that require retransmission are added to the payload, given `Send Epoch` has been reached. 117 | 118 | 119 | 120 |

121 | 122 |
123 | Figure 2: Delivery without retransmissions in batch mode. 124 |

125 | 126 | 127 | 128 | 129 | ### Retransmission 130 | 131 | The record of the type `Type` SHOULD be retransmitted every time `Send Epoch` is smaller than or equal to the current epoch. 132 | 133 | `Send Epoch` and `Send Count` MUST be increased every time a record is retransmitted. Although no function is defined on how to increase `Send Epoch`, it SHOULD be exponentially increased until reaching an upper bound where it then goes back to a lower epoch in order to prevent a record's `Send Epoch`'s from becoming too large. 134 | 135 | > ***NOTE:** We do not retransmission `ACK`s as we do not know when they have arrived, therefore we simply resend them every time we receive a `MESSAGE`.* 136 | 137 | ## Footnotes 138 | 139 | 1. https://code.briarproject.org/briar/briar-spec/blob/master/protocols/BSP.md 140 | 2. https://github.com/status-im/mvds 141 | 142 | ## Acknowledgements 143 | - Preston van Loon 144 | - Greg Markou 145 | - Rene Nayman 146 | - Jacek Sieka 147 | -------------------------------------------------------------------------------- /data_sync/assets/mvds/interactive.svg: -------------------------------------------------------------------------------- 1 | ]> 2 | 3 | # Generated by mscgen_js - https://sverweij.github.io/mscgen_js 4 | # Alice and Bob: interactive data sync 5 | msc { 6 | hscale="2", wordwraparcs=on; 7 | 8 | alice [label="Alice"], 9 | bob [label="Bob"]; 10 | 11 | --- [label="interactive data sync"]; 12 | alice => alice [label="add offers to payload state"]; 13 | alice >> bob [label="send payload with offers"]; 14 | 15 | bob => bob [label="add requests to payload state"]; 16 | bob >> alice [label="send payload with requests"]; 17 | 18 | alice => alice [label="add requested messages to state"]; 19 | alice >> bob [label="send payload with messages"]; 20 | 21 | bob => bob [label="add acks to payload state"]; 22 | bob >> alice [label="send payload with acks"]; 23 | }AliceBobinteractive data syncadd offers to payload statesend payload with offersadd requests to payload statesend payload with requestsadd requested messages to statesend payload with messagesadd acks to payload statesend payload with acks -------------------------------------------------------------------------------- /data_sync/p2p-data-sync-comparison.md: -------------------------------------------------------------------------------- 1 | # Different approaches to p2p data sync 2 | *Written March 29, 2019 by Oskar. Updated April 1, April 2.* 3 | 4 | **WARNING: This is an early draft, and likely contains errors.** 5 | 6 | This document compares various forms of data sync protocols and applications, along various dimensions, with the goal of making it easier for you to choose which one is for you. It is part of a larger series on secure messaging. 7 | 8 | >Request for comments: Please direct comments on this document to Oskar. These comments can be on the structure, content, things that are wrong, things that should be looked into more, things that are unclear, as well as anything else that comes to mind. 9 | 10 | ## Table of contents 11 | 12 | 1. [Introduction](#1-introduction) 13 | 2. [Background and definitions](#2-background-and-definitions) 14 | 3. [Methodology](#3-methodology) 15 | 4. [Comparison](#4-comparison) 16 | 5. [Summary](#5-summary) 17 | 6. [Acknowledgements](#6-acknowledgements) 18 | 7. [References](#7-references) 19 | 20 | ## 1. Introduction 21 | In a p2p network you often want to reliably transmit and replicate data across participants. This can be either large files, or messages that users want to exchange between each other in a private chat. This is especially challenging on mobile devices. Additionally, you might want security properties beyond robustness, such as privacy-preservation, censorship resistance and coercion resistance. 22 | 23 | In this paper we go through various approaches to data sync in a p2p context. We then compare them across relevant dimensions to make it easier for you to make an informed decision about what solution you want to use. 24 | 25 | ## 2. Background and definitions 26 | 27 | For broad definitions please refer to the [glossary](../glossary.md), data sync specific definitions will be listed below. 28 | 29 | | Term | Definition | 30 | | ---- | ---------- | 31 | | Node | Some process that is able to store data, do processing and communicate with other nodes. | 32 | | Peer | The other nodes that a peer is connected to. | 33 | | Peer-to-peer (P2P) | Protocols where resources are divided among multiple peers, without the need of central coordination. | 34 | | Device | A node capable of storing some data locally. | 35 | | Identity | A user's cryptographic identity, usually a single or several public keypairs. | 36 | | User | A (human) end-user that may have multiple *devices*, and some form of identity. | 37 | | Data replication |Storing the same piece of data in multiple locations, in order to improve availability and performance. | 38 | | Data sync | Achieving *consistency* among a set of nodes storing data | 39 | | Mobile-friendly | Multiple factors that together make a solution suitable for mobile. These are things such as dealing with *mostly-offline* scenarios, *network churn*, *limited resources* (such as bandwidth, battery, storage or compute). | 40 | | Replication object | Also known as the minimal unit of replication, the thing that we are replicating. | 41 | | Friend-to-friend network (F2F) | A private P2P network where there are only connections between mutually trusted peers. | 42 | | Content addressable storage (CAS) | Storing information such that it can be retrieved by its content, not its location. Commonly performed by the use of cryptographic hash functions. | 43 | | Public P2P | Open network where peers can connect to each other. | 44 | | Structured P2P network | A public p2p network where data placement is related to the network topology. E.g. a DHT. | 45 | | Unstructed P2P network | A public p2p network where peers connect in an ad hoc manner, and a peer only knows its neighbors, not what they have. | 46 | | Super-peer P2P network | A non-pure p2p network, hybrid client/server architecture, where certain peers do more work. | 47 | | Private P2P | Private network where peers must mutually trust each other before connecting, can either be F2F or group-based. | 48 | | Group-based P2P network | A private p2p network where you need some form of access to join, but within group you can connect to new peers, e.g. friend of a friend or a private BT tracker. | 49 | 50 | **These have yet to be filled out** 51 | *Consistency model*: ... 52 | 53 | *Mostly-offline:* ... 54 | 55 | *Network churn:* ... 56 | 57 | *Light node:* ... 58 | 59 | *Cryptographic hash function*: ... 60 | 61 | *Multi-device:*... 62 | 63 | *Robustness*: ... 64 | 65 | *Privacy-preservation*: ... 66 | 67 | *Censorship-resistance*: ... 68 | 69 | *Coercion-resistance*: ... 70 | 71 | ## 3. Methodology 72 | We look at generally established dimensions in the literature [xx1], and evaluate protocols and applications based on these. Additionally we add some dimensions that aren't necessarily captured in the literature, such as mobile-friendliness and practical implementation. Specifically the focus is on p2p applications that perform some form of data synchronization, with a bias towards secure messaging applications. 73 | 74 | All notes are tentative and are based on the provided documentation and specification. Code has generally not been looked into, nor have any empirical simulations been performed. These results are tentative as they have yet to be reviewed. 75 | 76 | ### Compared dimensions 77 | 78 | **Request for comments: What's a better way to decompose these properties? Ideally something like 2-5 dimensions that matter most. Right now it reads a bit like a laundry list, or like an ad hoc classification. Update April 1: Factored other considerations out into guarantees and practical application/protocol considerations. Also tweaked How section to also deal with transport requirements and peer discovery.** 79 | 80 | These dimensions are largely taken from the survey paper by Martins as well, with some small tweaks. To make it easier to survey, we divide up the dimensions into rough sections. 81 | 82 | #### 1. Why and what are we syncing? 83 | - *Problem domain*. Why are we syncing stuff in the first place? 84 | - *Minimal unit of replication*. The minimal entity that we can replicate. The actual unit of replication is the data structure that we are interested in, usually a collection of entities. Related: version history abstraction (linear vs DAG). 85 | - *Read-only or read and write*. Is the (actual unit of replication) data static or not? 86 | 87 | #### 2. Who is participating? 88 | - *Active vs passive replication*. Are participants replicating data that they are not interested in themselves? 89 | 90 | #### 3. When and where are we syncing? 91 | Replication control mechanisms. 92 | 93 | - *Single-master vs multi-master*. Both entity and collection. 94 | - *Synchronous (eager) vs asynchronous (lazy)*. 95 | - *If asynchronous, optimistic or not*. 96 | - *Replica placement*. Full or partial replication. 97 | 98 | #### 4. How are they syncing? 99 | - *P2P Topology*. Public or private P2P? Unstructured/structured/super-peer or friend-to-friend/group-based? 100 | - *Peer Discovery*. How do peers discover and connect with each other? 101 | - *Transport requirements*. Any specific requirements on the underlying transport layer? 102 | 103 | #### 5. What guarantees does it provide? 104 | - *Consistency-model*. What are the guarantees provided? 105 | - *Privacy-preservation*. Assuming underlying layers provide privacy, are these guarantees upheld? 106 | 107 | #### 6. Engineering and application, how does it work in practice? 108 | - *Actively used*. Is an instance of the protocol used in the real-world? 109 | - *Mobile friendliness*. Any specific considerations for mobile? 110 | - *Well-defined spec*. Is there a well-defined and documented spec to refer to? 111 | - *Protocol upgradability*. Are there any mechanisms for upgrading the protocol? 112 | - *Framing considerations*. Does it support messages that don't need replication? 113 | 114 | ## Notes on single-master vs multi-master 115 | For single-master, there's only a single node that writes to a specific piece of data. The other peers purely replicate it and don't change it. 116 | 117 | For many of the studied systems, *content addressable storage* is used. This means the replication object is usually immutable, and it is only written to once. As a side effect, many systems are naturally single-master from this point of view. 118 | 119 | This is in comparison with the Martins survey paper, where they are more interested in update-in-place programming through replicating SQL DBs and other rich data structures. 120 | 121 | However, if we look at what is semantically interesting for the user, this is usually not an individual message or chunk of a file. Instead it is usually a conversation or a file. Seen from that point of view, we often employ some form of linear or DAG-based version history. In this case, many participants might update the relevant sync scope. Thus, the system is better seen as a multi-master one. To capture this notion we have divided the minimal unit of replication into entity and collection. 122 | 123 | ## Compared technologies 124 | 125 | **RFC: Which way is the most useful to look at it? Worried about confusion between the two, especially since many applications have tightly coupled set of protocols.** 126 | 127 | This includes both applications and specification. For example p2p messengers such as Briar (Bramble) [xx2], Matrix [xx3] and Secure Scuttlebutt (SSB) [xx4]. As well as general p2p storage solutions such as Swarm [xx5]. 128 | 129 | In order to compare apples to apples, we compare Briar and Matrix as in how they de facto use sync. An additional consideration would be to compare Bramble and Matrix Server-to-Server API/spec directly. This also means Swarm doesn't necessarily make sense to compare directly to these applications. 130 | 131 | This application focus is justified by the following observations: 132 | 133 | a) most protocols in this space are young, underspecified and somewhat coupled with the application using them, especially compared to protocols such as IP/TCP/UDP/TLS/Bittorrent, etc 134 | 135 | b) we are ultimately interested in considerations that only manifest themselves as the protocol is used on e.g. mobile devices 136 | 137 | ## 4. Comparison 138 | 139 | ### What and why are we syncing? 140 | 141 | #### Briar 142 | 143 | *Problem domain*. Bramble is the protocol Briar uses for synchronizing application layer data. Briar is a messaging app, and application data can be events such as sending messages, joining a room, etc. 144 | 145 | *Minimal unit of replication*. Immutable messages. The actual unit of replication is a DAG, which is encoded inside the encrypted message payload. This produces a message graph. 146 | 147 | *Read-only or read and write*. Each message is immutable, but since we are dealing with instant messaging a group context, such as a conversation, is changing. Thus it is read and write. 148 | 149 | #### Matrix 150 | 151 | *Problem domain*. Matrix is a set of open HTTP APIs that allow for real-time synchronization and persistance of arbitrary JSON over a federation of servers. Often this is done in a room context, with participants talking to each other. More generally, for messaging between humans. 152 | 153 | *Minimal unit of replication*. Messages that also form a DAG. 154 | 155 | *Read-only or read and write*. Message domain so read and write. Individual messages are generally immutable. 156 | 157 | #### Secure Scuttlebutt 158 | 159 | *Problem domain*. 160 | 161 | *Minimal unit of replication* 162 | 163 | *Actual unit of replication*. 164 | 165 | *Read-only or read and write*. 166 | 167 | #### Swarm 168 | 169 | *Problem domain*. 170 | 171 | *Minimal unit of replication* 172 | 173 | *Actual unit of replication*. 174 | 175 | *Read-only or read and write*. 176 | 177 | #### Comparison 178 | 179 | | Dimension | Briar | Matrix | SSB | Swarm | 180 | |------------|------|------| ----| ---| 181 | | *Problem domain*. | Messaging | Messaging | Social network | Storage | 182 | | *Minimal unit of replication* | Messages | Messages | Messages | Chunks | 183 | | *Actual unit of replication*. | DAG | DAG | Log | | 184 | | *Read-only or read and write*. | Read and write | Read and write | Read and write | Read and write | 185 | 186 | ### Who is participating? 187 | 188 | #### Bramble 189 | *Active vs passive replication*. In Bramble, only the participants in a group chat are syncing messages. It is thus a form of passive replication. There are some proposals to have additional nodes to help with offline inboxing. 190 | 191 | #### Matrix 192 | *Active vs passive replication*. In Matrix, homeservers are used for replication. Homeservers themselves don't care about the messages, so it is a form of active replication. 193 | 194 | #### Secure Scuttlebutt 195 | *Active vs passive replication*. 196 | 197 | #### Swarm 198 | *Active vs passive replication*. In Swarm, nodes replicate data that they are "assigned" to by virtue of being close to it. 199 | 200 | #### Comparison 201 | 202 | | Dimension | Briar | Matrix | SSB | Swarm | 203 | |------------|------|------| ----| ---| 204 | | Active replication? | Passive | Active | | Passive | 205 | 206 | ### When and where are we syncing? 207 | 208 | #### Briar 209 | *Single-master vs multi-master*. Anyone who has access to the DAG can write to it. Each individual message is immutable and thus write-once. But it's multi-master since multiple people can update the principal data structure. 210 | 211 | *Synchronous (eager) vs asynchronous (lazy)*. Asynchronous, you write to your own node first. 212 | 213 | *If asynchronous, optimistic or not*. Optimistic, since each message update is a hash, conflicts are likely to be rare. 214 | 215 | *Replica placement*. Full or partial replication. 216 | 217 | #### Matrix 218 | *Single-master vs multi-master*. Essentially same as Briar. Multi-master. 219 | 220 | *Synchronous (eager) vs asynchronous (lazy)*. Partial, because there's a HTTP API to Matrix server that needs to acknowledge before a message is transmitted. 221 | 222 | *If asynchronous, optimistic or not*. Optimistic. Though there are sone considerations for the conflict resolutions when it comes to auth events, which are synced separately from normal message events. This is part of room state, which is a shared dictionary and they use room state resolution to agree on current state. This is in order to deal with soft failure / ban evasion prevention. 223 | 224 | More on ban evasion: In order to prevent user from evading bans by attaching to an older part of DAG. These events may be valid, but a federation homeserver checks if such an event passes the current state auth checks. If it does, homeserver doesn’t propagate it. A similar construct does not appear in Briar, since there’s no such federation contruct and there’s no global notion of the current state. 225 | 226 | *Replica placement*. Partial. DAG and homeserver can choose to get previous state when the join the network. 227 | 228 | #### Secure Scuttlebutt 229 | *Single-master vs multi-master*. The principal data structure in SSB is a log, and only you can write to your own log. Single-master. 230 | 231 | *Synchronous (eager) vs asynchronous (lazy)*. Asynchronous, you add to your own log first. 232 | 233 | *If asynchronous, optimistic or not*. Optimistic. 234 | 235 | *Replica placement*. Full, because you start syncing your log from scratch. 236 | 237 | #### Swarm 238 | *Single-master vs multi-master*. 239 | 240 | *Synchronous (eager) vs asynchronous (lazy)*. 241 | 242 | *If asynchronous, optimistic or not*. 243 | 244 | *Replica placement*. Full or partial replication. 245 | 246 | #### Comparison 247 | 248 | | Dimension | Briar | Matrix | SSB | Swarm | 249 | |------------|------|------| ----| ---| 250 | | *Multi-master?* | Mulit-master | Multi-master | Single-master | | 251 | | *Asynchronous?* | Asynchronous | Partial | Asynchronous| | 252 | | *Optimistic?* | Optimistic | Optimistic | Optimistic | | 253 | | *Replica placement* | Partial | Partial | Full | 254 | 255 | ### How are they syncing? 256 | 257 | #### Bramble 258 | *P2P Topology*. Multiple options, but basically friend to friend. Can work either directly over Bluetooth or Tor (Internet). For BT it works by directly connecting to known contact, i.e. friend to friend. For Tor it uses onion addresses , it also leverages a DHT (structured). 259 | 260 | *Peer Discovery*. Usually add contact locally. beacon signal locally, then keep track of and update transport properties for each contact. Leverage Tor onion addresses. 261 | 262 | *Transport requirements*. Requires a transport layer security protocol which provides a secure channel between two peers. For Briar, that means Bramble Transport Protocol. 263 | 264 | #### Matrix 265 | *P2P Topology*. Super-peer based on homeservers. Pure P2P discused discussed but not live yet. 266 | 267 | *Peer Discovery*. 268 | 269 | *Transport requirements*. Written as a set of HTTP JSON API:s, so runs on HTTP. Though there are some alternative transports as well. 270 | 271 | #### Secure Scuttlebutt 272 | *P2P Topology*. Private p2p, group-based. 273 | 274 | *Peer Discovery*. Gossip-based, friends and then different levels of awareness 1-hop, 2-hop and 3-hop. 275 | 276 | *Transport requirements*. 277 | 278 | #### Swarm 279 | *P2P Topology*. Public structured DHT. 280 | 281 | *Peer Discovery*. Kademlia, and caching peers in various proximity bins. 282 | 283 | *Transport requirements*. 284 | 285 | #### Comparison 286 | 287 | | Dimension | Briar | Matrix | SSB | Swarm | 288 | |------------|------|------| ----| ---| 289 | |*P2P Topology* | F2F | Super-peer | Group-based | Structured | 290 | | *Peer Discovery* | | | | | 291 | | *Transport requirements* | | | | | 292 | 293 | ### Guarantees provided 294 | 295 | #### Bramble 296 | *Consistency model*. Casual consistency, since each message encodes its (optional) message dependencies. This DAG is hidden for non-participants. 297 | 298 | *Privacy-preservation*. It is a friend to friend network, and preserves whatever privacy underlying layer provides, so it has Tor's properties, and for short-distance the threat model is justifably weaker due to the increased cost of surveillence. However, a peer knows when another message is received by the receiving ACK (though it is possible to delay sending this, if on wishes). 299 | 300 | #### Matrix 301 | *Consistency model*. DAG so casual consistency, generally speaking. 302 | 303 | *Privacy-preservation*. Relies on homeservers, so sender and receiver anonymity not provided. Generally, if adversary controls homeserver they have a lot of information. 304 | 305 | #### Secure Scuttlebutt 306 | *Consistency model*. Sequential consistency due to append-only log for each participant. 307 | 308 | *Privacy-preservation*. 309 | 310 | #### Swarm 311 | *Consistency model*. 312 | 313 | *Privacy-preservation*. 314 | 315 | #### Comparison 316 | *Consistency model*. 317 | 318 | *Privacy-preservation*. 319 | 320 | | Dimension | Briar | Matrix | SSB | Swarm | 321 | |------------|------|------| ----| ---| 322 | |*Consistency model* | Casual | Casual | Sequential | | 323 | | *Privacy-preservation* | Yes | | | | 324 | | *Transport requirements* | | | | | 325 | 326 | ### Engineering and application 327 | 328 | #### Bramble 329 | *Actively used*. Released in the Play Store and on F-Droid. Seems like it, but not huge. ~300 reviews on Google Play Store, if that means anything. 330 | 331 | *Mobile friendliness*. Partial. Yes in that it is built for Android and works well, even without Internet and with high network churn. Note that battery consumption is still an issue, and that it runs in the background. The latter makes an iOS app difficulty. 332 | 333 | *Well-defined spec*. Yes/Partial, see Bramble specifications. Not machine readable though. 334 | 335 | *Protocol upgradability*. Haven't seen any specific work in this area, though they do have multiple versioned specs, which generally seem based on accretion. 336 | 337 | *Framing considerations*. As far as I can tell this is not a consideration. 338 | 339 | #### Matrix 340 | *Actively used*. Matrix has users in the millions and there are multiple clients. So definitely. 341 | 342 | *Mobile friendliness*. Due to its super-peer architecture, mobile is straightforward and it can leverage basic HTTP PUT/long-lived GETs like normal messenger apps. DNS for homeservers can also be leveraged. 343 | 344 | *Well-defined spec*. Partial/Yes. Well-described enough to have multiple clients, but not machine-readable as far as I can tell. 345 | 346 | *Protocol upgradability*. Spec is versioned and rooms can have different versions that can be upgraded. Exact process around that is currently a bit unclear, though it does appear to have been a major consideration. 347 | 348 | *Framing considerations*. Matrix supports multiple forms of message types. Aside from synced Persisted Data Units (PDU) they also have Ephemeral (EDU) and queries. PDUs record history of message and state of room. EDUs don’t need to be replied to, necessarily. Queries are simple req/resp to get snapshot of state. 349 | 350 | #### Secure Scuttlebutt 351 | *Actively used*. In production and many clients. Unclear exactly how active. 352 | 353 | *Mobile friendliness*. 354 | 355 | *Well-defined spec*. 356 | 357 | *Protocol upgradability*. 358 | 359 | *Framing considerations*. 360 | 361 | #### Swarm 362 | *Actively used*. Currently in POC testing phase with limited real-world usage. 363 | 364 | *Mobile friendliness*. 365 | 366 | *Well-defined spec*. 367 | 368 | *Protocol upgradability*. 369 | 370 | *Framing considerations*. 371 | 372 | #### Comparison 373 | 374 | 375 | | Dimension | Briar | Matrix | SSB | Swarm | 376 | |------------|------|------| ----| ---| 377 | |*Actively used* | Yes | Very | Yes | POC | 378 | | *Mobile friendliness* | Partial | Yes | | | 379 | | *Well-defined spec* | Partial | Partial | | | 380 | | *Protocol upgradability* | | | | 381 | | *Framing considerations* | No | Yes | | 382 | 383 | ### Table Summary 384 | 385 | **RFC: Does a full table summary makes sense? Should it list all dimensions? What would be the most useful?** 386 | 387 | | Dimension | Bramble | Matrix | SSB | Swarm | 388 | | ---------- | -------- | -------- | --- |--- | 389 | | Replication object | Message (DAG) | Message* (DAG/*) | Messages (?) (Log) | Chunks (File/Manifest) | 390 | | Single-master? | Single-master | Depends* | Single-master | | 391 | | Asynchronous? |Yes | Partial | Yes | Yes | Yes | 392 | | Asynchronous optimistic? | Yes | Yes | Yes | Yes | 393 | | Full vs partial replication? | Partial | Partial | Partial? | Partial | 394 | | Consistency model | Casual consistency | Casual | Eventual / Casual | 395 | | Active replication? | No | Yes | Yes | Yes | 396 | | P2P Topology | F2F Network | Super-peer | Group-based | Structured | 397 | 398 | #### Brief discussion 399 | 400 | What these results mean. Things that come together or not, independent considerations. 401 | 402 | ## 5. Summary 403 | 404 | TODO. 405 | 406 | Should answer what this means for you, given your constraint. For us specifically: mobile-friendly considerations, etc. Possibly a section on future work. 407 | 408 | ## 6. Acknowledgements 409 | 410 | TODO. 411 | 412 | Should reflect people who have helped shape this document. 413 | 414 | ## 7. References 415 | 416 | xx1 417 | [Martins, Pacitti, and Valduriez, “Survey of Data Replication in P2P Systems.”](https://hal.inria.fr/inria-00122282/PDF/Survey_of_data_replication_in_P2P_systems.PDF) 418 | 419 | xx2 420 | [Bramble Synchronisation Protocol, version 1](https://code.briarproject.org/briar/briar-spec/blob/master/protocols/BSP.md) 421 | 422 | xx3 423 | [Matrix Specification](https://matrix.org/docs/spec/) 424 | 425 | xx4 426 | [Scuttlebutt Protocol Guide](https://ssbc.github.io/scuttlebutt-protocol-guide/) 427 | 428 | xx5 429 | [Swarm documentation, release 0.3](https://buildmedia.readthedocs.org/media/pdf/swarm-guide/latest/swarm-guide.pdf) 430 | 431 | 432 | 433 | ## TODO 434 | 435 | - Add graphs of linear vs DAG version history 436 | - Clarify user, identity, multi-device definitions 437 | - Mention alternative to data sync, i.e. for messaging RTC and framing, but motivate why useful especially for p2p 438 | - Better document structure, e.g. see SoK Secure Messaging for example of survey paper 439 | - Apples to apples: break apart e.g. Matrix and Swarm since they aren't directly comparable 440 | - Capture in how are they syncing: API and push/pull. 441 | - Consider collapasing actual replication unit. 442 | - Matrix should capture auth state semantics. 443 | - Mentions: How deal with other things that are kind of replicated but differently? E.g. metadata, etc. Auth state, ephemeral, manifests, feeds, etc. 444 | - Remove redundancy. In the case dimensions are generally the same, consider treating it under same header. 445 | - Fill in rest of definitions. 446 | 447 | ### Further work and research questions 448 | 449 | **1. Multiple-device management and data sync.** (minor) 450 | How do various solutions deal with multiple-devices and identities? I.e. Matrix, Briar etc. What does this look like for SSB if you have a single log but write from mobile and desktop? Related: ghost user problem and mitigations. 451 | 452 | **2. CRDTs and version history.** (minor) 453 | Elaborate on what problem CRDTs solve, their consistency model and problem domain, as well as how this related to logs and DAGs for version history. 454 | 455 | **3. Breaking apart data sync into better components.** 456 | The dimensions right now reads like a laundry list, and I'm not confident these are orthogonal considerations. Especially the "other considerations" section. Ideally there'd be just a few different dimensions that matter the most, with possible sub problems. A la principal component analysis, likely only a few really fundamental differences. Related to breaking things apart into similarties and differences. 457 | 458 | **4. Extend comparison with more applications and protocols.** 459 | E.g. Bittorrent, Git, Tribler, IPFS, Whisper, Status. Revisit Tox/TokTok. 460 | 461 | **5. Clarify requirements of sync protocol in question.** 462 | E.g. if it has transport requirements or certain protocol dependencies. This also relates to how tightly coupled a sync protocol is to other protocols. It is also relevant for engineering reality. 463 | 464 | **6. Ensure knowns in data sync research log are captured** 465 | I.e. for the comparative work that has been done in https://discuss.status.im/t/data-sync-research-log/1100, this should be summarized under the various dimensions. 466 | 467 | **7. Ensure previously unknowns brought up considerations are captured.** 468 | I.e. all of these https://discuss.status.im/t/data-sync-next-steps-and-considerations/1056 should be captured in some way or another. As well as all the loose ends/questions posted in https://discuss.status.im/t/data-sync-research-log/1100. This might fork into new future work items. 469 | 470 | **8. Peer discovery.** 471 | Capture how who wants to sync finds each other. Consider how encryption plays a role here in terms of visibility. 472 | 473 | **9. User-centric, capture client implementers and requirements.** 474 | The user of a data sync layer is the sync client writer, not end user. What do they want to think about and not think about? Capture requirements better. 475 | 476 | **10. Separate out Swarm vs IPFS storage and distribution comparison.** 477 | Looking at similarities and differences between these probably more informative, even though there's overlap. Also keeps the comparison scope more tight. 478 | -------------------------------------------------------------------------------- /data_sync/p2p-data-sync-mobile.md: -------------------------------------------------------------------------------- 1 | # P2P Data Sync for mobile 2 | 3 | ## Introduction 4 | 5 | How do we achieve user friendly data syncronization in a p2p network for resource restricted devices? In this paper we survey existing work, and propose a solution combining several of the most promising technologies. 6 | 7 | In a p2p network you often want to reliably transmit and replicate data across participants. This can either be large files, or small messages that users want to exchange between each other in a private chat. The paticipants can either be human, or machines, or a combination thereof. P2P networks are one type of challenging network that has a few special characteristics: nodes often churn a lot (low uptime) and there might be malicious nodes if you haven't established trust beforehand. There are also considerations such as NAT traversal, which we treat as out of scope for this paper. Doing P2P on mobile and similar resource restricted devices is extra challenging, due to the churn being even more of an issue, as well as having to pay attention to limited bandwidth usage, battery, storage and computational requirements. A lot of these considerations overlap with other types of challenging networks, such as VANETs, DTNs and mesh networks. 8 | 9 | In addition to synchronization data, you might also want other types of properties such as: enabling casual consistency for message ordering, privacy-preservation through self-exploding messages, censorship resistance, and so on. 10 | 11 | This paper is aimed at introducing one part of a secure messaging stack. 12 | 13 | ## Related Work 14 | 15 | Data synchronization in p2p networks have existed for a while. The most commonly used modern examples are probably Bittorrent and Git. With p2p cryptocurrencies like Bitcoin and Ethereum, we have seen an explosion in the amount of infrastructure being built to solve various use cases. In Ethereum you have projects like IPFS and Swarm, that deal with content storage and distribution, as well as associated protocols with it. 16 | 17 | Outside of the cryptocurrency or Web3 community we also have seen many projects building decentralized networks and apps recently with open associated protocols. We have projects like Briar with its Bramble Synchronization protocol, Tribler with its Dispersy Bundle Synchroization, Matrix, Secure Scuttlebutt and a few others. 18 | 19 | It's useful to compare these on a few dimensions to understand how they are similar and how they differ. There are many ways to slice a cake, and we'll focus in a few of the more relevant dimensions. Let's first give a brief overview of the main sources of inspiration. 20 | 21 | ### Server based model 22 | 23 | In a server based model things are rather simple. The server is the source of truth, and you do updates synchronously with it. So I have a device, make a request to some server and send update. This means it is easy to guarantee otherwise impossible attributes, such as sequential consistency. This comes at the cost of centralization, with drawbacks such as availability issues (whether deliberate, by accident, by service provider or by third party), censorship, lack of ownership and control of your data, and surveillance. We mention it here only to provide contrast with the other models. 24 | 25 | ### Git 26 | 27 | Git is a decentralized version control system. It was built out of the needs of Linux kernel development, which is inherently distributed and with many sources of truth that needs to be reconciled. Essentially a git repository is a DAG, and then you create branches and can pull down other remote references and merge them with yours. In case of different views, you either need to rebase (changing you own dependency tree), or create a merge commit. While you could use Git for other things than just code, it isn't what it is designed for. Often commits operate on the same single piece of data (codebase), which leads to things like merge conflicts etc. So while the commits themselves are immutable, the thing you are manipulating is not, and it often leads to conflicts and so on. 28 | 29 | ### Bittorrent 30 | 31 | Bittorrent has quite a few parts to it, and it can acts as a private group-based network or as an open structured network. A common mode of operation is having a tracker, then getting other nodes that you share chunks of a single static file. 32 | 33 | One interesting aspect of Bittorrent is that it has a form of incentive mechainsm that's tit for tat. Essentially to encourage pairwise contributions within the game that is sharing a single static torrent file. This game doesn't extend outside of sharing a single file, which has lead to private trackers using things like seed ratio as a reputation system in a centralized fashion, leading to exposure to censorship and coercion. 34 | 35 | For its structured network, most Bittorrent implementations leverage a DHT of nodes, that allow you to figure out who is sharing a piece of data. It is worth noting that the object being shared is static, a torrent file once uploaded doesn't change, and the integrity/consistency of it is communicated in the form of a checksum that's usually communicated in the initial (centrally distributed) download. 36 | 37 | ### Matrix 38 | 39 | Matrix is a tool for various forms of group chat, including on mobile. Matrix is in its current mode of operation operating a hybrid architecture. It is neither fully p2p nor client-server, instead it uses a federated mode or a super-peer architecture. As a user, you connect to a homeserver, which then syncs data with other known homeservers. While there are plans to move to a p2p architecture where each node is running its own homeserver, there appears to be some unsolved issues, similar to the ones noted in other protocols. 40 | 41 | Matrix distinguishes between syncable messages and ephemeral messages, which allows it to have non-heavy typing notifications and so on. They also have two DAGs, where one is used for reconciling auth events to ensure people who are banned from a chat room can't easily just rejoin as unbanned. Since it uses a DAG, it also means a chat context can be offline, such as in a submarine, for an extended period of time, then come online and sync up with the rest of the world. Because you need to connect to a homeserver to sync, it requires the user to be online or have an active connection with a homeserver. 42 | 43 | The super-peer archicture means there's only a simple TCP connection and no high availablity requirements for end users, which leads to a good user experience for mobile devices. However, it isn't pure peer to peer, and suffer many of the same centralizing aspects as other server based approaches, including security concerns, censorship resistance, and so on. 44 | 45 | ### Secure Scuttlebutt (SSB) 46 | 47 | SSB is a form of group-based private p2p network based on trusted relationsihps append-only feeds. These feeds are single-master and offline/local by default. You can follow a feed to get updates from it, and and by following a feed, and being followed, you become aware of more feeds. Essentially you see content 1-hop away, fetch in a cache content 2 hops away, and you are aware of 3 hop. The way bootstrapping works is that you connect to a known Pub server, which follows you back. By following you back, you become aware of the connection that pub has, and so on, in concentric circles, like a social graph owned by you. Feeds are unencrypted by default, and the way messaging works is that you post to your own feed with a message encrypted by the other person's public key. It's then a client detail how to make this into a good user interface. 48 | 49 | ### Bramble Synchronization Protocol (BSP) 50 | 51 | BSP is used in Briar, which is a messaging app for mobile devices. It is based on a friend-to-friend network, so it leverages trusted connections. It uses a secure transport protocol to transfer data from one device to another securely. While BSP is made for mobile devices, it assumes devices are largely online. This has battery concerns, and also means some difficulty in the real-world for running on iPhone devices. 52 | 53 | BSP works by keeping a log for each group context, and you replicate data asynchronously. Depending on the specific settings, you then offer to send or send messages to other peers that you are sharing with. If they receive it, they acknowledge this to you, and if they don't the sender resends. Each peer keeps track of the state of other nodes to know when to resend messages. 54 | 55 | Because it makes no special network assumptions, it works just as well over sneakernet (USB stick). 56 | 57 | Two drawbacks off this approach are: 58 | - high churn is problematic, as can be verified in a simulation / back of the envelope calculation 59 | - large group chats over a multicast network lead to a lot of overhead due to each message being pairwise 60 | 61 | ### Dispersy Bundle Synchronization (DBS) 62 | 63 | Dispersy is a form of anti-entropy sync, which uses bloom filters to advertise locally available bundles to random nodes in the network. The receiving node then tests this bloom filter against their own locally available bundles, and requests missing messages. As each random interaction steps, we get closer and closer to a consistent world view among all nodes. 64 | 65 | This approach is very bandwidth efficient, as Bloom Filters are very space efficient way of conveying probabilistic information, and the effects of false positives can be mitigated. One problem with using Bloom Filters is that if you have a lot of elements your false positive rate can shoot up quite quickly. To mitigate this, DBS uses a subset of elements approach. Essentially they cut up the locally available bundles in clever ways, which allows the false positive rate and the bloom filter size stay the same. The latter is important, as for challenging networks you want to keep the payload limited to allow for NAT puncturing with UDP, and too big of a payload leads to fragmentation. 66 | 67 | DBS is used in Tribler, where it's used for their megacaches, which essentially contains various forms of social information (peers similar, videos liked, etc). It's important to note that this doesn't have to be 100% up to date to be useful information. There are two main downsides of this approach as far as the author can tell: 68 | 69 | - it is probabilistic and doesn't aim to be used for guaranteed message ordering (i.e. casual consistency) 70 | - you sync data with all peers, including potentially untrusted peers 71 | 72 | ### Swarm Feeds and chunks 73 | 74 | Swarm is one of the legs of Web3 in Ethereum world. There are many related protocols to Swarm, such as access control, PSS for routing, incentivization, etc. It's essentially storage for the so called world computer. Swarm spreads data into chunks that are spread onto all the nodes in the network. It's leveraging a form of DHT, a structured network, where the data placement is related to the address of the content. This means individual nodes don't have a say in what they store, which is a form of censorship resistance, as no one is really in charge of data storage. 75 | 76 | All data is immutable in Swarm by default, so to deal with mutability Swarm has Feeds. This is a mutable resource, similar to a pointer or DNS. It essentially acts as a replicated log. This is useful for smartphones, since you can replicate your log to the network, and thus get close to 100% available guarantees. There are a few downsides to this approach: 77 | 78 | - Your log is by default accessible by anyone 79 | - The receiver need to actively go fetch your log to sync 80 | - In a group context, there might be a lot of feeds to sync 81 | 82 | These can be mitigated to various degrees by using things like PSS, topic negotiation, and so on. There are trade-offs to all of these though. 83 | 84 | ## Problem motivation 85 | 86 | Why do we want to do p2p sync for mobilephones in the first place? There's three components to that question. One is on the valuation of decentralization and peer-to-peer, the second is on why we'd want to reliably sync data at all, and finally why mobilephones and other resource restricted devices. 87 | 88 | For decentralization and p2p, there are various reasons, both technical and social/philosophical in nature. Technically, having a user run network means it can scale with the number of users. Data locality is also improved if you query data that's close to you, similar to distributed CDNs. The throughput is also improved if there are more places to get data from. 89 | 90 | Socially and philosophically, there are several ways to think about it. Open and decentralized networks also relate to the idea of open standards, i.e. compare AOL with IRC or Bittorrent and the longevity. One is run by a company and is shutdown as it stops being profitable, the others live on. Additionally increasingly control of data and infrastructure is becoming a liability [xx]. By having a network with no one in control, everyone is. It's ultimately a form of democratization, more similar to organic social structures pre Big Internet companies. This leads to properties such as censorship resistance and coercion resistance, where we limit the impact a 3rd party might have a voluntary interaction between individuals or a group of people. Examples of this are plentiful in the world of Facebook, Youtube and Twitter. 91 | 92 | For reliably syncing data at all, it's often a requirement for many problem domains. You don't get this by default in a p2p world, as it is extra unreliable with nodes permissionslessly join and leave the network. In some cases you can get away with only ephemeral data, but usually you want some kind of guarantees. This is a must for reliable group chat experience, for example, where messages are expected to arrive in a timely fashion and in some reasonable order. The same is true for messages there represent financial transactions, and so on. 93 | 94 | Why mobilephones? We live in an increasingly mobile world, and most devices people use daily are mobile phones. It's important to provide the same or at least similar guarantees to more traditional p2p nodes that might run on a desktop computer or computer. The alternative is to rely on gateways, which shares many of the drawbacks of centralized control and prone to censorship, control and surveillence. 95 | 96 | More generally, resource restricted devices can differ in their capabilities. One example is smartphones, but others are: desktop, routers, Raspberry PIs, POS systems, and so on. The number and diversity of devices are exploding, and it's useful to be able to leverage this for various types of infrastructure. The alternative is to centralize on big cloud providers, which also lends itself to lack of democratization and censorship, etc. 97 | 98 | ### Minimal Requirements 99 | 100 | In terms of minimal requirements or design goals for a solution we propose the following. 101 | 102 | 1. MUST sync data reliably between devices. 103 | By reliably we mean having the ability to deal with messages being out of order, dropped, duplicated, or delayed. 104 | 105 | 2. MUST NOT rely on any centralized services for reliability. 106 | By centralized services we mean any single point of failure that isn’t one of the endpoint devices. 107 | 108 | 3. MUST allow for mobile-friendly usage. 109 | By mobile-friendly we mean devices that are resource restricted, mostly-offline and often changing network. 110 | 111 | 4. MAY use helper services in order to be more mobile-friendly. 112 | Examples of helper services are decentralized file storage solutions such as IPFS and Swarm. These help with availability and latency of data for mostly-offline devices. 113 | 114 | 5. MUST have the ability to provide casual consistency. 115 | By casual consistency we mean the commonly accepted definition in distributed systems literature. This means messages that are casually related can achieve a partial ordering. 116 | 117 | 6. MUST support ephemeral messages that don’t need replication. 118 | That is, allow for messages that don’t need to be reliabily transmitted but still needs to be transmitted between devices. 119 | 120 | 7. MUST allow for privacy-preserving messages and extreme data loss. 121 | By privacy-preserving we mean things such as exploding messages (self-destructing messages). By extreme data loss we mean the ability for two trusted devices to recover from a, deliberate or accidental, removal of data. 122 | 123 | 8. MUST be agnostic to whatever transport it is running on. 124 | It should not rely on specific semantics of the transport it is running on, nor be tightly coupled with it. This means a transport can be swapped out without loss of reliability between devices. 125 | 126 | We've already expanded on the rationale for 1-3. Let's briefly expand on the need for the other requirements. For 4, the reality is that mobile devices are largely offline, and you need somewhere to get data from. To get reasonable latency, this requires some additional form of helper service. It also ties into 3, which is to allow for a good UX on mobile, which means reliable and reasonable latency. 127 | 128 | 5 is a must to at least be able provide casual ordering between events, though it is up to clients of the protocol to use this or no. 6 is useful for non important messages, such as typing notifications, discovery beacons, and so on. The alternative here is to use a separate protocol, which would make this data sync layer less pluggable and general for various types of transports. It is similar to TCP and UDP running on IP. This also ties into 8, where the protocol should be able to use any underlying transport from device to device. 129 | 130 | For 7, this is often a desirable feature for secure messaging applications. The idea is that you want to do exploding messages, or maybe wipe your device if you are afraid of local seizure [xx]. This is an explicit requirement as otherwise you get into hairy solutions when for example repairing a DAG. 131 | 132 | ## Solution / System Model 133 | 134 | # System model 135 | 136 | ## System components 137 | 138 | Here's a brief overview of the components that together make up data sync. 139 | 140 | 1. A *node*, or device, synchronizes data with other nodes. 141 | 2. *Data*, or a message, is the minimal unit being synchronized, and it is references by a globally unique *message ID*. 142 | 3. A *group* is an independent sync scope consisting of multiple nodes synchronizing some messages. 143 | 4. A *log* is an ordered list of messages by a particular node in a sync scope. 144 | 5. A *feed* is a persistent identifier to the head of a log, which allows us to identify missing data. 145 | 146 | Nodes belonging to the same group can choose which other nodes they synchronize with. A log is local by default, but can be replicated as part of communication with other nodes (passive replication) and, as an extension, actively replicated through decentralized file storage. A feed can either be implicit, in that it is the last message received by another node, or it can be explicit, in the case of fetching updates from mostly-offline nodes. Assuming messages include message IDs from other nodes, by taking logs from multiple nodes in a group together, we can optionally form an immutable graph. 147 | 148 | ## Overview 149 | 150 | We propose a protocol that's heavily inspired by BSP, with some tweaks, some minor and some major. Let's first introduce BSP in more detail as specified before diving into enhancements. 151 | 152 | We synchronize messages between devices. Each device has a set of peers, of which it chooses a subset to which it wants to synchronize messages with. Each synchronization happens within a data group, and in a data group there's a set of immutable messages. Each message has a message id that identifies it. 153 | 154 | There are the following message types: OFFER, ACK, SEND, REQUEST. ACK acknowledges a set of message IDs, whereas REQUEST requsests it. OFFER offers a set of message ids without sending them, whereas SEND sends the actual messages. Depending on their sharing policy, a client can choose to OFFER messages or SEND messages first, depending on if latency (roundtrips) or bandwidth (payloads) is at a premium. 155 | 156 | **Tweak 1:** For payloads we propose using Protobuf to allow for upgradability and implementation in multiple languages. For wire protocol we use a slightly different one, see next major section for this. 157 | 158 | **Tweak 2:** One difference from BSP is that we allow multiple messages types to occur in one packet, so you can ACK messages at the same time as you OFFER messages. This is to allow for fewer roundtrips, as well as piggybacking ACKs. This is useful if a node might not want to disclose that they received a message straight away. 159 | 160 | A device keeps track of its peers that it is synchronizing data with. When a device sends a message, it starts by appending it locally to its own log. It then tries to OFFER or SEND messages to each of the peers it is sharing with. This is operating on a positive retransmission with ACKs basis, simlar to e.g. TCP/IP. It uses exponential backoff and keeps track of send count to ensure it doesn't send too many messages to an offline node. 161 | 162 | A receiving node might receive messages out of order. Inside the payload, a useful thing to have is a set of parent IDs. These are message dependencies that can be requested specifically before delivering the message. So a node knows what data it is missing, and can request it, by default from the sending node but additionally in other ways. Only once all message dependencies have been met does the message actually get delivered, as some order has been guaranteed. This is useful when casual consistency is needed, but isn't a strict requirement and it might be domain dependent. 163 | 164 | ### Enhancement 1: Leveraging multicast 165 | 166 | Another difference with BSP is that we allow for leveraging multicast protocols. In these you might have a topic construct, which you know multiple receivers are listening to. If you are sending on a topic that you know a set of participants are listening to, this means a client counts this as a send to those participants. If you have 100 participant on a topic, this means a 100x factor improvement in bandwidth usage. A mapping is kept between known participants and a topic to facilitate this. 167 | 168 | As a specific example, if you have a group chat with 100 participants, you can ACK a set of messages. This means you publish this ACK (message ids) to a topic. As a result, all participants who are able to read this message knows you recevied it. Additionally, they are now aware that this is a message id that exists, and may choose to request it. However, this still has the drawback that each node sends ACKs to the same topic, which can lead to undesirable bandwidth consumption. This points to using ACKs less aggressively and leveraging more randomized and optimized approaches. These additional enhancements in case of large group chats are possibly, see below for more radical changes. 169 | 170 | This is an optional feature, as it isn't guaranteed the listeners of a topic are the intendendent recipients, which is a somewhat weaker assumption than friend to friend with mandatory trust establishment. 171 | 172 | ### Enhancement 2: Swarm for replicated log 173 | 174 | One drawback mentioned earlier is that high churn is problematic. As a simple example, we can imagine a mobile node only being online 10% of the time. This means 9/10 sends will be failures, and this will cascade with failing to deliver ACKs, leading to more resends, and so on. Additionally, with exponential backoff and possibly offline devices at non-overlapping times, the latency might be unacceptable. 175 | 176 | The fundamental issue here is one of low-availability. How do we solve this without reverting back to centralized solutions? One such approach is to replicate the log remotely. Similar to what is done in Git, but ideally in a 'no one owns this way', which points to IPFS and Swarm. This enables an interesting property, where you can essentially upload and forget, and, assuming incentives around storage policy and so on work out, you can be reasonably sure the data is still there for you even if you lose your local copy. 177 | 178 | A replicated log can exist on Swarm and IPFS, assuming the message ids map to references used there. What Swarm Feeds provide is a way to show where the last message is, and doing so without requiring individual nodes to send or offer a message to that peer individually. This means as long you know which log to pull from, you can query it at some interval. Additionally, Swarm Feeds uses an adaptive algorithm that allows you to seamlessly communicate the expected update frequency. 179 | 180 | One downside of this is that it requires Internet access and a specific overlay network, currently devp2p. This is an optional enhancement, and there are many ways to replicate remote logs, the important thing is that the other node is aware of it. 181 | 182 | The way it'd work is as follows. A node commits a message to their local log, then they sync it to some decentralized file storage, such as Swarm or IPFS. This information is further confirmation that the data has been replicated and acts as a form of ACK where Swarm is seen as a node. Recipients then know, through previous communication with sender node, that they can look at this log. This means the two nodes don't need to be both online, and they can leverage Swarm, or other similar solutions, with less latency. Another way of looking at it is as chunks providing a linked list. Even if a similar solution lacks a feeds-like mechanism (e.g. like ENS or DNS), it can still be used for chunk storage, even though latency might be slightly higher. 183 | 184 | Let's decompose the above into orthogonal concerns. 185 | 186 | #### Content addressed storage (CAS) 187 | 188 | Swarm with its chunks and references is a content addressed storage. This allows us to get and put data there, with various properties. There are also other content addressed storages, such as IPFS and local node's cache. What's important is that there's a way to refer to messages, and that this is based on a cryptographic hash function. This means we have two properties: collision resistance and preimage resistance. This allows us to uniquely refer to a piece of content, and it can be leveraged by clients in term of ordering of events. It also allows us to check the data integrity of a specific message. 189 | 190 | A desirable property here is to be as agnostic as possible to the specific infrastructure used. How can we refer to the same piece of content through various stores? This is an open question, but tentatively multihashes and multicodec can be used for this. 191 | 192 | One way of solving this issue is by wrapping. That is, a client might advertise they have the capability to replicate messages on Swarm. Reciving nodes then know to look at a certain feed, and referenced there we have chunk IDs. Inside the chunks we have the actual messages. So we have both ChunkIDs (Swarm references), and inside the chunks themselves the actual messages with their message IDs. This ensures messages stay immutable and are not tightly coupled to a specific storage or distribution mechanism. 193 | 194 | #### Update pointers 195 | 196 | In the above section, we suggest using Feeds. However, this is just one alternative. Other alternatives are ENS, DNS, etc. We can also use endpoint communicating the latest update through a push-message. It's desirable that this notion is kept abstract. 197 | 198 | What it does is to (a) allow you to see last updated data and ideally (b) reference previous state, so you can fetch it via CAS, or at least be aware that it's missing. 199 | 200 | ### Enhancement 3: Dispersy Bloom Filters 201 | 202 | Another drawback mentioned in earlier section is that large group data sync contexts over multicast network lead to a lot of overhead due to each message being pairwise. By leveraging the multicast capability when it exists we mitigate this somewhat. However, a simple back of the envelope calculation shows that this might still be too much. If you have 100 nodes in a group context, a node sending a message over a topic results in 100 naive ACKs over topic. These can clumped together and so on but it still might be an unacceptable bandwidth trade off. Without relying on centralized intermediaries, this leads to ACKs being relaxed somewhat, while still ensuring we keep reliablity. One way of doing this is by using some of the ideas in DSP. 203 | 204 | By using bloom filters and repeat random interactions with other nodes in a data sync context to offer and acknowledge messages, we'll eventually reach a form of steady state, as this is a form of anti-entropy sync. 205 | 206 | We can extend the message types as follows. In large group contexts, OFFER and ACK, all messages that have a set of message ids, can be replaced with BLOOM_AVAILABLE and some additional subset description. This conveys the same information as OFFER and ACK but probabilistically, in the following way. If we have a data sync context with 100 nodes, one node A has some messages locally available, and it connects to a random node B. A then sends a bloom filter along with some additional information. B then compares each locally available message (for some range) with the bloom filter sent. It gets a response 'maybe' or 'not' in set, and false positives here are fine. If it is not in set, it knows A is missing these messages, and in return it sends a specific REQUEST for these messages, along with its own Bloom filter. B then does vice versa. Messages that are in the 'maybe' category can be seen as probabilistic ACK, and no further action is needed from the receiving node. This information can be discarded, or it can be used as probabilistic information to inform things like future send time. 207 | 208 | #### Subset description 209 | How is the subset described? What's most important is that there's a way to divide up a set of messages into some equally sized buckets. This is important to ensure the bloom filter length and false positive rate stays the same. A good way of doing this is using Lamport timestamps. These are non-unique but monotonic, and they roughly evenly divide up the space of messages. I.e. we can select a range 0..1000 and communicate this in the BLOOM_AVAILABLE message, and the recipent knows which subset of messages to look for. What happens if there's a byzantine node that sends messages with the same timestamp? In that case, the sending node would select a smaller subset partition range. The universe size for the bloom filter is determined by sender, and it simply hints at the receiver where it should look in its own store. 210 | 211 | We don't just want to arbitrarily divide the space, instead we want to make sure we send a request that's most likely to get the information we need. Similar to DSP, depending on if the node has almost caught up or is just starting, different heuristics can be used. For example, if a node just joined, it can use a modulo heuristic. This way a field in BLOOM_AVAILABLE would also be modulo and offset. As an example, if you expect there to be roughly 1000 messages, you might select modulo 10. This would then divide this up the space into 10 pieces, which can be requested in parallel. I.e. 1st, 11th, 21th message, and so on. This approximates a linear download, without requiring any state. For more recent messages, a pivot heuristic that biases recent messages is desirable, that probabilistically picks messages that are more recent. For further analysis of this, we refer to the DSP paper. 212 | 213 | #### Sync scope and location of subset description 214 | A brief experimental note on sync scope and Lamport clocks. In some cases we might want untrusted nodes so not use information inside payload. One thing worth noticing here is where we want the global time partial ordering Lamport clocks to be stored. By default, they will be inside a data sync context payload. There may be circumstances were it is desirable to expand this. As a concrete example: imagine you have several one on one chats, each their own data sync context, since the chats are encrypted. If these devices are mostly offline, and you don't leverage remote replicated logs, it may be desirable to let other nodes help you sync these messages for you. This is similar to having an encrypted torrent file, but still having a way to refer to roughly ordered chunks, so other nodes can seed it. It may be that it's desirable to have nested sync contexts, where participants can choose to let other nodes help them sync in order to reduce latency. By doing so, another level of Lamport clocks can be added in the header, and the sync scope ID can also be used to divie up the space. This design is experimental, and belongs in future work. 215 | 216 | Notice that this general approach might still have to be complemented with REQUESTs. If we want casual consistency, it's desirable to have a way to refer to specific message ids to fetch these messages. 217 | 218 | Also note that this would work without a specific structured overlay on top of Internet, for example over a mesh network. 219 | 220 | ## Specification and wire protocol 221 | 222 | What follows is the actual specification and wire protocol. Note that this is the Minimal Version of Data Sync (MVDS). Both of these are in draft mode and currently being tested. They don't currently implement any of the enhancements listed above. 223 | 224 | ### Caveats for the minimal version (Status specific) 225 | 226 | This section is specific to Status engineering concerns to outline some aspects regarding the current state of MVDS (mid-May 2019). 227 | 228 | 1. On mode of replication: This version uses passive replication where nodes share data they are already interested in. In future version, this may be extended to use active replication, see enhanement 2 and 3. 229 | 230 | 2. On the role of mailservers: This does not replace mailservers outright, it amends them and makes them less critical. A simple litmus test is that reliability is not impacted by mailserver outage. Specificially, it deals with correctness first, not latency and mostly-offline per se. It is compatible with mostly-offlice enhanchements (see requirement 3), and this will be a focus of future iterations, for example through the enhancements mentioned. 231 | 232 | 3. On parent IDs: This is a concern for the specific clients that uses data sync layer. Other approaches may also be used and are not actively discouraged. It is a desirable aspect though, as it allows us to leverage CAS and build up a DAG, as outlined above. As future iterations are further along and example clients are implemented, this will likely be more clear. 233 | 234 | 4. Current implementation: Currently there's a reference implementation being integrated `status-console-client` (Golang). The (developing) spec is the source of truth though. In the future, it is expected that more clients will be integrated, for example in Javascript, Nim and others. For integration into the app, this is up to Core to leverage the code in `status-console-client`. 235 | 236 | #### Mailservers and data sync upgrade path 237 | 238 | In a sentence, data sync provides reliability/correctness whereas mailserver is currently used to provide less latency/more availability. 239 | 240 | What's the contract for mailservers? 241 | A node requests messages from a set of topics (roughly group context). This can be added with a time interval hint. Asynchronously, this mailserver will send expired messages. This is a set of messages, some of which the node can read and some they can't. There may be messages missing from this. Without data sync, we currently assume and require mailservers to have high availability to pick up all the envelopes. 242 | 243 | The upgrade path looks like this: 244 | 1. Ad hoc messaging + mailservers (need to be HA) 245 | 2. Data sync + mailservers (doesn't need to be HA) 246 | 3. Data sync + Swarm (e.g.) 247 | 248 | That is, we can leave mailserver as they are. For dealing with missing data, data sync will ensure not ACKed messages are resent. Alternatively and additionally, the Dispersy enhancement and/or message dependencies can be used to ensure receiver has received relevant messages. During a Chaos Unicorn day event, lack of mailservers would merely lead to message delays, not lost messages. 249 | 250 | ### Types 251 | 252 | #### Custom Types 253 | 254 | | Type | Equivilant | Description | 255 | |-------------|------------|-----------------------------------------------------------| 256 | | `MessageID` | `bytes32` | 32 bytes of binary data obtained when hashing the message | 257 | 258 | #### Message Types 259 | 260 | We define `messages` as packets sent and recieved by MVDS nodes. They have been taken from the BSP spec. 261 | 262 | ##### `ACK` 263 | 264 | @TODO: DESCRIBE 265 | 266 | ```python 267 | { 268 | 'ids': ['MessageID'] 269 | } 270 | ``` 271 | 272 | ##### `OFFER` 273 | 274 | @TODO: DESCRIBE 275 | 276 | ```python 277 | { 278 | 'id': ['MessageID'] 279 | } 280 | ``` 281 | 282 | ##### `REQUEST` 283 | 284 | @TODO: DESCRIBE 285 | 286 | ```python 287 | { 288 | 'id': ['MessageID'] 289 | } 290 | ``` 291 | 292 | ##### `MESSAGE` 293 | 294 | @TODO: DESCRIBE 295 | 296 | ```python 297 | { 298 | 'group_id': 'bytes32', 299 | 'timestamp': 'int64', 300 | 'body': 'bytes' 301 | } 302 | ``` 303 | 304 | ### Protobuf 305 | 306 | ``` 307 | syntax = "proto3"; 308 | 309 | package mvds; 310 | 311 | message Payload { 312 | Ack ack = 1; 313 | Offer offer = 2; 314 | Request request = 3; 315 | repeated Message messages = 4; 316 | } 317 | 318 | message Ack { 319 | repeated bytes id = 1; 320 | } 321 | 322 | message Message { 323 | bytes group_id = 1; 324 | int64 timestamp = 2; 325 | bytes body = 3; 326 | } 327 | 328 | message Offer { 329 | repeated bytes id = 1; 330 | } 331 | 332 | message Request { 333 | repeated bytes id = 1; 334 | } 335 | ``` 336 | 337 | ### Wire protocol 338 | 339 | ### Example Clients 340 | 341 | There are many clients that might use data sync, and each client might have their own policy for sharing messages, etc. This is where the semantics of what is actually in a message lives, which might differ in terms of guarantees and so on depending the domain. 342 | 343 | #### Private chat 344 | 345 | A simple example is a private chat, which includes two humans talking encrypted. Additionally, each human might have multiple devices. Even though it might just be two individuals, we still have a group context. 346 | 347 | #### Private group chat 348 | 349 | This is similar to private chat, except you might have some semantics around who can join a chat. This can be specified in the form of a finite state machine, and message types such as INVITE can be protobuf messages. 350 | 351 | #### Public (group) chat 352 | 353 | This is a channel with a lot of participants, and it might not be end to end encrypted. Additionally, it might be coordination-less, such that you can take a given string and hash it to a 'topic', and then join the chat. 354 | 355 | Membership here might be more probabilistic, and sharing policy more restricted/random, due to bandwidth constraints. 356 | 357 | #### State channels 358 | 359 | To enable layer 2 scaling solutions, such as state channels. This might have more stringent requirements than normal 'human' chat. 360 | 361 | #### Transport properties 362 | 363 | To communicate what transports a specific node supports. This can be leveraged by other clients in terms of where to look for updates, etc. 364 | 365 | #### Tribute to talk 366 | 367 | Setting a specific limit for a stake that has to be paid before messages are delivered to an end user. 368 | 369 | #### Multisig coordination 370 | 371 | Ensuring nodes get updates on when a multisig signature is required. A private group chat with native support for multsig interactions. 372 | 373 | #### And so on 374 | 375 | As you can see, a lot of different types of clients are possible. What they have in common is that they leverage data sync to various extents, and don't have to think about reliability. Instead, they can specify semantics they care about. By specifying these in e.g. protobufs and having clear message types, roles, finite state machine and sharing policy, we enable implementation of these clients into multiple end user applications 376 | 377 | ## Proof-Evaluation-Simulation 378 | 379 | Various types of simulation are needed to verify the design. Specifically wrt parameters such as bandwidth and latency. Here we draft some example simulations. 380 | 381 | ### Simulation 1: 1-1 chat (basic) 382 | 383 | Two nodes talking to each with 10% churn each. That is: 10% probability to be online at any given time. When online, a node is online for X time. For simplicity, let's assume X is 5 minutes. This can be a parameter and be varied up to connection windows as short as 30s. 384 | 385 | Send 5 messages each. 386 | 387 | Answer the following questions: 388 | 1. What's the bandwidth overhead? 389 | Expressed as multipler of 10 messages. E.g. if you try to send message one message 3 times and get 1 ack, that's x4 multipler. 390 | 391 | 2. What's the latency? 392 | Expressed as ticks or absolute time until a node has received all messages sent by other node. Alternatively: expressed as a distribution of average or median latency, along with P90 latency. I.e. what's the latency for the 90%th delayed message? 393 | 394 | #### Simulation 2: 1-1 chat (basic, naive extension) 395 | 396 | Answer same questions above, but where a mailserver with 90% reliability relays messages. 397 | 398 | ### Simulation 3: Large public chat (basic) 399 | 400 | 100 people sending 5 messages each. 401 | 402 | Answer same questions as above. 403 | 404 | ### Simulation 4: Large public chat (basic, multicast) 405 | 406 | Same as above, but leverage multicast extension. 407 | 408 | ### Simulation 5: Large public chat (basic, multicast, naive extension) 409 | 410 | Multicast+naive mailserver. 411 | 412 | ### Simulation 6: Large public chat (basic, dispersy extension) 413 | 414 | Dispery random pairwise bloom filter. 415 | 416 | ### Simulation 7: Large public chat (basic, dispersy extension, naive extension) 417 | 418 | Dispery random pairwise bloom filter, and mailserver. 419 | 420 | ### Notes 421 | 422 | Can probably be compacted differently, and might make sense to vary or measure other variables. 423 | 424 | ## Future work 425 | 426 | ## Conclusion 427 | 428 | ## References 429 | 430 | 431 | 432 | - [Bramble Synchronization Protocol](https://code.briarproject.org/briar/briar-spec/blob/master/protocols/BSP.md) 433 | - [Minimal Viable Data Sync](https://github.com/status-im/mvds/) --------------------------------------------------------------------------------