├── README.md └── ideas.md /README.md: -------------------------------------------------------------------------------- 1 | Footnote 2 | ======================= 3 | 4 | Footnote is an experimental Handshake layer two protocol for storing, 5 | synchronizing, and addressing small amounts of data. Its purpose is to create a 6 | global, decentralized, and permissionless data commons that allows everyone to 7 | store and retrieve data from any node on the Footnote network. 8 | 9 | Abstract and Motivation 10 | ----------------------- 11 | 12 | The Internet today favors a client-server model. Clients - who function like 13 | customers - request or upload data to servers - who function like vast 14 | warehouses of data under the control of the companies that own them. For most 15 | applications, this works well. There's no need nor ability for one's browser to 16 | know everything about every other Facebook user, the same way there's no need 17 | for people to stockpile every item at their local Costco. 18 | 19 | The client-server model, however, has its tradeoffs. Take Twitter: whenever a 20 | user sends a Tweet, that Tweet becomes the property of Twitter. Similarly, it 21 | is not Twitter's users that decide what appears on their news feeds but rather 22 | Twitter itself. Oftentimes this is done for scalability or content moderation 23 | reasons. Twitter needs to serve Tweets to over three hundred million people, so 24 | unfettered access to their firehose would be technically infeasible. Similarly, 25 | while ads help Twitter pay for hosting, not all content is advertiser-friendly. 26 | In sum, clients sacrifice the sovereignty of their data, as well as their 27 | ability to query other people's data, in return for scale and convenience. 28 | 29 | Footnote exists to serve data that may not fit the existing client-server mold. 30 | Rather than building another warehouse, Footnote creates an open plaza that 31 | anyone can observe or contribute to. Every node on the Footnote network serves 32 | everyone else's data. There's no DHT querying, sharding, or partitioning: every 33 | node is equal. This allows developers to build rich, interconnected applications 34 | on top of Footnote that leverage a persistent, global view of all content on the 35 | network. Imagine a version of the Internet built to be "locked open"; every 36 | server stores every website's content by default. While this might sound 37 | redundant at first, consider the decentralization-preserving protective effects 38 | (i.e. equal access) if the web we know today was built on top of it. Users 39 | could query data locally by default, and be able to choose which filters or and 40 | moderation to apply (or even to supplement with centralized infrastructure). 41 | This is what Footnote aims to enable. 42 | 43 | Creating such a global view would be impossible without constraints. This is 44 | because without constraints, as the network grows, resource and administration 45 | requirements increase. This effect prices out smaller players. Footnote's 46 | constraints are (i) bounded storage available to (ii) a bounded number of names. 47 | For the former, Footnote identifies users by their Handshake name, allowing each 48 | user to store a maximum of 1 mebibyte on the network. For the latter, Handshake 49 | itself imposes a maximum global renewal limit through its on-chain transaction 50 | throughput: around 50-100 million names can exist on Footnote. Taken together, a 51 | Footnote node's maximum storage requirement is on the order of tens of 52 | terabytes. As the cost of storage trends downwards, the resources required to 53 | run a Footnote node should trend downwards as well, even as the network itself 54 | grows in usage. In sum, Footnote leverages Handshake's Sybil-resistance 55 | properties and constraints to create the first durably decentralized public data 56 | commons. 57 | 58 | Protocol Specification 59 | ----------------------- 60 | 61 | At this time, there is no canonical specification. Early in the design process, 62 | a loose protocol specification was previously defined through a series of 63 | Protocol Improvement Proposals (PIPs), which can be found 64 | [here](https://github.com/kyokan/footnote-PIPs). They are made available for 65 | accuracy and to provide context. Please see the other document in this 66 | repository, [ideas.md](https://github.com/kyokan/footnote/blob/master/ideas.md) 67 | for examples of potential protocol improvements. 68 | 69 | 70 | Links 71 | ----- 72 | 73 | #### Protocol Implementations 74 | 75 | | Language | Project | Notes | 76 | | ----- | ---- | ----- | 77 | | Go | [fnd](https://github.com/kyokan/fnd) | Reference Implementation, by [@mslipper](https://github.com/mslipper) | 78 | 79 | #### Client Libraries 80 | 81 | | Language | Project | Description | 82 | | ----- | ---- | ----- | 83 | | TypeScript | [fn-client](https://github.com/kyokan/fn-client) | RPC client for fnd, message format utilities, and [social](https://github.com/kyokan/footnote-PIPs/blob/master/pip-008.rst) records | 84 | -------------------------------------------------------------------------------- /ideas.md: -------------------------------------------------------------------------------- 1 | # Footnote Protocol Ideas 2 | 3 | This document outlines several areas of potential improvement. It is not meant to be a recommendation or roadmap. The community is free to elaborate on the changes proposed here or explore new designs. The community is also feel free to disregard any of these ideas or explore/expand the [PIPs](https://github.com/kyokan/footnote-PIPs). 4 | 5 | ## Note on P2P 6 | * It is suboptimal to use protocols such as libp2p (or bittorrent), as it is focused on DHT/pubsub, instead of a broadcast medium. It uses protobuf as a dependency (a lack of canonicality for cryptographic proofs), as well a heavy dependency upon library quirks, making it exceptionally difficult for it to be portable and provable in embedded devices or other languages (in the face of adversaries exploiting library consensus faults). These types of protocols are useful as a parallel layer, e.g. metadata can be stored as a pointer to larger content in DHT-based networks. Additionally, differences in protocol design is necessary for this specific use case (as network protocol design will be the biggest factor in performance). An example would be making the protocol ensure high throughput in a multicast layer 3 with high number of messages per second (satellite broadcasting all traffic to a wide area). 7 | 8 | # Subdomains 9 | * A domain actually only maintains a 64KB record of up to 256 (uint8) subdomains. 10 | * It contains a list of subdomains. Conceptually, the data structure can be: uint32 idCreateAt, uint8 highestSectorPosition, [64]byte subdomainname, [32]byte pubkey. 11 | * The idCreatedAt is a unique id and timestamp combination. If one wants to create 256 records immediately, just submit 256 spaced 1 second apart. The records *MUST* have incrementing idCreatedAt messages (not out of order). 12 | * There are a maximum of 256 sectors to allocate (64KB each). The highestSectorPosition is the number of sectors allocated plus the previous highest. E.g. if the first record is 3 and the second record is at 5, then that means that the first record has 3 sectors of usable space and the second sector has 2 sectors of usable space. 13 | * A fixed allocation per slot (256bytes) and sets the ID to its position would be more difficult when incremental updates are used. If data is pruned/compacted, then the IDs will implicitly change, making fixed IDs much more challenging. It, however, may still be easy to set a fixed size per slot for easier scanning. 14 | * The wire protocol should reflect that it is updating subdomains instead of domains. When a domain gets updated, the subdomain entries get added as well. 15 | * Subdomain records are treated as WRITE ONLY, and can only be reset with a new on-chain Update message on Handshake (with a new TXT record pubkey) or RENEW messages. 16 | * The highest subdomain id value is the one assigned to duplicate subdomain names. Duplicate IDs should not exist, if they exist both should be ignored. 17 | * Subdomain pubkey changes are presumed to be a new record, so there will be a space cost for key rotation. The old record bytes are still reserved from future use and are effectively consumed until there is an on-chain update. There's no clear way to safely "delete"/"move" records, as a equivocation of multiple incremental updates (see below) with a fraud proof can be produced at any time, removing the ability for an overwrite to be correct. The code for this sounds too annoying, so it's better to be kept simple. The result would mean if a subdomain wants to use a new pubkey, for example (e.g. key lost), they have to create a new record with the same name -- the old space used is still consumed. 18 | 19 | # Signature Commitments 20 | * The domain owner's signature (and subdomain's) signature should commit to the name. Otherwise, a 3rd party can create a Handshake TXT record which uses the same pubkey. E.g. Imagine if .alice wants to use her name. Then a 3rd-party troll who owns .badmeme points to the same key. 21 | 22 | # UPDATE message fix 23 | * Currently, UPDATE and UPDATEREQ may be suboptimal for bandwidth efficiency. It is better to relay only the current timestamp when one has a newer update available. Send a notification of the timestamp available, and the peer can respond with an UPDATEREQ if desired. 24 | 25 | # Incremental Updates 26 | * The general idea here is to shift from a random access/update to an append-only log, which can be reset to zero with a new epoch. This may increase performance significantly, as it means that the P2P messages will be much smaller, as well as easier to process (only need to read the updates). 27 | * All updates are in 256-byte increments. Updates which are not exactly divisible by 256bytes will result in a little bit extra space "wasted" (which can be compacted in the future; in the next epoch). 28 | * Merkle proof is not needed for incremental updates. 64KB pages. 256byte sectors. Each sector is hashed into 32bytes, and incrementally hashed. The final sector when 64KB have been filled, becomes the page hash. Each page hash is incrementally hashed forward. This still allows for submitting the entire dataset in 8192B series of hashes (256 hashes representing 64KB each) as currently designed. 29 | * Entire pages can be downloaded similar to the current implementation. However, incremental updates are possible within pages. Simply include the current tip hash, and the peer will respond with the remaining data. This is fully verifiable, as any equivocation should be recognized by the peer (and may be responded with a proof that the tiphash does not match the peer's record). 30 | * New epochs mean that the entire data is reset back to nothing. The incremental log starts over. This is used when the blob fills up. 31 | 32 | # Banning and recovery of invalid incremental updates 33 | * A problem arises when a subdomain submits an invalid incremental update. One cannot do a temporary ban, as a malicious subdomain could backdate their timestamp and do many frequent invalid updates. One cannot ascertain for how long one can ban for, as well as whether one is receiving an old replay from a peer. 34 | * Therefore, any subdomain which submits an invalid incremental update should have that ID banned via a proof. 35 | * Include a domain record, domain header for that record, subdomain headers, and proof of invalid incremental update (two subdomain headers). 36 | * Anyone requesting ID will respond with the ban information. 37 | 38 | # Header formats 39 | 40 | Domain update header: 41 | 1. [32]byte domainHash 42 | * Hash is: a hash of the top level domain name 43 | * Most not be a hash of a TLD which contains periods 44 | 2. [32]byte currentTxNormalizedHash 45 | * This value must equal the current handshake record value of the most recent output of this name on Handshake. 46 | * After the update record is synced (hours later), this value is stale, then it is treated as stale/invalid and rejected/ignored. we want to make it so that a domain update doesn't overlap with the old one, as this is an append-only log. It is the node's implementation responsibility to expire the old name record when there is an RENEW/UPDATE/etc. 47 | * We do not use the blockheight since this cannot be presigned. Meaning, the transaction blockheight cannot be known ahead of time. If the name is owned by a multisig, then the multisig group should agree and sign on the domain record before they update the name record. Otherwise, if blockheight was used as the commitment and they end up disagreeing later (or go offline), they will not have any records, resulting in very unhappy subdomain users. This resolves that, by committing to the transaction (normalized, with the witnesses stripped) as one knows the transaction id beforehand. This can be verified on a light client as well. Note that the update messages only need to provide the blockheight, but the SIGNATURE is signing the currentTxNormalizedHash. 48 | * Sync speed is not immediate as that introduces consensus faults, it may look like it works when one tries to make record updates fast, but it's a trivial DoS vector. As how it works now, the updates happen periodically every n handshake block heights. 49 | * If there is a RENEW/UPDATE, the TLD owner *SHOULD* send a new header/record within the update period to ensure that the data can continue to be available. 50 | * If the current state is TRANSFER/FINALIZE/REVOKE or any bidding state, then the record is presumed to be valid until expiration or a new RENEW/UPDATE transaction. 51 | 3. uint8 size 52 | * Size of the record in 256 byte chunks. 53 | * Each name record consumes 256 bytes exactly. 54 | * This *MUST* be above any prior size unless the hnsRecordBlockheight is incremented/updated. 55 | * Update records, therefore, should be querying (domain, hnsRecordBlockeight) and receive a return value of a size. 56 | 4. [32]byte hash 57 | * 256-byte chunk hash commitment, see below for hashing format 58 | 5. [64]byte signature 59 | * The signature commits to: domainHash, timestamp, hash. 60 | 61 | Domain Update notification: 62 | 1. domainHash 63 | 2. uint32 currentTxBlockheight 64 | 3. size 65 | 66 | If one receives a domain update and (epoch, size) is: 67 | * Is lower than what you have: respond with a domain update notification of your own 68 | * Is equal: do nothing 69 | * Is higher: queue with a download request. 70 | 71 | Domain request response wire message (different than the header): 72 | 1. domainHash 73 | 2. uint32 currentTxBlockheight: This is different as the currentTxHash can be derived (note to account for duplicates). What this provides is a clean notification/indicator when the epoch resets 74 | 3. size 75 | 4. hash 76 | 5. signature 77 | 6. list of sector hashes 78 | 79 | Subdomain update header: 80 | 1. [32]byte domainHash 81 | * Hash is: a hash of the top level domain name 82 | * See domain header for more information 83 | * Does not commit to the RENEW/UPDATE, since we want the record to persist and 84 | validate without a new record. 85 | * *MUST NOT* be a hash of a TLD which contains periods 86 | 2. uint32 idCreatedAt 87 | * See Subdomain section above. This is the unique subdomain ID. 88 | * If this is an unrecognized idCreatedAt, then the node does not process it. 89 | 3. uint32 epochTimestamp 90 | * This subdomain's epoch timestamp (when the blob was last reset). 91 | * Can be updated by the subdomain owner at will without contacting the domain owner. 92 | * Nodes can set policies on minimum time before accepting a new epoch (e.g. only allow one new epoch every 2 weeks), should maintain a local copy of when the epoch was last updated locally to keep track of when to accept, that local timestamp should be used *NOT* 2 weeks after the epoch timestamp. 93 | * This *MUST* be above any prior timestamp 94 | * This *SHOULD* be equal or above the timestamp in the TLD's subdomain record 95 | * This *MUST NOT* be more than 3 hours in the future 96 | 4. uint16 size 97 | * Size of the record in 256-byte chunks. 98 | * Since this is an increasing record (and the timestamp is in messages), there is no need for a timestamp in the header. 99 | * This *MUST* be above any prior size unless the epochTimestamp is incremented. 100 | * Update records, therefore, should be querying (domain, idCreatedAt, epochTimestamp) and receive a return value of a size. 101 | 5. [32]byte hash 102 | * The sequentially hashed contents of all current blob data 103 | * Commits to all 256-byte chunk hashes, see below for hashing format 104 | 6. [32]byte messageRoot 105 | * Merkle root of all messages, size 2^16. If there are over 65536 messages, then the subsequent messages are ignored. 106 | * This *MUST NOT* be enforced on the protocol level. If someone submits garbage or all zeroes, it is fine. Their message simply cannot be validated on light clients. The primary people being hurt by this is the owner of the subdomain; propogation is only affected by the sectorHash. 107 | * Even though it is not validated, it must be sent over the wire to build the signature. 108 | 7. [64]byte signature 109 | * The signature commits to: domainHash, idCreatedAt, epochTimestamp, size, sectorHash, messageRoot. (less than 128 bytes per chunk so this is fine) 110 | * Both the name and ID are committed to since the domain owner could assign the same pubkey to a different name in the future. 111 | 112 | Subdomain update notifications occur in a manner similar to domain updates. 113 | Subdomain Update Notification: 114 | 1. domainHash 115 | 2. idCreatedAt 116 | 3. epochTimestamp 117 | 4. size 118 | 119 | # Fraud Proofs 120 | 121 | The fraud proofs contain: 122 | * Domain/Subdomain Header 1 123 | * Height and Sector Hash of the equivocating sector for Header 1 124 | * Subsequent (64KB hashed) sector checkpoints to generate the proofs for Header 125 | 1 126 | * Domain/Subdomain Header 2 127 | * Height and Sector Hash of the equivocating sector for Header 2 128 | * Subsequent (64KB hashed) sector checkpoints to generate the proofs for Header 129 | 2 130 | 131 | When a fraud proof is validated, the record is permanently banned until: 132 | * If it is a subdomain, a new valid domain record contains the same name 133 | * If it is a domain, then a new committed currentTxNormalizedHash is used 134 | (Handshake mined a new transaction for that name) 135 | 136 | This record will be relayed when others request this name. 137 | 138 | 139 | # Sector Hash 140 | 141 | Currently, the design uses a merkle tree of 64KB sectors. Instead, it may be easier to use a serial hash within each sector. The sectors themselves are then serially hashed. 142 | 143 | Within each 64KB sector, every 256-bytes gets hashed in a 32-byte hash. Those 32-byte hashes are sequentially hashed with the previous hash. Messages below 256-bytes are padded with zeroes to be 256-bytes exactly. 144 | 145 | For each sector, there are 256 32byte hashes. These sectors are then serially hashed to produce a final hash. 146 | 147 | There can be minimal changes over the wire, as every 64KB is hashed and the standard UpdateResponse message includes all sector hashes. By downloading 256 * 32B = 8192B similar to the UpdateResponse message, one can do chunked updates. One simply takes the previous checkpoint hash, and hashes the 64KB of data, which should produce the next checkpoint. 148 | 149 | This structure is ideal for incremental updates. 150 | 151 | #### Example pseudocode 152 | 153 | This is an example of appending 256bytes at a time (of course, should be optimized to do an arbitrary amount of data). 154 | 155 | Input: hash of existing blob, new data being added (256 bytes at a time) 156 | Output: new final hash 157 | 158 | ``` 159 | addHash([32]byte prevSectorHash, 160 | [32]byte prevHash, 161 | [256]byte input, 162 | uint16 pos): 163 | [32]byte sectorHash = blake2(append(prevHash, input)) 164 | if ((pos + 1) % 256 != 0) 165 | return sectorHash, nil 166 | else 167 | //The second part is what goes into the 8192byte UpdateResponse 168 | //The hashes can be recreated using a series of these 169 | return blake2(append(prevSectorHash, sectorHash)), sectorHash 170 | ``` 171 | 172 | When it fills up a 64KB sector, the first value is the value stored as the tip hash of the blob (all items hashed). The second return value gets checkpointed to disk. For a maximum size of 16MB, a series of the second return value can produce sufficient data to reproduce the current hash. E.g. if one only wants the 3rd sector, they can get 64KB for the 3rd sector, then take in the sectorHash for the 2nd sector and running addHash sequentially should produce the hash for the 4th. By having the 8192B UpdateResponse, any arbitrary sector can be verified. The only added expense is storing these 8192bytes as well as an additional hashing step every 64KB. 173 | 174 | Might need to be tweaked for the last sector to be hashed again as well for smaller proof sizes, but increases the lines of code and storage. 175 | 176 | # Multisig 177 | * Multisig is desirable for domain ownership as some may want a quorum of 30 validators to own a name and to update the domain records (list of subdomain owners), or for the domain owners to also be a quorum of subdomain owners. 178 | * Could be done with straight [ECDSA using GG20](https://eprint.iacr.org/2020/540.pdf). GG20 requires no changes in the protocol to make it multisig. 179 | * Schnorr can be used with fewer cryptographic assumptions, with the penalty of larger proofs and a larger consensus change). The Schnorr pubkey script can be signed by including a subkey as the message, the signature the previous bullet point's key, and the subkey's signature of the message itself. This enables attributable schnorr m-of-n in around 160 bytes. Could include a time range of when a subkey is valid. 180 | * Alternatively, one could consider doing this using plain-jane ECDSA at the application level. E.g., modifying the TXT records on HNS to include the validator set, and checking to see if X of Y normal signatures validate. This avoids compatibility problems introduced by using newer crypto, since it may be hard to find and integrate well-vetted crypto libraries for Schnorr or GG20 in common languages Footnote developers may want to use. 181 | 182 | # Light Client Validation 183 | * The highest right branch committed to could be a commitment of all current messages (each leaf is a message hash) 184 | * There is NO guarantee on the validity of this message. If a user decides to commit to something crazy, they can. They're the ones primarily harmed by this, if they find some weird use case for it, who cares. It's assumed if someone wishing to provide light client proofs, that they must have all the data to provide it. 185 | * For subdomains, the TLD owner should also commit to a merkle tree of all subdomains. 186 | * Full light client proofs provide the TLD header, subdomain message (in the domains' 64KB blob), merkle proof to subdomain ownership record, subdomain header, merkle proof to message, and the message itself. Much of this data, of course, can be cached client-side as multiple messages from the same subdomain would reuse a lot of this data. 187 | 188 | # Notes 189 | * 256 subdomains per domain * ~50MM domains = 12,800,000,000 (12.8 Billion) total records in the worst-case. Each domains can be stored in memory as a 1KB blob of 256 8-byte timestamps. That leaves a record of 50MM in an index. Should be VERY fast. Total memory cache for P2P performance is around ~100GB for high performance storage in RAM/SSD. Timestamps could be cached in high performance implementations since it's assumed the primary message will be notification of peers updates of which names have updated and their timestamp, with a LOT of duplicate messages from many peers. 190 | * There is probably an additional wire message optimization possible for subdomain updates for data retained between epochs, keyed by message timestamp (message timestamps MUST be unique for this to work). A node can always do a full redownload in the event this fails. 191 | * If the UPDATE/RENEW behavior is as described above, someone selling a TLD can update a record with an UPDATE message first to the owner, before doing a TRANSFER. 192 | * Subdomains on the same chain is silly and poorly thought out, as it's no different than a TLD (the dot is to define a different scaling/trust-model/hierarchy). This protocol is about resource constraints, therefore having defined limits is necessary. The limits on the number of TLDs exist in HNS likely due to the cryptoeconomic constraints of maximizing value of names, which maximizes long-term security and value (RENEW transactions function as a lower bound on the lowest value name when the mempool is always full, as there are a limited number of RENEWs per block). 193 | --------------------------------------------------------------------------------