├── .gitmodules ├── LICENSE ├── README.md ├── TODO ├── c ├── .gitignore ├── Makefile ├── README.md ├── example │ ├── Makefile │ └── test.c ├── negentropy_wrapper.cpp └── negentropy_wrapper.h ├── cpp ├── .gitignore ├── README.md ├── negentropy.h └── negentropy │ ├── encoding.h │ ├── storage │ ├── BTreeLMDB.h │ ├── BTreeMem.h │ ├── SubRange.h │ ├── Vector.h │ ├── base.h │ └── btree │ │ ├── core.h │ │ └── debug.h │ └── types.h ├── docs ├── fq.png ├── logo.svg └── negentropy-protocol-v1.md ├── js ├── Negentropy.js └── README.md └── test ├── .gitignore ├── Utils.pm ├── cpp ├── .gitignore ├── Makefile ├── btreeFuzz.cpp ├── check.sh ├── harness.cpp ├── lmdbTest.cpp ├── measureSpaceUsage.cpp ├── measureSpaceUsage.pl └── subRange.cpp ├── csharp ├── .gitignore ├── Harness.csproj ├── Program.cs └── README.md ├── fuzz.pl ├── go-nostr ├── go.mod ├── go.sum └── main.go ├── go └── harness.go ├── js └── harness.js ├── protoversion.pl └── test.pl /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "test/cpp/hoytech-cpp"] 2 | path = test/cpp/hoytech-cpp 3 | url = https://github.com/hoytech/hoytech-cpp.git 4 | [submodule "test/cpp/lmdbxx"] 5 | path = cpp/vendor/lmdbxx 6 | url = https://github.com/hoytech/lmdbxx.git 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright © 2023 Doug Hoyte 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![negentropy logo](docs/logo.svg) 2 | 3 | This repo contains the protocol specification, reference implementations, and tests for the negentropy set-reconciliation protocol. See [our article](https://logperiodic.com/rbsr.html) for a detailed description. For the low-level wire protocol, see the [Negentropy Protocol V1](docs/negentropy-protocol-v1.md) specification. 4 | 5 | 6 | 7 | 8 | 9 | * [Introduction](#introduction) 10 | * [Protocol](#protocol) 11 | * [Data Requirements](#data-requirements) 12 | * [Setup](#setup) 13 | * [Bounds](#bounds) 14 | * [Alternating Messages](#alternating-messages) 15 | * [Algorithm](#algorithm) 16 | * [Fingerprints](#fingerprints) 17 | * [Frame Size Limits](#frame-size-limits) 18 | * [Implementations](#implementations) 19 | * [Applications](#applications) 20 | * [Misc](#misc) 21 | * [Protocol Debugging with fq](#protocol-debugging-with-fq) 22 | * [Testing](#testing) 23 | * [Author](#author) 24 | 25 | 26 | 27 | 28 | ## Introduction 29 | 30 | Set-reconciliation supports the replication or syncing of data-sets, either because they were created independently, or because they have drifted out of sync due to downtime, network partitions, misconfigurations, etc. In the latter case, detecting and fixing these inconsistencies is sometimes called [anti-entropy repair](https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRepairNodesManualRepair.html). 31 | 32 | Suppose two participants on a network each have a set of records that they have collected independently. Set-reconciliation efficiently determines which records one side has that the other side doesn't, and vice versa. After the records that are missing have been determined, this information can be used to transfer the missing data items. The actual transfer is external to the negentropy protocol. 33 | 34 | Negentropy is based on Aljoscha Meyer's work on "Range-Based Set Reconciliation" ([overview](https://github.com/AljoschaMeyer/set-reconciliation) / [paper](https://arxiv.org/abs/2212.13567) / [master's thesis](https://github.com/AljoschaMeyer/master_thesis/blob/main/main.pdf)). 35 | 36 | This page is a technical description of the negentropy wire protocol and the various implementations. Read [our article](https://logperiodic.com/rbsr.html) for a comprehensive introduction to range-based set reconciliation, and the [Negentropy Protocol V1](docs/negentropy-protocol-v1.md) specification for the low-level wire protocol. 37 | 38 | 39 | ## Protocol 40 | 41 | ### Data Requirements 42 | 43 | In order to use negentropy, you need to define some mappings from your data records: 44 | 45 | * `record -> ID` 46 | * Typically a cryptographic hash of the entire record 47 | * The ID must be 32 bytes in length 48 | * Different records should not have the same ID (satisfied by using a cryptographic hash) 49 | * Equivalent records should not have different IDs (records should be canonicalised prior to hashing, if necessary) 50 | * `record -> timestamp` 51 | * Although timestamp is the most obvious, any ordering criteria can be used. The protocol will be most efficient if records with similar timestamps are often downloaded/stored/generated together 52 | * Units can be anything (seconds, microseconds, etc) as long as they fit in an 64-bit unsigned integer 53 | * The largest 64-bit unsigned integer should be reserved as a special "infinity" value 54 | * Timestamps do **not** need to be unique (different records can have the same timestamp). If necessary, `0` can be used as the timestamp for every record 55 | 56 | Negentropy does not support the concept of updating or changing a record while preserving its ID. This should instead be modelled as deleting the old record and inserting a new one. 57 | 58 | ### Setup 59 | 60 | The two parties engaged in the protocol are called the client and the server. The client is sometimes also called the *initiator*, because it creates and sends the first message in the protocol. 61 | 62 | Each party should begin by sorting their records in ascending order by timestamp. If the timestamps are equivalent, records should be sorted lexically by their IDs. This sorted array and contiguous slices of it are called *ranges*. 63 | 64 | For the purpose of this specification, we will assume that records are always stored in arrays. However, implementations may provide more advanced storage data-structures such as trees. 65 | 66 | ### Bounds 67 | 68 | Because each side potentially has a different set of records, ranges cannot be referred to by their indices in one side's sorted array. Instead, they are specified by lower and upper *bounds*. A bound is a timestamp and a variable-length ID prefix. In order to reduce the sizes of reconciliation messages, ID prefixes are as short as possible while still being able to separate records from their predecessors in the sorted array. If two adjacent records have different timestamps, then the prefix for a bound between them is empty. 69 | 70 | Lower bounds are *inclusive* and upper bounds are *exclusive*, as is [typical in computer science](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html). This means that given two adjacent ranges, the upper bound of the first is equal to the lower bound of the second. In order for a range to have full coverage over the universe of possible timestamps/IDs, the lower bound would have a 0 timestamp and all-0s ID, and the upper-bound would be the specially reserved "infinity" timestamp (max u64), and the ID doesn't matter. 71 | 72 | ### Alternating Messages 73 | 74 | After both sides have setup their sorted arrays, the client creates an initial message and sends it to the server. The server will then reply with another message, and the two parties continue exchanging messages until the protocol terminates (see below). After the protocol terminates, the *client* will have determined what IDs it has (and the server needs) and which it needs (and the server has). If desired, it can then respectively upload and/or download the missing records. 75 | 76 | Each message consists of a protocol version byte followed by an ordered sequence of ranges. Each range contains an upper bound, a mode, and a payload. The range's implied lower bound is the same as the previous range's upper bound (or 0, if it is the first range). The mode indicates what type of processing is needed for this range, and therefore how the payload should be parsed. 77 | 78 | The modes supported are: 79 | 80 | * `Skip`: No further processing is needed for this range. Payload is empty. 81 | * `Fingerprint`: Payload contains a [digest](#fingerprints) of all the IDs within this range. 82 | * `IdList`: Payload contains a complete list of IDs for this range. 83 | 84 | If a message does not end in a range with an "infinity" upper bound, an implicit range with upper bound of "infinity" and mode `Skip` is appended. This means that an empty message indicates that all ranges have been processed and the sender believes the protocol can now terminate. 85 | 86 | ### Algorithm 87 | 88 | Upon receiving a message, the recipient should loop over the message's ranges in order, while concurrently constructing a new message. `Skip` ranges are answered with `Skip` ranges, and adjacent `Skip` ranges should be coalesced into a single `Skip` range. 89 | 90 | `IdList` ranges represent a complete list of IDs held by the sender. Because the receiver obviously knows the items it has, this information is enough to fully reconcile the range. Therefore, when the client receives an `IdList` range, it should reply with a `Skip` range. However, since the goal of the protocol is to ensure the *client* has this information, when a server receives an `IdList` range it should reply with its own ranges (typically `IdList` and/or skip ranges). 91 | 92 | `Fingerprint` ranges contain a digest which can be used to determine whether or not the set of data items within the range are equal on both sides. However, if they differ, determining the actual differences requires further recursive processing. 93 | * Since `IdList` or `Skip` messages will always cause the client to terminate processing for the given ranges, these messages are considered *base cases*. 94 | * When the fingerprints on each side differ, the reciever should *split* its own range and send the results back in the next message. When splitting, the number of records within each sub-range should be considered. If small, an `IdList` range should be sent. If large, the sub-ranges should themselves be sent as `Fingerprint`s (this is the recursion). 95 | * When a range is split, the sub-ranges should completely cover the original range's lower and upper bounds. 96 | * Unlike in Meyer's designs, "empty" fingerprints are never used to indicate the absence of items within a range. Instead, an `IdList` of length 0 is sent because it is smaller. 97 | * How to split the range is implementation-defined. The simplest way is to divide the records that fall within the range into N equal-sized buckets, and emit a `Fingerprint` sub-range for each of these buckets. However, an implementation could choose different grouping criteria. For example, events with similar timestamps could be grouped into a single bucket. If the implementation believes recent events are less likely to be reconciled, it could make the most recent bucket an `IdList` instead of `Fingerprint`. 98 | * Note that if alternate grouping strategies are used, an implementation should never reply to a range with a single `Fingerprint` range, otherwise the protocol may never terminate (if the other side does the same). 99 | 100 | The initial message should cover the full universe, and therefore must have at least one range. The last range's upper bound should have the infinity timestamp (and the `id` doesn't matter, so should be empty also). How many ranges used in the initial message depends on the implementation. The most obvious implementation is to use the same logic as described above, either using the base case or splitting, depending on set size. However, an implementation may choose to use fewer or more buckets in its initial message, and/or may use different grouping strategies. 101 | 102 | Once the client has looped over all ranges in a server's message and its constructed response message is a full-universe `Skip` range (ie, the empty string `""`), then it needs no more information from the server and therefore it should terminate the protocol. 103 | 104 | ### Fingerprints 105 | 106 | Fingerprints are short digests (hashes) of the IDs contained within a range. A cryptographic hash function could simply be applied over the concatenation of all the IDs, however this would mean that generating fingerprints of sub-ranges would require re-hashing a potentially large number of IDs. Furthermore, adding a new record would invalidate a cached fingerprint, and require re-hashing the full list of IDs. 107 | 108 | To improve efficiency, negentropy fingerprints are specified as an incremental hash. There are [several considerations](https://logperiodic.com/rbsr.html#fingerprints) to take into account, but we believe the [algorithm used by negentropy](https://logperiodic.com/rbsr.html#fingerprint-function) represents a reasonable compromise between security and efficiency. 109 | 110 | ### Frame Size Limits 111 | 112 | If there are too many differences and/or they are too evenly distributed throughout the range, then message sizes may become unmanageably large. This might be undesirable if the network transport has message size limitations, meaning you would have to implement some kind of fragmentation system. Furthermore, large batch sizes inhibit work pipelining, where the synchronised records can be processed in parallel with additional reconciliation. 113 | 114 | Because of this, negentropy implementations may support a *frame size limit* parameter. If configured, all messages created by this instance will be of length equal to or smaller than this number of bytes. After processing each message, any discovered differences will be included in the `have`/`need` arrays on the client. 115 | 116 | To implement this, instead of sending all the ranges it has found that need syncing, the instance will send a smaller number of them to stay under the size limit. Any following ranges that were sent are replied to with a single coalesced `Fingerprint` range so that they will be processed in subsequent message rounds. Frame size limits can increase the number of messaging round-trips and bandwidth consumed. 117 | 118 | In some circumstances, already reconciled ranges can be coalesced into the final `Fingerprint` range. This means that these ranges will get re-processed in subsequent reconciliation rounds. As a result, if either of the two sync parties use frame size limits, then discovered differences may be added to the `have`/`need` multiple times. Applications that cannot handle duplicates should track the reported items to avoid processing items multiple times. 119 | 120 | 121 | ## Implementations 122 | 123 | This section lists all the currently-known negentropy implementations. If you know of a new one, please let us know by [opening an issue](https://github.com/hoytech/negentropy/issues/new). 124 | 125 | 126 | | **Language** | **Author** | **Status** | **Storage** | 127 | | ---- | ---- | ---- | ---- | 128 | | [C++](cpp/README.md) | reference | Stable | Vector, BTreeMem, BTreeLMDB, SubRange | 129 | | [Javascript](js/README.md) | reference | Stable | Vector | 130 | | [Rust](https://github.com/yukibtc/rust-negentropy) | Yuki Kishimoto | Stable | Vector | 131 | | [Go](https://github.com/illuzen/go-negentropy) | Illuzen | Stable | Vector | 132 | | [C bindings](c/README.md) | DarshanBPatel | Experimental | Same as C++ | 133 | | [Go](https://github.com/nbd-wtf/go-nostr/nip77/negentropy) | fiatjaf | Stable, Nostr-specific | Vector | 134 | | [C#](https://github.com/bezysoftware/negentropy.net) | bezysoftware | Stable | Vector | 135 | | [Kotlin](https://github.com/vitorpamplona/negentropy-kmp) | Vitor Pamplona | Stable | Vector | 136 | 137 | 138 | ## Applications 139 | 140 | This section lists the currently-known applications of negentropy. If you know of a new one, please let us know by [opening an issue](https://github.com/hoytech/negentropy/issues/new). 141 | 142 | * [Bandwidth-efficient Nostr event syncing](https://github.com/hoytech/strfry/blob/next/docs/negentropy.md) 143 | * [Waku Sync](https://github.com/waku-org/research/issues/80) 144 | 145 | 146 | ## Misc 147 | 148 | ### Protocol Debugging with fq 149 | 150 | fiatjaf added support to [fq](https://github.com/wader/fq) to inspect and debug negentropy messages (see [example usage](https://github.com/wader/fq/blob/master/doc/formats.md#negentropy)): 151 | 152 | ![example fq output](docs/fq.png) 153 | 154 | 155 | 156 | ## Testing 157 | 158 | There is a conformance test-suite available in the `testing` directory. 159 | 160 | In order to test a new language you should create a "harness", which is a basic stdio line-based adapter for your implementation. See the [test/cpp/harness.cpp](test/cpp/harness.cpp) and [test/js/harness.js](test/js/harness.js) files for examples. Next, edit the file `test/Utils.pm` and configure how your harness should be invoked. 161 | 162 | Harnesses may require some setup before they are usable. For example, to use the C++ harness you must first run: 163 | 164 | git submodule update --init 165 | cd test/cpp/ 166 | make 167 | 168 | In order to run the test-suite, you'll need the perl module [Session::Token](https://metacpan.org/pod/Session::Token) (`libsession-token-perl` Debian/Ubuntu package). 169 | 170 | Once setup, you should be able to run something like `perl test.pl cpp,js` from the `test/` directory. This will perform the following: 171 | 172 | * For each combination of language run the following fuzz tests: 173 | * Client has all records 174 | * Server has all records 175 | * Both have all records 176 | * Client is missing some and server is missing some 177 | 178 | The test is repeated using each language as both the client and the server. 179 | 180 | Afterwards, a different fuzz test is run for each language in isolation, and the exact protocol output is stored for each language. These are compared to ensure they are byte-wise identical. 181 | 182 | Finally, a protocol upgrade test is run for each language to ensure that when run as a server it correctly indicates to the client when it cannot handle a specific protocol version. 183 | 184 | * For the Rust implementation, check out its repo in the same directory as the `negentropy` repo, build the `harness` commands for both C++ and Rust, and then inside `negentropy/test/` directory running `perl test.pl cpp,rust` 185 | 186 | * For the golang implementation, checkout the repo in the same directory as the `negentropy` repo, then inside `negentropy/test/` directory running `perl test.pl cpp,go` 187 | 188 | * For the kotlin implementation, checkout the repo in the same directory as the `negentropy` repo, then inside `negentropy/test/` directory running `perl test.pl cpp,kotlin` 189 | 190 | ## Author 191 | 192 | (C) 2023-2024 Doug Hoyte and contributors 193 | 194 | Protocol specification, reference implementations, and tests are MIT licensed. 195 | 196 | See [our introductory article](https://logperiodic.com/rbsr.html) or the [low-level protocol spec](docs/negentropy-protocol-v1.md) for more information. 197 | 198 | Negentropy is a [Log Periodic](https://logperiodic.com) project. 199 | -------------------------------------------------------------------------------- /TODO: -------------------------------------------------------------------------------- 1 | get rid of Session::Token dependency for tests 2 | 3 | btree 4 | optimise splitting: should be able to re-use previous range's summary to avoid one traversal 5 | binary search within a node 6 | 7 | range randomisation 8 | 9 | release js package to npm 10 | 11 | ? coalesce Idlist ranges in buildOutput 12 | - at least empty ones! for example when P1=1 P2=0 P3=0 13 | 14 | ? is 16 * 2 optimal in splitRange 15 | 16 | ? figure out how to merge deno branch 17 | -------------------------------------------------------------------------------- /c/.gitignore: -------------------------------------------------------------------------------- 1 | # Prerequisites 2 | *.d 3 | 4 | # Object files 5 | *.o 6 | *.ko 7 | *.obj 8 | *.elf 9 | 10 | # Linker output 11 | *.ilk 12 | *.map 13 | *.exp 14 | 15 | # Precompiled Headers 16 | *.gch 17 | *.pch 18 | 19 | # Libraries 20 | *.lib 21 | *.a 22 | *.la 23 | *.lo 24 | 25 | # Shared objects (inc. Windows DLLs) 26 | *.dll 27 | *.so 28 | *.so.* 29 | *.dylib 30 | 31 | # Executables 32 | *.exe 33 | *.out 34 | *.app 35 | *.i*86 36 | *.x86_64 37 | *.hex 38 | 39 | # Debug files 40 | *.dSYM/ 41 | *.su 42 | *.idb 43 | *.pdb 44 | 45 | # Kernel Module Compile Results 46 | *.mod* 47 | *.cmd 48 | .tmp_versions/ 49 | modules.order 50 | Module.symvers 51 | Mkfile.old 52 | dkms.conf 53 | 54 | # ignore vscode files 55 | .vscode/ 56 | 57 | 58 | -------------------------------------------------------------------------------- /c/Makefile: -------------------------------------------------------------------------------- 1 | # Build a shared library of negentropy 2 | 3 | # Define the root directory of the negentropy project; this absolute path mechanism works across all major os 4 | NEGENTROPY_ROOT := $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) 5 | NEGENTROPY_CPP_ROOT := ../cpp/ 6 | INCS = -I$(NEGENTROPY_CPP_ROOT) -I/opt/homebrew/include/ -I$(NEGENTROPY_ROOT)/vendor/lmdbxx/include/ 7 | 8 | ifeq ($(OS),Windows_NT) 9 | TARGET = libnegentropy.dll 10 | else 11 | TARGET = libnegentropy.so 12 | endif 13 | 14 | .PHONY: all clean install-deps precompiled-header shared-lib 15 | 16 | all: precompiled-header shared-lib 17 | 18 | #TODO: Need to add compilation flags based on OS 19 | install-deps: 20 | brew install lmdb openssl 21 | 22 | # Generate 'negentropy.h.gch' 23 | precompiled-header: 24 | g++ -O3 --std=c++20 -Wall -fexceptions -g $(NEGENTROPY_CPP_ROOT)negentropy.h $(INCS) 25 | 26 | shared-lib: 27 | g++ -O3 -g -std=c++20 $(INCS) -shared -fPIC -o $(TARGET) $(NEGENTROPY_ROOT)negentropy_wrapper.cpp -lcrypto -lssl -L/opt/homebrew/lib/ 28 | 29 | clean: 30 | rm -f $(TARGET) $(NEGENTROPY_CPP_ROOT)/negentropy.h.gch libnegentropy.so 31 | -------------------------------------------------------------------------------- /c/README.md: -------------------------------------------------------------------------------- 1 | # C bindings 2 | 3 | This directory contains a wrapper around the C++ library to simplify integration with languages such as Nim. It builds a shared library, `libnegentropy.so` . 4 | 5 | ## Authors 6 | 7 | Contributed by @darshankabariya and @Ivansete-status and @vpavlin 8 | -------------------------------------------------------------------------------- /c/example/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: all 2 | 3 | all: 4 | g++ -g -O0 --std=c++20 test.c -I../ -lnegentropy -lcrypto -L../ -L/opt/homebrew/lib/ -Wall 5 | -------------------------------------------------------------------------------- /c/example/test.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include "../negentropy_wrapper.h" 10 | 11 | #define MAX_FRAME_SIZE 153600 12 | 13 | void printHexBuffer(buffer buf){ 14 | for (uint64_t i = 0; i < buf.len; ++i) { 15 | printf("%0hhx", buf.data[i]); 16 | } 17 | printf("\n"); 18 | } 19 | 20 | void rec_callback(buffer* have_ids, uint64_t have_ids_len, buffer* need_ids, uint64_t need_ids_len, buffer* output){ 21 | printf("needIds count:%llu , haveIds count: %llu \n",need_ids_len, have_ids_len); 22 | 23 | for (int i=0; i < need_ids_len ; i++) { 24 | printf("need ID at %d :", i); 25 | printHexBuffer(need_ids[i]); 26 | } 27 | 28 | for (int j=0; j < have_ids_len ; j++) { 29 | printf("need ID at %d :", j); 30 | printHexBuffer(have_ids[j]); 31 | } 32 | } 33 | 34 | 35 | int main(){ 36 | void* st1 = storage_new("",""); 37 | if(st1 == NULL){ 38 | perror("failed to create storage"); 39 | } 40 | 41 | 42 | void* st2 = storage_new("",""); 43 | if(st2 == NULL){ 44 | perror("failed to create storage"); 45 | } 46 | 47 | 48 | unsigned char m1[] = {0x6a, 0xdf, 0xaa, 0xe0, 0x31, 0xeb, 0x61, 0xa8, \ 49 | 0x3c, 0xff, 0x9c, 0xfd, 0xd2, 0xae, 0xf6, 0xed, \ 50 | 0x63, 0xda, 0xcf, 0xaa, 0x96, 0xd0, 0x51, 0x26, \ 51 | 0x7e, 0xf1, 0x0c, 0x8b, 0x61, 0xae, 0x35, 0xe9};//"61dfaae031eb61a83cff9cfdd2aef6ed63dacfaa96d051267ef10c8b61ae35e9"; 52 | buffer b1 ; 53 | b1.len = 32; 54 | b1.data = m1; 55 | 56 | unsigned char m2[] = {0x28 ,0x79 ,0x8d ,0x29 ,0x5c ,0x30 ,0xc7 ,0xe6 \ 57 | ,0xd9 ,0xa4 ,0xa9 ,0x6c ,0xdd ,0xa7 ,0xe0 ,0x20 \ 58 | ,0xf7 ,0xaa ,0x71 ,0x68 ,0xcc ,0xe0 ,0x63 ,0x30 \ 59 | ,0x2e ,0xd1 ,0x9b ,0x85 ,0x63 ,0x32 ,0x95 ,0x9e}; //28798d295c30c7e6d9a4a96cdda7e020f7aa7168cce063302ed19b856332959e 60 | buffer b2 ; 61 | b2.len = 32; 62 | b2.data = m2; 63 | 64 | bool ret = storage_insert(st1,time(NULL),&b1); 65 | if (ret){ 66 | printf("inserted hash successfully in st1\n"); 67 | } 68 | 69 | ret = storage_insert(st2,time(NULL),&b2); 70 | if (ret){ 71 | printf("inserted hash successfully in st2\n"); 72 | } 73 | 74 | ret = storage_insert(st2,time(NULL),&b1); 75 | if (ret){ 76 | printf("inserted hash successfully in st2\n"); 77 | } 78 | 79 | buffer b4 ; 80 | b4.len = 0; 81 | b4.data = (unsigned char*)malloc(37*sizeof(unsigned char)); 82 | 83 | printf("storage size of st2 is %d \n",storage_size(st2)); 84 | 85 | void* subrange = subrange_new(st2, 0 , UINT64_MAX); 86 | if (subrange == NULL){ 87 | perror("failed to init subrange"); 88 | return -1; 89 | } 90 | printf("subrange init successful with size %d \n ", subrange_size(subrange) ); 91 | 92 | 93 | void* subrange1 = subrange_new(st1, 0 , UINT64_MAX); 94 | if (subrange == NULL){ 95 | perror("failed to init subrange"); 96 | return -1; 97 | } 98 | printf("subrange init successful with size %d \n ", subrange_size(subrange1) ); 99 | 100 | void* ngn_inst1 = negentropy_new(subrange1, MAX_FRAME_SIZE); 101 | if(ngn_inst1 == NULL){ 102 | perror("failed to create negentropy instance"); 103 | return -1; 104 | } 105 | 106 | void* ngn_inst2 = negentropy_new(subrange, MAX_FRAME_SIZE); 107 | if(ngn_inst2 == NULL){ 108 | perror("failed to create negentropy instance"); 109 | return -1; 110 | } 111 | 112 | 113 | result res; 114 | int ret1 = negentropy_subrange_initiate(ngn_inst1, &res); 115 | if(ret1 < 0){ 116 | perror("failed to initiate negentropy instance"); 117 | return -1; 118 | } 119 | printf("initiated negentropy successfully with output of len %llu \n", res.output.len); 120 | b4.len = res.output.len; 121 | memcpy(b4.data, res.output.data, res.output.len); 122 | free_result(&res); 123 | 124 | buffer b3 ; 125 | b3.len = 0; 126 | b3.data = (unsigned char*)malloc(69*sizeof(unsigned char)); 127 | 128 | ret1 = reconcile_subrange(ngn_inst2, &b4, &res); 129 | if(ret1 < 0){ 130 | perror("error from reconcile"); 131 | } 132 | if (res.output.len == 0){ 133 | perror("nothing to reconcile"); 134 | } 135 | printf("reconcile returned with output of len %llu \n", res.output.len); 136 | b3.len = res.output.len; 137 | memcpy(b3.data, res.output.data, res.output.len); 138 | free_result(&res); 139 | //outSize = reconcile_with_ids(ngn_inst1, &b3, &rec_callback); 140 | 141 | result res1; 142 | reconcile_with_ids_subrange_no_cbk(ngn_inst1, &b3, &res1); 143 | printf("needIds count:%llu , haveIds count: %llu \n",res1.need_ids_len, res1.have_ids_len); 144 | 145 | for (int i=0; i < res1.need_ids_len ; i++) { 146 | printf("need ID at %d :", i); 147 | printHexBuffer(res1.need_ids[i]); 148 | } 149 | 150 | for (int j=0; j < res1.have_ids_len ; j++) { 151 | printf("need ID at %d :", j); 152 | printHexBuffer(res1.have_ids[j]); 153 | } 154 | 155 | free(b3.data); 156 | free(b4.data); 157 | free_result(&res1); 158 | 159 | ret = storage_insert(st1, time(NULL), &b2); 160 | if (ret){ 161 | printf("inserted hash successfully in st1\n"); 162 | } 163 | printf("\n storage size after adding 1 more elem is %d, subrange size is %d \n", storage_size(st1), subrange_size(subrange1)); 164 | 165 | subrange_delete(subrange); 166 | subrange_delete(subrange1); 167 | 168 | printf("storage after subrange deletion, st1 size: %d, st2 size: %d.", storage_size(st1), storage_size(st2)); 169 | 170 | } 171 | -------------------------------------------------------------------------------- /c/negentropy_wrapper.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "negentropy.h" 5 | #include "negentropy/storage/BTreeMem.h" 6 | #include "negentropy_wrapper.h" 7 | #include "negentropy/storage/SubRange.h" 8 | 9 | //This is a C-wrapper for the C++ library that helps in integrating negentropy with nim code. 10 | //TODO: Do error handling by catching exceptions 11 | 12 | void printHexString(std::string_view toPrint){ 13 | for (size_t i = 0; i < toPrint.size(); ++i) { 14 | printf("%0hhx", toPrint[i]); 15 | } 16 | printf("\n"); 17 | } 18 | 19 | void* storage_new(const char* db_path, const char* name){ 20 | negentropy::storage::BTreeMem* storage; 21 | /* 22 | auto env = lmdb::env::create(); 23 | env.set_max_dbs(64); 24 | env.open(db_path, 0); 25 | 26 | lmdb::dbi btreeDbi; 27 | 28 | { 29 | auto txn = lmdb::txn::begin(env); 30 | btreeDbi = negentropy::storage::BTreeMem::setupDB(txn, name); 31 | txn.commit(); 32 | } */ 33 | 34 | storage = new negentropy::storage::BTreeMem(); 35 | return storage; 36 | } 37 | 38 | void storage_delete(void* storage){ 39 | negentropy::storage::BTreeMem* lmdbStorage = reinterpret_cast(storage); 40 | delete lmdbStorage; 41 | } 42 | 43 | int storage_size(void* storage){ 44 | negentropy::storage::BTreeMem* lmdbStorage = reinterpret_cast(storage); 45 | return lmdbStorage->size(); 46 | } 47 | 48 | void negentropy_delete(void* negentropy){ 49 | Negentropy* ngn_inst = reinterpret_cast*>(negentropy); 50 | delete ngn_inst; 51 | } 52 | 53 | void* negentropy_new(void* storage, uint64_t frameSizeLimit){ 54 | //TODO: Make these typecasts into macros?? 55 | negentropy::storage::BTreeMem* lmdbStorage; 56 | //TODO: reinterpret cast is risky, need to use more safe type conversion. 57 | lmdbStorage = reinterpret_cast(storage); 58 | 59 | Negentropy* ne; 60 | try{ 61 | ne = new Negentropy(*lmdbStorage, frameSizeLimit); 62 | }catch(negentropy::err e){ 63 | //TODO: Error handling 64 | return NULL; 65 | } 66 | return ne; 67 | } 68 | 69 | // Returns -1 if already initiated. 70 | int negentropy_initiate(void* negentropy, result* result){ 71 | Negentropy* ngn_inst; 72 | ngn_inst = reinterpret_cast*>(negentropy); 73 | 74 | std::string output; 75 | try { 76 | output = ngn_inst->initiate(); 77 | /* std::cout << "output of initiate is, len:" << output->size() << ", output:"; 78 | printHexString(std::string_view(*output)); */ 79 | } catch(negentropy::err e){ 80 | //std::cout << "Exception raised in initiate " << e.what() << std::endl; 81 | //TODO: Error handling 82 | return -1; 83 | } 84 | if (output.size() > 0 ){ 85 | result->output.len = output.size(); 86 | result->output.data = (unsigned char*)calloc(output.size(), sizeof(unsigned char)); 87 | memcpy(result->output.data, (unsigned char*)output.c_str(),result->output.len) ; 88 | }else { 89 | result->output.len = 0; 90 | result->output.data = NULL; 91 | } 92 | return 0; 93 | } 94 | 95 | void negentropy_setinitiator(void* negentropy){ 96 | Negentropy *ngn_inst; 97 | ngn_inst = reinterpret_cast*>(negentropy); 98 | 99 | ngn_inst->setInitiator(); 100 | 101 | } 102 | 103 | bool storage_insert(void* storage, uint64_t createdAt, buffer* id){ 104 | negentropy::storage::BTreeMem* lmdbStorage; 105 | lmdbStorage = reinterpret_cast(storage); 106 | std::string_view data(reinterpret_cast< char const* >(id->data), id->len); 107 | 108 | /* std::cout << "inserting entry in storage, createdAt:" << createdAt << ",id:"; 109 | printHexString(data); */ 110 | 111 | //TODO: Error handling. Is it required? 112 | //How does out of memory get handled? 113 | return lmdbStorage->insert(createdAt, data); 114 | } 115 | 116 | bool storage_erase(void* storage, uint64_t createdAt, buffer* id){ 117 | negentropy::storage::BTreeMem* lmdbStorage; 118 | lmdbStorage = reinterpret_cast(storage); 119 | std::string_view data(reinterpret_cast< char const* >(id->data), id->len); 120 | 121 | /* std::cout << "erasing entry from storage, createdAt:" << createdAt << ",id:"; 122 | printHexString(data); */ 123 | 124 | //TODO: Error handling 125 | return lmdbStorage->erase(createdAt, data); 126 | } 127 | 128 | int reconcile(void* negentropy, buffer* query, result* result){ 129 | Negentropy *ngn_inst; 130 | ngn_inst = reinterpret_cast*>(negentropy); 131 | std::string out; 132 | try { 133 | out = ngn_inst->reconcile(std::string_view(reinterpret_cast< char const* >(query->data), query->len)); 134 | /* std::cout << "reconcile output of reconcile is, len:" << out->size() << ", output:"; 135 | printHexString(std::string_view(*out)); */ 136 | } catch(negentropy::err e){ 137 | //All errors returned are non-recoverable errors. 138 | //So passing on the error message upwards 139 | //std::cout << "Exception raised in reconcile " << e.what() << std::endl; 140 | result->error = (char*)calloc(strlen(e.what()), sizeof(char)); 141 | strcpy(result->error,e.what()); 142 | return -1; 143 | } 144 | if (out.size() > 0 ){ 145 | result->output.len = out.size(); 146 | result->output.data = (unsigned char*)calloc(out.size(), sizeof(unsigned char)); 147 | memcpy(result->output.data, (unsigned char*)out.c_str(),result->output.len) ; 148 | }else { 149 | result->output.len = 0; 150 | result->output.data = NULL; 151 | } 152 | return 0; 153 | } 154 | 155 | void transform(std::vector &from_ids, buffer* to_ids) 156 | { 157 | for (int i=0; i < from_ids.size(); i ++){ 158 | to_ids[i].len = from_ids[i].size(); 159 | to_ids[i].data = (unsigned char*)from_ids[i].c_str(); 160 | } 161 | } 162 | 163 | int reconcile_with_ids(void* negentropy, buffer* query,reconcile_cbk cbk, char* outptr){ 164 | Negentropy *ngn_inst; 165 | ngn_inst = reinterpret_cast*>(negentropy); 166 | 167 | std::optional out; 168 | std::vector haveIds, needIds; 169 | uint64_t have_ids_len, need_ids_len; 170 | buffer* have_ids; 171 | buffer* need_ids; 172 | 173 | try { 174 | out = ngn_inst->reconcile(std::string_view(reinterpret_cast< char const* >(query->data), query->len), haveIds, needIds); 175 | 176 | have_ids_len = haveIds.size(); 177 | need_ids_len = needIds.size(); 178 | have_ids = (buffer*)malloc(have_ids_len*sizeof(buffer)); 179 | need_ids = (buffer*)malloc(need_ids_len*sizeof(buffer)); 180 | 181 | std::cout << "have_ids_len:" << have_ids_len << "need_ids_len:" << need_ids_len << std::endl; 182 | 183 | transform(haveIds, have_ids); 184 | transform(needIds, need_ids); 185 | } catch(negentropy::err e){ 186 | std::cout << "exception raised in reconcile_with_ids"<< e.what() << std::endl; 187 | //TODO:Find a way to return this error and cleanup partially allocated memory if any 188 | return -1; 189 | } 190 | buffer output = {0,NULL}; 191 | if (out) { 192 | output.len = out.value().size(); 193 | output.data = (unsigned char*)out.value().c_str(); 194 | std::cout << "reconcile_with_ids output of reconcile is, len:" << out.value().size() << ", output:"; 195 | printHexString(std::string_view(out.value())); 196 | } 197 | std::cout << "invoking callback" << std::endl; 198 | std::flush(std::cout); 199 | 200 | cbk(have_ids, have_ids_len, need_ids, need_ids_len, &output, outptr); 201 | std::cout << "invoked callback" << std::endl; 202 | std::flush(std::cout); 203 | 204 | free(have_ids); 205 | free(need_ids); 206 | return 0; 207 | } 208 | 209 | void transform_with_alloc(std::vector &from_ids, buffer* to_ids) 210 | { 211 | for (int i=0; i < from_ids.size(); i ++){ 212 | to_ids[i].data = (unsigned char*) calloc(from_ids[i].size(), sizeof(unsigned char)); 213 | to_ids[i].len = from_ids[i].size(); 214 | memcpy(to_ids[i].data, from_ids[i].c_str(),to_ids[i].len); 215 | } 216 | } 217 | 218 | int reconcile_with_ids_no_cbk(void* negentropy, buffer* query, result* result){ 219 | Negentropy *ngn_inst; 220 | ngn_inst = reinterpret_cast*>(negentropy); 221 | 222 | std::optional out; 223 | std::vector haveIds, needIds; 224 | try { 225 | out = ngn_inst->reconcile(std::string_view(reinterpret_cast< char const* >(query->data), query->len), haveIds, needIds); 226 | result->have_ids_len = haveIds.size(); 227 | result->need_ids_len = needIds.size(); 228 | if (haveIds.size() > 0){ 229 | result->have_ids = (buffer*)calloc(result->have_ids_len, sizeof(buffer)); 230 | transform_with_alloc(haveIds, result->have_ids); 231 | } 232 | 233 | if (needIds.size() > 0) { 234 | result->need_ids = (buffer*)calloc(result->need_ids_len, sizeof(buffer)); 235 | transform_with_alloc(needIds, result->need_ids); 236 | } 237 | 238 | // std::cout << "have_ids_len:" << result->have_ids_len << "need_ids_len:" << result->need_ids_len << std::endl; 239 | 240 | 241 | } catch(negentropy::err e){ 242 | std::cout << "caught error "<< e.what() << std::endl; 243 | result->error = (char*)calloc(strlen(e.what()), sizeof(char)); 244 | strcpy(result->error,e.what()); 245 | return -1; 246 | } 247 | buffer output = {0,NULL}; 248 | if (out) { 249 | result->output.len = out.value().size(); 250 | result->output.data = (unsigned char*)calloc(out.value().size(), sizeof(unsigned char)); 251 | memcpy(result->output.data, (unsigned char*)out.value().c_str(),result->output.len) ; 252 | /* std::cout << "reconcile_with_ids output of reconcile is, len:" << out.value().size() << ", output:"; 253 | printHexString(std::string_view(out.value())); */ 254 | }else { 255 | //std::cout << "reconcile_with_ids_no_cbk output is empty " << std::endl; 256 | result->output.len = 0; 257 | result->output.data = NULL; 258 | } 259 | return 0; 260 | } 261 | 262 | //Note: This function assumes that all relevant heap memory is alloced and just tries to free 263 | void free_result(result* r){ 264 | if (r->output.len > 0) { 265 | free((void *) r->output.data); 266 | } 267 | 268 | if (r->have_ids_len > 0){ 269 | for (int i = 0; i < r->have_ids_len; i++) { 270 | free((void *) r->have_ids[i].data); 271 | } 272 | free((void *)r->have_ids); 273 | } 274 | 275 | if (r->need_ids_len > 0) { 276 | for (int i = 0; i < r->need_ids_len; i++) { 277 | free((void *) r->need_ids[i].data); 278 | } 279 | free((void *)r->need_ids); 280 | } 281 | 282 | if (r->error != NULL && strlen(r->error) > 0){ 283 | free((void *)r->error); 284 | } 285 | } 286 | 287 | /*SubRange specific functions 288 | TODO: These and above methods need to be optimized to reduce code duplication*/ 289 | 290 | void* subrange_new(void* storage, uint64_t startTimeStamp, uint64_t endTimeStamp){ 291 | negentropy::storage::BTreeMem* st = reinterpret_cast(storage); 292 | negentropy::storage::SubRange* subRange = NULL; 293 | try { 294 | subRange = new negentropy::storage::SubRange(*st, negentropy::Bound(startTimeStamp), negentropy::Bound(endTimeStamp)); 295 | } catch (negentropy::err e){ 296 | //TODO: Error handling 297 | return NULL; 298 | } 299 | return subRange; 300 | } 301 | 302 | void subrange_delete(void* range){ 303 | negentropy::storage::SubRange* subRange = reinterpret_cast(range); 304 | delete subRange; 305 | } 306 | 307 | int subrange_size(void* range){ 308 | negentropy::storage::SubRange* subrange = reinterpret_cast(range); 309 | return subrange->size(); 310 | } 311 | 312 | void negentropy_subrange_delete(void* negentropy){ 313 | Negentropy* ngn_inst = reinterpret_cast*>(negentropy); 314 | delete ngn_inst; 315 | } 316 | 317 | void* negentropy_subrange_new(void* subrange, uint64_t frameSizeLimit){ 318 | //TODO: Make these typecasts into macros?? 319 | negentropy::storage::SubRange* sub_range; 320 | //TODO: reinterpret cast is risky, need to use more safe type conversion. 321 | sub_range = reinterpret_cast(subrange); 322 | 323 | Negentropy* ne; 324 | try{ 325 | ne = new Negentropy(*sub_range, frameSizeLimit); 326 | }catch(negentropy::err e){ 327 | //TODO: Error handling 328 | return NULL; 329 | } 330 | return ne; 331 | } 332 | 333 | // Returns -1 if already initiated. 334 | int negentropy_subrange_initiate(void* negentropy, result* result){ 335 | Negentropy* ngn_inst; 336 | ngn_inst = reinterpret_cast*>(negentropy); 337 | 338 | std::string output; 339 | try { 340 | output = ngn_inst->initiate(); 341 | /* std::cout << "output of initiate is, len:" << output->size() << ", output:"; 342 | printHexString(std::string_view(*output)); */ 343 | } catch(negentropy::err e){ 344 | //std::cout << "Exception raised in initiate " << e.what() << std::endl; 345 | return -1; 346 | } 347 | if (output.size() > 0 ){ 348 | result->output.len = output.size(); 349 | result->output.data = (unsigned char*)calloc(output.size(), sizeof(unsigned char)); 350 | memcpy(result->output.data, (unsigned char*)output.c_str(),result->output.len) ; 351 | }else { 352 | result->output.len = 0; 353 | result->output.data = NULL; 354 | } 355 | return 0; 356 | } 357 | 358 | void negentropy_subrange_setinitiator(void* negentropy){ 359 | Negentropy *ngn_inst; 360 | ngn_inst = reinterpret_cast*>(negentropy); 361 | 362 | ngn_inst->setInitiator(); 363 | 364 | } 365 | 366 | int reconcile_subrange(void* negentropy, buffer* query, result* result){ 367 | Negentropy *ngn_inst; 368 | ngn_inst = reinterpret_cast*>(negentropy); 369 | std::string out; 370 | try { 371 | out = ngn_inst->reconcile(std::string_view(reinterpret_cast< char const* >(query->data), query->len)); 372 | /* std::cout << "reconcile output of reconcile is, len:" << out->size() << ", output:"; 373 | printHexString(std::string_view(*out)); */ 374 | } catch(negentropy::err e){ 375 | //All errors returned are non-recoverable errors. 376 | //So passing on the error message upwards 377 | //std::cout << "Exception raised in reconcile " << e.what() << std::endl; 378 | result->error = (char*)calloc(strlen(e.what()), sizeof(char)); 379 | strcpy(result->error,e.what()); 380 | return -1; 381 | } 382 | if (out.size() > 0 ){ 383 | result->output.len = out.size(); 384 | result->output.data = (unsigned char*)calloc(out.size(), sizeof(unsigned char)); 385 | memcpy(result->output.data, (unsigned char*)out.c_str(),result->output.len) ; 386 | }else { 387 | result->output.len = 0; 388 | result->output.data = NULL; 389 | } 390 | return 0; 391 | } 392 | 393 | int reconcile_with_ids_subrange_no_cbk(void* negentropy, buffer* query, result* result){ 394 | Negentropy *ngn_inst; 395 | ngn_inst = reinterpret_cast*>(negentropy); 396 | 397 | std::optional out; 398 | std::vector haveIds, needIds; 399 | try { 400 | out = ngn_inst->reconcile(std::string_view(reinterpret_cast< char const* >(query->data), query->len), haveIds, needIds); 401 | result->have_ids_len = haveIds.size(); 402 | result->need_ids_len = needIds.size(); 403 | if (haveIds.size() > 0){ 404 | result->have_ids = (buffer*)calloc(result->have_ids_len, sizeof(buffer)); 405 | transform_with_alloc(haveIds, result->have_ids); 406 | } 407 | 408 | if (needIds.size() > 0) { 409 | result->need_ids = (buffer*)calloc(result->need_ids_len, sizeof(buffer)); 410 | transform_with_alloc(needIds, result->need_ids); 411 | } 412 | 413 | // std::cout << "have_ids_len:" << result->have_ids_len << "need_ids_len:" << result->need_ids_len << std::endl; 414 | 415 | 416 | } catch(negentropy::err e){ 417 | std::cout << "caught error "<< e.what() << std::endl; 418 | result->error = (char*)calloc(strlen(e.what()), sizeof(char)); 419 | strcpy(result->error,e.what()); 420 | return -1; 421 | } 422 | buffer output = {0,NULL}; 423 | if (out) { 424 | result->output.len = out.value().size(); 425 | result->output.data = (unsigned char*)calloc(out.value().size(), sizeof(unsigned char)); 426 | memcpy(result->output.data, (unsigned char*)out.value().c_str(),result->output.len) ; 427 | /* std::cout << "reconcile_with_ids output of reconcile is, len:" << out.value().size() << ", output:"; 428 | printHexString(std::string_view(out.value())); */ 429 | }else { 430 | //std::cout << "reconcile_with_ids_no_cbk output is empty " << std::endl; 431 | result->output.len = 0; 432 | result->output.data = NULL; 433 | } 434 | return 0; 435 | } 436 | 437 | -------------------------------------------------------------------------------- /c/negentropy_wrapper.h: -------------------------------------------------------------------------------- 1 | 2 | #ifndef _NEGENTROPY_WRAPPER_H 3 | #define _NEGENTROPY_WRAPPER_H 4 | 5 | #ifdef __cplusplus 6 | #define EXTERNC extern "C" 7 | #else 8 | #define EXTERNC 9 | #endif 10 | 11 | typedef struct _buffer_{ 12 | uint64_t len ; 13 | unsigned char* data; 14 | }buffer; 15 | 16 | typedef struct _result_ { 17 | buffer output; 18 | uint64_t have_ids_len; 19 | uint64_t need_ids_len; 20 | buffer* have_ids; 21 | buffer* need_ids; 22 | char* error; 23 | } result; 24 | 25 | //This is a C-wrapper for the C++ library that helps in integrating negentropy with nim code. 26 | //TODO: Do error handling by catching exceptions 27 | 28 | EXTERNC void* storage_new(const char* db_path, const char* name); 29 | 30 | EXTERNC void storage_delete(void* storage); 31 | 32 | EXTERNC int storage_size(void* storage); 33 | 34 | EXTERNC void* negentropy_new(void* storage, uint64_t frameSizeLimit); 35 | 36 | EXTERNC void negentropy_delete(void* negentropy); 37 | 38 | EXTERNC int negentropy_initiate(void* negentropy, result* result); 39 | 40 | EXTERNC void negentropy_setinitiator(void* negentropy); 41 | 42 | EXTERNC bool storage_insert(void* storage, uint64_t createdAt, buffer* id); 43 | 44 | EXTERNC bool storage_erase(void* storage, uint64_t createdAt, buffer* id); 45 | 46 | EXTERNC int reconcile(void* negentropy, buffer* query, result* result); 47 | 48 | EXTERNC typedef void (*reconcile_cbk)(buffer* have_ids, uint64_t have_ids_len, buffer* need_ids, uint64_t need_ids_len, buffer* output, char* outptr ); 49 | 50 | EXTERNC int reconcile_with_ids(void* negentropy, buffer* query, reconcile_cbk cbk, char* outptr); 51 | 52 | EXTERNC int reconcile_with_ids_no_cbk(void* negentropy, buffer* query, result* result); 53 | 54 | EXTERNC void free_result(result* result); 55 | 56 | //SubRange methods 57 | EXTERNC void* subrange_new(void* storage, uint64_t startTimeStamp, uint64_t endTimeStamp); 58 | 59 | EXTERNC void subrange_delete(void* range); 60 | 61 | EXTERNC void* negentropy_subrange_new(void* subrange, uint64_t frameSizeLimit); 62 | 63 | EXTERNC void negentropy_subrange_delete(void* negentropy); 64 | 65 | EXTERNC int negentropy_subrange_initiate(void* negentropy, result* result); 66 | 67 | EXTERNC int reconcile_subrange(void* negentropy, buffer* query, result* result); 68 | 69 | EXTERNC int reconcile_with_ids_subrange_no_cbk(void* negentropy, buffer* query, result* result); 70 | 71 | EXTERNC int subrange_size(void* storage); 72 | 73 | //End of SubRange methods 74 | 75 | #endif 76 | 77 | -------------------------------------------------------------------------------- /cpp/.gitignore: -------------------------------------------------------------------------------- 1 | *.gch 2 | -------------------------------------------------------------------------------- /cpp/README.md: -------------------------------------------------------------------------------- 1 | # Negentropy C++ Implementation 2 | 3 | The C++ implementation is header-only and the only required dependency is OpenSSL (for SHA-256). The main `Negentropy` class can be imported with the following: 4 | 5 | #include "negentropy.h" 6 | 7 | ## Storage 8 | 9 | First, you need to create a storage instance. Currently the following are available: 10 | 11 | ### negentropy::storage::Vector 12 | 13 | All the elements are put into a contiguous vector in memory, and are then sorted. This can be useful for syncing the results of a dynamic query, since it can be constructed rapidly and consumes a minimal amount of memory. However, modifying it by adding or removing elements is expensive (linear in the size of the data-set). 14 | 15 | #include "negentropy/storage/Vector.h" 16 | 17 | To use `Vector`, add all your items with `insert` and then call `seal`: 18 | 19 | negentropy::storage::Vector storage; 20 | 21 | for (const auto &item : myItems) { 22 | storage.insert(timestamp, id); 23 | } 24 | 25 | storage.seal(); 26 | 27 | After sealing, no more items can be added. 28 | 29 | ### negentropy::storage::BTreeMem 30 | 31 | Keeps the elements in an in-memory B+Tree. Computing fingerprints, adding, and removing elements are all logarithmic in data-set size. However, the elements will not be persisted to disk, and the data-structure is not thread-safe. 32 | 33 | #include "negentropy/storage/BTreeMem.h" 34 | 35 | To use `BTreeMem`, items can be added in the same way as with `Vector`, however sealing is not necessary (although is supported -- it is a no-op): 36 | 37 | negentropy::storage::BTreeMem storage; 38 | 39 | for (const auto &item : myItems) { 40 | storage.insert(timestamp, id); 41 | } 42 | 43 | More items can be added at any time, and items can be removed with `eraseItem`: 44 | 45 | storage.insert(timestamp, id); 46 | storage.erase(timestamp, id); 47 | 48 | 49 | ### negentropy::storage::BTreeLMDB 50 | 51 | Uses the same implementation as BTreeMem, except that it uses [LMDB](http://lmdb.tech/) to save the data-set to persistent storage. Because the database is memory mapped, its read-performance is identical to the "in-memory" version (it is also in-memory, the memory just happens to reside in the page cache). Additionally, the tree can be concurrently accessed by multiple threads/processes using ACID transactions. 52 | 53 | #include "negentropy/storage/BTreeLMDB.h" 54 | 55 | First create an LMDB environment. Next, allocate a DBI to contain your tree(s) by calling `setupDB` inside a write transaction (don't forget to commit it). The `"test-data"` argument is the LMDB DBI table name you want to use: 56 | 57 | negentropy::storage::BTreeLMDB storage; 58 | 59 | auto env = lmdb::env::create(); 60 | env.set_max_dbs(64); 61 | env.open("testdb/", 0); 62 | 63 | lmdb::dbi btreeDbi; 64 | 65 | { 66 | auto txn = lmdb::txn::begin(env); 67 | btreeDbi = negentropy::storage::BTreeLMDB::setupDB(txn, "test-data"); 68 | txn.commit(); 69 | } 70 | 71 | To add/remove items, create a `BTreeLMDB` object inside a write transaction. This is the storage instance: 72 | 73 | { 74 | auto txn = lmdb::txn::begin(env); 75 | negentropy::storage::BTreeLMDB storage(txn, btreeDbi, 300); 76 | 77 | storage.insert(timestamp, id); 78 | 79 | storage.flush(); 80 | txn.commit(); 81 | } 82 | 83 | * The third parameter (`300` in the above example) is the `treeId`. This allows many different trees to co-exist in the same DBI. 84 | * Storage must be flushed before commiting the transaction. `BTreeLMDB` will try to flush in its destructor. If you commit before this happens, you may see "mdb_put: Invalid argument" errors. 85 | 86 | 87 | ### negentropy::storage::SubRange 88 | 89 | This storage is a proxy to a sub-range of another storage. It is useful for performing partial syncs of the DB. 90 | 91 | The constructor arguments are the large storage you want to proxy to (of type `Vector`, `BTreeLMDB`, etc), and the lower and upper bounds of the desired sub-range. As usual, lower bounds are inclusive and upper bounds are exclusive: 92 | 93 | negentropy::storage::SubRange subStorage(storage, negentropy::Bound(fromTimestamp), negentropy::Bound(toTimestamp)); 94 | 95 | 96 | ## Reconciliation 97 | 98 | Reconciliation works mostly the same for all storage types. First create a `Negentropy` object: 99 | 100 | auto ne = Negentropy(storage, 50'000); 101 | 102 | * The object is templated on the storage type, but can often be auto-deduced (as above). 103 | * The second parameter (`50'000` above) is the `frameSizeLimit`. This can be omitted (or `0`) to permit unlimited-sized frames. 104 | 105 | On the client-side, create an initial message, and then transmit it to the server, receive the response, and `reconcile` until complete: 106 | 107 | std::string msg = ne.initiate(); 108 | 109 | while (true) { 110 | std::string response = queryServer(msg); 111 | 112 | std::vector have, need; 113 | std::optional newMsg = ne.reconcile(response, have, need); 114 | 115 | // handle have/need (there may be duplicates from previous calls to reconcile()) 116 | 117 | if (!newMsg) break; // done 118 | else std::swap(msg, *newMsg); 119 | } 120 | 121 | In each loop iteration, `have` contains IDs that the client has that the server doesn't, and `need` contains IDs that the server has that the client doesn't. 122 | 123 | The server-side is similar, except it doesn't create an initial message, there are no `have`/`need` arrays, and it doesn't return an optional (servers must always reply to a request): 124 | 125 | while (true) { 126 | std::string msg = receiveMsgFromClient(); 127 | std::string response = ne.reconcile(msg); 128 | respondToClient(response); 129 | } 130 | 131 | 132 | 133 | ## BTree Implementation 134 | 135 | The BTree implementation is technically a B+Tree since all records are stored in the leaves. Every node has `next` and `prev` pointers that point to the neighbour nodes on the same level, which allows efficient iteration. 136 | 137 | Each node has an accumulator that contains the sum of the IDs of all nodes below it, allowing fingerprints to be computed in logarithmic time relative to the number of tree leaves. 138 | 139 | Nodes will split and rebalance themselves as necessary to keep the tree balanced. This is a major advantage over rigid data-structures like merkle-search trees and prolly trees, which are only probabilisticly balanced. 140 | 141 | If records are always inserted to the "right" of the tree, nodes will be fully packed. Otherwise, the tree attempts to keep them 50% full. There are more details on the tree invariants in the `negentropy/storage/btree/core.h` implementation file. 142 | -------------------------------------------------------------------------------- /cpp/negentropy.h: -------------------------------------------------------------------------------- 1 | // (C) 2023 Doug Hoyte. MIT license 2 | 3 | #ifndef _NEGENTROPY_H_ 4 | #define _NEGENTROPY_H_ 5 | 6 | #include 7 | #include 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | #include "negentropy/encoding.h" 21 | #include "negentropy/types.h" 22 | #include "negentropy/storage/base.h" 23 | 24 | 25 | namespace negentropy { 26 | 27 | const uint64_t PROTOCOL_VERSION = 0x61; // Version 1 28 | 29 | const uint64_t MAX_U64 = std::numeric_limits::max(); 30 | using err = std::runtime_error; 31 | 32 | 33 | 34 | template 35 | struct Negentropy { 36 | StorageImpl &storage; 37 | uint64_t frameSizeLimit; 38 | 39 | bool isInitiator = false; 40 | 41 | uint64_t lastTimestampIn = 0; 42 | uint64_t lastTimestampOut = 0; 43 | 44 | Negentropy(StorageImpl &storage, uint64_t frameSizeLimit = 0) : storage(storage), frameSizeLimit(frameSizeLimit) { 45 | if (frameSizeLimit != 0 && frameSizeLimit < 4096) throw negentropy::err("frameSizeLimit too small"); 46 | } 47 | 48 | std::string initiate() { 49 | if (isInitiator) throw negentropy::err("already initiated"); 50 | isInitiator = true; 51 | 52 | std::string output; 53 | output.push_back(PROTOCOL_VERSION); 54 | 55 | output += splitRange(0, storage.size(), Bound(MAX_U64)); 56 | 57 | return output; 58 | } 59 | 60 | void setInitiator() { 61 | isInitiator = true; 62 | } 63 | 64 | std::string reconcile(std::string_view query) { 65 | if (isInitiator) throw negentropy::err("initiator not asking for have/need IDs"); 66 | 67 | std::vector haveIds, needIds; 68 | return reconcileAux(query, haveIds, needIds); 69 | } 70 | 71 | std::optional reconcile(std::string_view query, std::vector &haveIds, std::vector &needIds) { 72 | if (!isInitiator) throw negentropy::err("non-initiator asking for have/need IDs"); 73 | 74 | auto output = reconcileAux(query, haveIds, needIds); 75 | if (output.size() == 1) return std::nullopt; 76 | return output; 77 | } 78 | 79 | private: 80 | std::string reconcileAux(std::string_view query, std::vector &haveIds, std::vector &needIds) { 81 | lastTimestampIn = lastTimestampOut = 0; // reset for each message 82 | 83 | std::string fullOutput; 84 | fullOutput.push_back(PROTOCOL_VERSION); 85 | 86 | auto protocolVersion = getByte(query); 87 | if (protocolVersion < 0x60 || protocolVersion > 0x6F) throw negentropy::err("invalid negentropy protocol version byte"); 88 | if (protocolVersion != PROTOCOL_VERSION) { 89 | if (isInitiator) throw negentropy::err(std::string("unsupported negentropy protocol version requested") + std::to_string(protocolVersion - 0x60)); 90 | else return fullOutput; 91 | } 92 | 93 | uint64_t storageSize = storage.size(); 94 | Bound prevBound; 95 | size_t prevIndex = 0; 96 | bool skip = false; 97 | 98 | while (query.size()) { 99 | std::string o; 100 | 101 | auto doSkip = [&]{ 102 | if (skip) { 103 | skip = false; 104 | o += encodeBound(prevBound); 105 | o += encodeVarInt(uint64_t(Mode::Skip)); 106 | } 107 | }; 108 | 109 | auto currBound = decodeBound(query); 110 | auto mode = Mode(decodeVarInt(query)); 111 | 112 | auto lower = prevIndex; 113 | auto upper = storage.findLowerBound(prevIndex, storageSize, currBound); 114 | 115 | if (mode == Mode::Skip) { 116 | skip = true; 117 | } else if (mode == Mode::Fingerprint) { 118 | auto theirFingerprint = getBytes(query, FINGERPRINT_SIZE); 119 | auto ourFingerprint = storage.fingerprint(lower, upper); 120 | 121 | if (theirFingerprint != ourFingerprint.sv()) { 122 | doSkip(); 123 | o += splitRange(lower, upper, currBound); 124 | } else { 125 | skip = true; 126 | } 127 | } else if (mode == Mode::IdList) { 128 | auto numIds = decodeVarInt(query); 129 | 130 | std::unordered_set theirElems; 131 | for (uint64_t i = 0; i < numIds; i++) { 132 | auto e = getBytes(query, ID_SIZE); 133 | if (isInitiator) theirElems.insert(e); 134 | } 135 | 136 | if (isInitiator) { 137 | skip = true; 138 | 139 | storage.iterate(lower, upper, [&](const Item &item, size_t){ 140 | auto k = std::string(item.getId()); 141 | 142 | if (theirElems.find(k) == theirElems.end()) { 143 | // ID exists on our side, but not their side 144 | haveIds.emplace_back(k); 145 | } else { 146 | // ID exists on both sides 147 | theirElems.erase(k); 148 | } 149 | 150 | return true; 151 | }); 152 | 153 | for (const auto &k : theirElems) { 154 | // ID exists on their side, but not our side 155 | needIds.emplace_back(k); 156 | } 157 | } else { 158 | doSkip(); 159 | 160 | std::string responseIds; 161 | uint64_t numResponseIds = 0; 162 | Bound endBound = currBound; 163 | 164 | storage.iterate(lower, upper, [&](const Item &item, size_t index){ 165 | if (exceededFrameSizeLimit(fullOutput.size() + responseIds.size())) { 166 | endBound = Bound(item); 167 | upper = index; // shrink upper so that remaining range gets correct fingerprint 168 | return false; 169 | } 170 | 171 | responseIds += item.getId(); 172 | numResponseIds++; 173 | return true; 174 | }); 175 | 176 | o += encodeBound(endBound); 177 | o += encodeVarInt(uint64_t(Mode::IdList)); 178 | o += encodeVarInt(numResponseIds); 179 | o += responseIds; 180 | 181 | fullOutput += o; 182 | o.clear(); 183 | } 184 | } else { 185 | throw negentropy::err("unexpected mode"); 186 | } 187 | 188 | if (exceededFrameSizeLimit(fullOutput.size() + o.size())) { 189 | // frameSizeLimit exceeded: Stop range processing and return a fingerprint for the remaining range 190 | auto remainingFingerprint = storage.fingerprint(upper, storageSize); 191 | 192 | fullOutput += encodeBound(Bound(MAX_U64)); 193 | fullOutput += encodeVarInt(uint64_t(Mode::Fingerprint)); 194 | fullOutput += remainingFingerprint.sv(); 195 | break; 196 | } else { 197 | fullOutput += o; 198 | } 199 | 200 | prevIndex = upper; 201 | prevBound = currBound; 202 | } 203 | 204 | return fullOutput; 205 | } 206 | 207 | std::string splitRange(size_t lower, size_t upper, const Bound &upperBound) { 208 | std::string o; 209 | 210 | uint64_t numElems = upper - lower; 211 | const uint64_t buckets = 16; 212 | 213 | if (numElems < buckets * 2) { 214 | o += encodeBound(upperBound); 215 | o += encodeVarInt(uint64_t(Mode::IdList)); 216 | 217 | o += encodeVarInt(numElems); 218 | storage.iterate(lower, upper, [&](const Item &item, size_t){ 219 | o += item.getId(); 220 | return true; 221 | }); 222 | } else { 223 | uint64_t itemsPerBucket = numElems / buckets; 224 | uint64_t bucketsWithExtra = numElems % buckets; 225 | auto curr = lower; 226 | 227 | for (uint64_t i = 0; i < buckets; i++) { 228 | auto bucketSize = itemsPerBucket + (i < bucketsWithExtra ? 1 : 0); 229 | auto ourFingerprint = storage.fingerprint(curr, curr + bucketSize); 230 | curr += bucketSize; 231 | 232 | Bound nextBound; 233 | 234 | if (curr == upper) { 235 | nextBound = upperBound; 236 | } else { 237 | Item prevItem, currItem; 238 | 239 | storage.iterate(curr - 1, curr + 1, [&](const Item &item, size_t index){ 240 | if (index == curr - 1) prevItem = item; 241 | else currItem = item; 242 | return true; 243 | }); 244 | 245 | nextBound = getMinimalBound(prevItem, currItem); 246 | } 247 | 248 | o += encodeBound(nextBound); 249 | o += encodeVarInt(uint64_t(Mode::Fingerprint)); 250 | o += ourFingerprint.sv(); 251 | } 252 | } 253 | 254 | return o; 255 | } 256 | 257 | bool exceededFrameSizeLimit(size_t n) { 258 | return frameSizeLimit && n > frameSizeLimit - 200; 259 | } 260 | 261 | // Decoding 262 | 263 | uint64_t decodeTimestampIn(std::string_view &encoded) { 264 | uint64_t timestamp = decodeVarInt(encoded); 265 | timestamp = timestamp == 0 ? MAX_U64 : timestamp - 1; 266 | timestamp += lastTimestampIn; 267 | if (timestamp < lastTimestampIn) timestamp = MAX_U64; // saturate 268 | lastTimestampIn = timestamp; 269 | return timestamp; 270 | } 271 | 272 | Bound decodeBound(std::string_view &encoded) { 273 | auto timestamp = decodeTimestampIn(encoded); 274 | auto len = decodeVarInt(encoded); 275 | return Bound(timestamp, getBytes(encoded, len)); 276 | } 277 | 278 | // Encoding 279 | 280 | std::string encodeTimestampOut(uint64_t timestamp) { 281 | if (timestamp == MAX_U64) { 282 | lastTimestampOut = MAX_U64; 283 | return encodeVarInt(0); 284 | } 285 | 286 | uint64_t temp = timestamp; 287 | timestamp -= lastTimestampOut; 288 | lastTimestampOut = temp; 289 | return encodeVarInt(timestamp + 1); 290 | }; 291 | 292 | std::string encodeBound(const Bound &bound) { 293 | std::string output; 294 | 295 | output += encodeTimestampOut(bound.item.timestamp); 296 | output += encodeVarInt(bound.idLen); 297 | output += bound.item.getId().substr(0, bound.idLen); 298 | 299 | return output; 300 | }; 301 | 302 | Bound getMinimalBound(const Item &prev, const Item &curr) { 303 | if (curr.timestamp != prev.timestamp) { 304 | return Bound(curr.timestamp); 305 | } else { 306 | uint64_t sharedPrefixBytes = 0; 307 | auto currKey = curr.getId(); 308 | auto prevKey = prev.getId(); 309 | 310 | for (uint64_t i = 0; i < ID_SIZE; i++) { 311 | if (currKey[i] != prevKey[i]) break; 312 | sharedPrefixBytes++; 313 | } 314 | 315 | return Bound(curr.timestamp, currKey.substr(0, sharedPrefixBytes + 1)); 316 | } 317 | } 318 | }; 319 | 320 | 321 | } 322 | 323 | 324 | template 325 | using Negentropy = negentropy::Negentropy; 326 | 327 | #endif 328 | -------------------------------------------------------------------------------- /cpp/negentropy/encoding.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #include 6 | 7 | 8 | namespace negentropy { 9 | 10 | using err = std::runtime_error; 11 | 12 | 13 | 14 | inline uint8_t getByte(std::string_view &encoded) { 15 | if (encoded.size() < 1) throw negentropy::err("parse ends prematurely"); 16 | uint8_t output = encoded[0]; 17 | encoded = encoded.substr(1); 18 | return output; 19 | } 20 | 21 | inline std::string getBytes(std::string_view &encoded, size_t n) { 22 | if (encoded.size() < n) throw negentropy::err("parse ends prematurely"); 23 | auto res = encoded.substr(0, n); 24 | encoded = encoded.substr(n); 25 | return std::string(res); 26 | }; 27 | 28 | inline uint64_t decodeVarInt(std::string_view &encoded) { 29 | uint64_t res = 0; 30 | 31 | while (1) { 32 | if (encoded.size() == 0) throw negentropy::err("premature end of varint"); 33 | uint64_t byte = encoded[0]; 34 | encoded = encoded.substr(1); 35 | res = (res << 7) | (byte & 0b0111'1111); 36 | if ((byte & 0b1000'0000) == 0) break; 37 | } 38 | 39 | return res; 40 | } 41 | 42 | inline std::string encodeVarInt(uint64_t n) { 43 | if (n == 0) return std::string(1, '\0'); 44 | 45 | std::string o; 46 | 47 | while (n) { 48 | o.push_back(static_cast(n & 0x7F)); 49 | n >>= 7; 50 | } 51 | 52 | std::reverse(o.begin(), o.end()); 53 | 54 | for (size_t i = 0; i < o.size() - 1; i++) { 55 | o[i] |= 0x80; 56 | } 57 | 58 | return o; 59 | } 60 | 61 | 62 | } 63 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/BTreeLMDB.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #include "lmdbxx/lmdb++.h" 6 | 7 | #include "negentropy.h" 8 | #include "negentropy/storage/btree/core.h" 9 | 10 | 11 | namespace negentropy { namespace storage { 12 | 13 | using err = std::runtime_error; 14 | using Node = negentropy::storage::btree::Node; 15 | using NodePtr = negentropy::storage::btree::NodePtr; 16 | 17 | 18 | struct BTreeLMDB : btree::BTreeCore { 19 | lmdb::txn &txn; 20 | lmdb::dbi dbi; 21 | uint64_t treeId; 22 | 23 | struct MetaData { 24 | uint64_t rootNodeId; 25 | uint64_t nextNodeId; 26 | 27 | bool operator==(const MetaData &other) const { 28 | return rootNodeId == other.rootNodeId && nextNodeId == other.nextNodeId; 29 | } 30 | }; 31 | 32 | MetaData metaDataCache; 33 | MetaData origMetaData; 34 | std::map dirtyNodeCache; 35 | 36 | 37 | static lmdb::dbi setupDB(lmdb::txn &txn, std::string_view tableName) { 38 | return lmdb::dbi::open(txn, tableName, MDB_CREATE | MDB_REVERSEKEY); 39 | } 40 | 41 | BTreeLMDB(lmdb::txn &txn, lmdb::dbi dbi, uint64_t treeId) : txn(txn), dbi(dbi), treeId(treeId) { 42 | static_assert(sizeof(MetaData) == 16); 43 | std::string_view v; 44 | bool found = dbi.get(txn, getKey(0), v); 45 | metaDataCache = found ? lmdb::from_sv(v) : MetaData{ 0, 1, }; 46 | origMetaData = metaDataCache; 47 | } 48 | 49 | ~BTreeLMDB() { 50 | flush(); 51 | } 52 | 53 | void flush() { 54 | for (auto &[nodeId, node] : dirtyNodeCache) { 55 | dbi.put(txn, getKey(nodeId), node.sv()); 56 | } 57 | dirtyNodeCache.clear(); 58 | 59 | if (metaDataCache != origMetaData) { 60 | dbi.put(txn, getKey(0), lmdb::to_sv(metaDataCache)); 61 | origMetaData = metaDataCache; 62 | } 63 | } 64 | 65 | 66 | // Interface 67 | 68 | const btree::NodePtr getNodeRead(uint64_t nodeId) { 69 | if (nodeId == 0) return {nullptr, 0}; 70 | 71 | auto res = dirtyNodeCache.find(nodeId); 72 | if (res != dirtyNodeCache.end()) return NodePtr{&res->second, nodeId}; 73 | 74 | std::string_view sv; 75 | bool found = dbi.get(txn, getKey(nodeId), sv); 76 | if (!found) throw err("couldn't find node"); 77 | return NodePtr{(Node*)sv.data(), nodeId}; 78 | } 79 | 80 | btree::NodePtr getNodeWrite(uint64_t nodeId) { 81 | if (nodeId == 0) return {nullptr, 0}; 82 | 83 | { 84 | auto res = dirtyNodeCache.find(nodeId); 85 | if (res != dirtyNodeCache.end()) return NodePtr{&res->second, nodeId}; 86 | } 87 | 88 | std::string_view sv; 89 | bool found = dbi.get(txn, getKey(nodeId), sv); 90 | if (!found) throw err("couldn't find node"); 91 | 92 | auto res = dirtyNodeCache.try_emplace(nodeId); 93 | Node *newNode = &res.first->second; 94 | memcpy(newNode, sv.data(), sizeof(Node)); 95 | 96 | return NodePtr{newNode, nodeId}; 97 | } 98 | 99 | btree::NodePtr makeNode() { 100 | uint64_t nodeId = metaDataCache.nextNodeId++; 101 | auto res = dirtyNodeCache.try_emplace(nodeId); 102 | return NodePtr{&res.first->second, nodeId}; 103 | } 104 | 105 | void deleteNode(uint64_t nodeId) { 106 | if (nodeId == 0) throw err("can't delete metadata"); 107 | dirtyNodeCache.erase(nodeId); 108 | dbi.del(txn, getKey(nodeId)); 109 | } 110 | 111 | uint64_t getRootNodeId() { 112 | return metaDataCache.rootNodeId; 113 | } 114 | 115 | void setRootNodeId(uint64_t newRootNodeId) { 116 | metaDataCache.rootNodeId = newRootNodeId; 117 | } 118 | 119 | // Internal utils 120 | 121 | private: 122 | std::string getKey(uint64_t n) { 123 | uint64_t treeIdCopy = treeId; 124 | 125 | if constexpr (std::endian::native == std::endian::big) { 126 | auto byteswap = [](uint64_t &n) { 127 | uint8_t *first = reinterpret_cast(&n); 128 | uint8_t *last = first + 8; 129 | std::reverse(first, last); 130 | }; 131 | 132 | byteswap(n); 133 | byteswap(treeIdCopy); 134 | } else { 135 | static_assert(std::endian::native == std::endian::little); 136 | } 137 | 138 | std::string k; 139 | k += lmdb::to_sv(treeIdCopy); 140 | k += lmdb::to_sv(n); 141 | return k; 142 | } 143 | }; 144 | 145 | 146 | }} 147 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/BTreeMem.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include "negentropy.h" 4 | #include "negentropy/storage/btree/core.h" 5 | 6 | 7 | namespace negentropy { namespace storage { 8 | 9 | 10 | struct BTreeMem : btree::BTreeCore { 11 | std::unordered_map _nodeStorageMap; 12 | uint64_t _rootNodeId = 0; // 0 means no root 13 | uint64_t _nextNodeId = 1; 14 | 15 | // Interface 16 | 17 | const btree::NodePtr getNodeRead(uint64_t nodeId) { 18 | if (nodeId == 0) return {nullptr, 0}; 19 | auto res = _nodeStorageMap.find(nodeId); 20 | if (res == _nodeStorageMap.end()) return btree::NodePtr{nullptr, 0}; 21 | return btree::NodePtr{&res->second, nodeId}; 22 | } 23 | 24 | btree::NodePtr getNodeWrite(uint64_t nodeId) { 25 | return getNodeRead(nodeId); 26 | } 27 | 28 | btree::NodePtr makeNode() { 29 | uint64_t nodeId = _nextNodeId++; 30 | _nodeStorageMap.try_emplace(nodeId); 31 | return getNodeRead(nodeId); 32 | } 33 | 34 | void deleteNode(uint64_t nodeId) { 35 | _nodeStorageMap.erase(nodeId); 36 | } 37 | 38 | uint64_t getRootNodeId() { 39 | return _rootNodeId; 40 | } 41 | 42 | void setRootNodeId(uint64_t newRootNodeId) { 43 | _rootNodeId = newRootNodeId; 44 | } 45 | }; 46 | 47 | 48 | }} 49 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/SubRange.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #include "negentropy.h" 6 | 7 | 8 | 9 | namespace negentropy { namespace storage { 10 | 11 | 12 | struct SubRange : StorageBase { 13 | StorageBase &base; 14 | size_t baseSize; 15 | size_t subBegin; 16 | size_t subEnd; 17 | size_t subSize; 18 | 19 | SubRange(StorageBase &base, const Bound &lowerBound, const Bound &upperBound) : base(base) { 20 | baseSize = base.size(); 21 | subBegin = lowerBound == Bound(0) ? 0 : base.findLowerBound(0, baseSize, lowerBound); 22 | subEnd = upperBound == Bound(MAX_U64) ? baseSize : base.findLowerBound(subBegin, baseSize, upperBound); 23 | if (subEnd != baseSize && Bound(base.getItem(subEnd)) == upperBound) subEnd++; // instead of upper_bound: OK because items are unique 24 | subSize = subEnd - subBegin; 25 | } 26 | 27 | uint64_t size() { 28 | return subSize; 29 | } 30 | 31 | const Item &getItem(size_t i) { 32 | if (i >= subSize) throw negentropy::err("bad index"); 33 | return base.getItem(subBegin + i); 34 | } 35 | 36 | void iterate(size_t begin, size_t end, std::function cb) { 37 | checkBounds(begin, end); 38 | 39 | base.iterate(subBegin + begin, subBegin + end, [&](const Item &item, size_t index){ 40 | return cb(item, index - subBegin); 41 | }); 42 | } 43 | 44 | size_t findLowerBound(size_t begin, size_t end, const Bound &bound) { 45 | checkBounds(begin, end); 46 | 47 | return std::min(base.findLowerBound(subBegin + begin, subBegin + end, bound) - subBegin, subSize); 48 | } 49 | 50 | Fingerprint fingerprint(size_t begin, size_t end) { 51 | checkBounds(begin, end); 52 | 53 | return base.fingerprint(subBegin + begin, subBegin + end); 54 | } 55 | 56 | private: 57 | void checkBounds(size_t begin, size_t end) { 58 | if (begin > end || end > subSize) throw negentropy::err("bad range"); 59 | } 60 | }; 61 | 62 | 63 | }} 64 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/Vector.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include "negentropy.h" 4 | 5 | 6 | 7 | namespace negentropy { namespace storage { 8 | 9 | 10 | struct Vector : StorageBase { 11 | std::vector items; 12 | bool sealed = false; 13 | 14 | void insert(uint64_t createdAt, std::string_view id) { 15 | if (sealed) throw negentropy::err("already sealed"); 16 | if (id.size() != ID_SIZE) throw negentropy::err("bad id size for added item"); 17 | items.emplace_back(createdAt, id); 18 | } 19 | 20 | void insertItem(const Item &item) { 21 | insert(item.timestamp, item.getId()); 22 | } 23 | 24 | void seal() { 25 | if (sealed) throw negentropy::err("already sealed"); 26 | sealed = true; 27 | 28 | std::sort(items.begin(), items.end()); 29 | 30 | for (size_t i = 1; i < items.size(); i++) { 31 | if (items[i - 1] == items[i]) throw negentropy::err("duplicate item inserted"); 32 | } 33 | } 34 | 35 | void unseal() { 36 | sealed = false; 37 | } 38 | 39 | uint64_t size() { 40 | checkSealed(); 41 | return items.size(); 42 | } 43 | 44 | const Item &getItem(size_t i) { 45 | checkSealed(); 46 | return items.at(i); 47 | } 48 | 49 | void iterate(size_t begin, size_t end, std::function cb) { 50 | checkSealed(); 51 | checkBounds(begin, end); 52 | 53 | for (auto i = begin; i < end; ++i) { 54 | if (!cb(items[i], i)) break; 55 | } 56 | } 57 | 58 | size_t findLowerBound(size_t begin, size_t end, const Bound &bound) { 59 | checkSealed(); 60 | checkBounds(begin, end); 61 | 62 | return std::lower_bound(items.begin() + begin, items.begin() + end, bound.item) - items.begin(); 63 | } 64 | 65 | Fingerprint fingerprint(size_t begin, size_t end) { 66 | Accumulator out; 67 | out.setToZero(); 68 | 69 | iterate(begin, end, [&](const Item &item, size_t){ 70 | out.add(item); 71 | return true; 72 | }); 73 | 74 | return out.getFingerprint(end - begin); 75 | } 76 | 77 | private: 78 | void checkSealed() { 79 | if (!sealed) throw negentropy::err("not sealed"); 80 | } 81 | 82 | void checkBounds(size_t begin, size_t end) { 83 | if (begin > end || end > items.size()) throw negentropy::err("bad range"); 84 | } 85 | }; 86 | 87 | 88 | }} 89 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/base.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #include "negentropy/types.h" 6 | 7 | 8 | namespace negentropy { 9 | 10 | struct StorageBase { 11 | virtual uint64_t size() = 0; 12 | 13 | virtual const Item &getItem(size_t i) = 0; 14 | 15 | virtual void iterate(size_t begin, size_t end, std::function cb) = 0; 16 | 17 | virtual size_t findLowerBound(size_t begin, size_t end, const Bound &value) = 0; 18 | 19 | virtual Fingerprint fingerprint(size_t begin, size_t end) = 0; 20 | }; 21 | 22 | } 23 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/btree/core.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #include "negentropy.h" 6 | 7 | 8 | 9 | namespace negentropy { namespace storage { namespace btree { 10 | 11 | using err = std::runtime_error; 12 | 13 | /* 14 | 15 | Each node contains an array of keys. For leaf nodes, the keys are 0. For non-leaf nodes, these will 16 | be the nodeIds of the children leaves. The items in the keys of non-leaf nodes are the first items 17 | in the corresponding child nodes. 18 | 19 | Except for the right-most nodes in the tree at each level (which includes the root node), all nodes 20 | contain at least MIN_ITEMS and at most MAX_ITEMS. 21 | 22 | If a node falls below MIN_ITEMS, a neighbour node (which always has the same parent) is selected. 23 | * If between the two nodes there are REBALANCE_THRESHOLD or fewer total items, all items are 24 | moved into one node and the other is deleted. 25 | * If there are more than REBALANCE_THRESHOLD total items, then the items are divided into two 26 | approximately equal-sized halves. 27 | 28 | If a node goes above MAX_ITEMS then a new neighbour node is created. 29 | * If the node is the right-most in its level, pack the old node to MAX_ITEMS, and move the rest 30 | into the new neighbour. This optimises space-usage in the case of append workloads. 31 | * Otherwise, split the node into two approximately equal-sized halves. 32 | 33 | */ 34 | 35 | 36 | #ifdef NE_FUZZ_TEST 37 | 38 | // Fuzz test mode: Causes a large amount of tree structure changes like splitting, moving, and rebalancing 39 | 40 | const size_t MIN_ITEMS = 2; 41 | const size_t REBALANCE_THRESHOLD = 4; 42 | const size_t MAX_ITEMS = 6; 43 | 44 | #else 45 | 46 | // Production mode: Nodes fit into 4k pages, and oscillating insert/erase will not cause tree structure changes 47 | 48 | const size_t MIN_ITEMS = 30; 49 | const size_t REBALANCE_THRESHOLD = 60; 50 | const size_t MAX_ITEMS = 80; 51 | 52 | #endif 53 | 54 | static_assert(MIN_ITEMS < REBALANCE_THRESHOLD); 55 | static_assert(REBALANCE_THRESHOLD < MAX_ITEMS); 56 | static_assert(MAX_ITEMS / 2 > MIN_ITEMS); 57 | static_assert(MIN_ITEMS % 2 == 0 && REBALANCE_THRESHOLD % 2 == 0 && MAX_ITEMS % 2 == 0); 58 | 59 | 60 | struct Key { 61 | Item item; 62 | uint64_t nodeId; 63 | 64 | void setToZero() { 65 | item = Item(); 66 | nodeId = 0; 67 | } 68 | }; 69 | 70 | inline bool operator<(const Key &a, const Key &b) { 71 | return a.item < b.item; 72 | }; 73 | 74 | struct Node { 75 | uint64_t numItems; // Number of items in this Node 76 | uint64_t accumCount; // Total number of items in or under this Node 77 | uint64_t nextSibling; // Pointer to next node in this level 78 | uint64_t prevSibling; // Pointer to previous node in this level 79 | 80 | Accumulator accum; 81 | 82 | Key items[MAX_ITEMS + 1]; 83 | 84 | 85 | Node() { 86 | memset((void*)this, '\0', sizeof(*this)); 87 | } 88 | 89 | std::string_view sv() { 90 | return std::string_view(reinterpret_cast(this), sizeof(*this)); 91 | } 92 | }; 93 | 94 | struct NodePtr { 95 | Node *p; 96 | uint64_t nodeId; 97 | 98 | 99 | bool exists() { 100 | return p != nullptr; 101 | } 102 | 103 | Node &get() const { 104 | return *p; 105 | } 106 | }; 107 | 108 | struct Breadcrumb { 109 | size_t index; 110 | NodePtr nodePtr; 111 | }; 112 | 113 | 114 | struct BTreeCore : StorageBase { 115 | //// Node Storage 116 | 117 | virtual const NodePtr getNodeRead(uint64_t nodeId) = 0; 118 | 119 | virtual NodePtr getNodeWrite(uint64_t nodeId) = 0; 120 | 121 | virtual NodePtr makeNode() = 0; 122 | 123 | virtual void deleteNode(uint64_t nodeId) = 0; 124 | 125 | virtual uint64_t getRootNodeId() = 0; 126 | 127 | virtual void setRootNodeId(uint64_t newRootNodeId) = 0; 128 | 129 | 130 | //// Search 131 | 132 | std::vector searchItem(uint64_t rootNodeId, const Item &newItem, bool &found) { 133 | found = false; 134 | std::vector breadcrumbs; 135 | 136 | auto foundNode = getNodeRead(rootNodeId); 137 | 138 | while (foundNode.nodeId) { 139 | const auto &node = foundNode.get(); 140 | size_t index = node.numItems - 1; 141 | 142 | if (node.numItems > 1) { 143 | for (size_t i = 1; i < node.numItems + 1; i++) { 144 | if (i == node.numItems + 1 || newItem < node.items[i].item) { 145 | index = i - 1; 146 | break; 147 | } 148 | } 149 | } 150 | 151 | if (!found && (newItem == node.items[index].item)) found = true; 152 | 153 | breadcrumbs.push_back({index, foundNode}); 154 | foundNode = getNodeRead(node.items[index].nodeId); 155 | } 156 | 157 | return breadcrumbs; 158 | } 159 | 160 | 161 | //// Insert 162 | 163 | bool insert(uint64_t createdAt, std::string_view id) { 164 | return insertItem(Item(createdAt, id)); 165 | } 166 | 167 | bool insertItem(const Item &newItem) { 168 | // Make root leaf in case it doesn't exist 169 | 170 | auto rootNodeId = getRootNodeId(); 171 | 172 | if (!rootNodeId) { 173 | auto newNodePtr = makeNode(); 174 | auto &newNode = newNodePtr.get(); 175 | 176 | newNode.items[0].item = newItem; 177 | newNode.numItems++; 178 | newNode.accum.add(newItem); 179 | newNode.accumCount = 1; 180 | 181 | setRootNodeId(newNodePtr.nodeId); 182 | return true; 183 | } 184 | 185 | 186 | // Traverse interior nodes, leaving breadcrumbs along the way 187 | 188 | 189 | bool found; 190 | auto breadcrumbs = searchItem(rootNodeId, newItem, found); 191 | 192 | if (found) return false; // already inserted 193 | 194 | 195 | // Follow breadcrumbs back to root 196 | 197 | Key newKey = { newItem, 0 }; 198 | bool needsMerge = true; 199 | 200 | while (breadcrumbs.size()) { 201 | auto crumb = breadcrumbs.back(); 202 | breadcrumbs.pop_back(); 203 | 204 | auto &node = getNodeWrite(crumb.nodePtr.nodeId).get(); 205 | 206 | if (!needsMerge) { 207 | node.accum.add(newItem); 208 | node.accumCount++; 209 | } else if (crumb.nodePtr.get().numItems < MAX_ITEMS) { 210 | // Happy path: Node has room for new item 211 | 212 | node.items[node.numItems] = newKey; 213 | std::inplace_merge(node.items, node.items + node.numItems, node.items + node.numItems + 1); 214 | node.numItems++; 215 | 216 | node.accum.add(newItem); 217 | node.accumCount++; 218 | 219 | needsMerge = false; 220 | } else { 221 | // Node is full: Split it into 2 222 | 223 | auto &left = node; 224 | auto rightPtr = makeNode(); 225 | auto &right = rightPtr.get(); 226 | 227 | left.items[MAX_ITEMS] = newKey; 228 | std::inplace_merge(left.items, left.items + MAX_ITEMS, left.items + MAX_ITEMS + 1); 229 | 230 | left.accum.setToZero(); 231 | left.accumCount = 0; 232 | 233 | if (!left.nextSibling) { 234 | // If right-most node, pack as tightly as possible to optimise for append workloads 235 | left.numItems = MAX_ITEMS; 236 | right.numItems = 1; 237 | } else { 238 | // Otherwise, split the node equally 239 | left.numItems = (MAX_ITEMS / 2) + 1; 240 | right.numItems = MAX_ITEMS / 2; 241 | } 242 | 243 | for (size_t i = 0; i < left.numItems; i++) { 244 | addToAccum(left.items[i], left); 245 | } 246 | 247 | for (size_t i = 0; i < right.numItems; i++) { 248 | right.items[i] = left.items[left.numItems + i]; 249 | addToAccum(right.items[i], right); 250 | } 251 | 252 | for (size_t i = left.numItems; i < MAX_ITEMS + 1; i++) left.items[i].setToZero(); 253 | 254 | right.nextSibling = left.nextSibling; 255 | left.nextSibling = rightPtr.nodeId; 256 | right.prevSibling = crumb.nodePtr.nodeId; 257 | 258 | if (right.nextSibling) { 259 | auto &rightRight = getNodeWrite(right.nextSibling).get(); 260 | rightRight.prevSibling = rightPtr.nodeId; 261 | } 262 | 263 | newKey = { right.items[0].item, rightPtr.nodeId }; 264 | } 265 | 266 | // Update left-most key, in case item was inserted at the beginning 267 | 268 | refreshIndex(node, 0); 269 | } 270 | 271 | // Out of breadcrumbs but still need to merge: New level required 272 | 273 | if (needsMerge) { 274 | auto &left = getNodeRead(rootNodeId).get(); 275 | auto &right = getNodeRead(newKey.nodeId).get(); 276 | 277 | auto newRootPtr = makeNode(); 278 | auto &newRoot = newRootPtr.get(); 279 | newRoot.numItems = 2; 280 | 281 | newRoot.accum.add(left.accum); 282 | newRoot.accum.add(right.accum); 283 | newRoot.accumCount = left.accumCount + right.accumCount; 284 | 285 | newRoot.items[0] = left.items[0]; 286 | newRoot.items[0].nodeId = rootNodeId; 287 | newRoot.items[1] = right.items[0]; 288 | newRoot.items[1].nodeId = newKey.nodeId; 289 | 290 | setRootNodeId(newRootPtr.nodeId); 291 | } 292 | 293 | return true; 294 | } 295 | 296 | 297 | 298 | /// Erase 299 | 300 | bool erase(uint64_t createdAt, std::string_view id) { 301 | return eraseItem(Item(createdAt, id)); 302 | } 303 | 304 | bool eraseItem(const Item &oldItem) { 305 | auto rootNodeId = getRootNodeId(); 306 | if (!rootNodeId) return false; 307 | 308 | 309 | // Traverse interior nodes, leaving breadcrumbs along the way 310 | 311 | bool found; 312 | auto breadcrumbs = searchItem(rootNodeId, oldItem, found); 313 | if (!found) return false; 314 | 315 | 316 | // Remove from node 317 | 318 | bool needsRemove = true; 319 | bool neighbourRefreshNeeded = false; 320 | 321 | while (breadcrumbs.size()) { 322 | auto crumb = breadcrumbs.back(); 323 | breadcrumbs.pop_back(); 324 | 325 | auto &node = getNodeWrite(crumb.nodePtr.nodeId).get(); 326 | 327 | if (!needsRemove) { 328 | node.accum.sub(oldItem); 329 | node.accumCount--; 330 | } else { 331 | for (size_t i = crumb.index + 1; i < node.numItems; i++) node.items[i - 1] = node.items[i]; 332 | node.numItems--; 333 | node.items[node.numItems].setToZero(); 334 | 335 | node.accum.sub(oldItem); 336 | node.accumCount--; 337 | 338 | needsRemove = false; 339 | } 340 | 341 | 342 | if (crumb.index < node.numItems) refreshIndex(node, crumb.index); 343 | 344 | if (neighbourRefreshNeeded) { 345 | refreshIndex(node, crumb.index + 1); 346 | neighbourRefreshNeeded = false; 347 | } 348 | 349 | 350 | if (node.numItems < MIN_ITEMS && breadcrumbs.size() && breadcrumbs.back().nodePtr.get().numItems > 1) { 351 | auto rebalance = [&](Node &leftNode, Node &rightNode) { 352 | size_t totalItems = leftNode.numItems + rightNode.numItems; 353 | size_t numLeft = (totalItems + 1) / 2; 354 | size_t numRight = totalItems - numLeft; 355 | 356 | Accumulator accum; 357 | accum.setToZero(); 358 | uint64_t accumCount = 0; 359 | 360 | if (rightNode.numItems >= numRight) { 361 | // Move extra from right to left 362 | 363 | size_t numMove = rightNode.numItems - numRight; 364 | 365 | for (size_t i = 0; i < numMove; i++) { 366 | auto &item = rightNode.items[i]; 367 | if (item.nodeId == 0) { 368 | accum.add(item.item); 369 | accumCount++; 370 | } else { 371 | auto &movingNode = getNodeRead(item.nodeId).get(); 372 | accum.add(movingNode.accum); 373 | accumCount += movingNode.accumCount; 374 | } 375 | leftNode.items[leftNode.numItems + i] = item; 376 | } 377 | 378 | ::memmove(rightNode.items, rightNode.items + numMove, (rightNode.numItems - numMove) * sizeof(rightNode.items[0])); 379 | 380 | for (size_t i = numRight; i < rightNode.numItems; i++) rightNode.items[i].setToZero(); 381 | 382 | leftNode.accum.add(accum); 383 | rightNode.accum.sub(accum); 384 | 385 | leftNode.accumCount += accumCount; 386 | rightNode.accumCount -= accumCount; 387 | 388 | neighbourRefreshNeeded = true; 389 | } else { 390 | // Move extra from left to right 391 | 392 | size_t numMove = leftNode.numItems - numLeft; 393 | 394 | ::memmove(rightNode.items + numMove, rightNode.items, rightNode.numItems * sizeof(rightNode.items[0])); 395 | 396 | for (size_t i = 0; i < numMove; i++) { 397 | auto &item = leftNode.items[numLeft + i]; 398 | if (item.nodeId == 0) { 399 | accum.add(item.item); 400 | accumCount++; 401 | } else { 402 | auto &movingNode = getNodeRead(item.nodeId).get(); 403 | accum.add(movingNode.accum); 404 | accumCount += movingNode.accumCount; 405 | } 406 | rightNode.items[i] = item; 407 | } 408 | 409 | for (size_t i = numLeft; i < leftNode.numItems; i++) leftNode.items[i].setToZero(); 410 | 411 | leftNode.accum.sub(accum); 412 | rightNode.accum.add(accum); 413 | 414 | leftNode.accumCount -= accumCount; 415 | rightNode.accumCount += accumCount; 416 | } 417 | 418 | leftNode.numItems = numLeft; 419 | rightNode.numItems = numRight; 420 | }; 421 | 422 | if (breadcrumbs.back().index == 0) { 423 | // Use neighbour to the right 424 | 425 | auto &leftNode = node; 426 | auto &rightNode = getNodeWrite(node.nextSibling).get(); 427 | size_t totalItems = leftNode.numItems + rightNode.numItems; 428 | 429 | if (totalItems <= REBALANCE_THRESHOLD) { 430 | // Move all items into right 431 | 432 | ::memmove(rightNode.items + leftNode.numItems, rightNode.items, sizeof(rightNode.items[0]) * rightNode.numItems); 433 | ::memcpy(rightNode.items, leftNode.items, sizeof(leftNode.items[0]) * leftNode.numItems); 434 | 435 | rightNode.numItems += leftNode.numItems; 436 | rightNode.accumCount += leftNode.accumCount; 437 | rightNode.accum.add(leftNode.accum); 438 | 439 | if (leftNode.prevSibling) getNodeWrite(leftNode.prevSibling).get().nextSibling = leftNode.nextSibling; 440 | rightNode.prevSibling = leftNode.prevSibling; 441 | 442 | leftNode.numItems = 0; 443 | } else { 444 | // Rebalance from left to right 445 | 446 | rebalance(leftNode, rightNode); 447 | } 448 | } else { 449 | // Use neighbour to the left 450 | 451 | auto &leftNode = getNodeWrite(node.prevSibling).get(); 452 | auto &rightNode = node; 453 | size_t totalItems = leftNode.numItems + rightNode.numItems; 454 | 455 | if (totalItems <= REBALANCE_THRESHOLD) { 456 | // Move all items into left 457 | 458 | ::memcpy(leftNode.items + leftNode.numItems, rightNode.items, sizeof(rightNode.items[0]) * rightNode.numItems); 459 | 460 | leftNode.numItems += rightNode.numItems; 461 | leftNode.accumCount += rightNode.accumCount; 462 | leftNode.accum.add(rightNode.accum); 463 | 464 | if (rightNode.nextSibling) getNodeWrite(rightNode.nextSibling).get().prevSibling = rightNode.prevSibling; 465 | leftNode.nextSibling = rightNode.nextSibling; 466 | 467 | rightNode.numItems = 0; 468 | } else { 469 | // Rebalance from right to left 470 | 471 | rebalance(leftNode, rightNode); 472 | } 473 | } 474 | } 475 | 476 | if (node.numItems == 0) { 477 | if (node.prevSibling) getNodeWrite(node.prevSibling).get().nextSibling = node.nextSibling; 478 | if (node.nextSibling) getNodeWrite(node.nextSibling).get().prevSibling = node.prevSibling; 479 | 480 | needsRemove = true; 481 | 482 | deleteNode(crumb.nodePtr.nodeId); 483 | } 484 | } 485 | 486 | if (needsRemove) { 487 | setRootNodeId(0); 488 | } else { 489 | auto &node = getNodeRead(rootNodeId).get(); 490 | 491 | if (node.numItems == 1 && node.items[0].nodeId) { 492 | setRootNodeId(node.items[0].nodeId); 493 | deleteNode(rootNodeId); 494 | } 495 | } 496 | 497 | return true; 498 | } 499 | 500 | 501 | //// Compat with the vector interface 502 | 503 | void seal() { 504 | } 505 | 506 | void unseal() { 507 | } 508 | 509 | 510 | //// Utils 511 | 512 | void refreshIndex(Node &node, size_t index) { 513 | auto childNodePtr = getNodeRead(node.items[index].nodeId); 514 | if (childNodePtr.exists()) { 515 | auto &childNode = childNodePtr.get(); 516 | node.items[index].item = childNode.items[0].item; 517 | } 518 | } 519 | 520 | void addToAccum(const Key &k, Node &node) { 521 | if (k.nodeId == 0) { 522 | node.accum.add(k.item); 523 | node.accumCount++; 524 | } else { 525 | auto nodePtr = getNodeRead(k.nodeId); 526 | node.accum.add(nodePtr.get().accum); 527 | node.accumCount += nodePtr.get().accumCount; 528 | } 529 | } 530 | 531 | void traverseToOffset(size_t index, const std::function &cb, std::function customAccum = nullptr) { 532 | auto rootNodePtr = getNodeRead(getRootNodeId()); 533 | if (!rootNodePtr.exists()) return; 534 | auto &rootNode = rootNodePtr.get(); 535 | 536 | if (index > rootNode.accumCount) throw err("out of range"); 537 | return traverseToOffsetAux(index, rootNode, cb, customAccum); 538 | } 539 | 540 | void traverseToOffsetAux(size_t index, Node &node, const std::function &cb, std::function customAccum) { 541 | if (node.numItems == node.accumCount) { 542 | cb(node, index); 543 | return; 544 | } 545 | 546 | for (size_t i = 0; i < node.numItems; i++) { 547 | auto &child = getNodeRead(node.items[i].nodeId).get(); 548 | if (index < child.accumCount) return traverseToOffsetAux(index, child, cb, customAccum); 549 | index -= child.accumCount; 550 | if (customAccum) customAccum(child); 551 | } 552 | } 553 | 554 | 555 | 556 | //// Interface 557 | 558 | uint64_t size() { 559 | auto rootNodePtr = getNodeRead(getRootNodeId()); 560 | if (!rootNodePtr.exists()) return 0; 561 | auto &rootNode = rootNodePtr.get(); 562 | return rootNode.accumCount; 563 | } 564 | 565 | const Item &getItem(size_t index) { 566 | if (index >= size()) throw err("out of range"); 567 | 568 | Item *out; 569 | traverseToOffset(index, [&](Node &node, size_t index){ 570 | out = &node.items[index].item; 571 | }); 572 | return *out; 573 | } 574 | 575 | void iterate(size_t begin, size_t end, std::function cb) { 576 | checkBounds(begin, end); 577 | 578 | size_t num = end - begin; 579 | 580 | traverseToOffset(begin, [&](Node &node, size_t index){ 581 | Node *currNode = &node; 582 | for (size_t i = 0; i < num; i++) { 583 | if (!cb(currNode->items[index].item, begin + i)) return; 584 | index++; 585 | if (index >= currNode->numItems) { 586 | currNode = getNodeRead(currNode->nextSibling).p; 587 | index = 0; 588 | } 589 | } 590 | }); 591 | } 592 | 593 | size_t findLowerBound(size_t begin, size_t end, const Bound &value) { 594 | checkBounds(begin, end); 595 | 596 | auto rootNodePtr = getNodeRead(getRootNodeId()); 597 | if (!rootNodePtr.exists()) return end; 598 | auto &rootNode = rootNodePtr.get(); 599 | if (value.item <= rootNode.items[0].item) return begin; 600 | return std::min(findLowerBoundAux(value, rootNodePtr, 0), end); 601 | } 602 | 603 | size_t findLowerBoundAux(const Bound &value, NodePtr nodePtr, uint64_t numToLeft) { 604 | if (!nodePtr.exists()) return numToLeft + 1; 605 | 606 | Node &node = nodePtr.get(); 607 | 608 | for (size_t i = 1; i < node.numItems; i++) { 609 | if (value.item <= node.items[i].item) { 610 | return findLowerBoundAux(value, getNodeRead(node.items[i - 1].nodeId), numToLeft); 611 | } else { 612 | if (node.items[i - 1].nodeId) numToLeft += getNodeRead(node.items[i - 1].nodeId).get().accumCount; 613 | else numToLeft++; 614 | } 615 | } 616 | 617 | return findLowerBoundAux(value, getNodeRead(node.items[node.numItems - 1].nodeId), numToLeft); 618 | } 619 | 620 | Fingerprint fingerprint(size_t begin, size_t end) { 621 | checkBounds(begin, end); 622 | 623 | auto getAccumLeftOf = [&](size_t index) { 624 | Accumulator accum; 625 | accum.setToZero(); 626 | 627 | traverseToOffset(index, [&](Node &node, size_t index){ 628 | for (size_t i = 0; i < index; i++) accum.add(node.items[i].item); 629 | }, [&](Node &node){ 630 | accum.add(node.accum); 631 | }); 632 | 633 | return accum; 634 | }; 635 | 636 | auto accum1 = getAccumLeftOf(begin); 637 | auto accum2 = getAccumLeftOf(end); 638 | 639 | accum1.negate(); 640 | accum2.add(accum1); 641 | 642 | return accum2.getFingerprint(end - begin); 643 | } 644 | 645 | private: 646 | void checkBounds(size_t begin, size_t end) { 647 | if (begin > end || end > size()) throw negentropy::err("bad range"); 648 | } 649 | }; 650 | 651 | 652 | }}} 653 | -------------------------------------------------------------------------------- /cpp/negentropy/storage/btree/debug.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | #include 7 | 8 | #include "negentropy/storage/btree/core.h" 9 | #include "negentropy/storage/BTreeMem.h" 10 | #include "negentropy/storage/BTreeLMDB.h" 11 | 12 | 13 | namespace negentropy { namespace storage { namespace btree { 14 | 15 | 16 | using err = std::runtime_error; 17 | 18 | 19 | inline void dump(BTreeCore &btree, uint64_t nodeId, int depth) { 20 | if (nodeId == 0) { 21 | if (depth == 0) std::cout << "EMPTY TREE" << std::endl; 22 | return; 23 | } 24 | 25 | auto nodePtr = btree.getNodeRead(nodeId); 26 | auto &node = nodePtr.get(); 27 | std::string indent(depth * 4, ' '); 28 | 29 | std::cout << indent << "NODE id=" << nodeId << " numItems=" << node.numItems << " accum=" << hoytech::to_hex(node.accum.sv()) << " accumCount=" << node.accumCount << std::endl; 30 | 31 | for (size_t i = 0; i < node.numItems; i++) { 32 | std::cout << indent << " item: " << node.items[i].item.timestamp << "," << hoytech::to_hex(node.items[i].item.getId()) << std::endl; 33 | dump(btree, node.items[i].nodeId, depth + 1); 34 | } 35 | } 36 | 37 | inline void dump(BTreeCore &btree) { 38 | dump(btree, btree.getRootNodeId(), 0); 39 | } 40 | 41 | 42 | struct VerifyContext { 43 | std::optional leafDepth; 44 | std::set allNodeIds; 45 | std::vector leafNodeIds; 46 | }; 47 | 48 | inline void verify(BTreeCore &btree, uint64_t nodeId, uint64_t depth, VerifyContext &ctx, Accumulator *accumOut = nullptr, uint64_t *accumCountOut = nullptr) { 49 | if (nodeId == 0) return; 50 | 51 | if (ctx.allNodeIds.contains(nodeId)) throw err("verify: saw node id again"); 52 | ctx.allNodeIds.insert(nodeId); 53 | 54 | auto nodePtr = btree.getNodeRead(nodeId); 55 | auto &node = nodePtr.get(); 56 | 57 | if (node.numItems == 0) throw err("verify: empty node"); 58 | if (node.nextSibling && node.numItems < MIN_ITEMS) throw err("verify: too few items in node"); 59 | if (node.numItems > MAX_ITEMS) throw err("verify: too many items"); 60 | 61 | if (node.items[0].nodeId == 0) { 62 | if (ctx.leafDepth) { 63 | if (*ctx.leafDepth != depth) throw err("verify: mismatch of leaf depth"); 64 | } else { 65 | ctx.leafDepth = depth; 66 | } 67 | 68 | ctx.leafNodeIds.push_back(nodeId); 69 | } 70 | 71 | // FIXME: verify unused items are zeroed 72 | 73 | Accumulator accum; 74 | accum.setToZero(); 75 | uint64_t accumCount = 0; 76 | 77 | for (size_t i = 0; i < node.numItems; i++) { 78 | uint64_t childNodeId = node.items[i].nodeId; 79 | if (childNodeId == 0) { 80 | accum.add(node.items[i].item); 81 | accumCount++; 82 | } else { 83 | { 84 | auto firstChildPtr = btree.getNodeRead(childNodeId); 85 | auto &firstChild = firstChildPtr.get(); 86 | if (firstChild.numItems == 0 || firstChild.items[0].item != node.items[i].item) throw err("verify: key does not match child's first key"); 87 | } 88 | verify(btree, childNodeId, depth + 1, ctx, &accum, &accumCount); 89 | } 90 | 91 | if (i < node.numItems - 1) { 92 | if (!(node.items[i].item < node.items[i + 1].item)) throw err("verify: items out of order"); 93 | } 94 | } 95 | 96 | for (size_t i = node.numItems; i < MAX_ITEMS + 1; i++) { 97 | for (size_t j = 0; j < sizeof(Key); j++) if (((char*)&node.items[i])[j] != '\0') throw err("verify: memory not zeroed out"); 98 | } 99 | 100 | if (accumCount != node.accumCount) throw err("verify: accumCount mismatch"); 101 | if (accum.sv() != node.accum.sv()) throw err("verify: accum mismatch"); 102 | 103 | if (accumOut) accumOut->add(accum); 104 | if (accumCountOut) *accumCountOut += accumCount; 105 | } 106 | 107 | inline void verify(BTreeCore &btree, bool isLMDB) { 108 | VerifyContext ctx; 109 | Accumulator accum; 110 | accum.setToZero(); 111 | uint64_t accumCount = 0; 112 | 113 | verify(btree, btree.getRootNodeId(), 0, ctx, &accum, &accumCount); 114 | 115 | if (ctx.leafNodeIds.size()) { 116 | uint64_t i = 0, totalItems = 0; 117 | auto nodePtr = btree.getNodeRead(ctx.leafNodeIds[0]); 118 | std::optional prevItem; 119 | uint64_t prevSibling = 0; 120 | 121 | while (nodePtr.exists()) { 122 | auto &node = nodePtr.get(); 123 | if (nodePtr.nodeId != ctx.leafNodeIds[i]) throw err("verify: leaf id mismatch"); 124 | 125 | if (prevSibling != node.prevSibling) throw err("verify: prevSibling mismatch"); 126 | prevSibling = nodePtr.nodeId; 127 | 128 | nodePtr = btree.getNodeRead(node.nextSibling); 129 | i++; 130 | 131 | for (size_t j = 0; j < node.numItems; j++) { 132 | if (prevItem && !(*prevItem < node.items[j].item)) throw err("verify: leaf item out of order"); 133 | prevItem = node.items[j].item; 134 | totalItems++; 135 | } 136 | } 137 | 138 | if (totalItems != accumCount) throw err("verify: leaf count mismatch"); 139 | } 140 | 141 | // Check for leaks 142 | 143 | if (isLMDB) { 144 | static_assert(std::endian::native == std::endian::little); // FIXME 145 | 146 | auto &btreeLMDB = dynamic_cast(btree); 147 | btreeLMDB.flush(); 148 | 149 | std::string_view key, val; 150 | 151 | // Leaks 152 | 153 | auto cursor = lmdb::cursor::open(btreeLMDB.txn, btreeLMDB.dbi); 154 | 155 | if (cursor.get(key, val, MDB_FIRST)) { 156 | do { 157 | uint64_t nodeId = lmdb::from_sv(key.substr(8)); 158 | if (nodeId != 0 && !ctx.allNodeIds.contains(nodeId)) throw err("verify: memory leak"); 159 | } while (cursor.get(key, val, MDB_NEXT)); 160 | } 161 | 162 | // Dangling 163 | 164 | for (const auto &k : ctx.allNodeIds) { 165 | std::string tpKey; 166 | tpKey += lmdb::to_sv(btreeLMDB.treeId); 167 | tpKey += lmdb::to_sv(k); 168 | if (!btreeLMDB.dbi.get(btreeLMDB.txn, tpKey, val)) throw err("verify: dangling node"); 169 | } 170 | } else { 171 | auto &btreeMem = dynamic_cast(btree); 172 | 173 | // Leaks 174 | 175 | for (const auto &[k, v] : btreeMem._nodeStorageMap) { 176 | if (!ctx.allNodeIds.contains(k)) throw err("verify: memory leak"); 177 | } 178 | 179 | // Dangling 180 | 181 | for (const auto &k : ctx.allNodeIds) { 182 | if (!btreeMem._nodeStorageMap.contains(k)) throw err("verify: dangling node"); 183 | } 184 | } 185 | } 186 | 187 | 188 | 189 | }}} 190 | -------------------------------------------------------------------------------- /cpp/negentropy/types.h: -------------------------------------------------------------------------------- 1 | // (C) 2023 Doug Hoyte. MIT license 2 | 3 | #pragma once 4 | 5 | #include 6 | 7 | 8 | namespace negentropy { 9 | 10 | using err = std::runtime_error; 11 | 12 | const size_t ID_SIZE = 32; 13 | const size_t FINGERPRINT_SIZE = 16; 14 | 15 | 16 | enum class Mode { 17 | Skip = 0, 18 | Fingerprint = 1, 19 | IdList = 2, 20 | }; 21 | 22 | 23 | struct Item { 24 | uint64_t timestamp; 25 | uint8_t id[ID_SIZE]; 26 | 27 | explicit Item(uint64_t timestamp = 0) : timestamp(timestamp) { 28 | memset(id, '\0', sizeof(id)); 29 | } 30 | 31 | explicit Item(uint64_t timestamp, std::string_view id_) : timestamp(timestamp) { 32 | if (id_.size() != sizeof(id)) throw negentropy::err("bad id size for Item"); 33 | memcpy(id, id_.data(), sizeof(id)); 34 | } 35 | 36 | std::string_view getId() const { 37 | return std::string_view(reinterpret_cast(id), sizeof(id)); 38 | } 39 | 40 | bool operator==(const Item &other) const { 41 | return timestamp == other.timestamp && getId() == other.getId(); 42 | } 43 | }; 44 | 45 | inline bool operator<(const Item &a, const Item &b) { 46 | return a.timestamp != b.timestamp ? a.timestamp < b.timestamp : a.getId() < b.getId(); 47 | }; 48 | 49 | inline bool operator<=(const Item &a, const Item &b) { 50 | return a.timestamp != b.timestamp ? a.timestamp <= b.timestamp : a.getId() <= b.getId(); 51 | }; 52 | 53 | 54 | struct Bound { 55 | Item item; 56 | size_t idLen; 57 | 58 | explicit Bound(uint64_t timestamp = 0, std::string_view id = "") : item(timestamp), idLen(id.size()) { 59 | if (idLen > ID_SIZE) throw negentropy::err("bad id size for Bound"); 60 | memcpy(item.id, id.data(), idLen); 61 | } 62 | 63 | explicit Bound(const Item &item_) : item(item_), idLen(ID_SIZE) {} 64 | 65 | bool operator==(const Bound &other) const { 66 | return item == other.item; 67 | } 68 | }; 69 | 70 | inline bool operator<(const Bound &a, const Bound &b) { 71 | return a.item < b.item; 72 | }; 73 | 74 | 75 | struct Fingerprint { 76 | uint8_t buf[FINGERPRINT_SIZE]; 77 | 78 | std::string_view sv() const { 79 | return std::string_view(reinterpret_cast(buf), sizeof(buf)); 80 | } 81 | }; 82 | 83 | struct Accumulator { 84 | uint8_t buf[ID_SIZE]; 85 | 86 | void setToZero() { 87 | memset(buf, '\0', sizeof(buf)); 88 | } 89 | 90 | void add(const Item &item) { 91 | add(item.id); 92 | } 93 | 94 | void add(const Accumulator &acc) { 95 | add(acc.buf); 96 | } 97 | 98 | void add(const uint8_t *otherBuf) { 99 | uint64_t currCarry = 0, nextCarry = 0; 100 | uint64_t *p = reinterpret_cast(buf); 101 | const uint64_t *po = reinterpret_cast(otherBuf); 102 | 103 | auto byteswap = [](uint64_t &n) { 104 | uint8_t *first = reinterpret_cast(&n); 105 | uint8_t *last = first + 8; 106 | std::reverse(first, last); 107 | }; 108 | 109 | for (size_t i = 0; i < 4; i++) { 110 | uint64_t orig = p[i]; 111 | uint64_t otherV = po[i]; 112 | 113 | if constexpr (std::endian::native == std::endian::big) { 114 | byteswap(orig); 115 | byteswap(otherV); 116 | } else { 117 | static_assert(std::endian::native == std::endian::little); 118 | } 119 | 120 | uint64_t next = orig; 121 | 122 | next += currCarry; 123 | if (next < orig) nextCarry = 1; 124 | 125 | next += otherV; 126 | if (next < otherV) nextCarry = 1; 127 | 128 | if constexpr (std::endian::native == std::endian::big) { 129 | byteswap(next); 130 | } 131 | 132 | p[i] = next; 133 | currCarry = nextCarry; 134 | nextCarry = 0; 135 | } 136 | } 137 | 138 | void negate() { 139 | for (size_t i = 0; i < sizeof(buf); i++) { 140 | buf[i] = ~buf[i]; 141 | } 142 | 143 | Accumulator one; 144 | one.setToZero(); 145 | one.buf[0] = 1; 146 | add(one.buf); 147 | } 148 | 149 | void sub(const Item &item) { 150 | sub(item.id); 151 | } 152 | 153 | void sub(const Accumulator &acc) { 154 | sub(acc.buf); 155 | } 156 | 157 | void sub(const uint8_t *otherBuf) { 158 | Accumulator neg; 159 | memcpy(neg.buf, otherBuf, sizeof(buf)); 160 | neg.negate(); 161 | add(neg); 162 | } 163 | 164 | std::string_view sv() const { 165 | return std::string_view(reinterpret_cast(buf), sizeof(buf)); 166 | } 167 | 168 | Fingerprint getFingerprint(uint64_t n) { 169 | std::string input; 170 | input += sv(); 171 | input += encodeVarInt(n); 172 | 173 | unsigned char hash[SHA256_DIGEST_LENGTH]; 174 | SHA256(reinterpret_cast(input.data()), input.size(), hash); 175 | 176 | Fingerprint out; 177 | memcpy(out.buf, hash, FINGERPRINT_SIZE); 178 | 179 | return out; 180 | } 181 | }; 182 | 183 | 184 | } 185 | -------------------------------------------------------------------------------- /docs/fq.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoytech/negentropy/ebbeaf33ba122e8e416f95063a15ce5787ca6672/docs/fq.png -------------------------------------------------------------------------------- /docs/negentropy-protocol-v1.md: -------------------------------------------------------------------------------- 1 | ## Negentropy Protocol V1 2 | 3 | This document specifies the message semantics and low-level wire format for version 1 of the Negentropy set reconciliation protocol. For a high-level introduction, see the [Range-Based Set Reconciliation](https://logperiodic.com/rbsr.html) article. For reference implementations and conformance tests, see the [negentropy project page](https://github.com/hoytech/negentropy). 4 | 5 | ### Preparation 6 | 7 | There are two protocol participants: Client and server. The client creates an initial message and transmits it to the server, which replies with its own message in response. The client continues querying the server until it is satisifed, and then terminates the protocol. Messages in either direction have the same format. 8 | 9 | Each participant has a collection of records. A records consists of a 64-bit numeric timestamp and a 256-bit ID. Each participant starts by sorting their items according to timestamp, ascending. If two timestamps are equal then items are sorted lexically by ID, ascending by first differing byte. Items may not use the max uint64 value (`2**64 - 1`) as a timestamp since this is reserved as a special "infinity" value. 10 | 11 | The goal of the protocol is for the client to learn the set of IDs that it has and the server does not, and the set of items that the server has and it does not. 12 | 13 | ### `Varint` 14 | 15 | Varints (variable-sized unsigned integers) are represented as base-128 digits, most significant digit first, with as few digits as possible. Bit eight (the high bit) is set on each byte except the last. 16 | 17 | Varint := * 18 | 19 | ### `Id` 20 | 21 | IDs are represented as byte-strings of length `32`: 22 | 23 | Id := Byte{32} 24 | 25 | ### `Message` 26 | 27 | A reconciliation message is a protocol version byte followed by an ordered list of ranges: 28 | 29 | Message := * 30 | 31 | The current protocol version is 1, represented by the byte `0x61`. Protocol version 2 will be `0x62`, and so forth. If a server receives a message with a protocol version that it cannot handle, it should reply with a single byte containing the highest protocol version it supports, allowing the client to downgrade and retry its message. 32 | 33 | Each Range corresponds to a contiguous section of the timestamp/ID space. The first Range starts at timestamp 0 and an ID of 0 bytes. Ranges are always adjacent (no gaps). If the last Range doesn't end at the special infinity value, an implicit `Skip` to infinity Range is appended. This means that the list of Ranges always covers the full timestamp/ID space. 34 | 35 | ### `Range` 36 | 37 | A Range consists of an upper bound, a mode, and a payload: 38 | 39 | Range := 40 | 41 | The contents of the payload is determined by mode: 42 | 43 | * If `mode = 0`, then payload is `Skip`, meaning the sender does not wish to process this Range further. This payload is empty: 44 | 45 | Skip := 46 | 47 | * If `mode = 1`, then payload is a `Fingerprint`, which is a [digest](#fingerprint-algorithm) of all the IDs the sender has within the Range: 48 | 49 | Fingerprint := Byte{16} 50 | 51 | * If `mode = 2`, the payload is `IdList`, a variable-length list of all IDs the sender has within the Range: 52 | 53 | IdList := * 54 | 55 | 56 | ### `Bound` 57 | 58 | Each Range is specified by an *inclusive* lower bound and an *exclusive* upper bound. As defined above, each Range only includes an upper bound: the lower bound of a Range is the upper bound of the previous Range, or 0 timestamp/0 ID for the first Range. 59 | 60 | A Bound consists of an encoded timestamp and a variable-length disambiguating prefix of an ID (in case multiple items have the same timestamp): 61 | 62 | Bound := * 63 | 64 | * The timestamp is encoded specially. The infinity timestamp is encoded as `0`. All other values are encoded as `1 + offset`, where offset is the difference between this timestamp and the previously encoded timestamp. The initial offset starts at `0` and resets at the beginning of each message. 65 | 66 | Offsets are always non-negative since the upper bound's timestamp is greater than or equal to the lower bound's timestamp, ranges in a message are always encoded in ascending order, and ranges never overlap. 67 | 68 | * The size of `idPrefix` is encoded in `length`, and can be between `0` and `32` bytes, inclusive. This allows implementations to use the shortest possible prefix to separate the first record of this Range from the last record of the previous Range. If these records' timestamps differ, then the length should be 0, otherwise it should be the byte-length of their common ID-prefix plus 1. 69 | 70 | If the `idPrefix` length is less than `32` then the omitted trailing bytes are implicitly 0 bytes. 71 | 72 | 73 | ### Fingerprint Algorithm 74 | 75 | The fingerprint of a Range is computed with the following algorithm: 76 | 77 | * Compute the addition mod 2256 of the element IDs (interpreted as 32-byte little-endian unsigned integers) 78 | * Concatenate with the number of elements in the Range, encoded as a [Varint](#varint) 79 | * Hash with SHA-256 80 | * Take the first 16 bytes 81 | -------------------------------------------------------------------------------- /js/Negentropy.js: -------------------------------------------------------------------------------- 1 | // (C) 2023 Doug Hoyte. MIT license 2 | 3 | const PROTOCOL_VERSION = 0x61; // Version 1 4 | const ID_SIZE = 32; 5 | const FINGERPRINT_SIZE = 16; 6 | 7 | const Mode = { 8 | Skip: 0, 9 | Fingerprint: 1, 10 | IdList: 2, 11 | }; 12 | 13 | class WrappedBuffer { 14 | constructor(buffer) { 15 | this._raw = new Uint8Array(buffer || 512); 16 | this.length = buffer ? buffer.length : 0; 17 | } 18 | 19 | unwrap() { 20 | return this._raw.subarray(0, this.length); 21 | } 22 | 23 | get capacity() { 24 | return this._raw.byteLength; 25 | } 26 | 27 | extend(buf) { 28 | if (buf._raw) buf = buf.unwrap(); 29 | if (typeof(buf.length) !== 'number') throw Error("bad length"); 30 | const targetSize = buf.length + this.length; 31 | if (this.capacity < targetSize) { 32 | const oldRaw = this._raw; 33 | const newCapacity = Math.max(this.capacity * 2, targetSize); 34 | this._raw = new Uint8Array(newCapacity); 35 | this._raw.set(oldRaw); 36 | } 37 | 38 | this._raw.set(buf, this.length); 39 | this.length += buf.length; 40 | } 41 | 42 | shift() { 43 | const first = this._raw[0]; 44 | this._raw = this._raw.subarray(1); 45 | this.length--; 46 | return first; 47 | } 48 | 49 | shiftN(n = 1) { 50 | const firstSubarray = this._raw.subarray(0, n); 51 | this._raw = this._raw.subarray(n); 52 | this.length -= n; 53 | return firstSubarray; 54 | } 55 | } 56 | 57 | function decodeVarInt(buf) { 58 | let res = 0; 59 | 60 | while (1) { 61 | if (buf.length === 0) throw Error("parse ends prematurely"); 62 | let byte = buf.shift(); 63 | res = (res << 7) | (byte & 127); 64 | if ((byte & 128) === 0) break; 65 | } 66 | 67 | return res; 68 | } 69 | 70 | function encodeVarInt(n) { 71 | if (n === 0) return new WrappedBuffer([0]); 72 | 73 | let o = []; 74 | 75 | while (n !== 0) { 76 | o.push(n & 127); 77 | n >>>= 7; 78 | } 79 | 80 | o.reverse(); 81 | 82 | for (let i = 0; i < o.length - 1; i++) o[i] |= 128; 83 | 84 | return new WrappedBuffer(o); 85 | } 86 | 87 | function getByte(buf) { 88 | return getBytes(buf, 1)[0]; 89 | } 90 | 91 | function getBytes(buf, n) { 92 | if (buf.length < n) throw Error("parse ends prematurely"); 93 | return buf.shiftN(n); 94 | } 95 | 96 | 97 | class Accumulator { 98 | constructor() { 99 | this.setToZero(); 100 | 101 | if (typeof window === 'undefined') { // node.js 102 | const crypto = require('crypto'); 103 | this.sha256 = async (slice) => new Uint8Array(crypto.createHash('sha256').update(slice).digest()); 104 | } else { // browser 105 | this.sha256 = async (slice) => new Uint8Array(await crypto.subtle.digest("SHA-256", slice)); 106 | } 107 | } 108 | 109 | setToZero() { 110 | this.buf = new Uint8Array(ID_SIZE); 111 | } 112 | 113 | add(otherBuf) { 114 | let currCarry = 0, nextCarry = 0; 115 | let p = new DataView(this.buf.buffer); 116 | let po = new DataView(otherBuf.buffer); 117 | 118 | for (let i = 0; i < 8; i++) { 119 | let offset = i * 4; 120 | let orig = p.getUint32(offset, true); 121 | let otherV = po.getUint32(offset, true); 122 | 123 | let next = orig; 124 | 125 | next += currCarry; 126 | next += otherV; 127 | if (next > 0xFFFFFFFF) nextCarry = 1; 128 | 129 | p.setUint32(offset, next & 0xFFFFFFFF, true); 130 | currCarry = nextCarry; 131 | nextCarry = 0; 132 | } 133 | } 134 | 135 | negate() { 136 | let p = new DataView(this.buf.buffer); 137 | 138 | for (let i = 0; i < 8; i++) { 139 | let offset = i * 4; 140 | p.setUint32(offset, ~p.getUint32(offset, true)); 141 | } 142 | 143 | let one = new Uint8Array(ID_SIZE); 144 | one[0] = 1; 145 | this.add(one); 146 | } 147 | 148 | async getFingerprint(n) { 149 | let input = new WrappedBuffer(); 150 | input.extend(this.buf); 151 | input.extend(encodeVarInt(n)); 152 | 153 | let hash = await this.sha256(input.unwrap()); 154 | 155 | return hash.subarray(0, FINGERPRINT_SIZE); 156 | } 157 | }; 158 | 159 | 160 | class NegentropyStorageVector { 161 | constructor() { 162 | this.items = []; 163 | this.sealed = false; 164 | } 165 | 166 | insert(timestamp, id) { 167 | if (this.sealed) throw Error("already sealed"); 168 | id = loadInputBuffer(id); 169 | if (id.byteLength !== ID_SIZE) throw Error("bad id size for added item"); 170 | this.items.push({ timestamp, id }); 171 | } 172 | 173 | seal() { 174 | if (this.sealed) throw Error("already sealed"); 175 | this.sealed = true; 176 | 177 | this.items.sort(itemCompare); 178 | 179 | for (let i = 1; i < this.items.length; i++) { 180 | if (itemCompare(this.items[i - 1], this.items[i]) === 0) throw Error("duplicate item inserted"); 181 | } 182 | } 183 | 184 | unseal() { 185 | this.sealed = false; 186 | } 187 | 188 | size() { 189 | this._checkSealed(); 190 | return this.items.length; 191 | } 192 | 193 | getItem(i) { 194 | this._checkSealed(); 195 | if (i >= this.items.length) throw Error("out of range"); 196 | return this.items[i]; 197 | } 198 | 199 | iterate(begin, end, cb) { 200 | this._checkSealed(); 201 | this._checkBounds(begin, end); 202 | 203 | for (let i = begin; i < end; ++i) { 204 | if (!cb(this.items[i], i)) break; 205 | } 206 | } 207 | 208 | findLowerBound(begin, end, bound) { 209 | this._checkSealed(); 210 | this._checkBounds(begin, end); 211 | 212 | return this._binarySearch(this.items, begin, end, (a) => itemCompare(a, bound) < 0); 213 | } 214 | 215 | async fingerprint(begin, end) { 216 | let out = new Accumulator(); 217 | out.setToZero(); 218 | 219 | this.iterate(begin, end, (item, i) => { 220 | out.add(item.id); 221 | return true; 222 | }); 223 | 224 | return await out.getFingerprint(end - begin); 225 | } 226 | 227 | _checkSealed() { 228 | if (!this.sealed) throw Error("not sealed"); 229 | } 230 | 231 | _checkBounds(begin, end) { 232 | if (begin > end || end > this.items.length) throw Error("bad range"); 233 | } 234 | 235 | _binarySearch(arr, first, last, cmp) { 236 | let count = last - first; 237 | 238 | while (count > 0) { 239 | let it = first; 240 | let step = Math.floor(count / 2); 241 | it += step; 242 | 243 | if (cmp(arr[it])) { 244 | first = ++it; 245 | count -= step + 1; 246 | } else { 247 | count = step; 248 | } 249 | } 250 | 251 | return first; 252 | } 253 | } 254 | 255 | 256 | class Negentropy { 257 | constructor(storage, frameSizeLimit = 0) { 258 | if (frameSizeLimit !== 0 && frameSizeLimit < 4096) throw Error("frameSizeLimit too small"); 259 | 260 | this.storage = storage; 261 | this.frameSizeLimit = frameSizeLimit; 262 | 263 | this.lastTimestampIn = 0; 264 | this.lastTimestampOut = 0; 265 | } 266 | 267 | _bound(timestamp, id) { 268 | return { timestamp, id: id ? id : new Uint8Array(0) }; 269 | } 270 | 271 | async initiate() { 272 | if (this.isInitiator) throw Error("already initiated"); 273 | this.isInitiator = true; 274 | 275 | let output = new WrappedBuffer(); 276 | output.extend([ PROTOCOL_VERSION ]); 277 | 278 | await this.splitRange(0, this.storage.size(), this._bound(Number.MAX_VALUE), output); 279 | 280 | return this._renderOutput(output); 281 | } 282 | 283 | setInitiator() { 284 | this.isInitiator = true; 285 | } 286 | 287 | async reconcile(query) { 288 | let haveIds = [], needIds = []; 289 | query = new WrappedBuffer(loadInputBuffer(query)); 290 | 291 | this.lastTimestampIn = this.lastTimestampOut = 0; // reset for each message 292 | 293 | let fullOutput = new WrappedBuffer(); 294 | fullOutput.extend([ PROTOCOL_VERSION ]); 295 | 296 | let protocolVersion = getByte(query); 297 | if (protocolVersion < 0x60 || protocolVersion > 0x6F) throw Error("invalid negentropy protocol version byte"); 298 | if (protocolVersion !== PROTOCOL_VERSION) { 299 | if (this.isInitiator) throw Error("unsupported negentropy protocol version requested: " + (protocolVersion - 0x60)); 300 | else return [this._renderOutput(fullOutput), haveIds, needIds]; 301 | } 302 | 303 | let storageSize = this.storage.size(); 304 | let prevBound = this._bound(0); 305 | let prevIndex = 0; 306 | let skip = false; 307 | 308 | while (query.length !== 0) { 309 | let o = new WrappedBuffer(); 310 | 311 | let doSkip = () => { 312 | if (skip) { 313 | skip = false; 314 | o.extend(this.encodeBound(prevBound)); 315 | o.extend(encodeVarInt(Mode.Skip)); 316 | } 317 | }; 318 | 319 | let currBound = this.decodeBound(query); 320 | let mode = decodeVarInt(query); 321 | 322 | let lower = prevIndex; 323 | let upper = this.storage.findLowerBound(prevIndex, storageSize, currBound); 324 | 325 | if (mode === Mode.Skip) { 326 | skip = true; 327 | } else if (mode === Mode.Fingerprint) { 328 | let theirFingerprint = getBytes(query, FINGERPRINT_SIZE); 329 | let ourFingerprint = await this.storage.fingerprint(lower, upper); 330 | 331 | if (compareUint8Array(theirFingerprint, ourFingerprint) !== 0) { 332 | doSkip(); 333 | await this.splitRange(lower, upper, currBound, o); 334 | } else { 335 | skip = true; 336 | } 337 | } else if (mode === Mode.IdList) { 338 | let numIds = decodeVarInt(query); 339 | 340 | let theirElems = {}; // stringified Uint8Array -> original Uint8Array (or hex) 341 | for (let i = 0; i < numIds; i++) { 342 | let e = getBytes(query, ID_SIZE); 343 | if (this.isInitiator) theirElems[e] = e; 344 | } 345 | 346 | if (this.isInitiator) { 347 | skip = true; 348 | 349 | this.storage.iterate(lower, upper, (item) => { 350 | let k = item.id; 351 | 352 | if (!theirElems[k]) { 353 | // ID exists on our side, but not their side 354 | if (this.isInitiator) haveIds.push(this.wantUint8ArrayOutput ? k : uint8ArrayToHex(k)); 355 | } else { 356 | // ID exists on both sides 357 | delete theirElems[k]; 358 | } 359 | 360 | return true; 361 | }); 362 | 363 | for (let v of Object.values(theirElems)) { 364 | // ID exists on their side, but not our side 365 | needIds.push(this.wantUint8ArrayOutput ? v : uint8ArrayToHex(v)); 366 | } 367 | } else { 368 | doSkip(); 369 | 370 | let responseIds = new WrappedBuffer(); 371 | let numResponseIds = 0; 372 | let endBound = currBound; 373 | 374 | this.storage.iterate(lower, upper, (item, index) => { 375 | if (this.exceededFrameSizeLimit(fullOutput.length + responseIds.length)) { 376 | endBound = item; 377 | upper = index; // shrink upper so that remaining range gets correct fingerprint 378 | return false; 379 | } 380 | 381 | responseIds.extend(item.id); 382 | numResponseIds++; 383 | return true; 384 | }); 385 | 386 | o.extend(this.encodeBound(endBound)); 387 | o.extend(encodeVarInt(Mode.IdList)); 388 | o.extend(encodeVarInt(numResponseIds)); 389 | o.extend(responseIds); 390 | 391 | fullOutput.extend(o); 392 | o = new WrappedBuffer(); 393 | } 394 | } else { 395 | throw Error("unexpected mode"); 396 | } 397 | 398 | if (this.exceededFrameSizeLimit(fullOutput.length + o.length)) { 399 | // frameSizeLimit exceeded: Stop range processing and return a fingerprint for the remaining range 400 | let remainingFingerprint = await this.storage.fingerprint(upper, storageSize); 401 | 402 | fullOutput.extend(this.encodeBound(this._bound(Number.MAX_VALUE))); 403 | fullOutput.extend(encodeVarInt(Mode.Fingerprint)); 404 | fullOutput.extend(remainingFingerprint); 405 | break; 406 | } else { 407 | fullOutput.extend(o); 408 | } 409 | 410 | prevIndex = upper; 411 | prevBound = currBound; 412 | } 413 | 414 | return [fullOutput.length === 1 && this.isInitiator ? null : this._renderOutput(fullOutput), haveIds, needIds]; 415 | } 416 | 417 | async splitRange(lower, upper, upperBound, o) { 418 | let numElems = upper - lower; 419 | let buckets = 16; 420 | 421 | if (numElems < buckets * 2) { 422 | o.extend(this.encodeBound(upperBound)); 423 | o.extend(encodeVarInt(Mode.IdList)); 424 | 425 | o.extend(encodeVarInt(numElems)); 426 | this.storage.iterate(lower, upper, (item) => { 427 | o.extend(item.id); 428 | return true; 429 | }); 430 | } else { 431 | let itemsPerBucket = Math.floor(numElems / buckets); 432 | let bucketsWithExtra = numElems % buckets; 433 | let curr = lower; 434 | 435 | for (let i = 0; i < buckets; i++) { 436 | let bucketSize = itemsPerBucket + (i < bucketsWithExtra ? 1 : 0); 437 | let ourFingerprint = await this.storage.fingerprint(curr, curr + bucketSize); 438 | curr += bucketSize; 439 | 440 | let nextBound; 441 | 442 | if (curr === upper) { 443 | nextBound = upperBound; 444 | } else { 445 | let prevItem, currItem; 446 | 447 | this.storage.iterate(curr - 1, curr + 1, (item, index) => { 448 | if (index === curr - 1) prevItem = item; 449 | else currItem = item; 450 | return true; 451 | }); 452 | 453 | nextBound = this.getMinimalBound(prevItem, currItem); 454 | } 455 | 456 | o.extend(this.encodeBound(nextBound)); 457 | o.extend(encodeVarInt(Mode.Fingerprint)); 458 | o.extend(ourFingerprint); 459 | } 460 | } 461 | } 462 | 463 | _renderOutput(o) { 464 | o = o.unwrap(); 465 | if (!this.wantUint8ArrayOutput) o = uint8ArrayToHex(o); 466 | return o; 467 | } 468 | 469 | exceededFrameSizeLimit(n) { 470 | return this.frameSizeLimit && n > this.frameSizeLimit - 200; 471 | } 472 | 473 | // Decoding 474 | 475 | decodeTimestampIn(encoded) { 476 | let timestamp = decodeVarInt(encoded); 477 | timestamp = timestamp === 0 ? Number.MAX_VALUE : timestamp - 1; 478 | if (this.lastTimestampIn === Number.MAX_VALUE || timestamp === Number.MAX_VALUE) { 479 | this.lastTimestampIn = Number.MAX_VALUE; 480 | return Number.MAX_VALUE; 481 | } 482 | timestamp += this.lastTimestampIn; 483 | this.lastTimestampIn = timestamp; 484 | return timestamp; 485 | } 486 | 487 | decodeBound(encoded) { 488 | let timestamp = this.decodeTimestampIn(encoded); 489 | let len = decodeVarInt(encoded); 490 | if (len > ID_SIZE) throw Error("bound key too long"); 491 | let id = getBytes(encoded, len); 492 | return { timestamp, id }; 493 | } 494 | 495 | // Encoding 496 | 497 | encodeTimestampOut(timestamp) { 498 | if (timestamp === Number.MAX_VALUE) { 499 | this.lastTimestampOut = Number.MAX_VALUE; 500 | return encodeVarInt(0); 501 | } 502 | 503 | let temp = timestamp; 504 | timestamp -= this.lastTimestampOut; 505 | this.lastTimestampOut = temp; 506 | return encodeVarInt(timestamp + 1); 507 | } 508 | 509 | encodeBound(key) { 510 | let output = new WrappedBuffer(); 511 | 512 | output.extend(this.encodeTimestampOut(key.timestamp)); 513 | output.extend(encodeVarInt(key.id.length)); 514 | output.extend(key.id); 515 | 516 | return output; 517 | } 518 | 519 | getMinimalBound(prev, curr) { 520 | if (curr.timestamp !== prev.timestamp) { 521 | return this._bound(curr.timestamp); 522 | } else { 523 | let sharedPrefixBytes = 0; 524 | let currKey = curr.id; 525 | let prevKey = prev.id; 526 | 527 | for (let i = 0; i < ID_SIZE; i++) { 528 | if (currKey[i] !== prevKey[i]) break; 529 | sharedPrefixBytes++; 530 | } 531 | 532 | return this._bound(curr.timestamp, curr.id.subarray(0, sharedPrefixBytes + 1)); 533 | } 534 | }; 535 | } 536 | 537 | function loadInputBuffer(inp) { 538 | if (typeof(inp) === 'string') inp = hexToUint8Array(inp); 539 | else if (__proto__ !== Uint8Array.prototype) inp = new Uint8Array(inp); // node Buffer? 540 | return inp; 541 | } 542 | 543 | function hexToUint8Array(h) { 544 | if (h.startsWith('0x')) h = h.substr(2); 545 | if (h.length % 2 === 1) throw Error("odd length of hex string"); 546 | let arr = new Uint8Array(h.length / 2); 547 | for (let i = 0; i < arr.length; i++) arr[i] = parseInt(h.substr(i * 2, 2), 16); 548 | return arr; 549 | } 550 | 551 | const uint8ArrayToHexLookupTable = new Array(256); 552 | { 553 | const hexAlphabet = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f']; 554 | for (let i = 0; i < 256; i++) { 555 | uint8ArrayToHexLookupTable[i] = hexAlphabet[(i >>> 4) & 0xF] + hexAlphabet[i & 0xF]; 556 | } 557 | } 558 | 559 | function uint8ArrayToHex(arr) { 560 | let out = ''; 561 | for (let i = 0, edx = arr.length; i < edx; i++) { 562 | out += uint8ArrayToHexLookupTable[arr[i]]; 563 | } 564 | return out; 565 | } 566 | 567 | 568 | function compareUint8Array(a, b) { 569 | for (let i = 0; i < a.byteLength; i++) { 570 | if (a[i] < b[i]) return -1; 571 | if (a[i] > b[i]) return 1; 572 | } 573 | 574 | if (a.byteLength > b.byteLength) return 1; 575 | if (a.byteLength < b.byteLength) return -1; 576 | 577 | return 0; 578 | } 579 | 580 | function itemCompare(a, b) { 581 | if (a.timestamp === b.timestamp) { 582 | return compareUint8Array(a.id, b.id); 583 | } 584 | 585 | return a.timestamp - b.timestamp; 586 | } 587 | 588 | 589 | module.exports = { Negentropy, NegentropyStorageVector, }; 590 | -------------------------------------------------------------------------------- /js/README.md: -------------------------------------------------------------------------------- 1 | # Negentropy Javascript Implementation 2 | 3 | The library is contained in a single javascript file. It shouldn't need any dependencies, in either a browser or node.js: 4 | 5 | const Negentropy = require('Negentropy.js'); 6 | 7 | ## Storage 8 | 9 | First, you need to create a storage instance. Currently only `Vector` is implemented: 10 | 11 | let storage = new NegentropyStorageVector(); 12 | 13 | Next, add all the items in your collection, and `seal()`: 14 | 15 | for (let item of myItems) { 16 | storage.insert(item.timestamp, item.id); 17 | } 18 | 19 | ne.seal(); 20 | 21 | * `timestamp` should be a JS `Number` 22 | * `id` should be a hex string, `Uint8Array`, or node.js `Buffer` 23 | 24 | ## Reconciliation 25 | 26 | Create a Negentropy object: 27 | 28 | let ne = new Negentropy(storage, 50_000); 29 | 30 | * The second parameter (`50_000` above) is the `frameSizeLimit`. This can be omitted (or `0`) to permit unlimited-sized frames. 31 | 32 | On the client-side, create an initial message, and then transmit it to the server, receive the response, and `reconcile` until complete (signified by returning `null` for `newMsg`): 33 | 34 | let msg = await ne.initiate(); 35 | 36 | while (msg !== null) { 37 | let response = queryServer(msg); 38 | let [newMsg, have, need] = await ne.reconcile(msg); 39 | msg = newMsg; 40 | // handle have/need (there may be duplicates from previous calls to reconcile()) 41 | } 42 | 43 | * The output `msg`s and the IDs in the `have`/`need` arrays are hex strings, but you can set `ne.wantUint8ArrayOutput = true` if you want `Uint8Array`s instead. 44 | 45 | The server-side is similar, except it doesn't create an initial message, there are no `have`/`need` arrays, and `newMsg` will never be `null`: 46 | 47 | while (1) { 48 | let msg = receiveMsgFromClient(); 49 | let [newMsg] = await ne.reconcile(msg); 50 | respondToClient(newMsg); 51 | } 52 | 53 | * The `initiate()` and `reconcile()` methods are async because the `crypto.subtle.digest()` browser API is async. 54 | * Timestamp values greater than `Number.MAX_VALUE` will currently cause failures. 55 | -------------------------------------------------------------------------------- /test/.gitignore: -------------------------------------------------------------------------------- 1 | /negent-test.log 2 | -------------------------------------------------------------------------------- /test/Utils.pm: -------------------------------------------------------------------------------- 1 | package Utils; 2 | 3 | use strict; 4 | 5 | 6 | sub harnessTypeToCmd { 7 | my $harnessType = shift; 8 | 9 | if ($harnessType eq 'cpp') { 10 | return './cpp/harness'; 11 | } elsif ($harnessType eq 'js') { 12 | return 'node js/harness.js'; 13 | } elsif ($harnessType eq 'rust') { 14 | return '../../rust-negentropy/target/debug/harness'; 15 | } elsif ($harnessType eq 'go') { 16 | return 'go run go/harness.go'; 17 | } elsif ($harnessType eq 'go-nostr') { 18 | return "bash -c 'cd go-nostr && go run .'"; 19 | } elsif ($harnessType eq 'csharp') { 20 | return "./csharp/bin/Debug/net8.0/Harness"; 21 | } elsif ($harnessType eq 'kotlin') { 22 | return "kotlin -classpath ../negentropy-kmp/negentropy/build/libs/negentropy-jvm-1.0.0.jar com.vitorpamplona.negentropy.MainKt"; 23 | } 24 | 25 | die "unknown harness type: $harnessType"; 26 | } 27 | 28 | 29 | 1; 30 | -------------------------------------------------------------------------------- /test/cpp/.gitignore: -------------------------------------------------------------------------------- 1 | /harness 2 | /btreeFuzz 3 | /measureSpaceUsage 4 | /lmdbTest 5 | /subRange 6 | 7 | /testdb/ 8 | -------------------------------------------------------------------------------- /test/cpp/Makefile: -------------------------------------------------------------------------------- 1 | W = -Wall 2 | OPT = -g -O2 3 | STD = -std=c++20 4 | CXXFLAGS = $(STD) $(OPT) $(W) -fPIC $(XCXXFLAGS) 5 | INCS = -I../../cpp/ -I./hoytech-cpp/ -I../cpp/vendor/lmdbxx/include/ 6 | 7 | DEPS = ../../cpp/negentropy.h ../../cpp/negentropy/* ../../cpp/negentropy/storage/* ../../cpp/negentropy/storage/btree/* 8 | 9 | harness: harness.cpp 10 | $(CXX) $(W) $(OPT) $(STD) $(INCS) $< -lcrypto -o $@ 11 | 12 | btreeFuzz: btreeFuzz.cpp 13 | $(CXX) $(W) $(OPT) $(STD) $(INCS) $< -lcrypto -llmdb -o $@ 14 | 15 | lmdbTest: lmdbTest.cpp 16 | $(CXX) $(W) $(OPT) $(STD) $(INCS) $< -lcrypto -llmdb -o $@ 17 | 18 | measureSpaceUsage: measureSpaceUsage.cpp 19 | $(CXX) -DNE_FUZZ_TEST $(W) $(OPT) $(STD) $(INCS) $< -lcrypto -llmdb -o $@ 20 | 21 | subRange: subRange.cpp 22 | $(CXX) -DNE_FUZZ_TEST $(W) $(OPT) $(STD) $(INCS) $< -lcrypto -o $@ 23 | 24 | 25 | .PHONY: all clean 26 | 27 | all: harness btreeFuzz lmdbTest measureSpaceUsage subRange 28 | 29 | clean: 30 | rm -f harness btreeFuzz lmdbTest measureSpaceUsage 31 | -------------------------------------------------------------------------------- /test/cpp/btreeFuzz.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | #include 9 | #include 10 | 11 | #include "negentropy.h" 12 | #include "negentropy/storage/BTreeLMDB.h" 13 | #include "negentropy/storage/BTreeMem.h" 14 | #include "negentropy/storage/btree/debug.h" 15 | 16 | 17 | 18 | 19 | struct Verifier { 20 | bool isLMDB; 21 | 22 | std::set addedTimestamps; 23 | 24 | Verifier(bool isLMDB) : isLMDB(isLMDB) {} 25 | 26 | void insert(negentropy::storage::btree::BTreeCore &btree, uint64_t timestamp){ 27 | negentropy::Item item(timestamp, std::string(32, (unsigned char)(timestamp % 256))); 28 | btree.insertItem(item); 29 | addedTimestamps.insert(timestamp); 30 | doVerify(btree); 31 | } 32 | 33 | void erase(negentropy::storage::btree::BTreeCore &btree, uint64_t timestamp){ 34 | negentropy::Item item(timestamp, std::string(32, (unsigned char)(timestamp % 256))); 35 | btree.eraseItem(item); 36 | addedTimestamps.erase(timestamp); 37 | doVerify(btree); 38 | } 39 | 40 | void doVerify(negentropy::storage::btree::BTreeCore &btree) { 41 | try { 42 | negentropy::storage::btree::verify(btree, isLMDB); 43 | } catch (...) { 44 | std::cout << "TREE FAILED INVARIANTS:" << std::endl; 45 | negentropy::storage::btree::dump(btree); 46 | throw; 47 | } 48 | 49 | if (btree.size() != addedTimestamps.size()) throw negentropy::err("verify size mismatch"); 50 | auto iter = addedTimestamps.begin(); 51 | 52 | btree.iterate(0, btree.size(), [&](const auto &item, size_t i) { 53 | if (item.timestamp != *iter) throw negentropy::err("verify element mismatch"); 54 | iter = std::next(iter); 55 | return true; 56 | }); 57 | } 58 | }; 59 | 60 | 61 | 62 | 63 | 64 | void doFuzz(negentropy::storage::btree::BTreeCore &btree, Verifier &v) { 65 | if (btree.size() != 0) throw negentropy::err("expected empty tree"); 66 | 67 | 68 | // Verify return values 69 | 70 | if (!btree.insert(100, std::string(32, '\x01'))) throw negentropy::err("didn't insert element?"); 71 | if (btree.insert(100, std::string(32, '\x01'))) throw negentropy::err("double inserted element?"); 72 | if (!btree.erase(100, std::string(32, '\x01'))) throw negentropy::err("didn't erase element?"); 73 | if (btree.erase(100, std::string(32, '\x01'))) throw negentropy::err("erased non-existing element?"); 74 | 75 | 76 | // Fuzz test: Insertion phase 77 | 78 | while (btree.size() < 5000) { 79 | if (rand() % 3 <= 1) { 80 | int timestamp; 81 | 82 | do { 83 | timestamp = rand(); 84 | } while (v.addedTimestamps.contains(timestamp)); 85 | 86 | std::cout << "INSERT " << timestamp << " size = " << btree.size() << std::endl; 87 | v.insert(btree, timestamp); 88 | } else if (v.addedTimestamps.size()) { 89 | auto it = v.addedTimestamps.begin(); 90 | std::advance(it, rand() % v.addedTimestamps.size()); 91 | 92 | std::cout << "DEL " << (*it) << std::endl; 93 | v.erase(btree, *it); 94 | } 95 | } 96 | 97 | // Fuzz test: Removal phase 98 | 99 | std::cout << "REMOVING ALL" << std::endl; 100 | 101 | while (btree.size()) { 102 | auto it = v.addedTimestamps.begin(); 103 | std::advance(it, rand() % v.addedTimestamps.size()); 104 | auto timestamp = *it; 105 | 106 | std::cout << "DEL " << timestamp << " size = " << btree.size() << std::endl; 107 | v.erase(btree, *it); 108 | } 109 | } 110 | 111 | 112 | 113 | int main() { 114 | std::cout << "SIZEOF NODE: " << sizeof(negentropy::storage::Node) << std::endl; 115 | 116 | 117 | srand(0); 118 | 119 | 120 | if (::getenv("NE_FUZZ_LMDB")) { 121 | system("mkdir -p testdb/"); 122 | system("rm -f testdb/*"); 123 | 124 | auto env = lmdb::env::create(); 125 | env.set_max_dbs(64); 126 | env.set_mapsize(1'000'000'000ULL); 127 | env.open("testdb/", 0); 128 | 129 | auto txn = lmdb::txn::begin(env); 130 | auto btreeDbi = negentropy::storage::BTreeLMDB::setupDB(txn, "test-data"); 131 | 132 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 0); 133 | 134 | Verifier v(true); 135 | doFuzz(btree, v); 136 | 137 | btree.flush(); 138 | txn.commit(); 139 | } else { 140 | Verifier v(false); 141 | negentropy::storage::BTreeMem btree; 142 | doFuzz(btree, v); 143 | } 144 | 145 | 146 | std::cout << "OK" << std::endl; 147 | 148 | return 0; 149 | } 150 | -------------------------------------------------------------------------------- /test/cpp/check.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | make clean 4 | make -j all 5 | 6 | ./btreeFuzz 7 | NE_FUZZ_LMDB=1 ./btreeFuzz 8 | ./lmdbTest 9 | ./subRange 10 | -------------------------------------------------------------------------------- /test/cpp/harness.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include 6 | #include 7 | 8 | #include "negentropy.h" 9 | #include "negentropy/storage/BTreeMem.h" 10 | #include "negentropy/storage/Vector.h" 11 | 12 | 13 | 14 | std::vector split(const std::string &s, char delim) { 15 | std::vector result; 16 | std::stringstream ss (s); 17 | std::string item; 18 | 19 | while (getline (ss, item, delim)) { 20 | result.push_back (item); 21 | } 22 | 23 | return result; 24 | } 25 | 26 | 27 | 28 | int main() { 29 | uint64_t frameSizeLimit = 0; 30 | if (::getenv("FRAMESIZELIMIT")) frameSizeLimit = std::stoull(::getenv("FRAMESIZELIMIT")); 31 | 32 | negentropy::storage::Vector storage; 33 | std::unique_ptr> ne; 34 | 35 | std::string line; 36 | while (std::cin) { 37 | std::getline(std::cin, line); 38 | if (!line.size()) continue; 39 | 40 | auto items = split(line, ','); 41 | 42 | if (items[0] == "item") { 43 | if (items.size() != 3) throw hoytech::error("wrong num of fields"); 44 | uint64_t created = std::stoull(items[1]); 45 | auto id = hoytech::from_hex(items[2]); 46 | storage.insert(created, id); 47 | } else if (items[0] == "seal") { 48 | storage.seal(); 49 | ne = std::make_unique>(storage, frameSizeLimit); 50 | } else if (items[0] == "initiate") { 51 | auto q = ne->initiate(); 52 | if (frameSizeLimit && q.size() > frameSizeLimit) throw hoytech::error("initiate frameSizeLimit exceeded: ", q.size(), " > ", frameSizeLimit); 53 | std::cout << "msg," << hoytech::to_hex(q) << std::endl; 54 | } else if (items[0] == "msg") { 55 | std::string q; 56 | if (items.size() >= 2) q = hoytech::from_hex(items[1]); 57 | 58 | if (ne->isInitiator) { 59 | std::vector have, need; 60 | auto resp = ne->reconcile(q, have, need); 61 | 62 | for (auto &id : have) std::cout << "have," << hoytech::to_hex(id) << "\n"; 63 | for (auto &id : need) std::cout << "need," << hoytech::to_hex(id) << "\n"; 64 | 65 | if (!resp) { 66 | std::cout << "done" << std::endl; 67 | continue; 68 | } 69 | 70 | q = *resp; 71 | } else { 72 | q = ne->reconcile(q); 73 | } 74 | 75 | if (frameSizeLimit && q.size() > frameSizeLimit) throw hoytech::error("frameSizeLimit exceeded: ", q.size(), " > ", frameSizeLimit, ": from ", (ne->isInitiator ? "initiator" : "non-initiator")); 76 | std::cout << "msg," << hoytech::to_hex(q) << std::endl; 77 | } else { 78 | throw hoytech::error("unknown cmd: ", items[0]); 79 | } 80 | } 81 | 82 | return 0; 83 | } 84 | -------------------------------------------------------------------------------- /test/cpp/lmdbTest.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | #include 9 | #include 10 | 11 | #include "negentropy.h" 12 | #include "negentropy/storage/BTreeLMDB.h" 13 | #include "negentropy/storage/BTreeMem.h" 14 | #include "negentropy/storage/btree/debug.h" 15 | #include "negentropy/storage/Vector.h" 16 | 17 | 18 | 19 | 20 | 21 | int main() { 22 | system("mkdir -p testdb/"); 23 | system("rm -f testdb/*"); 24 | 25 | auto env = lmdb::env::create(); 26 | env.set_max_dbs(64); 27 | env.open("testdb/", 0); 28 | 29 | 30 | lmdb::dbi btreeDbi; 31 | 32 | { 33 | auto txn = lmdb::txn::begin(env); 34 | btreeDbi = negentropy::storage::BTreeLMDB::setupDB(txn, "test-data"); 35 | txn.commit(); 36 | } 37 | 38 | negentropy::storage::Vector vec; 39 | 40 | 41 | auto packId = [](uint64_t n){ 42 | auto o = std::string(32, '\0'); 43 | memcpy((char*)o.data(), (char*)&n, sizeof(n)); 44 | return o; 45 | }; 46 | 47 | auto unpackId = [](std::string_view n){ 48 | if (n.size() != 32) throw hoytech::error("too short to unpack"); 49 | return *(uint64_t*)n.data(); 50 | }; 51 | 52 | 53 | { 54 | auto txn = lmdb::txn::begin(env); 55 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 56 | 57 | auto add = [&](uint64_t timestamp){ 58 | negentropy::Item item(timestamp, packId(timestamp)); 59 | btree.insertItem(item); 60 | vec.insertItem(item); 61 | }; 62 | 63 | for (size_t i = 1000; i < 2000; i += 2) add(i); 64 | 65 | btree.flush(); 66 | 67 | txn.commit(); 68 | } 69 | 70 | vec.seal(); 71 | 72 | 73 | 74 | 75 | { 76 | auto txn = lmdb::txn::begin(env, 0, MDB_RDONLY); 77 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 78 | //negentropy::storage::btree::dump(btree); 79 | negentropy::storage::btree::verify(btree, true); 80 | } 81 | 82 | 83 | 84 | // Identical 85 | 86 | { 87 | auto txn = lmdb::txn::begin(env, 0, MDB_RDONLY); 88 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 89 | 90 | auto ne1 = Negentropy(vec); 91 | auto ne2 = Negentropy(btree); 92 | 93 | auto q = ne1.initiate(); 94 | 95 | std::string q2 = ne2.reconcile(q); 96 | 97 | std::vector have, need; 98 | auto q3 = ne1.reconcile(q2, have, need); 99 | if (q3 || have.size() || need.size()) throw hoytech::error("bad reconcile 1"); 100 | } 101 | 102 | 103 | // Make some modifications 104 | 105 | { 106 | auto txn = lmdb::txn::begin(env); 107 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 108 | 109 | btree.erase(1044, packId(1044)); 110 | btree.erase(1838, packId(1838)); 111 | 112 | btree.insert(1555, packId(1555)); 113 | btree.insert(99999, packId(99999)); 114 | 115 | btree.flush(); 116 | txn.commit(); 117 | } 118 | 119 | 120 | // Reconcile again 121 | 122 | { 123 | auto txn = lmdb::txn::begin(env, 0, MDB_RDONLY); 124 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 125 | 126 | auto ne1 = Negentropy(vec); 127 | auto ne2 = Negentropy(btree); 128 | 129 | std::vector allHave, allNeed; 130 | 131 | std::string msg = ne1.initiate(); 132 | 133 | while (true) { 134 | std::string response = ne2.reconcile(msg); 135 | 136 | std::vector have, need; 137 | auto newMsg = ne1.reconcile(response, have, need); 138 | 139 | for (const auto &id : have) allHave.push_back(unpackId(id)); 140 | for (const auto &id : need) allNeed.push_back(unpackId(id)); 141 | 142 | if (!newMsg) break; // done 143 | msg = *newMsg; 144 | } 145 | 146 | std::sort(allHave.begin(), allHave.end()); 147 | std::sort(allNeed.begin(), allNeed.end()); 148 | 149 | if (allHave != std::vector({ 1044, 1838 })) throw hoytech::error("bad allHave"); 150 | if (allNeed != std::vector({ 1555, 99999 })) throw hoytech::error("bad allNeed"); 151 | } 152 | 153 | 154 | std::cout << "OK" << std::endl; 155 | 156 | return 0; 157 | } 158 | -------------------------------------------------------------------------------- /test/cpp/measureSpaceUsage.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include 5 | #include 6 | 7 | #include 8 | #include 9 | 10 | #include "negentropy.h" 11 | #include "negentropy/storage/BTreeLMDB.h" 12 | #include "negentropy/storage/BTreeMem.h" 13 | #include "negentropy/storage/btree/debug.h" 14 | #include "negentropy/storage/Vector.h" 15 | 16 | 17 | 18 | 19 | 20 | 21 | int main() { 22 | system("mkdir -p testdb/"); 23 | system("rm -f testdb/*"); 24 | 25 | auto env = lmdb::env::create(); 26 | env.set_max_dbs(64); 27 | env.set_mapsize(1'000'000'000ULL); 28 | env.open("testdb/", 0); 29 | 30 | 31 | lmdb::dbi btreeDbi; 32 | 33 | { 34 | auto txn = lmdb::txn::begin(env); 35 | btreeDbi = negentropy::storage::BTreeLMDB::setupDB(txn, "test-data"); 36 | txn.commit(); 37 | } 38 | 39 | 40 | { 41 | auto txn = lmdb::txn::begin(env); 42 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 43 | 44 | auto add = [&](uint64_t timestamp){ 45 | negentropy::Item item(timestamp, std::string(32, '\x01')); 46 | btree.insertItem(item); 47 | }; 48 | 49 | for (size_t i = 1; i < 100'000; i++) add(i); 50 | 51 | btree.flush(); 52 | txn.commit(); 53 | } 54 | 55 | { 56 | auto txn = lmdb::txn::begin(env, 0, MDB_RDONLY); 57 | negentropy::storage::BTreeLMDB btree(txn, btreeDbi, 300); 58 | 59 | auto cursor = lmdb::cursor::open(txn, btreeDbi); 60 | 61 | std::string_view key, val; 62 | size_t minStart = negentropy::MAX_U64; 63 | size_t maxEnd = 0; 64 | 65 | if (cursor.get(key, val, MDB_FIRST)) { 66 | do { 67 | size_t ptrStart = (size_t)val.data(); 68 | size_t ptrEnd = ptrStart + sizeof(negentropy::storage::btree::Node); 69 | if (ptrStart < minStart) minStart = ptrStart; 70 | if (ptrEnd > maxEnd) maxEnd = ptrEnd; 71 | } while (cursor.get(key, val, MDB_NEXT)); 72 | } 73 | 74 | std::cout << "data," << negentropy::storage::btree::MAX_ITEMS << "," << sizeof(negentropy::storage::btree::Node) << "," << (maxEnd - minStart) << std::endl; 75 | } 76 | 77 | return 0; 78 | } 79 | -------------------------------------------------------------------------------- /test/cpp/measureSpaceUsage.pl: -------------------------------------------------------------------------------- 1 | system(qq{ perl -pi -e 's/MIN_ITEMS = \\d+/MIN_ITEMS = 2/' ../../cpp/negentropy/storage/btree/core.h }); 2 | system(qq{ perl -pi -e 's/REBALANCE_THRESHOLD = \\d+/REBALANCE_THRESHOLD = 4/' ../../cpp/negentropy/storage/btree/core.h }); 3 | system(qq{ perl -pi -e 's/MAX_ITEMS = \\d+/MAX_ITEMS = 6/' ../../cpp/negentropy/storage/btree/core.h }); 4 | 5 | for (my $i = 6; $i < 128; $i += 2) { 6 | print "DOING ITER $i\n"; 7 | system(qq{ perl -pi -e 's/MAX_ITEMS = \\d+/MAX_ITEMS = $i/' ../../cpp/negentropy/storage/btree/core.h }); 8 | system("rm -f measureSpaceUsage && make measureSpaceUsage && rm -f testdb/data.mdb && ./measureSpaceUsage >> measureSpaceUsage.log"); 9 | } 10 | -------------------------------------------------------------------------------- /test/cpp/subRange.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include 5 | 6 | #include 7 | #include 8 | 9 | #include "negentropy.h" 10 | #include "negentropy/storage/Vector.h" 11 | #include "negentropy/storage/BTreeMem.h" 12 | #include "negentropy/storage/SubRange.h" 13 | 14 | 15 | 16 | std::string sha256(std::string_view input) { 17 | unsigned char hash[SHA256_DIGEST_LENGTH]; 18 | SHA256(reinterpret_cast(input.data()), input.size(), hash); 19 | return std::string((const char*)&hash[0], SHA256_DIGEST_LENGTH); 20 | } 21 | 22 | std::string uintToId(uint64_t id) { 23 | return sha256(std::string((char*)&id, 8)); 24 | } 25 | 26 | 27 | template 28 | void testSubRange() { 29 | T vecBig; 30 | T vecSmall; 31 | 32 | for (size_t i = 0; i < 1000; i++) { 33 | vecBig.insert(100 + i, uintToId(i)); 34 | } 35 | 36 | for (size_t i = 400; i < 600; i++) { 37 | vecSmall.insert(100 + i, uintToId(i)); 38 | } 39 | 40 | vecBig.seal(); 41 | vecSmall.seal(); 42 | 43 | negentropy::storage::SubRange subRange(vecBig, negentropy::Bound(100 + 400), negentropy::Bound(100 + 600)); 44 | 45 | if (vecSmall.size() != subRange.size()) throw hoytech::error("size mismatch"); 46 | 47 | if (vecSmall.fingerprint(0, vecSmall.size()).sv() != subRange.fingerprint(0, subRange.size()).sv()) throw hoytech::error("fingerprint mismatch"); 48 | 49 | if (vecSmall.getItem(10) != subRange.getItem(10)) throw hoytech::error("getItem mismatch"); 50 | if (vecBig.getItem(400 + 10) != subRange.getItem(10)) throw hoytech::error("getItem mismatch"); 51 | 52 | { 53 | auto lb = subRange.findLowerBound(0, subRange.size(), negentropy::Bound(550)); 54 | auto lb2 = vecSmall.findLowerBound(0, vecSmall.size(), negentropy::Bound(550)); 55 | if (lb != lb2) throw hoytech::error("findLowerBound mismatch"); 56 | } 57 | 58 | { 59 | auto lb = subRange.findLowerBound(0, subRange.size(), negentropy::Bound(20)); 60 | auto lb2 = vecSmall.findLowerBound(0, vecSmall.size(), negentropy::Bound(20)); 61 | if (lb != lb2) throw hoytech::error("findLowerBound mismatch"); 62 | } 63 | 64 | { 65 | auto lb = subRange.findLowerBound(0, subRange.size(), negentropy::Bound(5000)); 66 | auto lb2 = vecSmall.findLowerBound(0, vecSmall.size(), negentropy::Bound(5000)); 67 | if (lb != lb2) throw hoytech::error("findLowerBound mismatch"); 68 | } 69 | } 70 | 71 | 72 | 73 | template 74 | void testSync(bool emptySide1, bool emptySide2) { 75 | T vecBig; 76 | T vecSmall; 77 | 78 | std::set expectedHave; 79 | std::set expectedNeed; 80 | 81 | size_t const lowerLimit = 20'000; 82 | size_t const upperLimit = 90'000; 83 | 84 | for (size_t i = lowerLimit; i < upperLimit; i++) { 85 | auto id = uintToId(i); 86 | if (emptySide1 || i % 15'000 == 0) { 87 | if (i >= lowerLimit && i < upperLimit) expectedNeed.insert(id); 88 | continue; 89 | } 90 | vecSmall.insert(100 + i, id); 91 | } 92 | 93 | for (size_t i = 0; i < 100'000; i++) { 94 | auto id = uintToId(i); 95 | if (emptySide2 || i % 22'000 == 0) { 96 | if (i >= lowerLimit && i < upperLimit) expectedHave.insert(id); 97 | continue; 98 | } 99 | vecBig.insert(100 + i, id); 100 | } 101 | 102 | // Get rid of common 103 | 104 | std::set commonItems; 105 | 106 | for (const auto &item : expectedHave) { 107 | if (expectedNeed.contains(item)) commonItems.insert(item); 108 | } 109 | 110 | for (const auto &item : commonItems) { 111 | expectedHave.erase(item); 112 | expectedNeed.erase(item); 113 | } 114 | 115 | 116 | vecBig.seal(); 117 | vecSmall.seal(); 118 | 119 | negentropy::storage::SubRange subRange(vecBig, negentropy::Bound(100 + lowerLimit), negentropy::Bound(100 + upperLimit)); 120 | 121 | 122 | auto ne1 = Negentropy(vecSmall, 20'000); 123 | auto ne2 = Negentropy(subRange, 20'000); 124 | 125 | std::string msg = ne1.initiate(); 126 | 127 | while (true) { 128 | msg = ne2.reconcile(msg); 129 | 130 | std::vector have, need; 131 | auto newMsg = ne1.reconcile(msg, have, need); 132 | 133 | for (const auto &item : have) { 134 | if (!expectedHave.contains(item)) throw hoytech::error("unexpected have: ", hoytech::to_hex(item)); 135 | expectedHave.erase(item); 136 | } 137 | 138 | for (const auto &item : need) { 139 | if (!expectedNeed.contains(item)) throw hoytech::error("unexpected need: ", hoytech::to_hex(item)); 140 | expectedNeed.erase(item); 141 | } 142 | 143 | if (!newMsg) break; 144 | else std::swap(msg, *newMsg); 145 | } 146 | 147 | if (expectedHave.size()) throw hoytech::error("missed have"); 148 | if (expectedNeed.size()) throw hoytech::error("missed need"); 149 | } 150 | 151 | 152 | 153 | 154 | int main() { 155 | testSubRange(); 156 | testSubRange(); 157 | 158 | testSync(false, false); 159 | testSync(true, false); 160 | testSync(false, true); 161 | 162 | std::cout << "OK" << std::endl; 163 | 164 | return 0; 165 | } 166 | -------------------------------------------------------------------------------- /test/csharp/.gitignore: -------------------------------------------------------------------------------- 1 | bin/ 2 | obj/ -------------------------------------------------------------------------------- /test/csharp/Harness.csproj: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | Exe 5 | net8.0 6 | enable 7 | enable 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /test/csharp/Program.cs: -------------------------------------------------------------------------------- 1 |  2 | using Negentropy; 3 | using System; 4 | using System.Diagnostics; 5 | 6 | var limit = Environment.GetEnvironmentVariable("FRAMESIZELIMIT"); 7 | 8 | var options = new NegentropyOptions 9 | { 10 | FrameSizeLimit = string.IsNullOrEmpty(limit) ? 0 : uint.Parse(limit) 11 | }; 12 | 13 | var builder = new NegentropyBuilder(options); 14 | Negentropy.Negentropy? ne = null; 15 | 16 | while (true) 17 | { 18 | var line = Console.ReadLine(); 19 | 20 | if (line == null) 21 | { 22 | continue; 23 | } 24 | 25 | var items = line.Split(","); 26 | 27 | switch (items[0]) 28 | { 29 | case "item": 30 | if (items.Length != 3) throw new ArgumentException("Too few items for 'item'"); 31 | 32 | var created = long.Parse(items[1]); 33 | var id = items[2].Trim(); 34 | var item = new StorageItem(id, created); 35 | 36 | builder.Add(item); 37 | 38 | break; 39 | case "seal": 40 | ne = builder.Build(); 41 | break; 42 | case "initiate": 43 | var q = ne?.Initiate() ?? ""; 44 | if (options.FrameSizeLimit > 0 && q.Length / 2 > options.FrameSizeLimit) throw new InvalidOperationException("FrameSizeLimit exceeded"); 45 | Console.WriteLine($"msg,{q}"); 46 | break; 47 | case "msg": 48 | var newQ = items[1]; 49 | 50 | if (ne == null) throw new InvalidOperationException("Negentropy not initialized"); 51 | 52 | var result = ne.Reconcile(newQ); 53 | 54 | newQ = result.Query; 55 | 56 | foreach (var x in result.HaveIds) Console.WriteLine($"have,{x}"); 57 | foreach (var x in result.NeedIds) Console.WriteLine($"need,{x}"); 58 | 59 | if (options.FrameSizeLimit > 0 && newQ.Length / 2 > options.FrameSizeLimit) throw new InvalidOperationException("FrameSizeLimit exceeded"); 60 | 61 | if (newQ == string.Empty) 62 | { 63 | Console.WriteLine("done"); 64 | } 65 | else 66 | { 67 | Console.WriteLine($"msg,{newQ}"); 68 | } 69 | break; 70 | default: 71 | return; 72 | } 73 | } 74 | 75 | record StorageItem(string Id, long Timestamp) : INegentropyItem { } -------------------------------------------------------------------------------- /test/csharp/README.md: -------------------------------------------------------------------------------- 1 | # Negentropy C# Implementation 2 | 3 | Available as a nuget package, repo available [here](https://github.com/bezysoftware/negentropy.net). 4 | 5 | ## Running tests 6 | 7 | * Install .NET 8, instructions [here](https://dotnet.microsoft.com/en-us/download/dotnet/8.0). 8 | * Run `dotnet build` 9 | * Run tests, e.g. `perl test.pl csharp,js` from the `test` directory -------------------------------------------------------------------------------- /test/fuzz.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | 3 | use strict; 4 | $|++; 5 | 6 | use IPC::Open2; 7 | use Session::Token; 8 | use FindBin; 9 | use lib "$FindBin::Bin"; 10 | use Utils; 11 | 12 | die "usage: $0 " if @ARGV < 2; 13 | my $harnessCmd1 = Utils::harnessTypeToCmd(shift) || die "please provide harness type (cpp, js, etc)"; 14 | my $harnessCmd2 = Utils::harnessTypeToCmd(shift) || die "please provide harness type (cpp, js, etc)"; 15 | 16 | my $idSize = 32; 17 | srand($ENV{SEED} || 0); 18 | my $stgen = Session::Token->new(seed => "\x00" x 1024, alphabet => '0123456789abcdef', length => $idSize * 2); 19 | 20 | 21 | my $minRecs = $ENV{MIN_RECS} // 1; 22 | my $maxRecs = $ENV{MAX_RECS} // 10_000; 23 | die "MIN_RECS > MAX_RECS" if $minRecs > $maxRecs; 24 | $minRecs = $maxRecs = $ENV{RECS} if $ENV{RECS}; 25 | 26 | my $prob1 = $ENV{P1} // 1; 27 | my $prob2 = $ENV{P2} // 1; 28 | my $prob3 = $ENV{P3} // 98; 29 | 30 | { 31 | my $total = $prob1 + $prob2 + $prob3; 32 | die "zero prob" if $total == 0; 33 | $prob1 = $prob1 / $total; 34 | $prob2 = $prob2 / $total; 35 | $prob3 = $prob3 / $total; 36 | } 37 | 38 | 39 | my $numSegs = $ENV{NUM_SEGS} // 50000; 40 | my $recsPerSeg = $ENV{RECS_PER_SEG} // 50; 41 | 42 | 43 | my $ids1 = {}; 44 | my $ids2 = {}; 45 | 46 | my ($pid1, $pid2); 47 | my ($infile1, $infile2); 48 | my ($outfile1, $outfile2); 49 | 50 | { 51 | local $ENV{FRAMESIZELIMIT}; 52 | $ENV{FRAMESIZELIMIT} = $ENV{FRAMESIZELIMIT1} if defined $ENV{FRAMESIZELIMIT1}; 53 | $pid1 = open2($outfile1, $infile1, $harnessCmd1); 54 | } 55 | 56 | { 57 | local $ENV{FRAMESIZELIMIT}; 58 | $ENV{FRAMESIZELIMIT} = $ENV{FRAMESIZELIMIT2} if defined $ENV{FRAMESIZELIMIT2}; 59 | $pid2 = open2($outfile2, $infile2, $harnessCmd2); 60 | } 61 | 62 | 63 | if ($ENV{SET1} && $ENV{SET2}) { 64 | my $cb = sub { 65 | my ($filename, $ids, $infile) = @_; 66 | 67 | open(my $fh, '<', $filename) || die "unable to open $filename: $!"; 68 | while (<$fh>) { 69 | die "unparseable line: $_" unless /^(?:item,|)(\d+),(\w{64})$/; 70 | my ($created, $id) = ($1, $2); 71 | die "duplicate line: $_" if $ids->{$id}; 72 | print $infile "item,$created,$id\n"; 73 | $ids->{$id} = 1; 74 | } 75 | }; 76 | 77 | $cb->($ENV{SET1}, $ids1, $infile1); 78 | $cb->($ENV{SET2}, $ids2, $infile2); 79 | } elsif ($ENV{CLUSTERED}) { 80 | my $segments = rnd($numSegs); 81 | my $curr = 0; 82 | 83 | for (1..$segments) { 84 | my $num = rnd($recsPerSeg) + 1; 85 | 86 | my $modeRnd = rand(); 87 | 88 | for (1..$num) { 89 | my $created = 1677970534 + $curr++; 90 | my $id = $stgen->get; 91 | 92 | if ($modeRnd < $prob1) { 93 | print $infile1 "item,$created,$id\n"; 94 | $ids1->{$id} = 1; 95 | } elsif ($modeRnd < $prob1 + $prob2) { 96 | print $infile2 "item,$created,$id\n"; 97 | $ids2->{$id} = 1; 98 | } else { 99 | print $infile1 "item,$created,$id\n"; 100 | print $infile2 "item,$created,$id\n"; 101 | $ids1->{$id} = 1; 102 | $ids2->{$id} = 1; 103 | } 104 | } 105 | } 106 | } else { 107 | my $num = $minRecs + rnd($maxRecs - $minRecs); 108 | 109 | for (1..$num) { 110 | my $created = 1677970534 + rnd($num); 111 | my $id = $stgen->get; 112 | 113 | my $modeRnd = rand(); 114 | 115 | if ($modeRnd < $prob1) { 116 | print $infile1 "item,$created,$id\n"; 117 | $ids1->{$id} = 1; 118 | } elsif ($modeRnd < $prob1 + $prob2) { 119 | print $infile2 "item,$created,$id\n"; 120 | $ids2->{$id} = 1; 121 | } else { 122 | print $infile1 "item,$created,$id\n"; 123 | print $infile2 "item,$created,$id\n"; 124 | $ids1->{$id} = 1; 125 | $ids2->{$id} = 1; 126 | } 127 | } 128 | } 129 | 130 | print $infile1 "seal\n"; 131 | print $infile2 "seal\n"; 132 | 133 | print $infile1 "initiate\n"; 134 | 135 | 136 | my $round = 0; 137 | my $totalUp = 0; 138 | my $totalDown = 0; 139 | my $optimalUp = 0; 140 | my $optimalDown = 0; 141 | 142 | while (1) { 143 | my $msg = <$outfile1>; 144 | chomp $msg; 145 | print "[1]: $msg\n" if $ENV{DEBUG}; 146 | 147 | if ($msg =~ /^(have|need),(\w+)/) { 148 | my ($action, $id) = ($1, $2); 149 | 150 | if ($action eq 'need') { 151 | die "duplicate insert of $action,$id" if $ids1->{$id} && $ENV{NODUPS}; 152 | $optimalDown += 32 if !$ids1->{$id}; 153 | $ids1->{$id} = 1; 154 | } elsif ($action eq 'have') { 155 | die "duplicate insert of $action,$id" if $ids2->{$id} && $ENV{NODUPS}; 156 | $optimalUp += 32 if !$ids2->{$id}; 157 | $ids2->{$id} = 1; 158 | } 159 | 160 | next; 161 | } elsif ($msg =~ /^msg,(\w*)/) { 162 | my $data = $1; 163 | print "DELTATRACE1 $data\n" if $ENV{DELTATRACE}; 164 | print $infile2 "msg,$data\n"; 165 | 166 | my $bytes = length($data) / 2; 167 | $totalUp += $bytes; 168 | print "[$round] CLIENT -> SERVER: $bytes bytes\n"; 169 | } elsif ($msg =~ /^done/) { 170 | last; 171 | } else { 172 | die "unexpected line from 1: '$msg'"; 173 | } 174 | 175 | $msg = <$outfile2>; 176 | print "[2]: $msg\n" if $ENV{DEBUG}; 177 | 178 | if ($msg =~ /^msg,(\w*)/) { 179 | my $data = $1; 180 | print "DELTATRACE2 $data\n" if $ENV{DELTATRACE}; 181 | print $infile1 "msg,$data\n"; 182 | 183 | my $bytes = length($data) / 2; 184 | $totalDown += $bytes; 185 | print "[$round] SERVER -> CLIENT: $bytes bytes\n"; 186 | } else { 187 | die "unexpected line from 2: $msg"; 188 | } 189 | 190 | $round++; 191 | } 192 | 193 | kill 'KILL', $pid1, $pid2; 194 | 195 | for my $id (keys %$ids1) { 196 | die "$id not in ids2" if !$ids2->{$id}; 197 | } 198 | 199 | for my $id (keys %$ids2) { 200 | die "$id not in ids1" if !$ids1->{$id}; 201 | } 202 | 203 | 204 | sub renderOverhead { 205 | my $total = shift; 206 | my $optimal = shift; 207 | 208 | return '∞' if $optimal == 0; 209 | return sprintf("%.2f%%", ($total / $optimal - 1) * 100); 210 | } 211 | 212 | my $upOverhead = renderOverhead($totalUp, $optimalUp); 213 | my $downOverhead = renderOverhead($totalDown, $optimalDown); 214 | 215 | 216 | print "UP: $totalUp bytes ($upOverhead overhead), DOWN: $totalDown bytes ($downOverhead overhead)\n"; 217 | 218 | print "\n-----------OK-----------\n"; 219 | 220 | 221 | sub rnd { 222 | my $n = shift; 223 | return int(rand() * $n); 224 | } 225 | -------------------------------------------------------------------------------- /test/go-nostr/go.mod: -------------------------------------------------------------------------------- 1 | module go-nostr-negentropy-harness 2 | 3 | go 1.23.0 4 | 5 | require github.com/nbd-wtf/go-nostr v0.37.2 6 | 7 | require ( 8 | github.com/btcsuite/btcd/btcec/v2 v2.3.4 // indirect 9 | github.com/btcsuite/btcd/chaincfg/chainhash v1.1.0 // indirect 10 | github.com/decred/dcrd/crypto/blake256 v1.1.0 // indirect 11 | github.com/decred/dcrd/dcrec/secp256k1/v4 v4.3.0 // indirect 12 | github.com/gobwas/httphead v0.1.0 // indirect 13 | github.com/gobwas/pool v0.2.1 // indirect 14 | github.com/gobwas/ws v1.4.0 // indirect 15 | github.com/josharian/intern v1.0.0 // indirect 16 | github.com/mailru/easyjson v0.7.7 // indirect 17 | github.com/puzpuzpuz/xsync/v3 v3.4.0 // indirect 18 | github.com/tidwall/gjson v1.17.3 // indirect 19 | github.com/tidwall/match v1.1.1 // indirect 20 | github.com/tidwall/pretty v1.2.1 // indirect 21 | golang.org/x/exp v0.0.0-20240909161429-701f63a606c0 // indirect 22 | golang.org/x/sys v0.25.0 // indirect 23 | ) 24 | -------------------------------------------------------------------------------- /test/go-nostr/go.sum: -------------------------------------------------------------------------------- 1 | github.com/btcsuite/btcd/btcec/v2 v2.3.4 h1:3EJjcN70HCu/mwqlUsGK8GcNVyLVxFDlWurTXGPFfiQ= 2 | github.com/btcsuite/btcd/btcec/v2 v2.3.4/go.mod h1:zYzJ8etWJQIv1Ogk7OzpWjowwOdXY1W/17j2MW85J04= 3 | github.com/btcsuite/btcd/chaincfg/chainhash v1.1.0 h1:59Kx4K6lzOW5w6nFlA0v5+lk/6sjybR934QNHSJZPTQ= 4 | github.com/btcsuite/btcd/chaincfg/chainhash v1.1.0/go.mod h1:7SFka0XMvUgj3hfZtydOrQY2mwhPclbT2snogU7SQQc= 5 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 6 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 7 | github.com/decred/dcrd/crypto/blake256 v1.1.0 h1:zPMNGQCm0g4QTY27fOCorQW7EryeQ/U0x++OzVrdms8= 8 | github.com/decred/dcrd/crypto/blake256 v1.1.0/go.mod h1:2OfgNZ5wDpcsFmHmCK5gZTPcCXqlm2ArzUIkw9czNJo= 9 | github.com/decred/dcrd/dcrec/secp256k1/v4 v4.3.0 h1:rpfIENRNNilwHwZeG5+P150SMrnNEcHYvcCuK6dPZSg= 10 | github.com/decred/dcrd/dcrec/secp256k1/v4 v4.3.0/go.mod h1:v57UDF4pDQJcEfFUCRop3lJL149eHGSe9Jvczhzjo/0= 11 | github.com/gobwas/httphead v0.1.0 h1:exrUm0f4YX0L7EBwZHuCF4GDp8aJfVeBrlLQrs6NqWU= 12 | github.com/gobwas/httphead v0.1.0/go.mod h1:O/RXo79gxV8G+RqlR/otEwx4Q36zl9rqC5u12GKvMCM= 13 | github.com/gobwas/pool v0.2.1 h1:xfeeEhW7pwmX8nuLVlqbzVc7udMDrwetjEv+TZIz1og= 14 | github.com/gobwas/pool v0.2.1/go.mod h1:q8bcK0KcYlCgd9e7WYLm9LpyS+YeLd8JVDW6WezmKEw= 15 | github.com/gobwas/ws v1.4.0 h1:CTaoG1tojrh4ucGPcoJFiAQUAsEWekEWvLy7GsVNqGs= 16 | github.com/gobwas/ws v1.4.0/go.mod h1:G3gNqMNtPppf5XUz7O4shetPpcZ1VJ7zt18dlUeakrc= 17 | github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY= 18 | github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y= 19 | github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0= 20 | github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc= 21 | github.com/nbd-wtf/go-nostr v0.37.2 h1:42rriFqqz07EdydERwYeQnewl+Rah1Gq46I+Wh0KYYg= 22 | github.com/nbd-wtf/go-nostr v0.37.2/go.mod h1:TGKGj00BmJRXvRe0LlpDN3KKbELhhPXgBwUEhzu3Oq0= 23 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= 24 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 25 | github.com/puzpuzpuz/xsync/v3 v3.4.0 h1:DuVBAdXuGFHv8adVXjWWZ63pJq+NRXOWVXlKDBZ+mJ4= 26 | github.com/puzpuzpuz/xsync/v3 v3.4.0/go.mod h1:VjzYrABPabuM4KyBh1Ftq6u8nhwY5tBPKP9jpmh0nnA= 27 | github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg= 28 | github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= 29 | github.com/tidwall/gjson v1.17.3 h1:bwWLZU7icoKRG+C+0PNwIKC6FCJO/Q3p2pZvuP0jN94= 30 | github.com/tidwall/gjson v1.17.3/go.mod h1:/wbyibRr2FHMks5tjHJ5F8dMZh3AcwJEMf5vlfC0lxk= 31 | github.com/tidwall/match v1.1.1 h1:+Ho715JplO36QYgwN9PGYNhgZvoUSc9X2c80KVTi+GA= 32 | github.com/tidwall/match v1.1.1/go.mod h1:eRSPERbgtNPcGhD8UCthc6PmLEQXEWd3PRB5JTxsfmM= 33 | github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU= 34 | github.com/tidwall/pretty v1.2.1 h1:qjsOFOWWQl+N3RsoF5/ssm1pHmJJwhjlSbZ51I6wMl4= 35 | github.com/tidwall/pretty v1.2.1/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU= 36 | golang.org/x/exp v0.0.0-20240909161429-701f63a606c0 h1:e66Fs6Z+fZTbFBAxKfP3PALWBtpfqks2bwGcexMxgtk= 37 | golang.org/x/exp v0.0.0-20240909161429-701f63a606c0/go.mod h1:2TbTHSBQa924w8M6Xs1QcRcFwyucIwBGpK1p2f1YFFY= 38 | golang.org/x/net v0.21.0 h1:AQyQV4dYCvJ7vGmJyKki9+PBdyvhkSd8EIx/qb0AYv4= 39 | golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44= 40 | golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= 41 | golang.org/x/sys v0.25.0 h1:r+8e+loiHxRqhXVl6ML1nO3l1+oFoWbnlu2Ehimmi34= 42 | golang.org/x/sys v0.25.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= 43 | gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= 44 | gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= 45 | -------------------------------------------------------------------------------- /test/go-nostr/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "bufio" 5 | "fmt" 6 | "os" 7 | "strconv" 8 | "strings" 9 | "sync" 10 | 11 | "github.com/nbd-wtf/go-nostr" 12 | "github.com/nbd-wtf/go-nostr/nip77/negentropy" 13 | "github.com/nbd-wtf/go-nostr/nip77/negentropy/storage/vector" 14 | ) 15 | 16 | func main() { 17 | frameSizeLimit, _ := strconv.Atoi(os.Getenv("FRAMESIZELIMIT")) 18 | 19 | vec := vector.New() 20 | neg := negentropy.New(vec, frameSizeLimit) 21 | 22 | have := make([]string, 0, 500) 23 | need := make([]string, 0, 500) 24 | 25 | wg := sync.WaitGroup{} 26 | wg.Add(2) 27 | go func() { 28 | for item := range neg.Haves { 29 | have = append(have, item) 30 | } 31 | wg.Done() 32 | }() 33 | go func() { 34 | for item := range neg.HaveNots { 35 | need = append(need, item) 36 | } 37 | wg.Done() 38 | }() 39 | 40 | scanner := bufio.NewScanner(os.Stdin) 41 | const maxCapacity = 1024 * 1024 * 16 // 16MB 42 | buf := make([]byte, maxCapacity) 43 | scanner.Buffer(buf, maxCapacity) 44 | 45 | for scanner.Scan() { 46 | line := scanner.Text() 47 | if len(line) == 0 { 48 | continue 49 | } 50 | 51 | items := strings.Split(line, ",") 52 | 53 | switch items[0] { 54 | case "item": 55 | created, err := strconv.ParseUint(items[1], 10, 64) 56 | if err != nil { 57 | panic(err) 58 | } 59 | vec.Insert(nostr.Timestamp(created), items[2]) 60 | 61 | case "seal": 62 | vec.Seal() 63 | 64 | case "initiate": 65 | q := neg.Start() 66 | if frameSizeLimit != 0 && len(q)/2 > frameSizeLimit { 67 | panic("frameSizeLimit exceeded") 68 | } 69 | fmt.Printf("msg,%s\n", q) 70 | 71 | case "msg": 72 | q, err := neg.Reconcile(items[1]) 73 | if err != nil { 74 | panic(fmt.Sprintf("reconciliation failed: %v", err)) 75 | } 76 | if q == "" { 77 | wg.Wait() 78 | 79 | for _, id := range have { 80 | fmt.Printf("have,%s\n", id) 81 | } 82 | for _, id := range need { 83 | fmt.Printf("need,%s\n", id) 84 | } 85 | 86 | fmt.Println("done") 87 | 88 | continue 89 | } 90 | 91 | if frameSizeLimit > 0 && len(q)/2 > frameSizeLimit { 92 | panic("frameSizeLimit exceeded") 93 | } 94 | 95 | fmt.Printf("msg,%s\n", q) 96 | 97 | default: 98 | panic("unknown cmd: " + items[0]) 99 | } 100 | } 101 | 102 | if err := scanner.Err(); err != nil { 103 | panic(err) 104 | } 105 | } 106 | -------------------------------------------------------------------------------- /test/go/harness.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "strings" 5 | "bufio" 6 | "fmt" 7 | "os" 8 | "strconv" 9 | "encoding/hex" 10 | negentropy "github.com/illuzen/go-negentropy" 11 | ) 12 | 13 | func split(s string, delim rune) []string { 14 | return strings.FieldsFunc(s, func(r rune) bool { 15 | return r == delim 16 | }) 17 | } 18 | 19 | func main() { 20 | frameSizeLimit := uint64(0) 21 | if env, exists := os.LookupEnv("FRAMESIZELIMIT"); exists { 22 | var err error 23 | frameSizeLimit, err = strconv.ParseUint(env, 10, 64) 24 | if err != nil { 25 | panic(fmt.Errorf("invalid FRAMESIZELIMIT: %w", err)) 26 | } 27 | } 28 | 29 | storage := negentropy.NewVector() 30 | var ne *negentropy.Negentropy 31 | 32 | scanner := bufio.NewScanner(os.Stdin) 33 | const maxCapacity = 1024 * 1024 // 1MB 34 | buf := make([]byte, maxCapacity) 35 | scanner.Buffer(buf, maxCapacity) 36 | 37 | for scanner.Scan() { 38 | line := scanner.Text() 39 | if len(line) == 0 { 40 | continue 41 | } 42 | 43 | items := split(line, ',') 44 | 45 | switch items[0] { 46 | case "item": 47 | if len(items) != 3 { 48 | panic("wrong num of fields") 49 | } 50 | created, err := strconv.ParseUint(items[1], 10, 64) 51 | if err != nil { 52 | panic(err) 53 | } 54 | id, err := hex.DecodeString(items[2]) // Assume fromHex translates hex string to []byte 55 | if err != nil { 56 | panic(err) 57 | } 58 | storage.Insert(created, id) 59 | 60 | case "seal": 61 | storage.Seal() 62 | neg, err := negentropy.NewNegentropy(storage, frameSizeLimit) 63 | if err != nil { 64 | panic(err) 65 | } 66 | ne = neg 67 | 68 | case "initiate": 69 | q, err := ne.Initiate() 70 | if err != nil { 71 | panic(err) 72 | } 73 | if frameSizeLimit != 0 && uint64(len(q)) > frameSizeLimit { 74 | panic("initiate frameSizeLimit exceeded") 75 | } 76 | fmt.Printf("msg,%s\n", hex.EncodeToString(q)) 77 | 78 | case "msg": 79 | var q []byte 80 | if len(items) >= 2 { 81 | s, err := hex.DecodeString(items[1]) 82 | if err != nil { 83 | panic(err) 84 | } 85 | q = s 86 | } 87 | 88 | if (*ne).IsInitiator { 89 | var have, need []string 90 | resp, err := ne.ReconcileWithIDs(q, &have, &need) 91 | if err != nil { 92 | panic(fmt.Sprintf("Reconciliation failed: %v", err)) 93 | } 94 | 95 | for _, id := range have { 96 | fmt.Printf("have,%s\n", hex.EncodeToString([]byte(id))) 97 | } 98 | for _, id := range need { 99 | fmt.Printf("need,%s\n", hex.EncodeToString([]byte(id))) 100 | } 101 | 102 | if resp == nil { 103 | fmt.Println("done") 104 | continue 105 | } 106 | 107 | q = resp 108 | } else { 109 | s, err := ne.Reconcile(q) 110 | if err != nil { 111 | panic(fmt.Sprintf("Reconciliation failed: %v", err)) 112 | } 113 | q = s 114 | } 115 | 116 | if frameSizeLimit > 0 && uint64(len(q)) > frameSizeLimit { 117 | panic("frameSizeLimit exceeded") 118 | } 119 | fmt.Printf("msg,%s\n", hex.EncodeToString(q)) 120 | 121 | default: 122 | panic("unknown cmd: " + items[0]) 123 | } 124 | } 125 | 126 | if err := scanner.Err(); err != nil { 127 | panic(err) 128 | } 129 | } -------------------------------------------------------------------------------- /test/js/harness.js: -------------------------------------------------------------------------------- 1 | const readline = require('readline'); 2 | const { Negentropy, NegentropyStorageVector } = require('../../js/Negentropy.js'); 3 | 4 | let frameSizeLimit = 0; 5 | if (process.env.FRAMESIZELIMIT) frameSizeLimit = parseInt(process.env.FRAMESIZELIMIT); 6 | 7 | const rl = readline.createInterface({ 8 | input: process.stdin, 9 | output: process.stdout, 10 | terminal: false 11 | }); 12 | 13 | let ne; 14 | let storage = new NegentropyStorageVector(); 15 | 16 | rl.on('line', async (line) => { 17 | let items = line.split(','); 18 | 19 | if (items[0] == "item") { 20 | if (items.length !== 3) throw Error("too few items"); 21 | let created = parseInt(items[1]); 22 | let id = items[2].trim(); 23 | storage.insert(created, id); 24 | } else if (items[0] == "seal") { 25 | storage.seal(); 26 | ne = new Negentropy(storage, frameSizeLimit); 27 | } else if (items[0] == "initiate") { 28 | let q = await ne.initiate(); 29 | if (frameSizeLimit && q.length/2 > frameSizeLimit) throw Error("frameSizeLimit exceeded"); 30 | console.log(`msg,${q}`); 31 | } else if (items[0] == "msg") { 32 | let q = items[1]; 33 | let [newQ, haveIds, needIds] = await ne.reconcile(q); 34 | q = newQ; 35 | 36 | for (let id of haveIds) console.log(`have,${id}`); 37 | for (let id of needIds) console.log(`need,${id}`); 38 | 39 | if (frameSizeLimit && q !== null && q.length/2 > frameSizeLimit) throw Error("frameSizeLimit exceeded"); 40 | 41 | if (q === null) { 42 | console.log(`done`); 43 | } else { 44 | console.log(`msg,${q}`); 45 | } 46 | } else { 47 | throw Error(`unknown cmd: ${items[0]}`); 48 | } 49 | }); 50 | 51 | rl.on('close', () => { 52 | process.exit(0); 53 | }); 54 | -------------------------------------------------------------------------------- /test/protoversion.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | 3 | use strict; 4 | $|++; 5 | 6 | use IPC::Open2; 7 | use Session::Token; 8 | use FindBin; 9 | use lib "$FindBin::Bin"; 10 | use Utils; 11 | 12 | die "usage: $0 " if @ARGV < 1; 13 | my $harnessCmd = Utils::harnessTypeToCmd(shift) || die "please provide harness type (cpp, js, etc)"; 14 | 15 | 16 | my $expectedResp; 17 | 18 | ## Get expected response using protocol version 1 19 | 20 | { 21 | my ($infile, $outfile); 22 | my $pid = open2($outfile, $infile, $harnessCmd); 23 | 24 | print $infile "item,12345,eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\n"; 25 | print $infile "seal\n"; 26 | print $infile "msg,6100000200\n"; ## full range bound, empty IdList 27 | 28 | my $resp = <$outfile>; 29 | chomp $resp; 30 | 31 | $expectedResp = $resp; 32 | } 33 | 34 | ## Client tries to use some hypothetical newer version, but falls back to version 1 35 | 36 | { 37 | my ($infile, $outfile); 38 | my $pid = open2($outfile, $infile, $harnessCmd); 39 | 40 | print $infile "item,12345,eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\n"; 41 | print $infile "seal\n"; 42 | 43 | print $infile "msg,62aabbccddeeff\n"; ## some new protocol message 44 | 45 | my $resp = <$outfile>; 46 | chomp $resp; 47 | 48 | ## 61: Preferred protocol version 49 | die "bad upgrade response: $resp" unless $resp eq "msg,61"; 50 | 51 | ## Try again with protocol version 1 52 | print $infile "msg,6100000200\n"; ## full range bound, empty IdList 53 | 54 | $resp = <$outfile>; 55 | chomp $resp; 56 | die "didn't fall back to protocol version 1: $resp" unless $resp eq $expectedResp; 57 | } 58 | 59 | print "OK\n"; 60 | -------------------------------------------------------------------------------- /test/test.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | 3 | use strict; 4 | 5 | unlink("negent-test.log"); 6 | 7 | 8 | my $langs = shift // 'cpp,js'; 9 | my @langs = split /,/, $langs; 10 | 11 | 12 | ## Compat tests (verify langs are compatible with eachother) 13 | 14 | { 15 | my $allOpts = [ 16 | 'RECS=50000 P1=1 P2=0 P3=0 NODUPS=1', 17 | 'RECS=50000 P1=0 P2=1 P3=0 NODUPS=1', 18 | 'RECS=50000 P1=0 P2=0 P3=1 NODUPS=1', 19 | 'RECS=50000 P1=2 P2=2 P3=7 NODUPS=1', 20 | 21 | 'RECS=100000 FRAMESIZELIMIT1=60000 FRAMESIZELIMIT2=500000 P1=1 P2=0 P3=0', 22 | 'RECS=100000 FRAMESIZELIMIT1=60000 FRAMESIZELIMIT2=500000 P1=0 P2=1 P3=0', 23 | 'RECS=100000 FRAMESIZELIMIT1=60000 FRAMESIZELIMIT2=500000 P1=0 P2=0 P3=1', 24 | 'RECS=100000 FRAMESIZELIMIT1=60000 FRAMESIZELIMIT2=500000 P1=1 P2=1 P3=5', 25 | 26 | 'CLUSTERED=1 FRAMESIZELIMIT1=50000 FRAMESIZELIMIT2=50000', 27 | 'CLUSTERED=1 FRAMESIZELIMIT1=50000 FRAMESIZELIMIT2=50000 P1=1 P2=1 P3=1', 28 | 'CLUSTERED=1 NODUPS=1', 29 | ]; 30 | 31 | foreach my $lang1 (@langs) { 32 | foreach my $lang2 (@langs) { 33 | foreach my $opts (@$allOpts) { 34 | note("------ INTEROP $lang1 / $lang2 : $opts ------"); 35 | run("$opts perl fuzz.pl $lang1 $lang2"); 36 | } 37 | } 38 | } 39 | } 40 | 41 | 42 | ## Delta tests (ensure output from all langs are byte-for-byte identical) 43 | 44 | if (@langs >= 2) { 45 | my @otherLangs = @langs; 46 | my $firstLang = shift @otherLangs; 47 | 48 | my $allOpts = [ 49 | 'RECS=10000 P1=1 P2=0 P3=0', 50 | 'RECS=10000 P1=0 P2=1 P3=0', 51 | 'RECS=10000 P1=0 P2=0 P3=1', 52 | 53 | 'RECS=10000 P1=1 P2=1 P3=10', 54 | 'RECS=100000 FRAMESIZELIMIT1=60000 FRAMESIZELIMIT2=500000 P1=1 P2=1 P3=10', 55 | 'RECS=200000 FRAMESIZELIMIT1=30000 FRAMESIZELIMIT2=200000 P1=1 P2=1 P3=10', 56 | ]; 57 | 58 | foreach my $opts (@$allOpts) { 59 | foreach my $lang (@otherLangs) { 60 | note("------- DELTA $firstLang / $lang : $opts ------"); 61 | 62 | my $res = system( 63 | "/bin/bash", 64 | "-c", 65 | qq{diff -u <(DELTATRACE=1 $opts perl fuzz.pl $firstLang $firstLang | grep DELTATRACE) <(DELTATRACE=1 $opts perl fuzz.pl $lang $lang | grep DELTATRACE)}, 66 | ); 67 | 68 | die "Difference detected between $firstLang / $lang" if $res; 69 | } 70 | } 71 | } 72 | 73 | 74 | 75 | ## Protocol upgrade tests 76 | 77 | foreach my $lang (@langs) { 78 | note("------- PROTO UPGRADE $lang -------"); 79 | run("perl protoversion.pl $lang"); 80 | } 81 | 82 | 83 | 84 | ######## 85 | 86 | sub run { 87 | my $cmd = shift; 88 | 89 | print "RUN: $cmd\n"; 90 | 91 | system("echo 'RUN: $cmd' >>negent-test.log"); 92 | system("$cmd >>negent-test.log 2>&1") && die "test failure (see negent-test.log file)"; 93 | system("echo '----------' >>negent-test.log"); 94 | } 95 | 96 | sub note { 97 | my $note = shift; 98 | 99 | print "NOTE: $note\n"; 100 | 101 | system("echo 'NOTE: $note' >>negent-test.log"); 102 | } 103 | --------------------------------------------------------------------------------