├── README.md ├── demo ├── delete.py ├── insert.py ├── update.py └── verify.py ├── ledger.py ├── main.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # Implementing Cryptographically Verifiable Change History using MongoDB (aka Ledger) 2 | 3 | ## 1. Problem Statement 4 | 5 | Organizations in industries such as finance, insurance, and supply chain have to (1) record complete changes to an information over time and (2) ensure that these changes have not been tampered with. The former requires storing all changes to a record over time. The later is often implemented with a combination of database auditing and security controls but it does not lead to easy verification. 6 | 7 | 8 | This is a proof-of-concept implementation of a library for implementing cryptographically verifiable change history of documents stored in MongoDB. The goals are two-fold: 9 | 10 | 1) Demonstrate creation of a document store with a cryptographically verifiable change history, while using MongoDB MQL and API. 11 | 2) Gather feedback on how such a functionality can address and simplify the data integrity needs of an application. 12 | 13 | 14 | ## 2. Common Questions 15 | 16 | ### 2.1 Does cyptographically verifiable change history constitute a Ledger? 17 | Ledgers are potentially useful for applications in areas such as drug development, accounting, audit, HR, crypto, and compliance. 18 | 19 | A common underlying themes for applications in the aforementioned areas is that information is stored in append-only mode. The assumption is that once the information has been stored in a ledger, it would not later be changed by an unauthorized entity. If a change is made to the information stored in a ledger, the change can be detected (e.g., using offline or online techniques). Cryptographic verification of change history helps determine information tampering in an online manner (e.g., upon document read). 20 | 21 | 22 | ### 2.2 Why not use MongoDB Auditing for document change history? 23 | 24 | In MongoDB, auditing can be enabled to record any changes made to documents. However, the audit records are generated in a separate file. To obtain the list of changes made to a single document, (1) MongoDB will need to be started with appropriate audit filters, and (2) the developer will have to sift through all the audit log entries to find the relevant changes for the document. This approach places additional burden on a developer. 25 | 26 | 27 | ## 3. Implementation Details 28 | 29 | This proof-of-concept library is implemented in Python. It provides functions to store 30 | documents in a collection along with cryptographic tamper-evidence (presently, a SHA256 31 | over the document and its history (if any) along with document metadata created 32 | by the library). Any updates to an existing document automatically results into a new copy of 33 | the document being inserted. The current and previous versions of a document are 34 | stored in separate collection. The name of this separate collection is 35 | `collectionname_history`. Any insertion or updates in this `history` collection 36 | are managed by the library. 37 | 38 | 39 | Specifically, the libray defines the following five functions: 40 | 41 | 1) `insert_one_ledger`: inserts a document in a collection along with cryptographic tamper-evidence and metadata. 42 | 2) `update_one_ledger`: updates an existing document while preserving the previous document, and links the two versions through cryptographic tamper-evidence. 43 | 3) `delete_one_ledger`: deletes a document from the collection; the document is still retained within the `collectionname_history` collection. 44 | 4) `verify_one_ledger`: verifies that document history is unchanged. 45 | 5) `init_ledger`: binds the above four functions to the collection python object, so that a developer can invoke these functions on a collection object as it would do so for any other collection function. 46 | 47 | 48 | ## 4. Environment setup (Mac OS) 49 | 50 | You need to install the following dependencies: 51 | 52 | * `python3` (brew install python@3.9) 53 | * `pip3` 54 | 55 | 56 | Follow the instructions on this [page](https://www.mongodb.com/docs/v6.0/tutorial/install-mongodb-on-os-x/ 57 | ) to install MongoDB 6.0 community edition. 58 | 59 | Install [mlaunch](https://rueckstiess.github.io/mtools/mlaunch.html), the tool needed to a three node MongoDB database replica set locally on your machine. 60 | 61 | Locally setup a MongoDB replica set or use Atlas to create a M10+ cluster. Since the library 62 | uses Transactions, a replica set must be launched. It can be created on your local machine 63 | through the following command: 64 | ``` 65 | mlaunch --replicaset --name ledger-rs --port 27017 66 | ``` 67 | 68 | ## 5. Developing an Application 69 | 70 | Clone the repository: 71 | ``` 72 | git clone git@github.com:mongodb-labs/ledger.git 73 | cd mdb_ledger 74 | ``` 75 | 76 | Setup a python virtual env and install dependencies: 77 | ``` 78 | python3 -m venv venv 79 | pip3 install -r requirements.txt 80 | ``` 81 | 82 | Import the ledger library, create connection to your Mongo instance, and call the `init_ledger` function on the collection that will act as a ledger. This function dynamically adds the functions mentioned above to the collection object. 83 | 84 | ``` 85 | import os 86 | import ledger 87 | import pymongo 88 | 89 | uriString = 'mongodb://localhost:27017,localhost:27018,localhost:27019' 90 | client = pymongo.MongoClient(uriString, replicaSet='ledger-rs') 91 | 92 | mydb = client['mydatabase'] 93 | mycol = mydb['customers'] 94 | 95 | # initialize the ledger functions using function bindings 96 | ledger.init_ledger(mycol) 97 | ``` 98 | 99 | ### 5.1 Insert a document 100 | Inserts a document into the specified collection. Another copy of the document is inserted into the history collection. History collection has a `_history` suffix with prefix being the collection name. 101 | 102 | ``` 103 | mydict = { "key": "v1" } 104 | x = mycol.insert_one_ledger(mydict, client) 105 | print (x.inserted_id) 106 | ``` 107 | 108 | ### 5.2 Update a document 109 | Applies the update operation, retrieves the updated doc, update hash, 110 | and store it in the history collection. 111 | ``` 112 | params = {"$set": {"name": "John 2"}} 113 | mycol.update_one_ledger({"_id": inserted_id}, params, client) 114 | ``` 115 | 116 | ### 5.3 Delete a document 117 | Delete a document from the ledger collection. 118 | ``` 119 | deleted = mycol.delete_one_ledger({"_id": inserted_id}, client) 120 | ``` 121 | 122 | ### 5.4 Verify change history of a document 123 | Verify change history of a document. 124 | ``` 125 | ret = mycol.verify_one_ledger({"_ledgermeta.orig_id": inserted_id}, client) 126 | print (ret) 127 | ``` 128 | 129 | ## 6. Can I use this code in production? 130 | 131 | This code is a proof-of-concept implementation of cryptographically verifiable 132 | change history using MongoDB. It is not meant for production usage. -------------------------------------------------------------------------------- /demo/delete.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pymongo 3 | import sys 4 | 5 | sys.path.append("..") 6 | import ledger 7 | 8 | 9 | uriString = 'mongodb://localhost:27017,localhost:27018,localhost:27019' 10 | client = pymongo.MongoClient(uriString, replicaSet='ledger-rs') 11 | 12 | mydb = client['mydatabase'] 13 | mycol = mydb['customers'] 14 | 15 | #0. Initialize the ledger functions using function bindings 16 | ledger.init_ledger(mycol) 17 | 18 | # 1. Insert first document 19 | mydict = { 20 | "First name": "Jane", 21 | "Last name": "Hill", 22 | "Address": "1 Broadway", 23 | "City": "New York", 24 | "State": "NY" 25 | } 26 | 27 | 28 | # 4. Delete the document 29 | inserted_id = sys.argv[1] 30 | deleted = mycol.delete_one_ledger({"_id": inserted_id}, client) 31 | 32 | -------------------------------------------------------------------------------- /demo/insert.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pymongo 3 | import sys 4 | 5 | sys.path.append("..") 6 | import ledger 7 | 8 | 9 | uriString = 'mongodb://localhost:27017,localhost:27018,localhost:27019' 10 | client = pymongo.MongoClient(uriString, replicaSet='ledger-rs') 11 | 12 | mydb = client['mydatabase'] 13 | mycol = mydb['customers'] 14 | 15 | #0. Initialize the ledger functions using function bindings 16 | ledger.init_ledger(mycol) 17 | 18 | # 1. Insert first document 19 | mydict = { 20 | "First name": "Jane", 21 | "Last name": "Hill", 22 | "Address": "1 Broadway", 23 | "City": "New York", 24 | "State": "NY" 25 | } 26 | 27 | 28 | # 1. insert the document in the ledger 29 | x = mycol.insert_one_ledger(mydict, client) 30 | print (x.inserted_id) 31 | -------------------------------------------------------------------------------- /demo/update.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pymongo 3 | import sys 4 | 5 | sys.path.append("..") 6 | import ledger 7 | 8 | 9 | uriString = 'mongodb://localhost:27017,localhost:27018,localhost:27019' 10 | client = pymongo.MongoClient(uriString, replicaSet='ledger-rs') 11 | 12 | mydb = client['mydatabase'] 13 | mycol = mydb['customers'] 14 | 15 | #0. Initialize the ledger functions using function bindings 16 | ledger.init_ledger(mycol) 17 | 18 | # 1. Insert first document 19 | mydict = { 20 | "First name": "Jane", 21 | "Last name": "Hill", 22 | "Address": "1 Broadway", 23 | "City": "New York", 24 | "State": "NY" 25 | } 26 | 27 | 28 | 29 | # 2. Update the inserted document 30 | inserted_id = sys.argv[1] 31 | params = {"$set": {"Address": "1 Michigan Avenue", "City": "Chicago", "State": "IL"}} 32 | mycol.update_one_ledger({"_id": inserted_id}, params, client) 33 | -------------------------------------------------------------------------------- /demo/verify.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pymongo 3 | import sys 4 | 5 | sys.path.append("..") 6 | import ledger 7 | 8 | 9 | uriString = 'mongodb://localhost:27017,localhost:27018,localhost:27019' 10 | client = pymongo.MongoClient(uriString, replicaSet='ledger-rs') 11 | 12 | mydb = client['mydatabase'] 13 | mycol = mydb['customers'] 14 | 15 | #0. Initialize the ledger functions using function bindings 16 | ledger.init_ledger(mycol) 17 | 18 | # 1. Insert first document 19 | mydict = { 20 | "First name": "Jane", 21 | "Last name": "Hill", 22 | "Address": "1 Broadway", 23 | "City": "New York", 24 | "State": "NY" 25 | } 26 | 27 | 28 | # 3. Verify the change history of the document 29 | inserted_id = sys.argv[1] 30 | ret = mycol.verify_one_ledger({"_ledgermeta.orig_id": inserted_id}, client) 31 | print (ret) 32 | 33 | -------------------------------------------------------------------------------- /ledger.py: -------------------------------------------------------------------------------- 1 | import pymongo 2 | import types 3 | from base64 import b64encode 4 | from os import urandom 5 | import json 6 | import hashlib 7 | import types 8 | import sys 9 | 10 | 11 | # inserts single document 12 | # creates an id for the doc if not present. 13 | # W + W 14 | def insert_one_ledger(self, params, client): 15 | 16 | db = self.database 17 | 18 | history_col_name = self.name + "_history" 19 | history_col = db[history_col_name] 20 | 21 | random_bytes = urandom(32) 22 | doc_id = b64encode(random_bytes).decode("utf-8") 23 | 24 | random_bytes = urandom(32) 25 | nonce = b64encode(random_bytes).decode("utf-8") 26 | 27 | if "_id" not in params.keys(): 28 | params["_id"] = doc_id 29 | 30 | params["_ledgermeta"] = {} 31 | params["_ledgermeta"]["prev_hash"] = "" 32 | params["_ledgermeta"]["orig_id"] = params["_id"] 33 | params["_ledgermeta"]["nonce"] = nonce 34 | params["_ledgermeta"]["op"] = "INSERTONE" 35 | params["_ledgermeta"]["seqno"] = 1 36 | del params["_id"] # to make it easy to verify updates to a document 37 | 38 | # get json string with keys sorted 39 | json_str = json.dumps(params, sort_keys=True).encode("utf-8") 40 | digestsha256 = hashlib.sha256(json_str) 41 | 42 | params["_ledgermeta"]["hash"] = digestsha256.hexdigest() 43 | params["_id"] = params["_ledgermeta"]["orig_id"] #readd id 44 | 45 | # Start transaction 46 | col_ret = None 47 | with client.start_session() as s: 48 | s.start_transaction() 49 | # insert in main collection 50 | col_ret = self.insert_one(params) 51 | 52 | # insert in ledger 53 | del params["_id"] 54 | history_col.insert_one(params) 55 | s.commit_transaction() 56 | 57 | ##TODO: what happens if either there is an error in col_ret or history_col_ret 58 | return col_ret 59 | 60 | # updates single document 61 | # not thread safe 62 | # requires that _id must be present in query. all other query parameters ignored 63 | # W + R + W 64 | def update_one_ledger(self, query, params, client): 65 | 66 | db = self.database 67 | history_col_name = self.name + "_history" 68 | history_col = db[history_col_name] 69 | 70 | random_bytes = urandom(32) 71 | nonce = b64encode(random_bytes).decode("utf-8") 72 | 73 | query = {"_id": query["_id"]} 74 | 75 | params["$set"]["_ledgermeta.nonce"] = nonce 76 | params["$set"]["_ledgermeta.op"] = "UPDATEONE" 77 | params["$inc"] = {"_ledgermeta.seqno": 1} 78 | 79 | 80 | ret = None 81 | with client.start_session() as s: 82 | s.start_transaction() 83 | # update in main collection 84 | # we update it first in order to let mongo take care of all the query syntax 85 | ret = self.update_one(query, params) 86 | if ret is None: 87 | return None 88 | 89 | # now update in collection ledger 90 | item = self.find_one(query) 91 | 92 | item["_ledgermeta"]["prev_hash"] = item["_ledgermeta"]["hash"] 93 | 94 | del item["_id"] 95 | del item["_ledgermeta"]["hash"] 96 | 97 | # get json string with keys sorted 98 | json_str = json.dumps(item, sort_keys=True).encode("utf-8") 99 | digestsha256 = hashlib.sha256(json_str) 100 | item["_ledgermeta"]["hash"] = digestsha256.hexdigest() 101 | 102 | # add this document to history collection 103 | history_col.insert_one(item) 104 | 105 | # update this item again with new hash in main collection 106 | item["_id"] = item["_ledgermeta"]["orig_id"] 107 | newparams = {"$set": {}} 108 | newparams["$set"]["_ledgermeta.prev_hash"] = item["_ledgermeta"]["prev_hash"] 109 | newparams["$set"]["_ledgermeta.hash"] = item["_ledgermeta"]["hash"] 110 | self.update_one(query, newparams) 111 | 112 | s.commit_transaction() 113 | 114 | return ret 115 | 116 | # delete document based on id 117 | # not thread safe 118 | # requires that _id must be present in query. all other query parameters ignored 119 | # R + D + W 120 | def delete_one_ledger(self, query, client): 121 | 122 | db = self.database 123 | history_col_name = self.name + "_history" 124 | history_col = db[history_col_name] 125 | 126 | random_bytes = urandom(32) 127 | nonce = b64encode(random_bytes).decode("utf-8") 128 | 129 | query = {"_id": query["_id"]} 130 | 131 | # now update in collection ledger 132 | item = self.find_one(query) 133 | 134 | item["_ledgermeta"]["nonce"] = nonce 135 | item["_ledgermeta"]["prev_hash"] = item["_ledgermeta"]["hash"] 136 | item["_ledgermeta"]["op"] = "DELETEONE" 137 | item["_ledgermeta"]["seqno"] = item["_ledgermeta"]["seqno"] + 1 138 | 139 | del item["_id"] 140 | del item["_ledgermeta"]["hash"] 141 | 142 | # get json string with keys sorted 143 | json_str = json.dumps(item, sort_keys=True).encode("utf-8") 144 | digestsha256 = hashlib.sha256(json_str) 145 | item["_ledgermeta"]["hash"] = digestsha256.hexdigest() 146 | 147 | ret = self.delete_one(query) 148 | history_col.insert_one(item) 149 | 150 | return ret 151 | 152 | # returns a boolean: True or False 153 | # not thread safe 154 | # if no items found, True is returned 155 | def verify_one_ledger(self, query, client): 156 | 157 | items_found = False # flag checks if query returned any items 158 | 159 | db = self.database 160 | history_col_name = self.name + "_history" 161 | history_col = db[history_col_name] 162 | 163 | items = history_col.find(query).sort("_ledgermeta.seqno", 1) 164 | 165 | if items is None: 166 | return None 167 | 168 | hashes_match = True 169 | for item in items: 170 | items_found = True 171 | orig_hash = item["_ledgermeta"]["hash"] 172 | del item["_id"] 173 | del item["_ledgermeta"]["hash"] 174 | 175 | json_str = json.dumps(item, sort_keys=True).encode("utf-8") 176 | computed_hash = hashlib.sha256(json_str).hexdigest() 177 | 178 | if ( orig_hash != computed_hash): 179 | hashes_match = False 180 | return hashes_match 181 | 182 | return hashes_match & items_found 183 | 184 | 185 | 186 | def init_ledger(mycol): 187 | mycol.insert_one_ledger = types.MethodType( insert_one_ledger, mycol ) 188 | mycol.update_one_ledger = types.MethodType( update_one_ledger, mycol ) 189 | mycol.delete_one_ledger = types.MethodType( delete_one_ledger, mycol ) 190 | mycol.verify_one_ledger = types.MethodType( verify_one_ledger, mycol ) 191 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import ledger 3 | import pymongo 4 | import sys 5 | 6 | uriString = 'mongodb://localhost:27017,localhost:27018,localhost:27019' 7 | client = pymongo.MongoClient(uriString, replicaSet='ledger-rs') 8 | 9 | mydb = client['mydatabase'] 10 | mycol = mydb['customers'] 11 | 12 | #0. Initialize the ledger functions using function bindings 13 | ledger.init_ledger(mycol) 14 | 15 | # 1. Insert first document 16 | mydict = { 17 | "First name": "Jane", 18 | "Last name": "Hill", 19 | "Address": "1 Broadway", 20 | "City": "New York", 21 | "State": "NY" 22 | } 23 | 24 | # 1. insert the document in the ledger 25 | x = mycol.insert_one_ledger(mydict, client) 26 | print (x.inserted_id) 27 | 28 | # # 2. Update the inserted document 29 | # inserted_id = x.inserted_id 30 | # params = {"$set": {"Address": "1 Michigan Avenue", "City": "Chicago", "State": "IL"}} 31 | # mycol.update_one_ledger({"_id": inserted_id}, params, client) 32 | 33 | # # 3. Update the inserted document again 34 | # inserted_id = x.inserted_id 35 | # params = {"$set": {"name": "John 3"}} 36 | # mycol.update_one_ledger({"_id": inserted_id}, params, client) 37 | 38 | # # 4. Delete the document 39 | # deleted = mycol.delete_one_ledger({"_id": inserted_id}, client) 40 | 41 | # inserted_id = "64vP2xz+rrntaLbaSFhitsnG7Dng6y2+Qq08JrIPYzo=" 42 | # ret = mycol.verify_one_ledger({"_ledgermeta.orig_id": inserted_id}, client) 43 | # print (ret) 44 | 45 | # sys.exit() 46 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pymongo==4.3.3 2 | --------------------------------------------------------------------------------