├── output.pdf ├── readme.pdf ├── report.pdf ├── README.md ├── Federated Learning-Blockchain-Anomalies.py └── Federated Learning-Blockchain-Anomalies.ipynb /output.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MALLI7622/Federated-Blockchain-Anamoly-Detection/HEAD/output.pdf -------------------------------------------------------------------------------- /readme.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MALLI7622/Federated-Blockchain-Anamoly-Detection/HEAD/readme.pdf -------------------------------------------------------------------------------- /report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MALLI7622/Federated-Blockchain-Anamoly-Detection/HEAD/report.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Supervised Learning approach to Detect Anomalies in Blockchain using Federated Learning 2 | 3 | ### Problem Statement 4 | 5 | As the interest in and use of machine learning for security applications 6 | increases, so will the awareness of cyber-criminals. When frequently updating an 7 | ML model to account for new threats, malicious adversaries can launch 8 | causative/data poisoning attacks to inject misleading training data intentionally 9 | so that an ML model becomes ineffective. Detecting anomalies in Blockchain 10 | needs each individual block data has to take to the central server which is very 11 | complex to get and train each block. And also in the testing phase, the model 12 | needs new block data it was complex to get the test data. As mentioned above 13 | attackers are intentionally deliver their data to the model which can’t predict 14 | anomalies. 15 | 16 | ### Proposed Solution: 17 | So, I came up with a new solution using new concept called Federated Learning ( 18 | A technique for training Machine models on data which do not have access ). 19 | Federated learning is one of the most widely deployed techniques in the context 20 | of Private Deep Learning. An interesting blog about Federated Learning by [Prof. 21 | Mi Zhang](https://www.egr.msu.edu/~mizhang/) found 22 | [here](https://medium.com/syncedreview/federated-learning-the-future-of-distributed-machine-learning-eec95242d897). 23 | I had created 50 VirtualWorkers for training the model using [PySyft](https://github.com/OpenMined/PySyft) 24 | ( PySyft is a Python library for secure, private Deep Learning). I had added data to each 25 | VirtualWorker and then created Blockchain using [Awesome Blockchains](https://github.com/openblockchains/awesome-blockchains). 26 | After creating the model the created model the created model has been sent to 27 | each VirtualWorker containing the data and train the model and goes to another 28 | Model. This process has iterated to all the blocks in the Blockchain trained their 29 | data. 30 | Now the model is ready to build for testing this model has been sent to any new 31 | block in a new Blockchain to testing. 32 | 33 | ### Results : 34 | I had iterated my model over 20 times the loss has been decreased from 0.6602 to 35 | 0.0001. 36 | 37 | ### References : 38 | 1. I had got a Secure and Private AI Scholarship from Facebook. Facebook offers 39 | a course at [Udacity](https://www.udacity.com/). In this course, one of the main topics was Federated 40 | Learning using PySyft. This helps to solve this problem using Federated Learning. 41 | Here are the tutorials on [GitHub](https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials/). 42 | 43 | 2. [Chained Anomaly Detection Models for Federated Learning: An Intrusion 44 | Detection Case Study](https://res.mdpi.com/d_attachment/applsci/applsci-08-02663/article_deploy/applsci-08-02663.pdf) 45 | this paper contains Anomaly Detection through Federated Learning using the CICIDS2017 dataset. I got an abstract idea to dealt with this 46 | problem. 47 | 48 | 3. [BAD: a Blockchain Anomaly Detection solution](https://arxiv.org/abs/1807.03833) 49 | 50 | 4. Tried these articles and courses 51 | ● [Introduction to Anomaly Detection in Python] 52 | (https://www.datacamp.com/community/news/introduction-to-anomaly-detection-in-python-65h6e32k2ve) 53 | ● [Anomaly Detection | Python - Course Outline - DataCamp] 54 | (https://campus.datacamp.com/courses/designing-machine-learning-workflows-in-python/unsupervised-workflows?ex=1) 55 | 5. Awesome Blockchains 56 | 57 | 58 | Presentation to this Problem (https://docs.google.com/presentation/d/173I4XCBvmzhgVQ2YAjBrLvzRQ7kcYaCi6aGZ348YG1A/edit?usp=sharing ) 59 | -------------------------------------------------------------------------------- /Federated Learning-Blockchain-Anomalies.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # In[1]: 5 | 6 | 7 | import random 8 | import namegenerator 9 | import hashlib as hasher 10 | import torch 11 | import syft 12 | 13 | 14 | # In[2]: 15 | 16 | 17 | hook = syft.TorchHook(torch) 18 | names = list() 19 | transaction = list() 20 | labels = list() 21 | 22 | 23 | # In[3]: 24 | 25 | 26 | i = 0 27 | while i < 50: 28 | name = namegenerator.gen() 29 | names.append(name) 30 | transaction_id = random.randint(100000000000000,999999999999999) 31 | transaction.append(transaction_id) 32 | label = random.randint(0,1) 33 | labels.append(label) 34 | i = i + 1 35 | 36 | 37 | # In[4]: 38 | 39 | print("*************** All names, Transaction_id , labels *********************") 40 | 41 | for i in range(20): 42 | print( "Name -->",names[i] , "Transaction id -->:",transaction[i], "Label -->",labels[i]) 43 | 44 | 45 | # In[5]: 46 | 47 | 48 | class Block: 49 | def __init__(self, name, transaction_id,label): 50 | self.name = name 51 | self.transaction_id = transaction_id 52 | self.label = label 53 | self.hash = self.hash_block() 54 | 55 | def hash_block(self): 56 | sha = hasher.sha256() 57 | sha.update(str(self.name).encode('utf-8') + 58 | str(self.transaction_id).encode('utf-8') + 59 | str(self.label).encode('utf-8')) 60 | return sha.hexdigest() 61 | 62 | 63 | # In[6]: 64 | 65 | 66 | name_0 = names[0] 67 | transaction_0 = transaction[0] 68 | labels_0 = labels[0] 69 | print("********* First Block details *****************") 70 | print("name",name_0,"transaction",transaction_0,"labels",labels_0) 71 | 72 | 73 | # In[7]: 74 | 75 | 76 | def create_genesis_block(): 77 | 78 | return Block(name_0, transaction_0,labels_0) 79 | 80 | 81 | # In[8]: 82 | 83 | 84 | def next_block(last_block,j): 85 | this_name = names[j] 86 | this_transaction_id = transaction[j] 87 | this_label = labels[j] 88 | this_hash = last_block.hash 89 | return Block(this_name, this_transaction_id,label) 90 | 91 | 92 | # In[9]: 93 | 94 | 95 | blockchain = [create_genesis_block()] 96 | previous_block = blockchain[0] 97 | num_of_blocks_to_add = len(names) 98 | 99 | 100 | # In[20]: 101 | 102 | 103 | print("************************** All block details ****************************") 104 | 105 | for i in range(0, num_of_blocks_to_add): 106 | block_to_add = next_block(previous_block,i) 107 | blockchain.append(block_to_add) 108 | previous_block = block_to_add 109 | print("Name: {}\n".format(block_to_add.name)) 110 | print("Hash: {}\n".format(block_to_add.transaction_id)) 111 | print("Hash: {}\n".format(block_to_add.label)) 112 | print("Hash: {}\n".format(block_to_add.hash)) 113 | 114 | 115 | # In[11]: 116 | 117 | 118 | a = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","aa","bb","cc", 119 | "dd","ee","ff","gg","hh","ii","jj","kk","ll","mm","nn","oo","pp","qq","rr","ss","tt","uu","vv","ww","xx"] 120 | 121 | 122 | # In[12]: 123 | 124 | 125 | b = a 126 | 127 | 128 | # In[13]: 129 | 130 | 131 | for i in range(len(names)): 132 | names[i] = syft.VirtualWorker(hook, id = names[i]) 133 | a[i] = torch.tensor([transaction[i]]).send(names[i]) 134 | b[i] = torch.tensor([labels[i]]).send(names[i]) 135 | 136 | 137 | # In[22]: 138 | 139 | print("************ Pointers of transaction_id and labels ***************************") 140 | 141 | for i in range(len(a)): 142 | print("Transaction_id address -->", a[i],"\n Label address -->",b[i]) 143 | 144 | 145 | # In[23]: 146 | 147 | 148 | datasets = [] 149 | 150 | 151 | # In[24]: 152 | 153 | 154 | for i in range(len(names)): 155 | datasets.append((a[i],b[i])) 156 | 157 | 158 | # In[25]: 159 | 160 | print("******************************** datasets values ***********************************") 161 | 162 | for i in range(10): 163 | print(datasets[i]) 164 | 165 | 166 | # In[17]: 167 | 168 | 169 | from torch import nn 170 | from torch import optim 171 | 172 | 173 | # In[18]: 174 | 175 | 176 | def train(iterations = 20): 177 | model = nn.Linear(50,22) 178 | optimizer_fed = optim.SGD(params = model.parameters(), lr = 0.1) 179 | for iter in range(iterations): 180 | for data, target in datasets: 181 | model = model.send(data.location) 182 | optimizer_fed.zero_grad() 183 | pred = model(data) 184 | loss = (( pred - target) ** 2).sum() 185 | loss.backward() 186 | optimizer_fed.step() 187 | model = model.get() 188 | print(loss.get()) 189 | 190 | 191 | # In[19]: 192 | 193 | 194 | train() 195 | 196 | 197 | # In[ ]: 198 | -------------------------------------------------------------------------------- /Federated Learning-Blockchain-Anomalies.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Supervised Learning approach to Detect Anomalies in Blockchain using Federated Learning\n", 8 | "## Libraries I have used for implementing\n", 9 | "#### 1. PyTorch (https://pytorch.org/)\n", 10 | "#### 2. PySyft ( https://github.com/OpenMined/PySyft )" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import random\n", 20 | "import namegenerator\n", 21 | "import hashlib as hasher ## For using hash functions\n", 22 | "import datetime as date \n", 23 | "import torch\n", 24 | "import syft" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "## hook creates virtual addresses to each data item\n", 34 | "hook = syft.TorchHook(torch)\n", 35 | "\n", 36 | "## Creating list for names of a transactions\n", 37 | "names = list()\n", 38 | "\n", 39 | "## Creating list for tranasaction_id\n", 40 | "transaction = list()\n", 41 | "\n", 42 | "## Labels for each each transaction either it was anamoly or correct\n", 43 | "labels = list()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "\n", 53 | "## Generating 50 transaction names and their id's\n", 54 | "\n", 55 | "i = 0\n", 56 | "while i < 50:\n", 57 | " name = namegenerator.gen()\n", 58 | " names.append(name)\n", 59 | " transaction_id = random.randint(100000000000000,999999999999999)\n", 60 | " transaction.append(transaction_id)\n", 61 | " label = random.randint(0,1)\n", 62 | " labels.append(label)\n", 63 | " i = i + 1" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "\n", 73 | "## Now let's see the fifty transaction id's and their labels\n", 74 | "\n", 75 | "for i in range(20):\n", 76 | " print( \"Name -->\",names[i] , \"Transaction id -->:\",transaction[i], \"Label -->\",labels[i])" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "\n", 86 | "## Now I'm going to create a Blockchain for 50 transactions\n", 87 | "\n", 88 | "class Block: \n", 89 | " def __init__(self, name, transaction_id,label):\n", 90 | " self.name = name\n", 91 | " self.transaction_id = transaction_id\n", 92 | " self.label = label\n", 93 | " self.hash = self.hash_block()\n", 94 | "\n", 95 | " def hash_block(self):\n", 96 | " sha = hasher.sha256()\n", 97 | " sha.update(str(self.name).encode('utf-8') + \n", 98 | " str(self.transaction_id).encode('utf-8') + \n", 99 | " str(self.label).encode('utf-8'))\n", 100 | " return sha.hexdigest()" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "name_0 = names[0]\n", 110 | "transaction_0 = transaction[0]\n", 111 | "labels_0 = labels[0]\n", 112 | "name_0,transaction_0,labels_0" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "\n", 122 | "## create_genesis_block() which creates initial block of the chain\n", 123 | "\n", 124 | "def create_genesis_block():\n", 125 | " \n", 126 | " return Block(name_0, transaction_0,labels_0)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "\n", 136 | "## After create_genesis_block() has been called this next_block() will create attach each transaction and their label id's to \n", 137 | "## blockchain\n", 138 | "\n", 139 | "def next_block(last_block,j):\n", 140 | " this_name = names[j]\n", 141 | " this_transaction_id = transaction[j]\n", 142 | " this_label = labels[j]\n", 143 | " this_hash = last_block.hash\n", 144 | " return Block(this_name, this_transaction_id,label)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": null, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "blockchain = [create_genesis_block()]\n", 154 | "previous_block = blockchain[0]\n", 155 | "num_of_blocks_to_add = len(names)\n" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "scrolled": false 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "\n", 167 | "## Now, Let's see the each transaction name,id,label and Hash Function\n", 168 | "\n", 169 | "for i in range(0, num_of_blocks_to_add):\n", 170 | " block_to_add = next_block(previous_block,i)\n", 171 | " blockchain.append(block_to_add)\n", 172 | " previous_block = block_to_add\n", 173 | " print(\"Name: {}\\n\".format(block_to_add.name))\n", 174 | " print(\"transaction_id: {}\\n\".format(block_to_add.transaction_id))\n", 175 | " print(\"Label: {}\\n\".format(block_to_add.label)) \n", 176 | " print(\"Hash: {}\\n\".format(block_to_add.hash))" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "\n", 186 | "## Creating 50 new names for VirtualWorker creation\n", 187 | "\n", 188 | "a = [\"a\",\"b\",\"c\",\"d\",\"e\",\"f\",\"g\",\"h\",\"i\",\"j\",\"k\",\"l\",\"m\",\"n\",\"o\",\"p\",\"q\",\"r\",\"s\",\"t\",\"u\",\"v\",\"w\",\"x\",\"y\",\"z\",\"aa\",\"bb\",\"cc\",\n", 189 | " \"dd\",\"ee\",\"ff\",\"gg\",\"hh\",\"ii\",\"jj\",\"kk\",\"ll\",\"mm\",\"nn\",\"oo\",\"pp\",\"qq\",\"rr\",\"ss\",\"tt\",\"uu\",\"vv\",\"ww\",\"xx\"]" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "b = a" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "\n", 208 | "## So, Inorder to set each transaction to remote we've to use VirtualWorker method in PySyft package. It will creates addresses\n", 209 | "## for each transaction.\n", 210 | "\n", 211 | "\n", 212 | "for i in range(len(names)):\n", 213 | " names[i] = syft.VirtualWorker(hook, id = names[i])\n", 214 | " a[i] = torch.tensor([transaction[i]]).send(names[i])\n", 215 | " b[i] = torch.tensor([labels[i]]).send(names[i])" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "\n", 225 | "## Let's see the each transaction_id and their label address\n", 226 | "\n", 227 | "for i in range(len(a)):\n", 228 | " print(\"Transaction_id address -->\", a[i],\"\\n Label address -->\",b[i])" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "datasets = []" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "for i in range(len(names)):\n", 247 | " datasets.append((a[i],b[i]))" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "for i in range(10):\n", 257 | " print(datasets[i])" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "\n", 267 | "## Importing nn,optim classes from PyTorch to train my model\n", 268 | "\n", 269 | "from torch import nn\n", 270 | "from torch import optim" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "\n", 280 | "## Creating model\n", 281 | "\n", 282 | "\n", 283 | "def train(iterations = 20):\n", 284 | " model = nn.Linear(50,2)\n", 285 | " optimizer_fed = optim.SGD(params = model.parameters(), lr = 0.1)\n", 286 | " for iter in range(iterations):\n", 287 | " for data, target in datasets:\n", 288 | " \n", 289 | " ## Here model.send() will goes each transaction present in remotely and trained their and move to the next \n", 290 | " ## trasaction\n", 291 | " \n", 292 | " model = model.send(data.location)\n", 293 | " optimizer_fed.zero_grad()\n", 294 | " pred = model(data)\n", 295 | " loss = (( pred - target) ** 2).sum()\n", 296 | " loss.backward()\n", 297 | " optimizer_fed.step()\n", 298 | " model = model.get()\n", 299 | " print(loss.get())\n" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": { 306 | "scrolled": true 307 | }, 308 | "outputs": [], 309 | "source": [ 310 | "\n", 311 | "## Finally training\n", 312 | "\n", 313 | "train()" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": null, 319 | "metadata": {}, 320 | "outputs": [], 321 | "source": [] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": {}, 334 | "outputs": [], 335 | "source": [] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": null, 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": null, 347 | "metadata": {}, 348 | "outputs": [], 349 | "source": [] 350 | } 351 | ], 352 | "metadata": { 353 | "kernelspec": { 354 | "display_name": "Python 3", 355 | "language": "python", 356 | "name": "python3" 357 | }, 358 | "language_info": { 359 | "codemirror_mode": { 360 | "name": "ipython", 361 | "version": 3 362 | }, 363 | "file_extension": ".py", 364 | "mimetype": "text/x-python", 365 | "name": "python", 366 | "nbconvert_exporter": "python", 367 | "pygments_lexer": "ipython3", 368 | "version": "3.7.4" 369 | } 370 | }, 371 | "nbformat": 4, 372 | "nbformat_minor": 2 373 | } 374 | --------------------------------------------------------------------------------