├── README.md ├── fraudml.JPG ├── LICENSE ├── Blockchain and XGBoosted K-means for Fraud Detection.ipynb └── Fraud Detection Using KMeans.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # Credit-card-fraud-detection-using-blockchain-and-ml 2 | -------------------------------------------------------------------------------- /fraudml.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PrudhviGNV/Credit-card-fraud-detection-using-blockchain-and-ml/main/fraudml.JPG -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. 166 | -------------------------------------------------------------------------------- /Blockchain and XGBoosted K-means for Fraud Detection.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#
Blockchain and Machine Learning for Fraud Detection: Employing Artificial Intelligence in the Banking Sector" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "###
Blockchain Integrated with an XGBoosted K-means Model" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "#### Import Libraries & Packages" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "import pandas as pd\n", 31 | "import pandas.testing as tm\n", 32 | "import numpy as np\n", 33 | "from numpy import loadtxt\n", 34 | "from sklearn.cluster import KMeans\n", 35 | "from sklearn.preprocessing import LabelEncoder\n", 36 | "from sklearn.preprocessing import MinMaxScaler\n", 37 | "import xgboost\n", 38 | "from xgboost import XGBClassifier\n", 39 | "import hashlib\n", 40 | "import json\n", 41 | "from time import time\n", 42 | "from urllib.parse import urlparse\n", 43 | "from uuid import uuid4\n", 44 | "import requests\n", 45 | "from flask import Flask, jsonify, request" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "### Data Wrangling" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 7, 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "#concatenate data into a single data frame\n", 62 | "\n", 63 | "account= pd.read_csv(\"account.csv\")\n", 64 | "order= pd.read_csv(\"order.csv\")\n", 65 | "transaction= pd.read_csv(\"transaction.csv\")\n", 66 | "\n", 67 | "X= pd.concat([account,order,transaction], axis=0)\n", 68 | "\n", 69 | "#dividing the data into train and test sets for the k-means model\n", 70 | "\n", 71 | "X_new= X.copy() #create a copy of your data \n", 72 | "\n", 73 | "x_train = X_new.sample(frac=0.40, random_state=0)\n", 74 | "x_test = X_new.drop(x_train.index)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### Blockchain" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "#Create a class to store the block chain\n", 91 | "\n", 92 | "class Blockchain:\n", 93 | " def __init__(self):\n", 94 | " self.current_trans = []\n", 95 | " self.chain = []\n", 96 | " self.nodes = set()\n", 97 | "\n", 98 | " #Create the genesis block\n", 99 | " self.new_block(prev_hash='1', proof=100)\n", 100 | "\n", 101 | " def new_node(self, address):\n", 102 | " \"\"\"\n", 103 | " Add a new node. View the node here:'http://192.168.0.5:5000'\n", 104 | " \"\"\"\n", 105 | "\n", 106 | " parsed_url = urlparse(address)\n", 107 | " if parsed_url.netloc:\n", 108 | " self.nodes.add(parsed_url.netloc)\n", 109 | " elif parsed_url.path:\n", 110 | " self.nodes.add(parsed_url.path)\n", 111 | " else:\n", 112 | " raise ValueError('Invalid URL. Please try again.')\n", 113 | "\n", 114 | "\n", 115 | " def valid_chain(self, chain):\n", 116 | " \"\"\"\n", 117 | " Determine if blockchain is valid.\n", 118 | " \"\"\"\n", 119 | "\n", 120 | " prev_block = chain[0]\n", 121 | " current_index = 1\n", 122 | "\n", 123 | " while current_index < len(chain):\n", 124 | " block = chain[current_index]\n", 125 | " print(f'{prev_block}')\n", 126 | " print(f'{block}')\n", 127 | " print(\"\\n-----------\\n\")\n", 128 | " #Check that the hash of the block is correct\n", 129 | " prev_block_hash = self.hash(prev_block)\n", 130 | " if block['prev_hash'] != prev_block_hash:\n", 131 | " return False\n", 132 | "\n", 133 | " #Check that the Proof of Work is correct\n", 134 | " if not self.valid_proof(prev_block['proof'], block['proof'], prev_block_hash):\n", 135 | " return False\n", 136 | "\n", 137 | " prev_block = block\n", 138 | " current_index += 1\n", 139 | "\n", 140 | " return True\n", 141 | "\n", 142 | " def conflict_resolution(self):\n", 143 | " \"\"\"\n", 144 | " Resolves conflicts by replacing current chain with the longest one in the network.\n", 145 | " \"\"\"\n", 146 | "\n", 147 | " neighbours = self.nodes\n", 148 | " new_chain = None\n", 149 | "\n", 150 | " #Identifying long chains\n", 151 | " max_length = len(self.chain)\n", 152 | "\n", 153 | " #Grab and verify the chains from all the nodes in the network\n", 154 | " for node in neighbours:\n", 155 | " response = requests.get(f'http://{node}/chain')\n", 156 | "\n", 157 | " if response.status_code == 200:\n", 158 | " length = response.json()['length']\n", 159 | " chain = response.json()['chain']\n", 160 | "\n", 161 | " #Check if the length is longer and the chain is valid\n", 162 | " if length > max_length and self.valid_chain(chain):\n", 163 | " max_length = length\n", 164 | " new_chain = chain\n", 165 | "\n", 166 | " #Replace chain if a valid longer chain is discovered\n", 167 | " if new_chain:\n", 168 | " self.chain = new_chain\n", 169 | " return True\n", 170 | "\n", 171 | " return False\n", 172 | "\n", 173 | " def new_block(self, proof, prev_hash):\n", 174 | "\n", 175 | " block = {\n", 176 | " 'index': len(self.chain) + 1,\n", 177 | " 'timestamp': time(),\n", 178 | " 'transactions': self.current_trans,\n", 179 | " 'proof': proof,\n", 180 | " 'prev_hash': prev_hash or self.hash(self.chain[-1]),\n", 181 | " }\n", 182 | "\n", 183 | " #Reset the current list of transactions\n", 184 | " self.current_trans = []\n", 185 | "\n", 186 | " self.chain.append(block)\n", 187 | " return block\n", 188 | "\n", 189 | " def new_trans(self, sender, recipient, amount):\n", 190 | " \"\"\"\n", 191 | " Creates a new transaction to go into the next mined Block.\n", 192 | " \"\"\"\n", 193 | " self.current_trans.append({\n", 194 | " 'sender': sender,\n", 195 | " 'recipient': recipient,\n", 196 | " 'amount': amount,\n", 197 | " })\n", 198 | "\n", 199 | " return self.prev_block['index'] + 1\n", 200 | "\n", 201 | " @property\n", 202 | " def prev_block(self):\n", 203 | " return self.chain[-1]\n", 204 | "\n", 205 | " @staticmethod\n", 206 | " def hash(block):\n", 207 | " \"\"\"\n", 208 | " SHA-256 encryption\n", 209 | " \"\"\"\n", 210 | "\n", 211 | " #Ensure that dictionary is ordered, to avoid inconsistent hashes.\n", 212 | " block_str = json.dumps(block, sort_keys=True).encode()\n", 213 | " return hashlib.sha256(block_str).hexdigest()\n", 214 | "\n", 215 | " def proof_of_work(self, prev_block):\n", 216 | " \n", 217 | " #Proof of Work Algorithm:\n", 218 | " #- Find a number p' such that hash(pp') contains leading 4 zeroes\n", 219 | " #- Where p is the previous proof, and p' is the new proof\n", 220 | "\n", 221 | " prev_proof = prev_block['proof']\n", 222 | " prev_hash = self.hash(prev_block)\n", 223 | "\n", 224 | " proof = 0\n", 225 | " while self.valid_proof(prev_proof, proof, prev_hash) is False:\n", 226 | " proof += 1\n", 227 | "\n", 228 | " return proof\n", 229 | "\n", 230 | " @staticmethod\n", 231 | " def valid_proof(prev_proof, proof, prev_hash):\n", 232 | "\n", 233 | " #Validates Proof\n", 234 | "\n", 235 | " guess = f'{prev_proof}{proof}{prev_hash}'.encode()\n", 236 | " guess_hash = hashlib.sha256(guess).hexdigest()\n", 237 | " return guess_hash[:4] == \"0000\"" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### Integration of XGBoosted KMeans with Blockchain" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": null, 250 | "metadata": {}, 251 | "outputs": [], 252 | "source": [ 253 | "#Instantiate the Node\n", 254 | "app = Flask(__name__)\n", 255 | "\n", 256 | "#Generate a globally unique address for this node\n", 257 | "node_id = str(uuid4()).replace('-', '')\n", 258 | "\n", 259 | "#Instantiate the Blockchain\n", 260 | "blockchain = Blockchain()\n", 261 | "\n", 262 | "\n", 263 | "@app.route('/mine', methods=['GET'])\n", 264 | "def mine():\n", 265 | " #Run the proof of work algorithm to get the next proof...\n", 266 | " prev_block = blockchain.prev_block\n", 267 | " proof = blockchain.proof_of_work(prev_block)\n", 268 | "\n", 269 | " #Receive a reward for finding the proof.\n", 270 | " #The sender is \"0\" to signify a new transaction.\n", 271 | " blockchain.new_trans(\n", 272 | " sender=\"0\",\n", 273 | " recipient=node_id,\n", 274 | " amount=1,\n", 275 | " )\n", 276 | "\n", 277 | " #Forge the new Block by adding it to the chain\n", 278 | " prev_hash = blockchain.hash(prev_block)\n", 279 | " block = blockchain.new_block(proof, prev_hash)\n", 280 | "\n", 281 | " response = {\n", 282 | " 'message': \"New Block Forged\",\n", 283 | " 'index': block['index'],\n", 284 | " 'transactions': block['transactions'],\n", 285 | " 'proof': block['proof'],\n", 286 | " 'prev_hash': block['prev_hash'],\n", 287 | " }\n", 288 | " return jsonify(response), 200\n", 289 | "\n", 290 | "\n", 291 | "@app.route('/transactions/new', methods=['POST'])\n", 292 | "def new_trans():\n", 293 | " values = request.get_json()\n", 294 | "\n", 295 | " #Check that the required fields are in the POST'ed data\n", 296 | " required = ['sender', 'recipient', 'amount']\n", 297 | " if not all(k in values for k in required):\n", 298 | " return 'Missing values', 400\n", 299 | "\n", 300 | " #Create a new Transaction\n", 301 | " index = blockchain.new_trans(values['sender'], values['recipient'], values['amount'])\n", 302 | "\n", 303 | " response = {'message': f'Transaction will be added to Block {index}'}\n", 304 | " \n", 305 | " #Kmeans clustering is implemented on the newly formed chain\n", 306 | "\n", 307 | "\n", 308 | " #Building the k-means model\n", 309 | "\n", 310 | " kmeans = KMeans(n_clusters=2)\n", 311 | " kmeans.fit(x_train)\n", 312 | " KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,\n", 313 | " n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',\n", 314 | " random_state=None, tol=0.0001, verbose=0)\n", 315 | " correct = 0\n", 316 | " for i in range(len(x_test)):\n", 317 | " predict_me = np.array(test_x[i].astype(float))\n", 318 | " predict_me = predict_me.reshape(-1, len(predict_me))\n", 319 | " prediction = kmeans.predict(predict_me)\n", 320 | " if prediction[0] == y[i]:\n", 321 | " correct += 1\n", 322 | "\n", 323 | " print(correct/len(x_test))\n", 324 | " return jsonify(response), 201\n", 325 | "\n", 326 | " #fit model no training data\n", 327 | " model = XGBClassifier()\n", 328 | "\n", 329 | "@app.route('/chain', methods=['GET'])\n", 330 | "def full_chain():\n", 331 | " response = {\n", 332 | " 'chain': blockchain.chain,\n", 333 | " 'length': len(blockchain.chain),\n", 334 | " }\n", 335 | " return jsonify(response), 200\n", 336 | " \n", 337 | "@app.route('/nodes/register', methods=['POST'])\n", 338 | "def new_nodes():\n", 339 | " values = request.get_json()\n", 340 | "\n", 341 | " nodes = values.get('nodes')\n", 342 | " if nodes is None:\n", 343 | " return \"Error: Please supply a valid list of nodes\", 400\n", 344 | "\n", 345 | " for node in nodes:\n", 346 | " blockchain.new_node(node)\n", 347 | "\n", 348 | " response = {\n", 349 | " 'message': 'New nodes have been added',\n", 350 | " 'total_nodes': list(blockchain.nodes),\n", 351 | " }\n", 352 | " return jsonify(response), 201\n", 353 | "\n", 354 | "\n", 355 | "@app.route('/nodes/resolve', methods=['GET'])\n", 356 | "def consensus():\n", 357 | " replaced = blockchain.conflict_resolution()\n", 358 | "\n", 359 | " if replaced:\n", 360 | " response = {\n", 361 | " 'message': 'Our chain was replaced',\n", 362 | " 'new_chain': blockchain.chain\n", 363 | " }\n", 364 | " else:\n", 365 | " response = {\n", 366 | " 'message': 'Our chain is authoritative',\n", 367 | " 'chain': blockchain.chain\n", 368 | " }\n", 369 | "\n", 370 | " return jsonify(response), 200\n", 371 | "\n", 372 | "\n", 373 | "if __name__ == '__main__':\n", 374 | " from argparse import ArgumentParser\n", 375 | "\n", 376 | " parser = ArgumentParser()\n", 377 | " parser.add_argument('-p', '--port', default=5000, type=int, help='port to listen on')\n", 378 | " args = parser.parse_args()\n", 379 | " port = args.port\n", 380 | "\n", 381 | " app.run(host='0.0.0.0', port=port)" 382 | ] 383 | } 384 | ], 385 | "metadata": { 386 | "kernelspec": { 387 | "display_name": "Python 3", 388 | "language": "python", 389 | "name": "python3" 390 | }, 391 | "language_info": { 392 | "codemirror_mode": { 393 | "name": "ipython", 394 | "version": 3 395 | }, 396 | "file_extension": ".py", 397 | "mimetype": "text/x-python", 398 | "name": "python", 399 | "nbconvert_exporter": "python", 400 | "pygments_lexer": "ipython3", 401 | "version": "3.7.4" 402 | }, 403 | "latex_envs": { 404 | "LaTeX_envs_menu_present": true, 405 | "autoclose": false, 406 | "autocomplete": true, 407 | "bibliofile": "biblio.bib", 408 | "cite_by": "apalike", 409 | "current_citInitial": 1, 410 | "eqLabelWithNumbers": true, 411 | "eqNumInitial": 1, 412 | "hotkeys": { 413 | "equation": "Ctrl-E", 414 | "itemize": "Ctrl-I" 415 | }, 416 | "labels_anchors": false, 417 | "latex_user_defs": false, 418 | "report_style_numbering": false, 419 | "user_envs_cfg": false 420 | } 421 | }, 422 | "nbformat": 4, 423 | "nbformat_minor": 2 424 | } 425 | -------------------------------------------------------------------------------- /Fraud Detection Using KMeans.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Blockchain and Machine Learning for Fraud Detection: Employing Artificial Intelligence in the Banking Sector" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "##
By Vinita Silaparasetty" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "#### Credits\n", 22 | "\n", 23 | "##### Source: http://lisp.vse.cz/pkdd99/berka.html\n", 24 | "\n", 25 | "##### Prepared by: Petr Berka and Marta Sochorova." 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Fraud Detection Using KMeans" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "#### Import Libraries" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 1, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "import numpy as np\n", 49 | "import scipy\n", 50 | "import pandas as pd\n", 51 | "import matplotlib\n", 52 | "%matplotlib inline\n", 53 | "import matplotlib.pyplot as plt\n", 54 | "import seaborn as sns\n", 55 | "import sklearn\n", 56 | "from sklearn.model_selection import train_test_split\n", 57 | "from sklearn.cluster import KMeans\n", 58 | "from sklearn.preprocessing import StandardScaler\n", 59 | "from sklearn.datasets import make_moons\n", 60 | "from sklearn.cluster import SpectralClustering" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "#### Import Data" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 2, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "order= pd.read_csv(\"order.csv\")\n", 77 | "account= pd.read_csv(\"account.csv\")\n", 78 | "transaction= pd.read_csv(\"transaction.csv\")" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "## Process Demo\n", 86 | "\n", 87 | "Demonstration of the working of KMeans clustering with visualization of the output." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "### Order & Account Dataframes" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 8, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "#working with 'order' dataframe and 'account' dataframe\n", 104 | "x = (order['account_id'],order['account_to'],order['amount'])\n", 105 | "y = (account['account_id'],account['district_id'],account['frequency'])" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 9, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "# Splitting the dataset into the Training set and Test set\n", 115 | "X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 10, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "# Feature Scaling\n", 125 | "from sklearn.preprocessing import StandardScaler\n", 126 | "sc_X = StandardScaler()\n", 127 | "X_train = sc_X.fit_transform(X_train)\n", 128 | "X_test = sc_X.transform(X_test)\n", 129 | "sc_y = StandardScaler()\n", 130 | "y_train = sc_y.fit_transform(y_train)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "#### Applying KMeans" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 11, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "kmeans = KMeans(n_clusters = 3, init = 'k-means++', random_state = 42)\n", 147 | "y_kmeans = kmeans.fit_predict(x)" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "#### Visualize the Clusters" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 12, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "image/png": "\n", 165 | "text/plain": [ 166 | "
" 167 | ] 168 | }, 169 | "metadata": { 170 | "needs_background": "light" 171 | }, 172 | "output_type": "display_data" 173 | } 174 | ], 175 | "source": [ 176 | "x, y = make_moons(200, noise=.05, random_state=0)\n", 177 | "labels = KMeans(2, random_state=0).fit_predict(x)\n", 178 | "plt.scatter(x[:, 0], x[:, 1], c=labels,\n", 179 | " s=50, cmap='viridis');" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "###
-------------------- End of Demo --------------------" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "###### I skipped the visualizations for the rest of the analysis procedure as it is not needed for the main program." 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "### Order & Transaction Dataframes" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 13, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "# working with 'order' dataframe and 'transaction' dataframe\n", 210 | "a = (order['account_id'],order['account_to'],order['amount'])\n", 211 | "b = (transaction['account_id'],transaction['balance'],transaction['amount'])" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 14, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "# Splitting the dataset into the Training set and Test set\n", 221 | "X_train, X_test, y_train, y_test = train_test_split(a, b, test_size = 0.2, random_state = 0)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 15, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "# Feature Scaling\n", 231 | "from sklearn.preprocessing import StandardScaler\n", 232 | "sc_X = StandardScaler()\n", 233 | "X_train = sc_X.fit_transform(X_train)\n", 234 | "X_test = sc_X.transform(X_test)\n", 235 | "sc_y = StandardScaler()\n", 236 | "y_train = sc_y.fit_transform(y_train)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 16, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "kmeans = KMeans(n_clusters = 3, init = 'k-means++', random_state = 42)\n", 246 | "y_kmeans = kmeans.fit_predict(a)" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "### Account & Transaction Dataframes" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 19, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [ 262 | "# working with 'account' dataframe and 'transaction' dataframe\n", 263 | "e = (account['account_id'],account['district_id'])\n", 264 | "f = (transaction['account_id'],transaction['amount'])" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 20, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [ 273 | "# Splitting the dataset into the Training set and Test set\n", 274 | "X_train, X_test, y_train, y_test = train_test_split(e,f, test_size = 0.2, random_state = 0)" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 21, 280 | "metadata": {}, 281 | "outputs": [], 282 | "source": [ 283 | "# Feature Scaling\n", 284 | "from sklearn.preprocessing import StandardScaler\n", 285 | "sc_X = StandardScaler()\n", 286 | "X_train = sc_X.fit_transform(X_train)\n", 287 | "X_test = sc_X.transform(X_test)\n", 288 | "sc_y = StandardScaler()\n", 289 | "y_train = sc_y.fit_transform(y_train)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 23, 295 | "metadata": {}, 296 | "outputs": [], 297 | "source": [ 298 | "kmeans = KMeans(n_clusters = 2, init = 'k-means++', random_state = 42)\n", 299 | "y_kmeans = kmeans.fit_predict(e)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "##### Note: \n", 307 | "The dataset consisists of several csv files, but I have used only three of the files they serve the pupose of my research. the files I used are:\n", 308 | "\n", 309 | "1) order.csv\n", 310 | "\n", 311 | "2) account.csv\n", 312 | "\n", 313 | "3) transaction.csv\n", 314 | " " 315 | ] 316 | } 317 | ], 318 | "metadata": { 319 | "kernelspec": { 320 | "display_name": "Python 3", 321 | "language": "python", 322 | "name": "python3" 323 | }, 324 | "language_info": { 325 | "codemirror_mode": { 326 | "name": "ipython", 327 | "version": 3 328 | }, 329 | "file_extension": ".py", 330 | "mimetype": "text/x-python", 331 | "name": "python", 332 | "nbconvert_exporter": "python", 333 | "pygments_lexer": "ipython3", 334 | "version": "3.7.3" 335 | } 336 | }, 337 | "nbformat": 4, 338 | "nbformat_minor": 2 339 | } 340 | --------------------------------------------------------------------------------