├── README.md
├── fraudml.JPG
├── LICENSE
├── Blockchain and XGBoosted K-means for Fraud Detection.ipynb
└── Fraud Detection Using KMeans.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Credit-card-fraud-detection-using-blockchain-and-ml
2 |
--------------------------------------------------------------------------------
/fraudml.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PrudhviGNV/Credit-card-fraud-detection-using-blockchain-and-ml/main/fraudml.JPG
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU LESSER GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 |
9 | This version of the GNU Lesser General Public License incorporates
10 | the terms and conditions of version 3 of the GNU General Public
11 | License, supplemented by the additional permissions listed below.
12 |
13 | 0. Additional Definitions.
14 |
15 | As used herein, "this License" refers to version 3 of the GNU Lesser
16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU
17 | General Public License.
18 |
19 | "The Library" refers to a covered work governed by this License,
20 | other than an Application or a Combined Work as defined below.
21 |
22 | An "Application" is any work that makes use of an interface provided
23 | by the Library, but which is not otherwise based on the Library.
24 | Defining a subclass of a class defined by the Library is deemed a mode
25 | of using an interface provided by the Library.
26 |
27 | A "Combined Work" is a work produced by combining or linking an
28 | Application with the Library. The particular version of the Library
29 | with which the Combined Work was made is also called the "Linked
30 | Version".
31 |
32 | The "Minimal Corresponding Source" for a Combined Work means the
33 | Corresponding Source for the Combined Work, excluding any source code
34 | for portions of the Combined Work that, considered in isolation, are
35 | based on the Application, and not on the Linked Version.
36 |
37 | The "Corresponding Application Code" for a Combined Work means the
38 | object code and/or source code for the Application, including any data
39 | and utility programs needed for reproducing the Combined Work from the
40 | Application, but excluding the System Libraries of the Combined Work.
41 |
42 | 1. Exception to Section 3 of the GNU GPL.
43 |
44 | You may convey a covered work under sections 3 and 4 of this License
45 | without being bound by section 3 of the GNU GPL.
46 |
47 | 2. Conveying Modified Versions.
48 |
49 | If you modify a copy of the Library, and, in your modifications, a
50 | facility refers to a function or data to be supplied by an Application
51 | that uses the facility (other than as an argument passed when the
52 | facility is invoked), then you may convey a copy of the modified
53 | version:
54 |
55 | a) under this License, provided that you make a good faith effort to
56 | ensure that, in the event an Application does not supply the
57 | function or data, the facility still operates, and performs
58 | whatever part of its purpose remains meaningful, or
59 |
60 | b) under the GNU GPL, with none of the additional permissions of
61 | this License applicable to that copy.
62 |
63 | 3. Object Code Incorporating Material from Library Header Files.
64 |
65 | The object code form of an Application may incorporate material from
66 | a header file that is part of the Library. You may convey such object
67 | code under terms of your choice, provided that, if the incorporated
68 | material is not limited to numerical parameters, data structure
69 | layouts and accessors, or small macros, inline functions and templates
70 | (ten or fewer lines in length), you do both of the following:
71 |
72 | a) Give prominent notice with each copy of the object code that the
73 | Library is used in it and that the Library and its use are
74 | covered by this License.
75 |
76 | b) Accompany the object code with a copy of the GNU GPL and this license
77 | document.
78 |
79 | 4. Combined Works.
80 |
81 | You may convey a Combined Work under terms of your choice that,
82 | taken together, effectively do not restrict modification of the
83 | portions of the Library contained in the Combined Work and reverse
84 | engineering for debugging such modifications, if you also do each of
85 | the following:
86 |
87 | a) Give prominent notice with each copy of the Combined Work that
88 | the Library is used in it and that the Library and its use are
89 | covered by this License.
90 |
91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license
92 | document.
93 |
94 | c) For a Combined Work that displays copyright notices during
95 | execution, include the copyright notice for the Library among
96 | these notices, as well as a reference directing the user to the
97 | copies of the GNU GPL and this license document.
98 |
99 | d) Do one of the following:
100 |
101 | 0) Convey the Minimal Corresponding Source under the terms of this
102 | License, and the Corresponding Application Code in a form
103 | suitable for, and under terms that permit, the user to
104 | recombine or relink the Application with a modified version of
105 | the Linked Version to produce a modified Combined Work, in the
106 | manner specified by section 6 of the GNU GPL for conveying
107 | Corresponding Source.
108 |
109 | 1) Use a suitable shared library mechanism for linking with the
110 | Library. A suitable mechanism is one that (a) uses at run time
111 | a copy of the Library already present on the user's computer
112 | system, and (b) will operate properly with a modified version
113 | of the Library that is interface-compatible with the Linked
114 | Version.
115 |
116 | e) Provide Installation Information, but only if you would otherwise
117 | be required to provide such information under section 6 of the
118 | GNU GPL, and only to the extent that such information is
119 | necessary to install and execute a modified version of the
120 | Combined Work produced by recombining or relinking the
121 | Application with a modified version of the Linked Version. (If
122 | you use option 4d0, the Installation Information must accompany
123 | the Minimal Corresponding Source and Corresponding Application
124 | Code. If you use option 4d1, you must provide the Installation
125 | Information in the manner specified by section 6 of the GNU GPL
126 | for conveying Corresponding Source.)
127 |
128 | 5. Combined Libraries.
129 |
130 | You may place library facilities that are a work based on the
131 | Library side by side in a single library together with other library
132 | facilities that are not Applications and are not covered by this
133 | License, and convey such a combined library under terms of your
134 | choice, if you do both of the following:
135 |
136 | a) Accompany the combined library with a copy of the same work based
137 | on the Library, uncombined with any other library facilities,
138 | conveyed under the terms of this License.
139 |
140 | b) Give prominent notice with the combined library that part of it
141 | is a work based on the Library, and explaining where to find the
142 | accompanying uncombined form of the same work.
143 |
144 | 6. Revised Versions of the GNU Lesser General Public License.
145 |
146 | The Free Software Foundation may publish revised and/or new versions
147 | of the GNU Lesser General Public License from time to time. Such new
148 | versions will be similar in spirit to the present version, but may
149 | differ in detail to address new problems or concerns.
150 |
151 | Each version is given a distinguishing version number. If the
152 | Library as you received it specifies that a certain numbered version
153 | of the GNU Lesser General Public License "or any later version"
154 | applies to it, you have the option of following the terms and
155 | conditions either of that published version or of any later version
156 | published by the Free Software Foundation. If the Library as you
157 | received it does not specify a version number of the GNU Lesser
158 | General Public License, you may choose any version of the GNU Lesser
159 | General Public License ever published by the Free Software Foundation.
160 |
161 | If the Library as you received it specifies that a proxy can decide
162 | whether future versions of the GNU Lesser General Public License shall
163 | apply, that proxy's public statement of acceptance of any version is
164 | permanent authorization for you to choose that version for the
165 | Library.
166 |
--------------------------------------------------------------------------------
/Blockchain and XGBoosted K-means for Fraud Detection.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "#
Blockchain and Machine Learning for Fraud Detection: Employing Artificial Intelligence in the Banking Sector"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "### Blockchain Integrated with an XGBoosted K-means Model"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "#### Import Libraries & Packages"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": null,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "import pandas as pd\n",
31 | "import pandas.testing as tm\n",
32 | "import numpy as np\n",
33 | "from numpy import loadtxt\n",
34 | "from sklearn.cluster import KMeans\n",
35 | "from sklearn.preprocessing import LabelEncoder\n",
36 | "from sklearn.preprocessing import MinMaxScaler\n",
37 | "import xgboost\n",
38 | "from xgboost import XGBClassifier\n",
39 | "import hashlib\n",
40 | "import json\n",
41 | "from time import time\n",
42 | "from urllib.parse import urlparse\n",
43 | "from uuid import uuid4\n",
44 | "import requests\n",
45 | "from flask import Flask, jsonify, request"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "### Data Wrangling"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 7,
58 | "metadata": {},
59 | "outputs": [],
60 | "source": [
61 | "#concatenate data into a single data frame\n",
62 | "\n",
63 | "account= pd.read_csv(\"account.csv\")\n",
64 | "order= pd.read_csv(\"order.csv\")\n",
65 | "transaction= pd.read_csv(\"transaction.csv\")\n",
66 | "\n",
67 | "X= pd.concat([account,order,transaction], axis=0)\n",
68 | "\n",
69 | "#dividing the data into train and test sets for the k-means model\n",
70 | "\n",
71 | "X_new= X.copy() #create a copy of your data \n",
72 | "\n",
73 | "x_train = X_new.sample(frac=0.40, random_state=0)\n",
74 | "x_test = X_new.drop(x_train.index)"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "### Blockchain"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "metadata": {},
88 | "outputs": [],
89 | "source": [
90 | "#Create a class to store the block chain\n",
91 | "\n",
92 | "class Blockchain:\n",
93 | " def __init__(self):\n",
94 | " self.current_trans = []\n",
95 | " self.chain = []\n",
96 | " self.nodes = set()\n",
97 | "\n",
98 | " #Create the genesis block\n",
99 | " self.new_block(prev_hash='1', proof=100)\n",
100 | "\n",
101 | " def new_node(self, address):\n",
102 | " \"\"\"\n",
103 | " Add a new node. View the node here:'http://192.168.0.5:5000'\n",
104 | " \"\"\"\n",
105 | "\n",
106 | " parsed_url = urlparse(address)\n",
107 | " if parsed_url.netloc:\n",
108 | " self.nodes.add(parsed_url.netloc)\n",
109 | " elif parsed_url.path:\n",
110 | " self.nodes.add(parsed_url.path)\n",
111 | " else:\n",
112 | " raise ValueError('Invalid URL. Please try again.')\n",
113 | "\n",
114 | "\n",
115 | " def valid_chain(self, chain):\n",
116 | " \"\"\"\n",
117 | " Determine if blockchain is valid.\n",
118 | " \"\"\"\n",
119 | "\n",
120 | " prev_block = chain[0]\n",
121 | " current_index = 1\n",
122 | "\n",
123 | " while current_index < len(chain):\n",
124 | " block = chain[current_index]\n",
125 | " print(f'{prev_block}')\n",
126 | " print(f'{block}')\n",
127 | " print(\"\\n-----------\\n\")\n",
128 | " #Check that the hash of the block is correct\n",
129 | " prev_block_hash = self.hash(prev_block)\n",
130 | " if block['prev_hash'] != prev_block_hash:\n",
131 | " return False\n",
132 | "\n",
133 | " #Check that the Proof of Work is correct\n",
134 | " if not self.valid_proof(prev_block['proof'], block['proof'], prev_block_hash):\n",
135 | " return False\n",
136 | "\n",
137 | " prev_block = block\n",
138 | " current_index += 1\n",
139 | "\n",
140 | " return True\n",
141 | "\n",
142 | " def conflict_resolution(self):\n",
143 | " \"\"\"\n",
144 | " Resolves conflicts by replacing current chain with the longest one in the network.\n",
145 | " \"\"\"\n",
146 | "\n",
147 | " neighbours = self.nodes\n",
148 | " new_chain = None\n",
149 | "\n",
150 | " #Identifying long chains\n",
151 | " max_length = len(self.chain)\n",
152 | "\n",
153 | " #Grab and verify the chains from all the nodes in the network\n",
154 | " for node in neighbours:\n",
155 | " response = requests.get(f'http://{node}/chain')\n",
156 | "\n",
157 | " if response.status_code == 200:\n",
158 | " length = response.json()['length']\n",
159 | " chain = response.json()['chain']\n",
160 | "\n",
161 | " #Check if the length is longer and the chain is valid\n",
162 | " if length > max_length and self.valid_chain(chain):\n",
163 | " max_length = length\n",
164 | " new_chain = chain\n",
165 | "\n",
166 | " #Replace chain if a valid longer chain is discovered\n",
167 | " if new_chain:\n",
168 | " self.chain = new_chain\n",
169 | " return True\n",
170 | "\n",
171 | " return False\n",
172 | "\n",
173 | " def new_block(self, proof, prev_hash):\n",
174 | "\n",
175 | " block = {\n",
176 | " 'index': len(self.chain) + 1,\n",
177 | " 'timestamp': time(),\n",
178 | " 'transactions': self.current_trans,\n",
179 | " 'proof': proof,\n",
180 | " 'prev_hash': prev_hash or self.hash(self.chain[-1]),\n",
181 | " }\n",
182 | "\n",
183 | " #Reset the current list of transactions\n",
184 | " self.current_trans = []\n",
185 | "\n",
186 | " self.chain.append(block)\n",
187 | " return block\n",
188 | "\n",
189 | " def new_trans(self, sender, recipient, amount):\n",
190 | " \"\"\"\n",
191 | " Creates a new transaction to go into the next mined Block.\n",
192 | " \"\"\"\n",
193 | " self.current_trans.append({\n",
194 | " 'sender': sender,\n",
195 | " 'recipient': recipient,\n",
196 | " 'amount': amount,\n",
197 | " })\n",
198 | "\n",
199 | " return self.prev_block['index'] + 1\n",
200 | "\n",
201 | " @property\n",
202 | " def prev_block(self):\n",
203 | " return self.chain[-1]\n",
204 | "\n",
205 | " @staticmethod\n",
206 | " def hash(block):\n",
207 | " \"\"\"\n",
208 | " SHA-256 encryption\n",
209 | " \"\"\"\n",
210 | "\n",
211 | " #Ensure that dictionary is ordered, to avoid inconsistent hashes.\n",
212 | " block_str = json.dumps(block, sort_keys=True).encode()\n",
213 | " return hashlib.sha256(block_str).hexdigest()\n",
214 | "\n",
215 | " def proof_of_work(self, prev_block):\n",
216 | " \n",
217 | " #Proof of Work Algorithm:\n",
218 | " #- Find a number p' such that hash(pp') contains leading 4 zeroes\n",
219 | " #- Where p is the previous proof, and p' is the new proof\n",
220 | "\n",
221 | " prev_proof = prev_block['proof']\n",
222 | " prev_hash = self.hash(prev_block)\n",
223 | "\n",
224 | " proof = 0\n",
225 | " while self.valid_proof(prev_proof, proof, prev_hash) is False:\n",
226 | " proof += 1\n",
227 | "\n",
228 | " return proof\n",
229 | "\n",
230 | " @staticmethod\n",
231 | " def valid_proof(prev_proof, proof, prev_hash):\n",
232 | "\n",
233 | " #Validates Proof\n",
234 | "\n",
235 | " guess = f'{prev_proof}{proof}{prev_hash}'.encode()\n",
236 | " guess_hash = hashlib.sha256(guess).hexdigest()\n",
237 | " return guess_hash[:4] == \"0000\""
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "metadata": {},
243 | "source": [
244 | "### Integration of XGBoosted KMeans with Blockchain"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": null,
250 | "metadata": {},
251 | "outputs": [],
252 | "source": [
253 | "#Instantiate the Node\n",
254 | "app = Flask(__name__)\n",
255 | "\n",
256 | "#Generate a globally unique address for this node\n",
257 | "node_id = str(uuid4()).replace('-', '')\n",
258 | "\n",
259 | "#Instantiate the Blockchain\n",
260 | "blockchain = Blockchain()\n",
261 | "\n",
262 | "\n",
263 | "@app.route('/mine', methods=['GET'])\n",
264 | "def mine():\n",
265 | " #Run the proof of work algorithm to get the next proof...\n",
266 | " prev_block = blockchain.prev_block\n",
267 | " proof = blockchain.proof_of_work(prev_block)\n",
268 | "\n",
269 | " #Receive a reward for finding the proof.\n",
270 | " #The sender is \"0\" to signify a new transaction.\n",
271 | " blockchain.new_trans(\n",
272 | " sender=\"0\",\n",
273 | " recipient=node_id,\n",
274 | " amount=1,\n",
275 | " )\n",
276 | "\n",
277 | " #Forge the new Block by adding it to the chain\n",
278 | " prev_hash = blockchain.hash(prev_block)\n",
279 | " block = blockchain.new_block(proof, prev_hash)\n",
280 | "\n",
281 | " response = {\n",
282 | " 'message': \"New Block Forged\",\n",
283 | " 'index': block['index'],\n",
284 | " 'transactions': block['transactions'],\n",
285 | " 'proof': block['proof'],\n",
286 | " 'prev_hash': block['prev_hash'],\n",
287 | " }\n",
288 | " return jsonify(response), 200\n",
289 | "\n",
290 | "\n",
291 | "@app.route('/transactions/new', methods=['POST'])\n",
292 | "def new_trans():\n",
293 | " values = request.get_json()\n",
294 | "\n",
295 | " #Check that the required fields are in the POST'ed data\n",
296 | " required = ['sender', 'recipient', 'amount']\n",
297 | " if not all(k in values for k in required):\n",
298 | " return 'Missing values', 400\n",
299 | "\n",
300 | " #Create a new Transaction\n",
301 | " index = blockchain.new_trans(values['sender'], values['recipient'], values['amount'])\n",
302 | "\n",
303 | " response = {'message': f'Transaction will be added to Block {index}'}\n",
304 | " \n",
305 | " #Kmeans clustering is implemented on the newly formed chain\n",
306 | "\n",
307 | "\n",
308 | " #Building the k-means model\n",
309 | "\n",
310 | " kmeans = KMeans(n_clusters=2)\n",
311 | " kmeans.fit(x_train)\n",
312 | " KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,\n",
313 | " n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',\n",
314 | " random_state=None, tol=0.0001, verbose=0)\n",
315 | " correct = 0\n",
316 | " for i in range(len(x_test)):\n",
317 | " predict_me = np.array(test_x[i].astype(float))\n",
318 | " predict_me = predict_me.reshape(-1, len(predict_me))\n",
319 | " prediction = kmeans.predict(predict_me)\n",
320 | " if prediction[0] == y[i]:\n",
321 | " correct += 1\n",
322 | "\n",
323 | " print(correct/len(x_test))\n",
324 | " return jsonify(response), 201\n",
325 | "\n",
326 | " #fit model no training data\n",
327 | " model = XGBClassifier()\n",
328 | "\n",
329 | "@app.route('/chain', methods=['GET'])\n",
330 | "def full_chain():\n",
331 | " response = {\n",
332 | " 'chain': blockchain.chain,\n",
333 | " 'length': len(blockchain.chain),\n",
334 | " }\n",
335 | " return jsonify(response), 200\n",
336 | " \n",
337 | "@app.route('/nodes/register', methods=['POST'])\n",
338 | "def new_nodes():\n",
339 | " values = request.get_json()\n",
340 | "\n",
341 | " nodes = values.get('nodes')\n",
342 | " if nodes is None:\n",
343 | " return \"Error: Please supply a valid list of nodes\", 400\n",
344 | "\n",
345 | " for node in nodes:\n",
346 | " blockchain.new_node(node)\n",
347 | "\n",
348 | " response = {\n",
349 | " 'message': 'New nodes have been added',\n",
350 | " 'total_nodes': list(blockchain.nodes),\n",
351 | " }\n",
352 | " return jsonify(response), 201\n",
353 | "\n",
354 | "\n",
355 | "@app.route('/nodes/resolve', methods=['GET'])\n",
356 | "def consensus():\n",
357 | " replaced = blockchain.conflict_resolution()\n",
358 | "\n",
359 | " if replaced:\n",
360 | " response = {\n",
361 | " 'message': 'Our chain was replaced',\n",
362 | " 'new_chain': blockchain.chain\n",
363 | " }\n",
364 | " else:\n",
365 | " response = {\n",
366 | " 'message': 'Our chain is authoritative',\n",
367 | " 'chain': blockchain.chain\n",
368 | " }\n",
369 | "\n",
370 | " return jsonify(response), 200\n",
371 | "\n",
372 | "\n",
373 | "if __name__ == '__main__':\n",
374 | " from argparse import ArgumentParser\n",
375 | "\n",
376 | " parser = ArgumentParser()\n",
377 | " parser.add_argument('-p', '--port', default=5000, type=int, help='port to listen on')\n",
378 | " args = parser.parse_args()\n",
379 | " port = args.port\n",
380 | "\n",
381 | " app.run(host='0.0.0.0', port=port)"
382 | ]
383 | }
384 | ],
385 | "metadata": {
386 | "kernelspec": {
387 | "display_name": "Python 3",
388 | "language": "python",
389 | "name": "python3"
390 | },
391 | "language_info": {
392 | "codemirror_mode": {
393 | "name": "ipython",
394 | "version": 3
395 | },
396 | "file_extension": ".py",
397 | "mimetype": "text/x-python",
398 | "name": "python",
399 | "nbconvert_exporter": "python",
400 | "pygments_lexer": "ipython3",
401 | "version": "3.7.4"
402 | },
403 | "latex_envs": {
404 | "LaTeX_envs_menu_present": true,
405 | "autoclose": false,
406 | "autocomplete": true,
407 | "bibliofile": "biblio.bib",
408 | "cite_by": "apalike",
409 | "current_citInitial": 1,
410 | "eqLabelWithNumbers": true,
411 | "eqNumInitial": 1,
412 | "hotkeys": {
413 | "equation": "Ctrl-E",
414 | "itemize": "Ctrl-I"
415 | },
416 | "labels_anchors": false,
417 | "latex_user_defs": false,
418 | "report_style_numbering": false,
419 | "user_envs_cfg": false
420 | }
421 | },
422 | "nbformat": 4,
423 | "nbformat_minor": 2
424 | }
425 |
--------------------------------------------------------------------------------
/Fraud Detection Using KMeans.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Blockchain and Machine Learning for Fraud Detection: Employing Artificial Intelligence in the Banking Sector"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## By Vinita Silaparasetty"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "#### Credits\n",
22 | "\n",
23 | "##### Source: http://lisp.vse.cz/pkdd99/berka.html\n",
24 | "\n",
25 | "##### Prepared by: Petr Berka and Marta Sochorova."
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## Fraud Detection Using KMeans"
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {},
38 | "source": [
39 | "#### Import Libraries"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 1,
45 | "metadata": {},
46 | "outputs": [],
47 | "source": [
48 | "import numpy as np\n",
49 | "import scipy\n",
50 | "import pandas as pd\n",
51 | "import matplotlib\n",
52 | "%matplotlib inline\n",
53 | "import matplotlib.pyplot as plt\n",
54 | "import seaborn as sns\n",
55 | "import sklearn\n",
56 | "from sklearn.model_selection import train_test_split\n",
57 | "from sklearn.cluster import KMeans\n",
58 | "from sklearn.preprocessing import StandardScaler\n",
59 | "from sklearn.datasets import make_moons\n",
60 | "from sklearn.cluster import SpectralClustering"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "#### Import Data"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 2,
73 | "metadata": {},
74 | "outputs": [],
75 | "source": [
76 | "order= pd.read_csv(\"order.csv\")\n",
77 | "account= pd.read_csv(\"account.csv\")\n",
78 | "transaction= pd.read_csv(\"transaction.csv\")"
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "## Process Demo\n",
86 | "\n",
87 | "Demonstration of the working of KMeans clustering with visualization of the output."
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "### Order & Account Dataframes"
95 | ]
96 | },
97 | {
98 | "cell_type": "code",
99 | "execution_count": 8,
100 | "metadata": {},
101 | "outputs": [],
102 | "source": [
103 | "#working with 'order' dataframe and 'account' dataframe\n",
104 | "x = (order['account_id'],order['account_to'],order['amount'])\n",
105 | "y = (account['account_id'],account['district_id'],account['frequency'])"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": 9,
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "# Splitting the dataset into the Training set and Test set\n",
115 | "X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 10,
121 | "metadata": {},
122 | "outputs": [],
123 | "source": [
124 | "# Feature Scaling\n",
125 | "from sklearn.preprocessing import StandardScaler\n",
126 | "sc_X = StandardScaler()\n",
127 | "X_train = sc_X.fit_transform(X_train)\n",
128 | "X_test = sc_X.transform(X_test)\n",
129 | "sc_y = StandardScaler()\n",
130 | "y_train = sc_y.fit_transform(y_train)"
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "#### Applying KMeans"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": 11,
143 | "metadata": {},
144 | "outputs": [],
145 | "source": [
146 | "kmeans = KMeans(n_clusters = 3, init = 'k-means++', random_state = 42)\n",
147 | "y_kmeans = kmeans.fit_predict(x)"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {},
153 | "source": [
154 | "#### Visualize the Clusters"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 12,
160 | "metadata": {},
161 | "outputs": [
162 | {
163 | "data": {
164 | "image/png": "\n",
165 | "text/plain": [
166 | ""
167 | ]
168 | },
169 | "metadata": {
170 | "needs_background": "light"
171 | },
172 | "output_type": "display_data"
173 | }
174 | ],
175 | "source": [
176 | "x, y = make_moons(200, noise=.05, random_state=0)\n",
177 | "labels = KMeans(2, random_state=0).fit_predict(x)\n",
178 | "plt.scatter(x[:, 0], x[:, 1], c=labels,\n",
179 | " s=50, cmap='viridis');"
180 | ]
181 | },
182 | {
183 | "cell_type": "markdown",
184 | "metadata": {},
185 | "source": [
186 | "### -------------------- End of Demo --------------------"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "###### I skipped the visualizations for the rest of the analysis procedure as it is not needed for the main program."
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "metadata": {},
199 | "source": [
200 | "### Order & Transaction Dataframes"
201 | ]
202 | },
203 | {
204 | "cell_type": "code",
205 | "execution_count": 13,
206 | "metadata": {},
207 | "outputs": [],
208 | "source": [
209 | "# working with 'order' dataframe and 'transaction' dataframe\n",
210 | "a = (order['account_id'],order['account_to'],order['amount'])\n",
211 | "b = (transaction['account_id'],transaction['balance'],transaction['amount'])"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 14,
217 | "metadata": {},
218 | "outputs": [],
219 | "source": [
220 | "# Splitting the dataset into the Training set and Test set\n",
221 | "X_train, X_test, y_train, y_test = train_test_split(a, b, test_size = 0.2, random_state = 0)"
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": 15,
227 | "metadata": {},
228 | "outputs": [],
229 | "source": [
230 | "# Feature Scaling\n",
231 | "from sklearn.preprocessing import StandardScaler\n",
232 | "sc_X = StandardScaler()\n",
233 | "X_train = sc_X.fit_transform(X_train)\n",
234 | "X_test = sc_X.transform(X_test)\n",
235 | "sc_y = StandardScaler()\n",
236 | "y_train = sc_y.fit_transform(y_train)"
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": 16,
242 | "metadata": {},
243 | "outputs": [],
244 | "source": [
245 | "kmeans = KMeans(n_clusters = 3, init = 'k-means++', random_state = 42)\n",
246 | "y_kmeans = kmeans.fit_predict(a)"
247 | ]
248 | },
249 | {
250 | "cell_type": "markdown",
251 | "metadata": {},
252 | "source": [
253 | "### Account & Transaction Dataframes"
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": 19,
259 | "metadata": {},
260 | "outputs": [],
261 | "source": [
262 | "# working with 'account' dataframe and 'transaction' dataframe\n",
263 | "e = (account['account_id'],account['district_id'])\n",
264 | "f = (transaction['account_id'],transaction['amount'])"
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "execution_count": 20,
270 | "metadata": {},
271 | "outputs": [],
272 | "source": [
273 | "# Splitting the dataset into the Training set and Test set\n",
274 | "X_train, X_test, y_train, y_test = train_test_split(e,f, test_size = 0.2, random_state = 0)"
275 | ]
276 | },
277 | {
278 | "cell_type": "code",
279 | "execution_count": 21,
280 | "metadata": {},
281 | "outputs": [],
282 | "source": [
283 | "# Feature Scaling\n",
284 | "from sklearn.preprocessing import StandardScaler\n",
285 | "sc_X = StandardScaler()\n",
286 | "X_train = sc_X.fit_transform(X_train)\n",
287 | "X_test = sc_X.transform(X_test)\n",
288 | "sc_y = StandardScaler()\n",
289 | "y_train = sc_y.fit_transform(y_train)"
290 | ]
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": 23,
295 | "metadata": {},
296 | "outputs": [],
297 | "source": [
298 | "kmeans = KMeans(n_clusters = 2, init = 'k-means++', random_state = 42)\n",
299 | "y_kmeans = kmeans.fit_predict(e)"
300 | ]
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "metadata": {},
305 | "source": [
306 | "##### Note: \n",
307 | "The dataset consisists of several csv files, but I have used only three of the files they serve the pupose of my research. the files I used are:\n",
308 | "\n",
309 | "1) order.csv\n",
310 | "\n",
311 | "2) account.csv\n",
312 | "\n",
313 | "3) transaction.csv\n",
314 | " "
315 | ]
316 | }
317 | ],
318 | "metadata": {
319 | "kernelspec": {
320 | "display_name": "Python 3",
321 | "language": "python",
322 | "name": "python3"
323 | },
324 | "language_info": {
325 | "codemirror_mode": {
326 | "name": "ipython",
327 | "version": 3
328 | },
329 | "file_extension": ".py",
330 | "mimetype": "text/x-python",
331 | "name": "python",
332 | "nbconvert_exporter": "python",
333 | "pygments_lexer": "ipython3",
334 | "version": "3.7.3"
335 | }
336 | },
337 | "nbformat": 4,
338 | "nbformat_minor": 2
339 | }
340 |
--------------------------------------------------------------------------------