├── LICENSE ├── README.md ├── api ├── api.py └── trained_model.sav ├── minisom.py ├── model.py ├── som_fraud.py └── trained_model.sav /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | [![GitHub pull-requests](https://img.shields.io/github/issues-pr/Naereen/StrapDown.js.svg)](https://GitHub.com/Naereen/StrapDown.js/pull/) 3 | [![Open Source Love svg1](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/) 4 | 5 | # REST_API_for_fraud-detection 6 | A logistic regression model that detects the fraud in online transactions . It can be acced with a REST API . 7 | In the files also inclused an implementation with self organizing maps to better classify fraudulant transactions. 8 | 9 | ## The Trained model 10 | 11 | Logistic regression is one of the most popular machine learning algorithms for binary classification. This is because it is a simple algorithm that performs very well on a wide range of problems. 12 | The logistic regression model takes real-valued inputs and makes a prediction as to the probability of the input belonging to the default class (class 0) 13 | 14 | ## The Data 15 | Three models trained to label anonymized credit card transactions as fraudulent or genuine. Dataset from [kaggle](https://www.kaggle.com/sarathchandra/credit-card-fraud-detection-99-accuracy/comments#144915). 16 | 17 | ## The API acces points 18 | to be able to use the api , you need first install the requirements 19 | ``` 20 | pip install flask 21 | ``` 22 | Then test the application by runing the code : 23 | ``` 24 | python api.py 25 | ``` 26 | the app will be running on local host port 5000 27 | with the endpoints : 28 | ## /api/v0/verify 29 | This endpoint get a JSON object in a post method , that represents the new transaction 30 | ``` 31 | { 32 | "V1" : -1.3598071336738, 33 | "V2" : -0.0727811733098497, 34 | "V3" : 2.53634673796914, 35 | ... 36 | ... 37 | ... 38 | ... 39 | "V25" : 0.128539358273528, 40 | "V26" : -0.189114843888824, 41 | "V27" : 0.133558376740387, 42 | "V28" : -0.0210530534538215, 43 | "Amount" : 149.62 44 | } 45 | ``` 46 | The retuen result looks like this : 47 | ``` 48 | { 49 | "id": 0, 50 | "prediction": "0" 51 | } 52 | ``` 53 | 54 | ## /api/v0/info 55 | this endpoint give you the information you need to know about the API , 56 | the result look like this , you can add as many details as you like 57 | ``` 58 | { 59 | 'Author' : 'IOO', 60 | 'description' : 'A fraud detection model using a kaggle dataset', 61 | } 62 | ``` 63 | 64 | 65 | -------------------------------------------------------------------------------- /api/api.py: -------------------------------------------------------------------------------- 1 | import flask 2 | from flask import request, jsonify 3 | import numpy 4 | import json 5 | import sys 6 | import pandas as pd 7 | from sklearn.decomposition import PCA 8 | from sklearn.preprocessing import StandardScaler 9 | from sklearn.linear_model import LogisticRegression 10 | from sklearn.externals import joblib 11 | app = flask.Flask(__name__) 12 | app.config["DEBUG"] = True 13 | def pred_vect(v): 14 | vect = {} 15 | j = 1 16 | for i in range (0,28): 17 | ind = "V"+str(j) 18 | vect [ind] = v[i] 19 | j = j+1 20 | vect ["Amount"] = v[28] 21 | return pd.DataFrame([vect]) 22 | v = [-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62] 23 | filename = 'trained_model.sav' 24 | #post a json object to see the prediction for it : 25 | @app.route('/foo', methods=['POST']) 26 | def foo(): 27 | print (request) 28 | #return request 29 | return json.dumps(request.json) 30 | @app.route('/postjson', methods = ['POST']) 31 | def postJsonHandler(): 32 | print (request.is_json) 33 | content = request.get_json() 34 | print (content) 35 | return jsonify(content) 36 | @app.route('/', methods=['GET']) 37 | def home(): 38 | return '''

Fraud Detection API : IOO

39 |

A POC for the use of Machine learning to detect Credit Cards Frauds.

''' 40 | 41 | @app.route('/api/v0/verify', methods=['GET','POST']) 42 | def predict_new_transaction(): 43 | vec = pd.DataFrame([request.json]) 44 | loaded_model = joblib.load(filename) 45 | res = str(loaded_model.predict(vec)[0]) 46 | result = [ 47 | {'id': 0, 48 | 'prediction': res} ] 49 | return jsonify(result) 50 | @app.route('/api/v0/test', methods=['GET','POST']) 51 | def predict_test(): 52 | loaded_model = joblib.load(filename) 53 | res = str(loaded_model.predict(pred_vect(v))[0]) 54 | result = [ 55 | {'id': 0, 56 | 'prediction': res} ] 57 | return jsonify(result) 58 | @app.route('/api/v0/info', methods=['GET']) 59 | def info(): 60 | result = [ 61 | { 62 | 'Author' : 'IOO', 63 | 'description' : 'A fraud detection model using a kaggle dataset', 64 | } 65 | ] 66 | return jsonify(result) 67 | app.run() 68 | 69 | -------------------------------------------------------------------------------- /api/trained_model.sav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Z4ck404/REST_API_for_fraud-detection/8a4c3dfd34206c2be88c846450df0986b609c0ad/api/trained_model.sav -------------------------------------------------------------------------------- /minisom.py: -------------------------------------------------------------------------------- 1 | from math import sqrt 2 | 3 | from numpy import (array, unravel_index, nditer, linalg, random, subtract, 4 | power, exp, pi, zeros, arange, outer, meshgrid, dot, 5 | logical_and, mean, std, cov, argsort, linspace, transpose) 6 | from collections import defaultdict, Counter 7 | from warnings import warn 8 | from sys import stdout 9 | from time import time 10 | 11 | # for unit tests 12 | from numpy.testing import assert_almost_equal, assert_array_almost_equal 13 | from numpy.testing import assert_array_equal 14 | import unittest 15 | 16 | """ 17 | Minimalistic implementation of the Self Organizing Maps (SOM). 18 | """ 19 | 20 | 21 | def _incremental_index_verbose(m): 22 | """Yields numbers from 0 to m-1 printing the status on the stdout.""" 23 | progress = f'\r [ {0:{len(str(m))}} / {m} ] {0:3.0f}% ? it/s' 24 | stdout.write(progress) 25 | beginning = time() 26 | for i in range(m): 27 | yield i 28 | it_per_sec = (time() - beginning) / (i+1) 29 | progress = f'\r [ {i+1:{len(str(m))}} / {m} ]' 30 | progress += f' {100*(i+1)/m:3.0f}%' 31 | progress += f' {it_per_sec:4.5f} it/s' 32 | stdout.write(progress) 33 | 34 | 35 | def fast_norm(x): 36 | """Returns norm-2 of a 1-D numpy array. 37 | 38 | * faster than linalg.norm in case of 1-D arrays (numpy 1.9.2rc1). 39 | """ 40 | return sqrt(dot(x, x.T)) 41 | 42 | 43 | def asymptotic_decay(learning_rate, t, max_iter): 44 | """Decay function of the learning process. 45 | Parameters 46 | ---------- 47 | learning_rate : float 48 | current learning rate. 49 | 50 | t : int 51 | current iteration. 52 | 53 | max_iter : int 54 | maximum number of iterations for the training. 55 | """ 56 | return learning_rate / (1+t/(max_iter/2)) 57 | 58 | 59 | class MiniSom(object): 60 | def __init__(self, x, y, input_len, sigma=1.0, learning_rate=0.5, 61 | decay_function=asymptotic_decay, 62 | neighborhood_function='gaussian', random_seed=None): 63 | """Initializes a Self Organizing Maps. 64 | 65 | A rule of thumb to set the size of the grid for a dimensionality 66 | reduction task is that it should contain 5*sqrt(N) neurons 67 | where N is the number of samples in the dataset to analyze. 68 | 69 | E.g. if your dataset has 150 samples, 5*sqrt(150) = 61.23 70 | hence a map 8-by-8 should perform well. 71 | 72 | Parameters 73 | ---------- 74 | x : int 75 | x dimension of the SOM. 76 | 77 | y : int 78 | y dimension of the SOM. 79 | 80 | input_len : int 81 | Number of the elements of the vectors in input. 82 | 83 | sigma : float, optional (default=1.0) 84 | Spread of the neighborhood function, needs to be adequate 85 | to the dimensions of the map. 86 | (at the iteration t we have sigma(t) = sigma / (1 + t/T) 87 | where T is #num_iteration/2) 88 | learning_rate, initial learning rate 89 | (at the iteration t we have 90 | learning_rate(t) = learning_rate / (1 + t/T) 91 | where T is #num_iteration/2) 92 | 93 | decay_function : function (default=None) 94 | Function that reduces learning_rate and sigma at each iteration 95 | the default function is: 96 | learning_rate / (1+t/(max_iterarations/2)) 97 | 98 | A custom decay function will need to to take in input 99 | three parameters in the following order: 100 | 101 | 1. learning rate 102 | 2. current iteration 103 | 3. maximum number of iterations allowed 104 | 105 | 106 | Note that if a lambda function is used to define the decay 107 | MiniSom will not be pickable anymore. 108 | 109 | neighborhood_function : function, optional (default='gaussian') 110 | Function that weights the neighborhood of a position in the map 111 | possible values: 'gaussian', 'mexican_hat', 'bubble' 112 | 113 | random_seed : int, optional (default=None) 114 | Random seed to use. 115 | """ 116 | if sigma >= x or sigma >= y: 117 | warn('Warning: sigma is too high for the dimension of the map.') 118 | 119 | self._random_generator = random.RandomState(random_seed) 120 | 121 | self._learning_rate = learning_rate 122 | self._sigma = sigma 123 | self._input_len = input_len 124 | # random initialization 125 | self._weights = self._random_generator.rand(x, y, input_len)*2-1 126 | 127 | for i in range(x): 128 | for j in range(y): 129 | # normalization 130 | norm = fast_norm(self._weights[i, j]) 131 | self._weights[i, j] = self._weights[i, j] / norm 132 | 133 | self._activation_map = zeros((x, y)) 134 | self._neigx = arange(x) 135 | self._neigy = arange(y) # used to evaluate the neighborhood function 136 | self._decay_function = decay_function 137 | 138 | neig_functions = {'gaussian': self._gaussian, 139 | 'mexican_hat': self._mexican_hat, 140 | 'bubble': self._bubble, 141 | 'triangle': self._triangle} 142 | 143 | if neighborhood_function not in neig_functions: 144 | msg = '%s not supported. Functions available: %s' 145 | raise ValueError(msg % (neighborhood_function, 146 | ', '.join(neig_functions.keys()))) 147 | 148 | if neighborhood_function in ['triangle', 149 | 'bubble'] and divmod(sigma, 1)[1] != 0: 150 | warn('sigma should be an integer when triangle or bubble' + 151 | 'are used as neighborhood function') 152 | 153 | self.neighborhood = neig_functions[neighborhood_function] 154 | 155 | def get_weights(self): 156 | """Returns the weights of the neural network.""" 157 | return self._weights 158 | 159 | def _activate(self, x): 160 | """Updates matrix activation_map, in this matrix 161 | the element i,j is the response of the neuron i,j to x.""" 162 | s = subtract(x, self._weights) # x - w 163 | it = nditer(self._activation_map, flags=['multi_index']) 164 | while not it.finished: 165 | # || x - w || 166 | self._activation_map[it.multi_index] = fast_norm(s[it.multi_index]) 167 | it.iternext() 168 | 169 | def activate(self, x): 170 | """Returns the activation map to x.""" 171 | self._activate(x) 172 | return self._activation_map 173 | 174 | def _gaussian(self, c, sigma): 175 | """Returns a Gaussian centered in c.""" 176 | d = 2*pi*sigma*sigma 177 | ax = exp(-power(self._neigx-c[0], 2)/d) 178 | ay = exp(-power(self._neigy-c[1], 2)/d) 179 | return outer(ax, ay) # the external product gives a matrix 180 | 181 | def _mexican_hat(self, c, sigma): 182 | """Mexican hat centered in c.""" 183 | xx, yy = meshgrid(self._neigx, self._neigy) 184 | p = power(xx-c[0], 2) + power(yy-c[1], 2) 185 | d = 2*pi*sigma*sigma 186 | return exp(-p/d)*(1-2/d*p) 187 | 188 | def _bubble(self, c, sigma): 189 | """Constant function centered in c with spread sigma. 190 | sigma should be an odd value. 191 | """ 192 | ax = logical_and(self._neigx > c[0]-sigma, 193 | self._neigx < c[0]+sigma) 194 | ay = logical_and(self._neigy > c[1]-sigma, 195 | self._neigy < c[1]+sigma) 196 | return outer(ax, ay)*1. 197 | 198 | def _triangle(self, c, sigma): 199 | """Triangular function centered in c with spread sigma.""" 200 | triangle_x = (-abs(c[0] - self._neigx)) + sigma 201 | triangle_y = (-abs(c[1] - self._neigy)) + sigma 202 | triangle_x[triangle_x < 0] = 0. 203 | triangle_y[triangle_y < 0] = 0. 204 | return outer(triangle_x, triangle_y) 205 | 206 | def _check_iteration_number(self, num_iteration): 207 | if num_iteration < 1: 208 | raise ValueError('num_iteration must be > 1') 209 | 210 | def _check_input_len(self, data): 211 | """Checks that the data in input is of the correct shape.""" 212 | data_len = len(data[0]) 213 | if self._input_len != data_len: 214 | msg = 'Received %d features, expected %d.' % (data_len, 215 | self._input_len) 216 | raise ValueError(msg) 217 | 218 | def winner(self, x): 219 | """Computes the coordinates of the winning neuron for the sample x.""" 220 | self._activate(x) 221 | return unravel_index(self._activation_map.argmin(), 222 | self._activation_map.shape) 223 | 224 | def update(self, x, win, t, max_iteration): 225 | """Updates the weights of the neurons. 226 | 227 | Parameters 228 | ---------- 229 | x : np.array 230 | Current pattern to learn. 231 | win : tuple 232 | Position of the winning neuron for x (array or tuple). 233 | t : int 234 | Iteration index 235 | max_iteration : int 236 | Maximum number of training itarations. 237 | """ 238 | eta = self._decay_function(self._learning_rate, t, max_iteration) 239 | # sigma and learning rate decrease with the same rule 240 | sig = self._decay_function(self._sigma, t, max_iteration) 241 | # improves the performances 242 | g = self.neighborhood(win, sig)*eta 243 | it = nditer(g, flags=['multi_index']) 244 | 245 | while not it.finished: 246 | # eta * neighborhood_function * (x-w) 247 | x_w = (x - self._weights[it.multi_index]) 248 | self._weights[it.multi_index] += g[it.multi_index] * x_w 249 | it.iternext() 250 | 251 | def quantization(self, data): 252 | """Assigns a code book (weights vector of the winning neuron) 253 | to each sample in data.""" 254 | self._check_input_len(data) 255 | q = zeros(data.shape) 256 | for i, x in enumerate(data): 257 | q[i] = self._weights[self.winner(x)] 258 | return q 259 | 260 | def random_weights_init(self, data): 261 | """Initializes the weights of the SOM 262 | picking random samples from data.""" 263 | self._check_input_len(data) 264 | it = nditer(self._activation_map, flags=['multi_index']) 265 | while not it.finished: 266 | rand_i = self._random_generator.randint(len(data)) 267 | self._weights[it.multi_index] = data[rand_i] 268 | norm = fast_norm(self._weights[it.multi_index]) 269 | self._weights[it.multi_index] = self._weights[it.multi_index] 270 | it.iternext() 271 | 272 | def pca_weights_init(self, data): 273 | """Initializes the weights to span the first two principal components. 274 | 275 | This initialization doesn't depend on random processes and 276 | makes the training process converge faster. 277 | 278 | It is strongly reccomended to normalize the data before initializing 279 | the weights and use the same normalization for the training data. 280 | """ 281 | if self._input_len == 1: 282 | msg = 'The data needs at least 2 features for pca initialization' 283 | raise ValueError(msg) 284 | self._check_input_len(data) 285 | if len(self._neigx) == 1 or len(self._neigy) == 1: 286 | msg = 'PCA initialization inappropriate:' + \ 287 | 'One of the dimensions of the map is 1.' 288 | warn(msg) 289 | pc_length, pc = linalg.eig(cov(transpose(data))) 290 | pc_order = argsort(pc_length) 291 | for i, c1 in enumerate(linspace(-1, 1, len(self._neigx))): 292 | for j, c2 in enumerate(linspace(-1, 1, len(self._neigy))): 293 | self._weights[i, j] = c1*pc[pc_order[0]] + c2*pc[pc_order[1]] 294 | 295 | def train_random(self, data, num_iteration, verbose=False): 296 | """Trains the SOM picking samples at random from data. 297 | 298 | Parameters 299 | ---------- 300 | data : np.array or list 301 | Data matrix. 302 | 303 | num_iterations : int 304 | Maximum number of iterations (one iteration per sample). 305 | 306 | verbose : bool (default=False) 307 | If True the status of the training 308 | will be printed at each iteration. 309 | """ 310 | self._check_iteration_number(num_iteration) 311 | self._check_input_len(data) 312 | iterations = range(num_iteration) 313 | if verbose: 314 | iterations = _incremental_index_verbose(num_iteration) 315 | 316 | for iteration in iterations: 317 | # pick a random sample 318 | rand_i = self._random_generator.randint(len(data)) 319 | self.update(data[rand_i], self.winner(data[rand_i]), 320 | iteration, num_iteration) 321 | 322 | def train_batch(self, data, num_iteration, verbose=False): 323 | """Trains using all the vectors in data sequentially. 324 | 325 | Parameters 326 | ---------- 327 | data : np.array or list 328 | Data matrix. 329 | 330 | num_iterations : int 331 | Maximum number of iterations (one iteration per sample). 332 | 333 | verbose : bool (default=False) 334 | If True the status of the training 335 | will be printed at each iteration. 336 | """ 337 | self._check_iteration_number(num_iteration) 338 | self._check_input_len(data) 339 | iterations = range(num_iteration) 340 | if verbose: 341 | iterations = _incremental_index_verbose(num_iteration) 342 | 343 | for iteration in iterations: 344 | idx = iteration % (len(data)-1) 345 | self.update(data[idx], self.winner(data[idx]), 346 | iteration, num_iteration) 347 | 348 | def distance_map(self): 349 | """Returns the distance map of the weights. 350 | Each cell is the normalised sum of the distances between 351 | a neuron and its neighbours.""" 352 | um = zeros((self._weights.shape[0], self._weights.shape[1])) 353 | it = nditer(um, flags=['multi_index']) 354 | while not it.finished: 355 | for ii in range(it.multi_index[0]-1, it.multi_index[0]+2): 356 | for jj in range(it.multi_index[1]-1, it.multi_index[1]+2): 357 | if (ii >= 0 and ii < self._weights.shape[0] and 358 | jj >= 0 and jj < self._weights.shape[1]): 359 | w_1 = self._weights[ii, jj, :] 360 | w_2 = self._weights[it.multi_index] 361 | um[it.multi_index] += fast_norm(w_1-w_2) 362 | it.iternext() 363 | um = um/um.max() 364 | return um 365 | 366 | def activation_response(self, data): 367 | """ 368 | Returns a matrix where the element i,j is the number of times 369 | that the neuron i,j have been winner. 370 | """ 371 | self._check_input_len(data) 372 | a = zeros((self._weights.shape[0], self._weights.shape[1])) 373 | for x in data: 374 | a[self.winner(x)] += 1 375 | return a 376 | 377 | def quantization_error(self, data): 378 | """Returns the quantization error computed as the average 379 | distance between each input sample and its best matching unit.""" 380 | self._check_input_len(data) 381 | error = 0 382 | for x in data: 383 | error += fast_norm(x-self._weights[self.winner(x)]) 384 | return error/len(data) 385 | 386 | def win_map(self, data): 387 | """Returns a dictionary wm where wm[(i,j)] is a list 388 | with all the patterns that have been mapped in the position i,j.""" 389 | self._check_input_len(data) 390 | winmap = defaultdict(list) 391 | for x in data: 392 | winmap[self.winner(x)].append(x) 393 | return winmap 394 | 395 | def labels_map(self, data, labels): 396 | """Returns a dictionary wm where wm[(i,j)] is a dictionary 397 | that contains the number of samples from a given label 398 | that have been mapped in position i,j. 399 | 400 | Parameters 401 | ---------- 402 | data : np.array or list 403 | Data matrix. 404 | 405 | label : np.array or list 406 | Labels for each sample in data. 407 | """ 408 | self._check_input_len(data) 409 | winmap = defaultdict(list) 410 | for x, l in zip(data, labels): 411 | winmap[self.winner(x)].append(l) 412 | for position in winmap: 413 | winmap[position] = Counter(winmap[position]) 414 | return winmap 415 | 416 | 417 | class TestMinisom(unittest.TestCase): 418 | def setUp(self): 419 | self.som = MiniSom(5, 5, 1) 420 | for i in range(5): 421 | for j in range(5): 422 | # checking weights normalization 423 | assert_almost_equal(1.0, linalg.norm(self.som._weights[i, j])) 424 | self.som._weights = zeros((5, 5)) # fake weights 425 | self.som._weights[2, 3] = 5.0 426 | self.som._weights[1, 1] = 2.0 427 | 428 | def test_decay_function(self): 429 | assert self.som._decay_function(1., 2., 3.) == 1./(1.+2./(3./2)) 430 | 431 | def test_fast_norm(self): 432 | assert fast_norm(array([1, 3])) == sqrt(1+9) 433 | 434 | def test_check_input_len(self): 435 | with self.assertRaises(ValueError): 436 | self.som.train_batch([[1, 2]], 1) 437 | 438 | with self.assertRaises(ValueError): 439 | self.som.random_weights_init(array([[1, 2]])) 440 | 441 | with self.assertRaises(ValueError): 442 | self.som._check_input_len(array([[1, 2]])) 443 | 444 | self.som._check_input_len(array([[1]])) 445 | self.som._check_input_len([[1]]) 446 | 447 | def test_unavailable_neigh_function(self): 448 | with self.assertRaises(ValueError): 449 | MiniSom(5, 5, 1, neighborhood_function='boooom') 450 | 451 | def test_gaussian(self): 452 | bell = self.som._gaussian((2, 2), 1) 453 | assert bell.max() == 1.0 454 | assert bell.argmax() == 12 # unravel(12) = (2,2) 455 | 456 | def test_mexican_hat(self): 457 | bell = self.som._mexican_hat((2, 2), 1) 458 | assert bell.max() == 1.0 459 | assert bell.argmax() == 12 # unravel(12) = (2,2) 460 | 461 | def test_bubble(self): 462 | bubble = self.som._bubble((2, 2), 1) 463 | assert bubble[2, 2] == 1 464 | assert sum(sum(bubble)) == 1 465 | 466 | def test_triangle(self): 467 | bubble = self.som._triangle((2, 2), 1) 468 | assert bubble[2, 2] == 1 469 | assert sum(sum(bubble)) == 1 470 | 471 | def test_win_map(self): 472 | winners = self.som.win_map([[5.0], [2.0]]) 473 | assert winners[(2, 3)][0] == [5.0] 474 | assert winners[(1, 1)][0] == [2.0] 475 | 476 | def test_labels_map(self): 477 | labels_map = self.som.labels_map([[5.0], [2.0]], ['a', 'b']) 478 | assert labels_map[(2, 3)]['a'] == 1 479 | assert labels_map[(1, 1)]['b'] == 1 480 | 481 | def test_activation_reponse(self): 482 | response = self.som.activation_response([[5.0], [2.0]]) 483 | assert response[2, 3] == 1 484 | assert response[1, 1] == 1 485 | 486 | def test_activate(self): 487 | assert self.som.activate(5.0).argmin() == 13.0 # unravel(13) = (2,3) 488 | 489 | def test_quantization_error(self): 490 | assert self.som.quantization_error([[5], [2]]) == 0.0 491 | assert self.som.quantization_error([[4], [1]]) == 1.0 492 | 493 | def test_quantization(self): 494 | q = self.som.quantization(array([[4], [2]])) 495 | assert q[0] == 5.0 496 | assert q[1] == 2.0 497 | 498 | def test_random_seed(self): 499 | som1 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1) 500 | som2 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1) 501 | # same initialization 502 | assert_array_almost_equal(som1._weights, som2._weights) 503 | data = random.rand(100, 2) 504 | som1 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1) 505 | som1.train_random(data, 10) 506 | som2 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1) 507 | som2.train_random(data, 10) 508 | # same state after training 509 | assert_array_almost_equal(som1._weights, som2._weights) 510 | 511 | def test_train_batch(self): 512 | som = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1) 513 | data = array([[4, 2], [3, 1]]) 514 | q1 = som.quantization_error(data) 515 | som.train_batch(data, 10) 516 | assert q1 > som.quantization_error(data) 517 | 518 | data = array([[1, 5], [6, 7]]) 519 | q1 = som.quantization_error(data) 520 | som.train_batch(data, 10, verbose=True) 521 | assert q1 > som.quantization_error(data) 522 | 523 | def test_train_random(self): 524 | som = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1) 525 | data = array([[4, 2], [3, 1]]) 526 | q1 = som.quantization_error(data) 527 | som.train_random(data, 10) 528 | assert q1 > som.quantization_error(data) 529 | 530 | data = array([[1, 5], [6, 7]]) 531 | q1 = som.quantization_error(data) 532 | som.train_random(data, 10, verbose=True) 533 | assert q1 > som.quantization_error(data) 534 | 535 | def test_random_weights_init(self): 536 | som = MiniSom(2, 2, 2, random_seed=1) 537 | som.random_weights_init(array([[1.0, .0]])) 538 | for w in som._weights: 539 | assert_array_equal(w[0], array([1.0, .0])) 540 | 541 | def test_pca_weights_init(self): 542 | som = MiniSom(2, 2, 2) 543 | som.pca_weights_init(array([[1., 0.], [0., 1.], [1., 0.], [0., 1.]])) 544 | expected = array([[[0., -1.41421356], [1.41421356, 0.]], 545 | [[-1.41421356, 0.], [0., 1.41421356]]]) 546 | assert_array_almost_equal(som._weights, expected) 547 | 548 | def test_distance_map(self): 549 | som = MiniSom(2, 2, 2, random_seed=1) 550 | som._weights = array([[[1., 0.], [0., 1.]], [[1., 0.], [0., 1.]]]) 551 | assert_array_equal(som.distance_map(), array([[1., 1.], [1., 1.]])) 552 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | import numpy 2 | import sys 3 | import pandas as pd 4 | from sklearn.decomposition import PCA 5 | from sklearn.preprocessing import StandardScaler 6 | from sklearn.linear_model import LogisticRegression 7 | from sklearn.externals import joblib 8 | #transform the vector to 1d vectors 9 | def transorm (y): 10 | tab =[] 11 | for i in y : 12 | tab = tab +[i[0]] 13 | return tab 14 | filename = 'trained_model.sav' 15 | data = pd.read_csv("creditcard.csv") 16 | def train_the_model (filename,data) : 17 | logistic_reg_model = LogisticRegression(solver = 'lbfgs') 18 | features = list(data.columns.values)[1:30] 19 | x = data.loc[:, features].values 20 | y = transorm(data.loc[:,['Class']].values) 21 | logistic_reg_model.fit(x, y) 22 | joblib.dump(logistic_reg_model, filename) 23 | #Prediction the new transaction 24 | def predict_new_transaction(vector_transaction): 25 | loaded_model = joblib.load(filename) 26 | return loaded_model.predict(vector_transaction) 27 | def pred_vect(v): 28 | vect = {} 29 | j = 1 30 | for i in range (0,28): 31 | ind = "V"+str(j) 32 | vect [ind] = v[i] 33 | j = j+1 34 | vect ["Amount"] = v[28] 35 | return pd.DataFrame([vect]) 36 | v = [-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62] 37 | print (predict_new_transaction(pred_vect(v))) 38 | #print (v[28]) -------------------------------------------------------------------------------- /som_fraud.py: -------------------------------------------------------------------------------- 1 | # Self Organizing Map 2 | 3 | # Importing the libraries 4 | import numpy as np 5 | import pandas as pd 6 | 7 | # Importing the dataset 8 | dataset = pd.read_csv('creditcard.csv') 9 | X = dataset.iloc[:, :-1].values 10 | y = dataset.iloc[:, -1].values 11 | 12 | # Feature Scaling 13 | from sklearn.preprocessing import MinMaxScaler 14 | sc = MinMaxScaler(feature_range = (0, 1)) 15 | X = sc.fit_transform(X) 16 | 17 | # Training the SOM 18 | from minisom import MiniSom 19 | som = MiniSom(x = 10, y = 10, input_len = 30, sigma = 1.0, learning_rate = 0.5) 20 | som.random_weights_init(X) 21 | som.train_random(data = X, num_iteration = 100) 22 | 23 | # Visualizing the results 24 | from pylab import bone, pcolor, colorbar, plot, show 25 | bone() 26 | pcolor(som.distance_map().T) 27 | colorbar() 28 | markers = ['o', 's'] 29 | colors = ['r', 'g'] 30 | for i, x in enumerate(X): 31 | w = som.winner(x) 32 | plot(w[0] + 0.5, 33 | w[1] + 0.5, 34 | markers[y[i]], 35 | markeredgecolor = colors[y[i]], 36 | markerfacecolor = 'None', 37 | markersize = 10, 38 | markeredgewidth = 2) 39 | show() 40 | 41 | # Finding the frauds 42 | ''' 43 | After visualizing the map , we noticed that the fraudulant transactions are behind t 44 | neuron (9,1) . 45 | ''' 46 | mappings = som.win_map(X) 47 | frauds = mappings[(9,1)] 48 | frauds = sc.inverse_transform(frauds) 49 | 50 | -------------------------------------------------------------------------------- /trained_model.sav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Z4ck404/REST_API_for_fraud-detection/8a4c3dfd34206c2be88c846450df0986b609c0ad/trained_model.sav --------------------------------------------------------------------------------