├── LICENSE
├── README.md
├── api
    ├── api.py
    └── trained_model.sav
├── minisom.py
├── model.py
├── som_fraud.py
└── trained_model.sav


/LICENSE:
--------------------------------------------------------------------------------
 1 | This is free and unencumbered software released into the public domain.
 2 | 
 3 | Anyone is free to copy, modify, publish, use, compile, sell, or
 4 | distribute this software, either in source code form or as a compiled
 5 | binary, for any purpose, commercial or non-commercial, and by any
 6 | means.
 7 | 
 8 | In jurisdictions that recognize copyright laws, the author or authors
 9 | of this software dedicate any and all copyright interest in the
10 | software to the public domain. We make this dedication for the benefit
11 | of the public at large and to the detriment of our heirs and
12 | successors. We intend this dedication to be an overt act of
13 | relinquishment in perpetuity of all present and future rights to this
14 | software under copyright law.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 | 
24 | For more information, please refer to <http://unlicense.org>
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | [![GitHub pull-requests](https://img.shields.io/github/issues-pr/Naereen/StrapDown.js.svg)](https://GitHub.com/Naereen/StrapDown.js/pull/)
 3 | [![Open Source Love svg1](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/)
 4 | 
 5 | # REST_API_for_fraud-detection
 6 | A logistic regression model that detects the fraud in online transactions . It can be acced with a REST API . 
 7 | In the files also inclused an implementation with self organizing maps to better classify fraudulant transactions.
 8 | 
 9 | ## The Trained model 
10 | 
11 | Logistic regression is one of the most popular machine learning algorithms for binary classification. This is because it is a simple algorithm that performs very well on a wide range of problems.
12 | The logistic regression model takes real-valued inputs and makes a prediction as to the probability of the input belonging to the default class (class 0)
13 | 
14 | ## The Data 
15 | Three models trained to label anonymized credit card transactions as fraudulent or genuine. Dataset from [kaggle](https://www.kaggle.com/sarathchandra/credit-card-fraud-detection-99-accuracy/comments#144915).
16 | 
17 | ## The API acces points
18 | to be able to use the api , you need first install the requirements 
19 | ```
20 | pip install flask
21 | ```
22 | Then test the application by runing the code : 
23 | ```
24 | python api.py
25 | ```
26 | the app will be running on local host port 5000 
27 | with the endpoints : 
28 | ## /api/v0/verify 
29 | This endpoint get a JSON object in a  post method , that represents the new transaction 
30 | ```
31 |  {
32 |  "V1" : -1.3598071336738,
33 |  "V2" : -0.0727811733098497,
34 |  "V3" : 2.53634673796914,
35 |   ...
36 |   ...
37 |   ...
38 |   ...
39 |  "V25" : 0.128539358273528,
40 |  "V26" : -0.189114843888824,
41 |  "V27" : 0.133558376740387,
42 |  "V28" : -0.0210530534538215,
43 |  "Amount" : 149.62
44 |  }
45 | ```
46 | The retuen result looks like this : 
47 | ```
48 |     {
49 |         "id": 0,
50 |         "prediction": "0"
51 |     }
52 | ```
53 | 
54 | ## /api/v0/info
55 | this endpoint give you the information you need to know about the API ,
56 | the result look like this , you can add as many details as you like 
57 | ```
58 |  {
59 |     'Author' : 'IOO',
60 |     'description' : 'A fraud detection model using a kaggle dataset',
61 |  }
62 | ```
63 | 
64 | 
65 | 


--------------------------------------------------------------------------------
/api/api.py:
--------------------------------------------------------------------------------
 1 | import flask
 2 | from flask import request, jsonify
 3 | import numpy
 4 | import json
 5 | import sys
 6 | import pandas as pd
 7 | from sklearn.decomposition import PCA
 8 | from sklearn.preprocessing import StandardScaler
 9 | from sklearn.linear_model import LogisticRegression
10 | from sklearn.externals import joblib
11 | app = flask.Flask(__name__)
12 | app.config["DEBUG"] = True
13 | def pred_vect(v):
14 |     vect = {}
15 |     j = 1
16 |     for i in range (0,28):
17 |         ind = "V"+str(j)
18 |         vect [ind] = v[i]
19 |         j = j+1
20 |     vect ["Amount"] = v[28]
21 |     return pd.DataFrame([vect])
22 | v = [-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62]
23 | filename = 'trained_model.sav'
24 | #post a json object to see the prediction for it :
25 | @app.route('/foo', methods=['POST']) 
26 | def foo():
27 |     print (request)
28 |     #return request
29 |     return json.dumps(request.json)
30 | @app.route('/postjson', methods = ['POST'])
31 | def postJsonHandler():
32 |     print (request.is_json)
33 |     content = request.get_json()
34 |     print (content)
35 |     return jsonify(content)
36 | @app.route('/', methods=['GET'])
37 | def home():
38 |     return '''<h1>Fraud Detection API : IOO</h1>
39 | <p> A POC for the use of Machine learning to detect Credit Cards Frauds.</p>'''
40 | 
41 | @app.route('/api/v0/verify', methods=['GET','POST'])
42 | def predict_new_transaction():
43 |     vec = pd.DataFrame([request.json])
44 |     loaded_model = joblib.load(filename)
45 |     res = str(loaded_model.predict(vec)[0])
46 |     result = [
47 |     {'id': 0,
48 |      'prediction': res} ]
49 |     return jsonify(result)
50 | @app.route('/api/v0/test', methods=['GET','POST'])
51 | def predict_test():
52 |     loaded_model = joblib.load(filename)
53 |     res = str(loaded_model.predict(pred_vect(v))[0])
54 |     result = [
55 |     {'id': 0,
56 |      'prediction': res} ]
57 |     return jsonify(result)
58 | @app.route('/api/v0/info', methods=['GET'])
59 | def info():
60 |     result = [
61 |         {
62 |             'Author' : 'IOO',
63 |             'description' : 'A fraud detection model using a kaggle dataset',
64 |         }
65 |     ]
66 |     return jsonify(result)  
67 | app.run()
68 | 
69 | 


--------------------------------------------------------------------------------
/api/trained_model.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Z4ck404/REST_API_for_fraud-detection/8a4c3dfd34206c2be88c846450df0986b609c0ad/api/trained_model.sav


--------------------------------------------------------------------------------
/minisom.py:
--------------------------------------------------------------------------------
  1 | from math import sqrt
  2 | 
  3 | from numpy import (array, unravel_index, nditer, linalg, random, subtract,
  4 |                    power, exp, pi, zeros, arange, outer, meshgrid, dot,
  5 |                    logical_and, mean, std, cov, argsort, linspace, transpose)
  6 | from collections import defaultdict, Counter
  7 | from warnings import warn
  8 | from sys import stdout
  9 | from time import time
 10 | 
 11 | # for unit tests
 12 | from numpy.testing import assert_almost_equal, assert_array_almost_equal
 13 | from numpy.testing import assert_array_equal
 14 | import unittest
 15 | 
 16 | """
 17 |     Minimalistic implementation of the Self Organizing Maps (SOM).
 18 | """
 19 | 
 20 | 
 21 | def _incremental_index_verbose(m):
 22 |     """Yields numbers from 0 to m-1 printing the status on the stdout."""
 23 |     progress = f'\r [ {0:{len(str(m))}} / {m} ] {0:3.0f}% ? it/s'
 24 |     stdout.write(progress)
 25 |     beginning = time()
 26 |     for i in range(m):
 27 |         yield i
 28 |         it_per_sec = (time() - beginning) / (i+1)
 29 |         progress = f'\r [ {i+1:{len(str(m))}} / {m} ]'
 30 |         progress += f' {100*(i+1)/m:3.0f}%'
 31 |         progress += f' {it_per_sec:4.5f} it/s'
 32 |         stdout.write(progress)
 33 | 
 34 | 
 35 | def fast_norm(x):
 36 |     """Returns norm-2 of a 1-D numpy array.
 37 | 
 38 |     * faster than linalg.norm in case of 1-D arrays (numpy 1.9.2rc1).
 39 |     """
 40 |     return sqrt(dot(x, x.T))
 41 | 
 42 | 
 43 | def asymptotic_decay(learning_rate, t, max_iter):
 44 |     """Decay function of the learning process.
 45 |     Parameters
 46 |     ----------
 47 |     learning_rate : float
 48 |         current learning rate.
 49 | 
 50 |     t : int
 51 |         current iteration.
 52 | 
 53 |     max_iter : int
 54 |         maximum number of iterations for the training.
 55 |     """
 56 |     return learning_rate / (1+t/(max_iter/2))
 57 | 
 58 | 
 59 | class MiniSom(object):
 60 |     def __init__(self, x, y, input_len, sigma=1.0, learning_rate=0.5,
 61 |                  decay_function=asymptotic_decay,
 62 |                  neighborhood_function='gaussian', random_seed=None):
 63 |         """Initializes a Self Organizing Maps.
 64 | 
 65 |         A rule of thumb to set the size of the grid for a dimensionality
 66 |         reduction task is that it should contain 5*sqrt(N) neurons
 67 |         where N is the number of samples in the dataset to analyze.
 68 | 
 69 |         E.g. if your dataset has 150 samples, 5*sqrt(150) = 61.23
 70 |         hence a map 8-by-8 should perform well.
 71 | 
 72 |         Parameters
 73 |         ----------
 74 |         x : int
 75 |             x dimension of the SOM.
 76 | 
 77 |         y : int
 78 |             y dimension of the SOM.
 79 | 
 80 |         input_len : int
 81 |             Number of the elements of the vectors in input.
 82 | 
 83 |         sigma : float, optional (default=1.0)
 84 |             Spread of the neighborhood function, needs to be adequate
 85 |             to the dimensions of the map.
 86 |             (at the iteration t we have sigma(t) = sigma / (1 + t/T)
 87 |             where T is #num_iteration/2)
 88 |             learning_rate, initial learning rate
 89 |             (at the iteration t we have
 90 |             learning_rate(t) = learning_rate / (1 + t/T)
 91 |             where T is #num_iteration/2)
 92 | 
 93 |         decay_function : function (default=None)
 94 |             Function that reduces learning_rate and sigma at each iteration
 95 |             the default function is:
 96 |                         learning_rate / (1+t/(max_iterarations/2))
 97 | 
 98 |             A custom decay function will need to to take in input
 99 |             three parameters in the following order:
100 | 
101 |             1. learning rate
102 |             2. current iteration
103 |             3. maximum number of iterations allowed
104 | 
105 | 
106 |             Note that if a lambda function is used to define the decay
107 |             MiniSom will not be pickable anymore.
108 | 
109 |         neighborhood_function : function, optional (default='gaussian')
110 |             Function that weights the neighborhood of a position in the map
111 |             possible values: 'gaussian', 'mexican_hat', 'bubble'
112 | 
113 |         random_seed : int, optional (default=None)
114 |             Random seed to use.
115 |         """
116 |         if sigma >= x or sigma >= y:
117 |             warn('Warning: sigma is too high for the dimension of the map.')
118 | 
119 |         self._random_generator = random.RandomState(random_seed)
120 | 
121 |         self._learning_rate = learning_rate
122 |         self._sigma = sigma
123 |         self._input_len = input_len
124 |         # random initialization
125 |         self._weights = self._random_generator.rand(x, y, input_len)*2-1
126 | 
127 |         for i in range(x):
128 |             for j in range(y):
129 |                 # normalization
130 |                 norm = fast_norm(self._weights[i, j])
131 |                 self._weights[i, j] = self._weights[i, j] / norm
132 | 
133 |         self._activation_map = zeros((x, y))
134 |         self._neigx = arange(x)
135 |         self._neigy = arange(y)  # used to evaluate the neighborhood function
136 |         self._decay_function = decay_function
137 | 
138 |         neig_functions = {'gaussian': self._gaussian,
139 |                           'mexican_hat': self._mexican_hat,
140 |                           'bubble': self._bubble,
141 |                           'triangle': self._triangle}
142 | 
143 |         if neighborhood_function not in neig_functions:
144 |             msg = '%s not supported. Functions available: %s'
145 |             raise ValueError(msg % (neighborhood_function,
146 |                                     ', '.join(neig_functions.keys())))
147 | 
148 |         if neighborhood_function in ['triangle',
149 |                                      'bubble'] and divmod(sigma, 1)[1] != 0:
150 |             warn('sigma should be an integer when triangle or bubble' +
151 |                  'are used as neighborhood function')
152 | 
153 |         self.neighborhood = neig_functions[neighborhood_function]
154 | 
155 |     def get_weights(self):
156 |         """Returns the weights of the neural network."""
157 |         return self._weights
158 | 
159 |     def _activate(self, x):
160 |         """Updates matrix activation_map, in this matrix
161 |            the element i,j is the response of the neuron i,j to x."""
162 |         s = subtract(x, self._weights)  # x - w
163 |         it = nditer(self._activation_map, flags=['multi_index'])
164 |         while not it.finished:
165 |             # || x - w ||
166 |             self._activation_map[it.multi_index] = fast_norm(s[it.multi_index])
167 |             it.iternext()
168 | 
169 |     def activate(self, x):
170 |         """Returns the activation map to x."""
171 |         self._activate(x)
172 |         return self._activation_map
173 | 
174 |     def _gaussian(self, c, sigma):
175 |         """Returns a Gaussian centered in c."""
176 |         d = 2*pi*sigma*sigma
177 |         ax = exp(-power(self._neigx-c[0], 2)/d)
178 |         ay = exp(-power(self._neigy-c[1], 2)/d)
179 |         return outer(ax, ay)  # the external product gives a matrix
180 | 
181 |     def _mexican_hat(self, c, sigma):
182 |         """Mexican hat centered in c."""
183 |         xx, yy = meshgrid(self._neigx, self._neigy)
184 |         p = power(xx-c[0], 2) + power(yy-c[1], 2)
185 |         d = 2*pi*sigma*sigma
186 |         return exp(-p/d)*(1-2/d*p)
187 | 
188 |     def _bubble(self, c, sigma):
189 |         """Constant function centered in c with spread sigma.
190 |         sigma should be an odd value.
191 |         """
192 |         ax = logical_and(self._neigx > c[0]-sigma,
193 |                          self._neigx < c[0]+sigma)
194 |         ay = logical_and(self._neigy > c[1]-sigma,
195 |                          self._neigy < c[1]+sigma)
196 |         return outer(ax, ay)*1.
197 | 
198 |     def _triangle(self, c, sigma):
199 |         """Triangular function centered in c with spread sigma."""
200 |         triangle_x = (-abs(c[0] - self._neigx)) + sigma
201 |         triangle_y = (-abs(c[1] - self._neigy)) + sigma
202 |         triangle_x[triangle_x < 0] = 0.
203 |         triangle_y[triangle_y < 0] = 0.
204 |         return outer(triangle_x, triangle_y)
205 | 
206 |     def _check_iteration_number(self, num_iteration):
207 |         if num_iteration < 1:
208 |             raise ValueError('num_iteration must be > 1')
209 | 
210 |     def _check_input_len(self, data):
211 |         """Checks that the data in input is of the correct shape."""
212 |         data_len = len(data[0])
213 |         if self._input_len != data_len:
214 |             msg = 'Received %d features, expected %d.' % (data_len,
215 |                                                           self._input_len)
216 |             raise ValueError(msg)
217 | 
218 |     def winner(self, x):
219 |         """Computes the coordinates of the winning neuron for the sample x."""
220 |         self._activate(x)
221 |         return unravel_index(self._activation_map.argmin(),
222 |                              self._activation_map.shape)
223 | 
224 |     def update(self, x, win, t, max_iteration):
225 |         """Updates the weights of the neurons.
226 | 
227 |         Parameters
228 |         ----------
229 |         x : np.array
230 |             Current pattern to learn.
231 |         win : tuple
232 |             Position of the winning neuron for x (array or tuple).
233 |         t : int
234 |             Iteration index
235 |         max_iteration : int
236 |             Maximum number of training itarations.
237 |         """
238 |         eta = self._decay_function(self._learning_rate, t, max_iteration)
239 |         # sigma and learning rate decrease with the same rule
240 |         sig = self._decay_function(self._sigma, t, max_iteration)
241 |         # improves the performances
242 |         g = self.neighborhood(win, sig)*eta
243 |         it = nditer(g, flags=['multi_index'])
244 | 
245 |         while not it.finished:
246 |             # eta * neighborhood_function * (x-w)
247 |             x_w = (x - self._weights[it.multi_index])
248 |             self._weights[it.multi_index] += g[it.multi_index] * x_w
249 |             it.iternext()
250 | 
251 |     def quantization(self, data):
252 |         """Assigns a code book (weights vector of the winning neuron)
253 |         to each sample in data."""
254 |         self._check_input_len(data)
255 |         q = zeros(data.shape)
256 |         for i, x in enumerate(data):
257 |             q[i] = self._weights[self.winner(x)]
258 |         return q
259 | 
260 |     def random_weights_init(self, data):
261 |         """Initializes the weights of the SOM
262 |         picking random samples from data."""
263 |         self._check_input_len(data)
264 |         it = nditer(self._activation_map, flags=['multi_index'])
265 |         while not it.finished:
266 |             rand_i = self._random_generator.randint(len(data))
267 |             self._weights[it.multi_index] = data[rand_i]
268 |             norm = fast_norm(self._weights[it.multi_index])
269 |             self._weights[it.multi_index] = self._weights[it.multi_index]
270 |             it.iternext()
271 | 
272 |     def pca_weights_init(self, data):
273 |         """Initializes the weights to span the first two principal components.
274 | 
275 |         This initialization doesn't depend on random processes and
276 |         makes the training process converge faster.
277 | 
278 |         It is strongly reccomended to normalize the data before initializing
279 |         the weights and use the same normalization for the training data.
280 |         """
281 |         if self._input_len == 1:
282 |             msg = 'The data needs at least 2 features for pca initialization'
283 |             raise ValueError(msg)
284 |         self._check_input_len(data)
285 |         if len(self._neigx) == 1 or len(self._neigy) == 1:
286 |             msg = 'PCA initialization inappropriate:' + \
287 |                   'One of the dimensions of the map is 1.'
288 |             warn(msg)
289 |         pc_length, pc = linalg.eig(cov(transpose(data)))
290 |         pc_order = argsort(pc_length)
291 |         for i, c1 in enumerate(linspace(-1, 1, len(self._neigx))):
292 |             for j, c2 in enumerate(linspace(-1, 1, len(self._neigy))):
293 |                 self._weights[i, j] = c1*pc[pc_order[0]] + c2*pc[pc_order[1]]
294 | 
295 |     def train_random(self, data, num_iteration, verbose=False):
296 |         """Trains the SOM picking samples at random from data.
297 | 
298 |         Parameters
299 |         ----------
300 |         data : np.array or list
301 |             Data matrix.
302 | 
303 |         num_iterations : int
304 |             Maximum number of iterations (one iteration per sample).
305 | 
306 |         verbose : bool (default=False)
307 |             If True the status of the training
308 |             will be printed at each iteration.
309 |         """
310 |         self._check_iteration_number(num_iteration)
311 |         self._check_input_len(data)
312 |         iterations = range(num_iteration)
313 |         if verbose:
314 |             iterations = _incremental_index_verbose(num_iteration)
315 | 
316 |         for iteration in iterations:
317 |             # pick a random sample
318 |             rand_i = self._random_generator.randint(len(data))
319 |             self.update(data[rand_i], self.winner(data[rand_i]),
320 |                         iteration, num_iteration)
321 | 
322 |     def train_batch(self, data, num_iteration, verbose=False):
323 |         """Trains using all the vectors in data sequentially.
324 | 
325 |         Parameters
326 |         ----------
327 |         data : np.array or list
328 |             Data matrix.
329 | 
330 |         num_iterations : int
331 |             Maximum number of iterations (one iteration per sample).
332 | 
333 |         verbose : bool (default=False)
334 |             If True the status of the training
335 |             will be printed at each iteration.
336 |         """
337 |         self._check_iteration_number(num_iteration)
338 |         self._check_input_len(data)
339 |         iterations = range(num_iteration)
340 |         if verbose:
341 |             iterations = _incremental_index_verbose(num_iteration)
342 | 
343 |         for iteration in iterations:
344 |             idx = iteration % (len(data)-1)
345 |             self.update(data[idx], self.winner(data[idx]),
346 |                         iteration, num_iteration)
347 | 
348 |     def distance_map(self):
349 |         """Returns the distance map of the weights.
350 |         Each cell is the normalised sum of the distances between
351 |         a neuron and its neighbours."""
352 |         um = zeros((self._weights.shape[0], self._weights.shape[1]))
353 |         it = nditer(um, flags=['multi_index'])
354 |         while not it.finished:
355 |             for ii in range(it.multi_index[0]-1, it.multi_index[0]+2):
356 |                 for jj in range(it.multi_index[1]-1, it.multi_index[1]+2):
357 |                     if (ii >= 0 and ii < self._weights.shape[0] and
358 |                             jj >= 0 and jj < self._weights.shape[1]):
359 |                         w_1 = self._weights[ii, jj, :]
360 |                         w_2 = self._weights[it.multi_index]
361 |                         um[it.multi_index] += fast_norm(w_1-w_2)
362 |             it.iternext()
363 |         um = um/um.max()
364 |         return um
365 | 
366 |     def activation_response(self, data):
367 |         """
368 |             Returns a matrix where the element i,j is the number of times
369 |             that the neuron i,j have been winner.
370 |         """
371 |         self._check_input_len(data)
372 |         a = zeros((self._weights.shape[0], self._weights.shape[1]))
373 |         for x in data:
374 |             a[self.winner(x)] += 1
375 |         return a
376 | 
377 |     def quantization_error(self, data):
378 |         """Returns the quantization error computed as the average
379 |         distance between each input sample and its best matching unit."""
380 |         self._check_input_len(data)
381 |         error = 0
382 |         for x in data:
383 |             error += fast_norm(x-self._weights[self.winner(x)])
384 |         return error/len(data)
385 | 
386 |     def win_map(self, data):
387 |         """Returns a dictionary wm where wm[(i,j)] is a list
388 |         with all the patterns that have been mapped in the position i,j."""
389 |         self._check_input_len(data)
390 |         winmap = defaultdict(list)
391 |         for x in data:
392 |             winmap[self.winner(x)].append(x)
393 |         return winmap
394 | 
395 |     def labels_map(self, data, labels):
396 |         """Returns a dictionary wm where wm[(i,j)] is a dictionary
397 |         that contains the number of samples from a given label
398 |         that have been mapped in position i,j.
399 | 
400 |         Parameters
401 |         ----------
402 |         data : np.array or list
403 |             Data matrix.
404 | 
405 |         label : np.array or list
406 |             Labels for each sample in data.
407 |         """
408 |         self._check_input_len(data)
409 |         winmap = defaultdict(list)
410 |         for x, l in zip(data, labels):
411 |             winmap[self.winner(x)].append(l)
412 |         for position in winmap:
413 |             winmap[position] = Counter(winmap[position])
414 |         return winmap
415 | 
416 | 
417 | class TestMinisom(unittest.TestCase):
418 |     def setUp(self):
419 |         self.som = MiniSom(5, 5, 1)
420 |         for i in range(5):
421 |             for j in range(5):
422 |                 # checking weights normalization
423 |                 assert_almost_equal(1.0, linalg.norm(self.som._weights[i, j]))
424 |         self.som._weights = zeros((5, 5))  # fake weights
425 |         self.som._weights[2, 3] = 5.0
426 |         self.som._weights[1, 1] = 2.0
427 | 
428 |     def test_decay_function(self):
429 |         assert self.som._decay_function(1., 2., 3.) == 1./(1.+2./(3./2))
430 | 
431 |     def test_fast_norm(self):
432 |         assert fast_norm(array([1, 3])) == sqrt(1+9)
433 | 
434 |     def test_check_input_len(self):
435 |         with self.assertRaises(ValueError):
436 |             self.som.train_batch([[1, 2]], 1)
437 | 
438 |         with self.assertRaises(ValueError):
439 |             self.som.random_weights_init(array([[1, 2]]))
440 | 
441 |         with self.assertRaises(ValueError):
442 |             self.som._check_input_len(array([[1, 2]]))
443 | 
444 |         self.som._check_input_len(array([[1]]))
445 |         self.som._check_input_len([[1]])
446 | 
447 |     def test_unavailable_neigh_function(self):
448 |         with self.assertRaises(ValueError):
449 |             MiniSom(5, 5, 1, neighborhood_function='boooom')
450 | 
451 |     def test_gaussian(self):
452 |         bell = self.som._gaussian((2, 2), 1)
453 |         assert bell.max() == 1.0
454 |         assert bell.argmax() == 12  # unravel(12) = (2,2)
455 | 
456 |     def test_mexican_hat(self):
457 |         bell = self.som._mexican_hat((2, 2), 1)
458 |         assert bell.max() == 1.0
459 |         assert bell.argmax() == 12  # unravel(12) = (2,2)
460 | 
461 |     def test_bubble(self):
462 |         bubble = self.som._bubble((2, 2), 1)
463 |         assert bubble[2, 2] == 1
464 |         assert sum(sum(bubble)) == 1
465 | 
466 |     def test_triangle(self):
467 |         bubble = self.som._triangle((2, 2), 1)
468 |         assert bubble[2, 2] == 1
469 |         assert sum(sum(bubble)) == 1
470 | 
471 |     def test_win_map(self):
472 |         winners = self.som.win_map([[5.0], [2.0]])
473 |         assert winners[(2, 3)][0] == [5.0]
474 |         assert winners[(1, 1)][0] == [2.0]
475 | 
476 |     def test_labels_map(self):
477 |         labels_map = self.som.labels_map([[5.0], [2.0]], ['a', 'b'])
478 |         assert labels_map[(2, 3)]['a'] == 1
479 |         assert labels_map[(1, 1)]['b'] == 1
480 | 
481 |     def test_activation_reponse(self):
482 |         response = self.som.activation_response([[5.0], [2.0]])
483 |         assert response[2, 3] == 1
484 |         assert response[1, 1] == 1
485 | 
486 |     def test_activate(self):
487 |         assert self.som.activate(5.0).argmin() == 13.0  # unravel(13) = (2,3)
488 | 
489 |     def test_quantization_error(self):
490 |         assert self.som.quantization_error([[5], [2]]) == 0.0
491 |         assert self.som.quantization_error([[4], [1]]) == 1.0
492 | 
493 |     def test_quantization(self):
494 |         q = self.som.quantization(array([[4], [2]]))
495 |         assert q[0] == 5.0
496 |         assert q[1] == 2.0
497 | 
498 |     def test_random_seed(self):
499 |         som1 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1)
500 |         som2 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1)
501 |         # same initialization
502 |         assert_array_almost_equal(som1._weights, som2._weights)
503 |         data = random.rand(100, 2)
504 |         som1 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1)
505 |         som1.train_random(data, 10)
506 |         som2 = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1)
507 |         som2.train_random(data, 10)
508 |         # same state after training
509 |         assert_array_almost_equal(som1._weights, som2._weights)
510 | 
511 |     def test_train_batch(self):
512 |         som = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1)
513 |         data = array([[4, 2], [3, 1]])
514 |         q1 = som.quantization_error(data)
515 |         som.train_batch(data, 10)
516 |         assert q1 > som.quantization_error(data)
517 | 
518 |         data = array([[1, 5], [6, 7]])
519 |         q1 = som.quantization_error(data)
520 |         som.train_batch(data, 10, verbose=True)
521 |         assert q1 > som.quantization_error(data)
522 | 
523 |     def test_train_random(self):
524 |         som = MiniSom(5, 5, 2, sigma=1.0, learning_rate=0.5, random_seed=1)
525 |         data = array([[4, 2], [3, 1]])
526 |         q1 = som.quantization_error(data)
527 |         som.train_random(data, 10)
528 |         assert q1 > som.quantization_error(data)
529 | 
530 |         data = array([[1, 5], [6, 7]])
531 |         q1 = som.quantization_error(data)
532 |         som.train_random(data, 10, verbose=True)
533 |         assert q1 > som.quantization_error(data)
534 | 
535 |     def test_random_weights_init(self):
536 |         som = MiniSom(2, 2, 2, random_seed=1)
537 |         som.random_weights_init(array([[1.0, .0]]))
538 |         for w in som._weights:
539 |             assert_array_equal(w[0], array([1.0, .0]))
540 | 
541 |     def test_pca_weights_init(self):
542 |         som = MiniSom(2, 2, 2)
543 |         som.pca_weights_init(array([[1.,  0.], [0., 1.], [1., 0.], [0., 1.]]))
544 |         expected = array([[[0., -1.41421356], [1.41421356, 0.]],
545 |                           [[-1.41421356, 0.], [0., 1.41421356]]])
546 |         assert_array_almost_equal(som._weights, expected)
547 | 
548 |     def test_distance_map(self):
549 |         som = MiniSom(2, 2, 2, random_seed=1)
550 |         som._weights = array([[[1.,  0.], [0., 1.]], [[1., 0.], [0., 1.]]])
551 |         assert_array_equal(som.distance_map(), array([[1., 1.], [1., 1.]]))
552 | 


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
 1 | import numpy
 2 | import sys
 3 | import pandas as pd
 4 | from sklearn.decomposition import PCA
 5 | from sklearn.preprocessing import StandardScaler
 6 | from sklearn.linear_model import LogisticRegression
 7 | from sklearn.externals import joblib
 8 | #transform the vector to 1d vectors
 9 | def transorm (y):
10 |     tab =[]
11 |     for i in y :
12 |         tab = tab +[i[0]]
13 |     return tab
14 | filename = 'trained_model.sav'
15 | data = pd.read_csv("creditcard.csv")
16 | def train_the_model (filename,data) :
17 |     logistic_reg_model = LogisticRegression(solver = 'lbfgs')
18 |     features = list(data.columns.values)[1:30]
19 |     x = data.loc[:, features].values
20 |     y = transorm(data.loc[:,['Class']].values)
21 |     logistic_reg_model.fit(x, y)
22 |     joblib.dump(logistic_reg_model, filename)
23 | #Prediction the new transaction
24 | def predict_new_transaction(vector_transaction):
25 |     loaded_model = joblib.load(filename)
26 |     return loaded_model.predict(vector_transaction)
27 | def pred_vect(v):
28 |     vect = {}
29 |     j = 1
30 |     for i in range (0,28):
31 |         ind = "V"+str(j)
32 |         vect [ind] = v[i]
33 |         j = j+1
34 |     vect ["Amount"] = v[28]
35 |     return pd.DataFrame([vect])
36 | v = [-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62]
37 | print (predict_new_transaction(pred_vect(v)))
38 | #print (v[28])


--------------------------------------------------------------------------------
/som_fraud.py:
--------------------------------------------------------------------------------
 1 | # Self Organizing Map
 2 | 
 3 | # Importing the libraries
 4 | import numpy as np
 5 | import pandas as pd
 6 | 
 7 | # Importing the dataset
 8 | dataset = pd.read_csv('creditcard.csv')
 9 | X = dataset.iloc[:, :-1].values
10 | y = dataset.iloc[:, -1].values
11 | 
12 | # Feature Scaling
13 | from sklearn.preprocessing import MinMaxScaler
14 | sc = MinMaxScaler(feature_range = (0, 1))
15 | X = sc.fit_transform(X)
16 | 
17 | # Training the SOM
18 | from minisom import MiniSom
19 | som = MiniSom(x = 10, y = 10, input_len = 30, sigma = 1.0, learning_rate = 0.5)
20 | som.random_weights_init(X)
21 | som.train_random(data = X, num_iteration = 100)
22 | 
23 | # Visualizing the results
24 | from pylab import bone, pcolor, colorbar, plot, show
25 | bone()
26 | pcolor(som.distance_map().T)
27 | colorbar()
28 | markers = ['o', 's']
29 | colors = ['r', 'g']
30 | for i, x in enumerate(X):
31 |     w = som.winner(x)
32 |     plot(w[0] + 0.5,
33 |          w[1] + 0.5,
34 |          markers[y[i]],
35 |          markeredgecolor = colors[y[i]],
36 |          markerfacecolor = 'None',
37 |          markersize = 10,
38 |          markeredgewidth = 2)
39 | show()
40 | 
41 | # Finding the frauds
42 | ''' 
43 | After visualizing the map , we noticed that the fraudulant transactions are behind t
44 | neuron (9,1) . 
45 | '''
46 | mappings = som.win_map(X)
47 | frauds = mappings[(9,1)]
48 | frauds = sc.inverse_transform(frauds)
49 | 
50 | 


--------------------------------------------------------------------------------
/trained_model.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Z4ck404/REST_API_for_fraud-detection/8a4c3dfd34206c2be88c846450df0986b609c0ad/trained_model.sav


--------------------------------------------------------------------------------