├── FFT
    ├── FreqTime.py
    ├── readme.md
    └── utils.py
├── Images
    ├── FFT
    │   ├── testset1
    │   │   ├── max1.jpg
    │   │   ├── max2.jpg
    │   │   ├── max3.png
    │   │   ├── max4.png
    │   │   └── max5.png
    │   └── testset2
    │   │   ├── max1.png
    │   │   ├── max2.jpg
    │   │   ├── max3.png
    │   │   ├── max4.png
    │   │   └── max5.png
    └── LogisticRegression
    │   ├── testset1_figure.PNG
    │   ├── testset1_result.png
    │   ├── testset2_figure.jpg
    │   └── testset2_result.jpg
├── Logistic Regression
    ├── LogisticRegressionTraining.py
    ├── logisticregressiontesting.py
    ├── readme.md
    └── utils.py
└── README.md


/FFT/FreqTime.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | Created on Mon Jun  4 11:46:36 2018
 4 | 
 5 | @author: SHEFALI JAIN
 6 | """
 7 | """
 8 |      this script is used to plot the frequency vs time graph.
 9 | """
10 | # import the libraries
11 | import pandas as pd
12 | import numpy as np
13 | import matplotlib.pyplot as plt
14 | 
15 | from utils import cal_max_freq,plotlabels
16 | 
17 | import os
18 | 
19 | def reshapelist(lst):
20 |     return (list(map(list, zip(*lst))))
21 | 
22 | try:
23 |     # user input for the path of the dataset
24 |     filedir = input("enter the complete directory path ")
25 |     filepath = input("enter the folder name")
26 | 
27 |     #load the files
28 |     all_files = os.listdir(filedir)
29 |     freq_max1,freq_max2,freq_max3,freq_max4,freq_max5 = cal_max_freq(all_files,filepath)
30 | except IOError:
31 |     print("you have entered either the wrong data directory path or filepath")
32 | 
33 | freq = [freq_max1,freq_max2,freq_max3,freq_max4,freq_max5]
34 | 
35 | max_type = input("enter the max type in range 1-5")
36 | max_type=int(max_type)
37 | if(max_type >= 1 and max_type <= 5):
38 |     freqmax = reshapelist(freq[max_type-1])
39 | else:
40 |     print("you have enter the max type range except 1-5.")
41 | 
42 | # plot the figure
43 | fig  =  plt.figure()
44 | plotlabels(freqmax)
45 | fig.suptitle(("freq_max"+str(max_type )+'vs time '), fontsize = 20)
46 | plt.ylabel('frequency', fontsize = 15)
47 | plt.xlabel('time', fontsize = 15)
48 | fig.savefig(("freq_"+str(max_type)+'.jpg'))


--------------------------------------------------------------------------------
/FFT/readme.md:
--------------------------------------------------------------------------------
 1 | # Real-time time-series anomaly detection
 2 | 
 3 | | Programming Language |  Python |
 4 | | --- | --- |
 5 | | Skills (beg, intermediate, advanced) |  Intermediate |
 6 | | Time to complete project (in increments of 15 min) |  25 min |
 7 | | Hardware needed (hardware used) | Up Squared* board  |
 8 | | Target Operating System |   |
 9 | 
10 | The monitoring of manufacturing equipment is vital to any industrial process. Sometimes it is critical that equipment be monitored in real-time for faults and anomalies to prevent damage and correlate equipment behavior faults to production line issues.  Fault detection is the pre-cursor to predictive maintenance.
11 | 
12 | There are several methods which don&#39;t require training of a neural network to be able to detect failures, starting with the most basic (FFT), to the most complex (Gaussian Mixture Model). These have the advantage of being able to be re-used with minor modifications on different data streams, and don&#39;t require a lot of known previously classified data (unlike neural nets). In fact, some of these methods can be used to classify data in order to train DNNs.
13 | 
14 | ## What you&#39;ll learn
15 | - Basic implementation of FFT.
16 | - How FFT is helpful in Feature engineering of vibrational data of a machine.
17 | 
18 | ## Setup
19 | - Download the Bearing Data Set (also on [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/))
20 | - Extract the zip format of data into respective folder named 1st\_test/2nd\_test.
21 | - Make sure you have the following libraries:
22 | 
23 |   * Numpy
24 |   * Pandas
25 |   * Matplotlib.pyplot
26 |   * Sklearn
27 |   * Scipy
28 | 
29 | - Download the code
30 | - Make sure all the code and data in one folder
31 | 
32 | ## Gather your materials
33 | 
34 | - [UP Squared* board](http://www.up-board.org/upsquared/)
35 | 
36 | ## Get the code
37 | 
38 | Open the example in a console or any python supported IDE (Spyder). Set the working directory where your code and dataset is stored.
39 | 
40 | ## Run the application
41 | 
42 | |  SAMPLE FILE NAME | EXPECTED OUTPUT | Note |
43 | | --- | --- | --- |
44 | | FreqTime.py | Frequency v/s Time plot |   |
45 | | Utils.py | Have all the function for all the module |   |
46 | 
47 | ## How it works
48 | 
49 | 1. **FFT**: A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase.
50 | 
51 |      [Y](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-Y) = fft([X](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-X)) computes the discrete Fourier transform (DFT) of X using a fast Fourier transform (FFT) algorithm. If X is a vector, then fft(X) returns the Fourier transform of the vector [**More Details**](https://en.wikipedia.org/wiki/Fast_Fourier_transform)
52 | 
53 | ## Code Explanation:
54 | 
55 | For all the samples the basic approach is same. Following are the steps that are basic steps:
56 | 
57 | - Take the fft of each bearing of each file.
58 | - Calculate the Frequency and amplitude of it
59 | - Calculate the top 5 amplitude and their corresponding frequency.
60 | - Repeat the same for each bearing and each data file. Stored the result in the result data frame.
61 | 
62 | ### For FFT: Gives the Frequency vs time plot of each maximum frequency for each dataset.
63 | 
64 |  ![Figure 1](./Images/FFT/testset2/max2.jpg)
65 |  
66 |  *Figure 1.  Plot for the testset2, max2 frequency for all the bearing.*
67 | 
68 | **NOTE:**
69 | 
70 | TESTSET 3 of the NASA bearing dataset is discarded for the observations because of the following reason:
71 | 
72 | 1: It has 6324 data file in actuality, but according to the documentation it contains 4448 data file. This makes very noisy data.
73 | 
74 | 2: None of the bearing indicates the symptomsof failure. However, it suddenly fails. This makes data inconsistent.
75 | 
76 | The above listed reasons describe how testset3 exhibits unpredictable behavior.
77 | 


--------------------------------------------------------------------------------
/FFT/utils.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon May 14 12:20:53 2018
  4 | 
  5 | @author:SHEFALI JAIN
  6 | """
  7 | """
  8 |     This scripts contain the functions
  9 | """
 10 | 
 11 | #importing the libraries
 12 | import numpy as np
 13 | import pandas as pd
 14 | import matplotlib.pyplot as plt
 15 | 
 16 | from scipy.fftpack import fft
 17 | from scipy.spatial.distance import cdist
 18 | from sklearn import cluster
 19 | 
 20 | #cal_labels function take no_of_files as input and generate the label based on 70-30 split.
 21 | #files for the testset1 = 2148,testset2 = 984,testset3 = 6324
 22 | def cal_Labels(files):
 23 |     range_low = files*0.7
 24 |     range_high = files*1.0
 25 |     label = []
 26 |     for i in range(0,files):
 27 |         if(i<range_low):
 28 |             label.append(0)
 29 |         elif(i >= range_low and i <= range_high):
 30 |             label.append(1)
 31 |         else:
 32 |             label.append(2)
 33 |     return label
 34 | 
 35 | # cal_amplitude take the fftdata, n = no of maximun amplitude as input and return the top5 frequecy which has the highest amplitude
 36 | def cal_amplitude(fftData,n):
 37 |     ifa = []
 38 |     ia = []
 39 |     amp = abs(fftData[0:int(len(fftData)/2)])
 40 |     freq = np.linspace(0,10000,num = int(len(fftData)/2))
 41 |     ida = np.array(amp).argsort()[-n:][::-1]
 42 |     ia.append([amp[i] for i in ida])
 43 |     ifa.append([freq[i] for i in ida])
 44 |     return(ifa,ia)
 45 | 
 46 | # this function calculate the top n freq which has the heighest amplitude and retuen the list for each maximum
 47 | def cal_max_freq(files,path):
 48 |     freq_max1, freq_max2, freq_max3, freq_max4, freq_max5 = ([] for _ in range(5))
 49 |     for f in files:
 50 |         temp = pd.read_csv(path+f,  sep = "\t",header = None)
 51 |         temp_freq_max1,temp_freq_max2,temp_freq_max3,temp_freq_max4,temp_freq_max5 = ([] for _ in range(5))
 52 |         if(path ==  "1st_test/"):
 53 |             rhigh = 8
 54 |         else:
 55 |             rhigh = 4
 56 |         for i in range(0,rhigh):
 57 |             t = fft(temp[i])
 58 |             ff,aa = cal_amplitude(t,5)
 59 |             temp_freq_max1.append(np.array(ff)[:,0])
 60 |             temp_freq_max2.append(np.array(ff)[:,1])
 61 |             temp_freq_max3.append(np.array(ff)[:,2])
 62 |             temp_freq_max4.append(np.array(ff)[:,3])
 63 |             temp_freq_max5.append(np.array(ff)[:,4])
 64 |         freq_max1.append(temp_freq_max1)
 65 |         freq_max2.append(temp_freq_max2)
 66 |         freq_max3.append(temp_freq_max3)
 67 |         freq_max4.append(temp_freq_max4)
 68 |         freq_max5.append(temp_freq_max5)
 69 |     return(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5)
 70 | 
 71 | 
 72 | # take the labels for each bearing, plot the corrosponding graph for each bearing .
 73 | 
 74 | def plotlabels(labels):
 75 |     length = len(labels)
 76 |     leng = len(labels[0])
 77 |     if(length == 4):
 78 |         ax1  =  plt.subplot2grid((8,1), (0,0), rowspan = 2, colspan = 1)
 79 |         ax2 = plt.subplot2grid((8,1), (2,0), rowspan = 2, colspan = 1)
 80 |         ax3 = plt.subplot2grid((8,1), (4,0), rowspan = 2, colspan = 1)
 81 |         ax4 = plt.subplot2grid((8,1), (6,0), rowspan = 2, colspan = 1)
 82 |         y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1")
 83 |         y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing2")
 84 |         y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing3")
 85 |         y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing4")
 86 |         plt.legend(handles = [y1,y2,y3,y4])
 87 |     elif(length == 8):
 88 |         ax1 = plt.subplot2grid((16,1), (0,0), rowspan = 2, colspan = 1)
 89 |         ax2 = plt.subplot2grid((16,1), (2,0), rowspan = 2, colspan = 1)
 90 |         ax3 = plt.subplot2grid((16,1), (4,0), rowspan = 2, colspan = 1)
 91 |         ax4 = plt.subplot2grid((16,1), (6,0), rowspan = 2, colspan = 1)
 92 |         ax5 = plt.subplot2grid((16,1), (8,0), rowspan = 2, colspan = 1)
 93 |         ax6 = plt.subplot2grid((16,1), (10,0), rowspan = 2, colspan = 1)
 94 |         ax7 = plt.subplot2grid((16,1), (12,0), rowspan = 2, colspan = 1)
 95 |         ax8 = plt.subplot2grid((16,1), (14,0), rowspan = 2, colspan = 1)
 96 |         y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1_x")
 97 |         y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing1_y")
 98 |         y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing2_x")
 99 |         y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing2_y")
100 |         y5 = ax5.scatter(np.array(range(1,leng+1)),np.array(labels)[4],label = "bearing3_x")
101 |         y6 = ax6.scatter(np.array(range(1,leng+1)),np.array(labels)[5],label = "bearing3_y")
102 |         y7 = ax7.scatter(np.array(range(1,leng+1)),np.array(labels)[6],label = "bearing4_x")
103 |         y8 = ax8.scatter(np.array(range(1,leng+1)),np.array(labels)[7],label = "bearing4_y")
104 |         plt.show()
105 |         plt.legend(handles = [y1,y2,y3,y4,y5,y6,y7,y8])
106 | 
107 | def create_dataframe(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5,bearing):
108 |     result = pd.DataFrame()
109 |     result['fmax1'] = list((np.array(freq_max1))[:,bearing])
110 |     result['fmax2'] = list((np.array(freq_max2))[:,bearing])
111 |     result['fmax3'] = list((np.array(freq_max3))[:,bearing])
112 |     result['fmax4'] = list((np.array(freq_max4))[:,bearing])
113 |     result['fmax5'] = list((np.array(freq_max5))[:,bearing])
114 |     x = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]]
115 |     return x
116 | 
117 | def elbow_method(X):
118 |     distortions = []
119 |     K = range(1,10)
120 |     for k in K:
121 |         kmeanModel = cluster.KMeans(n_clusters = k).fit(X)
122 |         kmeanModel.fit(X)
123 |         distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, 'euclidean'), axis = 1)) / X.shape[0])
124 |     #  Plot the elbow
125 |     plt.plot(K, distortions, 'bx-')
126 |     plt.xlabel('k')
127 |     plt.ylabel('Distortion')
128 |     plt.title('The Elbow Method showing the optimal k')
129 |     plt.show()


--------------------------------------------------------------------------------
/Images/FFT/testset1/max1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max1.jpg


--------------------------------------------------------------------------------
/Images/FFT/testset1/max2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max2.jpg


--------------------------------------------------------------------------------
/Images/FFT/testset1/max3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max3.png


--------------------------------------------------------------------------------
/Images/FFT/testset1/max4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max4.png


--------------------------------------------------------------------------------
/Images/FFT/testset1/max5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max5.png


--------------------------------------------------------------------------------
/Images/FFT/testset2/max1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max1.png


--------------------------------------------------------------------------------
/Images/FFT/testset2/max2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max2.jpg


--------------------------------------------------------------------------------
/Images/FFT/testset2/max3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max3.png


--------------------------------------------------------------------------------
/Images/FFT/testset2/max4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max4.png


--------------------------------------------------------------------------------
/Images/FFT/testset2/max5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max5.png


--------------------------------------------------------------------------------
/Images/LogisticRegression/testset1_figure.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset1_figure.PNG


--------------------------------------------------------------------------------
/Images/LogisticRegression/testset1_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset1_result.png


--------------------------------------------------------------------------------
/Images/LogisticRegression/testset2_figure.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset2_figure.jpg


--------------------------------------------------------------------------------
/Images/LogisticRegression/testset2_result.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset2_result.jpg


--------------------------------------------------------------------------------
/Logistic Regression/LogisticRegressionTraining.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Tue May 22 15:20:01 2018
  4 | 
  5 | @author: SHEFALI JAIN
  6 | """
  7 | 
  8 | """
  9 |     this script is used the train the Logistic Regression model, having 2 label.Labels are trained on the dataset, which consist of
 10 |     Testset1:
 11 |         bearing 7(bearing 4, y axis), Fail case
 12 |         bearing 2(bearing 2, x axis),pass case
 13 |     Testset2:
 14 |         bearing 0(bearing 1 ) , Fail case
 15 |         bearing 1(bearing 2) , Pass case
 16 | """
 17 | 
 18 | # importing libraries
 19 | 
 20 | import pandas as pd
 21 | import numpy as np
 22 | 
 23 | from utils import cal_Labels,cal_max_freq,create_dataframe
 24 | from sklearn.model_selection import train_test_split
 25 | from sklearn.linear_model import LogisticRegression
 26 | from sklearn import metrics
 27 | 
 28 | import os
 29 | 
 30 | # reading all the files from the testset1 and testset2
 31 | #chnaage the full path, to your dataset path
 32 | 
 33 | try:
 34 |     # reading all  the files from the testset1, and testset2
 35 |     filedir_testset1 = input("enter the complete directory path for the testset1 ")
 36 |     filedir_testset2 = input("enter the complete directory path for the testset2 ")
 37 |     all_files_testset1 = os.listdir(filedir_testset1)
 38 |     all_files_testset2 = os.listdir(filedir_testset2)
 39 | 
 40 |     # relative path of the dataset, after the current working directory
 41 | 
 42 |     path_testset2 = "2nd_test/"
 43 |     path_testset1 = "1st_test/"
 44 | 
 45 |     testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5 = cal_max_freq(all_files_testset1,path_testset1)
 46 |     testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5 = cal_max_freq(all_files_testset2,path_testset2)
 47 | 
 48 | except IOError:
 49 |     print("you have entered either the wrong data directory path for either testset1 or testset2")
 50 | 
 51 | # calculating the labels for the bearing which is failed
 52 | testset1_labelF = cal_Labels(len(all_files_testset1))
 53 | testset2_labelF = cal_Labels(len(all_files_testset2))
 54 | 
 55 | # creatine a datafRame for which bearing has failed
 56 | result1 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,7)
 57 | result1['labels'] = testset1_labelF
 58 | 
 59 | result2 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,0)
 60 | result2['labels'] = testset2_labelF
 61 | 
 62 | # calculating the labels for the testset1,testet2, which is not failed
 63 | testset1_labelP = np.array([0]*1800)
 64 | testset2_labelP = np.array([0]*800)
 65 | 
 66 | # creating a dataframe for which bearing is passed
 67 | result3 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,2)
 68 | result3 = result3[:1800]
 69 | result3['labels'] = testset1_labelP
 70 | 
 71 | result4 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,1)
 72 | result4 = result4[:800]
 73 | result4['labels'] = testset2_labelP
 74 | 
 75 | # creating the final Result
 76 | frames = [result1,result2,result3,result4]
 77 | result = pd.concat(frames)
 78 | 
 79 | # creating the X, Y variable
 80 | x = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]]
 81 | y = result['labels']
 82 | 
 83 | # spliting the x, y into train and test set
 84 | x_train, x_test, y_train, y_test  =  train_test_split(x, y, test_size = 0.3, random_state = 42,stratify = y)
 85 | 
 86 | # training the model
 87 | logisticRegr  =  LogisticRegression(class_weight = 'balanced',random_state = 42,max_iter = 300)
 88 | logisticRegr.fit(x_train, y_train)
 89 | predictions  =  logisticRegr.predict(x_test)
 90 | 
 91 | # Use score method to get accuracy of model
 92 | score  =  logisticRegr.score(x_test, y_test)
 93 | print("training accuracy",score)
 94 | cm  =  metrics.confusion_matrix(y_test, predictions)
 95 | print("confusion matrix for training",cm)
 96 | 
 97 | # save the model
 98 | filename = "logisticRegressionModel.npy"
 99 | np.save(filename,logisticRegr)
100 | 
101 | 
102 | 
103 | 
104 | 
105 | 


--------------------------------------------------------------------------------
/Logistic Regression/logisticregressiontesting.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | Created on Mon May 14 14:09:55 2018
 4 | 
 5 | @author:SHEFALI JAIN
 6 | """
 7 | """
 8 |     script for testing the bearing set whether they have any issue or not
 9 | """
10 | 
11 | # import the libraries
12 | import numpy as np
13 | 
14 | from utils import cal_max_freq,plotlabels,create_dataframe
15 | 
16 | import os
17 | 
18 | def main():
19 |     try:
20 |         # user input for the path of the dataset
21 |         filedir = input("enter the complete directory path ")
22 |         filepath = input("enter the folder name")
23 | 
24 |         # load the files
25 |         all_files = os.listdir(filedir)
26 |         freq_max1,freq_max2,freq_max3,freq_max4,freq_max5 = cal_max_freq(all_files,filepath)
27 |     except IOError:
28 |         print("you have entered either the wrong data directory path or filepath")
29 | 
30 |     # load the model
31 |     filename = "logisticRegressionModel.npy"
32 |     logisticRegr = np.load(filename).item()
33 | 
34 |     #checking the iteration
35 |     if (filepath=="1st_test/"):
36 |         rhigh = 8
37 |     else:
38 |         rhigh = 4
39 | 
40 |     print("for the testset",filepath)
41 |     prediction_last_100 = []
42 |     for i in range(0,rhigh):
43 |         print("checking for the bearing",i+1)
44 |         # creating  the dataframe
45 |         x = create_dataframe(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5,i)
46 |         predictions  =  logisticRegr.predict(x)
47 |         prediction_last_100.append(predictions[-100:])
48 |         # count no of zeros
49 |         zero = list(predictions).count(0)
50 |         ones = list(predictions).count(1)
51 |         print("the no of paased files",zero)
52 |         print("the no of failed files", ones)
53 |         check_one = list(predictions[-100:]).count(1)
54 |         check_zero = list(predictions[-100:]).count(0)
55 | 
56 |         if(check_one > check_zero):
57 |              print("bearing is suspected, there are chances to fail")
58 |         else:
59 |              print("bearing has no issue")
60 | 
61 |     # plotting the last 100 prediction for each bearing
62 |     plotlabels(prediction_last_100)
63 | 
64 | 
65 | if __name__  == "__main__":
66 |   main()


--------------------------------------------------------------------------------
/Logistic Regression/readme.md:
--------------------------------------------------------------------------------
 1 | # Real-time time-series anomaly detection
 2 | 
 3 | | Programming Language |  Python |
 4 | | --- | --- |
 5 | | Skills (beg, intermediate, advanced) |  Intermediate |
 6 | | Time to complete project (in increments of 15 min) |  25 min |
 7 | | Hardware needed (hardware used) | Up Squared* board  |
 8 | | Target Operating System |   |
 9 | 
10 | The monitoring of manufacturing equipment is vital to any industrial process. Sometimes it is critical that equipment be monitored in real-time for faults and anomalies to prevent damage and correlate equipment behavior faults to production line issues. Fault detection is the pre-cursor to predictive maintenance.
11 | 
12 | There are several methods which don&#39;t require training of a neural network to be able to detect failures, starting with the most basic (FFT), to the most complex (Gaussian Mixture Model). These have the advantage of being able to be re-used with minor modifications on different data streams, and don&#39;t require a lot of known previously classified data (unlike neural nets). In fact, some of these methods can be used to classify data in order to train DNNs.
13 | 
14 | ## What you&#39;ll learn
15 | 
16 | - Basic implementation of FFT, Logistic Regression.
17 | - How FFT is helpful in Feature engineering of vibrational data of a machine.
18 | 
19 | ## Setup
20 | 
21 | - Download the Bearing Data Set (also on [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/))
22 | - Extract the zip format of data into respective folder named 1st\_test/2nd\_test.
23 | - Make sure you have the following libraries:
24 | 
25 |      * Numpy
26 |      * Pandas
27 |      * Matplotlib.pyplot
28 |      * Sklearn
29 |      * Scipy
30 | 
31 | - Download the code
32 | - Make sure all the code and data in one folder
33 | 
34 | ## Gather your materials
35 | 
36 | - [UP Squared* board](http://www.up-board.org/upsquared/)
37 | 
38 | ## Get the code
39 | 
40 | Open the example in a console or any python supported IDE( Spyder ). Set the working directory where your code and dataset is stored.
41 | 
42 | ## Run the application
43 | For logistic Regression there are 3 files to execute.
44 | 
45 | | SAMPLE FILE NAME | EXPECTED OUTPUT | Note |
46 | | --- | --- | --- |
47 | | Utils.py | Have all the function for all the module |   |
48 | | Train\_logistic\_regression.py | Saved weight of the trained module in the filename &quot;LogisticRegression&quot; | Training for the logistic Regression is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. |
49 | | Test\_logistic\_regression.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take the input path from the user to check which bearing is suspected to fail and which is in normal condition.
50 | 
51 | ## How it works:
52 | 
53 | 1. **FFT**: A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase.
54 | 
55 |      [Y](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-Y) = fft([X](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-X)) computes the discrete Fourier transform (DFT) of X using a fast Fourier transform (FFT) algorithm. If X is a vector, then fft(X) returns the Fourier transform of the vector [**More Details**](https://en.wikipedia.org/wiki/Fast_Fourier_transform)
56 | 
57 | 2. **Logistic Regression**: Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).
58 | 
59 |      In logistic regression, the dependent variable is binary or dichotomous, i.e. it only contains data coded as 1 (TRUE, success, pregnant, etc.) or 0 (FALSE, failure, non-pregnant, etc.).The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. [**More Details**](https://en.wikipedia.org/wiki/Logistic_regression)
60 | 
61 | ## Code Explanation
62 | 
63 | For all the samples the basic approach is same. Following are the steps that are basic steps:
64 | 
65 | - Take the fft of each bearing of each file.
66 | - Calculate the Frequency and amplitude of it
67 | - Calculate the top 5 amplitude and their corresponding frequency.
68 | - Repeat the same for each bearing and each data file. Stored the result in the result data frame.
69 | 
70 | ### For FFT: it gives the Frequency vs time plot of each maximum frequency for each dataset.
71 | 
72 | ![Figure 1](.././Images/FFT/testset2/max2.jpg)
73 | *Figure 1.plot for the testset2, max2 frequency for all the bearing.*
74 | 
75 | 
76 | ### For Logistic regression
77 | 
78 | Calculate the label for the data frame (in depended variable). Assume that first 70% is in normal condition and rest 71-100% are suspected to fail. For the passed one give label &#39;0&#39; and for suspected to fail give label &#39;1&#39;. Train the logistic regression on result data frame. Stored the model using numpy.
79 | 
80 | For testing: Take input path from the user, he wants to check which bearing is suspected to fail which is in normal condition. Check the no of one label, and no of zero label in the last 100 predictions. If the no of one prediction is more than no oh zero label, then bearing is suspected to fail else it is in normal condition.
81 | 
82 |  ![Figure 2](.././Images/LogisticRegression/testset2_figure.jpg)
83 |  *Figure 2.predicted last 100  labels for testset2, for all four bearings*
84 | 
85 |  ![Figure 3](.././Images/LogisticRegression/testset2_result.jpg)
86 |  *Figure 3.result for the bearing which is failed for the testset2*
87 | 
88 | 
89 | **NOTE:**
90 | 
91 | TESTSET 3 of NASA the bearing dataset is discarded for the observations because of the following reasons:
92 | 
93 | 1: It has 6324 file in actuality, but according to the documentation its contains 4448 data file. This makes very noisy data.
94 | 
95 | 2: None of the bearing indicate symptoms of failure. However, it suddenly fails. This makes data inconsistent.
96 | 
97 | The above listed reasons describe how testset3 exhibits unpredictable behavior.
98 | 


--------------------------------------------------------------------------------
/Logistic Regression/utils.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon May 14 12:20:53 2018
  4 | 
  5 | @author:SHEFALI JAIN
  6 | """
  7 | """
  8 |     This scripts contain the functions
  9 | """
 10 | 
 11 | #importing the libraries
 12 | import numpy as np
 13 | import pandas as pd
 14 | import matplotlib.pyplot as plt
 15 | 
 16 | from scipy.fftpack import fft
 17 | from scipy.spatial.distance import cdist
 18 | from sklearn import cluster
 19 | 
 20 | #cal_labels function take no_of_files as input and generate the label based on 70-30 split.
 21 | #files for the testset1 = 2148,testset2 = 984,testset3 = 6324
 22 | def cal_Labels(files):
 23 |     range_low = files*0.7
 24 |     range_high = files*1.0
 25 |     label = []
 26 |     for i in range(0,files):
 27 |         if(i<range_low):
 28 |             label.append(0)
 29 |         elif(i >= range_low and i <= range_high):
 30 |             label.append(1)
 31 |         else:
 32 |             label.append(2)
 33 |     return label
 34 | 
 35 | # cal_amplitude take the fftdata, n = no of maximun amplitude as input and return the top5 frequecy which has the highest amplitude
 36 | def cal_amplitude(fftData,n):
 37 |     ifa = []
 38 |     ia = []
 39 |     amp = abs(fftData[0:int(len(fftData)/2)])
 40 |     freq = np.linspace(0,10000,num = int(len(fftData)/2))
 41 |     ida = np.array(amp).argsort()[-n:][::-1]
 42 |     ia.append([amp[i] for i in ida])
 43 |     ifa.append([freq[i] for i in ida])
 44 |     return(ifa,ia)
 45 | 
 46 | # this function calculate the top n freq which has the heighest amplitude and retuen the list for each maximum
 47 | def cal_max_freq(files,path):
 48 |     freq_max1, freq_max2, freq_max3, freq_max4, freq_max5 = ([] for _ in range(5))
 49 |     for f in files:
 50 |         temp = pd.read_csv(path+f,  sep = "\t",header = None)
 51 |         temp_freq_max1,temp_freq_max2,temp_freq_max3,temp_freq_max4,temp_freq_max5 = ([] for _ in range(5))
 52 |         if(path ==  "1st_test/"):
 53 |             rhigh = 8
 54 |         else:
 55 |             rhigh = 4
 56 |         for i in range(0,rhigh):
 57 |             t = fft(temp[i])
 58 |             ff,aa = cal_amplitude(t,5)
 59 |             temp_freq_max1.append(np.array(ff)[:,0])
 60 |             temp_freq_max2.append(np.array(ff)[:,1])
 61 |             temp_freq_max3.append(np.array(ff)[:,2])
 62 |             temp_freq_max4.append(np.array(ff)[:,3])
 63 |             temp_freq_max5.append(np.array(ff)[:,4])
 64 |         freq_max1.append(temp_freq_max1)
 65 |         freq_max2.append(temp_freq_max2)
 66 |         freq_max3.append(temp_freq_max3)
 67 |         freq_max4.append(temp_freq_max4)
 68 |         freq_max5.append(temp_freq_max5)
 69 |     return(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5)
 70 | 
 71 | 
 72 | # take the labels for each bearing, plot the corrosponding graph for each bearing .
 73 | 
 74 | def plotlabels(labels):
 75 |     length = len(labels)
 76 |     leng = len(labels[0])
 77 |     if(length == 4):
 78 |         ax1  =  plt.subplot2grid((8,1), (0,0), rowspan = 2, colspan = 1)
 79 |         ax2 = plt.subplot2grid((8,1), (2,0), rowspan = 2, colspan = 1)
 80 |         ax3 = plt.subplot2grid((8,1), (4,0), rowspan = 2, colspan = 1)
 81 |         ax4 = plt.subplot2grid((8,1), (6,0), rowspan = 2, colspan = 1)
 82 |         y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1")
 83 |         y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing2")
 84 |         y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing3")
 85 |         y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing4")
 86 |         plt.legend(handles = [y1,y2,y3,y4])
 87 |     elif(length == 8):
 88 |         ax1 = plt.subplot2grid((16,1), (0,0), rowspan = 2, colspan = 1)
 89 |         ax2 = plt.subplot2grid((16,1), (2,0), rowspan = 2, colspan = 1)
 90 |         ax3 = plt.subplot2grid((16,1), (4,0), rowspan = 2, colspan = 1)
 91 |         ax4 = plt.subplot2grid((16,1), (6,0), rowspan = 2, colspan = 1)
 92 |         ax5 = plt.subplot2grid((16,1), (8,0), rowspan = 2, colspan = 1)
 93 |         ax6 = plt.subplot2grid((16,1), (10,0), rowspan = 2, colspan = 1)
 94 |         ax7 = plt.subplot2grid((16,1), (12,0), rowspan = 2, colspan = 1)
 95 |         ax8 = plt.subplot2grid((16,1), (14,0), rowspan = 2, colspan = 1)
 96 |         y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1_x")
 97 |         y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing1_y")
 98 |         y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing2_x")
 99 |         y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing2_y")
100 |         y5 = ax5.scatter(np.array(range(1,leng+1)),np.array(labels)[4],label = "bearing3_x")
101 |         y6 = ax6.scatter(np.array(range(1,leng+1)),np.array(labels)[5],label = "bearing3_y")
102 |         y7 = ax7.scatter(np.array(range(1,leng+1)),np.array(labels)[6],label = "bearing4_x")
103 |         y8 = ax8.scatter(np.array(range(1,leng+1)),np.array(labels)[7],label = "bearing4_y")
104 |         plt.show()
105 |         plt.legend(handles = [y1,y2,y3,y4,y5,y6,y7,y8])
106 | 
107 | def create_dataframe(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5,bearing):
108 |     result = pd.DataFrame()
109 |     result['fmax1'] = list((np.array(freq_max1))[:,bearing])
110 |     result['fmax2'] = list((np.array(freq_max2))[:,bearing])
111 |     result['fmax3'] = list((np.array(freq_max3))[:,bearing])
112 |     result['fmax4'] = list((np.array(freq_max4))[:,bearing])
113 |     result['fmax5'] = list((np.array(freq_max5))[:,bearing])
114 |     x = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]]
115 |     return x
116 | 
117 | def elbow_method(X):
118 |     distortions = []
119 |     K = range(1,10)
120 |     for k in K:
121 |         kmeanModel = cluster.KMeans(n_clusters = k).fit(X)
122 |         kmeanModel.fit(X)
123 |         distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, 'euclidean'), axis = 1)) / X.shape[0])
124 |     #  Plot the elbow
125 |     plt.plot(K, distortions, 'bx-')
126 |     plt.xlabel('k')
127 |     plt.ylabel('Distortion')
128 |     plt.title('The Elbow Method showing the optimal k')
129 |     plt.show()


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DISCONTINUATION OF PROJECT #
  2 | This project will no longer be maintained by Intel.
  3 | Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
  4 | Intel no longer accepts patches to this project.
  5 | # Real-time time-series anomaly detection
  6 | 
  7 | | Programming Language |  Python |
  8 | | --- | --- |
  9 | | Skills (beg, intermediate, advanced) |  Intermediate |
 10 | | Time to complete project (in increments of 15 min) |  25 min |
 11 | | Hardware needed (hardware used) | Up Squared* board |
 12 | | Target Operating System |   |
 13 | 
 14 | The monitoring of manufacturing equipment is vital to any industrial process. Sometimes it is critical that equipment be monitored in real-time for faults and anomalies to prevent damage and correlate equipment behavior faults to production line issues. Fault detection is the pre-cursor to predictive maintenance.
 15 | 
 16 | There are several methods which don&#39;t require training of a neural network to be able to detect failures, starting with the most basic (FFT), to the most complex (Gaussian Mixture Model). These have the advantage of being able to be re-used with minor modifications on different data streams, and don&#39;t require a lot of known previously classified data (unlike neural nets). In fact, some of these methods can be used to classify data in order to train DNNs.
 17 | 
 18 | ## What you&#39;ll learn
 19 | 
 20 | - Basic implementation of FFT, Logistic Regression, K-Means clustering, GMM.
 21 | - How FFT is helpful in Feature engineering of vibrational data of a machine.
 22 | 
 23 | ## Setup
 24 | 
 25 | - Download the Bearing Data Set (also on [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/))
 26 | - Extract the zip format of data into respective folder named 1st\_test/2nd\_test.
 27 | - Make sure you have the following libraries:
 28 | 
 29 |   * Numpy
 30 |   * Pandas
 31 |   * Matplotlib.pyplot
 32 |   * Sklearn
 33 |   * Scipy
 34 | 
 35 | - Download the code
 36 | - Make sure all the code and data in one directory.
 37 | 
 38 | ## Gather your materials
 39 | 
 40 | - [UP Squared board](http://www.up-board.org/upsquared/)
 41 | 
 42 | ## Get the code
 43 | 
 44 | Open the example in a console or any python supported IDE (Spyder). Set the working directory where your code and dataset is stored.
 45 | 
 46 | ## Run the application
 47 | 
 48 |  There are 4 samples FFT, Logistic Regression, Kmeans, GMM. For each one of these there is one or multiple files to execute.
 49 | 
 50 | | SAMPLE FILE NAME | EXPECTED OUTPUT | Note |
 51 | | --- | --- | --- |
 52 | | FreqTime.py | Frequency v/s Time plot |   |
 53 | | Utils.py | Have all the function for all the module |   |
 54 | | Train\_logistic\_regression.py | Saved weight of the trained module in the filename &quot;LogisticRegression&quot; | Training for the logistic Regression is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. |
 55 | | Test\_logistic\_regression.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take input path from the user , he wants to check which bearing is suspected to fail, which is in normal condition |
 56 | | Train\_kmeans.py | Saved weight of the trained module in the filename &quot;Kmeans&quot; | Training for the Kmeans is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. |
 57 | | Test\_kmeans.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take input path from the user , he wants to check which bearing is suspected to fail, which is in normal conditionThe label **0-7 is the range(early-most critical for failure)****\*\* its able to detect the bearing 1\_y axis which is very close to failue, hence showing** |
 58 | | Train\_GMM.py | Saved weight of the trained module in the filename &quot;GMM&quot; | Training for the GMM is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. |
 59 | | Test\_GMM.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take input path from the user, he wants to check which bearing is suspected to fail, which is in normal conditionThe label **0-2 is the range(early-most critical for failure)** |
 60 | 
 61 | ## How it works
 62 | 
 63 | 1. **FFT**: A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase.
 64 |      [Y](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-Y) = fft( [X](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-X)) computes the discrete Fourier transform (DFT) of X using a fast Fourier transform (FFT) algorithm. If X is a vector, then fft(X) returns the Fourier transform of the vector. [**More details**](https://en.wikipedia.org/wiki/Fast_Fourier_transform)
 65 | 
 66 | 2. **Logistic Regression**: Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).
 67 | 
 68 |      In logistic regression, the dependent variable is binary or dichotomous, i.e. it only contains data coded as 1 (TRUE, success, pregnant, etc.) or 0 (FALSE, failure, non-pregnant, etc.).The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. [**More details**](https://en.wikipedia.org/wiki/Logistic_regression)
 69 | 
 70 | 3. **Kmeans Clustering**: K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.
 71 | 
 72 |       The results of the K-means clustering algorithm are:
 73 | 
 74 |       - The centroids of the K clusters, which can be used to label new data
 75 |       - Labels for the training data (each data point is assigned to a single cluster)
 76 | 
 77 |      **Each centroid of a cluster is a collection of feature values which define the resulting groups. Examining the centroid feature weights can be used to qualitatively interpret what kind of group each cluster represents.** [**More details**](https://en.wikipedia.org/wiki/K-means_clustering)
 78 | 
 79 | 4. **GMM**: A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. One can think of mixture models as generalizing k-means clustering to incorporate information about the covariance structure of the data as well as the centers of the latent Gaussians.
 80 | 
 81 |      The GaussianMixture object implements the expectation-maximization (EM) algorithm for fitting mixture-of-Gaussian models. It can also draw confidence ellipsoids for multivariate models, and compute the Bayesian Information Criterion to assess the number of clusters in the data. A GaussianMixture.fit method is provided that learns a Gaussian Mixture Model from train data. Given test data, it can assign to each sample the Gaussian it mostly probably belongs to using the GaussianMixture.predict method.
 82 | 
 83 |      The GaussianMixture comes with different options to constrain the covariance of the difference classes estimated: spherical, diagonal, tied or full covariance. [more details](https://en.wikipedia.org/wiki/Mixture_model)
 84 | 
 85 | ## Code Explanation:
 86 | 
 87 | For all the samples the basic approach is same. Following are the steps that are basic steps:
 88 | 
 89 | - Take the fft of each bearing of each file.
 90 | - Calculate the Frequency and amplitude of it
 91 | - Calculate the top 5 amplitude and their corresponding frequency.
 92 | - Repeat the same for each bearing and each data file. Stored the result in the result data frame.
 93 | 
 94 | ### For FFT: Gives the Frequency vs Time Plot of each maximum frequency for each dataset.
 95 | 
 96 | ![Figure 1](./Images/FFT/testset2/max2.jpg)
 97 | *Figure 1.plot for the testset2, max2 frequency for all the bearing.*
 98 | 
 99 | ### For Logistic regression:
100 | 
101 | Calculate the label for the data frame (in depended variable). Assume that first 70% is in normal condition and rest 71-100% are suspected to fail. For the passed one give label &#39;0&#39; and for suspected to fail give label &#39;1&#39;. Train the logistic regression on result data frame. Stored the model using numpy.
102 | 
103 | **For testing**: Take the input path from the user, he wants to check which bearing is suspected to fail &amp;which is in normal condition. Check the no of one label, and no of zero label in the last 100 predictions. If the no of one prediction is more than no oh zero label, then bearing is suspected to fail else it is in normal condition.
104 | 
105 |  ![Figure 2](./Images/LogisticRegression/testset2_figure.jpg)
106 |  *Figure 2.predicted last 100  labels for testset2, for all four bearings*
107 | 
108 |  ![Figure 3](./Images/LogisticRegression/testset2_result.jpg)
109 |  *Figure 3.result for the bearing which is failed for the testset2*
110 | 
111 | 
112 | ### For Kmeans Clustering:
113 | 
114 | Fit the result data frame into Kmeans cluster , grouping into 8 cluster(no is obtained from elbow method). Save the trained model into Kmeans.
115 | 
116 | For testing, take input path from the user, he wants to check which bearing is suspected to fail &amp;which is in normal condition. Calculate the no of label 5, 6 ,7 for the last 100 predictions if the its greater than 25 then its suspected to fail, else in normal condition.
117 | 
118 |  ![Figure 4](./Images/Kmeans/testset2_figure.jpg)
119 |  *Figure 4.predicted last 100 labels for testset2, for all four bearings using the Kmeans clustering*
120 | 
121 |  ![Figure 5](./Images/Kmeans/testset2_result.jpg)
122 |  *Figure 5. result of testset2 which bearing has failed using the Kmeans clustering*
123 |  
124 | ### For GMM:
125 | 
126 | Fit the result data frame into gmm model, grouping into 3 components (components depicts no of clusters). 
127 | Save the trained model as GMM using numpy.
128 | 
129 | For the testing, take input path from the user, he wants to check which bearing is suspected to fail &amp;which is in normal condition. Calculate the no of label 2 for the last 100 predictions if the its greater than 50 then its suspected to fail, else in normal condition.
130 | 
131 | ![Figure 6](./Images/GMM/testset2_figure.jpg)
132 |  *Figure 6.predicted last 100 labels for testset2, for all four bearings using the GMM clustering*
133 | 
134 | 
135 |  ![Figure 7](./Images/GMM/testset2_result.jpg)
136 |  *Figure 7. result of testset2 which bearing has failed using the GMM clustering*
137 | 
138 | **NOTE:**
139 | TESTSET 3 of NASA the bearing dataset is discarded for the observations because of the following reasons:
140 | 
141 |   1: It has 6324 file in actuality, but according to the documentation its contains 4448 data file. This makes very noisy data.
142 | 
143 |   2: None of the bearing indicate symptoms of failure. However, it suddenly fails. This makes data inconsistent.
144 | 
145 | The above listed reasons describe how testset3 exhibits unpredictable behavior.
146 | 


--------------------------------------------------------------------------------