├── FFT ├── FreqTime.py ├── readme.md └── utils.py ├── Images ├── FFT │ ├── testset1 │ │ ├── max1.jpg │ │ ├── max2.jpg │ │ ├── max3.png │ │ ├── max4.png │ │ └── max5.png │ └── testset2 │ │ ├── max1.png │ │ ├── max2.jpg │ │ ├── max3.png │ │ ├── max4.png │ │ └── max5.png └── LogisticRegression │ ├── testset1_figure.PNG │ ├── testset1_result.png │ ├── testset2_figure.jpg │ └── testset2_result.jpg ├── Logistic Regression ├── LogisticRegressionTraining.py ├── logisticregressiontesting.py ├── readme.md └── utils.py └── README.md /FFT/FreqTime.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Mon Jun 4 11:46:36 2018 4 | 5 | @author: SHEFALI JAIN 6 | """ 7 | """ 8 | this script is used to plot the frequency vs time graph. 9 | """ 10 | # import the libraries 11 | import pandas as pd 12 | import numpy as np 13 | import matplotlib.pyplot as plt 14 | 15 | from utils import cal_max_freq,plotlabels 16 | 17 | import os 18 | 19 | def reshapelist(lst): 20 | return (list(map(list, zip(*lst)))) 21 | 22 | try: 23 | # user input for the path of the dataset 24 | filedir = input("enter the complete directory path ") 25 | filepath = input("enter the folder name") 26 | 27 | #load the files 28 | all_files = os.listdir(filedir) 29 | freq_max1,freq_max2,freq_max3,freq_max4,freq_max5 = cal_max_freq(all_files,filepath) 30 | except IOError: 31 | print("you have entered either the wrong data directory path or filepath") 32 | 33 | freq = [freq_max1,freq_max2,freq_max3,freq_max4,freq_max5] 34 | 35 | max_type = input("enter the max type in range 1-5") 36 | max_type=int(max_type) 37 | if(max_type >= 1 and max_type <= 5): 38 | freqmax = reshapelist(freq[max_type-1]) 39 | else: 40 | print("you have enter the max type range except 1-5.") 41 | 42 | # plot the figure 43 | fig = plt.figure() 44 | plotlabels(freqmax) 45 | fig.suptitle(("freq_max"+str(max_type )+'vs time '), fontsize = 20) 46 | plt.ylabel('frequency', fontsize = 15) 47 | plt.xlabel('time', fontsize = 15) 48 | fig.savefig(("freq_"+str(max_type)+'.jpg')) -------------------------------------------------------------------------------- /FFT/readme.md: -------------------------------------------------------------------------------- 1 | # Real-time time-series anomaly detection 2 | 3 | | Programming Language | Python | 4 | | --- | --- | 5 | | Skills (beg, intermediate, advanced) | Intermediate | 6 | | Time to complete project (in increments of 15 min) | 25 min | 7 | | Hardware needed (hardware used) | Up Squared* board | 8 | | Target Operating System | | 9 | 10 | The monitoring of manufacturing equipment is vital to any industrial process. Sometimes it is critical that equipment be monitored in real-time for faults and anomalies to prevent damage and correlate equipment behavior faults to production line issues. Fault detection is the pre-cursor to predictive maintenance. 11 | 12 | There are several methods which don't require training of a neural network to be able to detect failures, starting with the most basic (FFT), to the most complex (Gaussian Mixture Model). These have the advantage of being able to be re-used with minor modifications on different data streams, and don't require a lot of known previously classified data (unlike neural nets). In fact, some of these methods can be used to classify data in order to train DNNs. 13 | 14 | ## What you'll learn 15 | - Basic implementation of FFT. 16 | - How FFT is helpful in Feature engineering of vibrational data of a machine. 17 | 18 | ## Setup 19 | - Download the Bearing Data Set (also on [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/)) 20 | - Extract the zip format of data into respective folder named 1st\_test/2nd\_test. 21 | - Make sure you have the following libraries: 22 | 23 | * Numpy 24 | * Pandas 25 | * Matplotlib.pyplot 26 | * Sklearn 27 | * Scipy 28 | 29 | - Download the code 30 | - Make sure all the code and data in one folder 31 | 32 | ## Gather your materials 33 | 34 | - [UP Squared* board](http://www.up-board.org/upsquared/) 35 | 36 | ## Get the code 37 | 38 | Open the example in a console or any python supported IDE (Spyder). Set the working directory where your code and dataset is stored. 39 | 40 | ## Run the application 41 | 42 | | SAMPLE FILE NAME | EXPECTED OUTPUT | Note | 43 | | --- | --- | --- | 44 | | FreqTime.py | Frequency v/s Time plot | | 45 | | Utils.py | Have all the function for all the module | | 46 | 47 | ## How it works 48 | 49 | 1. **FFT**: A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase. 50 | 51 | [Y](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-Y) = fft([X](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-X)) computes the discrete Fourier transform (DFT) of X using a fast Fourier transform (FFT) algorithm. If X is a vector, then fft(X) returns the Fourier transform of the vector [**More Details**](https://en.wikipedia.org/wiki/Fast_Fourier_transform) 52 | 53 | ## Code Explanation: 54 | 55 | For all the samples the basic approach is same. Following are the steps that are basic steps: 56 | 57 | - Take the fft of each bearing of each file. 58 | - Calculate the Frequency and amplitude of it 59 | - Calculate the top 5 amplitude and their corresponding frequency. 60 | - Repeat the same for each bearing and each data file. Stored the result in the result data frame. 61 | 62 | ### For FFT: Gives the Frequency vs time plot of each maximum frequency for each dataset. 63 | 64 | ![Figure 1](./Images/FFT/testset2/max2.jpg) 65 | 66 | *Figure 1. Plot for the testset2, max2 frequency for all the bearing.* 67 | 68 | **NOTE:** 69 | 70 | TESTSET 3 of the NASA bearing dataset is discarded for the observations because of the following reason: 71 | 72 | 1: It has 6324 data file in actuality, but according to the documentation it contains 4448 data file. This makes very noisy data. 73 | 74 | 2: None of the bearing indicates the symptomsof failure. However, it suddenly fails. This makes data inconsistent. 75 | 76 | The above listed reasons describe how testset3 exhibits unpredictable behavior. 77 | -------------------------------------------------------------------------------- /FFT/utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Mon May 14 12:20:53 2018 4 | 5 | @author:SHEFALI JAIN 6 | """ 7 | """ 8 | This scripts contain the functions 9 | """ 10 | 11 | #importing the libraries 12 | import numpy as np 13 | import pandas as pd 14 | import matplotlib.pyplot as plt 15 | 16 | from scipy.fftpack import fft 17 | from scipy.spatial.distance import cdist 18 | from sklearn import cluster 19 | 20 | #cal_labels function take no_of_files as input and generate the label based on 70-30 split. 21 | #files for the testset1 = 2148,testset2 = 984,testset3 = 6324 22 | def cal_Labels(files): 23 | range_low = files*0.7 24 | range_high = files*1.0 25 | label = [] 26 | for i in range(0,files): 27 | if(i= range_low and i <= range_high): 30 | label.append(1) 31 | else: 32 | label.append(2) 33 | return label 34 | 35 | # cal_amplitude take the fftdata, n = no of maximun amplitude as input and return the top5 frequecy which has the highest amplitude 36 | def cal_amplitude(fftData,n): 37 | ifa = [] 38 | ia = [] 39 | amp = abs(fftData[0:int(len(fftData)/2)]) 40 | freq = np.linspace(0,10000,num = int(len(fftData)/2)) 41 | ida = np.array(amp).argsort()[-n:][::-1] 42 | ia.append([amp[i] for i in ida]) 43 | ifa.append([freq[i] for i in ida]) 44 | return(ifa,ia) 45 | 46 | # this function calculate the top n freq which has the heighest amplitude and retuen the list for each maximum 47 | def cal_max_freq(files,path): 48 | freq_max1, freq_max2, freq_max3, freq_max4, freq_max5 = ([] for _ in range(5)) 49 | for f in files: 50 | temp = pd.read_csv(path+f, sep = "\t",header = None) 51 | temp_freq_max1,temp_freq_max2,temp_freq_max3,temp_freq_max4,temp_freq_max5 = ([] for _ in range(5)) 52 | if(path == "1st_test/"): 53 | rhigh = 8 54 | else: 55 | rhigh = 4 56 | for i in range(0,rhigh): 57 | t = fft(temp[i]) 58 | ff,aa = cal_amplitude(t,5) 59 | temp_freq_max1.append(np.array(ff)[:,0]) 60 | temp_freq_max2.append(np.array(ff)[:,1]) 61 | temp_freq_max3.append(np.array(ff)[:,2]) 62 | temp_freq_max4.append(np.array(ff)[:,3]) 63 | temp_freq_max5.append(np.array(ff)[:,4]) 64 | freq_max1.append(temp_freq_max1) 65 | freq_max2.append(temp_freq_max2) 66 | freq_max3.append(temp_freq_max3) 67 | freq_max4.append(temp_freq_max4) 68 | freq_max5.append(temp_freq_max5) 69 | return(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5) 70 | 71 | 72 | # take the labels for each bearing, plot the corrosponding graph for each bearing . 73 | 74 | def plotlabels(labels): 75 | length = len(labels) 76 | leng = len(labels[0]) 77 | if(length == 4): 78 | ax1 = plt.subplot2grid((8,1), (0,0), rowspan = 2, colspan = 1) 79 | ax2 = plt.subplot2grid((8,1), (2,0), rowspan = 2, colspan = 1) 80 | ax3 = plt.subplot2grid((8,1), (4,0), rowspan = 2, colspan = 1) 81 | ax4 = plt.subplot2grid((8,1), (6,0), rowspan = 2, colspan = 1) 82 | y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1") 83 | y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing2") 84 | y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing3") 85 | y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing4") 86 | plt.legend(handles = [y1,y2,y3,y4]) 87 | elif(length == 8): 88 | ax1 = plt.subplot2grid((16,1), (0,0), rowspan = 2, colspan = 1) 89 | ax2 = plt.subplot2grid((16,1), (2,0), rowspan = 2, colspan = 1) 90 | ax3 = plt.subplot2grid((16,1), (4,0), rowspan = 2, colspan = 1) 91 | ax4 = plt.subplot2grid((16,1), (6,0), rowspan = 2, colspan = 1) 92 | ax5 = plt.subplot2grid((16,1), (8,0), rowspan = 2, colspan = 1) 93 | ax6 = plt.subplot2grid((16,1), (10,0), rowspan = 2, colspan = 1) 94 | ax7 = plt.subplot2grid((16,1), (12,0), rowspan = 2, colspan = 1) 95 | ax8 = plt.subplot2grid((16,1), (14,0), rowspan = 2, colspan = 1) 96 | y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1_x") 97 | y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing1_y") 98 | y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing2_x") 99 | y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing2_y") 100 | y5 = ax5.scatter(np.array(range(1,leng+1)),np.array(labels)[4],label = "bearing3_x") 101 | y6 = ax6.scatter(np.array(range(1,leng+1)),np.array(labels)[5],label = "bearing3_y") 102 | y7 = ax7.scatter(np.array(range(1,leng+1)),np.array(labels)[6],label = "bearing4_x") 103 | y8 = ax8.scatter(np.array(range(1,leng+1)),np.array(labels)[7],label = "bearing4_y") 104 | plt.show() 105 | plt.legend(handles = [y1,y2,y3,y4,y5,y6,y7,y8]) 106 | 107 | def create_dataframe(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5,bearing): 108 | result = pd.DataFrame() 109 | result['fmax1'] = list((np.array(freq_max1))[:,bearing]) 110 | result['fmax2'] = list((np.array(freq_max2))[:,bearing]) 111 | result['fmax3'] = list((np.array(freq_max3))[:,bearing]) 112 | result['fmax4'] = list((np.array(freq_max4))[:,bearing]) 113 | result['fmax5'] = list((np.array(freq_max5))[:,bearing]) 114 | x = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]] 115 | return x 116 | 117 | def elbow_method(X): 118 | distortions = [] 119 | K = range(1,10) 120 | for k in K: 121 | kmeanModel = cluster.KMeans(n_clusters = k).fit(X) 122 | kmeanModel.fit(X) 123 | distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, 'euclidean'), axis = 1)) / X.shape[0]) 124 | # Plot the elbow 125 | plt.plot(K, distortions, 'bx-') 126 | plt.xlabel('k') 127 | plt.ylabel('Distortion') 128 | plt.title('The Elbow Method showing the optimal k') 129 | plt.show() -------------------------------------------------------------------------------- /Images/FFT/testset1/max1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max1.jpg -------------------------------------------------------------------------------- /Images/FFT/testset1/max2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max2.jpg -------------------------------------------------------------------------------- /Images/FFT/testset1/max3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max3.png -------------------------------------------------------------------------------- /Images/FFT/testset1/max4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max4.png -------------------------------------------------------------------------------- /Images/FFT/testset1/max5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset1/max5.png -------------------------------------------------------------------------------- /Images/FFT/testset2/max1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max1.png -------------------------------------------------------------------------------- /Images/FFT/testset2/max2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max2.jpg -------------------------------------------------------------------------------- /Images/FFT/testset2/max3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max3.png -------------------------------------------------------------------------------- /Images/FFT/testset2/max4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max4.png -------------------------------------------------------------------------------- /Images/FFT/testset2/max5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/FFT/testset2/max5.png -------------------------------------------------------------------------------- /Images/LogisticRegression/testset1_figure.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset1_figure.PNG -------------------------------------------------------------------------------- /Images/LogisticRegression/testset1_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset1_result.png -------------------------------------------------------------------------------- /Images/LogisticRegression/testset2_figure.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset2_figure.jpg -------------------------------------------------------------------------------- /Images/LogisticRegression/testset2_result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/intel-iot-devkit/predictive-maintenance-python/6457423cf60dea45792b2db0e7e8009fcee0f075/Images/LogisticRegression/testset2_result.jpg -------------------------------------------------------------------------------- /Logistic Regression/LogisticRegressionTraining.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue May 22 15:20:01 2018 4 | 5 | @author: SHEFALI JAIN 6 | """ 7 | 8 | """ 9 | this script is used the train the Logistic Regression model, having 2 label.Labels are trained on the dataset, which consist of 10 | Testset1: 11 | bearing 7(bearing 4, y axis), Fail case 12 | bearing 2(bearing 2, x axis),pass case 13 | Testset2: 14 | bearing 0(bearing 1 ) , Fail case 15 | bearing 1(bearing 2) , Pass case 16 | """ 17 | 18 | # importing libraries 19 | 20 | import pandas as pd 21 | import numpy as np 22 | 23 | from utils import cal_Labels,cal_max_freq,create_dataframe 24 | from sklearn.model_selection import train_test_split 25 | from sklearn.linear_model import LogisticRegression 26 | from sklearn import metrics 27 | 28 | import os 29 | 30 | # reading all the files from the testset1 and testset2 31 | #chnaage the full path, to your dataset path 32 | 33 | try: 34 | # reading all the files from the testset1, and testset2 35 | filedir_testset1 = input("enter the complete directory path for the testset1 ") 36 | filedir_testset2 = input("enter the complete directory path for the testset2 ") 37 | all_files_testset1 = os.listdir(filedir_testset1) 38 | all_files_testset2 = os.listdir(filedir_testset2) 39 | 40 | # relative path of the dataset, after the current working directory 41 | 42 | path_testset2 = "2nd_test/" 43 | path_testset1 = "1st_test/" 44 | 45 | testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5 = cal_max_freq(all_files_testset1,path_testset1) 46 | testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5 = cal_max_freq(all_files_testset2,path_testset2) 47 | 48 | except IOError: 49 | print("you have entered either the wrong data directory path for either testset1 or testset2") 50 | 51 | # calculating the labels for the bearing which is failed 52 | testset1_labelF = cal_Labels(len(all_files_testset1)) 53 | testset2_labelF = cal_Labels(len(all_files_testset2)) 54 | 55 | # creatine a datafRame for which bearing has failed 56 | result1 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,7) 57 | result1['labels'] = testset1_labelF 58 | 59 | result2 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,0) 60 | result2['labels'] = testset2_labelF 61 | 62 | # calculating the labels for the testset1,testet2, which is not failed 63 | testset1_labelP = np.array([0]*1800) 64 | testset2_labelP = np.array([0]*800) 65 | 66 | # creating a dataframe for which bearing is passed 67 | result3 = create_dataframe(testset1_freq_max1,testset1_freq_max2,testset1_freq_max3,testset1_freq_max4,testset1_freq_max5,2) 68 | result3 = result3[:1800] 69 | result3['labels'] = testset1_labelP 70 | 71 | result4 = create_dataframe(testset2_freq_max1,testset2_freq_max2,testset2_freq_max3,testset2_freq_max4,testset2_freq_max5,1) 72 | result4 = result4[:800] 73 | result4['labels'] = testset2_labelP 74 | 75 | # creating the final Result 76 | frames = [result1,result2,result3,result4] 77 | result = pd.concat(frames) 78 | 79 | # creating the X, Y variable 80 | x = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]] 81 | y = result['labels'] 82 | 83 | # spliting the x, y into train and test set 84 | x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 42,stratify = y) 85 | 86 | # training the model 87 | logisticRegr = LogisticRegression(class_weight = 'balanced',random_state = 42,max_iter = 300) 88 | logisticRegr.fit(x_train, y_train) 89 | predictions = logisticRegr.predict(x_test) 90 | 91 | # Use score method to get accuracy of model 92 | score = logisticRegr.score(x_test, y_test) 93 | print("training accuracy",score) 94 | cm = metrics.confusion_matrix(y_test, predictions) 95 | print("confusion matrix for training",cm) 96 | 97 | # save the model 98 | filename = "logisticRegressionModel.npy" 99 | np.save(filename,logisticRegr) 100 | 101 | 102 | 103 | 104 | 105 | -------------------------------------------------------------------------------- /Logistic Regression/logisticregressiontesting.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Mon May 14 14:09:55 2018 4 | 5 | @author:SHEFALI JAIN 6 | """ 7 | """ 8 | script for testing the bearing set whether they have any issue or not 9 | """ 10 | 11 | # import the libraries 12 | import numpy as np 13 | 14 | from utils import cal_max_freq,plotlabels,create_dataframe 15 | 16 | import os 17 | 18 | def main(): 19 | try: 20 | # user input for the path of the dataset 21 | filedir = input("enter the complete directory path ") 22 | filepath = input("enter the folder name") 23 | 24 | # load the files 25 | all_files = os.listdir(filedir) 26 | freq_max1,freq_max2,freq_max3,freq_max4,freq_max5 = cal_max_freq(all_files,filepath) 27 | except IOError: 28 | print("you have entered either the wrong data directory path or filepath") 29 | 30 | # load the model 31 | filename = "logisticRegressionModel.npy" 32 | logisticRegr = np.load(filename).item() 33 | 34 | #checking the iteration 35 | if (filepath=="1st_test/"): 36 | rhigh = 8 37 | else: 38 | rhigh = 4 39 | 40 | print("for the testset",filepath) 41 | prediction_last_100 = [] 42 | for i in range(0,rhigh): 43 | print("checking for the bearing",i+1) 44 | # creating the dataframe 45 | x = create_dataframe(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5,i) 46 | predictions = logisticRegr.predict(x) 47 | prediction_last_100.append(predictions[-100:]) 48 | # count no of zeros 49 | zero = list(predictions).count(0) 50 | ones = list(predictions).count(1) 51 | print("the no of paased files",zero) 52 | print("the no of failed files", ones) 53 | check_one = list(predictions[-100:]).count(1) 54 | check_zero = list(predictions[-100:]).count(0) 55 | 56 | if(check_one > check_zero): 57 | print("bearing is suspected, there are chances to fail") 58 | else: 59 | print("bearing has no issue") 60 | 61 | # plotting the last 100 prediction for each bearing 62 | plotlabels(prediction_last_100) 63 | 64 | 65 | if __name__ == "__main__": 66 | main() -------------------------------------------------------------------------------- /Logistic Regression/readme.md: -------------------------------------------------------------------------------- 1 | # Real-time time-series anomaly detection 2 | 3 | | Programming Language | Python | 4 | | --- | --- | 5 | | Skills (beg, intermediate, advanced) | Intermediate | 6 | | Time to complete project (in increments of 15 min) | 25 min | 7 | | Hardware needed (hardware used) | Up Squared* board | 8 | | Target Operating System | | 9 | 10 | The monitoring of manufacturing equipment is vital to any industrial process. Sometimes it is critical that equipment be monitored in real-time for faults and anomalies to prevent damage and correlate equipment behavior faults to production line issues. Fault detection is the pre-cursor to predictive maintenance. 11 | 12 | There are several methods which don't require training of a neural network to be able to detect failures, starting with the most basic (FFT), to the most complex (Gaussian Mixture Model). These have the advantage of being able to be re-used with minor modifications on different data streams, and don't require a lot of known previously classified data (unlike neural nets). In fact, some of these methods can be used to classify data in order to train DNNs. 13 | 14 | ## What you'll learn 15 | 16 | - Basic implementation of FFT, Logistic Regression. 17 | - How FFT is helpful in Feature engineering of vibrational data of a machine. 18 | 19 | ## Setup 20 | 21 | - Download the Bearing Data Set (also on [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/)) 22 | - Extract the zip format of data into respective folder named 1st\_test/2nd\_test. 23 | - Make sure you have the following libraries: 24 | 25 | * Numpy 26 | * Pandas 27 | * Matplotlib.pyplot 28 | * Sklearn 29 | * Scipy 30 | 31 | - Download the code 32 | - Make sure all the code and data in one folder 33 | 34 | ## Gather your materials 35 | 36 | - [UP Squared* board](http://www.up-board.org/upsquared/) 37 | 38 | ## Get the code 39 | 40 | Open the example in a console or any python supported IDE( Spyder ). Set the working directory where your code and dataset is stored. 41 | 42 | ## Run the application 43 | For logistic Regression there are 3 files to execute. 44 | 45 | | SAMPLE FILE NAME | EXPECTED OUTPUT | Note | 46 | | --- | --- | --- | 47 | | Utils.py | Have all the function for all the module | | 48 | | Train\_logistic\_regression.py | Saved weight of the trained module in the filename "LogisticRegression" | Training for the logistic Regression is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. | 49 | | Test\_logistic\_regression.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take the input path from the user to check which bearing is suspected to fail and which is in normal condition. 50 | 51 | ## How it works: 52 | 53 | 1. **FFT**: A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase. 54 | 55 | [Y](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-Y) = fft([X](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-X)) computes the discrete Fourier transform (DFT) of X using a fast Fourier transform (FFT) algorithm. If X is a vector, then fft(X) returns the Fourier transform of the vector [**More Details**](https://en.wikipedia.org/wiki/Fast_Fourier_transform) 56 | 57 | 2. **Logistic Regression**: Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). 58 | 59 | In logistic regression, the dependent variable is binary or dichotomous, i.e. it only contains data coded as 1 (TRUE, success, pregnant, etc.) or 0 (FALSE, failure, non-pregnant, etc.).The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. [**More Details**](https://en.wikipedia.org/wiki/Logistic_regression) 60 | 61 | ## Code Explanation 62 | 63 | For all the samples the basic approach is same. Following are the steps that are basic steps: 64 | 65 | - Take the fft of each bearing of each file. 66 | - Calculate the Frequency and amplitude of it 67 | - Calculate the top 5 amplitude and their corresponding frequency. 68 | - Repeat the same for each bearing and each data file. Stored the result in the result data frame. 69 | 70 | ### For FFT: it gives the Frequency vs time plot of each maximum frequency for each dataset. 71 | 72 | ![Figure 1](.././Images/FFT/testset2/max2.jpg) 73 | *Figure 1.plot for the testset2, max2 frequency for all the bearing.* 74 | 75 | 76 | ### For Logistic regression 77 | 78 | Calculate the label for the data frame (in depended variable). Assume that first 70% is in normal condition and rest 71-100% are suspected to fail. For the passed one give label '0' and for suspected to fail give label '1'. Train the logistic regression on result data frame. Stored the model using numpy. 79 | 80 | For testing: Take input path from the user, he wants to check which bearing is suspected to fail which is in normal condition. Check the no of one label, and no of zero label in the last 100 predictions. If the no of one prediction is more than no oh zero label, then bearing is suspected to fail else it is in normal condition. 81 | 82 | ![Figure 2](.././Images/LogisticRegression/testset2_figure.jpg) 83 | *Figure 2.predicted last 100 labels for testset2, for all four bearings* 84 | 85 | ![Figure 3](.././Images/LogisticRegression/testset2_result.jpg) 86 | *Figure 3.result for the bearing which is failed for the testset2* 87 | 88 | 89 | **NOTE:** 90 | 91 | TESTSET 3 of NASA the bearing dataset is discarded for the observations because of the following reasons: 92 | 93 | 1: It has 6324 file in actuality, but according to the documentation its contains 4448 data file. This makes very noisy data. 94 | 95 | 2: None of the bearing indicate symptoms of failure. However, it suddenly fails. This makes data inconsistent. 96 | 97 | The above listed reasons describe how testset3 exhibits unpredictable behavior. 98 | -------------------------------------------------------------------------------- /Logistic Regression/utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Mon May 14 12:20:53 2018 4 | 5 | @author:SHEFALI JAIN 6 | """ 7 | """ 8 | This scripts contain the functions 9 | """ 10 | 11 | #importing the libraries 12 | import numpy as np 13 | import pandas as pd 14 | import matplotlib.pyplot as plt 15 | 16 | from scipy.fftpack import fft 17 | from scipy.spatial.distance import cdist 18 | from sklearn import cluster 19 | 20 | #cal_labels function take no_of_files as input and generate the label based on 70-30 split. 21 | #files for the testset1 = 2148,testset2 = 984,testset3 = 6324 22 | def cal_Labels(files): 23 | range_low = files*0.7 24 | range_high = files*1.0 25 | label = [] 26 | for i in range(0,files): 27 | if(i= range_low and i <= range_high): 30 | label.append(1) 31 | else: 32 | label.append(2) 33 | return label 34 | 35 | # cal_amplitude take the fftdata, n = no of maximun amplitude as input and return the top5 frequecy which has the highest amplitude 36 | def cal_amplitude(fftData,n): 37 | ifa = [] 38 | ia = [] 39 | amp = abs(fftData[0:int(len(fftData)/2)]) 40 | freq = np.linspace(0,10000,num = int(len(fftData)/2)) 41 | ida = np.array(amp).argsort()[-n:][::-1] 42 | ia.append([amp[i] for i in ida]) 43 | ifa.append([freq[i] for i in ida]) 44 | return(ifa,ia) 45 | 46 | # this function calculate the top n freq which has the heighest amplitude and retuen the list for each maximum 47 | def cal_max_freq(files,path): 48 | freq_max1, freq_max2, freq_max3, freq_max4, freq_max5 = ([] for _ in range(5)) 49 | for f in files: 50 | temp = pd.read_csv(path+f, sep = "\t",header = None) 51 | temp_freq_max1,temp_freq_max2,temp_freq_max3,temp_freq_max4,temp_freq_max5 = ([] for _ in range(5)) 52 | if(path == "1st_test/"): 53 | rhigh = 8 54 | else: 55 | rhigh = 4 56 | for i in range(0,rhigh): 57 | t = fft(temp[i]) 58 | ff,aa = cal_amplitude(t,5) 59 | temp_freq_max1.append(np.array(ff)[:,0]) 60 | temp_freq_max2.append(np.array(ff)[:,1]) 61 | temp_freq_max3.append(np.array(ff)[:,2]) 62 | temp_freq_max4.append(np.array(ff)[:,3]) 63 | temp_freq_max5.append(np.array(ff)[:,4]) 64 | freq_max1.append(temp_freq_max1) 65 | freq_max2.append(temp_freq_max2) 66 | freq_max3.append(temp_freq_max3) 67 | freq_max4.append(temp_freq_max4) 68 | freq_max5.append(temp_freq_max5) 69 | return(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5) 70 | 71 | 72 | # take the labels for each bearing, plot the corrosponding graph for each bearing . 73 | 74 | def plotlabels(labels): 75 | length = len(labels) 76 | leng = len(labels[0]) 77 | if(length == 4): 78 | ax1 = plt.subplot2grid((8,1), (0,0), rowspan = 2, colspan = 1) 79 | ax2 = plt.subplot2grid((8,1), (2,0), rowspan = 2, colspan = 1) 80 | ax3 = plt.subplot2grid((8,1), (4,0), rowspan = 2, colspan = 1) 81 | ax4 = plt.subplot2grid((8,1), (6,0), rowspan = 2, colspan = 1) 82 | y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1") 83 | y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing2") 84 | y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing3") 85 | y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing4") 86 | plt.legend(handles = [y1,y2,y3,y4]) 87 | elif(length == 8): 88 | ax1 = plt.subplot2grid((16,1), (0,0), rowspan = 2, colspan = 1) 89 | ax2 = plt.subplot2grid((16,1), (2,0), rowspan = 2, colspan = 1) 90 | ax3 = plt.subplot2grid((16,1), (4,0), rowspan = 2, colspan = 1) 91 | ax4 = plt.subplot2grid((16,1), (6,0), rowspan = 2, colspan = 1) 92 | ax5 = plt.subplot2grid((16,1), (8,0), rowspan = 2, colspan = 1) 93 | ax6 = plt.subplot2grid((16,1), (10,0), rowspan = 2, colspan = 1) 94 | ax7 = plt.subplot2grid((16,1), (12,0), rowspan = 2, colspan = 1) 95 | ax8 = plt.subplot2grid((16,1), (14,0), rowspan = 2, colspan = 1) 96 | y1 = ax1.scatter(np.array(range(1,leng+1)),np.array(labels)[0],label = "bearing1_x") 97 | y2 = ax2.scatter(np.array(range(1,leng+1)),np.array(labels)[1],label = "bearing1_y") 98 | y3 = ax3.scatter(np.array(range(1,leng+1)),np.array(labels)[2],label = "bearing2_x") 99 | y4 = ax4.scatter(np.array(range(1,leng+1)),np.array(labels)[3],label = "bearing2_y") 100 | y5 = ax5.scatter(np.array(range(1,leng+1)),np.array(labels)[4],label = "bearing3_x") 101 | y6 = ax6.scatter(np.array(range(1,leng+1)),np.array(labels)[5],label = "bearing3_y") 102 | y7 = ax7.scatter(np.array(range(1,leng+1)),np.array(labels)[6],label = "bearing4_x") 103 | y8 = ax8.scatter(np.array(range(1,leng+1)),np.array(labels)[7],label = "bearing4_y") 104 | plt.show() 105 | plt.legend(handles = [y1,y2,y3,y4,y5,y6,y7,y8]) 106 | 107 | def create_dataframe(freq_max1,freq_max2,freq_max3,freq_max4,freq_max5,bearing): 108 | result = pd.DataFrame() 109 | result['fmax1'] = list((np.array(freq_max1))[:,bearing]) 110 | result['fmax2'] = list((np.array(freq_max2))[:,bearing]) 111 | result['fmax3'] = list((np.array(freq_max3))[:,bearing]) 112 | result['fmax4'] = list((np.array(freq_max4))[:,bearing]) 113 | result['fmax5'] = list((np.array(freq_max5))[:,bearing]) 114 | x = result[["fmax1","fmax2","fmax3","fmax4","fmax5"]] 115 | return x 116 | 117 | def elbow_method(X): 118 | distortions = [] 119 | K = range(1,10) 120 | for k in K: 121 | kmeanModel = cluster.KMeans(n_clusters = k).fit(X) 122 | kmeanModel.fit(X) 123 | distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, 'euclidean'), axis = 1)) / X.shape[0]) 124 | # Plot the elbow 125 | plt.plot(K, distortions, 'bx-') 126 | plt.xlabel('k') 127 | plt.ylabel('Distortion') 128 | plt.title('The Elbow Method showing the optimal k') 129 | plt.show() -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DISCONTINUATION OF PROJECT # 2 | This project will no longer be maintained by Intel. 3 | Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. 4 | Intel no longer accepts patches to this project. 5 | # Real-time time-series anomaly detection 6 | 7 | | Programming Language | Python | 8 | | --- | --- | 9 | | Skills (beg, intermediate, advanced) | Intermediate | 10 | | Time to complete project (in increments of 15 min) | 25 min | 11 | | Hardware needed (hardware used) | Up Squared* board | 12 | | Target Operating System | | 13 | 14 | The monitoring of manufacturing equipment is vital to any industrial process. Sometimes it is critical that equipment be monitored in real-time for faults and anomalies to prevent damage and correlate equipment behavior faults to production line issues. Fault detection is the pre-cursor to predictive maintenance. 15 | 16 | There are several methods which don't require training of a neural network to be able to detect failures, starting with the most basic (FFT), to the most complex (Gaussian Mixture Model). These have the advantage of being able to be re-used with minor modifications on different data streams, and don't require a lot of known previously classified data (unlike neural nets). In fact, some of these methods can be used to classify data in order to train DNNs. 17 | 18 | ## What you'll learn 19 | 20 | - Basic implementation of FFT, Logistic Regression, K-Means clustering, GMM. 21 | - How FFT is helpful in Feature engineering of vibrational data of a machine. 22 | 23 | ## Setup 24 | 25 | - Download the Bearing Data Set (also on [https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/)) 26 | - Extract the zip format of data into respective folder named 1st\_test/2nd\_test. 27 | - Make sure you have the following libraries: 28 | 29 | * Numpy 30 | * Pandas 31 | * Matplotlib.pyplot 32 | * Sklearn 33 | * Scipy 34 | 35 | - Download the code 36 | - Make sure all the code and data in one directory. 37 | 38 | ## Gather your materials 39 | 40 | - [UP Squared board](http://www.up-board.org/upsquared/) 41 | 42 | ## Get the code 43 | 44 | Open the example in a console or any python supported IDE (Spyder). Set the working directory where your code and dataset is stored. 45 | 46 | ## Run the application 47 | 48 | There are 4 samples FFT, Logistic Regression, Kmeans, GMM. For each one of these there is one or multiple files to execute. 49 | 50 | | SAMPLE FILE NAME | EXPECTED OUTPUT | Note | 51 | | --- | --- | --- | 52 | | FreqTime.py | Frequency v/s Time plot | | 53 | | Utils.py | Have all the function for all the module | | 54 | | Train\_logistic\_regression.py | Saved weight of the trained module in the filename "LogisticRegression" | Training for the logistic Regression is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. | 55 | | Test\_logistic\_regression.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take input path from the user , he wants to check which bearing is suspected to fail, which is in normal condition | 56 | | Train\_kmeans.py | Saved weight of the trained module in the filename "Kmeans" | Training for the Kmeans is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. | 57 | | Test\_kmeans.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take input path from the user , he wants to check which bearing is suspected to fail, which is in normal conditionThe label **0-7 is the range(early-most critical for failure)****\*\* its able to detect the bearing 1\_y axis which is very close to failue, hence showing** | 58 | | Train\_GMM.py | Saved weight of the trained module in the filename "GMM" | Training for the GMM is done on testset1 ,bearing4\_y axis (failed ),bearing 2\_x axis (passed ) and testset2 bearing1(failed),bearing2(passed)\*training dataset is increased for better result. | 59 | | Test\_GMM.py | Result for the particular test set which bearing is failed and the plot of last 100 predicted labels for all the bearings. | Take input path from the user, he wants to check which bearing is suspected to fail, which is in normal conditionThe label **0-2 is the range(early-most critical for failure)** | 60 | 61 | ## How it works 62 | 63 | 1. **FFT**: A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase. 64 | [Y](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-Y) = fft( [X](https://in.mathworks.com/help/matlab/ref/fft.html#f83-998360-X)) computes the discrete Fourier transform (DFT) of X using a fast Fourier transform (FFT) algorithm. If X is a vector, then fft(X) returns the Fourier transform of the vector. [**More details**](https://en.wikipedia.org/wiki/Fast_Fourier_transform) 65 | 66 | 2. **Logistic Regression**: Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). 67 | 68 | In logistic regression, the dependent variable is binary or dichotomous, i.e. it only contains data coded as 1 (TRUE, success, pregnant, etc.) or 0 (FALSE, failure, non-pregnant, etc.).The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. [**More details**](https://en.wikipedia.org/wiki/Logistic_regression) 69 | 70 | 3. **Kmeans Clustering**: K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. 71 | 72 | The results of the K-means clustering algorithm are: 73 | 74 | - The centroids of the K clusters, which can be used to label new data 75 | - Labels for the training data (each data point is assigned to a single cluster) 76 | 77 | **Each centroid of a cluster is a collection of feature values which define the resulting groups. Examining the centroid feature weights can be used to qualitatively interpret what kind of group each cluster represents.** [**More details**](https://en.wikipedia.org/wiki/K-means_clustering) 78 | 79 | 4. **GMM**: A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. One can think of mixture models as generalizing k-means clustering to incorporate information about the covariance structure of the data as well as the centers of the latent Gaussians. 80 | 81 | The GaussianMixture object implements the expectation-maximization (EM) algorithm for fitting mixture-of-Gaussian models. It can also draw confidence ellipsoids for multivariate models, and compute the Bayesian Information Criterion to assess the number of clusters in the data. A GaussianMixture.fit method is provided that learns a Gaussian Mixture Model from train data. Given test data, it can assign to each sample the Gaussian it mostly probably belongs to using the GaussianMixture.predict method. 82 | 83 | The GaussianMixture comes with different options to constrain the covariance of the difference classes estimated: spherical, diagonal, tied or full covariance. [more details](https://en.wikipedia.org/wiki/Mixture_model) 84 | 85 | ## Code Explanation: 86 | 87 | For all the samples the basic approach is same. Following are the steps that are basic steps: 88 | 89 | - Take the fft of each bearing of each file. 90 | - Calculate the Frequency and amplitude of it 91 | - Calculate the top 5 amplitude and their corresponding frequency. 92 | - Repeat the same for each bearing and each data file. Stored the result in the result data frame. 93 | 94 | ### For FFT: Gives the Frequency vs Time Plot of each maximum frequency for each dataset. 95 | 96 | ![Figure 1](./Images/FFT/testset2/max2.jpg) 97 | *Figure 1.plot for the testset2, max2 frequency for all the bearing.* 98 | 99 | ### For Logistic regression: 100 | 101 | Calculate the label for the data frame (in depended variable). Assume that first 70% is in normal condition and rest 71-100% are suspected to fail. For the passed one give label '0' and for suspected to fail give label '1'. Train the logistic regression on result data frame. Stored the model using numpy. 102 | 103 | **For testing**: Take the input path from the user, he wants to check which bearing is suspected to fail &which is in normal condition. Check the no of one label, and no of zero label in the last 100 predictions. If the no of one prediction is more than no oh zero label, then bearing is suspected to fail else it is in normal condition. 104 | 105 | ![Figure 2](./Images/LogisticRegression/testset2_figure.jpg) 106 | *Figure 2.predicted last 100 labels for testset2, for all four bearings* 107 | 108 | ![Figure 3](./Images/LogisticRegression/testset2_result.jpg) 109 | *Figure 3.result for the bearing which is failed for the testset2* 110 | 111 | 112 | ### For Kmeans Clustering: 113 | 114 | Fit the result data frame into Kmeans cluster , grouping into 8 cluster(no is obtained from elbow method). Save the trained model into Kmeans. 115 | 116 | For testing, take input path from the user, he wants to check which bearing is suspected to fail &which is in normal condition. Calculate the no of label 5, 6 ,7 for the last 100 predictions if the its greater than 25 then its suspected to fail, else in normal condition. 117 | 118 | ![Figure 4](./Images/Kmeans/testset2_figure.jpg) 119 | *Figure 4.predicted last 100 labels for testset2, for all four bearings using the Kmeans clustering* 120 | 121 | ![Figure 5](./Images/Kmeans/testset2_result.jpg) 122 | *Figure 5. result of testset2 which bearing has failed using the Kmeans clustering* 123 | 124 | ### For GMM: 125 | 126 | Fit the result data frame into gmm model, grouping into 3 components (components depicts no of clusters). 127 | Save the trained model as GMM using numpy. 128 | 129 | For the testing, take input path from the user, he wants to check which bearing is suspected to fail &which is in normal condition. Calculate the no of label 2 for the last 100 predictions if the its greater than 50 then its suspected to fail, else in normal condition. 130 | 131 | ![Figure 6](./Images/GMM/testset2_figure.jpg) 132 | *Figure 6.predicted last 100 labels for testset2, for all four bearings using the GMM clustering* 133 | 134 | 135 | ![Figure 7](./Images/GMM/testset2_result.jpg) 136 | *Figure 7. result of testset2 which bearing has failed using the GMM clustering* 137 | 138 | **NOTE:** 139 | TESTSET 3 of NASA the bearing dataset is discarded for the observations because of the following reasons: 140 | 141 | 1: It has 6324 file in actuality, but according to the documentation its contains 4448 data file. This makes very noisy data. 142 | 143 | 2: None of the bearing indicate symptoms of failure. However, it suddenly fails. This makes data inconsistent. 144 | 145 | The above listed reasons describe how testset3 exhibits unpredictable behavior. 146 | --------------------------------------------------------------------------------