├── INDRNN_(V)AE
    ├── dataset
    │   ├── readme.md
    │   ├── test.npy
    │   └── train.npy
    ├── graph.png
    ├── ind_rnn_cell.py
    ├── indrnn_ae_vae.py
    └── readme.md
├── LSTM_VAE
    ├── LSTM_VAE.png
    ├── LSTM_VAE.py
    ├── dataset
    │   ├── data0.csv
    │   ├── lstm_test.npy
    │   ├── lstm_test_label.npy
    │   └── lstm_train.npy
    ├── readme.md
    └── utils.py
├── MLP_VAE
    ├── MLP_VAE.py
    ├── data
    │   ├── a.py
    │   ├── test.npy
    │   ├── test_label.npy
    │   └── train.npy
    ├── img
    │   ├── MLP_VAE.png
    │   ├── a.txt
    │   ├── iforest.png
    │   └── lof.png
    └── readme.md
└── README.md


/INDRNN_(V)AE/dataset/readme.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/INDRNN_(V)AE/dataset/test.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/INDRNN_(V)AE/dataset/test.npy


--------------------------------------------------------------------------------
/INDRNN_(V)AE/dataset/train.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/INDRNN_(V)AE/dataset/train.npy


--------------------------------------------------------------------------------
/INDRNN_(V)AE/graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/INDRNN_(V)AE/graph.png


--------------------------------------------------------------------------------
/INDRNN_(V)AE/ind_rnn_cell.py:
--------------------------------------------------------------------------------
  1 | """Module implementing the IndRNN cell"""
  2 | 
  3 | from tensorflow.python.ops import math_ops
  4 | from tensorflow.python.ops import init_ops
  5 | from tensorflow.python.ops import nn_ops
  6 | from tensorflow.python.ops import clip_ops
  7 | from tensorflow.python.layers import base as base_layer
  8 | 
  9 | try:
 10 |   # TF 1.7+
 11 |   from tensorflow.python.ops.rnn_cell_impl import LayerRNNCell
 12 | except ImportError:
 13 |   from tensorflow.python.ops.rnn_cell_impl import _LayerRNNCell as LayerRNNCell
 14 | 
 15 | 
 16 | class IndRNNCell(LayerRNNCell):
 17 |   """Independently RNN Cell. Adapted from `rnn_cell_impl.BasicRNNCell`.
 18 | 
 19 |   Each unit has a single recurrent weight connected to its last hidden state.
 20 | 
 21 |   The implementation is based on:
 22 | 
 23 |     https://arxiv.org/abs/1803.04831
 24 | 
 25 |   Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, Yanbo Gao
 26 |   "Independently Recurrent Neural Network (IndRNN): Building A Longer and
 27 |   Deeper RNN"
 28 | 
 29 |   The default initialization values for recurrent weights, input weights and
 30 |   biases are taken from:
 31 | 
 32 |     https://arxiv.org/abs/1504.00941
 33 | 
 34 |   Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton
 35 |   "A Simple Way to Initialize Recurrent Networks of Rectified Linear Units"
 36 | 
 37 |   Args:
 38 |     num_units: int, The number of units in the RNN cell.
 39 |     recurrent_min_abs: float, minimum absolute value of each recurrent weight.
 40 |     recurrent_max_abs: (optional) float, maximum absolute value of each
 41 |       recurrent weight. For `relu` activation, `pow(2, 1/timesteps)` is
 42 |       recommended. If None, recurrent weights will not be clipped.
 43 |       Default: None.
 44 |     recurrent_kernel_initializer: (optional) The initializer to use for the
 45 |       recurrent weights. If None, every recurrent weight is initially set to 1.
 46 |       Default: None.
 47 |     input_kernel_initializer: (optional) The initializer to use for the input
 48 |       weights. If None, the input weights are initialized from a random normal
 49 |       distribution with `mean=0` and `stddev=0.001`. Default: None.
 50 |     activation: Nonlinearity to use.  Default: `relu`.
 51 |     reuse: (optional) Python boolean describing whether to reuse variables
 52 |       in an existing scope.  If not `True`, and the existing scope already has
 53 |       the given variables, an error is raised.
 54 |     name: String, the name of the layer. Layers with the same name will
 55 |       share weights, but to avoid mistakes we require reuse=True in such
 56 |       cases.
 57 |   """
 58 | 
 59 |   def __init__(self,
 60 |                num_units,
 61 |                recurrent_min_abs=0,
 62 |                recurrent_max_abs=None,
 63 |                recurrent_kernel_initializer=None,
 64 |                input_kernel_initializer=None,
 65 |                activation=None,
 66 |                reuse=None,
 67 |                name=None):
 68 |     super(IndRNNCell, self).__init__(_reuse=reuse, name=name)
 69 | 
 70 |     # Inputs must be 2-dimensional.
 71 |     self.input_spec = base_layer.InputSpec(ndim=2)
 72 | 
 73 |     self._num_units = num_units
 74 |     self._recurrent_min_abs = recurrent_min_abs
 75 |     self._recurrent_max_abs = recurrent_max_abs
 76 |     self._recurrent_initializer = recurrent_kernel_initializer
 77 |     self._input_initializer = input_kernel_initializer
 78 |     self._activation = activation or nn_ops.relu
 79 | 
 80 |   @property
 81 |   def state_size(self):
 82 |     return self._num_units
 83 | 
 84 |   @property
 85 |   def output_size(self):
 86 |     return self._num_units
 87 | 
 88 |   def build(self, inputs_shape):
 89 |     if inputs_shape[1].value is None:
 90 |       raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s"
 91 |                        % inputs_shape)
 92 | 
 93 |     input_depth = inputs_shape[1].value
 94 |     if self._input_initializer is None:
 95 |       self._input_initializer = init_ops.random_normal_initializer(mean=0.0,
 96 |                                                                    stddev=0.001)
 97 |     self._input_kernel = self.add_variable(
 98 |         "input_kernel",
 99 |         shape=[input_depth, self._num_units],
100 |         initializer=self._input_initializer)
101 | 
102 |     if self._recurrent_initializer is None:
103 |       self._recurrent_initializer = init_ops.constant_initializer(1.)
104 |     self._recurrent_kernel = self.add_variable(
105 |         "recurrent_kernel",
106 |         shape=[self._num_units],
107 |         initializer=self._recurrent_initializer)
108 | 
109 |     # Clip the absolute values of the recurrent weights to the specified minimum
110 |     if self._recurrent_min_abs:
111 |       abs_kernel = math_ops.abs(self._recurrent_kernel)
112 |       min_abs_kernel = math_ops.maximum(abs_kernel, self._recurrent_min_abs)
113 |       self._recurrent_kernel = math_ops.multiply(
114 |           math_ops.sign(self._recurrent_kernel),
115 |           min_abs_kernel
116 |       )
117 | 
118 |     # Clip the absolute values of the recurrent weights to the specified maximum
119 |     if self._recurrent_max_abs:
120 |       self._recurrent_kernel = clip_ops.clip_by_value(self._recurrent_kernel,
121 |                                                       -self._recurrent_max_abs,
122 |                                                       self._recurrent_max_abs)
123 | 
124 |     self._bias = self.add_variable(
125 |         "bias",
126 |         shape=[self._num_units],
127 |         initializer=init_ops.zeros_initializer(dtype=self.dtype))
128 | 
129 |     self.built = True
130 | 
131 |   def call(self, inputs, state):
132 |     """Run one time step of the IndRNN.
133 | 
134 |     Calculates the output and new hidden state using the IndRNN equation
135 | 
136 |       `output = new_state = act(W * input + u (*) state + b)`
137 | 
138 |     where `*` is the matrix multiplication and `(*)` is the Hadamard product.
139 | 
140 |     Args:
141 |       inputs: Tensor, 2-D tensor of shape `[batch, num_units]`.
142 |       state: Tensor, 2-D tensor of shape `[batch, num_units]` containing the
143 |         previous hidden state.
144 | 
145 |     Returns:
146 |       A tuple containing the output and new hidden state. Both are the same
147 |         2-D tensor of shape `[batch, num_units]`.
148 |     """
149 |     gate_inputs = math_ops.matmul(inputs, self._input_kernel)
150 |     recurrent_update = math_ops.multiply(state, self._recurrent_kernel)
151 |     gate_inputs = math_ops.add(gate_inputs, recurrent_update)
152 |     gate_inputs = nn_ops.bias_add(gate_inputs, self._bias)
153 |     output = self._activation(gate_inputs)
154 |     return output, output
155 | 


--------------------------------------------------------------------------------
/INDRNN_(V)AE/indrnn_ae_vae.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | A simple Implementation of INDRNN_(V)AE based algorithm 
  4 | for both Anomaly(Novelty) Detection  in Multivariate Time Series;
  5 | We also persent a health-judge mechanism for accessing the statement of 
  6 | the input Multivariate Time Series, which might be useful in machine maintenance;
  7 | 
  8 | 
  9 | A special note between LSTM_VAE and INDRNN_(V)AE is that INDRNN_(V)AE
 10 | can be adopted by high-frequency scenarios (industry sensors,for example).
 11 | 
 12 | Author: Schindler Liang
 13 | 
 14 | Reference:
 15 |     https://github.com/twairball/keras_lstm_vae
 16 |     https://github.com/batzner/indrnn
 17 | """
 18 | 
 19 | import numpy as np
 20 | import tensorflow as tf
 21 | from tensorflow.nn.rnn_cell import MultiRNNCell
 22 | from ind_rnn_cell import IndRNNCell
 23 | 
 24 | xavier_init = tf.contrib.layers.xavier_initializer(seed=2019)
 25 | zero_init =  tf.zeros_initializer()
 26 | 
 27 | def _INDRNNCells(unit_list,time_steps):
 28 |     recurrent_max = pow(2, 1 / time_steps)
 29 |     return MultiRNNCell([IndRNNCell(unit,recurrent_max_abs=recurrent_max) 
 30 |                          for unit in unit_list],state_is_tuple=True)
 31 | 
 32 | class Data_Hanlder:
 33 |     def __init__(self,train_file):
 34 |         self.train_data = np.load(train_file)
 35 |         
 36 |     def fetch_data(self,batch_size):
 37 |         indices = np.random.choice(self.train_data.shape[0],batch_size)
 38 |         return self.train_data[indices]
 39 |     
 40 | class INDRNN_VAE(object):
 41 |     def __init__(self,train_file,
 42 |                  z_dim=10,
 43 |                  encoder_layers=2,
 44 |                  decode_layers=2,
 45 |                  outlier_fraction=0.01
 46 |                  ):
 47 |         
 48 |         self.outlier_fraction = outlier_fraction
 49 |         self.data_source = Data_Hanlder(train_file)
 50 |         self.n_hidden = 16
 51 |         self.batch_size = 128
 52 |         self.learning_rate = 0.0005
 53 |         self.train_iters = 7000
 54 |         self.encoder_layers = encoder_layers
 55 |         self.decode_layers = decode_layers        
 56 |         self.time_steps = self.data_source.train_data.shape[1]        
 57 |         self.input_dim = self.data_source.train_data.shape[2]
 58 |         self.z_dim = z_dim 
 59 |         self.anomaly_score = 0
 60 |         self.sess = tf.Session()
 61 |         self._build_network()
 62 |         self.sess.run(tf.global_variables_initializer())
 63 |         
 64 |     def _build_network(self):
 65 |         with tf.variable_scope('ph'):
 66 |             self.X = tf.placeholder(tf.float32,shape=[None,self.time_steps,self.input_dim],name='input_X')
 67 |                         
 68 |         with tf.variable_scope('encoder',initializer=xavier_init):
 69 |             with tf.variable_scope('AE'):
 70 |                 ae_fw_lstm_cells = _INDRNNCells([self.n_hidden]*self.encoder_layers,self.time_steps)
 71 |                 ae_bw_lstm_cells = _INDRNNCells([self.n_hidden]*self.encoder_layers,self.time_steps)
 72 |                 (ae_fw_outputs,ae_bw_outputs),_ = tf.nn.bidirectional_dynamic_rnn(
 73 |                                                         ae_fw_lstm_cells,
 74 |                                                         ae_bw_lstm_cells, 
 75 |                                                         self.X, dtype=tf.float32)
 76 |                 ae_outputs = tf.add(ae_fw_outputs,ae_bw_outputs)
 77 |                 
 78 |             with tf.variable_scope('lat_Z'):
 79 |                 z_fw_lstm_cells = _INDRNNCells([self.n_hidden]*self.encoder_layers,
 80 |                                                self.time_steps)
 81 |                 z_bw_lstm_cells = _INDRNNCells([self.n_hidden]*self.encoder_layers,
 82 |                                                self.time_steps)
 83 |                 (z_fw_outputs,z_bw_outputs),_ = tf.nn.bidirectional_dynamic_rnn(
 84 |                                                             z_fw_lstm_cells,
 85 |                                                             z_bw_lstm_cells, 
 86 |                                                             self.X, dtype=tf.float32)
 87 |                 z_outputs = tf.reduce_mean( (z_fw_outputs+z_bw_outputs),axis=1 )
 88 |                 
 89 |                 mu_outputs = tf.layers.dense(z_outputs,self.z_dim,activation=tf.nn.tanh)               
 90 |                 log_sigma_outputs = tf.layers.dense(z_outputs,self.z_dim)               
 91 |         
 92 |                 sample_Z =  mu_outputs + tf.exp(log_sigma_outputs/2) * tf.random_normal(
 93 |                                                         tf.shape(mu_outputs),
 94 |                                                         0,1,dtype=tf.float32)
 95 |         
 96 |         
 97 |         with tf.variable_scope('decoder'):
 98 |             sample_Z = tf.expand_dims(sample_Z,axis=1)
 99 |             sample_Z = tf.tile(sample_Z,[1,self.time_steps,1])
100 |             decoder_input = tf.concat([ae_outputs,sample_Z],axis=-1)
101 |             
102 |             recons_fw_lstm_cells = _INDRNNCells([self.n_hidden]*self.decode_layers + [self.input_dim],
103 |                                                  self.time_steps)
104 |             recons_bw_lstm_cells = _INDRNNCells([self.n_hidden]*self.decode_layers + [self.input_dim],
105 |                                                  self.time_steps)            
106 |             (recons_fw_outputs,recons_bw_outputs),_ = tf.nn.bidirectional_dynamic_rnn( 
107 |                                                                 recons_fw_lstm_cells,
108 |                                                                 recons_bw_lstm_cells, 
109 |                                                                 decoder_input, dtype=tf.float32)           
110 |             self.recons_X = tf.add(recons_fw_outputs,recons_bw_outputs)
111 |  
112 |         with tf.variable_scope('loss'):
113 |             reduce_dims = np.arange(1,tf.keras.backend.ndim(self.X))
114 |             recons_loss = tf.losses.mean_squared_error(self.X, self.recons_X)
115 |             kl_loss = - 0.5 * tf.reduce_mean(1 + log_sigma_outputs - tf.square(mu_outputs) - tf.exp(log_sigma_outputs))
116 |             self.opt_loss = recons_loss + kl_loss
117 |             self.all_losses = tf.reduce_sum(tf.square(self.X - self.recons_X),reduction_indices=reduce_dims)
118 | 
119 |         with tf.variable_scope('train'):
120 |             self.uion_train_op = tf.train.AdamOptimizer(self.learning_rate).minimize(self.opt_loss)
121 |             
122 |             
123 |     def train(self):
124 |         for i in range(self.train_iters):            
125 |             this_X = self.data_source.fetch_data(self.batch_size)
126 |             self.sess.run([self.uion_train_op],feed_dict={
127 |                     self.X: this_X
128 |                     })
129 |             if i % 200 ==0:
130 |                 mse_loss = self.sess.run([self.opt_loss],feed_dict={
131 |                     self.X: self.data_source.train_data
132 |                     })
133 |                 print('epoch {}: with loss: {}'.format(i,mse_loss))
134 |         self._arange_score(self.data_source.train_data)
135 |     
136 |     def _arange_score(self,input_data):
137 |         all_losses = self.sess.run(self.all_losses,feed_dict={
138 |                     self.X: input_data
139 |                     })
140 |         self.sorted_loss = np.sort(all_losses).ravel()
141 |         self.anomaly_score = np.percentile(self.sorted_loss,(1-self.outlier_fraction)*100)
142 |     
143 |        
144 |     def judge_health(self,test):
145 |         all_losses = self.sess.run(self.all_losses,feed_dict={
146 |                     self.X: test
147 |                     }).ravel()       
148 |         percentile_95 = self.sorted_loss[int(self.sorted_loss.shape[0]*0.95)]
149 |         value_gap = self.sorted_loss[-1] - percentile_95
150 |         def _get_health(loss):
151 |             min_index = np.argmin(np.abs(self.sorted_loss-loss))
152 |             if min_index < self.sorted_loss.shape[0] - 1:
153 |                 minus_ratio = min_index / self.sorted_loss.shape[0]               
154 |             else:
155 |                 exceed_loss = loss - self.sorted_loss[-1]
156 |                 minus_ratio = exceed_loss / value_gap * 0.05 + 1               
157 |             return 100.0 - 40 * minus_ratio
158 |         all_health = list(map(lambda x:_get_health(x),all_losses))        
159 |         return all_health
160 | 
161 |     def judge_anomaly(self,test):
162 |         all_losses = self.sess.run(self.all_losses,feed_dict={
163 |                     self.X: test
164 |                     }).ravel()
165 |         judge_label = list( map(lambda x: -1 if x>self.anomaly_score else 1,all_losses)   )        
166 |         return judge_label
167 | 
168 |     
169 | indrnn_ae = INDRNN_VAE(train_file='dataset/train.npy',z_dim=10,outlier_fraction=0.04)    
170 | indrnn_ae.train()  
171 | 
172 | test = np.load('dataset/test.npy')
173 | z1 = indrnn_ae.judge_health(test)
174 | z2 = indrnn_ae.judge_anomaly(test)
175 | 
176 | import matplotlib.pyplot as plt
177 | plt.plot(z1)
178 | 


--------------------------------------------------------------------------------
/INDRNN_(V)AE/readme.md:
--------------------------------------------------------------------------------
 1 | ## Reference
 2 | High-frequency Multivariate Time Series Anomaly Detection based on IndRNN with AutoEncoder(both AE and VAE);
 3 | [reference1](https://github.com/twairball/keras_lstm_vae). The IndRNN implementation is from 
 4 | [reference2](https://github.com/batzner/indrnn);
 5 | 
 6 | 
 7 | ## Prerequisites
 8 | * Python 3.3+
 9 | * Tensorflow 1.12.0
10 | * Sklearn 0.20.1
11 | * Numpy 1.15.4
12 | * Pandas 0.23.4
13 | * Matplotlib 3.0.2
14 | 
15 | ## Dataset and Preprocessing
16 | The dataset used is the [MTSAD](https://github.com/jsonbruce/MTSAnomalyDetection), which has 2 dimensions. 
17 | Then we re-set the dataset to be 3_dimensional with time_steps of 16. The detailed preprecessing process can be found at 
18 | the LSTM_VAE chapter[reference3](https://github.com/SchindlerLiang/VAE-for-Anomaly-Detection/blob/master/LSTM_VAE/utils.py).
19 | 
20 | IndRNN_(V)AE algorithm should be trained on the Normal samples. In this algorithm, we present two score-functions for accessing the test_data. judge_anomaly() for anomaly detection and judge_health() for healthy accessment, which may be of use in high-frequency industry sensors.
21 | 
22 | 
23 | ## Network Structure
24 | The Structure of the network presented here 
25 | 
26 | ![Network Structure for IndRNN_(V)AE](https://github.com/SchindlerLiang/VAE-for-Anomaly-Detection/blob/master/INDRNN_(V)AE/graph.png)
27 | 
28 | Note that we use both AE and VAE structure, with the thoughts of keeping time-dependent information by AE and maitaining variability by VAE. 
29 | 


--------------------------------------------------------------------------------
/LSTM_VAE/LSTM_VAE.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/LSTM_VAE/LSTM_VAE.png


--------------------------------------------------------------------------------
/LSTM_VAE/LSTM_VAE.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | One simple Implementation of LSTM_VAE based algorithm for Anomaly Detection in Multivariate Time Series;
  4 | 
  5 | Author: Schindler Liang
  6 | 
  7 | Reference:
  8 |     https://www.researchgate.net/publication/304758073_LSTM-based_Encoder-Decoder_for_Multi-sensor_Anomaly_Detection
  9 |     https://github.com/twairball/keras_lstm_vae
 10 |     https://arxiv.org/pdf/1711.00614.pdf    
 11 | """
 12 | import numpy as np
 13 | import tensorflow as tf
 14 | from tensorflow.nn.rnn_cell import MultiRNNCell, LSTMCell
 15 | from utils import Data_Hanlder
 16 | 
 17 | 
 18 | def lrelu(x, leak=0.2, name='lrelu'):
 19 | 	return tf.maximum(x, leak*x)
 20 | 
 21 | 
 22 | def _LSTMCells(unit_list,act_fn_list):
 23 |     return MultiRNNCell([LSTMCell(unit,                         
 24 |                          activation=act_fn) 
 25 |                          for unit,act_fn in zip(unit_list,act_fn_list )])
 26 |     
 27 | class LSTM_VAE(object):
 28 |     def __init__(self,dataset_name,columns,z_dim,time_steps,outlier_fraction):
 29 |         self.outlier_fraction = outlier_fraction
 30 |         self.data_source = Data_Hanlder(dataset_name,columns,time_steps)
 31 |         self.n_hidden = 16
 32 |         self.batch_size = 128
 33 |         self.learning_rate = 0.0005
 34 |         self.train_iters = 4000
 35 |         
 36 |         self.input_dim = len(columns)
 37 |         self.z_dim = z_dim
 38 |         self.time_steps = time_steps
 39 |     
 40 |         self.pointer = 0 
 41 |         self.anomaly_score = 0
 42 |         self.sess = tf.Session()
 43 |         self._build_network()
 44 |         self.sess.run(tf.global_variables_initializer())
 45 |         
 46 |     def _build_network(self):
 47 |         with tf.variable_scope('ph'):
 48 |             self.X = tf.placeholder(tf.float32,shape=[None,self.time_steps,self.input_dim],name='input_X')
 49 |         
 50 |         with tf.variable_scope('encoder'):
 51 |             with tf.variable_scope('lat_mu'):
 52 |                 mu_fw_lstm_cells = _LSTMCells([self.z_dim],[lrelu])
 53 |                 mu_bw_lstm_cells = _LSTMCells([self.z_dim],[lrelu])
 54 | 
 55 |                 (mu_fw_outputs,mu_fw_outputs),_ = tf.nn.bidirectional_dynamic_rnn(
 56 |                                                         mu_fw_lstm_cells,
 57 |                                                         mu_bw_lstm_cells, 
 58 |                                                         self.X, dtype=tf.float32)
 59 |                 mu_outputs = tf.add(mu_fw_outputs,mu_fw_outputs)
 60 |                 
 61 |             with tf.variable_scope('lat_sigma'):
 62 |                 sigma_fw_lstm_cells = _LSTMCells([self.z_dim],[tf.nn.softplus])
 63 |                 sigma_bw_lstm_cells = _LSTMCells([self.z_dim],[tf.nn.softplus])
 64 |                 (sigma_fw_outputs,sigma_bw_outputs),_ = tf.nn.bidirectional_dynamic_rnn(
 65 |                                                             sigma_fw_lstm_cells,
 66 |                                                             sigma_bw_lstm_cells, 
 67 |                                                             self.X, dtype=tf.float32)
 68 |                 sigma_outputs = tf.add(sigma_fw_outputs,sigma_bw_outputs)                 
 69 |                 sample_Z =  mu_outputs + sigma_outputs * tf.random_normal(
 70 |                                                         tf.shape(mu_outputs),
 71 |                                                         0,1,dtype=tf.float32)                   
 72 |         
 73 |         with tf.variable_scope('decoder'):
 74 |             recons_lstm_cells = _LSTMCells([self.n_hidden,self.input_dim],[lrelu,lrelu])
 75 |             self.recons_X,_ = tf.nn.dynamic_rnn(recons_lstm_cells, sample_Z, dtype=tf.float32)
 76 |  
 77 |         with tf.variable_scope('loss'):
 78 |             reduce_dims = np.arange(1,tf.keras.backend.ndim(self.X))
 79 |             recons_loss = tf.losses.mean_squared_error(self.X, self.recons_X)
 80 |             kl_loss = - 0.5 * tf.reduce_mean(1 + sigma_outputs - tf.square(mu_outputs) - tf.exp(sigma_outputs))
 81 |             self.opt_loss = recons_loss + kl_loss
 82 |             self.all_losses = tf.reduce_sum(tf.square(self.X - self.recons_X),reduction_indices=reduce_dims)
 83 | 
 84 |         with tf.variable_scope('train'):
 85 |             self.uion_train_op = tf.train.AdamOptimizer(self.learning_rate).minimize(self.opt_loss)
 86 |             
 87 |             
 88 |     def train(self):
 89 |         for i in range(self.train_iters):            
 90 |             this_X = self.data_source.fetch_data(self.batch_size)
 91 |             self.sess.run([self.uion_train_op],feed_dict={
 92 |                     self.X: this_X
 93 |                     })
 94 |             if i % 200 ==0:
 95 |                 mse_loss = self.sess.run([self.opt_loss],feed_dict={
 96 |                     self.X: self.data_source.train
 97 |                     })
 98 |                 print('round {}: with loss: {}'.format(i,mse_loss))
 99 |         self._arange_score(self.data_source.train)   
100 |         
101 |     
102 |     def _arange_score(self,input_data):       
103 |         input_all_losses = self.sess.run(self.all_losses,feed_dict={
104 |                 self.X: input_data                
105 |                 })
106 |         self.anomaly_score = np.percentile(input_all_losses,(1-self.outlier_fraction)*100)
107 |        
108 |     def judge(self,test):
109 |         all_test_loss = self.sess.run(self.all_losses,feed_dict={
110 |                                     self.X: test                
111 |                                     })
112 |         result = map(lambda x: 1 if x< self.anomaly_score else -1,all_test_loss)
113 | 
114 |         return list(result)
115 | 
116 | 
117 |     def plot_confusion_matrix(self):
118 |         predict_label = self.judge(self.data_source.test)
119 |         self.data_source.plot_confusion_matrix(self.data_source.test_label,predict_label,['Abnormal','Normal'],'LSTM_VAE Confusion-Matrix')
120 | 
121 |     
122 | def main():
123 | 
124 |     lstm_vae = LSTM_VAE('dataset/data0.csv',['v0','v1'],z_dim=8,time_steps=16,outlier_fraction=0.01)    
125 |     lstm_vae.train()  
126 |     lstm_vae.plot_confusion_matrix() 
127 | 
128 | if __name__ == '__main__':
129 |     main()
130 | 
131 | 


--------------------------------------------------------------------------------
/LSTM_VAE/dataset/lstm_test.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/LSTM_VAE/dataset/lstm_test.npy


--------------------------------------------------------------------------------
/LSTM_VAE/dataset/lstm_test_label.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/LSTM_VAE/dataset/lstm_test_label.npy


--------------------------------------------------------------------------------
/LSTM_VAE/dataset/lstm_train.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/LSTM_VAE/dataset/lstm_train.npy


--------------------------------------------------------------------------------
/LSTM_VAE/readme.md:
--------------------------------------------------------------------------------
 1 | ## Reference
 2 | LSTM_VAE used for Multivariate Time Series Anomaly Detection;
 3 | [reference1](https://www.researchgate.net/publication/304758073_LSTM-based_Encoder-Decoder_for_Multi-sensor_Anomaly_Detection);
 4 | [reference2](https://github.com/twairball/keras_lstm_vae);
 5 | [reference3](https://arxiv.org/pdf/1711.00614.pdf);
 6 | 
 7 | ## Prerequisites
 8 | * Python 3.3+
 9 | * Tensorflow 1.12.0
10 | * Sklearn 0.20.1
11 | * Numpy 1.15.4
12 | * Pandas 0.23.4
13 | * Matplotlib 3.0.2
14 | 
15 | ## Dataset and Preprocessing
16 | The dataset used is the [MTSAD](https://github.com/jsonbruce/MTSAnomalyDetection), which has 2 dimensions.
17 | We use StandardScaler and MinMaxScaler to preprocess the initial data. Then we re-set the dataset to be 3_dimensional with time_steps of 10. 
18 | For each sample, if ANY ONE in the 10_timesteps is labeled as abnormal, then the corresponding 3_dimensional sample is labeled as ABNORMAL;
19 | 
20 | In total, there are 55 abnormal samples and 8661 normal samples. We randomly select 8000 normal samples as train dataset, 661 normal samples and 55 abnormal samples as test dataset. As a result, the abnormal samples constitute only 7.7% of the test dataset.
21 | 
22 | `LSTM_VAE should be trained on NORMAL Dataset. However, dataset with only a few ABNORMAL samples is also acceptable, since we can adjust the hyper-parameter outliers_fraction, which may slightly influnce the detection score.`
23 | 
24 | ## Result
25 | The confusion_matrix of the test dataset are presented as:
26 | 
27 | ![Confusion_Matrix for LSTM_VAE](https://github.com/SchindlerLiang/VAE-for-Anomaly-Detection/blob/master/LSTM_VAE/LSTM_VAE.png)
28 | 
29 | It can be concluded from above that LSTM_VAE is capable of capturing most of the outliers (anomaly) in the test dataset.
30 | 
31 | 


--------------------------------------------------------------------------------
/LSTM_VAE/utils.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | from sklearn.preprocessing import StandardScaler,MinMaxScaler
  4 | import os
  5 | from sklearn.metrics import confusion_matrix
  6 | import matplotlib.pyplot as plt
  7 | 
  8 | '''
  9 | time_steps = 10
 10 | '''
 11 | class Data_Hanlder(object):
 12 |     
 13 |     def __init__(self,dataset_name,columns,time_steps):
 14 |         self.time_steps = time_steps        
 15 |         self.data = pd.read_csv(dataset_name,index_col=0)
 16 |         self.columns = columns
 17 |         
 18 |         self.data['Class'] = 0
 19 |         self.data['Class'] = self.data['result'].apply(lambda x: 1 if x=='normal' else -1)
 20 |         self.data[self.columns] = self.data[self.columns].shift(-1) - self.data[self.columns]
 21 |         self.data = self.data.dropna(how='any')
 22 |         self.pointer = 0
 23 |         self.train = np.array([])
 24 |         self.test = np.array([])
 25 |         self.test_label = np.array([])
 26 |         
 27 |         
 28 |         self.split_fraction = 0.2
 29 |         
 30 |         
 31 |     def _process_source_data(self):
 32 |  
 33 |         self._data_scale()
 34 |         self._data_arrage()
 35 |         self._split_save_data()
 36 |         
 37 |     def _data_scale(self):
 38 | 
 39 |         standscaler = StandardScaler()
 40 |         mscaler = MinMaxScaler(feature_range=(0,1))
 41 |         self.data[self.columns] = standscaler.fit_transform(self.data[self.columns])
 42 |         self.data[self.columns] = mscaler.fit_transform(self.data[self.columns])
 43 | 
 44 | 
 45 |     def _data_arrage(self):
 46 |         
 47 |         self.all_data = np.array([])
 48 |         self.labels = np.array([])
 49 |         d_array = self.data[self.columns].values  
 50 |         class_array = self.data['Class'].values
 51 |         for index in range(self.data.shape[0]-self.time_steps+1):
 52 |             this_array = d_array[index:index+self.time_steps].reshape((-1,self.time_steps,len(self.columns)))
 53 |             time_steps_label = class_array[index:index+self.time_steps]
 54 |             if np.any(time_steps_label==-1):
 55 |                 this_label = -1
 56 |             else:
 57 |                 this_label = 1
 58 |             if self.all_data.shape[0] == 0:
 59 |                 self.all_data = this_array
 60 |                 self.labels = this_label                    
 61 |             else:
 62 |                 self.all_data = np.concatenate([self.all_data,this_array],axis=0)
 63 |                 self.labels = np.append(self.labels,this_label)
 64 |         
 65 |     def _split_save_data(self):
 66 |         normal = self.all_data[self.labels==1]
 67 |         abnormal = self.all_data[self.labels==-1]
 68 |         
 69 |         split_no =   normal.shape[0] -  abnormal.shape[0]    
 70 |         
 71 |         self.train = normal[:split_no,:]
 72 |         self.test = np.concatenate([normal[split_no:,:],abnormal],axis=0)
 73 |         self.test_label = np.concatenate([np.ones(normal[split_no:,:].shape[0]),-np.ones(abnormal.shape[0])])        
 74 |         np.save('dataset/train.npy',self.train)
 75 |         np.save('dataset/test.npy',self.test)
 76 |         np.save('dataset/test_label.npy',self.test_label)
 77 | 
 78 |     def _get_data(self):
 79 |         if os.path.exists('dataset/train.npy'):
 80 |             self.train = np.load('dataset/train.npy')
 81 |             self.test = np.load('dataset/test.npy')
 82 |             self.test_label = np.load('dataset/test_label.npy')        
 83 |         if self.train.ndim ==3:
 84 |             if self.train.shape[1] == self.time_steps and self.train.shape[2] != len(self.columns):
 85 |                 return 0
 86 |         self._process_source_data()
 87 | 
 88 | 
 89 |     def fetch_data(self,batch_size):
 90 |         if self.train.shape[0] == 0:
 91 |             self._get_data()
 92 |             
 93 |         if self.train.shape[0] < batch_size:
 94 |             return_train = self.train
 95 |         else:
 96 |             if (self.pointer + 1) * batch_size >= self.train.shape[0]-1:
 97 |                 self.pointer = 0
 98 |                 return_train = self.train[self.pointer * batch_size:,]
 99 |             else:
100 |                 self.pointer = self.pointer + 1
101 |                 return_train = self.train[self.pointer * batch_size:(self.pointer + 1) * batch_size,]
102 |         if return_train.ndim < self.train.ndim:
103 |             return_train = np.expand_dims(return_train,0)
104 |         return return_train
105 |     
106 |     def plot_confusion_matrix(self,y_true, y_pred, labels,title):
107 |         cmap = plt.cm.binary
108 |         cm = confusion_matrix(y_true, y_pred)
109 |         tick_marks = np.array(range(len(labels))) + 0.5
110 |         np.set_printoptions(precision=2)
111 |         cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
112 |         plt.figure(figsize=(8, 4), dpi=120)
113 |         ind_array = np.arange(len(labels))
114 |         x, y = np.meshgrid(ind_array, ind_array)
115 |         intFlag = 0 
116 |         for x_val, y_val in zip(x.flatten(), y.flatten()):
117 |     
118 |             if (intFlag):
119 |                 c = cm[y_val][x_val]
120 |                 plt.text(x_val, y_val, "%d" % (c,), color='red', fontsize=10, va='center', ha='center')
121 |     
122 |             else:
123 |                 c = cm_normalized[y_val][x_val]
124 |                 if (c > 0.01):
125 |                     plt.text(x_val, y_val, "%0.2f" % (c,), color='red', fontsize=10, va='center', ha='center')
126 |                 else:
127 |                     plt.text(x_val, y_val, "%d" % (0,), color='red', fontsize=10, va='center', ha='center')
128 |         if(intFlag):
129 |             plt.imshow(cm, interpolation='nearest', cmap=cmap)
130 |         else:
131 |             plt.imshow(cm_normalized, interpolation='nearest', cmap=cmap)
132 |         plt.gca().set_xticks(tick_marks, minor=True)
133 |         plt.gca().set_yticks(tick_marks, minor=True)
134 |         plt.gca().xaxis.set_ticks_position('none')
135 |         plt.gca().yaxis.set_ticks_position('none')
136 |         plt.grid(True, which='minor', linestyle='-')
137 |         plt.gcf().subplots_adjust(bottom=0.15)
138 |         plt.title(title)
139 |         plt.colorbar()
140 |         xlocations = np.array(range(len(labels)))
141 |         plt.xticks(xlocations, labels)
142 |         plt.yticks(xlocations, labels)
143 |         plt.ylabel('Index of True Classes')
144 |         plt.xlabel('Index of Predict Classes')
145 |         plt.show()


--------------------------------------------------------------------------------
/MLP_VAE/MLP_VAE.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Schindler Liang
  4 | 
  5 | MLP Variational AutoEncoder for Anomaly Detection
  6 | reference: https://pdfs.semanticscholar.org/0611/46b1d7938d7a8dae70e3531a00fceb3c78e8.pdf
  7 | """
  8 | import random
  9 | import tensorflow as tf
 10 | import numpy as np
 11 | import pandas as pd
 12 | import matplotlib.pyplot as plt
 13 | from sklearn.preprocessing import StandardScaler,MinMaxScaler
 14 | from sklearn.metrics import confusion_matrix
 15 | 
 16 | 
 17 | def lrelu(x, leak=0.2, name='lrelu'):
 18 | 	return tf.maximum(x, leak*x)
 19 | 
 20 | 
 21 | def build_dense(input_vector,unit_no,activation):    
 22 |     return tf.layers.dense(input_vector,unit_no,activation=activation,
 23 |             kernel_initializer=tf.contrib.layers.xavier_initializer(),
 24 |             bias_initializer=tf.zeros_initializer())
 25 | 
 26 | class MLP_VAE:
 27 |     def __init__(self,input_dim,lat_dim, outliers_fraction):
 28 |        # input_paras:
 29 |            # input_dim: input dimension for X
 30 |            # lat_dim: latent dimension for Z
 31 |            # outliers_fraction: pre-estimated fraction of outliers in trainning dataset
 32 |         
 33 |         self.outliers_fraction = outliers_fraction # for computing the threshold of anomaly score       
 34 |         self.input_dim = input_dim
 35 |         self.lat_dim = lat_dim # the lat_dim can exceed input_dim    
 36 |         
 37 |         self.input_X = tf.placeholder(tf.float32,shape=[None,self.input_dim],name='source_x')
 38 |         
 39 |         self.learning_rate = 0.0005
 40 |         self.batch_size =  32
 41 |         # batch_size should be smaller than normal setting for getting
 42 |         # a relatively lower anomaly-score-threshold
 43 |         self.train_iter = 3000
 44 |         self.hidden_units = 128
 45 |         self._build_VAE()
 46 |         self.sess = tf.Session()
 47 |         self.sess.run(tf.global_variables_initializer())
 48 |         self.pointer = 0
 49 |         
 50 |     def _encoder(self):
 51 |         with tf.variable_scope('encoder',reuse=tf.AUTO_REUSE):
 52 |             l1 = build_dense(self.input_X,self.hidden_units,activation=lrelu)
 53 | #            l1 = tf.nn.dropout(l1,0.8)
 54 |             l2 = build_dense(l1,self.hidden_units,activation=lrelu)
 55 | #            l2 = tf.nn.dropout(l2,0.8)          
 56 |             mu = tf.layers.dense(l2,self.lat_dim)
 57 |             sigma = tf.layers.dense(l2,self.lat_dim,activation=tf.nn.softplus)
 58 |             sole_z = mu + sigma *  tf.random_normal(tf.shape(mu),0,1,dtype=tf.float32)
 59 |         return mu,sigma,sole_z
 60 |         
 61 |     def _decoder(self,z):
 62 |         with tf.variable_scope('decoder',reuse=tf.AUTO_REUSE):
 63 |             l1 = build_dense(z,self.hidden_units,activation=lrelu)
 64 | #            l1 = tf.nn.dropout(l1,0.8)
 65 |             l2 = build_dense(l1,self.hidden_units,activation=lrelu)
 66 | #            l2 = tf.nn.dropout(l2,0.8)
 67 |             recons_X = tf.layers.dense(l2,self.input_dim)
 68 |         return recons_X
 69 | 
 70 | 
 71 |     def _build_VAE(self):
 72 |         self.mu_z,self.sigma_z,sole_z = self._encoder()
 73 |         self.recons_X = self._decoder(sole_z)
 74 |         
 75 |         with tf.variable_scope('loss'):
 76 |             KL_divergence = 0.5 * tf.reduce_sum(tf.square(self.mu_z) + tf.square(self.sigma_z) - tf.log(1e-8 + tf.square(self.sigma_z)) - 1, 1)
 77 |             mse_loss = tf.reduce_sum(tf.square(self.input_X-self.recons_X), 1)          
 78 |             self.all_loss =  mse_loss  
 79 |             self.loss = tf.reduce_mean(mse_loss + KL_divergence)
 80 |             
 81 |         with tf.variable_scope('train'):            
 82 |             self.train_op = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss)
 83 |             
 84 | 
 85 |     def _fecth_data(self,input_data):        
 86 |         if (self.pointer+1) * self.batch_size  >= input_data.shape[0]:
 87 |             return_data = input_data[self.pointer*self.batch_size:,:]
 88 |             self.pointer = 0
 89 |         else:
 90 |             return_data =  input_data[ self.pointer*self.batch_size:(self.pointer+1)*self.batch_size,:]
 91 |             self.pointer = self.pointer + 1
 92 |         return return_data
 93 |     
 94 |      
 95 | 
 96 |     def train(self,train_X):
 97 |         for index in range(self.train_iter):
 98 |             this_X = self._fecth_data(train_X)
 99 |             self.sess.run([self.train_op],feed_dict={
100 |                         self.input_X: this_X
101 |                         })
102 |         self.arrage_recons_loss(train_X)
103 | 
104 |         
105 |     def arrage_recons_loss(self,input_data):
106 |         all_losses =  self.sess.run(self.all_loss,feed_dict={
107 |                 self.input_X: input_data                  
108 |                 })
109 |         self.judge_loss = np.percentile(all_losses,(1-self.outliers_fraction)*100)
110 |                 
111 | 
112 |     def judge(self,input_data):
113 |         return_label = []
114 |         for index in range(input_data.shape[0]):
115 |             single_X = input_data[index].reshape(1,-1)
116 |             this_loss = self.sess.run(self.loss,feed_dict={
117 |                     self.input_X: single_X                  
118 |                     })
119 |             
120 |             if this_loss < self.judge_loss:
121 |                 return_label.append(1)
122 |             else:
123 |                 return_label.append(-1)
124 |         return return_label
125 |        
126 | def plot_confusion_matrix(y_true, y_pred, labels,title):
127 |     cmap = plt.cm.binary
128 |     cm = confusion_matrix(y_true, y_pred)
129 |     tick_marks = np.array(range(len(labels))) + 0.5
130 |     np.set_printoptions(precision=2)
131 |     cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
132 |     plt.figure(figsize=(4, 2), dpi=120)
133 |     ind_array = np.arange(len(labels))
134 |     x, y = np.meshgrid(ind_array, ind_array)
135 |     intFlag = 0 
136 |     for x_val, y_val in zip(x.flatten(), y.flatten()):
137 |         #
138 | 
139 |         if (intFlag):
140 |             c = cm[y_val][x_val]
141 |             plt.text(x_val, y_val, "%d" % (c,), color='red', fontsize=8, va='center', ha='center')
142 | 
143 |         else:
144 |             c = cm_normalized[y_val][x_val]
145 |             if (c > 0.01):
146 |                 plt.text(x_val, y_val, "%0.2f" % (c,), color='red', fontsize=7, va='center', ha='center')
147 |             else:
148 |                 plt.text(x_val, y_val, "%d" % (0,), color='red', fontsize=7, va='center', ha='center')
149 |     if(intFlag):
150 |         plt.imshow(cm, interpolation='nearest', cmap=cmap)
151 |     else:
152 |         plt.imshow(cm_normalized, interpolation='nearest', cmap=cmap)
153 |     plt.gca().set_xticks(tick_marks, minor=True)
154 |     plt.gca().set_yticks(tick_marks, minor=True)
155 |     plt.gca().xaxis.set_ticks_position('none')
156 |     plt.gca().yaxis.set_ticks_position('none')
157 |     plt.grid(True, which='minor', linestyle='-')
158 |     plt.gcf().subplots_adjust(bottom=0.15)
159 |     plt.title(title)
160 |     plt.colorbar()
161 |     xlocations = np.array(range(len(labels)))
162 |     plt.xticks(xlocations, labels)
163 |     plt.yticks(xlocations, labels)
164 |     plt.ylabel('Index of True Classes')
165 |     plt.xlabel('Index of Predict Classes')
166 |     plt.show()
167 |  
168 | def mlp_vae_predict(train,test,test_label):
169 |     mlp_vae = MLP_VAE(8,20,0.07)
170 |     mlp_vae.train(train)
171 |     mlp_vae_predict_label = mlp_vae.judge(test) 
172 |     plot_confusion_matrix(test_label, mlp_vae_predict_label, ['anomaly','normal'],'MLP_VAE Confusion-Matrix')
173 | 
174 | def iforest_predict(train,test,test_label):
175 |     from sklearn.ensemble import IsolationForest
176 |     iforest = IsolationForest(max_samples = 'auto',
177 |                                  behaviour="new",contamination=0.01)
178 | 
179 |     iforest.fit(train)
180 |     iforest_predict_label = iforest.predict(test)
181 |     plot_confusion_matrix(test_label, iforest_predict_label, ['anomaly','normal'],'iforest Confusion-Matrix')
182 | 
183 | def lof_predict(train,test,test_label):
184 |     from sklearn.neighbors import LocalOutlierFactor
185 |     lof = LocalOutlierFactor(novelty=True,contamination=0.01)
186 |     lof.fit(train)
187 |     lof_predict_label = lof.predict(test)
188 |     plot_confusion_matrix(test_label, lof_predict_label, ['anomaly','normal'],'LOF Confusion-Matrix')
189 | 
190 | if __name__ == '__main__':
191 |     train = np.load('data/train.npy') 
192 |     test = np.load('data/test.npy')
193 |     test_label = np.load('data/test_label.npy')
194 |     mlp_vae_predict(train,test,test_label)
195 |     iforest_predict(train,test,test_label)
196 |     lof_predict(train,test,test_label)
197 | 
198 | 
199 | 
200 | 
201 | 
202 | 
203 | 


--------------------------------------------------------------------------------
/MLP_VAE/data/a.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/MLP_VAE/data/test.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/MLP_VAE/data/test.npy


--------------------------------------------------------------------------------
/MLP_VAE/data/test_label.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/MLP_VAE/data/test_label.npy


--------------------------------------------------------------------------------
/MLP_VAE/data/train.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/MLP_VAE/data/train.npy


--------------------------------------------------------------------------------
/MLP_VAE/img/MLP_VAE.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/MLP_VAE/img/MLP_VAE.png


--------------------------------------------------------------------------------
/MLP_VAE/img/a.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/MLP_VAE/img/iforest.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/MLP_VAE/img/iforest.png


--------------------------------------------------------------------------------
/MLP_VAE/img/lof.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SchindlerLiang/VAE-for-Anomaly-Detection/061b6a68d8e4918c23ac154dcd1948d9941e5802/MLP_VAE/img/lof.png


--------------------------------------------------------------------------------
/MLP_VAE/readme.md:
--------------------------------------------------------------------------------
 1 | 
 2 | MLP_VAE used for anomaly detection;
 3 | [reference](https://pdfs.semanticscholar.org/0611/46b1d7938d7a8dae70e3531a00fceb3c78e8.pdf);
 4 | 
 5 | The dataset used is the [HTRU2 Data Set](http://archive.ics.uci.edu/ml/datasets/HTRU2). This is an unbanlaced dataset, where samples with Class 1 constitutes less than 10% of the entire dataset, which is treated as anomaly class;
 6 | 
 7 | All the dimensions are preprocessed by sklearn StandardScaler and MinMaxScaler to better fit for MLP_VAE;
 8 | 
 9 | The test results of MLP_VAE,IForest and LOF are presented as follows:
10 | 
11 | ![Confusion_Matrix for MLP_VAE](https://github.com/SchindlerLiang/VAE-for-Anomaly-Detection/blob/master/MLP_VAE/img/MLP_VAE.png)
12 | 
13 | ![Confusion_Matrix for Iforest](https://github.com/SchindlerLiang/VAE-for-Anomaly-Detection/blob/master/MLP_VAE/img/iforest.png)
14 | 
15 | ![Confusion_Matrix for LOF](https://github.com/SchindlerLiang/VAE-for-Anomaly-Detection/blob/master/MLP_VAE/img/lof.png)
16 | 
17 | The outliers_fraction for MLP_VAE are specially set to be different for better computing the anomaly score. It can be seen from above that MLP_VAE can obtain even results with IForest and LOF;
18 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # VAE-for-Anomaly-Detection
2 | MLP_VAE, Anomaly Detection, LSTM_VAE, Multivariate Time-Series Anomaly Detection，IndRNN_VAE, High_Frequency sensor Anomaly Detection,Tensorflow
3 | 


--------------------------------------------------------------------------------