├── Basic-idea-of-APReLU.png
├── ResNet-APReLU-for-Fault-Diagnosis.pdf
├── README.md
├── ResNet_APReLU_TFLearn.py
└── ResNet_APReLU_Keras_Cifar10.py


/Basic-idea-of-APReLU.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhao62/Adaptively-Parametric-ReLU/HEAD/Basic-idea-of-APReLU.png


--------------------------------------------------------------------------------
/ResNet-APReLU-for-Fault-Diagnosis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zhao62/Adaptively-Parametric-ReLU/HEAD/ResNet-APReLU-for-Fault-Diagnosis.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Adaptively-Parametric-ReLU
 2 | The adaptively parametric ReLU (APReLU) is a dynamic ReLU activation function that performs non-identically for input samples. 
 3 | 
 4 | Besides, although the APReLU was originally applied to vibration-based fault diagnosis, it can be applied in other applications, such as image classification. In this code, the APReLU is implemented using TensorFlow 1.0.1 and TFLearn 0.3.2, and applied for image classification.
 5 | 
 6 | ![The basic idea of APReLU](https://github.com/zhao62/Adaptively-Parametric-ReLU/blob/master/Basic-idea-of-APReLU.png)
 7 | 
 8 | Abstract:
 9 | Vibration signals under the same health state often have large differences due to changes in operating conditions. Likewise, the differences among vibration signals under different health states can be small under some operating conditions. Traditional deep learning methods apply fixed nonlinear transformations to all the input signals, which has a negative impact on the discriminative feature learning ability, i.e., projecting the intra-class signals into the same region and the inter-class signals into distant regions. Aiming at this issue, this paper develops a new activation function, i.e., adaptively parametric rectifier linear units, and inserts the activation function into deep residual networks to improve the feature learning ability, so that each input signal is trained to have its own set of nonlinear transformations. To be specific, a sub-network is inserted as an embedded module to learn slopes to be used in the nonlinear transformation. The slopes are dependent on the input signal, and thereby the developed method has more flexible nonlinear transformations than the traditional deep learning methods. Finally, the improved performance of the developed method in learning discriminative features has been validated through fault diagnosis applications.
10 | 
11 | Reference:
12 | Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, Michael Pecht, Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis, IEEE Transactions on Industrial Electronics, 2020, DOI: 10.1109/TIE.2020.2972458, Date of Publication: 13 February 2020.
13 | 
14 | https://ieeexplore.ieee.org/document/8998530
15 | 


--------------------------------------------------------------------------------
/ResNet_APReLU_TFLearn.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | Created on Fri Apr 10 09:52:39 2020
  5 | Implemented using TensorFlow 1.0 and TFLearn 0.3.2
  6 | 
  7 | Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, Michael Pecht,
  8 | Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis, 
  9 | IEEE Transactions on Industrial Electronics, 2020,  DOI: 10.1109/TIE.2020.2972458 
 10 |  
 11 | @author: Minghang Zhao
 12 | """
 13 | 
 14 | from __future__ import division, print_function, absolute_import
 15 | 
 16 | import tflearn
 17 | import tensorflow as tf
 18 | from tflearn.activations import relu
 19 | from tflearn.layers.normalization import batch_normalization as bn
 20 | from tflearn.layers.core import input_data, fully_connected
 21 | from tflearn.layers.conv import conv_2d, global_avg_pool
 22 | import tflearn.data_utils as du
 23 | 
 24 | # Data loading and preprocessing
 25 | import tflearn.datasets.mnist as mnist
 26 | X, Y, testX, testY = mnist.load_data(one_hot=True)
 27 | X = X.reshape([-1, 28, 28, 1])
 28 | testX = testX.reshape([-1, 28, 28, 1])
 29 | X, mean = du.featurewise_zero_center(X)
 30 | testX = du.featurewise_zero_center(testX, mean)
 31 | 
 32 | # An adaptively parametric rectifier linear unit (APReLU)
 33 | def aprelu(incoming):
 34 |     in_channels = incoming.get_shape().as_list()[-1]
 35 |     scales_n = tf.reduce_mean(tf.reduce_mean(tf.minimum(incoming,0),axis=2,keep_dims=True),axis=1,keep_dims=True)
 36 |     scales_p = tf.reduce_mean(tf.reduce_mean(tf.maximum(incoming,0),axis=2,keep_dims=True),axis=1,keep_dims=True)
 37 |     scales = tf.concat([scales_n, scales_p],axis=3)
 38 |     scales = fully_connected(scales, in_channels, activation='linear',regularizer='L2',
 39 |                              weight_decay=0.0001,weights_init='variance_scaling')
 40 |     scales = relu(bn(scales))
 41 |     scales = fully_connected(scales, in_channels, activation='linear',regularizer='L2',
 42 |                              weight_decay=0.0001,weights_init='variance_scaling')
 43 |     scales = tflearn.activations.sigmoid(bn(scales))
 44 |     scales = tf.expand_dims(tf.expand_dims(scales,axis=1),axis=1)
 45 |     return tf.maximum(incoming, 0) + tf.multiply(scales, (incoming - tf.abs(incoming))) * 0.5
 46 | 
 47 | # A residual building block with APReLUs
 48 | def res_block_aprelu(incoming, nb_blocks, out_channels, downsample=False,
 49 |                      downsample_strides=2, batch_norm=True,
 50 |                      bias=True, weights_init='variance_scaling',
 51 |                      bias_init='zeros', regularizer='L2', weight_decay=0.0001,
 52 |                      trainable=True, restore=True, reuse=False, scope=None,
 53 |                      name="ResidualBlock"):
 54 | 
 55 |     resnet = incoming
 56 |     in_channels = incoming.get_shape().as_list()[-1]
 57 | 
 58 |     # Variable Scope fix for older TF
 59 |     try:
 60 |         vscope = tf.variable_scope(scope, default_name=name, values=[incoming],
 61 |                                    reuse=reuse)
 62 |     except Exception:
 63 |         vscope = tf.variable_op_scope([incoming], scope, name, reuse=reuse)
 64 | 
 65 |     with vscope as scope:
 66 |         name = scope.name #TODO
 67 | 
 68 |         for i in range(nb_blocks):
 69 | 
 70 |             identity = resnet
 71 | 
 72 |             if not downsample:
 73 |                 downsample_strides = 1
 74 | 
 75 |             if batch_norm:
 76 |                 resnet = bn(resnet)
 77 |             resnet = aprelu(resnet)
 78 |             resnet = conv_2d(resnet, out_channels, 3,
 79 |                              downsample_strides, 'same', 'linear',
 80 |                              bias, weights_init, bias_init,
 81 |                              regularizer, weight_decay, trainable,
 82 |                              restore)
 83 | 
 84 |             if batch_norm:
 85 |                 resnet = bn(resnet)
 86 |             resnet = aprelu(resnet)
 87 |             resnet = conv_2d(resnet, out_channels, 3, 1, 'same',
 88 |                              'linear', bias, weights_init,
 89 |                              bias_init, regularizer, weight_decay,
 90 |                              trainable, restore)
 91 | 
 92 |             # Downsampling
 93 |             if downsample_strides > 1:
 94 |                 identity = tflearn.avg_pool_2d(identity, 1, downsample_strides)
 95 | 
 96 |             # Projection to new dimension
 97 |             if in_channels != out_channels:
 98 |                 if (out_channels - in_channels) % 2 == 0:
 99 |                     ch = (out_channels - in_channels)//2
100 |                     identity = tf.pad(identity,
101 |                                       [[0, 0], [0, 0], [0, 0], [ch, ch]])
102 |                 else:
103 |                     ch = (out_channels - in_channels)//2
104 |                     identity = tf.pad(identity,
105 |                                       [[0, 0], [0, 0], [0, 0], [ch, ch+1]])
106 |                 in_channels = out_channels
107 | 
108 |             resnet = resnet + identity
109 | 
110 |     return resnet
111 | 
112 | # Real-time data preprocessing
113 | img_prep = tflearn.ImagePreprocessing()
114 | img_prep.add_featurewise_zero_center(per_channel=True)
115 | 
116 | # Building Residual Network
117 | net = input_data(shape=[None, 28, 28, 1],
118 |                  data_preprocessing=img_prep)
119 | net = conv_2d(net, 16, 3, regularizer='L2', weight_decay=0.0001)
120 | net = res_block_aprelu(net, 1, 16)
121 | net = res_block_aprelu(net, 1, 32, downsample=True)
122 | net = res_block_aprelu(net, 1, 32)
123 | net = aprelu(bn(net))
124 | net = global_avg_pool(net)
125 | # Regression
126 | net = fully_connected(net, 10, activation='softmax')
127 | mom = tflearn.Momentum(0.1, lr_decay=0.1, decay_step=2000, staircase=True)
128 | net = tflearn.regression(net, optimizer=mom, loss='categorical_crossentropy')
129 | # Training
130 | model = tflearn.DNN(net, checkpoint_path='model_resnet_mnist',
131 |                     max_checkpoints=10, tensorboard_verbose=0,
132 |                     clip_gradients=0.)
133 | 
134 | model.fit(X, Y, n_epoch=10, snapshot_epoch=False, 
135 |           snapshot_step=500, show_metric=True, batch_size=100, 
136 |           shuffle=True, run_id='resnet_mnist')
137 | 
138 | training_acc = model.evaluate(X, Y)[0]
139 | validation_acc = model.evaluate(testX, testY)[0]
140 | 


--------------------------------------------------------------------------------
/ResNet_APReLU_Keras_Cifar10.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | Created on Tue Apr 14 04:17:45 2020
  5 | Implemented using TensorFlow 1.10.0 and Keras 2.2.1
  6 | 
  7 | Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, Michael Pecht,
  8 | Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis, 
  9 | IEEE Transactions on Industrial Electronics, DOI: 10.1109/TIE.2020.2972458,
 10 | Date of Publication: 13 February 2020
 11 | 
 12 | @author: Minghang Zhao
 13 | """
 14 | 
 15 | from __future__ import print_function
 16 | import keras
 17 | import numpy as np
 18 | from keras.datasets import cifar10
 19 | from keras.layers import Dense, Conv2D, BatchNormalization, Activation, Minimum, Lambda
 20 | from keras.layers import AveragePooling2D, Input, GlobalAveragePooling2D, Concatenate, Reshape
 21 | from keras.regularizers import l2
 22 | from keras import backend as K
 23 | from keras.models import Model
 24 | from keras import optimizers
 25 | from keras.preprocessing.image import ImageDataGenerator
 26 | from keras.callbacks import LearningRateScheduler
 27 | K.set_learning_phase(1)
 28 | 
 29 | def cal_mean(inputs):
 30 |     outputs = K.mean(inputs, axis=1, keepdims=True)
 31 |     return outputs
 32 | 
 33 | # The data, split between train and test sets
 34 | (x_train, y_train), (x_test, y_test) = cifar10.load_data()
 35 | x_train = x_train.astype('float32') / 255.
 36 | x_test = x_test.astype('float32') / 255.
 37 | x_test = x_test-np.mean(x_train)
 38 | x_train = x_train-np.mean(x_train)
 39 | print('x_train shape:', x_train.shape)
 40 | print(x_train.shape[0], 'train samples')
 41 | print(x_test.shape[0], 'test samples')
 42 | 
 43 | # convert class vectors to binary class matrices
 44 | y_train = keras.utils.to_categorical(y_train, 10)
 45 | y_test = keras.utils.to_categorical(y_test, 10)
 46 | 
 47 | # Schedule the learning rate, multiply 0.1 every 300 epoches
 48 | def scheduler(epoch):
 49 |     if epoch % 300 == 0 and epoch != 0:
 50 |         lr = K.get_value(model.optimizer.lr)
 51 |         K.set_value(model.optimizer.lr, lr * 0.1)
 52 |         print("lr changed to {}".format(lr * 0.1))
 53 |     return K.get_value(model.optimizer.lr)
 54 | 
 55 | # An adaptively parametric rectifier linear unit (APReLU)
 56 | def aprelu(inputs):
 57 |     # get the number of channels
 58 |     channels = inputs.get_shape().as_list()[-1]
 59 |     # get a zero feature map
 60 |     zeros_input = keras.layers.subtract([inputs, inputs])
 61 |     # get a feature map with only positive features
 62 |     pos_input = Activation('relu')(inputs)
 63 |     # get a feature map with only negative features
 64 |     neg_input = Minimum()([inputs,zeros_input])
 65 |     # define a network to obtain the scaling coefficients
 66 |     scales_p = Lambda(cal_mean)(GlobalAveragePooling2D()(pos_input))
 67 |     scales_n = Lambda(cal_mean)(GlobalAveragePooling2D()(neg_input))
 68 |     scales = Concatenate()([scales_n, scales_p])
 69 |     scales = Dense(2, activation='linear', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(scales)
 70 |     scales = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(scales)
 71 |     scales = Activation('relu')(scales)
 72 |     scales = Dense(1, activation='linear', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(scales)
 73 |     scales = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(scales)
 74 |     scales = Activation('sigmoid')(scales)
 75 |     scales = Reshape((1,1,1))(scales)
 76 |     # apply a paramtetric relu
 77 |     neg_part = keras.layers.multiply([scales, neg_input])
 78 |     return keras.layers.add([pos_input, neg_part])
 79 | 
 80 | # Residual Block
 81 | def residual_block(incoming, nb_blocks, out_channels, downsample=False,
 82 |                    downsample_strides=2):
 83 |     
 84 |     residual = incoming
 85 |     in_channels = incoming.get_shape().as_list()[-1]
 86 |     
 87 |     for i in range(nb_blocks):
 88 |         
 89 |         identity = residual
 90 |         
 91 |         if not downsample:
 92 |             downsample_strides = 1
 93 |         
 94 |         residual = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(residual)
 95 |         residual = aprelu(residual)
 96 |         residual = Conv2D(out_channels, 3, strides=(downsample_strides, downsample_strides), 
 97 |                           padding='same', kernel_initializer='he_normal', 
 98 |                           kernel_regularizer=l2(1e-4))(residual)
 99 |         
100 |         residual = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(residual)
101 |         residual = aprelu(residual)
102 |         residual = Conv2D(out_channels, 3, padding='same', kernel_initializer='he_normal', 
103 |                           kernel_regularizer=l2(1e-4))(residual)
104 |         
105 |         # Downsampling
106 |         if downsample_strides > 1:
107 |             identity = AveragePooling2D(pool_size=(1,1), strides=(2,2))(identity)
108 |             
109 |         # Zero_padding to match channels
110 |         if in_channels != out_channels:
111 |             zeros_identity = keras.layers.subtract([identity, identity])
112 |             identity = keras.layers.concatenate([identity, zeros_identity])
113 |             in_channels = out_channels
114 |         
115 |         residual = keras.layers.add([residual, identity])
116 |     
117 |     return residual
118 | 
119 | 
120 | # define and train a model
121 | inputs = Input(shape=(32, 32, 3))
122 | net = Conv2D(64, 3, padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(inputs)
123 | net = residual_block(net, 20,  64, downsample=False)
124 | net = residual_block(net, 1,  128, downsample=True)
125 | net = residual_block(net, 19, 128, downsample=False)
126 | net = residual_block(net, 1,  256, downsample=True)
127 | net = residual_block(net, 19, 256, downsample=False)
128 | net = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(net)
129 | net = aprelu(net)
130 | net = GlobalAveragePooling2D()(net)
131 | outputs = Dense(10, activation='softmax', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(net)
132 | model = Model(inputs=inputs, outputs=outputs)
133 | sgd = optimizers.SGD(lr=0.1, decay=0., momentum=0.9, nesterov=True)
134 | model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
135 | 
136 | # data augmentation
137 | datagen = ImageDataGenerator(
138 |     # randomly rotate images in the range (deg 0 to 180)
139 |     rotation_range=30,
140 |     # Range for random zoom
141 |     zoom_range = 0.2,
142 |     # shear angle in counter-clockwise direction in degrees
143 |     shear_range = 30,
144 |     # randomly flip images
145 |     horizontal_flip=True,
146 |     # randomly shift images horizontally
147 |     width_shift_range=0.125,
148 |     # randomly shift images vertically
149 |     height_shift_range=0.125)
150 | 
151 | reduce_lr = LearningRateScheduler(scheduler)
152 | # fit the model on the batches generated by datagen.flow().
153 | model.fit_generator(datagen.flow(x_train, y_train, batch_size=100),
154 |                     validation_data=(x_test, y_test), epochs=1000, 
155 |                     verbose=1, callbacks=[reduce_lr], workers=4)
156 | 
157 | # get results
158 | K.set_learning_phase(0)
159 | DRSN_train_score = model.evaluate(x_train, y_train, batch_size=100, verbose=0)
160 | print('Train loss:', DRSN_train_score[0])
161 | print('Train accuracy:', DRSN_train_score[1])
162 | DRSN_test_score = model.evaluate(x_test, y_test, batch_size=100, verbose=0)
163 | print('Test loss:', DRSN_test_score[0])
164 | print('Test accuracy:', DRSN_test_score[1])
165 | 
166 | # results:
167 | # Train loss: 0.08789637273550034
168 | # Train accuracy: 0.999960000038147
169 | # Test loss: 0.25383884519338606
170 | # Test accuracy: 0.9577000015974044
171 | 


--------------------------------------------------------------------------------