├── Basic-idea-of-APReLU.png ├── ResNet-APReLU-for-Fault-Diagnosis.pdf ├── README.md ├── ResNet_APReLU_TFLearn.py └── ResNet_APReLU_Keras_Cifar10.py /Basic-idea-of-APReLU.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhao62/Adaptively-Parametric-ReLU/HEAD/Basic-idea-of-APReLU.png -------------------------------------------------------------------------------- /ResNet-APReLU-for-Fault-Diagnosis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhao62/Adaptively-Parametric-ReLU/HEAD/ResNet-APReLU-for-Fault-Diagnosis.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Adaptively-Parametric-ReLU 2 | The adaptively parametric ReLU (APReLU) is a dynamic ReLU activation function that performs non-identically for input samples. 3 | 4 | Besides, although the APReLU was originally applied to vibration-based fault diagnosis, it can be applied in other applications, such as image classification. In this code, the APReLU is implemented using TensorFlow 1.0.1 and TFLearn 0.3.2, and applied for image classification. 5 | 6 | ![The basic idea of APReLU](https://github.com/zhao62/Adaptively-Parametric-ReLU/blob/master/Basic-idea-of-APReLU.png) 7 | 8 | Abstract: 9 | Vibration signals under the same health state often have large differences due to changes in operating conditions. Likewise, the differences among vibration signals under different health states can be small under some operating conditions. Traditional deep learning methods apply fixed nonlinear transformations to all the input signals, which has a negative impact on the discriminative feature learning ability, i.e., projecting the intra-class signals into the same region and the inter-class signals into distant regions. Aiming at this issue, this paper develops a new activation function, i.e., adaptively parametric rectifier linear units, and inserts the activation function into deep residual networks to improve the feature learning ability, so that each input signal is trained to have its own set of nonlinear transformations. To be specific, a sub-network is inserted as an embedded module to learn slopes to be used in the nonlinear transformation. The slopes are dependent on the input signal, and thereby the developed method has more flexible nonlinear transformations than the traditional deep learning methods. Finally, the improved performance of the developed method in learning discriminative features has been validated through fault diagnosis applications. 10 | 11 | Reference: 12 | Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, Michael Pecht, Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis, IEEE Transactions on Industrial Electronics, 2020, DOI: 10.1109/TIE.2020.2972458, Date of Publication: 13 February 2020. 13 | 14 | https://ieeexplore.ieee.org/document/8998530 15 | -------------------------------------------------------------------------------- /ResNet_APReLU_TFLearn.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Fri Apr 10 09:52:39 2020 5 | Implemented using TensorFlow 1.0 and TFLearn 0.3.2 6 | 7 | Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, Michael Pecht, 8 | Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis, 9 | IEEE Transactions on Industrial Electronics, 2020, DOI: 10.1109/TIE.2020.2972458 10 | 11 | @author: Minghang Zhao 12 | """ 13 | 14 | from __future__ import division, print_function, absolute_import 15 | 16 | import tflearn 17 | import tensorflow as tf 18 | from tflearn.activations import relu 19 | from tflearn.layers.normalization import batch_normalization as bn 20 | from tflearn.layers.core import input_data, fully_connected 21 | from tflearn.layers.conv import conv_2d, global_avg_pool 22 | import tflearn.data_utils as du 23 | 24 | # Data loading and preprocessing 25 | import tflearn.datasets.mnist as mnist 26 | X, Y, testX, testY = mnist.load_data(one_hot=True) 27 | X = X.reshape([-1, 28, 28, 1]) 28 | testX = testX.reshape([-1, 28, 28, 1]) 29 | X, mean = du.featurewise_zero_center(X) 30 | testX = du.featurewise_zero_center(testX, mean) 31 | 32 | # An adaptively parametric rectifier linear unit (APReLU) 33 | def aprelu(incoming): 34 | in_channels = incoming.get_shape().as_list()[-1] 35 | scales_n = tf.reduce_mean(tf.reduce_mean(tf.minimum(incoming,0),axis=2,keep_dims=True),axis=1,keep_dims=True) 36 | scales_p = tf.reduce_mean(tf.reduce_mean(tf.maximum(incoming,0),axis=2,keep_dims=True),axis=1,keep_dims=True) 37 | scales = tf.concat([scales_n, scales_p],axis=3) 38 | scales = fully_connected(scales, in_channels, activation='linear',regularizer='L2', 39 | weight_decay=0.0001,weights_init='variance_scaling') 40 | scales = relu(bn(scales)) 41 | scales = fully_connected(scales, in_channels, activation='linear',regularizer='L2', 42 | weight_decay=0.0001,weights_init='variance_scaling') 43 | scales = tflearn.activations.sigmoid(bn(scales)) 44 | scales = tf.expand_dims(tf.expand_dims(scales,axis=1),axis=1) 45 | return tf.maximum(incoming, 0) + tf.multiply(scales, (incoming - tf.abs(incoming))) * 0.5 46 | 47 | # A residual building block with APReLUs 48 | def res_block_aprelu(incoming, nb_blocks, out_channels, downsample=False, 49 | downsample_strides=2, batch_norm=True, 50 | bias=True, weights_init='variance_scaling', 51 | bias_init='zeros', regularizer='L2', weight_decay=0.0001, 52 | trainable=True, restore=True, reuse=False, scope=None, 53 | name="ResidualBlock"): 54 | 55 | resnet = incoming 56 | in_channels = incoming.get_shape().as_list()[-1] 57 | 58 | # Variable Scope fix for older TF 59 | try: 60 | vscope = tf.variable_scope(scope, default_name=name, values=[incoming], 61 | reuse=reuse) 62 | except Exception: 63 | vscope = tf.variable_op_scope([incoming], scope, name, reuse=reuse) 64 | 65 | with vscope as scope: 66 | name = scope.name #TODO 67 | 68 | for i in range(nb_blocks): 69 | 70 | identity = resnet 71 | 72 | if not downsample: 73 | downsample_strides = 1 74 | 75 | if batch_norm: 76 | resnet = bn(resnet) 77 | resnet = aprelu(resnet) 78 | resnet = conv_2d(resnet, out_channels, 3, 79 | downsample_strides, 'same', 'linear', 80 | bias, weights_init, bias_init, 81 | regularizer, weight_decay, trainable, 82 | restore) 83 | 84 | if batch_norm: 85 | resnet = bn(resnet) 86 | resnet = aprelu(resnet) 87 | resnet = conv_2d(resnet, out_channels, 3, 1, 'same', 88 | 'linear', bias, weights_init, 89 | bias_init, regularizer, weight_decay, 90 | trainable, restore) 91 | 92 | # Downsampling 93 | if downsample_strides > 1: 94 | identity = tflearn.avg_pool_2d(identity, 1, downsample_strides) 95 | 96 | # Projection to new dimension 97 | if in_channels != out_channels: 98 | if (out_channels - in_channels) % 2 == 0: 99 | ch = (out_channels - in_channels)//2 100 | identity = tf.pad(identity, 101 | [[0, 0], [0, 0], [0, 0], [ch, ch]]) 102 | else: 103 | ch = (out_channels - in_channels)//2 104 | identity = tf.pad(identity, 105 | [[0, 0], [0, 0], [0, 0], [ch, ch+1]]) 106 | in_channels = out_channels 107 | 108 | resnet = resnet + identity 109 | 110 | return resnet 111 | 112 | # Real-time data preprocessing 113 | img_prep = tflearn.ImagePreprocessing() 114 | img_prep.add_featurewise_zero_center(per_channel=True) 115 | 116 | # Building Residual Network 117 | net = input_data(shape=[None, 28, 28, 1], 118 | data_preprocessing=img_prep) 119 | net = conv_2d(net, 16, 3, regularizer='L2', weight_decay=0.0001) 120 | net = res_block_aprelu(net, 1, 16) 121 | net = res_block_aprelu(net, 1, 32, downsample=True) 122 | net = res_block_aprelu(net, 1, 32) 123 | net = aprelu(bn(net)) 124 | net = global_avg_pool(net) 125 | # Regression 126 | net = fully_connected(net, 10, activation='softmax') 127 | mom = tflearn.Momentum(0.1, lr_decay=0.1, decay_step=2000, staircase=True) 128 | net = tflearn.regression(net, optimizer=mom, loss='categorical_crossentropy') 129 | # Training 130 | model = tflearn.DNN(net, checkpoint_path='model_resnet_mnist', 131 | max_checkpoints=10, tensorboard_verbose=0, 132 | clip_gradients=0.) 133 | 134 | model.fit(X, Y, n_epoch=10, snapshot_epoch=False, 135 | snapshot_step=500, show_metric=True, batch_size=100, 136 | shuffle=True, run_id='resnet_mnist') 137 | 138 | training_acc = model.evaluate(X, Y)[0] 139 | validation_acc = model.evaluate(testX, testY)[0] 140 | -------------------------------------------------------------------------------- /ResNet_APReLU_Keras_Cifar10.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Created on Tue Apr 14 04:17:45 2020 5 | Implemented using TensorFlow 1.10.0 and Keras 2.2.1 6 | 7 | Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, Michael Pecht, 8 | Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis, 9 | IEEE Transactions on Industrial Electronics, DOI: 10.1109/TIE.2020.2972458, 10 | Date of Publication: 13 February 2020 11 | 12 | @author: Minghang Zhao 13 | """ 14 | 15 | from __future__ import print_function 16 | import keras 17 | import numpy as np 18 | from keras.datasets import cifar10 19 | from keras.layers import Dense, Conv2D, BatchNormalization, Activation, Minimum, Lambda 20 | from keras.layers import AveragePooling2D, Input, GlobalAveragePooling2D, Concatenate, Reshape 21 | from keras.regularizers import l2 22 | from keras import backend as K 23 | from keras.models import Model 24 | from keras import optimizers 25 | from keras.preprocessing.image import ImageDataGenerator 26 | from keras.callbacks import LearningRateScheduler 27 | K.set_learning_phase(1) 28 | 29 | def cal_mean(inputs): 30 | outputs = K.mean(inputs, axis=1, keepdims=True) 31 | return outputs 32 | 33 | # The data, split between train and test sets 34 | (x_train, y_train), (x_test, y_test) = cifar10.load_data() 35 | x_train = x_train.astype('float32') / 255. 36 | x_test = x_test.astype('float32') / 255. 37 | x_test = x_test-np.mean(x_train) 38 | x_train = x_train-np.mean(x_train) 39 | print('x_train shape:', x_train.shape) 40 | print(x_train.shape[0], 'train samples') 41 | print(x_test.shape[0], 'test samples') 42 | 43 | # convert class vectors to binary class matrices 44 | y_train = keras.utils.to_categorical(y_train, 10) 45 | y_test = keras.utils.to_categorical(y_test, 10) 46 | 47 | # Schedule the learning rate, multiply 0.1 every 300 epoches 48 | def scheduler(epoch): 49 | if epoch % 300 == 0 and epoch != 0: 50 | lr = K.get_value(model.optimizer.lr) 51 | K.set_value(model.optimizer.lr, lr * 0.1) 52 | print("lr changed to {}".format(lr * 0.1)) 53 | return K.get_value(model.optimizer.lr) 54 | 55 | # An adaptively parametric rectifier linear unit (APReLU) 56 | def aprelu(inputs): 57 | # get the number of channels 58 | channels = inputs.get_shape().as_list()[-1] 59 | # get a zero feature map 60 | zeros_input = keras.layers.subtract([inputs, inputs]) 61 | # get a feature map with only positive features 62 | pos_input = Activation('relu')(inputs) 63 | # get a feature map with only negative features 64 | neg_input = Minimum()([inputs,zeros_input]) 65 | # define a network to obtain the scaling coefficients 66 | scales_p = Lambda(cal_mean)(GlobalAveragePooling2D()(pos_input)) 67 | scales_n = Lambda(cal_mean)(GlobalAveragePooling2D()(neg_input)) 68 | scales = Concatenate()([scales_n, scales_p]) 69 | scales = Dense(2, activation='linear', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(scales) 70 | scales = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(scales) 71 | scales = Activation('relu')(scales) 72 | scales = Dense(1, activation='linear', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(scales) 73 | scales = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(scales) 74 | scales = Activation('sigmoid')(scales) 75 | scales = Reshape((1,1,1))(scales) 76 | # apply a paramtetric relu 77 | neg_part = keras.layers.multiply([scales, neg_input]) 78 | return keras.layers.add([pos_input, neg_part]) 79 | 80 | # Residual Block 81 | def residual_block(incoming, nb_blocks, out_channels, downsample=False, 82 | downsample_strides=2): 83 | 84 | residual = incoming 85 | in_channels = incoming.get_shape().as_list()[-1] 86 | 87 | for i in range(nb_blocks): 88 | 89 | identity = residual 90 | 91 | if not downsample: 92 | downsample_strides = 1 93 | 94 | residual = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(residual) 95 | residual = aprelu(residual) 96 | residual = Conv2D(out_channels, 3, strides=(downsample_strides, downsample_strides), 97 | padding='same', kernel_initializer='he_normal', 98 | kernel_regularizer=l2(1e-4))(residual) 99 | 100 | residual = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(residual) 101 | residual = aprelu(residual) 102 | residual = Conv2D(out_channels, 3, padding='same', kernel_initializer='he_normal', 103 | kernel_regularizer=l2(1e-4))(residual) 104 | 105 | # Downsampling 106 | if downsample_strides > 1: 107 | identity = AveragePooling2D(pool_size=(1,1), strides=(2,2))(identity) 108 | 109 | # Zero_padding to match channels 110 | if in_channels != out_channels: 111 | zeros_identity = keras.layers.subtract([identity, identity]) 112 | identity = keras.layers.concatenate([identity, zeros_identity]) 113 | in_channels = out_channels 114 | 115 | residual = keras.layers.add([residual, identity]) 116 | 117 | return residual 118 | 119 | 120 | # define and train a model 121 | inputs = Input(shape=(32, 32, 3)) 122 | net = Conv2D(64, 3, padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(inputs) 123 | net = residual_block(net, 20, 64, downsample=False) 124 | net = residual_block(net, 1, 128, downsample=True) 125 | net = residual_block(net, 19, 128, downsample=False) 126 | net = residual_block(net, 1, 256, downsample=True) 127 | net = residual_block(net, 19, 256, downsample=False) 128 | net = BatchNormalization(momentum=0.9, gamma_regularizer=l2(1e-4))(net) 129 | net = aprelu(net) 130 | net = GlobalAveragePooling2D()(net) 131 | outputs = Dense(10, activation='softmax', kernel_initializer='he_normal', kernel_regularizer=l2(1e-4))(net) 132 | model = Model(inputs=inputs, outputs=outputs) 133 | sgd = optimizers.SGD(lr=0.1, decay=0., momentum=0.9, nesterov=True) 134 | model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) 135 | 136 | # data augmentation 137 | datagen = ImageDataGenerator( 138 | # randomly rotate images in the range (deg 0 to 180) 139 | rotation_range=30, 140 | # Range for random zoom 141 | zoom_range = 0.2, 142 | # shear angle in counter-clockwise direction in degrees 143 | shear_range = 30, 144 | # randomly flip images 145 | horizontal_flip=True, 146 | # randomly shift images horizontally 147 | width_shift_range=0.125, 148 | # randomly shift images vertically 149 | height_shift_range=0.125) 150 | 151 | reduce_lr = LearningRateScheduler(scheduler) 152 | # fit the model on the batches generated by datagen.flow(). 153 | model.fit_generator(datagen.flow(x_train, y_train, batch_size=100), 154 | validation_data=(x_test, y_test), epochs=1000, 155 | verbose=1, callbacks=[reduce_lr], workers=4) 156 | 157 | # get results 158 | K.set_learning_phase(0) 159 | DRSN_train_score = model.evaluate(x_train, y_train, batch_size=100, verbose=0) 160 | print('Train loss:', DRSN_train_score[0]) 161 | print('Train accuracy:', DRSN_train_score[1]) 162 | DRSN_test_score = model.evaluate(x_test, y_test, batch_size=100, verbose=0) 163 | print('Test loss:', DRSN_test_score[0]) 164 | print('Test accuracy:', DRSN_test_score[1]) 165 | 166 | # results: 167 | # Train loss: 0.08789637273550034 168 | # Train accuracy: 0.999960000038147 169 | # Test loss: 0.25383884519338606 170 | # Test accuracy: 0.9577000015974044 171 | --------------------------------------------------------------------------------