├── README.md ├── imdb_lstm_test.py └── lazyoptimizer.py /README.md: -------------------------------------------------------------------------------- 1 | ## Keras Implement of Lazy Optimizer 2 | 3 | Inheriting Optimizer class, wrapping the original optimizer to achieve a new corresponding lazy optimizer. 4 | 5 | Here we use gradients are equal to zeros or not to distinguish whether the words are sampled or not. 6 | 7 | ### Usage 8 | just replace your original with-momentum optimizer, like `Adam(1e-3)`, with `LazyOptimizer(Adam(1e-3), embedding_layers)`. 9 | 10 | see imdb_lstm_test.py. 11 | 12 | ## Lazy类优化器的Keras实现 13 | 14 | 继承Optimizer类,包装原有优化器,实现Lazy版优化器。 15 | 16 | 这里判断一个词是否被采样的方法是检查该词的梯度是否全为0。 17 | 18 | ### 用法 19 | 直接将原来用的带动量的优化器, 如 `Adam(1e-3)`, 替换为 `LazyOptimizer(Adam(1e-3), embedding_layers)` 就行了. 20 | 21 | 参考 imdb_lstm_test.py. 22 | 23 | ## 交流 24 | QQ交流群:67729435,微信群请加机器人微信号spaces_ac_cn 25 | -------------------------------------------------------------------------------- /imdb_lstm_test.py: -------------------------------------------------------------------------------- 1 | ''' 2 | #Trains an LSTM model on the IMDB sentiment classification task. 3 | The dataset is actually too small for LSTM to be of any advantage 4 | compared to simpler, much faster methods such as TF-IDF + LogReg. 5 | **Notes** 6 | - RNNs are tricky. Choice of batch size is important, 7 | choice of loss and optimizer is critical, etc. 8 | Some configurations won't converge. 9 | - LSTM loss decrease patterns during training can be quite different 10 | from what you see with CNNs/MLPs/etc. 11 | - If use Adam(1e-3), the optimal accuracy is about 83.7%, and if use 12 | LazyOptimizer(Adam(1e-3), [embedding_layer]), the optimal is about 84.9%. 13 | ''' 14 | from __future__ import print_function 15 | 16 | from keras.preprocessing import sequence 17 | from keras.models import Sequential 18 | from keras.layers import Dense, Embedding 19 | from keras.layers import LSTM 20 | from keras.datasets import imdb 21 | from lazyoptimizer import * 22 | from keras.optimizers import Adam 23 | 24 | max_features = 20000 25 | # cut texts after this number of words (among top max_features most common words) 26 | maxlen = 80 27 | batch_size = 32 28 | 29 | print('Loading data...') 30 | (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features) 31 | print(len(x_train), 'train sequences') 32 | print(len(x_test), 'test sequences') 33 | 34 | print('Pad sequences (samples x time)') 35 | x_train = sequence.pad_sequences(x_train, maxlen=maxlen) 36 | x_test = sequence.pad_sequences(x_test, maxlen=maxlen) 37 | print('x_train shape:', x_train.shape) 38 | print('x_test shape:', x_test.shape) 39 | 40 | print('Build model...') 41 | model = Sequential() 42 | embedding_layer = Embedding(max_features, 128) 43 | model.add(embedding_layer) 44 | model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2)) 45 | model.add(Dense(1, activation='sigmoid')) 46 | 47 | # try using different optimizers and different optimizer configs 48 | model.compile(loss='binary_crossentropy', 49 | optimizer=LazyOptimizer(Adam(1e-3), [embedding_layer]), 50 | # optimizer=Adam(1e-3), 51 | metrics=['accuracy']) 52 | 53 | print('Train...') 54 | model.fit(x_train, y_train, 55 | batch_size=batch_size, 56 | epochs=15, 57 | validation_data=(x_test, y_test)) 58 | -------------------------------------------------------------------------------- /lazyoptimizer.py: -------------------------------------------------------------------------------- 1 | #! -*- coding: utf-8 -*- 2 | 3 | from keras.optimizers import Optimizer 4 | import keras.backend as K 5 | 6 | 7 | class LazyOptimizer(Optimizer): 8 | """Inheriting Optimizer class, wrapping the original optimizer 9 | to achieve a new corresponding lazy optimizer. 10 | (Not only LazyAdam, but also LazySGD with momentum if you like.) 11 | # Arguments 12 | optimizer: an instance of keras optimizer (supporting 13 | all keras optimizers currently available); 14 | embedding_layers: all Embedding layers you want to update sparsely. 15 | # Returns 16 | a new keras optimizer. 17 | 继承Optimizer类,包装原有优化器,实现Lazy版优化器 18 | (不局限于LazyAdam,任何带动量的优化器都可以有对应的Lazy版)。 19 | # 参数 20 | optimizer:优化器实例,支持目前所有的keras优化器; 21 | embedding_layers:模型中所有你喜欢稀疏更新的Embedding层。 22 | # 返回 23 | 一个新的keras优化器 24 | """ 25 | def __init__(self, optimizer, embedding_layers=None, **kwargs): 26 | super(LazyOptimizer, self).__init__(**kwargs) 27 | self.optimizer = optimizer 28 | self.embeddings = [] 29 | if embedding_layers is not None: 30 | for l in embedding_layers: 31 | self.embeddings.append( 32 | l.trainable_weights[0] 33 | ) 34 | with K.name_scope(self.__class__.__name__): 35 | for attr in self.optimizer.get_config(): 36 | if not hasattr(self, attr): 37 | value = getattr(self.optimizer, attr) 38 | setattr(self, attr, value) 39 | self.optimizer.get_gradients = self.get_gradients 40 | self._cache_grads = {} 41 | def get_gradients(self, loss, params): 42 | """Cache the gradients to avoiding recalculating. 43 | 把梯度缓存起来,避免重复计算,提高效率。 44 | """ 45 | _params = [] 46 | for p in params: 47 | if (loss, p) not in self._cache_grads: 48 | _params.append(p) 49 | _grads = super(LazyOptimizer, self).get_gradients(loss, _params) 50 | for p, g in zip(_params, _grads): 51 | self._cache_grads[(loss, p)] = g 52 | return [self._cache_grads[(loss, p)] for p in params] 53 | def get_updates(self, loss, params): 54 | # Only for initialization (仅初始化) 55 | self.optimizer.get_updates(loss, params) 56 | # Common updates (常规更新) 57 | dense_params = [p for p in params if p not in self.embeddings] 58 | self.updates = self.optimizer.get_updates(loss, dense_params) 59 | # Sparse update (稀疏更新) 60 | sparse_params = self.embeddings 61 | sparse_grads = self.get_gradients(loss, sparse_params) 62 | sparse_flags = [ 63 | K.any(K.not_equal(g, 0), axis=-1, keepdims=True) 64 | for g in sparse_grads 65 | ] 66 | original_lr = self.optimizer.lr 67 | for f, p in zip(sparse_flags, sparse_params): 68 | self.optimizer.lr = original_lr * K.cast(f, 'float32') 69 | # updates only when gradients are not equal to zeros. 70 | # (gradients are equal to zeros means these words are not sampled very likely.) 71 | # 仅更新梯度不为0的Embedding(梯度为0意味着这些词很可能是没被采样到的) 72 | self.updates.extend( 73 | self.optimizer.get_updates(loss, [p]) 74 | ) 75 | self.optimizer.lr = original_lr 76 | return self.updates 77 | def get_config(self): 78 | config = self.optimizer.get_config() 79 | return config 80 | --------------------------------------------------------------------------------