├── README.md
├── imdb_lstm_test.py
└── lazyoptimizer.py


/README.md:
--------------------------------------------------------------------------------
 1 | ## Keras Implement of Lazy Optimizer
 2 | 
 3 | Inheriting Optimizer class, wrapping the original optimizer to achieve a new corresponding lazy optimizer.
 4 | 
 5 | Here we use gradients are equal to zeros or not to distinguish whether the words are sampled or not.
 6 | 
 7 | ### Usage 
 8 | just replace your original with-momentum optimizer, like `Adam(1e-3)`, with `LazyOptimizer(Adam(1e-3), embedding_layers)`.
 9 | 
10 | see <a href="https://github.com/bojone/keras_lazyoptimizer/blob/master/imdb_lstm_test.py">imdb_lstm_test.py</a>.
11 | 
12 | ## Lazy类优化器的Keras实现
13 | 
14 | 继承Optimizer类，包装原有优化器，实现Lazy版优化器。
15 | 
16 | 这里判断一个词是否被采样的方法是检查该词的梯度是否全为0。
17 | 
18 | ### 用法
19 | 直接将原来用的带动量的优化器, 如 `Adam(1e-3)`, 替换为 `LazyOptimizer(Adam(1e-3), embedding_layers)` 就行了.
20 | 
21 | 参考 <a href="https://github.com/bojone/keras_lazyoptimizer/blob/master/imdb_lstm_test.py">imdb_lstm_test.py</a>.
22 | 
23 | ## 交流
24 | QQ交流群：67729435，微信群请加机器人微信号spaces_ac_cn
25 | 


--------------------------------------------------------------------------------
/imdb_lstm_test.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | #Trains an LSTM model on the IMDB sentiment classification task.
 3 | The dataset is actually too small for LSTM to be of any advantage
 4 | compared to simpler, much faster methods such as TF-IDF + LogReg.
 5 | **Notes**
 6 | - RNNs are tricky. Choice of batch size is important,
 7 | choice of loss and optimizer is critical, etc.
 8 | Some configurations won't converge.
 9 | - LSTM loss decrease patterns during training can be quite different
10 | from what you see with CNNs/MLPs/etc.
11 | - If use Adam(1e-3), the optimal accuracy is about 83.7%, and if use
12 | LazyOptimizer(Adam(1e-3), [embedding_layer]), the optimal is about 84.9%.
13 | '''
14 | from __future__ import print_function
15 | 
16 | from keras.preprocessing import sequence
17 | from keras.models import Sequential
18 | from keras.layers import Dense, Embedding
19 | from keras.layers import LSTM
20 | from keras.datasets import imdb
21 | from lazyoptimizer import *
22 | from keras.optimizers import Adam
23 | 
24 | max_features = 20000
25 | # cut texts after this number of words (among top max_features most common words)
26 | maxlen = 80
27 | batch_size = 32
28 | 
29 | print('Loading data...')
30 | (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
31 | print(len(x_train), 'train sequences')
32 | print(len(x_test), 'test sequences')
33 | 
34 | print('Pad sequences (samples x time)')
35 | x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
36 | x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
37 | print('x_train shape:', x_train.shape)
38 | print('x_test shape:', x_test.shape)
39 | 
40 | print('Build model...')
41 | model = Sequential()
42 | embedding_layer = Embedding(max_features, 128)
43 | model.add(embedding_layer)
44 | model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
45 | model.add(Dense(1, activation='sigmoid'))
46 | 
47 | # try using different optimizers and different optimizer configs
48 | model.compile(loss='binary_crossentropy',
49 |               optimizer=LazyOptimizer(Adam(1e-3), [embedding_layer]),
50 |               # optimizer=Adam(1e-3),
51 |               metrics=['accuracy'])
52 | 
53 | print('Train...')
54 | model.fit(x_train, y_train,
55 |           batch_size=batch_size,
56 |           epochs=15,
57 |           validation_data=(x_test, y_test))
58 | 


--------------------------------------------------------------------------------
/lazyoptimizer.py:
--------------------------------------------------------------------------------
 1 | #! -*- coding: utf-8 -*-
 2 | 
 3 | from keras.optimizers import Optimizer
 4 | import keras.backend as K
 5 | 
 6 | 
 7 | class LazyOptimizer(Optimizer):
 8 |     """Inheriting Optimizer class, wrapping the original optimizer
 9 |     to achieve a new corresponding lazy optimizer.
10 |     (Not only LazyAdam, but also LazySGD with momentum if you like.)
11 |     # Arguments
12 |         optimizer: an instance of keras optimizer (supporting
13 |                     all keras optimizers currently available);
14 |         embedding_layers: all Embedding layers you want to update sparsely.
15 |     # Returns
16 |         a new keras optimizer.
17 |     继承Optimizer类，包装原有优化器，实现Lazy版优化器
18 |     （不局限于LazyAdam，任何带动量的优化器都可以有对应的Lazy版）。
19 |     # 参数
20 |         optimizer：优化器实例，支持目前所有的keras优化器；
21 |         embedding_layers：模型中所有你喜欢稀疏更新的Embedding层。
22 |     # 返回
23 |         一个新的keras优化器
24 |     """
25 |     def __init__(self, optimizer, embedding_layers=None, **kwargs):
26 |         super(LazyOptimizer, self).__init__(**kwargs)
27 |         self.optimizer = optimizer
28 |         self.embeddings = []
29 |         if embedding_layers is not None:
30 |             for l in embedding_layers:
31 |                 self.embeddings.append(
32 |                     l.trainable_weights[0]
33 |                 )
34 |         with K.name_scope(self.__class__.__name__):
35 |             for attr in self.optimizer.get_config():
36 |                 if not hasattr(self, attr):
37 |                     value = getattr(self.optimizer, attr)
38 |                     setattr(self, attr, value)
39 |         self.optimizer.get_gradients = self.get_gradients
40 |         self._cache_grads = {}
41 |     def get_gradients(self, loss, params):
42 |         """Cache the gradients to avoiding recalculating.
43 |         把梯度缓存起来，避免重复计算，提高效率。
44 |         """
45 |         _params = []
46 |         for p in params:
47 |             if (loss, p) not in self._cache_grads:
48 |                 _params.append(p)
49 |         _grads = super(LazyOptimizer, self).get_gradients(loss, _params)
50 |         for p, g in zip(_params, _grads):
51 |             self._cache_grads[(loss, p)] = g
52 |         return [self._cache_grads[(loss, p)] for p in params]
53 |     def get_updates(self, loss, params):
54 |         # Only for initialization (仅初始化)
55 |         self.optimizer.get_updates(loss, params)
56 |         # Common updates (常规更新)
57 |         dense_params = [p for p in params if p not in self.embeddings]
58 |         self.updates = self.optimizer.get_updates(loss, dense_params)
59 |         # Sparse update (稀疏更新)
60 |         sparse_params = self.embeddings
61 |         sparse_grads = self.get_gradients(loss, sparse_params)
62 |         sparse_flags = [
63 |             K.any(K.not_equal(g, 0), axis=-1, keepdims=True)
64 |             for g in sparse_grads
65 |         ]
66 |         original_lr = self.optimizer.lr
67 |         for f, p in zip(sparse_flags, sparse_params):
68 |             self.optimizer.lr = original_lr * K.cast(f, 'float32')
69 |             # updates only when gradients are not equal to zeros.
70 |             # (gradients are equal to zeros means these words are not sampled very likely.)
71 |             # 仅更新梯度不为0的Embedding（梯度为0意味着这些词很可能是没被采样到的）
72 |             self.updates.extend(
73 |                 self.optimizer.get_updates(loss, [p])
74 |             )
75 |         self.optimizer.lr = original_lr
76 |         return self.updates
77 |     def get_config(self):
78 |         config = self.optimizer.get_config()
79 |         return config
80 | 


--------------------------------------------------------------------------------