├── README.md
├── imdb_lstm_test.py
└── lazyoptimizer.py
/README.md:
--------------------------------------------------------------------------------
1 | ## Keras Implement of Lazy Optimizer
2 |
3 | Inheriting Optimizer class, wrapping the original optimizer to achieve a new corresponding lazy optimizer.
4 |
5 | Here we use gradients are equal to zeros or not to distinguish whether the words are sampled or not.
6 |
7 | ### Usage
8 | just replace your original with-momentum optimizer, like `Adam(1e-3)`, with `LazyOptimizer(Adam(1e-3), embedding_layers)`.
9 |
10 | see imdb_lstm_test.py.
11 |
12 | ## Lazy类优化器的Keras实现
13 |
14 | 继承Optimizer类,包装原有优化器,实现Lazy版优化器。
15 |
16 | 这里判断一个词是否被采样的方法是检查该词的梯度是否全为0。
17 |
18 | ### 用法
19 | 直接将原来用的带动量的优化器, 如 `Adam(1e-3)`, 替换为 `LazyOptimizer(Adam(1e-3), embedding_layers)` 就行了.
20 |
21 | 参考 imdb_lstm_test.py.
22 |
23 | ## 交流
24 | QQ交流群:67729435,微信群请加机器人微信号spaces_ac_cn
25 |
--------------------------------------------------------------------------------
/imdb_lstm_test.py:
--------------------------------------------------------------------------------
1 | '''
2 | #Trains an LSTM model on the IMDB sentiment classification task.
3 | The dataset is actually too small for LSTM to be of any advantage
4 | compared to simpler, much faster methods such as TF-IDF + LogReg.
5 | **Notes**
6 | - RNNs are tricky. Choice of batch size is important,
7 | choice of loss and optimizer is critical, etc.
8 | Some configurations won't converge.
9 | - LSTM loss decrease patterns during training can be quite different
10 | from what you see with CNNs/MLPs/etc.
11 | - If use Adam(1e-3), the optimal accuracy is about 83.7%, and if use
12 | LazyOptimizer(Adam(1e-3), [embedding_layer]), the optimal is about 84.9%.
13 | '''
14 | from __future__ import print_function
15 |
16 | from keras.preprocessing import sequence
17 | from keras.models import Sequential
18 | from keras.layers import Dense, Embedding
19 | from keras.layers import LSTM
20 | from keras.datasets import imdb
21 | from lazyoptimizer import *
22 | from keras.optimizers import Adam
23 |
24 | max_features = 20000
25 | # cut texts after this number of words (among top max_features most common words)
26 | maxlen = 80
27 | batch_size = 32
28 |
29 | print('Loading data...')
30 | (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
31 | print(len(x_train), 'train sequences')
32 | print(len(x_test), 'test sequences')
33 |
34 | print('Pad sequences (samples x time)')
35 | x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
36 | x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
37 | print('x_train shape:', x_train.shape)
38 | print('x_test shape:', x_test.shape)
39 |
40 | print('Build model...')
41 | model = Sequential()
42 | embedding_layer = Embedding(max_features, 128)
43 | model.add(embedding_layer)
44 | model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
45 | model.add(Dense(1, activation='sigmoid'))
46 |
47 | # try using different optimizers and different optimizer configs
48 | model.compile(loss='binary_crossentropy',
49 | optimizer=LazyOptimizer(Adam(1e-3), [embedding_layer]),
50 | # optimizer=Adam(1e-3),
51 | metrics=['accuracy'])
52 |
53 | print('Train...')
54 | model.fit(x_train, y_train,
55 | batch_size=batch_size,
56 | epochs=15,
57 | validation_data=(x_test, y_test))
58 |
--------------------------------------------------------------------------------
/lazyoptimizer.py:
--------------------------------------------------------------------------------
1 | #! -*- coding: utf-8 -*-
2 |
3 | from keras.optimizers import Optimizer
4 | import keras.backend as K
5 |
6 |
7 | class LazyOptimizer(Optimizer):
8 | """Inheriting Optimizer class, wrapping the original optimizer
9 | to achieve a new corresponding lazy optimizer.
10 | (Not only LazyAdam, but also LazySGD with momentum if you like.)
11 | # Arguments
12 | optimizer: an instance of keras optimizer (supporting
13 | all keras optimizers currently available);
14 | embedding_layers: all Embedding layers you want to update sparsely.
15 | # Returns
16 | a new keras optimizer.
17 | 继承Optimizer类,包装原有优化器,实现Lazy版优化器
18 | (不局限于LazyAdam,任何带动量的优化器都可以有对应的Lazy版)。
19 | # 参数
20 | optimizer:优化器实例,支持目前所有的keras优化器;
21 | embedding_layers:模型中所有你喜欢稀疏更新的Embedding层。
22 | # 返回
23 | 一个新的keras优化器
24 | """
25 | def __init__(self, optimizer, embedding_layers=None, **kwargs):
26 | super(LazyOptimizer, self).__init__(**kwargs)
27 | self.optimizer = optimizer
28 | self.embeddings = []
29 | if embedding_layers is not None:
30 | for l in embedding_layers:
31 | self.embeddings.append(
32 | l.trainable_weights[0]
33 | )
34 | with K.name_scope(self.__class__.__name__):
35 | for attr in self.optimizer.get_config():
36 | if not hasattr(self, attr):
37 | value = getattr(self.optimizer, attr)
38 | setattr(self, attr, value)
39 | self.optimizer.get_gradients = self.get_gradients
40 | self._cache_grads = {}
41 | def get_gradients(self, loss, params):
42 | """Cache the gradients to avoiding recalculating.
43 | 把梯度缓存起来,避免重复计算,提高效率。
44 | """
45 | _params = []
46 | for p in params:
47 | if (loss, p) not in self._cache_grads:
48 | _params.append(p)
49 | _grads = super(LazyOptimizer, self).get_gradients(loss, _params)
50 | for p, g in zip(_params, _grads):
51 | self._cache_grads[(loss, p)] = g
52 | return [self._cache_grads[(loss, p)] for p in params]
53 | def get_updates(self, loss, params):
54 | # Only for initialization (仅初始化)
55 | self.optimizer.get_updates(loss, params)
56 | # Common updates (常规更新)
57 | dense_params = [p for p in params if p not in self.embeddings]
58 | self.updates = self.optimizer.get_updates(loss, dense_params)
59 | # Sparse update (稀疏更新)
60 | sparse_params = self.embeddings
61 | sparse_grads = self.get_gradients(loss, sparse_params)
62 | sparse_flags = [
63 | K.any(K.not_equal(g, 0), axis=-1, keepdims=True)
64 | for g in sparse_grads
65 | ]
66 | original_lr = self.optimizer.lr
67 | for f, p in zip(sparse_flags, sparse_params):
68 | self.optimizer.lr = original_lr * K.cast(f, 'float32')
69 | # updates only when gradients are not equal to zeros.
70 | # (gradients are equal to zeros means these words are not sampled very likely.)
71 | # 仅更新梯度不为0的Embedding(梯度为0意味着这些词很可能是没被采样到的)
72 | self.updates.extend(
73 | self.optimizer.get_updates(loss, [p])
74 | )
75 | self.optimizer.lr = original_lr
76 | return self.updates
77 | def get_config(self):
78 | config = self.optimizer.get_config()
79 | return config
80 |
--------------------------------------------------------------------------------