├── .gitignore
├── A_learning_notes
    ├── .ipynb_checkpoints
    │   ├── Q&A-checkpoint.ipynb
    │   └── generate_process-checkpoint.ipynb
    ├── Q&A.ipynb
    └── generate_process.ipynb
├── LICENSE
├── README.md
├── backbone
    ├── __init__.py
    ├── basic_backbone.py
    ├── mixnet18.py
    ├── mobilenet_v2.py
    ├── resnet18.py
    ├── resnet18_v2.py
    └── resnext.py
├── configs.py
├── dataset
    ├── __init__.py
    ├── dataset_util.py
    ├── file_util.py
    ├── test_sample
    │   ├── label.txt
    │   └── 鲁BC6T76.jpg
    └── tfrecord_util.py
├── images
    ├── GHM-insight.jpg
    ├── focal-loss.jpg
    ├── mixnet-18.svg
    ├── mobilenet-v2.svg
    ├── resnet-18-v2.svg
    ├── resnet-18.svg
    └── resnext-18.svg
├── multi_label
    ├── __init__.py
    ├── multi_label_loss.py
    ├── multi_label_model.py
    └── trainer.py
├── requirements.txt
├── run.py
└── utils
    ├── __init__.py
    ├── check_label_file.py
    ├── draw_tools.py
    ├── logger_callback.py
    └── radam.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | **/*.py[cod]
3 | logs/
4 | models/


--------------------------------------------------------------------------------
/A_learning_notes/.ipynb_checkpoints/Q&A-checkpoint.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "collapsed": true,
  7 |     "pycharm": {
  8 |      "name": "#%% md\n"
  9 |     }
 10 |    },
 11 |    "source": [
 12 |     "### Q. 在编译模型阶段，定义了多输出loss函数和权重，但在训练阶段，打印的loss却不等于各个loss的加权和\n",
 13 |     "```\n",
 14 |     "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n",
 15 |     "model.fit(dataset, epochs=2, steps_per_epoch=2, verbose=1)\n",
 16 |     "```\n",
 17 |     "输出：loss(21.9610) != 0.5 * 1.3583 - 0.5 * 1.5867\n",
 18 |     "```\n",
 19 |     "Epoch 1/2\n",
 20 |     "1/2 [==============>...............] - ETA: 0s - loss: 21.9610 - dense_4_loss: 1.3583 - dense_5_loss: 1.5867\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n",
 21 |     "2/2 [==============================] - 0s 140ms/step - loss: 22.1960 - dense_4_loss: 1.9502 - dense_5_loss: 1.5183\n",
 22 |     "Epoch 2/2\n",
 23 |     "1/2 [==============>...............] - ETA: 0s - loss: 21.8526 - dense_4_loss: 1.3555 - dense_5_loss: 1.5861\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n",
 24 |     "2/2 [==============================] - 0s 1ms/step - loss: 22.0872 - dense_4_loss: 1.9470 - dense_5_loss: 1.5171\n",
 25 |     "```\n",
 26 |     "\n",
 27 |     "\n",
 28 |     "**Answer**: 因为总的loss中包含了权重正则化损失部分：\n",
 29 |     "```\n",
 30 |     "def build_net(input_tensor):\n",
 31 |     "    out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
 32 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
 33 |     "    out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
 34 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
 35 |     "    return [out1, out2]\n",
 36 |     "```\n"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "---"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "markdown",
 48 |    "metadata": {},
 49 |    "source": [
 50 |     "### Q. 在将`.ckpt.index` + `.ckpt.data` 模型转为`pb`的时候，为什么还要先保存为`h5`，然后再加载模型，再保存为`pb`？\n",
 51 |     "\n",
 52 |     "\n",
 53 |     "**Answer**: 因为原来保存为`.ckpt.index` + `.ckpt.data` 的时候没有保存图信息，加载也只加载权重信息：\n",
 54 |     "```\n",
 55 |     "model.load_weights(latest)\n",
 56 |     "...\n",
 57 |     "cp_callback = ModelCheckpoint(path, save_weights_only=True, period=ckpt_period)\n",
 58 |     "```\n",
 59 |     "导致`keras.backend.get_session().graph.as_graph_def()`没有图结构信息。\n",
 60 |     "（理论上我是构建了网络图模型，然后再加载权重的，所以应该也得有图结构信息，但实际上没有）\n",
 61 |     "所以需要将模型完全保存为`h5`（包含图信息），然后重新加载进来，再保存为`pb`：\n",
 62 |     "```\n",
 63 |     "model.save(h5_path, overwrite=True, include_optimizer=False)\n",
 64 |     "model = keras.models.load_model(h5_path)\n",
 65 |     "...\n",
 66 |     "graph = tf.graph_util.remove_training_nodes(sess.graph.as_graph_def())\n",
 67 |     "graph_frozen = tf.graph_util.convert_variables_to_constants(sess, graph, output_names)\n",
 68 |     "tf.train.write_graph(graph_frozen, pb_model_dir, pb_model_name, as_text=False)\n",
 69 |     "```\n"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "markdown",
 74 |    "metadata": {},
 75 |    "source": [
 76 |     "---"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "markdown",
 81 |    "metadata": {},
 82 |    "source": [
 83 |     "\n",
 84 |     "### Q. 一直没办法用多GPU的模式运行？\n",
 85 |     "\n",
 86 |     "\n",
 87 |     "**Answer**: `tf.enable_eager_execution()`模型跑不了多GPU，要注释掉这句。\n"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "markdown",
 92 |    "metadata": {},
 93 |    "source": [
 94 |     "---"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "markdown",
 99 |    "metadata": {},
100 |    "source": [
101 |     "\n",
102 |     "### Q. Jupyter Notebook运行tf.keras.Model对象训练、预测，在model.predict()时报错`CancelledError:  [Op:StatefulPartitionedCall]`？\n",
103 |     "\n",
104 |     "**Answer**：不清楚为什么，但是如果选择 `Kernel -> Restart & Run All` 则能得到正确的结果。\n"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "markdown",
109 |    "metadata": {},
110 |    "source": [
111 |     "---"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "markdown",
116 |    "metadata": {},
117 |    "source": [
118 |     "### Q. TensorFlow自定义层有bug太难调了！！！\n",
119 |     "\n",
120 |     "\n",
121 |     "**Answer**：目前我调试就只有几个操作：\n",
122 |     "- tf.print() + with tf.control_dependencies()，打印信息；\n",
123 |     "- with tf.name_scope('name')，给操作加名称，定位到出错的局部操作；\n",
124 |     "- tf_debug.has_inf_or_nan，或自定义只有inf/nan，查看出现inf和nan的位置；\n",
125 |     "- with tf.GradientTape(persistent=persistent) as tape，详细查看梯度。"
126 |    ]
127 |   }
128 |  ],
129 |  "metadata": {
130 |   "kernelspec": {
131 |    "display_name": "Python 3",
132 |    "language": "python",
133 |    "name": "python3"
134 |   },
135 |   "language_info": {
136 |    "codemirror_mode": {
137 |     "name": "ipython",
138 |     "version": 3
139 |    },
140 |    "file_extension": ".py",
141 |    "mimetype": "text/x-python",
142 |    "name": "python",
143 |    "nbconvert_exporter": "python",
144 |    "pygments_lexer": "ipython3",
145 |    "version": "3.6.5"
146 |   },
147 |   "pycharm": {
148 |    "stem_cell": {
149 |     "cell_type": "raw",
150 |     "metadata": {
151 |      "collapsed": false
152 |     },
153 |     "source": []
154 |    }
155 |   },
156 |   "toc": {
157 |    "base_numbering": 1,
158 |    "nav_menu": {},
159 |    "number_sections": true,
160 |    "sideBar": true,
161 |    "skip_h1_title": false,
162 |    "title_cell": "Table of Contents",
163 |    "title_sidebar": "Contents",
164 |    "toc_cell": false,
165 |    "toc_position": {},
166 |    "toc_section_display": true,
167 |    "toc_window_display": false
168 |   }
169 |  },
170 |  "nbformat": 4,
171 |  "nbformat_minor": 1
172 | }
173 | 


--------------------------------------------------------------------------------
/A_learning_notes/.ipynb_checkpoints/generate_process-checkpoint.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "pycharm": {
  7 |      "name": "#%% md\n"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "## 本文主要记录多标签多分类模型的实现过程\n",
 12 |     "\n",
 13 |     "### 整体流程\n",
 14 |     "1. 依据数据格式，实现“数据读取”功能；（单元测试）\n",
 15 |     "2. 基础主干网络ResNet-18实现；\n",
 16 |     "3. 实现多标签多分类head，形成整体模型；（与2联合测试，绘制网络）\n",
 17 |     "4. 多标签多分类模型损失函数实现；\n",
 18 |     "5. 边边角角：配置与训练脚本、测试脚本、预测脚本，等等；（整体测试）\n",
 19 |     "6. 进阶修改：损失函数修改，主干网络修改，等等。（整体测试）"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "### 构造训练数据集"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 1,
 32 |    "metadata": {},
 33 |    "outputs": [
 34 |     {
 35 |      "name": "stdout",
 36 |      "output_type": "stream",
 37 |      "text": [
 38 |       "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\data\\ops\\iterator_ops.py:532: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
 39 |       "Instructions for updating:\n",
 40 |       "Colocations handled automatically by placer.\n",
 41 |       "=================================   0  =======================================\n",
 42 |       "input is: \n",
 43 |       " tf.Tensor(\n",
 44 |       "[[-0.54730872  0.26720298]\n",
 45 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
 46 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
 47 |       "output is: \n",
 48 |       " tf.Tensor(\n",
 49 |       "[[1.89145406]\n",
 50 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
 51 |       " tf.Tensor(\n",
 52 |       "[[-0.21285691]\n",
 53 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n",
 54 |       "=================================   1  =======================================\n",
 55 |       "input is: \n",
 56 |       " tf.Tensor(\n",
 57 |       "[[ 1.00501827 -0.83485065]\n",
 58 |       " [ 1.67905237  1.30604547]], shape=(2, 2), dtype=float64)\n",
 59 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
 60 |       "output is: \n",
 61 |       " tf.Tensor(\n",
 62 |       "[[-1.10457522]\n",
 63 |       " [ 0.64685953]], shape=(2, 1), dtype=float64) \n",
 64 |       " tf.Tensor(\n",
 65 |       "[[-0.47960561]\n",
 66 |       " [-0.93504079]], shape=(2, 1), dtype=float64)\n",
 67 |       "=================================   2  =======================================\n",
 68 |       "input is: \n",
 69 |       " tf.Tensor(\n",
 70 |       "[[-0.54730872  0.26720298]\n",
 71 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
 72 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
 73 |       "output is: \n",
 74 |       " tf.Tensor(\n",
 75 |       "[[1.89145406]\n",
 76 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
 77 |       " tf.Tensor(\n",
 78 |       "[[-0.21285691]\n",
 79 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n"
 80 |      ]
 81 |     }
 82 |    ],
 83 |    "source": [
 84 |     "import tensorflow as tf\n",
 85 |     "import numpy as np\n",
 86 |     "tf.enable_eager_execution()\n",
 87 |     "input = np.random.normal(0, 1, [4, 2])\n",
 88 |     "out_1 = np.random.normal(0, 1, [4, 1])\n",
 89 |     "out_2 = np.random.normal(0, 1, [4, 1])\n",
 90 |     "dataset = tf.data.Dataset.from_tensor_slices((input, (out_1, out_2)))\n",
 91 |     "dataset = dataset.repeat().batch(2).prefetch(buffer_size=4)\n",
 92 |     "\n",
 93 |     "# test\n",
 94 |     "for i, data in enumerate(dataset):\n",
 95 |     "    # (input, (out_1, out_2))\n",
 96 |     "    print('=================================   {}  ======================================='.format(i))\n",
 97 |     "    print('input is: \\n', data[0])\n",
 98 |     "    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n",
 99 |     "    print('output is: \\n', data[1][0], '\\n', data[1][1])\n",
100 |     "    if i >= 2:\n",
101 |     "        break"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "markdown",
106 |    "metadata": {
107 |     "pycharm": {
108 |      "name": "#%% md\n"
109 |     }
110 |    },
111 |    "source": [
112 |     "### 建立keras模型\n",
113 |     "1. 定义骨干网络；\n",
114 |     "1. 实现多标签多分类head，形成整体模型；"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": 2,
120 |    "metadata": {
121 |     "pycharm": {
122 |      "is_executing": false,
123 |      "name": "#%%\n"
124 |     }
125 |    },
126 |    "outputs": [],
127 |    "source": [
128 |     "from tensorflow import keras\n",
129 |     "\n",
130 |     "\n",
131 |     "def build_net(input_tensor):\n",
132 |     "    out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
133 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
134 |     "    out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
135 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
136 |     "    return [out1, out2]\n",
137 |     "\n",
138 |     "\n",
139 |     "feature_input = keras.layers.Input(shape=(2,), name='feature_input')\n",
140 |     "outputs = build_net(feature_input)\n",
141 |     "model = keras.models.Model(feature_input, outputs)"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {
147 |     "pycharm": {
148 |      "name": "#%% md\n"
149 |     }
150 |    },
151 |    "source": [
152 |     "### 定义loss函数"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "code",
157 |    "execution_count": 3,
158 |    "metadata": {
159 |     "pycharm": {
160 |      "is_executing": false,
161 |      "name": "#%%\n"
162 |     }
163 |    },
164 |    "outputs": [
165 |     {
166 |      "name": "stdout",
167 |      "output_type": "stream",
168 |      "text": [
169 |       "__________________________________________________________________________________________________\n",
170 |       "Layer (type)                    Output Shape         Param #     Connected to                     \n",
171 |       "==================================================================================================\n",
172 |       "feature_input (InputLayer)      (None, 2)            0                                            \n",
173 |       "__________________________________________________________________________________________________\n",
174 |       "dense (Dense)                   (None, 1)            3           feature_input[0][0]              \n",
175 |       "__________________________________________________________________________________________________\n",
176 |       "dense_1 (Dense)                 (None, 1)            3           feature_input[0][0]              \n",
177 |       "==================================================================================================\n",
178 |       "Total params: 6\n",
179 |       "Trainable params: 6\n",
180 |       "Non-trainable params: 0\n",
181 |       "__________________________________________________________________________________________________\n"
182 |      ]
183 |     }
184 |    ],
185 |    "source": [
186 |     "import tensorflow as tf\n",
187 |     "\n",
188 |     "\n",
189 |     "def my_loss(y_dummy, pred):\n",
190 |     "    loss = tf.keras.losses.mean_absolute_error(y_dummy, pred)\n",
191 |     "    return loss\n",
192 |     "\n",
193 |     "\n",
194 |     "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n",
195 |     "model.summary()"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "metadata": {},
201 |    "source": [
202 |     "### 训练与测试"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": 4,
208 |    "metadata": {},
209 |    "outputs": [
210 |     {
211 |      "name": "stdout",
212 |      "output_type": "stream",
213 |      "text": [
214 |       "Epoch 1/5\n",
215 |       "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\ops\\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
216 |       "Instructions for updating:\n",
217 |       "Use tf.cast instead.\n",
218 |       "2/2 [==============================] - 0s 213ms/step - loss: 9.9042 - dense_loss: 1.0547 - dense_1_loss: 0.8552\n",
219 |       "Epoch 2/5\n",
220 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.8423 - dense_loss: 1.0522 - dense_1_loss: 0.8527\n",
221 |       "Epoch 3/5\n",
222 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.7809 - dense_loss: 1.0497 - dense_1_loss: 0.8506\n",
223 |       "Epoch 4/5\n",
224 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.7199 - dense_loss: 1.0473 - dense_1_loss: 0.8484\n",
225 |       "Epoch 5/5\n",
226 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.6592 - dense_loss: 1.0448 - dense_1_loss: 0.8463\n",
227 |       "=================================   0  =======================================\n",
228 |       "input is: \n",
229 |       " tf.Tensor(\n",
230 |       "[[-0.54730872  0.26720298]\n",
231 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
232 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
233 |       "output is: \n",
234 |       " tf.Tensor(\n",
235 |       "[[1.89145406]\n",
236 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
237 |       " tf.Tensor(\n",
238 |       "[[-0.21285691]\n",
239 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n",
240 |       "predictions is: \n",
241 |       " [[-0.1156919]\n",
242 |       " [-0.1786184]] \n",
243 |       " [[-0.41703436]\n",
244 |       " [-0.56524646]]\n",
245 |       "=================================   1  =======================================\n",
246 |       "input is: \n",
247 |       " tf.Tensor(\n",
248 |       "[[ 1.00501827 -0.83485065]\n",
249 |       " [ 1.67905237  1.30604547]], shape=(2, 2), dtype=float64)\n",
250 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
251 |       "output is: \n",
252 |       " tf.Tensor(\n",
253 |       "[[-1.10457522]\n",
254 |       " [ 0.64685953]], shape=(2, 1), dtype=float64) \n",
255 |       " tf.Tensor(\n",
256 |       "[[-0.47960561]\n",
257 |       " [-0.93504079]], shape=(2, 1), dtype=float64)\n",
258 |       "predictions is: \n",
259 |       " [[0.2573548 ]\n",
260 |       " [0.23849788]] \n",
261 |       " [[ 1.0578033 ]\n",
262 |       " [-0.49074632]]\n",
263 |       "=================================   2  =======================================\n",
264 |       "input is: \n",
265 |       " tf.Tensor(\n",
266 |       "[[-0.54730872  0.26720298]\n",
267 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
268 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
269 |       "output is: \n",
270 |       " tf.Tensor(\n",
271 |       "[[1.89145406]\n",
272 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
273 |       " tf.Tensor(\n",
274 |       "[[-0.21285691]\n",
275 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n",
276 |       "predictions is: \n",
277 |       " [[-0.1156919]\n",
278 |       " [-0.1786184]] \n",
279 |       " [[-0.41703436]\n",
280 |       " [-0.56524646]]\n"
281 |      ]
282 |     }
283 |    ],
284 |    "source": [
285 |     "# 训练\n",
286 |     "model.fit(dataset, epochs=5, steps_per_epoch=2, verbose=1)\n",
287 |     "\n",
288 |     "# 测试\n",
289 |     "for i, data in enumerate(dataset):\n",
290 |     "    print('=================================   {}  ======================================='.format(i))\n",
291 |     "    print('input is: \\n', data[0])\n",
292 |     "    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n",
293 |     "    print('output is: \\n', data[1][0], '\\n', data[1][1])\n",
294 |     "    predictions = model.predict(np.array(data[0]))\n",
295 |     "    print('predictions is: \\n', predictions[0], '\\n', predictions[1])\n",
296 |     "    if i >= 2:\n",
297 |     "        break"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "markdown",
302 |    "metadata": {},
303 |    "source": [
304 |     "更细致的debug（查看梯度、打印操作等），可看详细查看本工程。\n"
305 |    ]
306 |   }
307 |  ],
308 |  "metadata": {
309 |   "kernelspec": {
310 |    "display_name": "tf13",
311 |    "language": "python",
312 |    "name": "tf13"
313 |   },
314 |   "language_info": {
315 |    "codemirror_mode": {
316 |     "name": "ipython",
317 |     "version": 3
318 |    },
319 |    "file_extension": ".py",
320 |    "mimetype": "text/x-python",
321 |    "name": "python",
322 |    "nbconvert_exporter": "python",
323 |    "pygments_lexer": "ipython3",
324 |    "version": "3.6.9"
325 |   },
326 |   "pycharm": {
327 |    "stem_cell": {
328 |     "cell_type": "raw",
329 |     "metadata": {
330 |      "collapsed": false
331 |     },
332 |     "source": []
333 |    }
334 |   },
335 |   "toc": {
336 |    "base_numbering": 1,
337 |    "nav_menu": {},
338 |    "number_sections": true,
339 |    "sideBar": true,
340 |    "skip_h1_title": false,
341 |    "title_cell": "Table of Contents",
342 |    "title_sidebar": "Contents",
343 |    "toc_cell": false,
344 |    "toc_position": {},
345 |    "toc_section_display": true,
346 |    "toc_window_display": false
347 |   }
348 |  },
349 |  "nbformat": 4,
350 |  "nbformat_minor": 1
351 | }
352 | 


--------------------------------------------------------------------------------
/A_learning_notes/Q&A.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "collapsed": true,
  7 |     "pycharm": {
  8 |      "name": "#%% md\n"
  9 |     }
 10 |    },
 11 |    "source": [
 12 |     "### Q. 在编译模型阶段，定义了多输出loss函数和权重，但在训练阶段，打印的loss却不等于各个loss的加权和\n",
 13 |     "```\n",
 14 |     "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n",
 15 |     "model.fit(dataset, epochs=2, steps_per_epoch=2, verbose=1)\n",
 16 |     "```\n",
 17 |     "输出：loss(21.9610) != 0.5 * 1.3583 - 0.5 * 1.5867\n",
 18 |     "```\n",
 19 |     "Epoch 1/2\n",
 20 |     "1/2 [==============>...............] - ETA: 0s - loss: 21.9610 - dense_4_loss: 1.3583 - dense_5_loss: 1.5867\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n",
 21 |     "2/2 [==============================] - 0s 140ms/step - loss: 22.1960 - dense_4_loss: 1.9502 - dense_5_loss: 1.5183\n",
 22 |     "Epoch 2/2\n",
 23 |     "1/2 [==============>...............] - ETA: 0s - loss: 21.8526 - dense_4_loss: 1.3555 - dense_5_loss: 1.5861\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n",
 24 |     "2/2 [==============================] - 0s 1ms/step - loss: 22.0872 - dense_4_loss: 1.9470 - dense_5_loss: 1.5171\n",
 25 |     "```\n",
 26 |     "\n",
 27 |     "\n",
 28 |     "**Answer**: 因为总的loss中包含了权重正则化损失部分：\n",
 29 |     "```\n",
 30 |     "def build_net(input_tensor):\n",
 31 |     "    out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
 32 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
 33 |     "    out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
 34 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
 35 |     "    return [out1, out2]\n",
 36 |     "```\n"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "---"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "markdown",
 48 |    "metadata": {},
 49 |    "source": [
 50 |     "### Q. 在将`.ckpt.index` + `.ckpt.data` 模型转为`pb`的时候，为什么还要先保存为`h5`，然后再加载模型，再保存为`pb`？\n",
 51 |     "\n",
 52 |     "\n",
 53 |     "**Answer**: 因为原来保存为`.ckpt.index` + `.ckpt.data` 的时候没有保存图信息，加载也只加载权重信息：\n",
 54 |     "```\n",
 55 |     "model.load_weights(latest)\n",
 56 |     "...\n",
 57 |     "cp_callback = ModelCheckpoint(path, save_weights_only=True, period=ckpt_period)\n",
 58 |     "```\n",
 59 |     "导致`keras.backend.get_session().graph.as_graph_def()`没有图结构信息。\n",
 60 |     "（理论上我是构建了网络图模型，然后再加载权重的，所以应该也得有图结构信息，但实际上没有）\n",
 61 |     "所以需要将模型完全保存为`h5`（包含图信息），然后重新加载进来，再保存为`pb`：\n",
 62 |     "```\n",
 63 |     "model.save(h5_path, overwrite=True, include_optimizer=False)\n",
 64 |     "model = keras.models.load_model(h5_path)\n",
 65 |     "...\n",
 66 |     "graph = tf.graph_util.remove_training_nodes(sess.graph.as_graph_def())\n",
 67 |     "graph_frozen = tf.graph_util.convert_variables_to_constants(sess, graph, output_names)\n",
 68 |     "tf.train.write_graph(graph_frozen, pb_model_dir, pb_model_name, as_text=False)\n",
 69 |     "```\n"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "markdown",
 74 |    "metadata": {},
 75 |    "source": [
 76 |     "---"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "markdown",
 81 |    "metadata": {},
 82 |    "source": [
 83 |     "\n",
 84 |     "### Q. 一直没办法用多GPU的模式运行？\n",
 85 |     "\n",
 86 |     "\n",
 87 |     "**Answer**: `tf.enable_eager_execution()`模型跑不了多GPU，要注释掉这句。\n"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "markdown",
 92 |    "metadata": {},
 93 |    "source": [
 94 |     "---"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "markdown",
 99 |    "metadata": {},
100 |    "source": [
101 |     "\n",
102 |     "### Q. Jupyter Notebook运行tf.keras.Model对象训练、预测，在model.predict()时报错`CancelledError:  [Op:StatefulPartitionedCall]`？\n",
103 |     "\n",
104 |     "**Answer**：不清楚为什么，但是如果选择 `Kernel -> Restart & Run All` 则能得到正确的结果。\n"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "markdown",
109 |    "metadata": {},
110 |    "source": [
111 |     "---"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "markdown",
116 |    "metadata": {},
117 |    "source": [
118 |     "### Q. TensorFlow自定义层有bug太难调了！！！\n",
119 |     "\n",
120 |     "\n",
121 |     "**Answer**：目前我调试就只有几个操作：\n",
122 |     "- tf.print() + with tf.control_dependencies()，打印信息；\n",
123 |     "- with tf.name_scope('name')，给操作加名称，定位到出错的局部操作；\n",
124 |     "- tf_debug.has_inf_or_nan，或自定义只有inf/nan，查看出现inf和nan的位置；\n",
125 |     "- with tf.GradientTape(persistent=persistent) as tape，详细查看梯度。"
126 |    ]
127 |   }
128 |  ],
129 |  "metadata": {
130 |   "kernelspec": {
131 |    "display_name": "Python 3",
132 |    "language": "python",
133 |    "name": "python3"
134 |   },
135 |   "language_info": {
136 |    "codemirror_mode": {
137 |     "name": "ipython",
138 |     "version": 3
139 |    },
140 |    "file_extension": ".py",
141 |    "mimetype": "text/x-python",
142 |    "name": "python",
143 |    "nbconvert_exporter": "python",
144 |    "pygments_lexer": "ipython3",
145 |    "version": "3.6.5"
146 |   },
147 |   "pycharm": {
148 |    "stem_cell": {
149 |     "cell_type": "raw",
150 |     "metadata": {
151 |      "collapsed": false
152 |     },
153 |     "source": []
154 |    }
155 |   },
156 |   "toc": {
157 |    "base_numbering": 1,
158 |    "nav_menu": {},
159 |    "number_sections": true,
160 |    "sideBar": true,
161 |    "skip_h1_title": false,
162 |    "title_cell": "Table of Contents",
163 |    "title_sidebar": "Contents",
164 |    "toc_cell": false,
165 |    "toc_position": {},
166 |    "toc_section_display": true,
167 |    "toc_window_display": false
168 |   }
169 |  },
170 |  "nbformat": 4,
171 |  "nbformat_minor": 1
172 | }
173 | 


--------------------------------------------------------------------------------
/A_learning_notes/generate_process.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "pycharm": {
  7 |      "name": "#%% md\n"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "## 本文主要记录多标签多分类模型的实现过程\n",
 12 |     "\n",
 13 |     "### 整体流程\n",
 14 |     "1. 依据数据格式，实现“数据读取”功能；（单元测试）\n",
 15 |     "2. 基础主干网络ResNet-18实现；\n",
 16 |     "3. 实现多标签多分类head，形成整体模型；（与2联合测试，绘制网络）\n",
 17 |     "4. 多标签多分类模型损失函数实现；\n",
 18 |     "5. 边边角角：配置与训练脚本、测试脚本、预测脚本，等等；（整体测试）\n",
 19 |     "6. 进阶修改：损失函数修改，主干网络修改，等等。（整体测试）"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "### 构造训练数据集"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 1,
 32 |    "metadata": {},
 33 |    "outputs": [
 34 |     {
 35 |      "name": "stdout",
 36 |      "output_type": "stream",
 37 |      "text": [
 38 |       "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\data\\ops\\iterator_ops.py:532: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
 39 |       "Instructions for updating:\n",
 40 |       "Colocations handled automatically by placer.\n",
 41 |       "=================================   0  =======================================\n",
 42 |       "input is: \n",
 43 |       " tf.Tensor(\n",
 44 |       "[[-0.54730872  0.26720298]\n",
 45 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
 46 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
 47 |       "output is: \n",
 48 |       " tf.Tensor(\n",
 49 |       "[[1.89145406]\n",
 50 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
 51 |       " tf.Tensor(\n",
 52 |       "[[-0.21285691]\n",
 53 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n",
 54 |       "=================================   1  =======================================\n",
 55 |       "input is: \n",
 56 |       " tf.Tensor(\n",
 57 |       "[[ 1.00501827 -0.83485065]\n",
 58 |       " [ 1.67905237  1.30604547]], shape=(2, 2), dtype=float64)\n",
 59 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
 60 |       "output is: \n",
 61 |       " tf.Tensor(\n",
 62 |       "[[-1.10457522]\n",
 63 |       " [ 0.64685953]], shape=(2, 1), dtype=float64) \n",
 64 |       " tf.Tensor(\n",
 65 |       "[[-0.47960561]\n",
 66 |       " [-0.93504079]], shape=(2, 1), dtype=float64)\n",
 67 |       "=================================   2  =======================================\n",
 68 |       "input is: \n",
 69 |       " tf.Tensor(\n",
 70 |       "[[-0.54730872  0.26720298]\n",
 71 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
 72 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
 73 |       "output is: \n",
 74 |       " tf.Tensor(\n",
 75 |       "[[1.89145406]\n",
 76 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
 77 |       " tf.Tensor(\n",
 78 |       "[[-0.21285691]\n",
 79 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n"
 80 |      ]
 81 |     }
 82 |    ],
 83 |    "source": [
 84 |     "import tensorflow as tf\n",
 85 |     "import numpy as np\n",
 86 |     "tf.enable_eager_execution()\n",
 87 |     "input = np.random.normal(0, 1, [4, 2])\n",
 88 |     "out_1 = np.random.normal(0, 1, [4, 1])\n",
 89 |     "out_2 = np.random.normal(0, 1, [4, 1])\n",
 90 |     "dataset = tf.data.Dataset.from_tensor_slices((input, (out_1, out_2)))\n",
 91 |     "dataset = dataset.repeat().batch(2).prefetch(buffer_size=4)\n",
 92 |     "\n",
 93 |     "# test\n",
 94 |     "for i, data in enumerate(dataset):\n",
 95 |     "    # (input, (out_1, out_2))\n",
 96 |     "    print('=================================   {}  ======================================='.format(i))\n",
 97 |     "    print('input is: \\n', data[0])\n",
 98 |     "    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n",
 99 |     "    print('output is: \\n', data[1][0], '\\n', data[1][1])\n",
100 |     "    if i >= 2:\n",
101 |     "        break"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "markdown",
106 |    "metadata": {
107 |     "pycharm": {
108 |      "name": "#%% md\n"
109 |     }
110 |    },
111 |    "source": [
112 |     "### 建立keras模型\n",
113 |     "1. 定义骨干网络；\n",
114 |     "1. 实现多标签多分类head，形成整体模型；"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": 2,
120 |    "metadata": {
121 |     "pycharm": {
122 |      "is_executing": false,
123 |      "name": "#%%\n"
124 |     }
125 |    },
126 |    "outputs": [],
127 |    "source": [
128 |     "from tensorflow import keras\n",
129 |     "\n",
130 |     "\n",
131 |     "def build_net(input_tensor):\n",
132 |     "    out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
133 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
134 |     "    out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n",
135 |     "                              kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n",
136 |     "    return [out1, out2]\n",
137 |     "\n",
138 |     "\n",
139 |     "feature_input = keras.layers.Input(shape=(2,), name='feature_input')\n",
140 |     "outputs = build_net(feature_input)\n",
141 |     "model = keras.models.Model(feature_input, outputs)"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "markdown",
146 |    "metadata": {
147 |     "pycharm": {
148 |      "name": "#%% md\n"
149 |     }
150 |    },
151 |    "source": [
152 |     "### 定义loss函数"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "code",
157 |    "execution_count": 3,
158 |    "metadata": {
159 |     "pycharm": {
160 |      "is_executing": false,
161 |      "name": "#%%\n"
162 |     }
163 |    },
164 |    "outputs": [
165 |     {
166 |      "name": "stdout",
167 |      "output_type": "stream",
168 |      "text": [
169 |       "__________________________________________________________________________________________________\n",
170 |       "Layer (type)                    Output Shape         Param #     Connected to                     \n",
171 |       "==================================================================================================\n",
172 |       "feature_input (InputLayer)      (None, 2)            0                                            \n",
173 |       "__________________________________________________________________________________________________\n",
174 |       "dense (Dense)                   (None, 1)            3           feature_input[0][0]              \n",
175 |       "__________________________________________________________________________________________________\n",
176 |       "dense_1 (Dense)                 (None, 1)            3           feature_input[0][0]              \n",
177 |       "==================================================================================================\n",
178 |       "Total params: 6\n",
179 |       "Trainable params: 6\n",
180 |       "Non-trainable params: 0\n",
181 |       "__________________________________________________________________________________________________\n"
182 |      ]
183 |     }
184 |    ],
185 |    "source": [
186 |     "import tensorflow as tf\n",
187 |     "\n",
188 |     "\n",
189 |     "def my_loss(y_dummy, pred):\n",
190 |     "    loss = tf.keras.losses.mean_absolute_error(y_dummy, pred)\n",
191 |     "    return loss\n",
192 |     "\n",
193 |     "\n",
194 |     "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n",
195 |     "model.summary()"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "metadata": {},
201 |    "source": [
202 |     "### 训练与测试"
203 |    ]
204 |   },
205 |   {
206 |    "cell_type": "code",
207 |    "execution_count": 4,
208 |    "metadata": {},
209 |    "outputs": [
210 |     {
211 |      "name": "stdout",
212 |      "output_type": "stream",
213 |      "text": [
214 |       "Epoch 1/5\n",
215 |       "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\ops\\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
216 |       "Instructions for updating:\n",
217 |       "Use tf.cast instead.\n",
218 |       "2/2 [==============================] - 0s 213ms/step - loss: 9.9042 - dense_loss: 1.0547 - dense_1_loss: 0.8552\n",
219 |       "Epoch 2/5\n",
220 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.8423 - dense_loss: 1.0522 - dense_1_loss: 0.8527\n",
221 |       "Epoch 3/5\n",
222 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.7809 - dense_loss: 1.0497 - dense_1_loss: 0.8506\n",
223 |       "Epoch 4/5\n",
224 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.7199 - dense_loss: 1.0473 - dense_1_loss: 0.8484\n",
225 |       "Epoch 5/5\n",
226 |       "2/2 [==============================] - 0s 2ms/step - loss: 9.6592 - dense_loss: 1.0448 - dense_1_loss: 0.8463\n",
227 |       "=================================   0  =======================================\n",
228 |       "input is: \n",
229 |       " tf.Tensor(\n",
230 |       "[[-0.54730872  0.26720298]\n",
231 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
232 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
233 |       "output is: \n",
234 |       " tf.Tensor(\n",
235 |       "[[1.89145406]\n",
236 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
237 |       " tf.Tensor(\n",
238 |       "[[-0.21285691]\n",
239 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n",
240 |       "predictions is: \n",
241 |       " [[-0.1156919]\n",
242 |       " [-0.1786184]] \n",
243 |       " [[-0.41703436]\n",
244 |       " [-0.56524646]]\n",
245 |       "=================================   1  =======================================\n",
246 |       "input is: \n",
247 |       " tf.Tensor(\n",
248 |       "[[ 1.00501827 -0.83485065]\n",
249 |       " [ 1.67905237  1.30604547]], shape=(2, 2), dtype=float64)\n",
250 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
251 |       "output is: \n",
252 |       " tf.Tensor(\n",
253 |       "[[-1.10457522]\n",
254 |       " [ 0.64685953]], shape=(2, 1), dtype=float64) \n",
255 |       " tf.Tensor(\n",
256 |       "[[-0.47960561]\n",
257 |       " [-0.93504079]], shape=(2, 1), dtype=float64)\n",
258 |       "predictions is: \n",
259 |       " [[0.2573548 ]\n",
260 |       " [0.23849788]] \n",
261 |       " [[ 1.0578033 ]\n",
262 |       " [-0.49074632]]\n",
263 |       "=================================   2  =======================================\n",
264 |       "input is: \n",
265 |       " tf.Tensor(\n",
266 |       "[[-0.54730872  0.26720298]\n",
267 |       " [-0.86050071  0.31083289]], shape=(2, 2), dtype=float64)\n",
268 |       "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n",
269 |       "output is: \n",
270 |       " tf.Tensor(\n",
271 |       "[[1.89145406]\n",
272 |       " [0.21500577]], shape=(2, 1), dtype=float64) \n",
273 |       " tf.Tensor(\n",
274 |       "[[-0.21285691]\n",
275 |       " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n",
276 |       "predictions is: \n",
277 |       " [[-0.1156919]\n",
278 |       " [-0.1786184]] \n",
279 |       " [[-0.41703436]\n",
280 |       " [-0.56524646]]\n"
281 |      ]
282 |     }
283 |    ],
284 |    "source": [
285 |     "# 训练\n",
286 |     "model.fit(dataset, epochs=5, steps_per_epoch=2, verbose=1)\n",
287 |     "\n",
288 |     "# 测试\n",
289 |     "for i, data in enumerate(dataset):\n",
290 |     "    print('=================================   {}  ======================================='.format(i))\n",
291 |     "    print('input is: \\n', data[0])\n",
292 |     "    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n",
293 |     "    print('output is: \\n', data[1][0], '\\n', data[1][1])\n",
294 |     "    predictions = model.predict(np.array(data[0]))\n",
295 |     "    print('predictions is: \\n', predictions[0], '\\n', predictions[1])\n",
296 |     "    if i >= 2:\n",
297 |     "        break"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "markdown",
302 |    "metadata": {},
303 |    "source": [
304 |     "更细致的debug（查看梯度、打印操作等），可看详细查看本工程。\n"
305 |    ]
306 |   }
307 |  ],
308 |  "metadata": {
309 |   "kernelspec": {
310 |    "display_name": "tf13",
311 |    "language": "python",
312 |    "name": "tf13"
313 |   },
314 |   "language_info": {
315 |    "codemirror_mode": {
316 |     "name": "ipython",
317 |     "version": 3
318 |    },
319 |    "file_extension": ".py",
320 |    "mimetype": "text/x-python",
321 |    "name": "python",
322 |    "nbconvert_exporter": "python",
323 |    "pygments_lexer": "ipython3",
324 |    "version": "3.6.9"
325 |   },
326 |   "pycharm": {
327 |    "stem_cell": {
328 |     "cell_type": "raw",
329 |     "metadata": {
330 |      "collapsed": false
331 |     },
332 |     "source": []
333 |    }
334 |   },
335 |   "toc": {
336 |    "base_numbering": 1.0,
337 |    "nav_menu": {},
338 |    "number_sections": true,
339 |    "sideBar": true,
340 |    "skip_h1_title": false,
341 |    "title_cell": "Table of Contents",
342 |    "title_sidebar": "Contents",
343 |    "toc_cell": false,
344 |    "toc_position": {},
345 |    "toc_section_display": true,
346 |    "toc_window_display": false
347 |   }
348 |  },
349 |  "nbformat": 4,
350 |  "nbformat_minor": 1
351 | }
352 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 郑煜伟
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # multi-label-classification
  2 | 
  3 | 基于tf.keras，实现多标签分类CNN模型。
  4 | 
  5 | ## 如何使用
  6 | 
  7 | ### 快速上手
  8 | 1. run.py同目录下新建 `logs`文件夹，存放日志文件；训练完毕会出现`models`文件夹，存放模型；
  9 | 2. 查看`configs.py`并进行修改，此为参数配置文件；
 10 | 3. 实际用自己的数据训练时，可能需要执行以下`utils/check_label_file.py`，确保标签文件中的图片真实可用；
 11 | 4. 执行`python run.py`，会根据配置文件`configs.py`进行训练/测试/模型转换等。
 12 | 
 13 | ### 学习掌握
 14 | 1. 先看`README.md`;
 15 | 2. 再看`1_learning_note`下的note；
 16 | 3. 看`multi_label`下的`trainer.py`里的`__init__`函数，把整体模型串起来；
 17 | 4. 看`run.py`文件，结合着看`configs.py`。
 18 | 
 19 | ## 目录结构
 20 | 
 21 | - `A_learning_notes`: README后，**先查看本部分**了解本项目大致结构；
 22 | - `backbone`: 模型的骨干网络脚本；
 23 | - `dataset`: 数据集构造脚本；
 24 |     - `dataset_util.py`: 使用tf.image API进行图像数据增强，然后用tf.data进行数据集构建；
 25 |     - `file_util.py`: 以txt标签文件的形式，构造tf.data数据集用于训练；
 26 |     - `tfrecord_util.py`: 读取txt标签文件，写tfrecord，然后读取tfrecord为数据集用于训练；
 27 | - `images`: 项目图片；
 28 | - `logs`: 存放训练过程中的日志文件和tensorboard文件（当前可能不存在）；
 29 | - `models`: 存放训练好的模型文件（当前可能不存在）；
 30 | - `multi_label`: 多标签分类模型构建脚本；
 31 |     - `classifier_loss.py`: 多标签分类的损失函数，包含多种损失函数：`focal loss`、`GHM`等；
 32 |     - `classifier_model.py`: 多标签分类模型，负责调用`backbone`里的骨干网络和本脚本中的多标签`head`组成整体模型；
 33 |     - `train.py`: 模型训练接口，集成模型构建/编译/训练/debug/预测、数据集构建等功能；
 34 | - `utils`: 一些工具脚本；
 35 |     - `generate_txt`: 扫描指定路径下的图片数据，生成训练、测试等label.txt（根据实际项目而定，当前可能不存在）；
 36 |     - `check_label_file.py`: 在训练前检查训练集，确保标签文件中的图片真实可用；
 37 |     - `draw_tools.py`: 模型训练完进行测试时，绘制每个类别的混淆图；
 38 |     - `logger_callback.py`: 日志打印的keras回调函数；
 39 |     - `radam.py`: RAdam算法的tf.keras优化器实现；
 40 | - `configs.py`: 配置文件；
 41 | - `run.py`: 启动脚本；
 42 | 
 43 | 
 44 | ## 算法说明
 45 | 
 46 | 在**多标签多分类模型**基础上，添加功能：
 47 | - loss函数改造：
 48 |     - `label smoothing`: 标签平滑。
 49 |     - `focal loss`: 给每个样本的分类loss增加一个因子项，降低分类误差小的样本的影响，解决难易样本问题。
 50 |     > ![focal loss类别概率和损失关系图](https://github.com/zheng-yuwei/multi-label-classification/blob/master/images/focal-loss.jpg)
 51 |     - `gradient harmonizing mechanism (GHM)`: 
 52 |     根据样本梯度密度曲线（这里的梯度是梯度范数，并且不是所有网络参数的梯度，而是最后一层的回传梯度），
 53 |     取反得到梯度密度调和参数（和平衡多类别数据集一个意思，只不过这里不是按类别来平衡，而是按梯度区间来平衡），
 54 |     再乘以梯度以**调整梯度贡献曲线**，从而降低高密度区域的梯度贡献比例，提升低密度区域的梯度贡献比例。
 55 |     > ![GHM论文梯度分布与贡献图](https://github.com/zheng-yuwei/multi-label-classification/blob/master/images/GHM-insight.jpg)
 56 |     >
 57 |     > 原论文insight： 对网络训练而言，梯度是最重要的东西，而网络训练不好，也是因为梯度没调节好。
 58 |     focal loss认为前背景不平衡问题，本质为难易样本不平衡问题，从而调节样本的梯度贡献，一定程度上解决了背景问题。
 59 |     作者认为，类别不平衡、难易样本不平衡，造成的本质驱动是梯度不平衡。
 60 |     > 然后通过绘制训练好的模型在样本空间上的梯度分布曲线，发现小梯度和大梯度都是高密度区域，
 61 |     （作者认为小梯度对应易学习样本，大密度对应异常样本）；
 62 |     然后绘制正常loss和focal loss梯度贡献曲线，发现正常loss中，高密度区域的梯度贡献度很高，
 63 |     而focal loss中，小梯度的高密度区域被因子项惩罚而降低梯度贡献度，
 64 |     但大梯度的高密度区域的梯度贡献度依然很高。
 65 |     作者认为focal loss平衡了一部分梯度贡献度，所以使得训练低密度的中间梯度的梯度贡献度影响提升，
 66 |     提升了算法性能；同时，认为focal loss并没有从本质出发，所以还有残留问题（异常样本大梯度的高密度区域）。
 67 |     然后提出了GHM，从梯度分布和梯度贡献角度出发，提升网络训练效果。
 68 |     
 69 | - 分离conv层的权重衰减项$\lambda_{conv}$ 和 BN层gamma的权重衰减项$\lambda_{gamma}$  
 70 | 
 71 | 
 72 | ## 缓解过拟合/标注错误/样本错误（稍微按效果分先后，按实际数据来）
 73 | 
 74 | 1. 一定程度提高BN层中gamma的L2权重衰减，conv层的L2权重衰减可以维持不变，去掉bias；[1,2,3]
 75 | 1. 加大batch，然后要用warmup（我一开始用adam+warmup,后面用radam+warmup, radam中用动态学习率）；[4,5,6]
 76 | 1. 白化预处理；
 77 | 1. 修改网络结构，resnext18相比resnet18多了结构正则的作用，效果好些；
 78 | 1. 剪枝，其实和修改网络结构一个道理，只不过剪枝可以类似NAS自动找到更好的sub-network(网络结构)；[3,9,10]
 79 | 1. GHM损失函数；[8]
 80 | 1. 数据增强（增加数据量）；
 81 | 1. label smoothing:；[7]
 82 | 
 83 | TIPS：其他试过但基本无效的手段包括：
 84 | 继续加大weight decay权重，BN层的gamma不加weight decay，BN层的beta加weight decay，
 85 | 全连接层加dropout，focal loss，从Adam训练改为SGDM，加warmup。
 86 | 
 87 | [1] L2 Regularization versus Batch and Weight Normalization  
 88 | [2] Towards Understanding Regularization in Batch Normalization  
 89 | [3] Learning Efficient Convolutional Networks through Network Slimming  
 90 | [4] Accurate, Large Minibatch SGD：Training ImageNet in 1 Hour  
 91 | [5] Large Batch Training of Convolutional Networks  
 92 | [6] On the Variance of the Adaptive Learning Rate and Beyond  
 93 | [7] Rethinking the inception architecture for computer vision  
 94 | [8] Gradient Harmonized Single-stage Detector  
 95 | [9] Data-Driven Sparse Structure Selection for Deep Neural Networks  
 96 | [10] Rethinking the Value of Network Pruning
 97 | 
 98 | ## TODO
 99 | 1. 解决类别不平衡的做法：
100 |     - reweighted sample从而实现self-balance（参考sklearn）；
101 |     - 先用训练一个网络然后采样平衡数据集做finetune。
102 | 1. 使用GAN生成数据，进行数据增强；
103 | 1. Handwriting Recognition in Low-resource Scripts Using Adversarial Learning。
104 | 
105 | 


--------------------------------------------------------------------------------
/backbone/__init__.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | File __init__.py
4 | @author:ZhengYuwei
5 | """


--------------------------------------------------------------------------------
/backbone/basic_backbone.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File basic_backbone.py
  4 | @author:ZhengYuwei
  5 | """
  6 | from tensorflow import keras
  7 | 
  8 | 
  9 | class BasicBackbone(object):
 10 |     """ 骨干网络基础类，其他骨干网络类需要继承该类 """
 11 |     L2_CONV_DECAY = 5.e-4  # 卷积层W权重衰减系数
 12 |     BN_L2_GAMMA_DECAY = 1.e-5  # BN层gamma系数的权重衰减系数
 13 |     BN_MOMENTUM = 0.9  # BN层mean、std的指数平滑动量系数
 14 |     BN_EPSILON = 1e-5
 15 |     BATCH_SIZE_AXIS = 0  # tensorflow backend的维度顺序(N, H, W, C)
 16 |     ROW_AXIS = 1
 17 |     COL_AXIS = 2
 18 |     CHANNEL_AXIS = 3
 19 |     
 20 |     @classmethod
 21 |     def convolution(cls, input_x, filters, **conv_params):
 22 |         """
 23 |         卷积运算
 24 |         :param input_x: 卷积运算的输入
 25 |         :param filters: 卷积核数量，输出channel数
 26 |         :param conv_params: 可缺省的默认参数：
 27 |             kernel_size: 卷积核大小，(width, height)，默认(3, 3)
 28 |             strides: 步长，(width, height)，默认(1, 1)
 29 |             padding: 填充方式，默认 same
 30 |             use_bias: 是否使用偏置b，默认不使用
 31 |             kernel_initializer: 卷积核初始化方式，默认 he_normal
 32 |             kernel_regularizer: 卷积核正则化项，默认 L2正则化，衰减权重系数为L2_CONV_DECAY
 33 |         :return: 卷积运算的输出
 34 |         """
 35 |         conv_params.setdefault('filters', filters)
 36 |         conv_params.setdefault('kernel_size', (3, 3))
 37 |         conv_params.setdefault('strides', (1, 1))
 38 |         conv_params.setdefault('padding', 'same')
 39 |         conv_params.setdefault('use_bias', False)
 40 |         conv_params.setdefault('kernel_initializer', 'he_normal')
 41 |         conv_params.setdefault('kernel_regularizer', keras.regularizers.l2(cls.L2_CONV_DECAY))
 42 |         conv = keras.layers.Conv2D(**conv_params)(input_x)
 43 |         return conv
 44 |     
 45 |     @classmethod
 46 |     def depthwise_conv(cls, input_x, **conv_params):
 47 |         """
 48 |         深度可分离卷积
 49 |         :param input_x: 卷积运算的输入
 50 |         :param conv_params: 可缺省的默认参数：
 51 |             kernel_size: 卷积核大小，(width, height)，默认(3, 3)
 52 |             strides: 步长，(width, height)，默认(1, 1)
 53 |             padding: 填充方式，默认 same
 54 |             use_bias: 是否使用偏置b，默认不使用
 55 |             depthwise_initializer: 卷积核初始化方式，默认 he_normal
 56 |             depthwise_regularizer: 卷积核正则化项，默认 L2正则化，衰减权重系数为L2_CONV_DECAY
 57 |         :return: 深度可分离卷积运算的输出
 58 |         """
 59 |         conv_params.setdefault('kernel_size', (3, 3))
 60 |         conv_params.setdefault('strides', (1, 1))
 61 |         conv_params.setdefault('padding', 'same')
 62 |         conv_params.setdefault('use_bias', False)
 63 |         conv_params.setdefault('depthwise_initializer', 'he_normal')
 64 |         conv_params.setdefault('depthwise_regularizer', keras.regularizers.l2(cls.L2_CONV_DECAY))
 65 |         conv = keras.layers.DepthwiseConv2D(**conv_params)(input_x)
 66 |         return conv
 67 |     
 68 |     @classmethod
 69 |     def batch_normalization(cls, input_x):
 70 |         """
 71 |         对输入执行batch normalization运算
 72 |         :param input_x: 输入tensor
 73 |         :return: BN运算后的tensor
 74 |         """
 75 |         bn = keras.layers.BatchNormalization(axis=cls.CHANNEL_AXIS, momentum=cls.BN_MOMENTUM,
 76 |                                              gamma_regularizer=keras.regularizers.l2(cls.BN_L2_GAMMA_DECAY),
 77 |                                              epsilon=cls.BN_EPSILON)(input_x)
 78 |         return bn
 79 |     
 80 |     @classmethod
 81 |     def activation(cls, input_x, activation='relu', **activation_params):
 82 |         """
 83 |         激活函数运算
 84 |         :param input_x: 输入tensor
 85 |         :param activation: 激活函数类型
 86 |         :param activation_params: 激活函数参数
 87 |         :return: 激活运算后的tensor
 88 |         """
 89 |         output = keras.layers.Activation(activation=activation, **activation_params)(input_x)
 90 |         return output
 91 |     
 92 |     @classmethod
 93 |     def _add_hard_swish(cls):
 94 |         """ 添加hard swish作为keras的自定义激活函数 """
 95 |         def hard_swish(input_x, max_value=6.):
 96 |             """ (x * ReLU6(x+3)) / 6 """
 97 |             h_swish = input_x * keras.layers.ReLU(max_value=max_value)(input_x + 3.) / max_value
 98 |             return h_swish
 99 |         # e.g. keras.layers.Activation(activation = 'h_swish')(5.)
100 |         keras.utils.get_custom_objects().update({'h_swish': keras.layers.Activation(hard_swish)})
101 |     
102 |     @classmethod
103 |     def element_wise_add(cls, identity, residual, is_nin=False):
104 |         """
105 |         逐元素加的合并单位分支和残差分支的运算
106 |         :param identity: shortcut的单位量分支
107 |         :param residual: shortcut的残差量分支
108 |         :param is_nin: 是否对单位量实施NIN卷积操作
109 |         :return: 相加合并结果tensor
110 |         """
111 |         identity_shape = keras.backend.int_shape(identity)
112 |         residual_shape = keras.backend.int_shape(residual)
113 |         stride_width = int(round(identity_shape[cls.ROW_AXIS] / residual_shape[cls.ROW_AXIS]))
114 |         stride_height = int(round(identity_shape[cls.COL_AXIS] / residual_shape[cls.COL_AXIS]))
115 |     
116 |         if is_nin:
117 |             identity = cls.convolution(identity,
118 |                                        filters=residual_shape[cls.CHANNEL_AXIS],
119 |                                        kernel_size=(1, 1),
120 |                                        strides=(stride_width, stride_height),
121 |                                        padding='valid')
122 |             identity = cls.batch_normalization(identity)
123 |             
124 |         merge = keras.layers.add(inputs=[identity, residual])
125 |         return merge
126 | 
127 |     @classmethod
128 |     def conv_bn(cls, input_x, filters, **conv_params):
129 |         """
130 |         卷积 + 批归一化 运算
131 |         :param input_x: 输入tensor
132 |         :param filters: 卷积核数量，channel数
133 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
134 |         :return: 运算后的tensor
135 |         """
136 |         conv = cls.convolution(input_x, filters, **conv_params)
137 |         bn = cls.batch_normalization(conv)
138 |         return bn
139 | 
140 |     @classmethod
141 |     def depthwise_conv_bn(cls, input_x, **conv_params):
142 |         """
143 |         深度可分离卷积 + 批归一化 运算
144 |         :param input_x: 输入tensor
145 |         :param conv_params: 深度可分离卷积参数，参见 BasicBackbone.depthwise_conv
146 |         :return: 运算后的tensor
147 |         """
148 |         conv = cls.depthwise_conv(input_x, **conv_params)
149 |         bn = cls.batch_normalization(conv)
150 |         return bn
151 | 
152 |     @classmethod
153 |     def bn_activation(cls, input_x, activation='relu', **activation_params):
154 |         """
155 |         批归一化 + 激活 运算
156 |         :param input_x: 输入tensor
157 |         :param activation: 激活函数类型名称
158 |         :param activation_params: 激活函数参数列表
159 |         :return: 运算后的tensor
160 |         """
161 |         bn = cls.batch_normalization(input_x)
162 |         act = cls.activation(bn, activation=activation, **activation_params)
163 |         return act
164 | 


--------------------------------------------------------------------------------
/backbone/mixnet18.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File mixnet.py
 4 | @author: ZhengYuwei
 5 | """
 6 | import numpy as np
 7 | from tensorflow import keras
 8 | from backbone.basic_backbone import BasicBackbone
 9 | 
10 | 
11 | class MixNet18(BasicBackbone):
12 |     """
13 |     MixNet 18:不是论文中的MixNet的结构
14 |     只是借鉴了MixNet的不同kernel size mix到一起的想法，同时也使用depthwise
15 |     不使用depthwise的话，就是resnext 18了
16 |     """
17 |     
18 |     MIX_KERNEL_SIZES = [(3, 3), (5, 5), (7, 7), (9, 9)]
19 |     MIX_KERNEL_RATIO = np.array([0, 8, 4, 2, 2], dtype=np.float)
20 |     MIX_KERNEL_RATIO = MIX_KERNEL_RATIO.cumsum() / MIX_KERNEL_RATIO.sum()
21 |     
22 |     @classmethod
23 |     def _mix_residual_block(cls, input_x, filters, is_nin=True, **conv_params):
24 |         """
25 |         一个残差模块里的 block
26 |         input-> conv+bn->relu-> conv+bn-> add->relu->
27 |              |-----> conv(1 X 1)+bn ------>|
28 |         :param input_x: 残差block的输入
29 |         :param filters: 卷积核数，残差运算后的channel数
30 |         :param is_nin: shortcut是否需要进行NIN运算
31 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
32 |         :return: 卷积块运算之后的tensor
33 |         """
34 |         residual = cls.conv_bn(input_x, filters, **conv_params)
35 |         residual = cls.activation(residual)
36 |         
37 |         mix_residuals = list()
38 |         mix_kernel_nums = filters * cls.MIX_KERNEL_RATIO
39 |         mix_kernel_nums = mix_kernel_nums.astype(np.int)
40 |         
41 |         for i, kernel_size in enumerate(cls.MIX_KERNEL_SIZES):
42 |             mix_residual = keras.layers.Lambda(lambda x: x[:, :, :, mix_kernel_nums[i]:mix_kernel_nums[i+1]])(residual)
43 |             mix_conv = cls.depthwise_conv_bn(mix_residual, kernel_size=kernel_size)
44 |             mix_residuals.append(mix_conv)
45 |         mix_residuals = keras.layers.concatenate(inputs=mix_residuals, axis=cls.CHANNEL_AXIS)
46 |         identity = cls.element_wise_add(input_x, mix_residuals, is_nin=is_nin)
47 |         identity = cls.activation(identity)
48 |         return identity
49 |     
50 |     @classmethod
51 |     def _mix_residual_module(cls, input_x, filters, **conv_params):
52 |         """
53 |         一个残差模块：
54 |         input-> conv+bn->relu-> conv+bn-> add->relu-> conv+bn->relu-> conv+bn-> add -> relu
55 |              |-----> conv(1 X 1)+bn ----->|        |--------------------------->|
56 |         :param input_x: 该残差块的输入
57 |         :param filters: 卷积核数，残差运算后的channel数
58 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
59 |         :return:
60 |         """
61 |         first_block = cls._mix_residual_block(input_x, filters, is_nin=True, **conv_params)
62 |         second_block = cls._mix_residual_block(first_block, filters, is_nin=False)
63 |         return second_block
64 |     
65 |     @classmethod
66 |     def build(cls, input_x):
67 |         """
68 |         构造mixnet18基础网络，接受layers.Input，卷积层+BN层+add层+activation层输出，tf维度为 NHWC
69 |         :param input_x: layers.Input对象
70 |         :return: 卷积层+BN层+add层+activation层输出，tf维度为 NHWC=(N, H/32, W/32, 512)
71 |         """
72 |         net = cls.conv_bn(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same')
73 |         net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(net)
74 |         net = cls.activation(net)
75 | 
76 |         # 4 * 残差模块
77 |         net = cls._mix_residual_module(net, filters=64)
78 |         net = cls._mix_residual_module(net, filters=128, strides=(2, 2))
79 |         net = cls._mix_residual_module(net, filters=256, strides=(2, 2))
80 |         net = cls._mix_residual_module(net, filters=512, strides=(2, 2))
81 |         
82 |         return net
83 | 


--------------------------------------------------------------------------------
/backbone/mobilenet_v2.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File mobilenet_v2.py
 4 | @author:ZhengYuwei
 5 | """
 6 | from tensorflow import keras
 7 | from backbone.basic_backbone import BasicBackbone
 8 | 
 9 | 
10 | class MobileNetV2(BasicBackbone):
11 |     
12 |     @classmethod
13 |     def _inverted_residual_module(cls, input_x, filters, expand_ratio=6, strides=(2, 2)):
14 |         net = cls._expand_depthwise_linear(input_x, filters, expand_ratio, strides)
15 |         net = cls.element_wise_add(input_x, net, is_nin=False)
16 |         return net
17 |     
18 |     @classmethod
19 |     def _expand_depthwise_linear(cls, input_x, filters, expand_ratio=6, strides=(2, 2)):
20 |         """
21 |         MobileNet v2基本模块：expand、depthwise、linear
22 |         :param input_x: 输入tensor
23 |         :param filters: 模块输出的通道数
24 |         :param expand_ratio: 扩张比例，默认为6
25 |         :param strides: 步长，默认(2, 2)
26 |         :return: 模块运算后的输出tensor
27 |         """
28 |         input_filters = keras.backend.int_shape(input_x)[-1]
29 |         depthwise_filters = expand_ratio * input_filters
30 |         # x6 (1, 1) expand
31 |         net = cls.conv_bn(input_x, filters=depthwise_filters, kernel_size=(1, 1), strides=(1, 1), padding='same')
32 |         net = cls.activation(net)
33 |         # (3, 3) depthwise
34 |         net = cls.depthwise_conv_bn(net, strides=strides)
35 |         net = cls.activation(net)
36 |         # (1, 1) linear bottleneck
37 |         net = cls.conv_bn(net, filters=filters, kernel_size=(1, 1), strides=(1, 1), padding='same')
38 |         return net
39 |         
40 |     @classmethod
41 |     def build(cls, input_x):
42 |         """
43 |         构建 MobileNet v2网络，整体网络的stride也是32
44 |         :param input_x: 网络输入图形矩阵
45 |         :return: 网络输出tensor
46 |         """
47 |         net = cls.conv_bn(input_x, filters=32, kernel_size=(3, 3), strides=(2, 2), padding='same')
48 |         net = cls.activation(net)
49 |         
50 |         net = cls._expand_depthwise_linear(net, filters=16, expand_ratio=1, strides=(1, 1))
51 |         
52 |         net = cls._expand_depthwise_linear(net, filters=24, expand_ratio=6, strides=(2, 2))
53 |         net = cls._inverted_residual_module(net, filters=24, expand_ratio=6, strides=(1, 1))
54 | 
55 |         net = cls._expand_depthwise_linear(net, filters=32, expand_ratio=6, strides=(2, 2))
56 |         net = cls._inverted_residual_module(net, filters=32, expand_ratio=6, strides=(1, 1))
57 |         net = cls._inverted_residual_module(net, filters=32, expand_ratio=6, strides=(1, 1))
58 | 
59 |         net = cls._expand_depthwise_linear(net, filters=64, expand_ratio=6, strides=(1, 1))
60 |         net = cls._inverted_residual_module(net, filters=64, expand_ratio=6, strides=(1, 1))
61 |         net = cls._inverted_residual_module(net, filters=64, expand_ratio=6, strides=(1, 1))
62 |         net = cls._inverted_residual_module(net, filters=64, expand_ratio=6, strides=(1, 1))
63 | 
64 |         net = cls._expand_depthwise_linear(net, filters=96, expand_ratio=6, strides=(2, 2))
65 |         net = cls._inverted_residual_module(net, filters=96, expand_ratio=6, strides=(1, 1))
66 |         net = cls._inverted_residual_module(net, filters=96, expand_ratio=6, strides=(1, 1))
67 | 
68 |         net = cls._expand_depthwise_linear(net, filters=160, expand_ratio=6, strides=(2, 2))
69 |         net = cls._inverted_residual_module(net, filters=160, expand_ratio=6, strides=(1, 1))
70 |         net = cls._inverted_residual_module(net, filters=160, expand_ratio=6, strides=(1, 1))
71 | 
72 |         net = cls._expand_depthwise_linear(net, filters=320, expand_ratio=6, strides=(1, 1))
73 |         # 原始是1280个channel的输出
74 |         net = cls.conv_bn(net, filters=512, kernel_size=(1, 1), strides=(1, 1), padding='same')
75 |         net = cls.activation(net)
76 |         return net
77 | 


--------------------------------------------------------------------------------
/backbone/resnet18.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File resnet18.py
 4 | @author:ZhengYuwei
 5 | 注意：
 6 | 1. resnet v1中，conv层都是有bias的，resnext则没有，resnet v2部分有部分没有；(但，个人感觉可以全部都不要，因为有BN）
 7 | 2. resnet v2 使用 pre-activation，resnet和resnext不用；（有没有pre-activation其实差不多，inception加强多一些）
 8 | 3. 18和34层用2层 3x3 conv层的block，50及以上的用3层(1,3,1)conv层、具有bottleneck（4倍差距）的block
 9 | """
10 | from tensorflow import keras
11 | from backbone.basic_backbone import BasicBackbone
12 | 
13 | 
14 | class ResNet18(BasicBackbone):
15 |     """ 改动后的ResNet 18，网络前端 7x7卷积->3x3卷积 """
16 |     
17 |     @classmethod
18 |     def _residual_block(cls, input_x, filters, is_nin=True, **conv_params):
19 |         """
20 |         一个残差模块里的 block
21 |         input-> conv+bn->relu-> conv+bn-> add->relu->
22 |              |-----> conv(1 X 1)+bn ------>|
23 |         :param input_x: 残差block的输入
24 |         :param filters: 卷积核数，残差运算后的channel数
25 |         :param is_nin: shortcut是否需要进行NIN运算
26 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
27 |         :return: 卷积块运算之后的tensor
28 |         """
29 |         residual = cls.conv_bn(input_x, filters, **conv_params)
30 |         residual = cls.activation(residual)
31 |         conv_params.update(strides=(1, 1))
32 |         residual = cls.conv_bn(residual, filters, **conv_params)
33 |         identity = cls.element_wise_add(input_x, residual, is_nin=is_nin)
34 |         identity = cls.activation(identity)
35 |         return identity
36 |     
37 |     @classmethod
38 |     def _residual_module(cls, input_x, filters, **conv_params):
39 |         """
40 |         一个残差模块：
41 |         input-> conv+bn->relu-> conv+bn-> add->relu-> conv+bn->relu-> conv+bn-> add -> relu
42 |              |-----> conv(1 X 1)+bn ----->|        |--------------------------->|
43 |         :param input_x: 该残差块的输入
44 |         :param filters: 卷积核数，残差运算后的channel数
45 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
46 |         :return:
47 |         """
48 |         first_block = cls._residual_block(input_x, filters, is_nin=True, **conv_params)
49 |         second_block = cls._residual_block(first_block, filters, is_nin=False)
50 |         return second_block
51 |     
52 |     @classmethod
53 |     def build(cls, input_x):
54 |         """
55 |         构造resnet18基础网络，接受layers.Input，卷积层+BN层+add层+activation层输出，tf维度为 NHWC
56 |         :param input_x: layers.Input对象
57 |         :return: 卷积层+BN层+add层+activation层输出，tf维度为 NHWC=(N, H/32, W/32, 512)
58 |         """
59 |         net = cls.conv_bn(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same')
60 |         net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(net)
61 |         net = cls.activation(net)
62 | 
63 |         # 4 * 残差模块
64 |         net = cls._residual_module(net, filters=64)
65 |         net = cls._residual_module(net, filters=128, strides=(2, 2))
66 |         net = cls._residual_module(net, filters=256, strides=(2, 2))
67 |         net = cls._residual_module(net, filters=512, strides=(2, 2))
68 |         
69 |         return net
70 | 


--------------------------------------------------------------------------------
/backbone/resnet18_v2.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File resnet18_v2.py
 4 | @author:ZhengYuwei
 5 | """
 6 | from tensorflow import keras
 7 | from backbone.basic_backbone import BasicBackbone
 8 | 
 9 | 
10 | class ResNet18_v2(BasicBackbone):
11 |     """ 改动后的ResNet-v2 18，网络前端 7x7卷积->3x3卷积 """
12 |     
13 |     @classmethod
14 |     def _residual_v2_block(cls, input_x, filters, is_nin=True, **conv_params):
15 |         """
16 |         一个残差模块里的 block
17 |         input-> bn+relu-> conv-> bn+relu-> conv-> add->
18 |                        |----> conv(1 X 1)+bn ---->|
19 |         或
20 |         input-> bn+relu-> conv-> bn+relu-> conv-> add->
21 |            |------------------------------------->|
22 |         :param input_x: 残差block的输入
23 |         :param filters: 卷积核数，残差运算后的channel数
24 |         :param is_nin: shortcut是否需要进行NIN运算
25 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
26 |         :return: 卷积块运算之后的tensor
27 |         """
28 |         pre_activation = cls.bn_activation(input_x)
29 |         residual = cls.convolution(pre_activation, filters=filters, **conv_params)
30 |         conv_params.update(strides=(1, 1))
31 |         residual = cls.bn_activation(residual)
32 |         residual = cls.convolution(residual, filters=filters, **conv_params)
33 |         if is_nin:
34 |             identity = cls.element_wise_add(pre_activation, residual, is_nin=True)
35 |         else:
36 |             identity = cls.element_wise_add(input_x, residual, is_nin=False)
37 |         return identity
38 | 
39 |     @classmethod
40 |     def _residual_v2_module(cls, input_x, filters, **conv_params):
41 |         """
42 |         一个resnet v2残差模块块：
43 |         input-> bn+relu-> conv-> bn+relu-> conv-> add-> bn+relu-> conv-> bn+relu-> conv-> add->
44 |                        |----> conv(1 X 1)+bn ---->| |------------------------------------->|
45 |         :param input_x: 该残差块的输入
46 |         :param filters: 卷积核数，残差运算后的channel数
47 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
48 |         :return:
49 |         """
50 |         first_block = cls._residual_v2_block(input_x, filters, is_nin=True, **conv_params)
51 |         second_block = cls._residual_v2_block(first_block, filters, is_nin=False)
52 |         return second_block
53 | 
54 |     @classmethod
55 |     def build(cls, input_x):
56 |         """
57 |         构造resnet18 v2基础网络，接受layers.Input，卷积层+add层+BN层+activation层输出，tf维度为 NHWC
58 |         :param input_x: layers.Input对象
59 |         :return: 卷积层+BN层+activation层输出，tf维度为 NHWC=(N, H/32, W/32, 512)
60 |         """
61 |         net = cls.convolution(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same')
62 |         net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(net)
63 | 
64 |         # 4 * 残差模块
65 |         net = cls._residual_v2_module(net, filters=64)
66 |         net = cls._residual_v2_module(net, filters=128, strides=(2, 2))
67 |         net = cls._residual_v2_module(net, filters=256, strides=(2, 2))
68 |         net = cls._residual_v2_module(net, filters=512, strides=(2, 2))
69 |         net = cls.bn_activation(net)
70 |         
71 |         return net
72 | 


--------------------------------------------------------------------------------
/backbone/resnext.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File resnext.py
  4 | @author: ZhengYuwei
  5 | """
  6 | import numpy as np
  7 | from tensorflow import keras
  8 | from backbone.basic_backbone import BasicBackbone
  9 | 
 10 | 
 11 | class ResNeXt18(BasicBackbone):
 12 |     """
 13 |     ResNeXt 18:不是论文中的ResNeXt的结构
 14 |     只是借鉴了ResNeXt中 分组卷积 + 不同组不同卷积核大小 的思想
 15 |     大概可以看成不使用depthwise的MixNet 18
 16 |     """
 17 |     
 18 |     MIX_KERNEL_SIZES = [(3, 3), (5, 5), (7, 7), (9, 9)]
 19 |     # 分为32组，每组理论至少4个channel，不足的话可以把组数减半
 20 |     GROUP_NUMS = np.array([16, 8, 4, 4], dtype=np.int)
 21 |     SMALL_GROUP_NUMS = GROUP_NUMS // 2
 22 |     TOTAL_GROUP_NUMS = np.sum(GROUP_NUMS)
 23 |     SMALL_TOTAL_GROUP_NUMS = np.sum(SMALL_GROUP_NUMS)
 24 | 
 25 |     @classmethod
 26 |     def _inception_residual_block(cls, input_x, filters, is_nin=True, **conv_params):
 27 |         """
 28 |         一个残差模块里的 block
 29 |         input-> conv+bn->relu-> conv+bn-> add->relu->
 30 |              |-----> conv(1 X 1)+bn ------>|
 31 |         :param input_x: 残差block的输入
 32 |         :param filters: 卷积核数，残差运算后的channel数
 33 |         :param is_nin: shortcut是否需要进行NIN运算
 34 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
 35 |         :return: 卷积块运算之后的tensor
 36 |         """
 37 |         residual = cls.conv_bn(input_x, filters, **conv_params)
 38 |         residual = cls.activation(residual)
 39 |         
 40 |         # 每组至少4个channel
 41 |         if filters % cls.SMALL_TOTAL_GROUP_NUMS != 0:
 42 |             raise ValueError('卷积核数必须可以被组数整除！')
 43 |         if filters / cls.SMALL_TOTAL_GROUP_NUMS < 4:
 44 |             raise ValueError('卷积核数分组后，每组至少有4个通道！')
 45 |         # 判断分为32组还是16组
 46 |         group_nums = cls.GROUP_NUMS
 47 |         total_group_num = cls.TOTAL_GROUP_NUMS
 48 |         if filters % cls.TOTAL_GROUP_NUMS != 0 or filters / cls.TOTAL_GROUP_NUMS < 4:
 49 |             group_nums = cls.SMALL_GROUP_NUMS
 50 |             total_group_num = cls.SMALL_TOTAL_GROUP_NUMS
 51 |         # 分组卷积
 52 |         group_channel = filters // total_group_num
 53 |         group_residuals = list()
 54 |         start_channel = 0
 55 |         end_channel = start_channel
 56 |         for i, group in enumerate(group_nums):
 57 |             for j in range(group):
 58 |                 end_channel += group_channel
 59 |                 group_residual = keras.layers.Lambda(lambda x: x[:, :, :, start_channel:end_channel])(residual)
 60 |                 group_conv = cls.conv_bn(group_residual, filters=group_channel, kernel_size=cls.MIX_KERNEL_SIZES[i])
 61 |                 group_residuals.append(group_conv)
 62 |         group_residuals = keras.layers.concatenate(inputs=group_residuals, axis=cls.CHANNEL_AXIS)
 63 |         identity = cls.element_wise_add(input_x, group_residuals, is_nin=is_nin)
 64 |         identity = cls.activation(identity)
 65 |         return identity
 66 | 
 67 |     @classmethod
 68 |     def _inception_residual_module(cls, input_x, filters, **conv_params):
 69 |         """
 70 |         一个残差模块：
 71 |         input-> conv+bn->relu-> conv+bn-> add->relu-> conv+bn->relu-> conv+bn-> add -> relu
 72 |              |-----> conv(1 X 1)+bn ----->|        |--------------------------->|
 73 |         :param input_x: 该残差块的输入
 74 |         :param filters: 卷积核数，残差运算后的channel数
 75 |         :param conv_params: 卷积参数，参见 BasicBackbone.convolution
 76 |         :return:
 77 |         """
 78 |         first_block = cls._inception_residual_block(input_x, filters, is_nin=True, **conv_params)
 79 |         second_block = cls._inception_residual_block(first_block, filters, is_nin=False)
 80 |         return second_block
 81 | 
 82 |     @classmethod
 83 |     def build(cls, input_x):
 84 |         """
 85 |         构造resnext18基础网络，接受layers.Input，卷积层+BN层+add层+activation层输出，tf维度为 NHWC
 86 |         :param input_x: layers.Input对象
 87 |         :return: 卷积层+BN层+add层+activation层输出，tf维度为 NHWC=(N, H/32, W/32, 512)
 88 |         """
 89 |         net = cls.conv_bn(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same')
 90 |         net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(net)
 91 |         net = cls.activation(net)
 92 | 
 93 |         # 4 * 残差模块
 94 |         net = cls._inception_residual_module(net, filters=64)
 95 |         net = cls._inception_residual_module(net, filters=128, strides=(2, 2))
 96 |         net = cls._inception_residual_module(net, filters=256, strides=(2, 2))
 97 |         net = cls._inception_residual_module(net, filters=512, strides=(2, 2))
 98 |         
 99 |         return net
100 | 


--------------------------------------------------------------------------------
/configs.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File configs.py
 4 | @author:ZhengYuwei
 5 | """
 6 | import datetime
 7 | import numpy as np
 8 | from easydict import EasyDict
 9 | from multi_label.multi_label_model import Classifier
10 | 
11 | 
12 | def lr_func(epoch):
13 |     # step_epoch = [10, 20, 30, 40, 50, 60, 70, 80]
14 |     # step_lr = [0.0000001, 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0] # 0.0001
15 |     step_epoch = [10, 140, 200, 260, 300]
16 |     step_lr = [0.00001, 0.001, 0.0001, 0.00001, 0.000001]
17 |     i = 0
18 |     while i < len(step_epoch) and epoch > step_epoch[i]:
19 |         i += 1
20 |     return step_lr[i]
21 | 
22 | 
23 | FLAGS = EasyDict()
24 | 
25 | # 数据集
26 | FLAGS.train_set_dir = 'dataset/test_sample'
27 | FLAGS.train_label_path = 'dataset/test_sample/label.txt'
28 | FLAGS.test_set_dir = 'dataset/test_sample'
29 | FLAGS.test_label_path = 'dataset/test_sample/label.txt'
30 | # 模型权重的L2正则化权重直接写在对应模型的骨干网络定义文件中
31 | FLAGS.input_shape = (48, 144, 3)  # (H, W, C)
32 | FLAGS.output_shapes = (34, 64, 34, 34, 34, 34, 42, 12, 2, 6)  # 多标签输出，每个标签预测的类别数
33 | FLAGS.output_names = ['class_{}'.format(i+1) for i in range(10)]
34 | FLAGS.loss_weights = [1, 1, 1, 1, 1, 1, 1, 1, 0.5, 0.5]
35 | FLAGS.mode = 'train'  # train, test, debug, save_pb, save_serving
36 | FLAGS.model_backbone = Classifier.BACKBONE_RESNET_18
37 | FLAGS.optimizer = 'radam'  # sgdm, adam, adabound, radam
38 | FLAGS.is_augment = True
39 | FLAGS.is_label_smoothing = False
40 | FLAGS.is_focal_loss = False
41 | FLAGS.is_gradient_harmonized = True
42 | FLAGS.type = FLAGS.model_backbone + '-' + FLAGS.optimizer
43 | FLAGS.type += ('-aug' if FLAGS.is_augment else '')
44 | FLAGS.type += ('-smooth' if FLAGS.is_label_smoothing else '')
45 | FLAGS.type += ('-focal' if FLAGS.is_focal_loss else '')
46 | FLAGS.type += ('-ghm' if FLAGS.is_gradient_harmonized else '')
47 | FLAGS.log_path = 'logs/log-{}.txt'.format(FLAGS.type)
48 | # 训练参数
49 | FLAGS.train_set_size = 14  # 160108
50 | FLAGS.val_set_size = 14  # 35935
51 | FLAGS.batch_size = 5  # 3079
52 | FLAGS.steps_per_epoch = int(np.ceil(FLAGS.train_set_size / FLAGS.batch_size))
53 | FLAGS.validation_steps = int(np.ceil(FLAGS.val_set_size / FLAGS.batch_size))
54 | 
55 | FLAGS.epoch = 300
56 | FLAGS.init_lr = 0.0002  # nadam推荐使用值
57 | # callback的参数
58 | FLAGS.ckpt_period = 20  # 模型保存
59 | FLAGS.stop_patience = 500  # early stop
60 | FLAGS.stop_min_delta = 0.0001
61 | FLAGS.lr_func = lr_func  # 学习率更新函数
62 | # FLAGS.logger_batch = 20  # 打印训练学习的batch间隔
63 | # tensorboard日志保存目录
64 | FLAGS.tensorboard_dir = 'logs/' + 'lpr-{}-{}'.format(FLAGS.type, datetime.datetime.now().strftime('%Y%m%d-%H%M%S'))
65 | # 模型保存
66 | FLAGS.checkpoint_path = 'models/{}/'.format(FLAGS.type)
67 | FLAGS.checkpoint_name = 'lp-recognition-{}'.format(FLAGS.type) + '-{epoch: 3d}-{loss: .5f}.ckpt'
68 | FLAGS.serving_model_dir = 'models/serving'
69 | FLAGS.pb_model_dir = 'models/pb'
70 | # 测试参数
71 | FLAGS.base_confidence = 0.83  # 基础置信度
72 | # 训练gpu
73 | FLAGS.gpu_mode = 'cpu'
74 | FLAGS.gpu_num = 1
75 | FLAGS.visible_gpu = '0'  # ','.join([str(_) for _ in range(FLAGS.gpu_num)])
76 | FLAGS.gpu_device = '0'
77 | 


--------------------------------------------------------------------------------
/dataset/__init__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File __init__.py
 4 | @author: ZhengYuwei
 5 | 功能：
 6 |     构建tf.data.Dataset对象，生成训练集/验证集/测试集，以提供模型训练、测试
 7 |     构建方式主要包含:
 8 |     1. 直接从label文件中读取信息进行构建（file_util)；
 9 |     2. 由label文件读取信息生成tfrecord、读取tfrecord方式，构建数据集（tfrecord_util）；
10 | """


--------------------------------------------------------------------------------
/dataset/dataset_util.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File dataset_util.py
  4 | @author:ZhengYuwei
  5 | 功能：
  6 | 功能：
  7 | 1. DatasetUtil.augment_image
  8 |     对传入的image构成的tf.data.Dataset数据集，进行图像数据增强，包含：
  9 |     - 等概率加噪：高斯噪声、椒盐噪声、不加噪声；
 10 |     - 对 对比度、亮度、饱和度 进行一定范围的随机扰动
 11 |     - mixup（待添加及实验）
 12 |     - 图像平移（当前场景不适用）
 13 |     - 图像旋转和翻转（当前场景不适用）
 14 |     - 随机crop（当前场景不适用）
 15 | 2. DatasetUtil.shuffle_repeat：随机扰乱数据，重复（整个数据集层面）生成数据；
 16 | 3. DatasetUtil.batch_prefetch：预生成批次数据。
 17 | """
 18 | import tensorflow as tf
 19 | 
 20 | 
 21 | class DatasetUtil(object):
 22 |     """ 对传入的生成（image, label)形式的tf.data.Dataset数据集，加工为可供训练使用的数据集 """
 23 |     # 数据增强的超参，这个可能需要先不使用数据增强训练，调整超参，然后再用数据增强训练对比，然后调节这些超参
 24 |     _random_brightness = 30. / 255.  # 随机亮度
 25 |     _random_low_contrast = 0.9  # 对比度最低值
 26 |     _random_up_contrast = 1.1  # 对比度最大值
 27 |     _random_low_saturation = 0.9  # 饱和度最小值
 28 |     _random_up_saturation = 1.1  # 饱和度最大值
 29 |     _random_normal = 0.01  # 随机噪声
 30 | 
 31 |     @staticmethod
 32 |     def _add_gauss_noise(image):
 33 |         """ 加入高斯噪声 """
 34 |         image = image + tf.cast(tf.random_normal(tf.shape(image), mean=0, stddev=DatasetUtil._random_normal),
 35 |                                 tf.float32)
 36 |         return image
 37 | 
 38 |     @staticmethod
 39 |     def _add_salt_pepper_noise(image):
 40 |         """ 加入椒盐噪声 """
 41 |         shp = tf.shape(image)[:-1]
 42 |         mask_select = tf.keras.backend.random_binomial(shape=shp, p=DatasetUtil._random_normal)
 43 |         mask_noise = tf.keras.backend.random_binomial(shape=shp, p=0.5)  # 同样概率的椒盐
 44 |         image = image * tf.expand_dims(1 - mask_select, -1) + tf.expand_dims(mask_noise * mask_select, -1)
 45 |         return image
 46 | 
 47 |     @staticmethod
 48 |     def _add_noise(image):
 49 |         """ 对图片进行数据增强：高斯噪声或椒盐噪声 """
 50 |         # 噪声类型
 51 |         noise_type = tf.random_uniform([], minval=0, maxval=3, dtype=tf.int32)
 52 |         image = tf.case(pred_fn_pairs=[(tf.equal(noise_type, 0),
 53 |                                         lambda: DatasetUtil._add_salt_pepper_noise(image)),
 54 |                                        (tf.equal(noise_type, 1),
 55 |                                         lambda: DatasetUtil._add_gauss_noise(image))],
 56 |                         default=lambda: image)
 57 |         return image
 58 | 
 59 |     @staticmethod
 60 |     def _augment_cond_0(image):
 61 |         """ 对图片进行数据增强：亮度，饱和度，对比度 """
 62 |         image = tf.image.random_brightness(image, max_delta=DatasetUtil._random_brightness)
 63 |         image = tf.image.random_saturation(image, lower=DatasetUtil._random_low_saturation,
 64 |                                            upper=DatasetUtil._random_up_saturation)
 65 |         image = tf.image.random_contrast(image, lower=DatasetUtil._random_low_contrast,
 66 |                                          upper=DatasetUtil._random_up_contrast)
 67 |         return image
 68 |     
 69 |     @staticmethod
 70 |     def _augment_cond_1(image):
 71 |         """ 对图片进行数据增强：饱和度，亮度，对比度 """
 72 |         image = tf.image.random_saturation(image, lower=DatasetUtil._random_low_saturation,
 73 |                                            upper=DatasetUtil._random_up_saturation)
 74 |         image = tf.image.random_brightness(image, max_delta=DatasetUtil._random_brightness)
 75 |         image = tf.image.random_contrast(image, lower=DatasetUtil._random_low_contrast,
 76 |                                          upper=DatasetUtil._random_up_contrast)
 77 |         return image
 78 |     
 79 |     @staticmethod
 80 |     def _augment_cond_2(image):
 81 |         """ 对图片进行数据增强：饱和度，对比度, 亮度 """
 82 |         image = tf.image.random_saturation(image, lower=DatasetUtil._random_low_saturation,
 83 |                                            upper=DatasetUtil._random_up_saturation)
 84 |         image = tf.image.random_contrast(image, lower=DatasetUtil._random_low_contrast,
 85 |                                          upper=DatasetUtil._random_up_contrast)
 86 |         image = tf.image.random_brightness(image, max_delta=DatasetUtil._random_brightness)
 87 |         return image
 88 |     
 89 |     @staticmethod
 90 |     def _augment(image):
 91 |         """ 对图片进行数据增强：饱和度，对比度, 亮度，加噪
 92 |         :param image: 待增强图片 (H, W, ?)
 93 |         :return:
 94 |         """
 95 |         image = DatasetUtil._add_noise(image)
 96 |         # 数据增强顺序
 97 |         color_ordering = tf.random_uniform([], minval=0, maxval=4, dtype=tf.int32)
 98 |         image = tf.case(pred_fn_pairs=[(tf.equal(color_ordering, 0),
 99 |                                         lambda: DatasetUtil._augment_cond_0(image)),
100 |                                        (tf.equal(color_ordering, 1),
101 |                                         lambda: DatasetUtil._augment_cond_1(image)),
102 |                                        (tf.equal(color_ordering, 2),
103 |                                         lambda: DatasetUtil._augment_cond_2(image))],
104 |                         default=lambda: image)
105 |         image = tf.clip_by_value(image, 0.0, 1.0)  # 防止数据增强越界
106 |         return image
107 |     
108 |     @staticmethod
109 |     def augment_image(image_set):
110 |         """ 对传入的tf.data.Dataset数据集进行图片数据增强，构造批次
111 |         :param image_set: tf.data.Dataset数据集，产生(image, label)形式的数据
112 |         :return: 增强后的tf.data.Dataset对象
113 |         """
114 |         # 进行数据增强（这个map需要在repeat之后，才能每次repeat都进行不一样的增强效果）
115 |         image_set = image_set.map(lambda image: DatasetUtil._augment(image),
116 |                                   num_parallel_calls=tf.data.experimental.AUTOTUNE)
117 |         return image_set
118 | 
119 |     @staticmethod
120 |     def shuffle_repeat(dataset, batch_size):
121 |         """ 对传入的tf.data.Dataset数据集进行 shuffle 和 repeat
122 |         :param dataset: tf.data.Dataset数据集，产生(image, label)形式的数据
123 |         :param batch_size: 训练的batch大小
124 |         :return: shuffle 和 repeat后的tf.data.Dataset对象
125 |         """
126 |         dataset = dataset.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=5 * batch_size))
127 |         return dataset
128 | 
129 |     @staticmethod
130 |     def batch_prefetch(dataset, batch_size):
131 |         """ 生成批次并预加载
132 |         :param dataset: tf.data.Dataset数据集
133 |         :param batch_size: 训练的batch大小
134 |         :return: 输出批次并预加载的tf.data.Dataset数据集
135 |         """
136 |         # 缓存数据到内存
137 |         # dataset = dataset.cache()
138 |         dataset = dataset.batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE)
139 |         return dataset
140 | 


--------------------------------------------------------------------------------
/dataset/file_util.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File file_util.py
  4 | @author:ZhengYuwei
  5 | """
  6 | import os
  7 | import logging
  8 | import functools
  9 | import tensorflow as tf
 10 | 
 11 | from dataset.dataset_util import DatasetUtil
 12 | 
 13 | 
 14 | class FileUtil(object):
 15 |     """
 16 |     从标签文件中，构造返回(image, label)的tf.data.Dataset数据集
 17 |     标签文件内容如下：
 18 |     image_name label0,label1,label2,...
 19 |     """
 20 | 
 21 |     @staticmethod
 22 |     def _parse_string_line(string_line, root_path):
 23 |         """
 24 |         解析文本中的一行字符串行，得到图片路径（拼接图片根目录）和标签
 25 |         :param string_line: 文本中的一行字符串，image_name label0 label1 label2 label3 ...
 26 |         :param root_path: 图片根目录
 27 |         :return: DatasetV1Adapter<(图片路径Tensor(shape=(), dtype=string)，标签Tensor(shape=(?,), dtype=float32))>
 28 |         """
 29 |         strings = tf.string_split([string_line], delimiter=' ').values
 30 |         image_path = tf.string_join([root_path, strings[0]], separator=os.sep)
 31 |         labels = tf.string_to_number(strings[1:])
 32 |         return image_path, labels
 33 |     
 34 |     @staticmethod
 35 |     def _parse_image(image_path, _, image_size):
 36 |         """
 37 |         根据图片路径和标签，读取图片
 38 |         :param image_path: 图片路径, Tensor(shape=(), dtype=string)
 39 |         :param _: 标签Tensor(shape(?,), dtype=float32))，本函数只产生图像dataset，故不需要
 40 |         :param image_size: 图像需要resize到的大小
 41 |         :return: 归一化的图片 Tensor(shape=(48, 144, ?), dtype=float32)
 42 |         """
 43 |         # 图片
 44 |         image = tf.read_file(image_path)
 45 |         image = tf.image.decode_jpeg(image)
 46 |         image = tf.image.resize_images(image, image_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
 47 |         # 这里使用tf.float32会将照片归一化，也就是 *1/255
 48 |         image = tf.image.convert_image_dtype(image, dtype=tf.float32)
 49 |         image = tf.reverse(image, axis=[2])  # 读取的是rgb，需要转为bgr
 50 |         return image
 51 | 
 52 |     @staticmethod
 53 |     def _parse_labels(_, labels, num_labels):
 54 |         """
 55 |         根据图片路径和标签，解析标签
 56 |         :param _: 图片路径, Tensor(shape=(), dtype=string)，本函数只产生标签dataset，故不需要
 57 |         :param labels: 标签，Tensor(shape=(?,), dtype=float32)
 58 |         :param num_labels: 每个图像对于输出的标签数（多标签分类模型）
 59 |         :return: 标签 DatasetV1Adapter<(多个标签Tensor(shape=(), dtype=float32), ...)>
 60 |         """
 61 |         label_list = list()
 62 |         for label_index in range(num_labels):
 63 |             label_list.append(labels[label_index])
 64 |         return label_list
 65 | 
 66 |     @staticmethod
 67 |     def get_dataset(file_path, root_path, image_size, num_labels, batch_size, is_augment=True, is_test=False):
 68 |         """
 69 |         从标签文件读取数据，并解析为（image_path, labels)形式的列表
 70 |         标签文件内容格式为：
 71 |         image_name label0,label1,label2,label3,...
 72 |         :param file_path: 标签文件路径
 73 |         :param root_path: 图片路径的根目录，用于和标签文件中的image_name拼接
 74 |         :param image_size: 图像需要resize到的尺寸
 75 |         :param num_labels: 每个图像对于输出的标签数（多标签分类模型）
 76 |         :param batch_size: 批次大小
 77 |         :param is_augment: 是否对图片进行数据增强
 78 |         :param is_test: 是否为测试阶段，测试阶段的话，输出的dataset中多包含image_path
 79 |         :return: tf.data.Dataset对象
 80 |         """
 81 |         logging.info('利用标签文件、图片根目录生成tf.data数据集对象：')
 82 |         logging.info('1. 解析标签文件；')
 83 |         dataset = tf.data.TextLineDataset(file_path)
 84 |         dataset = DatasetUtil.shuffle_repeat(dataset, batch_size)
 85 |         dataset = dataset.map(functools.partial(FileUtil._parse_string_line, root_path=root_path),
 86 |                               num_parallel_calls=tf.data.experimental.AUTOTUNE)
 87 |         logging.info('2. 读取图片数据，构造image set和label set；')
 88 |         image_set = dataset.map(functools.partial(FileUtil._parse_image, image_size=image_size),
 89 |                                 num_parallel_calls=tf.data.experimental.AUTOTUNE)
 90 |         labels_set = dataset.map(functools.partial(FileUtil._parse_labels, num_labels=num_labels),
 91 |                                  num_parallel_calls=tf.data.experimental.AUTOTUNE)
 92 | 
 93 |         if is_augment:
 94 |             logging.info('2.1 image set数据增强；')
 95 |             image_set = DatasetUtil.augment_image(image_set)
 96 | 
 97 |         logging.info('3. image set数据标准化；')
 98 |         image_set = image_set.map(lambda image: tf.image.per_image_standardization(image),
 99 |                                   num_parallel_calls=tf.data.experimental.AUTOTUNE)
100 | 
101 |         if is_test:
102 |             logging.info('4. 完成tf.data (image, label, path) 测试数据集构造；')
103 |             path_set = dataset.map(lambda image_path, label: image_path,
104 |                                    num_parallel_calls=tf.data.experimental.AUTOTUNE)
105 |             dataset = tf.data.Dataset.zip((image_set, labels_set, path_set))
106 |         else:
107 |             logging.info('4. 完成tf.data (image, label) 训练数据集构造；')
108 |             # 合并image、labels：
109 |             # DatasetV1Adapter<shapes:((48,144,?), ((), ..., ())), types:(float32,(float32,...,flout32))>
110 |             dataset = tf.data.Dataset.zip((image_set, labels_set))
111 |         logging.info('5. 构造tf.data多epoch训练模式；')
112 |         dataset = DatasetUtil.batch_prefetch(dataset, batch_size)
113 |         return dataset
114 | 
115 | 
116 | if __name__ == '__main__':
117 |     import cv2
118 |     import numpy as np
119 |     import time
120 |     
121 |     # 开启eager模式进行图片读取、增强和展示
122 |     tf.enable_eager_execution()
123 |     train_file_path = './test_sample/label.txt'  # 标签文件
124 |     image_root_path = './test_sample'  # 图片根目录
125 |     
126 |     train_batch = 100
127 |     train_set = FileUtil.get_dataset(train_file_path, image_root_path, image_size=(48, 144), num_labels=10,
128 |                                      batch_size=train_batch, is_augment=True)
129 |     start = time.time()
130 |     for count, data in enumerate(train_set):
131 |         for i in range(data[0].shape[0]):
132 |             cv2.imshow('a', np.array(data[0][i]))
133 |             cv2.waitKey(1)
134 | 
135 |     for count, data in enumerate(train_set):
136 |         print('一批(%d)图像 shape：' % train_batch, data[0].shape)
137 |         for i in range(data[0].shape[0]):
138 |             cv2.imshow('a', np.array(data[0][i]))
139 |             cv2.waitKey(1)
140 |         print('一批(%d)标签 shape：' % train_batch, len(data[1]))
141 |         for i in range(len(data[1])):
142 |             print(data[1][i])
143 |         if count == 100:
144 |             break
145 |     print('耗时：', time.time() - start)
146 | 


--------------------------------------------------------------------------------
/dataset/test_sample/label.txt:
--------------------------------------------------------------------------------
1 | 鲁BC6T76.jpg 15 1 12 6 27 7 6 -1 0 0


--------------------------------------------------------------------------------
/dataset/test_sample/鲁BC6T76.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zheng-yuwei/multi-label-classification/3563628060e5a9534b106414a193e71c2fa001b8/dataset/test_sample/鲁BC6T76.jpg


--------------------------------------------------------------------------------
/dataset/tfrecord_util.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File tfrecord_util.py
  4 | @author:ZhengYuwei
  5 | 功能：
  6 | 1. TFRecordUtil.generate
  7 |     用tf.gfile.FastGFile的read方法读取图片（要先确保shape一致），加上label，制作example，写tfrecord文件
  8 |     用tf.gfile.FastGFile读取图片而不是用cv2.imread读取图片，
  9 |     因为cv2.imread读取图片会使得存储的tfrecord扩大近8~10倍大小（相比原始jpg图片），而gfile只会增大一点
 10 |     另一种是不用tfrecord，而是训练时从图片名列表中读取图片，构造训练数据，速度会慢一点（个人感觉可以忽略）
 11 |     在进行这一切之前，确保图片不为空（cv2.imread(file) is not None）
 12 | 2. TFRecordUtil.get_dataset
 13 |     从tfrecord数据集中，构造tf.data.Dataset，解析图片、标签，返回未初始化的迭代器
 14 | """
 15 | import os
 16 | import logging
 17 | import time
 18 | import functools
 19 | import numpy as np
 20 | import tensorflow as tf
 21 | import cv2
 22 | 
 23 | from dataset.dataset_util import DatasetUtil
 24 | 
 25 | 
 26 | class TFRecordUtil(object):
 27 |     """ 图片-标签数据集 保存tfrecord，读取tfrecord """
 28 |     
 29 |     @staticmethod
 30 |     def generate(image_paths, labels, tfrecord_path):
 31 |         """ 用tf.gfile.FastGFile的read方法读取图片（要先确保shape一致），加上label，制作example，写tfrecord文件
 32 |         :param image_paths: 图片路径，list
 33 |         :param labels: 对应的标签，list
 34 |         :param tfrecord_path: tfrecord文件路径
 35 |         """
 36 |         if tf.gfile.Exists(tfrecord_path):
 37 |             logging.warning('TFRecord数据集(%s)已经存在，不再生成...', tfrecord_path)
 38 |             return
 39 | 
 40 |         total = len(image_paths)
 41 |         if len(labels) != total:
 42 |             logging.error('图片路径数量(%d)不等于标签数量(%d)', total, len(labels))
 43 |             return
 44 |         
 45 |         with tf.python_io.TFRecordWriter(tfrecord_path) as writer:
 46 |             for index, [image_path, label] in enumerate(zip(image_paths, labels)):
 47 |                 if (index + 1) % 1000 == 0:
 48 |                     logging.info('\r>> %d/%d done...', index + 1, total)
 49 |                 if not os.path.exists(image_path):
 50 |                     logging.warning('图片不存在：%s', image_path)
 51 |                     continue
 52 |                 # 读取图片，open(image_path, 'rb').read()和tf.read_file(image_path)也同样效果
 53 |                 image_data = tf.gfile.GFile(image_path, 'rb').read()  # type(image_data)为bytes
 54 |                 # 多label转化为string
 55 |                 label = np.asanyarray(label, dtype=np.int).tostring()
 56 |                 # 制作example，并序列化
 57 |                 tf_serialized = tf.train.Example(
 58 |                     features=tf.train.Features(
 59 |                         feature={
 60 |                             'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_data])),
 61 |                             'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[label]))
 62 |                         })).SerializeToString()
 63 |                 
 64 |                 writer.write(tf_serialized)
 65 |         return
 66 |     
 67 |     @staticmethod
 68 |     def _parse_tfrecord(serialized_example):
 69 |         """ 从序列化的tf.train.example解析出image（归一化）和label
 70 |         :param serialized_example: tfrecord读取来的序列化的tf.train.example数据
 71 |         :return: 归一化的图片，标签
 72 |         """
 73 |         example = tf.parse_single_example(
 74 |             serialized_example,
 75 |             features={
 76 |                 'image': tf.FixedLenFeature([], tf.string),
 77 |                 'label': tf.FixedLenFeature([], tf.string)
 78 |             }
 79 |         )
 80 |         
 81 |         image = tf.image.decode_jpeg(example['image'])
 82 |         label = tf.decode_raw(example['label'], tf.int32)
 83 |         label = tf.cast(label, tf.float32)
 84 |         return image, label
 85 | 
 86 |     @staticmethod
 87 |     def _parse_image(image, _, image_size):
 88 |         """
 89 |         根据图片路径和标签，读取图片
 90 |         :param image: 原始图片rgb数据, Tensor(shape=(原始尺寸), dtype=int)
 91 |         :param _: 标签Tensor(shape(?,), dtype=float32))，本函数只产生图像dataset，故不需要
 92 |         :param image_size: 图像需要resize到的大小
 93 |         :return: 归一化的图片 Tensor(shape=(48, 144, ?), dtype=float32)
 94 |         """
 95 |         # 图片
 96 |         image = tf.image.resize_images(image, image_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
 97 |         # 这里使用tf.float32会将照片归一化，也就是 *1/255
 98 |         image = tf.image.convert_image_dtype(image, dtype=tf.float32)
 99 |         image = tf.reverse(image, axis=[2])  # 读取的是rgb，需要转为bgr
100 |         return image
101 | 
102 |     @staticmethod
103 |     def _parse_labels(_, labels, num_labels):
104 |         """
105 |         根据图片路径和标签，解析标签
106 |         :param _: 图片路径, Tensor(shape=(), dtype=int)，本函数只产生标签dataset，故不需要
107 |         :param labels: 标签，Tensor(shape=(?,), dtype=float32)
108 |         :param num_labels: 每个图像对于输出的标签数（多标签分类模型）
109 |         :return: 标签 DatasetV1Adapter<(多个标签Tensor(shape=(), dtype=float32), ...)>
110 |         """
111 |         label_list = list()
112 |         for label_index in range(num_labels):
113 |             label_list.append(labels[label_index])
114 |         return label_list
115 | 
116 |     @staticmethod
117 |     def get_dataset(tfrecord_path_mode, image_size, num_labels, batch_size, is_augment=True):
118 |         """ 从tfrecord数据集中，构造tf.data，解析图片、标签，返回未初始化的迭代器
119 |         :param tfrecord_path_mode: tfrecord数据集名的模式，使用glob进行匹配
120 |         :param image_size: 图像需要resize到的尺寸
121 |         :param num_labels: 每个图像对于输出的标签数（多标签分类模型）
122 |         :param batch_size: 训练的batch大小
123 |         :param is_augment: 是否进行数据增强
124 |         :return: tf.data.Dataset对象
125 |         """
126 |         logging.info('1. 读取tfrecord文件，生成可初始化迭代器')
127 |         # 获取tfrecord，并进行解析
128 |         tfrecord_path_list = tf.data.Dataset.list_files(tfrecord_path_mode)
129 |         dataset = tf.data.TFRecordDataset(tfrecord_path_list)
130 |         dataset = DatasetUtil.shuffle_repeat(dataset, batch_size)
131 | 
132 |         logging.info('2. 读取图片数据，构造image set和label set；')
133 |         dataset = dataset.map(TFRecordUtil._parse_tfrecord, num_parallel_calls=tf.data.experimental.AUTOTUNE)
134 |         image_set = dataset.map(functools.partial(TFRecordUtil._parse_image, image_size=image_size),
135 |                                 num_parallel_calls=tf.data.experimental.AUTOTUNE)
136 |         labels_set = dataset.map(functools.partial(TFRecordUtil._parse_labels, num_labels=num_labels),
137 |                                  num_parallel_calls=tf.data.experimental.AUTOTUNE)
138 |         if is_augment:
139 |             logging.info('2.1 image set数据增强；')
140 |             image_set = DatasetUtil.augment_image(image_set)
141 | 
142 |         logging.info('3. image set数据白化；')
143 |         image_set = image_set.map(lambda image: tf.image.per_image_standardization(image),
144 |                                   num_parallel_calls=tf.data.experimental.AUTOTUNE)
145 | 
146 |         logging.info('4. 完成tf.data (image, label) 整体数据集构造，多epoch训练模式；')
147 |         dataset = tf.data.Dataset.zip((image_set, labels_set))
148 | 
149 |         logging.info('5. 构造tf.data多epoch训练模式；')
150 |         dataset = DatasetUtil.batch_prefetch(dataset, batch_size)
151 |         return dataset
152 | 
153 | 
154 | if __name__ == '__main__':
155 |     label_txt_path = './test_sample/label.txt'  # 标签文件
156 |     image_root_dir = './test_sample'  # 图片根目录
157 |     record_filename = './test_sample/{}.record'  # tfrecord存储目录
158 |     # 开启eager模式进行图片读取、增强和展示
159 |     tf.enable_eager_execution()
160 | 
161 |     # 1. 得到图片路径列表、标签数据列表
162 |     train_image_paths = list()
163 |     train_labels = list()
164 |     with open(label_txt_path, 'r', encoding='UTF-8') as label_file:
165 |         for line in label_file:
166 |             line = line.split(" ")
167 |             train_image_paths.append(os.path.join(image_root_dir, line[0]))
168 |             train_labels.append(line[1:])
169 |             
170 |     # 2. 制作record
171 |     file_names = record_filename.format('test_sample')
172 |     TFRecordUtil.generate(train_image_paths, train_labels, file_names)
173 |     
174 |     # 3. 读取record
175 |     train_batch = 100
176 |     train_set = TFRecordUtil.get_dataset(file_names, image_size=(48, 144), num_labels=10,
177 |                                          batch_size=train_batch, is_augment=True)
178 |     start = time.time()
179 |     for count, data in enumerate(train_set):
180 |         print('一批(%d)图像 shape：' % train_batch, data[0].shape)
181 |         for i in range(data[0].shape[0]):
182 |             cv2.imshow('a', np.array(data[0][i]))
183 |             cv2.waitKey(1)
184 |         print('一批(%d)标签 shape：' % train_batch, len(data[1]))
185 |         for i in range(len(data[1])):
186 |             print(data[1][i])
187 |         if count == 100:
188 |             break
189 |     print('耗时：', time.time() - start)
190 | 


--------------------------------------------------------------------------------
/images/GHM-insight.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zheng-yuwei/multi-label-classification/3563628060e5a9534b106414a193e71c2fa001b8/images/GHM-insight.jpg


--------------------------------------------------------------------------------
/images/focal-loss.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zheng-yuwei/multi-label-classification/3563628060e5a9534b106414a193e71c2fa001b8/images/focal-loss.jpg


--------------------------------------------------------------------------------
/multi_label/__init__.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | File __init__.py
4 | @author:ZhengYuwei
5 | """


--------------------------------------------------------------------------------
/multi_label/multi_label_loss.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File multi_label_loss.py
  4 | @author:ZhengYuwei
  5 | """
  6 | import numpy as np
  7 | import tensorflow as tf
  8 | from tensorflow import keras
  9 | 
 10 | 
 11 | class MyLoss(object):
 12 |     """ 损失函数 """
 13 |     def __init__(self, model, **options):
 14 |         self.model = model
 15 |         self.is_label_smoothing = options.setdefault('is_label_smoothing', False)
 16 |         self.is_focal_loss = options.setdefault('is_focal_loss', False)
 17 |         self.is_gradient_harmonizing = options.setdefault('is_gradient_harmonized', False)
 18 | 
 19 |         self.loss_func = self._normal_categorical_crossentropy()
 20 |         # 标签平滑
 21 |         if self.is_label_smoothing:
 22 |             self.smoothing_epsilon = options.setdefault('smoothing_epsilon', 0.005)
 23 |         # focal loss损失函数
 24 |         if self.is_focal_loss:
 25 |             gamma = options.setdefault('focal_loss_gamma', 2.0)
 26 |             alpha = options.setdefault('focal_loss_alpha', 1.0)
 27 |             self.loss_func = self._categorical_focal_loss(gamma, alpha)
 28 |         # gradient harmonized mechanism
 29 |         if self.is_gradient_harmonizing:
 30 |             bins = options.setdefault('ghm_loss_bins', 30)
 31 |             momentum = options.setdefault('ghm_loss_momentum', 0.75)
 32 |             self.loss_func = self._categorical_ghm_loss(bins, momentum)
 33 | 
 34 |     @staticmethod
 35 |     def _normal_categorical_crossentropy():
 36 |         """ 自带的多标签分类损失函数 categorical_crossentropy """
 37 |         def categorical_crossentropy(y_truth, y_pred, _):
 38 |             return keras.backend.categorical_crossentropy(y_truth, y_pred)
 39 |         return categorical_crossentropy
 40 | 
 41 |     @staticmethod
 42 |     def _categorical_focal_loss(gamma=2.0, alpha=1.0):
 43 |         """ 返回多分类 focal loss 函数
 44 |         Formula: loss = -alpha*((1-p_t)^gamma)*log(p_t)
 45 |         Parameters:
 46 |             alpha -- the same as wighting factor in balanced cross entropy, default 0.25
 47 |             gamma -- focusing parameter for modulating factor (1-p), default 2.0
 48 |         """
 49 |         def focal_loss(y_truth, y_pred, _):
 50 |             epsilon = keras.backend.epsilon()
 51 |             y_pred = keras.backend.clip(y_pred, epsilon, 1.0 - epsilon)
 52 |             cross_entropy = -y_truth * keras.backend.log(y_pred)
 53 |             weight = alpha * keras.backend.pow(keras.backend.abs(y_truth - y_pred), gamma)
 54 |             loss = weight * cross_entropy
 55 |             loss = keras.backend.sum(loss, axis=1)
 56 |             return loss
 57 |         return focal_loss
 58 | 
 59 |     @staticmethod
 60 |     def _categorical_ghm_loss(bins=30, momentum=0.75):
 61 |         """ 返回多分类 GHM 损失函数：
 62 |                 把每个区间上的梯度做平均，也就是说把梯度拉平，回推到公式上等价于把loss做平均
 63 |         Formula:
 64 |             loss = sum(crossentropy_loss(p_i,p*_i) / GD(g_i))
 65 |             GD(g) = S_ind(g) / delta = S_ind(g) * M
 66 |             S_ind(g) = momentum * S_ind(g) + (1 - momentum) * R_ind(g)
 67 |             R_ind(g)是 g=|p-p*| 所在梯度区间[(i-1)delta, i*delta]的样本数
 68 |             M = 1/delta，这个是个常数，理论上去掉只有步长影响
 69 |         Parameters: （论文默认）
 70 |             bins -- 区间个数，default 30
 71 |             momentum -- 使用移动平均来求区间内样本数，动量部分系数，论文说不敏感
 72 |         """
 73 |         # 区间边界
 74 |         edges = np.array([i/bins for i in range(bins + 1)])
 75 |         edges = np.expand_dims(np.expand_dims(edges, axis=-1), axis=-1)
 76 |         acc_sum = 0
 77 |         if momentum > 0:
 78 |             acc_sum = tf.zeros(shape=(bins,), dtype=tf.float32)
 79 | 
 80 |         def ghm_class_loss(y_truth, y_pred, valid_mask):
 81 |             epsilon = keras.backend.epsilon()
 82 |             y_pred = keras.backend.clip(y_pred, epsilon, 1.0 - epsilon)
 83 |             # 0. 计算本次mini-batch的梯度分布：R_ind(g)
 84 |             gradient = keras.backend.abs(y_truth - y_pred)
 85 |             # 获取概率最大的类别下标，将该类别的梯度做为该标签的梯度代表
 86 |             # 没有这部分就是每个类别的梯度都参与到GHM，实验表明没有这部分会更好些
 87 |             # truth_indices_1 = keras.backend.expand_dims(keras.backend.argmax(y_truth, axis=1))
 88 |             # truth_indices_0 = keras.backend.expand_dims(keras.backend.arange(start=0,
 89 |             #                                                                  stop=tf.shape(y_pred)[0],
 90 |             #                                                                  step=1, dtype='int64'))
 91 |             # truth_indices = keras.backend.concatenate([truth_indices_0, truth_indices_1])
 92 |             # main_gradient = tf.gather_nd(gradient, truth_indices)
 93 |             # gradient = tf.tile(tf.expand_dims(main_gradient, axis=-1), [1, y_pred.shape[1]])
 94 |             
 95 |             # 求解各个梯度所在的区间，并落到对应区间内进行密度计数
 96 |             grads_bin = tf.logical_and(tf.greater_equal(gradient, edges[:-1, :, :]), tf.less(gradient, edges[1:, :, :]))
 97 |             valid_bin = tf.boolean_mask(grads_bin, valid_mask, name='valid_gradient', axis=1)
 98 |             valid_bin = tf.reduce_sum(tf.cast(valid_bin, dtype=tf.float32), axis=(1, 2))
 99 |             # 2. 更新指数移动平均后的梯度分布：S_ind(g)
100 |             nonlocal acc_sum
101 |             acc_sum = tf.add(momentum * acc_sum, (1 - momentum) * valid_bin, name='update_bin_number')
102 |             # sample_num = tf.reduce_sum(acc_sum)  # 是否乘以总数，乘上效果反而变差了
103 |             # 3. 计算本次mini-batch不同loss对应的梯度密度：GD(g)
104 |             position = tf.slice(tf.where(grads_bin), [0, 1], [-1, 2])
105 |             value = tf.gather_nd(acc_sum, tf.slice(tf.where(grads_bin), [0, 0], [-1, 1]))  # * bins
106 |             grad_density = tf.sparse.SparseTensor(indices=position, values=value,
107 |                                                   dense_shape=tf.shape(gradient, out_type=tf.int64))
108 |             grad_density = tf.sparse.to_dense(grad_density, validate_indices=False)
109 |             grad_density = grad_density * tf.expand_dims(valid_mask, -1) + (1 - tf.expand_dims(valid_mask, -1))
110 | 
111 |             # 4. 计算本次mini-batch不同样本的损失：loss
112 |             cross_entropy = -y_truth * keras.backend.log(y_pred)
113 |             # loss = cross_entropy / grad_density * sample_num
114 |             loss = cross_entropy / grad_density
115 |             loss = keras.backend.sum(loss, axis=1)
116 |             """
117 |             # 调试用，打印tensor
118 |             print_op = tf.print('acc_sum: ', acc_sum, '\n',
119 |                                 'grad_density: ', grad_density, '\n',
120 |                                 'cross_entropy: ', cross_entropy, '\n',
121 |                                 'loss:', loss, '\n',
122 |                                 '\n',
123 |                                 '=================================================\n',
124 |                                 summarize=100)
125 |             with tf.control_dependencies([print_op]):
126 |                 return tf.identity(loss)
127 |             """
128 |             return loss
129 |         return ghm_class_loss
130 | 
131 |     def categorical_crossentropy(self, y_truth, y_pred):
132 |         """ 单标签多分类损失函数
133 |         :param y_truth: 真实类别值, (?, ?)
134 |         :param y_pred: 预测类别值, (?, num_classes)
135 |         :return: loss
136 |         """
137 |         num_classes = keras.backend.cast(keras.backend.int_shape(y_pred)[-1], dtype=tf.int32)  # 类别数
138 |         # 将sparse的truth输出flatten, 记录无效标签(-1)和有效标签(>=0)位置，后续用于乘以loss
139 |         y_truth = keras.backend.flatten(y_truth)
140 |         valid_mask = 1.0 - tf.cast(tf.less(y_truth, 0), dtype=tf.float32)
141 |         # 转为one_hot
142 |         y_truth = keras.backend.cast(y_truth, dtype=tf.uint8)
143 |         y_truth = keras.backend.one_hot(indices=y_truth, num_classes=num_classes)
144 | 
145 |         # 标签平滑
146 |         if self.is_label_smoothing:
147 |             num_classes = keras.backend.cast(num_classes, dtype=y_pred.dtype)
148 |             y_truth = (1.0 - self.smoothing_epsilon) * y_truth + self.smoothing_epsilon / num_classes
149 | 
150 |         loss = self.loss_func(y_truth, y_pred, valid_mask)
151 |         loss = loss * valid_mask
152 |         """
153 |         # 调试用，打印tensor
154 |         print_op = tf.print(
155 |             # 'y_pred: ', y_pred, '\n',
156 |             # 'y_truth: ', y_truth, '\n',
157 |             # 'valid_mask: ', valid_mask, '\n',
158 |             # 'loss:', loss, '\n',
159 |             # 'normal_loss:', self._normal_categorical_crossentropy()(y_truth, y_pred, valid_mask), '\n',
160 |             'layer losses (regularization)', tf.transpose(self.model.losses), '\n',
161 |             'mean loss:', tf.reduce_mean(loss), '\t',
162 |             'sum layer losses:', tf.reduce_sum(tf.transpose(self.model.losses)), '\n',
163 |             '=================================================\n',
164 |             summarize=100
165 |         )
166 |         with tf.control_dependencies([print_op]):
167 |             return tf.identity(loss)
168 |         """
169 |         return loss
170 | 


--------------------------------------------------------------------------------
/multi_label/multi_label_model.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File multi_label_model.py
 4 | @author:ZhengYuwei
 5 | """
 6 | import logging
 7 | from tensorflow import keras
 8 | from backbone.resnet18 import ResNet18
 9 | from backbone.resnet18_v2 import ResNet18_v2
10 | from backbone.resnext import ResNeXt18
11 | from backbone.mixnet18 import MixNet18
12 | from backbone.mobilenet_v2 import MobileNetV2
13 | 
14 | 
15 | class Classifier(object):
16 |     """
17 |     分类器，自定义了多标签的head(多输出keras.models.Model对象)
18 |     """
19 |     BACKBONE_RESNET_18 = 'resnet-18'
20 |     BACKBONE_RESNET_18_V2 = 'resnet-18-v2'
21 |     BACKBONE_RESNEXT_18 = 'resnext-18'
22 |     BACKBONE_MIXNET_18 = 'mixnet-18'
23 |     BACKBONE_MOBILENET_V2 = 'mobilenet-v2'
24 |     BACKBONE_TYPE = {
25 |         BACKBONE_RESNET_18: ResNet18,
26 |         BACKBONE_RESNET_18_V2: ResNet18_v2,
27 |         BACKBONE_RESNEXT_18: ResNeXt18,
28 |         BACKBONE_MOBILENET_V2: MobileNetV2,
29 |         BACKBONE_MIXNET_18: MixNet18
30 |     }
31 | 
32 |     @classmethod
33 |     def _multi_label_head(cls, net, output_shape, output_names):
34 |         """
35 |         多标签分类器的head，上接全连接输入，下输出多个标签的多分类softmax输出
36 |         :param net: 全连接输入
37 |         :param output_shape: 多标签输出的每个分支的类别数列表
38 |         :param output_names: 多标签输出的每个分支的名字
39 |         :return: keras.models.Model对象
40 |         """
41 |         # 全连接层：先做全局平均池化，然后flatten，然后再全连接层
42 |         net = keras.layers.GlobalAveragePooling2D()(net)
43 |         net = keras.layers.Flatten()(net)
44 |         
45 |         # 不同标签分支
46 |         outputs = list()
47 |         for num, name in zip(output_shape, output_names):
48 |             output = keras.layers.Dense(units=num, kernel_initializer=keras.initializers.RandomNormal(stddev=0.01),
49 |                                         activation="softmax", name=name)(net)
50 |             """
51 |             output = keras.layers.Dense(units=num, kernel_initializer=keras.initializers.RandomNormal(stddev=0.01),
52 |                                         kernel_regularizer=keras.regularizers.l2(ResNet18.L2_WEIGHT),
53 |                                         bias_regularizer=keras.regularizers.l2(ResNet18.L2_WEIGHT),
54 |                                         activation="softmax", name=name)(net)
55 |             """
56 |             outputs.append(output)
57 |         return outputs
58 | 
59 |     @classmethod
60 |     def build(cls, backbone, input_shape, output_shape, output_names):
61 |         """
62 |         构建backbone基础网络的多标签分类keras.models.Model对象
63 |         :param backbone: 基础网络，枚举变量 Classifier.NetType
64 |         :param input_shape: 输入尺寸
65 |         :param output_shape: 多标签输出的每个分支的类别数列表
66 |         :param output_names: 多标签输出的每个分支的名字
67 |         :return: resnet18基础网络的多标签分类keras.models.Model对象
68 |         """
69 |         if len(input_shape) != 3:
70 |             raise Exception('模型输入形状必须是3维形式')
71 | 
72 |         if backbone in cls.BACKBONE_TYPE.keys():
73 |             backbone = cls.BACKBONE_TYPE[backbone]
74 |         else:
75 |             raise ValueError("没有该类型的基础网络！")
76 |         
77 |         if len(input_shape) != 3:
78 |             raise Exception('模型输入形状必须是3维形式')
79 | 
80 |         logging.info('构造多标签分类模型，基础网络：%s', backbone)
81 |         input_x = keras.layers.Input(shape=input_shape)
82 |         backbone_model = backbone.build(input_x)
83 |         outputs = Classifier._multi_label_head(backbone_model, output_shape, output_names)
84 |         model = keras.models.Model(inputs=input_x, outputs=outputs, name=backbone)
85 |         return model
86 | 
87 | 
88 | if __name__ == '__main__':
89 |     """
90 |     可视化网络结构，使用plot_model需要先用conda安装GraphViz、pydotplus
91 |     """
92 |     from configs import FLAGS
93 |     model_names = Classifier.BACKBONE_TYPE.keys()
94 |     for model_name in model_names:
95 |         test_model = Classifier.build(model_name, FLAGS.input_shape, FLAGS.output_shapes, FLAGS.output_names)
96 |         keras.utils.plot_model(test_model, to_file='../images/{}.svg'.format(model_name), show_shapes=True)
97 |         test_model.summary()
98 | 


--------------------------------------------------------------------------------
/multi_label/trainer.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | File trainer.py
  4 | @author:ZhengYuwei
  5 | """
  6 | import os
  7 | import logging
  8 | import tensorflow as tf
  9 | from tensorflow import saved_model
 10 | from tensorflow import keras
 11 | 
 12 | from configs import FLAGS
 13 | from dataset.file_util import FileUtil
 14 | from multi_label.multi_label_model import Classifier
 15 | from multi_label.multi_label_loss import MyLoss
 16 | 
 17 | 
 18 | class MultiLabelClassifier(object):
 19 |     """
 20 |     训练分类器：
 21 |     1. 初始化分类器模型、训练参数等；
 22 |     2. 调用prepare_data函数准备训练、验证数据集；
 23 |     3. 调用train函数训练。
 24 |     """
 25 | 
 26 |     GPU_MODE = 'gpu'
 27 |     CPU_MODE = 'cpu'
 28 | 
 29 |     def __init__(self):
 30 |         """ 训练初始化 """
 31 |         # 构建模型网络
 32 |         self.backbone = FLAGS.model_backbone  # 网络类型
 33 |         self.input_shape = FLAGS.input_shape
 34 |         self.output_shapes = FLAGS.output_shapes
 35 | 
 36 |         model = Classifier.build(self.backbone, self.input_shape, self.output_shapes, FLAGS.output_names)
 37 |         # 训练模型: cpu，gpu 或 多gpu
 38 |         if FLAGS.gpu_mode == MultiLabelClassifier.GPU_MODE and FLAGS.gpu_num > 1:
 39 |             self.model = keras.utils.multi_gpu_model(model, gpus=FLAGS.gpu_num)
 40 |         else:
 41 |             self.model = model
 42 |         self.model.summary()
 43 |         self.history = None
 44 | 
 45 |         # 加载预训练模型（若有）
 46 |         self.checkpoint_path = FLAGS.checkpoint_path
 47 |         if self.checkpoint_path is None:
 48 |             self.checkpoint_path = 'models/'
 49 |         if os.path.isfile(self.checkpoint_path):
 50 |             if os.path.exists(self.checkpoint_path):
 51 |                 self.model.load_weights(self.checkpoint_path)
 52 |                 logging.info('加载模型成功！')
 53 |             else:
 54 |                 self.checkpoint_path = os.path.dirname(self.checkpoint_path)
 55 |         if os.path.isdir(self.checkpoint_path):
 56 |             if not os.path.exists(self.checkpoint_path):
 57 |                 os.makedirs(self.checkpoint_path)
 58 |             latest = tf.train.latest_checkpoint(self.checkpoint_path)
 59 |             if latest is not None:
 60 |                 self.model.load_weights(latest)
 61 |                 logging.info('加载模型成功！')
 62 |                 logging.info(latest)
 63 |         else:
 64 |             self.checkpoint_path = os.path.dirname(self.checkpoint_path)
 65 |         self.checkpoint_path = os.path.join(self.checkpoint_path, FLAGS.checkpoint_name)
 66 | 
 67 |         # 设置训练过程中的回调函数
 68 |         tensorboard = keras.callbacks.TensorBoard(log_dir=FLAGS.tensorboard_dir)
 69 |         cp_callback = keras.callbacks.ModelCheckpoint(self.checkpoint_path, save_weights_only=True,
 70 |                                                       verbose=1, period=FLAGS.ckpt_period)
 71 |         es_callback = keras.callbacks.EarlyStopping(monitor='loss', min_delta=FLAGS.stop_min_delta,
 72 |                                                     patience=FLAGS.stop_patience, verbose=0, mode='min')
 73 |         lr_callback = keras.callbacks.LearningRateScheduler(FLAGS.lr_func)
 74 |         # from utils.logger_callback import NBatchProgbarLogger
 75 |         # log_callback = NBatchProgbarLogger(display=FLAGS.logger_batch)
 76 |         self.callbacks = [tensorboard, cp_callback, es_callback, lr_callback, ]
 77 | 
 78 |         # 设置模型优化方法
 79 |         self.loss_function = list()
 80 |         for _ in self.output_shapes:
 81 |             loss_function = MyLoss(self.model,
 82 |                                    is_label_smoothing=FLAGS.is_label_smoothing,
 83 |                                    is_focal_loss=FLAGS.is_focal_loss,
 84 |                                    is_gradient_harmonized=FLAGS.is_gradient_harmonized).categorical_crossentropy
 85 |             self.loss_function.append(loss_function)
 86 | 
 87 |         optimizer = keras.optimizers.SGD(lr=FLAGS.init_lr, momentum=0.95, nesterov=True)
 88 |         if FLAGS.optimizer == 'adam':
 89 |             optimizer = keras.optimizers.Adam(lr=FLAGS.init_lr, amsgrad=True)  # 用AMSGrad
 90 |         elif FLAGS.optimizer == 'adabound':
 91 |             from keras_adabound import AdaBound
 92 |             optimizer = AdaBound(lr=1e-3, final_lr=0.1)
 93 |         elif FLAGS.optimizer == 'radam':
 94 |             from utils.radam import RAdam
 95 |             optimizer = RAdam(lr=1e-3)
 96 | 
 97 |         # 由于是多标签分类损失，最终答应的损失信息为：
 98 |         # loss: 44.6420 - class_1_loss: 7.8428 - class_2_loss: 5.8357 - class_3_loss: 4.5361 - class_4_loss: 4.7954
 99 |         # - class_5_loss: 4.1554 - class_6_loss: 4.6104 - class_7_loss: 5.5645 - class_8_loss: 0.6412
100 |         # - class_9_loss: 0.7639 - class_10_loss: 4.1163
101 |         # class_1_loss等为对应标签的平均样本损失，loss=所有标签平均样本损失 + 权重 * 罚项（正则项）误差(model.losses)
102 |         self.model.compile(optimizer=optimizer, loss=self.loss_function, loss_weights=FLAGS.loss_weights)
103 | 
104 |         # 设置模型训练参数
105 |         self.mini_batch = FLAGS.batch_size
106 |         self.epoch = FLAGS.epoch
107 | 
108 |     def prepare_data(self, label_file_path, image_root_dir, is_augment=False, is_test=False):
109 |         """
110 |         数据集准备，返回可初始化迭代器，使用前需要先sess.run(iterator.initializer)进行初始化
111 |         :param label_file_path: 标签文件路径，格式参考 代码具体接口解释
112 |         :param image_root_dir: 图片文件根目录
113 |         :param is_augment: 是否进行数据增强
114 |         :param is_test: 是否为测试阶段
115 |         :return: tf.data.Dataset对象
116 |         """
117 |         logging.info('加载数据集：%s', label_file_path)
118 |         dataset = FileUtil.get_dataset(label_file_path, image_root_dir, image_size=self.input_shape[0:2],
119 |                                        num_labels=len(self.output_shapes), batch_size=self.mini_batch,
120 |                                        is_augment=is_augment, is_test=is_test)
121 |         return dataset
122 | 
123 |     def train(self, train_set, val_set, train_steps=FLAGS.steps_per_epoch, val_steps=FLAGS.validation_steps):
124 |         """
125 |         使用训练集和验证集进行模型训练
126 |         :param train_set: 训练数据集的tf.data.Dataset对象
127 |         :param val_set: 验证数据集的tf.data.Dataset对象
128 |         :param train_steps: 每个训练epoch的迭代次数
129 |         :param val_steps: 每个验证epoch的迭代次数
130 |         :return:
131 |         """
132 |         if val_set:
133 |             self.history = self.model.fit(train_set, epochs=self.epoch, validation_data=val_set,
134 |                                           steps_per_epoch=train_steps, validation_steps=val_steps,
135 |                                           callbacks=self.callbacks, verbose=2)
136 |         else:
137 |             self.history = self.model.fit(train_set, epochs=self.epoch, steps_per_epoch=train_steps,
138 |                                           callbacks=self.callbacks, verbose=2)
139 |         logging.info('模型训练完毕！')
140 | 
141 |     def save_serving(self):
142 |         """ 使用TensorFlow Serving时的保存方式：
143 |             serving-save-dir/
144 |                 saved_model.pb
145 |                 variables/
146 |                     .data & .index
147 |         """
148 |         outputs = dict()
149 |         for index, name in enumerate(FLAGS.output_names):
150 |             outputs[name] = self.model.outputs[index]
151 | 
152 |         builder = saved_model.builder.SavedModelBuilder(FLAGS.serving_model_dir)
153 |         signature = saved_model.signature_def_utils.predict_signature_def(inputs={'images': self.model.input},
154 |                                                                           outputs=outputs)
155 |         with keras.backend.get_session() as sess:
156 |             builder.add_meta_graph_and_variables(sess=sess,
157 |                                                  tags=[saved_model.tag_constants.SERVING],
158 |                                                  signature_def_map={'predict': signature})
159 |             builder.save()
160 |         logging.info('serving模型保存成功!')
161 | 
162 |     def save_mobile(self):
163 |         """
164 |         保存模型为pb模型：先转为h5，再保存为pb（没法直接转pb）
165 |         """
166 |         # 获取待保存ckpt文件的文件名
167 |         latest = tf.train.latest_checkpoint(os.path.dirname(self.checkpoint_path))
168 |         model_name = os.path.splitext(os.path.basename(latest))[0]
169 |         if not os.path.exists(FLAGS.pb_model_dir):
170 |             os.makedirs(FLAGS.pb_model_dir)
171 |         # 将整个模型保存为h5（包含图结构和参数），然后再重新加载
172 |         h5_path = os.path.join(FLAGS.pb_model_dir, '{}.h5'.format(model_name))
173 |         self.model.save(h5_path, overwrite=True, include_optimizer=False)
174 |         model = keras.models.load_model(h5_path)
175 |         model.summary()
176 |         # 保存pb
177 |         with keras.backend.get_session() as sess:
178 |             output_names = [out.op.name for out in model.outputs]
179 |             input_graph_def = sess.graph.as_graph_def()
180 |             for node in input_graph_def.node:
181 |                 node.device = ""
182 |             graph = tf.graph_util.remove_training_nodes(input_graph_def)
183 |             graph_frozen = tf.graph_util.convert_variables_to_constants(sess, graph, output_names)
184 |             tf.train.write_graph(graph_frozen, FLAGS.pb_model_dir, '{}.pb'.format(model_name), as_text=False)
185 |         logging.info("pb模型保存成功！")
186 | 
187 |     def evaluate(self, test_set, steps):
188 |         """
189 |         使用测试集进行模型评估
190 |         :param test_set: 测试集的tf.data.Dataset对象
191 |         :param steps: 每一个epoch评估次数
192 |         :return:
193 |         """
194 |         test_loss, test_acc = self.model.evaluate(test_set)
195 |         logging.info('Test accuracy:', test_acc, steps)
196 | 
197 |     def predict(self, test_images):
198 |         """
199 |         使用测试图片进行模型测试
200 |         :param test_images: 测试图片
201 |         :return:
202 |         """
203 |         predictions = self.model.predict(test_images)
204 |         return predictions
205 | 
206 |     def get_gradients(self, images, labels, persistent=False):
207 |         """
208 |         在给定输入，获取所有可训练权重向量的梯度向量
209 |         :param images: 输入图像
210 |         :param labels: 标签ground truth
211 |         :param persistent: 是否用持久化的tape，一般不用，除非开启debug模式在该函数内debug
212 |         :return: 获取所有可训练参数的梯度
213 |         """
214 |         with tf.GradientTape(persistent=persistent) as tape:
215 |             y_preds = self.model(images)
216 |             y_truths = labels
217 |             loss = 0
218 |             for y_truth, y_pred in zip(y_truths, y_preds):
219 |                 loss += self.loss_function(y_truth, y_pred)
220 |             loss = tf.reduce_mean(loss)
221 |         gradients = tape.gradient(loss, self.model.trainable_weights)
222 |         gradients = [{weight.name: gradient} for gradient, weight in zip(gradients, self.model.trainable_weights)]
223 |         return gradients
224 | 
225 |     def get_trainable_layers_func(self):
226 |         """
227 |         构造keras函数：在给定输入，获取所有layer的预测结果
228 |         usage:
229 |             classifier = TrainClassifier(backbone=Classifier.BACKBONE_RESNET_18)
230 |             get_trainable_layers = classifier.get_trainable_layers_func()
231 |             outputs = get_trainable_layers(test_images)  # test_images [None, 48, 144, 3]
232 |         :return: 获取所有layers的预测结果的keras函数
233 |         """
234 |         trainable_names = [weight.name for weight in self.model.trainable_weights]
235 |         trainable_names = set([name.split('/')[0] for name in trainable_names])
236 |         trainable_outputs = [{layer.name: layer.output} for layer in self.model.layers
237 |                              if layer.name in trainable_names]
238 |         get_trainable_layers = keras.backend.function(inputs=[self.model.input], outputs=trainable_outputs)
239 |         return get_trainable_layers
240 | 
241 |     def get_layers_func(self):
242 |         """
243 |         构造keras函数：在给定输入，获取所有layer的预测结果
244 |         usage:
245 |             classifier = TrainClassifier(backbone=Classifier.BACKBONE_RESNET_18)
246 |             get_layers = classifier.get_layer_func()
247 |             outputs = get_layers(test_images)  # test_images [None, 48, 144, 3]
248 |         :return: 获取所有layers的预测结果的keras函数
249 |         """
250 |         layers_output = [layer.output for layer in self.model.layers]
251 |         get_layers = keras.backend.function(inputs=[self.model.input], outputs=layers_output)
252 |         return get_layers
253 | 
254 |     def convert_multi2single(self):
255 |         """
256 |         将多GPU训练的模型转为单GPU模型，从而可以在单GPU上运行测试
257 |         :return:
258 |         """
259 |         # it's necessary to save the model before use this single GPU model
260 |         multi_model = self.model.layers[FLAGS.gpu_num + 1]  # get single GPU model weights
261 |         dir_name = self.checkpoint_path
262 |         if not os.path.isdir(self.checkpoint_path):
263 |             dir_name = os.path.dirname(self.checkpoint_path)
264 |         latest = tf.train.latest_checkpoint(dir_name)
265 |         save_path = os.path.join(dir_name, 'single_' + os.path.basename(latest))
266 |         multi_model.save_weights(save_path)
267 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | numpy>=1.16.5
 2 | seaborn==0.9.0
 3 | easydict==1.9
 4 | pandas==0.25.2
 5 | opencv_python==4.1.1.26
 6 | tensorflow>=1.13.1
 7 | matplotlib==3.1.1
 8 | keras_adabound==0.5.0
 9 | scikit_learn==0.21.3
10 | 


--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on 2019/7/17
  4 | File run.py
  5 | @author:ZhengYuwei
  6 | """
  7 | import os
  8 | import logging
  9 | import numpy as np
 10 | import tensorflow as tf
 11 | from tensorflow import keras
 12 | from logging.handlers import RotatingFileHandler
 13 | 
 14 | from multi_label.trainer import MultiLabelClassifier
 15 | from configs import FLAGS
 16 | 
 17 | if FLAGS.mode == 'test':
 18 |     tf.enable_eager_execution()
 19 | if FLAGS.mode in ('train', 'debug'):
 20 |     keras.backend.set_learning_phase(True)
 21 | else:
 22 |     keras.backend.set_learning_phase(False)
 23 | np.random.seed(6)
 24 | tf.set_random_seed(800)
 25 | 
 26 | 
 27 | def generate_logger(filename, **log_params):
 28 |     """
 29 |     生成日志记录对象记录日志
 30 |     :param filename: 日志文件名称
 31 |     :param log_params: 日志参数
 32 |     :return:
 33 |     """
 34 |     level = log_params.setdefault('level', logging.INFO)
 35 | 
 36 |     logger = logging.getLogger()
 37 |     logger.setLevel(level=level)
 38 |     formatter = logging.Formatter('%(asctime)s %(filename)s:%(lineno)d %(levelname)s %(message)s')
 39 |     # 定义一个RotatingFileHandler，最多备份3个日志文件，每个日志文件最大1M
 40 |     file_handler = RotatingFileHandler(filename, maxBytes=1 * 1024 * 1024, backupCount=3)
 41 |     file_handler.setFormatter(formatter)
 42 |     # 控制台输出
 43 |     console = logging.StreamHandler()
 44 |     console.setFormatter(formatter)
 45 | 
 46 |     logger.addHandler(file_handler)
 47 |     logger.addHandler(console)
 48 |  
 49 |  
 50 | def run():
 51 |     # gpu模式
 52 |     if FLAGS.gpu_mode != MultiLabelClassifier.CPU_MODE:
 53 |         os.environ["CUDA_VISIBLE_DEVICES"] = FLAGS.visible_gpu
 54 |         # tf.device('/gpu:{}'.format(FLAGS.gpu_device))
 55 |         config = tf.ConfigProto()
 56 |         config.gpu_options.allow_growth = True  # 按需
 57 |         sess = tf.Session(config=config)
 58 | 
 59 |         """
 60 |         # 添加debug：nan或inf过滤器
 61 |         from tensorflow.python import debug as tf_debug
 62 |         from tensorflow.python.debug.lib.debug_data import InconvertibleTensorProto
 63 |         sess = tf_debug.LocalCLIDebugWrapperSession(sess)
 64 |         
 65 |         # nan过滤器
 66 |         def has_nan(datum, tensor):
 67 |             _ = datum  # Datum metadata is unused in this predicate.
 68 |             if isinstance(tensor, InconvertibleTensorProto):
 69 |                 # Uninitialized tensor doesn't have bad numerical values.
 70 |                 # Also return False for data types that cannot be represented as numpy
 71 |                 # arrays.
 72 |                 return False
 73 |             elif (np.issubdtype(tensor.dtype, np.floating) or
 74 |                   np.issubdtype(tensor.dtype, np.complex) or
 75 |                   np.issubdtype(tensor.dtype, np.integer)):
 76 |                 return np.any(np.isnan(tensor))
 77 |             else:
 78 |                 return False
 79 | 
 80 |         # inf过滤器
 81 |         def has_inf(datum, tensor):
 82 |             _ = datum  # Datum metadata is unused in this predicate.
 83 |             if isinstance(tensor, InconvertibleTensorProto):
 84 |                 # Uninitialized tensor doesn't have bad numerical values.
 85 |                 # Also return False for data types that cannot be represented as numpy
 86 |                 # arrays.
 87 |                 return False
 88 |             elif (np.issubdtype(tensor.dtype, np.floating) or
 89 |                   np.issubdtype(tensor.dtype, np.complex) or
 90 |                   np.issubdtype(tensor.dtype, np.integer)):
 91 |                 return np.any(np.isinf(tensor))
 92 |             else:
 93 |                 return False
 94 |         
 95 |         # 添加过滤器
 96 |         sess.add_tensor_filter("has_nan", has_nan)
 97 |         sess.add_tensor_filter("has_inf", has_inf)
 98 |         sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
 99 |         """
100 |         keras.backend.set_session(sess)
101 | 
102 |     generate_logger(filename=FLAGS.log_path)
103 |     logging.info('TensorFlow version: %s', tf.__version__)  # 1.13.1
104 |     logging.info('Keras version: %s', keras.__version__)  # 2.2.4-tf
105 | 
106 |     classifier = MultiLabelClassifier()
107 | 
108 |     # 模型训练
109 |     if FLAGS.mode == 'train':
110 |         train_dataset = classifier.prepare_data(FLAGS.train_label_path, FLAGS.train_set_dir, FLAGS.is_augment)
111 |         classifier.train(train_dataset, None)
112 |         logging.info('训练完毕！')
113 | 
114 |     # 用于测试，
115 |     elif FLAGS.mode == 'test':
116 |         # 测试用单GPU测试，若是多GPU模型，需要先转为单GPU模型，然后再执行测试
117 |         if FLAGS.gpu_num > 1:
118 |             classifier.convert_multi2single()
119 |             logging.info('多GPU训练模型转换单GPU运行模型成功，请使用单GPU测试！')
120 |             return
121 | 
122 |         total_test, wrong_count, great_total_count, great_wrong_count, great_wrong_records = test_model(classifier)
123 |         logging.info('预测总数：%d\t 错误数：%d', total_test, wrong_count)
124 |         logging.info('大于置信度总数：%d\t 错误数：%d\t 准确率：%f', great_total_count, great_wrong_count,
125 |                      1 - great_wrong_count/(great_total_count + 1e-7))
126 |         # logging.info('错误路径是：\n%s', great_wrong_records)
127 |         logging.info('测试完毕！')
128 | 
129 |     # 用于调试，查看训练的模型中每一层的输出/梯度
130 |     elif FLAGS.mode == 'debug':
131 |         import cv2
132 |         train_dataset = classifier.prepare_data(FLAGS.train_label_path, FLAGS.train_set_dir, FLAGS.is_augment)
133 |         get_trainable_layers = classifier.get_trainable_layers_func()
134 |         for images, labels in train_dataset:
135 |             cv2.imshow('a', np.array(images[0]))
136 |             cv2.waitKey(1)
137 |             outputs = get_trainable_layers(images)  # 每一个可训练层的输出
138 |             gradients = classifier.get_gradients(images, labels)  # 每一个可训练层的参数梯度
139 |             assert outputs is not None
140 |             assert gradients is not None
141 |             logging.info("=============== debug ================")
142 | 
143 |     # 将模型保存为pb模型
144 |     elif FLAGS.mode == 'save_pb':
145 |         # 保存模型记得注释eager execution
146 |         classifier.save_mobile()
147 | 
148 |     # 将模型保存为服务器pb模型
149 |     elif FLAGS.mode == 'save_serving':
150 |         # 保存模型记得注释eager execution
151 |         classifier.save_serving()
152 |     else:
153 |         raise ValueError('Mode Error!')
154 | 
155 | 
156 | def test_model(classifier):
157 |     """ 模型测试
158 |     :param classifier: 训练完毕的多标签分类模型
159 |     :return: 总测试样本数, 总错误样本数，大于置信度的总样本数, 大于置信度的错误样本数, 错误样本路径记录
160 |     """
161 |     # import cv2
162 |     # 测试集包含(image, labels, image_path)
163 |     test_set = classifier.prepare_data(FLAGS.test_label_path, FLAGS.test_set_dir, is_augment=False, is_test=True)
164 |     base_conf = FLAGS.base_confidence  # 置信度基线
165 | 
166 |     # 实际标签，预测标签，预测概率（label数，验证样本数）
167 |     total_test = int(np.ceil(FLAGS.val_set_size / FLAGS.batch_size) * FLAGS.batch_size)
168 |     truth = np.zeros(shape=(len(FLAGS.output_shapes), total_test))
169 |     pred = np.zeros(shape=(len(FLAGS.output_shapes), total_test))
170 |     prob = np.zeros(shape=(len(FLAGS.output_shapes), total_test))
171 |     start_index, end_index = 0, FLAGS.batch_size
172 |     great_wrong_records = list()  # 大于置信度的错误路径集合
173 |     for images, labels, paths in test_set:
174 |         great_wrong_records = np.concatenate((great_wrong_records, np.array(paths)), axis=0)
175 |         truth[:, start_index:end_index] = np.array(labels)
176 |         results = classifier.predict(np.array(images))
177 |         pred[:, start_index:end_index] = np.array([np.argmax(result, axis=-1) for result in results])
178 |         prob[:, start_index:end_index] = np.array([np.max(result, axis=-1) for result in results])
179 |         start_index, end_index = end_index, end_index + FLAGS.batch_size
180 |         logging.info('finish: %d/%d', start_index, total_test)
181 |         if start_index >= total_test:
182 |             break
183 | 
184 |     # 比较truth和pred，prob和base conf，以统计评价指标
185 |     valid_mask = (truth != -1)  # 有效的待预测位置标记（无效标签/未知类别的在label里真实标签为-1）
186 |     wrong_mask = abs(pred - truth) > 0.5  # 预测错误的位置标记
187 |     great_conf_mask = (prob >= base_conf)  # 预测置信度大于基线的位置标记
188 |     wrong_result = np.any(valid_mask & wrong_mask, axis=0)
189 |     great_conf_result = np.all(~valid_mask | great_conf_mask, axis=0)
190 | 
191 |     # 总错误数，大于置信度错误数，总大于置信度样本数
192 |     wrong_count = np.sum(wrong_result)
193 |     great_total_count = np.sum(great_conf_result)
194 |     great_wrong_count = np.sum(wrong_result & great_conf_result)
195 |     # 记录大于置信度的预测错误标签
196 |     if np.any(wrong_result & great_conf_result):
197 |         great_wrong_records = [u.decode() for u in great_wrong_records[wrong_result & great_conf_result]]
198 | 
199 |     # plot_confusion_matrix(truth, pred)
200 |     return total_test, wrong_count, great_total_count, great_wrong_count, great_wrong_records
201 | 
202 | 
203 | def plot_confusion_matrix(y_trues, y_preds):
204 |     from utils import draw_tools
205 |     for i in range(y_trues.shape[0]):
206 |         valid_mask = (y_trues[i] != -1)
207 |         draw_tools.plot_confusion_matrix(y_trues[i][valid_mask], y_preds[i][valid_mask],
208 |                                          ['cls_{}'.format(i) for i in range(FLAGS.output_shapes[i])],
209 |                                          FLAGS.output_names[i], is_save=True)
210 |     return
211 | 
212 | 
213 | if __name__ == '__main__':
214 |     run()
215 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File __init__.py
 4 | @author: ZhengYuwei
 5 | 功能：
 6 |     generate_txt：用于根据图片及图片名字，生成标签文件；（当前可能不存在）
 7 |     check_label_file.py：校验标签文件中，图片路径所指的图片是否存在、可读、且不为空；
 8 |     draw_tools.py: 绘制混淆图
 9 |     logger_callback.py：训练logger的回调类；
10 | """


--------------------------------------------------------------------------------
/utils/check_label_file.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File check_label_file.py
 4 | @author:ZhengYuwei
 5 | """
 6 | import os
 7 | import cv2
 8 | 
 9 | # 检查标签文件中的图片是否存在且可以打开
10 | train_set_dir = '/home/train_set/images'
11 | train_label_path = '/home/train_set/train.txt'
12 | lines = list()
13 | with open(train_label_path, 'r') as file:
14 |     for line in file:
15 |         img_name = line.strip().split(' ')[0]
16 |         img_path = os.path.join(train_set_dir, img_name)
17 |         if os.path.isfile(img_path):
18 |             img = cv2.imread(img_path)
19 |             if img is not None:
20 |                 lines.append(line)
21 | 
22 | lines[-1] = lines[-1].strip()
23 | new_train_label_path = os.path.join(os.path.dirname(train_label_path), 'new_train.txt')
24 | with open(new_train_label_path, 'w') as file:
25 |     file.writelines(lines)
26 | 


--------------------------------------------------------------------------------
/utils/draw_tools.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File check_label_file.py
 4 | @author:ZhengYuwei
 5 | """
 6 | import os
 7 | import numpy as np
 8 | import pandas as pd
 9 | import matplotlib.pyplot as plt
10 | import seaborn as sns
11 | from sklearn.metrics import confusion_matrix
12 | 
13 | 
14 | def plot_confusion_matrix(y_true, y_pred, labels, title='Confusion matrix', is_save=False):
15 |     """ 绘制混淆矩阵
16 |     :param y_true: 正确类别标签
17 |     :param y_pred: 预测类别标签
18 |     :param labels: 类别标签列表
19 |     :param title: 图名
20 |     :param is_save: 是否保存图片
21 |     :return:
22 |     """
23 |     if labels:
24 |         y_true = [labels[int(i)] for i in y_true]
25 |         y_pred = [labels[int(i)] for i in y_pred]
26 |     # 计算混淆矩阵，y轴是true，x轴是predicted
27 |     conf_matrix = confusion_matrix(y_true, y_pred, labels=labels)
28 |     conf_matrix_pred_sum = np.sum(conf_matrix, axis=0, keepdims=True).astype(float) + 1e-7
29 |     conf_matrix_percent = conf_matrix / conf_matrix_pred_sum * 100  # 沿y轴的百分比
30 | 
31 |     annot = np.empty_like(conf_matrix).astype(str)
32 |     nrows, ncols = conf_matrix.shape
33 |     for i in range(nrows):
34 |         for j in range(ncols):
35 |             c = conf_matrix[i, j]
36 |             p = conf_matrix_percent[i, j]
37 |             if i == j:
38 |                 s = conf_matrix_pred_sum[0][i]
39 |                 # annot[i, j] = '%.2f%%\n%d/%d' % (p, c, s)
40 |                 annot[i, j] = '%.2f%%\n%d' % (p, c)
41 |             elif c == 0:
42 |                 annot[i, j] = ''
43 |             else:
44 |                 annot[i, j] = '%.2f%%\n%d' % (p, c)
45 | 
46 |     # 绘制混淆矩阵图
47 |     conf_matrix = pd.DataFrame(conf_matrix, index=labels, columns=labels, dtype='float')
48 |     fig = plt.figure(figsize=(10, 10))
49 |     ax = fig.gca()
50 |     # Oranges,Oranges_r,YlGnBu,Blues,RdBu, PuRd ...
51 |     sns.heatmap(conf_matrix, annot=annot, fmt='', ax=ax, cmap='YlGnBu',
52 |                 annot_kws={"size": 11}, linewidths=0.5)
53 |     # 设置坐标轴
54 |     ax.set_xticklabels(ax.get_xticklabels(), rotation=25, fontsize=10)
55 |     ax.xaxis.set_ticks_position('none')
56 |     ax.set_yticklabels(ax.get_yticklabels(), rotation=25, fontsize=10)
57 |     ax.yaxis.set_ticks_position('none')
58 | 
59 |     plt.title(title, size=18)
60 |     plt.xlabel('Predicted', size=16)
61 |     plt.ylabel('Actual', size=16)
62 |     plt.tight_layout()
63 |     if is_save:
64 |         plt.savefig(os.path.join('.', title+'.jpg'))
65 |     else:
66 |         plt.show()
67 | 
68 | 
69 | if __name__ == '__main__':
70 |     y_predict = np.random.randint(low=0, high=10, size=(100,))
71 |     y_truth = np.random.randint(low=0, high=10, size=(100,))
72 |     y_labels = [str(i)+'s' for i in range(10)]
73 |     plot_confusion_matrix(y_truth, y_predict, y_labels)
74 | 


--------------------------------------------------------------------------------
/utils/logger_callback.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | File logger_callback.py
 4 | @author:ZhengYuwei
 5 | """
 6 | from tensorflow import keras
 7 | 
 8 | 
 9 | class NBatchProgbarLogger(keras.callbacks.ProgbarLogger):
10 |     """ 训练过程中，每N个batch打印log到stdout的回调函数 """
11 | 
12 |     def __init__(self, count_mode='samples', stateful_metrics=None, display=1000, verbose=1):
13 |         """
14 |         :param count_mode:
15 |         :param stateful_metrics: 打印的metrics
16 |         :param display: batch数打印一次记录
17 |         :param verbose: 是否打印训练logger
18 |         """
19 |         super(NBatchProgbarLogger, self).__init__(count_mode, stateful_metrics)
20 |         self.display = display
21 |         self.display_step = 1
22 |         self.verbose = verbose
23 |         self.epochs = 0
24 | 
25 |     def on_train_begin(self, logs=None):
26 |         self.epochs = self.params['epochs']
27 | 
28 |     def on_batch_end(self, batch, logs=None):
29 |         logs = logs or {}
30 |         batch_size = logs.get('size', 0)
31 |         """
32 |         # 分布式计算时需要注意
33 |         num_steps = logs.get('num_steps', 1)
34 |         if self.use_steps:
35 |             self.seen += num_steps
36 |         else:
37 |             self.seen += batch_size * num_steps
38 |         """
39 |         self.seen += 1
40 |         self.display_step += 1
41 |         # Skip progbar update for the last batch, will be handled by on_epoch_end.
42 |         if self.verbose and self.seen < self.target and self.display_step % self.display == 0:
43 |             # 打印的metrics
44 |             for k in self.params['metrics']:
45 |                 if k in logs:
46 |                     self.log_values.append((k, logs[k]))
47 |             self.progbar.update(self.seen, self.log_values)
48 | 


--------------------------------------------------------------------------------
/utils/radam.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on 2019/8/15
  4 | File radam
  5 | @author: ZhengYuwei
  6 | """
  7 | import tensorflow as tf
  8 | from tensorflow.python.framework import ops
  9 | from tensorflow.python.ops import math_ops
 10 | from tensorflow.python.ops import state_ops
 11 | from tensorflow.python.keras import backend as K
 12 | 
 13 | 
 14 | class RAdam(tf.keras.optimizers.Optimizer):
 15 |     """RAdam optimizer.
 16 | 
 17 |     Default parameters follow those provided in the original paper.
 18 | 
 19 |     Arguments:
 20 |         lr: float >= 0. Learning rate.
 21 |         beta_1: float, 0 < beta < 1. Generally close to 1.
 22 |         beta_2: float, 0 < beta < 1. Generally close to 1.
 23 |         epsilon: float >= 0. Fuzz factor. If `None`, defaults to `K.epsilon()`.
 24 |         decay: float >= 0. Learning rate decay over each update.
 25 |         amsgrad: boolean. Whether to apply the AMSGrad variant of this
 26 |             algorithm from the paper "On the Convergence of Adam and
 27 |             Beyond".
 28 |         warmup_coef: in early training stage, RAdam will fallback to SGDM,
 29 |             and for using warmup in SGDM, will set warmup_lr = warmup_coef * lr,
 30 |             default 1.
 31 |     """
 32 | 
 33 |     def __init__(self,
 34 |                  lr=0.001,
 35 |                  beta_1=0.9,
 36 |                  beta_2=0.999,
 37 |                  epsilon=None,
 38 |                  decay=0.,
 39 |                  amsgrad=False,
 40 |                  warmup_coef=1.,
 41 |                  **kwargs):
 42 |         super(RAdam, self).__init__(**kwargs)
 43 |         with K.name_scope(self.__class__.__name__):
 44 |             self.iterations = K.variable(0, dtype='int64', name='iterations')
 45 |             self.lr = K.variable(lr, name='lr')
 46 |             self.beta_1 = K.variable(beta_1, name='beta_1')
 47 |             self.beta_2 = K.variable(beta_2, name='beta_2')
 48 |             self.decay = K.variable(decay, name='decay')
 49 |         if epsilon is None:
 50 |             epsilon = K.epsilon()
 51 |         self.epsilon = epsilon
 52 |         self.initial_decay = decay
 53 |         self.amsgrad = amsgrad
 54 |         self.warmup_coef = warmup_coef
 55 |         self.rho_inf = 2. / (1. - self.beta_2) - 1
 56 | 
 57 |     def get_updates(self, loss, params):
 58 |         grads = self.get_gradients(loss, params)
 59 |         self.updates = []
 60 | 
 61 |         lr = self.lr
 62 |         if self.initial_decay > 0:
 63 |             lr = lr * (  # pylint: disable=g-no-augmented-assignment
 64 |                     1. / (1. + self.decay * math_ops.cast(self.iterations,
 65 |                                                           K.dtype(self.decay))))
 66 | 
 67 |         with ops.control_dependencies([state_ops.assign_add(self.iterations, 1)]):
 68 |             t = math_ops.cast(self.iterations, K.floatx())
 69 | 
 70 |         ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
 71 |         vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
 72 |         if self.amsgrad:
 73 |             vhats = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
 74 |         else:
 75 |             vhats = [K.zeros(1) for _ in params]
 76 |         self.weights = [self.iterations] + ms + vs + vhats
 77 | 
 78 |         beta_1_power = math_ops.pow(self.beta_1, t)
 79 |         beta_2_power = math_ops.pow(self.beta_2, t)
 80 |         rho_t = self.rho_inf - 2.0 * t * beta_2_power / (1.0 - beta_2_power)
 81 | 
 82 |         lr_t = tf.where(rho_t >= 5.0,
 83 |                         K.sqrt((rho_t - 4.) * (rho_t - 2.) * self.rho_inf /
 84 |                                ((self.rho_inf - 4.) * (self.rho_inf - 2.) * rho_t)) *
 85 |                         lr * (K.sqrt(1. - beta_2_power) / (1. - beta_1_power)),
 86 |                         self.warmup_coef * lr / (1. - beta_1_power))
 87 | 
 88 |         for p, g, m, v, vhat in zip(params, grads, ms, vs, vhats):
 89 |             m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
 90 |             v_t = (self.beta_2 * v) + (1. - self.beta_2) * math_ops.square(g)
 91 | 
 92 |             if self.amsgrad:
 93 |                 vhat_t = math_ops.maximum(vhat, v_t)
 94 |                 p_t = p - lr_t * tf.where(rho_t >= 5.0, m_t / (K.sqrt(vhat_t) + self.epsilon), m_t)
 95 |                 self.updates.append(state_ops.assign(vhat, vhat_t))
 96 |             else:
 97 |                 p_t = p - lr_t * tf.where(rho_t >= 5.0, m_t / (K.sqrt(v_t) + self.epsilon), m_t)
 98 | 
 99 |             self.updates.append(state_ops.assign(m, m_t))
100 |             self.updates.append(state_ops.assign(v, v_t))
101 |             new_p = p_t
102 | 
103 |             # Apply constraints.
104 |             if getattr(p, 'constraint', None) is not None:
105 |                 new_p = p.constraint(new_p)
106 | 
107 |             self.updates.append(state_ops.assign(p, new_p))
108 |         return self.updates
109 | 
110 |     def get_config(self):
111 |         config = {
112 |             'lr': float(K.get_value(self.lr)),
113 |             'beta_1': float(K.get_value(self.beta_1)),
114 |             'beta_2': float(K.get_value(self.beta_2)),
115 |             'decay': float(K.get_value(self.decay)),
116 |             'epsilon': self.epsilon,
117 |             'amsgrad': self.amsgrad
118 |         }
119 |         base_config = super(RAdam, self).get_config()
120 |         return dict(list(base_config.items()) + list(config.items()))
121 | 


--------------------------------------------------------------------------------