├── .gitignore ├── A_learning_notes ├── .ipynb_checkpoints │ ├── Q&A-checkpoint.ipynb │ └── generate_process-checkpoint.ipynb ├── Q&A.ipynb └── generate_process.ipynb ├── LICENSE ├── README.md ├── backbone ├── __init__.py ├── basic_backbone.py ├── mixnet18.py ├── mobilenet_v2.py ├── resnet18.py ├── resnet18_v2.py └── resnext.py ├── configs.py ├── dataset ├── __init__.py ├── dataset_util.py ├── file_util.py ├── test_sample │ ├── label.txt │ └── 鲁BC6T76.jpg └── tfrecord_util.py ├── images ├── GHM-insight.jpg ├── focal-loss.jpg ├── mixnet-18.svg ├── mobilenet-v2.svg ├── resnet-18-v2.svg ├── resnet-18.svg └── resnext-18.svg ├── multi_label ├── __init__.py ├── multi_label_loss.py ├── multi_label_model.py └── trainer.py ├── requirements.txt ├── run.py └── utils ├── __init__.py ├── check_label_file.py ├── draw_tools.py ├── logger_callback.py └── radam.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | **/*.py[cod] 3 | logs/ 4 | models/ -------------------------------------------------------------------------------- /A_learning_notes/.ipynb_checkpoints/Q&A-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true, 7 | "pycharm": { 8 | "name": "#%% md\n" 9 | } 10 | }, 11 | "source": [ 12 | "### Q. 在编译模型阶段,定义了多输出loss函数和权重,但在训练阶段,打印的loss却不等于各个loss的加权和\n", 13 | "```\n", 14 | "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n", 15 | "model.fit(dataset, epochs=2, steps_per_epoch=2, verbose=1)\n", 16 | "```\n", 17 | "输出:loss(21.9610) != 0.5 * 1.3583 - 0.5 * 1.5867\n", 18 | "```\n", 19 | "Epoch 1/2\n", 20 | "1/2 [==============>...............] - ETA: 0s - loss: 21.9610 - dense_4_loss: 1.3583 - dense_5_loss: 1.5867\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n", 21 | "2/2 [==============================] - 0s 140ms/step - loss: 22.1960 - dense_4_loss: 1.9502 - dense_5_loss: 1.5183\n", 22 | "Epoch 2/2\n", 23 | "1/2 [==============>...............] - ETA: 0s - loss: 21.8526 - dense_4_loss: 1.3555 - dense_5_loss: 1.5861\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n", 24 | "2/2 [==============================] - 0s 1ms/step - loss: 22.0872 - dense_4_loss: 1.9470 - dense_5_loss: 1.5171\n", 25 | "```\n", 26 | "\n", 27 | "\n", 28 | "**Answer**: 因为总的loss中包含了权重正则化损失部分:\n", 29 | "```\n", 30 | "def build_net(input_tensor):\n", 31 | " out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 32 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 33 | " out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 34 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 35 | " return [out1, out2]\n", 36 | "```\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "---" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### Q. 在将`.ckpt.index` + `.ckpt.data` 模型转为`pb`的时候,为什么还要先保存为`h5`,然后再加载模型,再保存为`pb`?\n", 51 | "\n", 52 | "\n", 53 | "**Answer**: 因为原来保存为`.ckpt.index` + `.ckpt.data` 的时候没有保存图信息,加载也只加载权重信息:\n", 54 | "```\n", 55 | "model.load_weights(latest)\n", 56 | "...\n", 57 | "cp_callback = ModelCheckpoint(path, save_weights_only=True, period=ckpt_period)\n", 58 | "```\n", 59 | "导致`keras.backend.get_session().graph.as_graph_def()`没有图结构信息。\n", 60 | "(理论上我是构建了网络图模型,然后再加载权重的,所以应该也得有图结构信息,但实际上没有)\n", 61 | "所以需要将模型完全保存为`h5`(包含图信息),然后重新加载进来,再保存为`pb`:\n", 62 | "```\n", 63 | "model.save(h5_path, overwrite=True, include_optimizer=False)\n", 64 | "model = keras.models.load_model(h5_path)\n", 65 | "...\n", 66 | "graph = tf.graph_util.remove_training_nodes(sess.graph.as_graph_def())\n", 67 | "graph_frozen = tf.graph_util.convert_variables_to_constants(sess, graph, output_names)\n", 68 | "tf.train.write_graph(graph_frozen, pb_model_dir, pb_model_name, as_text=False)\n", 69 | "```\n" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "---" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "\n", 84 | "### Q. 一直没办法用多GPU的模式运行?\n", 85 | "\n", 86 | "\n", 87 | "**Answer**: `tf.enable_eager_execution()`模型跑不了多GPU,要注释掉这句。\n" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "---" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "\n", 102 | "### Q. Jupyter Notebook运行tf.keras.Model对象训练、预测,在model.predict()时报错`CancelledError: [Op:StatefulPartitionedCall]`?\n", 103 | "\n", 104 | "**Answer**:不清楚为什么,但是如果选择 `Kernel -> Restart & Run All` 则能得到正确的结果。\n" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "---" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "### Q. TensorFlow自定义层有bug太难调了!!!\n", 119 | "\n", 120 | "\n", 121 | "**Answer**:目前我调试就只有几个操作:\n", 122 | "- tf.print() + with tf.control_dependencies(),打印信息;\n", 123 | "- with tf.name_scope('name'),给操作加名称,定位到出错的局部操作;\n", 124 | "- tf_debug.has_inf_or_nan,或自定义只有inf/nan,查看出现inf和nan的位置;\n", 125 | "- with tf.GradientTape(persistent=persistent) as tape,详细查看梯度。" 126 | ] 127 | } 128 | ], 129 | "metadata": { 130 | "kernelspec": { 131 | "display_name": "Python 3", 132 | "language": "python", 133 | "name": "python3" 134 | }, 135 | "language_info": { 136 | "codemirror_mode": { 137 | "name": "ipython", 138 | "version": 3 139 | }, 140 | "file_extension": ".py", 141 | "mimetype": "text/x-python", 142 | "name": "python", 143 | "nbconvert_exporter": "python", 144 | "pygments_lexer": "ipython3", 145 | "version": "3.6.5" 146 | }, 147 | "pycharm": { 148 | "stem_cell": { 149 | "cell_type": "raw", 150 | "metadata": { 151 | "collapsed": false 152 | }, 153 | "source": [] 154 | } 155 | }, 156 | "toc": { 157 | "base_numbering": 1, 158 | "nav_menu": {}, 159 | "number_sections": true, 160 | "sideBar": true, 161 | "skip_h1_title": false, 162 | "title_cell": "Table of Contents", 163 | "title_sidebar": "Contents", 164 | "toc_cell": false, 165 | "toc_position": {}, 166 | "toc_section_display": true, 167 | "toc_window_display": false 168 | } 169 | }, 170 | "nbformat": 4, 171 | "nbformat_minor": 1 172 | } 173 | -------------------------------------------------------------------------------- /A_learning_notes/.ipynb_checkpoints/generate_process-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "pycharm": { 7 | "name": "#%% md\n" 8 | } 9 | }, 10 | "source": [ 11 | "## 本文主要记录多标签多分类模型的实现过程\n", 12 | "\n", 13 | "### 整体流程\n", 14 | "1. 依据数据格式,实现“数据读取”功能;(单元测试)\n", 15 | "2. 基础主干网络ResNet-18实现;\n", 16 | "3. 实现多标签多分类head,形成整体模型;(与2联合测试,绘制网络)\n", 17 | "4. 多标签多分类模型损失函数实现;\n", 18 | "5. 边边角角:配置与训练脚本、测试脚本、预测脚本,等等;(整体测试)\n", 19 | "6. 进阶修改:损失函数修改,主干网络修改,等等。(整体测试)" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### 构造训练数据集" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "name": "stdout", 36 | "output_type": "stream", 37 | "text": [ 38 | "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\data\\ops\\iterator_ops.py:532: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", 39 | "Instructions for updating:\n", 40 | "Colocations handled automatically by placer.\n", 41 | "================================= 0 =======================================\n", 42 | "input is: \n", 43 | " tf.Tensor(\n", 44 | "[[-0.54730872 0.26720298]\n", 45 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 46 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 47 | "output is: \n", 48 | " tf.Tensor(\n", 49 | "[[1.89145406]\n", 50 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 51 | " tf.Tensor(\n", 52 | "[[-0.21285691]\n", 53 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n", 54 | "================================= 1 =======================================\n", 55 | "input is: \n", 56 | " tf.Tensor(\n", 57 | "[[ 1.00501827 -0.83485065]\n", 58 | " [ 1.67905237 1.30604547]], shape=(2, 2), dtype=float64)\n", 59 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 60 | "output is: \n", 61 | " tf.Tensor(\n", 62 | "[[-1.10457522]\n", 63 | " [ 0.64685953]], shape=(2, 1), dtype=float64) \n", 64 | " tf.Tensor(\n", 65 | "[[-0.47960561]\n", 66 | " [-0.93504079]], shape=(2, 1), dtype=float64)\n", 67 | "================================= 2 =======================================\n", 68 | "input is: \n", 69 | " tf.Tensor(\n", 70 | "[[-0.54730872 0.26720298]\n", 71 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 72 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 73 | "output is: \n", 74 | " tf.Tensor(\n", 75 | "[[1.89145406]\n", 76 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 77 | " tf.Tensor(\n", 78 | "[[-0.21285691]\n", 79 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "import tensorflow as tf\n", 85 | "import numpy as np\n", 86 | "tf.enable_eager_execution()\n", 87 | "input = np.random.normal(0, 1, [4, 2])\n", 88 | "out_1 = np.random.normal(0, 1, [4, 1])\n", 89 | "out_2 = np.random.normal(0, 1, [4, 1])\n", 90 | "dataset = tf.data.Dataset.from_tensor_slices((input, (out_1, out_2)))\n", 91 | "dataset = dataset.repeat().batch(2).prefetch(buffer_size=4)\n", 92 | "\n", 93 | "# test\n", 94 | "for i, data in enumerate(dataset):\n", 95 | " # (input, (out_1, out_2))\n", 96 | " print('================================= {} ======================================='.format(i))\n", 97 | " print('input is: \\n', data[0])\n", 98 | " print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n", 99 | " print('output is: \\n', data[1][0], '\\n', data[1][1])\n", 100 | " if i >= 2:\n", 101 | " break" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": { 107 | "pycharm": { 108 | "name": "#%% md\n" 109 | } 110 | }, 111 | "source": [ 112 | "### 建立keras模型\n", 113 | "1. 定义骨干网络;\n", 114 | "1. 实现多标签多分类head,形成整体模型;" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 2, 120 | "metadata": { 121 | "pycharm": { 122 | "is_executing": false, 123 | "name": "#%%\n" 124 | } 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "from tensorflow import keras\n", 129 | "\n", 130 | "\n", 131 | "def build_net(input_tensor):\n", 132 | " out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 133 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 134 | " out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 135 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 136 | " return [out1, out2]\n", 137 | "\n", 138 | "\n", 139 | "feature_input = keras.layers.Input(shape=(2,), name='feature_input')\n", 140 | "outputs = build_net(feature_input)\n", 141 | "model = keras.models.Model(feature_input, outputs)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "pycharm": { 148 | "name": "#%% md\n" 149 | } 150 | }, 151 | "source": [ 152 | "### 定义loss函数" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 3, 158 | "metadata": { 159 | "pycharm": { 160 | "is_executing": false, 161 | "name": "#%%\n" 162 | } 163 | }, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "__________________________________________________________________________________________________\n", 170 | "Layer (type) Output Shape Param # Connected to \n", 171 | "==================================================================================================\n", 172 | "feature_input (InputLayer) (None, 2) 0 \n", 173 | "__________________________________________________________________________________________________\n", 174 | "dense (Dense) (None, 1) 3 feature_input[0][0] \n", 175 | "__________________________________________________________________________________________________\n", 176 | "dense_1 (Dense) (None, 1) 3 feature_input[0][0] \n", 177 | "==================================================================================================\n", 178 | "Total params: 6\n", 179 | "Trainable params: 6\n", 180 | "Non-trainable params: 0\n", 181 | "__________________________________________________________________________________________________\n" 182 | ] 183 | } 184 | ], 185 | "source": [ 186 | "import tensorflow as tf\n", 187 | "\n", 188 | "\n", 189 | "def my_loss(y_dummy, pred):\n", 190 | " loss = tf.keras.losses.mean_absolute_error(y_dummy, pred)\n", 191 | " return loss\n", 192 | "\n", 193 | "\n", 194 | "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n", 195 | "model.summary()" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### 训练与测试" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 4, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "Epoch 1/5\n", 215 | "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\ops\\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", 216 | "Instructions for updating:\n", 217 | "Use tf.cast instead.\n", 218 | "2/2 [==============================] - 0s 213ms/step - loss: 9.9042 - dense_loss: 1.0547 - dense_1_loss: 0.8552\n", 219 | "Epoch 2/5\n", 220 | "2/2 [==============================] - 0s 2ms/step - loss: 9.8423 - dense_loss: 1.0522 - dense_1_loss: 0.8527\n", 221 | "Epoch 3/5\n", 222 | "2/2 [==============================] - 0s 2ms/step - loss: 9.7809 - dense_loss: 1.0497 - dense_1_loss: 0.8506\n", 223 | "Epoch 4/5\n", 224 | "2/2 [==============================] - 0s 2ms/step - loss: 9.7199 - dense_loss: 1.0473 - dense_1_loss: 0.8484\n", 225 | "Epoch 5/5\n", 226 | "2/2 [==============================] - 0s 2ms/step - loss: 9.6592 - dense_loss: 1.0448 - dense_1_loss: 0.8463\n", 227 | "================================= 0 =======================================\n", 228 | "input is: \n", 229 | " tf.Tensor(\n", 230 | "[[-0.54730872 0.26720298]\n", 231 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 232 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 233 | "output is: \n", 234 | " tf.Tensor(\n", 235 | "[[1.89145406]\n", 236 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 237 | " tf.Tensor(\n", 238 | "[[-0.21285691]\n", 239 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n", 240 | "predictions is: \n", 241 | " [[-0.1156919]\n", 242 | " [-0.1786184]] \n", 243 | " [[-0.41703436]\n", 244 | " [-0.56524646]]\n", 245 | "================================= 1 =======================================\n", 246 | "input is: \n", 247 | " tf.Tensor(\n", 248 | "[[ 1.00501827 -0.83485065]\n", 249 | " [ 1.67905237 1.30604547]], shape=(2, 2), dtype=float64)\n", 250 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 251 | "output is: \n", 252 | " tf.Tensor(\n", 253 | "[[-1.10457522]\n", 254 | " [ 0.64685953]], shape=(2, 1), dtype=float64) \n", 255 | " tf.Tensor(\n", 256 | "[[-0.47960561]\n", 257 | " [-0.93504079]], shape=(2, 1), dtype=float64)\n", 258 | "predictions is: \n", 259 | " [[0.2573548 ]\n", 260 | " [0.23849788]] \n", 261 | " [[ 1.0578033 ]\n", 262 | " [-0.49074632]]\n", 263 | "================================= 2 =======================================\n", 264 | "input is: \n", 265 | " tf.Tensor(\n", 266 | "[[-0.54730872 0.26720298]\n", 267 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 268 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 269 | "output is: \n", 270 | " tf.Tensor(\n", 271 | "[[1.89145406]\n", 272 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 273 | " tf.Tensor(\n", 274 | "[[-0.21285691]\n", 275 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n", 276 | "predictions is: \n", 277 | " [[-0.1156919]\n", 278 | " [-0.1786184]] \n", 279 | " [[-0.41703436]\n", 280 | " [-0.56524646]]\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | "# 训练\n", 286 | "model.fit(dataset, epochs=5, steps_per_epoch=2, verbose=1)\n", 287 | "\n", 288 | "# 测试\n", 289 | "for i, data in enumerate(dataset):\n", 290 | " print('================================= {} ======================================='.format(i))\n", 291 | " print('input is: \\n', data[0])\n", 292 | " print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n", 293 | " print('output is: \\n', data[1][0], '\\n', data[1][1])\n", 294 | " predictions = model.predict(np.array(data[0]))\n", 295 | " print('predictions is: \\n', predictions[0], '\\n', predictions[1])\n", 296 | " if i >= 2:\n", 297 | " break" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "更细致的debug(查看梯度、打印操作等),可看详细查看本工程。\n" 305 | ] 306 | } 307 | ], 308 | "metadata": { 309 | "kernelspec": { 310 | "display_name": "tf13", 311 | "language": "python", 312 | "name": "tf13" 313 | }, 314 | "language_info": { 315 | "codemirror_mode": { 316 | "name": "ipython", 317 | "version": 3 318 | }, 319 | "file_extension": ".py", 320 | "mimetype": "text/x-python", 321 | "name": "python", 322 | "nbconvert_exporter": "python", 323 | "pygments_lexer": "ipython3", 324 | "version": "3.6.9" 325 | }, 326 | "pycharm": { 327 | "stem_cell": { 328 | "cell_type": "raw", 329 | "metadata": { 330 | "collapsed": false 331 | }, 332 | "source": [] 333 | } 334 | }, 335 | "toc": { 336 | "base_numbering": 1, 337 | "nav_menu": {}, 338 | "number_sections": true, 339 | "sideBar": true, 340 | "skip_h1_title": false, 341 | "title_cell": "Table of Contents", 342 | "title_sidebar": "Contents", 343 | "toc_cell": false, 344 | "toc_position": {}, 345 | "toc_section_display": true, 346 | "toc_window_display": false 347 | } 348 | }, 349 | "nbformat": 4, 350 | "nbformat_minor": 1 351 | } 352 | -------------------------------------------------------------------------------- /A_learning_notes/Q&A.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true, 7 | "pycharm": { 8 | "name": "#%% md\n" 9 | } 10 | }, 11 | "source": [ 12 | "### Q. 在编译模型阶段,定义了多输出loss函数和权重,但在训练阶段,打印的loss却不等于各个loss的加权和\n", 13 | "```\n", 14 | "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n", 15 | "model.fit(dataset, epochs=2, steps_per_epoch=2, verbose=1)\n", 16 | "```\n", 17 | "输出:loss(21.9610) != 0.5 * 1.3583 - 0.5 * 1.5867\n", 18 | "```\n", 19 | "Epoch 1/2\n", 20 | "1/2 [==============>...............] - ETA: 0s - loss: 21.9610 - dense_4_loss: 1.3583 - dense_5_loss: 1.5867\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n", 21 | "2/2 [==============================] - 0s 140ms/step - loss: 22.1960 - dense_4_loss: 1.9502 - dense_5_loss: 1.5183\n", 22 | "Epoch 2/2\n", 23 | "1/2 [==============>...............] - ETA: 0s - loss: 21.8526 - dense_4_loss: 1.3555 - dense_5_loss: 1.5861\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\n", 24 | "2/2 [==============================] - 0s 1ms/step - loss: 22.0872 - dense_4_loss: 1.9470 - dense_5_loss: 1.5171\n", 25 | "```\n", 26 | "\n", 27 | "\n", 28 | "**Answer**: 因为总的loss中包含了权重正则化损失部分:\n", 29 | "```\n", 30 | "def build_net(input_tensor):\n", 31 | " out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 32 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 33 | " out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 34 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 35 | " return [out1, out2]\n", 36 | "```\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "---" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### Q. 在将`.ckpt.index` + `.ckpt.data` 模型转为`pb`的时候,为什么还要先保存为`h5`,然后再加载模型,再保存为`pb`?\n", 51 | "\n", 52 | "\n", 53 | "**Answer**: 因为原来保存为`.ckpt.index` + `.ckpt.data` 的时候没有保存图信息,加载也只加载权重信息:\n", 54 | "```\n", 55 | "model.load_weights(latest)\n", 56 | "...\n", 57 | "cp_callback = ModelCheckpoint(path, save_weights_only=True, period=ckpt_period)\n", 58 | "```\n", 59 | "导致`keras.backend.get_session().graph.as_graph_def()`没有图结构信息。\n", 60 | "(理论上我是构建了网络图模型,然后再加载权重的,所以应该也得有图结构信息,但实际上没有)\n", 61 | "所以需要将模型完全保存为`h5`(包含图信息),然后重新加载进来,再保存为`pb`:\n", 62 | "```\n", 63 | "model.save(h5_path, overwrite=True, include_optimizer=False)\n", 64 | "model = keras.models.load_model(h5_path)\n", 65 | "...\n", 66 | "graph = tf.graph_util.remove_training_nodes(sess.graph.as_graph_def())\n", 67 | "graph_frozen = tf.graph_util.convert_variables_to_constants(sess, graph, output_names)\n", 68 | "tf.train.write_graph(graph_frozen, pb_model_dir, pb_model_name, as_text=False)\n", 69 | "```\n" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "---" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "\n", 84 | "### Q. 一直没办法用多GPU的模式运行?\n", 85 | "\n", 86 | "\n", 87 | "**Answer**: `tf.enable_eager_execution()`模型跑不了多GPU,要注释掉这句。\n" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "---" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "\n", 102 | "### Q. Jupyter Notebook运行tf.keras.Model对象训练、预测,在model.predict()时报错`CancelledError: [Op:StatefulPartitionedCall]`?\n", 103 | "\n", 104 | "**Answer**:不清楚为什么,但是如果选择 `Kernel -> Restart & Run All` 则能得到正确的结果。\n" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "---" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "### Q. TensorFlow自定义层有bug太难调了!!!\n", 119 | "\n", 120 | "\n", 121 | "**Answer**:目前我调试就只有几个操作:\n", 122 | "- tf.print() + with tf.control_dependencies(),打印信息;\n", 123 | "- with tf.name_scope('name'),给操作加名称,定位到出错的局部操作;\n", 124 | "- tf_debug.has_inf_or_nan,或自定义只有inf/nan,查看出现inf和nan的位置;\n", 125 | "- with tf.GradientTape(persistent=persistent) as tape,详细查看梯度。" 126 | ] 127 | } 128 | ], 129 | "metadata": { 130 | "kernelspec": { 131 | "display_name": "Python 3", 132 | "language": "python", 133 | "name": "python3" 134 | }, 135 | "language_info": { 136 | "codemirror_mode": { 137 | "name": "ipython", 138 | "version": 3 139 | }, 140 | "file_extension": ".py", 141 | "mimetype": "text/x-python", 142 | "name": "python", 143 | "nbconvert_exporter": "python", 144 | "pygments_lexer": "ipython3", 145 | "version": "3.6.5" 146 | }, 147 | "pycharm": { 148 | "stem_cell": { 149 | "cell_type": "raw", 150 | "metadata": { 151 | "collapsed": false 152 | }, 153 | "source": [] 154 | } 155 | }, 156 | "toc": { 157 | "base_numbering": 1, 158 | "nav_menu": {}, 159 | "number_sections": true, 160 | "sideBar": true, 161 | "skip_h1_title": false, 162 | "title_cell": "Table of Contents", 163 | "title_sidebar": "Contents", 164 | "toc_cell": false, 165 | "toc_position": {}, 166 | "toc_section_display": true, 167 | "toc_window_display": false 168 | } 169 | }, 170 | "nbformat": 4, 171 | "nbformat_minor": 1 172 | } 173 | -------------------------------------------------------------------------------- /A_learning_notes/generate_process.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "pycharm": { 7 | "name": "#%% md\n" 8 | } 9 | }, 10 | "source": [ 11 | "## 本文主要记录多标签多分类模型的实现过程\n", 12 | "\n", 13 | "### 整体流程\n", 14 | "1. 依据数据格式,实现“数据读取”功能;(单元测试)\n", 15 | "2. 基础主干网络ResNet-18实现;\n", 16 | "3. 实现多标签多分类head,形成整体模型;(与2联合测试,绘制网络)\n", 17 | "4. 多标签多分类模型损失函数实现;\n", 18 | "5. 边边角角:配置与训练脚本、测试脚本、预测脚本,等等;(整体测试)\n", 19 | "6. 进阶修改:损失函数修改,主干网络修改,等等。(整体测试)" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### 构造训练数据集" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "name": "stdout", 36 | "output_type": "stream", 37 | "text": [ 38 | "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\data\\ops\\iterator_ops.py:532: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", 39 | "Instructions for updating:\n", 40 | "Colocations handled automatically by placer.\n", 41 | "================================= 0 =======================================\n", 42 | "input is: \n", 43 | " tf.Tensor(\n", 44 | "[[-0.54730872 0.26720298]\n", 45 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 46 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 47 | "output is: \n", 48 | " tf.Tensor(\n", 49 | "[[1.89145406]\n", 50 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 51 | " tf.Tensor(\n", 52 | "[[-0.21285691]\n", 53 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n", 54 | "================================= 1 =======================================\n", 55 | "input is: \n", 56 | " tf.Tensor(\n", 57 | "[[ 1.00501827 -0.83485065]\n", 58 | " [ 1.67905237 1.30604547]], shape=(2, 2), dtype=float64)\n", 59 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 60 | "output is: \n", 61 | " tf.Tensor(\n", 62 | "[[-1.10457522]\n", 63 | " [ 0.64685953]], shape=(2, 1), dtype=float64) \n", 64 | " tf.Tensor(\n", 65 | "[[-0.47960561]\n", 66 | " [-0.93504079]], shape=(2, 1), dtype=float64)\n", 67 | "================================= 2 =======================================\n", 68 | "input is: \n", 69 | " tf.Tensor(\n", 70 | "[[-0.54730872 0.26720298]\n", 71 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 72 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 73 | "output is: \n", 74 | " tf.Tensor(\n", 75 | "[[1.89145406]\n", 76 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 77 | " tf.Tensor(\n", 78 | "[[-0.21285691]\n", 79 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "import tensorflow as tf\n", 85 | "import numpy as np\n", 86 | "tf.enable_eager_execution()\n", 87 | "input = np.random.normal(0, 1, [4, 2])\n", 88 | "out_1 = np.random.normal(0, 1, [4, 1])\n", 89 | "out_2 = np.random.normal(0, 1, [4, 1])\n", 90 | "dataset = tf.data.Dataset.from_tensor_slices((input, (out_1, out_2)))\n", 91 | "dataset = dataset.repeat().batch(2).prefetch(buffer_size=4)\n", 92 | "\n", 93 | "# test\n", 94 | "for i, data in enumerate(dataset):\n", 95 | " # (input, (out_1, out_2))\n", 96 | " print('================================= {} ======================================='.format(i))\n", 97 | " print('input is: \\n', data[0])\n", 98 | " print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n", 99 | " print('output is: \\n', data[1][0], '\\n', data[1][1])\n", 100 | " if i >= 2:\n", 101 | " break" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": { 107 | "pycharm": { 108 | "name": "#%% md\n" 109 | } 110 | }, 111 | "source": [ 112 | "### 建立keras模型\n", 113 | "1. 定义骨干网络;\n", 114 | "1. 实现多标签多分类head,形成整体模型;" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 2, 120 | "metadata": { 121 | "pycharm": { 122 | "is_executing": false, 123 | "name": "#%%\n" 124 | } 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "from tensorflow import keras\n", 129 | "\n", 130 | "\n", 131 | "def build_net(input_tensor):\n", 132 | " out1 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 133 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 134 | " out2 = keras.layers.Dense(1, kernel_initializer='glorot_normal', activation='linear',\n", 135 | " kernel_regularizer=keras.regularizers.l2(10))(input_tensor)\n", 136 | " return [out1, out2]\n", 137 | "\n", 138 | "\n", 139 | "feature_input = keras.layers.Input(shape=(2,), name='feature_input')\n", 140 | "outputs = build_net(feature_input)\n", 141 | "model = keras.models.Model(feature_input, outputs)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "pycharm": { 148 | "name": "#%% md\n" 149 | } 150 | }, 151 | "source": [ 152 | "### 定义loss函数" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 3, 158 | "metadata": { 159 | "pycharm": { 160 | "is_executing": false, 161 | "name": "#%%\n" 162 | } 163 | }, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "__________________________________________________________________________________________________\n", 170 | "Layer (type) Output Shape Param # Connected to \n", 171 | "==================================================================================================\n", 172 | "feature_input (InputLayer) (None, 2) 0 \n", 173 | "__________________________________________________________________________________________________\n", 174 | "dense (Dense) (None, 1) 3 feature_input[0][0] \n", 175 | "__________________________________________________________________________________________________\n", 176 | "dense_1 (Dense) (None, 1) 3 feature_input[0][0] \n", 177 | "==================================================================================================\n", 178 | "Total params: 6\n", 179 | "Trainable params: 6\n", 180 | "Non-trainable params: 0\n", 181 | "__________________________________________________________________________________________________\n" 182 | ] 183 | } 184 | ], 185 | "source": [ 186 | "import tensorflow as tf\n", 187 | "\n", 188 | "\n", 189 | "def my_loss(y_dummy, pred):\n", 190 | " loss = tf.keras.losses.mean_absolute_error(y_dummy, pred)\n", 191 | " return loss\n", 192 | "\n", 193 | "\n", 194 | "model.compile(loss=my_loss, optimizer='adam', loss_weights=[0.5, 0.5])\n", 195 | "model.summary()" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### 训练与测试" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 4, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "Epoch 1/5\n", 215 | "WARNING:tensorflow:From D:\\Software\\Anaconda\\install\\Anaconda3\\envs\\tf13\\lib\\site-packages\\tensorflow\\python\\ops\\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", 216 | "Instructions for updating:\n", 217 | "Use tf.cast instead.\n", 218 | "2/2 [==============================] - 0s 213ms/step - loss: 9.9042 - dense_loss: 1.0547 - dense_1_loss: 0.8552\n", 219 | "Epoch 2/5\n", 220 | "2/2 [==============================] - 0s 2ms/step - loss: 9.8423 - dense_loss: 1.0522 - dense_1_loss: 0.8527\n", 221 | "Epoch 3/5\n", 222 | "2/2 [==============================] - 0s 2ms/step - loss: 9.7809 - dense_loss: 1.0497 - dense_1_loss: 0.8506\n", 223 | "Epoch 4/5\n", 224 | "2/2 [==============================] - 0s 2ms/step - loss: 9.7199 - dense_loss: 1.0473 - dense_1_loss: 0.8484\n", 225 | "Epoch 5/5\n", 226 | "2/2 [==============================] - 0s 2ms/step - loss: 9.6592 - dense_loss: 1.0448 - dense_1_loss: 0.8463\n", 227 | "================================= 0 =======================================\n", 228 | "input is: \n", 229 | " tf.Tensor(\n", 230 | "[[-0.54730872 0.26720298]\n", 231 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 232 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 233 | "output is: \n", 234 | " tf.Tensor(\n", 235 | "[[1.89145406]\n", 236 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 237 | " tf.Tensor(\n", 238 | "[[-0.21285691]\n", 239 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n", 240 | "predictions is: \n", 241 | " [[-0.1156919]\n", 242 | " [-0.1786184]] \n", 243 | " [[-0.41703436]\n", 244 | " [-0.56524646]]\n", 245 | "================================= 1 =======================================\n", 246 | "input is: \n", 247 | " tf.Tensor(\n", 248 | "[[ 1.00501827 -0.83485065]\n", 249 | " [ 1.67905237 1.30604547]], shape=(2, 2), dtype=float64)\n", 250 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 251 | "output is: \n", 252 | " tf.Tensor(\n", 253 | "[[-1.10457522]\n", 254 | " [ 0.64685953]], shape=(2, 1), dtype=float64) \n", 255 | " tf.Tensor(\n", 256 | "[[-0.47960561]\n", 257 | " [-0.93504079]], shape=(2, 1), dtype=float64)\n", 258 | "predictions is: \n", 259 | " [[0.2573548 ]\n", 260 | " [0.23849788]] \n", 261 | " [[ 1.0578033 ]\n", 262 | " [-0.49074632]]\n", 263 | "================================= 2 =======================================\n", 264 | "input is: \n", 265 | " tf.Tensor(\n", 266 | "[[-0.54730872 0.26720298]\n", 267 | " [-0.86050071 0.31083289]], shape=(2, 2), dtype=float64)\n", 268 | "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", 269 | "output is: \n", 270 | " tf.Tensor(\n", 271 | "[[1.89145406]\n", 272 | " [0.21500577]], shape=(2, 1), dtype=float64) \n", 273 | " tf.Tensor(\n", 274 | "[[-0.21285691]\n", 275 | " [ 0.6277284 ]], shape=(2, 1), dtype=float64)\n", 276 | "predictions is: \n", 277 | " [[-0.1156919]\n", 278 | " [-0.1786184]] \n", 279 | " [[-0.41703436]\n", 280 | " [-0.56524646]]\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | "# 训练\n", 286 | "model.fit(dataset, epochs=5, steps_per_epoch=2, verbose=1)\n", 287 | "\n", 288 | "# 测试\n", 289 | "for i, data in enumerate(dataset):\n", 290 | " print('================================= {} ======================================='.format(i))\n", 291 | " print('input is: \\n', data[0])\n", 292 | " print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')\n", 293 | " print('output is: \\n', data[1][0], '\\n', data[1][1])\n", 294 | " predictions = model.predict(np.array(data[0]))\n", 295 | " print('predictions is: \\n', predictions[0], '\\n', predictions[1])\n", 296 | " if i >= 2:\n", 297 | " break" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "更细致的debug(查看梯度、打印操作等),可看详细查看本工程。\n" 305 | ] 306 | } 307 | ], 308 | "metadata": { 309 | "kernelspec": { 310 | "display_name": "tf13", 311 | "language": "python", 312 | "name": "tf13" 313 | }, 314 | "language_info": { 315 | "codemirror_mode": { 316 | "name": "ipython", 317 | "version": 3 318 | }, 319 | "file_extension": ".py", 320 | "mimetype": "text/x-python", 321 | "name": "python", 322 | "nbconvert_exporter": "python", 323 | "pygments_lexer": "ipython3", 324 | "version": "3.6.9" 325 | }, 326 | "pycharm": { 327 | "stem_cell": { 328 | "cell_type": "raw", 329 | "metadata": { 330 | "collapsed": false 331 | }, 332 | "source": [] 333 | } 334 | }, 335 | "toc": { 336 | "base_numbering": 1.0, 337 | "nav_menu": {}, 338 | "number_sections": true, 339 | "sideBar": true, 340 | "skip_h1_title": false, 341 | "title_cell": "Table of Contents", 342 | "title_sidebar": "Contents", 343 | "toc_cell": false, 344 | "toc_position": {}, 345 | "toc_section_display": true, 346 | "toc_window_display": false 347 | } 348 | }, 349 | "nbformat": 4, 350 | "nbformat_minor": 1 351 | } 352 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 郑煜伟 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # multi-label-classification 2 | 3 | 基于tf.keras,实现多标签分类CNN模型。 4 | 5 | ## 如何使用 6 | 7 | ### 快速上手 8 | 1. run.py同目录下新建 `logs`文件夹,存放日志文件;训练完毕会出现`models`文件夹,存放模型; 9 | 2. 查看`configs.py`并进行修改,此为参数配置文件; 10 | 3. 实际用自己的数据训练时,可能需要执行以下`utils/check_label_file.py`,确保标签文件中的图片真实可用; 11 | 4. 执行`python run.py`,会根据配置文件`configs.py`进行训练/测试/模型转换等。 12 | 13 | ### 学习掌握 14 | 1. 先看`README.md`; 15 | 2. 再看`1_learning_note`下的note; 16 | 3. 看`multi_label`下的`trainer.py`里的`__init__`函数,把整体模型串起来; 17 | 4. 看`run.py`文件,结合着看`configs.py`。 18 | 19 | ## 目录结构 20 | 21 | - `A_learning_notes`: README后,**先查看本部分**了解本项目大致结构; 22 | - `backbone`: 模型的骨干网络脚本; 23 | - `dataset`: 数据集构造脚本; 24 | - `dataset_util.py`: 使用tf.image API进行图像数据增强,然后用tf.data进行数据集构建; 25 | - `file_util.py`: 以txt标签文件的形式,构造tf.data数据集用于训练; 26 | - `tfrecord_util.py`: 读取txt标签文件,写tfrecord,然后读取tfrecord为数据集用于训练; 27 | - `images`: 项目图片; 28 | - `logs`: 存放训练过程中的日志文件和tensorboard文件(当前可能不存在); 29 | - `models`: 存放训练好的模型文件(当前可能不存在); 30 | - `multi_label`: 多标签分类模型构建脚本; 31 | - `classifier_loss.py`: 多标签分类的损失函数,包含多种损失函数:`focal loss`、`GHM`等; 32 | - `classifier_model.py`: 多标签分类模型,负责调用`backbone`里的骨干网络和本脚本中的多标签`head`组成整体模型; 33 | - `train.py`: 模型训练接口,集成模型构建/编译/训练/debug/预测、数据集构建等功能; 34 | - `utils`: 一些工具脚本; 35 | - `generate_txt`: 扫描指定路径下的图片数据,生成训练、测试等label.txt(根据实际项目而定,当前可能不存在); 36 | - `check_label_file.py`: 在训练前检查训练集,确保标签文件中的图片真实可用; 37 | - `draw_tools.py`: 模型训练完进行测试时,绘制每个类别的混淆图; 38 | - `logger_callback.py`: 日志打印的keras回调函数; 39 | - `radam.py`: RAdam算法的tf.keras优化器实现; 40 | - `configs.py`: 配置文件; 41 | - `run.py`: 启动脚本; 42 | 43 | 44 | ## 算法说明 45 | 46 | 在**多标签多分类模型**基础上,添加功能: 47 | - loss函数改造: 48 | - `label smoothing`: 标签平滑。 49 | - `focal loss`: 给每个样本的分类loss增加一个因子项,降低分类误差小的样本的影响,解决难易样本问题。 50 | > ![focal loss类别概率和损失关系图](https://github.com/zheng-yuwei/multi-label-classification/blob/master/images/focal-loss.jpg) 51 | - `gradient harmonizing mechanism (GHM)`: 52 | 根据样本梯度密度曲线(这里的梯度是梯度范数,并且不是所有网络参数的梯度,而是最后一层的回传梯度), 53 | 取反得到梯度密度调和参数(和平衡多类别数据集一个意思,只不过这里不是按类别来平衡,而是按梯度区间来平衡), 54 | 再乘以梯度以**调整梯度贡献曲线**,从而降低高密度区域的梯度贡献比例,提升低密度区域的梯度贡献比例。 55 | > ![GHM论文梯度分布与贡献图](https://github.com/zheng-yuwei/multi-label-classification/blob/master/images/GHM-insight.jpg) 56 | > 57 | > 原论文insight: 对网络训练而言,梯度是最重要的东西,而网络训练不好,也是因为梯度没调节好。 58 | focal loss认为前背景不平衡问题,本质为难易样本不平衡问题,从而调节样本的梯度贡献,一定程度上解决了背景问题。 59 | 作者认为,类别不平衡、难易样本不平衡,造成的本质驱动是梯度不平衡。 60 | > 然后通过绘制训练好的模型在样本空间上的梯度分布曲线,发现小梯度和大梯度都是高密度区域, 61 | (作者认为小梯度对应易学习样本,大密度对应异常样本); 62 | 然后绘制正常loss和focal loss梯度贡献曲线,发现正常loss中,高密度区域的梯度贡献度很高, 63 | 而focal loss中,小梯度的高密度区域被因子项惩罚而降低梯度贡献度, 64 | 但大梯度的高密度区域的梯度贡献度依然很高。 65 | 作者认为focal loss平衡了一部分梯度贡献度,所以使得训练低密度的中间梯度的梯度贡献度影响提升, 66 | 提升了算法性能;同时,认为focal loss并没有从本质出发,所以还有残留问题(异常样本大梯度的高密度区域)。 67 | 然后提出了GHM,从梯度分布和梯度贡献角度出发,提升网络训练效果。 68 | 69 | - 分离conv层的权重衰减项$\lambda_{conv}$ 和 BN层gamma的权重衰减项$\lambda_{gamma}$ 70 | 71 | 72 | ## 缓解过拟合/标注错误/样本错误(稍微按效果分先后,按实际数据来) 73 | 74 | 1. 一定程度提高BN层中gamma的L2权重衰减,conv层的L2权重衰减可以维持不变,去掉bias;[1,2,3] 75 | 1. 加大batch,然后要用warmup(我一开始用adam+warmup,后面用radam+warmup, radam中用动态学习率);[4,5,6] 76 | 1. 白化预处理; 77 | 1. 修改网络结构,resnext18相比resnet18多了结构正则的作用,效果好些; 78 | 1. 剪枝,其实和修改网络结构一个道理,只不过剪枝可以类似NAS自动找到更好的sub-network(网络结构);[3,9,10] 79 | 1. GHM损失函数;[8] 80 | 1. 数据增强(增加数据量); 81 | 1. label smoothing:;[7] 82 | 83 | TIPS:其他试过但基本无效的手段包括: 84 | 继续加大weight decay权重,BN层的gamma不加weight decay,BN层的beta加weight decay, 85 | 全连接层加dropout,focal loss,从Adam训练改为SGDM,加warmup。 86 | 87 | [1] L2 Regularization versus Batch and Weight Normalization 88 | [2] Towards Understanding Regularization in Batch Normalization 89 | [3] Learning Efficient Convolutional Networks through Network Slimming 90 | [4] Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour 91 | [5] Large Batch Training of Convolutional Networks 92 | [6] On the Variance of the Adaptive Learning Rate and Beyond 93 | [7] Rethinking the inception architecture for computer vision 94 | [8] Gradient Harmonized Single-stage Detector 95 | [9] Data-Driven Sparse Structure Selection for Deep Neural Networks 96 | [10] Rethinking the Value of Network Pruning 97 | 98 | ## TODO 99 | 1. 解决类别不平衡的做法: 100 | - reweighted sample从而实现self-balance(参考sklearn); 101 | - 先用训练一个网络然后采样平衡数据集做finetune。 102 | 1. 使用GAN生成数据,进行数据增强; 103 | 1. Handwriting Recognition in Low-resource Scripts Using Adversarial Learning。 104 | 105 | -------------------------------------------------------------------------------- /backbone/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File __init__.py 4 | @author:ZhengYuwei 5 | """ -------------------------------------------------------------------------------- /backbone/basic_backbone.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File basic_backbone.py 4 | @author:ZhengYuwei 5 | """ 6 | from tensorflow import keras 7 | 8 | 9 | class BasicBackbone(object): 10 | """ 骨干网络基础类,其他骨干网络类需要继承该类 """ 11 | L2_CONV_DECAY = 5.e-4 # 卷积层W权重衰减系数 12 | BN_L2_GAMMA_DECAY = 1.e-5 # BN层gamma系数的权重衰减系数 13 | BN_MOMENTUM = 0.9 # BN层mean、std的指数平滑动量系数 14 | BN_EPSILON = 1e-5 15 | BATCH_SIZE_AXIS = 0 # tensorflow backend的维度顺序(N, H, W, C) 16 | ROW_AXIS = 1 17 | COL_AXIS = 2 18 | CHANNEL_AXIS = 3 19 | 20 | @classmethod 21 | def convolution(cls, input_x, filters, **conv_params): 22 | """ 23 | 卷积运算 24 | :param input_x: 卷积运算的输入 25 | :param filters: 卷积核数量,输出channel数 26 | :param conv_params: 可缺省的默认参数: 27 | kernel_size: 卷积核大小,(width, height),默认(3, 3) 28 | strides: 步长,(width, height),默认(1, 1) 29 | padding: 填充方式,默认 same 30 | use_bias: 是否使用偏置b,默认不使用 31 | kernel_initializer: 卷积核初始化方式,默认 he_normal 32 | kernel_regularizer: 卷积核正则化项,默认 L2正则化,衰减权重系数为L2_CONV_DECAY 33 | :return: 卷积运算的输出 34 | """ 35 | conv_params.setdefault('filters', filters) 36 | conv_params.setdefault('kernel_size', (3, 3)) 37 | conv_params.setdefault('strides', (1, 1)) 38 | conv_params.setdefault('padding', 'same') 39 | conv_params.setdefault('use_bias', False) 40 | conv_params.setdefault('kernel_initializer', 'he_normal') 41 | conv_params.setdefault('kernel_regularizer', keras.regularizers.l2(cls.L2_CONV_DECAY)) 42 | conv = keras.layers.Conv2D(**conv_params)(input_x) 43 | return conv 44 | 45 | @classmethod 46 | def depthwise_conv(cls, input_x, **conv_params): 47 | """ 48 | 深度可分离卷积 49 | :param input_x: 卷积运算的输入 50 | :param conv_params: 可缺省的默认参数: 51 | kernel_size: 卷积核大小,(width, height),默认(3, 3) 52 | strides: 步长,(width, height),默认(1, 1) 53 | padding: 填充方式,默认 same 54 | use_bias: 是否使用偏置b,默认不使用 55 | depthwise_initializer: 卷积核初始化方式,默认 he_normal 56 | depthwise_regularizer: 卷积核正则化项,默认 L2正则化,衰减权重系数为L2_CONV_DECAY 57 | :return: 深度可分离卷积运算的输出 58 | """ 59 | conv_params.setdefault('kernel_size', (3, 3)) 60 | conv_params.setdefault('strides', (1, 1)) 61 | conv_params.setdefault('padding', 'same') 62 | conv_params.setdefault('use_bias', False) 63 | conv_params.setdefault('depthwise_initializer', 'he_normal') 64 | conv_params.setdefault('depthwise_regularizer', keras.regularizers.l2(cls.L2_CONV_DECAY)) 65 | conv = keras.layers.DepthwiseConv2D(**conv_params)(input_x) 66 | return conv 67 | 68 | @classmethod 69 | def batch_normalization(cls, input_x): 70 | """ 71 | 对输入执行batch normalization运算 72 | :param input_x: 输入tensor 73 | :return: BN运算后的tensor 74 | """ 75 | bn = keras.layers.BatchNormalization(axis=cls.CHANNEL_AXIS, momentum=cls.BN_MOMENTUM, 76 | gamma_regularizer=keras.regularizers.l2(cls.BN_L2_GAMMA_DECAY), 77 | epsilon=cls.BN_EPSILON)(input_x) 78 | return bn 79 | 80 | @classmethod 81 | def activation(cls, input_x, activation='relu', **activation_params): 82 | """ 83 | 激活函数运算 84 | :param input_x: 输入tensor 85 | :param activation: 激活函数类型 86 | :param activation_params: 激活函数参数 87 | :return: 激活运算后的tensor 88 | """ 89 | output = keras.layers.Activation(activation=activation, **activation_params)(input_x) 90 | return output 91 | 92 | @classmethod 93 | def _add_hard_swish(cls): 94 | """ 添加hard swish作为keras的自定义激活函数 """ 95 | def hard_swish(input_x, max_value=6.): 96 | """ (x * ReLU6(x+3)) / 6 """ 97 | h_swish = input_x * keras.layers.ReLU(max_value=max_value)(input_x + 3.) / max_value 98 | return h_swish 99 | # e.g. keras.layers.Activation(activation = 'h_swish')(5.) 100 | keras.utils.get_custom_objects().update({'h_swish': keras.layers.Activation(hard_swish)}) 101 | 102 | @classmethod 103 | def element_wise_add(cls, identity, residual, is_nin=False): 104 | """ 105 | 逐元素加的合并单位分支和残差分支的运算 106 | :param identity: shortcut的单位量分支 107 | :param residual: shortcut的残差量分支 108 | :param is_nin: 是否对单位量实施NIN卷积操作 109 | :return: 相加合并结果tensor 110 | """ 111 | identity_shape = keras.backend.int_shape(identity) 112 | residual_shape = keras.backend.int_shape(residual) 113 | stride_width = int(round(identity_shape[cls.ROW_AXIS] / residual_shape[cls.ROW_AXIS])) 114 | stride_height = int(round(identity_shape[cls.COL_AXIS] / residual_shape[cls.COL_AXIS])) 115 | 116 | if is_nin: 117 | identity = cls.convolution(identity, 118 | filters=residual_shape[cls.CHANNEL_AXIS], 119 | kernel_size=(1, 1), 120 | strides=(stride_width, stride_height), 121 | padding='valid') 122 | identity = cls.batch_normalization(identity) 123 | 124 | merge = keras.layers.add(inputs=[identity, residual]) 125 | return merge 126 | 127 | @classmethod 128 | def conv_bn(cls, input_x, filters, **conv_params): 129 | """ 130 | 卷积 + 批归一化 运算 131 | :param input_x: 输入tensor 132 | :param filters: 卷积核数量,channel数 133 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 134 | :return: 运算后的tensor 135 | """ 136 | conv = cls.convolution(input_x, filters, **conv_params) 137 | bn = cls.batch_normalization(conv) 138 | return bn 139 | 140 | @classmethod 141 | def depthwise_conv_bn(cls, input_x, **conv_params): 142 | """ 143 | 深度可分离卷积 + 批归一化 运算 144 | :param input_x: 输入tensor 145 | :param conv_params: 深度可分离卷积参数,参见 BasicBackbone.depthwise_conv 146 | :return: 运算后的tensor 147 | """ 148 | conv = cls.depthwise_conv(input_x, **conv_params) 149 | bn = cls.batch_normalization(conv) 150 | return bn 151 | 152 | @classmethod 153 | def bn_activation(cls, input_x, activation='relu', **activation_params): 154 | """ 155 | 批归一化 + 激活 运算 156 | :param input_x: 输入tensor 157 | :param activation: 激活函数类型名称 158 | :param activation_params: 激活函数参数列表 159 | :return: 运算后的tensor 160 | """ 161 | bn = cls.batch_normalization(input_x) 162 | act = cls.activation(bn, activation=activation, **activation_params) 163 | return act 164 | -------------------------------------------------------------------------------- /backbone/mixnet18.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File mixnet.py 4 | @author: ZhengYuwei 5 | """ 6 | import numpy as np 7 | from tensorflow import keras 8 | from backbone.basic_backbone import BasicBackbone 9 | 10 | 11 | class MixNet18(BasicBackbone): 12 | """ 13 | MixNet 18:不是论文中的MixNet的结构 14 | 只是借鉴了MixNet的不同kernel size mix到一起的想法,同时也使用depthwise 15 | 不使用depthwise的话,就是resnext 18了 16 | """ 17 | 18 | MIX_KERNEL_SIZES = [(3, 3), (5, 5), (7, 7), (9, 9)] 19 | MIX_KERNEL_RATIO = np.array([0, 8, 4, 2, 2], dtype=np.float) 20 | MIX_KERNEL_RATIO = MIX_KERNEL_RATIO.cumsum() / MIX_KERNEL_RATIO.sum() 21 | 22 | @classmethod 23 | def _mix_residual_block(cls, input_x, filters, is_nin=True, **conv_params): 24 | """ 25 | 一个残差模块里的 block 26 | input-> conv+bn->relu-> conv+bn-> add->relu-> 27 | |-----> conv(1 X 1)+bn ------>| 28 | :param input_x: 残差block的输入 29 | :param filters: 卷积核数,残差运算后的channel数 30 | :param is_nin: shortcut是否需要进行NIN运算 31 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 32 | :return: 卷积块运算之后的tensor 33 | """ 34 | residual = cls.conv_bn(input_x, filters, **conv_params) 35 | residual = cls.activation(residual) 36 | 37 | mix_residuals = list() 38 | mix_kernel_nums = filters * cls.MIX_KERNEL_RATIO 39 | mix_kernel_nums = mix_kernel_nums.astype(np.int) 40 | 41 | for i, kernel_size in enumerate(cls.MIX_KERNEL_SIZES): 42 | mix_residual = keras.layers.Lambda(lambda x: x[:, :, :, mix_kernel_nums[i]:mix_kernel_nums[i+1]])(residual) 43 | mix_conv = cls.depthwise_conv_bn(mix_residual, kernel_size=kernel_size) 44 | mix_residuals.append(mix_conv) 45 | mix_residuals = keras.layers.concatenate(inputs=mix_residuals, axis=cls.CHANNEL_AXIS) 46 | identity = cls.element_wise_add(input_x, mix_residuals, is_nin=is_nin) 47 | identity = cls.activation(identity) 48 | return identity 49 | 50 | @classmethod 51 | def _mix_residual_module(cls, input_x, filters, **conv_params): 52 | """ 53 | 一个残差模块: 54 | input-> conv+bn->relu-> conv+bn-> add->relu-> conv+bn->relu-> conv+bn-> add -> relu 55 | |-----> conv(1 X 1)+bn ----->| |--------------------------->| 56 | :param input_x: 该残差块的输入 57 | :param filters: 卷积核数,残差运算后的channel数 58 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 59 | :return: 60 | """ 61 | first_block = cls._mix_residual_block(input_x, filters, is_nin=True, **conv_params) 62 | second_block = cls._mix_residual_block(first_block, filters, is_nin=False) 63 | return second_block 64 | 65 | @classmethod 66 | def build(cls, input_x): 67 | """ 68 | 构造mixnet18基础网络,接受layers.Input,卷积层+BN层+add层+activation层输出,tf维度为 NHWC 69 | :param input_x: layers.Input对象 70 | :return: 卷积层+BN层+add层+activation层输出,tf维度为 NHWC=(N, H/32, W/32, 512) 71 | """ 72 | net = cls.conv_bn(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same') 73 | net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(net) 74 | net = cls.activation(net) 75 | 76 | # 4 * 残差模块 77 | net = cls._mix_residual_module(net, filters=64) 78 | net = cls._mix_residual_module(net, filters=128, strides=(2, 2)) 79 | net = cls._mix_residual_module(net, filters=256, strides=(2, 2)) 80 | net = cls._mix_residual_module(net, filters=512, strides=(2, 2)) 81 | 82 | return net 83 | -------------------------------------------------------------------------------- /backbone/mobilenet_v2.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File mobilenet_v2.py 4 | @author:ZhengYuwei 5 | """ 6 | from tensorflow import keras 7 | from backbone.basic_backbone import BasicBackbone 8 | 9 | 10 | class MobileNetV2(BasicBackbone): 11 | 12 | @classmethod 13 | def _inverted_residual_module(cls, input_x, filters, expand_ratio=6, strides=(2, 2)): 14 | net = cls._expand_depthwise_linear(input_x, filters, expand_ratio, strides) 15 | net = cls.element_wise_add(input_x, net, is_nin=False) 16 | return net 17 | 18 | @classmethod 19 | def _expand_depthwise_linear(cls, input_x, filters, expand_ratio=6, strides=(2, 2)): 20 | """ 21 | MobileNet v2基本模块:expand、depthwise、linear 22 | :param input_x: 输入tensor 23 | :param filters: 模块输出的通道数 24 | :param expand_ratio: 扩张比例,默认为6 25 | :param strides: 步长,默认(2, 2) 26 | :return: 模块运算后的输出tensor 27 | """ 28 | input_filters = keras.backend.int_shape(input_x)[-1] 29 | depthwise_filters = expand_ratio * input_filters 30 | # x6 (1, 1) expand 31 | net = cls.conv_bn(input_x, filters=depthwise_filters, kernel_size=(1, 1), strides=(1, 1), padding='same') 32 | net = cls.activation(net) 33 | # (3, 3) depthwise 34 | net = cls.depthwise_conv_bn(net, strides=strides) 35 | net = cls.activation(net) 36 | # (1, 1) linear bottleneck 37 | net = cls.conv_bn(net, filters=filters, kernel_size=(1, 1), strides=(1, 1), padding='same') 38 | return net 39 | 40 | @classmethod 41 | def build(cls, input_x): 42 | """ 43 | 构建 MobileNet v2网络,整体网络的stride也是32 44 | :param input_x: 网络输入图形矩阵 45 | :return: 网络输出tensor 46 | """ 47 | net = cls.conv_bn(input_x, filters=32, kernel_size=(3, 3), strides=(2, 2), padding='same') 48 | net = cls.activation(net) 49 | 50 | net = cls._expand_depthwise_linear(net, filters=16, expand_ratio=1, strides=(1, 1)) 51 | 52 | net = cls._expand_depthwise_linear(net, filters=24, expand_ratio=6, strides=(2, 2)) 53 | net = cls._inverted_residual_module(net, filters=24, expand_ratio=6, strides=(1, 1)) 54 | 55 | net = cls._expand_depthwise_linear(net, filters=32, expand_ratio=6, strides=(2, 2)) 56 | net = cls._inverted_residual_module(net, filters=32, expand_ratio=6, strides=(1, 1)) 57 | net = cls._inverted_residual_module(net, filters=32, expand_ratio=6, strides=(1, 1)) 58 | 59 | net = cls._expand_depthwise_linear(net, filters=64, expand_ratio=6, strides=(1, 1)) 60 | net = cls._inverted_residual_module(net, filters=64, expand_ratio=6, strides=(1, 1)) 61 | net = cls._inverted_residual_module(net, filters=64, expand_ratio=6, strides=(1, 1)) 62 | net = cls._inverted_residual_module(net, filters=64, expand_ratio=6, strides=(1, 1)) 63 | 64 | net = cls._expand_depthwise_linear(net, filters=96, expand_ratio=6, strides=(2, 2)) 65 | net = cls._inverted_residual_module(net, filters=96, expand_ratio=6, strides=(1, 1)) 66 | net = cls._inverted_residual_module(net, filters=96, expand_ratio=6, strides=(1, 1)) 67 | 68 | net = cls._expand_depthwise_linear(net, filters=160, expand_ratio=6, strides=(2, 2)) 69 | net = cls._inverted_residual_module(net, filters=160, expand_ratio=6, strides=(1, 1)) 70 | net = cls._inverted_residual_module(net, filters=160, expand_ratio=6, strides=(1, 1)) 71 | 72 | net = cls._expand_depthwise_linear(net, filters=320, expand_ratio=6, strides=(1, 1)) 73 | # 原始是1280个channel的输出 74 | net = cls.conv_bn(net, filters=512, kernel_size=(1, 1), strides=(1, 1), padding='same') 75 | net = cls.activation(net) 76 | return net 77 | -------------------------------------------------------------------------------- /backbone/resnet18.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File resnet18.py 4 | @author:ZhengYuwei 5 | 注意: 6 | 1. resnet v1中,conv层都是有bias的,resnext则没有,resnet v2部分有部分没有;(但,个人感觉可以全部都不要,因为有BN) 7 | 2. resnet v2 使用 pre-activation,resnet和resnext不用;(有没有pre-activation其实差不多,inception加强多一些) 8 | 3. 18和34层用2层 3x3 conv层的block,50及以上的用3层(1,3,1)conv层、具有bottleneck(4倍差距)的block 9 | """ 10 | from tensorflow import keras 11 | from backbone.basic_backbone import BasicBackbone 12 | 13 | 14 | class ResNet18(BasicBackbone): 15 | """ 改动后的ResNet 18,网络前端 7x7卷积->3x3卷积 """ 16 | 17 | @classmethod 18 | def _residual_block(cls, input_x, filters, is_nin=True, **conv_params): 19 | """ 20 | 一个残差模块里的 block 21 | input-> conv+bn->relu-> conv+bn-> add->relu-> 22 | |-----> conv(1 X 1)+bn ------>| 23 | :param input_x: 残差block的输入 24 | :param filters: 卷积核数,残差运算后的channel数 25 | :param is_nin: shortcut是否需要进行NIN运算 26 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 27 | :return: 卷积块运算之后的tensor 28 | """ 29 | residual = cls.conv_bn(input_x, filters, **conv_params) 30 | residual = cls.activation(residual) 31 | conv_params.update(strides=(1, 1)) 32 | residual = cls.conv_bn(residual, filters, **conv_params) 33 | identity = cls.element_wise_add(input_x, residual, is_nin=is_nin) 34 | identity = cls.activation(identity) 35 | return identity 36 | 37 | @classmethod 38 | def _residual_module(cls, input_x, filters, **conv_params): 39 | """ 40 | 一个残差模块: 41 | input-> conv+bn->relu-> conv+bn-> add->relu-> conv+bn->relu-> conv+bn-> add -> relu 42 | |-----> conv(1 X 1)+bn ----->| |--------------------------->| 43 | :param input_x: 该残差块的输入 44 | :param filters: 卷积核数,残差运算后的channel数 45 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 46 | :return: 47 | """ 48 | first_block = cls._residual_block(input_x, filters, is_nin=True, **conv_params) 49 | second_block = cls._residual_block(first_block, filters, is_nin=False) 50 | return second_block 51 | 52 | @classmethod 53 | def build(cls, input_x): 54 | """ 55 | 构造resnet18基础网络,接受layers.Input,卷积层+BN层+add层+activation层输出,tf维度为 NHWC 56 | :param input_x: layers.Input对象 57 | :return: 卷积层+BN层+add层+activation层输出,tf维度为 NHWC=(N, H/32, W/32, 512) 58 | """ 59 | net = cls.conv_bn(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same') 60 | net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(net) 61 | net = cls.activation(net) 62 | 63 | # 4 * 残差模块 64 | net = cls._residual_module(net, filters=64) 65 | net = cls._residual_module(net, filters=128, strides=(2, 2)) 66 | net = cls._residual_module(net, filters=256, strides=(2, 2)) 67 | net = cls._residual_module(net, filters=512, strides=(2, 2)) 68 | 69 | return net 70 | -------------------------------------------------------------------------------- /backbone/resnet18_v2.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File resnet18_v2.py 4 | @author:ZhengYuwei 5 | """ 6 | from tensorflow import keras 7 | from backbone.basic_backbone import BasicBackbone 8 | 9 | 10 | class ResNet18_v2(BasicBackbone): 11 | """ 改动后的ResNet-v2 18,网络前端 7x7卷积->3x3卷积 """ 12 | 13 | @classmethod 14 | def _residual_v2_block(cls, input_x, filters, is_nin=True, **conv_params): 15 | """ 16 | 一个残差模块里的 block 17 | input-> bn+relu-> conv-> bn+relu-> conv-> add-> 18 | |----> conv(1 X 1)+bn ---->| 19 | 或 20 | input-> bn+relu-> conv-> bn+relu-> conv-> add-> 21 | |------------------------------------->| 22 | :param input_x: 残差block的输入 23 | :param filters: 卷积核数,残差运算后的channel数 24 | :param is_nin: shortcut是否需要进行NIN运算 25 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 26 | :return: 卷积块运算之后的tensor 27 | """ 28 | pre_activation = cls.bn_activation(input_x) 29 | residual = cls.convolution(pre_activation, filters=filters, **conv_params) 30 | conv_params.update(strides=(1, 1)) 31 | residual = cls.bn_activation(residual) 32 | residual = cls.convolution(residual, filters=filters, **conv_params) 33 | if is_nin: 34 | identity = cls.element_wise_add(pre_activation, residual, is_nin=True) 35 | else: 36 | identity = cls.element_wise_add(input_x, residual, is_nin=False) 37 | return identity 38 | 39 | @classmethod 40 | def _residual_v2_module(cls, input_x, filters, **conv_params): 41 | """ 42 | 一个resnet v2残差模块块: 43 | input-> bn+relu-> conv-> bn+relu-> conv-> add-> bn+relu-> conv-> bn+relu-> conv-> add-> 44 | |----> conv(1 X 1)+bn ---->| |------------------------------------->| 45 | :param input_x: 该残差块的输入 46 | :param filters: 卷积核数,残差运算后的channel数 47 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 48 | :return: 49 | """ 50 | first_block = cls._residual_v2_block(input_x, filters, is_nin=True, **conv_params) 51 | second_block = cls._residual_v2_block(first_block, filters, is_nin=False) 52 | return second_block 53 | 54 | @classmethod 55 | def build(cls, input_x): 56 | """ 57 | 构造resnet18 v2基础网络,接受layers.Input,卷积层+add层+BN层+activation层输出,tf维度为 NHWC 58 | :param input_x: layers.Input对象 59 | :return: 卷积层+BN层+activation层输出,tf维度为 NHWC=(N, H/32, W/32, 512) 60 | """ 61 | net = cls.convolution(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same') 62 | net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(net) 63 | 64 | # 4 * 残差模块 65 | net = cls._residual_v2_module(net, filters=64) 66 | net = cls._residual_v2_module(net, filters=128, strides=(2, 2)) 67 | net = cls._residual_v2_module(net, filters=256, strides=(2, 2)) 68 | net = cls._residual_v2_module(net, filters=512, strides=(2, 2)) 69 | net = cls.bn_activation(net) 70 | 71 | return net 72 | -------------------------------------------------------------------------------- /backbone/resnext.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File resnext.py 4 | @author: ZhengYuwei 5 | """ 6 | import numpy as np 7 | from tensorflow import keras 8 | from backbone.basic_backbone import BasicBackbone 9 | 10 | 11 | class ResNeXt18(BasicBackbone): 12 | """ 13 | ResNeXt 18:不是论文中的ResNeXt的结构 14 | 只是借鉴了ResNeXt中 分组卷积 + 不同组不同卷积核大小 的思想 15 | 大概可以看成不使用depthwise的MixNet 18 16 | """ 17 | 18 | MIX_KERNEL_SIZES = [(3, 3), (5, 5), (7, 7), (9, 9)] 19 | # 分为32组,每组理论至少4个channel,不足的话可以把组数减半 20 | GROUP_NUMS = np.array([16, 8, 4, 4], dtype=np.int) 21 | SMALL_GROUP_NUMS = GROUP_NUMS // 2 22 | TOTAL_GROUP_NUMS = np.sum(GROUP_NUMS) 23 | SMALL_TOTAL_GROUP_NUMS = np.sum(SMALL_GROUP_NUMS) 24 | 25 | @classmethod 26 | def _inception_residual_block(cls, input_x, filters, is_nin=True, **conv_params): 27 | """ 28 | 一个残差模块里的 block 29 | input-> conv+bn->relu-> conv+bn-> add->relu-> 30 | |-----> conv(1 X 1)+bn ------>| 31 | :param input_x: 残差block的输入 32 | :param filters: 卷积核数,残差运算后的channel数 33 | :param is_nin: shortcut是否需要进行NIN运算 34 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 35 | :return: 卷积块运算之后的tensor 36 | """ 37 | residual = cls.conv_bn(input_x, filters, **conv_params) 38 | residual = cls.activation(residual) 39 | 40 | # 每组至少4个channel 41 | if filters % cls.SMALL_TOTAL_GROUP_NUMS != 0: 42 | raise ValueError('卷积核数必须可以被组数整除!') 43 | if filters / cls.SMALL_TOTAL_GROUP_NUMS < 4: 44 | raise ValueError('卷积核数分组后,每组至少有4个通道!') 45 | # 判断分为32组还是16组 46 | group_nums = cls.GROUP_NUMS 47 | total_group_num = cls.TOTAL_GROUP_NUMS 48 | if filters % cls.TOTAL_GROUP_NUMS != 0 or filters / cls.TOTAL_GROUP_NUMS < 4: 49 | group_nums = cls.SMALL_GROUP_NUMS 50 | total_group_num = cls.SMALL_TOTAL_GROUP_NUMS 51 | # 分组卷积 52 | group_channel = filters // total_group_num 53 | group_residuals = list() 54 | start_channel = 0 55 | end_channel = start_channel 56 | for i, group in enumerate(group_nums): 57 | for j in range(group): 58 | end_channel += group_channel 59 | group_residual = keras.layers.Lambda(lambda x: x[:, :, :, start_channel:end_channel])(residual) 60 | group_conv = cls.conv_bn(group_residual, filters=group_channel, kernel_size=cls.MIX_KERNEL_SIZES[i]) 61 | group_residuals.append(group_conv) 62 | group_residuals = keras.layers.concatenate(inputs=group_residuals, axis=cls.CHANNEL_AXIS) 63 | identity = cls.element_wise_add(input_x, group_residuals, is_nin=is_nin) 64 | identity = cls.activation(identity) 65 | return identity 66 | 67 | @classmethod 68 | def _inception_residual_module(cls, input_x, filters, **conv_params): 69 | """ 70 | 一个残差模块: 71 | input-> conv+bn->relu-> conv+bn-> add->relu-> conv+bn->relu-> conv+bn-> add -> relu 72 | |-----> conv(1 X 1)+bn ----->| |--------------------------->| 73 | :param input_x: 该残差块的输入 74 | :param filters: 卷积核数,残差运算后的channel数 75 | :param conv_params: 卷积参数,参见 BasicBackbone.convolution 76 | :return: 77 | """ 78 | first_block = cls._inception_residual_block(input_x, filters, is_nin=True, **conv_params) 79 | second_block = cls._inception_residual_block(first_block, filters, is_nin=False) 80 | return second_block 81 | 82 | @classmethod 83 | def build(cls, input_x): 84 | """ 85 | 构造resnext18基础网络,接受layers.Input,卷积层+BN层+add层+activation层输出,tf维度为 NHWC 86 | :param input_x: layers.Input对象 87 | :return: 卷积层+BN层+add层+activation层输出,tf维度为 NHWC=(N, H/32, W/32, 512) 88 | """ 89 | net = cls.conv_bn(input_x, filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same') 90 | net = keras.layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding="same")(net) 91 | net = cls.activation(net) 92 | 93 | # 4 * 残差模块 94 | net = cls._inception_residual_module(net, filters=64) 95 | net = cls._inception_residual_module(net, filters=128, strides=(2, 2)) 96 | net = cls._inception_residual_module(net, filters=256, strides=(2, 2)) 97 | net = cls._inception_residual_module(net, filters=512, strides=(2, 2)) 98 | 99 | return net 100 | -------------------------------------------------------------------------------- /configs.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File configs.py 4 | @author:ZhengYuwei 5 | """ 6 | import datetime 7 | import numpy as np 8 | from easydict import EasyDict 9 | from multi_label.multi_label_model import Classifier 10 | 11 | 12 | def lr_func(epoch): 13 | # step_epoch = [10, 20, 30, 40, 50, 60, 70, 80] 14 | # step_lr = [0.0000001, 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0] # 0.0001 15 | step_epoch = [10, 140, 200, 260, 300] 16 | step_lr = [0.00001, 0.001, 0.0001, 0.00001, 0.000001] 17 | i = 0 18 | while i < len(step_epoch) and epoch > step_epoch[i]: 19 | i += 1 20 | return step_lr[i] 21 | 22 | 23 | FLAGS = EasyDict() 24 | 25 | # 数据集 26 | FLAGS.train_set_dir = 'dataset/test_sample' 27 | FLAGS.train_label_path = 'dataset/test_sample/label.txt' 28 | FLAGS.test_set_dir = 'dataset/test_sample' 29 | FLAGS.test_label_path = 'dataset/test_sample/label.txt' 30 | # 模型权重的L2正则化权重直接写在对应模型的骨干网络定义文件中 31 | FLAGS.input_shape = (48, 144, 3) # (H, W, C) 32 | FLAGS.output_shapes = (34, 64, 34, 34, 34, 34, 42, 12, 2, 6) # 多标签输出,每个标签预测的类别数 33 | FLAGS.output_names = ['class_{}'.format(i+1) for i in range(10)] 34 | FLAGS.loss_weights = [1, 1, 1, 1, 1, 1, 1, 1, 0.5, 0.5] 35 | FLAGS.mode = 'train' # train, test, debug, save_pb, save_serving 36 | FLAGS.model_backbone = Classifier.BACKBONE_RESNET_18 37 | FLAGS.optimizer = 'radam' # sgdm, adam, adabound, radam 38 | FLAGS.is_augment = True 39 | FLAGS.is_label_smoothing = False 40 | FLAGS.is_focal_loss = False 41 | FLAGS.is_gradient_harmonized = True 42 | FLAGS.type = FLAGS.model_backbone + '-' + FLAGS.optimizer 43 | FLAGS.type += ('-aug' if FLAGS.is_augment else '') 44 | FLAGS.type += ('-smooth' if FLAGS.is_label_smoothing else '') 45 | FLAGS.type += ('-focal' if FLAGS.is_focal_loss else '') 46 | FLAGS.type += ('-ghm' if FLAGS.is_gradient_harmonized else '') 47 | FLAGS.log_path = 'logs/log-{}.txt'.format(FLAGS.type) 48 | # 训练参数 49 | FLAGS.train_set_size = 14 # 160108 50 | FLAGS.val_set_size = 14 # 35935 51 | FLAGS.batch_size = 5 # 3079 52 | FLAGS.steps_per_epoch = int(np.ceil(FLAGS.train_set_size / FLAGS.batch_size)) 53 | FLAGS.validation_steps = int(np.ceil(FLAGS.val_set_size / FLAGS.batch_size)) 54 | 55 | FLAGS.epoch = 300 56 | FLAGS.init_lr = 0.0002 # nadam推荐使用值 57 | # callback的参数 58 | FLAGS.ckpt_period = 20 # 模型保存 59 | FLAGS.stop_patience = 500 # early stop 60 | FLAGS.stop_min_delta = 0.0001 61 | FLAGS.lr_func = lr_func # 学习率更新函数 62 | # FLAGS.logger_batch = 20 # 打印训练学习的batch间隔 63 | # tensorboard日志保存目录 64 | FLAGS.tensorboard_dir = 'logs/' + 'lpr-{}-{}'.format(FLAGS.type, datetime.datetime.now().strftime('%Y%m%d-%H%M%S')) 65 | # 模型保存 66 | FLAGS.checkpoint_path = 'models/{}/'.format(FLAGS.type) 67 | FLAGS.checkpoint_name = 'lp-recognition-{}'.format(FLAGS.type) + '-{epoch: 3d}-{loss: .5f}.ckpt' 68 | FLAGS.serving_model_dir = 'models/serving' 69 | FLAGS.pb_model_dir = 'models/pb' 70 | # 测试参数 71 | FLAGS.base_confidence = 0.83 # 基础置信度 72 | # 训练gpu 73 | FLAGS.gpu_mode = 'cpu' 74 | FLAGS.gpu_num = 1 75 | FLAGS.visible_gpu = '0' # ','.join([str(_) for _ in range(FLAGS.gpu_num)]) 76 | FLAGS.gpu_device = '0' 77 | -------------------------------------------------------------------------------- /dataset/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File __init__.py 4 | @author: ZhengYuwei 5 | 功能: 6 | 构建tf.data.Dataset对象,生成训练集/验证集/测试集,以提供模型训练、测试 7 | 构建方式主要包含: 8 | 1. 直接从label文件中读取信息进行构建(file_util); 9 | 2. 由label文件读取信息生成tfrecord、读取tfrecord方式,构建数据集(tfrecord_util); 10 | """ -------------------------------------------------------------------------------- /dataset/dataset_util.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File dataset_util.py 4 | @author:ZhengYuwei 5 | 功能: 6 | 功能: 7 | 1. DatasetUtil.augment_image 8 | 对传入的image构成的tf.data.Dataset数据集,进行图像数据增强,包含: 9 | - 等概率加噪:高斯噪声、椒盐噪声、不加噪声; 10 | - 对 对比度、亮度、饱和度 进行一定范围的随机扰动 11 | - mixup(待添加及实验) 12 | - 图像平移(当前场景不适用) 13 | - 图像旋转和翻转(当前场景不适用) 14 | - 随机crop(当前场景不适用) 15 | 2. DatasetUtil.shuffle_repeat:随机扰乱数据,重复(整个数据集层面)生成数据; 16 | 3. DatasetUtil.batch_prefetch:预生成批次数据。 17 | """ 18 | import tensorflow as tf 19 | 20 | 21 | class DatasetUtil(object): 22 | """ 对传入的生成(image, label)形式的tf.data.Dataset数据集,加工为可供训练使用的数据集 """ 23 | # 数据增强的超参,这个可能需要先不使用数据增强训练,调整超参,然后再用数据增强训练对比,然后调节这些超参 24 | _random_brightness = 30. / 255. # 随机亮度 25 | _random_low_contrast = 0.9 # 对比度最低值 26 | _random_up_contrast = 1.1 # 对比度最大值 27 | _random_low_saturation = 0.9 # 饱和度最小值 28 | _random_up_saturation = 1.1 # 饱和度最大值 29 | _random_normal = 0.01 # 随机噪声 30 | 31 | @staticmethod 32 | def _add_gauss_noise(image): 33 | """ 加入高斯噪声 """ 34 | image = image + tf.cast(tf.random_normal(tf.shape(image), mean=0, stddev=DatasetUtil._random_normal), 35 | tf.float32) 36 | return image 37 | 38 | @staticmethod 39 | def _add_salt_pepper_noise(image): 40 | """ 加入椒盐噪声 """ 41 | shp = tf.shape(image)[:-1] 42 | mask_select = tf.keras.backend.random_binomial(shape=shp, p=DatasetUtil._random_normal) 43 | mask_noise = tf.keras.backend.random_binomial(shape=shp, p=0.5) # 同样概率的椒盐 44 | image = image * tf.expand_dims(1 - mask_select, -1) + tf.expand_dims(mask_noise * mask_select, -1) 45 | return image 46 | 47 | @staticmethod 48 | def _add_noise(image): 49 | """ 对图片进行数据增强:高斯噪声或椒盐噪声 """ 50 | # 噪声类型 51 | noise_type = tf.random_uniform([], minval=0, maxval=3, dtype=tf.int32) 52 | image = tf.case(pred_fn_pairs=[(tf.equal(noise_type, 0), 53 | lambda: DatasetUtil._add_salt_pepper_noise(image)), 54 | (tf.equal(noise_type, 1), 55 | lambda: DatasetUtil._add_gauss_noise(image))], 56 | default=lambda: image) 57 | return image 58 | 59 | @staticmethod 60 | def _augment_cond_0(image): 61 | """ 对图片进行数据增强:亮度,饱和度,对比度 """ 62 | image = tf.image.random_brightness(image, max_delta=DatasetUtil._random_brightness) 63 | image = tf.image.random_saturation(image, lower=DatasetUtil._random_low_saturation, 64 | upper=DatasetUtil._random_up_saturation) 65 | image = tf.image.random_contrast(image, lower=DatasetUtil._random_low_contrast, 66 | upper=DatasetUtil._random_up_contrast) 67 | return image 68 | 69 | @staticmethod 70 | def _augment_cond_1(image): 71 | """ 对图片进行数据增强:饱和度,亮度,对比度 """ 72 | image = tf.image.random_saturation(image, lower=DatasetUtil._random_low_saturation, 73 | upper=DatasetUtil._random_up_saturation) 74 | image = tf.image.random_brightness(image, max_delta=DatasetUtil._random_brightness) 75 | image = tf.image.random_contrast(image, lower=DatasetUtil._random_low_contrast, 76 | upper=DatasetUtil._random_up_contrast) 77 | return image 78 | 79 | @staticmethod 80 | def _augment_cond_2(image): 81 | """ 对图片进行数据增强:饱和度,对比度, 亮度 """ 82 | image = tf.image.random_saturation(image, lower=DatasetUtil._random_low_saturation, 83 | upper=DatasetUtil._random_up_saturation) 84 | image = tf.image.random_contrast(image, lower=DatasetUtil._random_low_contrast, 85 | upper=DatasetUtil._random_up_contrast) 86 | image = tf.image.random_brightness(image, max_delta=DatasetUtil._random_brightness) 87 | return image 88 | 89 | @staticmethod 90 | def _augment(image): 91 | """ 对图片进行数据增强:饱和度,对比度, 亮度,加噪 92 | :param image: 待增强图片 (H, W, ?) 93 | :return: 94 | """ 95 | image = DatasetUtil._add_noise(image) 96 | # 数据增强顺序 97 | color_ordering = tf.random_uniform([], minval=0, maxval=4, dtype=tf.int32) 98 | image = tf.case(pred_fn_pairs=[(tf.equal(color_ordering, 0), 99 | lambda: DatasetUtil._augment_cond_0(image)), 100 | (tf.equal(color_ordering, 1), 101 | lambda: DatasetUtil._augment_cond_1(image)), 102 | (tf.equal(color_ordering, 2), 103 | lambda: DatasetUtil._augment_cond_2(image))], 104 | default=lambda: image) 105 | image = tf.clip_by_value(image, 0.0, 1.0) # 防止数据增强越界 106 | return image 107 | 108 | @staticmethod 109 | def augment_image(image_set): 110 | """ 对传入的tf.data.Dataset数据集进行图片数据增强,构造批次 111 | :param image_set: tf.data.Dataset数据集,产生(image, label)形式的数据 112 | :return: 增强后的tf.data.Dataset对象 113 | """ 114 | # 进行数据增强(这个map需要在repeat之后,才能每次repeat都进行不一样的增强效果) 115 | image_set = image_set.map(lambda image: DatasetUtil._augment(image), 116 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 117 | return image_set 118 | 119 | @staticmethod 120 | def shuffle_repeat(dataset, batch_size): 121 | """ 对传入的tf.data.Dataset数据集进行 shuffle 和 repeat 122 | :param dataset: tf.data.Dataset数据集,产生(image, label)形式的数据 123 | :param batch_size: 训练的batch大小 124 | :return: shuffle 和 repeat后的tf.data.Dataset对象 125 | """ 126 | dataset = dataset.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=5 * batch_size)) 127 | return dataset 128 | 129 | @staticmethod 130 | def batch_prefetch(dataset, batch_size): 131 | """ 生成批次并预加载 132 | :param dataset: tf.data.Dataset数据集 133 | :param batch_size: 训练的batch大小 134 | :return: 输出批次并预加载的tf.data.Dataset数据集 135 | """ 136 | # 缓存数据到内存 137 | # dataset = dataset.cache() 138 | dataset = dataset.batch(batch_size).prefetch(tf.data.experimental.AUTOTUNE) 139 | return dataset 140 | -------------------------------------------------------------------------------- /dataset/file_util.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File file_util.py 4 | @author:ZhengYuwei 5 | """ 6 | import os 7 | import logging 8 | import functools 9 | import tensorflow as tf 10 | 11 | from dataset.dataset_util import DatasetUtil 12 | 13 | 14 | class FileUtil(object): 15 | """ 16 | 从标签文件中,构造返回(image, label)的tf.data.Dataset数据集 17 | 标签文件内容如下: 18 | image_name label0,label1,label2,... 19 | """ 20 | 21 | @staticmethod 22 | def _parse_string_line(string_line, root_path): 23 | """ 24 | 解析文本中的一行字符串行,得到图片路径(拼接图片根目录)和标签 25 | :param string_line: 文本中的一行字符串,image_name label0 label1 label2 label3 ... 26 | :param root_path: 图片根目录 27 | :return: DatasetV1Adapter<(图片路径Tensor(shape=(), dtype=string),标签Tensor(shape=(?,), dtype=float32))> 28 | """ 29 | strings = tf.string_split([string_line], delimiter=' ').values 30 | image_path = tf.string_join([root_path, strings[0]], separator=os.sep) 31 | labels = tf.string_to_number(strings[1:]) 32 | return image_path, labels 33 | 34 | @staticmethod 35 | def _parse_image(image_path, _, image_size): 36 | """ 37 | 根据图片路径和标签,读取图片 38 | :param image_path: 图片路径, Tensor(shape=(), dtype=string) 39 | :param _: 标签Tensor(shape(?,), dtype=float32)),本函数只产生图像dataset,故不需要 40 | :param image_size: 图像需要resize到的大小 41 | :return: 归一化的图片 Tensor(shape=(48, 144, ?), dtype=float32) 42 | """ 43 | # 图片 44 | image = tf.read_file(image_path) 45 | image = tf.image.decode_jpeg(image) 46 | image = tf.image.resize_images(image, image_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 47 | # 这里使用tf.float32会将照片归一化,也就是 *1/255 48 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 49 | image = tf.reverse(image, axis=[2]) # 读取的是rgb,需要转为bgr 50 | return image 51 | 52 | @staticmethod 53 | def _parse_labels(_, labels, num_labels): 54 | """ 55 | 根据图片路径和标签,解析标签 56 | :param _: 图片路径, Tensor(shape=(), dtype=string),本函数只产生标签dataset,故不需要 57 | :param labels: 标签,Tensor(shape=(?,), dtype=float32) 58 | :param num_labels: 每个图像对于输出的标签数(多标签分类模型) 59 | :return: 标签 DatasetV1Adapter<(多个标签Tensor(shape=(), dtype=float32), ...)> 60 | """ 61 | label_list = list() 62 | for label_index in range(num_labels): 63 | label_list.append(labels[label_index]) 64 | return label_list 65 | 66 | @staticmethod 67 | def get_dataset(file_path, root_path, image_size, num_labels, batch_size, is_augment=True, is_test=False): 68 | """ 69 | 从标签文件读取数据,并解析为(image_path, labels)形式的列表 70 | 标签文件内容格式为: 71 | image_name label0,label1,label2,label3,... 72 | :param file_path: 标签文件路径 73 | :param root_path: 图片路径的根目录,用于和标签文件中的image_name拼接 74 | :param image_size: 图像需要resize到的尺寸 75 | :param num_labels: 每个图像对于输出的标签数(多标签分类模型) 76 | :param batch_size: 批次大小 77 | :param is_augment: 是否对图片进行数据增强 78 | :param is_test: 是否为测试阶段,测试阶段的话,输出的dataset中多包含image_path 79 | :return: tf.data.Dataset对象 80 | """ 81 | logging.info('利用标签文件、图片根目录生成tf.data数据集对象:') 82 | logging.info('1. 解析标签文件;') 83 | dataset = tf.data.TextLineDataset(file_path) 84 | dataset = DatasetUtil.shuffle_repeat(dataset, batch_size) 85 | dataset = dataset.map(functools.partial(FileUtil._parse_string_line, root_path=root_path), 86 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 87 | logging.info('2. 读取图片数据,构造image set和label set;') 88 | image_set = dataset.map(functools.partial(FileUtil._parse_image, image_size=image_size), 89 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 90 | labels_set = dataset.map(functools.partial(FileUtil._parse_labels, num_labels=num_labels), 91 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 92 | 93 | if is_augment: 94 | logging.info('2.1 image set数据增强;') 95 | image_set = DatasetUtil.augment_image(image_set) 96 | 97 | logging.info('3. image set数据标准化;') 98 | image_set = image_set.map(lambda image: tf.image.per_image_standardization(image), 99 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 100 | 101 | if is_test: 102 | logging.info('4. 完成tf.data (image, label, path) 测试数据集构造;') 103 | path_set = dataset.map(lambda image_path, label: image_path, 104 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 105 | dataset = tf.data.Dataset.zip((image_set, labels_set, path_set)) 106 | else: 107 | logging.info('4. 完成tf.data (image, label) 训练数据集构造;') 108 | # 合并image、labels: 109 | # DatasetV1Adapter 110 | dataset = tf.data.Dataset.zip((image_set, labels_set)) 111 | logging.info('5. 构造tf.data多epoch训练模式;') 112 | dataset = DatasetUtil.batch_prefetch(dataset, batch_size) 113 | return dataset 114 | 115 | 116 | if __name__ == '__main__': 117 | import cv2 118 | import numpy as np 119 | import time 120 | 121 | # 开启eager模式进行图片读取、增强和展示 122 | tf.enable_eager_execution() 123 | train_file_path = './test_sample/label.txt' # 标签文件 124 | image_root_path = './test_sample' # 图片根目录 125 | 126 | train_batch = 100 127 | train_set = FileUtil.get_dataset(train_file_path, image_root_path, image_size=(48, 144), num_labels=10, 128 | batch_size=train_batch, is_augment=True) 129 | start = time.time() 130 | for count, data in enumerate(train_set): 131 | for i in range(data[0].shape[0]): 132 | cv2.imshow('a', np.array(data[0][i])) 133 | cv2.waitKey(1) 134 | 135 | for count, data in enumerate(train_set): 136 | print('一批(%d)图像 shape:' % train_batch, data[0].shape) 137 | for i in range(data[0].shape[0]): 138 | cv2.imshow('a', np.array(data[0][i])) 139 | cv2.waitKey(1) 140 | print('一批(%d)标签 shape:' % train_batch, len(data[1])) 141 | for i in range(len(data[1])): 142 | print(data[1][i]) 143 | if count == 100: 144 | break 145 | print('耗时:', time.time() - start) 146 | -------------------------------------------------------------------------------- /dataset/test_sample/label.txt: -------------------------------------------------------------------------------- 1 | 鲁BC6T76.jpg 15 1 12 6 27 7 6 -1 0 0 -------------------------------------------------------------------------------- /dataset/test_sample/鲁BC6T76.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zheng-yuwei/multi-label-classification/3563628060e5a9534b106414a193e71c2fa001b8/dataset/test_sample/鲁BC6T76.jpg -------------------------------------------------------------------------------- /dataset/tfrecord_util.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File tfrecord_util.py 4 | @author:ZhengYuwei 5 | 功能: 6 | 1. TFRecordUtil.generate 7 | 用tf.gfile.FastGFile的read方法读取图片(要先确保shape一致),加上label,制作example,写tfrecord文件 8 | 用tf.gfile.FastGFile读取图片而不是用cv2.imread读取图片, 9 | 因为cv2.imread读取图片会使得存储的tfrecord扩大近8~10倍大小(相比原始jpg图片),而gfile只会增大一点 10 | 另一种是不用tfrecord,而是训练时从图片名列表中读取图片,构造训练数据,速度会慢一点(个人感觉可以忽略) 11 | 在进行这一切之前,确保图片不为空(cv2.imread(file) is not None) 12 | 2. TFRecordUtil.get_dataset 13 | 从tfrecord数据集中,构造tf.data.Dataset,解析图片、标签,返回未初始化的迭代器 14 | """ 15 | import os 16 | import logging 17 | import time 18 | import functools 19 | import numpy as np 20 | import tensorflow as tf 21 | import cv2 22 | 23 | from dataset.dataset_util import DatasetUtil 24 | 25 | 26 | class TFRecordUtil(object): 27 | """ 图片-标签数据集 保存tfrecord,读取tfrecord """ 28 | 29 | @staticmethod 30 | def generate(image_paths, labels, tfrecord_path): 31 | """ 用tf.gfile.FastGFile的read方法读取图片(要先确保shape一致),加上label,制作example,写tfrecord文件 32 | :param image_paths: 图片路径,list 33 | :param labels: 对应的标签,list 34 | :param tfrecord_path: tfrecord文件路径 35 | """ 36 | if tf.gfile.Exists(tfrecord_path): 37 | logging.warning('TFRecord数据集(%s)已经存在,不再生成...', tfrecord_path) 38 | return 39 | 40 | total = len(image_paths) 41 | if len(labels) != total: 42 | logging.error('图片路径数量(%d)不等于标签数量(%d)', total, len(labels)) 43 | return 44 | 45 | with tf.python_io.TFRecordWriter(tfrecord_path) as writer: 46 | for index, [image_path, label] in enumerate(zip(image_paths, labels)): 47 | if (index + 1) % 1000 == 0: 48 | logging.info('\r>> %d/%d done...', index + 1, total) 49 | if not os.path.exists(image_path): 50 | logging.warning('图片不存在:%s', image_path) 51 | continue 52 | # 读取图片,open(image_path, 'rb').read()和tf.read_file(image_path)也同样效果 53 | image_data = tf.gfile.GFile(image_path, 'rb').read() # type(image_data)为bytes 54 | # 多label转化为string 55 | label = np.asanyarray(label, dtype=np.int).tostring() 56 | # 制作example,并序列化 57 | tf_serialized = tf.train.Example( 58 | features=tf.train.Features( 59 | feature={ 60 | 'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_data])), 61 | 'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[label])) 62 | })).SerializeToString() 63 | 64 | writer.write(tf_serialized) 65 | return 66 | 67 | @staticmethod 68 | def _parse_tfrecord(serialized_example): 69 | """ 从序列化的tf.train.example解析出image(归一化)和label 70 | :param serialized_example: tfrecord读取来的序列化的tf.train.example数据 71 | :return: 归一化的图片,标签 72 | """ 73 | example = tf.parse_single_example( 74 | serialized_example, 75 | features={ 76 | 'image': tf.FixedLenFeature([], tf.string), 77 | 'label': tf.FixedLenFeature([], tf.string) 78 | } 79 | ) 80 | 81 | image = tf.image.decode_jpeg(example['image']) 82 | label = tf.decode_raw(example['label'], tf.int32) 83 | label = tf.cast(label, tf.float32) 84 | return image, label 85 | 86 | @staticmethod 87 | def _parse_image(image, _, image_size): 88 | """ 89 | 根据图片路径和标签,读取图片 90 | :param image: 原始图片rgb数据, Tensor(shape=(原始尺寸), dtype=int) 91 | :param _: 标签Tensor(shape(?,), dtype=float32)),本函数只产生图像dataset,故不需要 92 | :param image_size: 图像需要resize到的大小 93 | :return: 归一化的图片 Tensor(shape=(48, 144, ?), dtype=float32) 94 | """ 95 | # 图片 96 | image = tf.image.resize_images(image, image_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 97 | # 这里使用tf.float32会将照片归一化,也就是 *1/255 98 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 99 | image = tf.reverse(image, axis=[2]) # 读取的是rgb,需要转为bgr 100 | return image 101 | 102 | @staticmethod 103 | def _parse_labels(_, labels, num_labels): 104 | """ 105 | 根据图片路径和标签,解析标签 106 | :param _: 图片路径, Tensor(shape=(), dtype=int),本函数只产生标签dataset,故不需要 107 | :param labels: 标签,Tensor(shape=(?,), dtype=float32) 108 | :param num_labels: 每个图像对于输出的标签数(多标签分类模型) 109 | :return: 标签 DatasetV1Adapter<(多个标签Tensor(shape=(), dtype=float32), ...)> 110 | """ 111 | label_list = list() 112 | for label_index in range(num_labels): 113 | label_list.append(labels[label_index]) 114 | return label_list 115 | 116 | @staticmethod 117 | def get_dataset(tfrecord_path_mode, image_size, num_labels, batch_size, is_augment=True): 118 | """ 从tfrecord数据集中,构造tf.data,解析图片、标签,返回未初始化的迭代器 119 | :param tfrecord_path_mode: tfrecord数据集名的模式,使用glob进行匹配 120 | :param image_size: 图像需要resize到的尺寸 121 | :param num_labels: 每个图像对于输出的标签数(多标签分类模型) 122 | :param batch_size: 训练的batch大小 123 | :param is_augment: 是否进行数据增强 124 | :return: tf.data.Dataset对象 125 | """ 126 | logging.info('1. 读取tfrecord文件,生成可初始化迭代器') 127 | # 获取tfrecord,并进行解析 128 | tfrecord_path_list = tf.data.Dataset.list_files(tfrecord_path_mode) 129 | dataset = tf.data.TFRecordDataset(tfrecord_path_list) 130 | dataset = DatasetUtil.shuffle_repeat(dataset, batch_size) 131 | 132 | logging.info('2. 读取图片数据,构造image set和label set;') 133 | dataset = dataset.map(TFRecordUtil._parse_tfrecord, num_parallel_calls=tf.data.experimental.AUTOTUNE) 134 | image_set = dataset.map(functools.partial(TFRecordUtil._parse_image, image_size=image_size), 135 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 136 | labels_set = dataset.map(functools.partial(TFRecordUtil._parse_labels, num_labels=num_labels), 137 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 138 | if is_augment: 139 | logging.info('2.1 image set数据增强;') 140 | image_set = DatasetUtil.augment_image(image_set) 141 | 142 | logging.info('3. image set数据白化;') 143 | image_set = image_set.map(lambda image: tf.image.per_image_standardization(image), 144 | num_parallel_calls=tf.data.experimental.AUTOTUNE) 145 | 146 | logging.info('4. 完成tf.data (image, label) 整体数据集构造,多epoch训练模式;') 147 | dataset = tf.data.Dataset.zip((image_set, labels_set)) 148 | 149 | logging.info('5. 构造tf.data多epoch训练模式;') 150 | dataset = DatasetUtil.batch_prefetch(dataset, batch_size) 151 | return dataset 152 | 153 | 154 | if __name__ == '__main__': 155 | label_txt_path = './test_sample/label.txt' # 标签文件 156 | image_root_dir = './test_sample' # 图片根目录 157 | record_filename = './test_sample/{}.record' # tfrecord存储目录 158 | # 开启eager模式进行图片读取、增强和展示 159 | tf.enable_eager_execution() 160 | 161 | # 1. 得到图片路径列表、标签数据列表 162 | train_image_paths = list() 163 | train_labels = list() 164 | with open(label_txt_path, 'r', encoding='UTF-8') as label_file: 165 | for line in label_file: 166 | line = line.split(" ") 167 | train_image_paths.append(os.path.join(image_root_dir, line[0])) 168 | train_labels.append(line[1:]) 169 | 170 | # 2. 制作record 171 | file_names = record_filename.format('test_sample') 172 | TFRecordUtil.generate(train_image_paths, train_labels, file_names) 173 | 174 | # 3. 读取record 175 | train_batch = 100 176 | train_set = TFRecordUtil.get_dataset(file_names, image_size=(48, 144), num_labels=10, 177 | batch_size=train_batch, is_augment=True) 178 | start = time.time() 179 | for count, data in enumerate(train_set): 180 | print('一批(%d)图像 shape:' % train_batch, data[0].shape) 181 | for i in range(data[0].shape[0]): 182 | cv2.imshow('a', np.array(data[0][i])) 183 | cv2.waitKey(1) 184 | print('一批(%d)标签 shape:' % train_batch, len(data[1])) 185 | for i in range(len(data[1])): 186 | print(data[1][i]) 187 | if count == 100: 188 | break 189 | print('耗时:', time.time() - start) 190 | -------------------------------------------------------------------------------- /images/GHM-insight.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zheng-yuwei/multi-label-classification/3563628060e5a9534b106414a193e71c2fa001b8/images/GHM-insight.jpg -------------------------------------------------------------------------------- /images/focal-loss.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zheng-yuwei/multi-label-classification/3563628060e5a9534b106414a193e71c2fa001b8/images/focal-loss.jpg -------------------------------------------------------------------------------- /multi_label/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File __init__.py 4 | @author:ZhengYuwei 5 | """ -------------------------------------------------------------------------------- /multi_label/multi_label_loss.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File multi_label_loss.py 4 | @author:ZhengYuwei 5 | """ 6 | import numpy as np 7 | import tensorflow as tf 8 | from tensorflow import keras 9 | 10 | 11 | class MyLoss(object): 12 | """ 损失函数 """ 13 | def __init__(self, model, **options): 14 | self.model = model 15 | self.is_label_smoothing = options.setdefault('is_label_smoothing', False) 16 | self.is_focal_loss = options.setdefault('is_focal_loss', False) 17 | self.is_gradient_harmonizing = options.setdefault('is_gradient_harmonized', False) 18 | 19 | self.loss_func = self._normal_categorical_crossentropy() 20 | # 标签平滑 21 | if self.is_label_smoothing: 22 | self.smoothing_epsilon = options.setdefault('smoothing_epsilon', 0.005) 23 | # focal loss损失函数 24 | if self.is_focal_loss: 25 | gamma = options.setdefault('focal_loss_gamma', 2.0) 26 | alpha = options.setdefault('focal_loss_alpha', 1.0) 27 | self.loss_func = self._categorical_focal_loss(gamma, alpha) 28 | # gradient harmonized mechanism 29 | if self.is_gradient_harmonizing: 30 | bins = options.setdefault('ghm_loss_bins', 30) 31 | momentum = options.setdefault('ghm_loss_momentum', 0.75) 32 | self.loss_func = self._categorical_ghm_loss(bins, momentum) 33 | 34 | @staticmethod 35 | def _normal_categorical_crossentropy(): 36 | """ 自带的多标签分类损失函数 categorical_crossentropy """ 37 | def categorical_crossentropy(y_truth, y_pred, _): 38 | return keras.backend.categorical_crossentropy(y_truth, y_pred) 39 | return categorical_crossentropy 40 | 41 | @staticmethod 42 | def _categorical_focal_loss(gamma=2.0, alpha=1.0): 43 | """ 返回多分类 focal loss 函数 44 | Formula: loss = -alpha*((1-p_t)^gamma)*log(p_t) 45 | Parameters: 46 | alpha -- the same as wighting factor in balanced cross entropy, default 0.25 47 | gamma -- focusing parameter for modulating factor (1-p), default 2.0 48 | """ 49 | def focal_loss(y_truth, y_pred, _): 50 | epsilon = keras.backend.epsilon() 51 | y_pred = keras.backend.clip(y_pred, epsilon, 1.0 - epsilon) 52 | cross_entropy = -y_truth * keras.backend.log(y_pred) 53 | weight = alpha * keras.backend.pow(keras.backend.abs(y_truth - y_pred), gamma) 54 | loss = weight * cross_entropy 55 | loss = keras.backend.sum(loss, axis=1) 56 | return loss 57 | return focal_loss 58 | 59 | @staticmethod 60 | def _categorical_ghm_loss(bins=30, momentum=0.75): 61 | """ 返回多分类 GHM 损失函数: 62 | 把每个区间上的梯度做平均,也就是说把梯度拉平,回推到公式上等价于把loss做平均 63 | Formula: 64 | loss = sum(crossentropy_loss(p_i,p*_i) / GD(g_i)) 65 | GD(g) = S_ind(g) / delta = S_ind(g) * M 66 | S_ind(g) = momentum * S_ind(g) + (1 - momentum) * R_ind(g) 67 | R_ind(g)是 g=|p-p*| 所在梯度区间[(i-1)delta, i*delta]的样本数 68 | M = 1/delta,这个是个常数,理论上去掉只有步长影响 69 | Parameters: (论文默认) 70 | bins -- 区间个数,default 30 71 | momentum -- 使用移动平均来求区间内样本数,动量部分系数,论文说不敏感 72 | """ 73 | # 区间边界 74 | edges = np.array([i/bins for i in range(bins + 1)]) 75 | edges = np.expand_dims(np.expand_dims(edges, axis=-1), axis=-1) 76 | acc_sum = 0 77 | if momentum > 0: 78 | acc_sum = tf.zeros(shape=(bins,), dtype=tf.float32) 79 | 80 | def ghm_class_loss(y_truth, y_pred, valid_mask): 81 | epsilon = keras.backend.epsilon() 82 | y_pred = keras.backend.clip(y_pred, epsilon, 1.0 - epsilon) 83 | # 0. 计算本次mini-batch的梯度分布:R_ind(g) 84 | gradient = keras.backend.abs(y_truth - y_pred) 85 | # 获取概率最大的类别下标,将该类别的梯度做为该标签的梯度代表 86 | # 没有这部分就是每个类别的梯度都参与到GHM,实验表明没有这部分会更好些 87 | # truth_indices_1 = keras.backend.expand_dims(keras.backend.argmax(y_truth, axis=1)) 88 | # truth_indices_0 = keras.backend.expand_dims(keras.backend.arange(start=0, 89 | # stop=tf.shape(y_pred)[0], 90 | # step=1, dtype='int64')) 91 | # truth_indices = keras.backend.concatenate([truth_indices_0, truth_indices_1]) 92 | # main_gradient = tf.gather_nd(gradient, truth_indices) 93 | # gradient = tf.tile(tf.expand_dims(main_gradient, axis=-1), [1, y_pred.shape[1]]) 94 | 95 | # 求解各个梯度所在的区间,并落到对应区间内进行密度计数 96 | grads_bin = tf.logical_and(tf.greater_equal(gradient, edges[:-1, :, :]), tf.less(gradient, edges[1:, :, :])) 97 | valid_bin = tf.boolean_mask(grads_bin, valid_mask, name='valid_gradient', axis=1) 98 | valid_bin = tf.reduce_sum(tf.cast(valid_bin, dtype=tf.float32), axis=(1, 2)) 99 | # 2. 更新指数移动平均后的梯度分布:S_ind(g) 100 | nonlocal acc_sum 101 | acc_sum = tf.add(momentum * acc_sum, (1 - momentum) * valid_bin, name='update_bin_number') 102 | # sample_num = tf.reduce_sum(acc_sum) # 是否乘以总数,乘上效果反而变差了 103 | # 3. 计算本次mini-batch不同loss对应的梯度密度:GD(g) 104 | position = tf.slice(tf.where(grads_bin), [0, 1], [-1, 2]) 105 | value = tf.gather_nd(acc_sum, tf.slice(tf.where(grads_bin), [0, 0], [-1, 1])) # * bins 106 | grad_density = tf.sparse.SparseTensor(indices=position, values=value, 107 | dense_shape=tf.shape(gradient, out_type=tf.int64)) 108 | grad_density = tf.sparse.to_dense(grad_density, validate_indices=False) 109 | grad_density = grad_density * tf.expand_dims(valid_mask, -1) + (1 - tf.expand_dims(valid_mask, -1)) 110 | 111 | # 4. 计算本次mini-batch不同样本的损失:loss 112 | cross_entropy = -y_truth * keras.backend.log(y_pred) 113 | # loss = cross_entropy / grad_density * sample_num 114 | loss = cross_entropy / grad_density 115 | loss = keras.backend.sum(loss, axis=1) 116 | """ 117 | # 调试用,打印tensor 118 | print_op = tf.print('acc_sum: ', acc_sum, '\n', 119 | 'grad_density: ', grad_density, '\n', 120 | 'cross_entropy: ', cross_entropy, '\n', 121 | 'loss:', loss, '\n', 122 | '\n', 123 | '=================================================\n', 124 | summarize=100) 125 | with tf.control_dependencies([print_op]): 126 | return tf.identity(loss) 127 | """ 128 | return loss 129 | return ghm_class_loss 130 | 131 | def categorical_crossentropy(self, y_truth, y_pred): 132 | """ 单标签多分类损失函数 133 | :param y_truth: 真实类别值, (?, ?) 134 | :param y_pred: 预测类别值, (?, num_classes) 135 | :return: loss 136 | """ 137 | num_classes = keras.backend.cast(keras.backend.int_shape(y_pred)[-1], dtype=tf.int32) # 类别数 138 | # 将sparse的truth输出flatten, 记录无效标签(-1)和有效标签(>=0)位置,后续用于乘以loss 139 | y_truth = keras.backend.flatten(y_truth) 140 | valid_mask = 1.0 - tf.cast(tf.less(y_truth, 0), dtype=tf.float32) 141 | # 转为one_hot 142 | y_truth = keras.backend.cast(y_truth, dtype=tf.uint8) 143 | y_truth = keras.backend.one_hot(indices=y_truth, num_classes=num_classes) 144 | 145 | # 标签平滑 146 | if self.is_label_smoothing: 147 | num_classes = keras.backend.cast(num_classes, dtype=y_pred.dtype) 148 | y_truth = (1.0 - self.smoothing_epsilon) * y_truth + self.smoothing_epsilon / num_classes 149 | 150 | loss = self.loss_func(y_truth, y_pred, valid_mask) 151 | loss = loss * valid_mask 152 | """ 153 | # 调试用,打印tensor 154 | print_op = tf.print( 155 | # 'y_pred: ', y_pred, '\n', 156 | # 'y_truth: ', y_truth, '\n', 157 | # 'valid_mask: ', valid_mask, '\n', 158 | # 'loss:', loss, '\n', 159 | # 'normal_loss:', self._normal_categorical_crossentropy()(y_truth, y_pred, valid_mask), '\n', 160 | 'layer losses (regularization)', tf.transpose(self.model.losses), '\n', 161 | 'mean loss:', tf.reduce_mean(loss), '\t', 162 | 'sum layer losses:', tf.reduce_sum(tf.transpose(self.model.losses)), '\n', 163 | '=================================================\n', 164 | summarize=100 165 | ) 166 | with tf.control_dependencies([print_op]): 167 | return tf.identity(loss) 168 | """ 169 | return loss 170 | -------------------------------------------------------------------------------- /multi_label/multi_label_model.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File multi_label_model.py 4 | @author:ZhengYuwei 5 | """ 6 | import logging 7 | from tensorflow import keras 8 | from backbone.resnet18 import ResNet18 9 | from backbone.resnet18_v2 import ResNet18_v2 10 | from backbone.resnext import ResNeXt18 11 | from backbone.mixnet18 import MixNet18 12 | from backbone.mobilenet_v2 import MobileNetV2 13 | 14 | 15 | class Classifier(object): 16 | """ 17 | 分类器,自定义了多标签的head(多输出keras.models.Model对象) 18 | """ 19 | BACKBONE_RESNET_18 = 'resnet-18' 20 | BACKBONE_RESNET_18_V2 = 'resnet-18-v2' 21 | BACKBONE_RESNEXT_18 = 'resnext-18' 22 | BACKBONE_MIXNET_18 = 'mixnet-18' 23 | BACKBONE_MOBILENET_V2 = 'mobilenet-v2' 24 | BACKBONE_TYPE = { 25 | BACKBONE_RESNET_18: ResNet18, 26 | BACKBONE_RESNET_18_V2: ResNet18_v2, 27 | BACKBONE_RESNEXT_18: ResNeXt18, 28 | BACKBONE_MOBILENET_V2: MobileNetV2, 29 | BACKBONE_MIXNET_18: MixNet18 30 | } 31 | 32 | @classmethod 33 | def _multi_label_head(cls, net, output_shape, output_names): 34 | """ 35 | 多标签分类器的head,上接全连接输入,下输出多个标签的多分类softmax输出 36 | :param net: 全连接输入 37 | :param output_shape: 多标签输出的每个分支的类别数列表 38 | :param output_names: 多标签输出的每个分支的名字 39 | :return: keras.models.Model对象 40 | """ 41 | # 全连接层:先做全局平均池化,然后flatten,然后再全连接层 42 | net = keras.layers.GlobalAveragePooling2D()(net) 43 | net = keras.layers.Flatten()(net) 44 | 45 | # 不同标签分支 46 | outputs = list() 47 | for num, name in zip(output_shape, output_names): 48 | output = keras.layers.Dense(units=num, kernel_initializer=keras.initializers.RandomNormal(stddev=0.01), 49 | activation="softmax", name=name)(net) 50 | """ 51 | output = keras.layers.Dense(units=num, kernel_initializer=keras.initializers.RandomNormal(stddev=0.01), 52 | kernel_regularizer=keras.regularizers.l2(ResNet18.L2_WEIGHT), 53 | bias_regularizer=keras.regularizers.l2(ResNet18.L2_WEIGHT), 54 | activation="softmax", name=name)(net) 55 | """ 56 | outputs.append(output) 57 | return outputs 58 | 59 | @classmethod 60 | def build(cls, backbone, input_shape, output_shape, output_names): 61 | """ 62 | 构建backbone基础网络的多标签分类keras.models.Model对象 63 | :param backbone: 基础网络,枚举变量 Classifier.NetType 64 | :param input_shape: 输入尺寸 65 | :param output_shape: 多标签输出的每个分支的类别数列表 66 | :param output_names: 多标签输出的每个分支的名字 67 | :return: resnet18基础网络的多标签分类keras.models.Model对象 68 | """ 69 | if len(input_shape) != 3: 70 | raise Exception('模型输入形状必须是3维形式') 71 | 72 | if backbone in cls.BACKBONE_TYPE.keys(): 73 | backbone = cls.BACKBONE_TYPE[backbone] 74 | else: 75 | raise ValueError("没有该类型的基础网络!") 76 | 77 | if len(input_shape) != 3: 78 | raise Exception('模型输入形状必须是3维形式') 79 | 80 | logging.info('构造多标签分类模型,基础网络:%s', backbone) 81 | input_x = keras.layers.Input(shape=input_shape) 82 | backbone_model = backbone.build(input_x) 83 | outputs = Classifier._multi_label_head(backbone_model, output_shape, output_names) 84 | model = keras.models.Model(inputs=input_x, outputs=outputs, name=backbone) 85 | return model 86 | 87 | 88 | if __name__ == '__main__': 89 | """ 90 | 可视化网络结构,使用plot_model需要先用conda安装GraphViz、pydotplus 91 | """ 92 | from configs import FLAGS 93 | model_names = Classifier.BACKBONE_TYPE.keys() 94 | for model_name in model_names: 95 | test_model = Classifier.build(model_name, FLAGS.input_shape, FLAGS.output_shapes, FLAGS.output_names) 96 | keras.utils.plot_model(test_model, to_file='../images/{}.svg'.format(model_name), show_shapes=True) 97 | test_model.summary() 98 | -------------------------------------------------------------------------------- /multi_label/trainer.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File trainer.py 4 | @author:ZhengYuwei 5 | """ 6 | import os 7 | import logging 8 | import tensorflow as tf 9 | from tensorflow import saved_model 10 | from tensorflow import keras 11 | 12 | from configs import FLAGS 13 | from dataset.file_util import FileUtil 14 | from multi_label.multi_label_model import Classifier 15 | from multi_label.multi_label_loss import MyLoss 16 | 17 | 18 | class MultiLabelClassifier(object): 19 | """ 20 | 训练分类器: 21 | 1. 初始化分类器模型、训练参数等; 22 | 2. 调用prepare_data函数准备训练、验证数据集; 23 | 3. 调用train函数训练。 24 | """ 25 | 26 | GPU_MODE = 'gpu' 27 | CPU_MODE = 'cpu' 28 | 29 | def __init__(self): 30 | """ 训练初始化 """ 31 | # 构建模型网络 32 | self.backbone = FLAGS.model_backbone # 网络类型 33 | self.input_shape = FLAGS.input_shape 34 | self.output_shapes = FLAGS.output_shapes 35 | 36 | model = Classifier.build(self.backbone, self.input_shape, self.output_shapes, FLAGS.output_names) 37 | # 训练模型: cpu,gpu 或 多gpu 38 | if FLAGS.gpu_mode == MultiLabelClassifier.GPU_MODE and FLAGS.gpu_num > 1: 39 | self.model = keras.utils.multi_gpu_model(model, gpus=FLAGS.gpu_num) 40 | else: 41 | self.model = model 42 | self.model.summary() 43 | self.history = None 44 | 45 | # 加载预训练模型(若有) 46 | self.checkpoint_path = FLAGS.checkpoint_path 47 | if self.checkpoint_path is None: 48 | self.checkpoint_path = 'models/' 49 | if os.path.isfile(self.checkpoint_path): 50 | if os.path.exists(self.checkpoint_path): 51 | self.model.load_weights(self.checkpoint_path) 52 | logging.info('加载模型成功!') 53 | else: 54 | self.checkpoint_path = os.path.dirname(self.checkpoint_path) 55 | if os.path.isdir(self.checkpoint_path): 56 | if not os.path.exists(self.checkpoint_path): 57 | os.makedirs(self.checkpoint_path) 58 | latest = tf.train.latest_checkpoint(self.checkpoint_path) 59 | if latest is not None: 60 | self.model.load_weights(latest) 61 | logging.info('加载模型成功!') 62 | logging.info(latest) 63 | else: 64 | self.checkpoint_path = os.path.dirname(self.checkpoint_path) 65 | self.checkpoint_path = os.path.join(self.checkpoint_path, FLAGS.checkpoint_name) 66 | 67 | # 设置训练过程中的回调函数 68 | tensorboard = keras.callbacks.TensorBoard(log_dir=FLAGS.tensorboard_dir) 69 | cp_callback = keras.callbacks.ModelCheckpoint(self.checkpoint_path, save_weights_only=True, 70 | verbose=1, period=FLAGS.ckpt_period) 71 | es_callback = keras.callbacks.EarlyStopping(monitor='loss', min_delta=FLAGS.stop_min_delta, 72 | patience=FLAGS.stop_patience, verbose=0, mode='min') 73 | lr_callback = keras.callbacks.LearningRateScheduler(FLAGS.lr_func) 74 | # from utils.logger_callback import NBatchProgbarLogger 75 | # log_callback = NBatchProgbarLogger(display=FLAGS.logger_batch) 76 | self.callbacks = [tensorboard, cp_callback, es_callback, lr_callback, ] 77 | 78 | # 设置模型优化方法 79 | self.loss_function = list() 80 | for _ in self.output_shapes: 81 | loss_function = MyLoss(self.model, 82 | is_label_smoothing=FLAGS.is_label_smoothing, 83 | is_focal_loss=FLAGS.is_focal_loss, 84 | is_gradient_harmonized=FLAGS.is_gradient_harmonized).categorical_crossentropy 85 | self.loss_function.append(loss_function) 86 | 87 | optimizer = keras.optimizers.SGD(lr=FLAGS.init_lr, momentum=0.95, nesterov=True) 88 | if FLAGS.optimizer == 'adam': 89 | optimizer = keras.optimizers.Adam(lr=FLAGS.init_lr, amsgrad=True) # 用AMSGrad 90 | elif FLAGS.optimizer == 'adabound': 91 | from keras_adabound import AdaBound 92 | optimizer = AdaBound(lr=1e-3, final_lr=0.1) 93 | elif FLAGS.optimizer == 'radam': 94 | from utils.radam import RAdam 95 | optimizer = RAdam(lr=1e-3) 96 | 97 | # 由于是多标签分类损失,最终答应的损失信息为: 98 | # loss: 44.6420 - class_1_loss: 7.8428 - class_2_loss: 5.8357 - class_3_loss: 4.5361 - class_4_loss: 4.7954 99 | # - class_5_loss: 4.1554 - class_6_loss: 4.6104 - class_7_loss: 5.5645 - class_8_loss: 0.6412 100 | # - class_9_loss: 0.7639 - class_10_loss: 4.1163 101 | # class_1_loss等为对应标签的平均样本损失,loss=所有标签平均样本损失 + 权重 * 罚项(正则项)误差(model.losses) 102 | self.model.compile(optimizer=optimizer, loss=self.loss_function, loss_weights=FLAGS.loss_weights) 103 | 104 | # 设置模型训练参数 105 | self.mini_batch = FLAGS.batch_size 106 | self.epoch = FLAGS.epoch 107 | 108 | def prepare_data(self, label_file_path, image_root_dir, is_augment=False, is_test=False): 109 | """ 110 | 数据集准备,返回可初始化迭代器,使用前需要先sess.run(iterator.initializer)进行初始化 111 | :param label_file_path: 标签文件路径,格式参考 代码具体接口解释 112 | :param image_root_dir: 图片文件根目录 113 | :param is_augment: 是否进行数据增强 114 | :param is_test: 是否为测试阶段 115 | :return: tf.data.Dataset对象 116 | """ 117 | logging.info('加载数据集:%s', label_file_path) 118 | dataset = FileUtil.get_dataset(label_file_path, image_root_dir, image_size=self.input_shape[0:2], 119 | num_labels=len(self.output_shapes), batch_size=self.mini_batch, 120 | is_augment=is_augment, is_test=is_test) 121 | return dataset 122 | 123 | def train(self, train_set, val_set, train_steps=FLAGS.steps_per_epoch, val_steps=FLAGS.validation_steps): 124 | """ 125 | 使用训练集和验证集进行模型训练 126 | :param train_set: 训练数据集的tf.data.Dataset对象 127 | :param val_set: 验证数据集的tf.data.Dataset对象 128 | :param train_steps: 每个训练epoch的迭代次数 129 | :param val_steps: 每个验证epoch的迭代次数 130 | :return: 131 | """ 132 | if val_set: 133 | self.history = self.model.fit(train_set, epochs=self.epoch, validation_data=val_set, 134 | steps_per_epoch=train_steps, validation_steps=val_steps, 135 | callbacks=self.callbacks, verbose=2) 136 | else: 137 | self.history = self.model.fit(train_set, epochs=self.epoch, steps_per_epoch=train_steps, 138 | callbacks=self.callbacks, verbose=2) 139 | logging.info('模型训练完毕!') 140 | 141 | def save_serving(self): 142 | """ 使用TensorFlow Serving时的保存方式: 143 | serving-save-dir/ 144 | saved_model.pb 145 | variables/ 146 | .data & .index 147 | """ 148 | outputs = dict() 149 | for index, name in enumerate(FLAGS.output_names): 150 | outputs[name] = self.model.outputs[index] 151 | 152 | builder = saved_model.builder.SavedModelBuilder(FLAGS.serving_model_dir) 153 | signature = saved_model.signature_def_utils.predict_signature_def(inputs={'images': self.model.input}, 154 | outputs=outputs) 155 | with keras.backend.get_session() as sess: 156 | builder.add_meta_graph_and_variables(sess=sess, 157 | tags=[saved_model.tag_constants.SERVING], 158 | signature_def_map={'predict': signature}) 159 | builder.save() 160 | logging.info('serving模型保存成功!') 161 | 162 | def save_mobile(self): 163 | """ 164 | 保存模型为pb模型:先转为h5,再保存为pb(没法直接转pb) 165 | """ 166 | # 获取待保存ckpt文件的文件名 167 | latest = tf.train.latest_checkpoint(os.path.dirname(self.checkpoint_path)) 168 | model_name = os.path.splitext(os.path.basename(latest))[0] 169 | if not os.path.exists(FLAGS.pb_model_dir): 170 | os.makedirs(FLAGS.pb_model_dir) 171 | # 将整个模型保存为h5(包含图结构和参数),然后再重新加载 172 | h5_path = os.path.join(FLAGS.pb_model_dir, '{}.h5'.format(model_name)) 173 | self.model.save(h5_path, overwrite=True, include_optimizer=False) 174 | model = keras.models.load_model(h5_path) 175 | model.summary() 176 | # 保存pb 177 | with keras.backend.get_session() as sess: 178 | output_names = [out.op.name for out in model.outputs] 179 | input_graph_def = sess.graph.as_graph_def() 180 | for node in input_graph_def.node: 181 | node.device = "" 182 | graph = tf.graph_util.remove_training_nodes(input_graph_def) 183 | graph_frozen = tf.graph_util.convert_variables_to_constants(sess, graph, output_names) 184 | tf.train.write_graph(graph_frozen, FLAGS.pb_model_dir, '{}.pb'.format(model_name), as_text=False) 185 | logging.info("pb模型保存成功!") 186 | 187 | def evaluate(self, test_set, steps): 188 | """ 189 | 使用测试集进行模型评估 190 | :param test_set: 测试集的tf.data.Dataset对象 191 | :param steps: 每一个epoch评估次数 192 | :return: 193 | """ 194 | test_loss, test_acc = self.model.evaluate(test_set) 195 | logging.info('Test accuracy:', test_acc, steps) 196 | 197 | def predict(self, test_images): 198 | """ 199 | 使用测试图片进行模型测试 200 | :param test_images: 测试图片 201 | :return: 202 | """ 203 | predictions = self.model.predict(test_images) 204 | return predictions 205 | 206 | def get_gradients(self, images, labels, persistent=False): 207 | """ 208 | 在给定输入,获取所有可训练权重向量的梯度向量 209 | :param images: 输入图像 210 | :param labels: 标签ground truth 211 | :param persistent: 是否用持久化的tape,一般不用,除非开启debug模式在该函数内debug 212 | :return: 获取所有可训练参数的梯度 213 | """ 214 | with tf.GradientTape(persistent=persistent) as tape: 215 | y_preds = self.model(images) 216 | y_truths = labels 217 | loss = 0 218 | for y_truth, y_pred in zip(y_truths, y_preds): 219 | loss += self.loss_function(y_truth, y_pred) 220 | loss = tf.reduce_mean(loss) 221 | gradients = tape.gradient(loss, self.model.trainable_weights) 222 | gradients = [{weight.name: gradient} for gradient, weight in zip(gradients, self.model.trainable_weights)] 223 | return gradients 224 | 225 | def get_trainable_layers_func(self): 226 | """ 227 | 构造keras函数:在给定输入,获取所有layer的预测结果 228 | usage: 229 | classifier = TrainClassifier(backbone=Classifier.BACKBONE_RESNET_18) 230 | get_trainable_layers = classifier.get_trainable_layers_func() 231 | outputs = get_trainable_layers(test_images) # test_images [None, 48, 144, 3] 232 | :return: 获取所有layers的预测结果的keras函数 233 | """ 234 | trainable_names = [weight.name for weight in self.model.trainable_weights] 235 | trainable_names = set([name.split('/')[0] for name in trainable_names]) 236 | trainable_outputs = [{layer.name: layer.output} for layer in self.model.layers 237 | if layer.name in trainable_names] 238 | get_trainable_layers = keras.backend.function(inputs=[self.model.input], outputs=trainable_outputs) 239 | return get_trainable_layers 240 | 241 | def get_layers_func(self): 242 | """ 243 | 构造keras函数:在给定输入,获取所有layer的预测结果 244 | usage: 245 | classifier = TrainClassifier(backbone=Classifier.BACKBONE_RESNET_18) 246 | get_layers = classifier.get_layer_func() 247 | outputs = get_layers(test_images) # test_images [None, 48, 144, 3] 248 | :return: 获取所有layers的预测结果的keras函数 249 | """ 250 | layers_output = [layer.output for layer in self.model.layers] 251 | get_layers = keras.backend.function(inputs=[self.model.input], outputs=layers_output) 252 | return get_layers 253 | 254 | def convert_multi2single(self): 255 | """ 256 | 将多GPU训练的模型转为单GPU模型,从而可以在单GPU上运行测试 257 | :return: 258 | """ 259 | # it's necessary to save the model before use this single GPU model 260 | multi_model = self.model.layers[FLAGS.gpu_num + 1] # get single GPU model weights 261 | dir_name = self.checkpoint_path 262 | if not os.path.isdir(self.checkpoint_path): 263 | dir_name = os.path.dirname(self.checkpoint_path) 264 | latest = tf.train.latest_checkpoint(dir_name) 265 | save_path = os.path.join(dir_name, 'single_' + os.path.basename(latest)) 266 | multi_model.save_weights(save_path) 267 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy>=1.16.5 2 | seaborn==0.9.0 3 | easydict==1.9 4 | pandas==0.25.2 5 | opencv_python==4.1.1.26 6 | tensorflow>=1.13.1 7 | matplotlib==3.1.1 8 | keras_adabound==0.5.0 9 | scikit_learn==0.21.3 10 | -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on 2019/7/17 4 | File run.py 5 | @author:ZhengYuwei 6 | """ 7 | import os 8 | import logging 9 | import numpy as np 10 | import tensorflow as tf 11 | from tensorflow import keras 12 | from logging.handlers import RotatingFileHandler 13 | 14 | from multi_label.trainer import MultiLabelClassifier 15 | from configs import FLAGS 16 | 17 | if FLAGS.mode == 'test': 18 | tf.enable_eager_execution() 19 | if FLAGS.mode in ('train', 'debug'): 20 | keras.backend.set_learning_phase(True) 21 | else: 22 | keras.backend.set_learning_phase(False) 23 | np.random.seed(6) 24 | tf.set_random_seed(800) 25 | 26 | 27 | def generate_logger(filename, **log_params): 28 | """ 29 | 生成日志记录对象记录日志 30 | :param filename: 日志文件名称 31 | :param log_params: 日志参数 32 | :return: 33 | """ 34 | level = log_params.setdefault('level', logging.INFO) 35 | 36 | logger = logging.getLogger() 37 | logger.setLevel(level=level) 38 | formatter = logging.Formatter('%(asctime)s %(filename)s:%(lineno)d %(levelname)s %(message)s') 39 | # 定义一个RotatingFileHandler,最多备份3个日志文件,每个日志文件最大1M 40 | file_handler = RotatingFileHandler(filename, maxBytes=1 * 1024 * 1024, backupCount=3) 41 | file_handler.setFormatter(formatter) 42 | # 控制台输出 43 | console = logging.StreamHandler() 44 | console.setFormatter(formatter) 45 | 46 | logger.addHandler(file_handler) 47 | logger.addHandler(console) 48 | 49 | 50 | def run(): 51 | # gpu模式 52 | if FLAGS.gpu_mode != MultiLabelClassifier.CPU_MODE: 53 | os.environ["CUDA_VISIBLE_DEVICES"] = FLAGS.visible_gpu 54 | # tf.device('/gpu:{}'.format(FLAGS.gpu_device)) 55 | config = tf.ConfigProto() 56 | config.gpu_options.allow_growth = True # 按需 57 | sess = tf.Session(config=config) 58 | 59 | """ 60 | # 添加debug:nan或inf过滤器 61 | from tensorflow.python import debug as tf_debug 62 | from tensorflow.python.debug.lib.debug_data import InconvertibleTensorProto 63 | sess = tf_debug.LocalCLIDebugWrapperSession(sess) 64 | 65 | # nan过滤器 66 | def has_nan(datum, tensor): 67 | _ = datum # Datum metadata is unused in this predicate. 68 | if isinstance(tensor, InconvertibleTensorProto): 69 | # Uninitialized tensor doesn't have bad numerical values. 70 | # Also return False for data types that cannot be represented as numpy 71 | # arrays. 72 | return False 73 | elif (np.issubdtype(tensor.dtype, np.floating) or 74 | np.issubdtype(tensor.dtype, np.complex) or 75 | np.issubdtype(tensor.dtype, np.integer)): 76 | return np.any(np.isnan(tensor)) 77 | else: 78 | return False 79 | 80 | # inf过滤器 81 | def has_inf(datum, tensor): 82 | _ = datum # Datum metadata is unused in this predicate. 83 | if isinstance(tensor, InconvertibleTensorProto): 84 | # Uninitialized tensor doesn't have bad numerical values. 85 | # Also return False for data types that cannot be represented as numpy 86 | # arrays. 87 | return False 88 | elif (np.issubdtype(tensor.dtype, np.floating) or 89 | np.issubdtype(tensor.dtype, np.complex) or 90 | np.issubdtype(tensor.dtype, np.integer)): 91 | return np.any(np.isinf(tensor)) 92 | else: 93 | return False 94 | 95 | # 添加过滤器 96 | sess.add_tensor_filter("has_nan", has_nan) 97 | sess.add_tensor_filter("has_inf", has_inf) 98 | sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) 99 | """ 100 | keras.backend.set_session(sess) 101 | 102 | generate_logger(filename=FLAGS.log_path) 103 | logging.info('TensorFlow version: %s', tf.__version__) # 1.13.1 104 | logging.info('Keras version: %s', keras.__version__) # 2.2.4-tf 105 | 106 | classifier = MultiLabelClassifier() 107 | 108 | # 模型训练 109 | if FLAGS.mode == 'train': 110 | train_dataset = classifier.prepare_data(FLAGS.train_label_path, FLAGS.train_set_dir, FLAGS.is_augment) 111 | classifier.train(train_dataset, None) 112 | logging.info('训练完毕!') 113 | 114 | # 用于测试, 115 | elif FLAGS.mode == 'test': 116 | # 测试用单GPU测试,若是多GPU模型,需要先转为单GPU模型,然后再执行测试 117 | if FLAGS.gpu_num > 1: 118 | classifier.convert_multi2single() 119 | logging.info('多GPU训练模型转换单GPU运行模型成功,请使用单GPU测试!') 120 | return 121 | 122 | total_test, wrong_count, great_total_count, great_wrong_count, great_wrong_records = test_model(classifier) 123 | logging.info('预测总数:%d\t 错误数:%d', total_test, wrong_count) 124 | logging.info('大于置信度总数:%d\t 错误数:%d\t 准确率:%f', great_total_count, great_wrong_count, 125 | 1 - great_wrong_count/(great_total_count + 1e-7)) 126 | # logging.info('错误路径是:\n%s', great_wrong_records) 127 | logging.info('测试完毕!') 128 | 129 | # 用于调试,查看训练的模型中每一层的输出/梯度 130 | elif FLAGS.mode == 'debug': 131 | import cv2 132 | train_dataset = classifier.prepare_data(FLAGS.train_label_path, FLAGS.train_set_dir, FLAGS.is_augment) 133 | get_trainable_layers = classifier.get_trainable_layers_func() 134 | for images, labels in train_dataset: 135 | cv2.imshow('a', np.array(images[0])) 136 | cv2.waitKey(1) 137 | outputs = get_trainable_layers(images) # 每一个可训练层的输出 138 | gradients = classifier.get_gradients(images, labels) # 每一个可训练层的参数梯度 139 | assert outputs is not None 140 | assert gradients is not None 141 | logging.info("=============== debug ================") 142 | 143 | # 将模型保存为pb模型 144 | elif FLAGS.mode == 'save_pb': 145 | # 保存模型记得注释eager execution 146 | classifier.save_mobile() 147 | 148 | # 将模型保存为服务器pb模型 149 | elif FLAGS.mode == 'save_serving': 150 | # 保存模型记得注释eager execution 151 | classifier.save_serving() 152 | else: 153 | raise ValueError('Mode Error!') 154 | 155 | 156 | def test_model(classifier): 157 | """ 模型测试 158 | :param classifier: 训练完毕的多标签分类模型 159 | :return: 总测试样本数, 总错误样本数,大于置信度的总样本数, 大于置信度的错误样本数, 错误样本路径记录 160 | """ 161 | # import cv2 162 | # 测试集包含(image, labels, image_path) 163 | test_set = classifier.prepare_data(FLAGS.test_label_path, FLAGS.test_set_dir, is_augment=False, is_test=True) 164 | base_conf = FLAGS.base_confidence # 置信度基线 165 | 166 | # 实际标签,预测标签,预测概率(label数,验证样本数) 167 | total_test = int(np.ceil(FLAGS.val_set_size / FLAGS.batch_size) * FLAGS.batch_size) 168 | truth = np.zeros(shape=(len(FLAGS.output_shapes), total_test)) 169 | pred = np.zeros(shape=(len(FLAGS.output_shapes), total_test)) 170 | prob = np.zeros(shape=(len(FLAGS.output_shapes), total_test)) 171 | start_index, end_index = 0, FLAGS.batch_size 172 | great_wrong_records = list() # 大于置信度的错误路径集合 173 | for images, labels, paths in test_set: 174 | great_wrong_records = np.concatenate((great_wrong_records, np.array(paths)), axis=0) 175 | truth[:, start_index:end_index] = np.array(labels) 176 | results = classifier.predict(np.array(images)) 177 | pred[:, start_index:end_index] = np.array([np.argmax(result, axis=-1) for result in results]) 178 | prob[:, start_index:end_index] = np.array([np.max(result, axis=-1) for result in results]) 179 | start_index, end_index = end_index, end_index + FLAGS.batch_size 180 | logging.info('finish: %d/%d', start_index, total_test) 181 | if start_index >= total_test: 182 | break 183 | 184 | # 比较truth和pred,prob和base conf,以统计评价指标 185 | valid_mask = (truth != -1) # 有效的待预测位置标记(无效标签/未知类别的在label里真实标签为-1) 186 | wrong_mask = abs(pred - truth) > 0.5 # 预测错误的位置标记 187 | great_conf_mask = (prob >= base_conf) # 预测置信度大于基线的位置标记 188 | wrong_result = np.any(valid_mask & wrong_mask, axis=0) 189 | great_conf_result = np.all(~valid_mask | great_conf_mask, axis=0) 190 | 191 | # 总错误数,大于置信度错误数,总大于置信度样本数 192 | wrong_count = np.sum(wrong_result) 193 | great_total_count = np.sum(great_conf_result) 194 | great_wrong_count = np.sum(wrong_result & great_conf_result) 195 | # 记录大于置信度的预测错误标签 196 | if np.any(wrong_result & great_conf_result): 197 | great_wrong_records = [u.decode() for u in great_wrong_records[wrong_result & great_conf_result]] 198 | 199 | # plot_confusion_matrix(truth, pred) 200 | return total_test, wrong_count, great_total_count, great_wrong_count, great_wrong_records 201 | 202 | 203 | def plot_confusion_matrix(y_trues, y_preds): 204 | from utils import draw_tools 205 | for i in range(y_trues.shape[0]): 206 | valid_mask = (y_trues[i] != -1) 207 | draw_tools.plot_confusion_matrix(y_trues[i][valid_mask], y_preds[i][valid_mask], 208 | ['cls_{}'.format(i) for i in range(FLAGS.output_shapes[i])], 209 | FLAGS.output_names[i], is_save=True) 210 | return 211 | 212 | 213 | if __name__ == '__main__': 214 | run() 215 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File __init__.py 4 | @author: ZhengYuwei 5 | 功能: 6 | generate_txt:用于根据图片及图片名字,生成标签文件;(当前可能不存在) 7 | check_label_file.py:校验标签文件中,图片路径所指的图片是否存在、可读、且不为空; 8 | draw_tools.py: 绘制混淆图 9 | logger_callback.py:训练logger的回调类; 10 | """ -------------------------------------------------------------------------------- /utils/check_label_file.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File check_label_file.py 4 | @author:ZhengYuwei 5 | """ 6 | import os 7 | import cv2 8 | 9 | # 检查标签文件中的图片是否存在且可以打开 10 | train_set_dir = '/home/train_set/images' 11 | train_label_path = '/home/train_set/train.txt' 12 | lines = list() 13 | with open(train_label_path, 'r') as file: 14 | for line in file: 15 | img_name = line.strip().split(' ')[0] 16 | img_path = os.path.join(train_set_dir, img_name) 17 | if os.path.isfile(img_path): 18 | img = cv2.imread(img_path) 19 | if img is not None: 20 | lines.append(line) 21 | 22 | lines[-1] = lines[-1].strip() 23 | new_train_label_path = os.path.join(os.path.dirname(train_label_path), 'new_train.txt') 24 | with open(new_train_label_path, 'w') as file: 25 | file.writelines(lines) 26 | -------------------------------------------------------------------------------- /utils/draw_tools.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File check_label_file.py 4 | @author:ZhengYuwei 5 | """ 6 | import os 7 | import numpy as np 8 | import pandas as pd 9 | import matplotlib.pyplot as plt 10 | import seaborn as sns 11 | from sklearn.metrics import confusion_matrix 12 | 13 | 14 | def plot_confusion_matrix(y_true, y_pred, labels, title='Confusion matrix', is_save=False): 15 | """ 绘制混淆矩阵 16 | :param y_true: 正确类别标签 17 | :param y_pred: 预测类别标签 18 | :param labels: 类别标签列表 19 | :param title: 图名 20 | :param is_save: 是否保存图片 21 | :return: 22 | """ 23 | if labels: 24 | y_true = [labels[int(i)] for i in y_true] 25 | y_pred = [labels[int(i)] for i in y_pred] 26 | # 计算混淆矩阵,y轴是true,x轴是predicted 27 | conf_matrix = confusion_matrix(y_true, y_pred, labels=labels) 28 | conf_matrix_pred_sum = np.sum(conf_matrix, axis=0, keepdims=True).astype(float) + 1e-7 29 | conf_matrix_percent = conf_matrix / conf_matrix_pred_sum * 100 # 沿y轴的百分比 30 | 31 | annot = np.empty_like(conf_matrix).astype(str) 32 | nrows, ncols = conf_matrix.shape 33 | for i in range(nrows): 34 | for j in range(ncols): 35 | c = conf_matrix[i, j] 36 | p = conf_matrix_percent[i, j] 37 | if i == j: 38 | s = conf_matrix_pred_sum[0][i] 39 | # annot[i, j] = '%.2f%%\n%d/%d' % (p, c, s) 40 | annot[i, j] = '%.2f%%\n%d' % (p, c) 41 | elif c == 0: 42 | annot[i, j] = '' 43 | else: 44 | annot[i, j] = '%.2f%%\n%d' % (p, c) 45 | 46 | # 绘制混淆矩阵图 47 | conf_matrix = pd.DataFrame(conf_matrix, index=labels, columns=labels, dtype='float') 48 | fig = plt.figure(figsize=(10, 10)) 49 | ax = fig.gca() 50 | # Oranges,Oranges_r,YlGnBu,Blues,RdBu, PuRd ... 51 | sns.heatmap(conf_matrix, annot=annot, fmt='', ax=ax, cmap='YlGnBu', 52 | annot_kws={"size": 11}, linewidths=0.5) 53 | # 设置坐标轴 54 | ax.set_xticklabels(ax.get_xticklabels(), rotation=25, fontsize=10) 55 | ax.xaxis.set_ticks_position('none') 56 | ax.set_yticklabels(ax.get_yticklabels(), rotation=25, fontsize=10) 57 | ax.yaxis.set_ticks_position('none') 58 | 59 | plt.title(title, size=18) 60 | plt.xlabel('Predicted', size=16) 61 | plt.ylabel('Actual', size=16) 62 | plt.tight_layout() 63 | if is_save: 64 | plt.savefig(os.path.join('.', title+'.jpg')) 65 | else: 66 | plt.show() 67 | 68 | 69 | if __name__ == '__main__': 70 | y_predict = np.random.randint(low=0, high=10, size=(100,)) 71 | y_truth = np.random.randint(low=0, high=10, size=(100,)) 72 | y_labels = [str(i)+'s' for i in range(10)] 73 | plot_confusion_matrix(y_truth, y_predict, y_labels) 74 | -------------------------------------------------------------------------------- /utils/logger_callback.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | File logger_callback.py 4 | @author:ZhengYuwei 5 | """ 6 | from tensorflow import keras 7 | 8 | 9 | class NBatchProgbarLogger(keras.callbacks.ProgbarLogger): 10 | """ 训练过程中,每N个batch打印log到stdout的回调函数 """ 11 | 12 | def __init__(self, count_mode='samples', stateful_metrics=None, display=1000, verbose=1): 13 | """ 14 | :param count_mode: 15 | :param stateful_metrics: 打印的metrics 16 | :param display: batch数打印一次记录 17 | :param verbose: 是否打印训练logger 18 | """ 19 | super(NBatchProgbarLogger, self).__init__(count_mode, stateful_metrics) 20 | self.display = display 21 | self.display_step = 1 22 | self.verbose = verbose 23 | self.epochs = 0 24 | 25 | def on_train_begin(self, logs=None): 26 | self.epochs = self.params['epochs'] 27 | 28 | def on_batch_end(self, batch, logs=None): 29 | logs = logs or {} 30 | batch_size = logs.get('size', 0) 31 | """ 32 | # 分布式计算时需要注意 33 | num_steps = logs.get('num_steps', 1) 34 | if self.use_steps: 35 | self.seen += num_steps 36 | else: 37 | self.seen += batch_size * num_steps 38 | """ 39 | self.seen += 1 40 | self.display_step += 1 41 | # Skip progbar update for the last batch, will be handled by on_epoch_end. 42 | if self.verbose and self.seen < self.target and self.display_step % self.display == 0: 43 | # 打印的metrics 44 | for k in self.params['metrics']: 45 | if k in logs: 46 | self.log_values.append((k, logs[k])) 47 | self.progbar.update(self.seen, self.log_values) 48 | -------------------------------------------------------------------------------- /utils/radam.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on 2019/8/15 4 | File radam 5 | @author: ZhengYuwei 6 | """ 7 | import tensorflow as tf 8 | from tensorflow.python.framework import ops 9 | from tensorflow.python.ops import math_ops 10 | from tensorflow.python.ops import state_ops 11 | from tensorflow.python.keras import backend as K 12 | 13 | 14 | class RAdam(tf.keras.optimizers.Optimizer): 15 | """RAdam optimizer. 16 | 17 | Default parameters follow those provided in the original paper. 18 | 19 | Arguments: 20 | lr: float >= 0. Learning rate. 21 | beta_1: float, 0 < beta < 1. Generally close to 1. 22 | beta_2: float, 0 < beta < 1. Generally close to 1. 23 | epsilon: float >= 0. Fuzz factor. If `None`, defaults to `K.epsilon()`. 24 | decay: float >= 0. Learning rate decay over each update. 25 | amsgrad: boolean. Whether to apply the AMSGrad variant of this 26 | algorithm from the paper "On the Convergence of Adam and 27 | Beyond". 28 | warmup_coef: in early training stage, RAdam will fallback to SGDM, 29 | and for using warmup in SGDM, will set warmup_lr = warmup_coef * lr, 30 | default 1. 31 | """ 32 | 33 | def __init__(self, 34 | lr=0.001, 35 | beta_1=0.9, 36 | beta_2=0.999, 37 | epsilon=None, 38 | decay=0., 39 | amsgrad=False, 40 | warmup_coef=1., 41 | **kwargs): 42 | super(RAdam, self).__init__(**kwargs) 43 | with K.name_scope(self.__class__.__name__): 44 | self.iterations = K.variable(0, dtype='int64', name='iterations') 45 | self.lr = K.variable(lr, name='lr') 46 | self.beta_1 = K.variable(beta_1, name='beta_1') 47 | self.beta_2 = K.variable(beta_2, name='beta_2') 48 | self.decay = K.variable(decay, name='decay') 49 | if epsilon is None: 50 | epsilon = K.epsilon() 51 | self.epsilon = epsilon 52 | self.initial_decay = decay 53 | self.amsgrad = amsgrad 54 | self.warmup_coef = warmup_coef 55 | self.rho_inf = 2. / (1. - self.beta_2) - 1 56 | 57 | def get_updates(self, loss, params): 58 | grads = self.get_gradients(loss, params) 59 | self.updates = [] 60 | 61 | lr = self.lr 62 | if self.initial_decay > 0: 63 | lr = lr * ( # pylint: disable=g-no-augmented-assignment 64 | 1. / (1. + self.decay * math_ops.cast(self.iterations, 65 | K.dtype(self.decay)))) 66 | 67 | with ops.control_dependencies([state_ops.assign_add(self.iterations, 1)]): 68 | t = math_ops.cast(self.iterations, K.floatx()) 69 | 70 | ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params] 71 | vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params] 72 | if self.amsgrad: 73 | vhats = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params] 74 | else: 75 | vhats = [K.zeros(1) for _ in params] 76 | self.weights = [self.iterations] + ms + vs + vhats 77 | 78 | beta_1_power = math_ops.pow(self.beta_1, t) 79 | beta_2_power = math_ops.pow(self.beta_2, t) 80 | rho_t = self.rho_inf - 2.0 * t * beta_2_power / (1.0 - beta_2_power) 81 | 82 | lr_t = tf.where(rho_t >= 5.0, 83 | K.sqrt((rho_t - 4.) * (rho_t - 2.) * self.rho_inf / 84 | ((self.rho_inf - 4.) * (self.rho_inf - 2.) * rho_t)) * 85 | lr * (K.sqrt(1. - beta_2_power) / (1. - beta_1_power)), 86 | self.warmup_coef * lr / (1. - beta_1_power)) 87 | 88 | for p, g, m, v, vhat in zip(params, grads, ms, vs, vhats): 89 | m_t = (self.beta_1 * m) + (1. - self.beta_1) * g 90 | v_t = (self.beta_2 * v) + (1. - self.beta_2) * math_ops.square(g) 91 | 92 | if self.amsgrad: 93 | vhat_t = math_ops.maximum(vhat, v_t) 94 | p_t = p - lr_t * tf.where(rho_t >= 5.0, m_t / (K.sqrt(vhat_t) + self.epsilon), m_t) 95 | self.updates.append(state_ops.assign(vhat, vhat_t)) 96 | else: 97 | p_t = p - lr_t * tf.where(rho_t >= 5.0, m_t / (K.sqrt(v_t) + self.epsilon), m_t) 98 | 99 | self.updates.append(state_ops.assign(m, m_t)) 100 | self.updates.append(state_ops.assign(v, v_t)) 101 | new_p = p_t 102 | 103 | # Apply constraints. 104 | if getattr(p, 'constraint', None) is not None: 105 | new_p = p.constraint(new_p) 106 | 107 | self.updates.append(state_ops.assign(p, new_p)) 108 | return self.updates 109 | 110 | def get_config(self): 111 | config = { 112 | 'lr': float(K.get_value(self.lr)), 113 | 'beta_1': float(K.get_value(self.beta_1)), 114 | 'beta_2': float(K.get_value(self.beta_2)), 115 | 'decay': float(K.get_value(self.decay)), 116 | 'epsilon': self.epsilon, 117 | 'amsgrad': self.amsgrad 118 | } 119 | base_config = super(RAdam, self).get_config() 120 | return dict(list(base_config.items()) + list(config.items())) 121 | --------------------------------------------------------------------------------