├── README.md
├── images
    ├── GitHub-Mark-32px.png
    ├── colab_logo_32px.png
    ├── tf2_310-163.png
    ├── tf2_gif_1600-900.gif
    ├── tf_120-120.png
    ├── tf_120-23.png
    ├── tf_2056-690.png
    ├── tf_224-225.png
    ├── tf_2464-805.png
    ├── tf_300-168.png
    ├── tf_gif_740-416.gif
    └── tf_logo_32px.png
└── r2
    ├── guide
        ├── eager.md
        ├── effective_tf2.md
        ├── keras
        │   ├── functional.md
        │   ├── functional_25_0.png
        │   ├── functional_27_0.png
        │   ├── functional_45_0.png
        │   ├── functional_56_0.png
        │   ├── overview.md
        │   ├── training_and_evaluation.md
        │   └── training_and_evaluation_48_0.png
        └── migration_guide.md
    └── tutorials
        ├── eager
            ├── automatic_differentiation.md
            ├── basics.md
            ├── custom_layers.md
            ├── custom_training.md
            ├── custom_training_walkthrough.md
            └── tf_function.md
        ├── estimators
            └── linear.md
        ├── images
            ├── hub_with_keras.md
            ├── images
            │   ├── before_fine_tuning.png
            │   └── fine_tuning.png
            ├── intro_to_cnns.md
            ├── segmentation.md
            ├── transfer_learning.md
            └── transfer_learning_files
            │   ├── transfer_learning_17_0.png
            │   ├── transfer_learning_17_1.png
            │   ├── transfer_learning_53_0.png
            │   └── transfer_learning_70_0.png
        ├── keras
            ├── basic_classification.md
            ├── basic_regression.md
            ├── basic_text_classification.md
            ├── basic_text_classification_with_tfhub.md
            ├── feature_columns.md
            ├── overfit_and_underfit.md
            └── save_and_restore_models.md
        ├── quickstart
            ├── advanced.md
            └── beginner.md
        └── text
            ├── image_captioning.md
            ├── image_captioning_44_0.png
            ├── image_captioning_48_1.png
            ├── image_captioning_48_2.png
            ├── image_captioning_50_1.png
            ├── image_captioning_50_2.png
            ├── images
                ├── embedding.jpg
                ├── embedding2.png
                └── one-hot.png
            ├── nmt_with_attention.md
            ├── nmt_with_attention_43_1.png
            ├── nmt_with_attention_44_1.png
            ├── nmt_with_attention_45_1.png
            ├── nmt_with_attention_46_1.png
            ├── text_classification_rnn.md
            ├── text_classification_rnn_31_0.png
            ├── text_classification_rnn_32_0.png
            ├── text_classification_rnn_40_0.png
            ├── text_classification_rnn_41_0.png
            ├── text_generation.md
            ├── transformer.md
            ├── transformer_107_1.png
            ├── transformer_27_1.png
            ├── transformer_82_1.png
            └── word_embeddings.md


/images/GitHub-Mark-32px.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/GitHub-Mark-32px.png


--------------------------------------------------------------------------------
/images/colab_logo_32px.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/colab_logo_32px.png


--------------------------------------------------------------------------------
/images/tf2_310-163.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf2_310-163.png


--------------------------------------------------------------------------------
/images/tf2_gif_1600-900.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf2_gif_1600-900.gif


--------------------------------------------------------------------------------
/images/tf_120-120.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_120-120.png


--------------------------------------------------------------------------------
/images/tf_120-23.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_120-23.png


--------------------------------------------------------------------------------
/images/tf_2056-690.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_2056-690.png


--------------------------------------------------------------------------------
/images/tf_224-225.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_224-225.png


--------------------------------------------------------------------------------
/images/tf_2464-805.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_2464-805.png


--------------------------------------------------------------------------------
/images/tf_300-168.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_300-168.png


--------------------------------------------------------------------------------
/images/tf_gif_740-416.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_gif_740-416.gif


--------------------------------------------------------------------------------
/images/tf_logo_32px.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/images/tf_logo_32px.png


--------------------------------------------------------------------------------
/r2/guide/effective_tf2.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 高效的TensorFlow 2.0
  3 | tags: 
  4 |     - tensorflow2.0
  5 | categories: 
  6 |     - tensorflow2官方教程
  7 | top: 1902
  8 | abbrlink: tensorflow/tf2-guide-effective_tf2
  9 | ---
 10 | 
 11 | # 高效的TensorFlow 2.0 (tensorflow2.0官方教程翻译)
 12 | 
 13 | TensorFlow 2.0中有多处更改，以使TensorFlow用户使用更高效。TensorFlow 2.0删除[冗余 APIs](https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md),使API更加一致([统一 RNNs](https://github.com/tensorflow/community/blob/master/rfcs/20180920-unify-rnn-interface.md),[统一优化器](https://github.com/tensorflow/community/blob/master/rfcs/20181016-optimizer-unification.md)),并通过[Eager execution](https://www.tensorflow.org/guide/eager)模式更好地与Python运行时集成
 14 | 
 15 | 许多[RFCs](https://github.com/tensorflow/community/pulls?utf8=%E2%9C%93&q=is%3Apr)已经解释了TensorFlow 2.0所带来的变化。本指南介绍了TensorFlow 2.0应该是什么样的开发，假设您对TensorFlow 1.x有一定的了解。
 16 | 
 17 | ## 1. 主要变化的简要总结
 18 | 
 19 | ### 1.1. API清理
 20 | 
 21 | 许多API在tensorflow 2.0中[消失或移动](https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md)。一些主要的变化包括删除`tf.app`、`tf.flags`和`tf.logging` ，转而支持现在开源的[absl-py](https://github.com/abseil/abseil-py)，重新安置`tf.contrib`中的项目，并清理主要的 `tf.*`命名空间，将不常用的函数移动到像 `tf.math`这样的子包中。一些API已被2.0版本等效替换，如`tf.summary`, `tf.keras.metrics`和`tf.keras.optimizers`。
 22 | 自动应用这些重命名的最简单方法是使用[v2升级脚本](https://tensorflow.google.cn/beta/guide/upgrade)。
 23 | 
 24 | ### 1.2. Eager execution
 25 | 
 26 | TensorFlow 1.X要求用户通过进行`tf.*` API调用，手动将抽象语法树（图形）拼接在一起。然后要求用户通过将一组输出张量和输入张量传递给`session.run()`来手动编译抽象语法树。
 27 | TensorFlow 2.0 默认Eager execution模式，马上就执行代码（就像Python通常那样），在2.0中，图形和会话应该像实现细节一样。
 28 | 
 29 | Eager execution的一个值得注意的地方是不在需要`tf.control_dependencies()` ，因为所有代码按顺序执行（在`tf.function`中，带有副作用的代码按写入的顺序执行）。
 30 | 
 31 | ### 1.3. 没有更多的全局变量
 32 | 
 33 | TensorFlow 1.X严重依赖于隐式全局命名空间。当你调用`tf.Variable()`时，它会被放入默认图形中，保留在那里，即使你忘记了指向它的Python变量。
 34 | 然后，您可以恢复该`tf.Variable`，但前提是您知道它已创建的名称，如果您无法控制变量的创建，这很难做到。结果，各种机制激增，试图帮助用户再次找到他们的变量，并寻找框架来查找用户创建的变量：变量范围、全局集合、辅助方法如`tf.get_global_step()`, `tf.global_variables_initializer()`、优化器隐式计算所有可训练变量的梯度等等。
 35 | 
 36 | TensorFlow 2.0取消了所有这些机制([Variables 2.0 RFC](https://github.com/tensorflow/community/pull/11))，支持默认机制：跟踪变量！如果你失去了对tf.Variable的追踪，就会垃圾收集回收。
 37 | 
 38 | 跟踪变量的要求为用户创建了一些额外的工作，但是使用Keras对象（见下文），负担被最小化。
 39 | 
 40 | ### 1.4. Functions, not sessions
 41 | 
 42 | `session.run()`调用几乎就像一个函数调用：指定输入和要调用的函数，然后返回一组输出。
 43 | 在TensorFlow 2.0中，您可以使用`tf.function()` 来装饰Python函数以将其标记为JIT编译，以便TensorFlow将其作为单个图形运行([Functions 2.0 RFC](https://github.com/tensorflow/community/pull/20))。这种机制允许TensorFlow 2.0获得图形模式的所有好处：
 44 | 
 45 | - 性能：可以优化功能（节点修剪，内核融合等）
 46 | - 可移植性：该功能可以导出/重新导入([SavedModel 2.0 RFC](https://github.com/tensorflow/community/pull/34))，允许用户重用和共享模块化TensorFlow功能。
 47 | 
 48 | ```python
 49 | # TensorFlow 1.X
 50 | outputs = session.run(f(placeholder), feed_dict={placeholder: input})
 51 | # TensorFlow 2.0
 52 | outputs = f(input)
 53 | ```
 54 | 
 55 | 凭借能够自由穿插Python和TensorFlow代码，我们希望用户能够充分利用Python的表现力。但是可移植的TensorFlow在没有Python解释器的情况下执行-移动端、C++和JS，帮助用户避免在添加 `@tf.function`时重写代码，[AutoGraph](https://tensorflow.google.cn/beta/guide/autograph)将把Python构造的一个子集转换成它们等效的TensorFlow：
 56 | 
 57 | * `for`/`while` -> `tf.while_loop` (支持`break` 和 `continue`)
 58 | * `if` -> `tf.cond`
 59 | * `for _ in dataset` -> `dataset.reduce`
 60 | 
 61 | AutoGraph支持控制流的任意嵌套，这使得高效和简洁地实现许多复杂的ML程序成为可能，比如序列模型、强化学习、自定义训练循环等等。
 62 | 
 63 | ## 2. 使用TensorFlow 2.0的建议
 64 | 
 65 | ### 2.1. 将代码重构为更小的函数
 66 | 
 67 | TensorFlow 1.X中常见的使用模式是“kitchen sink”策略，在该策略中，所有可能的计算的并集被预先安排好，然后通过`session.run()`对所选的张量进行评估。
 68 | 
 69 | TensorFlow 2.0中，用户应该根据需要将代码重构为更小的函数。一般来说，没有必须要使用`tf.function`来修饰这些小函数，只用`tf.function`来修饰高级计算-例如，一个训练步骤，或者模型的前向传递。
 70 | 
 71 | ### 2.2. 使用Keras层和模型来管理变量
 72 | 
 73 | Keras模型和层提供了方便的`variables`和`trainable_variables`属性，它们递归地收集所有的因变量。这使得本地管理变量到使用它们的地方变得非常容易。
 74 | 
 75 | 对比如下：
 76 | 
 77 | ```python
 78 | def dense(x, W, b):
 79 |   return tf.nn.sigmoid(tf.matmul(x, W) + b)
 80 | 
 81 | @tf.function
 82 | def multilayer_perceptron(x, w0, b0, w1, b1, w2, b2 ...):
 83 |   x = dense(x, w0, b0)
 84 |   x = dense(x, w1, b1)
 85 |   x = dense(x, w2, b2)
 86 |   ...
 87 |   
 88 | # 您仍然必须管理w_i和b_i，它们是在代码的其他地方定义的。
 89 | ```
 90 | 
 91 | Keras版本如下：
 92 | 
 93 | ```python
 94 | # 每个图层都可以调用，其签名等价于linear(x)
 95 | layers = [tf.keras.layers.Dense(hidden_size, activation=tf.nn.sigmoid) for _ in range(n)]
 96 | perceptron = tf.keras.Sequential(layers)
 97 | 
 98 | # layers[3].trainable_variables => returns [w3, b3]
 99 | # perceptron.trainable_variables => returns [w0, b0, ...]
100 | ```
101 | 
102 | Keras 层/模型继承自 `tf.train.Checkpointable` 并与`@tf.function`集成，这使得从Keras对象导出保存模型成为可能。
103 | 您不必使用Keras的`.fit()` API来利用这些集成。
104 | 
105 | 下面是一个转移学习示例，演示了Keras如何简化收集相关变量子集的工作。假设你正在训练一个拥有共享trunk的multi-headed模型：
106 | 
107 | ```python
108 | trunk = tf.keras.Sequential([...])
109 | head1 = tf.keras.Sequential([...])
110 | head2 = tf.keras.Sequential([...])
111 | 
112 | path1 = tf.keras.Sequential([trunk, head1])
113 | path2 = tf.keras.Sequential([trunk, head2])
114 | 
115 | # 训练主要数据集
116 | for x, y in main_dataset:
117 |   with tf.GradientTape() as tape:
118 |     prediction = path1(x)
119 |     loss = loss_fn_head1(prediction, y)
120 |   # 同时优化trunk和head1的权重
121 |   gradients = tape.gradient(loss, path1.trainable_variables)
122 |   optimizer.apply_gradients(zip(gradients, path1.trainable_variables))
123 | 
124 | # 微调第二个头部，重用trunk
125 | for x, y in small_dataset:
126 |   with tf.GradientTape() as tape:
127 |     prediction = path2(x)
128 |     loss = loss_fn_head2(prediction, y)
129 |   # 只优化head2的权重，不是trunk的权重
130 |   gradients = tape.gradient(loss, head2.trainable_variables)
131 |   optimizer.apply_gradients(zip(gradients, head2.trainable_variables))
132 | 
133 | # 你可以发布trunk计算，以便他人重用。
134 | tf.saved_model.save(trunk, output_path)
135 | ```
136 | 
137 | ### 2.3. 结合tf.data.Datesets和@tf.function
138 | 
139 | 当迭代适合内存训练的数据时，可以随意使用常规的Python迭代。除此之外，`tf.data.Datesets`是从磁盘中传输训练数据的最佳方式。
140 | 数据集[可迭代（但不是迭代器](https://docs.python.org/3/glossary.html#term-iterable)），就像其他Python迭代器在Eager模式下工作一样。
141 | 您可以通过将代码包装在`tf.function()`中来充分利用数据集异步预取/流功能，该代码将Python迭代替换为使用AutoGraph的等效图形操作。
142 | 
143 | ```python
144 | @tf.function
145 | def train(model, dataset, optimizer):
146 |   for x, y in dataset:
147 |     with tf.GradientTape() as tape:
148 |       prediction = model(x)
149 |       loss = loss_fn(prediction, y)
150 |     gradients = tape.gradient(loss, model.trainable_variables)
151 |     optimizer.apply_gradients(zip(gradients, model.trainable_variables))
152 | ```
153 | 
154 | 如果使用Keras`.fit()`API，就不必担心数据集迭代：
155 | 
156 | ```python
157 | model.compile(optimizer=optimizer, loss=loss_fn)
158 | model.fit(dataset)
159 | ```
160 | 
161 | ### 2.4. 利用AutoGraph和Python控制流程
162 | 
163 | AutoGraph提供了一种将依赖于数据的控制流转换为图形模式等价的方法，如`tf.cond`和`tf.while_loop`。
164 | 
165 | 数据依赖控制流出现的一个常见位置是序列模型。`tf.keras.layers.RNN`封装一个RNN单元格，允许你您静态或动态展开递归。
166 | 为了演示，您可以重新实现动态展开如下：
167 | 
168 | ```python
169 | class DynamicRNN(tf.keras.Model):
170 | 
171 |   def __init__(self, rnn_cell):
172 |     super(DynamicRNN, self).__init__(self)
173 |     self.cell = rnn_cell
174 | 
175 |   def call(self, input_data):
176 |     # [batch, time, features] -> [time, batch, features]
177 |     input_data = tf.transpose(input_data, [1, 0, 2])
178 |     outputs = tf.TensorArray(tf.float32, input_data.shape[0])
179 |     state = self.cell.zero_state(input_data.shape[1], dtype=tf.float32)
180 |     for i in tf.range(input_data.shape[0]):
181 |       output, state = self.cell(input_data[i], state)
182 |       outputs = outputs.write(i, output)
183 |     return tf.transpose(outputs.stack(), [1, 0, 2]), state
184 | ```
185 | 
186 | 有关AutoGraph功能的更详细概述，请参阅[指南](https://tensorflow.google.cn/beta/guide/autograph).。
187 | 
188 | ### 2.5. 使用tf.metrics聚合数据和tf.summary来记录它
189 | 
190 | 要记录摘要，请使用`tf.summary.(scalar|histogram|...)` 并使用上下文管理器将其重定向到writer。（如果省略上下文管理器，则不会发生任何事情。）与TF 1.x不同，摘要直接发送给writer；没有单独的`merger`操作，也没有单独的`add_summary()`调用，这意味着必须在调用点提供步骤值。
191 | 
192 | ```python
193 | summary_writer = tf.summary.create_file_writer('/tmp/summaries')
194 | with summary_writer.as_default():
195 |   tf.summary.scalar('loss', 0.1, step=42)
196 | ```
197 | 
198 | 要在将数据记录为摘要之前聚合数据，请使用`tf.metrics`，Metrics是有状态的；
199 | 当你调用`.result()`时，它们会累计值并返回累计结果。使用`.reset_states()`清除累计值。
200 | 
201 | ```python
202 | def train(model, optimizer, dataset, log_freq=10):
203 |   avg_loss = tf.keras.metrics.Mean(name='loss', dtype=tf.float32)
204 |   for images, labels in dataset:
205 |     loss = train_step(model, optimizer, images, labels)
206 |     avg_loss.update_state(loss)
207 |     if tf.equal(optimizer.iterations % log_freq, 0):
208 |       tf.summary.scalar('loss', avg_loss.result(), step=optimizer.iterations)
209 |       avg_loss.reset_states()
210 | 
211 | def test(model, test_x, test_y, step_num):
212 |   loss = loss_fn(model(test_x), test_y)
213 |   tf.summary.scalar('loss', loss, step=step_num)
214 | 
215 | train_summary_writer = tf.summary.create_file_writer('/tmp/summaries/train')
216 | test_summary_writer = tf.summary.create_file_writer('/tmp/summaries/test')
217 | 
218 | with train_summary_writer.as_default():
219 |   train(model, optimizer, dataset)
220 | 
221 | with test_summary_writer.as_default():
222 |   test(model, test_x, test_y, optimizer.iterations)
223 | ```
224 | 
225 | 通过将TensorBoard指向摘要日志目录来显示生成的摘要：
226 | 
227 | ```shell
228 | tensorboard --logdir /tmp/summaries
229 | ```
230 | 
231 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-effective_tf2.html](https://www.mashangxue123.com/tensorflow/tf2-guide-effective_tf2.html)
232 | > 英文版本：[https://tensorflow.google.cn/beta/guide/effective_tf2](https://tensorflow.google.cn/beta/guide/effective_tf2)
233 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/effective_tf2.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/effective_tf2.md)


--------------------------------------------------------------------------------
/r2/guide/keras/functional_25_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/guide/keras/functional_25_0.png


--------------------------------------------------------------------------------
/r2/guide/keras/functional_27_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/guide/keras/functional_27_0.png


--------------------------------------------------------------------------------
/r2/guide/keras/functional_45_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/guide/keras/functional_45_0.png


--------------------------------------------------------------------------------
/r2/guide/keras/functional_56_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/guide/keras/functional_56_0.png


--------------------------------------------------------------------------------
/r2/guide/keras/training_and_evaluation_48_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/guide/keras/training_and_evaluation_48_0.png


--------------------------------------------------------------------------------
/r2/tutorials/eager/automatic_differentiation.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: TF梯度下降法的核心自动微分和梯度带
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1953
  6 | abbrlink: tensorflow/tf2-tutorials-eager-automatic_differentiation
  7 | ---
  8 | 
  9 | # TF梯度下降法的核心自动微分和梯度带 (tensorflow2.0官方教程翻译）
 10 | 
 11 | 在上一个教程中，我们介绍了张量及其操作。在本教程中，我们将介绍自动微分，这是优化机器学习模型的关键技术。
 12 | 
 13 | > 备注：在此之前，机器学习社区中很少发挥这个利器，一般都是用Backpropagation(反向传播算法)进行梯度求解，然后使用SGD等进行优化更新。手动实现过backprop算法的同学应该可以体会到其中的复杂性和易错性，一个好的框架应该可以很好地将这部分难点隐藏于用户视角，而自动微分技术恰好可以优雅解决这个问题。梯度下降法（Gradient Descendent）是机器学习的核心算法之一，自动微分则是梯度下降法的核心；梯度下降是通过计算参数与损失函数的梯度并在梯度的方向不断迭代求得极值；
 14 | 
 15 | ## 1. 导入包
 16 | 
 17 | ```python
 18 | from __future__ import absolute_import, division, print_function, unicode_literals
 19 | 
 20 | import tensorflow as tf
 21 | ```
 22 | 
 23 | ## 2. 梯度带(Gradient tapes)
 24 | 
 25 | TensorFlow提供了 [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) API 用于自动微分(计算与输入变量相关的计算梯度)。
 26 | Tensorflow将在 `tf.GradientTape` 上下文中执行的所有操作“records(记录)”到“tape(磁带)”上。然后，TensorFlow使用该磁带和与每个记录操作相关的梯度，使用反向模式微分“记录”计算的梯度。例如：
 27 | 
 28 | ```python
 29 | x = tf.ones((2, 2))
 30 | 
 31 | with tf.GradientTape() as t:
 32 |   t.watch(x)
 33 |   y = tf.reduce_sum(x)
 34 |   z = tf.multiply(y, y)
 35 | 
 36 | # Derivative of z with respect to the original input tensor x
 37 | dz_dx = t.gradient(z, x)
 38 | for i in [0, 1]:
 39 |   for j in [0, 1]:
 40 |     assert dz_dx[i][j].numpy() == 8.0
 41 | ```
 42 | 
 43 | 您还可以根据在“记录的”tf.GradientTape上下文中计算的中间值请求输出的梯度。
 44 | 
 45 | ```python
 46 | x = tf.ones((2, 2))
 47 | 
 48 | with tf.GradientTape() as t:
 49 |   t.watch(x)
 50 |   y = tf.reduce_sum(x)
 51 |   z = tf.multiply(y, y)
 52 | 
 53 | # Use the tape to compute the derivative of z with respect to the
 54 | # intermediate value y.
 55 | dz_dy = t.gradient(z, y)
 56 | assert dz_dy.numpy() == 8.0
 57 | ```
 58 | 
 59 | 默认情况下，GradientTape持有的资源会在调用 `GradientTape.gradient()` 方法后立即释放。要在同一计算中计算多个梯度，请创建一个持久梯度带，这允许多次调用 `gradient()` 方法，当磁带对象被垃圾收集时释放资源。例如：
 60 | 
 61 | ```python
 62 | x = tf.constant(3.0)
 63 | with tf.GradientTape(persistent=True) as t:
 64 |   t.watch(x)
 65 |   y = x * x
 66 |   z = y * y
 67 | dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
 68 | dy_dx = t.gradient(y, x)  # 6.0
 69 | del t  # Drop the reference to the tape
 70 | ```
 71 | 
 72 | ### 2.1. 记录控制流程
 73 | 
 74 | 因为tapes(磁带)在执行时记录操作，所以Python控制流程（例如使用 `if` 和 `while`）自然会被处理：
 75 | 
 76 | ```python
 77 | def f(x, y):
 78 |   output = 1.0
 79 |   for i in range(y):
 80 |     if i > 1 and i < 5:
 81 |       output = tf.multiply(output, x)
 82 |   return output
 83 | 
 84 | def grad(x, y):
 85 |   with tf.GradientTape() as t:
 86 |     t.watch(x)
 87 |     out = f(x, y)
 88 |   return t.gradient(out, x)
 89 | 
 90 | x = tf.convert_to_tensor(2.0)
 91 | 
 92 | assert grad(x, 6).numpy() == 12.0
 93 | assert grad(x, 5).numpy() == 12.0
 94 | assert grad(x, 4).numpy() == 4.0
 95 | 
 96 | ```
 97 | 
 98 | ### 2.2. 高阶梯度
 99 | 
100 |  `GradientTape` 上下文管理器内的操作将被记录下来，以便自动微分。如果在该上下文中计算梯度，那么梯度计算也会被记录下来。因此，同样的API也适用于高阶梯度。例如:
101 | 
102 | ```python
103 | x = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0
104 | 
105 | with tf.GradientTape() as t:
106 |   with tf.GradientTape() as t2:
107 |     y = x * x * x
108 |   # Compute the gradient inside the 't' context manager
109 |   # which means the gradient computation is differentiable as well.
110 |   dy_dx = t2.gradient(y, x)
111 | d2y_dx2 = t.gradient(dy_dx, x)
112 | 
113 | assert dy_dx.numpy() == 3.0
114 | assert d2y_dx2.numpy() == 6.0
115 | ```
116 | 
117 | ## 3. 下一步
118 | 
119 | 在本教程中，我们介绍了TensorFlow中的梯度计算。有了这个，我们就拥有了构建和训练神经网络所需的足够原语。
120 | 
121 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html)
122 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation](https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation)
123 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md)
124 | 


--------------------------------------------------------------------------------
/r2/tutorials/eager/basics.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: tensorflow2.0张量及其操作、numpy兼容、GPU加速
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1951
  6 | abbrlink: tensorflow/tf2-tutorials-eager-basics
  7 | ---
  8 | 
  9 | # tensorflow2.0张量及其操作、numpy兼容、GPU加速 (tensorflow2.0官方教程翻译）
 10 | 
 11 | 这是一个基础入门的TensorFlow教程，展示了如何：
 12 | 
 13 | * 导入所需的包
 14 | * 创建和使用张量
 15 | * 使用GPU加速
 16 | * 演示 `tf.data.Dataset`
 17 | 
 18 | ```python
 19 | from __future__ import absolute_import, division, print_function
 20 | ```
 21 | 
 22 | ## 1. 导入TensorFlow
 23 | 
 24 | 要开始，请导入tensorflow模块。从TensorFlow 2.0开始，默认情况下用会启用Eager execution，这使得TensorFlow能够实现更加互动的前端，我们将在稍后讨论这些细节。
 25 | 
 26 | ```python
 27 | import tensorflow as tf
 28 | ```
 29 | 
 30 | ## 2. 张量
 31 | 
 32 | 张量是一个多维数组，与NumPy的 `ndarray` 对象类似，`tf.Tensor` 对象具有数据类型和形状，此外，`tf.Tensor` 可以驻留在加速器内存中（如GPU）。TensorFlow提供了丰富的操作库（([tf.add](https://www.tensorflow.org/api_docs/python/tf/add), [tf.matmul](https://www.tensorflow.org/api_docs/python/tf/matmul), [tf.linalg.inv](https://www.tensorflow.org/api_docs/python/tf/linalg/inv) 等），它们使用和生成`tf.Tensor`。这些操作会自动转换本机Python类型，例如：
 33 | 
 34 | ```python
 35 | print(tf.add(1, 2))
 36 | print(tf.add([1, 2], [3, 4]))
 37 | print(tf.square(5))
 38 | print(tf.reduce_sum([1, 2, 3]))
 39 | 
 40 | # 操作符重载也支持
 41 | print(tf.square(2) + tf.square(3))
 42 | ```
 43 | 
 44 | ```
 45 |       tf.Tensor(3, shape=(), dtype=int32) 
 46 |       tf.Tensor([4 6], shape=(2,), dtype=int32) 
 47 |       tf.Tensor(25, shape=(), dtype=int32) 
 48 |       tf.Tensor(6, shape=(), dtype=int32) 
 49 |       tf.Tensor(13, shape=(), dtype=int32)
 50 | ```
 51 | 
 52 | 每个 `tf.Tensor` 有一个形状和数据类型：
 53 | 
 54 | ```python
 55 | x = tf.matmul([[1]], [[2, 3]])
 56 | print(x)
 57 | print(x.shape)
 58 | print(x.dtype)
 59 | ```
 60 | 
 61 | ```
 62 |       tf.Tensor([[2 3]], shape=(1, 2), dtype=int32) (1, 2) <dtype: 'int32'>
 63 | ```
 64 | 
 65 | NumPy数组和 `tf.Tensor` 之间最明显的区别是：
 66 | 
 67 | 1. 张量可以有加速器内存（如GPU,TPU）支持。
 68 | 
 69 | 2. 张量是不可改变的。
 70 | 
 71 | 
 72 | ### 2.1 NumPy兼容性
 73 | 
 74 | 在TensorFlow的 `tf.Tensor` 和NumPy的 `ndarray` 之间转换很容易：
 75 | 
 76 | * TensorFlow操作自动将NumPy ndarray转换为Tensor
 77 | 
 78 | * NumPy操作自动将Tensor转换为NumPy ndarray
 79 | 
 80 | 使用`.numpy（）`方法将张量显式转换为NumPy `ndarrays`。这些转换通常很便宜，因为如果可能的话，数组和`tf.Tensor`共享底层的内存表示。但是，共享底层表示并不总是可行的，因为`tf.Tensor`可以托管在GPU内存中，而NumPy阵列总是由主机内存支持，并且转换涉及从GPU到主机内存的复制。
 81 | 
 82 | ```python
 83 | import numpy as np
 84 | 
 85 | ndarray = np.ones([3, 3])
 86 | 
 87 | print("TensorFlow operations convert numpy arrays to Tensors automatically")
 88 | tensor = tf.multiply(ndarray, 42)
 89 | print(tensor)
 90 | 
 91 | 
 92 | print("And NumPy operations convert Tensors to numpy arrays automatically")
 93 | print(np.add(tensor, 1))
 94 | 
 95 | print("The .numpy() method explicitly converts a Tensor to a numpy array")
 96 | print(tensor.numpy())
 97 | ```
 98 | 
 99 | ```
100 |     TensorFlow operations convert numpy arrays to Tensors automatically
101 |       tf.Tensor( [[42. 42. 42.] [42. 42. 42.] [42. 42. 42.]], shape=(3, 3), dtype=float64) 
102 |     And NumPy operations convert Tensors to numpy arrays automatically
103 |       [[43. 43. 43.] [43. 43. 43.] [43. 43. 43.]] 
104 |     The .numpy() method explicitly converts a Tensor to a numpy array 
105 |       [[42. 42. 42.] [42. 42. 42.] [42. 42. 42.]]
106 | ```
107 | 
108 | ## 3. GPU加速
109 | 
110 | 使用GPU进行计算可以加速许多TensorFlow操作，如果没有任何注释，TensorFlow会自动决定是使用GPU还是CPU进行操作，如果有必要，可以复制CPU和GPU内存之间的张量，操作产生的张量通常由执行操作的设备的存储器支持，例如：
111 | 
112 | ```python
113 | x = tf.random.uniform([3, 3])
114 | 
115 | print("Is there a GPU available: "),
116 | print(tf.test.is_gpu_available())
117 | 
118 | print("Is the Tensor on GPU #0:  "),
119 | print(x.device.endswith('GPU:0'))
120 | ```
121 | 
122 | ### 3.1 设备名称
123 | 
124 | 
125 | `Tensor.device`属性提供托管张量内容的设备的完全限定字符串名称。此名称编码许多详细信息，例如正在执行此程序的主机的网络地址的标识符以及该主机中的设备。这是分布式执行TensorFlow程序所必需的。如果张量位于主机上的第N个GPU上，则字符串以 `GPU:<N>`  结尾。
126 |   
127 | ### 3.2 显式设备放置
128 | 
129 | 在TensorFlow中，*placement* (放置)指的是如何分配（放置）设备以执行各个操作，如上所述，如果没有提供明确的指导，TensorFlow会自动决定执行操作的设备，并在需要时将张量复制到该设备。但是，可以使用 `tf.device` 上下文管理器将TensorFlow操作显式放置在特定设备上，例如：
130 | 
131 | ```python
132 | import time
133 | 
134 | def time_matmul(x):
135 |   start = time.time()
136 |   for loop in range(10):
137 |     tf.matmul(x, x)
138 | 
139 |   result = time.time()-start
140 | 
141 |   print("10 loops: {:0.2f}ms".format(1000*result))
142 | 
143 | # Force execution on CPU
144 | print("On CPU:")
145 | with tf.device("CPU:0"):
146 |   x = tf.random.uniform([1000, 1000])
147 |   assert x.device.endswith("CPU:0")
148 |   time_matmul(x)
149 | 
150 | # Force execution on GPU #0 if available
151 | if tf.test.is_gpu_available():
152 |   print("On GPU:")
153 |   with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
154 |     x = tf.random.uniform([1000, 1000])
155 |     assert x.device.endswith("GPU:0")
156 |     time_matmul(x)
157 | ```
158 | 
159 | ```
160 |       On CPU: 10 loops: 88.60ms
161 | ```
162 | 
163 | ## 4. 数据集
164 | 
165 | 本节使用 [`tf.data.Dataset` API](https://www.tensorflow.org/guide/datasets) 构建管道，以便为模型提供数据。 `tf.data.Dataset`  API用于从简单，可重复使用的部分构建高性能，复杂的输入管道，这些部分将为模型的训练或评估循环提供支持。
166 | 
167 | 
168 | ### 4.1 创建源数据集
169 | 
170 | 使用其中一个工厂函数（如 [`Dataset.from_tensors`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensors), [`Dataset.from_tensor_slices`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices)）或使用从[`TextLineDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TextLineDataset) 或  [`TFRecordDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset) 等文件读取的对象创建源数据集。有关详细信息，请参阅[TensorFlow数据集指南](https://www.tensorflow.org/guide/datasets#reading_input_data)。
171 | 
172 | ```python
173 | ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])
174 | 
175 | # Create a CSV file
176 | import tempfile
177 | _, filename = tempfile.mkstemp()
178 | 
179 | with open(filename, 'w') as f:
180 |   f.write("""Line 1
181 | Line 2
182 | Line 3
183 |   """)
184 | 
185 | ds_file = tf.data.TextLineDataset(filename)
186 | ```
187 | 
188 | ### 4.2 应用转换
189 | 
190 | 使用 [`map`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map), [`batch`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch), 和 [`shuffle`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle)等转换函数将转换应用于数据集记录。
191 | 
192 | ```python
193 | ds_tensors = ds_tensors.map(tf.square).shuffle(2).batch(2)
194 | 
195 | ds_file = ds_file.batch(2)
196 | ```
197 | 
198 | ### 4.3 迭代（Iterate）
199 | 
200 | `tf.data.Dataset` 对象支持迭代循环：
201 | 
202 | 
203 | ```python
204 | print('Elements of ds_tensors:')
205 | for x in ds_tensors:
206 |   print(x)
207 | 
208 | print('\nElements in ds_file:')
209 | for x in ds_file:
210 |   print(x)
211 | ```
212 | 
213 | ```
214 |       Elements of ds_tensors:
215 |         tf.Tensor([1 9], shape=(2,), dtype=int32) 
216 |         tf.Tensor([ 4 25], shape=(2,), dtype=int32) 
217 |         tf.Tensor([16 36], shape=(2,), dtype=int32) 
218 |       Elements in ds_file: 
219 |         tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string) 
220 |         tf.Tensor([b'Line 3' b' '], shape=(2,), dtype=string)
221 | ```
222 | 
223 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-basics.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-basics.html)
224 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/basics](https://tensorflow.google.cn/beta/tutorials/eager/basics)
225 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/basics.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/basics.md)


--------------------------------------------------------------------------------
/r2/tutorials/eager/custom_layers.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 使用Keras自定义层
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1952
  6 | abbrlink: tensorflow/tf2-tutorials-eager-custom_layers
  7 | ---
  8 | 
  9 | # 使用Keras自定义层 (tensorflow2.0官方教程翻译）
 10 | 
 11 | 我们建议使用 `tf.keras` 作为构建神经网络的高级API，也就是说，大多数TensorFlow API都可用于Eager execution。
 12 | 
 13 | ```python
 14 | from __future__ import absolute_import, division, print_function, unicode_literals
 15 | 
 16 | import tensorflow as tf
 17 | ```
 18 | 
 19 | ## 1. 对图层的常用操作
 20 | 
 21 | 在编写机器学习模型的代码时，大多数情况下，您希望以比单个操作和单个变量操作更高的抽象级别上进行操作。
 22 | 
 23 | 许多机器学习模型都可以表示为相对简单的层的组合和叠加，TensorFlow提供了一组公共层和一种简单的方法，让您可以从头开始编写自己的特定于应用程序的层，也可以表示为现有层的组合。
 24 | 
 25 | TensorFlow在 `tf.keras` 中包含完整 [Keras](https://keras.io) API，而Keras层在构建自己的模型时非常有用。
 26 | 
 27 | 
 28 | ```python
 29 | # 在tf.keras.layers包中，图层是对象。要构造一个图层，只需构造一个对象。 
 30 | # 大多数层将输出维度/通道的数量作为第一个参数。 
 31 | layer = tf.keras.layers.Dense(100)
 32 | 
 33 | # 输入维度的数量通常是不必要的，因为它可以在第一次使用层时推断出来， 
 34 | # 但如果您想手动指定它，则可以提供它，这在某些复杂模型中很有用。 
 35 | layer = tf.keras.layers.Dense(10, input_shape=(None, 5))
 36 | ```
 37 | 
 38 | 可以在文档([链接](https://www.tensorflow.org/api_docs/python/tf/keras/layers))中看到预先存在的层的完整列表，它包括Dense（完全连接层），Conv2D，LSTM，BatchNormalization，Dropout等等。
 39 | 
 40 | ```python
 41 | # 要使用图层，只需调用它即可。 
 42 | layer(tf.zeros([10, 5]))
 43 | ```
 44 | 
 45 | 
 46 | ```python
 47 | # 层有许多有用的方法，例如，您可以使用 `layer.variables` 和可训练变量使用 
 48 | # `layer.trainable_variables`检查图层中的所有变量，在这种情况下， 
 49 | # 完全连接的层将具有权重和偏差的变量。 
 50 | print(layer.variables) 
 51 | ```
 52 | 
 53 | ```python
 54 | # 变量也可以通过nice accessors访问
 55 | print(layer.kernel, layer.bias)
 56 | ```
 57 | 
 58 | ## 2. 使用keras实现自定义层
 59 | 
 60 | 实现自己的层的最佳方法是扩展`tf.keras.Layer` 类并实现：
 61 | 
 62 |   *  `__init__` ，您可以在其中执行所有与输入无关的初始化
 63 | 
 64 |   * `build`，您可以在其中了解输入张量的形状，并可以执行其余的初始化
 65 | 
 66 |   * `call`，在那里进行正向计算。
 67 | 
 68 | 
 69 | 请注意，您不必等到调用 `build` 来创建变量，您也可以在 `__init__`中创建它们。但是，在 `build` 中创建它们的好处是，它支持根据将要操作的层的输入形状，创建后期变量。另一方面，在 `__init__` 中创建变量意味着需要明确指定创建变量所需的形状。
 70 | 
 71 | ```python
 72 | class MyDenseLayer(tf.keras.layers.Layer):
 73 |   def __init__(self, num_outputs):
 74 |     super(MyDenseLayer, self).__init__()
 75 |     self.num_outputs = num_outputs
 76 | 
 77 |   def build(self, input_shape):
 78 |     self.kernel = self.add_variable("kernel",
 79 |                                     shape=[int(input_shape[-1]),
 80 |                                            self.num_outputs])
 81 | 
 82 |   def call(self, input):
 83 |     return tf.matmul(input, self.kernel)
 84 | 
 85 | layer = MyDenseLayer(10)
 86 | print(layer(tf.zeros([10, 5])))
 87 | print(layer.trainable_variables)
 88 | ```
 89 | 
 90 | 如果尽可能使用标准层，则整体代码更易于阅读和维护，因为其他读者将熟悉标准层的行为。如果你想使用 `tf.keras.layers` 中不存在的图层，请考虑提交[github问题](http://github.com/tensorflow/tensorflow/issues/new)，或者最好向我们发送pull request！
 91 | 
 92 | 
 93 | ## 3. 通过组合层构建模型
 94 | 
 95 | 在机器学习模型中，许多有趣的类似层的事物都是通过组合现有层来实现的。例如，resnet中的每个残差块都是convolutions、 batch normalizations和shortcut的组合。
 96 | 
 97 | 创建包含其他层的类似层的事物时使用的主类是 `tf.keras.Model`，实现一个是通过继承自 `tf.keras.Model` 完成的。
 98 | 
 99 | ```python
100 | class ResnetIdentityBlock(tf.keras.Model):
101 |   def __init__(self, kernel_size, filters):
102 |     super(ResnetIdentityBlock, self).__init__(name='')
103 |     filters1, filters2, filters3 = filters
104 | 
105 |     self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))
106 |     self.bn2a = tf.keras.layers.BatchNormalization()
107 | 
108 |     self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')
109 |     self.bn2b = tf.keras.layers.BatchNormalization()
110 | 
111 |     self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))
112 |     self.bn2c = tf.keras.layers.BatchNormalization()
113 | 
114 |   def call(self, input_tensor, training=False):
115 |     x = self.conv2a(input_tensor)
116 |     x = self.bn2a(x, training=training)
117 |     x = tf.nn.relu(x)
118 | 
119 |     x = self.conv2b(x)
120 |     x = self.bn2b(x, training=training)
121 |     x = tf.nn.relu(x)
122 | 
123 |     x = self.conv2c(x)
124 |     x = self.bn2c(x, training=training)
125 | 
126 |     x += input_tensor
127 |     return tf.nn.relu(x)
128 | 
129 | 
130 | block = ResnetIdentityBlock(1, [1, 2, 3])
131 | print(block(tf.zeros([1, 2, 3, 3])))
132 | print([x.name for x in block.trainable_variables])
133 | ```
134 | 
135 | ```
136 |       tf.Tensor( [[[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]]], shape=(1, 2, 3, 3), dtype=float32)
137 |       ['resnet_identity_block/conv2d/kernel:0', 'resnet_identity_block/conv2d/bias:0',
138 |       'resnet_identity_block/batch_normalization_v2/gamma:0', 'resnet_identity_block/batch_normalization_v2/beta:0',
139 |       'resnet_identity_block/conv2d_1/kernel:0', 'resnet_identity_block/conv2d_1/bias:0',
140 |       'resnet_identity_block/batch_normalization_v2_1/gamma:0', 'resnet_identity_block/batch_normalization_v2_1/beta:0',
141 |       'resnet_identity_block/conv2d_2/kernel:0', 'resnet_identity_block/conv2d_2/bias:0',
142 |       'resnet_identity_block/batch_normalization_v2_2/gamma:0', 'resnet_identity_block/batch_normalization_v2_2/beta:0']
143 | ```
144 | 
145 | 然而，在大多数情况下，组成许多层的模型只是简单地调用一个又一个层。这可以通过使用 `tf.keras.Sequential`在很少的代码中完成
146 | 
147 | ```python
148 | my_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),
149 |                                                     input_shape=(
150 |                                                         None, None, 3)),
151 |                              tf.keras.layers.BatchNormalization(),
152 |                              tf.keras.layers.Conv2D(2, 1,
153 |                                                     padding='same'),
154 |                              tf.keras.layers.BatchNormalization(),
155 |                              tf.keras.layers.Conv2D(3, (1, 1)),
156 |                              tf.keras.layers.BatchNormalization()])
157 | my_seq(tf.zeros([1, 2, 3, 3]))
158 | ```
159 | 
160 | # 4. 下一步
161 | 
162 | 现在，您可以返回到之前的教程，并调整线性回归示例，以使用更好的结构化层和模型。
163 | 
164 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_layers.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_layers.html)
165 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_layers](https://tensorflow.google.cn/beta/tutorials/eager/custom_layers)
166 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_layers.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_layers.md)


--------------------------------------------------------------------------------
/r2/tutorials/eager/custom_training.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 构建tensorflow2.0模型自定义训练的基础步骤
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1954
  6 | abbrlink: tensorflow/tf2-tutorials-eager-custom_training
  7 | ---
  8 | 
  9 | # 构建tensorflow2.0模型自定义训练的基础步骤 (tensorflow2.0官方教程翻译）
 10 | 
 11 | 在上一个教程中，我们介绍了用于自动微分的TensorFlow API，这是机器学习的基本构建块。在本教程中，我们将使用先前教程中介绍的TensorFlow原语来进行一些简单的机器学习。
 12 | 
 13 | TensorFlow还包括一个更高级别的神经网络API(`tf.keras`) ，它提供了有用的抽象来减少引用。我们强烈建议使用神经网络的人使用更高级别的API。
 14 | 但是，在这个简短的教程中，我们从基本原理入手开始介绍神经网络训练，以建立坚实的基础。
 15 | 
 16 | ## 1. 设置
 17 | 
 18 | ```python
 19 | from __future__ import absolute_import, division, print_function, unicode_literals
 20 | 
 21 | import tensorflow as tf
 22 | ```
 23 | 
 24 | ## 2. 变量
 25 | 
 26 | TensorFlow中的张量是不可变的无状态对象。然而，机器学习模型需要具有变化的状态：随着模型训练，计算预测的相同代码应该随着时间的推移而表现不同（希望具有较低的损失）。要表示需要在计算过程中进行更改的状态，您可以选择依赖Python是有状态编程语言的这一事实：
 27 | 
 28 | ```python
 29 | # Using python state
 30 | x = tf.zeros([10, 10])
 31 | x += 2  # This is equivalent to x = x + 2, which does not mutate the original
 32 |         # value of x
 33 | print(x)
 34 | ```
 35 | 
 36 | 但是，TensorFlow内置了有状态操作，这些操作通常比您所在状态的低级Python表示更令人愉快。例如，为了表示模型中的权重，使用TensorFlow变量通常是方便有效的。
 37 | 
 38 | 变量是一个存储值的对象，当在TensorFlow计算中使用时，它将隐式地从该存储值中读取。有一些操作（`tf.assign_sub`, `tf.scatter_update`等）可以操作存储在TensorFlow变量中的值。
 39 | 
 40 | ```python
 41 | v = tf.Variable(1.0)
 42 | assert v.numpy() == 1.0
 43 | 
 44 | # Re-assign the value
 45 | v.assign(3.0)
 46 | assert v.numpy() == 3.0
 47 | 
 48 | # Use `v` in a TensorFlow operation like tf.square() and reassign
 49 | v.assign(tf.square(v))
 50 | assert v.numpy() == 9.0
 51 | ```
 52 | 
 53 | 计算梯度时会自动跟踪使用变量的计算。对于表示嵌入的变量，TensorFlow默认会进行稀疏更新，这样可以提高计算效率和内存效率。
 54 | 
 55 | 使用变量也是一种快速让代码的读者知道这段状态是可变的方法。
 56 | 
 57 | 
 58 | ## 3. 示例：拟合一个线性模型
 59 | 
 60 | 现在让我们把我们迄今为止的几个概念：`Tensor`， `GradientTape`， `Variable`。构建并训练一个简单的模型。这通常涉及几个步骤：
 61 | 
 62 | 1. 定义模型
 63 | 
 64 | 2. 定义损失函数
 65 | 
 66 | 3. 获取训练数据
 67 | 
 68 | 4. 运行训练数据并使用“优化器”调整变量以拟合数据。
 69 | 
 70 | 在本教程中，我们将介绍简单线性模型的一个简单示例： `f(x) = x * W + b`，它有两个变量，`W` 和 `b`。此外，我们将合成数据，使训练有素的模型具有`W = 3.0` 和` b =2.0` 。
 71 | 
 72 | ### 3.1. 定义模型
 73 | 
 74 | 让我们定义一个简单的类来封装变量和计算
 75 | 
 76 | ```python
 77 | class Model(object):
 78 |   def __init__(self):
 79 |     # Initialize variable to (5.0, 0.0)
 80 |     # In practice, these should be initialized to random values.
 81 |     self.W = tf.Variable(5.0)
 82 |     self.b = tf.Variable(0.0)
 83 | 
 84 |   def __call__(self, x):
 85 |     return self.W * x + self.b
 86 | 
 87 | model = Model()
 88 | 
 89 | assert model(3.0).numpy() == 15.0
 90 | ```
 91 | 
 92 | ### 3.2. 定义损失函数
 93 | 
 94 | 损失函数测量给定输入的模型输出与期望输出的匹配程度。让我们使用标准的L2损失：
 95 | 
 96 | ```python
 97 | def loss(predicted_y, desired_y):
 98 |   return tf.reduce_mean(tf.square(predicted_y - desired_y))
 99 | ```
100 | 
101 | ### 3.3. 获取训练数据
102 | 
103 | 让我们用一些噪音合成训练数据：
104 | 
105 | ```python
106 | TRUE_W = 3.0
107 | TRUE_b = 2.0
108 | NUM_EXAMPLES = 1000
109 | 
110 | inputs  = tf.random.normal(shape=[NUM_EXAMPLES])
111 | noise   = tf.random.normal(shape=[NUM_EXAMPLES])
112 | outputs = inputs * TRUE_W + TRUE_b + noise
113 | ```
114 | 
115 | 在我们训练模型之前，让我们可以看到模型现在所处的位置。我们将用红色绘制模型的预测，用蓝色绘制训练数据。
116 | 
117 | ```python
118 | import matplotlib.pyplot as plt
119 | 
120 | plt.scatter(inputs, outputs, c='b')
121 | plt.scatter(inputs, model(inputs), c='r')
122 | plt.show()
123 | 
124 | print('Current loss: '),
125 | print(loss(model(inputs), outputs).numpy())
126 | ```
127 | 
128 | ### 3.4. 定义训练循环
129 | 
130 | 我们现在拥有我们的网络和训练数据。让我们训练它，即使用训练数据来更新模型的变量（`W` 和 `b`），以便使用梯度下降来减少损失。在`tf.train.Optimizer`实现中拥有许多梯度下降方案的变体。我们强烈建议使用这些实现，但本着从基本原理构建的精神，在这个特定的例子中，我们将自己实现基本的数学。
131 | 
132 | ```python
133 | def train(model, inputs, outputs, learning_rate):
134 |   with tf.GradientTape() as t:
135 |     current_loss = loss(model(inputs), outputs)
136 |   dW, db = t.gradient(current_loss, [model.W, model.b])
137 |   model.W.assign_sub(learning_rate * dW)
138 |   model.b.assign_sub(learning_rate * db)
139 | ```
140 | 
141 | 最后，让我们反复浏览训练数据，看看W和b是如何演变的。
142 | 
143 | ```python
144 | model = Model()
145 | 
146 | # Collect the history of W-values and b-values to plot later
147 | Ws, bs = [], []
148 | epochs = range(10)
149 | for epoch in epochs:
150 |   Ws.append(model.W.numpy())
151 |   bs.append(model.b.numpy())
152 |   current_loss = loss(model(inputs), outputs)
153 | 
154 |   train(model, inputs, outputs, learning_rate=0.1)
155 |   print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %
156 |         (epoch, Ws[-1], bs[-1], current_loss))
157 | 
158 | # Let's plot it all
159 | plt.plot(epochs, Ws, 'r',
160 |          epochs, bs, 'b')
161 | plt.plot([TRUE_W] * len(epochs), 'r--',
162 |          [TRUE_b] * len(epochs), 'b--')
163 | plt.legend(['W', 'b', 'true W', 'true_b'])
164 | plt.show()
165 | 
166 | ```
167 | 
168 | ```
169 |       Epoch 0: W=5.00 b=0.00, loss=9.34552 
170 |       ...
171 |       Epoch 9: W=3.22 b=1.74, loss=1.14022
172 | ```
173 | 
174 | ![png](https://tensorflow.google.cn/beta/tutorials/eager/custom_training_files/output_22_1.png)
175 | 
176 | 
177 | ## 4. 下一步
178 | 
179 | 在本教程中，我们介绍了变量，并使用到目前为止讨论的TensorFlow原语构建并训练了一个简单的线性模型。
180 | 
181 | 从理论上讲，这几乎是您使用TensorFlow进行机器学习研究所需要的全部内容。在实践中，特别是对于神经网络，像 `tf.keras` 这样的高级API将更加方便，因为它提供了更高级别的构建块（称为“层”），用于保存和恢复状态的实用程序，一套损失函数，套件优化策略等。
182 | 
183 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training.htnl)
184 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_training](https://tensorflow.google.cn/beta/tutorials/eager/custom_training)
185 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training.md)
186 | 
187 | 
188 | 


--------------------------------------------------------------------------------
/r2/tutorials/estimators/linear.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 使用 Estimator 构建线性模型
  3 | tags: 
  4 |     - tensorflow2.0
  5 | categories: 
  6 |     - tensorflow2官方教程
  7 | top: 1929
  8 | abbrlink: tensorflow/tf2-tutorials-estimators-linear
  9 | ---
 10 | 
 11 | # 使用 Estimator 构建线性模型
 12 | 
 13 | ## 1. 概述
 14 | 
 15 | 这个端到端的演练使用`tf.estimator` API训练逻辑回归模型。该模型通常用作其他更复杂算法的基准。
 16 | Estimator 是可扩展性最强且面向生产的 TensorFlow 模型类型。如需了解详情，请参阅 [Estimator 指南](https://www.tensorflow.org/guide/estimators)。
 17 | 
 18 | ## 2. 安装和导入
 19 | 
 20 | 安装sklearn命令:  `pip install sklearn`
 21 | 
 22 | ```python
 23 | from __future__ import absolute_import, division, print_function, unicode_literals
 24 | 
 25 | import os
 26 | import sys
 27 | 
 28 | import numpy as np
 29 | import pandas as pd
 30 | import matplotlib.pyplot as plt
 31 | from IPython.display import clear_output
 32 | from six.moves import urllib
 33 | ```
 34 | 
 35 | ## 3. 加载泰坦尼克号数据集
 36 | 
 37 | 您将使用泰坦尼克数据集，其以预测乘客的生存(相当病态)为目标，给出性别、年龄、阶级等特征。
 38 | 
 39 | ```python
 40 | import tensorflow.compat.v2.feature_column as fc
 41 | 
 42 | import tensorflow as tf
 43 | 
 44 | # 加载数据集
 45 | dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
 46 | dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
 47 | y_train = dftrain.pop('survived')
 48 | y_eval = dfeval.pop('survived')
 49 | ```
 50 | 
 51 | ## 4. 探索数据
 52 | 
 53 | 数据集包含以下特征：
 54 | 
 55 | ```python
 56 | dftrain.head()
 57 | ```
 58 | 
 59 | |   | sex    | age  | n_siblings_spouses | parch | fare    | class | deck    | embark_town | alone |
 60 | |---|--------|------|--------------------|-------|---------|-------|---------|-------------|-------|
 61 | | 0 | male   | 22.0 | 1                  | 0     | 7.2500  | Third | unknown | Southampton | n     |
 62 | | 1 | female | 38.0 | 1                  | 0     | 71.2833 | First | C       | Cherbourg   | n     |
 63 | | 2 | female | 26.0 | 0                  | 0     | 7.9250  | Third | unknown | Southampton | y     |
 64 | | 3 | female | 35.0 | 1                  | 0     | 53.1000 | First | C       | Southampton | n     |
 65 | | 4 | male   | 28.0 | 0                  | 0     | 8.4583  | Third | unknown | Queenstown  | y     |
 66 | 
 67 | 
 68 | ```python
 69 | dftrain.describe()
 70 | ```
 71 | 
 72 | |       | age        | n_siblings_spouses | parch      | fare       |
 73 | |-------|------------|--------------------|------------|------------|
 74 | | count | 627.000000 | 627.000000         | 627.000000 | 627.000000 |
 75 | | mean  | 29.631308  | 0.545455           | 0.379585   | 34.385399  |
 76 | | std   | 12.511818  | 1.151090           | 0.792999   | 54.597730  |
 77 | | min   | 0.750000   | 0.000000           | 0.000000   | 0.000000   |
 78 | | 25%   | 23.000000  | 0.000000           | 0.000000   | 7.895800   |
 79 | | 50%   | 28.000000  | 0.000000           | 0.000000   | 15.045800  |
 80 | | 75%   | 35.000000  | 1.000000           | 0.000000   | 31.387500  |
 81 | | max   | 80.000000  | 8.000000           | 5.000000   | 512.329200 |
 82 | 
 83 | 
 84 | 训练和评估集分别有627和264个样本数据：
 85 | 
 86 | ```python
 87 | dftrain.shape[0], dfeval.shape[0]
 88 | ```
 89 | 
 90 | ```
 91 |       (627, 264)
 92 | ```
 93 | 
 94 | 大多数乘客都在20和30年代
 95 | 
 96 | ```python
 97 | dftrain.age.hist(bins=20)
 98 | ```
 99 | 
100 | ![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_15_1.png)
101 | 
102 | 
103 | 机上的男性乘客大约是女性乘客的两倍。
104 | 
105 | ```python
106 | dftrain.sex.value_counts().plot(kind='barh')
107 | ```
108 | 
109 | ![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_17_1.png)
110 | 
111 | 
112 | 大多数乘客都在“第三”阶级：
113 | 
114 | ```python
115 | dftrain['class'].value_counts().plot(kind='barh')
116 | ```
117 | 
118 | ![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_19_1.png)
119 | 
120 | 
121 | 与男性相比，女性的生存机会要高得多，这显然是该模型的预测特征：
122 | 
123 | ```python
124 | pd.concat([dftrain, y_train], axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survive')
125 | ```
126 | 
127 | ![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_21_1.png)
128 | 
129 | 
130 | ## 5. 模型的特征工程
131 | 
132 | Estimator使用称为[特征列](https://www.tensorflow.org/guide/feature_columns)的系统来描述模型应如何解释每个原始输入特征，Estimator需要一个数字输入向量，而特征列描述模型应如何转换每个特征。
133 | 
134 | 选择和制作正确的特征列是学习有效模型的关键，特征列可以是原始特征`dict`（基本特征列）中的原始输入之一，也可以是使用在一个或多个基本列（派生特征列）上定义的转换创建的任何新列。
135 | 
136 | 线性Estimator同时使用数值和分类特征，特征列适用于所有TensorFlow Estimator，它们的目的是定义用于建模的特征。此外，它们还提供了一些特征工程功能，比如独热编码、归一化和分桶。
137 | 
138 | 
139 | ### 5.1. 基本特征列
140 | 
141 | ```python
142 | CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
143 |                        'embark_town', 'alone']
144 | NUMERIC_COLUMNS = ['age', 'fare']
145 | 
146 | feature_columns = []
147 | for feature_name in CATEGORICAL_COLUMNS:
148 |   vocabulary = dftrain[feature_name].unique()
149 |   feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
150 | 
151 | for feature_name in NUMERIC_COLUMNS:
152 |   feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))
153 | ```
154 | 
155 | `input_function`指定如何将数据转换为以流方式提供输入管道的`tf.data.Dataset`。`tf.data.Dataset`采用多种来源，如数据帧DataFrame，csv格式的文件等。
156 | 
157 | ```python
158 | def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
159 |   def input_function():
160 |     ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
161 |     if shuffle:
162 |       ds = ds.shuffle(1000)
163 |     ds = ds.batch(batch_size).repeat(num_epochs)
164 |     return ds
165 |   return input_function
166 | 
167 | train_input_fn = make_input_fn(dftrain, y_train)
168 | eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)
169 | ```
170 | 
171 | 检查数据集：
172 | 
173 | ```python
174 | ds = make_input_fn(dftrain, y_train, batch_size=10)()
175 | for feature_batch, label_batch in ds.take(1):
176 |   print('Some feature keys:', list(feature_batch.keys()))
177 |   print()
178 |   print('A batch of class:', feature_batch['class'].numpy())
179 |   print()
180 |   print('A batch of Labels:', label_batch.numpy())
181 | ```
182 | 
183 | 您还可以使用`tf.keras.layers.DenseFeatures`层检查特征列的结果：
184 | 
185 | ```python
186 | age_column = feature_columns[7]
187 | tf.keras.layers.DenseFeatures([age_column])(feature_batch).numpy()
188 | ```
189 | 
190 | ```
191 |       array([[38.],
192 |              [39.],
193 |              [28.],
194 |              [28.],
195 |              [36.],
196 |              [71.],
197 |              [24.],
198 |              [47.],
199 |              [23.],
200 |              [28.]], dtype=float32)
201 | ```
202 | 
203 | `DenseFeatures`只接受密集张量，要检查分类列，需要先将其转换为指示列：
204 | 
205 | ```python
206 | gender_column = feature_columns[0]
207 | tf.keras.layers.DenseFeatures([tf.feature_column.indicator_column(gender_column)])(feature_batch).numpy()
208 | ```
209 | 
210 | ```
211 |       array([[0., 1.],
212 |              [0., 1.],
213 |              [1., 0.],
214 |              [0., 1.],
215 |              [1., 0.],
216 |              [1., 0.],
217 |              [1., 0.],
218 |              [1., 0.],
219 |              [1., 0.],
220 |              [0., 1.]], dtype=float32)
221 | ```       
222 | 
223 | 将所有基本特征添加到模型后，让我们训练模型。使用`tf.estimator` API训练模型只是一个命令：
224 | 
225 | ```python
226 | linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
227 | linear_est.train(train_input_fn)
228 | result = linear_est.evaluate(eval_input_fn)
229 | 
230 | clear_output()
231 | print(result)
232 | ```
233 | 
234 | ```
235 |         {'accuracy_baseline': 0.625, 'auc': 0.83722067, 'accuracy': 0.7462121, 'recall': 0.6666667, 'global_step': 200, 'prediction/mean': 0.38311505, 'average_loss': 0.47361037, 'precision': 0.66, 'auc_precision_recall': 0.7851523, 'loss': 0.46608958, 'label/mean': 0.375}
236 | ```
237 | 
238 | ### 5.2. 派生特征列
239 | 
240 | 现在你达到了75％的准确率。单独使用每个基本功能列可能不足以解释数据。例如，性别和标签之间的相关性可能因性别不同而不同。因此，如果您只学习`gender="Male"`和`gender="Female"`的单一模型权重，您将无法捕捉每个年龄-性别组合（例如，区分`gender="Male"`和`age="30"` 和`gender="Male"`和 `age="40"`）。
241 | 
242 | 要了解不同特征组合之间的差异，可以将交叉特征列添加到模型中（也可以在交叉列之前对年龄进行分桶）：
243 | 
244 | ```python
245 | age_x_gender = tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100)
246 | ```
247 | 
248 | 将组合特征添加到模型之后，让我们再次训练模型：
249 | 
250 | ```python
251 | derived_feature_columns = [age_x_gender]
252 | linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns+derived_feature_columns)
253 | linear_est.train(train_input_fn)
254 | result = linear_est.evaluate(eval_input_fn)
255 | 
256 | clear_output()
257 | print(result)
258 | ```
259 | 
260 | ```
261 |       {'accuracy_baseline': 0.625, 'auc': 0.8424855, 'accuracy': 0.7689394, 'recall': 0.6060606, 'global_step': 200, 'prediction/mean': 0.30415845, 'average_loss': 0.49316654, 'precision': 0.73170733, 'auc_precision_recall': 0.7732599, 'loss': 0.48306185, 'label/mean': 0.375}
262 | ```      
263 | 
264 | 它现在到达了77.6%的准确度，略好于仅在基本特征方面受过训练，您可以尝试使用更多特征和转换，看看您是否可以做得更好。
265 | 
266 | 现在，您可以使用训练模型从评估集对乘客进行预测。TensorFlow模型经过优化，可以同时对样本的批处理或集合进行预测，之前的`eval_input_fn`是使用整个评估集定义的。
267 | 
268 | ```python
269 | pred_dicts = list(linear_est.predict(eval_input_fn))
270 | probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])
271 | 
272 | probs.plot(kind='hist', bins=20, title='predicted probabilities')
273 | ```
274 | 
275 | ![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_42_1.png)
276 | 
277 | 最后，查看结果的接收器操作特性（即ROC），这将使我们更好地了解真阳性率和假阳性率之间的权衡。
278 | 
279 | ```python
280 | from sklearn.metrics import roc_curve
281 | from matplotlib import pyplot as plt
282 | 
283 | fpr, tpr, _ = roc_curve(y_eval, probs)
284 | plt.plot(fpr, tpr)
285 | plt.title('ROC curve')
286 | plt.xlabel('false positive rate')
287 | plt.ylabel('true positive rate')
288 | plt.xlim(0,)
289 | plt.ylim(0,)
290 | ```
291 | 
292 | `(0, 1.05)`
293 | 
294 | ![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_44_1.png)
295 | 
296 | 
297 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-estimators-linear.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-estimators-linear.html)
298 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/estimators/linear](https://tensorflow.google.cn/beta/tutorials/estimators/linear)
299 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/estimators/linear.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/estimators/linear.md)


--------------------------------------------------------------------------------
/r2/tutorials/images/hub_with_keras.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 基于Keras使用TensorFlow Hub实现迁移学习
  3 | tags: tensorflow2.0教程
  4 | categories: tensorflow2官方教程
  5 | top: 1922
  6 | abbrlink: tensorflow/tf2-tutorials-images-hub_with_keras
  7 | ---
  8 | 
  9 | # 基于Keras使用TensorFlow Hub实现迁移学习(tensorflow2.0官方教程翻译)
 10 | 
 11 | [TensorFlow Hub](http://tensorflow.google.cn/hub)是一种共享预训练模型组件的方法。
 12 | 
 13 | > TensorFlow Hub是一个用于促进机器学习模型的可重用部分的发布，探索和使用的库。特别是，它提供经过预先训练的TensorFlow模型，可以在新任务中重复使用。（可以理解为做迁移学习：可以使用较小的数据集训练模型，可以改善泛化和加快训练。）GitHub 地址：[https://github.com/tensorflow/hub](https://github.com/tensorflow/hub)
 14 | 
 15 | 有关预先训练模型的可搜索列表，请参阅[TensorFlow模块中心TensorFlow Module Hub](https://tfhub.dev/)。
 16 | 
 17 | 本教程演示：
 18 | 1. 如何在tf.keras中使用TensorFlow Hub。
 19 | 2. 如何使用TensorFlow Hub进行图像分类。
 20 | 3. 如何做简单的迁移学习。
 21 | 
 22 | ## 1. 安装和导入包
 23 | 
 24 | 安装命令：`pip install -U tensorflow_hub`
 25 | 
 26 | ```python
 27 | from __future__ import absolute_import, division, print_function, unicode_literals
 28 | 
 29 | import matplotlib.pylab as plt
 30 | 
 31 | import tensorflow as tf
 32 |  
 33 | import tensorflow_hub as hub
 34 | 
 35 | from tensorflow.keras import layers
 36 | ```
 37 | 
 38 | ## 2. ImageNet分类器
 39 | 
 40 | ### 2.1. 下载分类器
 41 | 
 42 | 使用`hub.module`加载mobilenet，并使用`tf.keras.layers.Lambda`将其包装为keras层。
 43 | 来自tfhub.dev的任何兼容tf2的[图像分类器URL](https://tfhub.dev/s?q=tf2&module-type=image-classification)都可以在这里工作。
 44 | 
 45 | ```python
 46 | classifier_url ="https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/2" #@param {type:"string"}
 47 | 
 48 | IMAGE_SHAPE = (224, 224)
 49 | 
 50 | classifier = tf.keras.Sequential([
 51 |     hub.KerasLayer(classifier_url, input_shape=IMAGE_SHAPE+(3,))
 52 | ])
 53 | ```
 54 | 
 55 | ### 2.2. 在单个图像上运行它
 56 | 
 57 | 下载单个图像以试用该模型。
 58 | 
 59 | ```python
 60 | import numpy as np
 61 | import PIL.Image as Image
 62 | 
 63 | grace_hopper = tf.keras.utils.get_file('image.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg')
 64 | grace_hopper = Image.open(grace_hopper).resize(IMAGE_SHAPE)
 65 | grace_hopper = np.array(grace_hopper)/255.0
 66 | grace_hopper.shape
 67 | ```
 68 | `(224, 224, 3)`
 69 | 
 70 | 添加批量维度，并将图像传递给模型。
 71 | 
 72 | ```python
 73 | result = classifier.predict(grace_hopper[np.newaxis, ...])
 74 | result.shape
 75 | ```
 76 | 
 77 | 结果是1001元素向量的`logits`，对图像属于每个类的概率进行评级。因此，可以使用`argmax`找到排在最前的类别ID：
 78 | 
 79 | ```python
 80 | predicted_class = np.argmax(result[0], axis=-1)
 81 | predicted_class
 82 | ```
 83 | ```
 84 | 653
 85 | ```
 86 | 
 87 | ### 2.3. 解码预测
 88 | 
 89 | 
 90 | 我们有预测的类别ID，获取`ImageNet`标签，并解码预测
 91 | 
 92 | ```python
 93 | labels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
 94 | imagenet_labels = np.array(open(labels_path).read().splitlines())
 95 | 
 96 | plt.imshow(grace_hopper)
 97 | plt.axis('off')
 98 | predicted_class_name = imagenet_labels[predicted_class]
 99 | _ = plt.title("Prediction: " + predicted_class_name.title())
100 | ```
101 | ![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_20_0.png)
102 | 
103 | ## 3. 简单的迁移学习
104 | 
105 | 使用TF Hub可以很容易地重新训练模型的顶层以识别数据集中的类。
106 | 
107 | ### 3.1. Dataset
108 | 
109 | 对于此示例，您将使用TensorFlow鲜花数据集：
110 | 
111 | ```python
112 | data_root = tf.keras.utils.get_file(
113 |   'flower_photos','https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
114 |    untar=True)
115 | ```
116 | 
117 | 将此数据加载到我们的模型中的最简单方法是使用 `tf.keras.preprocessing.image.ImageDataGenerator`,
118 | 
119 | 所有TensorFlow Hub的图像模块都期望浮点输入在“[0,1]”范围内。使用`ImageDataGenerator`的`rescale`参数来实现这一目的。图像大小将在稍后处理。
120 | 
121 | ```python
122 | image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
123 | image_data = image_generator.flow_from_directory(str(data_root), target_size=IMAGE_SHAPE)
124 | ```
125 | 
126 | ```
127 |     Found 3670 images belonging to 5 classes.
128 | ```
129 | 结果对象是一个返回`image_batch，label_batch`对的迭代器。
130 | 
131 | ```python
132 | for image_batch, label_batch in image_data:
133 |   print("Image batch shape: ", image_batch.shape)
134 |   print("Labe batch shape: ", label_batch.shape)
135 |   break
136 | ```
137 | 
138 | ```
139 |     Image batch shape:  (32, 224, 224, 3)
140 |     Labe batch shape:  (32, 5)
141 | ```
142 | 
143 | ### 3.2. 在一批图像上运行分类器
144 | 
145 | 现在在图像批处理上运行分类器。
146 | 
147 | 
148 | ```python
149 | result_batch = classifier.predict(image_batch)
150 | result_batch.shape  # (32, 1001)
151 | 
152 | predicted_class_names = imagenet_labels[np.argmax(result_batch, axis=-1)]
153 | predicted_class_names
154 | ```
155 | 
156 | ```
157 |       array(['daisy', 'sea urchin', 'ant', 'hamper', 'daisy', 'ringlet',
158 |              'daisy', 'daisy', 'daisy', 'cardoon', 'lycaenid', 'sleeping bag',
159 |              'Bedlington terrier', 'daisy', 'daisy', 'picket fence',
160 |              'coral fungus', 'daisy', 'zucchini', 'daisy', 'daisy', 'bee',
161 |              'daisy', 'daisy', 'bee', 'daisy', 'picket fence', 'bell pepper',
162 |              'daisy', 'pot', 'wolf spider', 'greenhouse'], dtype='<U30')
163 | ```
164 | 
165 | 现在检查这些预测如何与图像对齐：
166 | 
167 | ```python
168 | plt.figure(figsize=(10,9))
169 | plt.subplots_adjust(hspace=0.5)
170 | for n in range(30):
171 |   plt.subplot(6,5,n+1)
172 |   plt.imshow(image_batch[n])
173 |   plt.title(predicted_class_names[n])
174 |   plt.axis('off')
175 | _ = plt.suptitle("ImageNet predictions")
176 | ```
177 | 
178 | ![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_34_0.png)
179 | 
180 | 有关图像属性，请参阅`LICENSE.txt`文件。
181 | 
182 | 结果没有那么完美，但考虑到这些不是模型训练的类（“daisy雏菊”除外），这是合理的。
183 | 
184 | ### 3.3. 下载无头模型
185 | 
186 | TensorFlow Hub还可以在没有顶级分类层的情况下分发模型。这些可以用来轻松做迁移学习。
187 | 
188 | 来自tfhub.dev的任何[Tensorflow 2兼容图像特征向量URL](https://tfhub.dev/s?module-type=image-feature-vector&q=tf2)都可以在此处使用。
189 | 
190 | ```python
191 | feature_extractor_url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" #@param {type:"string"}
192 | ```
193 | 
194 | 创建特征提取器。
195 | 
196 | ```python
197 | feature_extractor_layer = hub.KerasLayer(feature_extractor_url,
198 |                                          input_shape=(224,224,3))
199 | ```
200 | 
201 | 它为每个图像返回一个1280长度的向量：
202 | 
203 | ```python
204 | feature_batch = feature_extractor_layer(image_batch)
205 | print(feature_batch.shape)
206 | ```
207 | `(32, 1280)`
208 | 
209 | 冻结特征提取器层中的变量，以便训练仅修改新的分类器层。
210 | 
211 | ```python
212 | feature_extractor_layer.trainable = False
213 | ```
214 | 
215 | ### 3.4. 附上分类头
216 | 
217 | 现在将中心层包装在`tf.keras.Sequential`模型中，并添加新的分类层。
218 | 
219 | ```python
220 | model = tf.keras.Sequential([
221 |   feature_extractor_layer,
222 |   layers.Dense(image_data.num_classes, activation='softmax')
223 | ])
224 | 
225 | model.summary()
226 | ```
227 | ```
228 |     Model: "sequential_1"
229 |     _________________________________________________________________
230 |     Layer (type)                 Output Shape              Param #   
231 |     =================================================================
232 |     keras_layer_1 (KerasLayer)   (None, 1280)              2257984   
233 |     _________________________________________________________________
234 |     dense (Dense)                (None, 5)                 6405      
235 |     =================================================================
236 |     Total params: 2,264,389
237 |     Trainable params: 6,405
238 |     Non-trainable params: 2,257,984
239 |     _________________________________________________________________
240 | ```
241 | 
242 | ```python
243 | predictions = model(image_batch)
244 | predictions.shape
245 | ```
246 | ```
247 |     TensorShape([32, 5])
248 | ```
249 | 
250 | ### 3.5. 训练模型
251 | 
252 | 使用compile配置训练过程：
253 | 
254 | ```python
255 | model.compile(
256 |   optimizer=tf.keras.optimizers.Adam(),
257 |   loss='categorical_crossentropy',
258 |   metrics=['acc'])
259 | ```
260 | 
261 | 现在使用`.fit`方法训练模型。
262 | 
263 | 这个例子只是训练两个周期。要显示训练进度，请使用自定义回调单独记录每个批次的损失和准确性，而不是记录周期的平均值。
264 | 
265 | ```python
266 | class CollectBatchStats(tf.keras.callbacks.Callback):
267 |   def __init__(self):
268 |     self.batch_losses = []
269 |     self.batch_acc = []
270 | 
271 |   def on_train_batch_end(self, batch, logs=None):
272 |     self.batch_losses.append(logs['loss'])
273 |     self.batch_acc.append(logs['acc'])
274 |     self.model.reset_metrics()
275 | 
276 | steps_per_epoch = np.ceil(image_data.samples/image_data.batch_size)
277 | 
278 | batch_stats_callback = CollectBatchStats()
279 | 
280 | history = model.fit(image_data, epochs=2,
281 |                     steps_per_epoch=steps_per_epoch,
282 |                     callbacks = [batch_stats_callback])
283 | ```
284 | 
285 | ```
286 |     Epoch 1/2
287 |     115/115 [==============================] - 22s 193ms/step - loss: 0.8613 - acc: 0.8438
288 |     Epoch 2/2
289 |     115/115 [==============================] - 23s 199ms/step - loss: 0.5083 - acc: 0.7812
290 | ```
291 | 
292 | 现在，即使只是几次训练迭代，我们已经可以看到模型正在完成任务。
293 | 
294 | ```python
295 | plt.figure()
296 | plt.ylabel("Loss")
297 | plt.xlabel("Training Steps")
298 | plt.ylim([0,2])
299 | plt.plot(batch_stats_callback.batch_losses)
300 | ```
301 | 
302 | ![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_53_1.png)
303 | 
304 | ```python
305 | plt.figure()
306 | plt.ylabel("Accuracy")
307 | plt.xlabel("Training Steps")
308 | plt.ylim([0,1])
309 | plt.plot(batch_stats_callback.batch_acc)
310 | ```
311 | ![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_54_1.png?dcb_=0.5728569869098554)
312 | 
313 | ### 3.6. 检查预测
314 | 
315 | 要重做之前的图，首先获取有序的类名列表：
316 | 
317 | ```python
318 | class_names = sorted(image_data.class_indices.items(), key=lambda pair:pair[1])
319 | class_names = np.array([key.title() for key, value in class_names])
320 | class_names
321 | ```
322 | ```
323 |     array(['Daisy', 'Dandelion', 'Roses', 'Sunflowers', 'Tulips'],
324 |           dtype='<U10')
325 | ```
326 | 
327 | 通过模型运行图像批处理，并将索引转换为类名。
328 | 
329 | ```python
330 | predicted_batch = model.predict(image_batch)
331 | predicted_id = np.argmax(predicted_batch, axis=-1)
332 | predicted_label_batch = class_names[predicted_id]
333 | ```
334 | 
335 | 绘制结果
336 | 
337 | ```python
338 | label_id = np.argmax(label_batch, axis=-1)
339 | 
340 | plt.figure(figsize=(10,9))
341 | plt.subplots_adjust(hspace=0.5)
342 | for n in range(30):
343 |   plt.subplot(6,5,n+1)
344 |   plt.imshow(image_batch[n])
345 |   color = "green" if predicted_id[n] == label_id[n] else "red"
346 |   plt.title(predicted_label_batch[n].title(), color=color)
347 |   plt.axis('off')
348 | _ = plt.suptitle("Model predictions (green: correct, red: incorrect)")
349 | ```
350 | 
351 | ![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_61_0.png)
352 | 
353 | ## 4. 导出你的模型
354 | 
355 | 现在您已经训练了模型，将其导出为已保存的模型：
356 | 
357 | ```python
358 | import time
359 | t = time.time()
360 | 
361 | export_path = "/tmp/saved_models/{}".format(int(t))
362 | tf.keras.experimental.export_saved_model(model, export_path)
363 | 
364 | export_path
365 | ```
366 | ```
367 | '/tmp/saved_models/1557794138'
368 | ```
369 | 
370 | 现在确认我们可以重新加载它，它仍然给出相同的结果：
371 | 
372 | ```python
373 | reloaded = tf.keras.experimental.load_from_saved_model(export_path, custom_objects={'KerasLayer':hub.KerasLayer})
374 | 
375 | result_batch = model.predict(image_batch)
376 | reloaded_result_batch = reloaded.predict(image_batch)
377 | 
378 | abs(reloaded_result_batch - result_batch).max()
379 | ```
380 | `0.0`
381 | 
382 | 这个保存的模型可以在以后加载推理，或转换为[TFLite](https://www.tensorflow.google.cn/lite/convert/) 和 [TFjs](https://github.com/tensorflow/tfjs-converter)。
383 | 
384 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-hub_with_keras.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-hub_with_keras.html)
385 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras)
386 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/hub_with_keras.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/hub_with_keras.md)


--------------------------------------------------------------------------------
/r2/tutorials/images/images/before_fine_tuning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/images/images/before_fine_tuning.png


--------------------------------------------------------------------------------
/r2/tutorials/images/images/fine_tuning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/images/images/fine_tuning.png


--------------------------------------------------------------------------------
/r2/tutorials/images/intro_to_cnns.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 使用TF2.0实现卷积神经网络CNN对MNIST数字分类
  3 | tags: tensorflow2.0教程
  4 | categories: tensorflow2官方教程
  5 | top: 1921
  6 | abbrlink: tensorflow/tf2-tutorials-images-intro_to_cnns
  7 | ---
  8 | 
  9 | # 使用TensorFlow2.0实现卷积神经网络CNN对MNIST数字分类 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 本教程演示了如何训练简单的[卷积神经网络](https://developers.google.com/machine-learning/glossary/#convolutional_neural_network)（CNN）来对MNIST数字进行分类。这个简单的网络将在MNIST测试集上实现99％以上的准确率。因为本教程使用[Keras Sequential API](https://www.tensorflow.org/guide/keras)，所以创建和训练我们的模型只需几行代码。
 12 | 
 13 | 注意：CNN使用GPU训练更快。
 14 | 
 15 | ## 1. 导入TensorFlow
 16 | 
 17 | 
 18 | ```python
 19 | from __future__ import absolute_import, division, print_function, unicode_literals
 20 | 
 21 | import tensorflow as tf
 22 | 
 23 | from tensorflow.keras import datasets, layers, models
 24 | ```
 25 | 
 26 | ## 2. 下载预处理MNIST数据集
 27 | 
 28 | ```python
 29 | (train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
 30 | 
 31 | train_images = train_images.reshape((60000, 28, 28, 1))
 32 | test_images = test_images.reshape((10000, 28, 28, 1))
 33 | 
 34 | # 特征缩放[0, 1]区间 
 35 | train_images, test_images = train_images / 255.0, test_images / 255.0
 36 | ```
 37 | 
 38 | ## 3. 创建卷积基
 39 | 
 40 | 下面6行代码使用常见模式定义卷积基数： [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) 和[MaxPooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)层的堆栈。
 41 | 
 42 | 作为输入，CNN采用形状的张量（image_height, image_width, color_channels），忽略批量大小。MNIST有一个颜色通道（因为图像是灰度的），而彩色图像有三个颜色通道（R,G,B）。在此示例中，我们将配置CNN以处理形状（28,28,1）的输入，这是MNIST图像的格式，我们通过将参数input_shape传递给第一层来完成此操作。
 43 | 
 44 | ```python
 45 | model = models.Sequential()
 46 | model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
 47 | model.add(layers.MaxPooling2D((2, 2)))
 48 | model.add(layers.Conv2D(64, (3, 3), activation='relu'))
 49 | model.add(layers.MaxPooling2D((2, 2)))
 50 | model.add(layers.Conv2D(64, (3, 3), activation='relu'))
 51 | model.summary() # 显示模型的架构
 52 | ```
 53 | 
 54 | ```
 55 | Model: "sequential"
 56 | _________________________________________________________________
 57 | Layer (type)                 Output Shape              Param #   
 58 | =================================================================
 59 | conv2d (Conv2D)              (None, 26, 26, 32)        320       
 60 | _________________________________________________________________
 61 | max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
 62 | _________________________________________________________________
 63 | conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
 64 | _________________________________________________________________
 65 | max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
 66 | _________________________________________________________________
 67 | conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
 68 | =================================================================
 69 | ...
 70 | ```
 71 | 
 72 | 在上面，你可以看到每个Conv2D和MaxPooling2D层的输出都是3D张量的形状（高度，宽度，通道），随着我们在网络中更深入，宽度和高度大小趋于缩小，每个Conv2D层的输出通道的数由第一个参数（例如，32或64）控制。通常，随着宽度和高度的缩小，我们可以（计算地）在每个Conv2D层中添加更多输出通道
 73 | 
 74 | ## 4. 在顶部添加密集层
 75 | 
 76 | 为了完成我们的模型，我们将最后的输出张量从卷积基（形状(3,3,64)）馈送到一个或多个密集层中以执行分类。密集层将矢量作为输入（1D），而当前输出是3D张量。首先，我们将3D输出展平（或展开）为1D，然后在顶部添加一个或多个Dense层。MINST有10个输出类，因此我们使用具有10输出和softmax激活的最终Dense层。
 77 | 
 78 | ```python
 79 | model.add(layers.Flatten())
 80 | model.add(layers.Dense(64, activation='relu'))
 81 | model.add(layers.Dense(10, activation='softmax'))
 82 | model.summary() # 显示模型的架构
 83 | ```
 84 | 
 85 | ```
 86 | Model: "sequential"
 87 | _________________________________________________________________
 88 | Layer (type)                 Output Shape              Param #   
 89 | =================================================================
 90 | conv2d (Conv2D)              (None, 26, 26, 32)        320       
 91 | _________________________________________________________________
 92 | max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
 93 | _________________________________________________________________
 94 | conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
 95 | _________________________________________________________________
 96 | max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
 97 | _________________________________________________________________
 98 | conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
 99 | _________________________________________________________________
100 | flatten (Flatten)            (None, 576)               0         
101 | _________________________________________________________________
102 | dense (Dense)                (None, 64)                36928     
103 | _________________________________________________________________
104 | dense_1 (Dense)              (None, 10)                650       
105 | =================================================================
106 | ...
107 | ```
108 | 
109 | 从上面可以看出，在通过两个密集层之前，我们的(3,3,64)输出被展平为矢量（576）。
110 | 
111 | ## 5. 编译和训练模型
112 | 
113 | 
114 | ```python
115 | model.compile(optimizer='adam',
116 |               loss='sparse_categorical_crossentropy',
117 |               metrics=['accuracy'])
118 | 
119 | model.fit(train_images, train_labels, epochs=5)
120 | ```
121 | 
122 | ```
123 | ...
124 | Epoch 5/5
125 | 60000/60000 [==============================] - 15s 258us/sample - loss: 0.0190 - accuracy: 0.9941
126 | ```
127 | 
128 | ## 6. 评估模型
129 | 
130 | ```python
131 | test_loss, test_acc = model.evaluate(test_images, test_labels)
132 | ```
133 | ```
134 | 10000/10000 [==============================] - 1s 92us/sample - loss: 0.0272 - accuracy: 0.9921
135 | ```
136 | 
137 | ```python
138 | print(test_acc)
139 | ```
140 | 
141 | ```
142 |     0.9921
143 | ```
144 | 
145 | 如你所见，我们简单的CNN已经达到了超过99%的测试精度，这几行代码还不错。另一种编写CNN的方式[here](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/advanced.ipynb)（使用Keras Subclassing API和GradientTape）。
146 | 
147 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-intro_to_cnns.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-intro_to_cnns.html)
148 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/save_and_restore_models](https://tensorflow.google.cn/beta/tutorials/images/intro_to_cnns)
149 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/intro_to_cnns.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/intro_to_cnns.md)
150 | 


--------------------------------------------------------------------------------
/r2/tutorials/images/segmentation.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 图像分割
  3 | tags: tensorflow2.0教程
  4 | categories: tensorflow2官方教程
  5 | top: 1924
  6 | abbrlink: tensorflow/tf2-tutorials-images-intro_to_cnns
  7 | ---
  8 | 
  9 | # 图像分割 (tensorflow2.0官方教程翻译)
 10 | 
 11 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-segmentation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-segmentation.html)
 12 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/segmentation](https://tensorflow.google.cn/beta/tutorials/images/segmentation)
 13 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/segmentation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/segmentation.md)
 14 | 
 15 | 
 16 | 本教程重点介绍使用修改后的[U-Net](https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/)进行图像分割的任务。
 17 | 
 18 | ## 什么是图像分割？
 19 | 
 20 | 前面的章节我们学习了图像分类，网络算法的任务是为输入图像输出对应的标签或类。但是，假设您想知道对象在图像中的位置，该对象的形状，哪个像素属于哪个对象等。在这种情况下，您将要分割图像，即图像的每个像素都是给了一个标签。
 21 | 
 22 | 因此，图像分割的任务是训练神经网络以输出图像的逐像素掩模。这有助于以更低的水平（即像素级别）理解图像。图像分割在医学成像，自动驾驶汽车和卫星成像等方面具有许多应用。
 23 | 
 24 | 将用于本教程的数据集是由Parkhi等人创建的[Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/)。数据集由图像、其对应的标签和像素方式的掩码组成。掩模基本上是每个像素的标签。每个像素分为三类：
 25 | *   第1类：属于宠物的像素。
 26 | *   第2类：与宠物接壤的像素。
 27 | *   第3类：以上都没有/周围像素。
 28 | 
 29 | 下载依赖项目  https://github.com/tensorflow/examples，
 30 | 把文件夹tensorflow_examples放到项目下，下面会导入pix2pix
 31 | 
 32 | 安装tensorflow：
 33 | 
 34 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==2.0.0-beta1
 35 | 
 36 | 安装tensorflow_datasets：
 37 | 
 38 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow_datasets
 39 | 
 40 | ## 导入各种依赖包
 41 | 
 42 | ```python
 43 | import tensorflow as tf
 44 | 
 45 | from __future__ import absolute_import, division, print_function, unicode_literals
 46 | 
 47 | from tensorflow_examples.models.pix2pix import pix2pix
 48 | 
 49 | import tensorflow_datasets as tfds
 50 | tfds.disable_progress_bar()
 51 | 
 52 | from IPython.display import clear_output
 53 | import matplotlib.pyplot as plt
 54 | ```
 55 | 
 56 | ## 下载Oxford-IIIT Pets数据集
 57 | 
 58 | 数据集已包含在TensorFlow数据集中，只需下载即可。分段掩码包含在3.0.0版中，这就是使用此特定版本的原因。
 59 | 
 60 | ```python
 61 | dataset, info = tfds.load('oxford_iiit_pet:3.0.0', with_info=True)
 62 | ```
 63 | 
 64 | 以下代码执行翻转图像的简单扩充。另外，图像归一化为[0,1]。
 65 | 最后，如上所述，分割掩模中的像素标记为{1,2,3}。为了方便起见，让我们从分割掩码中减去1，得到标签：{0,1,2}。
 66 | 
 67 | ```python
 68 | def normalize(input_image, input_mask):
 69 |   input_image = tf.cast(input_image, tf.float32)/128.0 - 1
 70 |   input_mask -= 1
 71 |   return input_image, input_mask
 72 | 
 73 | @tf.function
 74 | def load_image_train(datapoint):
 75 |   input_image = tf.image.resize(datapoint['image'], (128, 128))
 76 |   input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))
 77 | 
 78 |   if tf.random.uniform(()) > 0.5:
 79 |     input_image = tf.image.flip_left_right(input_image)
 80 |     input_mask = tf.image.flip_left_right(input_mask)
 81 | 
 82 |   input_image, input_mask = normalize(input_image, input_mask)
 83 | 
 84 |   return input_image, input_mask
 85 | 
 86 | def load_image_test(datapoint):
 87 |   input_image = tf.image.resize(datapoint['image'], (128, 128))
 88 |   input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))
 89 | 
 90 |   input_image, input_mask = normalize(input_image, input_mask)
 91 | 
 92 |   return input_image, input_mask
 93 | ```
 94 | 
 95 | 数据集已包含测试和训练所需的分割，因此让我们继续使用相同的分割。
 96 | 
 97 | ```python
 98 | TRAIN_LENGTH = info.splits['train'].num_examples
 99 | BATCH_SIZE = 64
100 | BUFFER_SIZE = 1000
101 | STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE
102 | 
103 | train = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.experimental.AUTOTUNE)
104 | test = dataset['test'].map(load_image_test)
105 | 
106 | train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
107 | train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
108 | test_dataset = test.batch(BATCH_SIZE)
109 | ```
110 | 
111 | 让我们看一下图像示例，它是数据集的相应掩模。
112 | 
113 | ```python
114 | def display(display_list):
115 |   plt.figure(figsize=(15, 15))
116 | 
117 |   title = ['Input Image', 'True Mask', 'Predicted Mask']
118 | 
119 |   for i in range(len(display_list)):
120 |     plt.subplot(1, len(display_list), i+1)
121 |     plt.title(title[i])
122 |     plt.imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
123 |     plt.axis('off')
124 |   plt.show()
125 | 
126 | for image, mask in train.take(1):
127 |   sample_image, sample_mask = image, mask
128 | display([sample_image, sample_mask])
129 | ```
130 | 
131 | ![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_a6u_Rblkteqb_0.png)
132 | 
133 | 
134 | ## 定义模型
135 | 
136 | 这里使用的模型是一个改进的U-Net。U-Net由编码器（下采样器）和解码器（上采样器）组成。为了学习鲁棒特征并减少可训练参数的数量，可以使用预训练模型作为编码器。因此，该任务的编码器将是预训练的MobileNetV2模型，其中间输出将被使用，并且解码器是已经在[Pix2pix tutorial](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py)教程示例中实现的上采样块。
137 | 
138 | 输出三个通道的原因是因为每个像素有三种可能的标签。可以将其视为多分类，其中每个像素被分为三类。
139 | 
140 | ```python
141 | OUTPUT_CHANNELS = 3
142 | ```
143 | 
144 | 如上所述，编码器将是一个预训练的MobileNetV2模型，它已经准备好并可以在[tf.keras.applications](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/applications)中使用。编码器由模型中间层的特定输出组成。
145 | 请注意，在训练过程中不会训练编码器。
146 | 
147 | ```python
148 | base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)
149 | 
150 | # Use the activations of these layers
151 | layer_names = [
152 |     'block_1_expand_relu',   # 64x64
153 |     'block_3_expand_relu',   # 32x32
154 |     'block_6_expand_relu',   # 16x16
155 |     'block_13_expand_relu',  # 8x8
156 |     'block_16_project',      # 4x4
157 | ]
158 | layers = [base_model.get_layer(name).output for name in layer_names]
159 | 
160 | # 创建特征提取模型
161 | down_stack = tf.keras.Model(inputs=base_model.input, outputs=layers)
162 | 
163 | down_stack.trainable = False
164 | ```
165 | 
166 | 解码器/上采样器只是在TensorFlow示例中实现的一系列上采样块。
167 | 
168 | ```python
169 | up_stack = [
170 |     pix2pix.upsample(512, 3),  # 4x4 -> 8x8
171 |     pix2pix.upsample(256, 3),  # 8x8 -> 16x16
172 |     pix2pix.upsample(128, 3),  # 16x16 -> 32x32
173 |     pix2pix.upsample(64, 3),   # 32x32 -> 64x64
174 | ]
175 | 
176 | 
177 | def unet_model(output_channels):
178 | 
179 |   # 这是模型的最后一层
180 |   last = tf.keras.layers.Conv2DTranspose(
181 |       output_channels, 3, strides=2,
182 |       padding='same', activation='softmax')  #64x64 -> 128x128
183 | 
184 |   inputs = tf.keras.layers.Input(shape=[128, 128, 3])
185 |   x = inputs
186 | 
187 |   # 通过该模型进行下采样
188 |   skips = down_stack(x)
189 |   x = skips[-1]
190 |   skips = reversed(skips[:-1])
191 | 
192 |   # Upsampling and establishing the skip connections
193 |   for up, skip in zip(up_stack, skips):
194 |     x = up(x)
195 |     concat = tf.keras.layers.Concatenate()
196 |     x = concat([x, skip])
197 | 
198 |   x = last(x)
199 | 
200 |   return tf.keras.Model(inputs=inputs, outputs=x)
201 | ```
202 | 
203 | ## 训练模型
204 | 
205 | 现在，剩下要做的就是编译和训练模型。这里使用的损失是`loss.sparse_categorical_crossentropy`。使用此丢失函数的原因是因为网络正在尝试为每个像素分配标签，就像多类预测一样。在真正的分割掩码中，每个像素都有{0,1,2}。这里的网络输出三个通道。基本上，每个频道都试图学习预测一个类，而 `loss.sparse_categorical_crossentropy` 是这种情况的推荐损失。使用网络输出，分配给像素的标签是具有最高值的通道。这就是create_mask函数正在做的事情。
206 | 
207 | ```python
208 | model = unet_model(OUTPUT_CHANNELS)
209 | model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
210 |               metrics=['accuracy'])
211 | ```
212 | 
213 | 让我们试试模型，看看它在训练前预测了什么。
214 | 
215 | ```python
216 | def create_mask(pred_mask):
217 |   pred_mask = tf.argmax(pred_mask, axis=-1)
218 |   pred_mask = pred_mask[..., tf.newaxis]
219 |   return pred_mask[0]
220 | 
221 | def show_predictions(dataset=None, num=1):
222 |   if dataset:
223 |     for image, mask in dataset.take(num):
224 |       pred_mask = model.predict(image)
225 |       display([image[0], mask[0], create_mask(pred_mask)])
226 |   else:
227 |     display([sample_image, sample_mask,
228 |              create_mask(model.predict(sample_image[tf.newaxis, ...]))])
229 | 
230 | 
231 | show_predictions()
232 | ```
233 | 
234 | ![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_X_1CC0T4dho3_0.png)
235 | 
236 | 
237 | 让我们观察模型在训练时如何改进。要完成此任务，下面定义了回调函数。
238 | 
239 | ```python
240 | class DisplayCallback(tf.keras.callbacks.Callback):
241 |   def on_epoch_end(self, epoch, logs=None):
242 |     clear_output(wait=True)
243 |     show_predictions()
244 |     print ('\nSample Prediction after epoch {}\n'.format(epoch+1))
245 | 
246 | 
247 | EPOCHS = 20
248 | VAL_SUBSPLITS = 5
249 | VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS
250 | 
251 | model_history = model.fit(train_dataset, epochs=EPOCHS,
252 |                           steps_per_epoch=STEPS_PER_EPOCH,
253 |                           validation_steps=VALIDATION_STEPS,
254 |                           validation_data=test_dataset,
255 |                           callbacks=[DisplayCallback()])
256 | ```
257 | 
258 | ![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_StKDH_B9t4SD_0.png)
259 | 
260 | 
261 | 我们查看损失变化情况
262 | ```python
263 | loss = model_history.history['loss']
264 | val_loss = model_history.history['val_loss']
265 | 
266 | epochs = range(EPOCHS)
267 | 
268 | plt.figure()
269 | plt.plot(epochs, loss, 'r', label='Training loss')
270 | plt.plot(epochs, val_loss, 'bo', label='Validation loss')
271 | plt.title('Training and Validation Loss')
272 | plt.xlabel('Epoch')
273 | plt.ylabel('Loss Value')
274 | plt.ylim([0, 1])
275 | plt.legend()
276 | plt.show()
277 | ```
278 | 
279 | ![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_P_mu0SAbt40Q_0.png)
280 | 
281 | 
282 | ## 作出预测
283 | 
284 | 让我们做一些预测。为了节省时间，周期的数量很小，但您可以将其设置得更高以获得更准确的结果。
285 | 
286 | ```python
287 | show_predictions(test_dataset, 1)
288 | ```
289 | 
290 | 预测效果：
291 | ![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_ikrzoG24qwf5_0.png)
292 | 
293 | 
294 | ## 下一步
295 | 
296 | 现在您已经了解了图像分割是什么，以及它是如何工作的，您可以尝试使用不同的中间层输出，甚至是不同的预训练模型。您也可以通过尝试在Kaggle上托管的[Carvana](https://www.kaggle.com/c/carvana-image-masking-challenge/overview)图像掩蔽比赛来挑战自己。
297 | 
298 | 您可能还希望查看[Tensorflow Object Detection API]（https://github.com/tensorflow/models/tree/master/research/object_detection），以获取您可以重新训练自己数据的其他模型。
299 | 


--------------------------------------------------------------------------------
/r2/tutorials/images/transfer_learning.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 使用预训练的卷积神经网络进行迁移学习
  3 | tags: tensorflow2.0教程
  4 | categories: tensorflow2官方教程
  5 | top: 1923
  6 | abbrlink: tensorflow/tf2-tutorials-images-transfer_learning
  7 | ---
  8 | 
  9 | # 使用预训练的卷积神经网络进行迁移学习 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 在本教程中，您将学习如何使用预训练网络进行转移学习对猫与狗图像分类。主要内容：使用预训练的模型进行特征提取，微调与训练的模型。
 12 | 
 13 | 预训练模型是一个保存的网路，以前在大型数据集上训练的，通常是在大规模图像分类任务上，您可以按原样使用预训练模型，也可以使用转移学习将此模型自定义为给定的任务。
 14 | 
 15 | 转移学习背后的直觉是，如果一个模型在一个大而且足够通用的数据集上训练，这个模型将有效地作为视觉世界的通用模型。然后，您可以利用这些学习的特征映射，而无需从头开始训练大型数据集上的大型模型。
 16 | 
 17 | 在本节中，您将尝试两种方法来自定义预训练模型：
 18 | 1. **特征提取**：使用先前网络学习的表示从新样本中提取有意义的特征，您只需在与训练模型的基础上添加一个新的分类器（将从头开始训练），以便您可以重新调整先前为我们的数据集学习的特征映射。
 19 | 您不需要(重新)训练整个模型，基本卷积网络已经包含了一些对图片分类非常有用的特性。然而，预训练模型的最后一个分类部分是特定于原始分类任务的，然后是特定于模型所训练的一组类。
 20 | 
 21 | 2. **微调**：解冻冻结模型的顶层，并共同训练新添加的分类器和基础模型的最后一层，这允许我们“微调”基础模型中的高阶特征表示，以使它们与特定任务更相关。
 22 | 
 23 | 你将要遵循一般的机器学习工作流程：
 24 | 1. 检查并理解数据
 25 | 2. 构建输入管道，在本例中使用Keras 的 `ImageDataGenerator`
 26 | 3. 构建模型
 27 |     * 加载我们的预训练基础模型（和预训练的权重）
 28 |     * 将我们的分类图层堆叠在顶部
 29 | 4. 训练模型
 30 | 5. 评估模型
 31 | 
 32 | 
 33 | ```python
 34 | from __future__ import absolute_import, division, print_function, unicode_literals
 35 | 
 36 | import os
 37 | 
 38 | import numpy as np
 39 | 
 40 | import matplotlib.pyplot as plt
 41 | 
 42 | import tensorflow as tf
 43 | 
 44 | keras = tf.keras
 45 | ```
 46 | 
 47 | ## 1. 数据预处理
 48 | 
 49 | ### 1.1. 下载数据
 50 | 
 51 | 使用 [TensorFlow Datasets](http://tensorflow.google.cn/datasets)加载猫狗数据集。`tfds` 包是加载预定义数据的最简单方法，如果您有自己的数据，并且有兴趣使用TensorFlow进行导入，请参阅[加载图像数据](https://tensorflow.google.cn/beta/tutorials/load_data/images)。
 52 | 
 53 | 
 54 | ```python
 55 | import tensorflow_datasets as tfds
 56 | ```
 57 | 
 58 | `tfds.load`方法下载并缓存数据，并返回`tf.data.Dataset`对象，这些对象提供了强大、高效的方法来处理数据并将其传递到模型中。
 59 | 
 60 | 由于`"cats_vs_dog"` 没有定义标准分割，因此使用subsplit功能将其分为训练80%、验证10%、测试10%的数据。
 61 | 
 62 | ```python
 63 | SPLIT_WEIGHTS = (8, 1, 1)
 64 | splits = tfds.Split.TRAIN.subsplit(weighted=SPLIT_WEIGHTS)
 65 | 
 66 | (raw_train, raw_validation, raw_test), metadata = tfds.load(
 67 |     'cats_vs_dogs', split=list(splits),
 68 |     with_info=True, as_supervised=True)
 69 | ```
 70 | 
 71 | 生成的`tf.data.Dataset`对象包含（图像，标签）对。图像具有可变形状和3个通道，标签是标量。
 72 | 
 73 | ```python
 74 | print(raw_train)
 75 | print(raw_validation)
 76 | print(raw_test)
 77 | ```
 78 | 
 79 | ```
 80 |     <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>
 81 |     <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>
 82 |     <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>
 83 | ```
 84 | 
 85 | 显示训练集中的前两个图像和标签：
 86 | 
 87 | ```python
 88 | get_label_name = metadata.features['label'].int2str
 89 | 
 90 | for image, label in raw_train.take(2):
 91 |   plt.figure()
 92 |   plt.imshow(image)
 93 |   plt.title(get_label_name(label))
 94 | ```
 95 | 
 96 | 
 97 | ![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_14_0.png)
 98 | 
 99 | ![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_14_1.png)
100 | 
101 | 
102 | ### 1.2. 格式化数据
103 | 
104 | 使用`tf.image`模块格式化图像，将图像调整为固定的输入大小，并将输入通道重新调整为`[-1,1]`范围。
105 | 
106 | <!-- TODO(markdaoust): fix the keras_applications preprocessing functions to work in tf2 -->
107 | 
108 | ```python
109 | IMG_SIZE = 160 # 所有图像将被调整为160x160
110 | 
111 | def format_example(image, label):
112 |   image = tf.cast(image, tf.float32)
113 |   image = (image/127.5) - 1
114 |   image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
115 |   return image, label
116 | ```
117 | 
118 | 使用map方法将此函数应用于数据集中的每一个项：
119 | 
120 | ```python
121 | train = raw_train.map(format_example)
122 | validation = raw_validation.map(format_example)
123 | test = raw_test.map(format_example)
124 | ```
125 | 
126 | 打乱和批处理数据：
127 | 
128 | ```python
129 | BATCH_SIZE = 32
130 | SHUFFLE_BUFFER_SIZE = 1000
131 | 
132 | train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
133 | validation_batches = validation.batch(BATCH_SIZE)
134 | test_batches = test.batch(BATCH_SIZE)
135 | ```
136 | 
137 | 检查一批数据：
138 | 
139 | ```python
140 | for image_batch, label_batch in train_batches.take(1):
141 |   pass
142 | 
143 | image_batch.shape
144 | ```
145 | 
146 | ```
147 |     TensorShape([32, 160, 160, 3])
148 | ```
149 | 
150 | ## 2. 从预先训练的网络中创建基础模型
151 | 
152 | 您将从Google开发的**MobileNet V2**模型创建基础模型，这是在ImageNet数据集上预先训练的，一个包含1.4M图像和1000类Web图像的大型数据集。ImageNet有一个相当随意的研究训练数据集，其中包括“jackfruit(菠萝蜜)”和“syringe(注射器)”等类别，但这个知识基础将帮助我们将猫和狗从特定数据集中区分开来。
153 | 
154 | 首先，您需要选择用于特征提取的MobileNet V2层，显然，最后一个分类层（在“顶部”，因为大多数机器学习模型的图表从下到上）并不是非常有用。相反，您将遵循通常的做法，在展平操作之前依赖于最后一层，该层称为“瓶颈层”，与最终/顶层相比，瓶颈层保持了很多通用性。
155 | 
156 | 然后，实例化预装了ImageNet上训练的MobileNet V2模型权重，通过制定include_top=False参数，可以加载不包含顶部分类层的网络，这是特征提取的理想选择。
157 | 
158 | ```python
159 | IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)
160 | 
161 | # 从预先训练的模型MobileNet V2创建基础模型 
162 | base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
163 |                                                include_top=False,
164 |                                                weights='imagenet')
165 | ```
166 | 
167 | 此特征提取器将每个160x160x3图像转换为5x5x1280的特征块，看看它对示例批量图像的作用：
168 | 
169 | ```python
170 | feature_batch = base_model(image_batch)
171 | print(feature_batch.shape)
172 | ```
173 | 
174 | ```
175 |     (32, 5, 5, 1280)
176 | ```
177 | 
178 | 
179 | ## 3. 特征提取
180 | 
181 | 您将冻结上一步创建的卷积基，并将其用作特征提取器，在其上添加分类器并训练顶级分类器。
182 | 
183 | ### 3.1. 冻结卷积基
184 | 
185 | 在编译和训练模型之前，冻结卷积基是很重要的，通过冻结（或设置`layer.trainable = False`），可以防止在训练期间更新给定图层中的权重。MobileNet V2有很多层，因此将整个模型的可训练标志设置为`False`将冻结所有层。
186 | 
187 | 
188 | ```python
189 | base_model.trainable = False
190 | base_model.summary() # 看看基础模型架构  
191 | ```
192 | 
193 | ```
194 |     Model: "mobilenetv2_1.00_160"
195 |     __________________________________________________________________________________________________
196 |     Layer (type)                    Output Shape         Param #     Connected to
197 |     ==================================================================================================
198 |     input_1 (InputLayer)            [(None, 160, 160, 3) 0
199 |     __________________________________________________________________________________________________
200 |     Conv1_pad (ZeroPadding2D)       (None, 161, 161, 3)  0           input_1[0][0]
201 |     __________________________________________________________________________________________________
202 |     Conv1 (Conv2D)                  (None, 80, 80, 32)   864         Conv1_pad[0][0]
203 |     __________________________________________________________________________________________________
204 |     .....（此处省略很多层）
205 |     __________________________________________________________________________________________________
206 |     Conv_1_bn (BatchNormalizationV1 (None, 5, 5, 1280)   5120        Conv_1[0][0]
207 |     __________________________________________________________________________________________________
208 |     out_relu (ReLU)                 (None, 5, 5, 1280)   0           Conv_1_bn[0][0]
209 |     ==================================================================================================
210 |     ...
211 | ```
212 | 
213 | 
214 | ### 3.2. 添加分类头
215 | 
216 | 要从特征块生成预测，请用5x5在空间位置上进行平均，使用`tf.keras.layers.GlobalAveragePooling2D`层将特征转换为每个图像对应一个1280元素向量。
217 | 
218 | ```python
219 | global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
220 | feature_batch_average = global_average_layer(feature_batch)
221 | print(feature_batch_average.shape)
222 | ```
223 | 
224 | `(32, 1280)`
225 | 
226 | 
227 | 应用`tf.keras.layers.Dense`层将这些特征转换为每个图像的单个预测。您不需要激活函数，因为此预测将被视为`logit`或原始预测值。正数预测第1类，负数预测第0类。
228 | 
229 | ```python
230 | prediction_layer = keras.layers.Dense(1)
231 | prediction_batch = prediction_layer(feature_batch_average)
232 | print(prediction_batch.shape)
233 | ```
234 | 
235 | ``` 
236 |     (32, 1)
237 | ```
238 | 
239 | 
240 | 现在使用`tf.keras.Sequential`堆叠特征提取器和这两个层：
241 | 
242 | ```python
243 | model = tf.keras.Sequential([
244 |   base_model,
245 |   global_average_layer,
246 |   prediction_layer
247 | ])
248 | ```
249 | 
250 | ### 3.3. 编译模型
251 | 
252 | 你必须在训练之前编译模型，由于有两个类，因此使用二进制交叉熵损失：
253 | 
254 | ```python
255 | base_learning_rate = 0.0001
256 | model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
257 |               loss='binary_crossentropy',
258 |               metrics=['accuracy'])
259 |               
260 | model.summary()
261 | ```
262 | 
263 | ```
264 |     Model: "sequential"
265 |     _________________________________________________________________
266 |     Layer (type)                 Output Shape              Param #
267 |     =================================================================
268 |     mobilenetv2_1.00_160 (Model) (None, 5, 5, 1280)        2257984
269 |     _________________________________________________________________
270 |     global_average_pooling2d (Gl (None, 1280)              0
271 |     _________________________________________________________________
272 |     dense (Dense)                (None, 1)                 1281
273 |     =================================================================
274 |     Total params: 2,259,265
275 |     Trainable params: 1,281
276 |     Non-trainable params: 2,257,984
277 |     _________________________________________________________________
278 | ```
279 | 
280 | MobileNet中的2.5M参数被冻结，但Dense层中有1.2K可训练参数，它们分为两个`tf.Variable`对象：权重和偏差。
281 | 
282 | 
283 | ```python
284 | len(model.trainable_variables)
285 | ```
286 | 
287 | `2`
288 | 
289 | 
290 | 
291 | ### 3.4. 训练模型
292 | 
293 | 经过10个周期的训练后，你应该看到约96%的准确率。
294 | 
295 | <!-- TODO(markdaoust): delete steps_per_epoch in TensorFlow r1.14/r2.0 -->
296 | 
297 | 
298 | ```python
299 | num_train, num_val, num_test = (
300 |   metadata.splits['train'].num_examples*weight/10
301 |   for weight in SPLIT_WEIGHTS
302 | )
303 | 
304 | initial_epochs = 10
305 | steps_per_epoch = round(num_train)//BATCH_SIZE
306 | validation_steps = 20
307 | 
308 | loss0,accuracy0 = model.evaluate(validation_batches, steps = validation_steps)
309 | ```
310 | 
311 | ```
312 |     20/20 [==============================] - 4s 219ms/step - loss: 3.1885 - accuracy: 0.6109
313 | ```
314 | 
315 | 
316 | 
317 | ```python
318 | print("initial loss: {:.2f}".format(loss0))
319 | print("initial accuracy: {:.2f}".format(accuracy0))
320 | ```
321 | 
322 | ```
323 |     initial loss: 3.19
324 |     initial accuracy: 0.61
325 | ```
326 | 
327 | 
328 | 
329 | ```python
330 | history = model.fit(train_batches,
331 |                     epochs=initial_epochs,
332 |                     validation_data=validation_batches)
333 | ```
334 | 
335 | ```
336 |     Epoch 1/10
337 |     581/581 [==============================] - 102s 175ms/step - loss: 1.8917 - accuracy: 0.7606 - val_loss: 0.8860 - val_accuracy: 0.8828
338 |     ...
339 |     Epoch 10/10
340 |     581/581 [==============================] - 96s 165ms/step - loss: 0.4921 - accuracy: 0.9381 - val_loss: 0.1847 - val_accuracy: 0.9719
341 | ```
342 | 
343 | ### 3.5. 学习曲线
344 | 
345 | 让我们来看一下使用MobileNet V2基础模型作为固定特征提取器时，训练和验证准确性/损失的学习曲线。
346 | 
347 | ```python
348 | acc = history.history['accuracy']
349 | val_acc = history.history['val_accuracy']
350 | 
351 | loss = history.history['loss']
352 | val_loss = history.history['val_loss']
353 | 
354 | plt.figure(figsize=(8, 8))
355 | plt.subplot(2, 1, 1)
356 | plt.plot(acc, label='Training Accuracy')
357 | plt.plot(val_acc, label='Validation Accuracy')
358 | plt.legend(loc='lower right')
359 | plt.ylabel('Accuracy')
360 | plt.ylim([min(plt.ylim()),1])
361 | plt.title('Training and Validation Accuracy')
362 | 
363 | plt.subplot(2, 1, 2)
364 | plt.plot(loss, label='Training Loss')
365 | plt.plot(val_loss, label='Validation Loss')
366 | plt.legend(loc='upper right')
367 | plt.ylabel('Cross Entropy')
368 | plt.ylim([0,1.0])
369 | plt.title('Training and Validation Loss')
370 | plt.xlabel('epoch')
371 | plt.show()
372 | ```
373 | 
374 | 
375 | ![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_50_0.png)
376 | 
377 | 
378 | *注意：如果您想知道为什么验证指标明显优于训练指标，主要因素是因为像`tf.keras.layers.BatchNormalization`和`tf.keras.layers.Dropout`这样的层会影响训练期间的准确性。在计算验证损失时，它们会被关闭。*
379 | 
380 | 在较小程度上，这也是因为训练指标报告了一个周期的平均值，而验证指标是在周期之后进行评估的，因此验证指标会看到已经训练稍长一些的模型。
381 | 
382 | ## 4. 微调
383 | 
384 | 在我们的特征提取实验中，您只在MobileNet V2基础模型上训练了几层，训练期间未预先更新预训练网络的权重。
385 | 
386 | 进一步提高性能的方法是训练（或“微调”）预训练模型的顶层的权重以及您添加的分类器的训练，训练过程将强制将权重通过特征图调整为专门与我们的数据集关联的特征。
387 | 
388 | *注意：只有在训练顶级分类器并将预先训练的模型设置为不可训练之后，才应尝试此操作。如果您在预先训练的模型上添加一个随机初始化的分类器并尝试联合训练所有层，则梯度更新的幅度将太大（由于分类器的随机权重），并且您的预训练模型将忘记它学到的东西。*
389 | 
390 | 此外，您应该尝试微调少量顶层而不是整个MobileNet模型，在大多数卷积网络中，层越高，它就越专业化。前几层学习非常简单和通用的功能，这些功能可以推广到几乎所有类型的图像，随着层越来越高，这些功能越来越多地针对训练模型的数据集。微调的目的是使这些专用功能适应新数据集，而不是覆盖通用学习。
391 | 
392 | ### 4.1. 取消冻结模型的顶层
393 | 
394 | 
395 | 您需要做的就是解冻`base_model`并将底层设置为无法训练，然后重新编译模型（这些更改生效所必须的），并恢复训练。
396 | 
397 | 
398 | ```python
399 | base_model.trainable = True
400 | 
401 | # 看看基础模型有多少层 
402 | print("Number of layers in the base model: ", len(base_model.layers))
403 | 
404 | # 从此层开始微调 
405 | fine_tune_at = 100
406 | 
407 | # 冻结‘fine_tune_at’层之前的所有层
408 | for layer in base_model.layers[:fine_tune_at]:
409 |   layer.trainable =  False
410 | ```
411 | ```
412 |     Number of layers in the base model:  155
413 | ```
414 | 
415 | ### 4.2. 编译模型
416 | 
417 | 使用低得多的训练率（学习率）编译模型：
418 | 
419 | ```python
420 | model.compile(loss='binary_crossentropy',
421 |               optimizer = tf.keras.optimizers.RMSprop(lr=base_learning_rate/10),
422 |               metrics=['accuracy'])
423 |               
424 | model.summary()
425 | ```
426 | 
427 | ```
428 |     Model: "sequential"
429 |     _________________________________________________________________
430 |     Layer (type)                 Output Shape              Param #
431 |     =================================================================
432 |     mobilenetv2_1.00_160 (Model) (None, 5, 5, 1280)        2257984
433 |     _________________________________________________________________
434 |     global_average_pooling2d (Gl (None, 1280)              0
435 |     _________________________________________________________________
436 |     dense (Dense)                (None, 1)                 1281
437 |     =================================================================
438 |     Total params: 2,259,265
439 |     Trainable params: 1,863,873
440 |     Non-trainable params: 395,392
441 |     _________________________________________________________________
442 | ```
443 | 
444 | ```python
445 | len(model.trainable_variables)
446 | ```
447 | 
448 | ``` 
449 |    58
450 | ```
451 | 
452 | 
453 | 
454 | ### 4.3. 继续训练模型
455 | 
456 | 如果你训练得更早收敛，这将使你的准确率提高几个百分点。
457 | 
458 | ```python
459 | fine_tune_epochs = 10
460 | total_epochs =  initial_epochs + fine_tune_epochs
461 | 
462 | history_fine = model.fit(train_batches,
463 |                          epochs=total_epochs,
464 |                          initial_epoch = initial_epochs,
465 |                          validation_data=validation_batches)
466 | ```
467 | ```
468 |     ...
469 |     Epoch 20/20
470 |     581/581 [==============================] - 116s 199ms/step - loss: 0.1243 - accuracy: 0.9849 - val_loss: 0.1121 - val_accuracy: 0.9875
471 | ```
472 | 
473 | 让我们看一下训练和验证精度/损失的学习曲线，当微调MobileNet V2基础模型的最后几层并在其上训练分类器是，验证损失远远高于训练损失，因此您可能有一些过度拟合。因为新的训练集相对较小且与原始的MobileNet V2数据集类似。
474 | 
475 | 经过微调后，模型精度几乎达到98%。
476 | 
477 | ```python
478 | acc += history_fine.history['accuracy']
479 | val_acc += history_fine.history['val_accuracy']
480 | 
481 | loss += history_fine.history['loss']
482 | val_loss += history_fine.history['val_loss']
483 | 
484 | plt.figure(figsize=(8, 8))
485 | plt.subplot(2, 1, 1)
486 | plt.plot(acc, label='Training Accuracy')
487 | plt.plot(val_acc, label='Validation Accuracy')
488 | plt.ylim([0.8, 1])
489 | plt.plot([initial_epochs-1,initial_epochs-1],
490 |           plt.ylim(), label='Start Fine Tuning')
491 | plt.legend(loc='lower right')
492 | plt.title('Training and Validation Accuracy')
493 | 
494 | plt.subplot(2, 1, 2)
495 | plt.plot(loss, label='Training Loss')
496 | plt.plot(val_loss, label='Validation Loss')
497 | plt.ylim([0, 1.0])
498 | plt.plot([initial_epochs-1,initial_epochs-1],
499 |          plt.ylim(), label='Start Fine Tuning')
500 | plt.legend(loc='upper right')
501 | plt.title('Training and Validation Loss')
502 | plt.xlabel('epoch')
503 | plt.show()
504 | ```
505 | 
506 | ![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_67_0.png)
507 | 
508 | 
509 | ## 5. 小结:
510 | 
511 | * **使用预训练的模型进行特征提取：**
512 | 使用小型数据集时，通常会利用在同一域中的较大数据集上训练的模型所学习的特征。这是通过实例化预先训练的模型，并在顶部添加完全连接的分类器来完成的。预训练的模型被“冻结”并且仅在训练期间更新分类器的权重。在这种情况下，卷积基提取了与每幅图像相关的所有特征，您只需训练一个分类器，根据所提取的特征集确定图像类。
513 | 
514 | * **微调与训练的模型：** 
515 | 为了进一步提高性能，可以通过微调将预训练模型的顶层重新调整为新数据集。在这种情况下，您调整了权重，以便模型学习特定于数据集的高级特征，当训练数据集很大并且非常类似于预训练模型训练的原始数据集时，通常建议使用此技术。
516 | 
517 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html)
518 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/transfer_learning](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning)
519 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md)
520 | 


--------------------------------------------------------------------------------
/r2/tutorials/images/transfer_learning_files/transfer_learning_17_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/images/transfer_learning_files/transfer_learning_17_0.png


--------------------------------------------------------------------------------
/r2/tutorials/images/transfer_learning_files/transfer_learning_17_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/images/transfer_learning_files/transfer_learning_17_1.png


--------------------------------------------------------------------------------
/r2/tutorials/images/transfer_learning_files/transfer_learning_53_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/images/transfer_learning_files/transfer_learning_53_0.png


--------------------------------------------------------------------------------
/r2/tutorials/images/transfer_learning_files/transfer_learning_70_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/images/transfer_learning_files/transfer_learning_70_0.png


--------------------------------------------------------------------------------
/r2/tutorials/keras/basic_classification.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 训练您的第一个神经网络：基本分类Fashion MNIST
  3 | categories: 
  4 |     - tensorflow2官方教程
  5 | tags: 
  6 |     - tensorflow2.0
  7 | top: 1911
  8 | abbrlink: tensorflow/tf2-tutorials-keras-basic_classification
  9 | ---
 10 | 
 11 | # 训练您的第一个神经网络：基本分类Fashion MNIST(tensorflow2.0官方教程翻译)
 12 | 
 13 | 本指南会训练一个对服饰（例如运动鞋和衬衫）图像进行分类的神经网络模型。即使您不了解所有细节也没关系，本教程只是简要介绍了一个完整的 TensorFlow 程序，而且后续我们会详细介绍。
 14 | 
 15 | 本指南使用的是[tf.keras](https://tensorflow.google.cn/guide/keras)，它是一种用于在 TensorFlow 中构建和训练模型的高阶 API。
 16 | 
 17 | 安装
 18 | 
 19 | ```python
 20 | pip install tensorflow==2.0.0-alpha0
 21 | ```
 22 | 
 23 | 导入相关库
 24 | 
 25 | ```python
 26 | from __future__ import absolute_import, division, print_function, unicode_literals
 27 | 
 28 | # TensorFlow and tf.keras
 29 | import tensorflow as tf
 30 | from tensorflow import keras
 31 | 
 32 | # Helper libraries
 33 | import numpy as np
 34 | import matplotlib.pyplot as plt
 35 | 
 36 | print(tf.__version__)
 37 | ```
 38 | 
 39 | ## 1. 导入MNIST数据集
 40 | 
 41 | 本指南使用[Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)数据集，其中包含 70000 张灰度图像，涵盖 10 个类别。以下图像显示了单件服饰在较低分辨率（28x28 像素）下的效果：
 42 | 
 43 | <table>
 44 |   <tr><td>
 45 |     <img src="https://tensorflow.google.cn/images/fashion-mnist-sprite.png"
 46 |          alt="Fashion MNIST sprite"  width="600">
 47 |   </td></tr>
 48 |   <tr><td align="center">
 49 |     <b>Figure 1.</b> <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST 样本</a>
 50 |   </td></tr>
 51 | </table>
 52 | 
 53 | Fashion MNIST 的作用是成为经典 MNIST 数据集的简易替换，后者通常用作计算机视觉机器学习程序的“Hello, World”入门数据集。[MNIST](http://yann.lecun.com/exdb/mnist/)数据集包含手写数字（0、1、2 等）的图像，这些图像的格式与我们在本教程中使用的服饰图像的格式相同。
 54 | 
 55 | 本指南使用 Fashion MNIST 实现多样化，并且它比常规 [MNIST](http://yann.lecun.com/exdb/mnist/)更具挑战性。这两个数据集都相对较小，用于验证某个算法能否如期正常运行。它们都是测试和调试代码的良好起点。
 56 | 
 57 | 我们将使用 60000 张图像训练网络，并使用 10000 张图像评估经过学习的网络分类图像的准确率。您可以从 TensorFlow 直接访问 Fashion MNIST，只需导入和加载数据即可：
 58 | 
 59 | ```python
 60 | fashion_mnist = keras.datasets.fashion_mnist
 61 | 
 62 | (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
 63 | ```
 64 | 
 65 | 加载数据返回4个NumPy数组：
 66 | 
 67 | * `train_images`和`train_labels`数组是训练集，即模型用于学习的数据。
 68 | * 测试集 `test_images` 和 `test_labels` 数组用于测试模型。
 69 | 
 70 | 图像为28x28的NumPy数组，像素值介于0到255之间。标签是整数数组，介于0到9之间。这些标签对应于图像代表的服饰所属的类别：
 71 | 
 72 | <table>
 73 |   <tr>
 74 |     <th>Label</th>
 75 |     <th>Class</th>
 76 |   </tr>
 77 |   <tr>
 78 |     <td>0</td>
 79 |     <td>T-shirt/top(T 恤衫/上衣)</td>
 80 |   </tr>
 81 |   <tr>
 82 |     <td>1</td>
 83 |     <td>Trouser(裤子)</td>
 84 |   </tr>
 85 |     <tr>
 86 |     <td>2</td>
 87 |     <td>Pullover (套衫)</td>
 88 |   </tr>
 89 |     <tr>
 90 |     <td>3</td>
 91 |     <td>Dress(裙子)</td>
 92 |   </tr>
 93 |     <tr>
 94 |     <td>4</td>
 95 |     <td>Coat(外套)</td>
 96 |   </tr>
 97 |     <tr>
 98 |     <td>5</td>
 99 |     <td>Sandal(凉鞋)</td>
100 |   </tr>
101 |     <tr>
102 |     <td>6</td>
103 |     <td>Shirt(衬衫)</td>
104 |   </tr>
105 |     <tr>
106 |     <td>7</td>
107 |     <td>Sneaker(运动鞋)</td>
108 |   </tr>
109 |     <tr>
110 |     <td>8</td>
111 |     <td>Bag(包包)</td>
112 |   </tr>
113 |     <tr>
114 |     <td>9</td>
115 |     <td>Ankle boot(踝靴)</td>
116 |   </tr>
117 | </table>
118 | 
119 | 每个图像都映射到一个标签，由于类名不包含在数据集中，因此将它们存储在此处以便在绘制图像时使用：
120 | 
121 | ```python
122 | class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
123 |                'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
124 | ```
125 | 
126 | ## 2. 探索数据
127 | 
128 | 我们先探索数据集的格式，然后再训练模型。以下内容显示训练集中有 60000 张图像，每张图像都表示为 28x28 像素：
129 | 
130 | ```python
131 | train_images.shape
132 | ```
133 | 
134 | `(60000, 28, 28)`
135 | 
136 | 同样，训练集中有60,000个标签：
137 | 
138 | ```python
139 | len(train_labels)
140 | ```
141 | 
142 | `60000`
143 | 
144 | 每个标签都是0到9之间的整数：
145 | 
146 | ```python
147 | train_labels
148 | ```
149 | 
150 | `array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)`
151 | 
152 | 测试集中有10,000个图像。同样，每个图像表示为28 x 28像素：
153 | 
154 | ```python
155 | test_images.shape
156 | ```
157 | 
158 | `(10000, 28, 28)`
159 | 
160 | 测试集包含10,000个图像标签：
161 | 
162 | ```python
163 | len(test_labels)
164 | ```
165 | 
166 | `10000`
167 | 
168 | ## 3. 预处理数据
169 | 
170 | 在训练网络之前必须对数据进行预处理。 如果您检查训练集中的第一个图像，您将看到像素值落在0到255的范围内：
171 | 
172 | ```python
173 | plt.figure()
174 | plt.imshow(train_images[0])
175 | plt.colorbar()
176 | plt.grid(False)
177 | plt.show()
178 | ```
179 | 
180 | ![](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_21_0.png)
181 | 
182 | 我们将这些值缩小到 0 到 1 之间，然后将其馈送到神经网络模型。为此，将图像组件的数据类型从整数转换为浮点数，然后除以 255。以下是预处理图像的函数：
183 | 
184 | 务必要以相同的方式对训练集和测试集进行预处理：
185 | 
186 | ```python
187 | train_images = train_images / 255.0
188 | 
189 | test_images = test_images / 255.0
190 | ```
191 | 
192 | 为了验证数据的格式是否正确以及我们是否已准备好构建和训练网络，让我们显示训练集中的前25个图像，并在每个图像下方显示类名。
193 | 
194 | ```python
195 | plt.figure(figsize=(10,10))
196 | for i in range(25):
197 |     plt.subplot(5,5,i+1)
198 |     plt.xticks([])
199 |     plt.yticks([])
200 |     plt.grid(False)
201 |     plt.imshow(train_images[i], cmap=plt.cm.binary)
202 |     plt.xlabel(class_names[train_labels[i]])
203 | plt.show()
204 | ```
205 | 
206 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_25_0.png)
207 | 
208 | ## 4. 构建模型
209 | 
210 | 构建神经网络需要配置模型的层，然后编译模型。
211 | 
212 | ### 4.1. 设置图层
213 | 
214 | 神经网络的基本构造块是层。层从馈送到其中的数据中提取表示结果。希望这些表示结果有助于解决手头问题。
215 | 
216 | 大部分深度学习都会把简单的层连在一起。大部分层（例如 `tf.keras.layers.Dense`）都具有在训练期间要学习的参数。
217 | 
218 | ```python
219 | model = keras.Sequential([
220 |     keras.layers.Flatten(input_shape=(28, 28)),
221 |     keras.layers.Dense(128, activation='relu'),
222 |     keras.layers.Dense(10, activation='softmax')
223 | ])
224 | ```
225 | 
226 | 该网络中的第一层`tf.keras.layers.Flatten`将图像的格式从二维数组（28 x 28像素）转换为一维数组（28 * 28 = 784像素））。可以将该层视为图像中像素未堆叠的行，并排列这些行。该层没有要学习的参数；它只改动数据的格式。
227 | 
228 | 在像素被展平之后，网络由两个`tf.keras.layers.Dense`层的序列组成。这些是密集连接或全连接的神经层。第一个`Dense`层有128个节点（或神经元）。第二个（也是最后一个）层是具有 10 个节点的 `softmax` 层，该层会返回一个具有 10 个概率得分的数组，这些得分的总和为 1。每个节点包含一个得分，表示当前图像属于 10 个类别中某一个的概率。
229 | 
230 | ### 4.2. 编译模型
231 | 
232 | 模型还需要再进行几项设置才可以开始训练。这些设置会添加到模型的编译步骤：
233 | 
234 | * 损失函数：衡量模型在训练期间的准确率。我们希望尽可能缩小该函数，以“引导”模型朝着正确的方向优化。
235 | * 优化器：根据模型看到的数据及其损失函数更新模型的方式。
236 | * 度量标准：用于监控训练和测试步骤。以下示例使用准确率，即图像被正确分类的比例。
237 | 
238 | ```python
239 | model.compile(optimizer='adam',
240 |               loss='sparse_categorical_crossentropy',
241 |               metrics=['accuracy'])
242 | ```
243 | 
244 | ## 5. 训练模型
245 | 
246 | 训练神经网络模型需要以下步骤：
247 | 
248 | 1. 将训练数据馈送到模型中，在本示例中为 `train_images` 和 `train_labels` 数组。
249 | 2. 模型学习将图像与标签相关联。
250 | 3. 我们要求模型对测试集进行预测，在本示例中为 test_images 数组。我们会验证预测结果是否与 `test_labels` 数组中的标签一致。
251 | 
252 | 要开始训练，请调用 `model.fit` 方法，使模型与训练数据“拟合”：
253 | 
254 | ```python
255 | model.fit(train_images, train_labels, epochs=5)
256 | ```
257 | 
258 | ```shell
259 | Epoch 1/5
260 | 60000/60000 [==============================] - 5s 87us/step - loss: 0.5033 - acc: 0.8242
261 | ......
262 | Epoch 5/5
263 | 60000/60000 [==============================] - 5s 88us/step - loss: 0.2941 - acc: 0.8917
264 | ```
265 | 
266 | 在模型训练期间，系统会显示损失和准确率指标。该模型在训练数据上的准确率达到 0.88（即 88%）。
267 | 
268 | ## 6. 评估精度
269 | 
270 | 接下来，比较模型在测试数据集上的表现情况：
271 | 
272 | ```python
273 | test_loss, test_acc = model.evaluate(test_images, test_labels)
274 | 
275 | print('\nTest accuracy:', test_acc)
276 | ```
277 | 
278 | 输出：
279 | 
280 | ```output
281 | 10000/10000 [==============================] - 1s 50us/step
282 | Test accuracy: 0.8734
283 | ```
284 | 
285 | 结果表明，模型在测试数据集上的准确率略低于在训练数据集上的准确率。训练准确率和测试准确率之间的这种差异表示出现过拟合(*overfitting*)。如果机器学习模型在新数据上的表现不如在训练数据上的表现，也就是泛化性不好，就表示出现过拟合。
286 | 
287 | ## 7. 预测
288 | 
289 | 模型经过训练后，我们可以使用它对一些图像进行预测。
290 | 
291 | ```python
292 | predictions = model.predict(test_images)
293 | ```
294 | 
295 | 在本示例中，模型已经预测了测试集中每张图像的标签。我们来看看第一个预测：
296 | 
297 | ```python
298 | predictions[0]
299 | ```
300 | 
301 | 输出：
302 | 
303 | ```output
304 | array([6.2482708e-05, 2.4860196e-08, 9.7165821e-07, 4.7436039e-08,
305 |        2.0804382e-06, 1.3316551e-02, 9.8731316e-06, 3.4591161e-02,
306 |        1.2390658e-04, 9.5189297e-01], dtype=float32)
307 | ```
308 | 
309 | 预测结果是一个具有 10 个数字的数组，这些数字说明模型对于图像对应于 10 种不同服饰中每一个服饰的“confidence（置信度）”。我们可以看到哪个标签的置信度值最大：
310 | 
311 | ```python
312 | np.argmax(predictions[0])
313 | ```
314 | 
315 | `9`
316 | 
317 | 因此，模型非常确信这张图像是踝靴或属于 class_names[9]。我们可以检查测试标签以查看该预测是否正确：
318 | 
319 | ```python
320 | test_labels[0]
321 | ```
322 | 
323 | `9`
324 | 
325 | 我们可以将该预测绘制成图来查看全部 10 个通道
326 | 
327 | ```python
328 | def plot_image(i, predictions_array, true_label, img):
329 |   predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
330 |   plt.grid(False)
331 |   plt.xticks([])
332 |   plt.yticks([])
333 | 
334 |   plt.imshow(img, cmap=plt.cm.binary)
335 | 
336 |   predicted_label = np.argmax(predictions_array)
337 |   if predicted_label == true_label:
338 |     color = 'blue'
339 |   else:
340 |     color = 'red'
341 | 
342 |   plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
343 |                                 100*np.max(predictions_array),
344 |                                 class_names[true_label]),
345 |                                 color=color)
346 | 
347 | def plot_value_array(i, predictions_array, true_label):
348 |   predictions_array, true_label = predictions_array[i], true_label[i]
349 |   plt.grid(False)
350 |   plt.xticks([])
351 |   plt.yticks([])
352 |   thisplot = plt.bar(range(10), predictions_array, color="#777777")
353 |   plt.ylim([0, 1])
354 |   predicted_label = np.argmax(predictions_array)
355 | 
356 |   thisplot[predicted_label].set_color('red')
357 |   thisplot[true_label].set_color('blue')
358 | ```
359 | 
360 | 让我们看看第0个图像，预测和预测数组。
361 | 
362 | ```python
363 | i = 0
364 | plt.figure(figsize=(6,3))
365 | plt.subplot(1,2,1)
366 | plot_image(i, predictions, test_labels, test_images)
367 | plt.subplot(1,2,2)
368 | plot_value_array(i, predictions,  test_labels)
369 | plt.show()
370 | ```
371 | 
372 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_48_0.png)
373 | 
374 | ```python
375 | i = 12
376 | plt.figure(figsize=(6,3))
377 | plt.subplot(1,2,1)
378 | plot_image(i, predictions, test_labels, test_images)
379 | plt.subplot(1,2,2)
380 | plot_value_array(i, predictions,  test_labels)
381 | plt.show()
382 | ```
383 | 
384 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_49_0.png)
385 | 
386 | 我们用它们的预测绘制几张图像。正确的预测标签为蓝色，错误的预测标签为红色。数字表示预测标签的百分比（总计为 100）。请注意，即使置信度非常高，也有可能预测错误。
387 | 
388 | ```python
389 | # 绘制前X个测试图像，预测标签和真实标签。 
390 | # 用蓝色标记正确的预测，用红色标记错误的预测。
391 | num_rows = 5
392 | num_cols = 3
393 | num_images = num_rows*num_cols
394 | plt.figure(figsize=(2*2*num_cols, 2*num_rows))
395 | for i in range(num_images):
396 |   plt.subplot(num_rows, 2*num_cols, 2*i+1)
397 |   plot_image(i, predictions, test_labels, test_images)
398 |   plt.subplot(num_rows, 2*num_cols, 2*i+2)
399 |   plot_value_array(i, predictions, test_labels)
400 | plt.show()
401 | ```
402 | 
403 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_51_0.png)
404 | 
405 | 最后，使用训练的模型对单个图像进行预测。
406 | 
407 | ```python
408 | # 从测试数据集中获取图像
409 | img = test_images[0]
410 | 
411 | print(img.shape)
412 | ```
413 | 
414 | `tf.keras`模型已经过优化，可以一次性对样本批次或样本集进行预测。因此，即使我们使用单个图像，仍需要将其添加到列表中：
415 | 
416 | ```python
417 | # 将图像添加到批次中，它是唯一的成员。 
418 | img = (np.expand_dims(img,0))
419 | 
420 | print(img.shape)
421 | ```
422 | 
423 | `(1, 28, 28)`
424 | 
425 | 现在预测此图像的正确标签：
426 | 
427 | ```python
428 | predictions_single = model.predict(img)
429 | 
430 | print(predictions_single)
431 | ```
432 | 
433 | ```python
434 | plot_value_array(0, predictions_single, test_labels)
435 | _ = plt.xticks(range(10), class_names, rotation=45)
436 | ```
437 | 
438 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_58_0.png)
439 | 
440 | `model.predict`返回一组列表，每个列表对应批次数据中的每张图像。（仅）获取批次数据中相应图像的预测结果：
441 | 
442 | ```python
443 | np.argmax(predictions_single[0])
444 | ```
445 | 
446 | `9`
447 | 
448 | 和前面的一样，模型预测标签为9。
449 | 
450 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_classification.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_classification.html)
451 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_classification](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification)
452 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_classification.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_classification.md)
453 | 


--------------------------------------------------------------------------------
/r2/tutorials/keras/basic_regression.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 回归项目实战：预测燃油效率
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1914
  6 | abbrlink: tensorflow/tf2-tutorials-keras-basic_regression
  7 | ---
  8 | 
  9 | # 回归项目实战：预测燃油效率 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 在*回归*问题中，我们的目标是预测连续值的输出，如价格或概率。
 12 | 将此与*分类*问题进行对比，分类的目标是从类列表中选择一个类（例如，图片包含苹果或橙色，识别图片中的哪个水果）。
 13 | 
 14 | 本章节采用了经典的[Auto MPG](https://archive.ics.uci.edu/ml/datasets/auto+mpg) 数据集，并建立了一个模型来预测20世纪70年代末和80年代初汽车的燃油效率。为此，我们将为该模型提供该时段内许多汽车的描述，此描述包括以下属性：气缸，排量，马力和重量。
 15 | 
 16 | 此实例使用tf.keras API，有关信息信息，请参阅[Keras指南](https://tensorflow.google.cn/guide/keras)。
 17 | 
 18 | ```python
 19 | # 使用seaborn进行pairplot数据可视化，安装命令
 20 | pip install seaborn
 21 | ```
 22 | 
 23 | ```python
 24 | from __future__ import absolute_import, division, print_function, unicode_literals
 25 | 
 26 | import pathlib
 27 | 
 28 | import matplotlib.pyplot as plt
 29 | import pandas as pd
 30 | import seaborn as sns
 31 | 
 32 | # tensorflow2 安装命令 pip install tensorflow==2.0.0-alpha0
 33 | import tensorflow as tf
 34 | 
 35 | from tensorflow import keras
 36 | from tensorflow.keras import layers
 37 | 
 38 | print(tf.__version__)
 39 | ```
 40 | 
 41 | ## 1. Auto MPG数据集
 42 | 
 43 | 该数据集可从[UCI机器学习库](https://archive.ics.uci.edu/ml/)获得。
 44 | 
 45 | ### 1.1. 获取数据
 46 | 
 47 | 首先下载数据集：
 48 | 
 49 | ```python
 50 | dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
 51 | dataset_path
 52 | ```
 53 | 
 54 | 用pandas导入数据
 55 | 
 56 | ```python
 57 | column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
 58 |                 'Acceleration', 'Model Year', 'Origin']
 59 | raw_dataset = pd.read_csv(dataset_path, names=column_names,
 60 |                       na_values = "?", comment='\t',
 61 |                       sep=" ", skipinitialspace=True)
 62 | 
 63 | dataset = raw_dataset.copy()
 64 | dataset.tail()
 65 | ```
 66 | 
 67 | |     | MPG  | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model Year | Origin |
 68 | |-----|------|-----------|--------------|------------|--------|--------------|------------|--------|
 69 | | 393 | 27.0 | 4         | 140.0        | 86.0       | 2790.0 | 15.6         | 82         | 1      |
 70 | | 394 | 44.0 | 4         | 97.0         | 52.0       | 2130.0 | 24.6         | 82         | 2      |
 71 | | 395 | 32.0 | 4         | 135.0        | 84.0       | 2295.0 | 11.6         | 82         | 1      |
 72 | | 396 | 28.0 | 4         | 120.0        | 79.0       | 2625.0 | 18.6         | 82         | 1      |
 73 | | 397 | 31.0 | 4         | 119.0        | 82.0       | 2720.0 | 19.4         | 82         | 1      |
 74 | 
 75 | ### 1.2. 清理数据
 76 | 
 77 | 数据集包含一些未知值
 78 | 
 79 | ```python
 80 | dataset.isna().sum()
 81 | ```
 82 | 
 83 | ```output
 84 | MPG             0
 85 | Cylinders       0
 86 | Displacement    0
 87 | Horsepower      6
 88 | Weight          0
 89 | Acceleration    0
 90 | Model Year      0
 91 | Origin          0
 92 | dtype: int64
 93 | ```
 94 | 
 95 | 这是一个入门教程，所以我们就简单地删除这些行。
 96 | 
 97 | ```python
 98 | dataset = dataset.dropna()
 99 | ```
100 | 
101 | “Origin”这一列实际上是分类，而不是数字。 所以把它转换为独热编码：
102 | 
103 | ```python
104 | origin = dataset.pop('Origin')
105 | ```
106 | 
107 | ```python
108 | dataset['USA'] = (origin == 1)*1.0
109 | dataset['Europe'] = (origin == 2)*1.0
110 | dataset['Japan'] = (origin == 3)*1.0
111 | dataset.tail()
112 | ```
113 | 
114 | |     | MPG  | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model Year | USA | Europe | Japan |
115 | |-----|------|-----------|--------------|------------|--------|--------------|------------|-----|--------|-------|
116 | | 393 | 27.0 | 4         | 140.0        | 86.0       | 2790.0 | 15.6         | 82         | 1.0 | 0.0    | 0.0   |
117 | | 394 | 44.0 | 4         | 97.0         | 52.0       | 2130.0 | 24.6         | 82         | 0.0 | 1.0    | 0.0   |
118 | | 395 | 32.0 | 4         | 135.0        | 84.0       | 2295.0 | 11.6         | 82         | 1.0 | 0.0    | 0.0   |
119 | | 396 | 28.0 | 4         | 120.0        | 79.0       | 2625.0 | 18.6         | 82         | 1.0 | 0.0    | 0.0   |
120 | | 397 | 31.0 | 4         | 119.0        | 82.0       | 2720.0 | 19.4         | 82         | 1.0 | 0.0    | 0.0   |
121 | 
122 | ### 1.3. 将数据分为训练集和测试集
123 | 
124 | 现在将数据集拆分为训练集和测试集，我们将在模型的最终评估中使用测试集。
125 | 
126 | ```python
127 | train_dataset = dataset.sample(frac=0.8,random_state=0)
128 | test_dataset = dataset.drop(train_dataset.index)
129 | ```
130 | 
131 | ### 1.4. 检查数据
132 | 
133 | 快速浏览训练集中几对列的联合分布：
134 | 
135 | ```python
136 | sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")
137 | ```
138 | 
139 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_20_1.png)
140 | 
141 | 另外查看整体统计数据：
142 | 
143 | ```python
144 | train_stats = train_dataset.describe()
145 | train_stats.pop("MPG")
146 | train_stats = train_stats.transpose()
147 | train_stats
148 | ```
149 | 
150 | |              | count | mean        | std        | min    | 25%     | 50%    | 75%     | max    |
151 | |--------------|-------|-------------|------------|--------|---------|--------|---------|--------|
152 | | Cylinders    | 314.0 | 5.477707    | 1.699788   | 3.0    | 4.00    | 4.0    | 8.00    | 8.0    |
153 | | Displacement | 314.0 | 195.318471  | 104.331589 | 68.0   | 105.50  | 151.0  | 265.75  | 455.0  |
154 | | Horsepower   | 314.0 | 104.869427  | 38.096214  | 46.0   | 76.25   | 94.5   | 128.00  | 225.0  |
155 | | Weight       | 314.0 | 2990.251592 | 843.898596 | 1649.0 | 2256.50 | 2822.5 | 3608.00 | 5140.0 |
156 | | Acceleration | 314.0 | 15.559236   | 2.789230   | 8.0    | 13.80   | 15.5   | 17.20   | 24.8   |
157 | | Model Year   | 314.0 | 75.898089   | 3.675642   | 70.0   | 73.00   | 76.0   | 79.00   | 82.0   |
158 | | USA          | 314.0 | 0.624204    | 0.485101   | 0.0    | 0.00    | 1.0    | 1.00    | 1.0    |
159 | | Europe       | 314.0 | 0.178344    | 0.383413   | 0.0    | 0.00    | 0.0    | 0.00    | 1.0    |
160 | | Japan        | 314.0 | 0.197452    | 0.398712   | 0.0    | 0.00    | 0.0    | 0.00    | 1.0    |
161 | 
162 | ### 1.5. 从标签中分割特征
163 | 
164 | 将目标值或“标签”与特征分开，此标签是您训练的模型进行预测的值：
165 | 
166 | ```python
167 | train_labels = train_dataset.pop('MPG')
168 | test_labels = test_dataset.pop('MPG')
169 | ```
170 | 
171 | ### 1.6. 标准化数据
172 | 
173 | 再看一下上面的`train_stats`块，注意每个特征的范围有多么不同。
174 | 
175 | 使用不同的比例和范围对特征进行标准化是一个很好的实践，虽然模型可能在没有特征标准化的情况下收敛，但它使训练更加困难，并且它使得最终模型取决于输入中使用的单位的选择。
176 | 
177 | 注意：尽管我们仅从训练数据集中有意生成这些统计信息，但这些统计信息也将用于标准化测试数据集。我们需要这样做，将测试数据集投影到模型已经训练过的相同分布中。
178 | 
179 | ```python
180 | def norm(x):
181 |   return (x - train_stats['mean']) / train_stats['std']
182 | normed_train_data = norm(train_dataset)
183 | normed_test_data = norm(test_dataset)
184 | ```
185 | 
186 | 这个标准化数据是我们用来训练模型的数据。
187 | 
188 | 注意：用于标准化输入的统计数据（平均值和标准偏差）需要应用于输入模型的任何其他数据，以及我们之前执行的独热编码。这包括测试集以及模型在生产中使用时的实时数据。
189 | 
190 | ## 2. 模型
191 | 
192 | ### 2.1. 构建模型
193 | 
194 | 让我们建立我们的模型。在这里，我们将使用具有两个密集连接隐藏层的`Sequential`模型，以及返回单个连续值的输出层。模型构建步骤包含在函数`build_model`中，因为我们稍后将创建第二个模型。
195 | 
196 | ```python
197 | def build_model():
198 |   model = keras.Sequential([
199 |     layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
200 |     layers.Dense(64, activation='relu'),
201 |     layers.Dense(1)
202 |   ])
203 | 
204 |   optimizer = tf.keras.optimizers.RMSprop(0.001)
205 | 
206 |   model.compile(loss='mse',
207 |                 optimizer=optimizer,
208 |                 metrics=['mae', 'mse'])
209 |   return model
210 | ```
211 | 
212 | ```python
213 | model = build_model()
214 | ```
215 | 
216 | ### 2.2. 检查模型
217 | 
218 | 使用`.summary`方法打印模型的简单描述
219 | 
220 | ```python
221 | model.summary()
222 | ```
223 | 
224 | ```output
225 | Model: "sequential"
226 | _________________________________________________________________
227 | Layer (type)                 Output Shape              Param #   
228 | =================================================================
229 | dense (Dense)                (None, 64)                640       
230 | _________________________________________________________________
231 | dense_1 (Dense)              (None, 64)                4160      
232 | _________________________________________________________________
233 | dense_2 (Dense)              (None, 1)                 65        
234 | =================================================================
235 | Total params: 4,865
236 | Trainable params: 4,865
237 | Non-trainable params: 0
238 | _________________________________________________________________
239 | ```
240 | 
241 | 现在试试这个模型。从训练数据中取出一批10个样本数据并在调用`model.predict`函数。
242 | 
243 | ```python
244 | example_batch = normed_train_data[:10]
245 | example_result = model.predict(example_batch)
246 | example_result
247 | ```
248 | 
249 | ```output
250 |       array([[ 0.3297699 ],
251 |             [ 0.25655937],
252 |             [-0.12460149],
253 |             [ 0.32495883],
254 |             [ 0.50459725],
255 |             [ 0.10887371],
256 |             [ 0.57305855],
257 |             [ 0.57637435],
258 |             [ 0.12094647],
259 |             [ 0.6864784 ]], dtype=float32)
260 | ```
261 | 
262 | 这似乎可以工作，它产生预期的shape和类型的结果。
263 | 
264 | ### 2.3. 训练模型
265 | 
266 | 训练模型1000个周期，并在`history`对象中记录训练和验证准确性：
267 | 
268 | ```python
269 | # 通过为每个完成的周期打印单个点来显示训练进度 
270 | class PrintDot(keras.callbacks.Callback):
271 |   def on_epoch_end(self, epoch, logs):
272 |     if epoch % 100 == 0: print('')
273 |     print('.', end='')
274 | 
275 | EPOCHS = 1000
276 | 
277 | history = model.fit(
278 |   normed_train_data, train_labels,
279 |   epochs=EPOCHS, validation_split = 0.2, verbose=0,
280 |   callbacks=[PrintDot()])
281 | ```
282 | 
283 | 使用存储在`history`对象中的统计数据可视化模型的训练进度。300
284 | 
285 | ```python
286 | hist = pd.DataFrame(history.history)
287 | hist['epoch'] = history.epoch
288 | hist.tail()
289 | ```
290 | 
291 | |     | loss     | mae      | mse      | val_loss  | val_mae  | val_mse   | epoch |
292 | |-----|----------|----------|----------|-----------|----------|-----------|-------|
293 | | 995 | 2.556746 | 0.988013 | 2.556746 | 10.210531 | 2.324411 | 10.210530 | 995   |
294 | | 996 | 2.597973 | 1.039339 | 2.597973 | 11.257273 | 2.469266 | 11.257273 | 996   |
295 | | 997 | 2.671929 | 1.040886 | 2.671929 | 10.604957 | 2.446257 | 10.604958 | 997   |
296 | | 998 | 2.634858 | 1.001898 | 2.634858 | 10.906935 | 2.373279 | 10.906935 | 998   |
297 | | 999 | 2.741717 | 1.035889 | 2.741717 | 10.698320 | 2.342703 | 10.698319 | 999   |
298 | 
299 | ```python
300 | def plot_history(history):
301 |   hist = pd.DataFrame(history.history)
302 |   hist['epoch'] = history.epoch
303 | 
304 |   plt.figure()
305 |   plt.xlabel('Epoch')
306 |   plt.ylabel('Mean Abs Error [MPG]')
307 |   plt.plot(hist['epoch'], hist['mae'],
308 |            label='Train Error')
309 |   plt.plot(hist['epoch'], hist['val_mae'],
310 |            label = 'Val Error')
311 |   plt.ylim([0,5])
312 |   plt.legend()
313 | 
314 |   plt.figure()
315 |   plt.xlabel('Epoch')
316 |   plt.ylabel('Mean Square Error [$MPG^2$]')
317 |   plt.plot(hist['epoch'], hist['mse'],
318 |            label='Train Error')
319 |   plt.plot(hist['epoch'], hist['val_mse'],
320 |            label = 'Val Error')
321 |   plt.ylim([0,20])
322 |   plt.legend()
323 |   plt.show()
324 | 
325 | 
326 | plot_history(history)
327 | ```
328 | 
329 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_42_0.png?dcb_=0.7319815786783315)
330 | 
331 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_42_1.png?dcb_=0.09774210050560783)
332 | 
333 | 该图表显示在约100个周期之后，验证误差几乎没有改进，甚至降低。让我们更新`model.fit`调用，以便在验证分数没有提高时自动停止训练。我们将使用`EarlyStopping`回调来测试每个周期的训练状态。如果经过一定数量的周期而没有显示出改进，则自动停止训练。
334 | 
335 | 您可以了解此回调的更多信息 [连接](https://tensorflow.google.cn/versions/master/api_docs/python/tf/keras/callbacks/EarlyStopping).
336 | 
337 | ```python
338 | model = build_model()
339 | 
340 | # “patience”参数是检查改进的周期量 
341 | early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
342 | 
343 | history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
344 |                     validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])
345 | 
346 | plot_history(history)
347 | ```
348 | 
349 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_44_1.png?dcb_=0.8643233947217597)
350 | 
351 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_44_2.png?dcb_=0.8788778722328034)
352 | 
353 | 上图显示在验证集上平均误差通常约为+/-2MPG，这个好吗？我们会把这个决定留给你。
354 | 
355 | 让我们看一下使用测试集来看一下泛化模型效果，我们在训练模型时没有使用测试集，这告诉我们，当我们在现实世界中使用模型时，我们可以期待模型预测。
356 | 
357 | ```python
358 | loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)
359 | 
360 | print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))
361 | ```
362 | 
363 | `Testing set Mean Abs Error:  2.09 MPG`
364 | 
365 | ### 2.4. 预测
366 | 
367 | 最后，使用测试集中的数据预测MPG值：
368 | 
369 | ```python
370 | test_predictions = model.predict(normed_test_data).flatten()
371 | 
372 | plt.scatter(test_labels, test_predictions)
373 | plt.xlabel('True Values [MPG]')
374 | plt.ylabel('Predictions [MPG]')
375 | plt.axis('equal')
376 | plt.axis('square')
377 | plt.xlim([0,plt.xlim()[1]])
378 | plt.ylim([0,plt.ylim()[1]])
379 | _ = plt.plot([-100, 100], [-100, 100])
380 | 
381 | ```
382 | 
383 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_48_0.png?dcb_=0.5259404035812005)
384 | 
385 | 看起来我们的模型预测得相当好，我们来看看错误分布：
386 | 
387 | ```python
388 | error = test_predictions - test_labels
389 | plt.hist(error, bins = 25)
390 | plt.xlabel("Prediction Error [MPG]")
391 | _ = plt.ylabel("Count")
392 | ```
393 | 
394 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_50_0.png?dcb_=0.042220469967213514)
395 | 
396 | 上图看起来不是很高斯（正态分布），很可能是因为样本数据非常少。
397 | 
398 | ## 3. 结论
399 | 
400 | 本章节介绍了一些处理回归问题的技巧：
401 | 
402 | * 均方误差（MSE）是用于回归问题的常见损失函数（不同的损失函数用于分类问题）。
403 | 
404 | * 同样，用于回归的评估指标与分类不同，常见的回归度量是平均绝对误差（MAE）。
405 | 
406 | * 当数字输入数据特征具有不同范围的值时，应将每个特征独立地缩放到相同范围。
407 | 
408 | * 如果没有太多训练数据，应选择隐藏层很少的小网络，以避免过拟合。
409 | 
410 | * 尽早停止是防止过拟合的有效技巧。
411 | 
412 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_regression.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_regression.html)
413 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_regression](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression)
414 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_regression.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_regression.md)
415 | 


--------------------------------------------------------------------------------
/r2/tutorials/keras/basic_text_classification.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 文本分类项目实战：电影评论
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1927
  6 | abbrlink: tensorflow/tf2-tutorials-keras-basic_text_classification
  7 | ---
  8 | 
  9 | # 文本分类项目实战：电影评论 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 本文会将文本形式的影评分为“正面”或“负面”影评。这是一个二元分类（又称为两类分类）的示例，也是一种重要且广泛适用的机器学习问题。
 12 | 
 13 | 我们将使用包含来自[网络电影数据库](https://www.imdb.com/)的50,000条电影评论文本的[IMDB数据集](https://tensorflow.google.cn/api_docs/python/tf/keras/datasets/imdb)，这些被分为25,000条训练评论和25,000条评估评论，训练和测试集是平衡的，这意味着它们包含相同数量的正面和负面评论。
 14 | 
 15 | 本章节使用tf.keras，这是一个高级API，用于在TensorFlow中构建和训练模型，有关使用tf.keras的更高级文本分类教程，请参阅[MLCC文本分类指南](https://developers.google.cn/machine-learning/guides/text-classification/)。
 16 | 
 17 | ```python
 18 | from __future__ import absolute_import, division, print_function, unicode_literals
 19 | 
 20 | import tensorflow as tf
 21 | from tensorflow import keras
 22 | 
 23 | import numpy as np
 24 | 
 25 | print(tf.__version__)
 26 | ```
 27 | 
 28 | `2.0.0-alpha0`
 29 | 
 30 | ## 1. 下载IMDB数据集
 31 | 
 32 | IMDB数据集与TensorFlow一起打包，它已经被预处理，使得评论（单词序列）已被转换为整数序列，其中每个整数表示字典中的特定单词。
 33 | 
 34 | 以下代码将IMDB数据集下载到您的计算机（如果您已经下载了它，则使用缓存副本）：
 35 | 
 36 | ```python
 37 | imdb = keras.datasets.imdb
 38 | 
 39 | (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
 40 | ```  
 41 | 
 42 | 参数 `num_words=10000` 保留训练数据中最常出现的10,000个单词，丢弃罕见的单词以保持数据的大小可管理。
 43 | 
 44 | ## 2. 探索数据
 45 | 
 46 | 我们花一点时间来理解数据的格式，数据集经过预处理：每个示例都是一个整数数组，表示电影评论的单词。每个标签都是0或1的整数值，其中0表示负面评论，1表示正面评论。
 47 | 
 48 | ```python
 49 | print("Training entries: {}, labels: {}".format(len(train_data), len(train_labels)))
 50 | ```
 51 | 
 52 | `Training entries: 25000, labels: 25000`
 53 | 
 54 | 评论文本已转换为整数，其中每个整数表示字典中的特定单词。以下是第一篇评论的内容：
 55 | 
 56 | ```python
 57 | print(train_data[0])
 58 | ```
 59 | 
 60 | `[1, 14, 22, 16, 43, 530, 973, ...., 32, 15, 16, 5345, 19, 178, 32]`
 61 | 
 62 | 电影评论的长度可能不同，以下代码显示了第一次和第二次评论中的字数。由于对神经网络的输入必须是相同的长度，我们稍后需要解决此问题。
 63 | 
 64 | ```python
 65 | len(train_data[0]), len(train_data[1])
 66 | ```
 67 | 
 68 | `(218, 189)`
 69 | 
 70 | ### 2.1. 将整数转换成文本
 71 | 
 72 | 了解如何将整数转换回文本可能很有用。
 73 | 在这里，我们将创建一个辅助函数来查询包含整数到字符串映射的字典对象：
 74 | 
 75 | ```python
 76 | # 将单词映射到整数索引的字典
 77 | word_index = imdb.get_word_index()
 78 | 
 79 | # 第一个指数是保留的
 80 | word_index = {k:(v+3) for k,v in word_index.items()}
 81 | word_index["<PAD>"] = 0
 82 | word_index["<START>"] = 1
 83 | word_index["<UNK>"] = 2  # unknown
 84 | word_index["<UNUSED>"] = 3
 85 | 
 86 | reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
 87 | 
 88 | def decode_review(text):
 89 |     return ' '.join([reverse_word_index.get(i, '?') for i in text])
 90 | ```
 91 | 
 92 | 现在我们可以使用`decode_review`函数显示第一次检查的文本：
 93 | 
 94 | ```python
 95 | decode_review(train_data[0])
 96 | ```
 97 | 
 98 | *"<START> this film was just brilliant casting location scenery story direction .....that was shared with us all"*
 99 | 
100 | ## 3. 预处理数据
101 | 
102 | 影评（整数数组）必须转换为张量，然后才能馈送到神经网络中。我们可以通过以下两种方法实现这种转换：
103 | 
104 | * 对数组进行独热编码，将它们转换为由 0 和 1 构成的向量。例如，序列 [3, 5] 将变成一个 10000 维的向量，除索引 3 和 5 转换为 1 之外，其余全转换为 0。然后，将它作为网络的第一层，一个可以处理浮点向量数据的密集层。不过，这种方法会占用大量内存，需要一个大小为 `num_words * num_reviews` 的矩阵。
105 | 
106 | * 或者，我们可以填充数组，使它们都具有相同的长度，然后创建一个形状为 `max_length * num_reviews` 的整数张量。我们可以使用一个能够处理这种形状的嵌入层作为网络中的第一层。
107 | 
108 | 在本教程中，我们将使用第二种方法。
109 | 
110 | 由于电影评论的长度必须相同，我们将使用[pad_sequences](https://tensorflow.google.cn/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences)函数来标准化长度：
111 | 
112 | ```python
113 | train_data = keras.preprocessing.sequence.pad_sequences(train_data,
114 |                                                         value=word_index["<PAD>"],
115 |                                                         padding='post',
116 |                                                         maxlen=256)
117 | 
118 | test_data = keras.preprocessing.sequence.pad_sequences(test_data,
119 |                                                        value=word_index["<PAD>"],
120 |                                                        padding='post',
121 |                                                        maxlen=256)
122 | ```
123 | 
124 | 我们再看一下数据的长度：
125 | 
126 | ```python
127 | len(train_data[0]), len(train_data[1])
128 | ```
129 | 
130 | `(256, 256)`
131 | 
132 | 并查看数据：
133 | 
134 | ```python
135 | print(train_data[0])
136 | ```
137 | 
138 | ```output
139 | [   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941
140 |     4  173   36  256    5   25  100   43  838  112   50  670    2    9
141 |   ...
142 |     0    0    0    0    0    0    0    0    0    0    0    0    0    0
143 |     0    0    0    0]
144 | ```
145 | 
146 | ## 4. 构建模型
147 | 
148 | 神经网络通过堆叠层创建而成，这需要做出两个架构方面的主要决策：
149 | 
150 | * 要在模型中使用多少个层？
151 | * 要针对每个层使用多少个隐藏单元？
152 | 
153 | 在本示例中，输入数据由字词-索引数组构成。要预测的标签是 0 或 1。接下来，我们为此问题构建一个模型：
154 | 
155 | ```python
156 | # 输入形状是用于电影评论的词汇计数（10,000字）
157 | vocab_size = 10000
158 | 
159 | model = keras.Sequential()
160 | model.add(keras.layers.Embedding(vocab_size, 16))
161 | model.add(keras.layers.GlobalAveragePooling1D())
162 | model.add(keras.layers.Dense(16, activation='relu'))
163 | model.add(keras.layers.Dense(1, activation='sigmoid'))
164 | 
165 | model.summary()
166 | ```
167 | 
168 | ```
169 | Model: "sequential"
170 | _________________________________________________________________
171 | Layer (type)                 Output Shape              Param #   
172 | =================================================================
173 | embedding (Embedding)        (None, None, 16)          160000    
174 | _________________________________________________________________
175 | global_average_pooling1d (Gl (None, 16)                0         
176 | _________________________________________________________________
177 | dense (Dense)                (None, 16)                272       
178 | _________________________________________________________________
179 | dense_1 (Dense)              (None, 1)                 17        
180 | =================================================================
181 | Total params: 160,289
182 | Trainable params: 160,289
183 | Non-trainable params: 0
184 | _________________________________________________________________
185 | ```
186 | 这些层按顺序堆叠以构建分类器：
187 | 
188 | 1. 第一层是`Embedding`层。该层采用整数编码的词汇表，并查找每个词索引的嵌入向量。这些向量是作为模型训练学习的，向量为输入数组添加维度，生成的维度为：`(batch, sequence, embedding)`.
189 | 
190 | 2. 接下来，`GlobalAveragePooling1D`层通过对序列维度求平均值，针对每个样本返回一个长度固定的输出向量。这样，模型便能够以尽可能简单的方式处理各种长度的输入。
191 | 
192 | 3. 该长度固定的输出向量会传入一个全连接 (Dense) 层（包含 16 个隐藏单元）
193 | 
194 | 4. 最后一层与单个输出节点密集连接。应用`sigmoid`激活函数后，结果是介于 0 到 1 之间的浮点值，表示概率或置信水平。
195 | 
196 | ### 4.1. 隐藏单元
197 | 
198 | 上述模型在输入和输出之间有两个中间层（也称为“隐藏”层）。输出（单元、节点或神经元）的数量是相应层的表示法空间的维度。换句话说，该数值表示学习内部表示法时网络所允许的自由度。
199 | 
200 | 如果模型具有更多隐藏单元（更高维度的表示空间）和/或更多层，则说明网络可以学习更复杂的表示法。不过，这会使网络耗费更多计算资源，并且可能导致学习不必要的模式（可以优化在训练数据上的表现，但不会优化在测试数据上的表现）。这称为过拟合，我们稍后会加以探讨。
201 | 
202 | ### 4.2. 损失函数和优化器
203 | 
204 | 模型需要一个损失函数和一个用于训练的优化器。由于这是一个二元分类问题，并且模型输出概率（网络最后一层使用sigmoid 激活函数，仅包含一个单元），那么最好使用`binary_crossentropy`（二元交叉熵）损失。
205 | 
206 | 这不是损失函数的唯一选择，例如，您可以选择`mean_squared_error`（均方误差）。但对于输出概率值的模型，交叉熵（crossentropy）往往是最好
207 | 的选择。交叉熵是来自于信息论领域的概念，用于衡量概率分布之间的距离，在这个例子中就是真实分布与预测值之间的距离。。
208 | 
209 | 在后面，当我们探索回归问题（比如预测房子的价格）时，我们将看到如何使用另一种称为均方误差的损失函数。
210 | 
211 | 现在，配置模型以使用优化器和损失函数：
212 | 
213 | ```python
214 | model.compile(optimizer='adam',
215 |               loss='binary_crossentropy',
216 |               metrics=['accuracy'])
217 | ```
218 | 
219 | ## 5. 创建验证集
220 | 
221 | 在训练时，我们想要检查模型在以前没有见过的数据上的准确性。通过从原始训练数据中分离10,000个示例来创建验证集。（为什么不立即使用测试集？我们的目标是仅使用训练数据开发和调整我们的模型，然后仅使用测试数据来评估我们的准确性）。
222 | 
223 | ```python
224 | x_val = train_data[:10000]
225 | partial_x_train = train_data[10000:]
226 | 
227 | y_val = train_labels[:10000]
228 | partial_y_train = train_labels[10000:]
229 | ```
230 | 
231 | ## 6. 训练模型
232 | 
233 | 以512个样本的小批量训练模型40个周期，这是`x_train`和`y_train`张量中所有样本的40次迭代。在训练期间，监控模型在验证集中的10,000个样本的损失和准确性：
234 | 
235 | ```python
236 | history = model.fit(partial_x_train,
237 |                     partial_y_train,
238 |                     epochs=40,
239 |                     batch_size=512,
240 |                     validation_data=(x_val, y_val),
241 |                     verbose=1)
242 | ```
243 | 
244 | `Epoch 40/40
245 | 15000/15000 [==============================] - 1s 54us/sample - loss: 0.0926 - accuracy: 0.9771 - val_loss: 0.3133 - val_accuracy: 0.8824`
246 | 
247 | ## 7. 评估模型
248 | 
249 | 让我们看看模型的表现，将返回两个值，损失（表示我们的错误的数字，更低的值更好）和准确性。
250 | 
251 | ```
252 | results = model.evaluate(test_data, test_labels)
253 | 
254 | print(results)
255 | ```
256 | 
257 | `25000/25000 [==============================] - 1s 45us/sample - loss: 0.3334 - accuracy: 0.8704
258 | [0.33341303256988525, 0.87036]`
259 | 
260 | 这种相当简单的方法实现了约87％的准确度，使用更先进的方法，模型应该接近95％。
261 | 
262 | ## 8. 创建准确性和损失随时间变化的图表
263 | 
264 | `model.fit()`返回一个`History`对象，其中包含一个字典，其中包含训练期间发生的所有事情：
265 | 
266 | ```python
267 | history_dict = history.history
268 | history_dict.keys()
269 | ```
270 | 
271 | ```output
272 |       dict_keys(['loss', 'val_loss', 'accuracy', 'val_accuracy'])
273 | ```
274 | 
275 | 有四个条目：在训练和验证期间，每个条目对应一个监控指标，我们可以使用这些来绘制训练和验证损失以进行比较，以及训练和验证准确性：
276 | 
277 | ```python
278 | import matplotlib.pyplot as plt
279 | 
280 | acc = history_dict['accuracy']
281 | val_acc = history_dict['val_accuracy']
282 | loss = history_dict['loss']
283 | val_loss = history_dict['val_loss']
284 | 
285 | epochs = range(1, len(acc) + 1)
286 | 
287 | # "bo" is for "blue dot"
288 | plt.plot(epochs, loss, 'bo', label='Training loss')
289 | # b is for "solid blue line"
290 | plt.plot(epochs, val_loss, 'b', label='Validation loss')
291 | plt.title('Training and validation loss')
292 | plt.xlabel('Epochs')
293 | plt.ylabel('Loss')
294 | plt.legend()
295 | 
296 | plt.show()
297 | ```
298 | 
299 | *<Figure size 640x480 with 1 Axes>*  
300 | 
301 | ```
302 | plt.clf()   # clear figure
303 | 
304 | plt.plot(epochs, acc, 'bo', label='Training acc')
305 | plt.plot(epochs, val_acc, 'b', label='Validation acc')
306 | plt.title('Training and validation accuracy')
307 | plt.xlabel('Epochs')
308 | plt.ylabel('Accuracy')
309 | plt.legend()
310 | 
311 | plt.show()
312 | ```
313 | 
314 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_files/output_40_0.png)
315 | 
316 | 在该图中，点表示训练损失和准确度，实线表示验证损失和准确度。
317 | 
318 | 可以注意到，训练损失随着周期数的增加而降低，训练准确率随着周期数的增加而提高。在使用梯度下降法优化模型时，这属于正常现象(该方法应在每次迭代时尽可能降低目标值)。
319 | 
320 | 验证损失和准确率的变化情况并非如此，它们似乎在大约 20 个周期后达到峰值。这是一种过拟合现象：模型在训练数据上的表现要优于在从未见过的数据上的表现。在此之后，模型会过度优化和学习特定于训练数据的表示法，而无法泛化到测试数据。
321 | 
322 | 对于这种特殊情况，我们可以在大约 20 个周期后停止训练，防止出现过拟合。稍后，您将了解如何使用回调自动执行此操作。
323 | 
324 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification.html)
325 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification)
326 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification.md)
327 | 


--------------------------------------------------------------------------------
/r2/tutorials/keras/basic_text_classification_with_tfhub.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 使用Keras和TensorFlow Hub对电影评论进行文本分类
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1918
  6 | abbrlink: tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub
  7 | ---
  8 | 
  9 | # 使用Keras和TensorFlow Hub对电影评论进行文本分类 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 此教程本会将文本形式的影评分为“正面”或“负面”影评。这是一个二元分类（又称为两类分类）的示例，也是一种重要且广泛适用的机器学习问题。
 12 | 
 13 | 本教程演示了使用TensorFlow Hub和Keras进行迁移学习的基本应用。
 14 | 
 15 | 数据集使用 [IMDB 数据集](https://tensorflow.google.cn/api_docs/python/tf/keras/datasets/imdb)，其中包含来自互联网电影数据库  https://www.imdb.com/ 的50000 条影评文本。我们将这些影评拆分为训练集（25000 条影评）和测试集（25000 条影评）。训练集和测试集之间达成了平衡，意味着它们包含相同数量的正面和负面影评。
 16 | 
 17 | 此教程使用[tf.keras](https://www.tensorflow.org/guide/keras)，一种用于在 TensorFlow 中构建和训练模型的高阶 API，以及[TensorFlow Hub](https://www.tensorflow.org/hub)，一个用于迁移学习的库和平台。
 18 | 
 19 | 有关使用 tf.keras 的更高级文本分类教程，请参阅 [MLCC 文本分类指南](https://developers.google.cn/machine-learning/guides/text-classification/)。
 20 | 
 21 | 导入库：
 22 | 
 23 | ```python
 24 | from __future__ import absolute_import, division, print_function, unicode_literals
 25 | 
 26 | import numpy as np
 27 | 
 28 | import tensorflow as tf
 29 | 
 30 | import tensorflow_hub as hub
 31 | import tensorflow_datasets as tfds
 32 | 
 33 | print("Version: ", tf.__version__)
 34 | print("Eager mode: ", tf.executing_eagerly())
 35 | print("Hub version: ", hub.__version__)
 36 | print("GPU is", "available" if tf.test.is_gpu_available() else "NOT AVAILABLE")
 37 | ```
 38 | 
 39 | ## 1. 下载 IMDB 数据集
 40 | 
 41 | [TensorFlow数据集](https://github.com/tensorflow/datasets)上提供了IMDB数据集。以下代码将IMDB数据集下载到您的机器：
 42 | 
 43 | ```python
 44 | # 将训练集分成60％和40％，因此我们最终会得到15,000个训练样本，10,000个验证样本和25,000个测试样本。
 45 | train_validation_split = tfds.Split.TRAIN.subsplit([6, 4])
 46 | 
 47 | (train_data, validation_data), test_data = tfds.load(
 48 |     name="imdb_reviews", 
 49 |     split=(train_validation_split, tfds.Split.TEST),
 50 |     as_supervised=True)
 51 | ```
 52 | 
 53 | ## 2. 探索数据 
 54 | 
 55 | 我们花点时间来了解一下数据的格式，每个样本表示电影评论和相应标签的句子，该句子不以任何方式进行预处理。每个标签都是整数值 0 或 1，其中 0 表示负面影评，1 表示正面影评。
 56 | 
 57 | 我们先打印10个样本。
 58 | 
 59 | ```python
 60 | train_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))
 61 | train_examples_batch
 62 | ```
 63 | 
 64 | 我们还打印前10个标签。
 65 | 
 66 | ```python
 67 | train_labels_batch
 68 | ```
 69 | 
 70 | ## 3. 构建模
 71 | 
 72 | 神经网络通过堆叠层创建而成，这需要做出三个架构方面的主要决策：
 73 | 
 74 | * 如何表示文字？
 75 | * 要在模型中使用多少个层？
 76 | * 要针对每个层使用多少个隐藏单元？
 77 | 
 78 | 在此示例中，输入数据由句子组成。要预测的标签是0或1。
 79 | 
 80 | 表示文本的一种方法是将句子转换为嵌入向量。我们可以使用预先训练的文本嵌入作为第一层，这将具有两个优点：
 81 | *  我们不必担心文本预处理，
 82 | *  我们可以从迁移学习中受益
 83 | *  嵌入具有固定的大小，因此处理起来更简单。
 84 | 
 85 | 对于此示例，我们将使用来自[TensorFlow Hub](https://www.tensorflow.org/hub) 的预训练文本嵌入模型，名为[google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1).
 86 | 
 87 | 要达到本教程的目的，还有其他三种预训练模型可供测试：
 88 | * [google/tf2-preview/gnews-swivel-20dim-with-oov/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1) 与 [google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1)相同，但2.5％的词汇量转换为OOV桶。如果模型的任务和词汇表的词汇不完全重叠，这可以提供帮助。
 89 | 
 90 | * [google/tf2-preview/nnlm-en-dim50/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1) 一个更大的模型，具有约1M的词汇量和50个维度。
 91 | * [google/tf2-preview/nnlm-en-dim128/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1) 甚至更大的模型，具有约1M的词汇量和128个维度。
 92 | 
 93 | 让我们首先创建一个使用TensorFlow Hub模型嵌入句子的Keras层，并在几个输入示例上进行尝试。请注意，无论输入文本的长度如何，嵌入的输出形状为：`(num_examples, embedding_dimension)`。
 94 | 
 95 | ```python
 96 | embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
 97 | hub_layer = hub.KerasLayer(embedding, input_shape=[], 
 98 |                            dtype=tf.string, trainable=True)
 99 | hub_layer(train_examples_batch[:3])
100 | ```
101 | 
102 | 现在让我们构建完整的模型：
103 | 
104 | ```python
105 | model = tf.keras.Sequential()
106 | model.add(hub_layer)
107 | model.add(tf.keras.layers.Dense(16, activation='relu'))
108 | model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
109 | 
110 | model.summary()
111 | ```
112 | 
113 | ```output
114 |             Model: "sequential" 
115 |             _________________________________________________________________ 
116 |             Layer (type) Output Shape Param # 
117 |             =================================================================
118 |             keras_layer (KerasLayer) (None, 20) 400020 
119 |             _________________________________________________________________ 
120 |             dense (Dense) (None, 16) 336
121 |             _________________________________________________________________ 
122 |             dense_1 (Dense) (None, 1) 17 
123 |             ================================================================= 
124 |             Total params: 400,373 Trainable params: 400,373 Non-trainable params: 0 
125 |             _________________________________________________________________
126 | ```
127 | 
128 | 这些图层按顺序堆叠以构建分类器：
129 | 1. 第一层是TensorFlow Hub层。该层使用预先训练的保存模型将句子映射到其嵌入向量。我们正在使用的预训练文本嵌入模型([google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1))将句子拆分为标记，嵌入每个标记然后组合嵌入。生成的维度为：`(num_examples, embedding_dimension)`。
130 | 
131 | 2. 这个固定长度的输出矢量通过一个带有16个隐藏单元的完全连接（“密集”）层传输。
132 | 3. 最后一层与单个输出节点密集连接。使用`sigmoid`激活函数，该值是0到1之间的浮点数，表示概率或置信度。
133 | 
134 | 让我们编译模型。
135 | 
136 | ### 3.1. 损失函数和优化器
137 | 
138 | 模型在训练时需要一个损失函数和一个优化器。由于这是一个二元分类问题且模型会输出一个概率（应用 S 型激活函数的单个单元层），因此我们将使用 binary_crossentropy 损失函数。
139 | 
140 | 该函数并不是唯一的损失函数，例如，您可以选择 mean_squared_error。但一般来说，binary_crossentropy 更适合处理概率问题，它可测量概率分布之间的“差距”，在本例中则为实际分布和预测之间的“差距”。
141 | 
142 | 稍后，在探索回归问题（比如预测房价）时，我们将了解如何使用另一个称为均方误差的损失函数。
143 | 
144 | 现在，配置模型以使用优化器和损失函数：
145 | 
146 | ```python
147 | model.compile(optimizer='adam',
148 |               loss='binary_crossentropy',
149 |               metrics=['accuracy'])
150 | ```
151 | 
152 | ## 4. 训练模型
153 | 
154 | 用有 512 个样本的小批次训练模型 40 个周期。这将对 x_train 和 y_train 张量中的所有样本进行 40 次迭代。在训练期间，监控模型在验证集的 10000 个样本上的损失和准确率：
155 | 
156 | ```python
157 | history = model.fit(train_data.shuffle(10000).batch(512),
158 |                     epochs=20,
159 |                     validation_data=validation_data.batch(512),
160 |                     verbose=1)
161 | ```
162 | 
163 | ```
164 | ...output
165 |             Epoch 20/20
166 |             30/30 [==============================] - 4s 144ms/step - loss: 0.2027 - accuracy: 0.9264 - val_loss: 0.3079 - val_accuracy: 0.8697
167 | ```
168 | 
169 | ## 5. 评估模型
170 | 
171 | 我们来看看模型的表现如何。模型会返回两个值：损失（表示误差的数字，越低越好）和准确率。
172 | 
173 | ```python
174 | results = model.evaluate(test_data.batch(512), verbose=0)
175 | for name, value in zip(model.metrics_names, results):
176 |   print("%s: %.3f" % (name, value))
177 | ```
178 | 
179 | ```
180 |             loss: 0.324 accuracy: 0.860
181 | ```
182 | 
183 | 使用这种相当简单的方法可实现约 87% 的准确率。如果采用更高级的方法，模型的准确率应该会接近 95%。
184 | 
185 | ## 6. 进一步阅读
186 | 
187 | 要了解处理字符串输入的更一般方法，以及更详细地分析训练过程中的准确性和损失，请查看 https://www.tensorflow.org/tutorials/keras/basic_text_classification
188 | 
189 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub.html)
190 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_with_tfhub](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_with_tfhub)
191 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification_with_tfhub.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification_with_tfhub.md)
192 | 
193 | 


--------------------------------------------------------------------------------
/r2/tutorials/keras/feature_columns.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 结构化数据分类实战：心脏病预测
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1913
  6 | abbrlink: tensorflow/tf2-tutorials-keras-feature_columns
  7 | ---
  8 | # 结构化数据分类实战：心脏病预测(tensorflow2.0官方教程翻译)
  9 | 
 10 | 本教程演示了如何对结构化数据进行分类（例如CSV格式的表格数据）。
 11 | 我们将使用Keras定义模型，并使用[特征列](https://tensorflow.google.cn/guide/feature_columns)作为桥梁，将CSV中的列映射到用于训练模型的特性。
 12 | 本教程包含完整的代码：
 13 | 
 14 | * 使用[Pandas](https://pandas.pydata.org/)加载CSV文件。 .
 15 | * 构建一个输入管道，使用[tf.data](https://tensorflow.google.cn/guide/datasets)批处理和洗牌行
 16 | * 从CSV中的列映射到用于训练模型的特性。
 17 | * 使用Keras构建、训练和评估模型。
 18 | 
 19 | ## 1. 数据集
 20 | 
 21 | 我们将使用克利夫兰诊所心脏病基金会提供的一个小[数据集](https://archive.ics.uci.edu/ml/datasets/heart+Disease) 。CSV中有几百行，每行描述一个患者，每列描述一个属性。我们将使用此信息来预测患者是否患有心脏病，该疾病在该数据集中是二元分类任务。
 22 | 
 23 | 以下是此[数据集的说明](https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names)。请注意，有数字和分类列。
 24 | 
 25 | >Column| Description| Feature Type | Data Type
 26 | >------------|--------------------|----------------------|-----------------
 27 | >Age | Age in years | Numerical | integer
 28 | >Sex | (1 = male; 0 = female) | Categorical | integer
 29 | >CP | Chest pain type (0, 1, 2, 3, 4) | Categorical | integer
 30 | >Trestbpd | Resting blood pressure (in mm Hg on admission to the hospital) | Numerical | integer
 31 | >Chol | Serum cholestoral in mg/dl | Numerical | integer
 32 | >FBS | (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) | Categorical | integer
 33 | >RestECG | Resting electrocardiographic results (0, 1, 2) | Categorical | integer
 34 | >Thalach | Maximum heart rate achieved | Numerical | integer
 35 | >Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical | integer
 36 | >Oldpeak | ST depression induced by exercise relative to rest | Numerical | integer
 37 | >Slope | The slope of the peak exercise ST segment | Numerical | float
 38 | >CA | Number of major vessels (0-3) colored by flourosopy | Numerical | integer
 39 | >Thal | 3 = normal; 6 = fixed defect; 7 = reversable defect | Categorical | string
 40 | >Target | Diagnosis of heart disease (1 = true; 0 = false) | Classification | integer
 41 | 
 42 | ## 2. 导入TensorFlow和其他库
 43 | 
 44 | 安装sklearn依赖库
 45 | 
 46 | ```python
 47 | pip install sklearn
 48 | ```
 49 | 
 50 | ```python
 51 | from __future__ import absolute_import, division, print_function, unicode_literals
 52 | 
 53 | import numpy as np
 54 | import pandas as pd
 55 | 
 56 | import tensorflow as tf
 57 | 
 58 | from tensorflow import feature_column
 59 | from tensorflow.keras import layers
 60 | from sklearn.model_selection import train_test_split
 61 | ```
 62 | 
 63 | ## 3. 使用Pandas创建数据帧
 64 | 
 65 | [Pandas](https://pandas.pydata.org/) 是一个Python库，包含许多有用的实用程序，用于加载和处理结构化数据。我们将使用Pandas从URL下载数据集，并将其加载到数据帧中。
 66 | 
 67 | ```python
 68 | URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
 69 | dataframe = pd.read_csv(URL)
 70 | dataframe.head()
 71 | ```
 72 | 
 73 | ## 4. 将数据拆分为训练、验证和测试
 74 | 
 75 | 我们下载的数据集是一个CSV文件，并将其分为训练，验证和测试集。
 76 | 
 77 | ```python
 78 | train, test = train_test_split(dataframe, test_size=0.2)
 79 | train, val = train_test_split(train, test_size=0.2)
 80 | print(len(train), 'train examples')
 81 | print(len(val), 'validation examples')
 82 | print(len(test), 'test examples')
 83 | ```
 84 | 
 85 | ```output
 86 |       193 train examples
 87 |       49 validation examples
 88 |       61 test examples
 89 | ```
 90 | 
 91 | ## 5. 使用tf.data创建输入管道
 92 | 
 93 | 接下来，我们将使用tf.data包装数据帧，这将使我们能够使用特征列作为桥梁从Pandas数据框中的列映射到用于训练模型的特征。如果我们使用非常大的CSV文件（如此之大以至于它不适合内存），我们将使用tf.data直接从磁盘读取它，本教程不涉及这一点。
 94 | 
 95 | ```python
 96 | # 一种从Pandas Dataframe创建tf.data数据集的使用方法 
 97 | def df_to_dataset(dataframe, shuffle=True, batch_size=32):
 98 |   dataframe = dataframe.copy()
 99 |   labels = dataframe.pop('target')
100 |   ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
101 |   if shuffle:
102 |     ds = ds.shuffle(buffer_size=len(dataframe))
103 |   ds = ds.batch(batch_size)
104 |   return ds
105 | ```
106 | 
107 | ```python
108 | batch_size = 5 # 小批量用于演示目的
109 | train_ds = df_to_dataset(train, batch_size=batch_size)
110 | val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
111 | test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
112 | ```
113 | 
114 | ## 6. 理解输入管道
115 | 
116 | 现在我们已经创建了输入管道，让我们调用它来查看它返回的数据的格式，我们使用了一小批量来保持输出的可读性。
117 | 
118 | ```python
119 | for feature_batch, label_batch in train_ds.take(1):
120 |   print('Every feature:', list(feature_batch.keys()))
121 |   print('A batch of ages:', feature_batch['age'])
122 |   print('A batch of targets:', label_batch )
123 | ```
124 | 
125 | ```output
126 |       Every feature: ['age', 'chol', 'fbs', 'ca', 'slope', 'restecg', 'sex', 'thal', 'thalach', 'oldpeak', 'exang', 'cp', 'trestbps']
127 |       A batch of ages: tf.Tensor([58 52 56 35 59], shape=(5,), dtype=int32)
128 |       A batch of targets: tf.Tensor([1 0 1 0 0], shape=(5,), dtype=int32)
129 | ```
130 | 
131 | 我们可以看到数据集返回一个列名称（来自数据帧），该列表映射到数据帧中行的列值。
132 | 
133 | ## 7. 演示几种类型的特征列
134 | 
135 | TensorFlow提供了许多类型的特性列。在本节中，我们将创建几种类型的特性列，并演示它们如何从dataframe转换列。
136 | 
137 | ```python
138 | # 我们将使用此批处理来演示几种类型的特征列 
139 | example_batch = next(iter(train_ds))[0]
140 | 
141 | # 用于创建特征列和转换批量数据 
142 | def demo(feature_column):
143 |   feature_layer = layers.DenseFeatures(feature_column)
144 |   print(feature_layer(example_batch).numpy())
145 | ```
146 | 
147 | ### 7.1. 数字列
148 | 
149 | 特征列的输出成为模型的输入（使用上面定义的演示函数，我们将能够准确地看到数据帧中每列的转换方式），[数字列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/numeric_column)是最简单的列类型，它用于表示真正有价值的特征，使用此列时，模型将从数据帧中接收未更改的列值。
150 | 
151 | ```python
152 | age = feature_column.numeric_column("age")
153 | demo(age)
154 | ```
155 | 
156 | ```output
157 |       [[58.]
158 |       [52.]
159 |       [56.]
160 |       [35.]
161 |       [59.]]
162 |  ```
163 | 
164 | 在心脏病数据集中，数据帧中的大多数列都是数字。
165 | 
166 | ### 7.2. Bucketized列（桶列）
167 | 
168 | 通常，您不希望将数字直接输入模型，而是根据数值范围将其值分成不同的类别，考虑代表一个人年龄的原始数据，我们可以使用[bucketized列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/bucketized_column)将年龄分成几个桶，而不是将年龄表示为数字列。
169 | 请注意，下面的one-hot(独热编码)值描述了每行匹配的年龄范围。
170 | 
171 | ```python
172 | age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
173 | demo(age_buckets)
174 | ```
175 | 
176 | ```output
177 |       [[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
178 |       [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
179 |       [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
180 |       [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
181 |       [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]
182 |  ```
183 | 
184 | ### 7.3. 分类列
185 | 
186 | 在该数据集中，thal表示为字符串（例如“固定”，“正常”或“可逆”），我们无法直接将字符串提供给模型，相反，我们必须首先将它们映射到数值。分类词汇表列提供了一种将字符串表示为独热矢量的方法（就像上面用年龄段看到的那样）。词汇表可以使用[categorical_column_with_vocabulary_list](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list)作为列表传递，或者使用[categorical_column_with_vocabulary_file](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_file)从文件加载。
187 | 
188 | ```python
189 | thal = feature_column.categorical_column_with_vocabulary_list(
190 |       'thal', ['fixed', 'normal', 'reversible'])
191 | 
192 | thal_one_hot = feature_column.indicator_column(thal)
193 | demo(thal_one_hot)
194 | ```
195 | 
196 | ```output
197 |       [[0. 0. 1.]
198 |       [0. 1. 0.]
199 |       [0. 0. 1.]
200 |       [0. 0. 1.]
201 |       [0. 0. 1.]]
202 |  ```
203 | 
204 | 在更复杂的数据集中，许多列将是分类的（例如字符串），在处理分类数据时，特征列最有价值。虽然此数据集中只有一个分类列，但我们将使用它来演示在处理其他数据集时可以使用的几种重要类型的特征列。
205 | 
206 | ### 7.4. 嵌入列
207 | 
208 | 假设我们不是只有几个可能的字符串，而是每个类别有数千（或更多）值。由于多种原因，随着类别数量的增加，使用独热编码训练神经网络变得不可行，我们可以使用嵌入列来克服此限制。
209 | [嵌入列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/embedding_column)不是将数据表示为多维度的独热矢量，而是将数据表示为低维密集向量，其中每个单元格可以包含任意数字，而不仅仅是0或1.嵌入的大小（在下面的例子中是8）是必须调整的参数。
210 | 
211 | 关键点：当分类列具有许多可能的值时，最好使用嵌入列，我们在这里使用一个用于演示目的，因此您有一个完整的示例，您可以在将来修改其他数据集。
212 | 
213 | ```python
214 | # 请注意，嵌入列的输入是我们先前创建的分类列 
215 | thal_embedding = feature_column.embedding_column(thal, dimension=8)
216 | demo(thal_embedding)
217 | ```
218 | 
219 | ```output
220 | [[-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594
221 |    0.32729048 -0.07209085]
222 |  [ 0.08829682  0.3921798   0.32400072  0.00508362 -0.15642034 -0.17451124
223 |    0.12631968  0.15029909]
224 |  [-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594
225 |    0.32729048 -0.07209085]
226 |  [-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594
227 |    0.32729048 -0.07209085]
228 |  [-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594
229 |    0.32729048 -0.07209085]]
230 | ```
231 | 
232 | ### 7.5. 哈希特征列
233 | 
234 | 表示具有大量值的分类列的另一种方法是使用[categorical_column_with_hash_bucket](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket).
235 | 此特征列计算输入的哈希值，然后选择一个`hash_bucket_size`存储桶来编码字符串，使用此列时，您不需要提供词汇表，并且可以选择使`hash_buckets`的数量远远小于实际类别的数量以节省空间。
236 | 
237 | 关键点：该技术的一个重要缺点是可能存在冲突，其中不同的字符串被映射到同一个桶，实际上，无论如何，这对某些数据集都有效。
238 | 
239 | ```python
240 | thal_hashed = feature_column.categorical_column_with_hash_bucket(
241 |       'thal', hash_bucket_size=1000)
242 | demo(feature_column.indicator_column(thal_hashed))
243 | ```
244 | 
245 | ```
246 | [[0. 0. 0. ... 0. 0. 0.]
247 |  [0. 0. 0. ... 0. 0. 0.]
248 |  [0. 0. 0. ... 0. 0. 0.]
249 |  [0. 0. 0. ... 0. 0. 0.]
250 |  [0. 0. 0. ... 0. 0. 0.]]
251 | ```
252 | 
253 | ### 7.6. 交叉特征列
254 | 
255 | 将特征组合成单个特征（也称为[特征交叉](https://developers.google.com/machine-learning/glossary/#feature_cross)），使模型能够为每个特征组合学习单独的权重。
256 | 在这里，我们将创建一个age和thal交叉的新功能，
257 | 请注意，`crossed_column`不会构建所有可能组合的完整表（可能非常大），相反，它由`hashed_column`支持，因此您可以选择表的大小。
258 | 
259 | ```python
260 | crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
261 | demo(feature_column.indicator_column(crossed_feature))
262 | ```
263 | 
264 | ```
265 | [[0. 0. 0. ... 0. 0. 0.]
266 |  [0. 0. 0. ... 0. 0. 0.]
267 |  [0. 0. 0. ... 0. 0. 0.]
268 |  [0. 0. 0. ... 0. 0. 0.]
269 |  [0. 0. 0. ... 0. 0. 0.]]
270 | ```
271 | 
272 | ## 8. 选择要使用的列
273 | 
274 | 我们已经了解了如何使用几种类型的特征列，现在我们将使用它们来训练模型。本教程的目标是向您展示使用特征列所需的完整代码（例如，机制），我们选择了几列来任意训练我们的模型。
275 | 
276 | 关键点：如果您的目标是建立一个准确的模型，请尝试使用您自己的更大数据集，并仔细考虑哪些特征最有意义，以及如何表示它们。
277 | 
278 | ```python
279 | feature_columns = []
280 | 
281 | # numeric 数字列
282 | for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:
283 |   feature_columns.append(feature_column.numeric_column(header))
284 | 
285 | # bucketized 分桶列
286 | age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
287 | feature_columns.append(age_buckets)
288 | 
289 | # indicator 指示符列 
290 | thal = feature_column.categorical_column_with_vocabulary_list(
291 |       'thal', ['fixed', 'normal', 'reversible'])
292 | thal_one_hot = feature_column.indicator_column(thal)
293 | feature_columns.append(thal_one_hot)
294 | 
295 | # embedding 嵌入列 
296 | thal_embedding = feature_column.embedding_column(thal, dimension=8)
297 | feature_columns.append(thal_embedding)
298 | 
299 | # crossed 交叉列 
300 | crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
301 | crossed_feature = feature_column.indicator_column(crossed_feature)
302 | feature_columns.append(crossed_feature)
303 | ```
304 | 
305 | ### 8.1. 创建特征层
306 | 
307 | 现在我们已经定义了我们的特征列，我们将使用[DenseFeatures](https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/layers/DenseFeatures)层将它们输入到我们的Keras模型中。
308 | 
309 | ```python
310 | feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
311 | ```
312 | 
313 | 之前，我们使用小批量大小来演示特征列的工作原理，我们创建了一个具有更大批量的新输入管道。
314 | 
315 | ```python
316 | batch_size = 32
317 | train_ds = df_to_dataset(train, batch_size=batch_size)
318 | val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
319 | test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
320 | ```
321 | 
322 | ## 9. 创建、编译和训练模型
323 | 
324 | ```python
325 | model = tf.keras.Sequential([
326 |   feature_layer,
327 |   layers.Dense(128, activation='relu'),
328 |   layers.Dense(128, activation='relu'),
329 |   layers.Dense(1, activation='sigmoid')
330 | ])
331 | 
332 | model.compile(optimizer='adam',
333 |               loss='binary_crossentropy',
334 |               metrics=['accuracy'])
335 | 
336 | model.fit(train_ds,
337 |           validation_data=val_ds,
338 |           epochs=5)
339 | ```
340 | 
341 | 训练过程的输出
342 | 
343 | ```
344 | Epoch 1/5
345 | 7/7 [==============================] - 1s 79ms/step - loss: 3.8492 - accuracy: 0.4219 - val_loss: 2.7367 - val_accuracy: 0.7143
346 | ......
347 | Epoch 5/5
348 | 7/7 [==============================] - 0s 34ms/step - loss: 0.6200 - accuracy: 0.7377 - val_loss: 0.6288 - val_accuracy: 0.6327
349 | 
350 | <tensorflow.python.keras.callbacks.History at 0x7f48c044c5f8>
351 | ```
352 | 
353 | 测试
354 | 
355 | ```python
356 | loss, accuracy = model.evaluate(test_ds)
357 | print("Accuracy", accuracy)
358 | ```
359 | 
360 | ```output
361 |       2/2 [==============================] - 0s 19ms/step - loss: 0.5538 - accuracy: 0.6721
362 |       Accuracy 0.6721311
363 | ```
364 | 
365 | 关键点：通常使用更大更复杂的数据集进行深度学习，您将看到最佳结果。使用像这样的小数据集时，我们建议使用决策树或随机森林作为强基线。
366 | 
367 | 本教程的目标不是为了训练一个准确的模型，而是为了演示使用结构化数据的机制，因此您在将来使用自己的数据集时需要使用代码作为起点。
368 | 
369 | ## 10. 下一步
370 | 
371 | 了解有关分类结构化数据的更多信息的最佳方法是亲自尝试，我们建议找到另一个可以使用的数据集，并训练模型使用类似于上面的代码对其进行分类，要提高准确性，请仔细考虑模型中包含哪些特征以及如何表示这些特征。
372 | 
373 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-feature_columns.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-feature_columns.html)
374 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/feature_columns](https://tensorflow.google.cn/beta/tutorials/keras/feature_columns)
375 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/feature_columns.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/feature_columns.md)
376 | 


--------------------------------------------------------------------------------
/r2/tutorials/keras/overfit_and_underfit.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 探索过拟合和欠拟合
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1915
  6 | abbrlink: tensorflow/tf2-tutorials-keras-overfit_and_underfit
  7 | ---
  8 | 
  9 | # 探索过拟合和欠拟合 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 在前面的两个例子中（电影影评分类和预测燃油效率），我们看到，在训练许多周期之后，我们的模型对验证数据的准确性会到达峰值，然后开始下降。
 12 | 
 13 | 换句话说，我们的模型会过度拟合训练数据，学习如果处理过拟合很重要，尽管通常可以在训练集上实现高精度，但我们真正想要的是开发能够很好泛化测试数据（或之前未见过的数据）的模型。
 14 | 
 15 | 过拟合的反面是欠拟合，当测试数据仍有改进空间会发生欠拟合，出现这种情况的原因有很多：模型不够强大，过度正则化，或者根本没有经过足够长的时间训练，这意味着网络尚未学习训练数据中的相关模式。
 16 | 
 17 | 如果训练时间过长，模型将开始过度拟合，并从训练数据中学习模式，而这些模式可能并不适用于测试数据，我们需要取得平衡，了解如何训练适当数量的周期，我们将在下面讨论，这是一项有用的技能。
 18 | 
 19 | 为了防止过拟合，最好的解决方案是使用更多的训练数据，受过更多数据训练的模型自然会更好的泛化。当没有更多的训练数据时，另外一个最佳解决方案是使用正则化等技术，这些限制了模型可以存储的信息的数据量和类型，如果网络只能记住少量模式，那么优化过程将迫使它专注于最突出的模式，这些模式有更好的泛化性。
 20 | 
 21 | 在本章节中，我们将探索两种常见的正则化技术：权重正则化和dropout丢弃正则化，并使用它们来改进我们的IMDB电影评论分类。
 22 | 
 23 | ```python
 24 | from __future__ import absolute_import, division, print_function, unicode_literals
 25 | 
 26 | import tensorflow as tf 
 27 | from tensorflow import keras
 28 | 
 29 | import numpy as np
 30 | import matplotlib.pyplot as plt
 31 | 
 32 | print(tf.__version__)
 33 | ```
 34 | 
 35 | ## 1. 下载IMDB数据集
 36 | 
 37 | 我们不会像以前一样使用嵌入，而是对句子进行多重编码。这个模型将很快适应训练集。它将用于证明何时发生过拟合，以及如何处理它。
 38 | 
 39 | 对我们的列表进行多热编码意味着将它们转换为0和1的向量，具体地说，这将意味着例如将序列[3,5]转换为10000维向量，除了索引3和5的值是1之外，其他全零。
 40 | 
 41 | ```python
 42 | NUM_WORDS = 10000
 43 | 
 44 | (train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS)
 45 | 
 46 | def multi_hot_sequences(sequences, dimension):
 47 |     # 创建一个全零的形状矩阵 (len(sequences), dimension)
 48 |     results = np.zeros((len(sequences), dimension))
 49 |     for i, word_indices in enumerate(sequences):
 50 |         results[i, word_indices] = 1.0  # 将results[i]的特定值设为1
 51 |     return results
 52 | 
 53 | 
 54 | train_data = multi_hot_sequences(train_data, dimension=NUM_WORDS)
 55 | test_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)
 56 | ```
 57 | 
 58 | 让我们看一下生成的多热矢量，单词索引按频率排序，因此预计索引零附近有更多的1值，我们可以在下图中看到：
 59 | 
 60 | ```python
 61 | plt.plot(train_data[0])
 62 | ```
 63 | 
 64 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_7_1.png)
 65 | 
 66 | ## 2. 演示过度拟合
 67 | 
 68 | 防止过度拟合的最简单方法是减小模型的大小，即模型中可学习参数的数量（由层数和每层单元数决定）。在深度学习中，模型中可学习参数的数量通常被称为模型的“容量”。直观地，具有更多参数的模型将具有更多的“记忆能力”，因此将能够容易地学习训练样本与其目标之间的完美的字典式映射，没有任何泛化能力的映射，但是在对未见过的数据做出预测时这将是无用的。
 69 | 
 70 | 始终牢记这一点：深度学习模型往往善于适应训练数据，但真正的挑战是泛化，而不是适应。
 71 | 
 72 | 另一方面，如果网络具有有限的记忆资源，则将不能容易地学习映射。为了最大限度地减少损失，它必须学习具有更强预测能力的压缩表示。同时，如果您使模型太小，则难以适应训练数据。“太多容量”和“容量不足”之间存在平衡。
 73 | 
 74 | 不幸的是，没有神奇的公式来确定模型的正确大小或架构（就层数而言，或每层的正确大小），您将不得不尝试使用一系列不同的架构。
 75 | 
 76 | 要找到合适的模型大小，最好从相对较少的层和参数开始，然后开始增加层的大小或添加新层，直到您看到验证损失的收益递减为止。让我们在电影评论分类网络上试试。
 77 | 
 78 | 我们将仅适用`Dense`层作为基线创建一个简单模型，然后创建更小和更大的版本，并进行比较。
 79 | 
 80 | ### 2.1. 创建一个基线模型
 81 | 
 82 | ```python
 83 | baseline_model = keras.Sequential([
 84 |     # `input_shape` is only required here so that `.summary` works.
 85 |     keras.layers.Dense(16, activation='relu', input_shape=(NUM_WORDS,)),
 86 |     keras.layers.Dense(16, activation='relu'),
 87 |     keras.layers.Dense(1, activation='sigmoid')
 88 | ])
 89 | 
 90 | baseline_model.compile(optimizer='adam',
 91 |                        loss='binary_crossentropy',
 92 |                        metrics=['accuracy', 'binary_crossentropy'])
 93 | 
 94 | baseline_model.summary()
 95 | ```
 96 | 
 97 | ```output
 98 | Model: "sequential"
 99 | _________________________________________________________________
100 | Layer (type)                 Output Shape              Param #   
101 | =================================================================
102 | dense (Dense)                (None, 16)                160016    
103 | _________________________________________________________________
104 | dense_1 (Dense)              (None, 16)                272       
105 | _________________________________________________________________
106 | dense_2 (Dense)              (None, 1)                 17        
107 | =================================================================
108 | Total params: 160,305
109 | Trainable params: 160,305
110 | Non-trainable params: 0
111 | _________________________________________________________________
112 | ```
113 | 
114 | ```python
115 | baseline_history = baseline_model.fit(train_data,
116 |                                       train_labels,
117 |                                       epochs=20,
118 |                                       batch_size=512,
119 |                                       validation_data=(test_data, test_labels),
120 |                                       verbose=2)
121 | ```
122 | 
123 | ```output
124 | Train on 25000 samples, validate on 25000 samples
125 | Epoch 1/20
126 | 25000/25000 - 3s - loss: 0.4664 - accuracy: 0.8135 - binary_crossentropy: 0.4664 - val_loss: 0.3257 - val_accuracy: 0.8808 - val_binary_crossentropy: 0.3257
127 | ......
128 | Epoch 20/20
129 | 25000/25000 - 2s - loss: 0.0037 - accuracy: 0.9999 - binary_crossentropy: 0.0037 - val_loss: 0.8219 - val_accuracy: 0.8532 - val_binary_crossentropy: 0.8219
130 | ```
131 | 
132 | ### 2.2. 创建一个更小的模型
133 | 
134 | 让我们创建一个隐藏单元较少的模型，与我们刚刚创建的基线模型进行比较：
135 | 
136 | ```python
137 | smaller_model = keras.Sequential([
138 |     keras.layers.Dense(4, activation='relu', input_shape=(NUM_WORDS,)),
139 |     keras.layers.Dense(4, activation='relu'),
140 |     keras.layers.Dense(1, activation='sigmoid')
141 | ])
142 | 
143 | smaller_model.compile(optimizer='adam',
144 |                       loss='binary_crossentropy',
145 |                       metrics=['accuracy', 'binary_crossentropy'])
146 | 
147 | smaller_model.summary()
148 | ```
149 | 
150 | ```output
151 | Model: "sequential_1"
152 | _________________________________________________________________
153 | Layer (type)                 Output Shape              Param #   
154 | =================================================================
155 | dense_3 (Dense)              (None, 4)                 40004     
156 | _________________________________________________________________
157 | dense_4 (Dense)              (None, 4)                 20        
158 | _________________________________________________________________
159 | dense_5 (Dense)              (None, 1)                 5         
160 | =================================================================
161 | Total params: 40,029
162 | Trainable params: 40,029
163 | Non-trainable params: 0
164 | _________________________________________________________________
165 | ```
166 | 
167 | 用相同的数据训练模型：
168 | 
169 | ```python
170 | smaller_history = smaller_model.fit(train_data,
171 |                                     train_labels,
172 |                                     epochs=20,
173 |                                     batch_size=512,
174 |                                     validation_data=(test_data, test_labels),
175 |                                     verbose=2)
176 | ```
177 | 
178 | ```output
179 | Train on 25000 samples, validate on 25000 samples
180 | Epoch 1/20
181 | 25000/25000 - 3s - loss: 0.6189 - accuracy: 0.6439 - binary_crossentropy: 0.6189 - val_loss: 0.5482 - val_accuracy: 0.7987 - val_binary_crossentropy: 0.5482
182 | ......
183 | Epoch 20/20
184 | 25000/25000 - 2s - loss: 0.1857 - accuracy: 0.9880 - binary_crossentropy: 0.1857 - val_loss: 0.5043 - val_accuracy: 0.8632 - val_binary_crossentropy: 0.5043
185 | ```
186 | 
187 | ### 2.3. 创建一个较大的模型
188 | 
189 | 作为练习，您可以创建一个更大的模型，并查看它开始过拟合的速度。
190 | 接下来，让我们在这个基准测试中添加一个容量更大的网络，远远超出问题的范围：
191 | 
192 | ```python
193 | bigger_model = keras.models.Sequential([
194 |     keras.layers.Dense(512, activation='relu', input_shape=(NUM_WORDS,)),
195 |     keras.layers.Dense(512, activation='relu'),
196 |     keras.layers.Dense(1, activation='sigmoid')
197 | ])
198 | 
199 | bigger_model.compile(optimizer='adam',
200 |                      loss='binary_crossentropy',
201 |                      metrics=['accuracy','binary_crossentropy'])
202 | 
203 | bigger_model.summary()
204 | ```
205 | 
206 | ```output
207 | Model: "sequential_2"
208 | _________________________________________________________________
209 | Layer (type)                 Output Shape              Param #   
210 | =================================================================
211 | dense_6 (Dense)              (None, 512)               5120512   
212 | _________________________________________________________________
213 | dense_7 (Dense)              (None, 512)               262656    
214 | _________________________________________________________________
215 | dense_8 (Dense)              (None, 1)                 513       
216 | =================================================================
217 | Total params: 5,383,681
218 | Trainable params: 5,383,681
219 | Non-trainable params: 0
220 | _________________________________________________________________
221 | ```
222 | 
223 | 并且，再次使用相同的数据训练模型：
224 | 
225 | ```python
226 | bigger_history = bigger_model.fit(train_data, train_labels,
227 |                                   epochs=20,
228 |                                   batch_size=512,
229 |                                   validation_data=(test_data, test_labels),
230 |                                   verbose=2)
231 | ```
232 | 输出
233 | ```
234 | Train on 25000 samples, validate on 25000 samples
235 | Epoch 1/20
236 | 25000/25000 - 5s - loss: 0.3392 - accuracy: 0.8581 - binary_crossentropy: 0.3392 - val_loss: 0.2947 - val_accuracy: 0.8802 - val_binary_crossentropy: 0.2947
237 | ......
238 | Epoch 20/20
239 | 25000/25000 - 5s - loss: 1.1516e-05 - accuracy: 1.0000 - binary_crossentropy: 1.1516e-05 - val_loss: 0.9571 - val_accuracy: 0.8717 - val_binary_crossentropy: 0.9571
240 | ```
241 | 
242 | ### 2.4. 绘制训练和验证损失
243 | 
244 | <!--TODO(markdaoust): This should be a one-liner with tensorboard -->
245 | 
246 | 实线表示训练损失，虚线表示验证损失（记住：较低的验证损失表示更好的模型）。在这里，较小的网络开始过拟合晚于基线模型（在6个周期之后而不是4个周期），并且一旦开始过拟合，其性能下降得慢得多。
247 | 
248 | ```python
249 | def plot_history(histories, key='binary_crossentropy'):
250 |   plt.figure(figsize=(16,10))
251 | 
252 |   for name, history in histories:
253 |     val = plt.plot(history.epoch, history.history['val_'+key],
254 |                    '--', label=name.title()+' Val')
255 |     plt.plot(history.epoch, history.history[key], color=val[0].get_color(),
256 |              label=name.title()+' Train')
257 | 
258 |   plt.xlabel('Epochs')
259 |   plt.ylabel(key.replace('_',' ').title())
260 |   plt.legend()
261 | 
262 |   plt.xlim([0,max(history.epoch)])
263 | 
264 | 
265 | plot_history([('baseline', baseline_history),
266 |               ('smaller', smaller_history),
267 |               ('bigger', bigger_history)])
268 | ```
269 | 
270 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_23_0.png?dcb_=0.12370822350480548)
271 | 
272 | 请注意，较大的网络在仅仅一个周期之后几乎立即开始过度拟合，并且更严重。网络容量越大，能够越快地对训练数据进行建模（导致训练损失低），但过拟合的可能性越大（导致训练和验证损失之间的差异很大）。
273 | 
274 | ## 3. 防止过度拟合的策略
275 | 
276 | ### 3.1. 添加权重正则化
277 | 
278 | 你可能熟悉奥卡姆的剃刀原则：给出两个解释的东西，最可能正确的解释是“最简单”的解释，即做出最少量假设的解释。这也适用于神经网络学习的模型：给定一些训练数据和网络架构，有多组权重值（多个模型）可以解释数据，而简单模型比复杂模型更不容易过度拟合。
279 | 
280 | 在这种情况下，“简单模型”是参数值分布的熵更小的模型(或参数更少的模型，如我们在上一节中看到的)。因此，减轻过度拟合的一种常见方法是通过强制网络的权值只取较小的值来限制网络的复杂性，这使得权值的分布更加“规则”。这被称为“权重正则化”，它是通过在网络的损失函数中增加与权重过大相关的成本来实现的。这种成本有两种:
281 | 
282 | * [L1 正则化](https://developers.google.cn/machine-learning/glossary/#L1_regularization)其中添加的成本与权重系数的绝对值成正比(即与权重的“L1范数”成正比)。
283 | 
284 | * [L2 正则化](https://developers.google.cn/machine-learning/glossary/#L2_regularization), 其中增加的成本与权重系数值的平方成正比(即与权重的平方“L2范数”成正比)。L2正则化在神经网络中也称为权值衰减。不要让不同的名称迷惑你:权重衰减在数学上与L2正则化是完全相同的。
285 | 
286 | L2正则化引入了稀疏性，使一些权重参数为零。L2正则化将惩罚权重参数而不会使它们稀疏，这是L2更常见的一个原因。
287 | 
288 | 在`tf.keras`中，通过将权重正则化实例作为关键字参数传递给层来添加权重正则化。我们现在添加L2权重正则化。
289 | 
290 | ```python
291 | l2_model = keras.models.Sequential([
292 |     keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
293 |                        activation='relu', input_shape=(NUM_WORDS,)),
294 |     keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
295 |                        activation='relu'),
296 |     keras.layers.Dense(1, activation='sigmoid')
297 | ])
298 | 
299 | l2_model.compile(optimizer='adam',
300 |                  loss='binary_crossentropy',
301 |                  metrics=['accuracy', 'binary_crossentropy'])
302 | 
303 | l2_model_history = l2_model.fit(train_data, train_labels,
304 |                                 epochs=20,
305 |                                 batch_size=512,
306 |                                 validation_data=(test_data, test_labels),
307 |                                 verbose=2)
308 | ```
309 | 
310 | ```
311 | Train on 25000 samples, validate on 25000 samples
312 | Epoch 1/20
313 | 25000/25000 - 3s - loss: 0.5191 - accuracy: 0.8206 - binary_crossentropy: 0.4785 - val_loss: 0.3855 - val_accuracy: 0.8727 - val_binary_crossentropy: 0.3421
314 | ......
315 | Epoch 20/20
316 | 25000/25000 - 2s - loss: 0.1567 - accuracy: 0.9718 - binary_crossentropy: 0.0868 - val_loss: 0.5327 - val_accuracy: 0.8561 - val_binary_crossentropy: 0.4631
317 | ```
318 | 
319 | ```l2（0.001）```表示该层的权重矩阵中的每个系数都会将```0.001 * weight_coefficient_value**2```添加到网络的总损失中。请注意，由于此惩罚仅在训练时添加，因此在训练时该网络的损失将远高于测试时。
320 | 
321 | 这是我们的L2正则化惩罚的影响：
322 | 
323 | ```python
324 | plot_history([('baseline', baseline_history),
325 |               ('l2', l2_model_history)])
326 | ```
327 | 
328 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_30_0.png?dcb_=0.8386779368853696)
329 | 
330 | 正如你所看到的，L2正则化模型比基线模型更能抵抗过拟合，即使两个模型具有相同数量的参数。
331 | 
332 | ### 3.2. 添加Dropout(丢弃正则化)
333 | 
334 | Dropout是由Hinton和他在多伦多大学的学生开发的最有效和最常用的神经网络正则化技术之一。Dropout应用于层主要就是在训练期间随机“丢弃”（即设置为零）该层的多个输出特征。假设一个给定的层通常会在训练期间为给定的输入样本返回一个向量[0.2,0.5,1.3,0.8,1.1]，在应用了Dropout之后，该向量将具有随机分布的几个零条目，例如，[0,0.5,1.3,0,1.1]。“丢弃率”是被归零的特征的一部分，它通常设置在0.2和0.5之间，
335 | 在测试时，没有单元被剔除，而是将层的输出值按与丢弃率相等的因子缩小，以平衡实际活动的单元多余训练时的单元。
336 | 
337 | 在`tf.keras`中，您可以通过`Dropout`层在网络中引入dropout，该层将在之前应用于层的输出。
338 | 
339 | 让我们在IMDB网络中添加两个`Dropout`层，看看它们在减少过度拟合方面做得如何：
340 | 
341 | ```python
342 | dpt_model = keras.models.Sequential([
343 |     keras.layers.Dense(16, activation='relu', input_shape=(NUM_WORDS,)),
344 |     keras.layers.Dropout(0.5),
345 |     keras.layers.Dense(16, activation='relu'),
346 |     keras.layers.Dropout(0.5),
347 |     keras.layers.Dense(1, activation='sigmoid')
348 | ])
349 | 
350 | dpt_model.compile(optimizer='adam',
351 |                   loss='binary_crossentropy',
352 |                   metrics=['accuracy','binary_crossentropy'])
353 | 
354 | dpt_model_history = dpt_model.fit(train_data, train_labels,
355 |                                   epochs=20,
356 |                                   batch_size=512,
357 |                                   validation_data=(test_data, test_labels),
358 |                                   verbose=2)
359 | ```
360 | 
361 | ```
362 | Train on 25000 samples, validate on 25000 samples
363 | Epoch 1/20
364 | 25000/25000 - 3s - loss: 0.6355 - accuracy: 0.6373 - binary_crossentropy: 0.6355 - val_loss: 0.4929 - val_accuracy: 0.8396 - val_binary_crossentropy: 0.4929
365 | ......
366 | Epoch 20/20
367 | 25000/25000 - 3s - loss: 0.0729 - accuracy: 0.9738 - binary_crossentropy: 0.0729 - val_loss: 0.5624 - val_accuracy: 0.8747 - val_binary_crossentropy: 0.5624
368 | ```
369 | 
370 | ```
371 | plot_history([('baseline', baseline_history),
372 |               ('dropout', dpt_model_history)])
373 | ```
374 | 
375 | ![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_34_0.png?dcb_=0.9304692927609572)
376 | 
377 | 从上图可以看出，添加dropout时对基线模型的明显改进。
378 | 
379 | 回顾一下，以下是防止神经网络中过度拟合的最常用方法：
380 | * 获取更多训练数据
381 | * 减少网络的容量
382 | * 添加权重正则化
383 | * 添加dropout
384 | 
385 | 本指南未涉及的两个重要方法是数据增强和批量标准化。
386 | 
387 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-overfit_and_underfit.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-overfit_and_underfit.html)
388 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit)
389 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/overfit_and_underfit.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/overfit_and_underfit.md)
390 | 


--------------------------------------------------------------------------------
/r2/tutorials/keras/save_and_restore_models.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: tensorflow2保存和加载模型
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1916
  6 | abbrlink: tensorflow/tf2-tutorials-keras-save_and_restore_models
  7 | ---
  8 | 
  9 | # tensorflow2保存和加载模型 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 模型进度可以在训练期间和训练后保存。这意味着模型可以在它停止的地方继续，并避免长时间的训练。保存还意味着您可以共享您的模型，其他人可以重新创建您的工作。当发布研究模型和技术时，大多数机器学习实践者共享:
 12 | * 用于创建模型的代码
 13 | * 以及模型的训练权重或参数
 14 | 
 15 | 共享此数据有助于其他人了解模型的工作原理，并使用新数据自行尝试。
 16 | 
 17 | 注意：小心不受信任的代码(TensorFlow模型是代码)。有关详细信息，请参阅[安全使用TensorFlow](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) 。
 18 | 
 19 | **选项**
 20 | 
 21 | 保存TensorFlow模型有多种方法，具体取决于你使用的API。本章节使用tf.keras(一个高级API，用于TensorFlow中构建和训练模型)，有关其他方法，请参阅TensorFlow[保存和还原指南](https://tensorflow.google.cn/guide/saved_model)或[保存在eager中](https://tensorflow.google.cn/guide/eager#object-based_saving)。
 22 | 
 23 | ## 1. 设置
 24 | 
 25 | ### 1.1. 安装和导入
 26 | 
 27 | 需要安装和导入TensorFlow和依赖项
 28 | 
 29 | ```python
 30 | pip install h5py pyyaml
 31 | ```
 32 | 
 33 | ### 1.2. 获取样本数据集
 34 | 
 35 | 我们将使用[MNIST数据集](http://yann.lecun.com/exdb/mnist/)来训练我们的模型以演示保存权重，要加速这些演示运行，请只使用前1000个样本数据：
 36 | 
 37 | ```python
 38 | from __future__ import absolute_import, division, print_function, unicode_literals
 39 | 
 40 | import os
 41 | 
 42 | !pip install tensorflow==2.0.0-alpha0
 43 | import tensorflow as tf
 44 | from tensorflow import keras
 45 | 
 46 | tf.__version__
 47 | ```
 48 | 
 49 | ```python
 50 | (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
 51 | 
 52 | train_labels = train_labels[:1000]
 53 | test_labels = test_labels[:1000]
 54 | 
 55 | train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
 56 | test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
 57 | ```
 58 | 
 59 | ### 1.3. 定义模型
 60 | 
 61 | 让我们构建一个简单的模型，我们将用它来演示保存和加载权重。
 62 | 
 63 | ```python
 64 | # 返回一个简短的序列模型 
 65 | def create_model():
 66 |   model = tf.keras.models.Sequential([
 67 |     keras.layers.Dense(512, activation='relu', input_shape=(784,)),
 68 |     keras.layers.Dropout(0.2),
 69 |     keras.layers.Dense(10, activation='softmax')
 70 |   ])
 71 | 
 72 |   model.compile(optimizer='adam',
 73 |                 loss='sparse_categorical_crossentropy',
 74 |                 metrics=['accuracy'])
 75 | 
 76 |   return model
 77 | 
 78 | 
 79 | # 创建基本模型实例
 80 | model = create_model()
 81 | model.summary()
 82 | ```
 83 | 
 84 | ```python
 85 | Model: "sequential"
 86 | _________________________________________________________________
 87 | Layer (type)                 Output Shape              Param #   
 88 | =================================================================
 89 | dense (Dense)                (None, 512)               401920    
 90 | _________________________________________________________________
 91 | dropout (Dropout)            (None, 512)               0         
 92 | _________________________________________________________________
 93 | dense_1 (Dense)              (None, 10)                5130      
 94 | =================================================================
 95 | Total params: 407,050
 96 | Trainable params: 407,050
 97 | Non-trainable params: 0
 98 | _________________________________________________________________
 99 | ```
100 | 
101 | ## 2. 在训练期间保存检查点
102 | 
103 | 主要用例是在训练期间和训练结束时自动保存检查点，通过这种方式，您可以使用训练有素的模型，而无需重新训练，或者在您离开的地方继续训练，以防止训练过程中断。
104 | 
105 | `tf.keras.callbacks.ModelCheckpoint`是执行此任务的回调，回调需要几个参数来配置检查点。
106 | 
107 | ### 2.1. 检查点回调使用情况
108 | 
109 | 训练模型并将其传递给 `ModelCheckpoint`回调
110 | 
111 | ```python
112 | checkpoint_path = "training_1/cp.ckpt"
113 | checkpoint_dir = os.path.dirname(checkpoint_path)
114 | 
115 | # 创建一个检查点回调
116 | cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
117 |                                                  save_weights_only=True,
118 |                                                  verbose=1)
119 | 
120 | model = create_model()
121 | 
122 | model.fit(train_images, train_labels,  epochs = 10,
123 |           validation_data = (test_images,test_labels),
124 |           callbacks = [cp_callback])  # pass callback to training
125 | 
126 | # 这可能会生成与保存优化程序状态相关的警告。
127 | # 这些警告（以及整个笔记本中的类似警告）是为了阻止过时使用的，可以忽略。
128 | ```
129 | 
130 | ```output
131 |   Train on 1000 samples, validate on 1000 samples
132 |   ......
133 |   Epoch 10/10
134 |   960/1000 [===========================>..] - ETA: 0s - loss: 0.0392 - accuracy: 1.0000
135 |   Epoch 00010: saving model to training_1/cp.ckpt
136 |   1000/1000 [==============================] - 0s 207us/sample - loss: 0.0393 - accuracy: 1.0000 - val_loss: 0.3976 - val_accuracy: 0.8750
137 | 
138 |   <tensorflow.python.keras.callbacks.History at 0x7efc3eba7358>
139 | ```
140 | 
141 | 这将创建一个TensorFlow检查点文件集合，这些文件在每个周期结束时更新。
142 | 文件夹checkpoint_dir下的内容如下：（Linux系统使用 `ls`命令查看）
143 | ```
144 | checkpoint  cp.ckpt.data-00000-of-00001  cp.ckpt.index
145 | ```
146 | 
147 | 创建一个新的未经训练的模型，仅从权重恢复模型时，必须具有与原始模型具有相同体系结构的模型，由于它是相同的模型架构，我们可以共享权重，尽管它是模型的不同示例。
148 | 
149 | 现在重建一个新的，未经训练的模型，并在测试集中评估它。未经训练的模型将在随机水平(约10%的准确率):
150 | 
151 | ```python
152 | model = create_model()
153 | 
154 | loss, acc = model.evaluate(test_images, test_labels)
155 | print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
156 | ```
157 | 
158 | ```output
159 | 1000/1000 [==============================] - 0s 107us/sample - loss: 2.3224 - accuracy: 0.1230
160 | Untrained model, accuracy: 12.30%
161 | ```
162 | 
163 | 然后从检查点加载权重，并重新评估：
164 | 
165 | ```python
166 | model.load_weights(checkpoint_path)
167 | loss,acc = model.evaluate(test_images, test_labels)
168 | print("Restored model, accuracy: {:5.2f}%".format(100*acc))
169 | ```
170 | 
171 | ```
172 | 1000/1000 [==============================] - 0s 48us/sample - loss: 0.3976 - accuracy: 0.8750
173 | Restored model, accuracy: 87.50%
174 | ```
175 | 
176 | ### 2.2. 检查点选项
177 | 
178 | 回调提供了几个选项，可以为生成的检查点提供唯一的名称，并调整检查点频率。
179 | 
180 | 训练一个新模型，每5个周期保存一次唯一命名的检查点：
181 | 
182 | ```python
183 | # 在文件名中包含周期数. (使用 `str.format`)
184 | checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
185 | checkpoint_dir = os.path.dirname(checkpoint_path)
186 | 
187 | cp_callback = tf.keras.callbacks.ModelCheckpoint(
188 |     checkpoint_path, verbose=1, save_weights_only=True,
189 |     # 每5个周期保存一次权重
190 |     period=5)
191 | 
192 | model = create_model()
193 | model.save_weights(checkpoint_path.format(epoch=0))
194 | model.fit(train_images, train_labels,
195 |           epochs = 50, callbacks = [cp_callback],
196 |           validation_data = (test_images,test_labels),
197 |           verbose=0)
198 | ```
199 | 
200 | ```
201 | 
202 | Epoch 00005: saving model to training_2/cp-0005.ckpt
203 | ......
204 | Epoch 00050: saving model to training_2/cp-0050.ckpt
205 | <tensorflow.python.keras.callbacks.History at 0x7efc7c3bbd30>
206 | ```
207 | 
208 | 现在，查看生成的检查点并选择最新的检查点：
209 | 
210 | ```python
211 | latest = tf.train.latest_checkpoint(checkpoint_dir)
212 | latest
213 | ```
214 | 
215 | ```
216 |       'training_2/cp-0050.ckpt'
217 | ```
218 | 
219 | 注意：默认的tensorflow格式仅保存最近的5个检查点。
220 | 
221 | 要测试，请重置模型并加载最新的检查点：
222 | 
223 | ```
224 | model = create_model()
225 | model.load_weights(latest)
226 | loss, acc = model.evaluate(test_images, test_labels)
227 | print("Restored model, accuracy: {:5.2f}%".format(100*acc))
228 | ```
229 | 
230 | ```
231 |       1000/1000 [==============================] - 0s 84us/sample - loss: 0.4695 - accuracy: 0.8810
232 |       Restored model, accuracy: 88.10%
233 | ```
234 | 
235 | ## 3. 这些文件是什么？
236 | 
237 | 上述代码将权重存储到[检查点]((https://tensorflow.google.cn/guide/saved_model#save_and_restore_variables))格式的文件集合中，这些文件仅包含二进制格式的训练权重.
238 | 检查点包含：
239 | * 一个或多个包含模型权重的分片；
240 | * 索引文件，指示哪些权重存储在哪个分片。
241 | 
242 | 如果您只在一台机器上训练模型，那么您将有一个带有后缀的分片：`.data-00000-of-00001`
243 | 
244 | ## 4. 手动保存权重
245 | 
246 | 上面你看到了如何将权重加载到模型中。手动保存权重同样简单，使用`Model.save_weights`方法。
247 | 
248 | ```python
249 | # 保存权重
250 | model.save_weights('./checkpoints/my_checkpoint')
251 | 
252 | # 加载权重
253 | model = create_model()
254 | model.load_weights('./checkpoints/my_checkpoint')
255 | 
256 | loss,acc = model.evaluate(test_images, test_labels)
257 | print("Restored model, accuracy: {:5.2f}%".format(100*acc))
258 | ```
259 | 
260 | ## 5. 保存整个模型
261 | 
262 | 模型和优化器可以保存到包含其状态（权重和变量）和模型配置的文件中，这允许您导出模型，以便可以在不访问原始python代码的情况下使用它。由于恢复了优化器状态，您甚至可以从中断的位置恢复训练。
263 | 
264 | 保存完整的模型非常有用，您可以在TensorFlow.js([HDF5](https://tensorflow.google.cn/js/tutorials/import-keras.html), [Saved Model](https://tensorflow.google.cn/js/tutorials/conversion/import_saved_model)) 中加载它们，然后在Web浏览器中训练和运行它们，或者使用TensorFlow Lite([HDF5](https://tensorflow.google.cn/lite/convert/python_api#exporting_a_tfkeras_file_), [Saved Model](https://tensorflow.google.cn/lite/convert/python_api#exporting_a_savedmodel_))将它们转换为在移动设备上运行。
265 | 
266 | ### 5.1. 作为HDF5文件
267 | 
268 | Keras使用[HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format)标准提供基本保存格式，出于我们的目的，可以将保存的模型视为单个二进制blob。
269 | 
270 | ```python
271 | model = create_model()
272 | 
273 | model.fit(train_images, train_labels, epochs=5)
274 | 
275 | # 保存整个模型到HDF5文件 
276 | model.save('my_model.h5')
277 | ```
278 | 
279 | 现在从该文件重新创建模型：
280 | 
281 | ```python
282 | # 重新创建完全相同的模型，包括权重和优化器
283 | new_model = keras.models.load_model('my_model.h5')
284 | new_model.summary()
285 | ```
286 | 
287 | ```
288 | Model: "sequential_6"
289 | _________________________________________________________________
290 | Layer (type)                 Output Shape              Param #   
291 | =================================================================
292 | dense_12 (Dense)             (None, 512)               401920    
293 | _________________________________________________________________
294 | dropout_6 (Dropout)          (None, 512)               0         
295 | _________________________________________________________________
296 | dense_13 (Dense)             (None, 10)                5130      
297 | =================================================================
298 | Total params: 407,050
299 | Trainable params: 407,050
300 | Non-trainable params: 0
301 | _________________________________________________________________
302 | ```
303 | 
304 | 检查模型的准确率:
305 | 
306 | ```python
307 | loss, acc = new_model.evaluate(test_images, test_labels)
308 | print("Restored model, accuracy: {:5.2f}%".format(100*acc))
309 | ```
310 | 
311 | ```
312 | 1000/1000 [==============================] - 0s 94us/sample - loss: 0.4137 - accuracy: 0.8540
313 | Restored model, accuracy: 85.40%
314 | ```
315 | 
316 | 此方法可保存模型的所有东西：
317 | * 权重值
318 | * 模型的配置（架构）
319 | * 优化器配置
320 | 
321 | Keras通过检查架构来保存模型，目前它无法保存TensorFlow优化器（来自`tf.train`）。使用这些时，您需要在加载后重新编译模型，否则您将失去优化程序的状态。
322 | 
323 | ### 5.2. 作为 `saved_model`
324 | 
325 | 注意：这种保存`tf.keras`模型的方法是实验性的，在将来的版本中可能会有所改变。
326 | 
327 | 创建一个新的模型：
328 | 
329 | ```
330 | model = create_model()
331 | 
332 | model.fit(train_images, train_labels, epochs=5)
333 | ```
334 | 
335 | 创建`saved_model`，并将其放在带时间戳的目录中：
336 | 
337 | ```python
338 | import time
339 | saved_model_path = "./saved_models/{}".format(int(time.time()))
340 | 
341 | tf.keras.experimental.export_saved_model(model, saved_model_path)
342 | saved_model_path
343 | ```
344 | 
345 | ```
346 |     './saved_models/1555630614'
347 | ```
348 | 
349 | 从保存的模型重新加载新的keras模型：
350 | 
351 | ```
352 | new_model = tf.keras.experimental.load_from_saved_model(saved_model_path)
353 | new_model.summary()
354 | ```
355 | 
356 | ```
357 | Model: "sequential_7"
358 | _________________________________________________________________
359 | Layer (type)                 Output Shape              Param #   
360 | =================================================================
361 | dense_14 (Dense)             (None, 512)               401920    
362 | _________________________________________________________________
363 | dropout_7 (Dropout)          (None, 512)               0         
364 | _________________________________________________________________
365 | dense_15 (Dense)             (None, 10)                5130      
366 | =================================================================
367 | Total params: 407,050
368 | Trainable params: 407,050
369 | Non-trainable params: 0
370 | _________________________________________________________________
371 | ```
372 | 
373 | 运行加载的模型进行预测：
374 | 
375 | ```python
376 | model.predict(test_images).shape
377 | ```
378 | 
379 | ```
380 | (1000, 10)
381 | ```
382 | 
383 | ```python
384 | # 必须要在评估之前编译模型
385 | # 如果仅部署已保存的模型，则不需要此步骤 
386 | 
387 | new_model.compile(optimizer=model.optimizer,  # keep the optimizer that was loaded
388 |               loss='sparse_categorical_crossentropy',
389 |               metrics=['accuracy'])
390 | 
391 | # 评估加载后的模型 
392 | loss, acc = new_model.evaluate(test_images, test_labels)
393 | print("Restored model, accuracy: {:5.2f}%".format(100*acc))
394 | ```
395 | 
396 | ```
397 |       1000/1000 [==============================] - 0s 102us/sample - loss: 0.4367 - accuracy: 0.8570
398 |       Restored model, accuracy: 85.70%
399 | ```
400 | 
401 | ## 6. 下一步是什么
402 | 
403 | 这是使用`tf.keras`保存和加载的快速指南。
404 | 
405 | * [tf.keras指南](https://tensorflow.google.cn/guide/keras)显示了有关使用tf.keras保存和加载模型的更多信息。
406 | 
407 | * 在eager execution期间保存，请参阅在[Saving in eager](https://tensorflow.google.cn/guide/eager#object_based_saving)。
408 | 
409 | * [保存和还原指南](https://tensorflow.google.cn/guide/saved_model)包含有关TensorFlow保存的低阶详细信息。
410 | 
411 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-save_and_restore_models.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-save_and_restore_models.html)
412 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/save_and_restore_models](https://tensorflow.google.cn/beta/tutorials/keras/save_and_restore_models)
413 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/save_and_restore_models.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/save_and_restore_models.md)


--------------------------------------------------------------------------------
/r2/tutorials/quickstart/advanced.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 专家入门TensorFlow 2.0
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1906
  6 | abbrlink: tensorflow/tf2-tutorials-quickstart-advanced
  7 | ---
  8 | 
  9 | # 专家入门TensorFlow 2.0使用流程：数据处理、自定义模型、损失、指标、梯度下降 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 初学者入门教程中，使用tf.keras.Sequential模型，只是简单的堆叠模型。
 12 | 本文是专家级入门，使用 Keras 模型子类 API 构建模型，会使用更底层一点的的函数接口，自定义模型、损失、评估指标和梯度下降控制等，流程清晰。
 13 | 
 14 | 
 15 | 开始，请将TensorFlow库导入您的程序：
 16 | 
 17 | ```python
 18 | from __future__ import absolute_import, division, print_function, unicode_literals
 19 | 
 20 | import tensorflow as tf  # 安装命令 `pip install tensorflow-gpu==2.0.0-alpha0`
 21 | 
 22 | from tensorflow.keras.layers import Dense, Flatten, Conv2D
 23 | from tensorflow.keras import Model
 24 | ```
 25 | 
 26 | 加载并准备[MNIST数据集](http://yann.lecun.com/exdb/mnist/).。
 27 | 
 28 | ```python
 29 | mnist = tf.keras.datasets.mnist
 30 | 
 31 | (x_train, y_train), (x_test, y_test) = mnist.load_data()
 32 | x_train, x_test = x_train / 255.0, x_test / 255.0
 33 | 
 34 | # 添加一个通道维度
 35 | x_train = x_train[..., tf.newaxis]
 36 | x_test = x_test[..., tf.newaxis]
 37 | ```
 38 | 
 39 | 使用tf.data批处理和随机打乱数据集：
 40 | 
 41 | ```python
 42 | train_ds = tf.data.Dataset.from_tensor_slices(
 43 |     (x_train, y_train)).shuffle(10000).batch(32)
 44 | test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
 45 | ```
 46 | 
 47 | 通过使用Keras[模型子类 API](https://tensorflow.google.cn/guide/keras#model_subclassing)构建`tf.keras`模型：
 48 | 
 49 | ```python
 50 | class MyModel(Model):
 51 |   def __init__(self):
 52 |     super(MyModel, self).__init__()
 53 |     self.conv1 = Conv2D(32, 3, activation='relu')
 54 |     self.flatten = Flatten()
 55 |     self.d1 = Dense(128, activation='relu')
 56 |     self.d2 = Dense(10, activation='softmax')
 57 | 
 58 |   def call(self, x):
 59 |     x = self.conv1(x)
 60 |     x = self.flatten(x)
 61 |     x = self.d1(x)
 62 |     return self.d2(x)
 63 | 
 64 | model = MyModel()
 65 | ```
 66 | 
 67 | 选择优化器和损失函数进行训练：
 68 | 
 69 | ```python
 70 | loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
 71 | 
 72 | optimizer = tf.keras.optimizers.Adam()
 73 | ```
 74 | 
 75 | 选择指标（metrics）以衡量模型的损失和准确性。这些指标累积超过周期的值，然后打印整体结果。
 76 | 
 77 | ```python
 78 | train_loss = tf.keras.metrics.Mean(name='train_loss')
 79 | train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
 80 | 
 81 | test_loss = tf.keras.metrics.Mean(name='test_loss')
 82 | test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')
 83 | ```
 84 | 
 85 | 使用`tf.GradientTape`训练模型：
 86 | 
 87 | ```python
 88 | @tf.function
 89 | def train_step(images, labels):
 90 |   with tf.GradientTape() as tape:
 91 |     predictions = model(images)
 92 |     loss = loss_object(labels, predictions)
 93 |   gradients = tape.gradient(loss, model.trainable_variables)
 94 |   optimizer.apply_gradients(zip(gradients, model.trainable_variables))
 95 | 
 96 |   train_loss(loss)
 97 |   train_accuracy(labels, predictions)
 98 | ```
 99 | 
100 | 现在测试模型：
101 | 
102 | ```python
103 | @tf.function
104 | def test_step(images, labels):
105 |   predictions = model(images)
106 |   t_loss = loss_object(labels, predictions)
107 | 
108 |   test_loss(t_loss)
109 |   test_accuracy(labels, predictions)
110 | ```
111 | 
112 | ```python
113 | EPOCHS = 5
114 | 
115 | for epoch in range(EPOCHS):
116 |   for images, labels in train_ds:
117 |     train_step(images, labels)
118 | 
119 |   for test_images, test_labels in test_ds:
120 |     test_step(test_images, test_labels)
121 | 
122 |   template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
123 |   print (template.format(epoch+1,
124 |                          train_loss.result(),
125 |                          train_accuracy.result()*100,
126 |                          test_loss.result(),
127 |                          test_accuracy.result()*100))
128 | ```
129 | 
130 | ```
131 |       Epoch 1, Loss: 0.13177014887332916, Accuracy: 96.06000518798828, Test Loss: 0.05814294517040253, Test Accuracy: 98.04999542236328 
132 |       ...
133 |       Epoch 5, Loss: 0.042211469262838364, Accuracy: 98.72000122070312, Test Loss: 0.05708516761660576, Test Accuracy: 98.3239974975586
134 | ```
135 | 
136 | 现在，图像分类器在该数据集上的准确度达到约98％。要了解更多信息，请阅读 [TensorFlow教程](https://tensorflow.google.cn/beta/tutorials/keras).。
137 | 
138 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-advanced.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-advanced.html)
139 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/advanced](https://tensorflow.google.cn/beta/tutorials/quickstart/advanced)
140 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/advanced.md)
141 | 


--------------------------------------------------------------------------------
/r2/tutorials/quickstart/beginner.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: 初学者入门 TensorFlow 2.0
 3 | categories: tensorflow2官方教程
 4 | tags: tensorflow2.0教程
 5 | top: 1905
 6 | abbrlink: tensorflow/tf2-tutorials-quickstart-beginner
 7 | ---
 8 | 
 9 | # 初学者入门 TensorFlow 2.0（tensorflow2.0官方教程翻译）
10 | 
11 | 安装命令：
12 | 
13 | ```shell
14 | pip install tensorflow-gpu==2.0.0-alpha0
15 | ```
16 | 
17 | 要开始，请将TensorFlow库导入您的程序：
18 | 
19 | ```python
20 | from __future__ import absolute_import, division, print_function, unicode_literals
21 | import tensorflow as tf
22 | ```
23 | 
24 | 加载并准备[MNIST数据集](http://yann.lecun.com/exdb/mnist/)，将样本从整数转换为浮点数：
25 | 
26 | ```python
27 | mnist = tf.keras.datasets.mnist
28 | 
29 | (x_train, y_train), (x_test, y_test) = mnist.load_data()
30 | x_train, x_test = x_train / 255.0, x_test / 255.0
31 | ```
32 | 
33 | 通过堆叠图层构建`tf.keras.Sequential`模型。选择用于训练的优化器和损失函数：
34 | 
35 | ```python
36 | model = tf.keras.models.Sequential([
37 |   tf.keras.layers.Flatten(input_shape=(28, 28)),
38 |   tf.keras.layers.Dense(128, activation='relu'),
39 |   tf.keras.layers.Dropout(0.2),
40 |   tf.keras.layers.Dense(10, activation='softmax')
41 | ])
42 | 
43 | model.compile(optimizer='adam',
44 |               loss='sparse_categorical_crossentropy',
45 |               metrics=['accuracy'])
46 | ```
47 | 
48 | 训练和评估模型：
49 | 
50 | ```python
51 | model.fit(x_train, y_train, epochs=5)
52 | 
53 | model.evaluate(x_test, y_test)
54 | ```
55 | 
56 | 现在，图像分类器在该数据集上的准确度达到约98％。 要了解更多信息，请阅读[TensorFlow教程](https://tensorflow.google.cn/beta/tutorials/).。
57 | 
58 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html)
59 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/beginner](https://tensorflow.google.cn/beta/tutorials/quickstart/beginner)
60 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md)
61 | 


--------------------------------------------------------------------------------
/r2/tutorials/text/image_captioning_44_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/image_captioning_44_0.png


--------------------------------------------------------------------------------
/r2/tutorials/text/image_captioning_48_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/image_captioning_48_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/image_captioning_48_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/image_captioning_48_2.png


--------------------------------------------------------------------------------
/r2/tutorials/text/image_captioning_50_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/image_captioning_50_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/image_captioning_50_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/image_captioning_50_2.png


--------------------------------------------------------------------------------
/r2/tutorials/text/images/embedding.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/images/embedding.jpg


--------------------------------------------------------------------------------
/r2/tutorials/text/images/embedding2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/images/embedding2.png


--------------------------------------------------------------------------------
/r2/tutorials/text/images/one-hot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/images/one-hot.png


--------------------------------------------------------------------------------
/r2/tutorials/text/nmt_with_attention_43_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/nmt_with_attention_43_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/nmt_with_attention_44_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/nmt_with_attention_44_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/nmt_with_attention_45_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/nmt_with_attention_45_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/nmt_with_attention_46_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/nmt_with_attention_46_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/text_classification_rnn.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 使用RNN对文本进行分类实践：电影评论
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1928
  6 | abbrlink: tensorflow/tf2-tutorials-text-text_classification_rnn
  7 | ---
  8 | 
  9 | # 使用RNN对文本进行分类实践：电影评论 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 本教程在[IMDB大型影评数据集](http://ai.stanford.edu/~amaas/data/sentiment/) 上训练一个循环神经网络进行情感分类。
 12 | 
 13 | ```python
 14 | from __future__ import absolute_import, division, print_function, unicode_literals
 15 | 
 16 | # !pip install tensorflow-gpu==2.0.0-alpha0
 17 | import tensorflow_datasets as tfds
 18 | import tensorflow as tf
 19 | ```
 20 | 
 21 | 导入matplotlib并创建一个辅助函数来绘制图形
 22 | 
 23 | ```python
 24 | import matplotlib.pyplot as plt
 25 | 
 26 | 
 27 | def plot_graphs(history, string):
 28 |   plt.plot(history.history[string])
 29 |   plt.plot(history.history['val_'+string])
 30 |   plt.xlabel("Epochs")
 31 |   plt.ylabel(string)
 32 |   plt.legend([string, 'val_'+string])
 33 |   plt.show()
 34 | ```
 35 | 
 36 | ## 1. 设置输入管道
 37 | 
 38 | IMDB大型电影影评数据集是一个二元分类数据集，所有评论都有正面或负面的情绪标签。
 39 | 
 40 | 使用[TFDS](https://tensorflow.google.cn/datasets)下载数据集，数据集附带一个内置的子字标记器
 41 | 
 42 | 
 43 | ```python
 44 | dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True,
 45 |                           as_supervised=True)
 46 | train_dataset, test_dataset = dataset['train'], dataset['test']
 47 | ```
 48 | 
 49 | 由于这是一个子字标记器，它可以传递任何字符串，并且标记器将对其进行标记。
 50 | 
 51 | ```python
 52 | tokenizer = info.features['text'].encoder
 53 | 
 54 | print ('Vocabulary size: {}'.format(tokenizer.vocab_size))
 55 | ```
 56 | ```
 57 |       Vocabulary size: 8185
 58 | ```
 59 | 
 60 | 
 61 | ```python
 62 | sample_string = 'TensorFlow is cool.'
 63 | 
 64 | tokenized_string = tokenizer.encode(sample_string)
 65 | print ('Tokenized string is {}'.format(tokenized_string))
 66 | 
 67 | original_string = tokenizer.decode(tokenized_string)
 68 | print ('The original string: {}'.format(original_string))
 69 | 
 70 | assert original_string == sample_string
 71 | ```
 72 | 
 73 | ```
 74 |       Tokenized string is [6307, 2327, 4043, 4265, 9, 2724, 7975]
 75 |       The original string: TensorFlow is cool.
 76 | ```
 77 | 
 78 | 如果字符串不在字典中，则标记生成器通过将字符串分解为子字符串来对字符串进行编码。
 79 | 
 80 | ```python
 81 | for ts in tokenized_string:
 82 |   print ('{} ----> {}'.format(ts, tokenizer.decode([ts])))
 83 | ```
 84 | 
 85 | ```
 86 |     6307 ----> Ten
 87 |     2327 ----> sor
 88 |     4043 ----> Fl
 89 |     4265 ----> ow
 90 |     9 ----> is
 91 |     2724 ----> cool
 92 |     7975 ----> .
 93 | ```
 94 | 
 95 | 
 96 | ```python
 97 | BUFFER_SIZE = 10000
 98 | BATCH_SIZE = 64
 99 | 
100 | train_dataset = train_dataset.shuffle(BUFFER_SIZE)
101 | train_dataset = train_dataset.padded_batch(BATCH_SIZE, train_dataset.output_shapes)
102 | 
103 | test_dataset = test_dataset.padded_batch(BATCH_SIZE, test_dataset.output_shapes)
104 | ```
105 | 
106 | ## 2. 创建模型
107 | 
108 | 构建一个`tf.keras.Sequential`模型并从嵌入层开始，嵌入层每个字存储一个向量，当被调用时，它将单词索引的序列转换为向量序列，这些向量是可训练的，在训练之后（在足够的数据上），具有相似含义的词通常具有相似的向量。
109 | 
110 | 这种索引查找比通过`tf.keras.layers.Dense`层传递独热编码向量的等效操作更有效。
111 | 
112 | 递归神经网络（RNN）通过迭代元素来处理序列输入，RNN将输出从一个时间步传递到其输入端，然后传递到下一个时间步。
113 | 
114 | `tf.keras.layers.Bidirectional`包装器也可以与RNN层一起使用。这通过RNN层向前和向后传播输入，然后连接输出。这有助于RNN学习远程依赖性。
115 | 
116 | ```python
117 | model = tf.keras.Sequential([
118 |     tf.keras.layers.Embedding(tokenizer.vocab_size, 64),
119 |     tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
120 |     tf.keras.layers.Dense(64, activation='relu'),
121 |     tf.keras.layers.Dense(1, activation='sigmoid')
122 | ])
123 | 
124 | # 编译Keras模型以配置训练过程：
125 | model.compile(loss='binary_crossentropy',
126 |               optimizer='adam',
127 |               metrics=['accuracy'])
128 | ```
129 | 
130 | ## 3. 训练模型
131 | 
132 | ```python
133 | history = model.fit(train_dataset, epochs=10,
134 |                     validation_data=test_dataset)
135 | ```
136 | 
137 | ```
138 |       ...
139 |       Epoch 10/10
140 |       391/391 [==============================] - 70s 180ms/step - loss: 0.3074 - accuracy: 0.8692 - val_loss: 0.5533 - val_accuracy: 0.7873
141 | ```
142 | 
143 | 
144 | ```python
145 | test_loss, test_acc = model.evaluate(test_dataset)
146 | 
147 | print('Test Loss: {}'.format(test_loss))
148 | print('Test Accuracy: {}'.format(test_acc))
149 | ```
150 | 
151 | ```
152 |           391/Unknown - 19s 47ms/step - loss: 0.5533 - accuracy: 0.7873Test Loss: 0.553319326714
153 |       Test Accuracy: 0.787320017815
154 | ```
155 | 
156 | 
157 | 上面的模型没有屏蔽应用于序列的填充。如果我们对填充序列进行训练，并对未填充序列进行测试，就会导致偏斜。理想情况下，模型应该学会忽略填充，但是正如您在下面看到的，它对输出的影响确实很小。
158 | 
159 | 如果预测 >=0.5，则为正，否则为负。
160 | 
161 | ```python
162 | def pad_to_size(vec, size):
163 |   zeros = [0] * (size - len(vec))
164 |   vec.extend(zeros)
165 |   return vec
166 | 
167 | def sample_predict(sentence, pad):
168 |   tokenized_sample_pred_text = tokenizer.encode(sample_pred_text)
169 | 
170 |   if pad:
171 |     tokenized_sample_pred_text = pad_to_size(tokenized_sample_pred_text, 64)
172 | 
173 |   predictions = model.predict(tf.expand_dims(tokenized_sample_pred_text, 0))
174 | 
175 |   return (predictions)
176 | ```
177 | 
178 | 
179 | ```python
180 | # 对不带填充的示例文本进行预测 
181 | 
182 | sample_pred_text = ('The movie was cool. The animation and the graphics '
183 |                     'were out of this world. I would recommend this movie.')
184 | predictions = sample_predict(sample_pred_text, pad=False)
185 | print (predictions)
186 | ```
187 | 
188 | ```
189 |         [[ 0.68914342]]
190 | ```
191 | 
192 | 
193 | ```python
194 | # 对带填充的示例文本进行预测 
195 | 
196 | sample_pred_text = ('The movie was cool. The animation and the graphics '
197 |                     'were out of this world. I would recommend this movie.')
198 | predictions = sample_predict(sample_pred_text, pad=True)
199 | print (predictions)
200 | ```
201 | 
202 | ```
203 |        [[ 0.68634349]]
204 | ```
205 | 
206 | ```python
207 | plot_graphs(history, 'accuracy')
208 | ```
209 | 
210 | ![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_29_0.png)
211 | 
212 | 
213 | ```python
214 | plot_graphs(history, 'loss')
215 | ```
216 | 
217 | ![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_30_0.png)
218 | 
219 | ## 4. 堆叠两个或更多LSTM层
220 | 
221 | Keras递归层有两种可以用的模式，由`return_sequences`构造函数参数控制：
222 | 
223 | * 返回每个时间步的连续输出的完整序列（3D张量形状 `(batch_size, timesteps, output_features)`）。
224 | 
225 | * 仅返回每个输入序列的最后一个输出（2D张量形状 `(batch_size, output_features)`）。
226 | 
227 | ```python
228 | model = tf.keras.Sequential([
229 |     tf.keras.layers.Embedding(tokenizer.vocab_size, 64),
230 |     tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(
231 |         64, return_sequences=True)),
232 |     tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
233 |     tf.keras.layers.Dense(64, activation='relu'),
234 |     tf.keras.layers.Dense(1, activation='sigmoid')
235 | ])
236 | 
237 | model.compile(loss='binary_crossentropy',
238 |               optimizer='adam',
239 |               metrics=['accuracy'])
240 | 
241 | history = model.fit(train_dataset, epochs=10,
242 |                     validation_data=test_dataset)
243 | ```
244 | 
245 | ```
246 |       ...
247 |       Epoch 10/10
248 |       391/391 [==============================] - 154s 394ms/step - loss: 0.1120 - accuracy: 0.9643 - val_loss: 0.5646 - val_accuracy: 0.8070
249 | ```
250 | 
251 | ```python
252 | test_loss, test_acc = model.evaluate(test_dataset)
253 | 
254 | print('Test Loss: {}'.format(test_loss))
255 | print('Test Accuracy: {}'.format(test_acc))
256 | ```
257 | 
258 | ```
259 |             391/Unknown - 45s 115ms/step - loss: 0.5646 - accuracy: 0.8070Test Loss: 0.564571284348
260 |         Test Accuracy: 0.80703997612
261 | ```
262 | 
263 | 
264 | ```python
265 | # 在没有填充的情况下预测示例文本
266 | 
267 | sample_pred_text = ('The movie was not good. The animation and the graphics '
268 |                     'were terrible. I would not recommend this movie.')
269 | predictions = sample_predict(sample_pred_text, pad=False)
270 | print (predictions)
271 | ```
272 | 
273 | ```
274 |        [[ 0.00393916]]
275 | ```
276 | 
277 | 
278 | ```python
279 | # 在有填充的情况下预测示例文本
280 | 
281 | sample_pred_text = ('The movie was not good. The animation and the graphics '
282 |                     'were terrible. I would not recommend this movie.')
283 | predictions = sample_predict(sample_pred_text, pad=True)
284 | print (predictions)
285 | ```
286 | 
287 | ```
288 |       [[ 0.01098633]]
289 | ```
290 | 
291 | 
292 | ```python
293 | plot_graphs(history, 'accuracy')
294 | ```
295 | 
296 | ![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_38_0.png)
297 | 
298 | 
299 | ```python
300 | plot_graphs(history, 'loss')
301 | ```
302 | 
303 | ![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_39_0.png)
304 | 
305 | 查看其它现有的递归层，例如[GRU层](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/GRU)。
306 | 
307 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_classification_rnn.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_classification_rnn.html)
308 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn)
309 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_classification_rnn.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_classification_rnn.md)
310 | 


--------------------------------------------------------------------------------
/r2/tutorials/text/text_classification_rnn_31_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/text_classification_rnn_31_0.png


--------------------------------------------------------------------------------
/r2/tutorials/text/text_classification_rnn_32_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/text_classification_rnn_32_0.png


--------------------------------------------------------------------------------
/r2/tutorials/text/text_classification_rnn_40_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/text_classification_rnn_40_0.png


--------------------------------------------------------------------------------
/r2/tutorials/text/text_classification_rnn_41_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/text_classification_rnn_41_0.png


--------------------------------------------------------------------------------
/r2/tutorials/text/transformer_107_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/transformer_107_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/transformer_27_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/transformer_27_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/transformer_82_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mashangxue/tensorflow2-zh/a9db132818277f840a47eaca66b85f2ff7d7f8db/r2/tutorials/text/transformer_82_1.png


--------------------------------------------------------------------------------
/r2/tutorials/text/word_embeddings.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: NLP词嵌入Word embedding实战项目
  3 | categories: tensorflow2官方教程
  4 | tags: tensorflow2.0教程
  5 | top: 1926
  6 | abbrlink: tensorflow/tf2-tutorials-text-word_embeddings
  7 | ---
  8 | 
  9 | # NLP词嵌入Word embedding实战项目 (tensorflow2.0官方教程翻译)
 10 | 
 11 | 本文介绍词嵌入向量 Word embedding，包含完整的代码，可以在小型数据集上从零开始训练词嵌入，并使用[Embedding Projector](http://projector.tensorflow.org) 可视化这些嵌入，如下图所示：
 12 | 
 13 | <img src="https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/text/images/embedding.jpg?raw=1" alt="Screenshot of the embedding projector" width="400"/>
 14 | 
 15 | > 词嵌入向量(Word Embedding)是NLP里面一个重要的概念，我们可以利用 WordEmbedding 将一个单词转换成固定长度的向量表示，从而便于进行数学处理。
 16 | 
 17 | ## 1. 将文本表示为数字
 18 | 
 19 | 机器学习模型以向量（数字数组）作为输入，在处理文本时，我们必须首先想出一个策略，将字符串转换为数字（或将文本“向量化”），然后再将其提供给模型。在本节中，我们将研究三种策略。
 20 | 
 21 | ### 1.1. 独热编码（One-hot encodings）
 22 | 
 23 | 首先，我们可以用“one-hot”对词汇的每个单词进行编码，想想“the cat sat on the mat”这句话，这个句子中的词汇（或独特的单词）是（cat,mat,on,The），为了表示每个单词，我们将创建一个长度等于词汇表的零向量，然后再对应单词的索引中放置一个1。这种方法如下图所示：
 24 | 
 25 | <img src="https://raw.githubusercontent.com/tensorflow/docs/master/site/en/r2/tutorials/text/images/one-hot.png" alt="Diagram of one-hot encodings" width="400" />
 26 | 
 27 | 为了创建包含句子编码的向量，我们可以连接每个单词的one-hot向量。
 28 | 
 29 | 关键点：这种方法是低效的，一个热编码的向量是稀疏的（意思是，大多数指标是零）。假设我们有10000个单词，要对每个单词进行一个热编码，我们将创建一个向量，其中99.99%的元素为零。
 30 | 
 31 | ### 1.2. 用唯一的数字编码每个单词
 32 | 
 33 | 我们尝试第二种方法，使用唯一的数字编码每个单词。继续上面的例子，我们可以将1赋值给“cat”，将2赋值给“mat”，以此类推，然后我们可以将句子“The cat sat on the mat”编码为像[5, 1, 4, 3, 5, 2]这样的密集向量。这种方法很有效，我们现有有一个稠密的向量（所有元素都是满的），而不是稀疏的向量。
 34 | 
 35 | 然而，这种方法有两个缺点：
 36 | 
 37 | * 整数编码是任意的（它不捕获单词之间的任何关系）。
 38 | 
 39 | * 对于模型来说，整数编码的解释是很有挑战性的。例如，线性分类器为每个特征学习单个权重。由于任何两个单词的相似性与它们编码的相似性之间没有关系，所以这种特征权重组合没有意义。
 40 | 
 41 | 
 42 | ### 1.3. 词嵌入
 43 | 
 44 | 词嵌入为我们提供了一种使用高效、密集表示的方法，其中相似的单词具有相似的编码，重要的是，我们不必手工指定这种编码，嵌入是浮点值的密集向量（向量的长度是您指定的参数），它们不是手工指定嵌入的值，而是可训练的参数（模型在训练期间学习的权重，与模型学习密集层的权重的方法相同）。通常会看到8维（对于小数据集）的词嵌入，在处理大型数据集时最多可达1024维。更高维度的嵌入可以捕获单词之间的细粒度关系，但需要更多的数据来学习。
 45 | 
 46 | <img src="https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/text/images/embedding2.png?raw=1" alt="Diagram of an embedding" width="400" />
 47 | 
 48 | 上面是词嵌入的图表，每个单词表示为浮点值的4维向量，另一种考虑嵌入的方法是“查找表”，在学习了这些权重之后，我们可以通过查找表中对应的密集向量来编码每个单词。
 49 | 
 50 | ## 2. 利用 Embedding 层学习词嵌入
 51 | 
 52 | Keras可以轻松使用词嵌入。我们来看看 [Embedding](https://tensorflow.goolge.cn/api_docs/python/tf/keras/layers/Embedding) 层。
 53 | 
 54 | ```python
 55 | from __future__ import absolute_import, division, print_function, unicode_literals
 56 | 
 57 | # !pip install tf-nightly-2.0-preview
 58 | import tensorflow as tf
 59 | 
 60 | from tensorflow import keras
 61 | from tensorflow.keras import layers
 62 | 
 63 | # Embedding层至少需要两个参数： 
 64 | # 词汇表中可能的单词数量，这里是1000（1+最大单词索引）； 
 65 | # embeddings的维数，这里是32.。
 66 | embedding_layer = layers.Embedding(1000, 32)
 67 | ```
 68 | 
 69 | Embedding层可以理解为一个查询表，它从整数索引（表示特定的单词）映射到密集向量（它们的嵌入）。嵌入的维数（或宽度）是一个参数，您可以用它进行试验，看看什么对您的问题有效，这与您在一个密集层中对神经元数量进行试验的方法非常相似。
 70 | 
 71 | 创建Embedding层时，嵌入的权重会随机初始化（就像任何其他层一样），在训练期间，它们通过反向传播逐渐调整，一旦经过训练，学习的词嵌入将粗略地编码单词之间的相似性（因为它们是针对您的模型所训练的特定问题而学习的）。
 72 | 
 73 | 作为输入，Embedding层采用形状`(samples, sequence_length)`的整数2D张量，其中每个条目都是整数序列，它可以嵌入可以变长度的序列。您可以使用形状`(32, 10)` （批次为32个长度为10的序列）或`(64, 15)` （批次为64个长度为15的序列）导入上述批次的嵌入层，批处理中的序列必须具有相同的长度，因此较短的序列应该用零填充，较长的序列应该被截断。
 74 | 
 75 | 作为输出，Embedding层返回一个形状`(samples, sequence_length, embedding_dimensionality)`的三维浮点张量，这样一个三维张量可以由一个RNN层来处理，也可以简单地由一个扁平化或合并的密集层处理。我们将在本教程中展示第一种方法，您可以参考[使用RNN的文本分类](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/text/text_classification_rnn.ipynb)来学习第一种方法。
 76 | 
 77 | 
 78 | ## 3. 从头开始学习嵌入
 79 | 
 80 | 我们将在 IMDB 影评上训练一个情感分类器，在这个过程中，我们将从头开始学习嵌入，通过下载和预处理数据集的代码快速开始(请参阅本教程[tutorial](https://tensorflow.goolge.cn/tutorials/keras/basic_text_classification)了解更多细节)。
 81 | 
 82 | 
 83 | ```python
 84 | vocab_size = 10000
 85 | imdb = keras.datasets.imdb
 86 | (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=vocab_size)
 87 | 
 88 | print(train_data[0])
 89 | ```
 90 | 
 91 | ```
 92 |       [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, ...]
 93 | ```
 94 | 
 95 | 导入时，评论文本是整数编码的（每个整数代表字典中的特定单词）。
 96 | 
 97 | 
 98 | 
 99 | ### 3.1. 将整数转换会单词
100 | 
101 | 了解如何将整数转换回文本可能很有用，在这里我们将创建一个辅助函数来查询包含整数到字符串映射的字典对象：
102 | 
103 | ```python
104 | # 将单词映射到整数索引的字典
105 | word_index = imdb.get_word_index()
106 | 
107 | # 第一个指数是保留的
108 | word_index = {k:(v+3) for k,v in word_index.items()}
109 | word_index["<PAD>"] = 0
110 | word_index["<START>"] = 1
111 | word_index["<UNK>"] = 2  # unknown
112 | word_index["<UNUSED>"] = 3
113 | 
114 | reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
115 | 
116 | def decode_review(text):
117 |     return ' '.join([reverse_word_index.get(i, '?') for i in text])
118 | 
119 | decode_review(train_data[0])
120 | ```
121 | 
122 | ```
123 | Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
124 | 1646592/1641221 [==============================] - 0s 0us/step
125 | 
126 | "<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ..."
127 | ```
128 | 
129 | 电影评论可以有不同的长度，我们将使用`pad_sequences`函数来标准化评论的长度：
130 | 
131 | 
132 | ```python
133 | maxlen = 500
134 | 
135 | train_data = keras.preprocessing.sequence.pad_sequences(train_data,
136 |                                                         value=word_index["<PAD>"],
137 |                                                         padding='post',
138 |                                                         maxlen=maxlen)
139 | 
140 | test_data = keras.preprocessing.sequence.pad_sequences(test_data,
141 |                                                        value=word_index["<PAD>"],
142 |                                                        padding='post',
143 |                                                        maxlen=maxlen)
144 |                                                        
145 | print(train_data[0])                                                       
146 | ```
147 | 
148 | 检查填充数据的第一个元素：
149 | 
150 | ```
151 |     [   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941
152 |         4  173   36  256    5   25  100   43  838  112   50  670    2    9
153 |        ...
154 |         0    0    0    0    0    0    0    0    0    0]
155 | ```
156 | 
157 | ### 3.2. 创建一个简单的模型
158 | 
159 | 我们将使用 [Keras Sequential API](https://www.tensorflow.org/guide/keras)来定义我们的模型。
160 | 
161 | * 第一层是`Embedding`层。该层采用整数编码的词汇表，并查找每个词索引的嵌入向量，这些向量是作为模型训练学习的，向量为输出数组添加维度，得到的维度是:`(batch, sequence, embedding)`。
162 | 
163 | * 接下来，`GlobalAveragePooling1D`层通过对序列维度求平均，为每个示例返回固定长度的输出向量，这允许模型以尽可能最简单的方式处理可变长度的输入。
164 | 
165 | * 该固定长度输出矢量通过具有16个隐藏单元的完全连接（`Dense`）层进行管道传输。
166 | 
167 | * 最后一层与单个输出节点密集连接，使用`sigmoid`激活函数，此值是介于0和1之间的浮点值，表示评论为正的概率（或置信度）。
168 | 
169 | ```python
170 | embedding_dim=16
171 | 
172 | model = keras.Sequential([
173 |   layers.Embedding(vocab_size, embedding_dim, input_length=maxlen),
174 |   layers.GlobalAveragePooling1D(),
175 |   layers.Dense(16, activation='relu'),
176 |   layers.Dense(1, activation='sigmoid')
177 | ])
178 | 
179 | model.summary()
180 | ```
181 | 
182 | ```
183 |       Model: "sequential"
184 |       _________________________________________________________________
185 |       Layer (type)                 Output Shape              Param #   
186 |       =================================================================
187 |       embedding_1 (Embedding)      (None, 500, 16)           160000    
188 |       _________________________________________________________________
189 |       global_average_pooling1d (Gl (None, 16)                0         
190 |       _________________________________________________________________
191 |       dense (Dense)                (None, 16)                272       
192 |       _________________________________________________________________
193 |       dense_1 (Dense)              (None, 1)                 17        
194 |       =================================================================
195 |       Total params: 160,289
196 |       Trainable params: 160,289
197 |       Non-trainable params: 0
198 |       _________________________________________________________________
199 | ```
200 | 
201 | ### 3.3. 编译和训练模型
202 | 
203 | 
204 | ```python
205 | model.compile(optimizer='adam',
206 |               loss='binary_crossentropy',
207 |               metrics=['accuracy'])
208 | 
209 | history = model.fit(
210 |     train_data,
211 |     train_labels,
212 |     epochs=30,
213 |     batch_size=512,
214 |     validation_split=0.2)
215 | ```
216 | 
217 | ```
218 |       Train on 20000 samples, validate on 5000 samples
219 |       ...
220 |       Epoch 30/30
221 |       20000/20000 [==============================] - 1s 54us/sample - loss: 0.1639 - accuracy: 0.9449 - val_loss: 0.2840 - val_accuracy: 0.8912
222 | ```
223 | 
224 | 通过这种方法，我们的模型达到了大约88%的验证精度（注意模型过度拟合，训练精度显著提高）。
225 | 
226 | ```python
227 | import matplotlib.pyplot as plt
228 | 
229 | acc = history.history['accuracy']
230 | val_acc = history.history['val_accuracy']
231 | 
232 | epochs = range(1, len(acc) + 1)
233 | 
234 | plt.figure(figsize=(12,9))
235 | plt.plot(epochs, acc, 'bo', label='Training acc')
236 | plt.plot(epochs, val_acc, 'b', label='Validation acc')
237 | plt.title('Training and validation accuracy')
238 | plt.xlabel('Epochs')
239 | plt.ylabel('Accuracy')
240 | plt.legend(loc='lower right')
241 | plt.ylim((0.5,1))
242 | 
243 | plt.show()
244 | ```
245 | 
246 | ```
247 | <Figure size 1200x900 with 1 Axes>
248 | <Figure size 1200x900 with 1 Axes>
249 | ```
250 | 
251 | ## 4. 检索学习的嵌入
252 | 
253 | 接下来，让我们检索在训练期间学习的嵌入词，这将是一个形状矩阵 `(vocab_size,embedding-dimension)`。
254 | 
255 | ```python
256 | e = model.layers[0]
257 | weights = e.get_weights()[0]
258 | print(weights.shape) # shape: (vocab_size, embedding_dim)
259 | ```
260 | ```
261 |     (10000, 16)
262 | ```
263 | 
264 | 我们现在将权重写入磁盘。要使用[Embedding Projector](http://projector.tensorflow.org)，我们将以制表符分隔格式上传两个文件：向量文件（包含嵌入）和元数据文件（包含单词）。
265 | 
266 | ```python
267 | import io
268 | 
269 | out_v = io.open('vecs.tsv', 'w', encoding='utf-8')
270 | out_m = io.open('meta.tsv', 'w', encoding='utf-8')
271 | for word_num in range(vocab_size):
272 |   word = reverse_word_index[word_num]
273 |   embeddings = weights[word_num]
274 |   out_m.write(word + "\n")
275 |   out_v.write('\t'.join([str(x) for x in embeddings]) + "\n")
276 | out_v.close()
277 | out_m.close()
278 | ```
279 | 
280 | 如果您在Colaboratory中运行本教程，则可以使用以下代码段将这些文件下载到本地计算机（或使用文件浏览器， *View -> Table of contents -> File browser*）。
281 | 
282 | 
283 | ```python
284 | try:
285 |   from google.colab import files
286 | except ImportError:
287 |   pass
288 | else:
289 |   files.download('vecs.tsv')
290 |   files.download('meta.tsv')
291 | ```
292 | 
293 | ## 5. 可视化嵌入
294 | 
295 | 为了可视化我们的嵌入，我们将把它们上传到[Embedding Projector](http://projector.tensorflow.org)。
296 | 
297 | 打开[Embedding Projector](http://projector.tensorflow.org)：
298 | 
299 | * 点击“Load data”
300 | 
301 | * 上传我们上面创建的两个文件：`vecs.tsv`和`meta.tsv`。
302 | 
303 | 现在将显示您已训练的嵌入，您可以搜索单词以查找最近的邻居。例如，尝试搜索“beautiful”，你可能会看到像“wonderful”这样的邻居。注意：您的结果可能有点不同，这取决于在训练嵌入层之前如何随机初始化权重。
304 | 
305 | *注意：通过实验，你可以使用更简单的模型生成更多可解释的嵌入，尝试删除`Dense（16）`层，重新训练模型，再次可视化嵌入。*
306 | 
307 | <img src="https://raw.githubusercontent.com/tensorflow/docs/master/site/en/r2/tutorials/text/images/embedding.jpg" alt="Screenshot of the embedding projector" width="400"/>
308 | 
309 | 
310 | ## 6. 下一步
311 | 
312 | 本教程向你展示了如何在小型数据集上从头开始训练和可视化词嵌入。
313 | 
314 | * 要了解有关嵌入Keras的更多信息，我们推荐FrançoisChollet推出的教程，[链接](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.2-understanding-recurrent-neural-networks.ipynb)。
315 | 
316 | * 要了解有关文本分类的更多信息（包括整体工作流程，如果您对何时使用嵌入与one-hot编码感到好奇），我们推荐[Google的实战课程-文本分类指南](https://developers.google.cn/machine-learning/guides/text-classification/step-2-5)。
317 | 
318 | > 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-word_embeddings.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-word_embeddings.html)
319 | > 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/word_embeddings](https://tensorflow.google.cn/beta/tutorials/text/word_embeddings)
320 | > 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/word_embeddings.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/word_embeddings.md)
321 | 


--------------------------------------------------------------------------------