├── .gitignore ├── 1-introduction ├── ch1-welcome.md └── part1.md ├── 2-background ├── ch2-introduction-to-theano.md ├── ch3-introduction-to-tensorflow.md ├── ch4-introduction-to-keras.md ├── ch5-project-develop-large-models-on-gpus-cheaply-in-the-cloud.md └── part2.md ├── 3-multi-layer-perceptrons ├── ch10-project-multiclass-classification-of-flower-species.md ├── ch11-project-binary-classification-of-sonar-returns.md ├── ch12-project-regression-of-boston-house-prices.md ├── ch6-crash-course-in-multi-layer-perceptrons.md ├── ch7-develop-your-first-neural-network-with-keras.md ├── ch8-evaluate-the-performance-of-deep-learning-models.md ├── ch9-use-keras-models-with-scikit-learn-for-general-machine-learning.md └── part3.md ├── 4-advanced-multi-layer-perceptrons-and-keras ├── ch13-save-your-models-for-later-with-serialization.md ├── ch14-keep-the-best-models-during-training-with-checkpointing.md ├── ch15-understand-model-behavior-during-training-by-plotting-history.md ├── ch16-reduce-overfitting-with-dropout-regularization.md ├── ch17-lift-performance-with-learning-rate-schedules.md └── part4.md ├── 5-convolutional-neural-networks └── part5.md ├── LICENSE ├── Preface-translator.md ├── Preface.md ├── README.md ├── SUMMARY.md └── book.json /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | yarn-debug.log* 6 | yarn-error.log* 7 | 8 | # Runtime data 9 | pids 10 | *.pid 11 | *.seed 12 | *.pid.lock 13 | 14 | # Directory for instrumented libs generated by jscoverage/JSCover 15 | lib-cov 16 | 17 | # Coverage directory used by tools like istanbul 18 | coverage 19 | 20 | # nyc test coverage 21 | .nyc_output 22 | 23 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 24 | .grunt 25 | 26 | # Bower dependency directory (https://bower.io/) 27 | bower_components 28 | 29 | # node-waf configuration 30 | .lock-wscript 31 | 32 | # Compiled binary addons (http://nodejs.org/api/addons.html) 33 | build/Release 34 | 35 | # Dependency directories 36 | node_modules/ 37 | jspm_packages/ 38 | 39 | # Typescript v1 declaration files 40 | typings/ 41 | 42 | # Optional npm cache directory 43 | .npm 44 | 45 | # Optional eslint cache 46 | .eslintcache 47 | 48 | # Optional REPL history 49 | .node_repl_history 50 | 51 | # Output of 'npm pack' 52 | *.tgz 53 | 54 | # Yarn Integrity file 55 | .yarn-integrity 56 | 57 | # dotenv environment variables file 58 | .env 59 | 60 | -------------------------------------------------------------------------------- /1-introduction/ch1-welcome.md: -------------------------------------------------------------------------------- 1 | ## 第一章 深度学习入门 2 | 3 | 欢迎购买本书:本书旨在帮助您使用Python进行深度学习,包括如何使用Keras构建和运行深度学习模型。本书也包括深度学习的技巧、示例代码和技术内容。 4 | 5 | 深度学习的数学基础很精妙:但是一般用户不需要完全了解数学细节就可以抄起键盘开始编程。实用一点讲,深度学习并不复杂,带来的成效却很客观。教会你如何用深度学习:这就是本书的目的。 6 | 7 | #### 1.1 深度学习:如何错误入门 8 | 9 | 如果你去问大佬们深度学习如何入门,他们会怎么说?不外乎: 10 | 11 | - 线性代数是关键啊! 12 | - 你得了解传统神经网络才能干啊! 13 | - 概率论和统计学是基础的基础不是吗? 14 | - 你得先在机器学习的水里扑腾几年再来啊。 15 | - 不是计算机博士不要和我说话好吗! 16 | - 入门挺简单的:10年经验应该差不多也行有可能就够了吧。 17 | 18 | 总结一下:只有大神才能做深度学习。 19 | 20 | **净TM扯淡!** 21 | 22 | #### 1.2 使用Python进行深度学习 23 | 24 | 本书准备把传统的教学方式倒过来:直接教你怎么深度学习。如果你觉得这东西真厉害我要好好研究一下,再去研究理论细节。本书直接让你用深度学习写出能跑的东西。 25 | 26 | 我用了不少深度学习的库:我觉得最好的还是基于Python的Keras。Python是完整的成熟语言,可以直接用于商业项目的核心,这点R是比不上的。和Java比,Python有SciPy和scikit-learn这些专业级别的包,可以快速搭建平台。 27 | 28 | Python的深度学习库有很多,最著名的是蒙特利尔大学的Theano(已死,有事烧纸)和Google的TensorFlow。这两个库都很简单,Keras都无缝支持。Keras把数值计算的部分封装掉,留下搭建神经网络和深度学习模型的重点API。 29 | 30 | 本书会带领你亲手构建神经网络和深度学习模型,告诉你如何在自己的项目中利用。废话少说,赶快开始: 31 | 32 | #### 1.3 本书结构 33 | 34 | 本书分3部分: 35 | 36 | - 课程:介绍某个神经网络的某个功能,以及如何使用Keras的API写出来 37 | - 项目:将课上的知识放在一起,写一个项目:这个项目可以作为模板 38 | - 示例:直接可以复制粘贴的代码!本书还附赠了很多代码,在Github上! 39 | 40 | ##### 1.3.1 第一部分:课程和项目 41 | 42 | 每节课是独立的,推荐一次性完成,时长短则20分钟,长则数小时 - 如果你想仔细调参数。课程分4块: 43 | 44 | - 背景 45 | - 多层感知器 46 | - 高级多层感知器和Keras 47 | - 卷积神经网络 48 | 49 | ##### 1.3.2 第二部分:背景知识 50 | 51 | 这部分我们介绍Theano、TensorFlow(TF)和Keras这3个库,以及如何在亚马逊的云服务(AWS)上用低廉的价格测试你的网络。分成4个部分: 52 | 53 | - Theano 入门 54 | - TensorFlow 入门 55 | - Keras 入门 56 | 57 | 这些是最重要的深度学习库。我们多介绍一点东西: 58 | 59 | - 项目:在云上部署GPU项目 60 | 61 | 到这里你应该准备好用Keras开发模型了。 62 | 63 | ##### 1.3.3 第三部分:多层感知器 64 | 65 | 这部分我们介绍前馈神经网络,以及如何用Keras写出自己的网络。大体分段: 66 | 67 | - 多层感知器入门 68 | - 用Keras开发第一个神经网络 69 | - 测试神经网络模型性能 70 | - 用Scikit-Learn和Keras模型进行机器学习 71 | 72 | 这里有3个项目可以帮助你开发神经网络,以及为之后的网络打下模板: 73 | 74 | - 项目:多类分类 75 | - 项目:分类问题 76 | - 项目:回归问题 77 | 78 | 到这里你已经熟悉了Keras的基本操作。 79 | 80 | ##### 1.3.4 第四部分:高级多层感知器 81 | 82 | 这部分我们进一步探索Keras的API,研究如何得到世界顶级的结果。内容包括: 83 | 84 | - 如何保存神经网络 85 | - 如何保存最好的网络 86 | - 如何边训练观察训练结果 87 | - 如何对付过拟合 88 | - 如何提高训练速度 89 | 90 | 到这里你已经可以使用Keras开发成熟的模型了。 91 | 92 | ##### 1.3.5 第五部分:卷积神经网络(CNN) 93 | 94 | 这部分我们介绍一些计算机视觉和自然语言的问题,以及如何用Keras构建神经网络出色地解决问题。内容包括: 95 | 96 | - 卷积神经网络入门 97 | - 如何增强模型效果 98 | 99 | 写代码才能真正理解网络:这里我们用CNN解决如下问题: 100 | 101 | - 项目:手写字符识别 102 | - 项目:图像物体识别 103 | - 项目:影视评论分类 104 | 105 | 到这里你可以用CNN对付你遇到的实际问题了。 106 | 107 | 108 | ##### 1.3.6 结论 109 | 110 | 这部分我们给你提供一些继续深造的资料。 111 | 112 | ##### 1.3.7 示例 113 | 114 | 边学习边积累代码库:每个问题你都写了代码,供以后使用。 115 | 116 | 本书给你所有项目的代码,以及一些没有讲到的Keras代码。自己动手积累吧! 117 | 118 | 119 | #### 1.4 本书需求 120 | 121 | ##### 1.4.1 Python和SciPy 122 | 123 | 你起码得会装Python和SciPy,本书默认你都配置好了。你可以在自己的机器上,或者虚拟机/Docker/云端配置好环境。参见第二章项目。 124 | 125 | 本书使用的软件和库: 126 | 127 | - Python 2或3:本书用版本2.7.11. 128 | - SciPy和NumPy:本书用SciPy 0.17.0和NumPy 1.11.0. 129 | - Matplotlib:本书用版本1.5.1 130 | - Pandas:本书用版本0.18.0 131 | - scikit-learn:本书用版本0.17.1。 132 | 133 | 版本不需要完全一致:但是希望安装的版本不要低于上面的要求。第二部分会带领你配置环境。 134 | 135 | ##### 1.4.2 机器学习 136 | 137 | 你不需要专业背景,但是会用scikit-learn研究简单的机器学习很有帮助。交叉检验等基本概念了解一下。书后有参考资料:简单阅读一下。 138 | 139 | ##### 1.4.3 深度学习 140 | 141 | 你不需要知道算法的数学理论,但是概念需要有所了解。本书有个神经网络和模型的入门,但是不会深度研究细节。术后有参考资料:希望你对神经网络有点概念。 142 | 143 | 注意:所有的例子都可以用CPU跑,GPU不是必备的,但是GPU可以显著加速运算。第5章会告诉你如何在云上配置GPU。 144 | 145 | ### 1.5 本书目标 146 | 147 | 希望你看完本书后有能力从数据集上用Python开发深度学习算法。包括: 148 | 149 | - 如何开发并测试深度学习模型 150 | - 如何使用高级技巧 151 | - 如何为图片和文本数据构建大模型 152 | - 如何扩大图片数据 153 | - 如何寻求帮助 154 | 155 | 现在可以开始了。你可以挑自己需要的主题阅读,也可以从头到尾走一遍流程。我推荐后者。 156 | 157 | 希望你亲手做每一个例子,将所思所想记录下来。我的邮箱是jason@MachineLearningMastery.com。本书希望你努力一下,尽快成为深度学习工程师。 158 | 159 | ### 1.6 本书不是什么 160 | 161 | 本书为开发者提供深度学习的入门教程,但是挂一漏万。本书不是: 162 | 163 | - 深度学习教科书:本书不深入神经网络的理论细节,请自行学习。 164 | - 算法书:我们不关注算法如何工作,请自行学习。 165 | - Python编程书:本书不深入讲解Python的用法,希望你已经会Python了。 166 | 167 | 如果需要深入了解某个主题,请看书后的帮助。 168 | 169 | ## 1.7 总结 170 | 171 | 此时此刻,深度学习的工具处于历史顶峰,神经网络和深度学习的发展从未如此之快,在无数领域出神入化。希望你玩的开心。 172 | 173 | ### 1.7.1 下一步 174 | 175 | 下一章我们讲解一下Theano、TensorFlow和你要用的Keras。 176 | -------------------------------------------------------------------------------- /1-introduction/part1.md: -------------------------------------------------------------------------------- 1 | ## 第一部分 入门 2 | -------------------------------------------------------------------------------- /2-background/ch2-introduction-to-theano.md: -------------------------------------------------------------------------------- 1 | 略 2 | -------------------------------------------------------------------------------- /2-background/ch3-introduction-to-tensorflow.md: -------------------------------------------------------------------------------- 1 | ### 第3章 TensorFlow入门 2 | 3 | TensorFlow是Google创造的数值运算库,作为深度学习的底层使用。本章包括: 4 | 5 | - TensorFlow介绍 6 | - 如何用TensorFlow定义、编译并运算表达式 7 | - 如何寻求帮助 8 | 9 | 注意:TensorFlow暂时不支持Windows,你可以用Docker或虚拟机。Windows用户可以不看这章。 10 | 11 | #### 3.1 TensorFlow是什么? 12 | 13 | TensorFlow是开源数学计算引擎,由Google创造,用Apache 2.0协议发布。TF的API是Python的,但底层是C++。和Theano不同,TF兼顾了工业和研究,在RankBrain、DeepDream等项目中使用。TF可以在单个CPU或GPU,移动设备以及大规模分布式系统中使用。 14 | 15 | #### 3.2 安装TensorFlow 16 | 17 | TF支持Python 2.7和3.3以上。安装很简单: 18 | 19 | ```bash 20 | sudo pip install TensorFlow 21 | ``` 22 | 23 | 就好了。 24 | 25 | #### 3.3 TensorFlow例子 26 | 27 | TF的计算是用图表示的: 28 | 29 | - 节点:节点进行计算,有一个或者多个输入输出。节点间的数据叫张量:多维实数数组。 30 | - 边缘:定义数据、分支、循环和覆盖的图,也可以进行高级操作,例如等待某个计算完成。 31 | - 操作:取一个输入值,得出一个输出值,例如,加减乘除。 32 | 33 | #### 3.4 简单的TensorFlow 34 | 35 | 简单说一下TensorFlow:我们定义a和b两个浮点变量,定义一个表达式(c=a+b),将表达式变成函数,编译,进行计算: 36 | 37 | ```python 38 | import tensorflow as tf 39 | # declare two symbolic floating-point scalars 40 | a = tf.placeholder(tf.float32) 41 | b = tf.placeholder(tf.float32) 42 | # create a simple symbolic expression using the add function add = tf.add(a, b) 43 | # bind 1.5 to 'a', 2.5 to 'b', and evaluate 'c' 44 | sess = tf.Session() 45 | binding = {a: 1.5, b: 2.5} 46 | c = sess.run(add, feed_dict=binding) 47 | print(c) 48 | ``` 49 | 50 | 结果是4: 1.5+2.5=4.0。大的矩阵操作类似。 51 | 52 | #### 3.5 其他深度学习模型 53 | 54 | TensorFlow自带很多模型,可以直接调用:首先,看看TensorFlow的安装位置: 55 | 56 | ```python 57 | python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))' 58 | ``` 59 | 60 | 结果类似于: 61 | 62 | ``` 63 | /usr/lib/python2.7/site-packages/tensorflow 64 | ``` 65 | 66 | 进入该目录,可以看见很多例子: 67 | 68 | - 多线程word2vec mini-batch Skip-Gram模型 69 | - 多线程word2vec Skip-Gram模型 70 | - CIFAR-10的CNN模型 71 | - 类似LeNet-5的端到端的MNIST模型 72 | - 带注意力机制的端到端模型 73 | 74 | example目录带有MNIST数据集的例子,TensorFlow的网站也很有帮助,包括不同的网络、数据集。TensorFlow也有个网页版,可以直接试验。 75 | 76 | #### 3.6 总结 77 | 78 | 本章关于TensorFlow。总结一下: 79 | 80 | - TensorFlow和Theano一样,都是数值计算库 81 | - TensorFlow和Theano一样可以直接开发模型 82 | - TensorFlow比Theano包装的好一些 83 | 84 | ##### 3.6.1 下一章 85 | 86 | 下一章我们研究Keras:我们用这个库开发深度学习模型。 87 | 88 | -------------------------------------------------------------------------------- /2-background/ch4-introduction-to-keras.md: -------------------------------------------------------------------------------- 1 | ### 第4章 Keras入门 2 | 3 | Python的科学计算包主要是Theano和TensorFlow:很强大,但有点难用。Keras可以基于这两种包之一方便地建立神经网络。本章包括: 4 | 5 | - 使用Keras进行深度学习 6 | - 如何配置Keras的后端 7 | - Keras的常见操作 8 | 9 | 我们开始吧。 10 | 11 | #### 4.1 Keras是什么? 12 | 13 | Keras可以基于Theano或TensorFlow建立深度学习模型,方便研究和开发。Keras可以在Python 2.7或3.5运行,无痛调用后端的CPU或GPU网络。Keras由Google的Francois Chollet开发,遵循以下原则: 14 | 15 | - 模块化:每个模块都是单独的流程或图,深度学习的所有问题都可以通过组装模块解决 16 | - 简单化:提供解决问题的最简单办法,不加装饰,最大化可读性 17 | - 扩展性:新模块的添加特别容易,方便试验新想法 18 | - Python:不使用任何自创格式,只使用原生Python 19 | 20 | #### 4.2 安装Keras 21 | 22 | Keras很好安装,但是你需要至少安装Theano或TensorFlow之一。 23 | 24 | 使用PyPI安装Keras: 25 | 26 | ``` 27 | sudo pip install keras 28 | ``` 29 | 30 | 本书完成时,Keras的最新版本是1.0.1。下面这句话可以看Keras的版本: 31 | 32 | ``` 33 | python -c "import keras; print keras.__version__" 34 | ``` 35 | 36 | Python会显示Keras的版本号,例如: 37 | 38 | ``` 39 | 1.0.1 40 | ``` 41 | 42 | Keras的升级也是一句话: 43 | 44 | ``` 45 | sudo pip install --upgrade keras 46 | ``` 47 | 48 | #### 4.3 配置Keras的后端 49 | 50 | Keras是Theano和TensorFlow的轻量级API,所以必须配合后端使用。后端配置只需要一个文件: 51 | 52 | ``` 53 | ~/.keras/keras.json 54 | ``` 55 | 56 | 里面是: 57 | 58 | ``` 59 | {"epsilon": 1e-07, "floatx": "float32", "backend": "theano"} 60 | ``` 61 | 62 | 默认的后端是```theano ```,可以改成```tensorflow```。下面这行命令会显示Keras的后端: 63 | 64 | ``` 65 | python -c "from keras import backend; print backend._BACKEND" 66 | ``` 67 | 68 | 默认会显示: 69 | 70 | ``` 71 | Using Theano backend. 72 | theano 73 | ``` 74 | 75 | 变量```KERAS_BACKEND```可以控制Keras的后端,例如: 76 | 77 | ``` 78 | KERAS_BACKEND=tensorflow python -c "from keras import backend; print backend._BACKEND" 79 | ``` 80 | 81 | 会输出: 82 | 83 | ``` 84 | Using TensorFlow backend. 85 | tensorflow 86 | ``` 87 | 88 | #### 4.4 使用Keras搭建深度学习模型 89 | 90 | Keras的目标就是搭建模型。最主要的模型是```Sequential```:不同层的叠加。模型创建后可以编译,调用后端进行优化,可以指定损失函数和优化算法。 91 | 92 | 编译后的模型需要导入数据:可以一批批加入数据,也可以一次性全加入。所有的计算在这步进行。训练后的模型就可以做预测或分类了。大体上的步骤是: 93 | 94 | 1. 定义模型:创建```Sequential```模型,加入每一层 95 | 2. 编译模型:指定损失函数和优化算法,使用模型的```compile()```方法 96 | 3. 拟合数据:使用模型的```fit()```方法拟合数据 97 | 4. 进行预测:使用模型的```evaluate()``` 或 ```predict()```方法进行预测 98 | 99 | #### 4.5 总结 100 | 101 | 本章关于Keras。总结一下: 102 | 103 | - Keras是Theano和TensorFlow的封装,降低了复杂性 104 | - Keras是最小化、模块化的封装,可以迅速上手 105 | - Keras可以通过定义-编译-拟合搭建模型,进行预测 106 | 107 | ##### 4.5.1 下一章 108 | 109 | 这是Python机器学习的最前沿:下个项目我们一步步在云上搭建机器学习的环境。 110 | 111 | -------------------------------------------------------------------------------- /2-background/ch5-project-develop-large-models-on-gpus-cheaply-in-the-cloud.md: -------------------------------------------------------------------------------- 1 | TODO 2 | -------------------------------------------------------------------------------- /2-background/part2.md: -------------------------------------------------------------------------------- 1 | ## 第二部分 背景 2 | 3 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch10-project-multiclass-classification-of-flower-species.md: -------------------------------------------------------------------------------- 1 | ### 第10章 项目:多类花朵分类 2 | 3 | 本章我们使用Keras为多类分类开发并验证一个神经网络。本章包括: 4 | 5 | - 将CSV导入Keras 6 | - 为Keras预处理数据 7 | - 使用scikit-learn验证Keras模型 8 | 9 | 我们开始吧。 10 | 11 | #### 10.1 鸢尾花分类数据集 12 | 13 | 本章我们使用经典的鸢尾花数据集。这个数据集已经被充分研究过,4个输入变量都是数字,量纲都是厘米。每个数据代表花朵的不同参数,输出是分类结果。数据的属性是(厘米): 14 | 15 | 1. 萼片长度 16 | 2. 萼片宽度 17 | 3. 花瓣长度 18 | 4. 花瓣宽度 19 | 5. 类别 20 | 21 | 这个问题是多类分类的:有两种以上的类别需要预测,确切的说,3种。这种问题需要对神经网络做出特殊调整。数据有150条:前5行是: 22 | 23 | ```python 24 | 5.1,3.5,1.4,0.2,Iris-setosa 25 | 4.9,3.0,1.4,0.2,Iris-setosa 26 | 4.7,3.2,1.3,0.2,Iris-setosa 27 | 4.6,3.1,1.5,0.2,Iris-setosa 28 | 5.0,3.6,1.4,0.2,Iris-setosa 29 | ``` 30 | 31 | 鸢尾花数据集已经被充分研究,模型的准确率可以达到95%到97%,作为目标很不错。本书的data目录下附带了示例代码和数据,也可以从UCI机器学习网站下载,重命名为```iris.csv```。数据集的详情请在UCI机器学习网站查询。 32 | 33 | #### 10.2 导入库和函数 34 | 35 | 我们导入所需要的库和函数,包括深度学习包Keras、数据处理包pandas和模型测试包scikit-learn。 36 | 37 | ```python 38 | import numpy 39 | import pandas 40 | from keras.models import Sequential 41 | from keras.layers import Dense 42 | from keras.wrappers.scikit_learn import KerasClassifier 43 | from keras.utils import np_utils 44 | from sklearn.cross_validation import cross_val_score 45 | from sklearn.cross_validation import KFold 46 | from sklearn.preprocessing import LabelEncoder 47 | from sklearn.pipeline import Pipeline 48 | ``` 49 | 50 | #### 10.3 指定随机数种子 51 | 52 | 我们指定一个随机数种子,这样重复运行的结果会一致,以便复现随机梯度下降的结果: 53 | 54 | ```python 55 | # fix random seed for reproducibility 56 | seed = 7 57 | numpy.random.seed(seed) 58 | ``` 59 | 60 | #### 10.4 导入数据 61 | 62 | 数据可以直接导入。因为数据包含字符,用pandas更容易。然后可以将数据的属性(列)分成输入变量(X)和输出变量(Y): 63 | 64 | ```python 65 | # load dataset 66 | dataframe = pandas.read_csv("iris.csv", header=None) 67 | dataset = dataframe.values 68 | X = dataset[:,0:4].astype(float) 69 | Y = dataset[:,4] 70 | ``` 71 | 72 | #### 10.5 输出变量编码 73 | 74 | 数据的类型是字符串:在使用神经网络时应该将类别编码成矩阵,每行每列代表所属类别。可以使用独热编码,或者加入一列。这个数据中有3个类别:```Iris-setosa```、```Iris-versicolor```和```Iris-virginica```。如果数据是 75 | 76 | ```python 77 | Iris-setosa 78 | Iris-versicolor 79 | Iris-virginica 80 | ``` 81 | 82 | 用独热编码可以编码成这种矩阵: 83 | 84 | ```python 85 | Iris-setosa, Iris-versicolor, Iris-virginica 1, 0, 0 86 | 0, 1, 0 87 | 0, 0, 1 88 | ``` 89 | 90 | scikit-learn的```LabelEncoder```可以将类别变成数字,然后用Keras的```to_categorical()```函数编码: 91 | 92 | ```python 93 | # encode class values as integers 94 | encoder = LabelEncoder() 95 | encoder.fit(Y) 96 | encoded_Y = encoder.transform(Y) 97 | # convert integers to dummy variables (i.e. one hot encoded) 98 | dummy_y = np_utils.to_categorical(encoded_Y) 99 | ``` 100 | 101 | #### 10.6 设计神经网络 102 | 103 | Keras提供了```KerasClassifier```,可以将网络封装,在scikit-learn上用。```KerasClassifier```的初始化变量是模型名称,返回供训练的神经网络模型。 104 | 105 | 我们写一个函数,为鸢尾花分类问题创建一个神经网络:这个全连接网络只有1个带有4个神经元的隐层,和输入的变量数相同。为了效果,隐层使用整流函数作为激活函数。因为我们用了独热编码,网络的输出必须是3个变量,每个变量代表一种花,最大的变量代表预测种类。网络的结构是: 106 | 107 | ```python 108 | 4个神经元 输入层 -> [4个神经元 隐层] -> 3个神经元 输出层 109 | ``` 110 | 111 | 输出层的函数是S型函数,把可能性映射到概率的0到1。优化算法选择ADAM随机梯度下降,损失函数是对数函数,在Keras中叫```categorical_crossentropy```: 112 | 113 | ```python 114 | # define baseline model 115 | def baseline_model(): 116 | # create model 117 | model = Sequential() 118 | model.add(Dense(4, input_dim=4, init='normal', activation='relu')) model.add(Dense(3, init='normal', activation='sigmoid')) 119 | # Compile model 120 | model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model 121 | ``` 122 | 123 | 可以用这个模型创建```KerasClassifier```,也可以传入其他参数,这些参数会传递到```fit()```函数中。我们将训练次数```nb_epoch```设成150,批尺寸```batch_size```设成5,```verbose```设成0以关闭调试信息: 124 | 125 | ```python 126 | estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0) 127 | ``` 128 | 129 | #### 10.7 用K折交叉检验测试模型 130 | 131 | 现在可以测试模型效果了。scikit-learn有很多种办法可以测试模型,其中最重要的就是K折检验。我们先设定模型的测试方法:K设为10(默认值很好),在分割前随机重排数据: 132 | 133 | ```python 134 | kfold = KFold(n=len(X), n_folds=10, shuffle=True, random_state=seed) 135 | ``` 136 | 137 | 这样我们就可以在数据集(```X```和```dummy_y```)上用10折交叉检验(```kfold```)测试性能了。模型需要10秒钟就可以跑完,每次检验输出结果: 138 | 139 | ```python 140 | results = cross_val_score(estimator, X, dummy_y, cv=kfold) 141 | print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 142 | ``` 143 | 144 | 输出结果的均值和标准差,这样可以验证模型的预测能力,效果拔群: 145 | 146 | ```python 147 | Baseline: 95.33% (4.27%) 148 | ``` 149 | 150 | #### 10.8 总结 151 | 152 | 本章关于使用Keras开发深度学习项目。总结一下: 153 | 154 | - 如何导入数据 155 | - 如何使用独热编码处理多类分类数据 156 | - 如何与scikit-learn一同使用Keras 157 | - 如何用Keras定义多类分类神经网络 158 | - 如何用scikit-learn通过K折交叉检验测试Keras的模型 159 | 160 | 161 | #### 10.8.1 下一章 162 | 163 | 本章完整描述了Keras项目的开发:下一章我们开发一个二分类网络,并调优。 164 | 165 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch11-project-binary-classification-of-sonar-returns.md: -------------------------------------------------------------------------------- 1 | ### 第11章 项目:声呐返回值分类 2 | 3 | 本章我们使用Keras开发一个二分类网络。本章包括: 4 | 5 | - 将数据导入Keras 6 | - 为表格数据定义并训练模型 7 | - 在未知数据上测试Keras模型的性能 8 | - 处理数据以提高准确率 9 | - 调整Keras模型的拓扑和配置 10 | 11 | 我们开始吧。 12 | 13 | #### 11.1 声呐物体分类数据 14 | 15 | 本章使用声呐数据,包括声呐在不同物体的返回。数据有60个变量,代表不同角度的返回值。目标是将石头和金属筒(矿石)分开。 16 | 17 | 所有的数据都是连续的,从0到1;输出变量中M代表矿石,R代表石头,需要转换为1和0。数据集有208条数据,在本书的data目录下,也可以自行下载,重命名为```sonar.csv```。 18 | 19 | 此数据集可以作为性能测试标准:我们知道什么程度的准确率代表模型是优秀的。交叉检验后,一般的网络可以达到84%的准确率,最高可以达到88%。关于数据集详情,请到UCI机器学习网站查看。 20 | 21 | #### 11.2 简单的神经网络 22 | 23 | 先创建一个简单的神经网络试试看。导入所有的库和函数: 24 | 25 | ```python 26 | import numpy 27 | import pandas 28 | from keras.models import Sequential 29 | from keras.layers import Dense 30 | from keras.wrappers.scikit_learn import KerasClassifier 31 | from sklearn.cross_validation import cross_val_score 32 | from sklearn.preprocessing import LabelEncoder 33 | from sklearn.cross_validation import StratifiedKFold 34 | from sklearn.preprocessing import StandardScaler 35 | from sklearn.pipeline import Pipeline 36 | ``` 37 | 38 | 初始化随机数种子,这样每次的结果都一样,帮助debug: 39 | 40 | ```python 41 | # fix random seed for reproducibility 42 | seed = 7 43 | numpy.random.seed(seed) 44 | ``` 45 | 46 | 用pandas读入数据:前60列是输入变量(X),最后一列是输出变量(Y)。pandas处理带字符的数据比NumPy更容易。 47 | 48 | ```python 49 | # load dataset 50 | dataframe = pandas.read_csv("sonar.csv", header=None) 51 | dataset = dataframe.values 52 | # split into input (X) and output (Y) variables 53 | X = dataset[:,0:60].astype(float) 54 | Y = dataset[:,60] 55 | ``` 56 | 57 | 输出变量现在是字符串:需要编码成数字0和1。scikit-learn的```LabelEncoder```可以做到:先将数据用```fit()```方法导入,然后用```transform()```函数编码,加入一列: 58 | 59 | ```python 60 | # encode class values as integers 61 | encoder = LabelEncoder() 62 | encoder.fit(Y) 63 | encoded_Y = encoder.transform(Y) 64 | ``` 65 | 66 | 现在可以用Keras创建神经网络模型了。我们用scikit-learn进行随机K折验证,测试模型效果。Keras的模型用```KerasClassifier```封装后可以在scikit-learn中调用,取的变量建立的模型;其他变量会传入```fit()```方法中,例如训练次数和批尺寸。我们写一个函数创建这个模型:只有一个全连接层,神经元数量和输入变量数一样,作为最基础的模型。 67 | 68 | 模型的权重是比较小的高斯随机数,激活函数是整流函数,输出层只有一个神经元,激活函数是S型函数,代表某个类的概率。损失函数还是对数损失函数(```binary_crossentropy```),这个函数适用于二分类问题。优化算法是Adam随机梯度下降,每轮收集模型的准确率。 69 | 70 | ```python 71 | # baseline model 72 | def create_baseline(): 73 | # create model 74 | model = Sequential() 75 | model.add(Dense(60, input_dim=60, init='normal', activation='relu')) model.add(Dense(1, init='normal', activation='sigmoid')) 76 | # Compile model 77 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) return model 78 | ``` 79 | 80 | 用scikit-learn测试一下模型。向```KerasClassifier```传入训练次数(默认值),关闭日志: 81 | 82 | ```python 83 | # evaluate model with standardized dataset 84 | estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0) 85 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 86 | results = cross_val_score(estimator, X, encoded_Y, cv=kfold) 87 | print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 88 | ``` 89 | 90 | 输出内容是测试的平均数和标准差。 91 | 92 | ```python 93 | Baseline: 81.68% (5.67%) 94 | ``` 95 | 96 | 不用很累很麻烦效果也可以很好。 97 | 98 | #### 11.3 预处理数据以增加性能 99 | 100 | 预处理数据是个好习惯。神经网络喜欢输入类型的比例和分布一致,为了达到这点可以使用正则化,让数据的平均值是0,标准差是1,这样可以保留数据的分布情况。 101 | 102 | scikit-learn的```StandardScaler```可以做到这点。不应该在整个数据集上直接应用正则化:应该只在测试数据上交叉验证时进行正则化处理,使正则化成为交叉验证的一环,让模型没有新数据的先验知识,防止模型发散。 103 | 104 | scikit-learn的```Pipeline```可以直接做到这些。我们先定义一个```StandardScaler```,然后进行验证: 105 | 106 | ```python 107 | # evaluate baseline model with standardized dataset 108 | numpy.random.seed(seed) 109 | estimators = [] 110 | estimators.append(('standardize', StandardScaler())) 111 | estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, nb_epoch=100, 112 | batch_size=5, verbose=0))) 113 | pipeline = Pipeline(estimators) 114 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 115 | results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 116 | print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 117 | ``` 118 | 119 | 结果如下,平均效果有一点进步。 120 | 121 | ```python 122 | Standardized: 84.07% (6.23%) 123 | ``` 124 | 125 | #### 11.4 调整模型的拓扑和神经元 126 | 127 | 神经网络有很多参数,例如初始化权重、激活函数、优化算法等等。我们一直没有说到调整网络的拓扑结构:扩大或缩小网络。我们试验一下: 128 | 129 | ##### 11.4.1 缩小网络 130 | 131 | 有可能数据中有冗余:原始数据是不同角度的信号,有可能其中某些角度有相关性。我们把第一层隐层缩小一些,强行提取特征试试。 132 | 133 | 我们把之前的模型隐层的60个神经元减半到30个,这样神经网络需要挑选最重要的信息。之前的正则化有效果:我们也一并做一下. 134 | 135 | ```python 136 | # smaller model 137 | def create_smaller(): 138 | # create model 139 | model = Sequential() 140 | model.add(Dense(30, input_dim=60, init='normal', activation='relu')) model.add(Dense(1, init='normal', activation='sigmoid')) 141 | # Compile model 142 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) return model 143 | numpy.random.seed(seed) 144 | estimators = [] 145 | estimators.append(('standardize', StandardScaler())) 146 | estimators.append(('mlp', KerasClassifier(build_fn=create_smaller, nb_epoch=100, 147 | batch_size=5, verbose=0))) 148 | pipeline = Pipeline(estimators) 149 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 150 | results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 151 | print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 152 | ``` 153 | 154 | 结果如下。平均值有少许提升,方差减少很多:这么做果然有效,因为这次的训练时间只需要之前的一半! 155 | 156 | ```python 157 | Smaller: 84.61% (4.65%) 158 | ``` 159 | 160 | ##### 11.4.2 扩大网络 161 | 162 | 扩大网络后,神经网络更有可能提取关键特征,以非线性方式组合。我们对之前的网络简单修改一下:在原来的隐层后加入一层30个神经元的隐层。现在的网络是: 163 | 164 | ```python 165 | 60 inputs -> [60 -> 30] -> 1 output 166 | ``` 167 | 168 | 我们希望在缩减信息前可以对所有的变量建模,和缩小网络时的想法类似。这次我们加一层,帮助网络挑选信息: 169 | 170 | ```python 171 | # larger model 172 | def create_larger(): 173 | # create model 174 | model = Sequential() 175 | model.add(Dense(60, input_dim=60, init='normal', activation='relu')) model.add(Dense(30, init='normal', activation='relu')) model.add(Dense(1, init='normal', activation='sigmoid')) 176 | # Compile model 177 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) return model 178 | numpy.random.seed(seed) 179 | estimators = [] 180 | estimators.append(('standardize', StandardScaler())) 181 | estimators.append(('mlp', KerasClassifier(build_fn=create_larger, nb_epoch=100, 182 | batch_size=5, verbose=0))) 183 | pipeline = Pipeline(estimators) 184 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 185 | results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 186 | print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 187 | ``` 188 | 189 | 这次的结果好了很多,几乎达到业界最优。 190 | 191 | ```python 192 | Larger: 86.47% (3.82%) 193 | ``` 194 | 195 | 继续微调网络的结果会更好。你能做到如何? 196 | 197 | #### 11.5 总结 198 | 199 | 本章关于使用Keras开发二分类深度学习项目。总结一下: 200 | 201 | - 如何导入数据 202 | - 如何创建基准模型 203 | - 如何用scikit-learn通过K折随机交叉检验测试Keras的模型 204 | - 如何预处理数据 205 | - 如何微调网络 206 | 207 | ##### 11.5.1 下一章 208 | 209 | 多分类和二分类介绍完了:下一章是回归问题。 210 | 211 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch12-project-regression-of-boston-house-prices.md: -------------------------------------------------------------------------------- 1 | ### 第12章 项目:波士顿住房价格回归 2 | 3 | 本章关于如何使用Keras和社交网络解决回归问题。本章将: 4 | 5 | - 导入CSV数据 6 | - 创建回归问题的神经网络模型 7 | - 使用scikit-learn对Keras的模型进行交叉验证 8 | - 预处理数据以增加效果 9 | - 微调网络参数 10 | 11 | 我们开始吧。 12 | 13 | #### 12.1 波士顿住房价格数据 14 | 15 | 本章我们研究波士顿住房价格数据集,即波士顿地区的住房信息。我们关心的是住房价格,单位是千美金:所以,这个问题是回归问题。数据有13个输入变量,代表房屋不同的属性: 16 | 17 | 1. CRIM:人均犯罪率 18 | 2. ZN:25,000平方英尺以上民用土地的比例 19 | 3. INDUS:城镇非零售业商用土地比例 20 | 4. CHAS:是否邻近查尔斯河,1是邻近,0是不邻近 21 | 5. NOX:一氧化氮浓度(千万分之一) 22 | 6. RM:住宅的平均房间数 23 | 7. AGE:自住且建于1940年前的房屋比例 24 | 8. DIS:到5个波士顿就业中心的加权距离 25 | 9. RAD:到高速公路的便捷度指数 26 | 10. TAX:每万元的房产税率 27 | 11. PTRATIO:城镇学生教师比例 28 | 12. B: 1000(Bk − 0.63)2 其中Bk是城镇中黑人比例 29 | 13. LSTAT:低收入人群比例 30 | 14. MEDV:自住房中位数价格,单位是千元 31 | 32 | 这个问题已经被深入研究过,所有的数据都是数字。数据的前5行是: 33 | 34 | ```python 35 | 0.00632 18.00 2.310 0 0.5380 6.5750 65.20 4.0900 1 296.0 15.30 396.90 4.98 24.00 36 | 0.02731 0.00 7.070 0 0.4690 6.4210 78.90 4.9671 2 242.0 17.80 396.90 9.14 21.60 37 | 0.02729 0.00 7.070 0 0.4690 7.1850 61.10 4.9671 2 242.0 17.80 392.83 4.03 34.70 38 | 0.03237 0.00 2.180 0 0.4580 6.9980 45.80 6.0622 3 222.0 18.70 394.63 2.94 33.40 39 | 0.06905 0.00 2.180 0 0.4580 7.1470 54.20 6.0622 3 222.0 18.70 396.90 5.33 36.20 40 | ``` 41 | 42 | 数据在本书的data目录下,也可以自行下载,重命名为```housing.csv```。普通模型的均方误差(MSE)大约是20,和方差(SSE)是$4,500美金。关于数据集详情,请到UCI机器学习网站查看。 43 | 44 | 45 | #### 12.2 简单的神经网络 46 | 47 | 先创建一个简单的回归神经网络。导入所有的库和函数: 48 | 49 | ```python 50 | import numpy 51 | import pandas 52 | from keras.models import Sequential 53 | from keras.layers import Dense 54 | from keras.wrappers.scikit_learn import KerasRegressor 55 | from sklearn.cross_validation import cross_val_score 56 | from sklearn.cross_validation import KFold 57 | from sklearn.preprocessing import StandardScaler 58 | from sklearn.pipeline import Pipeline 59 | ``` 60 | 61 | 源文件是CSV格式,分隔符是空格:可以用pandas导入,然后分成输入(X)和输出(Y)变量。 62 | 63 | ```python 64 | # load dataset 65 | dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None) 66 | dataset = dataframe.values 67 | # split into input (X) and output (Y) variables 68 | X = dataset[:,0:13] 69 | Y = dataset[:,13] 70 | ``` 71 | 72 | Keras可以把模型封装好,交给scikit-learn使用,方便测试模型。我们写一个函数,创建神经网络。 73 | 74 | 代码如下。有一个全连接层,神经元数量和输入变量数一致(13),激活函数还是整流函数。输出层没有激活函数,因为在回归问题中我们希望直接取结果。 75 | 76 | 优化函数是Adam,损失函数是MSE,和我们要优化的函数一致:这样可以对模型的预测有直观的理解,因为MSE乘方就是千美元计的误差。 77 | 78 | ```python 79 | # define base mode 80 | def baseline_model(): 81 | # create model 82 | model = Sequential() 83 | model.add(Dense(13, input_dim=13, init='normal', activation='relu')) model.add(Dense(1, init='normal')) 84 | # Compile model 85 | model.compile(loss='mean_squared_error', optimizer='adam') return model 86 | ``` 87 | 88 | 使用```KerasRegressor```封装这个模型,任何其他的变量都会传入```fit()```函数中,例如训练次数和批次大小,这里我们取默认值。老规矩,为了可以复现结果,指定一下随机数种子: 89 | 90 | ```python 91 | # fix random seed for reproducibility 92 | seed = 7 93 | numpy.random.seed(seed) 94 | # evaluate model with standardized dataset 95 | estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0) 96 | ``` 97 | 98 | 可以测试一下基准模型的结果了:用10折交叉检验看看。 99 | 100 | ```python 101 | kfold = KFold(n=len(X), n_folds=10, random_state=seed) 102 | results = cross_val_score(estimator, X, Y, cv=kfold) 103 | print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std())) 104 | ``` 105 | 106 | 结果是10次检验的误差均值和标准差。 107 | 108 | ```python 109 | Results: 38.04 (28.15) MSE 110 | ``` 111 | 112 | #### 12.3 预处理数据以增加性能 113 | 114 | 这个数据集的特点是变量的尺度不一致,所以标准化很有用。 115 | 116 | scikit-learn的```Pipeline```可以直接进行均一化处理并交叉检验,这样模型不会预先知道新的数据。代码如下: 117 | 118 | ```python 119 | # evaluate model with standardized dataset 120 | numpy.random.seed(seed) 121 | estimators = [] 122 | estimators.append(('standardize', StandardScaler())) 123 | estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=50, 124 | batch_size=5, verbose=0))) 125 | pipeline = Pipeline(estimators) 126 | kfold = KFold(n=len(X), n_folds=10, random_state=seed) 127 | results = cross_val_score(pipeline, X, Y, cv=kfold) 128 | print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std())) 129 | ``` 130 | 131 | 效果直接好了一万刀: 132 | 133 | ```python 134 | Standardized: 28.24 (26.25) MSE 135 | ``` 136 | 137 | 也可以将数据标准化,在最后一层用S型函数作为激活函数,将比例拉到一样。 138 | 139 | #### 12.4 调整模型的拓扑 140 | 141 | 神经网络有很多可调的参数:最可玩的是网络的结构。这次我们用一个更深的和一个更宽的模型试试。 142 | 143 | 144 | ##### 12.4.1 更深的模型 145 | 146 | 增加神经网络的层数可以提高效果,这样模型可以提取并组合更多的特征。我们试着加几层隐层:加几句话就行。代码从上面复制下来,在第一层后加一层隐层,神经元数量是上层的一半: 147 | 148 | ```python 149 | def larger_model(): 150 | # create model 151 | model = Sequential() 152 | model.add(Dense(13, input_dim=13, init='normal', activation='relu')) model.add(Dense(6, init='normal', activation='relu')) model.add(Dense(1, init='normal')) 153 | # Compile model 154 | model.compile(loss='mean_squared_error', optimizer='adam') return model 155 | ``` 156 | 157 | 这样的结构是: 158 | 159 | ```python 160 | 13 inputs -> [13 -> 6] -> 1 output 161 | ``` 162 | 163 | 测试的方法一样,数据正则化一下: 164 | 165 | ```python 166 | numpy.random.seed(seed) 167 | estimators = [] 168 | estimators.append(('standardize', StandardScaler())) 169 | estimators.append(('mlp', KerasRegressor(build_fn=larger_model, nb_epoch=50, batch_size=5, 170 | verbose=0))) 171 | pipeline = Pipeline(estimators) 172 | kfold = KFold(n=len(X), n_folds=10, random_state=seed) 173 | results = cross_val_score(pipeline, X, Y, cv=kfold) 174 | print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std())) 175 | ``` 176 | 177 | 效果好了一点,MSE从28变成24: 178 | 179 | ```python 180 | Larger: 24.60 (25.65) MSE 181 | ``` 182 | 183 | ##### 12.4.1 更宽的模型 184 | 185 | 加宽模型可以增加网络容量。我们减去一层,把隐层的神经元数量加大,从13加到20: 186 | 187 | ```python 188 | def wider_model(): 189 | # create model 190 | model = Sequential() 191 | model.add(Dense(20, input_dim=13, init='normal', activation='relu')) model.add(Dense(1, init='normal')) 192 | # Compile model 193 | model.compile(loss='mean_squared_error', optimizer='adam') return model 194 | ``` 195 | 196 | 网络的结构是: 197 | 198 | ```python 199 | 13 inputs -> [20] -> 1 output 200 | ``` 201 | 202 | 跑一下试试: 203 | 204 | ```python 205 | numpy.random.seed(seed) 206 | estimators = [] 207 | estimators.append(('standardize', StandardScaler())) 208 | estimators.append(('mlp', KerasRegressor(build_fn=wider_model, nb_epoch=100, batch_size=5, 209 | verbose=0))) 210 | pipeline = Pipeline(estimators) 211 | kfold = KFold(n=len(X), n_folds=10, random_state=seed) 212 | results = cross_val_score(pipeline, X, Y, cv=kfold) 213 | print("Wider: %.2f (%.2f) MSE" % (results.mean(), results.std())) 214 | ``` 215 | 216 | MSE下降到21,效果不错了。 217 | 218 | ```python 219 | Wider: 21.64 (23.75) MSE 220 | ``` 221 | 222 | 很难想到,加宽模型比加深模型效果更好:这就是欧皇的力量。 223 | 224 | 225 | #### 12.5 总结 226 | 227 | 本章关于使用Keras开发回归深度学习项目。总结一下: 228 | 229 | - 如何导入数据 230 | - 如何预处理数据提高性能 231 | - 如何调整网络结构提高性能 232 | 233 | ##### 12.5.1 下一章 234 | 235 | 第三部分到此结束:你可以处理一般的机器学习问题了。下一章我们用一些奇技淫巧,使用一些Keras的高级API。 236 | 237 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch6-crash-course-in-multi-layer-perceptrons.md: -------------------------------------------------------------------------------- 1 | ### 第6章 多层感知器入门 2 | 3 | 神经网络很神奇,但是一开始学起来很痛苦,涉及大量术语和算法。本章主要介绍多层感知器的术语和使用方法。本章将: 4 | 5 | - 介绍神经网络的神经元、权重和激活函数 6 | - 如何使用构建块建立网络 7 | - 如何训练网络 8 | 9 | 我们开始吧。 10 | 11 | #### 6.1 绪论 12 | 13 | 本节课的内容很多: 14 | 15 | - 多层感知器 16 | - 神经元、权重和激活函数 17 | - 神经元网络 18 | - 网络训练 19 | 20 | 我们从多层感知器开始谈起。 21 | 22 | #### 6.2 多层感知器 23 | 24 | (译者注:本书中的“神经网络”一般指“人工神经网络”) 25 | 26 | 在一般的语境中,```人工神经网络```一般指```神经网络```,或者,```多层感知器```。感知器是简单的神经元模型,大型神经网络的前体。这个领域主要研究大脑如何通过简单的生物学结构解决复杂的计算问题,例如,进行预测。最终的目标不是要建立大脑的真正模型,而是发掘可以解决复杂问题的算法。 27 | 28 | 神经网络的能力来自它可以从输入数据中学习,对未来进行预测:在这个意义上说,神经网络学习了一种对应关系。数学上说,这种能力是一种有普适性的近似算法。神经网络的预测能力来自网络的分层或多层结构:这种结构可以找出不同尺度或分辨率下的不同特征,将其组合成更高级别的特征。例如,从线条到线条的集合到形状。 29 | 30 | #### 6.3 神经元 31 | 32 | 神经网络由人工神经元组成:这些神经元有计算能力,使用激活函数,利用输入和权重,输出一个标量。 33 | 34 | ![6.1 简单的神经元](https://i.imgur.com/Lwy7Hpy.png) 35 | 36 | ##### 6.3.1 神经元权重 37 | 38 | 线性回归的权重和这里的权重类似:每个神经元也有一个误差项,永远是1.0,必须被加权。例如,一个神经元有2个输入值,那就需要3个权重项:一个输入一个权重,加上一个误差项的权重。 39 | 40 | 权重项的初始值一般是小随机数,例如,0~0.3;也有更复杂的初始化方法。和线性回归一样,权重越大代表网络越复杂,越不稳定。我们希望让权重变小,为此可以使用正则化。 41 | 42 | ##### 6.3.2 激活函数 43 | 44 | 神经元的所有输入都被加权求和,输入激活函数中。激活函数就是输入的加权求和值到信号输出的映射。激活函数得名于其功能:控制激活哪个神经元,以及输出信号强度。历史上的激活函数是个阈值:例如,输入加权求和超过0.5,则输出1;反之输出0.0。 45 | 46 | 激活函数一般使用非线性函数,这样输入的组合方式可以更复杂,提供更多功能。非线性函数可以输出一个分布:例如,逻辑函数(也称为S型函数)输出一个0到1之间的S形分布,正切函数可以输出一个-1到1之间的S形分布。最近的研究表明,线性整流函数的效果更好。 47 | 48 | #### 6.4 神经元网络 49 | 50 | 神经元可以组成网络:每行的神经元叫做一层,一个神经网络可以有很多层。神经网络的结构叫做网络拓扑。 51 | 52 | ![6.2 简单的神经网络](https://i.imgur.com/3UxEXFE.png) 53 | 54 | ##### 6.4.1 输入层 55 | 56 | 神经网络最底下的那层叫输入层,因为直接和数据连接。一般的节点数是数据的列数。这层的神经元只将数据传输到下一层。 57 | 58 | ##### 6.4.2 隐层 59 | 60 | 输入节点后的层叫隐层,因为不直接和外界相连。最简单的网络中,隐层只有一个神经元,直接输出结果。随着算力增加,现在可以训练很复杂,层数很高的神经网络:历史上需要几辈子才能训练的网络,现在有可能几分钟就能训练好。 61 | 62 | ##### 6.4.3 输出层 63 | 64 | 神经网络的最后一层叫输出层,输出问题需要的值。这层的激活函数由问题的类型而定: 65 | 66 | - 简单的回归问题:有可能只有一个神经元,没有激活函数 67 | - 两项的分类问题:有可能只有一个神经元,激活函数是S型函数,输出一个0到1之间的概率,代表主类别的概率。也可以用0.5作为阈值:低于0.5输出0,大于0.5输出1. 68 | - 多项的分类问题:有可能有多个神经元,每个代表一类(例如,3个神经元,代表3种不同的鸢尾花 - 这是个经典问题)。激活函数可以使用Softmax函数,每个输出代表是某个类别的概率。最有可能的类别就是输出最高的那组。 69 | 70 | #### 6.5 网络训练 71 | 72 | ##### 6.5.1 预备数据 73 | 74 | 预处理一下数据:数据必须是数值,例如,实数。如果某一项是类别,需要通过独热编码将其变成数字:对于N种可能的类别,加入N列,对其取0或1代表是否属于该类别。 75 | 76 | 独热编码也可以对多个类别进行编码:建立一个二进制向量表示类别,输出的结果可以对类别进行分类。神经网络需要所有的数据单位差不多:例如,将所有的数据缩放到0和1之间,这步叫归一化。或者,对数据进行缩放(正则化),让每列的平均值为0,标准差为1。图像的像素数据也应该这样处理。文字输入可以转化为数字,例如某个单词出现的频率,或者用其他的什么办法转化。 77 | 78 | ##### 6.5.2 随机梯度下降 79 | 80 | 随机梯度下降很经典,现在还是很流行。使用的方式是正向传递:每次对网络输入一行数据,激活每层神经元,得出一个输出值。对数据进行预测也用这种方式。 81 | 82 | 我们把输出和预计值进行比较,算出误差;这个错误通过网络反向传播,更新权重数据。这个算法叫反向传播算法。我们在所有的训练数据上重复此过程,每次网络全部更新叫一轮。神经网络可以训练几十乃至成千上万轮。 83 | 84 | ##### 6.5.3 权重更新 85 | 86 | 神经网络的权重可以每次训练都更新,这种方式叫在线更新,速度很快但是有可能造成灾难性结果。或者也可以保存误差数据,最后只更新一次:这种更新叫批量更新,一般而言更稳妥。 87 | 88 | 因为数据集有可能很大,为了计算速度,每次更新的数据量一般不大,只有几十到几百个数据。权重的更新数量由学习速率(步长)这个参数控制,规定神经网络针对错误的更新速度。这个参数一般很小,0.1或者0.01,乃至更小。也可以调整其他参数: 89 | 90 | - 动量:如果上次和这次的方向一样,则加速变化,即使这次的错误不那么大。用于加速网络训练。 91 | - 学习速率衰减:随着训练次数增加而减少学习速率。在一开始加速训练,后面微调参数。 92 | 93 | ##### 6.5.4 进行预测 94 | 95 | 训练好的神经网络就可以进行预测了。可以使用测试数据进行测量,看看能不能预测新的数据;也可以部署网络,进行预测。网络只需要保存拓扑结构和最终的权重。将新的数据喂进去,经过前向传输,神经网络就会做出预测了。 96 | 97 | #### 6.6 总结 98 | 99 | 本章关于利用人工神经网络进行机器学习。总结一下: 100 | 101 | - 神经网络不是大脑的模型,而是计算模型,可以解决复杂的机器学习问题 102 | - 神经网络由带有权重和激活函数的神经元组成 103 | - 神经网络分层,用随机梯度下降训练 104 | - 应该预处理数据 105 | 106 | ##### 6.6.1 下一章 107 | 108 | 你已经了解了神经网络:下一章我们用Keras动手制作第一个神经网络。 109 | 110 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch7-develop-your-first-neural-network-with-keras.md: -------------------------------------------------------------------------------- 1 | ### 第7章 使用Keras开发神经网络 2 | 3 | Keras基于Python,开发深度学习模型很容易。Keras将Theano和TensorFlow的数值计算封装好,几句话就可以配置并训练神经网络。本章开始使用Keras开发神经网络。本章将: 4 | 5 | - 将CSV数据读入Keras 6 | - 用Keras配置并编译多层感知器模型 7 | - 用验证数据集验证Keras模型 8 | 9 | 我们开始吧。 10 | 11 | #### 7.1 简介 12 | 13 | 虽然代码量不大,但是我们还是慢慢来。大体分几步: 14 | 15 | 1. 导入数据 16 | 2. 定义模型 17 | 3. 编译模型 18 | 4. 训练模型 19 | 5. 测试模型 20 | 6. 写出程序 21 | 22 | #### 7.2 皮马人糖尿病数据集 23 | 24 | 我们使用皮马人糖尿病数据集(Pima Indians onset of diabetes),在UCI的机器学习网站可以免费下载。数据集的内容是皮马人的医疗记录,以及过去5年内是否有糖尿病。所有的数据都是数字,问题是(是否有糖尿病是1或0),是二分类问题。数据的数量级不同,有8个属性: 25 | 26 | 1. 怀孕次数 27 | 2. 2小时口服葡萄糖耐量试验中的血浆葡萄糖浓度 28 | 3. 舒张压(毫米汞柱) 29 | 4. 2小时血清胰岛素(mu U/ml) 30 | 5. 体重指数(BMI) 31 | 6. 糖尿病血系功能 32 | 7. 年龄(年) 33 | 8. 类别:过去5年内是否有糖尿病 34 | 35 | 所有的数据都是数字,可以直接导入Keras。本书后面也会用到这个数据集。数据有768行,前5行的样本长这样: 36 | 37 | ``` 38 | 6,148,72,35,0,33.6,0.627,50,1 39 | 1,85,66,29,0,26.6,0.351,31,0 40 | 8,183,64,0,0,23.3,0.672,32,1 41 | 1,89,66,23,94,28.1,0.167,21,0 42 | 0,137,40,35,168,43.1,2.288,33,1 43 | ``` 44 | 45 | 数据在本书代码的```data``` 目录,也可以在UCI机器学习的网站下载。把数据和Python文件放在一起,改名: 46 | 47 | ``` 48 | pima-indians-diabetes.csv 49 | ``` 50 | 51 | 基准准确率是65.1%,在10次交叉验证中最高的正确率是77.7%。在UCI机器学习的网站可以得到数据集的更多资料。 52 | 53 | #### 7.3 导入资料 54 | 55 | 使用随机梯度下降时最好固定随机数种子,这样你的代码每次运行的结果都一致。这种做法在演示结果、比较算法或debug时特别有效。你可以随便选种子: 56 | 57 | ```python 58 | # fix random seed for reproducibility 59 | seed = 7 60 | numpy.random.seed(seed) 61 | ``` 62 | 63 | 现在导入皮马人数据集。NumPy的```loadtxt()```函数可以直接带入数据,输入变量是8个,输出1个。导入数据后,我们把数据分成输入和输出两组以便交叉检验: 64 | 65 | ```python 66 | # load pima indians dataset 67 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 68 | # split into input (X) and output (Y) variables 69 | X = dataset[:,0:8] 70 | Y = dataset[:,8] 71 | ``` 72 | 73 | 这样我们的数据每次结果都一致,可以定义模型了。 74 | 75 | #### 7.4 定义模型 76 | 77 | Keras的模型由层构成:我们建立一个```Sequential```模型,一层层加入神经元。第一步是确定输入层的数目正确:在创建模型时用```input_dim```参数确定。例如,有8个输入变量,就设成8。 78 | 79 | 隐层怎么设置?这个问题很难回答,需要慢慢试验。一般来说,如果网络够大,即使存在问题也不会有影响。这个例子里我们用3层全连接网络。 80 | 81 | 全连接层用```Dense```类定义:第一个参数是本层神经元个数,然后是初始化方式和激活函数。这里的初始化方法是0到0.05的连续型均匀分布(```uniform```),Keras的默认方法也是这个。也可以用高斯分布进行初始化(```normal```)。 82 | 83 | 前两层的激活函数是线性整流函数(```relu```),最后一层的激活函数是S型函数(```sigmoid```)。之前大家喜欢用S型和正切函数,但现在线性整流函数效果更好。为了保证输出是0到1的概率数字,最后一层的激活函数是S型函数,这样映射到0.5的阈值函数也容易。前两个隐层分别有12和8个神经元,最后一层是1个神经元(是否有糖尿病)。 84 | 85 | ```python 86 | # create model 87 | model = Sequential() 88 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) model.add(Dense(1, init='uniform', activation='sigmoid')) 89 | ``` 90 | 91 | 网络的结构如图: 92 | 93 | ![7.1 神经网络结构](https://i.imgur.com/6TQODzB.png) 94 | 95 | #### 7.5 编译模型 96 | 97 | 定义好的模型可以编译:Keras会调用Theano或者TensorFlow编译模型。后端会自动选择表示网络的最佳方法,配合你的硬件。这步需要定义几个新的参数。训练神经网络的意义是:找到最好的一组权重,解决问题。 98 | 99 | 我们需要定义损失函数和优化算法,以及需要收集的数据。我们使用```binary_crossentropy```,错误的对数作为损失函数;```adam```作为优化算法,因为这东西好用。想深入了解请查阅:Adam: A Method for Stochastic Optimization论文。因为这个问题是分类问题,我们收集每轮的准确率。 100 | 101 | #### 7.6 训练模型 102 | 103 | 终于开始训练了!调用模型的```fit()```方法即可开始训练。 104 | 105 | 网络按轮训练,通过```nb_epoch```参数控制。每次送入的数据(批尺寸)可以用```batch_size```参数控制。这里我们只跑150轮,每次10个数据。多试试就知道了。 106 | 107 | ```python 108 | # Fit the model 109 | model.fit(X, Y, nb_epoch=150, batch_size=10) 110 | ``` 111 | 112 | 现在CPU或GPU开始煎鸡蛋了。 113 | 114 | #### 7.7 测试模型 115 | 116 | 我们把测试数据拿出来检验一下模型的效果。注意这样不能测试在新数据的预测能力。应该将数据分成训练和测试集。 117 | 118 | 调用模型的```evaluation()```方法,传入训练时的数据。输出是平均值,包括平均误差和其他的数据,例如准确度。 119 | 120 | ```python 121 | # evaluate the model 122 | scores = model.evaluate(X, Y) 123 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 124 | ``` 125 | 126 | #### 7.8 写出程序 127 | 128 | 用Keras做机器学习就是这么简单。我们把代码放在一起: 129 | 130 | ```python 131 | # Create first network with Keras 132 | from keras.models import Sequential 133 | from keras.layers import Dense 134 | import numpy 135 | # fix random seed for reproducibility 136 | seed = 7 137 | numpy.random.seed(seed) 138 | # load pima indians dataset 139 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 140 | # split into input (X) and output (Y) variables 141 | X = dataset[:,0:8] 142 | Y = dataset[:,8] 143 | # create model 144 | model = Sequential() 145 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 146 | model.add(Dense(1, init='uniform', activation='sigmoid')) 147 | # Compile model 148 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model 149 | model.fit(X, Y, nb_epoch=150, batch_size=10) 150 | # evaluate the model 151 | scores = model.evaluate(X, Y) 152 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 153 | ``` 154 | 155 | 训练时每轮会输出一次损失和正确率,以及最终的效果。在我的CPU上用Theano大约跑10秒: 156 | 157 | ```python 158 | ... 159 | Epoch 143/150 160 | 768/768 [==============================] - 0s - loss: 0.4614 - acc: 0.7878 161 | Epoch 144/150 162 | 768/768 [==============================] - 0s - loss: 0.4508 - acc: 0.7969 163 | Epoch 145/150 164 | 768/768 [==============================] - 0s - loss: 0.4580 - acc: 0.7747 165 | Epoch 146/150 166 | 768/768 [==============================] - 0s - loss: 0.4627 - acc: 0.7812 167 | Epoch 147/150 168 | 768/768 [==============================] - 0s - loss: 0.4531 - acc: 0.7943 169 | Epoch 148/150 170 | 768/768 [==============================] - 0s - loss: 0.4656 - acc: 0.7734 171 | Epoch 149/150 172 | 768/768 [==============================] - 0s - loss: 0.4566 - acc: 0.7839 173 | Epoch 150/150 174 | 768/768 [==============================] - 0s - loss: 0.4593 - acc: 0.7839 175 | 768/768 [==============================] - 0s 176 | acc: 79.56% 177 | ``` 178 | #### 7.9 总结 179 | 180 | 本章关于利用Keras创建神经网络。总结一下: 181 | 182 | - 如何导入数据 183 | - 如何用Keras定义神经网络 184 | - 如何调用后端编译模型 185 | - 如何训练模型 186 | - 如何测试模型 187 | 188 | ##### 7.9.1 下一章 189 | 190 | 现在你已经知道如何如何用Keras开发神经网络:下一章讲讲如何在新的数据上进行测试。 191 | 192 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch8-evaluate-the-performance-of-deep-learning-models.md: -------------------------------------------------------------------------------- 1 | ### 第8章 测试神经网络 2 | 3 | 深度学习有很多参数要调:大部分都是拍脑袋的。所以测试特别重要:本章我们讨论几种测试方法。本章将: 4 | 5 | - 使用Keras进行自动验证 6 | - 使用Keras进行手工验证 7 | - 使用Keras进行K折交叉验证 8 | 9 | 我们开始吧。 10 | 11 | #### 8.1 口算神经网络 12 | 13 | 创建神经网络时有很多参数:很多时候可以从别人的网络上抄,但是最终还是需要一点点做实验。无论是网络的拓扑结构(层数、大小、每层类型)还是小参数(损失函数、激活函数、优化算法、训练次数)等。 14 | 15 | 一般深度学习的数据集都很大,数据有几十万乃至几亿个。所以测试方法至关重要。 16 | 17 | #### 8.2 分割数据 18 | 19 | 数据量大和网络复杂会造成训练时间很长,所以需要将数据分成训练、测试或验证数据集。Keras提供两种办法: 20 | 21 | 1. 自动验证 22 | 2. 手工验证 23 | 24 | ##### 8.2.1 自动验证 25 | 26 | Keras可以将数据自动分出一部分,每次训练后进行验证。在训练时用```validation_split```参数可以指定验证数据的比例,一般是总数据的20%或者33%。下面的代码在第七章上加入了自动验证: 27 | 28 | ```python 29 | # MLP with automatic validation set 30 | from keras.models import Sequential 31 | from keras.layers import Dense 32 | import numpy 33 | # fix random seed for reproducibility 34 | seed = 7 35 | numpy.random.seed(seed) 36 | # load pima indians dataset 37 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 38 | # split into input (X) and output (Y) variables 39 | X = dataset[:,0:8] 40 | Y = dataset[:,8] 41 | # create model 42 | model = Sequential() 43 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 44 | model.add(Dense(1, init='uniform', activation='sigmoid')) 45 | # Compile model 46 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model 47 | model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10) 48 | ``` 49 | 50 | 训练时,每轮会显示训练和测试数据的数据: 51 | 52 | ```python 53 | Epoch 145/150 54 | 514/514 [==============================] - 0s - loss: 0.4885 - acc: 0.7743 - val_loss: 55 | 0.5016 - val_acc: 0.7638 56 | Epoch 146/150 57 | 514/514 [==============================] - 0s - loss: 0.4862 - acc: 0.7704 - val_loss: 58 | 0.5202 - val_acc: 0.7323 59 | Epoch 147/150 60 | 514/514 [==============================] - 0s - loss: 0.4959 - acc: 0.7588 - val_loss: 61 | 0.5012 - val_acc: 0.7598 62 | Epoch 148/150 63 | 514/514 [==============================] - 0s - loss: 0.4966 - acc: 0.7665 - val_loss: 64 | 0.5244 - val_acc: 0.7520 65 | Epoch 149/150 66 | 514/514 [==============================] - 0s - loss: 0.4863 - acc: 0.7724 - val_loss: 67 | 0.5074 - val_acc: 0.7717 68 | Epoch 150/150 69 | 514/514 [==============================] - 0s - loss: 0.4884 - acc: 0.7724 - val_loss: 70 | 0.5462 - val_acc: 0.7205 71 | ``` 72 | 73 | ##### 8.2.2 手工验证 74 | 75 | Keras也可以手工进行验证。我们定义一个```train_test_split```函数,将数据分成2:1的测试和验证数据集。在调用```fit()```方法时需要加入```validation_data```参数作为验证数据,数组的项目分别是输入和输出数据。 76 | 77 | ```python 78 | # MLP with manual validation set 79 | from keras.models import Sequential 80 | from keras.layers import Dense 81 | from sklearn.cross_validation import train_test_split 82 | import numpy 83 | # fix random seed for reproducibility 84 | seed = 7 85 | numpy.random.seed(seed) 86 | # load pima indians dataset 87 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 88 | # split into input (X) and output (Y) variables 89 | X = dataset[:,0:8] 90 | Y = dataset[:,8] 91 | # split into 67% for train and 33% for test 92 | X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed) # create model 93 | model = Sequential() 94 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) 95 | model.add(Dense(8, init='uniform', activation='relu')) 96 | model.add(Dense(1, init='uniform', activation='sigmoid')) 97 | # Compile model 98 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 99 | # Fit the model 100 | model.fit(X_train, y_train, validation_data=(X_test,y_test), nb_epoch=150, batch_size=10) 101 | ``` 102 | 103 | 和自动化验证一样,每轮训练后,Keras会输出训练和验证结果: 104 | 105 | ```python 106 | ... 107 | Epoch 145/150 108 | 514/514 [==============================] - 0s - loss: 0.5001 - acc: 0.7685 - val_loss: 109 | 0.5617 - val_acc: 0.7087 110 | Epoch 146/150 111 | 514/514 [==============================] - 0s - loss: 0.5041 - acc: 0.7529 - val_loss: 112 | 0.5423 - val_acc: 0.7362 113 | Epoch 147/150 114 | 514/514 [==============================] - 0s - loss: 0.4936 - acc: 0.7685 - val_loss: 115 | 0.5426 - val_acc: 0.7283 116 | Epoch 148/150 117 | 514/514 [==============================] - 0s - loss: 0.4957 - acc: 0.7685 - val_loss: 118 | 0.5430 - val_acc: 0.7362 119 | Epoch 149/150 120 | 514/514 [==============================] - 0s - loss: 0.4953 - acc: 0.7685 - val_loss: 121 | 0.5403 - val_acc: 0.7323 122 | Epoch 150/150 123 | 514/514 [==============================] - 0s - loss: 0.4941 - acc: 0.7743 - val_loss: 124 | 0.5452 - val_acc: 0.7323 125 | ``` 126 | 127 | #### 8.3 手工K折交叉验证 128 | 129 | 机器学习的金科玉律是K折验证,以验证模型对未来数据的预测能力。K折验证的方法是:将数据分成K组,留下1组验证,其他数据用作训练,直到每种分发的性能一致。 130 | 131 | 深度学习一般不用交叉验证,因为对算力要求太高。例如,K折的次数一般是5或者10折:每组都需要训练并验证,训练时间成倍上升。然而,如果数据量小,交叉验证的效果更好,误差更小。 132 | 133 | scikit-learn有```StratifiedKFold```类,我们用它把数据分成10组。抽样方法是分层抽样,尽可能保证每组数据量一致。然后我们在每组上训练模型,使用```verbose=0```参数关闭每轮的输出。训练后,Keras会输出模型的性能,并存储模型。最终,Keras输出性能的平均值和标准差,为性能估算提供更准确的估计: 134 | 135 | ```python 136 | # MLP for Pima Indians Dataset with 10-fold cross validation 137 | from keras.models import Sequential 138 | from keras.layers import Dense 139 | from sklearn.cross_validation import StratifiedKFold 140 | import numpy 141 | # fix random seed for reproducibility 142 | seed = 7 143 | numpy.random.seed(seed) 144 | # load pima indians dataset 145 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 146 | # split into input (X) and output (Y) variables 147 | X = dataset[:,0:8] 148 | Y = dataset[:,8] 149 | # define 10-fold cross validation test harness 150 | kfold = StratifiedKFold(y=Y, n_folds=10, shuffle=True, random_state=seed) 151 | cvscores = [] 152 | for i, (train, test) in enumerate(kfold): 153 | # create model 154 | model = Sequential() 155 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 156 | model.add(Dense(1, init='uniform', activation='sigmoid')) 157 | # Compile model 158 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model 159 | model.fit(X[train], Y[train], nb_epoch=150, batch_size=10, verbose=0) 160 | # evaluate the model 161 | scores = model.evaluate(X[test], Y[test], verbose=0) 162 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) cvscores.append(scores[1] * 100) 163 | print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores))) 164 | ``` 165 | 166 | 输出是: 167 | 168 | ```python 169 | acc: 77.92% 170 | acc: 79.22% 171 | acc: 76.62% 172 | acc: 77.92% 173 | acc: 75.32% 174 | acc: 74.03% 175 | acc: 77.92% 176 | acc: 71.43% 177 | acc: 71.05% 178 | acc: 75.00% 179 | 75.64% (+/- 2.67%) 180 | ``` 181 | 182 | 每次循环都需要重新生成模型,使用对应的数据训练。下一章我们用scikit-learn直接使用Keras的模型。 183 | 184 | #### 8.4 总结 185 | 186 | 本章关于测试神经网络的性能。总结一下: 187 | 188 | - 如何自动将数据分成训练和测试组 189 | - 如何人工对数据分组 190 | - 如何使用K折法测试性能 191 | 192 | ##### 8.4.1 下一章 193 | 194 | 现在你已经知道如何如何测试神经网络的性能:下一章讲讲如何在scikit-learn中直接使用Keras的模型。 195 | 196 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/ch9-use-keras-models-with-scikit-learn-for-general-machine-learning.md: -------------------------------------------------------------------------------- 1 | ### 第9章 使用Scikit-Learn调用Keras的模型 2 | 3 | scikit-learn是最受欢迎的Python机器学习库。本章我们将使用scikit-learn调用Keras生成的模型。本章将: 4 | 5 | - 使用scikit-learn封装Keras的模型 6 | - 使用scikit-learn对Keras的模型进行交叉验证 7 | - 使用scikit-learn,利用网格搜索调整Keras模型的超参 8 | 9 | 我们开始吧。 10 | 11 | #### 9.1 简介 12 | 13 | Keras在深度学习很受欢迎,但是只能做深度学习:Keras是最小化的深度学习库,目标在于快速搭建深度学习模型。基于SciPy的scikit-learn,数值运算效率很高,适用于普遍的机器学习任务,提供很多机器学习工具,包括但不限于: 14 | 15 | - 使用K折验证模型 16 | - 快速搜索并测试超参 17 | 18 | Keras为scikit-learn封装了```KerasClassifier```和```KerasRegressor```。本章我们继续使用第7章的模型。 19 | 20 | #### 9.2 使用交叉验证检验深度学习模型 21 | 22 | Keras的```KerasClassifier```和```KerasRegressor```两个类接受```build_fn```参数,传入编译好的模型。我们加入```nb_epoch=150```和```batch_size=10```这两个参数:这两个参数会传入模型的```fit()```方法。我们用scikit-learn的```StratifiedKFold```类进行10折交叉验证,测试模型在未知数据的性能,并使用```cross_val_score()```函数检测模型,打印结果。 23 | 24 | ```python 25 | # MLP for Pima Indians Dataset with 10-fold cross validation via sklearn 26 | from keras.models import Sequential 27 | from keras.layers import Dense 28 | from keras.wrappers.scikit_learn import KerasClassifier 29 | from sklearn.cross_validation import StratifiedKFold 30 | from sklearn.cross_validation import cross_val_score 31 | import numpy 32 | import pandas 33 | # Function to create model, required for KerasClassifier 34 | def create_model(): 35 | # create model 36 | model = Sequential() 37 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) model.add(Dense(1, init='uniform', activation='sigmoid')) 38 | # Compile model 39 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) return model 40 | # fix random seed for reproducibility 41 | seed = 7 42 | numpy.random.seed(seed) 43 | # load pima indians dataset 44 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 45 | # split into input (X) and output (Y) variables 46 | X = dataset[:,0:8] 47 | Y = dataset[:,8] 48 | # create model 49 | model = KerasClassifier(build_fn=create_model, nb_epoch=150, batch_size=10) 50 | # evaluate using 10-fold cross validation 51 | kfold = StratifiedKFold(y=Y, n_folds=10, shuffle=True, random_state=seed) 52 | results = cross_val_score(model, X, Y, cv=kfold) 53 | print(results.mean()) 54 | ``` 55 | 56 | 每轮训练会输出一次结果,加上最终的平均性能: 57 | 58 | ```python 59 | ... 60 | Epoch 145/150 61 | 692/692 [==============================] - 0s - loss: 0.4671 - acc: 0.7803 62 | Epoch 146/150 63 | 692/692 [==============================] - 0s - loss: 0.4661 - acc: 0.7847 64 | Epoch 147/150 65 | 692/692 [==============================] - 0s - loss: 0.4581 - acc: 0.7803 66 | Epoch 148/150 67 | 692/692 [==============================] - 0s - loss: 0.4657 - acc: 0.7688 68 | Epoch 149/150 69 | 692/692 [==============================] - 0s - loss: 0.4660 - acc: 0.7659 70 | Epoch 150/150 71 | 692/692 [==============================] - 0s - loss: 0.4574 - acc: 0.7702 72 | 76/76 [==============================] - 0s 73 | 0.756442244065 74 | ``` 75 | 76 | 比起手工测试,使用scikit-learn容易的多。 77 | 78 | #### 9.3 使用网格搜索调整深度学习模型的参数 79 | 80 | 使用scikit-learn封装Keras的模型十分简单。进一步想:我们可以给```fit()```方法传入参数,```KerasClassifier```的```build_fn```方法也可以传入参数。可以利用这点进一步调整模型。 81 | 82 | 我们用网格搜索测试不同参数的性能:```create_model()```函数可以传入```optimizer```和```init```参数,虽然都有默认值。那么我们可以用不同的优化算法和初始权重调整网络。具体说,我们希望搜索: 83 | 84 | - 优化算法:搜索权重的方法 85 | - 初始权重:初始化不同的网络 86 | - 训练次数:对模型训练的次数 87 | - 批次大小:每次训练的数据量 88 | 89 | 所有的参数组成一个字典,传入scikit-learn的```GridSearchCV```类:```GridSearchCV```会对每组参数(2×3×3×3)进行训练,进行3折交叉检验。 90 | 91 | 计算量巨大:耗时巨长。如果模型小还可以取一部分数据试试。第7章的模型可以用,因为网络和数据集都不大(1000个数据内,9个参数)。最后scikit-learn会输出最好的参数和模型,以及平均值。 92 | 93 | ```python 94 | # MLP for Pima Indians Dataset with grid search via sklearn 95 | from keras.models import Sequential 96 | from keras.layers import Dense 97 | from keras.wrappers.scikit_learn import KerasClassifier 98 | from sklearn.grid_search import GridSearchCV 99 | import numpy 100 | import pandas 101 | # Function to create model, required for KerasClassifier 102 | def create_model(optimizer='rmsprop', init='glorot_uniform'): 103 | # create model 104 | model = Sequential() 105 | model.add(Dense(12, input_dim=8, init=init, activation='relu')) model.add(Dense(8, init=init, activation='relu')) model.add(Dense(1, init=init, activation='sigmoid')) 106 | # Compile model 107 | model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) return model 108 | # fix random seed for reproducibility 109 | seed = 7 110 | numpy.random.seed(seed) 111 | # load pima indians dataset 112 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 113 | # split into input (X) and output (Y) variables 114 | X = dataset[:,0:8] 115 | Y = dataset[:,8] 116 | # create model 117 | model = KerasClassifier(build_fn=create_model) 118 | # grid search epochs, batch size and optimizer 119 | optimizers = ['rmsprop', 'adam'] 120 | init = ['glorot_uniform', 'normal', 'uniform'] 121 | epochs = numpy.array([50, 100, 150]) 122 | batches = numpy.array([5, 10, 20]) 123 | param_grid = dict(optimizer=optimizers, nb_epoch=epochs, batch_size=batches, init=init) grid = GridSearchCV(estimator=model, param_grid=param_grid) 124 | grid_result = grid.fit(X, Y) 125 | # summarize results 126 | print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) 127 | for params, mean_score, scores in grid_result.grid_scores_: 128 | print("%f (%f) with: %r" % (scores.mean(), scores.std(), params)) 129 | ``` 130 | 131 | 用CPU差不多要5分钟,结果如下。我们发现使用均匀分布初始化,```rmsprop```优化算法,150轮,批尺寸为5时效果最好,正确率约75%: 132 | 133 | ```python 134 | Best: 0.751302 using {'init': 'uniform', 'optimizer': 'rmsprop', 'nb_epoch': 150, 'batch_size': 5} 135 | 0.653646 (0.031948) with: {'init': 'glorot_uniform', 'optimizer': 'rmsprop', 'nb_epoch': 50, 'batch_size': 5} 136 | 0.665365 (0.004872) with: {'init': 'glorot_uniform', 'optimizer': 'adam', 'nb_epoch': 50, 'batch_size': 5} 137 | 0.683594 (0.037603) with: {'init': 'glorot_uniform', 'optimizer': 'rmsprop', 'nb_epoch': 100, 'batch_size': 5} 138 | 0.709635 (0.034987) with: {'init': 'glorot_uniform', 'optimizer': 'adam', 'nb_epoch': 100, 'batch_size': 5} 139 | 0.699219 (0.009568) with: {'init': 'glorot_uniform', 'optimizer': 'rmsprop', 'nb_epoch': 150, 'batch_size': 5} 140 | 0.725260 (0.008027) with: {'init': 'glorot_uniform', 'optimizer': 'adam', 'nb_epoch': 150, 'batch_size': 5} 141 | 0.686198 (0.024774) with: {'init': 'normal', 'optimizer': 'rmsprop', 'nb_epoch': 50, 'batch_size': 5} 142 | 0.718750 (0.014616) with: {'init': 'normal', 'optimizer': 'adam', 'nb_epoch': 50, 'batch_size': 5} 143 | 0.725260 (0.028940) with: {'init': 'normal', 'optimizer': 'rmsprop', 'nb_epoch': 100, 'batch_size': 5} 144 | 0.727865 (0.028764) with: {'init': 'normal', 'optimizer': 'adam', 'nb_epoch': 100, 'batch_size': 5} 145 | 0.748698 (0.035849) with: {'init': 'normal', 'optimizer': 'rmsprop', 'nb_epoch': 150, 'batch_size': 5} 146 | 0.712240 (0.039623) with: {'init': 'normal', 'optimizer': 'adam', 'nb_epoch': 150, 'batch_size': 5} 147 | 0.699219 (0.024910) with: {'init': 'uniform', 'optimizer': 'rmsprop', 'nb_epoch': 50, 'batch_size': 5} 148 | 0.703125 (0.011500) with: {'init': 'uniform', 'optimizer': 'adam', 'nb_epoch': 50, 'batch_size': 5} 149 | 0.720052 (0.015073) with: {'init': 'uniform', 'optimizer': 'rmsprop', 'nb_epoch': 100, 'batch_size': 5} 150 | 0.712240 (0.034987) with: {'init': 'uniform', 'optimizer': 'adam', 'nb_epoch': 100, 'batch_size': 5} 151 | 0.751302 (0.031466) with: {'init': 'uniform', 'optimizer': 'rmsprop', 'nb_epoch': 150, 'batch_size': 5} 152 | 0.734375 (0.038273) with: {'init': 'uniform', 'optimizer': 'adam', 'nb_epoch': 150, 'batch_size': 5} 153 | ... 154 | ``` 155 | #### 9.4 总结 156 | 157 | 本章关于使用scikit-learn封装并测试神经网络的性能。总结一下: 158 | 159 | - 如何使用scikit-learn封装Keras模型 160 | - 如何使用scikit-learn测试Keras模型的性能 161 | - 如何使用scikit-learn调整Keras模型的超参 162 | 163 | 使用scikit-learn调整参数比手工调用Keras简便的多。 164 | 165 | ##### 9.4.1 下一章 166 | 167 | 现在你已经知道如何如何在scikit-learn调用Keras模型:可以开工了。接下来几章我们会用Keras创造不同的端到端模型,从多类分类问题开始。 168 | 169 | -------------------------------------------------------------------------------- /3-multi-layer-perceptrons/part3.md: -------------------------------------------------------------------------------- 1 | ## 第三部分 多层感知器 2 | 3 | -------------------------------------------------------------------------------- /4-advanced-multi-layer-perceptrons-and-keras/ch13-save-your-models-for-later-with-serialization.md: -------------------------------------------------------------------------------- 1 | ### 第13章 用序列化保存模型 2 | 3 | 深度学习的模型有可能需要好几天才能训练好,如果没有SL大法就完蛋了。本章关于如何保存和加载模型。本章将: 4 | 5 | - 使用HDF5格式保存模型 6 | - 使用JSON格式保存模型 7 | - 使用YAML格式保存模型 8 | 9 | 我们开始吧。 10 | 11 | #### 13.1 简介 12 | 13 | Keras中,模型的结构和权重数据是分开的:权重的文件格式是HDF5,这种格式保存数字矩阵效率很高。模型的结构用JSON或YAML导入导出。 14 | 15 | 本章包括如何手工修改HDF5文件,使用的模型是第7章的皮马人糖尿病模型。 16 | 17 | ##### 13.1.1 HDF5文件 18 | 19 | 分层数据格式,版本5(HDF5)可以高效保存大实数矩阵,例如神经网络的权重。HDF5的包需要安装: 20 | 21 | ```python 22 | sudo pip install h5py 23 | ``` 24 | 25 | #### 13.2 使用JSON保存网络结构 26 | 27 | JSON的格式很简单,Keras可以用```to_json()```把模型导出为JSON格式,再用```model_from_json()```加载回来。 28 | 29 | ```save_weights()```和```load_weights()```可以保存和加载模型权重。下面的代码把之前的模型保存到JSON文件```model.json```,权重保存到HDF5文件```model.h5```,然后加载回来。 30 | 31 | 模型和权重加载后需要编译一次,让Keras正确调用后端。模型的验证方法和之前一致: 32 | 33 | 导出: 34 | 35 | ```python 36 | # MLP for Pima Indians Dataset serialize to JSON and HDF5 37 | from keras.models import Sequential 38 | from keras.layers import Dense 39 | from keras.models import model_from_json 40 | import numpy 41 | import os 42 | # fix random seed for reproducibility 43 | seed = 7 44 | numpy.random.seed(seed) 45 | # load pima indians dataset 46 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 47 | # split into input (X) and output (Y) variables 48 | X = dataset[:,0:8] 49 | Y = dataset[:,8] 50 | # create model 51 | model = Sequential() 52 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 53 | model.add(Dense(1, init='uniform', activation='sigmoid')) 54 | # Compile model 55 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model 56 | model.fit(X, Y, nb_epoch=150, batch_size=10, verbose=0) 57 | # evaluate the model 58 | scores = model.evaluate(X, Y, verbose=0) 59 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 60 | # serialize model to JSON 61 | model_json = model.to_json() 62 | with open("model.json", "w") as json_file: 63 | json_file.write(model_json) 64 | # serialize weights to HDF5 65 | model.save_weights("model.h5") 66 | print("Saved model to disk") 67 | ``` 68 | 69 | 导入: 70 | 71 | ```python 72 | # later... 73 | # load json and create model 74 | # MLP for Pima Indians Dataset serialize to JSON and HDF5 75 | from keras.models import Sequential 76 | from keras.layers import Dense 77 | from keras.models import model_from_json 78 | import numpy 79 | import os 80 | # fix random seed for reproducibility 81 | seed = 7 82 | numpy.random.seed(seed) 83 | # load pima indians dataset 84 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 85 | # split into input (X) and output (Y) variables 86 | X = dataset[:,0:8] 87 | Y = dataset[:,8] 88 | # create model 89 | model = Sequential() 90 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 91 | model.add(Dense(1, init='uniform', activation='sigmoid')) 92 | # Compile model 93 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model 94 | model.fit(X, Y, nb_epoch=150, batch_size=10, verbose=0) 95 | # evaluate the model 96 | scores = model.evaluate(X, Y, verbose=0) 97 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 98 | # serialize model to JSON 99 | model_json = model.to_json() 100 | with open("model.json", "w") as json_file: 101 | json_file.write(model_json) 102 | # serialize weights to HDF5 103 | model.save_weights("model.h5") 104 | print("Saved model to disk") 105 | # later... 106 | # load json and create model 107 | ``` 108 | 109 | 结果如下。导入的模型和之前导出时一致: 110 | 111 | 112 | ```python 113 | acc: 79.56% 114 | Saved model to disk 115 | Loaded model from disk 116 | acc: 79.56% 117 | ``` 118 | 119 | JSON文件类似: 120 | 121 | ```python 122 | { 123 | "class_name": "Sequential", 124 | "config": [{ 125 | "class_name": "Dense", 126 | "config": { 127 | "W_constraint": null, 128 | "b_constraint": null, 129 | "name": "dense_1", 130 | "output_dim": 12, 131 | "activity_regularizer": null, 132 | "trainable": true, 133 | "init": "uniform", 134 | "input_dtype": "float32", 135 | "input_dim": 8, 136 | "b_regularizer": null, 137 | "W_regularizer": null, 138 | "activation": "relu", 139 | "batch_input_shape": [ 140 | null, 141 | 8 142 | ] 143 | } 144 | }, 145 | { 146 | "class_name": "Dense", 147 | "config": { 148 | "W_constraint": null, 149 | "b_constraint": null, 150 | "name": "dense_2", 151 | "activity_regularizer": null, 152 | "trainable": true, 153 | "init": "uniform", 154 | "input_dim": null, 155 | "b_regularizer": null, 156 | "W_regularizer": null, 157 | "activation": "relu", 158 | "output_dim": 8 159 | } 160 | }, 161 | { 162 | "class_name": "Dense", 163 | "config": { 164 | "W_constraint": null, 165 | "b_constraint": null, 166 | "name": "dense_3", 167 | "activity_regularizer": null, 168 | "trainable": true, 169 | "init": "uniform", 170 | "input_dim": null, 171 | "b_regularizer": null, 172 | "W_regularizer": null, 173 | "activation": "sigmoid", 174 | "output_dim": 1 175 | } 176 | } 177 | ] 178 | } 179 | ``` 180 | 181 | #### 13.3 使用YAML保存网络结构 182 | 183 | 和之前JSON类似,只不过文件格式变成YAML,使用的函数变成了```to_yaml()```和```model_from_yaml()```: 184 | 185 | ```python 186 | # MLP for Pima Indians Dataset serialize to YAML and HDF5 187 | from keras.models import Sequential 188 | from keras.layers import Dense 189 | from keras.models import model_from_yaml 190 | import numpy 191 | import os 192 | # fix random seed for reproducibility 193 | seed = 7 194 | numpy.random.seed(seed) 195 | # load pima indians dataset 196 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 197 | # split into input (X) and output (Y) variables 198 | X = dataset[:,0:8] 199 | Y = dataset[:,8] 200 | # create model 201 | model = Sequential() 202 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 203 | model.add(Dense(1, init='uniform', activation='sigmoid')) 204 | # Compile model 205 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model 206 | model.fit(X, Y, nb_epoch=150, batch_size=10, verbose=0) 207 | # evaluate the model 208 | scores = model.evaluate(X, Y, verbose=0) 209 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 210 | # serialize model to YAML 211 | model_yaml = model.to_yaml() 212 | with open("model.yaml", "w") as yaml_file: 213 | yaml_file.write(model_yaml) 214 | # serialize weights to HDF5 215 | model.save_weights("model.h5") 216 | print("Saved model to disk") 217 | 218 | 219 | # later... 220 | # load YAML and create model 221 | yaml_file = open('model.yaml', 'r') loaded_model_yaml = yaml_file.read() yaml_file.close() 222 | loaded_model = model_from_yaml(loaded_model_yaml) # load weights into new model loaded_model.load_weights("model.h5") print("Loaded model from disk") 223 | # evaluate loaded model on test data 224 | loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) score = loaded_model.evaluate(X, Y, verbose=0) 225 | print "%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100) 226 | ``` 227 | 228 | 结果和之前的一样: 229 | 230 | ```python 231 | acc: 79.56% 232 | Saved model to disk 233 | Loaded model from disk 234 | acc: 79.56% 235 | ``` 236 | 237 | YAML文件长这样: 238 | 239 | ```yaml 240 | class_name: Sequential 241 | config: 242 | - class_name: Dense 243 | config: 244 | W_constraint: null 245 | W_regularizer: null 246 | activation: relu 247 | activity_regularizer: null 248 | b_constraint: null 249 | b_regularizer: null 250 | batch_input_shape: !!python/tuple [null, 8] 251 | init: uniform 252 | input_dim: 8 253 | input_dtype: float32 254 | name: dense_1 255 | output_dim: 12 256 | trainable: true 257 | - class_name: Dense 258 | config: {W_constraint: null, W_regularizer: null, activation: relu, activity_regularizer: 259 | null, 260 | b_constraint: null, b_regularizer: null, init: uniform, input_dim: null, name: dense_2, 261 | output_dim: 8, trainable: true} 262 | - class_name: Dense 263 | config: {W_constraint: null, W_regularizer: null, activation: sigmoid, 264 | activity_regularizer: null, 265 | b_constraint: null, b_regularizer: null, init: uniform, input_dim: null, name: dense_3, 266 | output_dim: 1, trainable: true} 267 | ``` 268 | 269 | #### 13.4 总结 270 | 271 | 本章关于导入导出Keras模型。总结一下: 272 | 273 | - 如何用HDF5保存加载权重 274 | - 如何用JSON保存加载模型 275 | - 如何用YAML保存加载模型 276 | 277 | ##### 13.4.1 下一章 278 | 279 | 模型可以保存了:下一章关于使用保存点。 280 | 281 | 282 | -------------------------------------------------------------------------------- /4-advanced-multi-layer-perceptrons-and-keras/ch14-keep-the-best-models-during-training-with-checkpointing.md: -------------------------------------------------------------------------------- 1 | ### 第14章 使用保存点保存最好的模型 2 | 3 | 深度学习有可能需要跑很长时间,如果中间断了(特别是在竞价式实例上跑的时候)就要亲命了。本章关于在训练时中途保存模型。本章将: 4 | 5 | - 保存点很重要! 6 | - 每轮打保存点! 7 | - 挑最好的模型! 8 | 9 | 我们开始吧。 10 | 11 | #### 14.1 使用保存点 12 | 13 | 长时间运行的程序需要能中途保存,加强健壮性。保存的程序应该可以继续运行,或者直接运行。深度学习的保存点用来存储模型的权重:这样可以继续训练,或者直接开始预测。 14 | 15 | Keras有回调API,配合```ModelCheckpoint```可以每轮保存网络信息,可以定义文件位置、文件名和保存时机等。例如,损失函数或准确率达到某个标准就保存,文件名的格式可以加入时间和准确率等。```ModelCheckpoint```需要传入```fit()```函数,也需要安装```h5py```库。 16 | 17 | #### 14.2 效果变好就保存 18 | 19 | 好习惯:每轮如果效果变好就保存一下。还是用第7章的模型,用33%的数据测试。 20 | 21 | 每轮后在测试数据集上验证,如果比之前效果好就保存权重(monitor='val_acc', mode='max')。文件名格式是```weights-improvement-val_acc=.2f.hdf5```。 22 | 23 | ```python 24 | # Checkpoint the weights when validation accuracy improves 25 | from keras.models import Sequential 26 | from keras.layers import Dense 27 | from keras.callbacks import ModelCheckpoint 28 | import matplotlib.pyplot as plt 29 | import numpy 30 | # fix random seed for reproducibility 31 | seed = 7 32 | numpy.random.seed(seed) 33 | # load pima indians dataset 34 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 35 | # split into input (X) and output (Y) variables 36 | X = dataset[:,0:8] 37 | Y = dataset[:,8] 38 | # create model 39 | model = Sequential() 40 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) 41 | model.add(Dense(8, init='uniform', activation='relu')) 42 | model.add(Dense(1, init='uniform', activation='sigmoid')) 43 | # Compile model 44 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 45 | # checkpoint 46 | filepath="weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5" 47 | checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, 48 | mode='max') 49 | callbacks_list = [checkpoint] 50 | # Fit the model 51 | model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10, 52 | callbacks=callbacks_list, verbose=0) 53 | ``` 54 | 55 | 输出的结果如下:如果效果更好就保存。 56 | 57 | ```python 58 | ... 59 | Epoch 00134: val_acc did not improve 60 | Epoch 00135: val_acc did not improve 61 | Epoch 00136: val_acc did not improve 62 | Epoch 00137: val_acc did not improve 63 | Epoch 00138: val_acc did not improve 64 | Epoch 00139: val_acc did not improve 65 | Epoch 00140: val_acc improved from 0.83465 to 0.83858, saving model to 66 | weights-improvement-140-0.84.hdf5 67 | Epoch 00141: val_acc did not improve 68 | Epoch 00142: val_acc did not improve 69 | Epoch 00143: val_acc did not improve 70 | Epoch 00144: val_acc did not improve 71 | Epoch 00145: val_acc did not improve 72 | Epoch 00146: val_acc improved from 0.83858 to 0.84252, saving model to 73 | weights-improvement-146-0.84.hdf5 74 | Epoch 00147: val_acc did not improve 75 | Epoch 00148: val_acc improved from 0.84252 to 0.84252, saving model to 76 | weights-improvement-148-0.84.hdf5 77 | Epoch 00149: val_acc did not improve 78 | ``` 79 | 80 | 目录下会保存每次的模型: 81 | 82 | ```python 83 | ... 84 | weights-improvement-74-0.81.hdf5 85 | weights-improvement-81-0.82.hdf5 86 | weights-improvement-91-0.82.hdf5 87 | weights-improvement-93-0.83.hdf5 88 | ``` 89 | 90 | 这种方法有效,但是文件较多。当然最好的模型肯定保存下来了。 91 | 92 | #### 14.3 保存最好的模型 93 | 94 | 也可以只保存最好的模型:每次如果效果变好就覆盖之前的权重文件,把之前的文件名改成固定的就可以: 95 | 96 | ```python 97 | # Checkpoint the weights for best model on validation accuracy 98 | from keras.models import Sequential 99 | from keras.layers import Dense 100 | from keras.callbacks import ModelCheckpoint 101 | import matplotlib.pyplot as plt 102 | import numpy 103 | # fix random seed for reproducibility 104 | seed = 7 105 | numpy.random.seed(seed) 106 | # load pima indians dataset 107 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 108 | # split into input (X) and output (Y) variables 109 | X = dataset[:,0:8] 110 | Y = dataset[:,8] 111 | # create model 112 | model = Sequential() 113 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) model.add(Dense(1, init='uniform', activation='sigmoid')) 114 | # Compile model 115 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 116 | # checkpoint 117 | filepath="weights.best.hdf5" 118 | checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, 119 | mode='max') 120 | callbacks_list = [checkpoint] 121 | # Fit the model 122 | model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10, 123 | callbacks=callbacks_list, verbose=0) 124 | ``` 125 | 126 | 结果如下: 127 | 128 | ```python 129 | ... 130 | Epoch 00136: val_acc did not improve 131 | Epoch 00137: val_acc did not improve 132 | Epoch 00138: val_acc did not improve 133 | Epoch 00139: val_acc did not improve 134 | Epoch 00140: val_acc improved from 0.83465 to 0.83858, saving model to weights.best.hdf5 135 | Epoch 00141: val_acc did not improve 136 | Epoch 00142: val_acc did not improve 137 | Epoch 00143: val_acc did not improve 138 | Epoch 00144: val_acc did not improve 139 | Epoch 00145: val_acc did not improve 140 | Epoch 00146: val_acc improved from 0.83858 to 0.84252, saving model to weights.best.hdf5 141 | Epoch 00147: val_acc did not improve 142 | Epoch 00148: val_acc improved from 0.84252 to 0.84252, saving model to weights.best.hdf5 143 | Epoch 00149: val_acc did not improve 144 | ``` 145 | 146 | 网络保存在: 147 | 148 | ``` 149 | weights.best.hdf5 150 | ``` 151 | 152 | #### 14.4 导入保存的模型 153 | 154 | 保存点只保存权重,网络结构需要预先保存。参见第13章,代码如下: 155 | 156 | ```python 157 | # How to load and use weights from a checkpoint 158 | from keras.models import Sequential 159 | from keras.layers import Dense 160 | from keras.callbacks import ModelCheckpoint 161 | import matplotlib.pyplot as plt 162 | import numpy 163 | # fix random seed for reproducibility 164 | seed = 7 165 | numpy.random.seed(seed) 166 | # create model 167 | model = Sequential() 168 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) model.add(Dense(8, init='uniform', activation='relu')) 169 | model.add(Dense(1, init='uniform', activation='sigmoid')) 170 | # load weights 171 | model.load_weights("weights.best.hdf5") 172 | # Compile model (required to make predictions) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) print("Created model and loaded weights from file") 173 | # load pima indians dataset 174 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 175 | # split into input (X) and output (Y) variables 176 | X = dataset[:,0:8] 177 | Y = dataset[:,8] 178 | # estimate accuracy on whole dataset using loaded weights 179 | scores = model.evaluate(X, Y, verbose=0) 180 | print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100)) 181 | ``` 182 | 183 | 结果如下: 184 | 185 | ``` 186 | Created model and loaded weights from file 187 | acc: 77.73% 188 | ``` 189 | 190 | #### 14.5 总结 191 | 192 | 本章关于在训练时保存检查点。总结一下: 193 | 194 | - 如何在优化时保存网络 195 | - 如何保存最好的网络 196 | - 如何导入网络 197 | 198 | ##### 14.5.1 下一章 199 | 200 | 本章关于建立保存点:下一章关于在训练时画性能图表。 201 | 202 | 203 | -------------------------------------------------------------------------------- /4-advanced-multi-layer-perceptrons-and-keras/ch15-understand-model-behavior-during-training-by-plotting-history.md: -------------------------------------------------------------------------------- 1 | ### 第15章 模型训练效果可视化 2 | 3 | 查看训练效果的历史数据大有裨益。本章关于将模型的训练效果进行可视化。本章教你: 4 | 5 | - 如何观察历史训练数据 6 | - 如何在训练时绘制数据准确性图像 7 | - 如何在训练时绘制损失图像 8 | 9 | 我们开始吧。 10 | 11 | #### 15.1 取历史数据 12 | 13 | 上一章说到Keras支持回调API,其中默认调用```History```函数,每轮训练收集损失和准确率,如果有测试集,也会收集测试集的数据。 14 | 15 | 历史数据会收集```fit()```函数的返回值,在```history```对象中。看一下到底收集了什么数据: 16 | 17 | ```python 18 | # list all data in history 19 | print(history.history.keys()) 20 | ``` 21 | 22 | 如果是第7章的二分类问题: 23 | 24 | ```python 25 | ['acc', 'loss', 'val_acc', 'val_loss'] 26 | ``` 27 | 28 | 可以用这些数据画折线图,直观看到: 29 | 30 | - 模型收敛的速度(斜率) 31 | - 模型是否已经收敛(稳定性) 32 | - 模型是否过拟合(验证数据集) 33 | 34 | 以及更多。 35 | 36 | #### 15.2 可视化Keras模型训练 37 | 38 | 收集一下第7章皮马人糖尿病模型的历史数据,绘制: 39 | 40 | 1. 训练和验证集的准确度 41 | 2. 训练和验证集的损失 42 | 43 | ```python 44 | # Visualize training history 45 | from keras.models import Sequential 46 | from keras.layers import Dense 47 | import matplotlib.pyplot as plt 48 | import numpy 49 | # fix random seed for reproducibility 50 | seed = 7 51 | numpy.random.seed(seed) 52 | # load pima indians dataset 53 | dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") 54 | # split into input (X) and output (Y) variables 55 | X = dataset[:,0:8] 56 | Y = dataset[:,8] 57 | # create model 58 | model = Sequential() 59 | model.add(Dense(12, input_dim=8, init='uniform', activation='relu')) 60 | model.add(Dense(8, init='uniform', activation='relu')) 61 | model.add(Dense(1, init='uniform', activation='sigmoid')) 62 | # Compile model 63 | model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 64 | # Fit the model 65 | history = model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10, verbose=0) # list all data in history 66 | print(history.history.keys()) 67 | # summarize history for accuracy 68 | plt.plot(history.history['acc']) 69 | plt.plot(history.history['val_acc']) 70 | plt.title('model accuracy') 71 | plt.ylabel('accuracy') 72 | plt.xlabel('epoch') 73 | plt.legend(['train', 'test'], loc='upper left') plt.show() 74 | # summarize history for loss plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('model loss') 75 | plt.ylabel('loss') 76 | plt.xlabel('epoch') 77 | plt.legend(['train', 'test'], loc='upper left') plt.show() 78 | ``` 79 | 80 | 图像如下。最后几轮的准确率还在上升,有可能有点过度学习;但是两个数据集的效果差不多,应该没有过拟合。 81 | 82 | ![15.1 准确度图像](https://i.imgur.com/ngFUnWu.png) 83 | 84 | 从损失图像看,两个数据集的性能差不多。如果两条线开始分开,有可能应该提前终止训练。 85 | 86 | ![15.2 损失图像](https://i.imgur.com/r5XHhJy.png) 87 | 88 | #### 15.3 总结 89 | 90 | 本章关于在训练时绘制图像。总结一下: 91 | 92 | - 如何看历史对象 93 | - 如何绘制历史性能 94 | - 如何绘制两个数据集的不同性能 95 | 96 | ##### 15.3.1 下一章 97 | 98 | Dropout可以有效防止过拟合:下一章关于这个技术、如何在Keras中实现以及最佳实践。 99 | 100 | 101 | -------------------------------------------------------------------------------- /4-advanced-multi-layer-perceptrons-and-keras/ch16-reduce-overfitting-with-dropout-regularization.md: -------------------------------------------------------------------------------- 1 | ### 第16章 使用Dropout正则化防止过拟合 2 | 3 | Dropout虽然简单,但可以有效防止过拟合。本章关于如何在Keras中使用Dropout。本章包括: 4 | 5 | - dropout的原理 6 | - dropout的使用 7 | - 在隐层上使用dropout 8 | 9 | 我们开始吧。 10 | 11 | #### 16.1 Dropout正则化 12 | 13 | 译者鄙校的Srivastava等大牛在2014年的论文《[Dropout: A Simple Way to Prevent Neural Networks from Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)》提出了Dropout正则化。Dropout的意思是:每次训练时随机忽略一部分神经元,这些神经元dropped-out了。换句话讲,这些神经元在正向传播时对下游的启动影响被忽略,反向传播时也不会更新权重。 14 | 15 | 神经网络的所谓“学习”是指,让各个神经元的权重符合需要的特性。不同的神经元组合后可以分辨数据的某个特征。每个神经元的邻居会依赖邻居的行为组成的特征,如果过度依赖,就会造成过拟合。如果每次随机拿走一部分神经元,那么剩下的神经元就需要补上消失神经元的功能,整个网络变成很多独立网络(对同一问题的不同解决方法)的合集。 16 | 17 | Dropout的效果是,网络对某个神经元的权重变化更不敏感,增加泛化能力,减少过拟合。 18 | 19 | #### 16.2 在Keras中使用Dropout正则化 20 | 21 | Dropout就是每次训练按概率拿走一部分神经元,只在训练时使用。后面我们会研究其他的用法。 22 | 23 | 以下的例子是声呐数据集(第11章),用scikit-learn进行10折交叉检验,这样可以看出区别。输入变量有60个,输出1个,数据经过正则化。基线模型有2个隐层,第一个有60个神经元,第二个有30个。训练方法是随机梯度下降,学习率和动量较低。下面是基线模型的代码: 24 | 25 | ```python 26 | import numpy 27 | import pandas 28 | from keras.models import Sequential 29 | from keras.layers import Dense 30 | from keras.layers import Dropout 31 | from keras.wrappers.scikit_learn import KerasClassifier 32 | from keras.constraints import maxnorm 33 | from keras.optimizers import SGD 34 | from sklearn.cross_validation import cross_val_score 35 | from sklearn.preprocessing import LabelEncoder 36 | from sklearn.cross_validation import StratifiedKFold 37 | from sklearn.preprocessing import StandardScaler 38 | from sklearn.grid_search import GridSearchCV 39 | from sklearn.pipeline import Pipeline 40 | # fix random seed for reproducibility 41 | seed = 7 42 | numpy.random.seed(seed) 43 | # load dataset 44 | dataframe = pandas.read_csv("sonar.csv", header=None) 45 | dataset = dataframe.values 46 | # split into input (X) and output (Y) variables 47 | X = dataset[:,0:60].astype(float) 48 | Y = dataset[:,60] 49 | # encode class values as integers 50 | encoder = LabelEncoder() 51 | encoder.fit(Y) 52 | encoded_Y = encoder.transform(Y) 53 | 54 | # baseline 55 | def create_baseline(): 56 | # create model 57 | model = Sequential() 58 | model.add(Dense(60, input_dim=60, init='normal', activation='relu')) model.add(Dense(30, init='normal', activation='relu')) model.add(Dense(1, init='normal', activation='sigmoid')) 59 | # Compile model 60 | sgd = SGD(lr=0.01, momentum=0.8, decay=0.0, nesterov=False) 61 | model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy']) 62 | return model 63 | numpy.random.seed(seed) 64 | estimators = [] 65 | estimators.append(('standardize', StandardScaler())) 66 | estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, nb_epoch=300, 67 | batch_size=16, verbose=0))) 68 | pipeline = Pipeline(estimators) 69 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 70 | results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 71 | print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 72 | ``` 73 | 74 | 不使用Dropout的准确率是82%。 75 | 76 | ```python 77 | Accuracy: 82.68% (3.90%) 78 | ``` 79 | 80 | #### 16.3 对输入层使用Dropout正则化 81 | 82 | 可以对表层用Dropout:这里我们对输入层(表层)和第一个隐层用Dropout,比例是20%,意思是每轮训练每5个输入随机去掉1个变量。 83 | 84 | 原论文推荐对每层的权重加限制,保证模不超过3:在定义全连接层时用```W_constraint```可以做到。学习率加10倍,动量加到0.9,原论文也如此推荐。对上面的模型进行修改: 85 | 86 | ```python 87 | 88 | # dropout in the input layer with weight constraint 89 | def create_model1(): 90 | # create model 91 | model = Sequential() 92 | model.add(Dropout(0.2, input_shape=(60,))) 93 | model.add(Dense(60, init='normal', activation='relu', W_constraint=maxnorm(3))) model.add(Dense(30, init='normal', activation='relu', W_constraint=maxnorm(3))) model.add(Dense(1, init='normal', activation='sigmoid')) 94 | # Compile model 95 | sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False) 96 | model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy']) 97 | return model 98 | numpy.random.seed(seed) 99 | estimators = [] 100 | estimators.append(('standardize', StandardScaler())) 101 | estimators.append(('mlp', KerasClassifier(build_fn=create_model1, nb_epoch=300, 102 | batch_size=16, verbose=0))) 103 | pipeline = Pipeline(estimators) 104 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 105 | results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 106 | print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 107 | ``` 108 | 109 | 准确率提升到86%: 110 | 111 | ```python 112 | Accuracy: 86.04% (6.33%) 113 | ``` 114 | 115 | #### 16.4 对隐层使用Dropout正则化 116 | 117 | 隐层当然也可以用Dropout。和上次一样,这次对两个隐层都做Dropout,概率还是20%: 118 | 119 | ```python 120 | # dropout in hidden layers with weight constraint 121 | def create_model2(): 122 | # create model 123 | model = Sequential() 124 | model.add(Dense(60, input_dim=60, init='normal', activation='relu', 125 | W_constraint=maxnorm(3))) 126 | model.add(Dropout(0.2)) 127 | model.add(Dense(30, init='normal', activation='relu', W_constraint=maxnorm(3))) 128 | model.add(Dropout(0.2)) 129 | model.add(Dense(1, init='normal', activation='sigmoid')) 130 | # Compile model 131 | sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False) 132 | model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy']) 133 | return model 134 | numpy.random.seed(seed) 135 | estimators = [] 136 | estimators.append(('standardize', StandardScaler())) 137 | estimators.append(('mlp', KerasClassifier(build_fn=create_model2, nb_epoch=300, 138 | batch_size=16, verbose=0))) 139 | pipeline = Pipeline(estimators) 140 | kfold = StratifiedKFold(y=encoded_Y, n_folds=10, shuffle=True, random_state=seed) 141 | results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 142 | print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 143 | ``` 144 | 145 | 然而并没有什么卯月,效果更差了。有可能需要多训练一些吧。 146 | 147 | ```python 148 | Accuracy: 82.16% (6.16%) 149 | ``` 150 | 151 | #### 16.5 使用Dropout正则化的技巧 152 | 153 | 原论文对很多标准机器学习问题做出了比较,并提出了下列建议: 154 | 155 | 1. Dropout概率不要太高,从20%开始,试到50%。太低的概率效果不好,太高有可能欠拟合。 156 | 2. 网络要大。更大的网络学习到不同方法的几率更大。 157 | 3. 每层都做Dropout,包括输入层。效果更好。 158 | 4. 学习率(带衰减的)和动量要大。直接对学习率乘10或100,动量设到0.9或0.99。 159 | 5. 限制每层的权重。学习率增大会造成权重增大,把每层的模限制到4或5的效果更好。 160 | 161 | #### 16.6 总结 162 | 163 | 本章关于使用Dropout正则化避免过拟合。总结一下: 164 | 165 | - Dropout的工作原理是什么 166 | - 如何使用Dropout 167 | - Dropout的最佳实践是什么 168 | 169 | ##### 16.6.1 下一章 170 | 171 | 在训练中调节学习率会提升性能。下一章会研究不同学习率的效果,以及如何在Keras中使用。 172 | 173 | 174 | -------------------------------------------------------------------------------- /4-advanced-multi-layer-perceptrons-and-keras/ch17-lift-performance-with-learning-rate-schedules.md: -------------------------------------------------------------------------------- 1 | ### 第17章 学习速度设计 2 | 3 | 神经网络的训练是很困难的优化问题。传统的随机梯度下降算法配合设计好的学习速度有时效果更好。本章包括: 4 | 5 | - 调整学习速度的的原因 6 | - 如何使用按时间变化的学习速度 7 | - 如何使用按训练次数变化的学习速度 8 | 9 | 我们开始吧。 10 | 11 | #### 17.1 学习速度 12 | 13 | 随机梯度下降算法配合设计好的速度可以增强效果,减少训练时间:也叫学习速度退火或可变学习速度。其实就是慢慢调整学习速度,而传统的方法中学习速度不变。 14 | 15 | 最简单的调整方法是学习速度随时间下降,一开始做大的调整加速训练,后面慢慢微调性能。两个简单的方法: 16 | 17 | - 根据训练轮数慢慢下降 18 | - 到某个点下降到某个值 19 | 20 | 我们分别探讨一下。 21 | 22 | #### 17.2 电离层分类数据集 23 | 24 | 本章使用电离层二分类数据集,研究电离层中的自由电子。分类g(好)意味着电离层中有某个结构;b(坏)代表没有,信号通过了电离层。数据有34个属性,351个数据。 25 | 26 | 10折检验下最好的模型可以达到94~98%的准确度。数据在本书的data目录下,也可以自行下载,重命名为```ionosphere.csv```。数据集详情请参见UCI机器学习网站。 27 | 28 | #### 17.3 基于时间的学习速度调度 29 | 30 | Keras内置了一个基于时间的学习速度调度器:Keras的随机梯度下降```SGD```类有```decay```参数,按下面的公式调整速度: 31 | 32 | ``` 33 | LearnRate = LearnRate x (1 / 1 + decay x epoch) 34 | ``` 35 | 36 | 默认值是0:不起作用。 37 | 38 | ```python 39 | LearningRate = 0.1 * 1/(1 + 0.0 * 1) 40 | LearningRate = 0.1 41 | ``` 42 | 43 | 如果衰减率大于1,例如0.001,效果是: 44 | 45 | ```python 46 | Epoch Learning Rate 47 | 1 0.1 48 | 2 0.0999000999 49 | 3 0.0997006985 50 | 4 0.09940249103 51 | 5 0.09900646517 52 | ``` 53 | 54 | 到100轮的图像: 55 | 56 | ![17.2 按时间](https://i.imgur.com/aGgYVXf.png) 57 | 58 | 可以这样设计: 59 | 60 | ```python 61 | Decay = LearningRate / Epochs 62 | Decay = 0.1 / 100 63 | Decay = 0.001 64 | ``` 65 | 66 | 下面的代码按时间减少学习速度。神经网络有1个隐层,34个神经元,激活函数是整流函数。输出层是1个神经元,激活函数是S型函数,输出一个概率。学习率设到0.1,训练50轮,衰减率0.002,也就是0.1/50。学习速度调整一般配合动量使用:动量设成0.8。代码如下: 67 | 68 | ```python 69 | import pandas 70 | import numpy 71 | from keras.models import Sequential 72 | from keras.layers import Dense 73 | from keras.optimizers import SGD 74 | from sklearn.preprocessing import LabelEncoder 75 | # fix random seed for reproducibility 76 | seed = 7 77 | numpy.random.seed(seed) 78 | # load dataset 79 | dataframe = pandas.read_csv("ionosphere.csv", header=None) 80 | dataset = dataframe.values 81 | # split into input (X) and output (Y) variables 82 | X = dataset[:,0:34].astype(float) 83 | Y = dataset[:,34] 84 | # encode class values as integers 85 | encoder = LabelEncoder() 86 | encoder.fit(Y) 87 | Y = encoder.transform(Y) 88 | # create model 89 | model = Sequential() 90 | model.add(Dense(34, input_dim=34, init='normal', activation='relu')) model.add(Dense(1, init='normal', activation='sigmoid')) 91 | # Compile model 92 | epochs = 50 93 | learning_rate = 0.1 94 | decay_rate = learning_rate / epochs 95 | momentum = 0.8 96 | sgd = SGD(lr=learning_rate, momentum=momentum, decay=decay_rate, nesterov=False) model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy']) 97 | # Fit the model 98 | model.fit(X, Y, validation_split=0.33, nb_epoch=epochs, batch_size=28) 99 | ``` 100 | 101 | 训练67%,测试33%的数据,准确度达到了99.14%,高于不使用任何优化的95.69%: 102 | 103 | ``` 104 | 235/235 [==============================] - 0s - loss: 0.0607 - acc: 0.9830 - val_loss: 105 | 0.0732 - val_acc: 0.9914 106 | Epoch 46/50 107 | 235/235 [==============================] - 0s - loss: 0.0570 - acc: 0.9830 - val_loss: 108 | 0.0867 - val_acc: 0.9914 109 | Epoch 47/50 110 | 235/235 [==============================] - 0s - loss: 0.0584 - acc: 0.9830 - val_loss: 111 | 0.0808 - val_acc: 0.9914 112 | Epoch 48/50 113 | 235/235 [==============================] - 0s - loss: 0.0610 - acc: 0.9872 - val_loss: 114 | 0.0653 - val_acc: 0.9828 115 | Epoch 49/50 116 | 235/235 [==============================] - 0s - loss: 0.0591 - acc: 0.9830 - val_loss: 117 | 0.0821 - val_acc: 0.9914 118 | Epoch 50/50 119 | 235/235 [==============================] - 0s - loss: 0.0598 - acc: 0.9872 - val_loss: 120 | 0.0739 - val_acc: 0.9914 121 | ``` 122 | 123 | #### 17.3 基于轮数的学习速度调度 124 | 125 | 也可以固定调度:到某个轮数就用某个速度,每次的速度是上次的一半。例如,初始速度0.1,每10轮降低一半。画图就是: 126 | 127 | ![17.3 轮数](https://i.imgur.com/xKKILzV.png) 128 | 129 | Keras的```LearningRateScheduler```作为回调参数可以控制学习速度,取当前的轮数,返回应有的速度。还是刚才的网络,加入一个```step_decay```函数,生成如下学习率: 130 | 131 | ```Python 132 | LearnRate = InitialLearningRate x Droprate ^ floor((1+Epoch)/EpochDrop) 133 | ``` 134 | 135 | InitialLearningRate是初始的速度,DropRate是减速频率,EpochDrop是降低多少: 136 | 137 | ```python 138 | import pandas 139 | import pandas 140 | import numpy 141 | import math 142 | from keras.models import Sequential 143 | from keras.layers import Dense 144 | from keras.optimizers import SGD 145 | from sklearn.preprocessing import LabelEncoder 146 | from keras.callbacks import LearningRateScheduler 147 | # learning rate schedule 148 | def step_decay(epoch): 149 | initial_lrate = 0.1 150 | drop = 0.5 151 | epochs_drop = 10.0 152 | lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop)) 153 | return lrate 154 | # fix random seed for reproducibility 155 | seed = 7 156 | numpy.random.seed(seed) 157 | # load dataset 158 | dataframe = pandas.read_csv("../data/ionosphere.csv", header=None) 159 | dataset = dataframe.values 160 | # split into input (X) and output (Y) variables 161 | X = dataset[:,0:34].astype(float) 162 | Y = dataset[:,34] 163 | # encode class values as integers 164 | encoder = LabelEncoder() 165 | encoder.fit(Y) 166 | Y = encoder.transform(Y) 167 | # create model 168 | model = Sequential() 169 | model.add(Dense(34, input_dim=34, init='normal', activation='relu')) 170 | model.add(Dense(1, init='normal', activation='sigmoid')) 171 | # Compile model 172 | sgd = SGD(lr=0.0, momentum=0.9, decay=0.0, nesterov=False) model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy']) 173 | # learning schedule callback 174 | lrate = LearningRateScheduler(step_decay) 175 | callbacks_list = [lrate] 176 | # Fit the model 177 | model.fit(X, Y, validation_split=0.33, nb_epoch=50, batch_size=28, callbacks=callbacks_list) 178 | ``` 179 | 180 | 效果也是99.14%,比什么都不做好: 181 | 182 | ```python 183 | Epoch 45/50 184 | 235/235 [==============================] - 0s - loss: 0.0546 - acc: 0.9830 - val_loss: 185 | 0.0705 - val_acc: 0.9914 186 | Epoch 46/50 187 | 235/235 [==============================] - 0s - loss: 0.0542 - acc: 0.9830 - val_loss: 188 | 0.0676 - val_acc: 0.9914 189 | Epoch 47/50 190 | 235/235 [==============================] - 0s - loss: 0.0538 - acc: 0.9830 - val_loss: 191 | 0.0668 - val_acc: 0.9914 192 | Epoch 48/50 193 | 235/235 [==============================] - 0s - loss: 0.0539 - acc: 0.9830 - val_loss: 194 | 0.0708 - val_acc: 0.9914 195 | Epoch 49/50 196 | 235/235 [==============================] - 0s - loss: 0.0539 - acc: 0.9830 - val_loss: 197 | 0.0674 - val_acc: 0.9914 198 | Epoch 50/50 199 | 235/235 [==============================] - 0s - loss: 0.0531 - acc: 0.9830 - val_loss: 200 | 0.0694 - val_acc: 0.9914 201 | ``` 202 | 203 | #### 17.5 调整学习速度的技巧 204 | 205 | 这些技巧可以帮助调参: 206 | 207 | 1. 增加初始学习速度。因为速度后面会降低,一开始速度快点可以加速收敛。 208 | 2. 动量要大。这样后期学习速度下降时如果方向一样,还可以继续收敛。 209 | 3. 多试验。这个问题没有定论,需要多尝试。也试试指数下降和什么都不做。 210 | 211 | #### 17.6 总结 212 | 213 | 本章关于调整学习速度。总结一下: 214 | 215 | - 调整学习速度为什么有效 216 | - 如何在Keras使用基于时间的学习速度下降 217 | - 如何自己编写下降速度函数 218 | 219 | ##### 17.6.1 下一章 220 | 221 | 第四章到此结束,包括Keras的一些高级函数和调参的高级方法。下一章研究卷积神经网络(CNN),在图片和自然语言处理上尤为有效。 222 | 223 | 224 | -------------------------------------------------------------------------------- /4-advanced-multi-layer-perceptrons-and-keras/part4.md: -------------------------------------------------------------------------------- 1 | ## 第四部分 Keras与高级多层感知器 -------------------------------------------------------------------------------- /5-convolutional-neural-networks/part5.md: -------------------------------------------------------------------------------- 1 | ## 第五部分 卷积神经网络 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /Preface-translator.md: -------------------------------------------------------------------------------- 1 | ## 译者的话 2 | 3 | 很有幸翻译本书。 4 | 5 | 本书的阅读和翻译一气呵成。我尽可能言简意赅的翻译,以便和自己的稿费(万一有呢)过不去。 6 | 7 | 本书的风格类似[fast.ai](http://www.fast.ai/)。然而fast.ai并不使用Keras(否则这个译本就不会出现了):如果想用Keras,[这里有示例代码](https://github.com/roebius/deeplearning_keras2)。 8 | 9 | 如果您没有Python基础,请阅读[廖雪峰的Python学习教程](https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000)。 10 | 11 | 本书的Theano部分翻译并不认真,因为[Theano的死期已定](https://groups.google.com/forum/#!msg/theano-users/7Poq8BZutbY/rNCIfvAEAwAJ):请使用TensorFlow作为后端吧。 12 | 13 | 希望本书可以填补Keras资料的空白。 14 | 15 | 本书的翻译得到了很多指正:没有他们本书的准确性将大打折扣。如果我忘记了您的名字,请联系我:i@cnbeining.com 在此致谢。 16 | 17 | (名称不分先后) 18 | 19 | msg7086、TsukaTsuki、CAMOE 20 | 21 | Beining 22 | 23 | -------------------------------------------------------------------------------- /Preface.md: -------------------------------------------------------------------------------- 1 | ## 前言 2 | 3 | 深度学习一直很有意思:虽然神经网络不是新鲜事,但是直到最近才开始迅速发展。得益于硬件、技术和开源软件的升级,现在创建庞大的神经网络已经无比容易。 4 | 5 | 随着神经网络变得更大更深,机器学习可以解决很多问题。我个人近几年的观察是,神经网络在很多领域的效果是惊艳的,包括但不限于物体识别、语音识别、分类、机器翻译等等等等等等等等。 6 | 7 | 那么问题来了:深度学习应该从哪里入门?鉴于目前机器学习没有好的用Python的入门书,我撰写了本书。 8 | 9 | 本书选用Keras:因为Keras简单明了,封装了所有的底层细节,只留下最重要的API,可以很快捷地开始神经网络的开发。 10 | 11 | 我希望我当初有这样一本入门书:最后祝您调参愉快! 12 | 13 | Jason Brownlee 14 | 15 | 写于澳大利亚墨尔本 16 | 17 | 2016 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 深度学习:Python 教程 (Deep Learning With Python) 2 | 3 | ## Deep Learning With Python: Develop Deep Learning Models on Theano and TensorFlow Using Keras 4 | 5 | ## 使用 Keras、Python、Theano 和 TensorFlow 开发深度学习模型 6 | 7 | 原书网站: https://machinelearningmastery.com/deep-learning-with-python/ 8 | 9 | 作者:Jason Brownlee 10 | -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | * [封面](README.md) 4 | * [前言](Preface.md) 5 | * [译者的话](Preface-translator.md) 6 | * [I 简介](1-introduction/part1.md) 7 | * [第1章 深度学习入门](1-introduction/ch1-welcome.md) 8 | * [II 背景](2-background/part2.md) 9 | * [第2章 Theano入门](2-background/ch2-introduction-to-theano.md) 10 | * [第3章 TensorFlow入门](2-background/ch3-introduction-to-tensorflow.md) 11 | * [第4章 Keras入门](2-background/ch4-introduction-to-keras.md) 12 | * [第5章 项目:在云上搭建机器学习环境](2-background/ch5-project-develop-large-models-on-gpus-cheaply-in-the-cloud.md) 13 | * [III 多层感知器](3-multi-layer-perceptrons/part3.md) 14 | * [第6章 多层感知器入门](3-multi-layer-perceptrons/ch6-crash-course-in-multi-layer-perceptrons.md) 15 | * [第7章 使用Keras开发神经网络](3-multi-layer-perceptrons/ch7-develop-your-first-neural-network-with-keras.md) 16 | * [第8章 测试神经网络](3-multi-layer-perceptrons/ch8-evaluate-the-performance-of-deep-learning-models.md) 17 | * [第9章 使用Scikit-Learn调用Keras的模型](3-multi-layer-perceptrons/ch9-use-keras-models-with-scikit-learn-for-general-machine-learning.md) 18 | * [第10章 项目:多类花朵分类](3-multi-layer-perceptrons/ch10-project-multiclass-classification-of-flower-species.md) 19 | * [第11章 项目:声呐返回值分类](3-multi-layer-perceptrons/ch11-project-binary-classification-of-sonar-returns.md) 20 | * [第12章 项目:波士顿住房价格回归](3-multi-layer-perceptrons/ch12-project-regression-of-boston-house-prices.md) 21 | * [IV Keras与高级多层感知器](4-advanced-multi-layer-perceptrons-and-keras/part4.md) 22 | * [第13章 用序列化保存模型](4-advanced-multi-layer-perceptrons-and-keras/ch13-save-your-models-for-later-with-serialization.md) 23 | * [第14章 使用保存点保存最好的模型](4-advanced-multi-layer-perceptrons-and-keras/ch14-keep-the-best-models-during-training-with-checkpointing.md) 24 | * [第15章 模型训练效果可视化](4-advanced-multi-layer-perceptrons-and-keras/ch15-understand-model-behavior-during-training-by-plotting-history.md) 25 | * [第16章 使用Dropout正则化防止过拟合](4-advanced-multi-layer-perceptrons-and-keras/ch16-reduce-overfitting-with-dropout-regularization.md) 26 | * [第17章 学习速度设计](4-advanced-multi-layer-perceptrons-and-keras/ch17-lift-performance-with-learning-rate-schedules.md) 27 | * [V 卷积神经网络](5-convolutional-neural-networks/part5.md) 28 | -------------------------------------------------------------------------------- /book.json: -------------------------------------------------------------------------------- 1 | { 2 | "author": "Beining ", 3 | "description": "使用Keras、Python、Theano和TensorFlow开发深度学习模型", 4 | "extension": null, 5 | "generator": "site", 6 | "isbn": "", 7 | "links": { 8 | "sharing": { 9 | "all": null, 10 | "facebook": null, 11 | "google": null, 12 | "twitter": null, 13 | "weibo": null 14 | }, 15 | "sidebar": { 16 | "Beining's Blog": "https://www.cnbeining.com/", 17 | "Github": "https://github.com/cnbeining/deep-learning-with-python-cn", 18 | "讨论区": "https://github.com/cnbeining/deep-learning-with-python-cn/issues" 19 | } 20 | }, 21 | "output": null, 22 | "pdf": { 23 | "fontSize": 12, 24 | "footerTemplate": null, 25 | "headerTemplate": null, 26 | "margin": { 27 | "bottom": 36, 28 | "left": 62, 29 | "right": 62, 30 | "top": 36 31 | }, 32 | "pageNumbers": false, 33 | "paperSize": "a4" 34 | }, 35 | "plugins": [ 36 | "highlight", 37 | "katex" 38 | ], 39 | "title": "深度学习:Python教程", 40 | "variables": {} 41 | } 42 | --------------------------------------------------------------------------------