├── LICENSE ├── Notes_CN ├── DRL.pdf ├── README.md ├── book1.png ├── book2.jpg └── book3.jpg ├── README.md └── Slides ├── 1_Basics_1.pdf ├── 1_Basics_2.pdf ├── 1_Basics_3.pdf ├── 1_Basics_4.pdf ├── 1_Basics_5.pdf ├── 2_TD_1.pdf ├── 2_TD_2.pdf ├── 2_TD_3.pdf ├── 3_DQN_1.pdf ├── 3_DQN_2.pdf ├── 3_DQN_3.pdf ├── 4_Policy_1.pdf ├── 4_Policy_2.pdf ├── 4_Policy_3.pdf ├── 4_Policy_4.pdf ├── 5_Policy_1.pdf ├── 6_Continuous_1.pdf ├── 6_Continuous_2.pdf ├── 6_Continuous_3.pdf ├── 7_MARL_1.pdf └── 7_MARL_2.pdf /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2020 Shusen Wang 2 | 3 | Permission is granted to only nonprofit organizations, including schools and 4 | research institutes. Employees of nonprofit organizations are granted, free 5 | of charge, the rights to use, copy, modify, merge, publish, and distribute 6 | the slides and lecture notes in this repo. 7 | -------------------------------------------------------------------------------- /Notes_CN/DRL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Notes_CN/DRL.pdf -------------------------------------------------------------------------------- /Notes_CN/README.md: -------------------------------------------------------------------------------- 1 | # Deep Reinforcement Learning Book In Chinese 2 | 3 | 中文图书[人民邮电出版社](https://www.ituring.com.cn/book/2982)编辑和出版,全书294页,彩色印刷。草稿仍然可以在我的GitHub上免费下载。正式出版的书经过了作者和编辑的反复修改和校对,并添加了少量新的内容、习题、以及全部习题的答案。本书算法的PyTorch代码在这里 [[链接]](https://github.com/DeepRLChinese/DeepRL-Chinese)。 4 | 5 | 京东:[https://u.jd.com/eLdsveg](https://u.jd.com/eLdsveg) 6 | 7 | 当当:[http://product.dangdang.com/29490069.html‍‬⁢‬‍‌⁢⁣](http://product.dangdang.com/29490069.html) 8 | 9 | 10 | 11 | ![book1](book1.png) 12 | 13 | 14 | ![book2](book2.jpg) 15 | 16 | 17 | ![book3](book3.jpg) 18 | 19 | 20 | -------------------------------------------------------------------------------- /Notes_CN/book1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Notes_CN/book1.png -------------------------------------------------------------------------------- /Notes_CN/book2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Notes_CN/book2.jpg -------------------------------------------------------------------------------- /Notes_CN/book3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Notes_CN/book3.jpg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Reinforcement Learning 2 | 3 | 4 | 5 | 6 | 1. **Overview.** 7 | 8 | 9 | * Reinforcement Learning 10 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_1.pdf)] 11 | [[lecture note](https://github.com/wangshusen/DeepLearning/blob/master/LectureNotes/DRL/DRL.pdf)] 12 | [[Video (in Chinese)](https://youtu.be/vmkRMvhCW5c)]. 13 | 14 | * Value-Based Learning 15 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_2.pdf)] 16 | [[Video (in Chinese)](https://youtu.be/jflq6vNcZyA)]. 17 | 18 | * Policy-Based Learning 19 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_3.pdf)] 20 | [[Video (in Chinese)](https://youtu.be/qI0vyfR2_Rc)]. 21 | 22 | * Actor-Critic Methods 23 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_4.pdf)] 24 | [[Video (in Chinese)](https://youtu.be/xjd7Jq9wPQY)]. 25 | 26 | * AlphaGo 27 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_5.pdf)] 28 | [[Video (in Chinese)](https://youtu.be/zHojAp5vkRE)]. 29 | 30 | 31 | 32 | 33 | 34 | 2. **TD Learning.** 35 | 36 | * Sarsa 37 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_1.pdf)] 38 | [[Video (in Chinese)](https://youtu.be/-cYWdUubB6Q)]. 39 | 40 | * Q-learning 41 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_2.pdf)] 42 | [[Video (in Chinese)](https://youtu.be/Ymy2w3DGn2U)]. 43 | 44 | * Multi-Step TD Target 45 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_3.pdf)] 46 | [[Video (in Chinese)](https://youtu.be/UqTP138IATc)]. 47 | 48 | 49 | 50 | 51 | 52 | 3. **Advanced Topics on Value-Based Learning.** 53 | 54 | 55 | * Experience Replay (ER) & Prioritized ER 56 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_1.pdf)] 57 | [[Video (in Chinese)](https://youtu.be/rhslMPmj7SY)]. 58 | 59 | * Overestimation, Target Network, & Double DQN 60 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_2.pdf)] 61 | [[Video (in Chinese)](https://youtu.be/X2-56QN79zc)]. 62 | 63 | * Dueling Networks 64 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_3.pdf)] 65 | [[Video (in Chinese)](https://youtu.be/DBux6cA0EoM)]. 66 | 67 | 68 | 69 | 70 | 4. **Policy Gradient with Baseline.** 71 | 72 | 73 | * Policy Gradient with Baseline 74 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_1.pdf)] 75 | [[Video (in Chinese)](https://youtu.be/yNEqbptitZs)]. 76 | 77 | * REINFORCE with Baseline 78 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_2.pdf)] 79 | [[Video (in Chinese)](https://youtu.be/Ob78ADXTQNo)]. 80 | 81 | * Advantage Actor-Critic (A2C) 82 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_3.pdf)] 83 | [[Video (in Chinese)](https://youtu.be/mtT4TSGSon8)]. 84 | 85 | * REINFORCE versus A2C 86 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_4.pdf)] 87 | [[Video (in Chinese)](https://youtu.be/hN9WMIMMeAI)]. 88 | 89 | 90 | 91 | 5. **Advanced Topics on Policy-Based Learning.** 92 | 93 | * Trust-Region Policy Optimization (TRPO) 94 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/5_Policy_1.pdf)] 95 | [[Video (in Chinese)](https://youtu.be/fcSYiyvPjm4)]. 96 | 97 | * Partial Observation and RNNs. 98 | 99 | 100 | 101 | 6. **Dealing with Continuous Action Space.** 102 | 103 | 104 | * Discrete versus Continuous Control 105 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_1.pdf)] 106 | [[Video (in Chinese)](https://youtu.be/rRIjgdxSvg8)]. 107 | 108 | * Deterministic Policy Gradient (DPG) for Continuous Control 109 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_2.pdf)] 110 | [[Video (in Chinese)](https://youtu.be/cmWejKRWLA8)]. 111 | 112 | * Stochastic Policy Gradient for Continuous Control 113 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_3.pdf)] 114 | [[Video (in Chinese)](https://youtu.be/McqFyl_W5Wc)]. 115 | 116 | 117 | 118 | 7. **Multi-Agent Reinforcement Learning.** 119 | 120 | 121 | * Basics and Challenges 122 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/7_MARL_1.pdf)] 123 | [[Video (in Chinese)](https://youtu.be/KN-XMQFTD0o)]. 124 | 125 | * Centralized VS Decentralized 126 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/7_MARL_2.pdf)] 127 | [[Video (in Chinese)](https://youtu.be/0HV1hsjd1y8)]. 128 | 129 | 130 | 131 | 8. **Imitation Learning.** 132 | 133 | 134 | * Inverse Reinforcement Learning. 135 | 136 | * Generative Adversarial Imitation Learning (GAIL). 137 | 138 | 139 | -------------------------------------------------------------------------------- /Slides/1_Basics_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/1_Basics_1.pdf -------------------------------------------------------------------------------- /Slides/1_Basics_2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/1_Basics_2.pdf -------------------------------------------------------------------------------- /Slides/1_Basics_3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/1_Basics_3.pdf -------------------------------------------------------------------------------- /Slides/1_Basics_4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/1_Basics_4.pdf -------------------------------------------------------------------------------- /Slides/1_Basics_5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/1_Basics_5.pdf -------------------------------------------------------------------------------- /Slides/2_TD_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/2_TD_1.pdf -------------------------------------------------------------------------------- /Slides/2_TD_2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/2_TD_2.pdf -------------------------------------------------------------------------------- /Slides/2_TD_3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/2_TD_3.pdf -------------------------------------------------------------------------------- /Slides/3_DQN_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/3_DQN_1.pdf -------------------------------------------------------------------------------- /Slides/3_DQN_2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/3_DQN_2.pdf -------------------------------------------------------------------------------- /Slides/3_DQN_3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/3_DQN_3.pdf -------------------------------------------------------------------------------- /Slides/4_Policy_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/4_Policy_1.pdf -------------------------------------------------------------------------------- /Slides/4_Policy_2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/4_Policy_2.pdf -------------------------------------------------------------------------------- /Slides/4_Policy_3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/4_Policy_3.pdf -------------------------------------------------------------------------------- /Slides/4_Policy_4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/4_Policy_4.pdf -------------------------------------------------------------------------------- /Slides/5_Policy_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/5_Policy_1.pdf -------------------------------------------------------------------------------- /Slides/6_Continuous_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/6_Continuous_1.pdf -------------------------------------------------------------------------------- /Slides/6_Continuous_2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/6_Continuous_2.pdf -------------------------------------------------------------------------------- /Slides/6_Continuous_3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/6_Continuous_3.pdf -------------------------------------------------------------------------------- /Slides/7_MARL_1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/7_MARL_1.pdf -------------------------------------------------------------------------------- /Slides/7_MARL_2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangshusen/DRL/d41e637cc5cec13f342e47e93d23e669129189dd/Slides/7_MARL_2.pdf --------------------------------------------------------------------------------