├── IEEE Transactions on Systems, Man, and Cybernetics- Systems Volume 45 issue 3 2015 [doi 10.1109%2FTSMC.2014.2358639] Liu, Chunming; Xu, Xin; Hu, Dewen -- Multiobjective Reinforcement Learning- A Compr.pdf ├── README.md └── DeepRL.md /IEEE Transactions on Systems, Man, and Cybernetics- Systems Volume 45 issue 3 2015 [doi 10.1109%2FTSMC.2014.2358639] Liu, Chunming; Xu, Xin; Hu, Dewen -- Multiobjective Reinforcement Learning- A Compr.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RL-Group/tutorials-and-papers/HEAD/IEEE Transactions on Systems, Man, and Cybernetics- Systems Volume 45 issue 3 2015 [doi 10.1109%2FTSMC.2014.2358639] Liu, Chunming; Xu, Xin; Hu, Dewen -- Multiobjective Reinforcement Learning- A Compr.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # tutorials-and-papers 2 | Collection of tutorials, exercises and papers on RL 3 | 4 | ## Study tracks 5 | 6 | - [Deep Reinforcement Learning](https://github.com/RL-Group/tutorials-and-papers/blob/master/DeepRL.md) 7 | 8 | ## Other collections of resources 9 | 10 | - [awesome-rl](https://github.com/aikorea/awesome-rl) 11 | - [awesome-deep-reinforcement-learning](https://github.com/williamd4112/awesome-deep-reinforcement-learning) 12 | 13 | ## Tutorials (theory) 14 | 15 | - Nervana's [Demystifying Deep Reinforcement Learning](https://www.intelnervana.com/demystifying-deep-reinforcement-learning/) 16 | 17 | ## Implementations 18 | 19 | - Pytorch 20 | - [Pytorch, DQN and gym](http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html#sphx-glr-intermediate-reinforcement-q-learning-py) 21 | 22 | - Tensorflow 23 | - [Minimal and clean examples of reinforcement learning algorithms](https://github.com/rlcode/reinforcement-learning) 24 | 25 | ## Reviews 26 | 27 | - Yuxi Li, [Deep Reinforcement Learning: An Overview](https://arxiv.org/abs/1701.07274) 28 | 66 pages 29 | - Arulkumaran et al, [A Brief Survey of Deep Reinforcement Learning](https://arxiv.org/abs/1708.05866) 14 pages 30 | 31 | ## Theses 32 | 33 | - David Silver's (2009) [Reinforcement Learning and Simulation-Based Search in Computer Go](http://papersdb.cs.ualberta.ca/~papersdb/uploaded_files/1029/paper_thesis.pdf) 34 | - John Schulman's (2016) [Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs](http://joschu.net/docs/thesis.pdf) 35 | 36 | ## Sutton & Barto 37 | Deserves its own section 38 | 39 | - 2nd edition draft [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) [PDF] 40 | - Solutions: 41 | - official manual somewhere 42 | - [JKCooper2/rlai-exercises](https://github.com/JKCooper2/rlai-exercises) 43 | - Implementations 44 | - [ShangtongZhang Sutton's student](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction) 45 | - [Denny Britz Google Brain Resident](https://github.com/dennybritz/reinforcement-learning) 46 | 47 | 48 | ## Books 49 | 50 | - RL 51 | - Marco Wiering and Martijn van Otterlo [Reinforcement Learning: State-of-the-Art](https://smile.amazon.com/Reinforcement-Learning-State-Art-Optimization/dp/364227644X) 52 | - DL 53 | - Goodfellow, Bengio and Courville [Deep Learning](http://www.deeplearningbook.org/) 54 | - ML 55 | - Bishop [Pattern Recognition and Machine Learning](https://smile.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738) 56 | 57 | ## Libraries 58 | 59 | - [Tensorforce](https://github.com/reinforceio/tensorforce): A TensorFlow library for applied reinforcement learning 60 | - [keras-rl](https://github.com/matthiasplappert/keras-rl): Library with keras and openai gym, with several DRL algorithms implemented/ 61 | -------------------------------------------------------------------------------- /DeepRL.md: -------------------------------------------------------------------------------- 1 | # Deep Reinforcement Learning 2 | 3 | ## Lecture 1: Markov Decision Processes and Solving Finite Problems 4 | 5 | [Video](https://www.youtube.com/watch?v=IL3gVyJMmhg&index=7&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX) 6 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec1.pdf) 7 | 8 | ## Lecture 2: Policy Gradient Methods 9 | 10 | [Video](https://www.youtube.com/watch?v=BB-BhTn6DCM&index=8&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX) 11 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec2.pdf) 12 | 13 | * Karpathy's Deep RL Tutorial: [Deep Reinforcement Learning: Pong from Pixels](http://karpathy.github.io/2016/05/31/rl/) 14 | * John Schulman's thesis: [Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs](http://joschu.net/docs/thesis.pdf) 15 | * R. J. Williams. [Simple statistical gradient-following algorithms for connectionist reinforcement learning](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf). Machine learning (1992) 16 | * R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. [Policy gradient methods for reinforcement learning with function approximation](https://web.eecs.umich.edu/~baveja/Papers/PolicyGradientNIPS99.pdf). NIPS. MIT Press, 2000 17 | * Jan Peters, Stefan Schaal. [Policy Gradient Methods for Robotics](http://www-clmc.usc.edu/publications/P/peters-IROS2006.pdf) (2006) 18 | * D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, et al. [Deterministic Policy Gradient Algorithms](http://proceedings.mlr.press/v32/silver14.pdf) ICML 2014. 19 | * N. Heess, G. Wayne, D. Silver, T. Lillicrap, Y. Tassa, et al. [Learning Continuous Control Policies by Stochastic Value Gradients](https://arxiv.org/abs/1510.09142) arXiv preprint arXiv:1510.09142 (2015). 20 | * T. Jie and P. Abbeel. [On a connection between importance sampling and the likelihood ratio policy gradient](http://rll.berkeley.edu/~jietang/pubs/nips10_Tang.pdf) Advances in Neural Information Processing Systems. 2010, pp. 1000–1008. 21 | * A3C paper: V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, et al. [Asynchronous methods for deep reinforcement learning](https://arxiv.org/abs/1602.01783) (2016) 22 | 23 | ## Lecture 3: Q-Function Learning Methods 24 | 25 | [Video](https://www.youtube.com/watch?v=Wnl-Qh2UHGg&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX&index=9) 26 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec3.pdf) 27 | 28 | * T. Jaakkola, M. I. Jordan, and S. P. Singh. [On the convergence of stochastic iterative dynamic programming algorithms](https://www.researchgate.net/publication/220499733_On_the_Convergence_of_Stochastic_Iterative_Dynamic_Programming_Algorithms). Neural computation (1994); 29 | * C. J. Watkins and P. Dayan. [Q-learning](https://link.springer.com/content/pdf/10.1007/BF00992698.pdf). Machine learning (1992). 30 | * M. Riedmiller. [Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method](https://pdfs.semanticscholar.org/2820/01869bd502c7917db8b32b75593addfbbc68.pdf). Machine Learning: ECML 2005. Springer, 2005. 31 | 32 | ## Lecture 4: Advanced Q-Function Learning Methods 33 | 34 | [Video](https://www.youtube.com/watch?v=h1-pj4Y9-kM&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX&index=10) 35 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec4.pdf) 36 | 37 | * V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, et al. [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602) (2013) 38 | * M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. [The Arcade Learning Environment: An Evaluation Platform for General Agents](https://arxiv.org/abs/1207.4708) Journal of Artificial Intelligence Research (2013) 39 | * V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, et al. [Human-level control through deep reinforcement learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf). Nature (2015) 40 | * H. V. Hasselt. [Double Q-learning](https://papers.nips.cc/paper/3964-double-q-learning.pdf). NIPS. 2010 41 | * H. Van Hasselt, A. Guez, and D. Silver. [Deep reinforcement learning with double Q-learning](https://arxiv.org/abs/1509.06461). CoRR, abs/1509.06461 (2015) 42 | * Z. Wang, N. de Freitas, and M. Lanctot. [Dueling network architectures for deep reinforcement learning](https://arxiv.org/abs/1511.06581). arXiv preprint arXiv:1511.06581 (2015) 43 | * T. Schaul, J. Quan, I. Antonoglou, and D. Silver. [Prioritized experience replay](https://arxiv.org/abs/1511.05952). arXiv preprint arXiv:1511.05952 (2015) 44 | * D. Silver, A. Huang, C. J. Maddison, et al. [Mastering the 45 | game of Go with deep neural networks and tree search](https://vk.com/doc-44016343_437229031?dl=56ce06e325d42fbc72) (2016) 46 | 47 | ## Lecture 5: Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More 48 | 49 | Schulman: [Video](https://www.youtube.com/watch?v=_t5fpZuuf-4&index=15&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX) 50 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec5.pdf) 51 | 52 | Achiam: [Video](https://www.youtube.com/watch?v=ycCtmp4hcUs&list=PLkFD6_40KJIznC9CDbVTjAF2oyt8_VAe3&index=14) 53 | | [Slides](http://rll.berkeley.edu/deeprlcourse/f17docs/lecture_13_advanced_pg.pdf) 54 | 55 | * S. Kakade. [A Natural Policy Gradient](https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf) NIPS. 2001 56 | * S. Kakade and J. Langford. [Approximately optimal approximate reinforcement learning](https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/KakadeLangford-icml2002.pdf). ICML. 2002. 57 | * J. Peters and S. Schaal. [Natural actor-critic](https://homes.cs.washington.edu/~todorov/courses/amath579/reading/NaturalActorCritic.pdf). Neurocomputing 58 | (2008) 59 | * J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477). ICML (2015) 60 | * Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel [Constrained Policy Optimization](https://arxiv.org/abs/1705.10528) (2017) 61 | * John Schulman, Filip Wolski, Prafulla Dhariwal, et al. [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347) (2017) 62 | * Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel. [Benchmarking Deep Reinforcement Learning for Continuous Control](https://arxiv.org/abs/1604.06778). ICML (2016) 63 | * J. Martens and I. Sutskever. [Training deep and recurrent networks with Hessian-free optimization](http://www.cs.utoronto.ca/~jmartens/docs/HF_book_chapter.pdf). 64 | * Neural Networks: Tricks of the Trade. Springer, 2012 (book) 65 | * S. Amari and S. C. Douglas [Why Natural Gradient?](http://www.yaroslavvb.com/papers/amari-why.pdf) (1998) 66 | * Jan Peters, Stefan Schaal [Reinforcement learning of motor skills with policy gradients](http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Neural-Netw-2008-21-682_4867%5b0%5d.pdf) (2008) 67 | * Nicolas Heess, Dhruva TB, Srinivasan Sriram, et al. [Emergence of Locomotion Behaviours in Rich Environments](https://arxiv.org/abs/1707.02286) (2017) 68 | * Yuhuai Wu, Elman Mansimov, Shun Liao, et al. [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation](https://arxiv.org/abs/1708.05144) (2017) 69 | 70 | ## Lecture 6: Variance Reduction for Policy Gradient Methods 71 | 72 | [Video](https://www.youtube.com/watch?v=Fauwwkiy-bo&index=16&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX) 73 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec6.pdf) 74 | 75 | * A. Y. Ng, D. Harada, and S. Russell. [Policy invariance under reward transformations: Theory and application to reward shaping](http://www.robotics.stanford.edu/~ang/papers/shaping-icml99.pdf). ICML. 1999. 76 | * (Example) I. Mordatch, E. Todorov, and Z. Popović. [Discovery of complex behaviors through contact-invariant optimization](https://homes.cs.washington.edu/~todorov/papers/MordatchSIGGRAPH12.pdf). ACM Transactions on Graphics (TOG) 31.4 (2012), p. 43 77 | * (Example) Y. Tassa, T. Erez, and E. Todorov. [Synthesis and stabilization of complex behaviors through online trajectory optimization](https://homes.cs.washington.edu/~todorov/papers/TassaIROS12.pdf) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE. 2012, pp. 4906–4913 78 | * J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. [High-dimensional continuous control using generalized advantage estimation](https://arxiv.org/abs/1506.02438) (2015) 79 | * H. Kimura and S. Kobayashi. [An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function](http://www.umiacs.umd.edu/~hal/courses/2016F_RL/Kimura98.pdf) ICML. 1998 80 | * A. Y. Ng and M. Jordan. [PEGASUS: A policy search method for large MDPs and POMDPs](http://www.robotics.stanford.edu/~ang/papers/uai00-pegasus.pdf) (2000) 81 | 82 | ## Lecture 7: Policy Gradient Methods: Pathwise Derivative Methods and Wrap-up 83 | 84 | [Video](https://www.youtube.com/watch?v=IDSA2wAACr0&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX&index=17) 85 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/lec7.pdf) 86 | 87 | * T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, et al. [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971). arXiv preprint arXiv:1509.02971 (2015) 88 | * (Example) S. Gu, T. Lillicrap, I. Sutskever, and S. Levine. [Continuous deep Q-learning with model-based acceleration](https://arxiv.org/abs/1603.00748) (2016) 89 | * B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih. [PGQ: Combining policy gradient and Q-learning](https://arxiv.org/abs/1611.01626) (2016) 90 | * Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, et al. [Sample Efficient Actor-Critic with Experience Replay](https://arxiv.org/abs/1611.01224) (2016) 91 | * A. Harutyunyan, M. G. Bellemare, T. Stepleton, and R. Munos. [Q(λ) with Off-Policy Corrections](https://arxiv.org/abs/1602.04951) 2016 92 | * N. Jiang and L. Li. [Doubly robust off-policy value evaluation for reinforcement learning](https://arxiv.org/abs/1511.03722) 2016 93 | * O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans. [Bridging the Gap Between Value and Policy Based Reinforcement Learning](https://arxiv.org/abs/1702.08892) (2017) 94 | * T. Haarnoja, H. Tang, P. Abbeel, and S. Levine. [Reinforcement Learning with Deep Energy-Based Policies](https://arxiv.org/abs/1702.08165) (2017) 95 | 96 | ## Lecture 8: Exploration 97 | 98 | [Video](https://www.youtube.com/watch?v=SfCa1HQMkuw&index=18&list=PLkFD6_40KJIwTmSbCv9OVJB3YaO4sFwkX) 99 | | [Slides](http://rll.berkeley.edu/deeprlcoursesp17/docs/2017.03.20.Exploration.pdf) 100 | 101 | * Bubeck, Sébastien, and Nicolo Cesa-Bianchi. [Regret analysis of stochastic and nonstochastic multi-armed bandit problems](https://arxiv.org/abs/1204.5721) arXiv preprint arXiv:1204.5721 (2012). 102 | * Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer, [Finite-Time Analysis of the Multi-Armed Bandit Problem](https://d2925a48-a-62cb3a1a-s-sites.googlegroups.com/site/anrexplora/bibliography/fta-2002.pdf), Mach. Learn., 47 (2002), 235–256 103 | * Daniel Russo, Benjamin Van Roy (2014) [Learning to Optimize via Posterior Sampling](https://arxiv.org/abs/1301.2609). Mathematics of Operations Research 104 | * Chapelle O. and Li, L. [An Empirical Evaluation of Thompson Sampling](https://papers.nips.cc/paper/4321-an-empirical-evaluation-of-thompson-sampling.pdf). NIPS, 2011 105 | * Daniel Russo, Benjamin Van Roy (2014) [Learning to Optimize via Information-Directed Sampling](https://arxiv.org/abs/1403.5556). NIPS 106 | * Kearns & Singh, [Near-Optimal Reinforcement Learning in Polynomial Time](https://www.cis.upenn.edu/~mkearns/papers/KearnsSinghE3.pdf) (1999) 107 | * Kakade, [On the sample complexity of reinforcement learning](https://homes.cs.washington.edu/~sham/papers/thesis/sham_thesis.pdf) (thesis) (2003) 108 | * Szita, István, and András Lőrincz. [The many faces of optimism: a unifying approach](http://icml2008.cs.helsinki.fi/papers/490.pdf) ICML 2008. 109 | * Moldovan, Teodor Mihai, and Pieter Abbeel. [Safe exploration in 110 | markov decision processes](https://arxiv.org/abs/1205.4810) arXiv preprint arXiv:1205.4810 (2012). 111 | * Strehl, [PROBABLY APPROXIMATELY CORRECT (PAC) EXPLORATION IN REINFORCEMENT LEARNING](http://cs.brown.edu/~mlittman/theses/strehl.pdf), 2007 112 | * Osband, Ian, and Benjamin Van Roy. [Bootstrapped Thompson Sampling and Deep Exploration](https://arxiv.org/abs/1507.00300) (2015) 113 | * I. Osband, C. Blundell, A. Pritzel, and B. Van Roy [Deep Exploration via Bootstrapped DQN](https://arxiv.org/abs/1602.04621) (2016) 114 | * Yarin Gal, and Zoubin Ghahramani. [Dropout as a Bayesian approximation: Representing model uncertainty in deep learning](https://arxiv.org/abs/1506.02142) (2015). 115 | * Zachary Lipton et al., [Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking](https://arxiv.org/abs/1608.05081) (2016) 116 | * Stadie et al., [Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models](https://arxiv.org/abs/1507.00814) (2015) 117 | * Bellemare et al., [Unifying Count-Based Exploration and Intrinsic Motivation](https://arxiv.org/abs/1606.01868) (2016) 118 | * Ostrovski et al., [Count-Based Exploration with Neural Density Models](https://arxiv.org/abs/1703.01310) (2017) 119 | * Meire Fortunato, Mohammad Gheshlaghi Azar, et al. [Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295) (2017) 120 | * Houthooft et al., [Variational information maximizing exploration](https://arxiv.org/abs/1605.09674) (2016). 121 | * Duan et al., [RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning](https://arxiv.org/abs/1611.02779) (2017) 122 | * Wang et al., [Learning to Reinforcement Learn](https://arxiv.org/abs/1611.05763) (2017) 123 | * Singh, S. P., Barto, A. G., and Chentanez, N. [Intrinsically motivated reinforcement learning](https://papers.nips.cc/paper/2552-intrinsically-motivated-reinforcement-learning.pdf) In NIPS, 2005. 124 | * Oudeyer, Pierre-Yves, and Frederic Kaplan. [How can we define intrinsic motivation?](http://www.pyoudeyer.com/epirob08OudeyerKaplan.pdf) 2008. 125 | * Shakir Mohamed and Danilo J. Rezende, [Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning](https://arxiv.org/abs/1509.08731), ArXiv 2015. (Includes discussion of *common random numbers*) 126 | 127 | ## Scalability 128 | 129 | * Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever [Evolution Strategies as a Scalable Alternative to Reinforcement Learning](https://arxiv.org/abs/1703.03864) (2017) 130 | * Alfredo V. Clemente, Humberto N. Castejón, Arjun Chandra [Efficient Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1705.04862) (2017) 131 | 132 | ## Frontiers 133 | 134 | * Piotr Mirowski, Razvan Pascanu, Fabio Viola, et al. [Learning to Navigate in Complex Environments](https://arxiv.org/abs/1611.03673) (2017) 135 | * Alexander Sasha Vezhnevets, Simon Osindero, et al. [FeUdal Networks for Hierarchical Reinforcement Learning](https://arxiv.org/abs/1703.01161) (2017) 136 | * Nicolas Heess, Greg Wayne, Yuval Tassa, et al. [Learning and Transfer of Modulated Locomotor Controllers](https://arxiv.org/abs/1610.05182) (2016) 137 | * Théophane Weber, Sébastien Racanière, et al. [Imagination-Augmented Agents for Deep Reinforcement Learning](https://arxiv.org/abs/1707.06203) (2017) 138 | * S. Levine, C. Finn, T. Darrell, and P. Abbeel. [End-to-end training of deep visuomotor policies](https://arxiv.org/abs/1504.00702) (2016) 139 | * Haoran Tang and Tuomas Haarnoja [Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning](http://bair.berkeley.edu/blog/2017/10/06/soft-q-learning/) (2017) 140 | * Matteo Hessel, Joseph Modayil, et al. [Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/abs/1710.02298) (2017) 141 | * Max Jaderberg, Volodymyr Mnih, et al. [Reinforcement Learning with Unsupervised Auxiliary Tasks](https://arxiv.org/abs/1611.05397) (2016) 142 | * Aravind Rajeswaran, Vikash Kumar, et al. [Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations](https://arxiv.org/abs/1709.10087) (2017) 143 | 144 | ## Optimization 145 | 146 | * Kerby Shedden [Optimization in statistics](http://dept.stat.lsa.umich.edu/~kshedden/Courses/Stat606/Notes/optim.pdf) 147 | - It contains a nice section about the *Fisher Information* and its relation to the *Hessian* of the log-likelihood. 148 | - It also contains a section about *Conjugate-Gradient Methods*. 149 | * Andrew Gibiansky [Conjugate Gradient (article)](http://andrew.gibiansky.com/blog/machine-learning/conjugate-gradient/) (2014) 150 | * Andrew Gibiansky [Hessian Free Optimization (article)](http://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/) (2014) 151 | * Andrew Gibiansky [Fully Connected Neural Network Algorithms (article)](http://andrew.gibiansky.com/blog/machine-learning/fully-connected-neural-networks/) (2014) 152 | * Andrew Gibiansky [Gauss Newton Matrix (article)](http://andrew.gibiansky.com/blog/machine-learning/gauss-newton-matrix/) (2014) 153 | 154 | ## Useful knowledge 155 | 156 | * Art B. Owen [Monte Carlo theory, methods and examples](http://statweb.stanford.edu/~owen/mc/) (2013) 157 | - Useful for learning about **variance reduction**) 158 | --------------------------------------------------------------------------------