├── .gitignore ├── LICENSE ├── README.md ├── cpp ├── ETD.cpp └── TOETD.cpp └── py3 ├── dvtd.py ├── elstd.py ├── etd.py ├── gtd.py ├── htd.py ├── idbd.py ├── lstd.py ├── td.py └── totd.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled Object files 2 | *.slo 3 | *.lo 4 | *.o 5 | *.obj 6 | 7 | # Precompiled Headers 8 | *.gch 9 | *.pch 10 | 11 | # Compiled Dynamic libraries 12 | *.so 13 | *.dylib 14 | *.dll 15 | 16 | # Fortran module files 17 | *.mod 18 | 19 | # Compiled Static libraries 20 | *.lai 21 | *.la 22 | *.a 23 | *.lib 24 | 25 | # Executables 26 | *.exe 27 | *.out 28 | *.app 29 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 rldotai 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # rl-algorithms 2 | 3 | Reinforcement learning algorithms. 4 | 5 | There are many different variants on the basic ideas of reinforcement learning. 6 | I have implemented some of them, with a focus on linear function approximation. 7 | 8 | Extending these algorithms (for example, with nonlinear function approximators such as neural nets) is relatively straightforward once you are familiar with the underlying ideas. 9 | 10 | To facilitate this, the algorithms listed are written in a straightforward style and thoroughly commented, with references to the relevant papers and some explanation of the reasoning behind the code. 11 | 12 | ## Implemented Algorithms 13 | 14 | - [TD(λ): Temporal Difference Learning](py3/td.py) 15 | - [LSTD(λ): Least-Squares Temporal Difference Learning](py3/lstd.py) 16 | - [ETD(λ): Emphatic Temporal Difference Learning](py3/etd.py) 17 | - [GTD(λ): Gradient Temporal Difference Learning, AKA TDC(λ)](py3/gtd.py) 18 | - [TOTD(λ): True-Online Temporal Difference Learning, AKA TD with "Dutch Traces"](py3/totd.py) 19 | - [ESTD(λ): Least Squares Emphatic Temporal Difference Learning](py3/elstd.py) 20 | - [HTD(λ): Hybrid Temporal Difference Learning](py3/htd.py) 21 | - [DVTD(λ) or TD-δ^2: Online Variance Estimation via temporal difference errors](py3/td-variance.py) 22 | - [The paper describing it](https://arxiv.org/abs/1801.08287) 23 | 24 | ## TODO 25 | 26 | - [ ] Q-Learning 27 | - [ ] SARSA 28 | - [ ] Distributional RL algorithms 29 | - [ ] Other second-order TD algorithms (e.g., NTD) 30 | - [ ] Actor-Critic algorithms 31 | 32 | # Contributing 33 | 34 | Send me a pull request if you have code to contribute. 35 | 36 | Alternatively, raise an issue and provide me with a link to the paper describing the algorithm, and I will read and implement it when I get a chance. 37 | -------------------------------------------------------------------------------- /cpp/ETD.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * ETD(lambda): Emphatic Temporal Difference Learning 3 | * 4 | * @author Brendan Bennett, Rich Sutton, October 2015. 5 | * 6 | * CHANGES FROM TOETD.cpp 7 | * - renamed some variables 8 | * - removed `gamma` as object variable, since it was unused 9 | * - rearranged parameters in `learn()` so that `phi`, `r`, `phi_p` come first 10 | */ 11 | 12 | class ETD 13 | { 14 | int n; 15 | double *theta; 16 | double *e; 17 | double F; 18 | double D; 19 | 20 | public: 21 | ETD(int fvec_length) { 22 | n = fvec_length; 23 | e = new double[n]; 24 | theta = new double[n]; 25 | 26 | // initialize weight vector and traces 27 | for (int i=0; i