├── Emphatic Temporal-Difference Learning ├── Emphatic_TD.ipynb ├── README.md ├── etd0.png ├── etd0_1.png ├── etd_lambda.png └── rmsve.png ├── Q(sigma) and multi-step bootstrapping methods ├── n_step_cliff.ipynb ├── n_step_importance_sampling_cliff.ipynb ├── n_step_importance_sampling_ratio_windy_grid.ipynb ├── n_step_windy_grid.ipynb ├── pic_1.png ├── pic_2.png ├── pic_3.png └── pic_4.png ├── README.md ├── Real Time Dynamic Programming ├── LRTA_RTDP.py ├── Policy_Evaluation.ipynb ├── Presentation.pdf ├── README.md ├── SyncDP_GaussSeidal.py ├── gauss_seidel_VI.ipynb ├── lrta_rtdp.mp4 └── syncdp_gauss.mp4 ├── TD Control methods - Expected SARSA ├── 1.png ├── 2.png ├── 3.png ├── 4.png ├── 5.png ├── 6.png ├── 7.png ├── 8.png ├── README.md ├── es.png ├── exp_sarsa.png ├── q.png ├── s.png ├── se.png ├── td_learning_3_cliff_walking.ipynb ├── td_learning_3_grid_world.ipynb ├── td_learning_3_windy_grid.ipynb ├── u_e_sarsa.png ├── u_q.png └── u_sarsa.png └── Temporal-Difference Learning by Harm van Seijen ├── README.md └── TD_FA_True_Online_lambda.ipynb /Emphatic Temporal-Difference Learning/Emphatic_TD.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Emphatic Temporal-Difference Learning/Emphatic_TD.ipynb -------------------------------------------------------------------------------- /Emphatic Temporal-Difference Learning/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Emphatic Temporal-Difference Learning/README.md -------------------------------------------------------------------------------- /Emphatic Temporal-Difference Learning/etd0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Emphatic Temporal-Difference Learning/etd0.png -------------------------------------------------------------------------------- /Emphatic Temporal-Difference Learning/etd0_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Emphatic Temporal-Difference Learning/etd0_1.png -------------------------------------------------------------------------------- /Emphatic Temporal-Difference Learning/etd_lambda.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Emphatic Temporal-Difference Learning/etd_lambda.png -------------------------------------------------------------------------------- /Emphatic Temporal-Difference Learning/rmsve.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Emphatic Temporal-Difference Learning/rmsve.png -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/n_step_cliff.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/n_step_cliff.ipynb -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/n_step_importance_sampling_cliff.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/n_step_importance_sampling_cliff.ipynb -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/n_step_importance_sampling_ratio_windy_grid.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/n_step_importance_sampling_ratio_windy_grid.ipynb -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/n_step_windy_grid.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/n_step_windy_grid.ipynb -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/pic_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/pic_1.png -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/pic_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/pic_2.png -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/pic_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/pic_3.png -------------------------------------------------------------------------------- /Q(sigma) and multi-step bootstrapping methods/pic_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Q(sigma) and multi-step bootstrapping methods/pic_4.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/README.md -------------------------------------------------------------------------------- /Real Time Dynamic Programming/LRTA_RTDP.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/LRTA_RTDP.py -------------------------------------------------------------------------------- /Real Time Dynamic Programming/Policy_Evaluation.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/Policy_Evaluation.ipynb -------------------------------------------------------------------------------- /Real Time Dynamic Programming/Presentation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/Presentation.pdf -------------------------------------------------------------------------------- /Real Time Dynamic Programming/README.md: -------------------------------------------------------------------------------- 1 | Presentation for 3rd February 2017 by Monica Patel and Pulkit Khandelwal 2 | 3 | -------------------------------------------------------------------------------- /Real Time Dynamic Programming/SyncDP_GaussSeidal.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/SyncDP_GaussSeidal.py -------------------------------------------------------------------------------- /Real Time Dynamic Programming/gauss_seidel_VI.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/gauss_seidel_VI.ipynb -------------------------------------------------------------------------------- /Real Time Dynamic Programming/lrta_rtdp.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/lrta_rtdp.mp4 -------------------------------------------------------------------------------- /Real Time Dynamic Programming/syncdp_gauss.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Real Time Dynamic Programming/syncdp_gauss.mp4 -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/1.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/2.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/3.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/4.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/5.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/6.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/7.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/8.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/README.md -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/es.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/es.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/exp_sarsa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/exp_sarsa.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/q.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/q.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/s.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/s.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/se.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/se.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/td_learning_3_cliff_walking.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/td_learning_3_cliff_walking.ipynb -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/td_learning_3_grid_world.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/td_learning_3_grid_world.ipynb -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/td_learning_3_windy_grid.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/td_learning_3_windy_grid.ipynb -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/u_e_sarsa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/u_e_sarsa.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/u_q.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/u_q.png -------------------------------------------------------------------------------- /TD Control methods - Expected SARSA/u_sarsa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/TD Control methods - Expected SARSA/u_sarsa.png -------------------------------------------------------------------------------- /Temporal-Difference Learning by Harm van Seijen/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Temporal-Difference Learning by Harm van Seijen/README.md -------------------------------------------------------------------------------- /Temporal-Difference Learning by Harm van Seijen/TD_FA_True_Online_lambda.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Pulkit-Khandelwal/Reinforcement-Learning-Notebooks/HEAD/Temporal-Difference Learning by Harm van Seijen/TD_FA_True_Online_lambda.ipynb --------------------------------------------------------------------------------