├── Algorithms ├── Monte_Carlo.py ├── PolicyGradient.py ├── PolicyInteration.py ├── Q_Learning.py ├── README.md ├── Sarsa.py ├── ValueIteration.py ├── __pycache__ │ ├── Monte_Carlo.cpython-311.pyc │ ├── Monte_Carlo.cpython-312.pyc │ ├── PolicyGradient.cpython-312.pyc │ ├── PolicyInteration.cpython-311.pyc │ ├── PolicyInteration.cpython-312.pyc │ ├── Q_Learning.cpython-311.pyc │ ├── Q_Learning.cpython-312.pyc │ ├── Sarsa.cpython-311.pyc │ ├── Sarsa.cpython-312.pyc │ ├── ValueIteration.cpython-311.pyc │ └── ValueIteration.cpython-312.pyc └── pics │ ├── a2c.png │ ├── consistent_optimality.png │ ├── convergence_dqn.png │ ├── epsilon.png │ ├── expected_sarsa.png │ ├── ground_truth.png │ ├── n_steps_sarsa.png │ ├── pg_convergence.png │ ├── predict_state_values.png │ └── td_comparison.png ├── README.md ├── examples ├── __pycache__ │ ├── arguments.cpython-311.pyc │ └── arguments.cpython-312.pyc ├── arguments.py ├── example_grid_world.py └── main.py └── src ├── __pycache__ ├── grid_world.cpython-311.pyc ├── grid_world.cpython-312.pyc ├── grid_world.cpython-38.pyc └── utils.cpython-311.pyc └── grid_world.py /Algorithms/Monte_Carlo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/Monte_Carlo.py -------------------------------------------------------------------------------- /Algorithms/PolicyGradient.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/PolicyGradient.py -------------------------------------------------------------------------------- /Algorithms/PolicyInteration.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/PolicyInteration.py -------------------------------------------------------------------------------- /Algorithms/Q_Learning.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/Q_Learning.py -------------------------------------------------------------------------------- /Algorithms/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/README.md -------------------------------------------------------------------------------- /Algorithms/Sarsa.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/Sarsa.py -------------------------------------------------------------------------------- /Algorithms/ValueIteration.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/ValueIteration.py -------------------------------------------------------------------------------- /Algorithms/__pycache__/Monte_Carlo.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/Monte_Carlo.cpython-311.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/Monte_Carlo.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/Monte_Carlo.cpython-312.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/PolicyGradient.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/PolicyGradient.cpython-312.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/PolicyInteration.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/PolicyInteration.cpython-311.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/PolicyInteration.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/PolicyInteration.cpython-312.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/Q_Learning.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/Q_Learning.cpython-311.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/Q_Learning.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/Q_Learning.cpython-312.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/Sarsa.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/Sarsa.cpython-311.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/Sarsa.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/Sarsa.cpython-312.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/ValueIteration.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/ValueIteration.cpython-311.pyc -------------------------------------------------------------------------------- /Algorithms/__pycache__/ValueIteration.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/__pycache__/ValueIteration.cpython-312.pyc -------------------------------------------------------------------------------- /Algorithms/pics/a2c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/a2c.png -------------------------------------------------------------------------------- /Algorithms/pics/consistent_optimality.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/consistent_optimality.png -------------------------------------------------------------------------------- /Algorithms/pics/convergence_dqn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/convergence_dqn.png -------------------------------------------------------------------------------- /Algorithms/pics/epsilon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/epsilon.png -------------------------------------------------------------------------------- /Algorithms/pics/expected_sarsa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/expected_sarsa.png -------------------------------------------------------------------------------- /Algorithms/pics/ground_truth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/ground_truth.png -------------------------------------------------------------------------------- /Algorithms/pics/n_steps_sarsa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/n_steps_sarsa.png -------------------------------------------------------------------------------- /Algorithms/pics/pg_convergence.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/pg_convergence.png -------------------------------------------------------------------------------- /Algorithms/pics/predict_state_values.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/predict_state_values.png -------------------------------------------------------------------------------- /Algorithms/pics/td_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/Algorithms/pics/td_comparison.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/README.md -------------------------------------------------------------------------------- /examples/__pycache__/arguments.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/examples/__pycache__/arguments.cpython-311.pyc -------------------------------------------------------------------------------- /examples/__pycache__/arguments.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/examples/__pycache__/arguments.cpython-312.pyc -------------------------------------------------------------------------------- /examples/arguments.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/examples/arguments.py -------------------------------------------------------------------------------- /examples/example_grid_world.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/examples/example_grid_world.py -------------------------------------------------------------------------------- /examples/main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/examples/main.py -------------------------------------------------------------------------------- /src/__pycache__/grid_world.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/src/__pycache__/grid_world.cpython-311.pyc -------------------------------------------------------------------------------- /src/__pycache__/grid_world.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/src/__pycache__/grid_world.cpython-312.pyc -------------------------------------------------------------------------------- /src/__pycache__/grid_world.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/src/__pycache__/grid_world.cpython-38.pyc -------------------------------------------------------------------------------- /src/__pycache__/utils.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/src/__pycache__/utils.cpython-311.pyc -------------------------------------------------------------------------------- /src/grid_world.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning/HEAD/src/grid_world.py --------------------------------------------------------------------------------