├── Project GitHub.pdf
├── README.md
├── cliff_walking
    ├── Sample results
    │   ├── cliff env.png
    │   ├── cliff_data.txt
    │   ├── readme
    │   ├── rewardcliff.png
    │   ├── stepscliff.png
    │   ├── toute.png
    │   └── valuescliff.png
    ├── algo_double_qlearning.py
    ├── algo_qlearning.py
    ├── algo_sarsa.py
    ├── env_cliff.py
    ├── images
    │   ├── cliff.png
    │   ├── end.png
    │   ├── readme
    │   ├── robot.png
    │   └── start.png
    ├── plot_results.py
    ├── readme
    └── run.py
├── env1.png
├── env2.jpg
├── q_learning
    ├── Sample results
    │   ├── QLreward.png
    │   ├── QLsteps.png
    │   ├── QLvalue.png
    │   ├── data.txt
    │   ├── readme
    │   └── result QL.png
    ├── algo_qlearning.py
    ├── environment.py
    ├── images
    │   ├── agent.png
    │   ├── boot_tree.png
    │   ├── building.png
    │   ├── flag.png
    │   ├── garbage.png
    │   ├── obstacle.png
    │   ├── readme
    │   ├── rubik.png
    │   └── tree.png
    ├── plot_results.py
    ├── readme
    └── run.py
├── sarsa
    ├── Sample results
    │   ├── data.txt
    │   ├── readme
    │   ├── result sarsa.png
    │   ├── sarsareward.png
    │   ├── sarsastep.png
    │   └── sarsavalue.png
    ├── algo_sarsa.py
    ├── environment.py
    ├── images
    │   ├── agent.png
    │   ├── boot_tree.png
    │   ├── building.png
    │   ├── flag.png
    │   ├── garbage.png
    │   ├── obstacle.png
    │   ├── readme
    │   ├── rubik.png
    │   └── tree.png
    ├── plot_results.py
    ├── readme
    └── run.py
├── td_learning
    ├── Sample results
    │   ├── TD_0.png
    │   ├── TDreward.png
    │   ├── TDsteps.png
    │   ├── TDvalue.png
    │   ├── data.txt
    │   └── readme
    ├── algo_td0.py
    ├── environment.py
    ├── images
    │   ├── agent.png
    │   ├── boot_tree.png
    │   ├── building.png
    │   ├── flag.png
    │   ├── garbage.png
    │   ├── obstacle.png
    │   ├── readme
    │   ├── rubik.png
    │   └── tree.png
    ├── plot_results.py
    ├── readme
    └── run.py
└── training.gif


/Project GitHub.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/Project GitHub.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Global Path Planning with Reinforcement Learning Algorithms to Help the WALL-E Robot Reach EVE!
 2 | 
 3 | ## About the Project
 4 | 
 5 | ### Introduction
 6 | 
 7 | This project aims to test various reinforcement learning (RL) algorithms for the global path planning of a mobile robot. The environment is designed based on the popular WALL-E animation, and the tested algorithms include Q-learning, SARSA, TD(0) learning, and double Q-learning (Temporal Difference (TD) learning is a combination of Monte Carlo principles and dynamic programming (DP) concepts).
 8 | 
 9 | ### Environment 1
10 | 
11 | The environment is designed with Tkinter, a standard Python interface to the Tcl/Tk GUI toolkit. In this environment, the WALL-E  robot wants to reach the goal, which is the EVE robot, but there are numerous obstacles in the path that it must avoid.
12 | The environment size is 15x15, in which each square is 40x40 pixels, and there are 52 obstacles inside it, including trees, buildings, garbage, road signs, a plant in the boot (based on animation), and a Rubik's cube. The upside left corner (the agent starting position) is (0, 0), and going to the right and down is +X and +Y, respectively. For example, two steps to the right and one step to the down move the agent to the [80, 40] location.
13 | The below figure shows a screenshot of the environment. The position of obstacles and the blocking area around the goal are considered in such a way that it is not easy for the agent to find an optimal path.
14 | 
15 | ![Wall-e environment](env1.png)
16 | 
17 | ### Environment 2
18 | 
19 | In addition to applying some RL algorithms in the mentioned environment, another test is done in a traditional environment. The second problem is Cliff Walking from *"Reinforcement Learning: An Introduction"* by Andrew Barto and Richard S. Sutton. This challenge is solved with those algorithms, and all results and output data are presented in the project's PDF file. You can test them and change the variables or even the environmental features.
20 | 
21 | ![Cliff environemnt](env2.jpg)
22 | 
23 | ### General concept
24 | 
25 | This RL problem is modeled with the Markov decision process (MDP), in which state, action, and reward sets are *S*, *A*, and *R*. The environmental dynamics would be a set of probabilities *p(s', r | s, a)* for all states, actions, and rewards. However, the testing environment is deterministic, and there are no stochastic actions.
26 | The agent can go up, down, right, or left to reach the final goal in training mode. Each time the agent hits an obstacle, it will receive a -5 reward, and the system will reset to the initial point. If the robot reaches the goal, it will receive a massive reward of +100, and the reward is zero for other movements. According to animation, the interest of the WALL-E in a Rubik's cube is considered motivation. It is not the goal, but it has a -1 reward. Solving this problem can be a good evaluation for some popular RL algorithms regarding the number of obstacles, especially around the goal, environment size, and motivation between the paths.
27 | 
28 | 
29 | ## Installation
30 | 
31 | It is easier to create a Conda environment and install the below libraries (Python version 3.9):
32 | 
33 | ```bash
34 | conda create -n rl_env python=3.9
35 | conda activate rl_env
36 | conda install -c anaconda pandas
37 | conda install -c anaconda numpy
38 | conda install matplotlib
39 | conda install -c anaconda tk
40 | conda install -c anaconda pillow
41 | ```
42 | 
43 | ## Usage
44 | 
45 | At first, download the repository in your destination folder.
46 | 
47 | ```bash
48 | git clone https://github.com/pouyan-asg/path-planning-with-RL-algorithms.git
49 | ```
50 | Then, go to the main folder and select the algorithm that you want to test.
51 | 
52 | ```bash
53 | python run.py
54 | 
55 | ```
56 | The training process of the agent is illustrated in the animated GIF below. Over time, observable results emerge. Reference examples are provided for each algorithm. Additionally, you have the flexibility to modify parameters such as learning rate, discount factor, number of iterations, and more to assess and refine the obtained results.
57 | 
58 | ![agent training](training.gif) 
59 | 


--------------------------------------------------------------------------------
/cliff_walking/Sample results/cliff env.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/cliff env.png


--------------------------------------------------------------------------------
/cliff_walking/Sample results/cliff_data.txt:
--------------------------------------------------------------------------------
1 | Q-Learning: [[0.0, 160.0], [80.0, 160.0], [160.0, 160.0], [240.0, 160.0], [320.0, 160.0], [400.0, 160.0], [480.0, 160.0], [560.0, 160.0], [640.0, 160.0], [720.0, 160.0], [800.0, 160.0], [880.0, 160.0], [880.0, 240.0]] 
2 | SARSA: [[0.0, 160.0], [0.0, 80.0], [0.0, 0.0], [80.0, 0.0], [160.0, 0.0], [240.0, 0.0], [320.0, 0.0], [400.0, 0.0], [480.0, 0.0], [560.0, 0.0], [560.0, 80.0], [640.0, 80.0], [720.0, 80.0], [800.0, 80.0], [800.0, 160.0], [880.0, 160.0], [880.0, 240.0]] 
3 | Double Q-Learning: [[0.0, 160.0], [0.0, 80.0], [0.0, 80.0], [0.0, 160.0], [0.0, 80.0], [0.0, 0.0], [0.0, 0.0], [80.0, 0.0], [80.0, 0.0], [80.0, 80.0], [0.0, 80.0], [0.0, 0.0], [80.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [80.0, 0.0], [80.0, 0.0], [160.0, 0.0], [80.0, 0.0], [80.0, 80.0], [80.0, 0.0], [80.0, 80.0], [80.0, 0.0], [80.0, 0.0], [160.0, 0.0], [240.0, 0.0], [240.0, 0.0], [240.0, 0.0], [240.0, 0.0], [320.0, 0.0], [320.0, 0.0], [320.0, 0.0], [400.0, 0.0], [400.0, 0.0], [480.0, 0.0], [560.0, 0.0], [560.0, 0.0], [560.0, 0.0], [560.0, 80.0], [640.0, 80.0], [720.0, 80.0], [640.0, 80.0], [640.0, 0.0], [640.0, 80.0], [640.0, 160.0], [640.0, 80.0], [720.0, 80.0], [720.0, 0.0], [720.0, 0.0], [720.0, 80.0], [800.0, 80.0], [880.0, 80.0], [880.0, 160.0], [880.0, 80.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [800.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 80.0], [800.0, 80.0], [720.0, 80.0], [720.0, 0.0], [720.0, 0.0], [640.0, 0.0], [720.0, 0.0], [800.0, 0.0], [800.0, 80.0], [880.0, 80.0], [800.0, 80.0], [880.0, 80.0], [880.0, 160.0], [880.0, 160.0], [880.0, 160.0], [880.0, 240.0]] 
4 | 


--------------------------------------------------------------------------------
/cliff_walking/Sample results/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/cliff_walking/Sample results/rewardcliff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/rewardcliff.png


--------------------------------------------------------------------------------
/cliff_walking/Sample results/stepscliff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/stepscliff.png


--------------------------------------------------------------------------------
/cliff_walking/Sample results/toute.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/toute.png


--------------------------------------------------------------------------------
/cliff_walking/Sample results/valuescliff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/valuescliff.png


--------------------------------------------------------------------------------
/cliff_walking/algo_double_qlearning.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import pandas as pd
 3 | import numpy as np
 4 | from operator import add
 5 | 
 6 | 
 7 | class Double_QLearning:
 8 |     def __init__(self, actions):
 9 |         self.actions = actions
10 |         self.alpha = 0.9  # learning rate
11 |         self.gamma = 0.9  # discount factor
12 |         self.probability = 0.5  # fix
13 |         self.q_table1 = pd.DataFrame(columns=self.actions, dtype=np.float64)
14 |         self.q_table2 = pd.DataFrame(columns=self.actions, dtype=np.float64)
15 |         self.q_table_final = pd.DataFrame(
16 |             columns=self.actions, dtype=np.float64)
17 | 
18 |     # exploration and exploitation
19 |     def choose_action(self, state, epsilon):
20 |         self.check_state_exist1(state)
21 |         self.check_state_exist2(state)
22 |         if np.random.uniform(0, 1) < epsilon:
23 |             action = np.random.choice(self.actions)
24 |         else:
25 |             state_action1 = list(self.q_table1.loc[state, :])
26 |             state_action2 = list(self.q_table2.loc[state, :])
27 |             state_action = random.shuffle(
28 |                 list(map(add, state_action1, state_action2)))
29 |             action = np.argmax(state_action)
30 |         return action
31 | 
32 |     # Function for learning and updating Q-table with new knowledge
33 |     def learning(self, state, action, reward, next_state):
34 |         self.check_state_exist1(next_state)
35 |         self.check_state_exist2(next_state)
36 |         q_current1 = self.q_table1.loc[state, action]
37 |         q_current2 = self.q_table2.loc[state, action]
38 |         arg1 = self.q_table1.loc[next_state, :].idxmax()
39 |         arg2 = self.q_table2.loc[next_state, :].idxmax()
40 |         if np.random.random() < self.probability:
41 |             if next_state != 'Goal' or next_state != 'Obstacle':
42 |                 q_target1 = reward + self.gamma * \
43 |                     self.q_table2.loc[next_state, arg1]
44 |             else:
45 |                 q_target1 = reward
46 |             self.q_table1.loc[state, action] += self.alpha * \
47 |                 (q_target1 - q_current1)
48 |         else:
49 |             if next_state != 'Goal' or next_state != 'Obstacle':
50 |                 q_target2 = reward + self.gamma * \
51 |                     self.q_table1.loc[next_state, arg2]
52 |             else:
53 |                 q_target2 = reward
54 |             self.q_table2.loc[state, action] += self.alpha * \
55 |                 (q_target2 - q_current2)
56 | 
57 |         return self.q_table1.loc[state,
58 |                                  action], self.q_table2.loc[state, action]
59 | 
60 |     # Adding to the Q-table new states (pd.series generate 1-dimensional array)
61 |     def check_state_exist1(self, state):
62 |         if state not in self.q_table1.index:
63 |             self.q_table1 = self.q_table1.append(pd.Series(
64 |                 [0] * len(self.actions), index=self.q_table1.columns, name=state))
65 | 
66 |     def check_state_exist2(self, state):
67 |         if state not in self.q_table2.index:
68 |             self.q_table2 = self.q_table2.append(pd.Series(
69 |                 [0] * len(self.actions), index=self.q_table2.columns, name=state))
70 | 


--------------------------------------------------------------------------------
/cliff_walking/algo_qlearning.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | 
 4 | 
 5 | class QLearning:
 6 | 
 7 |     def __init__(self, actions):
 8 |         '''
 9 |         Q leanring inital parameters
10 | 
11 |         Parameters
12 |         ----------
13 |             actions : int
14 |                 all actions including up, down, left, right
15 |             alpha : int
16 |                 learning rate
17 |             gamma : int
18 |                 discount factor
19 |             q_table : pandas Dataframe
20 |                 Q-table with actions as columns
21 |             q_table_final : pandas Dataframe
22 |                 final Q-table
23 |         '''
24 |         self.actions = actions
25 |         self.alpha = 0.9
26 |         self.gamma = 0.9
27 |         self.q_table = pd.DataFrame(columns=self.actions, dtype=np.float64)
28 |         self.q_table_final = pd.DataFrame(
29 |             columns=self.actions, dtype=np.float64)
30 | 
31 |     # exploration and exploitation
32 |     def choose_action(self, observation, epsilon):
33 |         self.check_state_exist(observation)
34 |         if np.random.random() < epsilon:
35 |             action = np.random.choice(self.actions)  # choice randomly
36 |         else:
37 |             state_action = self.q_table.loc[observation, :]
38 |             state_action = state_action.reindex(
39 |                 np.random.permutation(state_action.index))
40 |             action = state_action.idxmax()
41 |         return action
42 | 
43 |     def learning(self, state, action, reward, next_state):
44 |         self.check_state_exist(next_state)
45 |         # current state and action for that state
46 |         q_current = self.q_table.loc[state, action]
47 |         if next_state != 'Goal' or next_state != 'Obstacle':
48 |             q_target = reward + self.gamma * \
49 |                 self.q_table.loc[next_state, :].max()
50 |         else:
51 |             q_target = reward
52 |         # updating Q-table
53 |         self.q_table.loc[state, action] += self.alpha * (q_target - q_current)
54 |         # return a value that is Q-value
55 |         return self.q_table.loc[state, action]
56 | 
57 |     # Adding to the Q-table new states (pd.series generate 1-dimensional array)
58 |     def check_state_exist(self, state):
59 |         if state not in self.q_table.index:
60 |             self.q_table = self.q_table.append(pd.Series(
61 |                 [0] * len(self.actions), index=self.q_table.columns, name=state))
62 | 


--------------------------------------------------------------------------------
/cliff_walking/algo_sarsa.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | 
 4 | 
 5 | class SARSA:
 6 |     def __init__(self, actions):
 7 |         '''
 8 |         SARSA inital parameters
 9 | 
10 |         Parameters
11 |         ----------
12 |             actions : int
13 |                 all actions including up, down, left, right
14 |             alpha : int
15 |                 learning rate
16 |             gamma : int
17 |                 discount factor
18 |             q_table : pandas Dataframe
19 |                 Q-table with actions as columns
20 |             q_table_final : pandas Dataframe
21 |                 final Q-table
22 |         '''
23 |         self.actions = actions
24 |         self.alpha = 0.9
25 |         self.gamma = 0.9
26 |         self.q_table = pd.DataFrame(columns=self.actions, dtype=np.float64)
27 |         self.q_table_final = pd.DataFrame(
28 |             columns=self.actions, dtype=np.float64)
29 | 
30 |     # exploration and exploitation
31 |     def choose_action(self, observation, epsilon):
32 |         self.check_state_exist(observation)
33 |         if np.random.uniform(0, 1) < epsilon:
34 |             action = np.random.choice(self.actions)
35 |         else:
36 |             state_action = self.q_table.loc[observation, :]
37 |             action = state_action.idxmax()
38 |         return action
39 | 
40 |     # Function for learning and updating Q-table with new knowledge
41 |     def learning(self, state, action, reward, next_state, next_action):
42 |         self.check_state_exist(next_state)
43 |         # current state and action for that state
44 |         q_current = self.q_table.loc[state, action]
45 |         if next_state != 'Goal' or next_state != 'Obstacle':
46 |             q_target = reward + self.gamma * \
47 |                 self.q_table.loc[next_state, next_action]
48 |         else:
49 |             q_target = reward
50 |         # updating Q-table
51 |         self.q_table.loc[state, action] += self.alpha * (q_target - q_current)
52 |         # return a value that is Q-value
53 |         return self.q_table.loc[state, action]
54 | 
55 |     # Adding to the Q-table new states (pd.series generate 1-dimensional array)
56 |     def check_state_exist(self, state):
57 |         if state not in self.q_table.index:
58 |             self.q_table = self.q_table.append(pd.Series(
59 |                 [0] * len(self.actions), index=self.q_table.columns, name=state))
60 | 


--------------------------------------------------------------------------------
/cliff_walking/env_cliff.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tkinter as tk
  3 | from PIL import Image, ImageTk
  4 | 
  5 | 
  6 | global_variable = {}
  7 | 
  8 | 
  9 | class Environment(tk.Tk, object):
 10 |     def __init__(self):
 11 |         '''
 12 |         Constructs all the necessary attributes for the environment.
 13 | 
 14 |         Parameters
 15 |         ----------
 16 |             num_actions: all actions including up, down, left, right
 17 |             pixels: environment pixels for each location
 18 |             env_height: number of vertical grids for the environment
 19 |             env_width: number of horizontal grids for the environment
 20 |             title: Tkinter environment title
 21 |             geometry: environmet geometry which is (w*px)*(h*px)+offsets
 22 |             comparison_dic: storing agnet pathway for each iteration
 23 |             path_dic: saving final pathway of teh agent
 24 |             key_dic: a counter for stroing paths
 25 |             fake: fake variable for reaching final Goal for first time
 26 |             longest_path: logest path to reach the Goal
 27 |             shortest_path: shortest path to reach the Goal
 28 |         '''
 29 | 
 30 |         super(Environment, self).__init__()
 31 |         self.num_actions = 4
 32 |         self.title('Cliff Walking - Sutton Book')
 33 |         self.pixels = 80
 34 |         self.env_height = 4
 35 |         self.env_width = 12
 36 |         self.geometry(
 37 |             f'{self.env_width * self.pixels}x{self.env_height * self.pixels}+450+250')
 38 |         self.build_environment()
 39 |         self.comparison_dic = {}
 40 |         self.path_dic = {}
 41 |         self.key_dic = 0
 42 |         self.fake = True
 43 |         self.longest_path = 0
 44 |         self.shortest_path = 0
 45 | 
 46 |     def build_environment(self):
 47 |         '''
 48 |         environment creation by Tkinter
 49 |         '''
 50 | 
 51 |         self.canvas_widget = tk.Canvas(
 52 |             self,
 53 |             bg='white',
 54 |             height=self.env_height *
 55 |             self.pixels,
 56 |             width=self.env_width *
 57 |             self.pixels)
 58 |         for column in range(0, self.env_width * self.pixels, self.pixels):
 59 |             x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels
 60 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 61 |         for row in range(0, self.env_height * self.pixels, self.pixels):
 62 |             x0, y0, x1, y1 = 0, row, self.env_width * self.pixels, row
 63 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 64 | 
 65 |         img_cliff = Image.open("images/cliff.png")
 66 |         self.cliff_object = ImageTk.PhotoImage(img_cliff)
 67 | 
 68 |         self.obstacle1 = self.canvas_widget.create_image(
 69 |             self.pixels, self.pixels * 3, anchor='nw', image=self.cliff_object)
 70 |         self.obstacle2 = self.canvas_widget.create_image(
 71 |             self.pixels * 2, self.pixels * 3, anchor='nw', image=self.cliff_object)
 72 |         self.obstacle3 = self.canvas_widget.create_image(
 73 |             self.pixels * 3, self.pixels * 3, anchor='nw', image=self.cliff_object)
 74 |         self.obstacle4 = self.canvas_widget.create_image(
 75 |             self.pixels * 4, self.pixels * 3, anchor='nw', image=self.cliff_object)
 76 |         self.obstacle5 = self.canvas_widget.create_image(
 77 |             self.pixels * 5, self.pixels * 3, anchor='nw', image=self.cliff_object)
 78 |         self.obstacle6 = self.canvas_widget.create_image(
 79 |             self.pixels * 6, self.pixels * 3, anchor='nw', image=self.cliff_object)
 80 |         self.obstacle7 = self.canvas_widget.create_image(
 81 |             self.pixels * 7, self.pixels * 3, anchor='nw', image=self.cliff_object)
 82 |         self.obstacle8 = self.canvas_widget.create_image(
 83 |             self.pixels * 8, self.pixels * 3, anchor='nw', image=self.cliff_object)
 84 |         self.obstacle9 = self.canvas_widget.create_image(
 85 |             self.pixels * 9, self.pixels * 3, anchor='nw', image=self.cliff_object)
 86 |         self.obstacle10 = self.canvas_widget.create_image(
 87 |             self.pixels * 10, self.pixels * 3, anchor='nw', image=self.cliff_object)
 88 | 
 89 |         img_flag = Image.open("images/end.png")
 90 |         self.flag_object = ImageTk.PhotoImage(img_flag)
 91 |         self.flag = self.canvas_widget.create_image(
 92 |             self.pixels * 11, self.pixels * 3, anchor='nw', image=self.flag_object)
 93 | 
 94 |         img_robot = Image.open("images/robot.png")
 95 |         self.robot = ImageTk.PhotoImage(img_robot)
 96 |         self.agent = self.canvas_widget.create_image(
 97 |             self.pixels * 0, self.pixels * 3, anchor='nw', image=self.robot)
 98 | 
 99 |         img_start = Image.open("images/start.png")
100 |         self.start_object = ImageTk.PhotoImage(img_start)
101 |         self.start = self.canvas_widget.create_image(
102 |             self.pixels * 0, self.pixels * 3, anchor='nw', image=self.start_object)
103 | 
104 |         self.canvas_widget.pack()
105 | 
106 |     def reset(self):
107 |         '''
108 |         reset the environment and all parameters
109 |                 Return:
110 |                      the agent's current state in the format of [120.0, 40.0]
111 |         '''
112 | 
113 |         self.update()
114 |         self.canvas_widget.delete(self.agent)
115 |         self.agent = self.canvas_widget.create_image(
116 |             0, self.pixels * 3, anchor='nw', image=self.robot)
117 |         self.comparison_dic = {}
118 |         self.key_dic = 0
119 |         return self.canvas_widget.coords(self.agent)
120 | 
121 |     def refresh(self):
122 |         '''
123 |         update and refresh the environment before training
124 |         '''
125 |         self.update()
126 | 
127 |     def step(self, action):
128 |         '''
129 |         Moving the agent one pixel and update reward, action and next step regarding the agent next location
130 | 
131 |                 Parameters:
132 |                         action: Actions = {0:'up', 1:'down', 2:'right', 3:'left}
133 | 
134 |                 Returns:
135 |                         reward, next step and done flag
136 |         '''
137 | 
138 |         state = self.canvas_widget.coords(self.agent)
139 |         base_action = np.array([0, 0])
140 | 
141 |         if action == 0:
142 |             if state[1] >= self.pixels:
143 |                 base_action[1] -= self.pixels
144 |         elif action == 1:
145 |             if state[1] < (self.env_height - 1) * self.pixels:
146 |                 base_action[1] += self.pixels
147 |         elif action == 2:
148 |             if state[0] < (self.env_width - 1) * self.pixels:
149 |                 base_action[0] += self.pixels
150 |         elif action == 3:
151 |             if state[0] >= self.pixels:
152 |                 base_action[0] -= self.pixels
153 | 
154 |         self.canvas_widget.move(self.agent, base_action[0], base_action[1])
155 |         self.comparison_dic[self.key_dic] = self.canvas_widget.coords(
156 |             self.agent)
157 |         next_state = self.comparison_dic[self.key_dic]
158 |         self.key_dic += 1
159 | 
160 |         if next_state == self.canvas_widget.coords(self.flag):
161 |             reward = 900
162 |             next_state = 'Goal'
163 |             done = True
164 | 
165 |             # filling the dictionary first time
166 |             if self.fake:
167 |                 for j in range(len(self.comparison_dic)):
168 |                     self.path_dic[j] = self.comparison_dic[j]
169 |                 self.fake = False
170 |                 self.longest_path = len(self.comparison_dic)
171 |                 self.shortest_path = len(self.comparison_dic)
172 | 
173 |             # storing shortest path
174 |             if len(self.comparison_dic) < len(self.path_dic):
175 |                 self.shortest_path = len(self.comparison_dic)
176 |                 self.path_dic = {}
177 |                 for j in range(len(self.comparison_dic)):
178 |                     self.path_dic[j] = self.comparison_dic[j]
179 | 
180 |         elif next_state in [self.canvas_widget.coords(self.obstacle1),
181 |                             self.canvas_widget.coords(self.obstacle2),
182 |                             self.canvas_widget.coords(self.obstacle3),
183 |                             self.canvas_widget.coords(self.obstacle4),
184 |                             self.canvas_widget.coords(self.obstacle5),
185 |                             self.canvas_widget.coords(self.obstacle6),
186 |                             self.canvas_widget.coords(self.obstacle7),
187 |                             self.canvas_widget.coords(self.obstacle8),
188 |                             self.canvas_widget.coords(self.obstacle9),
189 |                             self.canvas_widget.coords(self.obstacle10)]:
190 |             reward = -100
191 |             next_state = 'Obstacle'
192 |             self.comparison_dic = {}
193 |             self.key_dic = 0
194 |             done = True
195 | 
196 |         else:
197 |             reward = -1
198 |             done = False
199 | 
200 |         return next_state, reward, done
201 | 
202 |     def final_path_Q(self):
203 |         '''
204 |         saving final path of Q-learning algorithm
205 |         '''
206 | 
207 |         origin_point1 = np.array([40, 40])
208 |         path_list1 = []
209 |         self.canvas_widget.delete(self.agent)
210 |         for j in range(len(self.path_dic)):
211 |             path_list1.append(self.path_dic[j])
212 |             self.track = self.canvas_widget.create_oval(
213 |                 self.path_dic[j][0] + origin_point1[0] - 30,
214 |                 self.path_dic[j][1] + origin_point1[1] - 30,
215 |                 self.path_dic[j][0] + origin_point1[0] + 30,
216 |                 self.path_dic[j][1] + origin_point1[1] + 30,
217 |                 fill='#0C4A75',
218 |                 outline='#00DCFF')
219 |         self.path_dic = {}
220 |         self.fake = True
221 |         f = open("cliff_data.txt", "w")
222 |         f.write(f'Q-Learning: {str(path_list1)} \n')
223 |         f.close()
224 |         print('Path:', path_list1)
225 | 
226 |     def final_path_SARSA(self):
227 |         '''
228 |         saving final path of SARSA algorithm
229 |         '''
230 | 
231 |         origin_point2 = np.array([40, 40])
232 |         path_list2 = []
233 |         self.canvas_widget.delete(self.agent)
234 |         for j in range(len(self.path_dic)):
235 |             path_list2.append(self.path_dic[j])
236 |             self.track = self.canvas_widget.create_oval(
237 |                 self.path_dic[j][0] + origin_point2[0] - 22,
238 |                 self.path_dic[j][1] + origin_point2[1] - 22,
239 |                 self.path_dic[j][0] + origin_point2[0] + 22,
240 |                 self.path_dic[j][1] + origin_point2[1] + 22,
241 |                 fill='#FF5885',
242 |                 outline='#FF5885')
243 |         self.path_dic = {}
244 |         self.fake = True
245 |         f = open("cliff_data.txt", "a")
246 |         f.write(f'SARSA: {str(path_list2)} \n')
247 |         f.close()
248 |         print('Path:', path_list2)
249 | 
250 |     def final_path_DQL(self):
251 |         '''
252 |         saving final path double Q-learning algorithm
253 |         '''
254 | 
255 |         origin_point3 = np.array([40, 40])
256 |         path_list3 = []
257 |         self.canvas_widget.delete(self.agent)
258 |         for j in range(len(self.path_dic)):
259 |             path_list3.append(self.path_dic[j])
260 |             self.track = self.canvas_widget.create_oval(
261 |                 self.path_dic[j][0] + origin_point3[0] - 15,
262 |                 self.path_dic[j][1] + origin_point3[1] - 15,
263 |                 self.path_dic[j][0] + origin_point3[0] + 15,
264 |                 self.path_dic[j][1] + origin_point3[1] + 15,
265 |                 fill='#FF8E23',
266 |                 outline='#00DCFF')
267 |         self.path_dic = {}
268 |         self.fake = True
269 |         with open("cliff_data.txt", "a") as f:
270 |             f.write(f'Double Q-Learning: {str(path_list3)} \n')
271 |         print('Path:', path_list3)
272 | 


--------------------------------------------------------------------------------
/cliff_walking/images/cliff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/cliff.png


--------------------------------------------------------------------------------
/cliff_walking/images/end.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/end.png


--------------------------------------------------------------------------------
/cliff_walking/images/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/cliff_walking/images/robot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/robot.png


--------------------------------------------------------------------------------
/cliff_walking/images/start.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/start.png


--------------------------------------------------------------------------------
/cliff_walking/plot_results.py:
--------------------------------------------------------------------------------
 1 | import matplotlib.pyplot as plt
 2 | import numpy as np
 3 | 
 4 | 
 5 | class Plots:
 6 | 
 7 |     def plot_reward(self, reward1, reward2, reward3):
 8 |         plt.close()
 9 |         plt.plot(np.arange(len(reward1)), reward1, 'b')
10 |         plt.plot(np.arange(len(reward2)), reward2, 'r')
11 |         plt.plot(np.arange(len(reward3)), reward3, 'g')
12 |         plt.xlabel('Episodes')
13 |         plt.ylabel('Reward')
14 |         plt.legend(['q-learning', 'sarsa', 'double q-learning'])
15 |         plt.grid()
16 |         plt.savefig('rewardcliff.png')
17 |         plt.show()
18 | 
19 |     def plot_steps(self, steps1, steps2, steps3):
20 |         plt.close()
21 |         plt.plot(np.arange(len(steps1)), steps1, 'b')
22 |         plt.plot(np.arange(len(steps2)), steps2, 'r')
23 |         plt.plot(np.arange(len(steps3)), steps3, 'g')
24 |         plt.xlabel('Episodes')
25 |         plt.ylabel('Steps')
26 |         plt.legend(['q-learning', 'sarsa', 'double q-learning'])
27 |         plt.grid()
28 |         plt.savefig('stepscliff.png')
29 |         plt.show()
30 | 
31 |     def plot_value(self, value1, value2, value31, value32):
32 |         plt.close()
33 |         plt.plot(np.arange(len(value1)), value1, 'b')
34 |         plt.plot(np.arange(len(value2)), value2, 'r')
35 |         plt.plot(np.arange(len(value31)), value31, 'g')
36 |         plt.plot(np.arange(len(value32)), value32, 'black')
37 |         plt.xlabel('Episodes')
38 |         plt.ylabel('Q-Values')
39 |         plt.legend(['q-learning',
40 |                     'sarsa',
41 |                     'double q-learning 1',
42 |                     'double q-learning 2'])
43 |         plt.grid()
44 |         plt.savefig('valuescliff.png')
45 |         plt.show()
46 | 


--------------------------------------------------------------------------------
/cliff_walking/readme:
--------------------------------------------------------------------------------
1 | This problem is adapted from the book 'Reinforcement Learning: An Introduction' by Andrew Barto and Richard S. Sutton. It represents a standard, undiscounted, episodic task with designated start and goal states. The available actions include moving up, down, right, and left. The default reward is -1 for all transitions except those leading into the region marked 'The Cliff.' Entering this region results in a penalty of -100 and instantaneously relocates the agent back to the start.
2 | 


--------------------------------------------------------------------------------
/cliff_walking/run.py:
--------------------------------------------------------------------------------
  1 | from env_cliff import Environment
  2 | from algo_qlearning import QLearning
  3 | from algo_sarsa import SARSA
  4 | from algo_double_qlearning import Double_QLearning
  5 | from plot_results import Plots
  6 | 
  7 | 
  8 | env = Environment()
  9 | QL = QLearning(actions=list(range(env.num_actions)))
 10 | SARSA = SARSA(actions=list(range(env.num_actions)))
 11 | DQL = Double_QLearning(actions=list(range(env.num_actions)))
 12 | plt = Plots()
 13 | 
 14 | 
 15 | total_steps1 = []
 16 | total_rewards1 = []
 17 | total_values1 = []
 18 | total_steps2 = []
 19 | total_rewards2 = []
 20 | total_values2 = []
 21 | episodes = 500
 22 | epsilon1 = 0.1
 23 | decay_factor1 = 0.999
 24 | epsilon2 = 0.1
 25 | decay_factor2 = 0.999
 26 | epsilon3 = 0.8
 27 | decay_factor3 = 0.999
 28 | 
 29 | 
 30 | print('Q-learning')
 31 | for episode in range(episodes):
 32 |     state1 = env.reset()
 33 |     step1 = 0
 34 |     value1 = 0
 35 |     reward_value1 = 0
 36 |     epsilon1 *= decay_factor1
 37 |     while True:
 38 |         env.refresh()
 39 |         action1 = QL.choose_action(str(state1), epsilon1)
 40 |         next_state1, reward1, done1 = env.step(action1)
 41 |         value1 += QL.learning(str(state1), action1, reward1, str(next_state1))
 42 |         state1 = next_state1
 43 |         step1 += 1
 44 |         reward_value1 += reward1
 45 | 
 46 |         if done1:
 47 |             total_steps1 += [step1]
 48 |             total_rewards1 += [reward_value1]
 49 |             total_values1 += [value1]
 50 |             break
 51 | env.final_path_Q()
 52 | 
 53 | print()
 54 | print('SARSA')
 55 | for episode in range(episodes):
 56 |     state2 = env.reset()
 57 |     step2 = 0
 58 |     value2 = 0
 59 |     reward_value2 = 0
 60 |     epsilon2 *= decay_factor2
 61 |     action2 = SARSA.choose_action(str(state2), epsilon2)
 62 |     while True:
 63 |         env.refresh()
 64 |         next_state2, reward2, done2 = env.step(action2)
 65 |         next_action2 = SARSA.choose_action(str(next_state2), epsilon2)
 66 |         value2 += SARSA.learning(str(state2), action2,
 67 |                                  reward2, str(next_state2), next_action2)
 68 |         state2 = next_state2
 69 |         action2 = next_action2
 70 |         reward_value2 += reward2
 71 |         step2 += 1
 72 | 
 73 |         if done2:
 74 |             total_steps2 += [step2]
 75 |             total_rewards2 += [reward_value2]
 76 |             total_values2 += [value2]
 77 |             break
 78 | env.final_path_SARSA()
 79 | 
 80 | print()
 81 | print('Double Q-learning')
 82 | total_steps3 = []
 83 | total_rewards3 = []
 84 | total_values13 = []
 85 | total_values23 = []
 86 | value13 = 0
 87 | value23 = 0
 88 | for episode in range(episodes):
 89 |     state3 = env.reset()
 90 |     step3 = 0
 91 |     reward_value3 = 0
 92 |     epsilon3 *= decay_factor3
 93 |     while True:
 94 |         env.refresh()
 95 |         action3 = DQL.choose_action(str(state3), epsilon3)
 96 |         next_state3, reward3, done3 = env.step(action3)
 97 |         value13, value23 = DQL.learning(
 98 |             str(state3), action3, reward3, str(next_state3))
 99 |         value13 += value13
100 |         value23 += value23
101 |         state3 = next_state3
102 |         step3 += 1
103 |         reward_value3 += reward3
104 | 
105 |         if done3:
106 |             total_steps3 += [step3]
107 |             total_rewards3 += [reward_value3]
108 |             total_values13 += [value13]
109 |             total_values23 += [value23]
110 |             break
111 | 
112 | env.final_path_DQL()
113 | 
114 | plt.plot_reward(total_rewards1, total_rewards2, total_rewards3)
115 | plt.plot_steps(total_steps1, total_steps2, total_steps3)
116 | plt.plot_value(total_values1, total_values2, total_values13, total_values23)
117 | 
118 | env.after(1000)
119 | env.mainloop()
120 | 


--------------------------------------------------------------------------------
/env1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/env1.png


--------------------------------------------------------------------------------
/env2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/env2.jpg


--------------------------------------------------------------------------------
/q_learning/Sample results/QLreward.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/QLreward.png


--------------------------------------------------------------------------------
/q_learning/Sample results/QLsteps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/QLsteps.png


--------------------------------------------------------------------------------
/q_learning/Sample results/QLvalue.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/QLvalue.png


--------------------------------------------------------------------------------
/q_learning/Sample results/data.txt:
--------------------------------------------------------------------------------
 1 | The Shortest Path: 28 
 2 | The Longest Path: 1401 
 3 | Optimal Path: [[40.0, 0.0], [40.0, 40.0], [80.0, 40.0], [80.0, 80.0], [80.0, 120.0], [120.0, 120.0], [160.0, 120.0], [160.0, 160.0], [160.0, 200.0], [160.0, 240.0], [160.0, 280.0], [200.0, 280.0], [240.0, 280.0], [280.0, 280.0], [320.0, 280.0], [320.0, 320.0], [320.0, 360.0], [360.0, 360.0], [360.0, 400.0], [400.0, 400.0], [400.0, 440.0], [400.0, 480.0], [400.0, 520.0], [400.0, 560.0], [440.0, 560.0], [480.0, 560.0], [520.0, 560.0], [520.0, 520.0]] 
 4 | Final Path Q-table:                          0          1          2          3
 5 | [40.0, 0.0]       0.000000   6.461082  -5.000000   0.000000
 6 | [40.0, 40.0]      0.000000   0.000000   7.178980  -5.000000
 7 | [80.0, 40.0]     -5.000000   7.976644   0.000000   0.000000
 8 | [80.0, 80.0]      0.000000   8.862938  -5.000000   0.000000
 9 | [80.0, 120.0]     0.000000  -5.000000   9.847709   0.000000
10 | [120.0, 120.0]   -5.000000   0.000000  10.941899   7.976644
11 | [160.0, 120.0]    0.000000  12.157665   0.000000   0.000000
12 | [160.0, 160.0]    9.847709  13.508517  -5.000000   0.000000
13 | [160.0, 200.0]    0.000000  15.009464   0.000000   0.000000
14 | [160.0, 240.0]    0.000000  16.677182   0.000000  -4.950000
15 | [160.0, 280.0]    0.000000   0.000000  18.530202   0.000000
16 | [200.0, 280.0]    0.000000  -4.950000  20.589113   0.000000
17 | [240.0, 280.0]   -4.999500   0.000000  22.876792   0.000000
18 | [280.0, 280.0]    0.000000   0.000000  25.418658   0.000000
19 | [320.0, 280.0]    0.000000  28.242954  -0.900000   0.000000
20 | [320.0, 320.0]    0.000000  31.381060  -4.500000   0.000000
21 | [320.0, 360.0]    0.000000   0.000000  34.867844  -4.500000
22 | [360.0, 360.0]   -4.500000  38.742049   0.000000   0.000000
23 | [360.0, 400.0]    0.000000  -4.500000  43.046721   0.000000
24 | [400.0, 400.0]    0.000000  47.829690   0.000000   0.000000
25 | [400.0, 440.0]    0.000000  53.144100   0.000000  -4.500000
26 | [400.0, 480.0]    0.000000  59.049000   0.000000  43.046717
27 | [400.0, 520.0]    0.000000  65.610000   0.000000   0.000000
28 | [400.0, 560.0]    0.000000   0.000000  72.900000   0.000000
29 | [440.0, 560.0]    0.000000   0.000000  81.000000   0.000000
30 | [480.0, 560.0]   -4.950000   0.000000  90.000000   0.000000
31 | [520.0, 560.0]  100.000000   0.000000  -4.500000   0.000000 
32 | Full Q-table::                   0         1         2    3
33 | [0.0, 0.0]      0.0 -5.000000  5.814974  0.0
34 | Obstacle        0.0  0.000000  0.000000  0.0
35 | [40.0, 0.0]     0.0  6.461082 -5.000000  0.0
36 | [40.0, 40.0]    0.0  0.000000  7.178980 -5.0
37 | [80.0, 40.0]   -5.0  7.976644  0.000000  0.0
38 | ...             ...       ...       ...  ...
39 | [560.0, 360.0] -4.5  0.000000  0.000000  0.0
40 | [520.0, 360.0]  0.0  0.000000  0.000000  0.0
41 | [560.0, 160.0] -4.5  0.000000  0.000000 -4.5
42 | [560.0, 40.0]  -4.5  0.000000  0.000000 -4.5
43 | [520.0, 0.0]    0.0 -4.500000 -4.500000  0.0
44 | 
45 | [176 rows x 4 columns] 
46 | 


--------------------------------------------------------------------------------
/q_learning/Sample results/readme:
--------------------------------------------------------------------------------
1 | Some results from my run (they may differ if you run it).
2 | 


--------------------------------------------------------------------------------
/q_learning/Sample results/result QL.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/result QL.png


--------------------------------------------------------------------------------
/q_learning/algo_qlearning.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | from environment import final_states
  4 | 
  5 | 
  6 | class QLearning:
  7 | 
  8 |     def __init__(self, actions):
  9 |         '''
 10 |         Q leanring inital parameters
 11 | 
 12 |         Parameters
 13 |         ----------
 14 |             actions : int
 15 |                 all actions including up, down, left, right
 16 |             alpha : int
 17 |                 learning rate
 18 |             gamma : int
 19 |                 discount factor
 20 |             epsilon : int
 21 |                 probability
 22 |             decay_factor : int
 23 |             q_table : pandas Dataframe
 24 |                 Q-table with actions as columns
 25 |             q_table_final : pandas Dataframe
 26 |                 final Q-table
 27 |         '''
 28 |         self.actions = actions
 29 |         self.alpha = 0.9
 30 |         self.gamma = 0.9
 31 |         self.epsilon = 0.5
 32 |         self.decay_factor = 0.9999
 33 |         self.q_table = pd.DataFrame(columns=self.actions,
 34 |                                     dtype=np.float64)
 35 |         self.q_table_final = pd.DataFrame(columns=self.actions,
 36 |                                           dtype=np.float64)
 37 | 
 38 |     def choose_action(self, observation):
 39 |         '''
 40 |         Returns an action through exploration and exploitation
 41 | 
 42 |                 Parameters:
 43 |                         observation: current state of
 44 |                         the agent in the format of state = '[5.0, 40.0]'
 45 | 
 46 |                 Returns:
 47 |                         action number
 48 |         '''
 49 |         self.check_state_exist(observation)
 50 |         self.epsilon *= self.decay_factor  # epsilon greedy
 51 |         if np.random.random() < self.epsilon:
 52 |             action = np.random.choice(self.actions)
 53 |         else:
 54 |             # access a group of rows and columns [row , column]
 55 |             state_action = self.q_table.loc[observation, :]
 56 |             # reindex: based on previous DataFrame, regenerate new indexes
 57 |             state_action = state_action.reindex(
 58 |                 np.random.permutation(state_action.index))
 59 |             action = state_action.idxmax()  # return index of first occurrence of maximum value
 60 |         return action
 61 | 
 62 |     def learning(self, state, action, reward, next_state):
 63 |         '''
 64 |         Function for learning and updating Q-table with new data
 65 | 
 66 |                 Parameters:
 67 |                         state: current state of the agent
 68 |                         action: chosen action
 69 |                         reward: received reward
 70 |                         next_state: next state that the agent will move
 71 | 
 72 |                 Returns:
 73 |                         update Q-table
 74 |         '''
 75 |         self.check_state_exist(next_state)
 76 |         q_current = self.q_table.loc[state, action]
 77 |         if next_state != 'Goal' or next_state != 'Obstacle' or next_state != 'Rubik':
 78 |             q_target = reward + self.gamma * \
 79 |                 self.q_table.loc[next_state, :].max()
 80 |         else:
 81 |             q_target = reward
 82 | 
 83 |         self.q_table.loc[state, action] += self.alpha * \
 84 |             (q_target - q_current)  # updating Q-table
 85 |         return self.q_table.loc[state, action]
 86 | 
 87 |     def check_state_exist(self, state):
 88 |         '''
 89 |         Adding new states to the Q-table
 90 |         (pd.series generate 1-dimensional array)
 91 |         '''
 92 |         if state not in self.q_table.index:
 93 |             self.q_table = self.q_table.append(pd.Series(
 94 |                 [0] * len(self.actions), index=self.q_table.columns, name=state))
 95 | 
 96 |     def print_q_table(self):
 97 |         '''
 98 |         Saving final Q-table
 99 |         '''
100 |         final_route = final_states()
101 |         for i in range(len(final_route)):
102 |             state = str(final_route[i])
103 |             for j in range(len(self.q_table.index)):
104 |                 if self.q_table.index[j] == state:
105 |                     self.q_table_final.loc[state,
106 |                                            :] = self.q_table.loc[state, :]
107 | 
108 |         with open('data.txt', 'a') as f:
109 |             f.write(f'Final Path Q-table: {str(self.q_table_final)} \n')
110 |             f.write(f'Full Q-table:: {str(self.q_table)} \n')
111 | 


--------------------------------------------------------------------------------
/q_learning/environment.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tkinter as tk
  3 | from tkinter import *
  4 | from PIL import Image, ImageTk
  5 | 
  6 | 
  7 | global_variable = {}
  8 | 
  9 | 
 10 | class Environment(tk.Tk, object):
 11 |     def __init__(self):
 12 |         '''
 13 |         Constructs all the necessary attributes for the environment.
 14 | 
 15 |         Parameters
 16 |         ----------
 17 |             num_actions: all actions including up, down, left, right
 18 |             pixels: environment pixels for each location
 19 |             env_height: number of vertical grids for the environment
 20 |             env_width: number of horizontal grids for the environment
 21 |             title: Tkinter environment title
 22 |             geometry: environmet geometry which is (w*px)*(h*px)+offsets
 23 |             comparison_dic: storing agnet pathway for each iteration
 24 |             path_dic: saving final pathway of teh agent
 25 |             key_dic: a counter for stroing paths
 26 |             fake: fake variable for reaching final Goal for first time
 27 |             longest_path: logest path to reach the Goal
 28 |             shortest_path: shortest path to reach the Goal
 29 |         '''
 30 | 
 31 |         super(Environment, self).__init__()
 32 |         self.num_actions = 4
 33 |         self.pixels = 40
 34 |         self.env_height = 15
 35 |         self.env_width = 15
 36 |         self.title('Path Planing with Reinforcement Learning')
 37 |         self.geometry(
 38 |             f'{self.env_width* self.pixels}x{self.env_height * self.pixels}+600+250')
 39 |         self.build_environment()
 40 |         self.comparison_dic = {}
 41 |         self.path_dic = {}
 42 |         self.key_dic = 0
 43 |         self.fake = True
 44 |         self.longest_path = 0
 45 |         self.shortest_path = 0
 46 | 
 47 |     def build_environment(self):
 48 |         '''
 49 |         environment creation by Tkinter
 50 |         '''
 51 | 
 52 |         self.canvas_widget = tk.Canvas(
 53 |             self,
 54 |             bg='white',
 55 |             height=self.env_height *
 56 |             self.pixels,
 57 |             width=self.env_width *
 58 |             self.pixels)
 59 | 
 60 |         for column in range(0, self.env_width * self.pixels, self.pixels):
 61 |             x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels
 62 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 63 |         for row in range(0, self.env_height * self.pixels, self.pixels):
 64 |             x0, y0, x1, y1 = 0, row, self.env_height * self.pixels, row
 65 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 66 | 
 67 |         img_obstacle = Image.open('images/obstacle.png')
 68 |         self.obstacle_object = ImageTk.PhotoImage(img_obstacle)
 69 | 
 70 |         img_tree = Image.open('images/tree.png')
 71 |         self.tree_object = ImageTk.PhotoImage(img_tree)
 72 | 
 73 |         img_shop = Image.open('images/boot_tree.png')
 74 |         self.shop_object = ImageTk.PhotoImage(img_shop)
 75 | 
 76 |         img_building = Image.open('images/building.png')
 77 |         self.building_object = ImageTk.PhotoImage(img_building)
 78 | 
 79 |         img_cube = Image.open('images/rubik.png')
 80 |         self.cube_object = ImageTk.PhotoImage(img_cube)
 81 | 
 82 |         img_garbage = Image.open('images/garbage.png')
 83 |         self.garbage_object = ImageTk.PhotoImage(img_garbage)
 84 | 
 85 |         self.obstacle1 = self.canvas_widget.create_image(
 86 |             self.pixels * 2, 0, anchor='nw', image=self.obstacle_object)
 87 |         self.obstacle2 = self.canvas_widget.create_image(
 88 |             self.pixels * 9, 0, anchor='nw', image=self.tree_object)
 89 |         self.obstacle3 = self.canvas_widget.create_image(
 90 |             self.pixels * 11, 0, anchor='nw', image=self.obstacle_object)
 91 |         self.obstacle4 = self.canvas_widget.create_image(
 92 |             self.pixels * 14, 0, anchor='nw', image=self.shop_object)
 93 |         self.obstacle5 = self.canvas_widget.create_image(
 94 |             self.pixels * 5, 0, anchor='nw', image=self.obstacle_object)
 95 |         self.obstacle6 = self.canvas_widget.create_image(
 96 |             0, self.pixels, anchor='nw', image=self.building_object)
 97 |         self.obstacle7 = self.canvas_widget.create_image(
 98 |             self.pixels * 7, self.pixels, anchor='nw', image=self.obstacle_object)
 99 |         self.obstacle8 = self.canvas_widget.create_image(
100 |             self.pixels * 9, self.pixels, anchor='nw', image=self.obstacle_object)
101 |         self.obstacle9 = self.canvas_widget.create_image(
102 |             self.pixels * 13, self.pixels, anchor='nw', image=self.obstacle_object)
103 |         self.obstacle10 = self.canvas_widget.create_image(
104 |             self.pixels * 3, self.pixels * 2, anchor='nw', image=self.tree_object)
105 |         self.obstacle11 = self.canvas_widget.create_image(
106 |             self.pixels * 5, self.pixels * 2, anchor='nw', image=self.obstacle_object)
107 |         self.obstacle12 = self.canvas_widget.create_image(
108 |             self.pixels * 11, self.pixels * 2, anchor='nw', image=self.obstacle_object)
109 |         self.obstacle13 = self.canvas_widget.create_image(
110 |             self.pixels * 0, self.pixels * 3, anchor='nw', image=self.building_object)
111 |         self.obstacle14 = self.canvas_widget.create_image(
112 |             self.pixels * 2, self.pixels * 4, anchor='nw', image=self.shop_object)
113 |         self.obstacle15 = self.canvas_widget.create_image(
114 |             self.pixels * 8, self.pixels * 3, anchor='nw', image=self.obstacle_object)
115 |         self.obstacle16 = self.canvas_widget.create_image(
116 |             self.pixels * 9, self.pixels * 3, anchor='nw', image=self.tree_object)
117 |         self.obstacle17 = self.canvas_widget.create_image(
118 |             self.pixels * 14, self.pixels * 3, anchor='nw', image=self.obstacle_object)
119 |         self.obstacle19 = self.canvas_widget.create_image(
120 |             self.pixels * 5, self.pixels * 4, anchor='nw', image=self.building_object)
121 |         self.obstacle20 = self.canvas_widget.create_image(
122 |             self.pixels * 10, self.pixels * 4, anchor='nw', image=self.obstacle_object)
123 |         self.obstacle21 = self.canvas_widget.create_image(
124 |             self.pixels * 13, self.pixels * 4, anchor='nw', image=self.obstacle_object)
125 |         self.obstacle22 = self.canvas_widget.create_image(
126 |             self.pixels * 8, self.pixels * 5, anchor='nw', image=self.shop_object)
127 |         self.obstacle23 = self.canvas_widget.create_image(
128 |             self.pixels * 3, self.pixels * 6, anchor='nw', image=self.obstacle_object)
129 |         self.obstacle24 = self.canvas_widget.create_image(
130 |             self.pixels * 6, self.pixels * 6, anchor='nw', image=self.obstacle_object)
131 |         self.obstacle25 = self.canvas_widget.create_image(
132 |             self.pixels * 11, self.pixels * 6, anchor='nw', image=self.tree_object)
133 |         self.obstacle26 = self.canvas_widget.create_image(
134 |             self.pixels * 14, self.pixels * 6, anchor='nw', image=self.obstacle_object)
135 |         self.obstacle27 = self.canvas_widget.create_image(
136 |             self.pixels * 0, self.pixels * 7, anchor='nw', image=self.obstacle_object)
137 |         self.obstacle28 = self.canvas_widget.create_image(
138 |             self.pixels * 1, self.pixels * 7, anchor='nw', image=self.tree_object)
139 |         self.obstacle29 = self.canvas_widget.create_image(
140 |             self.pixels * 9, self.pixels * 7, anchor='nw', image=self.cube_object)
141 |         self.obstacle30 = self.canvas_widget.create_image(
142 |             self.pixels * 3, self.pixels * 8, anchor='nw', image=self.building_object)
143 |         self.obstacle31 = self.canvas_widget.create_image(
144 |             self.pixels * 5, self.pixels * 8, anchor='nw', image=self.obstacle_object)
145 |         self.obstacle32 = self.canvas_widget.create_image(
146 |             self.pixels * 9, self.pixels * 8, anchor='nw', image=self.shop_object)
147 |         self.obstacle33 = self.canvas_widget.create_image(
148 |             self.pixels * 12, self.pixels * 8, anchor='nw', image=self.tree_object)
149 |         self.obstacle34 = self.canvas_widget.create_image(
150 |             self.pixels * 14, self.pixels * 8, anchor='nw', image=self.obstacle_object)
151 |         self.obstacle35 = self.canvas_widget.create_image(
152 |             self.pixels * 0, self.pixels * 9, anchor='nw', image=self.shop_object)
153 |         self.obstacle36 = self.canvas_widget.create_image(
154 |             self.pixels * 7, self.pixels * 9, anchor='nw', image=self.obstacle_object)
155 |         self.obstacle37 = self.canvas_widget.create_image(
156 |             self.pixels * 3, self.pixels * 10, anchor='nw', image=self.building_object)
157 |         self.obstacle38 = self.canvas_widget.create_image(
158 |             self.pixels * 5, self.pixels * 10, anchor='nw', image=self.tree_object)
159 |         self.obstacle39 = self.canvas_widget.create_image(
160 |             self.pixels * 12, self.pixels * 10, anchor='nw', image=self.obstacle_object)
161 |         self.obstacle40 = self.canvas_widget.create_image(
162 |             self.pixels * 1, self.pixels * 11, anchor='nw', image=self.tree_object)
163 |         self.obstacle41 = self.canvas_widget.create_image(
164 |             self.pixels * 6, self.pixels * 11, anchor='nw', image=self.obstacle_object)
165 |         self.obstacle42 = self.canvas_widget.create_image(
166 |             self.pixels * 9, self.pixels * 11, anchor='nw', image=self.tree_object)
167 |         self.obstacle43 = self.canvas_widget.create_image(
168 |             self.pixels * 12, self.pixels * 12, anchor='nw', image=self.garbage_object)
169 |         self.obstacle44 = self.canvas_widget.create_image(
170 |             self.pixels * 13, self.pixels * 12, anchor='nw', image=self.garbage_object)
171 |         self.obstacle45 = self.canvas_widget.create_image(
172 |             self.pixels * 14, self.pixels * 12, anchor='nw', image=self.garbage_object)
173 |         self.obstacle46 = self.canvas_widget.create_image(
174 |             self.pixels * 2, self.pixels * 13, anchor='nw', image=self.obstacle_object)
175 |         self.obstacle47 = self.canvas_widget.create_image(
176 |             self.pixels * 4, self.pixels * 13, anchor='nw', image=self.building_object)
177 |         self.obstacle48 = self.canvas_widget.create_image(
178 |             self.pixels * 7, self.pixels * 13, anchor='nw', image=self.obstacle_object)
179 |         self.obstacle49 = self.canvas_widget.create_image(
180 |             self.pixels * 12, self.pixels * 13, anchor='nw', image=self.garbage_object)
181 |         self.obstacle50 = self.canvas_widget.create_image(
182 |             self.pixels * 14, self.pixels * 13, anchor='nw', image=self.garbage_object)
183 |         self.obstacle51 = self.canvas_widget.create_image(
184 |             self.pixels * 0, self.pixels * 14, anchor='nw', image=self.building_object)
185 |         self.obstacle52 = self.canvas_widget.create_image(
186 |             self.pixels * 14, self.pixels * 14, anchor='nw', image=self.garbage_object)
187 | 
188 |         img_flag = Image.open('images/flag.png')
189 |         self.flag_object = ImageTk.PhotoImage(img_flag)
190 |         self.flag = self.canvas_widget.create_image(
191 |             self.pixels * 13, self.pixels * 13, anchor='nw', image=self.flag_object)
192 | 
193 |         img_robot = Image.open('images/agent.png')
194 |         self.robot = ImageTk.PhotoImage(img_robot)
195 |         self.agent = self.canvas_widget.create_image(
196 |             0, 0, anchor='nw', image=self.robot)
197 | 
198 |         self.canvas_widget.pack()
199 | 
200 |     def reset(self):
201 |         '''
202 |         reset the environment and all parameters
203 |                 Return:
204 |                      the agent's current state in the format of [120.0, 40.0]
205 |         '''
206 | 
207 |         self.update()
208 |         self.canvas_widget.delete(self.agent)
209 |         self.agent = self.canvas_widget.create_image(
210 |             0, 0, anchor='nw', image=self.robot)
211 |         self.comparison_dic = {}
212 |         self.key_dic = 0
213 |         return self.canvas_widget.coords(self.agent)
214 | 
215 |     def refresh(self):
216 |         '''
217 |         update and refresh the environment before training
218 |         '''
219 |         self.update()
220 | 
221 |     def step(self, action):
222 |         '''
223 |         Moving the agent one pixel and update reward, action and next step regarding the agent next location
224 | 
225 |                 Parameters:
226 |                         action: Actions = {0:'up', 1:'down', 2:'right', 3:'left}
227 | 
228 |                 Returns:
229 |                         reward, next step and done flag
230 |         '''
231 | 
232 |         state = self.canvas_widget.coords(self.agent)
233 |         base_action = np.array([0, 0])
234 | 
235 |         if action == 0:
236 |             if state[1] >= self.pixels:
237 |                 base_action[1] -= self.pixels
238 |         elif action == 1:
239 |             if state[1] < (self.env_height - 1) * self.pixels:
240 |                 base_action[1] += self.pixels
241 |         elif action == 2:
242 |             if state[0] < (self.env_width - 1) * self.pixels:
243 |                 base_action[0] += self.pixels
244 |         elif action == 3:
245 |             if state[0] >= self.pixels:
246 |                 base_action[0] -= self.pixels
247 | 
248 |         self.canvas_widget.move(self.agent, base_action[0], base_action[1])
249 |         self.comparison_dic[self.key_dic] = self.canvas_widget.coords(
250 |             self.agent)  # storing new position of agent
251 |         next_state = self.comparison_dic[self.key_dic]
252 |         self.key_dic += 1  # add next key in dictionary
253 | 
254 |         if next_state == self.canvas_widget.coords(self.flag):
255 |             reward = 100
256 |             next_state = 'Goal'
257 |             done = True
258 | 
259 |             # filling the dictionary first time
260 |             if self.fake:
261 |                 for j in range(len(self.comparison_dic)):
262 |                     self.path_dic[j] = self.comparison_dic[j]
263 |                 self.fake = False
264 |                 self.longest_path = len(self.comparison_dic)
265 |                 self.shortest_path = len(self.comparison_dic)
266 | 
267 |             # storing shortest path
268 |             if len(self.comparison_dic) < len(self.path_dic):
269 |                 self.shortest_path = len(self.comparison_dic)
270 |                 self.path_dic = {}
271 |                 for j in range(len(self.comparison_dic)):
272 |                     self.path_dic[j] = self.comparison_dic[j]
273 | 
274 |             # storing longest path
275 |             if len(self.comparison_dic) > self.longest_path:
276 |                 self.longest_path = len(self.comparison_dic)
277 | 
278 |         elif next_state in [self.canvas_widget.coords(self.obstacle1),
279 |                             self.canvas_widget.coords(self.obstacle2),
280 |                             self.canvas_widget.coords(self.obstacle3),
281 |                             self.canvas_widget.coords(self.obstacle4),
282 |                             self.canvas_widget.coords(self.obstacle5),
283 |                             self.canvas_widget.coords(self.obstacle6),
284 |                             self.canvas_widget.coords(self.obstacle7),
285 |                             self.canvas_widget.coords(self.obstacle8),
286 |                             self.canvas_widget.coords(self.obstacle9),
287 |                             self.canvas_widget.coords(self.obstacle10),
288 |                             self.canvas_widget.coords(self.obstacle11),
289 |                             self.canvas_widget.coords(self.obstacle12),
290 |                             self.canvas_widget.coords(self.obstacle13),
291 |                             self.canvas_widget.coords(self.obstacle14),
292 |                             self.canvas_widget.coords(self.obstacle15),
293 |                             self.canvas_widget.coords(self.obstacle16),
294 |                             self.canvas_widget.coords(self.obstacle17),
295 |                             self.canvas_widget.coords(self.obstacle19),
296 |                             self.canvas_widget.coords(self.obstacle20),
297 |                             self.canvas_widget.coords(self.obstacle21),
298 |                             self.canvas_widget.coords(self.obstacle22),
299 |                             self.canvas_widget.coords(self.obstacle23),
300 |                             self.canvas_widget.coords(self.obstacle24),
301 |                             self.canvas_widget.coords(self.obstacle25),
302 |                             self.canvas_widget.coords(self.obstacle26),
303 |                             self.canvas_widget.coords(self.obstacle27),
304 |                             self.canvas_widget.coords(self.obstacle28),
305 |                             self.canvas_widget.coords(self.obstacle30),
306 |                             self.canvas_widget.coords(self.obstacle31),
307 |                             self.canvas_widget.coords(self.obstacle32),
308 |                             self.canvas_widget.coords(self.obstacle33),
309 |                             self.canvas_widget.coords(self.obstacle34),
310 |                             self.canvas_widget.coords(self.obstacle35),
311 |                             self.canvas_widget.coords(self.obstacle36),
312 |                             self.canvas_widget.coords(self.obstacle37),
313 |                             self.canvas_widget.coords(self.obstacle38),
314 |                             self.canvas_widget.coords(self.obstacle39),
315 |                             self.canvas_widget.coords(self.obstacle40),
316 |                             self.canvas_widget.coords(self.obstacle41),
317 |                             self.canvas_widget.coords(self.obstacle42),
318 |                             self.canvas_widget.coords(self.obstacle43),
319 |                             self.canvas_widget.coords(self.obstacle44),
320 |                             self.canvas_widget.coords(self.obstacle45),
321 |                             self.canvas_widget.coords(self.obstacle46),
322 |                             self.canvas_widget.coords(self.obstacle47),
323 |                             self.canvas_widget.coords(self.obstacle48),
324 |                             self.canvas_widget.coords(self.obstacle49),
325 |                             self.canvas_widget.coords(self.obstacle50),
326 |                             self.canvas_widget.coords(self.obstacle51),
327 |                             self.canvas_widget.coords(self.obstacle52)]:
328 |             reward = -5
329 |             done = True
330 |             next_state = 'Obstacle'
331 |             self.comparison_dic = {}
332 |             self.key_dic = 0
333 | 
334 |         elif next_state in [self.canvas_widget.coords(self.obstacle29)]:
335 |             reward = -1
336 |             done = True
337 |             next_state = 'Rubik'
338 |             self.comparison_dic = {}
339 |             self.key_dic = 0
340 | 
341 |         else:
342 |             reward = 0
343 |             done = False
344 | 
345 |         return next_state, reward, done
346 | 
347 |     def final_path(self):
348 |         '''
349 |         saving final path and showing graphically by balck ovals
350 |         '''
351 | 
352 |         origin_point = np.array([20, 20])
353 |         path_list = []
354 |         self.canvas_widget.delete(self.agent)
355 |         for j in range(len(self.path_dic)):
356 |             path_list.append(self.path_dic[j])
357 |             self.track = self.canvas_widget.create_oval(
358 |                 self.path_dic[j][0] + origin_point[0] - 12,
359 |                 self.path_dic[j][1] + origin_point[1] - 12,
360 |                 self.path_dic[j][0] + origin_point[0] + 12,
361 |                 self.path_dic[j][1] + origin_point[1] + 12,
362 |                 fill='black',
363 |                 outline='black')
364 |             # putting the final route in a global variable
365 |             global_variable[j] = self.path_dic[j]
366 | 
367 |         with open('data.txt', 'w') as f:
368 |             f.write(f'The Shortest Path: {str(self.shortest_path)} \n')
369 |             f.write(f'The Longest Path: {str(self.longest_path)} \n')
370 |             f.write(f'Optimal Path: {str(path_list)} \n')
371 | 
372 | 
373 | def final_states():
374 |     '''final route coordination for plotting'''
375 | 
376 |     return global_variable
377 | 


--------------------------------------------------------------------------------
/q_learning/images/agent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/agent.png


--------------------------------------------------------------------------------
/q_learning/images/boot_tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/boot_tree.png


--------------------------------------------------------------------------------
/q_learning/images/building.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/building.png


--------------------------------------------------------------------------------
/q_learning/images/flag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/flag.png


--------------------------------------------------------------------------------
/q_learning/images/garbage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/garbage.png


--------------------------------------------------------------------------------
/q_learning/images/obstacle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/obstacle.png


--------------------------------------------------------------------------------
/q_learning/images/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/q_learning/images/rubik.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/rubik.png


--------------------------------------------------------------------------------
/q_learning/images/tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/tree.png


--------------------------------------------------------------------------------
/q_learning/plot_results.py:
--------------------------------------------------------------------------------
 1 | import matplotlib.pyplot as plt
 2 | import numpy as np
 3 | 
 4 | 
 5 | class Plots:
 6 | 
 7 |     def plot_reward(self, reward):
 8 |         plt.close()
 9 |         plt.plot(np.arange(len(reward)), reward, 'b')
10 |         plt.title('Episodes vs Reward')
11 |         plt.xlabel('Episodes')
12 |         plt.ylabel('Reward')
13 |         plt.grid()
14 |         plt.savefig('reward.png')
15 |         plt.show()
16 | 
17 |     def plot_steps(self, steps):
18 |         plt.close()
19 |         plt.plot(np.arange(len(steps)), steps, 'r')
20 |         plt.title('Episodes vs Steps')
21 |         plt.xlabel('Episodes')
22 |         plt.ylabel('Steps')
23 |         plt.grid()
24 |         plt.savefig('steps.png')
25 |         plt.show()
26 | 
27 |     def plot_value(self, value):
28 |         plt.close()
29 |         plt.plot(np.arange(len(value)), value, 'g')
30 |         plt.title('Episodes vs Values')
31 |         plt.xlabel('Episodes')
32 |         plt.ylabel('Q-Values')
33 |         plt.grid()
34 |         plt.savefig('value.png')
35 |         plt.show()
36 | 


--------------------------------------------------------------------------------
/q_learning/readme:
--------------------------------------------------------------------------------
1 | Q-learning is an off-policy Temporal Difference (TD) control algorithm that operates based on the action-value function. The action selection mechanism in this program follows an epsilon-greedy approach. With a probability of epsilon, the action is chosen randomly, while with a probability of 1 - epsilon, it is selected based on the maximum Q-value. Subsequently, the agent interacts with the environment to observe the next state and the corresponding reward. Finally, the Q-learning formula is updated with all the data collected during the environment interaction. This loop continues for each episode until it reaches the final step. As mentioned earlier, when the agent encounters an obstacle or reaches the goal, the 'Done' flag activates, and the entire process restarts.
2 | 


--------------------------------------------------------------------------------
/q_learning/run.py:
--------------------------------------------------------------------------------
 1 | '''This is the main file. When you run it, the agent start training process.'''
 2 | 
 3 | from environment import Environment
 4 | from algo_qlearning import QLearning
 5 | from plot_results import Plots
 6 | 
 7 | 
 8 | def main():
 9 |     total_steps = []
10 |     total_rewards = []
11 |     total_values = []
12 |     episodes = 2000
13 | 
14 |     for episode in range(episodes):
15 |         state = env.reset()  # it returns the coordination of the agent
16 |         step = 0
17 |         value = 0
18 |         reward_value = 0
19 |         while True:
20 |             env.refresh()
21 |             action = RL.choose_action(str(state))
22 |             next_state, reward, done = env.step(action)
23 |             value += RL.learning(str(state), action, reward, str(next_state))
24 |             state = next_state
25 |             step += 1
26 |             reward_value += reward
27 | 
28 |             if done:
29 |                 total_steps += [step]
30 |                 total_rewards += [reward_value]
31 |                 total_values += [value]
32 |                 break
33 | 
34 |     env.final_path()
35 |     plot.plot_reward(total_rewards)
36 |     plot.plot_steps(total_steps)
37 |     plot.plot_value(total_values)
38 |     RL.print_q_table()
39 | 
40 | 
41 | if __name__ == '__main__':
42 |     env = Environment()
43 |     RL = QLearning(actions=list(range(env.num_actions)))
44 |     plot = Plots()
45 |     env.after(10, main)
46 |     env.mainloop()
47 | 


--------------------------------------------------------------------------------
/sarsa/Sample results/data.txt:
--------------------------------------------------------------------------------
 1 | The Shortest Path: 28 
 2 | The Longest Path: 550 
 3 | Optimal Path: [[40.0, 0.0], [40.0, 40.0], [40.0, 80.0], [40.0, 120.0], [40.0, 160.0], [40.0, 200.0], [40.0, 240.0], [80.0, 240.0], [80.0, 280.0], [120.0, 280.0], [160.0, 280.0], [160.0, 320.0], [160.0, 360.0], [200.0, 360.0], [240.0, 360.0], [240.0, 400.0], [280.0, 400.0], [320.0, 400.0], [320.0, 440.0], [320.0, 480.0], [360.0, 480.0], [360.0, 520.0], [360.0, 560.0], [400.0, 560.0], [440.0, 560.0], [480.0, 560.0], [520.0, 560.0], [520.0, 520.0]] 
 4 | Final Path Q-table:                          0          1          2          3
 5 | [40.0, 0.0]      -4.104407   2.099550  -5.000000  -2.055044
 6 | [40.0, 40.0]     -3.302516   1.967389  -4.061254  -5.000000
 7 | [40.0, 80.0]     -4.119704   1.462201  -0.919095  -0.012449
 8 | [40.0, 120.0]    -3.295571   0.998683   1.750480  -5.000000
 9 | [40.0, 160.0]    -2.239626   0.895676  -5.000000  -3.647892
10 | [40.0, 200.0]    -0.719129  -3.280719   0.391534  -1.899991
11 | [40.0, 240.0]    -0.233413  -5.000000   9.847704  -3.284249
12 | [80.0, 240.0]    -4.000866   1.387104  -5.000000  -4.050001
13 | [80.0, 280.0]    -2.664114  -3.163169   0.324364  -5.000000
14 | [120.0, 280.0]   -5.000000  -5.000000   0.170468  -1.978908
15 | [160.0, 280.0]   -3.439421   0.036780  -3.314143  -0.500884
16 | [160.0, 320.0]   -2.579885  -2.332648  -5.000000  -5.000000
17 | [160.0, 360.0]   -0.973352  -0.428563  -3.493924  -4.050894
18 | [200.0, 360.0]   -5.000000  -5.000000  -2.542879  15.517152
19 | [240.0, 360.0]   -3.282946  -1.290420  -5.000000  -4.048767
20 | [240.0, 400.0]   -4.020465  -4.999995   1.036072  -5.000000
21 | [280.0, 400.0]   -5.000000  -1.160020   7.942310  -4.110057
22 | [320.0, 400.0]   -2.777621  16.501423  -4.033209  -2.777935
23 | [320.0, 440.0]   -3.287799  26.688280  -5.000000  -3.087450
24 | [320.0, 480.0]   -1.166957   1.711174   7.072903  -0.391565
25 | [360.0, 480.0]   -5.000000  43.939945   0.138748   0.238742
26 | [360.0, 520.0]    4.678612   5.156711  58.510487   4.770994
27 | [360.0, 560.0]    5.121811   4.392585   3.671136   6.237530
28 | [400.0, 560.0]   59.049000  53.883428  72.220537   7.833147
29 | [440.0, 560.0]   50.404889  13.880242  81.000000   3.740890
30 | [480.0, 560.0]   -5.000000  -2.835820  90.000000   4.278974
31 | [520.0, 560.0]  100.000000  90.000000  -5.000000  49.840349 
32 | Full Q-table::                        0         1         2         3
33 | [0.0, 0.0]     -4.083781 -5.000000  2.638752 -4.418550
34 | Obstacle        0.000000  0.000000  0.000000  0.000000
35 | [40.0, 0.0]    -4.104407  2.099550 -5.000000 -2.055044
36 | [40.0, 40.0]   -3.302516  1.967389 -4.061254 -5.000000
37 | [40.0, 80.0]   -4.119704  1.462201 -0.919095 -0.012449
38 | ...                  ...       ...       ...       ...
39 | [520.0, 440.0] -0.018998 -4.950000 -2.952450 -3.645000
40 | [560.0, 440.0] -0.003294 -4.995000 -1.404229 -0.144136
41 | [560.0, 400.0] -4.089535 -4.009859 -1.768169 -4.010247
42 | [560.0, 360.0] -5.000000 -0.363298 -4.057762 -2.952618
43 | [520.0, 400.0] -2.954453 -2.130813 -0.287858 -4.999500
44 | 
45 | [176 rows x 4 columns] 
46 | 


--------------------------------------------------------------------------------
/sarsa/Sample results/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/sarsa/Sample results/result sarsa.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/result sarsa.png


--------------------------------------------------------------------------------
/sarsa/Sample results/sarsareward.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/sarsareward.png


--------------------------------------------------------------------------------
/sarsa/Sample results/sarsastep.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/sarsastep.png


--------------------------------------------------------------------------------
/sarsa/Sample results/sarsavalue.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/sarsavalue.png


--------------------------------------------------------------------------------
/sarsa/algo_sarsa.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | from environment import final_states
  4 | 
  5 | 
  6 | class SARSA:
  7 | 
  8 |     def __init__(self, actions):
  9 |         '''
 10 |         SARSA inital parameters
 11 | 
 12 |         Parameters
 13 |         ----------
 14 |             actions : int
 15 |                 all actions including up, down, left, right
 16 |             alpha : int
 17 |                 learning rate
 18 |             gamma : int
 19 |                 discount factor
 20 |             epsilon : int
 21 |                 probability
 22 |             decay_factor : int
 23 |             q_table : pandas Dataframe
 24 |                 Q-table with actions as columns
 25 |             q_table_final : pandas Dataframe
 26 |                 final Q-table
 27 |         '''
 28 |         self.actions = actions
 29 |         self.alpha = 0.9
 30 |         self.gamma = 0.9
 31 |         self.epsilon = 0.5
 32 |         self.decay_factor = 0.99995
 33 |         self.q_table = pd.DataFrame(columns=self.actions, dtype=np.float64)
 34 |         self.q_table_final = pd.DataFrame(
 35 |             columns=self.actions, dtype=np.float64)
 36 | 
 37 |     def choose_action(self, observation):
 38 |         '''
 39 |         Returns an action through exploration and exploitation
 40 | 
 41 |                 Parameters:
 42 |                         observation: current state of
 43 |                         the agent in the format of state = '[5.0, 40.0]'
 44 | 
 45 |                 Returns:
 46 |                         action number
 47 |         '''
 48 |         self.check_state_exist(observation)
 49 |         self.epsilon *= self.decay_factor  # epsilon greedy
 50 |         if np.random.uniform(0, 1) < self.epsilon:
 51 |             action = np.random.choice(self.actions)
 52 |         else:
 53 |             state_action = self.q_table.loc[observation, :]
 54 |             #state_action = state_action.reindex(np.random.permutation(state_action.index))
 55 |             action = state_action.idxmax()
 56 |         return action
 57 | 
 58 |     def learning(self, state, action, reward, next_state, next_action):
 59 |         '''
 60 |         Function for learning and updating Q-table with new data
 61 | 
 62 |                 Parameters:
 63 |                         state: current state of the agent
 64 |                         action: chosen action
 65 |                         reward: received reward
 66 |                         next_state: next state that the agent will move
 67 | 
 68 |                 Returns:
 69 |                         update Q-table
 70 |         '''
 71 |         self.check_state_exist(next_state)
 72 |         q_current = self.q_table.loc[state, action]
 73 |         if next_state != 'Goal' or next_state != 'Obstacle' or next_state != 'Rubik':
 74 |             q_target = reward + self.gamma * \
 75 |                 self.q_table.loc[next_state, next_action]
 76 |         else:
 77 |             q_target = reward
 78 |         self.q_table.loc[state, action] += self.alpha * \
 79 |             (q_target - q_current)  # updating Q-table
 80 |         return self.q_table.loc[state, action]
 81 | 
 82 |     def check_state_exist(self, state):
 83 |         '''
 84 |         Adding new states to the Q-table
 85 |         (pd.series generate 1-dimensional array)
 86 |         '''
 87 |         if state not in self.q_table.index:
 88 |             self.q_table = self.q_table.append(pd.Series(
 89 |                 [0] * len(self.actions), index=self.q_table.columns, name=state))
 90 | 
 91 |     def print_q_table(self):
 92 |         '''
 93 |         Saving final Q-table
 94 |         '''
 95 |         final_route = final_states()
 96 |         for i in range(len(final_route)):
 97 |             state = str(final_route[i])
 98 |             for j in range(len(self.q_table.index)):
 99 |                 if self.q_table.index[j] == state:
100 |                     self.q_table_final.loc[state,
101 |                                            :] = self.q_table.loc[state, :]
102 | 
103 |         with open('data.txt', 'a') as f:
104 |             f.write(f'Final Path Q-table: {str(self.q_table_final)} \n')
105 |             f.write(f'Full Q-table:: {str(self.q_table)} \n')
106 | 


--------------------------------------------------------------------------------
/sarsa/environment.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tkinter as tk
  3 | from tkinter import *
  4 | from PIL import Image, ImageTk
  5 | 
  6 | 
  7 | global_variable = {}
  8 | 
  9 | 
 10 | class Environment(tk.Tk, object):
 11 |     def __init__(self):
 12 |         '''
 13 |         Constructs all the necessary attributes for the environment.
 14 | 
 15 |         Parameters
 16 |         ----------
 17 |             num_actions: all actions including up, down, left, right
 18 |             pixels: environment pixels for each location
 19 |             env_height: number of vertical grids for the environment
 20 |             env_width: number of horizontal grids for the environment
 21 |             title: Tkinter environment title
 22 |             geometry: environmet geometry which is (w*px)*(h*px)+offsets
 23 |             comparison_dic: storing agnet pathway for each iteration
 24 |             path_dic: saving final pathway of teh agent
 25 |             key_dic: a counter for stroing paths
 26 |             fake: fake variable for reaching final Goal for first time
 27 |             longest_path: logest path to reach the Goal
 28 |             shortest_path: shortest path to reach the Goal
 29 |         '''
 30 | 
 31 |         super(Environment, self).__init__()
 32 |         self.num_actions = 4
 33 |         self.pixels = 40
 34 |         self.env_height = 15
 35 |         self.env_width = 15
 36 |         self.title('Path Planing with Reinforcement Learning')
 37 |         self.geometry(
 38 |             f'{self.env_width  * self.pixels}x{self.env_height * self.pixels}+600+250')
 39 |         self.build_environment()
 40 |         self.comparison_dic = {}
 41 |         self.path_dic = {}
 42 |         self.key_dic = 0
 43 |         self.fake = True
 44 |         self.longest_path = 0
 45 |         self.shortest_path = 0
 46 | 
 47 |     def build_environment(self):
 48 |         '''
 49 |         environment creation by Tkinter
 50 |         '''
 51 | 
 52 |         self.canvas_widget = tk.Canvas(
 53 |             self,
 54 |             bg='white',
 55 |             height=self.env_height *
 56 |             self.pixels,
 57 |             width=self.env_width *
 58 |             self.pixels)
 59 | 
 60 |         for column in range(0, self.env_width * self.pixels, self.pixels):
 61 |             x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels
 62 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 63 |         for row in range(0, self.env_height * self.pixels, self.pixels):
 64 |             x0, y0, x1, y1 = 0, row, self.env_height * self.pixels, row
 65 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 66 | 
 67 |         img_obstacle = Image.open("images/obstacle.png")
 68 |         self.obstacle_object = ImageTk.PhotoImage(img_obstacle)
 69 | 
 70 |         img_tree = Image.open("images/tree.png")
 71 |         self.tree_object = ImageTk.PhotoImage(img_tree)
 72 | 
 73 |         img_shop = Image.open("images/boot_tree.png")
 74 |         self.shop_object = ImageTk.PhotoImage(img_shop)
 75 | 
 76 |         img_building = Image.open("images/building.png")
 77 |         self.building_object = ImageTk.PhotoImage(img_building)
 78 | 
 79 |         img_cube = Image.open("images/rubik.png")
 80 |         self.cube_object = ImageTk.PhotoImage(img_cube)
 81 | 
 82 |         img_garbage = Image.open("images/garbage.png")
 83 |         self.garbage_object = ImageTk.PhotoImage(img_garbage)
 84 | 
 85 |         self.obstacle1 = self.canvas_widget.create_image(
 86 |             self.pixels * 2, 0, anchor='nw', image=self.obstacle_object)
 87 |         self.obstacle2 = self.canvas_widget.create_image(
 88 |             self.pixels * 9, 0, anchor='nw', image=self.tree_object)
 89 |         self.obstacle3 = self.canvas_widget.create_image(
 90 |             self.pixels * 11, 0, anchor='nw', image=self.obstacle_object)
 91 |         self.obstacle4 = self.canvas_widget.create_image(
 92 |             self.pixels * 14, 0, anchor='nw', image=self.shop_object)
 93 |         self.obstacle5 = self.canvas_widget.create_image(
 94 |             self.pixels * 5, 0, anchor='nw', image=self.obstacle_object)
 95 |         self.obstacle6 = self.canvas_widget.create_image(
 96 |             0, self.pixels, anchor='nw', image=self.building_object)
 97 |         self.obstacle7 = self.canvas_widget.create_image(
 98 |             self.pixels * 7, self.pixels, anchor='nw', image=self.obstacle_object)
 99 |         self.obstacle8 = self.canvas_widget.create_image(
100 |             self.pixels * 9, self.pixels, anchor='nw', image=self.obstacle_object)
101 |         self.obstacle9 = self.canvas_widget.create_image(
102 |             self.pixels * 13, self.pixels, anchor='nw', image=self.obstacle_object)
103 |         self.obstacle10 = self.canvas_widget.create_image(
104 |             self.pixels * 3, self.pixels * 2, anchor='nw', image=self.tree_object)
105 |         self.obstacle11 = self.canvas_widget.create_image(
106 |             self.pixels * 5, self.pixels * 2, anchor='nw', image=self.obstacle_object)
107 |         self.obstacle12 = self.canvas_widget.create_image(
108 |             self.pixels * 11, self.pixels * 2, anchor='nw', image=self.obstacle_object)
109 |         self.obstacle13 = self.canvas_widget.create_image(
110 |             self.pixels * 0, self.pixels * 3, anchor='nw', image=self.building_object)
111 |         self.obstacle14 = self.canvas_widget.create_image(
112 |             self.pixels * 2, self.pixels * 4, anchor='nw', image=self.shop_object)
113 |         self.obstacle15 = self.canvas_widget.create_image(
114 |             self.pixels * 8, self.pixels * 3, anchor='nw', image=self.obstacle_object)
115 |         self.obstacle16 = self.canvas_widget.create_image(
116 |             self.pixels * 9, self.pixels * 3, anchor='nw', image=self.tree_object)
117 |         self.obstacle17 = self.canvas_widget.create_image(
118 |             self.pixels * 14, self.pixels * 3, anchor='nw', image=self.obstacle_object)
119 |         self.obstacle19 = self.canvas_widget.create_image(
120 |             self.pixels * 5, self.pixels * 4, anchor='nw', image=self.building_object)
121 |         self.obstacle20 = self.canvas_widget.create_image(
122 |             self.pixels * 10, self.pixels * 4, anchor='nw', image=self.obstacle_object)
123 |         self.obstacle21 = self.canvas_widget.create_image(
124 |             self.pixels * 13, self.pixels * 4, anchor='nw', image=self.obstacle_object)
125 |         self.obstacle22 = self.canvas_widget.create_image(
126 |             self.pixels * 8, self.pixels * 5, anchor='nw', image=self.shop_object)
127 |         self.obstacle23 = self.canvas_widget.create_image(
128 |             self.pixels * 3, self.pixels * 6, anchor='nw', image=self.obstacle_object)
129 |         self.obstacle24 = self.canvas_widget.create_image(
130 |             self.pixels * 6, self.pixels * 6, anchor='nw', image=self.obstacle_object)
131 |         self.obstacle25 = self.canvas_widget.create_image(
132 |             self.pixels * 11, self.pixels * 6, anchor='nw', image=self.tree_object)
133 |         self.obstacle26 = self.canvas_widget.create_image(
134 |             self.pixels * 14, self.pixels * 6, anchor='nw', image=self.obstacle_object)
135 |         self.obstacle27 = self.canvas_widget.create_image(
136 |             self.pixels * 0, self.pixels * 7, anchor='nw', image=self.obstacle_object)
137 |         self.obstacle28 = self.canvas_widget.create_image(
138 |             self.pixels * 1, self.pixels * 7, anchor='nw', image=self.tree_object)
139 |         self.obstacle29 = self.canvas_widget.create_image(
140 |             self.pixels * 9, self.pixels * 7, anchor='nw', image=self.cube_object)
141 |         self.obstacle30 = self.canvas_widget.create_image(
142 |             self.pixels * 3, self.pixels * 8, anchor='nw', image=self.building_object)
143 |         self.obstacle31 = self.canvas_widget.create_image(
144 |             self.pixels * 5, self.pixels * 8, anchor='nw', image=self.obstacle_object)
145 |         self.obstacle32 = self.canvas_widget.create_image(
146 |             self.pixels * 9, self.pixels * 8, anchor='nw', image=self.shop_object)
147 |         self.obstacle33 = self.canvas_widget.create_image(
148 |             self.pixels * 12, self.pixels * 8, anchor='nw', image=self.tree_object)
149 |         self.obstacle34 = self.canvas_widget.create_image(
150 |             self.pixels * 14, self.pixels * 8, anchor='nw', image=self.obstacle_object)
151 |         self.obstacle35 = self.canvas_widget.create_image(
152 |             self.pixels * 0, self.pixels * 9, anchor='nw', image=self.shop_object)
153 |         self.obstacle36 = self.canvas_widget.create_image(
154 |             self.pixels * 7, self.pixels * 9, anchor='nw', image=self.obstacle_object)
155 |         self.obstacle37 = self.canvas_widget.create_image(
156 |             self.pixels * 3, self.pixels * 10, anchor='nw', image=self.building_object)
157 |         self.obstacle38 = self.canvas_widget.create_image(
158 |             self.pixels * 5, self.pixels * 10, anchor='nw', image=self.tree_object)
159 |         self.obstacle39 = self.canvas_widget.create_image(
160 |             self.pixels * 12, self.pixels * 10, anchor='nw', image=self.obstacle_object)
161 |         self.obstacle40 = self.canvas_widget.create_image(
162 |             self.pixels * 1, self.pixels * 11, anchor='nw', image=self.tree_object)
163 |         self.obstacle41 = self.canvas_widget.create_image(
164 |             self.pixels * 6, self.pixels * 11, anchor='nw', image=self.obstacle_object)
165 |         self.obstacle42 = self.canvas_widget.create_image(
166 |             self.pixels * 9, self.pixels * 11, anchor='nw', image=self.tree_object)
167 |         self.obstacle43 = self.canvas_widget.create_image(
168 |             self.pixels * 12, self.pixels * 12, anchor='nw', image=self.garbage_object)
169 |         self.obstacle44 = self.canvas_widget.create_image(
170 |             self.pixels * 13, self.pixels * 12, anchor='nw', image=self.garbage_object)
171 |         self.obstacle45 = self.canvas_widget.create_image(
172 |             self.pixels * 14, self.pixels * 12, anchor='nw', image=self.garbage_object)
173 |         self.obstacle46 = self.canvas_widget.create_image(
174 |             self.pixels * 2, self.pixels * 13, anchor='nw', image=self.obstacle_object)
175 |         self.obstacle47 = self.canvas_widget.create_image(
176 |             self.pixels * 4, self.pixels * 13, anchor='nw', image=self.building_object)
177 |         self.obstacle48 = self.canvas_widget.create_image(
178 |             self.pixels * 7, self.pixels * 13, anchor='nw', image=self.obstacle_object)
179 |         self.obstacle49 = self.canvas_widget.create_image(
180 |             self.pixels * 12, self.pixels * 13, anchor='nw', image=self.garbage_object)
181 |         self.obstacle50 = self.canvas_widget.create_image(
182 |             self.pixels * 14, self.pixels * 13, anchor='nw', image=self.garbage_object)
183 |         self.obstacle51 = self.canvas_widget.create_image(
184 |             self.pixels * 0, self.pixels * 14, anchor='nw', image=self.building_object)
185 |         self.obstacle52 = self.canvas_widget.create_image(
186 |             self.pixels * 14, self.pixels * 14, anchor='nw', image=self.garbage_object)
187 | 
188 |         img_flag = Image.open("images/flag.png")
189 |         self.flag_object = ImageTk.PhotoImage(img_flag)
190 |         self.flag = self.canvas_widget.create_image(
191 |             self.pixels * 13, self.pixels * 13, anchor='nw', image=self.flag_object)
192 | 
193 |         img_robot = Image.open("images/agent.png")
194 |         self.robot = ImageTk.PhotoImage(img_robot)
195 |         self.agent = self.canvas_widget.create_image(
196 |             0, 0, anchor='nw', image=self.robot)
197 | 
198 |         self.canvas_widget.pack()
199 | 
200 |     def reset(self):
201 |         '''
202 |         reset the environment and all parameters
203 |                 Return:
204 |                      the agent's current state in the format of [120.0, 40.0]
205 |         '''
206 | 
207 |         self.update()
208 |         self.canvas_widget.delete(self.agent)
209 |         self.agent = self.canvas_widget.create_image(
210 |             0, 0, anchor='nw', image=self.robot)
211 |         self.comparison_dic = {}
212 |         self.key_dic = 0
213 |         return self.canvas_widget.coords(self.agent)
214 | 
215 |     def refresh(self):
216 |         '''
217 |         update and refresh the environment before training
218 |         '''
219 | 
220 |         self.update()
221 | 
222 |     def step(self, action):
223 |         '''
224 |         Moving the agent one pixel and update reward, action and next step regarding the agent next location
225 | 
226 |                 Parameters:
227 |                         action: Actions = {0:'up', 1:'down', 2:'right', 3:'left}
228 | 
229 |                 Returns:
230 |                         reward, next step and done flag
231 |         '''
232 | 
233 |         state = self.canvas_widget.coords(self.agent)
234 |         base_action = np.array([0, 0])
235 | 
236 |         if action == 0:
237 |             if state[1] >= self.pixels:
238 |                 base_action[1] -= self.pixels
239 |         elif action == 1:
240 |             if state[1] < (self.env_height - 1) * self.pixels:
241 |                 base_action[1] += self.pixels
242 |         elif action == 2:
243 |             if state[0] < (self.env_width - 1) * self.pixels:
244 |                 base_action[0] += self.pixels
245 |         elif action == 3:
246 |             if state[0] >= self.pixels:
247 |                 base_action[0] -= self.pixels
248 | 
249 |         self.canvas_widget.move(self.agent, base_action[0], base_action[1])
250 |         self.comparison_dic[self.key_dic] = self.canvas_widget.coords(
251 |             self.agent)  # storing new position of agent
252 |         next_state = self.comparison_dic[self.key_dic]
253 |         self.key_dic += 1  # add next key in dictionary
254 | 
255 |         if next_state == self.canvas_widget.coords(self.flag):
256 |             reward = 100
257 |             next_state = 'Goal'
258 |             done = True
259 | 
260 |             # filling the dictionary first time
261 |             if self.fake:
262 |                 for j in range(len(self.comparison_dic)):
263 |                     self.path_dic[j] = self.comparison_dic[j]
264 |                 self.fake = False
265 |                 self.longest_path = len(self.comparison_dic)
266 |                 self.shortest_path = len(self.comparison_dic)
267 | 
268 |             # storing shortest path
269 |             if len(self.comparison_dic) < len(self.path_dic):
270 |                 self.shortest_path = len(self.comparison_dic)
271 |                 self.path_dic = {}
272 |                 for j in range(len(self.comparison_dic)):
273 |                     self.path_dic[j] = self.comparison_dic[j]
274 | 
275 |             # storing longest path
276 |             if len(self.comparison_dic) > self.longest_path:
277 |                 self.longest_path = len(self.comparison_dic)
278 | 
279 |         elif next_state in [self.canvas_widget.coords(self.obstacle1),
280 |                             self.canvas_widget.coords(self.obstacle2),
281 |                             self.canvas_widget.coords(self.obstacle3),
282 |                             self.canvas_widget.coords(self.obstacle4),
283 |                             self.canvas_widget.coords(self.obstacle5),
284 |                             self.canvas_widget.coords(self.obstacle6),
285 |                             self.canvas_widget.coords(self.obstacle7),
286 |                             self.canvas_widget.coords(self.obstacle8),
287 |                             self.canvas_widget.coords(self.obstacle9),
288 |                             self.canvas_widget.coords(self.obstacle10),
289 |                             self.canvas_widget.coords(self.obstacle11),
290 |                             self.canvas_widget.coords(self.obstacle12),
291 |                             self.canvas_widget.coords(self.obstacle13),
292 |                             self.canvas_widget.coords(self.obstacle14),
293 |                             self.canvas_widget.coords(self.obstacle15),
294 |                             self.canvas_widget.coords(self.obstacle16),
295 |                             self.canvas_widget.coords(self.obstacle17),
296 |                             self.canvas_widget.coords(self.obstacle19),
297 |                             self.canvas_widget.coords(self.obstacle20),
298 |                             self.canvas_widget.coords(self.obstacle21),
299 |                             self.canvas_widget.coords(self.obstacle22),
300 |                             self.canvas_widget.coords(self.obstacle23),
301 |                             self.canvas_widget.coords(self.obstacle24),
302 |                             self.canvas_widget.coords(self.obstacle25),
303 |                             self.canvas_widget.coords(self.obstacle26),
304 |                             self.canvas_widget.coords(self.obstacle27),
305 |                             self.canvas_widget.coords(self.obstacle28),
306 |                             self.canvas_widget.coords(self.obstacle30),
307 |                             self.canvas_widget.coords(self.obstacle31),
308 |                             self.canvas_widget.coords(self.obstacle32),
309 |                             self.canvas_widget.coords(self.obstacle33),
310 |                             self.canvas_widget.coords(self.obstacle34),
311 |                             self.canvas_widget.coords(self.obstacle35),
312 |                             self.canvas_widget.coords(self.obstacle36),
313 |                             self.canvas_widget.coords(self.obstacle37),
314 |                             self.canvas_widget.coords(self.obstacle38),
315 |                             self.canvas_widget.coords(self.obstacle39),
316 |                             self.canvas_widget.coords(self.obstacle40),
317 |                             self.canvas_widget.coords(self.obstacle41),
318 |                             self.canvas_widget.coords(self.obstacle42),
319 |                             self.canvas_widget.coords(self.obstacle43),
320 |                             self.canvas_widget.coords(self.obstacle44),
321 |                             self.canvas_widget.coords(self.obstacle45),
322 |                             self.canvas_widget.coords(self.obstacle46),
323 |                             self.canvas_widget.coords(self.obstacle47),
324 |                             self.canvas_widget.coords(self.obstacle48),
325 |                             self.canvas_widget.coords(self.obstacle49),
326 |                             self.canvas_widget.coords(self.obstacle50),
327 |                             self.canvas_widget.coords(self.obstacle51),
328 |                             self.canvas_widget.coords(self.obstacle52)]:
329 |             reward = -5
330 |             done = True
331 |             next_state = 'Obstacle'
332 |             self.comparison_dic = {}
333 |             self.key_dic = 0
334 | 
335 |         elif next_state in [self.canvas_widget.coords(self.obstacle29)]:
336 |             reward = -1
337 |             done = True
338 |             next_state = 'Rubik'
339 |             self.comparison_dic = {}
340 |             self.key_dic = 0
341 | 
342 |         else:
343 |             reward = 0
344 |             done = False
345 | 
346 |         return next_state, reward, done
347 | 
348 |     def final_path(self):
349 |         '''
350 |         saving final path and showing graphically by balck ovals
351 |         '''
352 | 
353 |         origin_point = np.array([20, 20])
354 |         path_list = []
355 |         self.canvas_widget.delete(self.agent)
356 |         for j in range(len(self.path_dic)):
357 |             path_list.append(self.path_dic[j])
358 |             self.track = self.canvas_widget.create_oval(
359 |                 self.path_dic[j][0] + origin_point[0] - 12,
360 |                 self.path_dic[j][1] + origin_point[1] - 12,
361 |                 self.path_dic[j][0] + origin_point[0] + 12,
362 |                 self.path_dic[j][1] + origin_point[1] + 12,
363 |                 fill='black',
364 |                 outline='black')
365 |             # putting the final route in a global variable
366 |             global_variable[j] = self.path_dic[j]
367 | 
368 |         with open('data.txt', 'w') as f:
369 |             f.write(f'The Shortest Path: {str(self.shortest_path)} \n')
370 |             f.write(f'The Longest Path: {str(self.longest_path)} \n')
371 |             f.write(f'Optimal Path: {str(path_list)} \n')
372 | 
373 | 
374 | def final_states():
375 |     '''final route coordination for plotting'''
376 | 
377 |     return global_variable
378 | 


--------------------------------------------------------------------------------
/sarsa/images/agent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/agent.png


--------------------------------------------------------------------------------
/sarsa/images/boot_tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/boot_tree.png


--------------------------------------------------------------------------------
/sarsa/images/building.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/building.png


--------------------------------------------------------------------------------
/sarsa/images/flag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/flag.png


--------------------------------------------------------------------------------
/sarsa/images/garbage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/garbage.png


--------------------------------------------------------------------------------
/sarsa/images/obstacle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/obstacle.png


--------------------------------------------------------------------------------
/sarsa/images/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/sarsa/images/rubik.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/rubik.png


--------------------------------------------------------------------------------
/sarsa/images/tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/tree.png


--------------------------------------------------------------------------------
/sarsa/plot_results.py:
--------------------------------------------------------------------------------
 1 | import matplotlib.pyplot as plt
 2 | import numpy as np
 3 | 
 4 | 
 5 | class Plots:
 6 | 
 7 |     def plot_reward(self, reward):
 8 |         plt.close()
 9 |         plt.plot(np.arange(len(reward)), reward, 'b')
10 |         plt.title('Episodes vs Reward')
11 |         plt.xlabel('Episodes')
12 |         plt.ylabel('Reward')
13 |         plt.grid()
14 |         plt.savefig('reward.png')
15 |         plt.show()
16 | 
17 |     def plot_steps(self, steps):
18 |         plt.close()
19 |         plt.plot(np.arange(len(steps)), steps, 'r')
20 |         plt.title('Episodes vs Steps')
21 |         plt.xlabel('Episodes')
22 |         plt.ylabel('Steps')
23 |         plt.grid()
24 |         plt.savefig('steps.png')
25 |         plt.show()
26 | 
27 |     def plot_value(self, value):
28 |         plt.close()
29 |         plt.plot(np.arange(len(value)), value, 'g')
30 |         plt.title('Episodes vs Values')
31 |         plt.xlabel('Episodes')
32 |         plt.ylabel('Q-Values')
33 |         plt.grid()
34 |         plt.savefig('value.png')
35 |         plt.show()


--------------------------------------------------------------------------------
/sarsa/readme:
--------------------------------------------------------------------------------
1 | SARSA is an on-policy Temporal Difference (TD) control method that shares common features with Q-learning. The convergence properties of the SARSA algorithm depend on the nature of the policy's reliance on Q. As evident, SARSA necessitates the next action to update the Q table, and therefore, it selects the next action based on the next state. The action selection mechanism follows an epsilon-greedy approach, and all procedures are akin to Q-learning, with the exception of episodes during which SARSA requires more time to discover the optimal path.
2 | 


--------------------------------------------------------------------------------
/sarsa/run.py:
--------------------------------------------------------------------------------
 1 | '''This is the main file. When you run it, the agent start training process.'''
 2 | 
 3 | from environment import Environment
 4 | from algo_sarsa import SARSA
 5 | from plot_results import Plots
 6 | 
 7 | 
 8 | def main():
 9 | 
10 |     total_steps = []
11 |     total_rewards = []
12 |     total_values = []
13 |     episodes = 10000
14 | 
15 |     for episode in range(episodes):
16 |         state = env.reset()  # it returns the coordination of the agent
17 |         step = 0
18 |         value = 0
19 |         reward_value = 0
20 |         action = RL.choose_action(str(state))
21 |         while True:
22 |             env.refresh()
23 |             next_state, reward, done = env.step(action)
24 |             next_action = RL.choose_action(str(next_state))
25 |             value += RL.learning(str(state), action, reward,
26 |                                  str(next_state), next_action)
27 |             state = next_state
28 |             action = next_action
29 |             reward_value += reward
30 |             step += 1
31 | 
32 |             if done:
33 |                 total_steps += [step]
34 |                 total_rewards += [reward_value]
35 |                 total_values += [value]
36 |                 break
37 | 
38 |     env.final_path()
39 |     plot.plot_reward(total_rewards)
40 |     plot.plot_steps(total_steps)
41 |     plot.plot_value(total_values)
42 |     RL.print_q_table()
43 | 
44 | 
45 | if __name__ == "__main__":
46 |     env = Environment()
47 |     RL = SARSA(actions=list(range(env.num_actions)))
48 |     plot = Plots()
49 |     env.after(10, main)
50 |     env.mainloop()
51 | 


--------------------------------------------------------------------------------
/td_learning/Sample results/TD_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TD_0.png


--------------------------------------------------------------------------------
/td_learning/Sample results/TDreward.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TDreward.png


--------------------------------------------------------------------------------
/td_learning/Sample results/TDsteps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TDsteps.png


--------------------------------------------------------------------------------
/td_learning/Sample results/TDvalue.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TDvalue.png


--------------------------------------------------------------------------------
/td_learning/Sample results/data.txt:
--------------------------------------------------------------------------------
 1 | The Shortest Path: 95 
 2 | The Longest Path: 336 
 3 | Optimal Path: [[0.0, 0.0], [40.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [40.0, 0.0], [40.0, 0.0], [0.0, 0.0], [40.0, 0.0], [0.0, 0.0], [40.0, 0.0], [40.0, 0.0], [40.0, 40.0], [40.0, 80.0], [40.0, 120.0], [40.0, 80.0], [40.0, 120.0], [40.0, 80.0], [40.0, 120.0], [40.0, 80.0], [80.0, 80.0], [80.0, 120.0], [40.0, 120.0], [40.0, 160.0], [0.0, 160.0], [0.0, 200.0], [0.0, 160.0], [0.0, 160.0], [0.0, 200.0], [40.0, 200.0], [80.0, 200.0], [120.0, 200.0], [120.0, 160.0], [160.0, 160.0], [160.0, 120.0], [200.0, 120.0], [240.0, 120.0], [240.0, 160.0], [240.0, 200.0], [200.0, 200.0], [200.0, 240.0], [200.0, 200.0], [200.0, 240.0], [160.0, 240.0], [160.0, 280.0], [160.0, 240.0], [160.0, 280.0], [160.0, 240.0], [160.0, 280.0], [160.0, 240.0], [160.0, 200.0], [120.0, 200.0], [80.0, 200.0], [120.0, 200.0], [160.0, 200.0], [200.0, 200.0], [240.0, 200.0], [280.0, 200.0], [280.0, 240.0], [280.0, 200.0], [240.0, 200.0], [200.0, 200.0], [200.0, 240.0], [200.0, 280.0], [160.0, 280.0], [160.0, 320.0], [160.0, 360.0], [160.0, 400.0], [160.0, 440.0], [200.0, 440.0], [200.0, 480.0], [240.0, 480.0], [280.0, 480.0], [320.0, 480.0], [360.0, 480.0], [360.0, 520.0], [400.0, 520.0], [400.0, 560.0], [440.0, 560.0], [480.0, 560.0], [440.0, 560.0], [480.0, 560.0], [480.0, 560.0], [480.0, 560.0], [480.0, 560.0], [520.0, 560.0], [520.0, 560.0], [520.0, 560.0], [520.0, 520.0]] 
 4 | Final Path V-table:                 state value
 5 | [0.0, 0.0]        -0.048587
 6 | [40.0, 0.0]       -0.232032
 7 | [40.0, 40.0]      -1.110985
 8 | [40.0, 80.0]      -0.677959
 9 | [40.0, 120.0]     -0.082712
10 | [80.0, 80.0]      -0.580085
11 | [80.0, 120.0]     -0.226284
12 | [40.0, 160.0]     -0.109950
13 | [0.0, 160.0]      -0.126974
14 | [0.0, 200.0]      -0.119014
15 | [40.0, 200.0]     -0.097870
16 | [80.0, 200.0]     -0.199502
17 | [120.0, 200.0]    -0.271317
18 | [120.0, 160.0]    -0.133922
19 | [160.0, 160.0]    -0.299506
20 | [160.0, 120.0]    -3.808410
21 | [200.0, 120.0]    -1.061209
22 | [240.0, 120.0]    -3.797218
23 | [240.0, 160.0]    -0.847080
24 | [240.0, 200.0]    -0.352839
25 | [200.0, 200.0]    -0.517903
26 | [200.0, 240.0]    -0.222678
27 | [160.0, 240.0]    -2.248170
28 | [160.0, 280.0]    -0.168093
29 | [160.0, 200.0]    -0.177441
30 | [280.0, 200.0]    -0.182979
31 | [280.0, 240.0]    -0.450525
32 | [200.0, 280.0]    -0.207689
33 | [160.0, 320.0]    -4.799294
34 | [160.0, 360.0]    -2.910989
35 | [160.0, 400.0]    -4.860101
36 | [160.0, 440.0]    -1.272738
37 | [200.0, 440.0]    -4.681947
38 | [200.0, 480.0]    -3.562581
39 | [240.0, 480.0]    -3.315708
40 | [280.0, 480.0]    -0.111897
41 | [320.0, 480.0]     0.119484
42 | [360.0, 480.0]     0.306290
43 | [360.0, 520.0]     0.335131
44 | [400.0, 520.0]     0.408170
45 | [400.0, 560.0]     3.452500
46 | [440.0, 560.0]     6.391877
47 | [480.0, 560.0]    25.508254
48 | [520.0, 560.0]    65.003347 
49 | Full V-table::                 state value
50 | [0.0, 0.0]        -0.048587
51 | [40.0, 0.0]       -0.232032
52 | [40.0, 40.0]      -1.110985
53 | [40.0, 80.0]      -0.677959
54 | [0.0, 80.0]       -4.789541
55 | ...                     ...
56 | [520.0, 560.0]    65.003347
57 | [480.0, 0.0]      -1.821095
58 | Goal               0.000000
59 | [520.0, 0.0]      -3.951864
60 | [560.0, 160.0]    -4.993713
61 | 
62 | [176 rows x 1 columns] 
63 | 


--------------------------------------------------------------------------------
/td_learning/Sample results/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/td_learning/algo_td0.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | from environment import final_states
 4 | 
 5 | 
 6 | class TemporalDifference:
 7 | 
 8 |     def __init__(self):
 9 |         '''
10 |         TD inital parameters
11 | 
12 |         Parameters
13 |         ----------
14 |             alpha : int
15 |                 learning rate
16 |             gamma : int
17 |                 discount factor
18 |             epsilon : int
19 |                 probability
20 |             decay_factor : int
21 |             q_table : pandas Dataframe
22 |                 V-table with state values as columns
23 |             q_table_final : pandas Dataframe
24 |                 final V-table
25 |         '''
26 | 
27 |         self.alpha = 0.9
28 |         self.gamma = 0.9
29 |         self.epsilon = 0.5
30 |         self.decay_factor = 0.99995
31 |         self.v_table = pd.DataFrame(columns=['state value'], dtype=np.float64)
32 |         self.v_table_final = pd.DataFrame(
33 |             columns=['state value'], dtype=np.float64)
34 | 
35 |     def learning(self, state, reward, next_state):
36 |         '''
37 |         Function for learning and updating V-table with new data
38 | 
39 |                 Parameters:
40 |                         state: current state of the agent
41 |                         action: chosen action
42 |                         reward: received reward
43 |                         next_state: next state that the agent will move
44 | 
45 |                 Returns:
46 |                         update V-table
47 |         '''
48 |         self.check_state_exist(next_state)
49 |         v_current = self.v_table.loc[state]
50 |         if next_state != 'Goal' or next_state != 'Obstacle' or next_state != 'Rubik':
51 |             v_target = reward + self.gamma * self.v_table.loc[next_state]
52 |         else:
53 |             v_target = reward
54 |         self.v_table.loc[state] += self.alpha * (v_target - v_current)
55 |         return self.v_table.loc[state]
56 | 
57 |     def check_state_exist(self, state):
58 |         '''
59 |         Adding new states to the V-table
60 |         '''
61 |         if state not in self.v_table.index:
62 |             self.v_table = self.v_table.append(
63 |                 pd.Series([0] * 1, index=self.v_table.columns, name=state))
64 | 
65 |     def print_v_table(self):
66 |         '''
67 |         Saving final Q-table
68 |         '''
69 |         final_route = final_states()
70 |         for i in range(len(final_route)):
71 |             state = str(final_route[i])
72 |             for j in range(len(self.v_table.index)):
73 |                 if self.v_table.index[j] == state:
74 |                     self.v_table_final.loc[state] = self.v_table.loc[state]
75 | 
76 |         with open('data.txt', 'a') as f:
77 |             f.write(f'Final Path V-table: {str(self.v_table_final)} \n')
78 |             f.write(f'Full V-table:: {str(self.v_table)} \n')
79 | 


--------------------------------------------------------------------------------
/td_learning/environment.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from operator import add
  3 | import tkinter as tk
  4 | from PIL import Image, ImageTk
  5 | 
  6 | 
  7 | global_variable = {}
  8 | 
  9 | 
 10 | class Environment(tk.Tk, object):
 11 |     def __init__(self):
 12 |         '''
 13 |         Constructs all the necessary attributes for the environment.
 14 | 
 15 |         Parameters
 16 |         ----------
 17 |             num_actions: all actions including up, down, left, right
 18 |             pixels: environment pixels for each location
 19 |             env_height: number of vertical grids for the environment
 20 |             env_width: number of horizontal grids for the environment
 21 |             title: Tkinter environment title
 22 |             geometry: environmet geometry which is (w*px)*(h*px)+offsets
 23 |             comparison_dic: storing agnet pathway for each iteration
 24 |             path_dic: saving final pathway of teh agent
 25 |             key_dic: a counter for stroing paths
 26 |             fake: fake variable for reaching final Goal for first time
 27 |             longest_path: logest path to reach the Goal
 28 |             shortest_path: shortest path to reach the Goal
 29 |         '''
 30 |         super(Environment, self).__init__()
 31 |         self.num_actions = 4
 32 |         self.pixels = 40
 33 |         self.env_height = 15
 34 |         self.env_width = 15
 35 |         self.title('Path Planing with Reinforcement Learning')
 36 |         self.geometry(
 37 |             f'{self.env_width * self.pixels }x{self.env_height * self.pixels }+600+250')
 38 |         self.build_environment()
 39 |         self.comparison_dic = {}
 40 |         self.path_dic = {}
 41 |         self.key_dic = 0
 42 |         self.fake = True
 43 |         self.longest_path = 0
 44 |         self.shortest_path = 0
 45 | 
 46 |     def build_environment(self):
 47 |         '''
 48 |         environment creation by Tkinter
 49 |         '''
 50 |         self.canvas_widget = tk.Canvas(
 51 |             self,
 52 |             bg='white',
 53 |             height=self.env_height *
 54 |             self.pixels,
 55 |             width=self.env_width *
 56 |             self.pixels)
 57 |         for column in range(0, self.env_width * self.pixels, self.pixels):
 58 |             x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels
 59 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 60 |         for row in range(0, self.env_height * self.pixels, self.pixels):
 61 |             x0, y0, x1, y1 = 0, row, self.env_height * self.pixels, row
 62 |             self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey')
 63 | 
 64 |         img_obstacle = Image.open("images/obstacle.png")
 65 |         self.obstacle_object = ImageTk.PhotoImage(img_obstacle)
 66 | 
 67 |         img_tree = Image.open("images/tree.png")
 68 |         self.tree_object = ImageTk.PhotoImage(img_tree)
 69 | 
 70 |         img_shop = Image.open("images/boot_tree.png")
 71 |         self.shop_object = ImageTk.PhotoImage(img_shop)
 72 | 
 73 |         img_building = Image.open("images/building.png")
 74 |         self.building_object = ImageTk.PhotoImage(img_building)
 75 | 
 76 |         img_cube = Image.open("images/rubik.png")
 77 |         self.cube_object = ImageTk.PhotoImage(img_cube)
 78 | 
 79 |         img_garbage = Image.open("images/garbage.png")
 80 |         self.garbage_object = ImageTk.PhotoImage(img_garbage)
 81 | 
 82 |         self.obstacle1 = self.canvas_widget.create_image(
 83 |             self.pixels * 2, 0, anchor='nw', image=self.obstacle_object)
 84 |         self.obstacle2 = self.canvas_widget.create_image(
 85 |             self.pixels * 9, 0, anchor='nw', image=self.tree_object)
 86 |         self.obstacle3 = self.canvas_widget.create_image(
 87 |             self.pixels * 11, 0, anchor='nw', image=self.obstacle_object)
 88 |         self.obstacle4 = self.canvas_widget.create_image(
 89 |             self.pixels * 14, 0, anchor='nw', image=self.shop_object)
 90 |         self.obstacle5 = self.canvas_widget.create_image(
 91 |             self.pixels * 5, 0, anchor='nw', image=self.obstacle_object)
 92 |         self.obstacle6 = self.canvas_widget.create_image(
 93 |             0, self.pixels, anchor='nw', image=self.building_object)
 94 |         self.obstacle7 = self.canvas_widget.create_image(
 95 |             self.pixels * 7, self.pixels, anchor='nw', image=self.obstacle_object)
 96 |         self.obstacle8 = self.canvas_widget.create_image(
 97 |             self.pixels * 9, self.pixels, anchor='nw', image=self.obstacle_object)
 98 |         self.obstacle9 = self.canvas_widget.create_image(
 99 |             self.pixels * 13, self.pixels, anchor='nw', image=self.obstacle_object)
100 |         self.obstacle10 = self.canvas_widget.create_image(
101 |             self.pixels * 3, self.pixels * 2, anchor='nw', image=self.tree_object)
102 |         self.obstacle11 = self.canvas_widget.create_image(
103 |             self.pixels * 5, self.pixels * 2, anchor='nw', image=self.obstacle_object)
104 |         self.obstacle12 = self.canvas_widget.create_image(
105 |             self.pixels * 11, self.pixels * 2, anchor='nw', image=self.obstacle_object)
106 |         self.obstacle13 = self.canvas_widget.create_image(
107 |             self.pixels * 0, self.pixels * 3, anchor='nw', image=self.building_object)
108 |         self.obstacle14 = self.canvas_widget.create_image(
109 |             self.pixels * 2, self.pixels * 4, anchor='nw', image=self.shop_object)
110 |         self.obstacle15 = self.canvas_widget.create_image(
111 |             self.pixels * 8, self.pixels * 3, anchor='nw', image=self.obstacle_object)
112 |         self.obstacle16 = self.canvas_widget.create_image(
113 |             self.pixels * 9, self.pixels * 3, anchor='nw', image=self.tree_object)
114 |         self.obstacle17 = self.canvas_widget.create_image(
115 |             self.pixels * 14, self.pixels * 3, anchor='nw', image=self.obstacle_object)
116 |         self.obstacle19 = self.canvas_widget.create_image(
117 |             self.pixels * 5, self.pixels * 4, anchor='nw', image=self.building_object)
118 |         self.obstacle20 = self.canvas_widget.create_image(
119 |             self.pixels * 10, self.pixels * 4, anchor='nw', image=self.obstacle_object)
120 |         self.obstacle21 = self.canvas_widget.create_image(
121 |             self.pixels * 13, self.pixels * 4, anchor='nw', image=self.obstacle_object)
122 |         self.obstacle22 = self.canvas_widget.create_image(
123 |             self.pixels * 8, self.pixels * 5, anchor='nw', image=self.shop_object)
124 |         self.obstacle23 = self.canvas_widget.create_image(
125 |             self.pixels * 3, self.pixels * 6, anchor='nw', image=self.obstacle_object)
126 |         self.obstacle24 = self.canvas_widget.create_image(
127 |             self.pixels * 6, self.pixels * 6, anchor='nw', image=self.obstacle_object)
128 |         self.obstacle25 = self.canvas_widget.create_image(
129 |             self.pixels * 11, self.pixels * 6, anchor='nw', image=self.tree_object)
130 |         self.obstacle26 = self.canvas_widget.create_image(
131 |             self.pixels * 14, self.pixels * 6, anchor='nw', image=self.obstacle_object)
132 |         self.obstacle27 = self.canvas_widget.create_image(
133 |             self.pixels * 0, self.pixels * 7, anchor='nw', image=self.obstacle_object)
134 |         self.obstacle28 = self.canvas_widget.create_image(
135 |             self.pixels * 1, self.pixels * 7, anchor='nw', image=self.tree_object)
136 |         self.obstacle29 = self.canvas_widget.create_image(
137 |             self.pixels * 9, self.pixels * 7, anchor='nw', image=self.cube_object)
138 |         self.obstacle30 = self.canvas_widget.create_image(
139 |             self.pixels * 3, self.pixels * 8, anchor='nw', image=self.building_object)
140 |         self.obstacle31 = self.canvas_widget.create_image(
141 |             self.pixels * 5, self.pixels * 8, anchor='nw', image=self.obstacle_object)
142 |         self.obstacle32 = self.canvas_widget.create_image(
143 |             self.pixels * 9, self.pixels * 8, anchor='nw', image=self.shop_object)
144 |         self.obstacle33 = self.canvas_widget.create_image(
145 |             self.pixels * 12, self.pixels * 8, anchor='nw', image=self.tree_object)
146 |         self.obstacle34 = self.canvas_widget.create_image(
147 |             self.pixels * 14, self.pixels * 8, anchor='nw', image=self.obstacle_object)
148 |         self.obstacle35 = self.canvas_widget.create_image(
149 |             self.pixels * 0, self.pixels * 9, anchor='nw', image=self.shop_object)
150 |         self.obstacle36 = self.canvas_widget.create_image(
151 |             self.pixels * 7, self.pixels * 9, anchor='nw', image=self.obstacle_object)
152 |         self.obstacle37 = self.canvas_widget.create_image(
153 |             self.pixels * 3, self.pixels * 10, anchor='nw', image=self.building_object)
154 |         self.obstacle38 = self.canvas_widget.create_image(
155 |             self.pixels * 5, self.pixels * 10, anchor='nw', image=self.tree_object)
156 |         self.obstacle39 = self.canvas_widget.create_image(
157 |             self.pixels * 12, self.pixels * 10, anchor='nw', image=self.obstacle_object)
158 |         self.obstacle40 = self.canvas_widget.create_image(
159 |             self.pixels * 1, self.pixels * 11, anchor='nw', image=self.tree_object)
160 |         self.obstacle41 = self.canvas_widget.create_image(
161 |             self.pixels * 6, self.pixels * 11, anchor='nw', image=self.obstacle_object)
162 |         self.obstacle42 = self.canvas_widget.create_image(
163 |             self.pixels * 9, self.pixels * 11, anchor='nw', image=self.tree_object)
164 |         self.obstacle43 = self.canvas_widget.create_image(
165 |             self.pixels * 12, self.pixels * 12, anchor='nw', image=self.garbage_object)
166 |         self.obstacle44 = self.canvas_widget.create_image(
167 |             self.pixels * 13, self.pixels * 12, anchor='nw', image=self.garbage_object)
168 |         self.obstacle45 = self.canvas_widget.create_image(
169 |             self.pixels * 14, self.pixels * 12, anchor='nw', image=self.garbage_object)
170 |         self.obstacle46 = self.canvas_widget.create_image(
171 |             self.pixels * 2, self.pixels * 13, anchor='nw', image=self.obstacle_object)
172 |         self.obstacle47 = self.canvas_widget.create_image(
173 |             self.pixels * 4, self.pixels * 13, anchor='nw', image=self.building_object)
174 |         self.obstacle48 = self.canvas_widget.create_image(
175 |             self.pixels * 7, self.pixels * 13, anchor='nw', image=self.obstacle_object)
176 |         self.obstacle49 = self.canvas_widget.create_image(
177 |             self.pixels * 12, self.pixels * 13, anchor='nw', image=self.garbage_object)
178 |         self.obstacle50 = self.canvas_widget.create_image(
179 |             self.pixels * 14, self.pixels * 13, anchor='nw', image=self.garbage_object)
180 |         self.obstacle51 = self.canvas_widget.create_image(
181 |             self.pixels * 0, self.pixels * 14, anchor='nw', image=self.building_object)
182 |         self.obstacle52 = self.canvas_widget.create_image(
183 |             self.pixels * 14, self.pixels * 14, anchor='nw', image=self.garbage_object)
184 | 
185 |         img_flag = Image.open("images/flag.png")
186 |         self.flag_object = ImageTk.PhotoImage(img_flag)
187 |         self.flag = self.canvas_widget.create_image(
188 |             self.pixels * 13, self.pixels * 13, anchor='nw', image=self.flag_object)
189 | 
190 |         img_robot = Image.open("images/agent.png")
191 |         self.robot = ImageTk.PhotoImage(img_robot)
192 |         self.agent = self.canvas_widget.create_image(
193 |             0, 0, anchor='nw', image=self.robot)
194 | 
195 |         self.canvas_widget.pack()
196 | 
197 |         self.obstacles = [self.canvas_widget.coords(self.obstacle1),
198 |                           self.canvas_widget.coords(self.obstacle2),
199 |                           self.canvas_widget.coords(self.obstacle3),
200 |                           self.canvas_widget.coords(self.obstacle4),
201 |                           self.canvas_widget.coords(self.obstacle5),
202 |                           self.canvas_widget.coords(self.obstacle6),
203 |                           self.canvas_widget.coords(self.obstacle7),
204 |                           self.canvas_widget.coords(self.obstacle8),
205 |                           self.canvas_widget.coords(self.obstacle9),
206 |                           self.canvas_widget.coords(self.obstacle10),
207 |                           self.canvas_widget.coords(self.obstacle11),
208 |                           self.canvas_widget.coords(self.obstacle12),
209 |                           self.canvas_widget.coords(self.obstacle13),
210 |                           self.canvas_widget.coords(self.obstacle14),
211 |                           self.canvas_widget.coords(self.obstacle15),
212 |                           self.canvas_widget.coords(self.obstacle16),
213 |                           self.canvas_widget.coords(self.obstacle17),
214 |                           self.canvas_widget.coords(self.obstacle19),
215 |                           self.canvas_widget.coords(self.obstacle20),
216 |                           self.canvas_widget.coords(self.obstacle21),
217 |                           self.canvas_widget.coords(self.obstacle22),
218 |                           self.canvas_widget.coords(self.obstacle23),
219 |                           self.canvas_widget.coords(self.obstacle24),
220 |                           self.canvas_widget.coords(self.obstacle25),
221 |                           self.canvas_widget.coords(self.obstacle26),
222 |                           self.canvas_widget.coords(self.obstacle27),
223 |                           self.canvas_widget.coords(self.obstacle28),
224 |                           self.canvas_widget.coords(self.obstacle30),
225 |                           self.canvas_widget.coords(self.obstacle31),
226 |                           self.canvas_widget.coords(self.obstacle32),
227 |                           self.canvas_widget.coords(self.obstacle33),
228 |                           self.canvas_widget.coords(self.obstacle34),
229 |                           self.canvas_widget.coords(self.obstacle35),
230 |                           self.canvas_widget.coords(self.obstacle36),
231 |                           self.canvas_widget.coords(self.obstacle37),
232 |                           self.canvas_widget.coords(self.obstacle38),
233 |                           self.canvas_widget.coords(self.obstacle39),
234 |                           self.canvas_widget.coords(self.obstacle40),
235 |                           self.canvas_widget.coords(self.obstacle41),
236 |                           self.canvas_widget.coords(self.obstacle42),
237 |                           self.canvas_widget.coords(self.obstacle43),
238 |                           self.canvas_widget.coords(self.obstacle44),
239 |                           self.canvas_widget.coords(self.obstacle45),
240 |                           self.canvas_widget.coords(self.obstacle46),
241 |                           self.canvas_widget.coords(self.obstacle47),
242 |                           self.canvas_widget.coords(self.obstacle48),
243 |                           self.canvas_widget.coords(self.obstacle49),
244 |                           self.canvas_widget.coords(self.obstacle50),
245 |                           self.canvas_widget.coords(self.obstacle51),
246 |                           self.canvas_widget.coords(self.obstacle52)]
247 | 
248 |     def reset(self):
249 |         '''
250 |         reset the environment and all parameters
251 |                 Return:
252 |                      the agent's current state in the format of [120.0, 40.0]
253 |         '''
254 |         self.update()
255 |         self.canvas_widget.delete(self.agent)
256 |         self.agent = self.canvas_widget.create_image(
257 |             0, 0, anchor='nw', image=self.robot)
258 |         self.comparison_dic = {}
259 |         self.key_dic = 0
260 |         return self.canvas_widget.coords(self.agent)
261 | 
262 |     def refresh(self):
263 |         '''
264 |         update and refresh the environment before training
265 |         '''
266 |         self.update()
267 | 
268 |     def policy(self, state):
269 |         '''
270 |         choosing teh best action based on current state of the agent
271 |         '''
272 |         # right
273 |         if list(map(add, state, [40, 0])) in self.obstacles:
274 |             action = np.random.choice([0, 1, 3])
275 |         elif list(map(add, state, [-40, 0])) in self.obstacles:
276 |             action = np.random.choice([0, 1, 2])
277 |         # down
278 |         elif list(map(add, state, [0, 40])) in self.obstacles:
279 |             action = np.random.choice([0, 2, 3])
280 |         elif list(map(add, state, [0, -40])) in self.obstacles:
281 |             action = np.random.choice([1, 2, 3])
282 |         else:
283 |             action = np.random.choice(self.num_actions)
284 |         return action
285 | 
286 |     def step(self, action):
287 |         '''
288 |         Moving the agent one pixel and update reward, action and next step regarding the agent next location
289 | 
290 |                 Parameters:
291 |                         action: Actions = {0:'up', 1:'down', 2:'right', 3:'left}
292 | 
293 |                 Returns:
294 |                         reward, next step and done flag
295 |         '''
296 |         state = self.canvas_widget.coords(self.agent)
297 |         base_action = np.array([0, 0])
298 | 
299 |         if action == 0:
300 |             if state[1] >= self.pixels:
301 |                 base_action[1] -= self.pixels
302 |         elif action == 1:
303 |             if state[1] < (self.env_height - 1) * self.pixels:
304 |                 base_action[1] += self.pixels
305 |         elif action == 2:
306 |             if state[0] < (self.env_width - 1) * self.pixels:
307 |                 base_action[0] += self.pixels
308 |         elif action == 3:
309 |             if state[0] >= self.pixels:
310 |                 base_action[0] -= self.pixels
311 | 
312 |         self.canvas_widget.move(self.agent, base_action[0], base_action[1])
313 |         self.comparison_dic[self.key_dic] = self.canvas_widget.coords(
314 |             self.agent)  # storing new position of agent
315 |         next_state = self.comparison_dic[self.key_dic]
316 |         self.key_dic += 1  # add next key in dictionary
317 | 
318 |         if next_state == self.canvas_widget.coords(self.flag):
319 |             reward = 100
320 |             next_state = 'Goal'
321 |             done = True
322 | 
323 |             # filling the dictionary first time
324 |             if self.fake:
325 |                 for j in range(len(self.comparison_dic)):
326 |                     self.path_dic[j] = self.comparison_dic[j]
327 |                 self.fake = False
328 |                 self.longest_path = len(self.comparison_dic)
329 |                 self.shortest_path = len(self.comparison_dic)
330 | 
331 |             # storing shortest path
332 |             if len(self.comparison_dic) < len(self.path_dic):
333 |                 self.shortest_path = len(self.comparison_dic)
334 |                 self.path_dic = {}
335 |                 for j in range(len(self.comparison_dic)):
336 |                     self.path_dic[j] = self.comparison_dic[j]
337 | 
338 |             # storing longest path
339 |             if len(self.comparison_dic) > self.longest_path:
340 |                 self.longest_path = len(self.comparison_dic)
341 | 
342 |         elif next_state in self.obstacles:
343 |             reward = -5
344 |             done = True
345 |             next_state = 'Obstacle'
346 |             self.comparison_dic = {}
347 |             self.key_dic = 0
348 | 
349 |         elif next_state in [self.canvas_widget.coords(self.obstacle29)]:
350 |             reward = -1
351 |             done = True
352 |             next_state = 'Rubik'
353 |             self.comparison_dic = {}
354 |             self.key_dic = 0
355 | 
356 |         else:
357 |             reward = 0
358 |             done = False
359 | 
360 |         return next_state, reward, done
361 | 
362 |     def final_path(self):
363 |         '''
364 |         saving final path and showing graphically by balck ovals
365 |         '''
366 |         origin_point = np.array([20, 20])
367 |         path_list = []
368 |         self.canvas_widget.delete(self.agent)
369 |         for j in range(len(self.path_dic)):
370 |             path_list.append(self.path_dic[j])
371 |             self.track = self.canvas_widget.create_oval(
372 |                 self.path_dic[j][0] + origin_point[0] - 12,
373 |                 self.path_dic[j][1] + origin_point[1] - 12,
374 |                 self.path_dic[j][0] + origin_point[0] + 12,
375 |                 self.path_dic[j][1] + origin_point[1] + 12,
376 |                 fill='black',
377 |                 outline='black')
378 |             # putting the final route in a global variable
379 |             global_variable[j] = self.path_dic[j]
380 | 
381 |         with open('data.txt', 'w') as f:
382 |             f.write(f'The Shortest Path: {str(self.shortest_path)} \n')
383 |             f.write(f'The Longest Path: {str(self.longest_path)} \n')
384 |             f.write(f'Optimal Path: {str(path_list)} \n')
385 | 
386 | 
387 | def final_states():
388 |     '''final route coordination for plotting'''
389 | 
390 |     return global_variable
391 | 


--------------------------------------------------------------------------------
/td_learning/images/agent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/agent.png


--------------------------------------------------------------------------------
/td_learning/images/boot_tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/boot_tree.png


--------------------------------------------------------------------------------
/td_learning/images/building.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/building.png


--------------------------------------------------------------------------------
/td_learning/images/flag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/flag.png


--------------------------------------------------------------------------------
/td_learning/images/garbage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/garbage.png


--------------------------------------------------------------------------------
/td_learning/images/obstacle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/obstacle.png


--------------------------------------------------------------------------------
/td_learning/images/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/td_learning/images/rubik.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/rubik.png


--------------------------------------------------------------------------------
/td_learning/images/tree.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/tree.png


--------------------------------------------------------------------------------
/td_learning/plot_results.py:
--------------------------------------------------------------------------------
 1 | import matplotlib.pyplot as plt
 2 | import numpy as np
 3 | 
 4 | 
 5 | class Plots:
 6 | 
 7 |     def plot_reward(self, reward):
 8 |         plt.close()
 9 |         plt.plot(np.arange(len(reward)), reward, 'b')
10 |         plt.title('Episodes vs Reward')
11 |         plt.xlabel('Episodes')
12 |         plt.ylabel('Reward')
13 |         plt.grid()
14 |         plt.savefig('reward.png')
15 |         plt.show()
16 | 
17 |     def plot_steps(self, steps):
18 |         plt.close()
19 |         plt.plot(np.arange(len(steps)), steps, 'r')
20 |         plt.title('Episodes vs Steps')
21 |         plt.xlabel('Episodes')
22 |         plt.ylabel('Steps')
23 |         plt.grid()
24 |         plt.savefig('steps.png')
25 |         plt.show()
26 | 
27 |     def plot_value(self, value):
28 |         plt.close()
29 |         plt.plot(np.arange(len(value)), value, 'g')
30 |         plt.title('Episodes vs Values')
31 |         plt.xlabel('Episodes')
32 |         plt.ylabel('Q-Values')
33 |         plt.grid()
34 |         plt.savefig('value.png')
35 |         plt.show()
36 | 


--------------------------------------------------------------------------------
/td_learning/readme:
--------------------------------------------------------------------------------
1 | "The Temporal Difference (TD) method utilizes experience to address the prediction problem. Given experiences following a policy, this method updates its estimate, V, for the non-terminal states encountered in that experience. In this algorithm, actions are selected according to the policy. In each state, the agent randomly chooses an action based on a simple policy. When moving right encounters an obstacle, the agent will randomly select other actions, and this scenario applies similarly to other directions."
2 | 


--------------------------------------------------------------------------------
/td_learning/run.py:
--------------------------------------------------------------------------------
 1 | '''This is the main file. When you run it, the agent start training process.'''
 2 | 
 3 | from environment import Environment
 4 | from algo_td0 import TemporalDifference
 5 | from plot_results import Plots
 6 | 
 7 | 
 8 | def main():
 9 |     total_steps = []
10 |     total_rewards = []
11 |     total_values = []
12 |     episodes = 20000
13 | 
14 |     for episode in range(episodes):
15 |         state = env.reset()
16 |         step = 0
17 |         value = 0
18 |         reward_value = 0
19 |         while True:
20 |             env.refresh()
21 |             action = env.policy(state)
22 |             next_state, reward, done = env.step(action)
23 |             value += RL.learning(str(state), reward, str(next_state))
24 |             state = next_state
25 |             reward_value += reward
26 |             step += 1
27 | 
28 |             if done:
29 |                 total_steps += [step]
30 |                 total_rewards += [reward_value]
31 |                 total_values += [value]
32 |                 break
33 | 
34 |     env.final_path()
35 |     plot.plot_reward(total_rewards)
36 |     plot.plot_steps(total_steps)
37 |     plot.plot_value(total_values)
38 |     RL.print_v_table()
39 | 
40 | 
41 | if __name__ == "__main__":
42 |     env = Environment()
43 |     # RL = TemporalDifference(actions=list(range(env.num_actions)))
44 |     RL = TemporalDifference()
45 |     plot = Plots()
46 |     env.after(10, main)
47 |     env.mainloop()
48 | 


--------------------------------------------------------------------------------
/training.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/training.gif


--------------------------------------------------------------------------------