├── Project GitHub.pdf ├── README.md ├── cliff_walking ├── Sample results │ ├── cliff env.png │ ├── cliff_data.txt │ ├── readme │ ├── rewardcliff.png │ ├── stepscliff.png │ ├── toute.png │ └── valuescliff.png ├── algo_double_qlearning.py ├── algo_qlearning.py ├── algo_sarsa.py ├── env_cliff.py ├── images │ ├── cliff.png │ ├── end.png │ ├── readme │ ├── robot.png │ └── start.png ├── plot_results.py ├── readme └── run.py ├── env1.png ├── env2.jpg ├── q_learning ├── Sample results │ ├── QLreward.png │ ├── QLsteps.png │ ├── QLvalue.png │ ├── data.txt │ ├── readme │ └── result QL.png ├── algo_qlearning.py ├── environment.py ├── images │ ├── agent.png │ ├── boot_tree.png │ ├── building.png │ ├── flag.png │ ├── garbage.png │ ├── obstacle.png │ ├── readme │ ├── rubik.png │ └── tree.png ├── plot_results.py ├── readme └── run.py ├── sarsa ├── Sample results │ ├── data.txt │ ├── readme │ ├── result sarsa.png │ ├── sarsareward.png │ ├── sarsastep.png │ └── sarsavalue.png ├── algo_sarsa.py ├── environment.py ├── images │ ├── agent.png │ ├── boot_tree.png │ ├── building.png │ ├── flag.png │ ├── garbage.png │ ├── obstacle.png │ ├── readme │ ├── rubik.png │ └── tree.png ├── plot_results.py ├── readme └── run.py ├── td_learning ├── Sample results │ ├── TD_0.png │ ├── TDreward.png │ ├── TDsteps.png │ ├── TDvalue.png │ ├── data.txt │ └── readme ├── algo_td0.py ├── environment.py ├── images │ ├── agent.png │ ├── boot_tree.png │ ├── building.png │ ├── flag.png │ ├── garbage.png │ ├── obstacle.png │ ├── readme │ ├── rubik.png │ └── tree.png ├── plot_results.py ├── readme └── run.py └── training.gif /Project GitHub.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/Project GitHub.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Global Path Planning with Reinforcement Learning Algorithms to Help the WALL-E Robot Reach EVE! 2 | 3 | ## About the Project 4 | 5 | ### Introduction 6 | 7 | This project aims to test various reinforcement learning (RL) algorithms for the global path planning of a mobile robot. The environment is designed based on the popular WALL-E animation, and the tested algorithms include Q-learning, SARSA, TD(0) learning, and double Q-learning (Temporal Difference (TD) learning is a combination of Monte Carlo principles and dynamic programming (DP) concepts). 8 | 9 | ### Environment 1 10 | 11 | The environment is designed with Tkinter, a standard Python interface to the Tcl/Tk GUI toolkit. In this environment, the WALL-E  robot wants to reach the goal, which is the EVE robot, but there are numerous obstacles in the path that it must avoid. 12 | The environment size is 15x15, in which each square is 40x40 pixels, and there are 52 obstacles inside it, including trees, buildings, garbage, road signs, a plant in the boot (based on animation), and a Rubik's cube. The upside left corner (the agent starting position) is (0, 0), and going to the right and down is +X and +Y, respectively. For example, two steps to the right and one step to the down move the agent to the [80, 40] location. 13 | The below figure shows a screenshot of the environment. The position of obstacles and the blocking area around the goal are considered in such a way that it is not easy for the agent to find an optimal path. 14 | 15 | ![Wall-e environment](env1.png) 16 | 17 | ### Environment 2 18 | 19 | In addition to applying some RL algorithms in the mentioned environment, another test is done in a traditional environment. The second problem is Cliff Walking from *"Reinforcement Learning: An Introduction"* by Andrew Barto and Richard S. Sutton. This challenge is solved with those algorithms, and all results and output data are presented in the project's PDF file. You can test them and change the variables or even the environmental features. 20 | 21 | ![Cliff environemnt](env2.jpg) 22 | 23 | ### General concept 24 | 25 | This RL problem is modeled with the Markov decision process (MDP), in which state, action, and reward sets are *S*, *A*, and *R*. The environmental dynamics would be a set of probabilities *p(s', r | s, a)* for all states, actions, and rewards. However, the testing environment is deterministic, and there are no stochastic actions. 26 | The agent can go up, down, right, or left to reach the final goal in training mode. Each time the agent hits an obstacle, it will receive a -5 reward, and the system will reset to the initial point. If the robot reaches the goal, it will receive a massive reward of +100, and the reward is zero for other movements. According to animation, the interest of the WALL-E in a Rubik's cube is considered motivation. It is not the goal, but it has a -1 reward. Solving this problem can be a good evaluation for some popular RL algorithms regarding the number of obstacles, especially around the goal, environment size, and motivation between the paths. 27 | 28 | 29 | ## Installation 30 | 31 | It is easier to create a Conda environment and install the below libraries (Python version 3.9): 32 | 33 | ```bash 34 | conda create -n rl_env python=3.9 35 | conda activate rl_env 36 | conda install -c anaconda pandas 37 | conda install -c anaconda numpy 38 | conda install matplotlib 39 | conda install -c anaconda tk 40 | conda install -c anaconda pillow 41 | ``` 42 | 43 | ## Usage 44 | 45 | At first, download the repository in your destination folder. 46 | 47 | ```bash 48 | git clone https://github.com/pouyan-asg/path-planning-with-RL-algorithms.git 49 | ``` 50 | Then, go to the main folder and select the algorithm that you want to test. 51 | 52 | ```bash 53 | python run.py 54 | 55 | ``` 56 | The training process of the agent is illustrated in the animated GIF below. Over time, observable results emerge. Reference examples are provided for each algorithm. Additionally, you have the flexibility to modify parameters such as learning rate, discount factor, number of iterations, and more to assess and refine the obtained results. 57 | 58 | ![agent training](training.gif) 59 | -------------------------------------------------------------------------------- /cliff_walking/Sample results/cliff env.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/cliff env.png -------------------------------------------------------------------------------- /cliff_walking/Sample results/cliff_data.txt: -------------------------------------------------------------------------------- 1 | Q-Learning: [[0.0, 160.0], [80.0, 160.0], [160.0, 160.0], [240.0, 160.0], [320.0, 160.0], [400.0, 160.0], [480.0, 160.0], [560.0, 160.0], [640.0, 160.0], [720.0, 160.0], [800.0, 160.0], [880.0, 160.0], [880.0, 240.0]] 2 | SARSA: [[0.0, 160.0], [0.0, 80.0], [0.0, 0.0], [80.0, 0.0], [160.0, 0.0], [240.0, 0.0], [320.0, 0.0], [400.0, 0.0], [480.0, 0.0], [560.0, 0.0], [560.0, 80.0], [640.0, 80.0], [720.0, 80.0], [800.0, 80.0], [800.0, 160.0], [880.0, 160.0], [880.0, 240.0]] 3 | Double Q-Learning: [[0.0, 160.0], [0.0, 80.0], [0.0, 80.0], [0.0, 160.0], [0.0, 80.0], [0.0, 0.0], [0.0, 0.0], [80.0, 0.0], [80.0, 0.0], [80.0, 80.0], [0.0, 80.0], [0.0, 0.0], [80.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [80.0, 0.0], [80.0, 0.0], [160.0, 0.0], [80.0, 0.0], [80.0, 80.0], [80.0, 0.0], [80.0, 80.0], [80.0, 0.0], [80.0, 0.0], [160.0, 0.0], [240.0, 0.0], [240.0, 0.0], [240.0, 0.0], [240.0, 0.0], [320.0, 0.0], [320.0, 0.0], [320.0, 0.0], [400.0, 0.0], [400.0, 0.0], [480.0, 0.0], [560.0, 0.0], [560.0, 0.0], [560.0, 0.0], [560.0, 80.0], [640.0, 80.0], [720.0, 80.0], [640.0, 80.0], [640.0, 0.0], [640.0, 80.0], [640.0, 160.0], [640.0, 80.0], [720.0, 80.0], [720.0, 0.0], [720.0, 0.0], [720.0, 80.0], [800.0, 80.0], [880.0, 80.0], [880.0, 160.0], [880.0, 80.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [800.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 0.0], [880.0, 80.0], [800.0, 80.0], [720.0, 80.0], [720.0, 0.0], [720.0, 0.0], [640.0, 0.0], [720.0, 0.0], [800.0, 0.0], [800.0, 80.0], [880.0, 80.0], [800.0, 80.0], [880.0, 80.0], [880.0, 160.0], [880.0, 160.0], [880.0, 160.0], [880.0, 240.0]] 4 | -------------------------------------------------------------------------------- /cliff_walking/Sample results/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /cliff_walking/Sample results/rewardcliff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/rewardcliff.png -------------------------------------------------------------------------------- /cliff_walking/Sample results/stepscliff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/stepscliff.png -------------------------------------------------------------------------------- /cliff_walking/Sample results/toute.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/toute.png -------------------------------------------------------------------------------- /cliff_walking/Sample results/valuescliff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/Sample results/valuescliff.png -------------------------------------------------------------------------------- /cliff_walking/algo_double_qlearning.py: -------------------------------------------------------------------------------- 1 | import random 2 | import pandas as pd 3 | import numpy as np 4 | from operator import add 5 | 6 | 7 | class Double_QLearning: 8 | def __init__(self, actions): 9 | self.actions = actions 10 | self.alpha = 0.9 # learning rate 11 | self.gamma = 0.9 # discount factor 12 | self.probability = 0.5 # fix 13 | self.q_table1 = pd.DataFrame(columns=self.actions, dtype=np.float64) 14 | self.q_table2 = pd.DataFrame(columns=self.actions, dtype=np.float64) 15 | self.q_table_final = pd.DataFrame( 16 | columns=self.actions, dtype=np.float64) 17 | 18 | # exploration and exploitation 19 | def choose_action(self, state, epsilon): 20 | self.check_state_exist1(state) 21 | self.check_state_exist2(state) 22 | if np.random.uniform(0, 1) < epsilon: 23 | action = np.random.choice(self.actions) 24 | else: 25 | state_action1 = list(self.q_table1.loc[state, :]) 26 | state_action2 = list(self.q_table2.loc[state, :]) 27 | state_action = random.shuffle( 28 | list(map(add, state_action1, state_action2))) 29 | action = np.argmax(state_action) 30 | return action 31 | 32 | # Function for learning and updating Q-table with new knowledge 33 | def learning(self, state, action, reward, next_state): 34 | self.check_state_exist1(next_state) 35 | self.check_state_exist2(next_state) 36 | q_current1 = self.q_table1.loc[state, action] 37 | q_current2 = self.q_table2.loc[state, action] 38 | arg1 = self.q_table1.loc[next_state, :].idxmax() 39 | arg2 = self.q_table2.loc[next_state, :].idxmax() 40 | if np.random.random() < self.probability: 41 | if next_state != 'Goal' or next_state != 'Obstacle': 42 | q_target1 = reward + self.gamma * \ 43 | self.q_table2.loc[next_state, arg1] 44 | else: 45 | q_target1 = reward 46 | self.q_table1.loc[state, action] += self.alpha * \ 47 | (q_target1 - q_current1) 48 | else: 49 | if next_state != 'Goal' or next_state != 'Obstacle': 50 | q_target2 = reward + self.gamma * \ 51 | self.q_table1.loc[next_state, arg2] 52 | else: 53 | q_target2 = reward 54 | self.q_table2.loc[state, action] += self.alpha * \ 55 | (q_target2 - q_current2) 56 | 57 | return self.q_table1.loc[state, 58 | action], self.q_table2.loc[state, action] 59 | 60 | # Adding to the Q-table new states (pd.series generate 1-dimensional array) 61 | def check_state_exist1(self, state): 62 | if state not in self.q_table1.index: 63 | self.q_table1 = self.q_table1.append(pd.Series( 64 | [0] * len(self.actions), index=self.q_table1.columns, name=state)) 65 | 66 | def check_state_exist2(self, state): 67 | if state not in self.q_table2.index: 68 | self.q_table2 = self.q_table2.append(pd.Series( 69 | [0] * len(self.actions), index=self.q_table2.columns, name=state)) 70 | -------------------------------------------------------------------------------- /cliff_walking/algo_qlearning.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | 4 | 5 | class QLearning: 6 | 7 | def __init__(self, actions): 8 | ''' 9 | Q leanring inital parameters 10 | 11 | Parameters 12 | ---------- 13 | actions : int 14 | all actions including up, down, left, right 15 | alpha : int 16 | learning rate 17 | gamma : int 18 | discount factor 19 | q_table : pandas Dataframe 20 | Q-table with actions as columns 21 | q_table_final : pandas Dataframe 22 | final Q-table 23 | ''' 24 | self.actions = actions 25 | self.alpha = 0.9 26 | self.gamma = 0.9 27 | self.q_table = pd.DataFrame(columns=self.actions, dtype=np.float64) 28 | self.q_table_final = pd.DataFrame( 29 | columns=self.actions, dtype=np.float64) 30 | 31 | # exploration and exploitation 32 | def choose_action(self, observation, epsilon): 33 | self.check_state_exist(observation) 34 | if np.random.random() < epsilon: 35 | action = np.random.choice(self.actions) # choice randomly 36 | else: 37 | state_action = self.q_table.loc[observation, :] 38 | state_action = state_action.reindex( 39 | np.random.permutation(state_action.index)) 40 | action = state_action.idxmax() 41 | return action 42 | 43 | def learning(self, state, action, reward, next_state): 44 | self.check_state_exist(next_state) 45 | # current state and action for that state 46 | q_current = self.q_table.loc[state, action] 47 | if next_state != 'Goal' or next_state != 'Obstacle': 48 | q_target = reward + self.gamma * \ 49 | self.q_table.loc[next_state, :].max() 50 | else: 51 | q_target = reward 52 | # updating Q-table 53 | self.q_table.loc[state, action] += self.alpha * (q_target - q_current) 54 | # return a value that is Q-value 55 | return self.q_table.loc[state, action] 56 | 57 | # Adding to the Q-table new states (pd.series generate 1-dimensional array) 58 | def check_state_exist(self, state): 59 | if state not in self.q_table.index: 60 | self.q_table = self.q_table.append(pd.Series( 61 | [0] * len(self.actions), index=self.q_table.columns, name=state)) 62 | -------------------------------------------------------------------------------- /cliff_walking/algo_sarsa.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | 4 | 5 | class SARSA: 6 | def __init__(self, actions): 7 | ''' 8 | SARSA inital parameters 9 | 10 | Parameters 11 | ---------- 12 | actions : int 13 | all actions including up, down, left, right 14 | alpha : int 15 | learning rate 16 | gamma : int 17 | discount factor 18 | q_table : pandas Dataframe 19 | Q-table with actions as columns 20 | q_table_final : pandas Dataframe 21 | final Q-table 22 | ''' 23 | self.actions = actions 24 | self.alpha = 0.9 25 | self.gamma = 0.9 26 | self.q_table = pd.DataFrame(columns=self.actions, dtype=np.float64) 27 | self.q_table_final = pd.DataFrame( 28 | columns=self.actions, dtype=np.float64) 29 | 30 | # exploration and exploitation 31 | def choose_action(self, observation, epsilon): 32 | self.check_state_exist(observation) 33 | if np.random.uniform(0, 1) < epsilon: 34 | action = np.random.choice(self.actions) 35 | else: 36 | state_action = self.q_table.loc[observation, :] 37 | action = state_action.idxmax() 38 | return action 39 | 40 | # Function for learning and updating Q-table with new knowledge 41 | def learning(self, state, action, reward, next_state, next_action): 42 | self.check_state_exist(next_state) 43 | # current state and action for that state 44 | q_current = self.q_table.loc[state, action] 45 | if next_state != 'Goal' or next_state != 'Obstacle': 46 | q_target = reward + self.gamma * \ 47 | self.q_table.loc[next_state, next_action] 48 | else: 49 | q_target = reward 50 | # updating Q-table 51 | self.q_table.loc[state, action] += self.alpha * (q_target - q_current) 52 | # return a value that is Q-value 53 | return self.q_table.loc[state, action] 54 | 55 | # Adding to the Q-table new states (pd.series generate 1-dimensional array) 56 | def check_state_exist(self, state): 57 | if state not in self.q_table.index: 58 | self.q_table = self.q_table.append(pd.Series( 59 | [0] * len(self.actions), index=self.q_table.columns, name=state)) 60 | -------------------------------------------------------------------------------- /cliff_walking/env_cliff.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tkinter as tk 3 | from PIL import Image, ImageTk 4 | 5 | 6 | global_variable = {} 7 | 8 | 9 | class Environment(tk.Tk, object): 10 | def __init__(self): 11 | ''' 12 | Constructs all the necessary attributes for the environment. 13 | 14 | Parameters 15 | ---------- 16 | num_actions: all actions including up, down, left, right 17 | pixels: environment pixels for each location 18 | env_height: number of vertical grids for the environment 19 | env_width: number of horizontal grids for the environment 20 | title: Tkinter environment title 21 | geometry: environmet geometry which is (w*px)*(h*px)+offsets 22 | comparison_dic: storing agnet pathway for each iteration 23 | path_dic: saving final pathway of teh agent 24 | key_dic: a counter for stroing paths 25 | fake: fake variable for reaching final Goal for first time 26 | longest_path: logest path to reach the Goal 27 | shortest_path: shortest path to reach the Goal 28 | ''' 29 | 30 | super(Environment, self).__init__() 31 | self.num_actions = 4 32 | self.title('Cliff Walking - Sutton Book') 33 | self.pixels = 80 34 | self.env_height = 4 35 | self.env_width = 12 36 | self.geometry( 37 | f'{self.env_width * self.pixels}x{self.env_height * self.pixels}+450+250') 38 | self.build_environment() 39 | self.comparison_dic = {} 40 | self.path_dic = {} 41 | self.key_dic = 0 42 | self.fake = True 43 | self.longest_path = 0 44 | self.shortest_path = 0 45 | 46 | def build_environment(self): 47 | ''' 48 | environment creation by Tkinter 49 | ''' 50 | 51 | self.canvas_widget = tk.Canvas( 52 | self, 53 | bg='white', 54 | height=self.env_height * 55 | self.pixels, 56 | width=self.env_width * 57 | self.pixels) 58 | for column in range(0, self.env_width * self.pixels, self.pixels): 59 | x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels 60 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 61 | for row in range(0, self.env_height * self.pixels, self.pixels): 62 | x0, y0, x1, y1 = 0, row, self.env_width * self.pixels, row 63 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 64 | 65 | img_cliff = Image.open("images/cliff.png") 66 | self.cliff_object = ImageTk.PhotoImage(img_cliff) 67 | 68 | self.obstacle1 = self.canvas_widget.create_image( 69 | self.pixels, self.pixels * 3, anchor='nw', image=self.cliff_object) 70 | self.obstacle2 = self.canvas_widget.create_image( 71 | self.pixels * 2, self.pixels * 3, anchor='nw', image=self.cliff_object) 72 | self.obstacle3 = self.canvas_widget.create_image( 73 | self.pixels * 3, self.pixels * 3, anchor='nw', image=self.cliff_object) 74 | self.obstacle4 = self.canvas_widget.create_image( 75 | self.pixels * 4, self.pixels * 3, anchor='nw', image=self.cliff_object) 76 | self.obstacle5 = self.canvas_widget.create_image( 77 | self.pixels * 5, self.pixels * 3, anchor='nw', image=self.cliff_object) 78 | self.obstacle6 = self.canvas_widget.create_image( 79 | self.pixels * 6, self.pixels * 3, anchor='nw', image=self.cliff_object) 80 | self.obstacle7 = self.canvas_widget.create_image( 81 | self.pixels * 7, self.pixels * 3, anchor='nw', image=self.cliff_object) 82 | self.obstacle8 = self.canvas_widget.create_image( 83 | self.pixels * 8, self.pixels * 3, anchor='nw', image=self.cliff_object) 84 | self.obstacle9 = self.canvas_widget.create_image( 85 | self.pixels * 9, self.pixels * 3, anchor='nw', image=self.cliff_object) 86 | self.obstacle10 = self.canvas_widget.create_image( 87 | self.pixels * 10, self.pixels * 3, anchor='nw', image=self.cliff_object) 88 | 89 | img_flag = Image.open("images/end.png") 90 | self.flag_object = ImageTk.PhotoImage(img_flag) 91 | self.flag = self.canvas_widget.create_image( 92 | self.pixels * 11, self.pixels * 3, anchor='nw', image=self.flag_object) 93 | 94 | img_robot = Image.open("images/robot.png") 95 | self.robot = ImageTk.PhotoImage(img_robot) 96 | self.agent = self.canvas_widget.create_image( 97 | self.pixels * 0, self.pixels * 3, anchor='nw', image=self.robot) 98 | 99 | img_start = Image.open("images/start.png") 100 | self.start_object = ImageTk.PhotoImage(img_start) 101 | self.start = self.canvas_widget.create_image( 102 | self.pixels * 0, self.pixels * 3, anchor='nw', image=self.start_object) 103 | 104 | self.canvas_widget.pack() 105 | 106 | def reset(self): 107 | ''' 108 | reset the environment and all parameters 109 | Return: 110 | the agent's current state in the format of [120.0, 40.0] 111 | ''' 112 | 113 | self.update() 114 | self.canvas_widget.delete(self.agent) 115 | self.agent = self.canvas_widget.create_image( 116 | 0, self.pixels * 3, anchor='nw', image=self.robot) 117 | self.comparison_dic = {} 118 | self.key_dic = 0 119 | return self.canvas_widget.coords(self.agent) 120 | 121 | def refresh(self): 122 | ''' 123 | update and refresh the environment before training 124 | ''' 125 | self.update() 126 | 127 | def step(self, action): 128 | ''' 129 | Moving the agent one pixel and update reward, action and next step regarding the agent next location 130 | 131 | Parameters: 132 | action: Actions = {0:'up', 1:'down', 2:'right', 3:'left} 133 | 134 | Returns: 135 | reward, next step and done flag 136 | ''' 137 | 138 | state = self.canvas_widget.coords(self.agent) 139 | base_action = np.array([0, 0]) 140 | 141 | if action == 0: 142 | if state[1] >= self.pixels: 143 | base_action[1] -= self.pixels 144 | elif action == 1: 145 | if state[1] < (self.env_height - 1) * self.pixels: 146 | base_action[1] += self.pixels 147 | elif action == 2: 148 | if state[0] < (self.env_width - 1) * self.pixels: 149 | base_action[0] += self.pixels 150 | elif action == 3: 151 | if state[0] >= self.pixels: 152 | base_action[0] -= self.pixels 153 | 154 | self.canvas_widget.move(self.agent, base_action[0], base_action[1]) 155 | self.comparison_dic[self.key_dic] = self.canvas_widget.coords( 156 | self.agent) 157 | next_state = self.comparison_dic[self.key_dic] 158 | self.key_dic += 1 159 | 160 | if next_state == self.canvas_widget.coords(self.flag): 161 | reward = 900 162 | next_state = 'Goal' 163 | done = True 164 | 165 | # filling the dictionary first time 166 | if self.fake: 167 | for j in range(len(self.comparison_dic)): 168 | self.path_dic[j] = self.comparison_dic[j] 169 | self.fake = False 170 | self.longest_path = len(self.comparison_dic) 171 | self.shortest_path = len(self.comparison_dic) 172 | 173 | # storing shortest path 174 | if len(self.comparison_dic) < len(self.path_dic): 175 | self.shortest_path = len(self.comparison_dic) 176 | self.path_dic = {} 177 | for j in range(len(self.comparison_dic)): 178 | self.path_dic[j] = self.comparison_dic[j] 179 | 180 | elif next_state in [self.canvas_widget.coords(self.obstacle1), 181 | self.canvas_widget.coords(self.obstacle2), 182 | self.canvas_widget.coords(self.obstacle3), 183 | self.canvas_widget.coords(self.obstacle4), 184 | self.canvas_widget.coords(self.obstacle5), 185 | self.canvas_widget.coords(self.obstacle6), 186 | self.canvas_widget.coords(self.obstacle7), 187 | self.canvas_widget.coords(self.obstacle8), 188 | self.canvas_widget.coords(self.obstacle9), 189 | self.canvas_widget.coords(self.obstacle10)]: 190 | reward = -100 191 | next_state = 'Obstacle' 192 | self.comparison_dic = {} 193 | self.key_dic = 0 194 | done = True 195 | 196 | else: 197 | reward = -1 198 | done = False 199 | 200 | return next_state, reward, done 201 | 202 | def final_path_Q(self): 203 | ''' 204 | saving final path of Q-learning algorithm 205 | ''' 206 | 207 | origin_point1 = np.array([40, 40]) 208 | path_list1 = [] 209 | self.canvas_widget.delete(self.agent) 210 | for j in range(len(self.path_dic)): 211 | path_list1.append(self.path_dic[j]) 212 | self.track = self.canvas_widget.create_oval( 213 | self.path_dic[j][0] + origin_point1[0] - 30, 214 | self.path_dic[j][1] + origin_point1[1] - 30, 215 | self.path_dic[j][0] + origin_point1[0] + 30, 216 | self.path_dic[j][1] + origin_point1[1] + 30, 217 | fill='#0C4A75', 218 | outline='#00DCFF') 219 | self.path_dic = {} 220 | self.fake = True 221 | f = open("cliff_data.txt", "w") 222 | f.write(f'Q-Learning: {str(path_list1)} \n') 223 | f.close() 224 | print('Path:', path_list1) 225 | 226 | def final_path_SARSA(self): 227 | ''' 228 | saving final path of SARSA algorithm 229 | ''' 230 | 231 | origin_point2 = np.array([40, 40]) 232 | path_list2 = [] 233 | self.canvas_widget.delete(self.agent) 234 | for j in range(len(self.path_dic)): 235 | path_list2.append(self.path_dic[j]) 236 | self.track = self.canvas_widget.create_oval( 237 | self.path_dic[j][0] + origin_point2[0] - 22, 238 | self.path_dic[j][1] + origin_point2[1] - 22, 239 | self.path_dic[j][0] + origin_point2[0] + 22, 240 | self.path_dic[j][1] + origin_point2[1] + 22, 241 | fill='#FF5885', 242 | outline='#FF5885') 243 | self.path_dic = {} 244 | self.fake = True 245 | f = open("cliff_data.txt", "a") 246 | f.write(f'SARSA: {str(path_list2)} \n') 247 | f.close() 248 | print('Path:', path_list2) 249 | 250 | def final_path_DQL(self): 251 | ''' 252 | saving final path double Q-learning algorithm 253 | ''' 254 | 255 | origin_point3 = np.array([40, 40]) 256 | path_list3 = [] 257 | self.canvas_widget.delete(self.agent) 258 | for j in range(len(self.path_dic)): 259 | path_list3.append(self.path_dic[j]) 260 | self.track = self.canvas_widget.create_oval( 261 | self.path_dic[j][0] + origin_point3[0] - 15, 262 | self.path_dic[j][1] + origin_point3[1] - 15, 263 | self.path_dic[j][0] + origin_point3[0] + 15, 264 | self.path_dic[j][1] + origin_point3[1] + 15, 265 | fill='#FF8E23', 266 | outline='#00DCFF') 267 | self.path_dic = {} 268 | self.fake = True 269 | with open("cliff_data.txt", "a") as f: 270 | f.write(f'Double Q-Learning: {str(path_list3)} \n') 271 | print('Path:', path_list3) 272 | -------------------------------------------------------------------------------- /cliff_walking/images/cliff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/cliff.png -------------------------------------------------------------------------------- /cliff_walking/images/end.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/end.png -------------------------------------------------------------------------------- /cliff_walking/images/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /cliff_walking/images/robot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/robot.png -------------------------------------------------------------------------------- /cliff_walking/images/start.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/cliff_walking/images/start.png -------------------------------------------------------------------------------- /cliff_walking/plot_results.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | 5 | class Plots: 6 | 7 | def plot_reward(self, reward1, reward2, reward3): 8 | plt.close() 9 | plt.plot(np.arange(len(reward1)), reward1, 'b') 10 | plt.plot(np.arange(len(reward2)), reward2, 'r') 11 | plt.plot(np.arange(len(reward3)), reward3, 'g') 12 | plt.xlabel('Episodes') 13 | plt.ylabel('Reward') 14 | plt.legend(['q-learning', 'sarsa', 'double q-learning']) 15 | plt.grid() 16 | plt.savefig('rewardcliff.png') 17 | plt.show() 18 | 19 | def plot_steps(self, steps1, steps2, steps3): 20 | plt.close() 21 | plt.plot(np.arange(len(steps1)), steps1, 'b') 22 | plt.plot(np.arange(len(steps2)), steps2, 'r') 23 | plt.plot(np.arange(len(steps3)), steps3, 'g') 24 | plt.xlabel('Episodes') 25 | plt.ylabel('Steps') 26 | plt.legend(['q-learning', 'sarsa', 'double q-learning']) 27 | plt.grid() 28 | plt.savefig('stepscliff.png') 29 | plt.show() 30 | 31 | def plot_value(self, value1, value2, value31, value32): 32 | plt.close() 33 | plt.plot(np.arange(len(value1)), value1, 'b') 34 | plt.plot(np.arange(len(value2)), value2, 'r') 35 | plt.plot(np.arange(len(value31)), value31, 'g') 36 | plt.plot(np.arange(len(value32)), value32, 'black') 37 | plt.xlabel('Episodes') 38 | plt.ylabel('Q-Values') 39 | plt.legend(['q-learning', 40 | 'sarsa', 41 | 'double q-learning 1', 42 | 'double q-learning 2']) 43 | plt.grid() 44 | plt.savefig('valuescliff.png') 45 | plt.show() 46 | -------------------------------------------------------------------------------- /cliff_walking/readme: -------------------------------------------------------------------------------- 1 | This problem is adapted from the book 'Reinforcement Learning: An Introduction' by Andrew Barto and Richard S. Sutton. It represents a standard, undiscounted, episodic task with designated start and goal states. The available actions include moving up, down, right, and left. The default reward is -1 for all transitions except those leading into the region marked 'The Cliff.' Entering this region results in a penalty of -100 and instantaneously relocates the agent back to the start. 2 | -------------------------------------------------------------------------------- /cliff_walking/run.py: -------------------------------------------------------------------------------- 1 | from env_cliff import Environment 2 | from algo_qlearning import QLearning 3 | from algo_sarsa import SARSA 4 | from algo_double_qlearning import Double_QLearning 5 | from plot_results import Plots 6 | 7 | 8 | env = Environment() 9 | QL = QLearning(actions=list(range(env.num_actions))) 10 | SARSA = SARSA(actions=list(range(env.num_actions))) 11 | DQL = Double_QLearning(actions=list(range(env.num_actions))) 12 | plt = Plots() 13 | 14 | 15 | total_steps1 = [] 16 | total_rewards1 = [] 17 | total_values1 = [] 18 | total_steps2 = [] 19 | total_rewards2 = [] 20 | total_values2 = [] 21 | episodes = 500 22 | epsilon1 = 0.1 23 | decay_factor1 = 0.999 24 | epsilon2 = 0.1 25 | decay_factor2 = 0.999 26 | epsilon3 = 0.8 27 | decay_factor3 = 0.999 28 | 29 | 30 | print('Q-learning') 31 | for episode in range(episodes): 32 | state1 = env.reset() 33 | step1 = 0 34 | value1 = 0 35 | reward_value1 = 0 36 | epsilon1 *= decay_factor1 37 | while True: 38 | env.refresh() 39 | action1 = QL.choose_action(str(state1), epsilon1) 40 | next_state1, reward1, done1 = env.step(action1) 41 | value1 += QL.learning(str(state1), action1, reward1, str(next_state1)) 42 | state1 = next_state1 43 | step1 += 1 44 | reward_value1 += reward1 45 | 46 | if done1: 47 | total_steps1 += [step1] 48 | total_rewards1 += [reward_value1] 49 | total_values1 += [value1] 50 | break 51 | env.final_path_Q() 52 | 53 | print() 54 | print('SARSA') 55 | for episode in range(episodes): 56 | state2 = env.reset() 57 | step2 = 0 58 | value2 = 0 59 | reward_value2 = 0 60 | epsilon2 *= decay_factor2 61 | action2 = SARSA.choose_action(str(state2), epsilon2) 62 | while True: 63 | env.refresh() 64 | next_state2, reward2, done2 = env.step(action2) 65 | next_action2 = SARSA.choose_action(str(next_state2), epsilon2) 66 | value2 += SARSA.learning(str(state2), action2, 67 | reward2, str(next_state2), next_action2) 68 | state2 = next_state2 69 | action2 = next_action2 70 | reward_value2 += reward2 71 | step2 += 1 72 | 73 | if done2: 74 | total_steps2 += [step2] 75 | total_rewards2 += [reward_value2] 76 | total_values2 += [value2] 77 | break 78 | env.final_path_SARSA() 79 | 80 | print() 81 | print('Double Q-learning') 82 | total_steps3 = [] 83 | total_rewards3 = [] 84 | total_values13 = [] 85 | total_values23 = [] 86 | value13 = 0 87 | value23 = 0 88 | for episode in range(episodes): 89 | state3 = env.reset() 90 | step3 = 0 91 | reward_value3 = 0 92 | epsilon3 *= decay_factor3 93 | while True: 94 | env.refresh() 95 | action3 = DQL.choose_action(str(state3), epsilon3) 96 | next_state3, reward3, done3 = env.step(action3) 97 | value13, value23 = DQL.learning( 98 | str(state3), action3, reward3, str(next_state3)) 99 | value13 += value13 100 | value23 += value23 101 | state3 = next_state3 102 | step3 += 1 103 | reward_value3 += reward3 104 | 105 | if done3: 106 | total_steps3 += [step3] 107 | total_rewards3 += [reward_value3] 108 | total_values13 += [value13] 109 | total_values23 += [value23] 110 | break 111 | 112 | env.final_path_DQL() 113 | 114 | plt.plot_reward(total_rewards1, total_rewards2, total_rewards3) 115 | plt.plot_steps(total_steps1, total_steps2, total_steps3) 116 | plt.plot_value(total_values1, total_values2, total_values13, total_values23) 117 | 118 | env.after(1000) 119 | env.mainloop() 120 | -------------------------------------------------------------------------------- /env1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/env1.png -------------------------------------------------------------------------------- /env2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/env2.jpg -------------------------------------------------------------------------------- /q_learning/Sample results/QLreward.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/QLreward.png -------------------------------------------------------------------------------- /q_learning/Sample results/QLsteps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/QLsteps.png -------------------------------------------------------------------------------- /q_learning/Sample results/QLvalue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/QLvalue.png -------------------------------------------------------------------------------- /q_learning/Sample results/data.txt: -------------------------------------------------------------------------------- 1 | The Shortest Path: 28 2 | The Longest Path: 1401 3 | Optimal Path: [[40.0, 0.0], [40.0, 40.0], [80.0, 40.0], [80.0, 80.0], [80.0, 120.0], [120.0, 120.0], [160.0, 120.0], [160.0, 160.0], [160.0, 200.0], [160.0, 240.0], [160.0, 280.0], [200.0, 280.0], [240.0, 280.0], [280.0, 280.0], [320.0, 280.0], [320.0, 320.0], [320.0, 360.0], [360.0, 360.0], [360.0, 400.0], [400.0, 400.0], [400.0, 440.0], [400.0, 480.0], [400.0, 520.0], [400.0, 560.0], [440.0, 560.0], [480.0, 560.0], [520.0, 560.0], [520.0, 520.0]] 4 | Final Path Q-table: 0 1 2 3 5 | [40.0, 0.0] 0.000000 6.461082 -5.000000 0.000000 6 | [40.0, 40.0] 0.000000 0.000000 7.178980 -5.000000 7 | [80.0, 40.0] -5.000000 7.976644 0.000000 0.000000 8 | [80.0, 80.0] 0.000000 8.862938 -5.000000 0.000000 9 | [80.0, 120.0] 0.000000 -5.000000 9.847709 0.000000 10 | [120.0, 120.0] -5.000000 0.000000 10.941899 7.976644 11 | [160.0, 120.0] 0.000000 12.157665 0.000000 0.000000 12 | [160.0, 160.0] 9.847709 13.508517 -5.000000 0.000000 13 | [160.0, 200.0] 0.000000 15.009464 0.000000 0.000000 14 | [160.0, 240.0] 0.000000 16.677182 0.000000 -4.950000 15 | [160.0, 280.0] 0.000000 0.000000 18.530202 0.000000 16 | [200.0, 280.0] 0.000000 -4.950000 20.589113 0.000000 17 | [240.0, 280.0] -4.999500 0.000000 22.876792 0.000000 18 | [280.0, 280.0] 0.000000 0.000000 25.418658 0.000000 19 | [320.0, 280.0] 0.000000 28.242954 -0.900000 0.000000 20 | [320.0, 320.0] 0.000000 31.381060 -4.500000 0.000000 21 | [320.0, 360.0] 0.000000 0.000000 34.867844 -4.500000 22 | [360.0, 360.0] -4.500000 38.742049 0.000000 0.000000 23 | [360.0, 400.0] 0.000000 -4.500000 43.046721 0.000000 24 | [400.0, 400.0] 0.000000 47.829690 0.000000 0.000000 25 | [400.0, 440.0] 0.000000 53.144100 0.000000 -4.500000 26 | [400.0, 480.0] 0.000000 59.049000 0.000000 43.046717 27 | [400.0, 520.0] 0.000000 65.610000 0.000000 0.000000 28 | [400.0, 560.0] 0.000000 0.000000 72.900000 0.000000 29 | [440.0, 560.0] 0.000000 0.000000 81.000000 0.000000 30 | [480.0, 560.0] -4.950000 0.000000 90.000000 0.000000 31 | [520.0, 560.0] 100.000000 0.000000 -4.500000 0.000000 32 | Full Q-table:: 0 1 2 3 33 | [0.0, 0.0] 0.0 -5.000000 5.814974 0.0 34 | Obstacle 0.0 0.000000 0.000000 0.0 35 | [40.0, 0.0] 0.0 6.461082 -5.000000 0.0 36 | [40.0, 40.0] 0.0 0.000000 7.178980 -5.0 37 | [80.0, 40.0] -5.0 7.976644 0.000000 0.0 38 | ... ... ... ... ... 39 | [560.0, 360.0] -4.5 0.000000 0.000000 0.0 40 | [520.0, 360.0] 0.0 0.000000 0.000000 0.0 41 | [560.0, 160.0] -4.5 0.000000 0.000000 -4.5 42 | [560.0, 40.0] -4.5 0.000000 0.000000 -4.5 43 | [520.0, 0.0] 0.0 -4.500000 -4.500000 0.0 44 | 45 | [176 rows x 4 columns] 46 | -------------------------------------------------------------------------------- /q_learning/Sample results/readme: -------------------------------------------------------------------------------- 1 | Some results from my run (they may differ if you run it). 2 | -------------------------------------------------------------------------------- /q_learning/Sample results/result QL.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/Sample results/result QL.png -------------------------------------------------------------------------------- /q_learning/algo_qlearning.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from environment import final_states 4 | 5 | 6 | class QLearning: 7 | 8 | def __init__(self, actions): 9 | ''' 10 | Q leanring inital parameters 11 | 12 | Parameters 13 | ---------- 14 | actions : int 15 | all actions including up, down, left, right 16 | alpha : int 17 | learning rate 18 | gamma : int 19 | discount factor 20 | epsilon : int 21 | probability 22 | decay_factor : int 23 | q_table : pandas Dataframe 24 | Q-table with actions as columns 25 | q_table_final : pandas Dataframe 26 | final Q-table 27 | ''' 28 | self.actions = actions 29 | self.alpha = 0.9 30 | self.gamma = 0.9 31 | self.epsilon = 0.5 32 | self.decay_factor = 0.9999 33 | self.q_table = pd.DataFrame(columns=self.actions, 34 | dtype=np.float64) 35 | self.q_table_final = pd.DataFrame(columns=self.actions, 36 | dtype=np.float64) 37 | 38 | def choose_action(self, observation): 39 | ''' 40 | Returns an action through exploration and exploitation 41 | 42 | Parameters: 43 | observation: current state of 44 | the agent in the format of state = '[5.0, 40.0]' 45 | 46 | Returns: 47 | action number 48 | ''' 49 | self.check_state_exist(observation) 50 | self.epsilon *= self.decay_factor # epsilon greedy 51 | if np.random.random() < self.epsilon: 52 | action = np.random.choice(self.actions) 53 | else: 54 | # access a group of rows and columns [row , column] 55 | state_action = self.q_table.loc[observation, :] 56 | # reindex: based on previous DataFrame, regenerate new indexes 57 | state_action = state_action.reindex( 58 | np.random.permutation(state_action.index)) 59 | action = state_action.idxmax() # return index of first occurrence of maximum value 60 | return action 61 | 62 | def learning(self, state, action, reward, next_state): 63 | ''' 64 | Function for learning and updating Q-table with new data 65 | 66 | Parameters: 67 | state: current state of the agent 68 | action: chosen action 69 | reward: received reward 70 | next_state: next state that the agent will move 71 | 72 | Returns: 73 | update Q-table 74 | ''' 75 | self.check_state_exist(next_state) 76 | q_current = self.q_table.loc[state, action] 77 | if next_state != 'Goal' or next_state != 'Obstacle' or next_state != 'Rubik': 78 | q_target = reward + self.gamma * \ 79 | self.q_table.loc[next_state, :].max() 80 | else: 81 | q_target = reward 82 | 83 | self.q_table.loc[state, action] += self.alpha * \ 84 | (q_target - q_current) # updating Q-table 85 | return self.q_table.loc[state, action] 86 | 87 | def check_state_exist(self, state): 88 | ''' 89 | Adding new states to the Q-table 90 | (pd.series generate 1-dimensional array) 91 | ''' 92 | if state not in self.q_table.index: 93 | self.q_table = self.q_table.append(pd.Series( 94 | [0] * len(self.actions), index=self.q_table.columns, name=state)) 95 | 96 | def print_q_table(self): 97 | ''' 98 | Saving final Q-table 99 | ''' 100 | final_route = final_states() 101 | for i in range(len(final_route)): 102 | state = str(final_route[i]) 103 | for j in range(len(self.q_table.index)): 104 | if self.q_table.index[j] == state: 105 | self.q_table_final.loc[state, 106 | :] = self.q_table.loc[state, :] 107 | 108 | with open('data.txt', 'a') as f: 109 | f.write(f'Final Path Q-table: {str(self.q_table_final)} \n') 110 | f.write(f'Full Q-table:: {str(self.q_table)} \n') 111 | -------------------------------------------------------------------------------- /q_learning/environment.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tkinter as tk 3 | from tkinter import * 4 | from PIL import Image, ImageTk 5 | 6 | 7 | global_variable = {} 8 | 9 | 10 | class Environment(tk.Tk, object): 11 | def __init__(self): 12 | ''' 13 | Constructs all the necessary attributes for the environment. 14 | 15 | Parameters 16 | ---------- 17 | num_actions: all actions including up, down, left, right 18 | pixels: environment pixels for each location 19 | env_height: number of vertical grids for the environment 20 | env_width: number of horizontal grids for the environment 21 | title: Tkinter environment title 22 | geometry: environmet geometry which is (w*px)*(h*px)+offsets 23 | comparison_dic: storing agnet pathway for each iteration 24 | path_dic: saving final pathway of teh agent 25 | key_dic: a counter for stroing paths 26 | fake: fake variable for reaching final Goal for first time 27 | longest_path: logest path to reach the Goal 28 | shortest_path: shortest path to reach the Goal 29 | ''' 30 | 31 | super(Environment, self).__init__() 32 | self.num_actions = 4 33 | self.pixels = 40 34 | self.env_height = 15 35 | self.env_width = 15 36 | self.title('Path Planing with Reinforcement Learning') 37 | self.geometry( 38 | f'{self.env_width* self.pixels}x{self.env_height * self.pixels}+600+250') 39 | self.build_environment() 40 | self.comparison_dic = {} 41 | self.path_dic = {} 42 | self.key_dic = 0 43 | self.fake = True 44 | self.longest_path = 0 45 | self.shortest_path = 0 46 | 47 | def build_environment(self): 48 | ''' 49 | environment creation by Tkinter 50 | ''' 51 | 52 | self.canvas_widget = tk.Canvas( 53 | self, 54 | bg='white', 55 | height=self.env_height * 56 | self.pixels, 57 | width=self.env_width * 58 | self.pixels) 59 | 60 | for column in range(0, self.env_width * self.pixels, self.pixels): 61 | x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels 62 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 63 | for row in range(0, self.env_height * self.pixels, self.pixels): 64 | x0, y0, x1, y1 = 0, row, self.env_height * self.pixels, row 65 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 66 | 67 | img_obstacle = Image.open('images/obstacle.png') 68 | self.obstacle_object = ImageTk.PhotoImage(img_obstacle) 69 | 70 | img_tree = Image.open('images/tree.png') 71 | self.tree_object = ImageTk.PhotoImage(img_tree) 72 | 73 | img_shop = Image.open('images/boot_tree.png') 74 | self.shop_object = ImageTk.PhotoImage(img_shop) 75 | 76 | img_building = Image.open('images/building.png') 77 | self.building_object = ImageTk.PhotoImage(img_building) 78 | 79 | img_cube = Image.open('images/rubik.png') 80 | self.cube_object = ImageTk.PhotoImage(img_cube) 81 | 82 | img_garbage = Image.open('images/garbage.png') 83 | self.garbage_object = ImageTk.PhotoImage(img_garbage) 84 | 85 | self.obstacle1 = self.canvas_widget.create_image( 86 | self.pixels * 2, 0, anchor='nw', image=self.obstacle_object) 87 | self.obstacle2 = self.canvas_widget.create_image( 88 | self.pixels * 9, 0, anchor='nw', image=self.tree_object) 89 | self.obstacle3 = self.canvas_widget.create_image( 90 | self.pixels * 11, 0, anchor='nw', image=self.obstacle_object) 91 | self.obstacle4 = self.canvas_widget.create_image( 92 | self.pixels * 14, 0, anchor='nw', image=self.shop_object) 93 | self.obstacle5 = self.canvas_widget.create_image( 94 | self.pixels * 5, 0, anchor='nw', image=self.obstacle_object) 95 | self.obstacle6 = self.canvas_widget.create_image( 96 | 0, self.pixels, anchor='nw', image=self.building_object) 97 | self.obstacle7 = self.canvas_widget.create_image( 98 | self.pixels * 7, self.pixels, anchor='nw', image=self.obstacle_object) 99 | self.obstacle8 = self.canvas_widget.create_image( 100 | self.pixels * 9, self.pixels, anchor='nw', image=self.obstacle_object) 101 | self.obstacle9 = self.canvas_widget.create_image( 102 | self.pixels * 13, self.pixels, anchor='nw', image=self.obstacle_object) 103 | self.obstacle10 = self.canvas_widget.create_image( 104 | self.pixels * 3, self.pixels * 2, anchor='nw', image=self.tree_object) 105 | self.obstacle11 = self.canvas_widget.create_image( 106 | self.pixels * 5, self.pixels * 2, anchor='nw', image=self.obstacle_object) 107 | self.obstacle12 = self.canvas_widget.create_image( 108 | self.pixels * 11, self.pixels * 2, anchor='nw', image=self.obstacle_object) 109 | self.obstacle13 = self.canvas_widget.create_image( 110 | self.pixels * 0, self.pixels * 3, anchor='nw', image=self.building_object) 111 | self.obstacle14 = self.canvas_widget.create_image( 112 | self.pixels * 2, self.pixels * 4, anchor='nw', image=self.shop_object) 113 | self.obstacle15 = self.canvas_widget.create_image( 114 | self.pixels * 8, self.pixels * 3, anchor='nw', image=self.obstacle_object) 115 | self.obstacle16 = self.canvas_widget.create_image( 116 | self.pixels * 9, self.pixels * 3, anchor='nw', image=self.tree_object) 117 | self.obstacle17 = self.canvas_widget.create_image( 118 | self.pixels * 14, self.pixels * 3, anchor='nw', image=self.obstacle_object) 119 | self.obstacle19 = self.canvas_widget.create_image( 120 | self.pixels * 5, self.pixels * 4, anchor='nw', image=self.building_object) 121 | self.obstacle20 = self.canvas_widget.create_image( 122 | self.pixels * 10, self.pixels * 4, anchor='nw', image=self.obstacle_object) 123 | self.obstacle21 = self.canvas_widget.create_image( 124 | self.pixels * 13, self.pixels * 4, anchor='nw', image=self.obstacle_object) 125 | self.obstacle22 = self.canvas_widget.create_image( 126 | self.pixels * 8, self.pixels * 5, anchor='nw', image=self.shop_object) 127 | self.obstacle23 = self.canvas_widget.create_image( 128 | self.pixels * 3, self.pixels * 6, anchor='nw', image=self.obstacle_object) 129 | self.obstacle24 = self.canvas_widget.create_image( 130 | self.pixels * 6, self.pixels * 6, anchor='nw', image=self.obstacle_object) 131 | self.obstacle25 = self.canvas_widget.create_image( 132 | self.pixels * 11, self.pixels * 6, anchor='nw', image=self.tree_object) 133 | self.obstacle26 = self.canvas_widget.create_image( 134 | self.pixels * 14, self.pixels * 6, anchor='nw', image=self.obstacle_object) 135 | self.obstacle27 = self.canvas_widget.create_image( 136 | self.pixels * 0, self.pixels * 7, anchor='nw', image=self.obstacle_object) 137 | self.obstacle28 = self.canvas_widget.create_image( 138 | self.pixels * 1, self.pixels * 7, anchor='nw', image=self.tree_object) 139 | self.obstacle29 = self.canvas_widget.create_image( 140 | self.pixels * 9, self.pixels * 7, anchor='nw', image=self.cube_object) 141 | self.obstacle30 = self.canvas_widget.create_image( 142 | self.pixels * 3, self.pixels * 8, anchor='nw', image=self.building_object) 143 | self.obstacle31 = self.canvas_widget.create_image( 144 | self.pixels * 5, self.pixels * 8, anchor='nw', image=self.obstacle_object) 145 | self.obstacle32 = self.canvas_widget.create_image( 146 | self.pixels * 9, self.pixels * 8, anchor='nw', image=self.shop_object) 147 | self.obstacle33 = self.canvas_widget.create_image( 148 | self.pixels * 12, self.pixels * 8, anchor='nw', image=self.tree_object) 149 | self.obstacle34 = self.canvas_widget.create_image( 150 | self.pixels * 14, self.pixels * 8, anchor='nw', image=self.obstacle_object) 151 | self.obstacle35 = self.canvas_widget.create_image( 152 | self.pixels * 0, self.pixels * 9, anchor='nw', image=self.shop_object) 153 | self.obstacle36 = self.canvas_widget.create_image( 154 | self.pixels * 7, self.pixels * 9, anchor='nw', image=self.obstacle_object) 155 | self.obstacle37 = self.canvas_widget.create_image( 156 | self.pixels * 3, self.pixels * 10, anchor='nw', image=self.building_object) 157 | self.obstacle38 = self.canvas_widget.create_image( 158 | self.pixels * 5, self.pixels * 10, anchor='nw', image=self.tree_object) 159 | self.obstacle39 = self.canvas_widget.create_image( 160 | self.pixels * 12, self.pixels * 10, anchor='nw', image=self.obstacle_object) 161 | self.obstacle40 = self.canvas_widget.create_image( 162 | self.pixels * 1, self.pixels * 11, anchor='nw', image=self.tree_object) 163 | self.obstacle41 = self.canvas_widget.create_image( 164 | self.pixels * 6, self.pixels * 11, anchor='nw', image=self.obstacle_object) 165 | self.obstacle42 = self.canvas_widget.create_image( 166 | self.pixels * 9, self.pixels * 11, anchor='nw', image=self.tree_object) 167 | self.obstacle43 = self.canvas_widget.create_image( 168 | self.pixels * 12, self.pixels * 12, anchor='nw', image=self.garbage_object) 169 | self.obstacle44 = self.canvas_widget.create_image( 170 | self.pixels * 13, self.pixels * 12, anchor='nw', image=self.garbage_object) 171 | self.obstacle45 = self.canvas_widget.create_image( 172 | self.pixels * 14, self.pixels * 12, anchor='nw', image=self.garbage_object) 173 | self.obstacle46 = self.canvas_widget.create_image( 174 | self.pixels * 2, self.pixels * 13, anchor='nw', image=self.obstacle_object) 175 | self.obstacle47 = self.canvas_widget.create_image( 176 | self.pixels * 4, self.pixels * 13, anchor='nw', image=self.building_object) 177 | self.obstacle48 = self.canvas_widget.create_image( 178 | self.pixels * 7, self.pixels * 13, anchor='nw', image=self.obstacle_object) 179 | self.obstacle49 = self.canvas_widget.create_image( 180 | self.pixels * 12, self.pixels * 13, anchor='nw', image=self.garbage_object) 181 | self.obstacle50 = self.canvas_widget.create_image( 182 | self.pixels * 14, self.pixels * 13, anchor='nw', image=self.garbage_object) 183 | self.obstacle51 = self.canvas_widget.create_image( 184 | self.pixels * 0, self.pixels * 14, anchor='nw', image=self.building_object) 185 | self.obstacle52 = self.canvas_widget.create_image( 186 | self.pixels * 14, self.pixels * 14, anchor='nw', image=self.garbage_object) 187 | 188 | img_flag = Image.open('images/flag.png') 189 | self.flag_object = ImageTk.PhotoImage(img_flag) 190 | self.flag = self.canvas_widget.create_image( 191 | self.pixels * 13, self.pixels * 13, anchor='nw', image=self.flag_object) 192 | 193 | img_robot = Image.open('images/agent.png') 194 | self.robot = ImageTk.PhotoImage(img_robot) 195 | self.agent = self.canvas_widget.create_image( 196 | 0, 0, anchor='nw', image=self.robot) 197 | 198 | self.canvas_widget.pack() 199 | 200 | def reset(self): 201 | ''' 202 | reset the environment and all parameters 203 | Return: 204 | the agent's current state in the format of [120.0, 40.0] 205 | ''' 206 | 207 | self.update() 208 | self.canvas_widget.delete(self.agent) 209 | self.agent = self.canvas_widget.create_image( 210 | 0, 0, anchor='nw', image=self.robot) 211 | self.comparison_dic = {} 212 | self.key_dic = 0 213 | return self.canvas_widget.coords(self.agent) 214 | 215 | def refresh(self): 216 | ''' 217 | update and refresh the environment before training 218 | ''' 219 | self.update() 220 | 221 | def step(self, action): 222 | ''' 223 | Moving the agent one pixel and update reward, action and next step regarding the agent next location 224 | 225 | Parameters: 226 | action: Actions = {0:'up', 1:'down', 2:'right', 3:'left} 227 | 228 | Returns: 229 | reward, next step and done flag 230 | ''' 231 | 232 | state = self.canvas_widget.coords(self.agent) 233 | base_action = np.array([0, 0]) 234 | 235 | if action == 0: 236 | if state[1] >= self.pixels: 237 | base_action[1] -= self.pixels 238 | elif action == 1: 239 | if state[1] < (self.env_height - 1) * self.pixels: 240 | base_action[1] += self.pixels 241 | elif action == 2: 242 | if state[0] < (self.env_width - 1) * self.pixels: 243 | base_action[0] += self.pixels 244 | elif action == 3: 245 | if state[0] >= self.pixels: 246 | base_action[0] -= self.pixels 247 | 248 | self.canvas_widget.move(self.agent, base_action[0], base_action[1]) 249 | self.comparison_dic[self.key_dic] = self.canvas_widget.coords( 250 | self.agent) # storing new position of agent 251 | next_state = self.comparison_dic[self.key_dic] 252 | self.key_dic += 1 # add next key in dictionary 253 | 254 | if next_state == self.canvas_widget.coords(self.flag): 255 | reward = 100 256 | next_state = 'Goal' 257 | done = True 258 | 259 | # filling the dictionary first time 260 | if self.fake: 261 | for j in range(len(self.comparison_dic)): 262 | self.path_dic[j] = self.comparison_dic[j] 263 | self.fake = False 264 | self.longest_path = len(self.comparison_dic) 265 | self.shortest_path = len(self.comparison_dic) 266 | 267 | # storing shortest path 268 | if len(self.comparison_dic) < len(self.path_dic): 269 | self.shortest_path = len(self.comparison_dic) 270 | self.path_dic = {} 271 | for j in range(len(self.comparison_dic)): 272 | self.path_dic[j] = self.comparison_dic[j] 273 | 274 | # storing longest path 275 | if len(self.comparison_dic) > self.longest_path: 276 | self.longest_path = len(self.comparison_dic) 277 | 278 | elif next_state in [self.canvas_widget.coords(self.obstacle1), 279 | self.canvas_widget.coords(self.obstacle2), 280 | self.canvas_widget.coords(self.obstacle3), 281 | self.canvas_widget.coords(self.obstacle4), 282 | self.canvas_widget.coords(self.obstacle5), 283 | self.canvas_widget.coords(self.obstacle6), 284 | self.canvas_widget.coords(self.obstacle7), 285 | self.canvas_widget.coords(self.obstacle8), 286 | self.canvas_widget.coords(self.obstacle9), 287 | self.canvas_widget.coords(self.obstacle10), 288 | self.canvas_widget.coords(self.obstacle11), 289 | self.canvas_widget.coords(self.obstacle12), 290 | self.canvas_widget.coords(self.obstacle13), 291 | self.canvas_widget.coords(self.obstacle14), 292 | self.canvas_widget.coords(self.obstacle15), 293 | self.canvas_widget.coords(self.obstacle16), 294 | self.canvas_widget.coords(self.obstacle17), 295 | self.canvas_widget.coords(self.obstacle19), 296 | self.canvas_widget.coords(self.obstacle20), 297 | self.canvas_widget.coords(self.obstacle21), 298 | self.canvas_widget.coords(self.obstacle22), 299 | self.canvas_widget.coords(self.obstacle23), 300 | self.canvas_widget.coords(self.obstacle24), 301 | self.canvas_widget.coords(self.obstacle25), 302 | self.canvas_widget.coords(self.obstacle26), 303 | self.canvas_widget.coords(self.obstacle27), 304 | self.canvas_widget.coords(self.obstacle28), 305 | self.canvas_widget.coords(self.obstacle30), 306 | self.canvas_widget.coords(self.obstacle31), 307 | self.canvas_widget.coords(self.obstacle32), 308 | self.canvas_widget.coords(self.obstacle33), 309 | self.canvas_widget.coords(self.obstacle34), 310 | self.canvas_widget.coords(self.obstacle35), 311 | self.canvas_widget.coords(self.obstacle36), 312 | self.canvas_widget.coords(self.obstacle37), 313 | self.canvas_widget.coords(self.obstacle38), 314 | self.canvas_widget.coords(self.obstacle39), 315 | self.canvas_widget.coords(self.obstacle40), 316 | self.canvas_widget.coords(self.obstacle41), 317 | self.canvas_widget.coords(self.obstacle42), 318 | self.canvas_widget.coords(self.obstacle43), 319 | self.canvas_widget.coords(self.obstacle44), 320 | self.canvas_widget.coords(self.obstacle45), 321 | self.canvas_widget.coords(self.obstacle46), 322 | self.canvas_widget.coords(self.obstacle47), 323 | self.canvas_widget.coords(self.obstacle48), 324 | self.canvas_widget.coords(self.obstacle49), 325 | self.canvas_widget.coords(self.obstacle50), 326 | self.canvas_widget.coords(self.obstacle51), 327 | self.canvas_widget.coords(self.obstacle52)]: 328 | reward = -5 329 | done = True 330 | next_state = 'Obstacle' 331 | self.comparison_dic = {} 332 | self.key_dic = 0 333 | 334 | elif next_state in [self.canvas_widget.coords(self.obstacle29)]: 335 | reward = -1 336 | done = True 337 | next_state = 'Rubik' 338 | self.comparison_dic = {} 339 | self.key_dic = 0 340 | 341 | else: 342 | reward = 0 343 | done = False 344 | 345 | return next_state, reward, done 346 | 347 | def final_path(self): 348 | ''' 349 | saving final path and showing graphically by balck ovals 350 | ''' 351 | 352 | origin_point = np.array([20, 20]) 353 | path_list = [] 354 | self.canvas_widget.delete(self.agent) 355 | for j in range(len(self.path_dic)): 356 | path_list.append(self.path_dic[j]) 357 | self.track = self.canvas_widget.create_oval( 358 | self.path_dic[j][0] + origin_point[0] - 12, 359 | self.path_dic[j][1] + origin_point[1] - 12, 360 | self.path_dic[j][0] + origin_point[0] + 12, 361 | self.path_dic[j][1] + origin_point[1] + 12, 362 | fill='black', 363 | outline='black') 364 | # putting the final route in a global variable 365 | global_variable[j] = self.path_dic[j] 366 | 367 | with open('data.txt', 'w') as f: 368 | f.write(f'The Shortest Path: {str(self.shortest_path)} \n') 369 | f.write(f'The Longest Path: {str(self.longest_path)} \n') 370 | f.write(f'Optimal Path: {str(path_list)} \n') 371 | 372 | 373 | def final_states(): 374 | '''final route coordination for plotting''' 375 | 376 | return global_variable 377 | -------------------------------------------------------------------------------- /q_learning/images/agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/agent.png -------------------------------------------------------------------------------- /q_learning/images/boot_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/boot_tree.png -------------------------------------------------------------------------------- /q_learning/images/building.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/building.png -------------------------------------------------------------------------------- /q_learning/images/flag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/flag.png -------------------------------------------------------------------------------- /q_learning/images/garbage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/garbage.png -------------------------------------------------------------------------------- /q_learning/images/obstacle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/obstacle.png -------------------------------------------------------------------------------- /q_learning/images/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /q_learning/images/rubik.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/rubik.png -------------------------------------------------------------------------------- /q_learning/images/tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/q_learning/images/tree.png -------------------------------------------------------------------------------- /q_learning/plot_results.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | 5 | class Plots: 6 | 7 | def plot_reward(self, reward): 8 | plt.close() 9 | plt.plot(np.arange(len(reward)), reward, 'b') 10 | plt.title('Episodes vs Reward') 11 | plt.xlabel('Episodes') 12 | plt.ylabel('Reward') 13 | plt.grid() 14 | plt.savefig('reward.png') 15 | plt.show() 16 | 17 | def plot_steps(self, steps): 18 | plt.close() 19 | plt.plot(np.arange(len(steps)), steps, 'r') 20 | plt.title('Episodes vs Steps') 21 | plt.xlabel('Episodes') 22 | plt.ylabel('Steps') 23 | plt.grid() 24 | plt.savefig('steps.png') 25 | plt.show() 26 | 27 | def plot_value(self, value): 28 | plt.close() 29 | plt.plot(np.arange(len(value)), value, 'g') 30 | plt.title('Episodes vs Values') 31 | plt.xlabel('Episodes') 32 | plt.ylabel('Q-Values') 33 | plt.grid() 34 | plt.savefig('value.png') 35 | plt.show() 36 | -------------------------------------------------------------------------------- /q_learning/readme: -------------------------------------------------------------------------------- 1 | Q-learning is an off-policy Temporal Difference (TD) control algorithm that operates based on the action-value function. The action selection mechanism in this program follows an epsilon-greedy approach. With a probability of epsilon, the action is chosen randomly, while with a probability of 1 - epsilon, it is selected based on the maximum Q-value. Subsequently, the agent interacts with the environment to observe the next state and the corresponding reward. Finally, the Q-learning formula is updated with all the data collected during the environment interaction. This loop continues for each episode until it reaches the final step. As mentioned earlier, when the agent encounters an obstacle or reaches the goal, the 'Done' flag activates, and the entire process restarts. 2 | -------------------------------------------------------------------------------- /q_learning/run.py: -------------------------------------------------------------------------------- 1 | '''This is the main file. When you run it, the agent start training process.''' 2 | 3 | from environment import Environment 4 | from algo_qlearning import QLearning 5 | from plot_results import Plots 6 | 7 | 8 | def main(): 9 | total_steps = [] 10 | total_rewards = [] 11 | total_values = [] 12 | episodes = 2000 13 | 14 | for episode in range(episodes): 15 | state = env.reset() # it returns the coordination of the agent 16 | step = 0 17 | value = 0 18 | reward_value = 0 19 | while True: 20 | env.refresh() 21 | action = RL.choose_action(str(state)) 22 | next_state, reward, done = env.step(action) 23 | value += RL.learning(str(state), action, reward, str(next_state)) 24 | state = next_state 25 | step += 1 26 | reward_value += reward 27 | 28 | if done: 29 | total_steps += [step] 30 | total_rewards += [reward_value] 31 | total_values += [value] 32 | break 33 | 34 | env.final_path() 35 | plot.plot_reward(total_rewards) 36 | plot.plot_steps(total_steps) 37 | plot.plot_value(total_values) 38 | RL.print_q_table() 39 | 40 | 41 | if __name__ == '__main__': 42 | env = Environment() 43 | RL = QLearning(actions=list(range(env.num_actions))) 44 | plot = Plots() 45 | env.after(10, main) 46 | env.mainloop() 47 | -------------------------------------------------------------------------------- /sarsa/Sample results/data.txt: -------------------------------------------------------------------------------- 1 | The Shortest Path: 28 2 | The Longest Path: 550 3 | Optimal Path: [[40.0, 0.0], [40.0, 40.0], [40.0, 80.0], [40.0, 120.0], [40.0, 160.0], [40.0, 200.0], [40.0, 240.0], [80.0, 240.0], [80.0, 280.0], [120.0, 280.0], [160.0, 280.0], [160.0, 320.0], [160.0, 360.0], [200.0, 360.0], [240.0, 360.0], [240.0, 400.0], [280.0, 400.0], [320.0, 400.0], [320.0, 440.0], [320.0, 480.0], [360.0, 480.0], [360.0, 520.0], [360.0, 560.0], [400.0, 560.0], [440.0, 560.0], [480.0, 560.0], [520.0, 560.0], [520.0, 520.0]] 4 | Final Path Q-table: 0 1 2 3 5 | [40.0, 0.0] -4.104407 2.099550 -5.000000 -2.055044 6 | [40.0, 40.0] -3.302516 1.967389 -4.061254 -5.000000 7 | [40.0, 80.0] -4.119704 1.462201 -0.919095 -0.012449 8 | [40.0, 120.0] -3.295571 0.998683 1.750480 -5.000000 9 | [40.0, 160.0] -2.239626 0.895676 -5.000000 -3.647892 10 | [40.0, 200.0] -0.719129 -3.280719 0.391534 -1.899991 11 | [40.0, 240.0] -0.233413 -5.000000 9.847704 -3.284249 12 | [80.0, 240.0] -4.000866 1.387104 -5.000000 -4.050001 13 | [80.0, 280.0] -2.664114 -3.163169 0.324364 -5.000000 14 | [120.0, 280.0] -5.000000 -5.000000 0.170468 -1.978908 15 | [160.0, 280.0] -3.439421 0.036780 -3.314143 -0.500884 16 | [160.0, 320.0] -2.579885 -2.332648 -5.000000 -5.000000 17 | [160.0, 360.0] -0.973352 -0.428563 -3.493924 -4.050894 18 | [200.0, 360.0] -5.000000 -5.000000 -2.542879 15.517152 19 | [240.0, 360.0] -3.282946 -1.290420 -5.000000 -4.048767 20 | [240.0, 400.0] -4.020465 -4.999995 1.036072 -5.000000 21 | [280.0, 400.0] -5.000000 -1.160020 7.942310 -4.110057 22 | [320.0, 400.0] -2.777621 16.501423 -4.033209 -2.777935 23 | [320.0, 440.0] -3.287799 26.688280 -5.000000 -3.087450 24 | [320.0, 480.0] -1.166957 1.711174 7.072903 -0.391565 25 | [360.0, 480.0] -5.000000 43.939945 0.138748 0.238742 26 | [360.0, 520.0] 4.678612 5.156711 58.510487 4.770994 27 | [360.0, 560.0] 5.121811 4.392585 3.671136 6.237530 28 | [400.0, 560.0] 59.049000 53.883428 72.220537 7.833147 29 | [440.0, 560.0] 50.404889 13.880242 81.000000 3.740890 30 | [480.0, 560.0] -5.000000 -2.835820 90.000000 4.278974 31 | [520.0, 560.0] 100.000000 90.000000 -5.000000 49.840349 32 | Full Q-table:: 0 1 2 3 33 | [0.0, 0.0] -4.083781 -5.000000 2.638752 -4.418550 34 | Obstacle 0.000000 0.000000 0.000000 0.000000 35 | [40.0, 0.0] -4.104407 2.099550 -5.000000 -2.055044 36 | [40.0, 40.0] -3.302516 1.967389 -4.061254 -5.000000 37 | [40.0, 80.0] -4.119704 1.462201 -0.919095 -0.012449 38 | ... ... ... ... ... 39 | [520.0, 440.0] -0.018998 -4.950000 -2.952450 -3.645000 40 | [560.0, 440.0] -0.003294 -4.995000 -1.404229 -0.144136 41 | [560.0, 400.0] -4.089535 -4.009859 -1.768169 -4.010247 42 | [560.0, 360.0] -5.000000 -0.363298 -4.057762 -2.952618 43 | [520.0, 400.0] -2.954453 -2.130813 -0.287858 -4.999500 44 | 45 | [176 rows x 4 columns] 46 | -------------------------------------------------------------------------------- /sarsa/Sample results/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /sarsa/Sample results/result sarsa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/result sarsa.png -------------------------------------------------------------------------------- /sarsa/Sample results/sarsareward.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/sarsareward.png -------------------------------------------------------------------------------- /sarsa/Sample results/sarsastep.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/sarsastep.png -------------------------------------------------------------------------------- /sarsa/Sample results/sarsavalue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/Sample results/sarsavalue.png -------------------------------------------------------------------------------- /sarsa/algo_sarsa.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from environment import final_states 4 | 5 | 6 | class SARSA: 7 | 8 | def __init__(self, actions): 9 | ''' 10 | SARSA inital parameters 11 | 12 | Parameters 13 | ---------- 14 | actions : int 15 | all actions including up, down, left, right 16 | alpha : int 17 | learning rate 18 | gamma : int 19 | discount factor 20 | epsilon : int 21 | probability 22 | decay_factor : int 23 | q_table : pandas Dataframe 24 | Q-table with actions as columns 25 | q_table_final : pandas Dataframe 26 | final Q-table 27 | ''' 28 | self.actions = actions 29 | self.alpha = 0.9 30 | self.gamma = 0.9 31 | self.epsilon = 0.5 32 | self.decay_factor = 0.99995 33 | self.q_table = pd.DataFrame(columns=self.actions, dtype=np.float64) 34 | self.q_table_final = pd.DataFrame( 35 | columns=self.actions, dtype=np.float64) 36 | 37 | def choose_action(self, observation): 38 | ''' 39 | Returns an action through exploration and exploitation 40 | 41 | Parameters: 42 | observation: current state of 43 | the agent in the format of state = '[5.0, 40.0]' 44 | 45 | Returns: 46 | action number 47 | ''' 48 | self.check_state_exist(observation) 49 | self.epsilon *= self.decay_factor # epsilon greedy 50 | if np.random.uniform(0, 1) < self.epsilon: 51 | action = np.random.choice(self.actions) 52 | else: 53 | state_action = self.q_table.loc[observation, :] 54 | #state_action = state_action.reindex(np.random.permutation(state_action.index)) 55 | action = state_action.idxmax() 56 | return action 57 | 58 | def learning(self, state, action, reward, next_state, next_action): 59 | ''' 60 | Function for learning and updating Q-table with new data 61 | 62 | Parameters: 63 | state: current state of the agent 64 | action: chosen action 65 | reward: received reward 66 | next_state: next state that the agent will move 67 | 68 | Returns: 69 | update Q-table 70 | ''' 71 | self.check_state_exist(next_state) 72 | q_current = self.q_table.loc[state, action] 73 | if next_state != 'Goal' or next_state != 'Obstacle' or next_state != 'Rubik': 74 | q_target = reward + self.gamma * \ 75 | self.q_table.loc[next_state, next_action] 76 | else: 77 | q_target = reward 78 | self.q_table.loc[state, action] += self.alpha * \ 79 | (q_target - q_current) # updating Q-table 80 | return self.q_table.loc[state, action] 81 | 82 | def check_state_exist(self, state): 83 | ''' 84 | Adding new states to the Q-table 85 | (pd.series generate 1-dimensional array) 86 | ''' 87 | if state not in self.q_table.index: 88 | self.q_table = self.q_table.append(pd.Series( 89 | [0] * len(self.actions), index=self.q_table.columns, name=state)) 90 | 91 | def print_q_table(self): 92 | ''' 93 | Saving final Q-table 94 | ''' 95 | final_route = final_states() 96 | for i in range(len(final_route)): 97 | state = str(final_route[i]) 98 | for j in range(len(self.q_table.index)): 99 | if self.q_table.index[j] == state: 100 | self.q_table_final.loc[state, 101 | :] = self.q_table.loc[state, :] 102 | 103 | with open('data.txt', 'a') as f: 104 | f.write(f'Final Path Q-table: {str(self.q_table_final)} \n') 105 | f.write(f'Full Q-table:: {str(self.q_table)} \n') 106 | -------------------------------------------------------------------------------- /sarsa/environment.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tkinter as tk 3 | from tkinter import * 4 | from PIL import Image, ImageTk 5 | 6 | 7 | global_variable = {} 8 | 9 | 10 | class Environment(tk.Tk, object): 11 | def __init__(self): 12 | ''' 13 | Constructs all the necessary attributes for the environment. 14 | 15 | Parameters 16 | ---------- 17 | num_actions: all actions including up, down, left, right 18 | pixels: environment pixels for each location 19 | env_height: number of vertical grids for the environment 20 | env_width: number of horizontal grids for the environment 21 | title: Tkinter environment title 22 | geometry: environmet geometry which is (w*px)*(h*px)+offsets 23 | comparison_dic: storing agnet pathway for each iteration 24 | path_dic: saving final pathway of teh agent 25 | key_dic: a counter for stroing paths 26 | fake: fake variable for reaching final Goal for first time 27 | longest_path: logest path to reach the Goal 28 | shortest_path: shortest path to reach the Goal 29 | ''' 30 | 31 | super(Environment, self).__init__() 32 | self.num_actions = 4 33 | self.pixels = 40 34 | self.env_height = 15 35 | self.env_width = 15 36 | self.title('Path Planing with Reinforcement Learning') 37 | self.geometry( 38 | f'{self.env_width * self.pixels}x{self.env_height * self.pixels}+600+250') 39 | self.build_environment() 40 | self.comparison_dic = {} 41 | self.path_dic = {} 42 | self.key_dic = 0 43 | self.fake = True 44 | self.longest_path = 0 45 | self.shortest_path = 0 46 | 47 | def build_environment(self): 48 | ''' 49 | environment creation by Tkinter 50 | ''' 51 | 52 | self.canvas_widget = tk.Canvas( 53 | self, 54 | bg='white', 55 | height=self.env_height * 56 | self.pixels, 57 | width=self.env_width * 58 | self.pixels) 59 | 60 | for column in range(0, self.env_width * self.pixels, self.pixels): 61 | x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels 62 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 63 | for row in range(0, self.env_height * self.pixels, self.pixels): 64 | x0, y0, x1, y1 = 0, row, self.env_height * self.pixels, row 65 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 66 | 67 | img_obstacle = Image.open("images/obstacle.png") 68 | self.obstacle_object = ImageTk.PhotoImage(img_obstacle) 69 | 70 | img_tree = Image.open("images/tree.png") 71 | self.tree_object = ImageTk.PhotoImage(img_tree) 72 | 73 | img_shop = Image.open("images/boot_tree.png") 74 | self.shop_object = ImageTk.PhotoImage(img_shop) 75 | 76 | img_building = Image.open("images/building.png") 77 | self.building_object = ImageTk.PhotoImage(img_building) 78 | 79 | img_cube = Image.open("images/rubik.png") 80 | self.cube_object = ImageTk.PhotoImage(img_cube) 81 | 82 | img_garbage = Image.open("images/garbage.png") 83 | self.garbage_object = ImageTk.PhotoImage(img_garbage) 84 | 85 | self.obstacle1 = self.canvas_widget.create_image( 86 | self.pixels * 2, 0, anchor='nw', image=self.obstacle_object) 87 | self.obstacle2 = self.canvas_widget.create_image( 88 | self.pixels * 9, 0, anchor='nw', image=self.tree_object) 89 | self.obstacle3 = self.canvas_widget.create_image( 90 | self.pixels * 11, 0, anchor='nw', image=self.obstacle_object) 91 | self.obstacle4 = self.canvas_widget.create_image( 92 | self.pixels * 14, 0, anchor='nw', image=self.shop_object) 93 | self.obstacle5 = self.canvas_widget.create_image( 94 | self.pixels * 5, 0, anchor='nw', image=self.obstacle_object) 95 | self.obstacle6 = self.canvas_widget.create_image( 96 | 0, self.pixels, anchor='nw', image=self.building_object) 97 | self.obstacle7 = self.canvas_widget.create_image( 98 | self.pixels * 7, self.pixels, anchor='nw', image=self.obstacle_object) 99 | self.obstacle8 = self.canvas_widget.create_image( 100 | self.pixels * 9, self.pixels, anchor='nw', image=self.obstacle_object) 101 | self.obstacle9 = self.canvas_widget.create_image( 102 | self.pixels * 13, self.pixels, anchor='nw', image=self.obstacle_object) 103 | self.obstacle10 = self.canvas_widget.create_image( 104 | self.pixels * 3, self.pixels * 2, anchor='nw', image=self.tree_object) 105 | self.obstacle11 = self.canvas_widget.create_image( 106 | self.pixels * 5, self.pixels * 2, anchor='nw', image=self.obstacle_object) 107 | self.obstacle12 = self.canvas_widget.create_image( 108 | self.pixels * 11, self.pixels * 2, anchor='nw', image=self.obstacle_object) 109 | self.obstacle13 = self.canvas_widget.create_image( 110 | self.pixels * 0, self.pixels * 3, anchor='nw', image=self.building_object) 111 | self.obstacle14 = self.canvas_widget.create_image( 112 | self.pixels * 2, self.pixels * 4, anchor='nw', image=self.shop_object) 113 | self.obstacle15 = self.canvas_widget.create_image( 114 | self.pixels * 8, self.pixels * 3, anchor='nw', image=self.obstacle_object) 115 | self.obstacle16 = self.canvas_widget.create_image( 116 | self.pixels * 9, self.pixels * 3, anchor='nw', image=self.tree_object) 117 | self.obstacle17 = self.canvas_widget.create_image( 118 | self.pixels * 14, self.pixels * 3, anchor='nw', image=self.obstacle_object) 119 | self.obstacle19 = self.canvas_widget.create_image( 120 | self.pixels * 5, self.pixels * 4, anchor='nw', image=self.building_object) 121 | self.obstacle20 = self.canvas_widget.create_image( 122 | self.pixels * 10, self.pixels * 4, anchor='nw', image=self.obstacle_object) 123 | self.obstacle21 = self.canvas_widget.create_image( 124 | self.pixels * 13, self.pixels * 4, anchor='nw', image=self.obstacle_object) 125 | self.obstacle22 = self.canvas_widget.create_image( 126 | self.pixels * 8, self.pixels * 5, anchor='nw', image=self.shop_object) 127 | self.obstacle23 = self.canvas_widget.create_image( 128 | self.pixels * 3, self.pixels * 6, anchor='nw', image=self.obstacle_object) 129 | self.obstacle24 = self.canvas_widget.create_image( 130 | self.pixels * 6, self.pixels * 6, anchor='nw', image=self.obstacle_object) 131 | self.obstacle25 = self.canvas_widget.create_image( 132 | self.pixels * 11, self.pixels * 6, anchor='nw', image=self.tree_object) 133 | self.obstacle26 = self.canvas_widget.create_image( 134 | self.pixels * 14, self.pixels * 6, anchor='nw', image=self.obstacle_object) 135 | self.obstacle27 = self.canvas_widget.create_image( 136 | self.pixels * 0, self.pixels * 7, anchor='nw', image=self.obstacle_object) 137 | self.obstacle28 = self.canvas_widget.create_image( 138 | self.pixels * 1, self.pixels * 7, anchor='nw', image=self.tree_object) 139 | self.obstacle29 = self.canvas_widget.create_image( 140 | self.pixels * 9, self.pixels * 7, anchor='nw', image=self.cube_object) 141 | self.obstacle30 = self.canvas_widget.create_image( 142 | self.pixels * 3, self.pixels * 8, anchor='nw', image=self.building_object) 143 | self.obstacle31 = self.canvas_widget.create_image( 144 | self.pixels * 5, self.pixels * 8, anchor='nw', image=self.obstacle_object) 145 | self.obstacle32 = self.canvas_widget.create_image( 146 | self.pixels * 9, self.pixels * 8, anchor='nw', image=self.shop_object) 147 | self.obstacle33 = self.canvas_widget.create_image( 148 | self.pixels * 12, self.pixels * 8, anchor='nw', image=self.tree_object) 149 | self.obstacle34 = self.canvas_widget.create_image( 150 | self.pixels * 14, self.pixels * 8, anchor='nw', image=self.obstacle_object) 151 | self.obstacle35 = self.canvas_widget.create_image( 152 | self.pixels * 0, self.pixels * 9, anchor='nw', image=self.shop_object) 153 | self.obstacle36 = self.canvas_widget.create_image( 154 | self.pixels * 7, self.pixels * 9, anchor='nw', image=self.obstacle_object) 155 | self.obstacle37 = self.canvas_widget.create_image( 156 | self.pixels * 3, self.pixels * 10, anchor='nw', image=self.building_object) 157 | self.obstacle38 = self.canvas_widget.create_image( 158 | self.pixels * 5, self.pixels * 10, anchor='nw', image=self.tree_object) 159 | self.obstacle39 = self.canvas_widget.create_image( 160 | self.pixels * 12, self.pixels * 10, anchor='nw', image=self.obstacle_object) 161 | self.obstacle40 = self.canvas_widget.create_image( 162 | self.pixels * 1, self.pixels * 11, anchor='nw', image=self.tree_object) 163 | self.obstacle41 = self.canvas_widget.create_image( 164 | self.pixels * 6, self.pixels * 11, anchor='nw', image=self.obstacle_object) 165 | self.obstacle42 = self.canvas_widget.create_image( 166 | self.pixels * 9, self.pixels * 11, anchor='nw', image=self.tree_object) 167 | self.obstacle43 = self.canvas_widget.create_image( 168 | self.pixels * 12, self.pixels * 12, anchor='nw', image=self.garbage_object) 169 | self.obstacle44 = self.canvas_widget.create_image( 170 | self.pixels * 13, self.pixels * 12, anchor='nw', image=self.garbage_object) 171 | self.obstacle45 = self.canvas_widget.create_image( 172 | self.pixels * 14, self.pixels * 12, anchor='nw', image=self.garbage_object) 173 | self.obstacle46 = self.canvas_widget.create_image( 174 | self.pixels * 2, self.pixels * 13, anchor='nw', image=self.obstacle_object) 175 | self.obstacle47 = self.canvas_widget.create_image( 176 | self.pixels * 4, self.pixels * 13, anchor='nw', image=self.building_object) 177 | self.obstacle48 = self.canvas_widget.create_image( 178 | self.pixels * 7, self.pixels * 13, anchor='nw', image=self.obstacle_object) 179 | self.obstacle49 = self.canvas_widget.create_image( 180 | self.pixels * 12, self.pixels * 13, anchor='nw', image=self.garbage_object) 181 | self.obstacle50 = self.canvas_widget.create_image( 182 | self.pixels * 14, self.pixels * 13, anchor='nw', image=self.garbage_object) 183 | self.obstacle51 = self.canvas_widget.create_image( 184 | self.pixels * 0, self.pixels * 14, anchor='nw', image=self.building_object) 185 | self.obstacle52 = self.canvas_widget.create_image( 186 | self.pixels * 14, self.pixels * 14, anchor='nw', image=self.garbage_object) 187 | 188 | img_flag = Image.open("images/flag.png") 189 | self.flag_object = ImageTk.PhotoImage(img_flag) 190 | self.flag = self.canvas_widget.create_image( 191 | self.pixels * 13, self.pixels * 13, anchor='nw', image=self.flag_object) 192 | 193 | img_robot = Image.open("images/agent.png") 194 | self.robot = ImageTk.PhotoImage(img_robot) 195 | self.agent = self.canvas_widget.create_image( 196 | 0, 0, anchor='nw', image=self.robot) 197 | 198 | self.canvas_widget.pack() 199 | 200 | def reset(self): 201 | ''' 202 | reset the environment and all parameters 203 | Return: 204 | the agent's current state in the format of [120.0, 40.0] 205 | ''' 206 | 207 | self.update() 208 | self.canvas_widget.delete(self.agent) 209 | self.agent = self.canvas_widget.create_image( 210 | 0, 0, anchor='nw', image=self.robot) 211 | self.comparison_dic = {} 212 | self.key_dic = 0 213 | return self.canvas_widget.coords(self.agent) 214 | 215 | def refresh(self): 216 | ''' 217 | update and refresh the environment before training 218 | ''' 219 | 220 | self.update() 221 | 222 | def step(self, action): 223 | ''' 224 | Moving the agent one pixel and update reward, action and next step regarding the agent next location 225 | 226 | Parameters: 227 | action: Actions = {0:'up', 1:'down', 2:'right', 3:'left} 228 | 229 | Returns: 230 | reward, next step and done flag 231 | ''' 232 | 233 | state = self.canvas_widget.coords(self.agent) 234 | base_action = np.array([0, 0]) 235 | 236 | if action == 0: 237 | if state[1] >= self.pixels: 238 | base_action[1] -= self.pixels 239 | elif action == 1: 240 | if state[1] < (self.env_height - 1) * self.pixels: 241 | base_action[1] += self.pixels 242 | elif action == 2: 243 | if state[0] < (self.env_width - 1) * self.pixels: 244 | base_action[0] += self.pixels 245 | elif action == 3: 246 | if state[0] >= self.pixels: 247 | base_action[0] -= self.pixels 248 | 249 | self.canvas_widget.move(self.agent, base_action[0], base_action[1]) 250 | self.comparison_dic[self.key_dic] = self.canvas_widget.coords( 251 | self.agent) # storing new position of agent 252 | next_state = self.comparison_dic[self.key_dic] 253 | self.key_dic += 1 # add next key in dictionary 254 | 255 | if next_state == self.canvas_widget.coords(self.flag): 256 | reward = 100 257 | next_state = 'Goal' 258 | done = True 259 | 260 | # filling the dictionary first time 261 | if self.fake: 262 | for j in range(len(self.comparison_dic)): 263 | self.path_dic[j] = self.comparison_dic[j] 264 | self.fake = False 265 | self.longest_path = len(self.comparison_dic) 266 | self.shortest_path = len(self.comparison_dic) 267 | 268 | # storing shortest path 269 | if len(self.comparison_dic) < len(self.path_dic): 270 | self.shortest_path = len(self.comparison_dic) 271 | self.path_dic = {} 272 | for j in range(len(self.comparison_dic)): 273 | self.path_dic[j] = self.comparison_dic[j] 274 | 275 | # storing longest path 276 | if len(self.comparison_dic) > self.longest_path: 277 | self.longest_path = len(self.comparison_dic) 278 | 279 | elif next_state in [self.canvas_widget.coords(self.obstacle1), 280 | self.canvas_widget.coords(self.obstacle2), 281 | self.canvas_widget.coords(self.obstacle3), 282 | self.canvas_widget.coords(self.obstacle4), 283 | self.canvas_widget.coords(self.obstacle5), 284 | self.canvas_widget.coords(self.obstacle6), 285 | self.canvas_widget.coords(self.obstacle7), 286 | self.canvas_widget.coords(self.obstacle8), 287 | self.canvas_widget.coords(self.obstacle9), 288 | self.canvas_widget.coords(self.obstacle10), 289 | self.canvas_widget.coords(self.obstacle11), 290 | self.canvas_widget.coords(self.obstacle12), 291 | self.canvas_widget.coords(self.obstacle13), 292 | self.canvas_widget.coords(self.obstacle14), 293 | self.canvas_widget.coords(self.obstacle15), 294 | self.canvas_widget.coords(self.obstacle16), 295 | self.canvas_widget.coords(self.obstacle17), 296 | self.canvas_widget.coords(self.obstacle19), 297 | self.canvas_widget.coords(self.obstacle20), 298 | self.canvas_widget.coords(self.obstacle21), 299 | self.canvas_widget.coords(self.obstacle22), 300 | self.canvas_widget.coords(self.obstacle23), 301 | self.canvas_widget.coords(self.obstacle24), 302 | self.canvas_widget.coords(self.obstacle25), 303 | self.canvas_widget.coords(self.obstacle26), 304 | self.canvas_widget.coords(self.obstacle27), 305 | self.canvas_widget.coords(self.obstacle28), 306 | self.canvas_widget.coords(self.obstacle30), 307 | self.canvas_widget.coords(self.obstacle31), 308 | self.canvas_widget.coords(self.obstacle32), 309 | self.canvas_widget.coords(self.obstacle33), 310 | self.canvas_widget.coords(self.obstacle34), 311 | self.canvas_widget.coords(self.obstacle35), 312 | self.canvas_widget.coords(self.obstacle36), 313 | self.canvas_widget.coords(self.obstacle37), 314 | self.canvas_widget.coords(self.obstacle38), 315 | self.canvas_widget.coords(self.obstacle39), 316 | self.canvas_widget.coords(self.obstacle40), 317 | self.canvas_widget.coords(self.obstacle41), 318 | self.canvas_widget.coords(self.obstacle42), 319 | self.canvas_widget.coords(self.obstacle43), 320 | self.canvas_widget.coords(self.obstacle44), 321 | self.canvas_widget.coords(self.obstacle45), 322 | self.canvas_widget.coords(self.obstacle46), 323 | self.canvas_widget.coords(self.obstacle47), 324 | self.canvas_widget.coords(self.obstacle48), 325 | self.canvas_widget.coords(self.obstacle49), 326 | self.canvas_widget.coords(self.obstacle50), 327 | self.canvas_widget.coords(self.obstacle51), 328 | self.canvas_widget.coords(self.obstacle52)]: 329 | reward = -5 330 | done = True 331 | next_state = 'Obstacle' 332 | self.comparison_dic = {} 333 | self.key_dic = 0 334 | 335 | elif next_state in [self.canvas_widget.coords(self.obstacle29)]: 336 | reward = -1 337 | done = True 338 | next_state = 'Rubik' 339 | self.comparison_dic = {} 340 | self.key_dic = 0 341 | 342 | else: 343 | reward = 0 344 | done = False 345 | 346 | return next_state, reward, done 347 | 348 | def final_path(self): 349 | ''' 350 | saving final path and showing graphically by balck ovals 351 | ''' 352 | 353 | origin_point = np.array([20, 20]) 354 | path_list = [] 355 | self.canvas_widget.delete(self.agent) 356 | for j in range(len(self.path_dic)): 357 | path_list.append(self.path_dic[j]) 358 | self.track = self.canvas_widget.create_oval( 359 | self.path_dic[j][0] + origin_point[0] - 12, 360 | self.path_dic[j][1] + origin_point[1] - 12, 361 | self.path_dic[j][0] + origin_point[0] + 12, 362 | self.path_dic[j][1] + origin_point[1] + 12, 363 | fill='black', 364 | outline='black') 365 | # putting the final route in a global variable 366 | global_variable[j] = self.path_dic[j] 367 | 368 | with open('data.txt', 'w') as f: 369 | f.write(f'The Shortest Path: {str(self.shortest_path)} \n') 370 | f.write(f'The Longest Path: {str(self.longest_path)} \n') 371 | f.write(f'Optimal Path: {str(path_list)} \n') 372 | 373 | 374 | def final_states(): 375 | '''final route coordination for plotting''' 376 | 377 | return global_variable 378 | -------------------------------------------------------------------------------- /sarsa/images/agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/agent.png -------------------------------------------------------------------------------- /sarsa/images/boot_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/boot_tree.png -------------------------------------------------------------------------------- /sarsa/images/building.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/building.png -------------------------------------------------------------------------------- /sarsa/images/flag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/flag.png -------------------------------------------------------------------------------- /sarsa/images/garbage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/garbage.png -------------------------------------------------------------------------------- /sarsa/images/obstacle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/obstacle.png -------------------------------------------------------------------------------- /sarsa/images/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /sarsa/images/rubik.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/rubik.png -------------------------------------------------------------------------------- /sarsa/images/tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/sarsa/images/tree.png -------------------------------------------------------------------------------- /sarsa/plot_results.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | 5 | class Plots: 6 | 7 | def plot_reward(self, reward): 8 | plt.close() 9 | plt.plot(np.arange(len(reward)), reward, 'b') 10 | plt.title('Episodes vs Reward') 11 | plt.xlabel('Episodes') 12 | plt.ylabel('Reward') 13 | plt.grid() 14 | plt.savefig('reward.png') 15 | plt.show() 16 | 17 | def plot_steps(self, steps): 18 | plt.close() 19 | plt.plot(np.arange(len(steps)), steps, 'r') 20 | plt.title('Episodes vs Steps') 21 | plt.xlabel('Episodes') 22 | plt.ylabel('Steps') 23 | plt.grid() 24 | plt.savefig('steps.png') 25 | plt.show() 26 | 27 | def plot_value(self, value): 28 | plt.close() 29 | plt.plot(np.arange(len(value)), value, 'g') 30 | plt.title('Episodes vs Values') 31 | plt.xlabel('Episodes') 32 | plt.ylabel('Q-Values') 33 | plt.grid() 34 | plt.savefig('value.png') 35 | plt.show() -------------------------------------------------------------------------------- /sarsa/readme: -------------------------------------------------------------------------------- 1 | SARSA is an on-policy Temporal Difference (TD) control method that shares common features with Q-learning. The convergence properties of the SARSA algorithm depend on the nature of the policy's reliance on Q. As evident, SARSA necessitates the next action to update the Q table, and therefore, it selects the next action based on the next state. The action selection mechanism follows an epsilon-greedy approach, and all procedures are akin to Q-learning, with the exception of episodes during which SARSA requires more time to discover the optimal path. 2 | -------------------------------------------------------------------------------- /sarsa/run.py: -------------------------------------------------------------------------------- 1 | '''This is the main file. When you run it, the agent start training process.''' 2 | 3 | from environment import Environment 4 | from algo_sarsa import SARSA 5 | from plot_results import Plots 6 | 7 | 8 | def main(): 9 | 10 | total_steps = [] 11 | total_rewards = [] 12 | total_values = [] 13 | episodes = 10000 14 | 15 | for episode in range(episodes): 16 | state = env.reset() # it returns the coordination of the agent 17 | step = 0 18 | value = 0 19 | reward_value = 0 20 | action = RL.choose_action(str(state)) 21 | while True: 22 | env.refresh() 23 | next_state, reward, done = env.step(action) 24 | next_action = RL.choose_action(str(next_state)) 25 | value += RL.learning(str(state), action, reward, 26 | str(next_state), next_action) 27 | state = next_state 28 | action = next_action 29 | reward_value += reward 30 | step += 1 31 | 32 | if done: 33 | total_steps += [step] 34 | total_rewards += [reward_value] 35 | total_values += [value] 36 | break 37 | 38 | env.final_path() 39 | plot.plot_reward(total_rewards) 40 | plot.plot_steps(total_steps) 41 | plot.plot_value(total_values) 42 | RL.print_q_table() 43 | 44 | 45 | if __name__ == "__main__": 46 | env = Environment() 47 | RL = SARSA(actions=list(range(env.num_actions))) 48 | plot = Plots() 49 | env.after(10, main) 50 | env.mainloop() 51 | -------------------------------------------------------------------------------- /td_learning/Sample results/TD_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TD_0.png -------------------------------------------------------------------------------- /td_learning/Sample results/TDreward.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TDreward.png -------------------------------------------------------------------------------- /td_learning/Sample results/TDsteps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TDsteps.png -------------------------------------------------------------------------------- /td_learning/Sample results/TDvalue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/Sample results/TDvalue.png -------------------------------------------------------------------------------- /td_learning/Sample results/data.txt: -------------------------------------------------------------------------------- 1 | The Shortest Path: 95 2 | The Longest Path: 336 3 | Optimal Path: [[0.0, 0.0], [40.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [40.0, 0.0], [40.0, 0.0], [0.0, 0.0], [40.0, 0.0], [0.0, 0.0], [40.0, 0.0], [40.0, 0.0], [40.0, 40.0], [40.0, 80.0], [40.0, 120.0], [40.0, 80.0], [40.0, 120.0], [40.0, 80.0], [40.0, 120.0], [40.0, 80.0], [80.0, 80.0], [80.0, 120.0], [40.0, 120.0], [40.0, 160.0], [0.0, 160.0], [0.0, 200.0], [0.0, 160.0], [0.0, 160.0], [0.0, 200.0], [40.0, 200.0], [80.0, 200.0], [120.0, 200.0], [120.0, 160.0], [160.0, 160.0], [160.0, 120.0], [200.0, 120.0], [240.0, 120.0], [240.0, 160.0], [240.0, 200.0], [200.0, 200.0], [200.0, 240.0], [200.0, 200.0], [200.0, 240.0], [160.0, 240.0], [160.0, 280.0], [160.0, 240.0], [160.0, 280.0], [160.0, 240.0], [160.0, 280.0], [160.0, 240.0], [160.0, 200.0], [120.0, 200.0], [80.0, 200.0], [120.0, 200.0], [160.0, 200.0], [200.0, 200.0], [240.0, 200.0], [280.0, 200.0], [280.0, 240.0], [280.0, 200.0], [240.0, 200.0], [200.0, 200.0], [200.0, 240.0], [200.0, 280.0], [160.0, 280.0], [160.0, 320.0], [160.0, 360.0], [160.0, 400.0], [160.0, 440.0], [200.0, 440.0], [200.0, 480.0], [240.0, 480.0], [280.0, 480.0], [320.0, 480.0], [360.0, 480.0], [360.0, 520.0], [400.0, 520.0], [400.0, 560.0], [440.0, 560.0], [480.0, 560.0], [440.0, 560.0], [480.0, 560.0], [480.0, 560.0], [480.0, 560.0], [480.0, 560.0], [520.0, 560.0], [520.0, 560.0], [520.0, 560.0], [520.0, 520.0]] 4 | Final Path V-table: state value 5 | [0.0, 0.0] -0.048587 6 | [40.0, 0.0] -0.232032 7 | [40.0, 40.0] -1.110985 8 | [40.0, 80.0] -0.677959 9 | [40.0, 120.0] -0.082712 10 | [80.0, 80.0] -0.580085 11 | [80.0, 120.0] -0.226284 12 | [40.0, 160.0] -0.109950 13 | [0.0, 160.0] -0.126974 14 | [0.0, 200.0] -0.119014 15 | [40.0, 200.0] -0.097870 16 | [80.0, 200.0] -0.199502 17 | [120.0, 200.0] -0.271317 18 | [120.0, 160.0] -0.133922 19 | [160.0, 160.0] -0.299506 20 | [160.0, 120.0] -3.808410 21 | [200.0, 120.0] -1.061209 22 | [240.0, 120.0] -3.797218 23 | [240.0, 160.0] -0.847080 24 | [240.0, 200.0] -0.352839 25 | [200.0, 200.0] -0.517903 26 | [200.0, 240.0] -0.222678 27 | [160.0, 240.0] -2.248170 28 | [160.0, 280.0] -0.168093 29 | [160.0, 200.0] -0.177441 30 | [280.0, 200.0] -0.182979 31 | [280.0, 240.0] -0.450525 32 | [200.0, 280.0] -0.207689 33 | [160.0, 320.0] -4.799294 34 | [160.0, 360.0] -2.910989 35 | [160.0, 400.0] -4.860101 36 | [160.0, 440.0] -1.272738 37 | [200.0, 440.0] -4.681947 38 | [200.0, 480.0] -3.562581 39 | [240.0, 480.0] -3.315708 40 | [280.0, 480.0] -0.111897 41 | [320.0, 480.0] 0.119484 42 | [360.0, 480.0] 0.306290 43 | [360.0, 520.0] 0.335131 44 | [400.0, 520.0] 0.408170 45 | [400.0, 560.0] 3.452500 46 | [440.0, 560.0] 6.391877 47 | [480.0, 560.0] 25.508254 48 | [520.0, 560.0] 65.003347 49 | Full V-table:: state value 50 | [0.0, 0.0] -0.048587 51 | [40.0, 0.0] -0.232032 52 | [40.0, 40.0] -1.110985 53 | [40.0, 80.0] -0.677959 54 | [0.0, 80.0] -4.789541 55 | ... ... 56 | [520.0, 560.0] 65.003347 57 | [480.0, 0.0] -1.821095 58 | Goal 0.000000 59 | [520.0, 0.0] -3.951864 60 | [560.0, 160.0] -4.993713 61 | 62 | [176 rows x 1 columns] 63 | -------------------------------------------------------------------------------- /td_learning/Sample results/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /td_learning/algo_td0.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | from environment import final_states 4 | 5 | 6 | class TemporalDifference: 7 | 8 | def __init__(self): 9 | ''' 10 | TD inital parameters 11 | 12 | Parameters 13 | ---------- 14 | alpha : int 15 | learning rate 16 | gamma : int 17 | discount factor 18 | epsilon : int 19 | probability 20 | decay_factor : int 21 | q_table : pandas Dataframe 22 | V-table with state values as columns 23 | q_table_final : pandas Dataframe 24 | final V-table 25 | ''' 26 | 27 | self.alpha = 0.9 28 | self.gamma = 0.9 29 | self.epsilon = 0.5 30 | self.decay_factor = 0.99995 31 | self.v_table = pd.DataFrame(columns=['state value'], dtype=np.float64) 32 | self.v_table_final = pd.DataFrame( 33 | columns=['state value'], dtype=np.float64) 34 | 35 | def learning(self, state, reward, next_state): 36 | ''' 37 | Function for learning and updating V-table with new data 38 | 39 | Parameters: 40 | state: current state of the agent 41 | action: chosen action 42 | reward: received reward 43 | next_state: next state that the agent will move 44 | 45 | Returns: 46 | update V-table 47 | ''' 48 | self.check_state_exist(next_state) 49 | v_current = self.v_table.loc[state] 50 | if next_state != 'Goal' or next_state != 'Obstacle' or next_state != 'Rubik': 51 | v_target = reward + self.gamma * self.v_table.loc[next_state] 52 | else: 53 | v_target = reward 54 | self.v_table.loc[state] += self.alpha * (v_target - v_current) 55 | return self.v_table.loc[state] 56 | 57 | def check_state_exist(self, state): 58 | ''' 59 | Adding new states to the V-table 60 | ''' 61 | if state not in self.v_table.index: 62 | self.v_table = self.v_table.append( 63 | pd.Series([0] * 1, index=self.v_table.columns, name=state)) 64 | 65 | def print_v_table(self): 66 | ''' 67 | Saving final Q-table 68 | ''' 69 | final_route = final_states() 70 | for i in range(len(final_route)): 71 | state = str(final_route[i]) 72 | for j in range(len(self.v_table.index)): 73 | if self.v_table.index[j] == state: 74 | self.v_table_final.loc[state] = self.v_table.loc[state] 75 | 76 | with open('data.txt', 'a') as f: 77 | f.write(f'Final Path V-table: {str(self.v_table_final)} \n') 78 | f.write(f'Full V-table:: {str(self.v_table)} \n') 79 | -------------------------------------------------------------------------------- /td_learning/environment.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from operator import add 3 | import tkinter as tk 4 | from PIL import Image, ImageTk 5 | 6 | 7 | global_variable = {} 8 | 9 | 10 | class Environment(tk.Tk, object): 11 | def __init__(self): 12 | ''' 13 | Constructs all the necessary attributes for the environment. 14 | 15 | Parameters 16 | ---------- 17 | num_actions: all actions including up, down, left, right 18 | pixels: environment pixels for each location 19 | env_height: number of vertical grids for the environment 20 | env_width: number of horizontal grids for the environment 21 | title: Tkinter environment title 22 | geometry: environmet geometry which is (w*px)*(h*px)+offsets 23 | comparison_dic: storing agnet pathway for each iteration 24 | path_dic: saving final pathway of teh agent 25 | key_dic: a counter for stroing paths 26 | fake: fake variable for reaching final Goal for first time 27 | longest_path: logest path to reach the Goal 28 | shortest_path: shortest path to reach the Goal 29 | ''' 30 | super(Environment, self).__init__() 31 | self.num_actions = 4 32 | self.pixels = 40 33 | self.env_height = 15 34 | self.env_width = 15 35 | self.title('Path Planing with Reinforcement Learning') 36 | self.geometry( 37 | f'{self.env_width * self.pixels }x{self.env_height * self.pixels }+600+250') 38 | self.build_environment() 39 | self.comparison_dic = {} 40 | self.path_dic = {} 41 | self.key_dic = 0 42 | self.fake = True 43 | self.longest_path = 0 44 | self.shortest_path = 0 45 | 46 | def build_environment(self): 47 | ''' 48 | environment creation by Tkinter 49 | ''' 50 | self.canvas_widget = tk.Canvas( 51 | self, 52 | bg='white', 53 | height=self.env_height * 54 | self.pixels, 55 | width=self.env_width * 56 | self.pixels) 57 | for column in range(0, self.env_width * self.pixels, self.pixels): 58 | x0, y0, x1, y1 = column, 0, column, self.env_height * self.pixels 59 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 60 | for row in range(0, self.env_height * self.pixels, self.pixels): 61 | x0, y0, x1, y1 = 0, row, self.env_height * self.pixels, row 62 | self.canvas_widget.create_line(x0, y0, x1, y1, fill='grey') 63 | 64 | img_obstacle = Image.open("images/obstacle.png") 65 | self.obstacle_object = ImageTk.PhotoImage(img_obstacle) 66 | 67 | img_tree = Image.open("images/tree.png") 68 | self.tree_object = ImageTk.PhotoImage(img_tree) 69 | 70 | img_shop = Image.open("images/boot_tree.png") 71 | self.shop_object = ImageTk.PhotoImage(img_shop) 72 | 73 | img_building = Image.open("images/building.png") 74 | self.building_object = ImageTk.PhotoImage(img_building) 75 | 76 | img_cube = Image.open("images/rubik.png") 77 | self.cube_object = ImageTk.PhotoImage(img_cube) 78 | 79 | img_garbage = Image.open("images/garbage.png") 80 | self.garbage_object = ImageTk.PhotoImage(img_garbage) 81 | 82 | self.obstacle1 = self.canvas_widget.create_image( 83 | self.pixels * 2, 0, anchor='nw', image=self.obstacle_object) 84 | self.obstacle2 = self.canvas_widget.create_image( 85 | self.pixels * 9, 0, anchor='nw', image=self.tree_object) 86 | self.obstacle3 = self.canvas_widget.create_image( 87 | self.pixels * 11, 0, anchor='nw', image=self.obstacle_object) 88 | self.obstacle4 = self.canvas_widget.create_image( 89 | self.pixels * 14, 0, anchor='nw', image=self.shop_object) 90 | self.obstacle5 = self.canvas_widget.create_image( 91 | self.pixels * 5, 0, anchor='nw', image=self.obstacle_object) 92 | self.obstacle6 = self.canvas_widget.create_image( 93 | 0, self.pixels, anchor='nw', image=self.building_object) 94 | self.obstacle7 = self.canvas_widget.create_image( 95 | self.pixels * 7, self.pixels, anchor='nw', image=self.obstacle_object) 96 | self.obstacle8 = self.canvas_widget.create_image( 97 | self.pixels * 9, self.pixels, anchor='nw', image=self.obstacle_object) 98 | self.obstacle9 = self.canvas_widget.create_image( 99 | self.pixels * 13, self.pixels, anchor='nw', image=self.obstacle_object) 100 | self.obstacle10 = self.canvas_widget.create_image( 101 | self.pixels * 3, self.pixels * 2, anchor='nw', image=self.tree_object) 102 | self.obstacle11 = self.canvas_widget.create_image( 103 | self.pixels * 5, self.pixels * 2, anchor='nw', image=self.obstacle_object) 104 | self.obstacle12 = self.canvas_widget.create_image( 105 | self.pixels * 11, self.pixels * 2, anchor='nw', image=self.obstacle_object) 106 | self.obstacle13 = self.canvas_widget.create_image( 107 | self.pixels * 0, self.pixels * 3, anchor='nw', image=self.building_object) 108 | self.obstacle14 = self.canvas_widget.create_image( 109 | self.pixels * 2, self.pixels * 4, anchor='nw', image=self.shop_object) 110 | self.obstacle15 = self.canvas_widget.create_image( 111 | self.pixels * 8, self.pixels * 3, anchor='nw', image=self.obstacle_object) 112 | self.obstacle16 = self.canvas_widget.create_image( 113 | self.pixels * 9, self.pixels * 3, anchor='nw', image=self.tree_object) 114 | self.obstacle17 = self.canvas_widget.create_image( 115 | self.pixels * 14, self.pixels * 3, anchor='nw', image=self.obstacle_object) 116 | self.obstacle19 = self.canvas_widget.create_image( 117 | self.pixels * 5, self.pixels * 4, anchor='nw', image=self.building_object) 118 | self.obstacle20 = self.canvas_widget.create_image( 119 | self.pixels * 10, self.pixels * 4, anchor='nw', image=self.obstacle_object) 120 | self.obstacle21 = self.canvas_widget.create_image( 121 | self.pixels * 13, self.pixels * 4, anchor='nw', image=self.obstacle_object) 122 | self.obstacle22 = self.canvas_widget.create_image( 123 | self.pixels * 8, self.pixels * 5, anchor='nw', image=self.shop_object) 124 | self.obstacle23 = self.canvas_widget.create_image( 125 | self.pixels * 3, self.pixels * 6, anchor='nw', image=self.obstacle_object) 126 | self.obstacle24 = self.canvas_widget.create_image( 127 | self.pixels * 6, self.pixels * 6, anchor='nw', image=self.obstacle_object) 128 | self.obstacle25 = self.canvas_widget.create_image( 129 | self.pixels * 11, self.pixels * 6, anchor='nw', image=self.tree_object) 130 | self.obstacle26 = self.canvas_widget.create_image( 131 | self.pixels * 14, self.pixels * 6, anchor='nw', image=self.obstacle_object) 132 | self.obstacle27 = self.canvas_widget.create_image( 133 | self.pixels * 0, self.pixels * 7, anchor='nw', image=self.obstacle_object) 134 | self.obstacle28 = self.canvas_widget.create_image( 135 | self.pixels * 1, self.pixels * 7, anchor='nw', image=self.tree_object) 136 | self.obstacle29 = self.canvas_widget.create_image( 137 | self.pixels * 9, self.pixels * 7, anchor='nw', image=self.cube_object) 138 | self.obstacle30 = self.canvas_widget.create_image( 139 | self.pixels * 3, self.pixels * 8, anchor='nw', image=self.building_object) 140 | self.obstacle31 = self.canvas_widget.create_image( 141 | self.pixels * 5, self.pixels * 8, anchor='nw', image=self.obstacle_object) 142 | self.obstacle32 = self.canvas_widget.create_image( 143 | self.pixels * 9, self.pixels * 8, anchor='nw', image=self.shop_object) 144 | self.obstacle33 = self.canvas_widget.create_image( 145 | self.pixels * 12, self.pixels * 8, anchor='nw', image=self.tree_object) 146 | self.obstacle34 = self.canvas_widget.create_image( 147 | self.pixels * 14, self.pixels * 8, anchor='nw', image=self.obstacle_object) 148 | self.obstacle35 = self.canvas_widget.create_image( 149 | self.pixels * 0, self.pixels * 9, anchor='nw', image=self.shop_object) 150 | self.obstacle36 = self.canvas_widget.create_image( 151 | self.pixels * 7, self.pixels * 9, anchor='nw', image=self.obstacle_object) 152 | self.obstacle37 = self.canvas_widget.create_image( 153 | self.pixels * 3, self.pixels * 10, anchor='nw', image=self.building_object) 154 | self.obstacle38 = self.canvas_widget.create_image( 155 | self.pixels * 5, self.pixels * 10, anchor='nw', image=self.tree_object) 156 | self.obstacle39 = self.canvas_widget.create_image( 157 | self.pixels * 12, self.pixels * 10, anchor='nw', image=self.obstacle_object) 158 | self.obstacle40 = self.canvas_widget.create_image( 159 | self.pixels * 1, self.pixels * 11, anchor='nw', image=self.tree_object) 160 | self.obstacle41 = self.canvas_widget.create_image( 161 | self.pixels * 6, self.pixels * 11, anchor='nw', image=self.obstacle_object) 162 | self.obstacle42 = self.canvas_widget.create_image( 163 | self.pixels * 9, self.pixels * 11, anchor='nw', image=self.tree_object) 164 | self.obstacle43 = self.canvas_widget.create_image( 165 | self.pixels * 12, self.pixels * 12, anchor='nw', image=self.garbage_object) 166 | self.obstacle44 = self.canvas_widget.create_image( 167 | self.pixels * 13, self.pixels * 12, anchor='nw', image=self.garbage_object) 168 | self.obstacle45 = self.canvas_widget.create_image( 169 | self.pixels * 14, self.pixels * 12, anchor='nw', image=self.garbage_object) 170 | self.obstacle46 = self.canvas_widget.create_image( 171 | self.pixels * 2, self.pixels * 13, anchor='nw', image=self.obstacle_object) 172 | self.obstacle47 = self.canvas_widget.create_image( 173 | self.pixels * 4, self.pixels * 13, anchor='nw', image=self.building_object) 174 | self.obstacle48 = self.canvas_widget.create_image( 175 | self.pixels * 7, self.pixels * 13, anchor='nw', image=self.obstacle_object) 176 | self.obstacle49 = self.canvas_widget.create_image( 177 | self.pixels * 12, self.pixels * 13, anchor='nw', image=self.garbage_object) 178 | self.obstacle50 = self.canvas_widget.create_image( 179 | self.pixels * 14, self.pixels * 13, anchor='nw', image=self.garbage_object) 180 | self.obstacle51 = self.canvas_widget.create_image( 181 | self.pixels * 0, self.pixels * 14, anchor='nw', image=self.building_object) 182 | self.obstacle52 = self.canvas_widget.create_image( 183 | self.pixels * 14, self.pixels * 14, anchor='nw', image=self.garbage_object) 184 | 185 | img_flag = Image.open("images/flag.png") 186 | self.flag_object = ImageTk.PhotoImage(img_flag) 187 | self.flag = self.canvas_widget.create_image( 188 | self.pixels * 13, self.pixels * 13, anchor='nw', image=self.flag_object) 189 | 190 | img_robot = Image.open("images/agent.png") 191 | self.robot = ImageTk.PhotoImage(img_robot) 192 | self.agent = self.canvas_widget.create_image( 193 | 0, 0, anchor='nw', image=self.robot) 194 | 195 | self.canvas_widget.pack() 196 | 197 | self.obstacles = [self.canvas_widget.coords(self.obstacle1), 198 | self.canvas_widget.coords(self.obstacle2), 199 | self.canvas_widget.coords(self.obstacle3), 200 | self.canvas_widget.coords(self.obstacle4), 201 | self.canvas_widget.coords(self.obstacle5), 202 | self.canvas_widget.coords(self.obstacle6), 203 | self.canvas_widget.coords(self.obstacle7), 204 | self.canvas_widget.coords(self.obstacle8), 205 | self.canvas_widget.coords(self.obstacle9), 206 | self.canvas_widget.coords(self.obstacle10), 207 | self.canvas_widget.coords(self.obstacle11), 208 | self.canvas_widget.coords(self.obstacle12), 209 | self.canvas_widget.coords(self.obstacle13), 210 | self.canvas_widget.coords(self.obstacle14), 211 | self.canvas_widget.coords(self.obstacle15), 212 | self.canvas_widget.coords(self.obstacle16), 213 | self.canvas_widget.coords(self.obstacle17), 214 | self.canvas_widget.coords(self.obstacle19), 215 | self.canvas_widget.coords(self.obstacle20), 216 | self.canvas_widget.coords(self.obstacle21), 217 | self.canvas_widget.coords(self.obstacle22), 218 | self.canvas_widget.coords(self.obstacle23), 219 | self.canvas_widget.coords(self.obstacle24), 220 | self.canvas_widget.coords(self.obstacle25), 221 | self.canvas_widget.coords(self.obstacle26), 222 | self.canvas_widget.coords(self.obstacle27), 223 | self.canvas_widget.coords(self.obstacle28), 224 | self.canvas_widget.coords(self.obstacle30), 225 | self.canvas_widget.coords(self.obstacle31), 226 | self.canvas_widget.coords(self.obstacle32), 227 | self.canvas_widget.coords(self.obstacle33), 228 | self.canvas_widget.coords(self.obstacle34), 229 | self.canvas_widget.coords(self.obstacle35), 230 | self.canvas_widget.coords(self.obstacle36), 231 | self.canvas_widget.coords(self.obstacle37), 232 | self.canvas_widget.coords(self.obstacle38), 233 | self.canvas_widget.coords(self.obstacle39), 234 | self.canvas_widget.coords(self.obstacle40), 235 | self.canvas_widget.coords(self.obstacle41), 236 | self.canvas_widget.coords(self.obstacle42), 237 | self.canvas_widget.coords(self.obstacle43), 238 | self.canvas_widget.coords(self.obstacle44), 239 | self.canvas_widget.coords(self.obstacle45), 240 | self.canvas_widget.coords(self.obstacle46), 241 | self.canvas_widget.coords(self.obstacle47), 242 | self.canvas_widget.coords(self.obstacle48), 243 | self.canvas_widget.coords(self.obstacle49), 244 | self.canvas_widget.coords(self.obstacle50), 245 | self.canvas_widget.coords(self.obstacle51), 246 | self.canvas_widget.coords(self.obstacle52)] 247 | 248 | def reset(self): 249 | ''' 250 | reset the environment and all parameters 251 | Return: 252 | the agent's current state in the format of [120.0, 40.0] 253 | ''' 254 | self.update() 255 | self.canvas_widget.delete(self.agent) 256 | self.agent = self.canvas_widget.create_image( 257 | 0, 0, anchor='nw', image=self.robot) 258 | self.comparison_dic = {} 259 | self.key_dic = 0 260 | return self.canvas_widget.coords(self.agent) 261 | 262 | def refresh(self): 263 | ''' 264 | update and refresh the environment before training 265 | ''' 266 | self.update() 267 | 268 | def policy(self, state): 269 | ''' 270 | choosing teh best action based on current state of the agent 271 | ''' 272 | # right 273 | if list(map(add, state, [40, 0])) in self.obstacles: 274 | action = np.random.choice([0, 1, 3]) 275 | elif list(map(add, state, [-40, 0])) in self.obstacles: 276 | action = np.random.choice([0, 1, 2]) 277 | # down 278 | elif list(map(add, state, [0, 40])) in self.obstacles: 279 | action = np.random.choice([0, 2, 3]) 280 | elif list(map(add, state, [0, -40])) in self.obstacles: 281 | action = np.random.choice([1, 2, 3]) 282 | else: 283 | action = np.random.choice(self.num_actions) 284 | return action 285 | 286 | def step(self, action): 287 | ''' 288 | Moving the agent one pixel and update reward, action and next step regarding the agent next location 289 | 290 | Parameters: 291 | action: Actions = {0:'up', 1:'down', 2:'right', 3:'left} 292 | 293 | Returns: 294 | reward, next step and done flag 295 | ''' 296 | state = self.canvas_widget.coords(self.agent) 297 | base_action = np.array([0, 0]) 298 | 299 | if action == 0: 300 | if state[1] >= self.pixels: 301 | base_action[1] -= self.pixels 302 | elif action == 1: 303 | if state[1] < (self.env_height - 1) * self.pixels: 304 | base_action[1] += self.pixels 305 | elif action == 2: 306 | if state[0] < (self.env_width - 1) * self.pixels: 307 | base_action[0] += self.pixels 308 | elif action == 3: 309 | if state[0] >= self.pixels: 310 | base_action[0] -= self.pixels 311 | 312 | self.canvas_widget.move(self.agent, base_action[0], base_action[1]) 313 | self.comparison_dic[self.key_dic] = self.canvas_widget.coords( 314 | self.agent) # storing new position of agent 315 | next_state = self.comparison_dic[self.key_dic] 316 | self.key_dic += 1 # add next key in dictionary 317 | 318 | if next_state == self.canvas_widget.coords(self.flag): 319 | reward = 100 320 | next_state = 'Goal' 321 | done = True 322 | 323 | # filling the dictionary first time 324 | if self.fake: 325 | for j in range(len(self.comparison_dic)): 326 | self.path_dic[j] = self.comparison_dic[j] 327 | self.fake = False 328 | self.longest_path = len(self.comparison_dic) 329 | self.shortest_path = len(self.comparison_dic) 330 | 331 | # storing shortest path 332 | if len(self.comparison_dic) < len(self.path_dic): 333 | self.shortest_path = len(self.comparison_dic) 334 | self.path_dic = {} 335 | for j in range(len(self.comparison_dic)): 336 | self.path_dic[j] = self.comparison_dic[j] 337 | 338 | # storing longest path 339 | if len(self.comparison_dic) > self.longest_path: 340 | self.longest_path = len(self.comparison_dic) 341 | 342 | elif next_state in self.obstacles: 343 | reward = -5 344 | done = True 345 | next_state = 'Obstacle' 346 | self.comparison_dic = {} 347 | self.key_dic = 0 348 | 349 | elif next_state in [self.canvas_widget.coords(self.obstacle29)]: 350 | reward = -1 351 | done = True 352 | next_state = 'Rubik' 353 | self.comparison_dic = {} 354 | self.key_dic = 0 355 | 356 | else: 357 | reward = 0 358 | done = False 359 | 360 | return next_state, reward, done 361 | 362 | def final_path(self): 363 | ''' 364 | saving final path and showing graphically by balck ovals 365 | ''' 366 | origin_point = np.array([20, 20]) 367 | path_list = [] 368 | self.canvas_widget.delete(self.agent) 369 | for j in range(len(self.path_dic)): 370 | path_list.append(self.path_dic[j]) 371 | self.track = self.canvas_widget.create_oval( 372 | self.path_dic[j][0] + origin_point[0] - 12, 373 | self.path_dic[j][1] + origin_point[1] - 12, 374 | self.path_dic[j][0] + origin_point[0] + 12, 375 | self.path_dic[j][1] + origin_point[1] + 12, 376 | fill='black', 377 | outline='black') 378 | # putting the final route in a global variable 379 | global_variable[j] = self.path_dic[j] 380 | 381 | with open('data.txt', 'w') as f: 382 | f.write(f'The Shortest Path: {str(self.shortest_path)} \n') 383 | f.write(f'The Longest Path: {str(self.longest_path)} \n') 384 | f.write(f'Optimal Path: {str(path_list)} \n') 385 | 386 | 387 | def final_states(): 388 | '''final route coordination for plotting''' 389 | 390 | return global_variable 391 | -------------------------------------------------------------------------------- /td_learning/images/agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/agent.png -------------------------------------------------------------------------------- /td_learning/images/boot_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/boot_tree.png -------------------------------------------------------------------------------- /td_learning/images/building.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/building.png -------------------------------------------------------------------------------- /td_learning/images/flag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/flag.png -------------------------------------------------------------------------------- /td_learning/images/garbage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/garbage.png -------------------------------------------------------------------------------- /td_learning/images/obstacle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/obstacle.png -------------------------------------------------------------------------------- /td_learning/images/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /td_learning/images/rubik.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/rubik.png -------------------------------------------------------------------------------- /td_learning/images/tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/td_learning/images/tree.png -------------------------------------------------------------------------------- /td_learning/plot_results.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | 5 | class Plots: 6 | 7 | def plot_reward(self, reward): 8 | plt.close() 9 | plt.plot(np.arange(len(reward)), reward, 'b') 10 | plt.title('Episodes vs Reward') 11 | plt.xlabel('Episodes') 12 | plt.ylabel('Reward') 13 | plt.grid() 14 | plt.savefig('reward.png') 15 | plt.show() 16 | 17 | def plot_steps(self, steps): 18 | plt.close() 19 | plt.plot(np.arange(len(steps)), steps, 'r') 20 | plt.title('Episodes vs Steps') 21 | plt.xlabel('Episodes') 22 | plt.ylabel('Steps') 23 | plt.grid() 24 | plt.savefig('steps.png') 25 | plt.show() 26 | 27 | def plot_value(self, value): 28 | plt.close() 29 | plt.plot(np.arange(len(value)), value, 'g') 30 | plt.title('Episodes vs Values') 31 | plt.xlabel('Episodes') 32 | plt.ylabel('Q-Values') 33 | plt.grid() 34 | plt.savefig('value.png') 35 | plt.show() 36 | -------------------------------------------------------------------------------- /td_learning/readme: -------------------------------------------------------------------------------- 1 | "The Temporal Difference (TD) method utilizes experience to address the prediction problem. Given experiences following a policy, this method updates its estimate, V, for the non-terminal states encountered in that experience. In this algorithm, actions are selected according to the policy. In each state, the agent randomly chooses an action based on a simple policy. When moving right encounters an obstacle, the agent will randomly select other actions, and this scenario applies similarly to other directions." 2 | -------------------------------------------------------------------------------- /td_learning/run.py: -------------------------------------------------------------------------------- 1 | '''This is the main file. When you run it, the agent start training process.''' 2 | 3 | from environment import Environment 4 | from algo_td0 import TemporalDifference 5 | from plot_results import Plots 6 | 7 | 8 | def main(): 9 | total_steps = [] 10 | total_rewards = [] 11 | total_values = [] 12 | episodes = 20000 13 | 14 | for episode in range(episodes): 15 | state = env.reset() 16 | step = 0 17 | value = 0 18 | reward_value = 0 19 | while True: 20 | env.refresh() 21 | action = env.policy(state) 22 | next_state, reward, done = env.step(action) 23 | value += RL.learning(str(state), reward, str(next_state)) 24 | state = next_state 25 | reward_value += reward 26 | step += 1 27 | 28 | if done: 29 | total_steps += [step] 30 | total_rewards += [reward_value] 31 | total_values += [value] 32 | break 33 | 34 | env.final_path() 35 | plot.plot_reward(total_rewards) 36 | plot.plot_steps(total_steps) 37 | plot.plot_value(total_values) 38 | RL.print_v_table() 39 | 40 | 41 | if __name__ == "__main__": 42 | env = Environment() 43 | # RL = TemporalDifference(actions=list(range(env.num_actions))) 44 | RL = TemporalDifference() 45 | plot = Plots() 46 | env.after(10, main) 47 | env.mainloop() 48 | -------------------------------------------------------------------------------- /training.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pouyan-asg/Path-planning-with-RL-algorithms/aa277272722b15a683ed2fd93e3635cc3860863e/training.gif --------------------------------------------------------------------------------