├── .gitignore ├── README.md ├── assets ├── car_perspective_polar.png ├── laser_range.png └── neuroracer.gif ├── docker ├── Dockerfile ├── clone_build.sh ├── install_gzweb.sh ├── install_ros.sh ├── requirements.txt ├── setup.sh └── start.sh ├── neuroracer_gym ├── CMakeLists.txt ├── package.xml ├── setup.py └── src │ └── neuroracer_gym │ ├── __init__.py │ ├── neuroracer_env.py │ └── tasks │ ├── __init__.py │ ├── neuroracer_continuous_task.py │ └── neuroracer_discrete_task.py ├── neuroracer_gym_rl ├── CMakeLists.txt ├── __init__.py ├── config │ └── params.yaml ├── launch │ ├── ddpg_learning.launch │ └── start.launch ├── package.xml └── scripts │ ├── __init__.py │ ├── ddpg.py │ ├── double_dqn.py │ ├── double_drqn.py │ ├── dqn.py │ ├── drqn.py │ ├── neuroracer_discrete.py │ ├── training.py │ └── utils.py └── q_learning.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | overlay/* 2 | .ipynb_checkpoints/ 3 | 4 | *.pyc 5 | *.h5 6 | *.hdf5 7 | 8 | .vscode/* 9 | !.vscode/settings.json 10 | !.vscode/tasks.json 11 | !.vscode/launch.json 12 | !.vscode/extensions.json 13 | 14 | # OS generated files # 15 | ###################### 16 | .DS_Store 17 | .DS_Store? 18 | ._* 19 | .Spotlight-V100 20 | .Trashes 21 | ehthumbs.db 22 | Thumbs.db 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![](assets/neuroracer.gif) 2 | 3 | 4 | The goal of this project is to provide an easy-to-use framework that will allow to simulate a training of a self-driving car using OpenAI Gym, ROS and Gazebo. The project is based on openai_ros package. This package implements an architecture that was proposed by The Construct team. 5 | 6 | # Software requirements # 7 | * Ubuntu 18.04 or Windows WSL Ubuntu 18.04 8 | * Python 2.7 and pip2 9 | * Tensorflow CPU or GPU 10 | 11 | # Installation # 12 | The whole setup can be found in [setup.sh](docker/setup.sh) which is used to setup docker ([goto docker section](#Docker)) which is the simplest way to launch the simulation. 13 | 14 | ### ROS Melodic ### 15 | Official instructions can be found [here](http://wiki.ros.org/melodic/Installation/Ubuntu) 16 | ```bash 17 | chmod +x ./docker/install_ros.s 18 | ./docker/install_ros.sh 19 | ``` 20 | [install_ros.sh](docker/install_ros.sh) 21 | 22 | 23 | ### Catkin workspace ### 24 | Creating `catkin_ws` directory, cloning and building all projects: 25 | ```bash 26 | chmod +x ./docker/clone_build.sh 27 | ./docker/clone_build.sh 28 | ``` 29 | [clone_build.sh](docker/clone_build.sh) 30 | 31 | 32 | ### Python packages ### 33 | ```bash 34 | pip install -U -r ./docker/requirements.txt 35 | ```` 36 | [requirements.txt](docker/requirements.txt) 37 | 38 | 39 | # Start training # 40 | Start MIT racecar simulation in its own terminal: 41 | ```bash 42 | source ~//devel/setup.bash 43 | roslaunch racecar_gazebo racecar_tunnel.launch 44 | ``` 45 | 46 | Start training in the second terminal: 47 | ```bash 48 | source ~/catkin_ws/devel/setup.bash 49 | roslaunch neuroracer_gym_rl start.launch agent:= 50 | ``` 51 | There are 2 implemented agents: `dqn` and `double_dqn`. 52 | 53 | 54 | Note: if you are getting `Exception sending a message` error, you should set the `IGN_IP` environment variable ([details](http://answers.gazebosim.org/question/21103/exception-sending-a-message/?answer=22276#post-id-22276)) before launching ros-packages: 55 | ```bash 56 | export IGN_IP=127.0.0.1 57 | ``` 58 | # WSL and headless setup # 59 | Sometimes headless setup is needed. For example, when there is only ssh access to the server or if the server runs on Windows. 60 | 61 | ### Xvfb ### 62 | Headless setup has a couple of special requirements. In order to get the camera rendering a view, you will need an xserver running. This can be achieved in several ways. The universal solution is Xvfb. 63 | 64 | >Xvfb or X virtual framebuffer is a display server implementing the X11 display server protocol. In contrast to other display servers, Xvfb performs all graphical operations in virtual memory without showing any screen output. 65 | >[wikipedia](https://en.wikipedia.org/wiki/Xvfb) 66 | ```bash 67 | sudo apt install xvfb 68 | ``` 69 | 70 | ### Gazebo Web ### 71 | >Gzweb is a WebGL client for Gazebo. Like gzclient, it's a front-end graphical interface to gzserver and provides visualization of the simulation. However, Gzweb is a thin client in comparison, and lets you interact with the simulation from the comfort of a web browser. This means cross-platform support, minimal client-side installation, and support for mobile devices. 72 | >[Gzweb](http://gazebosim.org/gzweb.html) 73 | 74 | ```bash 75 | chmod +x ./docker/install_gzweb.sh 76 | ./docker/install_gzweb.sh path/to/home_dir 77 | ``` 78 | [install_gzweb.sh](docker/install_gzweb.sh) 79 | 80 | 81 | # Docker # 82 | Just pull the image: 83 | ```bash 84 | docker pull karay/neuroracer 85 | ``` 86 | 87 | and run it with following params: 88 | ```bash 89 | docker run -d --runtime=nvidia -p 8080:8080 -p 8888:8888 karay/neuroracer 90 | ``` 91 | Where `http://localhost:8080` is Gazebo Web and `http://localhost:8888` is Jupyter Lab. There is also an example notebook: 92 | http://localhost:8888/lab/tree/catkin_ws/src/neuroracer/q_learning.ipynb 93 | 94 | Note: This image is setup for CUDA 10 and Tensorflow GPU. So `docker-ce` and [nvidia-docker2](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)) are required. 95 | 96 | 97 | 103 | -------------------------------------------------------------------------------- /assets/car_perspective_polar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/assets/car_perspective_polar.png -------------------------------------------------------------------------------- /assets/laser_range.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/assets/laser_range.png -------------------------------------------------------------------------------- /assets/neuroracer.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/assets/neuroracer.gif -------------------------------------------------------------------------------- /docker/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 2 | 3 | # nvidia-container-runtime 4 | ENV NVIDIA_VISIBLE_DEVICES \ 5 | ${NVIDIA_VISIBLE_DEVICES:-all} 6 | ENV NVIDIA_DRIVER_CAPABILITIES \ 7 | ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics 8 | 9 | WORKDIR /home 10 | COPY setup.sh start.sh install_ros.sh clone_build.sh install_gzweb.sh requirements.txt ./ 11 | RUN ["chmod", "+x", "setup.sh", "start.sh", "install_ros.sh", "clone_build.sh", "install_gzweb.sh"] 12 | RUN ./setup.sh 13 | 14 | # Expose Jupyter 15 | EXPOSE 8888 16 | # Expose Gazebo web 17 | EXPOSE 8080 18 | 19 | CMD ./start.sh -------------------------------------------------------------------------------- /docker/clone_build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | CATKIN_HOME=$1/catkin_ws 4 | CATKIN_SRC=$CATKIN_HOME/src 5 | 6 | mkdir -p $CATKIN_SRC 7 | 8 | cd $CATKIN_SRC 9 | git clone https://github.com/mit-racecar/racecar.git 10 | git clone https://github.com/mit-racecar/racecar-simulator.git 11 | 12 | git clone https://bitbucket.org/theconstructcore/openai_ros.git 13 | 14 | git clone https://github.com/karray/neuroracer.git 15 | 16 | cd $CATKIN_HOME 17 | source "/opt/ros/melodic/setup.bash" && catkin_make -------------------------------------------------------------------------------- /docker/install_gzweb.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | HOME=$1 4 | 5 | MODEL_PATH="/home/catkin_ws/src/racecar-simulator/racecar_description/models" 6 | 7 | apt-get install -yq --no-install-recommends libjansson-dev mercurial libboost-dev imagemagick libtinyxml-dev cmake build-essential 8 | 9 | cd $HOME 10 | hg clone https://bitbucket.org/osrf/gzweb 11 | cd gzweb 12 | hg up gzweb_1.4.0 13 | # hg up default 14 | 15 | source "/usr/share/gazebo/setup.sh" 16 | 17 | # The first time you build, you'll need to gather all the Gazebo models which you want to simulate in the right directory ('http/client/assets') and prepare them for the web. 18 | source /usr/share/gazebo/setup.sh 19 | # export GAZEBO_MODEL_PATH=$MODEL_PATH:$GAZEBO_MODEL_PATH 20 | mkdir -p /home/gzweb/http/client/assets/ 21 | cp -rf /home/catkin_ws/src/racecar-simulator/racecar_description/ /home/gzweb/http/client/assets/ 22 | 23 | ./deploy.sh -m local -t 24 | # xvfb-run -s "-screen 0 640x480x24" ./deploy.sh -m local -t 25 | -------------------------------------------------------------------------------- /docker/install_ros.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | apt-get install --no-install-recommends -yq lsb-release 4 | 5 | sh -c 'echo "deb http://packages.ros.org/ros/ubuntu $(lsb_release -sc) main" > /etc/apt/sources.list.d/ros-latest.list' 6 | apt-key adv --keyserver 'hkp://keyserver.ubuntu.com:80' --recv-key C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654 7 | 8 | apt-get update 9 | 10 | apt-get install -yq --no-install-recommends ros-melodic-desktop-full \ 11 | ros-melodic-ackermann-msgs ros-melodic-effort-controllers ros-melodic-joy ros-melodic-tf2-sensor-msgs 12 | 13 | rosdep init 14 | rosdep update 15 | echo "source /opt/ros/melodic/setup.bash" >> ~/.bashrc 16 | -------------------------------------------------------------------------------- /docker/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | scipy 3 | matplotlib 4 | scikit-learn 5 | opencv-python 6 | keras 7 | jupyterlab 8 | gym==0.10.9 9 | tensorflow-gpu==1.13.1 -------------------------------------------------------------------------------- /docker/setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | HOME=/home 4 | 5 | echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections 6 | 7 | apt-get update; apt-get install --no-install-recommends -yq \ 8 | git lsb-core curl wget xvfb \ 9 | python-pip python-dev \ 10 | dirmngr gnupg2 python-setuptools 11 | 12 | curl -sL https://deb.nodesource.com/setup_10.x | bash - 13 | apt-get install -yq --no-install-recommends nodejs && npm install -g npm 14 | 15 | ./install_ros.sh 16 | ./clone_build.sh $HOME 17 | 18 | ./install_gzweb.sh $HOME 19 | 20 | pip install --no-cache-dir -r requirements.txt 21 | 22 | rm -rf /var/lib/apt/lists/* 23 | -------------------------------------------------------------------------------- /docker/start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | HOME=/home 4 | CATKIN_HOME=$HOME/catkin_ws 5 | GZWEB_HOME=$HOME/gzweb 6 | 7 | source "$CATKIN_HOME/devel/setup.bash" 8 | 9 | nohup xvfb-run -s "-screen 0 640x480x24" roslaunch racecar_gazebo racecar_tunnel.launch & 10 | 11 | nohup jupyter lab --notebook-dir $HOME --port 8888 --ip 0.0.0.0 --allow-root --no-browser --LabApp.token='' & 12 | 13 | cd $GZWEB_HOME 14 | sleep 10; echo 'Strarting gzweb'; npm start -------------------------------------------------------------------------------- /neuroracer_gym/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 2.8.3) 2 | project(neuroracer_gym) 3 | 4 | find_package(catkin REQUIRED COMPONENTS 5 | openai_ros 6 | rospy 7 | ) 8 | 9 | catkin_python_setup() 10 | 11 | catkin_package( 12 | 13 | ) 14 | 15 | include_directories( 16 | ${catkin_INCLUDE_DIRS} 17 | ) -------------------------------------------------------------------------------- /neuroracer_gym/package.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | neuroracer_gym 4 | 0.0.0 5 | neuroracer_gym package 6 | 7 | aray 8 | 9 | TODO 10 | 11 | catkin 12 | 13 | racecar_gazebo 14 | 15 | openai_ros 16 | rospy 17 | openai_ros 18 | rospy 19 | 20 | 21 | 22 | 23 | -------------------------------------------------------------------------------- /neuroracer_gym/setup.py: -------------------------------------------------------------------------------- 1 | ## ! DO NOT MANUALLY INVOKE THIS setup.py, USE CATKIN INSTEAD 2 | 3 | from distutils.core import setup 4 | from catkin_pkg.python_setup import generate_distutils_setup 5 | 6 | # fetch values from package.xml 7 | setup_args = generate_distutils_setup( 8 | packages=['neuroracer_gym'], 9 | package_dir={'': 'src'}, 10 | ) 11 | 12 | setup(**setup_args) -------------------------------------------------------------------------------- /neuroracer_gym/src/neuroracer_gym/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/neuroracer_gym/src/neuroracer_gym/__init__.py -------------------------------------------------------------------------------- /neuroracer_gym/src/neuroracer_gym/neuroracer_env.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | import numpy as np 4 | 5 | import rospy 6 | from gazebo_msgs.msg import ModelState 7 | from gazebo_msgs.srv import SetModelState 8 | 9 | from openai_ros import robot_gazebo_env 10 | from sensor_msgs.msg import LaserScan, CompressedImage 11 | from cv_bridge import CvBridge, CvBridgeError 12 | # from std_msgs.msg import Float64 13 | # from sensor_msgs.msg import Image 14 | # from tf.transformations import quaternion_from_euler 15 | 16 | from ackermann_msgs.msg import AckermannDriveStamped 17 | 18 | from gym import spaces 19 | from gym.envs.registration import register 20 | 21 | # import cv2 22 | 23 | default_sleep = 1 24 | 25 | class NeuroRacerEnv(robot_gazebo_env.RobotGazeboEnv): 26 | def __init__(self): 27 | 28 | self.initial_position = None 29 | 30 | self.min_distance = .255 31 | 32 | self.bridge = CvBridge() 33 | 34 | # Doesnt have any accesibles 35 | self.controllers_list = [] 36 | 37 | # It doesnt use namespace 38 | self.robot_name_space = "" 39 | 40 | # We launch the init function of the Parent Class robot_gazebo_env.RobotGazeboEnv 41 | super(NeuroRacerEnv, self).__init__(controllers_list=self.controllers_list, 42 | robot_name_space=self.robot_name_space, 43 | reset_controls=False, 44 | start_init_physics_parameters=False) 45 | 46 | rospy.wait_for_service('/gazebo/set_model_state') 47 | try: 48 | self.set_model_state = rospy.ServiceProxy('/gazebo/set_model_state', SetModelState) 49 | except rospy.ServiceException as e: 50 | print("Service call failed: %s" % e) 51 | 52 | 53 | self.gazebo.unpauseSim() 54 | time.sleep(default_sleep) 55 | 56 | #self.controllers_object.reset_controllers() 57 | self._check_all_sensors_ready() 58 | 59 | self._init_camera() 60 | 61 | self.laser_subscription = rospy.Subscriber("/scan", LaserScan, self._laser_scan_callback) 62 | 63 | self.drive_control_publisher= rospy.Publisher("/vesc/ackermann_cmd_mux/input/navigation", 64 | AckermannDriveStamped, 65 | queue_size=20) 66 | 67 | self._check_publishers_connection() 68 | 69 | self.gazebo.pauseSim() 70 | 71 | rospy.logdebug("Finished NeuroRacerEnv INIT...") 72 | 73 | def reset_position(self): 74 | if not self.initial_position: 75 | return 76 | state_msg = ModelState() 77 | state_msg.model_name = 'racecar' 78 | state_msg.pose.position.x = self.initial_position['p_x'] 79 | state_msg.pose.position.y = self.initial_position['p_y'] 80 | state_msg.pose.position.z = self.initial_position['p_z'] 81 | state_msg.pose.orientation.x = self.initial_position['o_x'] 82 | state_msg.pose.orientation.y = self.initial_position['o_y'] 83 | state_msg.pose.orientation.z = self.initial_position['o_z'] 84 | state_msg.pose.orientation.w = self.initial_position['o_w'] 85 | 86 | self.set_model_state(state_msg) 87 | 88 | def reset(self): 89 | super(NeuroRacerEnv, self).reset() 90 | self.gazebo.unpauseSim() 91 | self.reset_position() 92 | 93 | time.sleep(default_sleep) 94 | self.gazebo.pauseSim() 95 | 96 | return self._get_obs() 97 | 98 | def _check_all_systems_ready(self): 99 | """ 100 | Checks that all the sensors, publishers and other simulation systems are 101 | operational. 102 | """ 103 | self._check_all_sensors_ready() 104 | return True 105 | 106 | 107 | # virtual methods 108 | # ---------------------------- 109 | 110 | def _check_all_sensors_ready(self): 111 | rospy.logdebug("START ALL SENSORS READY") 112 | self._check_laser_scan_ready() 113 | self._check_camera_ready() 114 | rospy.logdebug("ALL SENSORS READY") 115 | 116 | def _check_camera_ready(self): 117 | self.camera_msg = None 118 | rospy.logdebug("Waiting for /camera/zed/rgb/image_rect_color/compressed to be READY...") 119 | while self.camera_msg is None and not rospy.is_shutdown(): 120 | try: 121 | self.camera_msg = rospy.wait_for_message('/camera/zed/rgb/image_rect_color/compressed', 122 | CompressedImage, 123 | timeout=1.0) 124 | except: 125 | rospy.logerr("Camera not ready yet, retrying for getting camera_msg") 126 | 127 | def _init_camera(self): 128 | img = self.get_camera_image() 129 | 130 | # self.color_scale = "bgr8" # config["color_scale"] 131 | self.input_shape = img.shape 132 | obs_low = 0 133 | obs_high = 1 134 | self.observation_space = spaces.Box(low=obs_low, high=obs_high, shape=self.input_shape) 135 | 136 | img_dims = img.shape[0]*img.shape[1]*img.shape[2] 137 | byte_size = 4 138 | overhaead = 2 # reserving memory for ros header 139 | buff_size = img_dims*byte_size*overhaead 140 | self.camera_msg = rospy.Subscriber("/camera/zed/rgb/image_rect_color/compressed", 141 | CompressedImage, self._camera_callback, queue_size=1, 142 | buff_size=buff_size) 143 | rospy.logdebug("== Camera READY ==") 144 | 145 | def _check_laser_scan_ready(self): 146 | self.laser_scan = None 147 | rospy.logdebug("Waiting for /scan to be READY...") 148 | while self.laser_scan is None and not rospy.is_shutdown(): 149 | try: 150 | self.laser_scan = rospy.wait_for_message("/scan", LaserScan, timeout=1.0) 151 | rospy.logdebug("Current /scan READY=>") 152 | 153 | except: 154 | rospy.logerr("Current /scan not ready yet, retrying for getting laser_scan") 155 | return self.laser_scan 156 | 157 | # def _get_additional_laser_scan(self): 158 | # laser_scans = [] 159 | # self.gazebo.unpauseSim() 160 | # while len(laser_scans) < 2 and not rospy.is_shutdown(): 161 | # try: 162 | # data = rospy.wait_for_message("/scan", LaserScan, timeout=1.0) 163 | # laser_scans.append(data.ranges) 164 | # except Exception as e: 165 | # rospy.logerr("getting laser data...") 166 | # print(e) 167 | # self.gazebo.pauseSim() 168 | 169 | # return laser_scans 170 | 171 | def _laser_scan_callback(self, data): 172 | self.laser_scan = data 173 | 174 | def _camera_callback(self, msg): 175 | self.camera_msg = msg 176 | 177 | def _check_publishers_connection(self): 178 | """ 179 | Checks that all the publishers are working 180 | :return: 181 | """ 182 | rate = rospy.Rate(10) # 10hz 183 | while self.drive_control_publisher.get_num_connections() == 0 and not rospy.is_shutdown(): 184 | rospy.logdebug("No susbribers to drive_control_publisher yet so we wait and try again") 185 | try: 186 | rate.sleep() 187 | except rospy.ROSInterruptException: 188 | # This is to avoid error when world is rested, time when backwards. 189 | pass 190 | rospy.logdebug("drive_control_publisher Publisher Connected") 191 | 192 | rospy.logdebug("All Publishers READY") 193 | 194 | def _set_init_pose(self): 195 | """Sets the Robot in its init pose 196 | """ 197 | raise NotImplementedError() 198 | 199 | def _init_env_variables(self): 200 | """Inits variables needed to be initialised each time we reset at the start 201 | of an episode. 202 | """ 203 | raise NotImplementedError() 204 | 205 | def _compute_reward(self, observations, done): 206 | """Calculates the reward to give based on the observations given. 207 | """ 208 | raise NotImplementedError() 209 | 210 | def _set_action(self, action): 211 | """Applies the given action to the simulation. 212 | """ 213 | raise NotImplementedError() 214 | 215 | def _get_obs(self): 216 | return self.get_camera_image() 217 | 218 | def _is_done(self, observations): 219 | self._episode_done = self._is_collided() 220 | return self._episode_done 221 | 222 | def _create_steering_command(self, steering_angle, speed): 223 | # steering_angle = np.clip(steering_angle,self.steerin_angle_min, self.steerin_angle_max) 224 | 225 | a_d_s = AckermannDriveStamped() 226 | a_d_s.drive.steering_angle = steering_angle 227 | a_d_s.drive.steering_angle_velocity = 0.0 228 | a_d_s.drive.speed = speed # from 0 to 1 229 | a_d_s.drive.acceleration = 0.0 230 | a_d_s.drive.jerk = 0.0 231 | 232 | return a_d_s 233 | 234 | def steering(self, steering_angle, speed): 235 | command = self._create_steering_command(steering_angle, speed) 236 | self.drive_control_publisher.publish(command) 237 | 238 | # def get_odom(self): 239 | # return self.odom 240 | 241 | # def get_imu(self): 242 | # return self.imu 243 | 244 | def get_laser_scan(self): 245 | return np.array(self.laser_scan.ranges, dtype=np.float32) 246 | 247 | def get_camera_image(self): 248 | try: 249 | cv_image = self.bridge.compressed_imgmsg_to_cv2(self.camera_msg).astype('float32') 250 | except Exception as e: 251 | rospy.logerr("CvBridgeError: Error converting image") 252 | rospy.logerr(e) 253 | return cv_image 254 | 255 | def _is_collided(self): 256 | r = self.get_laser_scan() 257 | crashed = np.any(r <= self.min_distance) 258 | if crashed: 259 | # rospy.logdebug('the auto crashed! :(') 260 | # rospy.logdebug('distance: {}'.format(r.min())) 261 | # data = np.array(self._get_additional_laser_scan(), dtype=np.float32) 262 | # data = np.concatenate((np.expand_dims(r, axis=0), data), axis=0) 263 | # data_mean = np.mean(data, axis=0) 264 | min_range_idx = r.argmin() 265 | min_idx = min_range_idx - 5 266 | if min_idx < 0: 267 | min_idx = 0 268 | max_idx = min_idx + 10 269 | if max_idx >= r.shape[0]: 270 | max_idx = r.shape[0] - 1 271 | min_idx = max_idx - 10 272 | mean_distance = r[min_idx:max_idx].mean() 273 | 274 | crashed = np.any(mean_distance <= self.min_distance) 275 | 276 | return crashed 277 | -------------------------------------------------------------------------------- /neuroracer_gym/src/neuroracer_gym/tasks/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/neuroracer_gym/src/neuroracer_gym/tasks/__init__.py -------------------------------------------------------------------------------- /neuroracer_gym/src/neuroracer_gym/tasks/neuroracer_continuous_task.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | import numpy as np 4 | 5 | import rospy 6 | 7 | from gym.envs.registration import register 8 | from gym import spaces 9 | 10 | from neuroracer_gym import neuroracer_env 11 | 12 | register( 13 | id='NeuroRacer-v1', 14 | entry_point='neuroracer_gym:tasks.neuroracer_continuous_task.NeuroracerContinuousTask', 15 | # timestep_limit=timestep_limit_per_episode, 16 | ) 17 | 18 | class NeuroracerContinuousTask(neuroracer_env.NeuroRacerEnv): 19 | def __init__(self): 20 | self.cumulated_steps = 0.0 21 | self.last_action = np.zeros(2) 22 | 23 | self.steerin_angle_min = -1 # rospy.get_param('neuroracer_env/action_space/steerin_angle_min') 24 | self.steerin_angle_max = 1 # rospy.get_param('neuroracer_env/action_space/steerin_angle_max') 25 | self.action_space = spaces.Box(low=np.array([self.steerin_angle_min], dtype=np.float32), 26 | high=np.array([self.steerin_angle_max], dtype=np.float32)) 27 | 28 | super(NeuroracerContinuousTask, self).__init__() 29 | 30 | def _set_init_pose(self): 31 | self.steering(0, speed=0) 32 | return True 33 | 34 | def _init_env_variables(self): 35 | self.cumulated_reward = 0.0 36 | self._episode_done = False 37 | self.last_action = np.zeros(2) 38 | 39 | def _compute_reward(self, observations, done): 40 | print(self.last_action) 41 | reward = 1-np.abs(self.last_action).sum() 42 | print(reward) 43 | 44 | self.cumulated_reward += reward 45 | self.cumulated_steps += 1 46 | 47 | return reward 48 | 49 | def _set_action(self, action): 50 | steering_angle = np.clip(action, self.steerin_angle_min, self.steerin_angle_max) 51 | 52 | self.last_action = np.array([action, steering_angle], dtype=np.float32) 53 | self.steering(steering_angle, speed=10) 54 | -------------------------------------------------------------------------------- /neuroracer_gym/src/neuroracer_gym/tasks/neuroracer_discrete_task.py: -------------------------------------------------------------------------------- 1 | import time 2 | import sys 3 | 4 | import rospy 5 | 6 | from neuroracer_gym import neuroracer_env 7 | 8 | from gym import spaces 9 | from gym.envs.registration import register 10 | 11 | import numpy as np 12 | 13 | np.set_printoptions(threshold=sys.maxsize) 14 | 15 | # timestep_limit_per_episode = 10000 # Can be any Value 16 | 17 | default_sleep = 2 18 | 19 | print(register( 20 | id='NeuroRacer-v0', 21 | entry_point='neuroracer_gym:tasks.neuroracer_discrete_task.NeuroRacerDiscreteTask', 22 | # timestep_limit=timestep_limit_per_episode, 23 | )) 24 | 25 | class NeuroRacerDiscreteTask(neuroracer_env.NeuroRacerEnv): 26 | def __init__(self): 27 | self.cumulated_steps = 0.0 28 | self.last_action = 1 29 | self.right_left = 0 30 | self.action_space = spaces.Discrete(3) 31 | self.rate = None 32 | self.speed = 1 33 | self.set_sleep_rate(100) 34 | self.number_of_sleeps = 10 35 | 36 | 37 | super(NeuroRacerDiscreteTask, self).__init__() 38 | 39 | def set_sleep_rate(self, hz): 40 | self.rate = None 41 | if hz > 0: 42 | self.rate = rospy.Rate(hz) 43 | 44 | def _set_init_pose(self): 45 | self.steering(0, speed=0) 46 | return True 47 | 48 | def _init_env_variables(self): 49 | self.cumulated_reward = 0.0 50 | self.last_action = 1 51 | # self.right_left = False 52 | self._episode_done = False 53 | 54 | def _get_distances(self): 55 | ranges = self.get_laser_scan() 56 | rigth_distance = np.clip(ranges[175:185], None, 10).mean() 57 | left_distance = np.clip(ranges[895:905], None, 10).mean() 58 | middle_distance = np.clip(ranges[525:555], None, 10).mean() 59 | return rigth_distance, left_distance, middle_distance 60 | 61 | def _compute_reward(self, observations, done): 62 | if not done: 63 | rigth_distance, left_distance, middle_distance = self._get_distances() 64 | # print(left_distance, middle_distance, rigth_distance) 65 | reward = (middle_distance - 3)-np.abs(left_distance-rigth_distance) 66 | # if self.last_action!=1: 67 | # reward-=0.001*(1.7**self.right_left - 1) 68 | else: 69 | reward = -100 70 | 71 | self.cumulated_reward += reward 72 | self.cumulated_steps += 1 73 | 74 | return reward 75 | 76 | 77 | def _set_action(self, action): 78 | steering_angle = 0 79 | 80 | if action == 0: # right 81 | steering_angle = -1 82 | if action == 2: # left 83 | steering_angle = 1 84 | 85 | if action == 1: 86 | self.right_left = 0 87 | else: 88 | self.right_left+=1 89 | 90 | # self.right_left = action != 1 & self.last_action != 1 & self.last_action != action 91 | 92 | self.last_action = action 93 | self.steering(steering_angle, self.speed) 94 | if self.rate: 95 | for i in range(int(self.number_of_sleeps)): 96 | self.rate.sleep() 97 | self.steering(steering_angle, self.speed) 98 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 2.8.3) 2 | project(neuroracer_gym_rl) 3 | 4 | find_package(catkin REQUIRED COMPONENTS 5 | neuroracer_gym 6 | rospy 7 | ) 8 | 9 | 10 | catkin_package( 11 | 12 | ) 13 | 14 | include_directories( 15 | ${catkin_INCLUDE_DIRS} 16 | ) -------------------------------------------------------------------------------- /neuroracer_gym_rl/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/neuroracer_gym_rl/__init__.py -------------------------------------------------------------------------------- /neuroracer_gym_rl/config/params.yaml: -------------------------------------------------------------------------------- 1 | neuroracer_gym: #namespace 2 | 3 | running_step: 0.04 # amount of time the control will be executed 4 | pos_step: 0.016 # increment in position for each command 5 | 6 | #qlearn parameters 7 | alpha: 1.0 8 | gamma: 0.7 9 | epsilon: 0.9 10 | epsilon_discount: 0.999 11 | nepisodes: 500 12 | nsteps: 500 13 | number_splits: 10 #set to change the number of state splits for the continuous problem and also the number of env_variable splits 14 | 15 | running_step: 0.06 # Time for each step 16 | wait_time: 0.1 # Time to wait in the reset phases 17 | 18 | n_actions: 3 # We have 3 actions, Forwards,TurnLeft,TurnRight 19 | n_observations: 6 # We have 6 different observations 20 | 21 | speed_step: 1.0 # Time to wait in the reset phases 22 | 23 | linear_forward_speed: 0.5 # Spawned for ging fowards 24 | linear_turn_speed: 0.1 # Lienare speed when turning 25 | angular_speed: 0.3 # Angular speed when turning Left or Right 26 | init_linear_forward_speed: 0.0 # Initial linear speed in which we start each episode 27 | init_linear_turn_speed: 0.0 # Initial angular speed in shich we start each episode 28 | 29 | new_ranges: 5 # How many laser readings we jump in each observation reading, the bigger the less laser resolution 30 | min_range: 0.5 # Minimum meters below wich we consider we have crashed 31 | max_laser_value: 6 # Value considered Ok, no wall 32 | min_laser_value: 0 # Value considered there is an obstacle or crashed 33 | 34 | number_of_sectors: 3 # How many sectors we have 35 | min_range: 0.5 # Minimum meters below wich we consider we have crashed 36 | middle_range: 1.0 # Minimum meters below wich we consider we have crashed 37 | danger_laser_value: 2 # Value considered Ok, no wall 38 | middle_laser_value: 1 # Middle value 39 | safe_laser_value: 0 # Value considered there is an obstacle or crashed 40 | 41 | forwards_reward: 5 # Points Given to go forwards 42 | turn_reward: 1 # Points Given to turn as action 43 | end_episode_points: 200 # Points given when ending an episode 44 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/launch/ddpg_learning.launch: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/launch/start.launch: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/package.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | neuroracer_gym_rl 4 | 0.0.0 5 | The neuroracer_gym_rl package 6 | 7 | aray 8 | 9 | TODO 10 | 11 | catkin 12 | 13 | neuroracer_gym 14 | rospy 15 | neuroracer_gym 16 | rospy 17 | 18 | 19 | 20 | 21 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karray/neuroracer/47be6366bc9574b41146b9e47002847cd9ae4356/neuroracer_gym_rl/scripts/__init__.py -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/ddpg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import time 4 | import os 5 | import csv 6 | 7 | import numpy as np 8 | 9 | import rospy 10 | import rospkg 11 | 12 | import gym 13 | from neuroracer_gym.tasks import neuroracer_continuous_task 14 | 15 | # from gym.spaces import Box 16 | 17 | # from keras import backend as k 18 | # from keras.layers.core import Reshape 19 | # from keras.optimizers import Adam 20 | from keras.initializers import RandomUniform, VarianceScaling 21 | from keras.layers import Conv2D, Dense, Dropout, Flatten, Input, concatenate, add 22 | from keras.models import Model 23 | from keras.regularizers import l2 24 | from keras.callbacks import Callback as KerasCallback 25 | from keras.optimizers import Adam 26 | 27 | from rl.agents import DDPGAgent 28 | from rl.memory import SequentialMemory 29 | from rl.random import OrnsteinUhlenbeckProcess 30 | 31 | 32 | 33 | class Agent: 34 | def __init__(self, env): 35 | # rospack = rospkg.RosPack() 36 | # self.working_dir = rospack.get_path('neuroracer_gym_rl') 37 | # self.weight_backup = os.path.join(self.working_dir, "neuroracer.h5") 38 | self.env = env 39 | 40 | self.observation_space = self.env.observation_space 41 | self.action_space = self.env.action_space 42 | self.nb_actions = self.env.action_space.shape[0] 43 | self.batch_size = 16 44 | self.max_buffer = 100000 45 | self.window_length = 16 46 | self.memory = SequentialMemory(limit=self.max_buffer, window_length=self.window_length) 47 | self.learning_rate_actor = 0.0001 48 | self.learning_rate_critic = 0.001 49 | self.gamma = 0.9 50 | self.exploration_rate = 0.95 51 | self.exploration_min = 0.01 52 | self.exploration_decay = 0.995 53 | 54 | random_process = OrnsteinUhlenbeckProcess(size=self.nb_actions, theta=.15, mu=0., sigma=.2) 55 | 56 | actor = self._create_actor() 57 | critic, critic_action_input = self._create_critic(self.nb_actions) 58 | 59 | self.model = DDPGAgent(nb_actions=self.nb_actions, 60 | actor=actor, 61 | critic=critic, 62 | critic_action_input=critic_action_input, 63 | memory=self.memory, 64 | nb_steps_warmup_critic=500, 65 | nb_steps_warmup_actor=500, 66 | random_process=random_process, 67 | gamma=self.gamma, 68 | target_model_update=.001, 69 | # processor=self.processor, 70 | batch_size=self.batch_size) 71 | self.model.compile( 72 | (Adam(lr=self.learning_rate_actor, clipnorm=1.), Adam(lr=self.learning_rate_critic, clipnorm=1.)), 73 | metrics=['mse']) 74 | 75 | 76 | def _create_actor(self): 77 | # input_shape = (self.window_length,) + self.observation_space.shape 78 | S = Input(shape=self.observation_space.shape) 79 | # S_reshape = Reshape(input_shape)(S) 80 | c1 = Conv2D(32, kernel_size=(4, 4), activation='relu', padding="valid", 81 | kernel_initializer=VarianceScaling(mode='fan_in', distribution='uniform'))(S) 82 | c2 = Conv2D(32, kernel_size=(4, 4), activation='relu', 83 | kernel_initializer=VarianceScaling(mode='fan_in', distribution='uniform'))(c1) 84 | c3 = Conv2D(32, kernel_size=(4, 4), activation='relu', 85 | kernel_initializer=VarianceScaling(mode='fan_in', distribution='uniform'))(c2) 86 | c3_flatten = Flatten(name='flattened_observation')(c3) 87 | d1 = Dense(200, activation='relu', kernel_initializer='glorot_uniform')(c3_flatten) 88 | d2 = Dense(200, activation='relu', kernel_initializer='glorot_uniform')(d1) 89 | 90 | Steer = Dense(self.nb_actions, 91 | activation='tanh', 92 | name='prediction', 93 | bias_initializer='zeros', 94 | kernel_initializer=RandomUniform(minval=-3e-4, maxval=3e-4) 95 | )(d2) 96 | # Speed = Dense(1, 97 | # activation='sigmoid', 98 | # bias_initializer='zeros', 99 | # kernel_initializer=RandomUniform(minval=0.0, maxval=3e-4) 100 | # )(d2) 101 | # V = concatenate([Steer, Speed], name='merge_concatenate') 102 | 103 | model = Model(inputs=S, outputs=Steer) # TODO use 'V' once multi output is supported by keras-rl 104 | 105 | print(model.summary()) 106 | 107 | return model 108 | 109 | def _create_critic(self, nb_actions=None): 110 | # input_shape = (self.window_length,) + self.observation_space.shape 111 | S = Input(shape=self.observation_space.shape) 112 | # S_reshape = Reshape(input_shape)(S) 113 | c1 = Conv2D(32, kernel_size=(4, 4), activation='relu', padding="valid", 114 | kernel_initializer=VarianceScaling(mode='fan_in', distribution='uniform'), 115 | kernel_regularizer=l2(0.01))(S) 116 | c2 = Conv2D(32, kernel_size=(4, 4), activation='relu', 117 | kernel_initializer=VarianceScaling(mode='fan_in', distribution='uniform'), 118 | kernel_regularizer=l2(0.01))(c1) 119 | c3 = Conv2D(32, kernel_size=(4, 4), activation='relu', 120 | kernel_initializer=VarianceScaling(mode='fan_in', distribution='uniform'), 121 | kernel_regularizer=l2(0.01))(c2) 122 | observation_flattened = Flatten()(c3) 123 | O = Dense(200, activation='relu', kernel_initializer='glorot_uniform', 124 | kernel_regularizer=l2(0.01))(observation_flattened) 125 | 126 | A = Input(shape=self.action_space.shape, name='action_input') 127 | 128 | # TODO activate once Speed is activated 129 | # a1 = Dense(200, activation='relu', kernel_initializer='glorot_uniform', 130 | # kernel_regularizer=l2(0.01))(A) 131 | # h1 = add([O,a1], name='merge_sum') 132 | 133 | # TODO use upper h1 instead of this one with Speed activated 134 | h1 = concatenate([O, A], name='merge_concatenate') 135 | h2 = Dense(units=200, activation='relu', kernel_initializer='glorot_uniform', 136 | kernel_regularizer=l2(0.01))(h1) 137 | V = Dense(nb_actions, 138 | activation='linear', 139 | bias_initializer='zeros', 140 | kernel_initializer=RandomUniform(minval=-3e-4, maxval=3e-4), 141 | kernel_regularizer=l2(0.01) 142 | )(h2) 143 | model = Model(inputs=[S, A], outputs=V) 144 | 145 | print (model.summary()) 146 | 147 | return model, A 148 | 149 | def train(self, env=None, nb_steps=50000, nb_max_episode_steps=5000, nb_episodes_test=20, action_repetition=1, 150 | verbose=0): 151 | 152 | # TODO callback for save depending on loss value? 153 | self.model.fit(self.env, 154 | nb_max_episode_steps=nb_max_episode_steps, 155 | nb_steps=nb_steps, 156 | action_repetition=action_repetition, 157 | visualize=False) 158 | 159 | self.model.test(self.env, 160 | nb_episodes=nb_episodes_test, 161 | visualize=False, 162 | nb_max_episode_steps=nb_max_episode_steps) 163 | 164 | 165 | 166 | class StatsCallback(KerasCallback): 167 | def _set_env(self, env, path, interval=5000): 168 | self.model_file_path = os.path.join(path, 'model_checkpoint.h5') 169 | self.stats_file_path = os.path.join(path, 'stats_{}.csv'.format(time.time())) 170 | self.env = env 171 | self.episode_time = time.time() 172 | self.total_time = time.time() 173 | self.interval = interval 174 | self.total_steps = 0 175 | 176 | # def on_episode_begin(self, episode, logs={}): 177 | # pass 178 | 179 | def on_episode_end(self, episode, logs={}): 180 | rospy.loginfo("Episode {}; steps{}; reward {}".format(episode, self.env.cumulated_steps, self.env.cumulated_reward)) 181 | rospy.loginfo("Time {}, total {}".format(self.format_time(self.episode_time), 182 | self.format_time(self.total_time))) 183 | # with open(self.stats_file_path, newline='') as csvfile: 184 | # writer = csv.writer(csvfile, delimiter=';') 185 | # writer.writerow([episode, self.env.cumulated_steps, self.env.cumulated_reward, self.episode_time, self.total_time]) 186 | 187 | # def on_step_begin(self, step, logs={}): 188 | # pass 189 | 190 | def on_step_end(self, step, logs={}): 191 | self.total_steps += 1 192 | if self.total_steps % self.interval != 0: 193 | return 194 | # filepath = self.model_file_path.format(step=self.total_steps) 195 | # rospy.loginfo('Saving model to {}'.format(self.total_steps, filepath)) 196 | self.model.save_weights(self.model_file_path, overwrite=True) 197 | # def on_action_begin(self, action, logs={}): 198 | # pass 199 | 200 | # def on_action_end(self, action, logs={}): 201 | # pass 202 | 203 | def format_time(self, t): 204 | m, s = divmod(int(time.time() - t), 60) 205 | h, m = divmod(m, 60) 206 | return "%d:%02d:%02d" % (h, m, s) 207 | 208 | if __name__ == '__main__': 209 | rospy.init_node('neuroracer_ddpg', anonymous=True, log_level=rospy.INFO) 210 | 211 | env = gym.make('NeuroRacer-v1') 212 | agent = Agent(env) 213 | agent.train() -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/double_dqn.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import time 4 | 5 | import numpy as np 6 | 7 | import keras 8 | from keras.models import Sequential,Input,Model 9 | from keras.layers import Dense, Dropout, Flatten 10 | from keras.layers import Conv2D, MaxPooling2D 11 | # from keras.layers.normalization import BatchNormalization 12 | from keras.layers.advanced_activations import LeakyReLU 13 | from keras.optimizers import Adam 14 | 15 | # ROS packages required 16 | import rospy 17 | import rospkg 18 | 19 | from utils import Memory, H5Buffer 20 | 21 | class Agent(): 22 | def __init__(self, state_size, action_size, buffer_max_size, chunk_size, add_flipped, always_explore=False): 23 | file_name = 'double_dqn'+'_'+str(state_size[2])+'f' 24 | if add_flipped: 25 | file_name+='_flip' 26 | rospack = rospkg.RosPack() 27 | 28 | self.chunk_size = chunk_size 29 | self.add_flipped = add_flipped 30 | self.always_explore = always_explore 31 | self.working_dir = rospack.get_path('neuroracer_gym_rl') 32 | self.weight_backup = os.path.join(self.working_dir, file_name+'.h5') 33 | 34 | self.state_size = state_size 35 | self.action_size = action_size 36 | self.buffer = H5Buffer(state_size, buffer_max_size) 37 | self.learning_rate = 0.001 38 | self.gamma = 0.9 39 | self.exploration_rate = 0.85 40 | self.exploration_min = 0.01 41 | self.exploration_decay = 0.99 42 | self.model = self._build_model() 43 | self.target_model = self._build_model() 44 | self.training_count = 0 45 | 46 | 47 | def _build_model(self): 48 | 49 | model = Sequential() 50 | model.add(Conv2D(16, kernel_size=(3, 3), strides=(1, 1), input_shape=self.state_size,padding='same')) 51 | model.add(LeakyReLU(alpha=0.1)) 52 | model.add(MaxPooling2D((2, 2),padding='same')) 53 | model.add(Dropout(0.25)) 54 | 55 | model.add(Conv2D(32,kernel_size=(3, 3), strides=(1, 1),padding='same')) 56 | model.add(LeakyReLU(alpha=0.1)) 57 | model.add(Dropout(0.25)) 58 | 59 | model.add(Conv2D(64, kernel_size=(3, 3), strides=(2, 2), padding='same')) 60 | model.add(LeakyReLU(alpha=0.1)) 61 | model.add(Dropout(0.25)) 62 | 63 | model.add(Flatten()) 64 | 65 | model.add(Dense(256)) 66 | model.add(LeakyReLU(alpha=0.1)) 67 | model.add(Dropout(0.25)) 68 | 69 | model.add(Dense(128)) 70 | model.add(LeakyReLU(alpha=0.1)) 71 | model.add(Dropout(0.1)) 72 | 73 | model.add(Dense(self.action_size, activation='linear')) 74 | 75 | model.compile(loss='mse', optimizer=Adam(lr=self.learning_rate), metrics=['accuracy']) 76 | model.summary() 77 | 78 | if os.path.isfile(self.weight_backup): 79 | model.load_weights(self.weight_backup) 80 | if not self.always_explore: 81 | self.exploration_rate = self.exploration_min 82 | 83 | return model 84 | 85 | # def to_grayscale(self, img): 86 | # return np.mean(img, axis=2).astype(np.uint8) 87 | 88 | # def downsample(self, img): 89 | # return img[::2, ::2] 90 | 91 | 92 | 93 | def save_model(self): 94 | rospy.loginfo("Model saved") 95 | self.model.save(self.weight_backup) 96 | 97 | def act(self, state): 98 | if np.random.rand() <= self.exploration_rate: 99 | return random.randrange(self.action_size) 100 | act_values = self.model.predict(state) 101 | return np.argmax(act_values[0]) 102 | 103 | def flip(self, actions, states, next_states, rewards, not_done): 104 | actions_flipped = 2-actions 105 | states_flipped = np.flip(states, axis=2) 106 | next_states_flipped = np.flip(next_states, axis=2) 107 | rewards_flipped = np.copy(rewards) 108 | 109 | next_pred_flipped = self.target_model.predict(next_states_flipped[not_done]).max(axis=1) 110 | rewards_flipped[not_done]+= self.gamma * next_pred_flipped 111 | targets_flipped = self.model.predict(states_flipped) 112 | targets_flipped[np.arange(len(actions_flipped)), actions_flipped] = rewards_flipped 113 | 114 | return states_flipped, targets_flipped 115 | 116 | def replay(self, new_data): 117 | rospy.loginfo("Replaying..."), 118 | 119 | self.buffer.extend(new_data) 120 | buffer_length = self.buffer.length() 121 | 122 | chunks = buffer_length / self.chunk_size 123 | 124 | chunk_n = 2 125 | if chunks < 2: 126 | chunk_n = 1 127 | chunks=1 128 | print('buffer length', buffer_length) 129 | print('chunks', chunks) 130 | 131 | for i in np.random.choice(range(chunks), chunk_n, False): 132 | print('fitting', i) 133 | start_idx = i * self.chunk_size 134 | end_idx = start_idx + self.chunk_size 135 | 136 | loading_time = time.time() 137 | actions, states, next_states, rewards, terminates = self.buffer.sample(start_idx, end_idx) 138 | print('loading {} samples time: {}'.format(self.chunk_size, time.time()-loading_time)) 139 | 140 | not_done = np.invert(terminates) 141 | rewards_new = np.copy(rewards) 142 | 143 | tmp_pred = self.target_model.predict(next_states[not_done], batch_size=1000) 144 | 145 | next_pred = tmp_pred.max(axis=1) 146 | rewards_new[not_done]+= self.gamma * next_pred 147 | targets = self.model.predict(states) 148 | targets[np.arange(len(actions)), actions] = rewards_new 149 | 150 | if self.add_flipped: 151 | states_flipped, targets_flipped = self.flip(actions, states, next_states, rewards, not_done) 152 | states = np.concatenate((states,states_flipped)) 153 | targets = np.concatenate((targets,targets_flipped)) 154 | fit_time = time.time() 155 | self.model.fit(states, targets, shuffle=True, batch_size=1000, epochs=1, verbose=0) 156 | print('fit time:', time.time()-fit_time) 157 | 158 | if self.training_count == 0 or self.training_count % 10 == 0: 159 | print('Updating weights') 160 | self.target_model.set_weights(self.model.get_weights()) 161 | self.training_count+=1 162 | 163 | if self.exploration_rate > self.exploration_min: 164 | self.exploration_rate *= self.exploration_decay 165 | 166 | self.save_model() 167 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/double_drqn.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import time 4 | 5 | import numpy as np 6 | 7 | import keras 8 | from keras.models import Sequential,Input,Model 9 | from keras.layers import Dense, Dropout, Flatten 10 | from keras.layers import Conv2D, MaxPooling2D 11 | from keras.layers.wrappers import TimeDistributed 12 | from keras.layers.recurrent import LSTM 13 | # from keras.layers.normalization import BatchNormalization 14 | from keras.layers.advanced_activations import LeakyReLU 15 | from keras.optimizers import Adam 16 | 17 | # ROS packages required 18 | import rospy 19 | import rospkg 20 | 21 | from utils import Memory, H5Buffer 22 | 23 | class Agent(): 24 | def __init__(self, state_size, action_size, buffer_max_size, chunk_size, add_flipped, always_explore=False): 25 | file_name = 'double_drqn'+'_'+str(state_size[2])+'f' 26 | if add_flipped: 27 | file_name+='_flip' 28 | rospack = rospkg.RosPack() 29 | 30 | self.model_batch = 128 31 | 32 | self.chunk_size = chunk_size 33 | self.add_flipped = add_flipped 34 | self.always_explore = always_explore 35 | self.working_dir = rospack.get_path('neuroracer_gym_rl') 36 | self.weight_backup = os.path.join(self.working_dir, file_name+'.h5') 37 | 38 | self.state_size = (state_size[2], state_size[0], state_size[1], 1) 39 | self.action_size = action_size 40 | self.buffer = H5Buffer(state_size, buffer_max_size) 41 | self.learning_rate = 0.001 42 | self.gamma = 0.9 43 | self.exploration_rate = 0.85 44 | self.exploration_min = 0.01 45 | self.exploration_decay = 0.99 46 | self.model = self._build_model() 47 | self.target_model = self._build_model() 48 | self.training_count = 0 49 | 50 | 51 | def _build_model(self): 52 | 53 | model = Sequential() 54 | model.add(TimeDistributed(Conv2D(16, kernel_size=3, strides=1, padding='same'), input_shape=self.state_size)) 55 | model.add(TimeDistributed(LeakyReLU(alpha=0.1))) 56 | model.add(TimeDistributed(MaxPooling2D((2, 2),padding='same'))) 57 | model.add(TimeDistributed(Dropout(0.25))) 58 | 59 | model.add(TimeDistributed(Conv2D (32, kernel_size=3, strides=1, padding='same'))) 60 | model.add(TimeDistributed(LeakyReLU(alpha=0.1))) 61 | model.add(TimeDistributed(Dropout(0.25))) 62 | 63 | model.add(TimeDistributed(Conv2D (64, kernel_size=3, strides=2, padding='same'))) 64 | model.add(TimeDistributed(LeakyReLU(alpha=0.1))) 65 | model.add(TimeDistributed(Dropout(0.25))) 66 | 67 | model.add(TimeDistributed(Flatten())) 68 | 69 | # Use all traces for training 70 | #model.add(LSTM(512, return_sequences=True, activation='tanh')) 71 | #model.add(TimeDistributed(Dense(output_dim=action_size, activation='linear'))) 72 | 73 | # Use last trace for training 74 | model.add(LSTM(512, activation='tanh')) 75 | model.add(Dense(output_dim=self.action_size, activation='linear')) 76 | 77 | adam = Adam(lr=self.learning_rate) 78 | model.compile(loss='mse',optimizer=adam) 79 | 80 | if os.path.isfile(self.weight_backup): 81 | model.load_weights(self.weight_backup) 82 | if not self.always_explore: 83 | self.exploration_rate = self.exploration_min 84 | 85 | return model 86 | 87 | 88 | def save_model(self): 89 | rospy.loginfo("Model saved") 90 | self.model.save(self.weight_backup) 91 | 92 | def act(self, state): 93 | if np.random.rand() <= self.exploration_rate: 94 | return random.randrange(self.action_size) 95 | 96 | state = np.expand_dims(np.moveaxis(state, -1, 1), axis=-1) 97 | 98 | act_values = self.model.predict(state) 99 | return np.argmax(act_values[0]) 100 | 101 | def flip(self, actions, states, next_states, rewards, not_done): 102 | actions_flipped = 2-actions 103 | states_flipped = np.flip(states, axis=2) 104 | next_states_flipped = np.flip(next_states, axis=2) 105 | rewards_flipped = np.copy(rewards) 106 | 107 | next_pred_flipped = self.target_model.predict(next_states_flipped[not_done]).max(axis=1) 108 | rewards_flipped[not_done]+= self.gamma * next_pred_flipped 109 | targets_flipped = self.model.predict(states_flipped) 110 | targets_flipped[np.arange(len(actions_flipped)), actions_flipped] = rewards_flipped 111 | 112 | return states_flipped, targets_flipped 113 | 114 | def replay(self, new_data): 115 | rospy.loginfo("Replaying..."), 116 | 117 | self.buffer.extend(new_data) 118 | buffer_length = self.buffer.length() 119 | 120 | chunks = buffer_length / self.chunk_size 121 | 122 | chunk_n = 2 123 | if chunks < 2: 124 | chunk_n = 1 125 | chunks=1 126 | 127 | print('buffer length', buffer_length) 128 | print('chunks', chunks) 129 | 130 | for i in np.random.choice(range(chunks), chunk_n, False): 131 | print('fitting', i) 132 | start_idx = i * self.chunk_size 133 | end_idx = start_idx + self.chunk_size 134 | 135 | loading_time = time.time() 136 | actions, states, next_states, rewards, terminates = self.buffer.sample(start_idx, end_idx) 137 | print('loading {} samples time: {}'.format(self.chunk_size, time.time()-loading_time)) 138 | 139 | states = np.expand_dims(np.moveaxis(states, -1, 1), axis=-1) 140 | next_states = np.expand_dims(np.moveaxis(next_states, -1, 1), axis=-1) 141 | print('================== shapes:') 142 | print(states.shape,next_states.shape) 143 | 144 | not_done = np.invert(terminates) 145 | rewards_new = np.copy(rewards) 146 | 147 | tmp_pred = self.target_model.predict(next_states[not_done], batch_size=self.model_batch) 148 | 149 | next_pred = tmp_pred.max(axis=1) 150 | rewards_new[not_done]+= self.gamma * next_pred 151 | targets = self.model.predict(states) 152 | targets[np.arange(len(actions)), actions] = rewards_new 153 | 154 | if self.add_flipped: 155 | states_flipped, targets_flipped = self.flip(actions, states, next_states, rewards, not_done) 156 | states = np.concatenate((states,states_flipped)) 157 | targets = np.concatenate((targets,targets_flipped)) 158 | 159 | fit_time = time.time() 160 | self.model.fit(states, targets, shuffle=True, batch_size=self.model_batch, epochs=1, verbose=0) 161 | print('fit time:', time.time()-fit_time) 162 | 163 | if self.training_count == 0 or self.training_count % 10 == 0: 164 | print('Updating weights') 165 | self.target_model.set_weights(self.model.get_weights()) 166 | self.training_count+=1 167 | 168 | if self.exploration_rate > self.exploration_min: 169 | self.exploration_rate *= self.exploration_decay 170 | 171 | self.save_model() 172 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/dqn.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import time 4 | 5 | import numpy as np 6 | 7 | import keras 8 | from keras.models import Sequential,Input,Model 9 | from keras.layers import Dense, Dropout, Flatten 10 | from keras.layers import Conv2D, MaxPooling2D 11 | # from keras.layers.normalization import BatchNormalization 12 | from keras.layers.advanced_activations import LeakyReLU 13 | from keras.optimizers import Adam 14 | 15 | # ROS packages required 16 | import rospy 17 | import rospkg 18 | 19 | from utils import Memory, H5Buffer 20 | 21 | class Agent(): 22 | def __init__(self, state_size, action_size, buffer_max_size, chunk_size, add_flipped, always_explore=False): 23 | file_name = 'dqn'+'_'+str(state_size[2])+'f' 24 | if add_flipped: 25 | file_name+='_flip' 26 | rospack = rospkg.RosPack() 27 | 28 | self.chunk_size = chunk_size 29 | self.add_flipped = add_flipped 30 | self.always_explore = always_explore 31 | self.working_dir = rospack.get_path('neuroracer_gym_rl') 32 | self.weight_backup = os.path.join(self.working_dir, file_name+'.h5') 33 | 34 | self.state_size = state_size 35 | self.action_size = action_size 36 | self.buffer = H5Buffer(state_size, buffer_max_size) 37 | self.learning_rate = 0.001 38 | self.gamma = 0.9 39 | self.exploration_rate = 0.85 40 | self.exploration_min = 0.01 41 | self.exploration_decay = 0.99 42 | self.model = self._build_model() 43 | 44 | 45 | def _build_model(self): 46 | 47 | model = Sequential() 48 | model.add(Conv2D(16, kernel_size=(3, 3), strides=(1, 1), input_shape=self.state_size,padding='same')) 49 | model.add(LeakyReLU(alpha=0.1)) 50 | model.add(MaxPooling2D((2, 2),padding='same')) 51 | model.add(Dropout(0.25)) 52 | 53 | model.add(Conv2D(32,kernel_size=(3, 3), strides=(1, 1),padding='same')) 54 | model.add(LeakyReLU(alpha=0.1)) 55 | model.add(Dropout(0.25)) 56 | 57 | model.add(Conv2D(64, kernel_size=(3, 3), strides=(2, 2), padding='same')) 58 | model.add(LeakyReLU(alpha=0.1)) 59 | model.add(Dropout(0.25)) 60 | 61 | model.add(Flatten()) 62 | 63 | model.add(Dense(256)) 64 | model.add(LeakyReLU(alpha=0.1)) 65 | model.add(Dropout(0.25)) 66 | 67 | model.add(Dense(128)) 68 | model.add(LeakyReLU(alpha=0.1)) 69 | model.add(Dropout(0.1)) 70 | 71 | model.add(Dense(self.action_size, activation='linear')) 72 | 73 | model.compile(loss='mse', optimizer=Adam(lr=self.learning_rate), metrics=['accuracy']) 74 | model.summary() 75 | 76 | if os.path.isfile(self.weight_backup): 77 | model.load_weights(self.weight_backup) 78 | if not self.always_explore: 79 | self.exploration_rate = self.exploration_min 80 | 81 | return model 82 | 83 | # def to_grayscale(self, img): 84 | # return np.mean(img, axis=2).astype(np.uint8) 85 | 86 | # def downsample(self, img): 87 | # return img[::2, ::2] 88 | 89 | 90 | 91 | def save_model(self): 92 | rospy.loginfo("Model saved") 93 | self.model.save(self.weight_backup) 94 | 95 | def act(self, state): 96 | if np.random.rand() <= self.exploration_rate: 97 | return random.randrange(self.action_size) 98 | act_values = self.model.predict(state) 99 | return np.argmax(act_values[0]) 100 | 101 | def flip(self, actions, states, next_states, rewards, not_done): 102 | actions_flipped = 2-actions 103 | states_flipped = np.flip(states, axis=2) 104 | next_states_flipped = np.flip(next_states, axis=2) 105 | rewards_flipped = np.copy(rewards) 106 | 107 | next_pred_flipped = self.model.predict(next_states_flipped[not_done]).max(axis=1) 108 | rewards_flipped[not_done]+= self.gamma * next_pred_flipped 109 | targets_flipped = self.model.predict(states_flipped) 110 | targets_flipped[np.arange(len(actions_flipped)), actions_flipped] = rewards_flipped 111 | 112 | return states_flipped, targets_flipped 113 | 114 | def replay(self, new_data): 115 | rospy.loginfo("Replaying..."), 116 | 117 | self.buffer.extend(new_data) 118 | buffer_length = self.buffer.length() 119 | 120 | chunks = buffer_length / self.chunk_size 121 | 122 | chunk_n = 2 123 | if chunks < 2: 124 | chunk_n = 1 125 | chunks=1 126 | print('buffer length', buffer_length) 127 | print('chunks', chunks) 128 | 129 | for i in np.random.choice(range(chunks), chunk_n, False): 130 | print('fitting', i) 131 | start_idx = i * self.chunk_size 132 | end_idx = start_idx + self.chunk_size 133 | 134 | loading_time = time.time() 135 | actions, states, next_states, rewards, terminates = self.buffer.sample(start_idx, end_idx) 136 | print('loading {} samples time: {}'.format(self.chunk_size, time.time()-loading_time)) 137 | 138 | not_done = np.invert(terminates) 139 | rewards_new = np.copy(rewards) 140 | 141 | tmp_pred = self.model.predict(next_states[not_done], batch_size=1000) 142 | 143 | next_pred = tmp_pred.max(axis=1) 144 | rewards_new[not_done]+= self.gamma * next_pred 145 | targets = self.model.predict(states) 146 | targets[np.arange(len(actions)), actions] = rewards_new 147 | 148 | if self.add_flipped: 149 | states_flipped, targets_flipped = self.flip(actions, states, next_states, rewards, not_done) 150 | states = np.concatenate((states,states_flipped)) 151 | targets = np.concatenate((targets,targets_flipped)) 152 | fit_time = time.time() 153 | self.model.fit(states, targets, shuffle=True, batch_size=1000, epochs=1, verbose=0) 154 | print('fit time:', time.time()-fit_time) 155 | 156 | if self.exploration_rate > self.exploration_min: 157 | self.exploration_rate *= self.exploration_decay 158 | 159 | self.save_model() 160 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/drqn.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import time 4 | 5 | import numpy as np 6 | 7 | import keras 8 | from keras.models import Sequential,Input,Model 9 | from keras.layers import Dense, Dropout, Flatten 10 | from keras.layers import Conv2D, MaxPooling2D 11 | from keras.layers.wrappers import TimeDistributed 12 | from keras.layers.recurrent import LSTM 13 | # from keras.layers.normalization import BatchNormalization 14 | from keras.layers.advanced_activations import LeakyReLU 15 | from keras.optimizers import Adam 16 | 17 | # ROS packages required 18 | import rospy 19 | import rospkg 20 | 21 | from utils import Memory, H5Buffer 22 | 23 | class Agent(): 24 | def __init__(self, state_size, action_size, buffer_max_size, chunk_size, add_flipped, always_explore=False): 25 | file_name = 'drqn'+'_'+str(state_size[2])+'f' 26 | if add_flipped: 27 | file_name+='_flip' 28 | rospack = rospkg.RosPack() 29 | 30 | self.model_batch = 128 31 | 32 | self.chunk_size = chunk_size 33 | self.add_flipped = add_flipped 34 | self.always_explore = always_explore 35 | self.working_dir = rospack.get_path('neuroracer_gym_rl') 36 | self.weight_backup = os.path.join(self.working_dir, file_name+'.h5') 37 | 38 | self.state_size = (state_size[2], state_size[0], state_size[1], 1) 39 | self.action_size = action_size 40 | self.buffer = H5Buffer(state_size, buffer_max_size) 41 | self.learning_rate = 0.001 42 | self.gamma = 0.9 43 | self.exploration_rate = 0.85 44 | self.exploration_min = 0.01 45 | self.exploration_decay = 0.99 46 | self.model = self._build_model() 47 | 48 | 49 | def _build_model(self): 50 | 51 | model = Sequential() 52 | model.add(TimeDistributed(Conv2D(16, kernel_size=3, strides=1, padding='same'), input_shape=self.state_size)) 53 | model.add(TimeDistributed(LeakyReLU(alpha=0.1))) 54 | model.add(TimeDistributed(MaxPooling2D((2, 2),padding='same'))) 55 | model.add(TimeDistributed(Dropout(0.25))) 56 | 57 | model.add(TimeDistributed(Conv2D (32, kernel_size=3, strides=1, padding='same'))) 58 | model.add(TimeDistributed(LeakyReLU(alpha=0.1))) 59 | model.add(TimeDistributed(Dropout(0.25))) 60 | 61 | model.add(TimeDistributed(Conv2D (64, kernel_size=3, strides=2, padding='same'))) 62 | model.add(TimeDistributed(LeakyReLU(alpha=0.1))) 63 | model.add(TimeDistributed(Dropout(0.25))) 64 | 65 | model.add(TimeDistributed(Flatten())) 66 | 67 | # Use all traces for training 68 | #model.add(LSTM(512, return_sequences=True, activation='tanh')) 69 | #model.add(TimeDistributed(Dense(output_dim=action_size, activation='linear'))) 70 | 71 | # Use last trace for training 72 | model.add(LSTM(512, activation='tanh')) 73 | model.add(Dense(output_dim=self.action_size, activation='linear')) 74 | 75 | adam = Adam(lr=self.learning_rate) 76 | model.compile(loss='mse',optimizer=adam) 77 | 78 | if os.path.isfile(self.weight_backup): 79 | model.load_weights(self.weight_backup) 80 | if not self.always_explore: 81 | self.exploration_rate = self.exploration_min 82 | 83 | return model 84 | 85 | 86 | def save_model(self): 87 | rospy.loginfo("Model saved") 88 | self.model.save(self.weight_backup) 89 | 90 | def act(self, state): 91 | if np.random.rand() <= self.exploration_rate: 92 | return random.randrange(self.action_size) 93 | 94 | state = np.expand_dims(np.moveaxis(state, -1, 1), axis=-1) 95 | 96 | act_values = self.model.predict(state) 97 | return np.argmax(act_values[0]) 98 | 99 | def flip(self, actions, states, next_states, rewards, not_done): 100 | actions_flipped = 2-actions 101 | states_flipped = np.flip(states, axis=2) 102 | next_states_flipped = np.flip(next_states, axis=2) 103 | rewards_flipped = np.copy(rewards) 104 | 105 | next_pred_flipped = self.model.predict(next_states_flipped[not_done]).max(axis=1) 106 | rewards_flipped[not_done]+= self.gamma * next_pred_flipped 107 | targets_flipped = self.model.predict(states_flipped) 108 | targets_flipped[np.arange(len(actions_flipped)), actions_flipped] = rewards_flipped 109 | 110 | return states_flipped, targets_flipped 111 | 112 | def replay(self, new_data): 113 | rospy.loginfo("Replaying..."), 114 | 115 | self.buffer.extend(new_data) 116 | buffer_length = self.buffer.length() 117 | 118 | chunks = buffer_length / self.chunk_size 119 | 120 | chunk_n = 2 121 | if chunks < 2: 122 | chunk_n = 1 123 | chunks=1 124 | 125 | print('buffer length', buffer_length) 126 | print('chunks', chunks) 127 | 128 | for i in np.random.choice(range(chunks), chunk_n, False): 129 | print('fitting', i) 130 | start_idx = i * self.chunk_size 131 | end_idx = start_idx + self.chunk_size 132 | 133 | loading_time = time.time() 134 | actions, states, next_states, rewards, terminates = self.buffer.sample(start_idx, end_idx) 135 | print('loading {} samples time: {}'.format(self.chunk_size, time.time()-loading_time)) 136 | 137 | states = np.expand_dims(np.moveaxis(states, -1, 1), axis=-1) 138 | next_states = np.expand_dims(np.moveaxis(next_states, -1, 1), axis=-1) 139 | print('================== shapes:') 140 | print(states.shape,next_states.shape) 141 | 142 | not_done = np.invert(terminates) 143 | rewards_new = np.copy(rewards) 144 | 145 | tmp_pred = self.model.predict(next_states[not_done], batch_size=self.model_batch) 146 | 147 | next_pred = tmp_pred.max(axis=1) 148 | rewards_new[not_done]+= self.gamma * next_pred 149 | targets = self.model.predict(states) 150 | targets[np.arange(len(actions)), actions] = rewards_new 151 | 152 | if self.add_flipped: 153 | states_flipped, targets_flipped = self.flip(actions, states, next_states, rewards, not_done) 154 | states = np.concatenate((states,states_flipped)) 155 | targets = np.concatenate((targets,targets_flipped)) 156 | 157 | fit_time = time.time() 158 | self.model.fit(states, targets, shuffle=True, batch_size=self.model_batch, epochs=1, verbose=0) 159 | print('fit time:', time.time()-fit_time) 160 | 161 | if self.exploration_rate > self.exploration_min: 162 | self.exploration_rate *= self.exploration_decay 163 | 164 | self.save_model() 165 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/neuroracer_discrete.py: -------------------------------------------------------------------------------- 1 | from collections import deque 2 | import time 3 | 4 | import numpy as np 5 | import gym 6 | 7 | import rospy 8 | 9 | from neuroracer_gym.tasks import neuroracer_discrete_task 10 | from utils import preprocess, Memory 11 | 12 | class NeuroRacer: 13 | def __init__(self, agent_class, sample_batch_size, n_frames, buffer_max_size, chunk_size, add_flipped, always_explore=False): 14 | self.sample_batch_size = sample_batch_size 15 | self.env = gym.make('NeuroRacer-v0') 16 | 17 | self.highest_reward = -np.inf 18 | 19 | self.n_frames = n_frames 20 | 21 | self.img_y_offset = 200 22 | self.img_y_scale = 0.2 23 | self.img_x_scale = 0.2 24 | 25 | state_size = self.env.observation_space.shape 26 | self.state_size = (int((state_size[0]-self.img_y_offset)*self.img_y_scale), int(state_size[1]*self.img_x_scale), n_frames) 27 | rospy.loginfo("State size") 28 | rospy.loginfo(self.state_size) 29 | 30 | self.action_size = self.env.action_space.n 31 | self.agent = agent_class(self.state_size, self.action_size, buffer_max_size, chunk_size, add_flipped, always_explore=always_explore) 32 | 33 | 34 | def format_time(self, t): 35 | m, s = divmod(int(time.time() - t), 60) 36 | h, m = divmod(m, 60) 37 | return "%d:%02d:%02d" % (h, m, s) 38 | 39 | 40 | 41 | def run(self): 42 | total_time = time.time() 43 | steps = 0 44 | 45 | try: 46 | save_interval = 0 47 | 48 | memory = Memory() 49 | 50 | do_training = True 51 | 52 | while do_training: 53 | episode_time = time.time() 54 | 55 | self.env.initial_position = {'p_x': np.random.uniform(1,4), 'p_y': 3.7, 'p_z': 0.05, 'o_x': 0, 'o_y': 0.0, 'o_z': np.random.uniform(0.4,1), 'o_w': 0.855} 56 | state = self.env.reset() 57 | state = preprocess(state, self.img_y_offset, self.img_x_scale, self.img_y_scale) 58 | # state = np.expand_dims(state, axis=0) 59 | 60 | done = False 61 | cumulated_reward = 0 62 | 63 | stacked_states = deque(maxlen=self.n_frames) 64 | stacked_next_states = deque(maxlen=self.n_frames) 65 | for i in range(self.n_frames): 66 | stacked_states.append(state) 67 | stacked_next_states.append(state) 68 | 69 | episode_steps = 0 70 | while not done: 71 | steps+=1 72 | episode_steps+=1 73 | 74 | action = self.agent.act(np.expand_dims(np.stack(stacked_states, axis=2), axis=0)) 75 | 76 | next_state, reward, done, _ = self.env.step(action) 77 | next_state = preprocess(next_state, self.img_y_offset, self.img_x_scale, self.img_y_scale) 78 | stacked_next_states.append(next_state) 79 | 80 | memory.append(action,np.stack(stacked_states, axis=2), np.stack(stacked_next_states, axis=2), reward, done) 81 | stacked_states.append(next_state) 82 | 83 | cumulated_reward += reward 84 | 85 | save_interval+=1 86 | if save_interval >= self.sample_batch_size: 87 | save_interval = 0 88 | replay_time = time.time() 89 | self.agent.replay(memory) 90 | rospy.loginfo("Replay time {}".format(time.time()-replay_time)) 91 | if steps >= self.sample_batch_size*200: 92 | do_training = False 93 | memory = Memory() 94 | 95 | 96 | if self.highest_reward < cumulated_reward: 97 | self.highest_reward = cumulated_reward 98 | 99 | rospy.loginfo("total episode_steps {}, reward {}/{}".format(episode_steps, cumulated_reward, self.highest_reward)) 100 | rospy.loginfo("Episode time {}, total {}".format(self.format_time(episode_time), 101 | self.format_time(total_time))) 102 | rospy.loginfo("exploration_rate {}".format(self.agent.exploration_rate)) 103 | 104 | finally: 105 | self.env.close() 106 | rospy.loginfo("Total time: {}".format(self.format_time(total_time))) 107 | rospy.loginfo("Total steps: {}".format(steps)) 108 | -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/training.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import rospy 4 | 5 | from neuroracer_discrete import NeuroRacer 6 | 7 | if __name__ == '__main__': 8 | 9 | rospy.init_node('neuroracer_qlearn', anonymous=True, log_level=rospy.INFO) 10 | 11 | always_explore = rospy.get_param('/neuroracer_gym/always_explore') 12 | agent_name = rospy.get_param('/neuroracer_gym/agent_name') 13 | 14 | module = __import__(agent_name) 15 | agent_class = getattr(module, 'Agent') 16 | 17 | game = NeuroRacer(agent_class, \ 18 | sample_batch_size=1000, \ 19 | n_frames=16, \ 20 | buffer_max_size=100000, \ 21 | chunk_size=20000, \ 22 | add_flipped=False) 23 | rospy.logwarn("Gym environment done. always_explore = " + str(always_explore)) 24 | rospy.logwarn("Agent is " + agent_name) 25 | 26 | # Set the logging system 27 | # rospack = rospkg.RosPack() 28 | # pkg_path = rospack.get_path('demo') 29 | # outdir = pkg_path + '/training_results' 30 | # rospy.loginfo("Monitor Wrapper started") 31 | 32 | game.run() -------------------------------------------------------------------------------- /neuroracer_gym_rl/scripts/utils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import cv2 4 | import h5py 5 | import numpy as np 6 | from sklearn.utils import shuffle 7 | from collections import deque 8 | import os 9 | 10 | def preprocess(img, y_offset, x_scale, y_scale, interpolation=cv2.INTER_LINEAR): 11 | return cv2.resize(cv2.cvtColor(img[y_offset:,:], cv2.COLOR_RGB2GRAY), None, fx=x_scale, fy=y_scale, interpolation=interpolation)/255.0 12 | 13 | 14 | class H5Buffer(): 15 | def __init__(self, state_shape, maxlen): 16 | self.maxlen = maxlen 17 | self.current_idx = 0 18 | 19 | self.file = h5py.File("buffer.hdf5", "w") 20 | 21 | self.actions = self.file.create_dataset('actions', (0,), maxshape=(maxlen,), dtype=np.ubyte) 22 | self.states = self.file.create_dataset('states', (0,)+state_shape, maxshape=(maxlen,)+state_shape, dtype=np.float32) 23 | self.next_states = self.file.create_dataset('next_states', (0,)+state_shape, maxshape=(maxlen,)+state_shape, dtype=np.float32) 24 | self.rewards = self.file.create_dataset('rewards', (0,), maxshape=(maxlen,), dtype=np.float32) 25 | self.terminates = self.file.create_dataset('terminates', (0,), maxshape=(maxlen,), dtype=np.bool) 26 | 27 | def append(self, actions, states, next_states, rewards, terminates): 28 | add_size = actions.shape[0] 29 | if self.actions.shape[0]= self.maxlen: 36 | add_idx-= end_idx - self.maxlen 37 | end_idx = self.maxlen 38 | 39 | self.actions[self.current_idx:end_idx] = actions[:add_idx] 40 | self.states[self.current_idx:end_idx] = states[:add_idx] 41 | self.next_states[self.current_idx:end_idx] = next_states[:add_idx] 42 | self.rewards[self.current_idx:end_idx] = rewards[:add_idx] 43 | self.terminates[self.current_idx:end_idx] = terminates[:add_idx] 44 | 45 | self.current_idx = end_idx 46 | if self.current_idx == self.maxlen: 47 | self.current_idx = 0 48 | if add_idx != add_size: 49 | self.append(actions[add_idx:], states[add_idx:], next_states[add_idx:], rewards[add_idx:], terminates[add_idx:]) 50 | 51 | def extend(self, obj): 52 | self.append(np.array(obj.action, dtype=np.ubyte), \ 53 | np.array(obj.state, dtype=np.float32), \ 54 | np.array(obj.next_state, dtype=np.float32), \ 55 | np.array(obj.reward, dtype=np.float32), \ 56 | np.array(obj.terminate, dtype=np.bool)) 57 | 58 | def _resize(self, current_size, add_size): 59 | new_size = current_size + add_size 60 | if new_size > self.maxlen: 61 | new_size = self.maxlen 62 | self.actions.resize(new_size, axis=0) 63 | self.states.resize(new_size, axis=0) 64 | self.next_states.resize(new_size, axis=0) 65 | self.rewards.resize(new_size, axis=0) 66 | self.terminates.resize(new_size, axis=0) 67 | 68 | def sample(self, start_idx, end_idx): 69 | # length = self.length() 70 | # if length <= n_samples: 71 | # return self.actions[:], \ 72 | # self.states[:], \ 73 | # self.next_state[:], \ 74 | # self.rewards[:], \ 75 | # self.terminates[:] 76 | 77 | # start_idx = np.random.randint(length-n_samples+1) 78 | # end_idx = start_idx+n_samples 79 | 80 | return self.actions[start_idx:end_idx], \ 81 | self.states[start_idx:end_idx], \ 82 | self.next_states[start_idx:end_idx], \ 83 | self.rewards[start_idx:end_idx], \ 84 | self.terminates[start_idx:end_idx] 85 | 86 | def length(self): 87 | return len(self.actions) 88 | 89 | def close(self): 90 | if self.file: 91 | self.file.close() 92 | os.remove('buffer.hdf5') 93 | self.file = None 94 | 95 | def __del__(self): 96 | self.close() 97 | 98 | class Memory(): 99 | def __init__(self, maxlen=None): 100 | self.action = deque(maxlen=maxlen) 101 | self.state = deque(maxlen=maxlen) 102 | self.next_state = deque(maxlen=maxlen) 103 | self.reward = deque(maxlen=maxlen) 104 | self.terminate = deque(maxlen=maxlen) 105 | 106 | def append(self, action, state, next_state, reward, terminate): 107 | self.action.append(action) 108 | self.state.append(state) 109 | self.next_state.append(next_state) 110 | self.reward.append(reward) 111 | self.terminate.append(terminate) 112 | 113 | def sample(self, n_samples=None): 114 | if not n_samples or len(self.action) <= n_samples: 115 | return np.array(self.action, dtype=np.int), np.array(self.state, dtype=np.float32), np.array(self.next_state, dtype=np.float32), np.array(self.reward, dtype=np.float32), np.array(self.terminate, dtype=np.bool) 116 | 117 | action, state, next_state, reward, terminate = shuffle(self.action, self.state, self.next_state, self.reward, self.terminate, n_samples=n_samples) 118 | return np.array(action, dtype=np.int), np.array(state, dtype=np.float32), np.array(next_state, dtype=np.float32), np.array(reward, dtype=np.float32), np.array(terminate, dtype=np.bool) 119 | 120 | def length(self): 121 | return len(self.action) 122 | 123 | def extend(self, obj): 124 | self.action.extend(obj.action) 125 | self.state.extend(obj.state) 126 | self.next_state.extend(obj.next_state) 127 | self.reward.extend(obj.reward) 128 | self.terminate.extend(obj.terminate) --------------------------------------------------------------------------------