├── LICENSE ├── README.md └── src ├── Dockerfile ├── docker.sh ├── kcs ├── batch_train.py ├── dqn_test.py ├── dqn_train.py ├── environment.py ├── hyperparams.py ├── kcs.py ├── saved_models │ └── model_012 │ │ ├── model_012.mat │ │ ├── model_012_ep_7000 │ │ ├── policy_specs.pbtxt │ │ ├── saved_model.pb │ │ └── variables │ │ │ ├── variables.data-00000-of-00001 │ │ │ └── variables.index │ │ └── parameters.txt └── test_wp_track.py ├── requirements_docker.txt ├── requirements_tf_260.txt ├── requirements_tf_291.txt └── run_docker.sh /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Marine Autonomous Robotic Systems (MARS) Laboratory 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation 2 | 3 | Achieves path following of an autonomous surface vessel (ASV) with a Deep Q-Network (DQN) agent. This code uses reinforcement learning to train an agent to control the rudder angle of the Kriso Container Ship (KCS) to achieve waypoint tracking the presence of calm waters and in the presence of winds 4 | 5 | # Why the project is useful? 6 | A majority of marine accidents that occur can be attributed to errors in human decisions. Through automation, the occurrence of such incidents can be minimized. Therefore, automation in the marine industry has been receiving increased attention in the recent years. This work investigates the automation of the path following action of a ship. A deep Q-learning approach is proposed to solve the path-following problem of a ship. This method comes under the broader area of deep reinforcement learning (DRL) and is well suited for such tasks, as it can learn to take optimal decisions through sufficient experience. This algorithm also balances the exploration and the exploitation schemes of an agent operating in an environment. A three-degree-of-freedom (3-DOF) dynamic model is adopted to describe the ship’s motion. The Krisco container ship (KCS) is chosen for this study as it is a benchmark hull that is used in several studies and its hydrodynamic coefficients are readily available for numerical modeling. Numerical simulations for the turning circle and zig-zag maneuver tests are performed to verify the accuracy of the proposed dynamic model. A reinforcement learning (RL) agent is trained to interact with this numerical model to achieve waypoint tracking. Finally, wind forces are modelled and the performance of the RL based controller is evaluated in the presence of wind. 7 | 8 | # What does the project do? 9 | This repository contains code for implementing reinforcement learning based control for the path following of autonomous ships. The code is trained in a docker container to maintain easy portability. 10 | 11 | # How to get started with the project? 12 | The project has been setup with a docker file that can be used to build the docker container in which the code will be executed. It is presumed that you have docker installed on your system. Please follow the following steps to get the code up and running. 13 | 14 | **Prerequisites:** 15 | 16 | 1. An Ubuntu OS (has been tested on Ubuntu 20.04). Note this will not work on Windows OS systems as x11 forwarding needs to be handled separately on it and also NVIDIA GPU support for docker containers as of now is not available on Windows. 17 | 2. Docker installed on your system ([Link](https://docs.docker.com/engine/install/ubuntu/)) 18 | 3. Complete the steps to use docker as a non-root user ([Link](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)) 19 | 4. NVIDIA driver is installed (for GPU use) - typing `nvidia-smi` on a terminal should tell you if you have the NVIDIA drivers or not (If not follow instructions at this [Link](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html)) 20 | 5. NVIDIA Container Toolkit is installed to interface GPUs to docker containers ([Link](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit)) 21 | 22 | **Note:** Some of these links listed above may move over time. In such a case please find the appropriate links to find the instructions for installation. 23 | 24 | **Step 1:** 25 | 26 | Clone the repository to your system 27 | 28 | ```commandline 29 | git clone https://github.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation.git 30 | ``` 31 | 32 | and cd into the src folder. The prompt on the terminal should look as ```.../DQN-ASV-path-follow/src$``` 33 | 34 | **Step 2:** 35 | 36 | Change shell scripts to be executables 37 | ```commandline 38 | chmod +x docker.sh 39 | chmod +x run_docker.sh 40 | ``` 41 | 42 | Execute the script docker.sh to build the docker container 43 | 44 | ```commandline 45 | ./docker.sh 46 | ``` 47 | 48 | **Step 3:** 49 | 50 | Type the following command 51 | 52 | ```commandline 53 | xauth list 54 | ``` 55 | and you will see a response similar (you may see more lines on your computer) to 56 | 57 | ``` 58 | iit-m/unix: MIT-MAGIC-COOKIE-1 958745625f2yu22358u3ebe5cc4ad453 59 | #ffff#6969742d6d#: MIT-MAGIC-COOKIE-1 958745625f2yu22358u3ebe5cc4ad453 60 | ``` 61 | 62 | Copy the first line of the response that corresponds to your computer name (as seen after @ in the prompt on a terminal). So for ```user@iit-m$``` displayed on the prompt the computer name is ```iit-m```. 63 | 64 | **Step 4:** 65 | 66 | Run the docker container 67 | 68 | ```commandline 69 | ./run_docker.sh 70 | ``` 71 | 72 | Notice that this binds the current /src directory on the host machine to the /src directory on the container. Thus, the code can be edited on the host machine and the changes will be reflected in the container instantly. 73 | 74 | **Step 5:** 75 | 76 | Type the following command inside the container (Check that the terminal prompt reflects something like ```docker@iit-m:~/DQN-ASV-path-follow/src$``` with ```iit-m``` replaced by your computer name) 77 | 78 | ```commandline 79 | xauth add 80 | ``` 81 | where make sure to replace `````` with your cookie that you copied in step 3. Notice however, that a ```0``` must be added at the end of `````` 82 | 83 | ``` 84 | xauth add iit-m/unix:0 MIT-MAGIC-COOKIE-1 958745625f2yu22358u3ebe5cc4ad453 85 | ``` 86 | 87 | **Step 6:** 88 | Now the files should be executable inside the docker container 89 | 90 | # Important python scripts 91 | 92 | **kcs folder** 93 | 94 | ```environment.py``` has the python environment description for the path following of KCS. Edit this file to modify the dynamics of the vehicle and how it interacts with the environment. Currently this study uses the 3-DOF MMG model to mimic the dynamics of the KCS vessel in calm waters and in wind. To start a single training run, edit the training hyperparameters in ```hyperparams.py``` execute the file ```dqn_train.py``` inside the docker container. Before starting a training run, ensure that the the model and any plots are getting saved in the correct directories. To test various trajectories including single waypoint tracking, elliptical trajectory and other trajectories in calm water or in wind run ```dqn_test.py``` (make sure to load the correct model). If you want to sequentially train multiple training runs, then set the required hyperparameters in ```batch_train.py``` and execute ```batch_train.py``` inside the docker container. 95 | 96 | # Where can you get help with your project? 97 | Help on this project can be sought by emailing R S Sanjeev Kumar at `sanjeevrs2000@gmail.com` or Md Shadab Alam at `shaadalam.5u@gmail.com` or the supervisor Abhilash Somayajula at `abhilash@iitm.ac.in`. For any proposed updates to the codes please raise a pull request. 98 | 99 | # Who maintains and contributes to the project? 100 | This project is currently maintained by Marine Autonomous Vessels (MAV) Laboratory within the department of Ocean Engineering at IIT Madras. The project has originally been undertaken by Rohit Deraj, R S Sanjeev Kumar, Md Shadab Alam and has been supervised by Abhilash Somayajula. Contributions to the repository have been made by R S Sanjeev Kumar, Md Shadab Alam and Abhilash Somayajula. 101 | -------------------------------------------------------------------------------- /src/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM tensorflow/tensorflow:2.9.1-gpu 2 | 3 | LABEL maintainer="abhilash" 4 | LABEL version="2.0" 5 | LABEL description="Docker environment for DQN-ASV-path-follow" 6 | 7 | #RUN apt-get update 8 | RUN apt-get -y install software-properties-common libsm6 libxext6 libxrender1 libfontconfig1 libgl1 sudo xauth 9 | #RUN apt-get update 10 | 11 | RUN pip install --upgrade pip 12 | COPY ./requirements_tf_291.txt /requirements.txt 13 | RUN pip install -r requirements.txt 14 | 15 | ENV USERNAME docker 16 | RUN useradd -m $USERNAME && \ 17 | echo "$USERNAME:$USERNAME" | chpasswd && \ 18 | usermod --shell /bin/bash $USERNAME && \ 19 | usermod -aG sudo $USERNAME && \ 20 | echo "$USERNAME ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/$USERNAME && \ 21 | chmod 0440 /etc/sudoers.d/$USERNAME 22 | 23 | WORKDIR /home/docker 24 | USER docker 25 | 26 | CMD ["bash"] 27 | -------------------------------------------------------------------------------- /src/docker.sh: -------------------------------------------------------------------------------- 1 | docker build . -f Dockerfile -t abhilashiit/dqn_asv:2.0 2 | #docker push abhilashiit/dqn_asv:2.0 3 | -------------------------------------------------------------------------------- /src/kcs/batch_train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import hyperparams as params 3 | import dqn_train 4 | import time 5 | import tensorflow as tf 6 | 7 | max_batch_train = 1 8 | 9 | for batch_train in range(0, max_batch_train): 10 | params.model_name = 'model_' + '{:03}'.format(batch_train + 1) 11 | 12 | if batch_train < max_batch_train: 13 | params.duration = 160 14 | params.initial_learning_rate = 1e-3 15 | params.decay_steps = 5000 16 | params.decay_rate = 0.5 17 | params.fc_layer_params = (64, 64) 18 | params.discount_factor = 0.95 19 | params.target_update_tau = 0.01 20 | params.target_update_period = 1 21 | params.replay_buffer_max_length = 100000 22 | params.num_parallel_calls = 2 23 | params.sample_batch_size = 128 24 | params.num_steps = 2 25 | params.prefetch = 3 26 | params.max_episodes = 10001 27 | params.epsilon_greedy_episodes = 5000 28 | params.random_seed = 69574#np.random.randint(100000) 29 | params.DQN_update_time_steps = 20 30 | params.DQN_policy_store_frequency = 1000 31 | params.DQN_loss_avg_interval = 100 32 | 33 | tf.keras.utils.set_random_seed(params.random_seed) 34 | tf.config.experimental.enable_op_determinism() 35 | dqn_train.dqn_train(params) 36 | 37 | print(f'Execution Completed Successfully') 38 | 39 | 40 | -------------------------------------------------------------------------------- /src/kcs/dqn_test.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import test_wp_track as wpt 3 | 4 | model_name = 'model_001' 5 | 6 | def ellipse(flag=0): 7 | npoints = 12 8 | 9 | if flag == 0: 10 | # Starboard turning ellipse 11 | theta = np.linspace(0, 2 * np.pi, num=npoints, endpoint=True) 12 | theta_des = np.linspace(0, 2 * np.pi, num=1000) 13 | psi0 = np.pi / 2 14 | else: 15 | # Port turning ellipse 16 | theta = np.linspace(0, -2 * np.pi, num=npoints, endpoint=True) 17 | theta_des = np.linspace(0, -2 * np.pi, num=1000) 18 | psi0 = -np.pi / 2 19 | 20 | a_ellipse = 12 21 | b_ellipse = 12 22 | x_wp = a_ellipse * np.cos(theta) 23 | y_wp = b_ellipse * np.sin(theta) 24 | 25 | xdes = a_ellipse * np.cos(theta_des) 26 | ydes = b_ellipse * np.sin(theta_des) 27 | 28 | return npoints, x_wp, y_wp, psi0, xdes, ydes 29 | 30 | def straight(): 31 | npoints = 9 32 | psi0 = 0 33 | x_wp = np.linspace(0, 64, num=npoints, endpoint=True) 34 | y_wp = 0 * x_wp 35 | xdes = x_wp 36 | ydes = y_wp 37 | return npoints, x_wp, y_wp, psi0, xdes, ydes 38 | 39 | def eight(flag=0): 40 | npoints = 15 41 | psi0 = 0 42 | circ_dia = 9 43 | 44 | if flag == 0: 45 | theta1 = np.linspace(-0.5 * np.pi, 1.5 * np.pi, npoints - npoints // 2, endpoint=True) 46 | theta1_des = np.linspace(-0.5 * np.pi, 1.5 * np.pi, 1000) 47 | x1 = circ_dia * np.cos(theta1) 48 | y1 = circ_dia * np.sin(theta1) + circ_dia 49 | x1des = circ_dia * np.cos(theta1_des) 50 | y1des = circ_dia * np.sin(theta1_des) + circ_dia 51 | 52 | theta2 = np.linspace(0.5 * np.pi, 2.5 * np.pi, npoints // 2, endpoint=False) 53 | theta2_des = np.linspace(0.5 * np.pi, 2.5 * np.pi, 1000) 54 | x2 = circ_dia * np.cos(theta2) 55 | y2 = circ_dia * np.sin(theta2) - circ_dia 56 | x2des = circ_dia * np.cos(theta2_des) 57 | y2des = circ_dia * np.sin(theta2_des) - circ_dia 58 | 59 | x_wp = np.append(x1, x2[::-1]) 60 | y_wp = np.append(y1, y2[::-1]) 61 | 62 | xdes = np.append(x1des, x2des[::-1]) 63 | ydes = np.append(y1des, y2des[::-1]) 64 | 65 | else: 66 | theta1 = np.linspace(-0.5 * np.pi, 1.5 * np.pi, npoints - npoints // 2, endpoint=True) 67 | theta1_des = np.linspace(-0.5 * np.pi, 1.5 * np.pi, 1000) 68 | x1 = circ_dia * np.cos(theta1) 69 | y1 = circ_dia * np.sin(theta1) + circ_dia 70 | x1des = circ_dia * np.cos(theta1_des) 71 | y1des = circ_dia * np.sin(theta1_des) + circ_dia 72 | 73 | theta2 = np.linspace(2.5 * np.pi, 0.5 * np.pi, npoints // 2, endpoint=False) 74 | theta2_des = np.linspace(2.5 * np.pi, 0.5 * np.pi, 1000) 75 | x2 = circ_dia * np.cos(theta2) 76 | y2 = circ_dia * np.sin(theta2) - circ_dia 77 | x2des = circ_dia * np.cos(theta2_des) 78 | y2des = circ_dia * np.sin(theta2_des) - circ_dia 79 | 80 | x_wp = np.append(x2, x1) 81 | y_wp = np.append(y2, y1) 82 | xdes = np.append(x2des, x1des[::-1]) 83 | ydes = np.append(y2des, y1des[::-1]) 84 | 85 | return npoints, x_wp, y_wp, psi0, xdes, ydes 86 | 87 | def single_wp(quadrant=1, len=10): 88 | npoints = 2 89 | psi0 = 0 90 | 91 | if quadrant == 1: 92 | x_wp = np.array([0, len]) 93 | y_wp = np.array([0, len]) 94 | elif quadrant == 2: 95 | x_wp = np.array([0, -len]) 96 | y_wp = np.array([0, len]) 97 | elif quadrant == 3: 98 | x_wp = np.array([0, -len]) 99 | y_wp = np.array([0, -len]) 100 | elif quadrant == 4: 101 | x_wp = np.array([0, len]) 102 | y_wp = np.array([0, -len]) 103 | 104 | xdes = x_wp 105 | ydes = y_wp 106 | return npoints, x_wp, y_wp, psi0, xdes, ydes 107 | 108 | # No wind condition 109 | 110 | # Single waypoint tracking in four quadrants 111 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=1) 112 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_01', xdes=xdes, ydes=ydes) 113 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=2) 114 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_02', xdes=xdes, ydes=ydes) 115 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=3) 116 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_03', xdes=xdes, ydes=ydes) 117 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=4) 118 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_04', xdes=xdes, ydes=ydes) 119 | 120 | # Ellipse starboard turn 121 | # n, xwp, ywp, psi0, xdes, ydes = ellipse(flag=0) 122 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='ellipse_stbd', xdes=xdes, ydes=ydes) 123 | 124 | # Ellipse port turn 125 | n, xwp, ywp, psi0, xdes, ydes = ellipse(flag=1) 126 | wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='ellipse_port', xdes=xdes, ydes=ydes) 127 | 128 | # Straight line 129 | # n, xwp, ywp, psi0, xdes, ydes = straight() 130 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='straight', xdes=xdes, ydes=ydes) 131 | 132 | # # Eight bottom first 133 | # n, xwp, ywp, psi0, xdes, ydes = eight(flag=0) 134 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='eight_bottom', xdes=xdes, ydes=ydes) 135 | 136 | # # Eight top first 137 | # n, xwp, ywp, psi0, xdes, ydes = eight(flag=1) 138 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='eight_top', xdes=xdes, ydes=ydes) 139 | -------------------------------------------------------------------------------- /src/kcs/dqn_train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.io import savemat 3 | import matplotlib.pyplot as plt 4 | import tensorflow as tf 5 | import tf_agents 6 | from tf_agents.environments import tf_py_environment 7 | from tf_agents.specs import array_spec 8 | from tf_agents.agents.dqn import dqn_agent 9 | from tf_agents.networks import q_network 10 | from tf_agents.policies import policy_saver 11 | from tf_agents.replay_buffers import tf_uniform_replay_buffer 12 | from tf_agents.trajectories import trajectory 13 | from tf_agents.utils import common 14 | from tf_agents.trajectories import time_step as ts 15 | from tf_agents.environments import wrappers 16 | from tf_agents.policies import random_tf_policy 17 | from tensorflow.keras.layers import Dense, Input 18 | from tensorflow.keras.optimizers.schedules import ExponentialDecay, PolynomialDecay 19 | from environment import ship_environment 20 | import os 21 | import time 22 | 23 | # collecting data with random policy to populate replay buffer 24 | def collect_init_data(environment, policy, buffer): 25 | time_step = environment.reset() 26 | episode_return = 0 27 | while not np.equal(time_step.step_type, 2): 28 | 29 | action_step = policy.action(time_step) 30 | next_time_step = environment.step(action_step.action) 31 | episode_return += next_time_step.reward.numpy()[0] 32 | traj = trajectory.from_transition(time_step, action_step, next_time_step) 33 | buffer.add_batch(traj) 34 | time_step=next_time_step 35 | 36 | return episode_return 37 | 38 | # collect data through epsilon-greedy policy and train agent 39 | def collect_data(envr, policy, buffer, dataset, agent, ep_counter, params): 40 | time_step = envr.reset() 41 | episode_return = 0 42 | global timestep_counter 43 | 44 | # time_step.step_type: 0-> initial step,1->intermediate, 2-> terminal step 45 | while not np.equal(time_step.step_type, 2): 46 | 47 | action_step = policy.action(time_step) 48 | 49 | #using custom annealing epsilon-greedy policy to collect training data 50 | epsilon = params.epsilon_greedy(ep_counter) 51 | if np.random.random() < epsilon: 52 | action_no = np.random.randint(0, 3) 53 | action = tf.constant([action_no], shape=(1,), dtype=np.int64,name='action') 54 | action_step = tf_agents.trajectories.policy_step.PolicyStep(action) 55 | 56 | next_time_step = envr.step(action_step) 57 | 58 | traj = trajectory.from_transition(time_step, action_step, next_time_step) 59 | episode_return = next_time_step.reward.numpy()[0] + episode_return 60 | buffer.add_batch(traj) 61 | time_step = next_time_step 62 | 63 | #update q-network once every some steps 64 | if timestep_counter % params.DQN_update_time_steps == 0: 65 | iterator=iter(dataset) 66 | experience, unused_info = next(iterator) 67 | agent.train(experience) 68 | 69 | timestep_counter+=1 70 | print("EPISODE RETURN", episode_return) 71 | return episode_return, agent 72 | 73 | def dqn_train(params): 74 | mname = params.model_name 75 | if not os.path.exists('saved_models/' + mname): 76 | if not os.path.exists('saved_models'): 77 | os.mkdir('saved_models') 78 | os.mkdir('saved_models/' + mname) 79 | os.mkdir('saved_models/' + mname + '/plots') 80 | 81 | params.print_params('saved_models/' + mname) 82 | 83 | env = wrappers.TimeLimit(ship_environment(train_test_flag=0, wind_flag=params.wind_flag, 84 | wind_speed=params.wind_speed, wind_dir=params.wind_dir, 85 | wave_flag=params.wave_flag, wave_height=params.wave_height, 86 | wave_period=params.wave_period, wave_dir=params.wave_dir), 87 | duration=params.duration) 88 | tf_env = tf_py_environment.TFPyEnvironment(env) 89 | 90 | #learning rate scheduler to decay learning rate with time 91 | lr_schedule=ExponentialDecay(initial_learning_rate=params.initial_learning_rate, 92 | decay_steps=params.decay_steps, 93 | decay_rate=params.decay_rate) 94 | # lr_schedule = PolynomialDecay(initial_learning_rate=0.01, decay_steps=50000, end_learning_rate=0.001, power=5) 95 | 96 | optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule) 97 | # optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) 98 | 99 | q_net = q_network.QNetwork(tf_env.observation_spec(),tf_env.action_spec(), 100 | fc_layer_params=params.fc_layer_params, 101 | activation_fn=tf.keras.activations.tanh) 102 | 103 | train_step_counter = tf.Variable(0) 104 | 105 | agent = dqn_agent.DqnAgent( 106 | tf_env.time_step_spec(), 107 | tf_env.action_spec(), 108 | q_network=q_net, 109 | optimizer=optimizer, 110 | gamma=params.discount_factor, 111 | target_update_tau=params.target_update_tau, 112 | target_update_period=params.target_update_period, 113 | td_errors_loss_fn=common.element_wise_squared_loss, 114 | train_step_counter=train_step_counter) 115 | 116 | agent.initialize() 117 | 118 | replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer( 119 | data_spec=agent.collect_data_spec, 120 | batch_size=tf_env.batch_size, 121 | max_length=params.replay_buffer_max_length) 122 | 123 | random_policy = random_tf_policy.RandomTFPolicy(tf_env.time_step_spec(),tf_env.action_spec()) 124 | 125 | #collecting samples with random policy 126 | for __ in range(10): 127 | collect_init_data(tf_env, random_policy, replay_buffer) 128 | 129 | dataset = replay_buffer.as_dataset( 130 | num_parallel_calls=params.num_parallel_calls, 131 | sample_batch_size=params.sample_batch_size, 132 | num_steps=params.num_steps).prefetch(params.prefetch) 133 | 134 | # Reset the train step 135 | agent.train_step_counter.assign(0) 136 | 137 | ep_counter=0 138 | global timestep_counter 139 | timestep_counter=0 140 | 141 | episodes = params.max_episodes 142 | returns = [] 143 | train_loss = [] 144 | 145 | while ep_counter < episodes: 146 | 147 | ereturn, agent = collect_data(tf_env, agent.policy, replay_buffer, dataset, agent, ep_counter, params) 148 | returns.append(ereturn) 149 | 150 | # Sample a batch of data from the buffer and update the agent's network. 151 | iterator = iter(dataset) 152 | experience, unused_info = next(iterator) 153 | eloss = agent.train(experience).loss 154 | train_loss.append(eloss) 155 | 156 | if len(train_loss) < 110: 157 | print('episode = {0}, Loss = {1}'.format(ep_counter, eloss)) 158 | else: 159 | print('episode = {0}, Avg Loss (100 ep) = {1}'.format(ep_counter, 160 | (sum(train_loss[-100:-1]) + train_loss[-1]) / 100)) 161 | 162 | ep_counter +=1 163 | 164 | # saving agent's policy at intervals 165 | if ep_counter % params.DQN_policy_store_frequency == 0 and ep_counter >= 1: 166 | policy_dir = os.path.join('saved_models', mname, mname + '_ep_' + str(ep_counter)) 167 | tf_policy_saver = policy_saver.PolicySaver(agent.policy) 168 | tf_policy_saver.save(policy_dir) 169 | 170 | # checkpoint_dir = 'saved_models/jan8_' 171 | # train_checkpointer = common.Checkpointer( 172 | # ckpt_dir=checkpoint_dir, 173 | # max_to_keep=1, 174 | # agent=agent, 175 | # policy=agent.policy, 176 | # replay_buffer=replay_buffer, 177 | # ) 178 | 179 | train_loss = np.array(train_loss) 180 | interval = params.DQN_loss_avg_interval 181 | avg_losses, avg_returns = [], [] 182 | 183 | for i in range(len(train_loss) - interval): 184 | avg_returns.append(sum(returns[i:i + interval]) / interval) 185 | avg_losses.append(sum(train_loss[i:i + interval]) / interval) 186 | 187 | mat_dict = {'returns':np.array(returns), 'loss':np.array(train_loss), 188 | 'avg_returns':np.array(avg_returns), 'avg_loss':np.array(avg_losses)} 189 | 190 | savemat('saved_models/'+mname+'/'+mname+'.mat', mat_dict) 191 | 192 | plt.figure() 193 | plt.title("Returns vs. Episodes") 194 | plt.ylabel("Returns") 195 | plt.plot(avg_returns) 196 | plt.xlabel("Episodes") 197 | plt.grid() 198 | plt.savefig('saved_models/'+mname+'/plots/'+mname+'_return.png', dpi=600) 199 | 200 | plt.figure() 201 | plt.title("Train-loss vs. Episodes") 202 | plt.xlabel("Episodes") 203 | plt.ylabel("Loss") 204 | plt.plot(avg_losses) 205 | plt.grid() 206 | plt.savefig('saved_models/'+mname+'/plots/'+mname+'_loss.png', dpi=600) 207 | # plt.show() 208 | 209 | if __name__ == "__main__": 210 | import hyperparams as params 211 | dqn_train(params) 212 | -------------------------------------------------------------------------------- /src/kcs/environment.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | from tf_agents.environments import py_environment 4 | from tf_agents.environments import tf_py_environment 5 | from tf_agents.specs import array_spec 6 | from tf_agents.utils import common 7 | from tf_agents.trajectories import time_step as ts 8 | from scipy.integrate import solve_ivp 9 | # import hyperparams as params 10 | import kcs 11 | 12 | 13 | class ship_environment(py_environment.PyEnvironment): 14 | 15 | def __init__(self, train_test_flag=0, 16 | wind_flag=0, 17 | wind_speed=0, 18 | wind_dir=0, 19 | wave_flag=0, 20 | wave_height=0, 21 | wave_period=0, 22 | wave_dir=0, 23 | test_x_waypoints=None, 24 | test_y_waypoints=None, 25 | nwp=None, 26 | initial_obs_state=None): 27 | 28 | self._action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.int64, minimum=0, maximum=2,name='action') 29 | self._observation_spec = array_spec.BoundedArraySpec(shape=(4,), dtype=np.float32, minimum=[-20, -np.pi, 0, -1], 30 | maximum=[20, np.pi, 50, 1], name='observation') 31 | self.episode_ended = False 32 | self.counter = 0 33 | self.x_goal = 0 34 | self.y_goal = 0 35 | self.distance = 0 36 | self.obs_state = [1, 0, 0, 0, 0, 0, 0] 37 | 38 | self.train_test_flag = train_test_flag 39 | self.wind_flag = wind_flag 40 | self.wind_speed = wind_speed 41 | self.wind_dir = wind_dir 42 | 43 | self.wave_flag = wave_flag 44 | self.wave_height = wave_height 45 | self.wave_period = wave_period 46 | self.wave_dir = wave_dir 47 | 48 | # Test waypoints 49 | self.test_x_waypoints = test_x_waypoints 50 | self.test_y_waypoints = test_y_waypoints 51 | self.nwp = nwp 52 | self.wp_counter = 1 53 | self.initial_obs_state = initial_obs_state 54 | 55 | # Trajectories of state variables 56 | # Only stored in testing 57 | self.x_traj = [] 58 | self.y_traj = [] 59 | self.psi_traj = [] 60 | self.u_traj = [] 61 | self.v_traj = [] 62 | self.r_traj = [] 63 | self.delta_traj = [] 64 | 65 | # Trajectories of observation states 66 | # Only stored in testing 67 | self.cross_trk_err_traj = [] 68 | self.course_ang_err_traj =[] 69 | self.distance_traj = [] 70 | 71 | # Trajectories of action 72 | # Only stored in testing 73 | self.action_traj = [] 74 | 75 | # Reward in trajectory 76 | # Only stored in testing 77 | self.total_reward = [] 78 | self.reward_01 = [] # Cross track error 79 | self.reward_02 = [] # Course angle error 80 | self.reward_03 = [] # Distance to goal 81 | 82 | def action_spec(self): 83 | return self._action_spec 84 | 85 | def observation_spec(self): 86 | return self._observation_spec 87 | 88 | def _step(self, action_no): 89 | 90 | # Action selection 91 | action_set = [-35*np.pi/180,0,35*np.pi/180] 92 | delta_c = action_set[action_no] 93 | if self.train_test_flag == 1: 94 | self.action_traj.append(delta_c) 95 | 96 | tspan = (0, 0.3) 97 | yinit = self.obs_state 98 | 99 | sol = solve_ivp(lambda t,v: kcs.KCS_ode(t,v,delta_c, wind_flag=self.wind_flag, 100 | wind_speed=self.wind_speed, wind_dir=self.wind_dir, 101 | wave_flag=self.wave_flag, wave_height=self.wave_height, 102 | wave_period=self.wave_period, wave_dir=self.wave_dir), 103 | tspan, yinit, t_eval=tspan, dense_output=True) 104 | 105 | # sol.y outputs 0)surge vel 1) sway vel 2) yaw vel 3) x_coord 4) y_coord 5) psi 6) delta 106 | 107 | u = sol.y[0][-1] 108 | v = sol.y[1][-1] 109 | r = sol.y[2][-1] 110 | x = sol.y[3][-1] 111 | y = sol.y[4][-1] 112 | psi_rad = sol.y[5][-1] # psi 113 | psi = psi_rad % (2 * np.pi) 114 | delta = sol.y[6][-1] 115 | 116 | self.obs_state[0] = sol.y[0][-1] # Surge velocity 117 | self.obs_state[1] = sol.y[1][-1] # Sway velocity 118 | self.obs_state[2] = sol.y[2][-1] # Yaw velocity 119 | self.obs_state[3] = sol.y[3][-1] # X cooridnate 120 | self.obs_state[4] = sol.y[4][-1] # Y coordinate 121 | self.obs_state[5] = psi # Heading angle 122 | self.obs_state[6] = sol.y[6][-1] # Actual rudder angle 123 | 124 | if self.train_test_flag == 1: 125 | self.x_traj.append(x) 126 | self.y_traj.append(y) 127 | self.psi_traj.append(psi_rad) 128 | self.u_traj.append(u) 129 | self.v_traj.append(v) 130 | self.r_traj.append(r) 131 | self.delta_traj.append(delta) 132 | 133 | x_init = 0 134 | y_init = 0 135 | x_goal = self.x_goal 136 | y_goal = self.y_goal 137 | 138 | if self.train_test_flag == 1: 139 | x_init = self.test_x_waypoints[self.wp_counter-1] 140 | y_init = self.test_y_waypoints[self.wp_counter-1] 141 | x_goal = self.test_x_waypoints[self.wp_counter] 142 | y_goal = self.test_y_waypoints[self.wp_counter] 143 | 144 | # DISTANCE TO GOAL 145 | distance = ((x - x_goal) ** 2 + (y - y_goal) ** 2) ** 0.5 146 | self.distance = distance 147 | if self.train_test_flag == 1: 148 | self.distance_traj.append(distance) 149 | 150 | # CROSS TRACK ERROR 151 | vec1 = np.array([x_goal - x_init, y_goal - y_init]) 152 | vec2 = np.array([x_goal - x, y_goal - y]) 153 | vec1_hat = vec1 / np.linalg.norm(vec1) 154 | cross_track_error = np.cross(vec2, vec1_hat) 155 | 156 | if self.train_test_flag == 1: 157 | self.cross_trk_err_traj.append(cross_track_error) 158 | 159 | # COURSE ANGLE ERROR 160 | x_dot = u * np.cos(psi) - v * np.sin(psi) 161 | y_dot = u * np.sin(psi) + v * np.cos(psi) 162 | 163 | Uvec = np.array([x_dot, y_dot]) 164 | Uvec_hat = Uvec / np.linalg.norm(Uvec) 165 | vec2_hat = vec2 / np.linalg.norm(vec2) 166 | 167 | course_angle = np.arctan2(Uvec[1], Uvec[0]) 168 | psi_vec2 = np.arctan2(vec2[1], vec2[0]) 169 | 170 | course_angle_err = course_angle - psi_vec2 171 | course_angle_err = (course_angle_err + np.pi) % (2 * np.pi) - np.pi 172 | 173 | if self.train_test_flag == 1: 174 | self.course_ang_err_traj.append(course_angle_err) 175 | 176 | # REWARD FUNCTIONS 177 | R1 = 2 * np.exp(-0.08 * cross_track_error ** 2) - 1 178 | R2 = 1.3 * np.exp(-10 * (abs(course_angle_err))) - 0.3 179 | R3 = -distance * 0.25 180 | reward = R1 + R2 + R3 181 | 182 | if self.train_test_flag == 1: 183 | self.reward_01.append(R1) 184 | self.reward_02.append(R2) 185 | self.reward_03.append(R3) 186 | self.total_reward.append(reward) 187 | 188 | observation = [cross_track_error, course_angle_err, distance, r] 189 | 190 | # DESTINATION CHECK 191 | if abs(distance) <= 0.5: 192 | if self.train_test_flag == 1: 193 | print(f"Destination {self.wp_counter} reached") 194 | if self.wp_counter == self.nwp - 1: 195 | return ts.termination(np.array(observation, dtype=np.float32), reward) 196 | self.wp_counter += 1 197 | 198 | else: 199 | reward = 100 200 | self.episode_ended = True 201 | print('Destination reached') 202 | return ts.termination(np.array(observation, dtype=np.float32), reward) 203 | 204 | # TERMINATION CHECK 205 | angle_btw23 = np.arccos(np.dot(vec2_hat, Uvec_hat)) 206 | angle_btw12 = np.arccos(np.dot(vec1_hat, vec2_hat)) 207 | if angle_btw12 > np.pi / 2 and angle_btw23 > np.pi / 2: 208 | self.episode_ended = True 209 | if self.train_test_flag == 1: 210 | print(f"Destination {self.wp_counter} missed") 211 | return ts.termination(np.array(observation, dtype=np.float32), reward) 212 | else: 213 | return ts.transition(np.array(observation, dtype=np.float32), reward) 214 | 215 | def _reset(self): 216 | 217 | # print("Next episode") 218 | 219 | if self.train_test_flag == 0: 220 | self.obs_state = [1, 0, 0, 0, 0, 0, 0] 221 | 222 | radius = np.random.randint(8, 28) 223 | random_theta = 2 * np.pi * np.random.random() 224 | 225 | self.x_goal = radius * np.cos(random_theta) 226 | self.y_goal = radius * np.sin(random_theta) 227 | 228 | self.episode_ended = False 229 | 230 | print("GOAL COORD", self.x_goal, self.y_goal) 231 | x_goal = self.x_goal 232 | y_goal = self.y_goal 233 | 234 | x_dot = 1 235 | y_dot = 0 236 | 237 | vec2 = np.array([x_goal, y_goal]) 238 | Uvec = np.array([x_dot, y_dot]) 239 | 240 | course_angle = np.arctan2(Uvec[1], Uvec[0]) 241 | psi_vec2 = np.arctan2(vec2[1], vec2[0]) 242 | course_angle_err = course_angle - psi_vec2 243 | course_angle_err = (course_angle_err + np.pi) % (2 * np.pi) - np.pi 244 | 245 | self.counter = self.counter + 1 246 | observation = np.array([0, course_angle_err, radius, 0], dtype=np.float32) 247 | 248 | else: 249 | self.obs_state = self.initial_obs_state 250 | 251 | self.x_goal = self.test_x_waypoints[1] 252 | self.y_goal = self.test_y_waypoints[1] 253 | 254 | x = self.test_x_waypoints[0] 255 | y = self.test_y_waypoints[0] 256 | 257 | print("GOAL COORD", self.x_goal, self.y_goal) 258 | x_goal = self.x_goal 259 | y_goal = self.y_goal 260 | 261 | u = self.initial_obs_state[0] 262 | v = self.initial_obs_state[1] 263 | psi = self.initial_obs_state[5] 264 | x_dot = u * np.cos(psi) - v * np.sin(psi) 265 | y_dot = u * np.sin(psi) + v * np.cos(psi) 266 | 267 | vec2 = np.array([x_goal, y_goal]) 268 | Uvec = np.array([x_dot, y_dot]) 269 | 270 | course_angle = np.arctan2(Uvec[1], Uvec[0]) 271 | psi_vec2 = np.arctan2(vec2[1], vec2[0]) 272 | course_angle_err = course_angle - psi_vec2 273 | course_angle_err = (course_angle_err + np.pi) % (2 * np.pi) - np.pi 274 | 275 | dist_to_goal = np.sqrt((x_goal - x) ** 2 + (y_goal - y) ** 2) 276 | 277 | observation = np.array([0, course_angle_err, dist_to_goal, 0], dtype=np.float32) 278 | 279 | 280 | return ts.restart(observation) 281 | 282 | 283 | # ENDS HERE 284 | -------------------------------------------------------------------------------- /src/kcs/hyperparams.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | 4 | # Wind Parameters 5 | wind_flag = 0 6 | wind_speed = 0 7 | wind_dir = 0 8 | 9 | # Wave Parameters 10 | wave_flag = 0 11 | wave_height = 0 12 | wave_period = 0 13 | wave_dir = 0 14 | 15 | # TRAINING HYPERPARAMETERS 16 | 17 | model_name = 'model_001' 18 | duration = 160 19 | 20 | # Learning rate parameters - Exponential decay 21 | initial_learning_rate = 0.005 22 | decay_steps = 5000 23 | decay_rate = 0.8 24 | 25 | fc_layer_params = (64,64) 26 | discount_factor = 0.95 27 | target_update_tau = 0.01 28 | target_update_period = 1 29 | 30 | replay_buffer_max_length = 100000 31 | num_parallel_calls = 2 32 | sample_batch_size = 128 33 | num_steps = 2 34 | prefetch = 3 35 | 36 | max_episodes = 8001 37 | epsilon_greedy_episodes = 4000 38 | random_seed = 12345 39 | 40 | DQN_update_time_steps = 20 # Updates DQN parameters every these many time steps 41 | DQN_policy_store_frequency = 1000 # Stores DQN policy every these many episodes 42 | DQN_loss_avg_interval = 100 # Computes DQN loss and returns by averaging over these many episodes 43 | 44 | def epsilon_greedy(ep_counter): 45 | # Epsilon greedy algorithm to balance exploration and exploitation 46 | return np.maximum(1 - (ep_counter /epsilon_greedy_episodes), 0) 47 | 48 | def print_params(path): 49 | fid = open(os.path.join(path,'parameters.txt'),'w') 50 | fid.write(f'model_name:{model_name}\nduration:{duration}\n\n') 51 | fid.write(f'initial_learning_rate:{initial_learning_rate}\ndecay_steps:{decay_steps}\ndecay_rate:{decay_rate}\n\n') 52 | fid.write(f'fc_layer_params:{fc_layer_params}\ndiscount_factor:{discount_factor}\ntarget_update_tau:{target_update_tau}\ntarget_update_period:{target_update_period}\n\n') 53 | fid.write(f'replay_buffer_max_length:{replay_buffer_max_length}\nnum_parallel_calls:{num_parallel_calls}\nsample_batch_size:{sample_batch_size}\nnum_steps:{num_steps}\nprefetch:{prefetch}\n\n') 54 | fid.write(f'max_episodes:{max_episodes}\nepsilon_greedy_episodes:{epsilon_greedy_episodes}\nrandom_seed:{random_seed}\n\n') 55 | fid.write(f'DQN_update_time_steps:{DQN_update_time_steps}\nDQN_policy_store_frequency:{DQN_policy_store_frequency}\nDQN_loss_avg_interval:{DQN_loss_avg_interval}\n') 56 | fid.close() 57 | 58 | -------------------------------------------------------------------------------- /src/kcs/kcs.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | # Ship Geometry and Constants 4 | L = 230 5 | B = 32.2 6 | d_em = 10.8 7 | rho = 1025 8 | g = 9.80665 9 | 10 | Cb = 0.651 11 | Dsp = Cb * L * B * d_em 12 | Fn = 0.26 13 | U_des = Fn * np.sqrt(g * L) 14 | xG = -3.404 15 | kzzp = 0.25 16 | 17 | # Surge Hydrodynamic Derivatives in non-dimensional form 18 | X0 = -0.0167 19 | Xbb = -0.0549 20 | Xbr_minus_my = -0.1084 21 | Xrr = -0.0120 22 | Xbbbb = -0.0417 23 | 24 | # Sway Hydrodynamic Derivatives in non-dimensional form 25 | Yb = 0.2252 26 | Yr_minus_mx = 0.0398 27 | Ybbb = 1.7179 28 | Ybbr = -0.4832 29 | Ybrr = 0.8341 30 | Yrrr = -0.0050 31 | 32 | # Yaw Hydrodynamic Derivatives in non-dimensional form 33 | Nb = 0.1111 34 | Nr = -0.0465 35 | Nbbb = 0.1752 36 | Nbbr = -0.6168 37 | Nbrr = 0.0512 38 | Nrrr = -0.0387 39 | 40 | n_prop = 115.5 / 60 41 | Dp = 7.9 42 | wp = 1 - 0.645 # Effective Wake Fraction of the Propeller 43 | tp = 1 - 0.793 # Thrust Deduction Factor 44 | 45 | eps = 0.956 46 | eta = 0.7979 47 | kappa = 0.633 48 | xp_P = -0.4565 # Assuming propeller location is 10 m ahead of AP (Rudder Location) 49 | xp_R = -0.5 50 | 51 | mp = Dsp / (0.5 * (L ** 2) * d_em) 52 | xGp = xG / L 53 | # Added Mass and Mass Moment of Inertia (from MDLHydroD) 54 | mxp = 1790.85 / (0.5 * (L ** 2) * d_em) 55 | myp = 44324.18 / (0.5 * (L ** 2) * d_em) 56 | Jzzp = 140067300 / (0.5 * (L ** 4) * d_em) 57 | Izzp = mp * (kzzp ** 2) + mp * (xGp ** 2) 58 | 59 | a0 = 0.5228 60 | a1 = -0.4390 61 | a2 = -0.0609 62 | 63 | tR = 1 - 0.742 64 | aH = 0.361 65 | xp_H = -0.436 66 | 67 | A_R = L * d_em / 54.86 68 | Lamda = 2.164 69 | f_alp = 6.13 * Lamda / (2.25 + Lamda) 70 | 71 | 72 | def KCS_ode(t, v, delta_c, wind_flag=0, wind_speed=0, wind_dir=0, 73 | wave_flag=0, wave_height=0, wave_period=0, wave_dir=0): 74 | 75 | # Nondimensional State Space Variables 76 | up = v[0] 77 | vp = v[1] 78 | rp = v[2] 79 | xp = v[3] 80 | yp = v[4] 81 | psi = v[5] 82 | delta = v[6] 83 | # n_prop = v[7] 84 | 85 | # Derived kinematic variables 86 | b = np.arctan2(-vp, up) # Drift angle 87 | 88 | # ---------------------------------------------------- 89 | # Hull Force Calculation 90 | # ---------------------------------------------------- 91 | 92 | # Non-dimensional Surge Hull Hydrodynamic Force 93 | Xp_H = X0 * (up ** 2) \ 94 | + Xbb * (b ** 2) + Xbr_minus_my * b * rp \ 95 | + Xrr * (rp ** 2) + Xbbbb * (b ** 4) 96 | 97 | # Non-dimensional Sway Hull Hydrodynamic Force 98 | Yp_H = Yb * b + Yr_minus_mx * rp + Ybbb * (b ** 3) \ 99 | + Ybbr * (b ** 2) * rp + Ybrr * b * (rp ** 2) \ 100 | + Yrrr * (rp ** 3) 101 | 102 | # Non-dimensional Yaw Hull Hydrodynamic Moment 103 | Np_H = Nb * b + Nr * rp + Nbbb * (b ** 3) \ 104 | + Nbbr * (b ** 2) * rp + Nbrr * b * (rp ** 2) \ 105 | + Nrrr * (rp ** 3) 106 | 107 | # ---------------------------------------------------- 108 | # Propulsion Force Calculation 109 | # ---------------------------------------------------- 110 | # The value self propulsion RPM is taken from Yoshimura's SIMMAN study 111 | # Analysis of steady hydrodynamic force components and prediction of 112 | # manoeuvering ship motion with KVLCC1, KVLCC2 and KCS 113 | 114 | J = (up * U_des) * (1 - wp) / (n_prop * Dp) # Advance Coefficient 115 | 116 | Kt = a0 + a1 * J + a2 * (J ** 2) # Thrust Coefficient 117 | 118 | # Dimensional Propulsion Force 119 | X_P = (1 - tp) * rho * Kt * (Dp ** 4) * (n_prop ** 2) 120 | 121 | # Non-dimensional Propulsion Force 122 | Xp_P = X_P / (0.5 * rho * L * d_em * (U_des ** 2)) 123 | 124 | # ---------------------------------------------------- 125 | # Rudder Force Calculation 126 | # ---------------------------------------------------- 127 | 128 | b_p = b - xp_P * rp 129 | 130 | if b_p > 0: 131 | gamma_R = 0.492 132 | else: 133 | gamma_R = 0.338 134 | 135 | lp_R = -0.755 136 | 137 | up_R = eps * (1 - wp) * up * np.sqrt(eta * (1 + kappa * (np.sqrt(1 + 8 * Kt / (np.pi * (J ** 2))) - 1)) ** 2 + (1 - eta)) 138 | 139 | vp_R = gamma_R * (vp + rp * lp_R) 140 | 141 | Up_R = np.sqrt(up_R ** 2 + vp_R ** 2) 142 | alpha_R = delta - np.arctan2(-vp_R, up_R) 143 | 144 | F_N = A_R / (L * d_em) * f_alp * (Up_R ** 2) * np.sin(alpha_R) 145 | 146 | Xp_R = - (1 - tR) * F_N * np.sin(delta) 147 | Yp_R = - (1 + aH) * F_N * np.cos(delta) 148 | Np_R = - (xp_R + aH * xp_H) * F_N * np.cos(delta) 149 | 150 | # ---------------------------------------------------- 151 | # Coriolis terms 152 | # ---------------------------------------------------- 153 | 154 | mp = Dsp / (0.5 * (L ** 2) * d_em) 155 | xGp = xG / L 156 | 157 | Xp_C = mp * vp * rp + mp * xGp * (rp ** 2) 158 | Yp_C = -mp * up * rp 159 | Np_C = -mp * xGp * up * rp 160 | 161 | # ---------------------------------------------------- 162 | # Wind Force Calculation 163 | # ---------------------------------------------------- 164 | 165 | if wind_flag == 1: 166 | Vw = wind_speed # wind speed 167 | betaw = wind_dir * (np.pi/180) # wind direction 168 | Lp = 3.0464 169 | de = 0.1430 170 | uw = Vw * np.cos(betaw - psi) 171 | vw = Vw * np.sin(betaw - psi) 172 | urw = up - uw 173 | vrw = vp - vw 174 | Uwr = (urw ** 2 + vrw ** 2) ** 0.5 175 | gammaw = np.arctan2(-vrw, -urw) 176 | # print(gammaw,"gamma") 177 | 178 | rhow = 1025 179 | rhoa = 1.225 180 | Ax = (0.4265 * (0.2517 - 0.1430)) 181 | Ay = ((0.2517 - 0.1430) * Lp) 182 | # print(Ax,Ay,"AX") 183 | 184 | Cwx = 1 * np.cos(gammaw) 185 | Cwy = 1 * np.sin(gammaw) 186 | Cwpsi = 0.5 * np.sin(gammaw) 187 | 188 | Xp_W = (Ax * Cwx * Uwr * abs(Uwr)) * rhoa / (Lp * de * rhow) 189 | Yp_W = (Ay * Cwy * Uwr * abs(Uwr)) * rhoa / (Lp * de * rhow) 190 | Np_W = (Ay * Lp * Cwpsi * Uwr * abs(Uwr)) * rhoa / ((Lp ** 2) * de * rhow) 191 | else: 192 | Xp_W = 0.0 193 | Yp_W = 0.0 194 | Np_W = 0.0 195 | 196 | # Net non-dimensional force and moment computation 197 | Xp = Xp_H + Xp_R + Xp_C + Xp_W + Xp_P 198 | Yp = Yp_H + Yp_R + Yp_C + Yp_W 199 | Np = Np_H + Np_R + Np_C + Np_W 200 | 201 | # Net force vector computation in Abkowitz non-dimensionalization 202 | X = Xp 203 | Y = Yp 204 | N = Np 205 | 206 | # Added Mass and Mass Moment of Inertia (from MDLHydroD) 207 | mxp = 1790.85 / (0.5 * (L ** 2) * d_em) 208 | myp = 44324.18 / (0.5 * (L ** 2) * d_em) 209 | Jzzp = 140067300 / (0.5 * (L ** 4) * d_em) 210 | Izzp = mp * (kzzp ** 2) + mp * (xGp ** 2) 211 | 212 | Mmat = np.zeros((3, 3)) 213 | 214 | Mmat[0, 0] = mp + mxp 215 | Mmat[1, 1] = mp + myp 216 | Mmat[2, 2] = Izzp + Jzzp 217 | Mmat[1, 2] = mp * xGp 218 | Mmat[2, 1] = mp * xGp 219 | 220 | Mmatinv = np.linalg.inv(Mmat) 221 | 222 | tau = np.array([X, Y, N]) 223 | 224 | vel_der = Mmatinv @ tau 225 | 226 | # Derivative of state vector 227 | vd = np.zeros(7) 228 | 229 | vd[0:3] = vel_der 230 | vd[3] = up * np.cos(psi) - vp * np.sin(psi) 231 | vd[4] = up * np.sin(psi) + vp * np.cos(psi) 232 | vd[5] = rp 233 | 234 | # # Commanded Rudder Angle 235 | # delta_c = KCS_rudder_angle(t, v) 236 | 237 | T_rud = 0.1 # Corresponds to a time constant of 0.1 * L / U_des = 2 seconds 238 | deltad = (delta_c - delta) / T_rud 239 | 240 | deltad_max = 5 * np.pi / 180 * (L / U_des) # Maximum rudder rate of 5 degrees per second 241 | 242 | # Rudder rate saturation 243 | if np.abs(deltad) > deltad_max: 244 | deltad = np.sign(deltad) * deltad_max 245 | 246 | vd[6] = deltad 247 | 248 | return vd -------------------------------------------------------------------------------- /src/kcs/saved_models/model_012/model_012.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012.mat -------------------------------------------------------------------------------- /src/kcs/saved_models/model_012/model_012_ep_7000/policy_specs.pbtxt: -------------------------------------------------------------------------------- 1 | dict_value { 2 | fields { 3 | key: "collect_data_spec" 4 | value { 5 | named_tuple_value { 6 | name: "Trajectory" 7 | values { 8 | key: "step_type" 9 | value { 10 | tensor_spec_value { 11 | name: "step_type" 12 | shape { 13 | } 14 | dtype: DT_INT32 15 | } 16 | } 17 | } 18 | values { 19 | key: "observation" 20 | value { 21 | bounded_tensor_spec_value { 22 | name: "observation" 23 | shape { 24 | dim { 25 | size: 4 26 | } 27 | } 28 | dtype: DT_FLOAT 29 | minimum { 30 | dtype: DT_FLOAT 31 | tensor_shape { 32 | dim { 33 | size: 4 34 | } 35 | } 36 | tensor_content: "\000\000\240\301\333\017I\300\000\000\000\000\000\000\200\277" 37 | } 38 | maximum { 39 | dtype: DT_FLOAT 40 | tensor_shape { 41 | dim { 42 | size: 4 43 | } 44 | } 45 | tensor_content: "\000\000\240A\333\017I@\000\000HB\000\000\200?" 46 | } 47 | } 48 | } 49 | } 50 | values { 51 | key: "action" 52 | value { 53 | bounded_tensor_spec_value { 54 | name: "action" 55 | shape { 56 | } 57 | dtype: DT_INT64 58 | minimum { 59 | dtype: DT_INT64 60 | tensor_shape { 61 | } 62 | int64_val: 0 63 | } 64 | maximum { 65 | dtype: DT_INT64 66 | tensor_shape { 67 | } 68 | int64_val: 2 69 | } 70 | } 71 | } 72 | } 73 | values { 74 | key: "policy_info" 75 | value { 76 | tuple_value { 77 | } 78 | } 79 | } 80 | values { 81 | key: "next_step_type" 82 | value { 83 | tensor_spec_value { 84 | name: "step_type" 85 | shape { 86 | } 87 | dtype: DT_INT32 88 | } 89 | } 90 | } 91 | values { 92 | key: "reward" 93 | value { 94 | tensor_spec_value { 95 | name: "reward" 96 | shape { 97 | } 98 | dtype: DT_FLOAT 99 | } 100 | } 101 | } 102 | values { 103 | key: "discount" 104 | value { 105 | bounded_tensor_spec_value { 106 | name: "discount" 107 | shape { 108 | } 109 | dtype: DT_FLOAT 110 | minimum { 111 | dtype: DT_FLOAT 112 | tensor_shape { 113 | } 114 | float_val: 0.0 115 | } 116 | maximum { 117 | dtype: DT_FLOAT 118 | tensor_shape { 119 | } 120 | float_val: 1.0 121 | } 122 | } 123 | } 124 | } 125 | } 126 | } 127 | } 128 | fields { 129 | key: "policy_state_spec" 130 | value { 131 | tuple_value { 132 | } 133 | } 134 | } 135 | } 136 | -------------------------------------------------------------------------------- /src/kcs/saved_models/model_012/model_012_ep_7000/saved_model.pb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012_ep_7000/saved_model.pb -------------------------------------------------------------------------------- /src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.data-00000-of-00001: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.data-00000-of-00001 -------------------------------------------------------------------------------- /src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.index -------------------------------------------------------------------------------- /src/kcs/saved_models/model_012/parameters.txt: -------------------------------------------------------------------------------- 1 | model_name:model_012 2 | duration:160 3 | 4 | initial_learning_rate:0.001 5 | decay_steps:5000 6 | decay_rate:0.5 7 | 8 | fc_layer_params:(64, 64) 9 | discount_factor:0.95 10 | target_update_tau:0.01 11 | target_update_period:1 12 | 13 | replay_buffer_max_length:100000 14 | num_parallel_calls:2 15 | sample_batch_size:128 16 | num_steps:2 17 | prefetch:3 18 | 19 | max_episodes:10001 20 | epsilon_greedy_episodes:5000 21 | random_seed:69574 22 | 23 | DQN_update_time_steps:20 24 | DQN_policy_store_frequency:1000 25 | DQN_loss_avg_interval:100 26 | -------------------------------------------------------------------------------- /src/kcs/test_wp_track.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | from tf_agents.environments import py_environment 5 | from tf_agents.environments import tf_py_environment 6 | from tf_agents.specs import array_spec 7 | from tf_agents.trajectories import time_step as ts 8 | from tf_agents.environments import wrappers 9 | from scipy.integrate import solve_ivp 10 | import os 11 | from environment import ship_environment 12 | import re 13 | 14 | def wp_track(model_name, wind_flag=0, wind_speed=0,wind_dir=0, 15 | wave_flag=0, wave_height=0, wave_period=0, wave_dir=0, 16 | npoints=None, x_wp=None, y_wp=None, psi0=None,traj_str='', 17 | xdes=None, ydes=None): 18 | 19 | dir_list = os.listdir('saved_models/' + model_name) 20 | regex = re.compile(model_name + '_ep_') 21 | new_dir_list = list(filter(regex.match, dir_list)) 22 | 23 | for policy_name in new_dir_list: 24 | 25 | env = wrappers.TimeLimit(ship_environment(train_test_flag=1, 26 | wind_flag=wind_flag, wind_speed=wind_speed, wind_dir=wind_dir, 27 | wave_flag=wave_flag, wave_height=wave_height, 28 | wave_period=wave_period, wave_dir=wave_dir, 29 | test_x_waypoints=x_wp, 30 | test_y_waypoints=y_wp, 31 | nwp=npoints, 32 | initial_obs_state=[1, 0, 0, x_wp[0], y_wp[0], psi0, 0]), duration=500) 33 | 34 | tf_env = tf_py_environment.TFPyEnvironment(env) 35 | 36 | train_step_counter = tf.Variable(0) 37 | 38 | # Reset the train step 39 | episodes = 1 40 | 41 | saved_policy = tf.compat.v2.saved_model.load('saved_models/'+ model_name + '/' + policy_name) 42 | 43 | for _ in range(episodes): 44 | 45 | time_step = tf_env.reset() 46 | # time = 0 47 | 48 | while not np.equal(time_step.step_type, 2): 49 | action_step = saved_policy.action(time_step) 50 | time_step = tf_env.step(action_step.action) 51 | 52 | plt.figure() 53 | X = env.x_traj 54 | Y = env.y_traj 55 | Psi = env.psi_traj 56 | x_ship = np.array([-0.5, -0.5, 0.25, 0.5, 0.25, -0.5, -0.5, 0.5, 0.25, 0, 0]) 57 | y_ship = 16.1 / 230 * np.array([-1, 1, 1, 0, -1, -1, 0, 0, 1, 1, -1]) 58 | 59 | plt.plot(X,Y) 60 | plt.plot(x_wp[0], y_wp[0], 'ro') 61 | 62 | # Waypoints 63 | x_wp = env.test_x_waypoints 64 | y_wp = env.test_y_waypoints 65 | 66 | plt.plot(xdes,ydes) 67 | plt.scatter(x_wp,y_wp) 68 | 69 | # Plot ship geometry on path 70 | m = 9 71 | for i in range(1,m): 72 | m_indx = i * (len(X) // m) 73 | psi_new = Psi[m_indx] 74 | x_new_ship = X[m_indx] + x_ship * np.cos(psi_new) - y_ship * np.sin(psi_new) 75 | y_new_ship = Y[m_indx] + x_ship * np.sin(psi_new) + y_ship * np.cos(psi_new) 76 | plt.plot(x_new_ship, y_new_ship, 'r') 77 | 78 | plt.xlabel("X/L") 79 | plt.ylabel("Y/L") 80 | plt.axis('equal') 81 | plt.legend(["Path Followed","Start Waypoint","Predefined Path"],loc="center") 82 | ax = plt.gca() 83 | ax.invert_yaxis() 84 | plt.grid(linewidth = 1) 85 | if wind_flag == 0 and wave_flag == 0: 86 | plt.title(f'Circle') 87 | plt_str = f'saved_models/{model_name}/plots/{traj_str}_{policy_name}.svg' 88 | plt.savefig(plt_str, dpi=600) 89 | elif wind_flag == 1: 90 | plt.title(f'V_w={wind_speed} \\beta_w={wind_dir}') 91 | plt_str = f'saved_models/{model_name}/plots/{traj_str}_{policy_name}_WS{wind_speed}_WD{wind_dir}.svg' 92 | plt.savefig(plt_str, dpi=600) 93 | 94 | print("rmse:" , (np.sqrt(np.mean(np.square(env.cross_trk_err_traj))))) 95 | 96 | 97 | -------------------------------------------------------------------------------- /src/requirements_docker.txt: -------------------------------------------------------------------------------- 1 | absl-py==1.0.0 2 | astunparse==1.6.3 3 | cachetools==5.1.0 4 | certifi==2019.11.28 5 | chardet==3.0.4 6 | cloudpickle==2.1.0 7 | cycler==0.11.0 8 | dbus-python==1.2.16 9 | decorator==5.1.1 10 | dm-tree==0.1.7 11 | flatbuffers==1.12 12 | fonttools==4.36.0 13 | gast==0.4.0 14 | gin-config==0.5.0 15 | google-auth==2.6.6 16 | google-auth-oauthlib==0.4.6 17 | google-pasta==0.2.0 18 | grpcio==1.46.3 19 | gym==0.23.0 20 | gym-notices==0.0.8 21 | h5py==3.6.0 22 | idna==2.8 23 | importlib-metadata==4.11.4 24 | keras==2.9.0 25 | Keras-Preprocessing==1.1.2 26 | kiwisolver==1.4.4 27 | libclang==14.0.1 28 | Markdown==3.3.7 29 | matplotlib==3.5.3 30 | numpy==1.22.4 31 | oauthlib==3.2.0 32 | opt-einsum==3.3.0 33 | packaging==21.3 34 | Pillow==9.2.0 35 | protobuf==3.19.4 36 | pyasn1==0.4.8 37 | pyasn1-modules==0.2.8 38 | pygame==2.1.0 39 | PyGObject==3.36.0 40 | pyparsing==3.0.9 41 | python-apt==2.0.0+ubuntu0.20.4.7 42 | python-dateutil==2.8.2 43 | requests==2.22.0 44 | requests-oauthlib==1.3.1 45 | requests-unixsocket==0.2.0 46 | rsa==4.8 47 | scipy==1.9.0 48 | six==1.14.0 49 | tensorboard==2.9.0 50 | tensorboard-data-server==0.6.1 51 | tensorboard-plugin-wit==1.8.1 52 | tensorflow==2.9.1 53 | tensorflow-estimator==2.9.0 54 | tensorflow-io-gcs-filesystem==0.26.0 55 | tensorflow-probability==0.17.0 56 | termcolor==1.1.0 57 | tf-agents==0.13.0 58 | typing_extensions==4.2.0 59 | urllib3==1.25.8 60 | Werkzeug==2.1.2 61 | wrapt==1.14.1 62 | zipp==3.8.0 63 | -------------------------------------------------------------------------------- /src/requirements_tf_260.txt: -------------------------------------------------------------------------------- 1 | tf-agents==0.10.0 2 | scipy==1.4.1 3 | gym==0.24.1 4 | matplotlib==3.3.4 5 | numpy==1.19.5 6 | tensorflow-probability==0.14.0 7 | -------------------------------------------------------------------------------- /src/requirements_tf_291.txt: -------------------------------------------------------------------------------- 1 | tf-agents==0.13.0 2 | scipy==1.9.0 3 | gym==0.23.0 4 | matplotlib==3.5.3 5 | numpy==1.22.4 6 | tensorflow-probability==0.17.0 7 | -------------------------------------------------------------------------------- /src/run_docker.sh: -------------------------------------------------------------------------------- 1 | docker run -it --rm --gpus all --name dqn_asv \ 2 | --net=host \ 3 | --env="DISPLAY" \ 4 | --workdir="/home/docker/DQN-ASV-path-follow/src" \ 5 | --volume=$(pwd):"/home/docker/DQN-ASV-path-follow/src" \ 6 | --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \ 7 | abhilashiit/dqn_asv:2.0 bash 8 | --------------------------------------------------------------------------------