├── LICENSE
├── README.md
└── src
    ├── Dockerfile
    ├── docker.sh
    ├── kcs
        ├── batch_train.py
        ├── dqn_test.py
        ├── dqn_train.py
        ├── environment.py
        ├── hyperparams.py
        ├── kcs.py
        ├── saved_models
        │   └── model_012
        │   │   ├── model_012.mat
        │   │   ├── model_012_ep_7000
        │   │       ├── policy_specs.pbtxt
        │   │       ├── saved_model.pb
        │   │       └── variables
        │   │       │   ├── variables.data-00000-of-00001
        │   │       │   └── variables.index
        │   │   └── parameters.txt
        └── test_wp_track.py
    ├── requirements_docker.txt
    ├── requirements_tf_260.txt
    ├── requirements_tf_291.txt
    └── run_docker.sh


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Marine Autonomous Robotic Systems (MARS) Laboratory
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation
  2 | 
  3 | Achieves path following of an autonomous surface vessel (ASV) with a Deep Q-Network (DQN) agent. This code uses reinforcement learning to train an agent to control the rudder angle of the Kriso Container Ship (KCS) to achieve waypoint tracking the presence of calm waters and in the presence of winds
  4 | 
  5 | # Why the project is useful?
  6 | A majority of marine accidents that occur can be attributed to errors in human decisions. Through automation, the occurrence of such incidents can be minimized. Therefore, automation in the marine industry has been receiving increased attention in the recent years. This work investigates the automation of the path following action of a ship. A deep Q-learning approach is proposed to solve the path-following problem of a ship. This method comes under the broader area of deep reinforcement learning (DRL) and is well suited for such tasks, as it can learn to take optimal decisions through sufficient experience. This algorithm also balances the exploration and the exploitation schemes of an agent operating in an environment. A three-degree-of-freedom (3-DOF) dynamic model is adopted to describe the ship’s motion. The Krisco container ship (KCS) is chosen for this study as it is a benchmark hull that is used in several studies and its hydrodynamic coefficients are readily available for numerical modeling. Numerical simulations for the turning circle and zig-zag maneuver tests are performed to verify the accuracy of the proposed dynamic model. A reinforcement learning (RL) agent is trained to interact with this numerical model to achieve waypoint tracking. Finally, wind forces are modelled and the performance of the RL based controller is evaluated in the presence of wind.
  7 | 
  8 | # What does the project do?
  9 | This repository contains code for implementing reinforcement learning based control for the path following of autonomous ships. The code is trained in a docker container to maintain easy portability.
 10 | 
 11 | # How to get started with the project?
 12 | The project has been setup with a docker file that can be used to build the docker container in which the code will be executed. It is presumed that you have docker installed on your system. Please follow the following steps to get the code up and running.
 13 | 
 14 | **Prerequisites:**
 15 | 
 16 | 1. An Ubuntu OS (has been tested on Ubuntu 20.04). Note this will not work on Windows OS systems as x11 forwarding needs to be handled separately on it and also NVIDIA GPU support for docker containers as of now is not available on Windows.
 17 | 2. Docker installed on your system ([Link](https://docs.docker.com/engine/install/ubuntu/))
 18 | 3. Complete the steps to use docker as a non-root user ([Link](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user))
 19 | 4. NVIDIA driver is installed (for GPU use) - typing `nvidia-smi` on a terminal should tell you if you have the NVIDIA drivers or not (If not follow instructions at this [Link](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html))
 20 | 5. NVIDIA Container Toolkit is installed to interface GPUs to docker containers ([Link](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit))
 21 | 
 22 | **Note:** Some of these links listed above may move over time. In such a case please find the appropriate links to find the instructions for installation.
 23 | 
 24 | **Step 1:** 
 25 | 
 26 | Clone the repository to your system
 27 | 
 28 | ```commandline 
 29 | git clone https://github.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation.git
 30 | ```
 31 | 
 32 | and cd into the src folder. The prompt on the terminal should look as ```.../DQN-ASV-path-follow/src$```
 33 | 
 34 | **Step 2:** 
 35 | 
 36 | Change shell scripts to be executables 
 37 | ```commandline
 38 | chmod +x docker.sh
 39 | chmod +x run_docker.sh
 40 | ```
 41 | 
 42 | Execute the script docker.sh to build the docker container
 43 | 
 44 | ```commandline
 45 | ./docker.sh 
 46 | ```
 47 | 
 48 | **Step 3:** 
 49 | 
 50 | Type the following command 
 51 | 
 52 | ```commandline
 53 | xauth list
 54 | ```
 55 | and you will see a response similar (you may see more lines on your computer) to 
 56 | 
 57 | ```
 58 | iit-m/unix:  MIT-MAGIC-COOKIE-1  958745625f2yu22358u3ebe5cc4ad453
 59 | #ffff#6969742d6d#:  MIT-MAGIC-COOKIE-1  958745625f2yu22358u3ebe5cc4ad453
 60 | ```
 61 | 
 62 | Copy the first line of the response that corresponds to your computer name (as seen after @ in the prompt on a terminal). So for ```user@iit-m$``` displayed on the prompt the computer name is ```iit-m```.
 63 | 
 64 | **Step 4:** 
 65 | 
 66 | Run the docker container
 67 | 
 68 | ```commandline
 69 | ./run_docker.sh 
 70 | ```
 71 | 
 72 | Notice that this binds the current /src directory on the host machine to the /src directory on the container. Thus, the code can be edited on the host machine and the changes will be reflected in the container instantly. 
 73 | 
 74 | **Step 5:** 
 75 | 
 76 | Type the following command inside the container (Check that the terminal prompt reflects something like ```docker@iit-m:~/DQN-ASV-path-follow/src$``` with ```iit-m``` replaced by your computer name)
 77 | 
 78 | ```commandline
 79 | xauth add <your MIT-MAGIC-COOKIE-1>
 80 | ```
 81 | where make sure to replace ```<your MIT-MAGIC-COOKIE-1>``` with your cookie that you copied in step 3. Notice however, that a ```0``` must be added at the end of ```<computer-name/unix:>```
 82 | 
 83 | ```
 84 | xauth add iit-m/unix:0  MIT-MAGIC-COOKIE-1  958745625f2yu22358u3ebe5cc4ad453
 85 | ```
 86 | 
 87 | **Step 6:** 
 88 | Now the files should be executable inside the docker container
 89 | 
 90 | # Important python scripts
 91 | 
 92 | **kcs folder** 
 93 | 
 94 | ```environment.py``` has the python environment description for the path following of KCS. Edit this file to modify the dynamics of the vehicle and how it interacts with the environment. Currently this study uses the 3-DOF MMG model to mimic the dynamics of the KCS vessel in calm waters and in wind. To start a single training run, edit the training hyperparameters in ```hyperparams.py``` execute the file ```dqn_train.py``` inside the docker container. Before starting a training run, ensure that the the model and any plots are getting saved in the correct directories. To test various trajectories including single waypoint tracking, elliptical trajectory and other trajectories in calm water or in wind run ```dqn_test.py``` (make sure to load the correct model). If you want to sequentially train multiple training runs, then set the required hyperparameters in ```batch_train.py``` and execute ```batch_train.py``` inside the docker container.  
 95 | 
 96 | # Where can you get help with your project?
 97 | Help on this project can be sought by emailing R S Sanjeev Kumar at `sanjeevrs2000@gmail.com` or Md Shadab Alam at `shaadalam.5u@gmail.com` or the supervisor Abhilash Somayajula at `abhilash@iitm.ac.in`. For any proposed updates to the codes please raise a pull request.
 98 | 
 99 | # Who maintains and contributes to the project?
100 | This project is currently maintained by Marine Autonomous Vessels (MAV) Laboratory within the department of Ocean Engineering at IIT Madras. The project has originally been undertaken by Rohit Deraj, R S Sanjeev Kumar, Md Shadab Alam and has been supervised by Abhilash Somayajula. Contributions to the repository have been made by R S Sanjeev Kumar, Md Shadab Alam and Abhilash Somayajula.
101 | 


--------------------------------------------------------------------------------
/src/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM tensorflow/tensorflow:2.9.1-gpu
 2 | 
 3 | LABEL maintainer="abhilash"
 4 | LABEL version="2.0"
 5 | LABEL description="Docker environment for DQN-ASV-path-follow"
 6 | 
 7 | #RUN apt-get update
 8 | RUN apt-get -y install software-properties-common libsm6 libxext6 libxrender1 libfontconfig1 libgl1 sudo xauth
 9 | #RUN apt-get update
10 | 
11 | RUN pip install --upgrade pip
12 | COPY ./requirements_tf_291.txt /requirements.txt
13 | RUN pip install -r requirements.txt
14 | 
15 | ENV USERNAME docker
16 | RUN useradd -m $USERNAME && \
17 |         echo "$USERNAME:$USERNAME" | chpasswd && \
18 |         usermod --shell /bin/bash $USERNAME && \
19 |         usermod -aG sudo $USERNAME && \
20 |         echo "$USERNAME ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/$USERNAME && \
21 |         chmod 0440 /etc/sudoers.d/$USERNAME 
22 | 
23 | WORKDIR /home/docker
24 | USER docker
25 | 
26 | CMD ["bash"]
27 | 


--------------------------------------------------------------------------------
/src/docker.sh:
--------------------------------------------------------------------------------
1 | docker build . -f Dockerfile -t abhilashiit/dqn_asv:2.0
2 | #docker push abhilashiit/dqn_asv:2.0
3 | 


--------------------------------------------------------------------------------
/src/kcs/batch_train.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import hyperparams as params
 3 | import dqn_train
 4 | import time
 5 | import tensorflow as tf
 6 | 
 7 | max_batch_train = 1
 8 | 
 9 | for batch_train in range(0, max_batch_train):
10 |     params.model_name = 'model_' + '{:03}'.format(batch_train + 1)
11 | 
12 |     if batch_train < max_batch_train:
13 |         params.duration = 160
14 |         params.initial_learning_rate = 1e-3
15 |         params.decay_steps = 5000
16 |         params.decay_rate = 0.5
17 |         params.fc_layer_params = (64, 64)
18 |         params.discount_factor = 0.95
19 |         params.target_update_tau = 0.01
20 |         params.target_update_period = 1
21 |         params.replay_buffer_max_length = 100000
22 |         params.num_parallel_calls = 2
23 |         params.sample_batch_size = 128
24 |         params.num_steps = 2
25 |         params.prefetch = 3
26 |         params.max_episodes = 10001
27 |         params.epsilon_greedy_episodes = 5000
28 |         params.random_seed = 69574#np.random.randint(100000)
29 |         params.DQN_update_time_steps = 20
30 |         params.DQN_policy_store_frequency = 1000
31 |         params.DQN_loss_avg_interval = 100
32 | 
33 |     tf.keras.utils.set_random_seed(params.random_seed)
34 |     tf.config.experimental.enable_op_determinism()
35 |     dqn_train.dqn_train(params)
36 | 
37 | print(f'Execution Completed Successfully')
38 | 
39 | 
40 | 


--------------------------------------------------------------------------------
/src/kcs/dqn_test.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import test_wp_track as wpt
  3 | 
  4 | model_name = 'model_001'
  5 | 
  6 | def ellipse(flag=0):
  7 |     npoints = 12
  8 | 
  9 |     if flag == 0:
 10 |         # Starboard turning ellipse
 11 |         theta = np.linspace(0, 2 * np.pi, num=npoints, endpoint=True)
 12 |         theta_des = np.linspace(0, 2 * np.pi, num=1000)
 13 |         psi0 = np.pi / 2
 14 |     else:
 15 |         # Port turning ellipse
 16 |         theta = np.linspace(0, -2 * np.pi, num=npoints, endpoint=True)
 17 |         theta_des = np.linspace(0, -2 * np.pi, num=1000)
 18 |         psi0 = -np.pi / 2
 19 | 
 20 |     a_ellipse = 12
 21 |     b_ellipse = 12
 22 |     x_wp = a_ellipse * np.cos(theta)
 23 |     y_wp = b_ellipse * np.sin(theta)
 24 | 
 25 |     xdes = a_ellipse * np.cos(theta_des)
 26 |     ydes = b_ellipse * np.sin(theta_des)
 27 | 
 28 |     return npoints, x_wp, y_wp, psi0, xdes, ydes
 29 | 
 30 | def straight():
 31 |     npoints = 9
 32 |     psi0 = 0
 33 |     x_wp = np.linspace(0, 64, num=npoints, endpoint=True)
 34 |     y_wp = 0 * x_wp
 35 |     xdes = x_wp
 36 |     ydes = y_wp
 37 |     return npoints, x_wp, y_wp, psi0, xdes, ydes
 38 | 
 39 | def eight(flag=0):
 40 |     npoints = 15
 41 |     psi0 = 0
 42 |     circ_dia = 9
 43 | 
 44 |     if flag == 0:
 45 |         theta1 = np.linspace(-0.5 * np.pi, 1.5 * np.pi, npoints - npoints // 2, endpoint=True)
 46 |         theta1_des = np.linspace(-0.5 * np.pi, 1.5 * np.pi, 1000)
 47 |         x1 = circ_dia * np.cos(theta1)
 48 |         y1 = circ_dia * np.sin(theta1) + circ_dia
 49 |         x1des = circ_dia * np.cos(theta1_des)
 50 |         y1des = circ_dia * np.sin(theta1_des) + circ_dia
 51 | 
 52 |         theta2 = np.linspace(0.5 * np.pi, 2.5 * np.pi, npoints // 2, endpoint=False)
 53 |         theta2_des = np.linspace(0.5 * np.pi, 2.5 * np.pi, 1000)
 54 |         x2 = circ_dia * np.cos(theta2)
 55 |         y2 = circ_dia * np.sin(theta2) - circ_dia
 56 |         x2des = circ_dia * np.cos(theta2_des)
 57 |         y2des = circ_dia * np.sin(theta2_des) - circ_dia
 58 | 
 59 |         x_wp = np.append(x1, x2[::-1])
 60 |         y_wp = np.append(y1, y2[::-1])
 61 | 
 62 |         xdes = np.append(x1des, x2des[::-1])
 63 |         ydes = np.append(y1des, y2des[::-1])
 64 | 
 65 |     else:
 66 |         theta1 = np.linspace(-0.5 * np.pi, 1.5 * np.pi, npoints - npoints // 2, endpoint=True)
 67 |         theta1_des = np.linspace(-0.5 * np.pi, 1.5 * np.pi, 1000)
 68 |         x1 = circ_dia * np.cos(theta1)
 69 |         y1 = circ_dia * np.sin(theta1) + circ_dia
 70 |         x1des = circ_dia * np.cos(theta1_des)
 71 |         y1des = circ_dia * np.sin(theta1_des) + circ_dia
 72 | 
 73 |         theta2 = np.linspace(2.5 * np.pi, 0.5 * np.pi, npoints // 2, endpoint=False)
 74 |         theta2_des = np.linspace(2.5 * np.pi, 0.5 * np.pi, 1000)
 75 |         x2 = circ_dia * np.cos(theta2)
 76 |         y2 = circ_dia * np.sin(theta2) - circ_dia
 77 |         x2des = circ_dia * np.cos(theta2_des)
 78 |         y2des = circ_dia * np.sin(theta2_des) - circ_dia
 79 | 
 80 |         x_wp = np.append(x2, x1)
 81 |         y_wp = np.append(y2, y1)
 82 |         xdes = np.append(x2des, x1des[::-1])
 83 |         ydes = np.append(y2des, y1des[::-1])
 84 |         
 85 |     return npoints, x_wp, y_wp, psi0, xdes, ydes
 86 | 
 87 | def single_wp(quadrant=1, len=10):
 88 |     npoints = 2
 89 |     psi0 = 0
 90 | 
 91 |     if quadrant == 1:
 92 |         x_wp = np.array([0, len])
 93 |         y_wp = np.array([0, len])
 94 |     elif quadrant == 2:
 95 |         x_wp = np.array([0, -len])
 96 |         y_wp = np.array([0, len])
 97 |     elif quadrant == 3:
 98 |         x_wp = np.array([0, -len])
 99 |         y_wp = np.array([0, -len])
100 |     elif quadrant == 4:
101 |         x_wp = np.array([0, len])
102 |         y_wp = np.array([0, -len])
103 | 
104 |     xdes = x_wp
105 |     ydes = y_wp
106 |     return npoints, x_wp, y_wp, psi0, xdes, ydes
107 | 
108 | # No wind condition
109 | 
110 | # Single waypoint tracking in four quadrants
111 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=1)
112 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_01', xdes=xdes, ydes=ydes)
113 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=2)
114 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_02', xdes=xdes, ydes=ydes)
115 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=3)
116 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_03', xdes=xdes, ydes=ydes)
117 | # n, xwp, ywp, psi0, xdes, ydes = single_wp(quadrant=4)
118 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='quadrant_04', xdes=xdes, ydes=ydes)
119 | 
120 | # Ellipse starboard turn
121 | # n, xwp, ywp, psi0, xdes, ydes = ellipse(flag=0)
122 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='ellipse_stbd', xdes=xdes, ydes=ydes)
123 | 
124 | # Ellipse port turn
125 | n, xwp, ywp, psi0, xdes, ydes = ellipse(flag=1)
126 | wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='ellipse_port', xdes=xdes, ydes=ydes)
127 | 
128 | # Straight line
129 | # n, xwp, ywp, psi0, xdes, ydes = straight()
130 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='straight', xdes=xdes, ydes=ydes)
131 | 
132 | # # Eight bottom first
133 | # n, xwp, ywp, psi0, xdes, ydes = eight(flag=0)
134 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='eight_bottom', xdes=xdes, ydes=ydes)
135 | 
136 | # # Eight top first
137 | # n, xwp, ywp, psi0, xdes, ydes = eight(flag=1)
138 | # wpt.wp_track(model_name, wind_flag=0, wind_speed=0, wind_dir=0, npoints=n, x_wp=xwp, y_wp=ywp, psi0=psi0, traj_str='eight_top', xdes=xdes, ydes=ydes)
139 | 


--------------------------------------------------------------------------------
/src/kcs/dqn_train.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from scipy.io import savemat
  3 | import matplotlib.pyplot as plt
  4 | import tensorflow as tf
  5 | import tf_agents
  6 | from tf_agents.environments import tf_py_environment
  7 | from tf_agents.specs import array_spec
  8 | from tf_agents.agents.dqn import dqn_agent
  9 | from tf_agents.networks import q_network
 10 | from tf_agents.policies import policy_saver
 11 | from tf_agents.replay_buffers import tf_uniform_replay_buffer
 12 | from tf_agents.trajectories import trajectory
 13 | from tf_agents.utils import common
 14 | from tf_agents.trajectories import time_step as ts
 15 | from tf_agents.environments import wrappers
 16 | from tf_agents.policies import random_tf_policy
 17 | from tensorflow.keras.layers import Dense, Input
 18 | from tensorflow.keras.optimizers.schedules import ExponentialDecay, PolynomialDecay
 19 | from environment import ship_environment
 20 | import os
 21 | import time
 22 | 
 23 | # collecting data with random policy to populate replay buffer
 24 | def collect_init_data(environment, policy, buffer):
 25 |     time_step = environment.reset()
 26 |     episode_return = 0
 27 |     while not np.equal(time_step.step_type, 2):
 28 | 
 29 |         action_step = policy.action(time_step)
 30 |         next_time_step = environment.step(action_step.action)
 31 |         episode_return += next_time_step.reward.numpy()[0]
 32 |         traj = trajectory.from_transition(time_step, action_step, next_time_step)
 33 |         buffer.add_batch(traj)
 34 |         time_step=next_time_step
 35 | 
 36 |     return episode_return
 37 | 
 38 | # collect data through epsilon-greedy policy and train agent
 39 | def collect_data(envr, policy, buffer, dataset, agent, ep_counter, params):
 40 |     time_step = envr.reset()
 41 |     episode_return = 0
 42 |     global timestep_counter
 43 | 
 44 |     # time_step.step_type:  0-> initial step,1->intermediate, 2-> terminal step
 45 |     while not np.equal(time_step.step_type, 2):
 46 | 
 47 |         action_step = policy.action(time_step)
 48 | 
 49 |         #using custom annealing epsilon-greedy policy to collect training data
 50 |         epsilon = params.epsilon_greedy(ep_counter)
 51 |         if np.random.random() < epsilon:
 52 |             action_no = np.random.randint(0, 3)
 53 |             action = tf.constant([action_no],  shape=(1,), dtype=np.int64,name='action')
 54 |             action_step = tf_agents.trajectories.policy_step.PolicyStep(action)
 55 | 
 56 |         next_time_step = envr.step(action_step)
 57 | 
 58 |         traj = trajectory.from_transition(time_step, action_step, next_time_step)
 59 |         episode_return = next_time_step.reward.numpy()[0] + episode_return
 60 |         buffer.add_batch(traj)
 61 |         time_step = next_time_step
 62 | 
 63 |         #update q-network once every some steps
 64 |         if timestep_counter % params.DQN_update_time_steps == 0:
 65 |             iterator=iter(dataset)
 66 |             experience, unused_info = next(iterator)
 67 |             agent.train(experience)
 68 | 
 69 |         timestep_counter+=1
 70 |     print("EPISODE RETURN", episode_return)
 71 |     return episode_return, agent
 72 | 
 73 | def dqn_train(params):
 74 |     mname = params.model_name
 75 |     if not os.path.exists('saved_models/' + mname):
 76 |         if not os.path.exists('saved_models'):
 77 |             os.mkdir('saved_models')
 78 |         os.mkdir('saved_models/' + mname)
 79 |         os.mkdir('saved_models/' + mname + '/plots')
 80 | 
 81 |     params.print_params('saved_models/' + mname)
 82 | 
 83 |     env = wrappers.TimeLimit(ship_environment(train_test_flag=0, wind_flag=params.wind_flag,
 84 |                                               wind_speed=params.wind_speed, wind_dir=params.wind_dir,
 85 |                                               wave_flag=params.wave_flag, wave_height=params.wave_height,
 86 |                                               wave_period=params.wave_period, wave_dir=params.wave_dir),
 87 |                              duration=params.duration)
 88 |     tf_env = tf_py_environment.TFPyEnvironment(env)
 89 | 
 90 |     #learning rate scheduler to decay learning rate with time
 91 |     lr_schedule=ExponentialDecay(initial_learning_rate=params.initial_learning_rate,
 92 |                                  decay_steps=params.decay_steps,
 93 |                                  decay_rate=params.decay_rate)
 94 |     # lr_schedule = PolynomialDecay(initial_learning_rate=0.01, decay_steps=50000, end_learning_rate=0.001, power=5)
 95 | 
 96 |     optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
 97 |     # optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
 98 | 
 99 |     q_net = q_network.QNetwork(tf_env.observation_spec(),tf_env.action_spec(),
100 |                                fc_layer_params=params.fc_layer_params,
101 |                                activation_fn=tf.keras.activations.tanh)
102 | 
103 |     train_step_counter = tf.Variable(0)
104 | 
105 |     agent = dqn_agent.DqnAgent(
106 |         tf_env.time_step_spec(),
107 |         tf_env.action_spec(),
108 |         q_network=q_net,
109 |         optimizer=optimizer,
110 |         gamma=params.discount_factor,
111 |         target_update_tau=params.target_update_tau,
112 |         target_update_period=params.target_update_period,
113 |         td_errors_loss_fn=common.element_wise_squared_loss,
114 |         train_step_counter=train_step_counter)
115 | 
116 |     agent.initialize()
117 | 
118 |     replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
119 |         data_spec=agent.collect_data_spec,
120 |         batch_size=tf_env.batch_size,
121 |         max_length=params.replay_buffer_max_length)
122 | 
123 |     random_policy = random_tf_policy.RandomTFPolicy(tf_env.time_step_spec(),tf_env.action_spec())
124 | 
125 |     #collecting samples with random policy
126 |     for __ in range(10):
127 |         collect_init_data(tf_env, random_policy, replay_buffer)
128 | 
129 |     dataset = replay_buffer.as_dataset(
130 |         num_parallel_calls=params.num_parallel_calls,
131 |         sample_batch_size=params.sample_batch_size,
132 |         num_steps=params.num_steps).prefetch(params.prefetch)
133 | 
134 |     # Reset the train step
135 |     agent.train_step_counter.assign(0)
136 | 
137 |     ep_counter=0
138 |     global timestep_counter
139 |     timestep_counter=0
140 | 
141 |     episodes = params.max_episodes
142 |     returns = []
143 |     train_loss = []
144 | 
145 |     while ep_counter < episodes:
146 | 
147 |         ereturn, agent = collect_data(tf_env, agent.policy, replay_buffer, dataset, agent, ep_counter, params)
148 |         returns.append(ereturn)
149 | 
150 |         # Sample a batch of data from the buffer and update the agent's network.
151 |         iterator = iter(dataset)
152 |         experience, unused_info = next(iterator)
153 |         eloss = agent.train(experience).loss
154 |         train_loss.append(eloss)
155 | 
156 |         if len(train_loss) < 110:
157 |             print('episode = {0}, Loss = {1}'.format(ep_counter, eloss))
158 |         else:
159 |             print('episode = {0}, Avg Loss (100 ep) = {1}'.format(ep_counter,
160 |                                                                   (sum(train_loss[-100:-1]) + train_loss[-1]) / 100))
161 | 
162 |         ep_counter +=1
163 | 
164 |         # saving agent's policy at intervals
165 |         if ep_counter % params.DQN_policy_store_frequency == 0 and ep_counter >= 1:
166 |             policy_dir = os.path.join('saved_models', mname, mname + '_ep_' + str(ep_counter))
167 |             tf_policy_saver = policy_saver.PolicySaver(agent.policy)
168 |             tf_policy_saver.save(policy_dir)
169 | 
170 |     # checkpoint_dir = 'saved_models/jan8_'
171 |     # train_checkpointer = common.Checkpointer(
172 |     #     ckpt_dir=checkpoint_dir,
173 |     #     max_to_keep=1,
174 |     #     agent=agent,
175 |     #     policy=agent.policy,
176 |     #     replay_buffer=replay_buffer,
177 |     # )
178 | 
179 |     train_loss = np.array(train_loss)
180 |     interval = params.DQN_loss_avg_interval
181 |     avg_losses, avg_returns = [], []
182 | 
183 |     for i in range(len(train_loss) - interval):
184 |         avg_returns.append(sum(returns[i:i + interval]) / interval)
185 |         avg_losses.append(sum(train_loss[i:i + interval]) / interval)
186 | 
187 |     mat_dict = {'returns':np.array(returns), 'loss':np.array(train_loss),
188 |                 'avg_returns':np.array(avg_returns), 'avg_loss':np.array(avg_losses)}
189 | 
190 |     savemat('saved_models/'+mname+'/'+mname+'.mat', mat_dict)
191 | 
192 |     plt.figure()
193 |     plt.title("Returns vs. Episodes")
194 |     plt.ylabel("Returns")
195 |     plt.plot(avg_returns)
196 |     plt.xlabel("Episodes")
197 |     plt.grid()
198 |     plt.savefig('saved_models/'+mname+'/plots/'+mname+'_return.png', dpi=600)
199 | 
200 |     plt.figure()
201 |     plt.title("Train-loss vs. Episodes")
202 |     plt.xlabel("Episodes")
203 |     plt.ylabel("Loss")
204 |     plt.plot(avg_losses)
205 |     plt.grid()
206 |     plt.savefig('saved_models/'+mname+'/plots/'+mname+'_loss.png', dpi=600)
207 |     # plt.show()
208 | 
209 | if __name__ == "__main__":
210 |     import hyperparams as params
211 |     dqn_train(params)
212 | 


--------------------------------------------------------------------------------
/src/kcs/environment.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tensorflow as tf
  3 | from tf_agents.environments import py_environment
  4 | from tf_agents.environments import tf_py_environment
  5 | from tf_agents.specs import array_spec
  6 | from tf_agents.utils import common
  7 | from tf_agents.trajectories import time_step as ts
  8 | from scipy.integrate import solve_ivp
  9 | # import hyperparams as params
 10 | import kcs
 11 | 
 12 | 
 13 | class ship_environment(py_environment.PyEnvironment):
 14 | 
 15 |     def __init__(self, train_test_flag=0,
 16 |                  wind_flag=0,
 17 |                  wind_speed=0,
 18 |                  wind_dir=0,
 19 |                  wave_flag=0,
 20 |                  wave_height=0,
 21 |                  wave_period=0,
 22 |                  wave_dir=0,
 23 |                  test_x_waypoints=None,
 24 |                  test_y_waypoints=None,
 25 |                  nwp=None,
 26 |                  initial_obs_state=None):
 27 | 
 28 |         self._action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.int64, minimum=0, maximum=2,name='action')
 29 |         self._observation_spec = array_spec.BoundedArraySpec(shape=(4,), dtype=np.float32, minimum=[-20, -np.pi, 0, -1],
 30 |                                                              maximum=[20, np.pi, 50, 1], name='observation')
 31 |         self.episode_ended = False
 32 |         self.counter = 0
 33 |         self.x_goal = 0
 34 |         self.y_goal = 0
 35 |         self.distance = 0
 36 |         self.obs_state = [1, 0, 0, 0, 0, 0, 0]
 37 | 
 38 |         self.train_test_flag = train_test_flag
 39 |         self.wind_flag = wind_flag
 40 |         self.wind_speed = wind_speed
 41 |         self.wind_dir = wind_dir
 42 | 
 43 |         self.wave_flag = wave_flag
 44 |         self.wave_height = wave_height
 45 |         self.wave_period = wave_period
 46 |         self.wave_dir = wave_dir
 47 | 
 48 |         # Test waypoints
 49 |         self.test_x_waypoints = test_x_waypoints
 50 |         self.test_y_waypoints = test_y_waypoints
 51 |         self.nwp = nwp
 52 |         self.wp_counter = 1
 53 |         self.initial_obs_state = initial_obs_state
 54 | 
 55 |         # Trajectories of state variables
 56 |         # Only stored in testing
 57 |         self.x_traj = []
 58 |         self.y_traj = []
 59 |         self.psi_traj = []
 60 |         self.u_traj = []
 61 |         self.v_traj = []
 62 |         self.r_traj = []
 63 |         self.delta_traj = []
 64 | 
 65 |         # Trajectories of observation states
 66 |         # Only stored in testing
 67 |         self.cross_trk_err_traj = []
 68 |         self.course_ang_err_traj =[]
 69 |         self.distance_traj = []
 70 | 
 71 |         # Trajectories of action
 72 |         # Only stored in testing
 73 |         self.action_traj = []
 74 | 
 75 |         # Reward in trajectory
 76 |         # Only stored in testing
 77 |         self.total_reward = []
 78 |         self.reward_01 = []    # Cross track error
 79 |         self.reward_02 = []    # Course angle error
 80 |         self.reward_03 = []    # Distance to goal
 81 | 
 82 |     def action_spec(self):
 83 |         return self._action_spec
 84 | 
 85 |     def observation_spec(self):
 86 |         return self._observation_spec
 87 | 
 88 |     def _step(self, action_no):
 89 | 
 90 |         # Action selection
 91 |         action_set = [-35*np.pi/180,0,35*np.pi/180]
 92 |         delta_c = action_set[action_no]
 93 |         if self.train_test_flag == 1:
 94 |             self.action_traj.append(delta_c)
 95 | 
 96 |         tspan = (0, 0.3)
 97 |         yinit = self.obs_state
 98 | 
 99 |         sol = solve_ivp(lambda t,v: kcs.KCS_ode(t,v,delta_c, wind_flag=self.wind_flag,
100 |                                                 wind_speed=self.wind_speed, wind_dir=self.wind_dir,
101 |                                                 wave_flag=self.wave_flag, wave_height=self.wave_height,
102 |                                                 wave_period=self.wave_period, wave_dir=self.wave_dir),
103 |                         tspan, yinit, t_eval=tspan, dense_output=True)
104 | 
105 |         # sol.y outputs 0)surge vel 1) sway vel 2) yaw vel 3) x_coord 4) y_coord 5) psi 6) delta
106 | 
107 |         u = sol.y[0][-1]
108 |         v = sol.y[1][-1]
109 |         r = sol.y[2][-1]
110 |         x = sol.y[3][-1]
111 |         y = sol.y[4][-1]
112 |         psi_rad = sol.y[5][-1]  # psi
113 |         psi = psi_rad % (2 * np.pi)
114 |         delta = sol.y[6][-1]
115 | 
116 |         self.obs_state[0] = sol.y[0][-1]    # Surge velocity
117 |         self.obs_state[1] = sol.y[1][-1]    # Sway velocity
118 |         self.obs_state[2] = sol.y[2][-1]    # Yaw velocity
119 |         self.obs_state[3] = sol.y[3][-1]    # X cooridnate
120 |         self.obs_state[4] = sol.y[4][-1]    # Y coordinate
121 |         self.obs_state[5] = psi             # Heading angle
122 |         self.obs_state[6] = sol.y[6][-1]    # Actual rudder angle
123 | 
124 |         if self.train_test_flag == 1:
125 |             self.x_traj.append(x)
126 |             self.y_traj.append(y)
127 |             self.psi_traj.append(psi_rad)
128 |             self.u_traj.append(u)
129 |             self.v_traj.append(v)
130 |             self.r_traj.append(r)
131 |             self.delta_traj.append(delta)
132 | 
133 |         x_init = 0
134 |         y_init = 0
135 |         x_goal = self.x_goal
136 |         y_goal = self.y_goal
137 | 
138 |         if self.train_test_flag == 1:
139 |             x_init = self.test_x_waypoints[self.wp_counter-1]
140 |             y_init = self.test_y_waypoints[self.wp_counter-1]
141 |             x_goal = self.test_x_waypoints[self.wp_counter]
142 |             y_goal = self.test_y_waypoints[self.wp_counter]
143 | 
144 |         # DISTANCE TO GOAL
145 |         distance = ((x - x_goal) ** 2 + (y - y_goal) ** 2) ** 0.5
146 |         self.distance = distance
147 |         if self.train_test_flag == 1:
148 |             self.distance_traj.append(distance)
149 | 
150 |         # CROSS TRACK ERROR
151 |         vec1 = np.array([x_goal - x_init, y_goal - y_init])
152 |         vec2 = np.array([x_goal - x, y_goal - y])
153 |         vec1_hat = vec1 / np.linalg.norm(vec1)
154 |         cross_track_error = np.cross(vec2, vec1_hat)
155 | 
156 |         if self.train_test_flag == 1:
157 |             self.cross_trk_err_traj.append(cross_track_error)
158 | 
159 |         # COURSE ANGLE ERROR
160 |         x_dot = u * np.cos(psi) - v * np.sin(psi)
161 |         y_dot = u * np.sin(psi) + v * np.cos(psi)
162 | 
163 |         Uvec = np.array([x_dot, y_dot])
164 |         Uvec_hat = Uvec / np.linalg.norm(Uvec)
165 |         vec2_hat = vec2 / np.linalg.norm(vec2)
166 | 
167 |         course_angle = np.arctan2(Uvec[1], Uvec[0])
168 |         psi_vec2 = np.arctan2(vec2[1], vec2[0])
169 | 
170 |         course_angle_err = course_angle - psi_vec2
171 |         course_angle_err = (course_angle_err + np.pi) % (2 * np.pi) - np.pi
172 | 
173 |         if self.train_test_flag == 1:
174 |             self.course_ang_err_traj.append(course_angle_err)
175 | 
176 |         # REWARD FUNCTIONS
177 |         R1 = 2 * np.exp(-0.08 * cross_track_error ** 2) - 1
178 |         R2 = 1.3 * np.exp(-10 * (abs(course_angle_err))) - 0.3
179 |         R3 = -distance * 0.25
180 |         reward = R1 + R2 + R3
181 | 
182 |         if self.train_test_flag == 1:
183 |             self.reward_01.append(R1)
184 |             self.reward_02.append(R2)
185 |             self.reward_03.append(R3)
186 |             self.total_reward.append(reward)
187 | 
188 |         observation = [cross_track_error, course_angle_err, distance, r]
189 | 
190 |         # DESTINATION CHECK
191 |         if abs(distance) <= 0.5:
192 |             if self.train_test_flag == 1:
193 |                 print(f"Destination {self.wp_counter} reached")
194 |                 if self.wp_counter == self.nwp - 1:
195 |                     return ts.termination(np.array(observation, dtype=np.float32), reward)
196 |                 self.wp_counter += 1
197 | 
198 |             else:
199 |                 reward = 100
200 |                 self.episode_ended = True
201 |                 print('Destination reached')
202 |                 return ts.termination(np.array(observation, dtype=np.float32), reward)
203 | 
204 |         # TERMINATION CHECK
205 |         angle_btw23 = np.arccos(np.dot(vec2_hat, Uvec_hat))
206 |         angle_btw12 = np.arccos(np.dot(vec1_hat, vec2_hat))
207 |         if angle_btw12 > np.pi / 2 and angle_btw23 > np.pi / 2:
208 |             self.episode_ended = True
209 |             if self.train_test_flag == 1:
210 |                 print(f"Destination {self.wp_counter} missed")
211 |             return ts.termination(np.array(observation, dtype=np.float32), reward)
212 |         else:
213 |             return ts.transition(np.array(observation, dtype=np.float32), reward)
214 | 
215 |     def _reset(self):
216 | 
217 |         # print("Next episode")
218 | 
219 |         if self.train_test_flag == 0:
220 |             self.obs_state = [1, 0, 0, 0, 0, 0, 0]
221 | 
222 |             radius = np.random.randint(8, 28)
223 |             random_theta = 2 * np.pi * np.random.random()
224 | 
225 |             self.x_goal = radius * np.cos(random_theta)
226 |             self.y_goal = radius * np.sin(random_theta)
227 | 
228 |             self.episode_ended = False
229 | 
230 |             print("GOAL COORD", self.x_goal, self.y_goal)
231 |             x_goal = self.x_goal
232 |             y_goal = self.y_goal
233 | 
234 |             x_dot = 1
235 |             y_dot = 0
236 | 
237 |             vec2 = np.array([x_goal, y_goal])
238 |             Uvec = np.array([x_dot, y_dot])
239 | 
240 |             course_angle = np.arctan2(Uvec[1], Uvec[0])
241 |             psi_vec2 = np.arctan2(vec2[1], vec2[0])
242 |             course_angle_err = course_angle - psi_vec2
243 |             course_angle_err = (course_angle_err + np.pi) % (2 * np.pi) - np.pi
244 | 
245 |             self.counter = self.counter + 1
246 |             observation = np.array([0, course_angle_err, radius, 0], dtype=np.float32)
247 | 
248 |         else:
249 |             self.obs_state = self.initial_obs_state
250 | 
251 |             self.x_goal = self.test_x_waypoints[1]
252 |             self.y_goal = self.test_y_waypoints[1]
253 | 
254 |             x = self.test_x_waypoints[0]
255 |             y = self.test_y_waypoints[0]
256 | 
257 |             print("GOAL COORD", self.x_goal, self.y_goal)
258 |             x_goal = self.x_goal
259 |             y_goal = self.y_goal
260 | 
261 |             u = self.initial_obs_state[0]
262 |             v = self.initial_obs_state[1]
263 |             psi = self.initial_obs_state[5]
264 |             x_dot = u * np.cos(psi) - v * np.sin(psi)
265 |             y_dot = u * np.sin(psi) + v * np.cos(psi)
266 | 
267 |             vec2 = np.array([x_goal, y_goal])
268 |             Uvec = np.array([x_dot, y_dot])
269 | 
270 |             course_angle = np.arctan2(Uvec[1], Uvec[0])
271 |             psi_vec2 = np.arctan2(vec2[1], vec2[0])
272 |             course_angle_err = course_angle - psi_vec2
273 |             course_angle_err = (course_angle_err + np.pi) % (2 * np.pi) - np.pi
274 | 
275 |             dist_to_goal = np.sqrt((x_goal - x) ** 2 + (y_goal - y) ** 2)
276 | 
277 |             observation = np.array([0, course_angle_err, dist_to_goal, 0], dtype=np.float32)
278 | 
279 | 
280 |         return ts.restart(observation)
281 | 
282 | 
283 | # ENDS HERE
284 | 


--------------------------------------------------------------------------------
/src/kcs/hyperparams.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import os
 3 | 
 4 | # Wind Parameters
 5 | wind_flag = 0
 6 | wind_speed = 0
 7 | wind_dir = 0
 8 | 
 9 | # Wave Parameters
10 | wave_flag = 0
11 | wave_height = 0
12 | wave_period = 0
13 | wave_dir = 0
14 | 
15 | # TRAINING HYPERPARAMETERS
16 | 
17 | model_name = 'model_001'
18 | duration = 160
19 | 
20 | # Learning rate parameters - Exponential decay
21 | initial_learning_rate = 0.005
22 | decay_steps = 5000
23 | decay_rate = 0.8
24 | 
25 | fc_layer_params = (64,64)
26 | discount_factor = 0.95
27 | target_update_tau = 0.01
28 | target_update_period = 1
29 | 
30 | replay_buffer_max_length = 100000
31 | num_parallel_calls = 2
32 | sample_batch_size = 128
33 | num_steps = 2
34 | prefetch = 3
35 | 
36 | max_episodes = 8001
37 | epsilon_greedy_episodes = 4000
38 | random_seed = 12345
39 | 
40 | DQN_update_time_steps = 20              # Updates DQN parameters every these many time steps
41 | DQN_policy_store_frequency = 1000       # Stores DQN policy every these many episodes
42 | DQN_loss_avg_interval = 100             # Computes DQN loss and returns by averaging over these many episodes
43 | 
44 | def epsilon_greedy(ep_counter):
45 |     # Epsilon greedy algorithm to balance exploration and exploitation
46 |     return np.maximum(1 - (ep_counter /epsilon_greedy_episodes), 0)
47 | 
48 | def print_params(path):
49 |     fid = open(os.path.join(path,'parameters.txt'),'w')
50 |     fid.write(f'model_name:{model_name}\nduration:{duration}\n\n')
51 |     fid.write(f'initial_learning_rate:{initial_learning_rate}\ndecay_steps:{decay_steps}\ndecay_rate:{decay_rate}\n\n')
52 |     fid.write(f'fc_layer_params:{fc_layer_params}\ndiscount_factor:{discount_factor}\ntarget_update_tau:{target_update_tau}\ntarget_update_period:{target_update_period}\n\n')
53 |     fid.write(f'replay_buffer_max_length:{replay_buffer_max_length}\nnum_parallel_calls:{num_parallel_calls}\nsample_batch_size:{sample_batch_size}\nnum_steps:{num_steps}\nprefetch:{prefetch}\n\n')
54 |     fid.write(f'max_episodes:{max_episodes}\nepsilon_greedy_episodes:{epsilon_greedy_episodes}\nrandom_seed:{random_seed}\n\n')
55 |     fid.write(f'DQN_update_time_steps:{DQN_update_time_steps}\nDQN_policy_store_frequency:{DQN_policy_store_frequency}\nDQN_loss_avg_interval:{DQN_loss_avg_interval}\n')
56 |     fid.close()
57 | 
58 | 


--------------------------------------------------------------------------------
/src/kcs/kcs.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | # Ship Geometry and Constants
  4 | L = 230
  5 | B = 32.2
  6 | d_em = 10.8
  7 | rho = 1025
  8 | g = 9.80665
  9 | 
 10 | Cb = 0.651
 11 | Dsp = Cb * L * B * d_em
 12 | Fn = 0.26
 13 | U_des = Fn * np.sqrt(g * L)
 14 | xG = -3.404
 15 | kzzp = 0.25
 16 | 
 17 | # Surge Hydrodynamic Derivatives in non-dimensional form
 18 | X0 = -0.0167
 19 | Xbb = -0.0549
 20 | Xbr_minus_my = -0.1084
 21 | Xrr = -0.0120
 22 | Xbbbb = -0.0417
 23 | 
 24 | # Sway Hydrodynamic Derivatives in non-dimensional form
 25 | Yb = 0.2252
 26 | Yr_minus_mx = 0.0398
 27 | Ybbb = 1.7179
 28 | Ybbr = -0.4832
 29 | Ybrr = 0.8341
 30 | Yrrr = -0.0050
 31 | 
 32 | # Yaw Hydrodynamic Derivatives in non-dimensional form
 33 | Nb = 0.1111
 34 | Nr = -0.0465
 35 | Nbbb = 0.1752
 36 | Nbbr = -0.6168
 37 | Nbrr = 0.0512
 38 | Nrrr = -0.0387
 39 | 
 40 | n_prop = 115.5 / 60
 41 | Dp = 7.9
 42 | wp = 1 - 0.645  # Effective Wake Fraction of the Propeller
 43 | tp = 1 - 0.793  # Thrust Deduction Factor
 44 | 
 45 | eps = 0.956
 46 | eta = 0.7979
 47 | kappa = 0.633
 48 | xp_P = -0.4565  # Assuming propeller location is 10 m ahead of AP (Rudder Location)
 49 | xp_R = -0.5
 50 | 
 51 | mp = Dsp / (0.5 * (L ** 2) * d_em)
 52 | xGp = xG / L
 53 | # Added Mass and Mass Moment of Inertia (from MDLHydroD)
 54 | mxp = 1790.85 / (0.5 * (L ** 2) * d_em)
 55 | myp = 44324.18 / (0.5 * (L ** 2) * d_em)
 56 | Jzzp = 140067300 / (0.5 * (L ** 4) * d_em)
 57 | Izzp = mp * (kzzp ** 2) + mp * (xGp ** 2)
 58 | 
 59 | a0 = 0.5228
 60 | a1 = -0.4390
 61 | a2 = -0.0609
 62 | 
 63 | tR = 1 - 0.742
 64 | aH = 0.361
 65 | xp_H = -0.436
 66 | 
 67 | A_R = L * d_em / 54.86
 68 | Lamda = 2.164
 69 | f_alp = 6.13 * Lamda / (2.25 + Lamda)
 70 | 
 71 | 
 72 | def KCS_ode(t, v, delta_c, wind_flag=0, wind_speed=0, wind_dir=0,
 73 |             wave_flag=0, wave_height=0, wave_period=0, wave_dir=0):
 74 | 
 75 |     # Nondimensional State Space Variables
 76 |     up = v[0]
 77 |     vp = v[1]
 78 |     rp = v[2]
 79 |     xp = v[3]
 80 |     yp = v[4]
 81 |     psi = v[5]
 82 |     delta = v[6]
 83 |     # n_prop = v[7]
 84 | 
 85 |     # Derived kinematic variables
 86 |     b = np.arctan2(-vp, up)  # Drift angle
 87 | 
 88 |     # ----------------------------------------------------
 89 |     # Hull Force Calculation
 90 |     # ----------------------------------------------------
 91 | 
 92 |     # Non-dimensional Surge Hull Hydrodynamic Force
 93 |     Xp_H = X0 * (up ** 2) \
 94 |            + Xbb * (b ** 2) + Xbr_minus_my * b * rp \
 95 |            + Xrr * (rp ** 2) + Xbbbb * (b ** 4)
 96 | 
 97 |     # Non-dimensional Sway Hull Hydrodynamic Force
 98 |     Yp_H = Yb * b + Yr_minus_mx * rp + Ybbb * (b ** 3) \
 99 |            + Ybbr * (b ** 2) * rp + Ybrr * b * (rp ** 2) \
100 |            + Yrrr * (rp ** 3)
101 | 
102 |     # Non-dimensional Yaw Hull Hydrodynamic Moment
103 |     Np_H = Nb * b + Nr * rp + Nbbb * (b ** 3) \
104 |            + Nbbr * (b ** 2) * rp + Nbrr * b * (rp ** 2) \
105 |            + Nrrr * (rp ** 3)
106 | 
107 |     # ----------------------------------------------------
108 |     # Propulsion Force Calculation
109 |     # ----------------------------------------------------
110 |     # The value self propulsion RPM is taken from Yoshimura's SIMMAN study
111 |     # Analysis of steady hydrodynamic force components and prediction of
112 |     # manoeuvering ship motion with KVLCC1, KVLCC2 and KCS
113 | 
114 |     J = (up * U_des) * (1 - wp) / (n_prop * Dp)  # Advance Coefficient
115 | 
116 |     Kt = a0 + a1 * J + a2 * (J ** 2)  # Thrust Coefficient
117 | 
118 |     # Dimensional Propulsion Force
119 |     X_P = (1 - tp) * rho * Kt * (Dp ** 4) * (n_prop ** 2)
120 | 
121 |     # Non-dimensional Propulsion Force
122 |     Xp_P = X_P / (0.5 * rho * L * d_em * (U_des ** 2))
123 | 
124 |     # ----------------------------------------------------
125 |     # Rudder Force Calculation
126 |     # ----------------------------------------------------
127 | 
128 |     b_p = b - xp_P * rp
129 | 
130 |     if b_p > 0:
131 |         gamma_R = 0.492
132 |     else:
133 |         gamma_R = 0.338
134 | 
135 |     lp_R = -0.755
136 | 
137 |     up_R = eps * (1 - wp) * up * np.sqrt(eta * (1 + kappa * (np.sqrt(1 + 8 * Kt / (np.pi * (J ** 2))) - 1)) ** 2 + (1 - eta))
138 | 
139 |     vp_R = gamma_R * (vp + rp * lp_R)
140 | 
141 |     Up_R = np.sqrt(up_R ** 2 + vp_R ** 2)
142 |     alpha_R = delta - np.arctan2(-vp_R, up_R)
143 | 
144 |     F_N = A_R / (L * d_em) * f_alp * (Up_R ** 2) * np.sin(alpha_R)
145 | 
146 |     Xp_R = - (1 - tR) * F_N * np.sin(delta)
147 |     Yp_R = - (1 + aH) * F_N * np.cos(delta)
148 |     Np_R = - (xp_R + aH * xp_H) * F_N * np.cos(delta)
149 | 
150 |     # ----------------------------------------------------
151 |     # Coriolis terms
152 |     # ----------------------------------------------------
153 | 
154 |     mp = Dsp / (0.5 * (L ** 2) * d_em)
155 |     xGp = xG / L
156 | 
157 |     Xp_C = mp * vp * rp + mp * xGp * (rp ** 2)
158 |     Yp_C = -mp * up * rp
159 |     Np_C = -mp * xGp * up * rp
160 | 
161 |     # ----------------------------------------------------
162 |     # Wind Force Calculation
163 |     # ----------------------------------------------------
164 | 
165 |     if wind_flag == 1:
166 |         Vw = wind_speed  # wind speed
167 |         betaw = wind_dir * (np.pi/180)  # wind direction
168 |         Lp = 3.0464
169 |         de = 0.1430
170 |         uw = Vw * np.cos(betaw - psi)
171 |         vw = Vw * np.sin(betaw - psi)
172 |         urw = up - uw
173 |         vrw = vp - vw
174 |         Uwr = (urw ** 2 + vrw ** 2) ** 0.5
175 |         gammaw = np.arctan2(-vrw, -urw)
176 |         # print(gammaw,"gamma")
177 | 
178 |         rhow = 1025
179 |         rhoa = 1.225
180 |         Ax = (0.4265 * (0.2517 - 0.1430))
181 |         Ay = ((0.2517 - 0.1430) * Lp)
182 |         # print(Ax,Ay,"AX")
183 | 
184 |         Cwx = 1 * np.cos(gammaw)
185 |         Cwy = 1 * np.sin(gammaw)
186 |         Cwpsi = 0.5 * np.sin(gammaw)
187 | 
188 |         Xp_W = (Ax * Cwx * Uwr * abs(Uwr)) * rhoa / (Lp * de * rhow)
189 |         Yp_W = (Ay * Cwy * Uwr * abs(Uwr)) * rhoa / (Lp * de * rhow)
190 |         Np_W = (Ay * Lp * Cwpsi * Uwr * abs(Uwr)) * rhoa / ((Lp ** 2) * de * rhow)
191 |     else:
192 |         Xp_W = 0.0
193 |         Yp_W = 0.0
194 |         Np_W = 0.0
195 | 
196 |     # Net non-dimensional force and moment computation
197 |     Xp = Xp_H + Xp_R + Xp_C + Xp_W + Xp_P
198 |     Yp = Yp_H + Yp_R + Yp_C + Yp_W
199 |     Np = Np_H + Np_R + Np_C + Np_W
200 | 
201 |     # Net force vector computation in Abkowitz non-dimensionalization
202 |     X = Xp
203 |     Y = Yp
204 |     N = Np
205 | 
206 |     # Added Mass and Mass Moment of Inertia (from MDLHydroD)
207 |     mxp = 1790.85 / (0.5 * (L ** 2) * d_em)
208 |     myp = 44324.18 / (0.5 * (L ** 2) * d_em)
209 |     Jzzp = 140067300 / (0.5 * (L ** 4) * d_em)
210 |     Izzp = mp * (kzzp ** 2) + mp * (xGp ** 2)
211 | 
212 |     Mmat = np.zeros((3, 3))
213 | 
214 |     Mmat[0, 0] = mp + mxp
215 |     Mmat[1, 1] = mp + myp
216 |     Mmat[2, 2] = Izzp + Jzzp
217 |     Mmat[1, 2] = mp * xGp
218 |     Mmat[2, 1] = mp * xGp
219 | 
220 |     Mmatinv = np.linalg.inv(Mmat)
221 | 
222 |     tau = np.array([X, Y, N])
223 | 
224 |     vel_der = Mmatinv @ tau
225 | 
226 |     # Derivative of state vector
227 |     vd = np.zeros(7)
228 | 
229 |     vd[0:3] = vel_der
230 |     vd[3] = up * np.cos(psi) - vp * np.sin(psi)
231 |     vd[4] = up * np.sin(psi) + vp * np.cos(psi)
232 |     vd[5] = rp
233 | 
234 |     # # Commanded Rudder Angle
235 |     # delta_c = KCS_rudder_angle(t, v)
236 | 
237 |     T_rud = 0.1  # Corresponds to a time constant of 0.1 * L / U_des = 2 seconds
238 |     deltad = (delta_c - delta) / T_rud
239 | 
240 |     deltad_max = 5 * np.pi / 180 * (L / U_des)  # Maximum rudder rate of 5 degrees per second
241 | 
242 |     # Rudder rate saturation
243 |     if np.abs(deltad) > deltad_max:
244 |         deltad = np.sign(deltad) * deltad_max
245 | 
246 |     vd[6] = deltad
247 | 
248 |     return vd


--------------------------------------------------------------------------------
/src/kcs/saved_models/model_012/model_012.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012.mat


--------------------------------------------------------------------------------
/src/kcs/saved_models/model_012/model_012_ep_7000/policy_specs.pbtxt:
--------------------------------------------------------------------------------
  1 | dict_value {
  2 |   fields {
  3 |     key: "collect_data_spec"
  4 |     value {
  5 |       named_tuple_value {
  6 |         name: "Trajectory"
  7 |         values {
  8 |           key: "step_type"
  9 |           value {
 10 |             tensor_spec_value {
 11 |               name: "step_type"
 12 |               shape {
 13 |               }
 14 |               dtype: DT_INT32
 15 |             }
 16 |           }
 17 |         }
 18 |         values {
 19 |           key: "observation"
 20 |           value {
 21 |             bounded_tensor_spec_value {
 22 |               name: "observation"
 23 |               shape {
 24 |                 dim {
 25 |                   size: 4
 26 |                 }
 27 |               }
 28 |               dtype: DT_FLOAT
 29 |               minimum {
 30 |                 dtype: DT_FLOAT
 31 |                 tensor_shape {
 32 |                   dim {
 33 |                     size: 4
 34 |                   }
 35 |                 }
 36 |                 tensor_content: "\000\000\240\301\333\017I\300\000\000\000\000\000\000\200\277"
 37 |               }
 38 |               maximum {
 39 |                 dtype: DT_FLOAT
 40 |                 tensor_shape {
 41 |                   dim {
 42 |                     size: 4
 43 |                   }
 44 |                 }
 45 |                 tensor_content: "\000\000\240A\333\017I@\000\000HB\000\000\200?"
 46 |               }
 47 |             }
 48 |           }
 49 |         }
 50 |         values {
 51 |           key: "action"
 52 |           value {
 53 |             bounded_tensor_spec_value {
 54 |               name: "action"
 55 |               shape {
 56 |               }
 57 |               dtype: DT_INT64
 58 |               minimum {
 59 |                 dtype: DT_INT64
 60 |                 tensor_shape {
 61 |                 }
 62 |                 int64_val: 0
 63 |               }
 64 |               maximum {
 65 |                 dtype: DT_INT64
 66 |                 tensor_shape {
 67 |                 }
 68 |                 int64_val: 2
 69 |               }
 70 |             }
 71 |           }
 72 |         }
 73 |         values {
 74 |           key: "policy_info"
 75 |           value {
 76 |             tuple_value {
 77 |             }
 78 |           }
 79 |         }
 80 |         values {
 81 |           key: "next_step_type"
 82 |           value {
 83 |             tensor_spec_value {
 84 |               name: "step_type"
 85 |               shape {
 86 |               }
 87 |               dtype: DT_INT32
 88 |             }
 89 |           }
 90 |         }
 91 |         values {
 92 |           key: "reward"
 93 |           value {
 94 |             tensor_spec_value {
 95 |               name: "reward"
 96 |               shape {
 97 |               }
 98 |               dtype: DT_FLOAT
 99 |             }
100 |           }
101 |         }
102 |         values {
103 |           key: "discount"
104 |           value {
105 |             bounded_tensor_spec_value {
106 |               name: "discount"
107 |               shape {
108 |               }
109 |               dtype: DT_FLOAT
110 |               minimum {
111 |                 dtype: DT_FLOAT
112 |                 tensor_shape {
113 |                 }
114 |                 float_val: 0.0
115 |               }
116 |               maximum {
117 |                 dtype: DT_FLOAT
118 |                 tensor_shape {
119 |                 }
120 |                 float_val: 1.0
121 |               }
122 |             }
123 |           }
124 |         }
125 |       }
126 |     }
127 |   }
128 |   fields {
129 |     key: "policy_state_spec"
130 |     value {
131 |       tuple_value {
132 |       }
133 |     }
134 |   }
135 | }
136 | 


--------------------------------------------------------------------------------
/src/kcs/saved_models/model_012/model_012_ep_7000/saved_model.pb:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012_ep_7000/saved_model.pb


--------------------------------------------------------------------------------
/src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.data-00000-of-00001:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.data-00000-of-00001


--------------------------------------------------------------------------------
/src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.index:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MarineAutonomy/Deep-Reinforcement-Learning-Based-Control-for-Ship-Navigation/e39720cdf759dd4a3a8bb2664d4375ab7352ed44/src/kcs/saved_models/model_012/model_012_ep_7000/variables/variables.index


--------------------------------------------------------------------------------
/src/kcs/saved_models/model_012/parameters.txt:
--------------------------------------------------------------------------------
 1 | model_name:model_012
 2 | duration:160
 3 | 
 4 | initial_learning_rate:0.001
 5 | decay_steps:5000
 6 | decay_rate:0.5
 7 | 
 8 | fc_layer_params:(64, 64)
 9 | discount_factor:0.95
10 | target_update_tau:0.01
11 | target_update_period:1
12 | 
13 | replay_buffer_max_length:100000
14 | num_parallel_calls:2
15 | sample_batch_size:128
16 | num_steps:2
17 | prefetch:3
18 | 
19 | max_episodes:10001
20 | epsilon_greedy_episodes:5000
21 | random_seed:69574
22 | 
23 | DQN_update_time_steps:20
24 | DQN_policy_store_frequency:1000
25 | DQN_loss_avg_interval:100
26 | 


--------------------------------------------------------------------------------
/src/kcs/test_wp_track.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | import numpy as np
 3 | import matplotlib.pyplot as plt
 4 | from tf_agents.environments import py_environment
 5 | from tf_agents.environments import tf_py_environment
 6 | from tf_agents.specs import array_spec
 7 | from tf_agents.trajectories import time_step as ts
 8 | from tf_agents.environments import wrappers
 9 | from scipy.integrate import solve_ivp
10 | import os
11 | from environment import ship_environment
12 | import re
13 | 
14 | def wp_track(model_name, wind_flag=0, wind_speed=0,wind_dir=0,
15 |              wave_flag=0, wave_height=0, wave_period=0, wave_dir=0,
16 |              npoints=None, x_wp=None, y_wp=None, psi0=None,traj_str='',
17 |              xdes=None, ydes=None):
18 | 
19 |     dir_list = os.listdir('saved_models/' + model_name)
20 |     regex = re.compile(model_name + '_ep_')
21 |     new_dir_list = list(filter(regex.match, dir_list))
22 | 
23 |     for policy_name in new_dir_list:
24 | 
25 |         env = wrappers.TimeLimit(ship_environment(train_test_flag=1,
26 |                                                   wind_flag=wind_flag, wind_speed=wind_speed, wind_dir=wind_dir,
27 |                                                   wave_flag=wave_flag, wave_height=wave_height,
28 |                                                   wave_period=wave_period, wave_dir=wave_dir,
29 |                                                   test_x_waypoints=x_wp,
30 |                                                   test_y_waypoints=y_wp,
31 |                                                   nwp=npoints,
32 |                                                   initial_obs_state=[1, 0, 0, x_wp[0], y_wp[0], psi0, 0]), duration=500)
33 | 
34 |         tf_env = tf_py_environment.TFPyEnvironment(env)
35 | 
36 |         train_step_counter = tf.Variable(0)
37 | 
38 |         # Reset the train step
39 |         episodes = 1
40 | 
41 |         saved_policy = tf.compat.v2.saved_model.load('saved_models/'+ model_name + '/' + policy_name)
42 | 
43 |         for _ in range(episodes):
44 | 
45 |             time_step = tf_env.reset()
46 |             # time = 0
47 | 
48 |             while not np.equal(time_step.step_type, 2):
49 |                 action_step = saved_policy.action(time_step)
50 |                 time_step = tf_env.step(action_step.action)
51 | 
52 |         plt.figure()
53 |         X = env.x_traj
54 |         Y = env.y_traj
55 |         Psi = env.psi_traj
56 |         x_ship = np.array([-0.5, -0.5, 0.25, 0.5, 0.25, -0.5, -0.5, 0.5, 0.25, 0, 0])
57 |         y_ship = 16.1 / 230 * np.array([-1, 1, 1, 0, -1, -1, 0, 0, 1, 1, -1])
58 | 
59 |         plt.plot(X,Y)
60 |         plt.plot(x_wp[0], y_wp[0], 'ro')
61 | 
62 |         # Waypoints
63 |         x_wp = env.test_x_waypoints
64 |         y_wp = env.test_y_waypoints
65 | 
66 |         plt.plot(xdes,ydes)
67 |         plt.scatter(x_wp,y_wp)
68 | 
69 |         # Plot ship geometry on path
70 |         m = 9
71 |         for i in range(1,m):
72 |             m_indx = i * (len(X) // m)
73 |             psi_new = Psi[m_indx]
74 |             x_new_ship = X[m_indx] + x_ship * np.cos(psi_new) - y_ship * np.sin(psi_new)
75 |             y_new_ship = Y[m_indx] + x_ship * np.sin(psi_new) + y_ship * np.cos(psi_new)
76 |             plt.plot(x_new_ship, y_new_ship, 'r')
77 | 
78 |         plt.xlabel("X/L")
79 |         plt.ylabel("Y/L")
80 |         plt.axis('equal')
81 |         plt.legend(["Path Followed","Start Waypoint","Predefined Path"],loc="center")
82 |         ax = plt.gca()
83 |         ax.invert_yaxis()
84 |         plt.grid(linewidth = 1)
85 |         if wind_flag == 0 and wave_flag == 0:
86 |             plt.title(f'Circle')
87 |             plt_str = f'saved_models/{model_name}/plots/{traj_str}_{policy_name}.svg'
88 |             plt.savefig(plt_str, dpi=600)
89 |         elif wind_flag == 1:
90 |             plt.title(f'V_w={wind_speed} \\beta_w={wind_dir}')
91 |             plt_str = f'saved_models/{model_name}/plots/{traj_str}_{policy_name}_WS{wind_speed}_WD{wind_dir}.svg'
92 |             plt.savefig(plt_str, dpi=600)
93 | 
94 |         print("rmse:" , (np.sqrt(np.mean(np.square(env.cross_trk_err_traj)))))
95 | 
96 | 
97 | 


--------------------------------------------------------------------------------
/src/requirements_docker.txt:
--------------------------------------------------------------------------------
 1 | absl-py==1.0.0
 2 | astunparse==1.6.3
 3 | cachetools==5.1.0
 4 | certifi==2019.11.28
 5 | chardet==3.0.4
 6 | cloudpickle==2.1.0
 7 | cycler==0.11.0
 8 | dbus-python==1.2.16
 9 | decorator==5.1.1
10 | dm-tree==0.1.7
11 | flatbuffers==1.12
12 | fonttools==4.36.0
13 | gast==0.4.0
14 | gin-config==0.5.0
15 | google-auth==2.6.6
16 | google-auth-oauthlib==0.4.6
17 | google-pasta==0.2.0
18 | grpcio==1.46.3
19 | gym==0.23.0
20 | gym-notices==0.0.8
21 | h5py==3.6.0
22 | idna==2.8
23 | importlib-metadata==4.11.4
24 | keras==2.9.0
25 | Keras-Preprocessing==1.1.2
26 | kiwisolver==1.4.4
27 | libclang==14.0.1
28 | Markdown==3.3.7
29 | matplotlib==3.5.3
30 | numpy==1.22.4
31 | oauthlib==3.2.0
32 | opt-einsum==3.3.0
33 | packaging==21.3
34 | Pillow==9.2.0
35 | protobuf==3.19.4
36 | pyasn1==0.4.8
37 | pyasn1-modules==0.2.8
38 | pygame==2.1.0
39 | PyGObject==3.36.0
40 | pyparsing==3.0.9
41 | python-apt==2.0.0+ubuntu0.20.4.7
42 | python-dateutil==2.8.2
43 | requests==2.22.0
44 | requests-oauthlib==1.3.1
45 | requests-unixsocket==0.2.0
46 | rsa==4.8
47 | scipy==1.9.0
48 | six==1.14.0
49 | tensorboard==2.9.0
50 | tensorboard-data-server==0.6.1
51 | tensorboard-plugin-wit==1.8.1
52 | tensorflow==2.9.1
53 | tensorflow-estimator==2.9.0
54 | tensorflow-io-gcs-filesystem==0.26.0
55 | tensorflow-probability==0.17.0
56 | termcolor==1.1.0
57 | tf-agents==0.13.0
58 | typing_extensions==4.2.0
59 | urllib3==1.25.8
60 | Werkzeug==2.1.2
61 | wrapt==1.14.1
62 | zipp==3.8.0
63 | 


--------------------------------------------------------------------------------
/src/requirements_tf_260.txt:
--------------------------------------------------------------------------------
1 | tf-agents==0.10.0
2 | scipy==1.4.1
3 | gym==0.24.1
4 | matplotlib==3.3.4
5 | numpy==1.19.5
6 | tensorflow-probability==0.14.0
7 | 


--------------------------------------------------------------------------------
/src/requirements_tf_291.txt:
--------------------------------------------------------------------------------
1 | tf-agents==0.13.0
2 | scipy==1.9.0
3 | gym==0.23.0
4 | matplotlib==3.5.3
5 | numpy==1.22.4
6 | tensorflow-probability==0.17.0
7 | 


--------------------------------------------------------------------------------
/src/run_docker.sh:
--------------------------------------------------------------------------------
1 | docker run -it --rm --gpus all --name dqn_asv \
2 |     --net=host \
3 |     --env="DISPLAY" \
4 |     --workdir="/home/docker/DQN-ASV-path-follow/src" \
5 |     --volume=$(pwd):"/home/docker/DQN-ASV-path-follow/src" \
6 |     --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
7 |     abhilashiit/dqn_asv:2.0 bash
8 | 


--------------------------------------------------------------------------------