├── Figure
    ├── image.png
    └── task_cue.png
├── README.md
├── Run_VLM_Agent.py
├── VLM_Agent
    ├── agent_VLM.py
    ├── api_yoloVLM.py
    ├── prompt_yoloVLM.py
    ├── readme.md
    ├── requirements.txt
    └── solution_yoloVLM.py
├── dockerfile
├── gym_rescue
    ├── __init__.py
    ├── __pycache__
    │   └── __init__.cpython-39.pyc
    ├── envs
    │   ├── __init__.py
    │   ├── __pycache__
    │   │   ├── __init__.cpython-39.pyc
    │   │   ├── base_env.cpython-39.pyc
    │   │   └── rescue.cpython-39.pyc
    │   ├── agent
    │   │   ├── __init__.py
    │   │   ├── __pycache__
    │   │   │   ├── __init__.cpython-39.pyc
    │   │   │   └── character.cpython-39.pyc
    │   │   └── character.py
    │   ├── base_env.py
    │   ├── rescue.py
    │   ├── utils
    │   │   ├── __init__.py
    │   │   ├── __pycache__
    │   │   │   ├── __init__.cpython-39.pyc
    │   │   │   ├── keyboard_util.cpython-39.pyc
    │   │   │   └── misc.cpython-39.pyc
    │   │   ├── keyboard_util.py
    │   │   └── misc.py
    │   └── wrappers
    │   │   ├── __init__.py
    │   │   ├── __pycache__
    │   │       ├── __init__.cpython-39.pyc
    │   │       ├── configUE.cpython-39.pyc
    │   │       ├── early_done.cpython-39.pyc
    │   │       ├── monitor.cpython-39.pyc
    │   │       ├── population.cpython-39.pyc
    │   │       ├── task_cue.cpython-39.pyc
    │   │       └── time_dilation.cpython-39.pyc
    │   │   ├── configUE.py
    │   │   ├── early_done.py
    │   │   ├── monitor.py
    │   │   ├── population.py
    │   │   ├── switch_env.py
    │   │   ├── task_cue.py
    │   │   └── time_dilation.py
    └── example
    │   └── rescue_demo.py
├── main.py
├── requirements.txt
├── run.sh
├── run_main.bat
├── run_main.sh
├── server.py
├── solution.py
└── solution_random.py


/Figure/image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/Figure/image.png


--------------------------------------------------------------------------------
/Figure/task_cue.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/Figure/task_cue.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ATEC_competition_demo_code_2025
 2 | ===
 3 | 
 4 | [//]: # (Gym-Rescue: )
 5 | Note that the Unreal Environment Binaries and example dataset in JSONL format are not included in this repository. Please register on the [ATEC official website](https://www.atecup.cn/competitions/atec2025) to access and download the completed demo.
 6 | 
 7 | **This project integrates Unreal Engine with OpenAI Gym for a rescue task based on [UnrealCV](http://unrealcv.org/).**  
 8 | 
 9 | ## 🔥 News
10 | 
11 | - 04/05/2025. We released two additional baseline methods: **VLM-based** and **RL-based**. 
12 |   ```
13 |   # An example of implementing RL-based algorithm, user could try to improve it by modifying the reward function, integrating VFMs etc.  
14 |   python train_ppo.py --num_envs 4 --num_epochs 100000 --device cuda --num_steps 60 --test_jsonl test_L1.jsonl --gamma 0.99 --ppo_epochs 20 --clip_param 0.2
15 |   
16 |   # An example of implementing VLM-based method for zero-shot reasoning and action execution.
17 |   python Run_VLM_Agent.py
18 |   ```
19 | 
20 | # Introduction
21 | 
22 | **In this competition, the agent will be given image and text cues at the initial stage to help it respond to distress signals from an injured person in need of rescue. The agent must navigate a complex 3D environment, locate the objects, and transport them to designated stretchers as quickly as possible.**
23 | 
24 | **1.Simulation Interaction Environment:**  
25 | Different from traditional computer vision tasks, contestants need to conduct real-time dynamic interactions in the virtual environment provided by this competition. Contestants can use this virtual simulation platform for data collection and strategy training, and the final scoring of the competition will also be evaluated on the same platform.
26 | 
27 | Examples of the simulation environment used in the competition and the first-person perspective images of the intelligent agent are shown below:
28 | ![Description of the image](./Figure/image.png)
29 | 
30 | **2.Interaction Interface:**  
31 | Contestants can interact with the environment through a gym-like Python interface to obtain the initial clues (a color image and a text description), as well as the first-person perspective observation information (RGB image) of the intelligent agent. They can also control the real-time movement of the intelligent agent in the environment, execute actions, and receive reward signals.
32 | 
33 | Examples of the clues obtained by the intelligent agent at the initial stage.  
34 | ![Description of the image](./Figure/task_cue.png)
35 | 
36 | # Installation
37 | 
38 | ## Dependencies
39 | UnrealCV, Gym, CV2, Numpy, Docker(Optional), Nvidia-Docker(Optional)
40 |  
41 | We recommend you use [anaconda](https://www.continuum.io/downloads) to install and manage your Python environment.
42 | 
43 | ```CV2``` is used for image processing, like extracting object masks and bounding boxes. ```Matplotlib``` is used for visualization.
44 | 
45 | 
46 | 
47 | 
48 | It is easy to install, just activate your python environment and install dependency package
49 | ```
50 | pip install -r requirements.txt
51 | ```
52 | 
53 | 
54 | ## Run the baseline code
55 | ```
56 | ./run_main.sh
57 | ```
58 | ## Task Demonstration
59 | We provide a task execution example to help participants better understand the task. Participants can:  
60 |     - Control the agent via keyboard to complete the rescue mission ```--keyboard```.  
61 |     - Observe a randomly controlled agent navigating the environment.  
62 | By adding the parameters ```--render``` and ```--record_video```, participants can visualize the agent’s first-person perspective and save the entire observation sequence as an MP4 file, gaining a clear understanding of the success and failure criteria.
63 | ### Run a keyboard controlled agent, visualize and save to output.mp4
64 |     - Use `i`, `j`, `k`, `l` for agent movement  
65 |     - `1` for pick  
66 |     - `2` for drop  
67 |     - `e` for open the door  
68 |     - `space` for jump  
69 |     - `ctrl` for crouch  
70 | 
71 | ```
72 | python example/rescue_demo.py --render --record_video  --keyboard
73 | ```
74 | 
75 | ### Run a random agent, visualize it
76 | ```
77 | python example/rescue_demo.py --render 
78 | ```
79 | 
80 | 
81 | ##  Acknowledgments
82 | **This repository is a specialized branch of the [unrealzoo project](http://unrealzoo.site/), tailored to simulate rescue task scenarios.**
83 | We acknowledge the following projects for their contributions:
84 | - [UnrealCV](https://unrealcv.org/)
85 | - [OpenAI Gym](https://gym.openai.com/)
86 | - [Unreal Engine](https://www.unrealengine.com/)
87 | - [UnrealZoo](http://unrealzoo.site/)
88 | 
89 | 
90 | 
91 | 


--------------------------------------------------------------------------------
/Run_VLM_Agent.py:
--------------------------------------------------------------------------------
  1 | import gym_rescue
  2 | from gym_rescue.envs.wrappers import switch_env, configUE, early_done
  3 | import gym
  4 | import numpy as np
  5 | import cv2
  6 | import os
  7 | import json
  8 | import base64
  9 | import time
 10 | import traceback
 11 | 
 12 | from VLM_Agent.solution_yoloVLM import AlgSolution
 13 | 
 14 | UnrealEnv = os.environ.get('UnrealEnv')
 15 | TEST_JSONL = os.path.join(UnrealEnv, 'test_L1.jsonl')
 16 | 
 17 | def img2base64(img):
 18 |     buf = cv2.imencode('.png', img)[1]
 19 |     b64 = base64.b64encode(buf).decode('utf-8')
 20 |     return b64
 21 | 
 22 | 
 23 | agent = AlgSolution()
 24 | 
 25 | results = []
 26 | seed = 0
 27 | 
 28 | rewards = []
 29 | ep_times = []
 30 | 
 31 | env_old = None
 32 | with open(TEST_JSONL, "r") as fp:
 33 |     for line in fp.readlines():
 34 | 
 35 |         content = json.loads(line)
 36 |         level = content.get('level')
 37 |         env_id = content.get("env_id")
 38 |         agent_loc = content.get("agent_loc")
 39 |         injured_player_loc = content.get("injured_player_loc")
 40 |         injured_agent_id = content.get("injured_agent_id")
 41 |         stretcher_loc = content.get("stretcher_loc")
 42 |         ambulance_loc= content.get("ambulance_loc")
 43 |         reference_text = content.get("reference_text")
 44 |         reference_image_path = content.get("reference_image_path")
 45 |         max_steps = 9999999
 46 |         timeout = content.get("timeout")
 47 | 
 48 |         env = gym.make(env_id, action_type='Mixed', observation_type='Color', reset_type=level)
 49 |         if env_old is not None:
 50 |             env = switch_env.SwitchEnvWrapper(env, env_old)
 51 |         env_old = env
 52 |         try:
 53 |             env._max_episode_steps = max_steps
 54 |             env = configUE.ConfigUEWrapper(env, offscreen=True, resolution=(640,480))
 55 |             env = early_done.EarlyDoneWrapper(env, int(timeout))
 56 | 
 57 |             if reference_image_path:
 58 |                 reference_image_path = os.path.join(UnrealEnv, 'ref_image', reference_image_path[0])
 59 |                 reference_image = base64.b64encode(open(reference_image_path, 'rb').read()).decode("utf-8")
 60 |             else:
 61 |                 reference_image = None
 62 | 
 63 |             agent.reset(reference_text, reference_image)
 64 | 
 65 |             env.seed(seed)
 66 |             seed += 1
 67 | 
 68 |             env.unwrapped.injured_player_pose = injured_player_loc
 69 |             env.unwrapped.rescue_pose = stretcher_loc
 70 |             env.unwrapped.agent_pose = agent_loc
 71 |             env.unwrapped.ambulance_pose = ambulance_loc
 72 |             ob, info = env.reset()
 73 |             env.unwrapped.unrealcv.set_appearance(env.unwrapped.injured_agent, injured_agent_id)
 74 | 
 75 |             ob, reward, termination, truncation, info = env.step([[(0, 0), 0, 0]])
 76 |             current_step = 0
 77 |             picked_reward = 0.
 78 |             success = True
 79 | 
 80 |             start_t = time.time()
 81 |             while True:
 82 |                 pred = agent.predicts(
 83 |                     img2base64(ob),
 84 |                     success,
 85 |                 )
 86 |                 action = [(pred['angular'], pred['velocity']), pred['viewport'], pred['interaction']]
 87 |                 ob, reward, termination, truncation, info = env.step([action])
 88 |                 if info['picked']:
 89 |                     picked_reward = 0.5
 90 | 
 91 |                 if pred['interaction'] == 3:
 92 |                     if info['picked']:
 93 |                         success = True
 94 |                     else:
 95 |                         success = False
 96 |                 else:
 97 |                     success = True
 98 | 
 99 |                 if current_step % 1 == 0:
100 |                     print('step', current_step)
101 |                 end_t = time.time()
102 |                 if termination:
103 |                     rewards.append(1)
104 |                     ep_times.append(end_t - start_t)
105 |                     break
106 |                 if truncation or pred['interaction'] == 4:
107 |                     rewards.append(picked_reward)
108 |                     ep_times.append(end_t - start_t)
109 |                     break
110 | 
111 |                 current_step += 1
112 |         except:
113 |             env.close()
114 |             traceback.print_exc()
115 |             exit()
116 | 
117 | env.close()
118 | 
119 | print('Rewards', rewards)
120 | print('Times', ep_times)
121 | rewards = np.mean(rewards)
122 | ep_times = np.sum(ep_times)
123 | print('Rewards', rewards, 'Times', ep_times)
124 | 
125 | 


--------------------------------------------------------------------------------
/VLM_Agent/agent_VLM.py:
--------------------------------------------------------------------------------
  1 | from VLM_Agent.prompt_yoloVLM import *
  2 | from VLM_Agent.api_yoloVLM import *
  3 | import os
  4 | import re
  5 | #import argparse
  6 | #import gym
  7 | import cv2
  8 | import time
  9 | import numpy as np
 10 | import base64
 11 | from PIL import Image
 12 | import io
 13 | from datetime import datetime
 14 | from collections import deque
 15 | import random
 16 | from ultralytics import YOLO
 17 | 
 18 | 
 19 | class agent:
 20 | 
 21 |     def __init__(self,reference_text,reference_image):
 22 |         self.clue = ""
 23 |         self.landmark = []
 24 |         self.landmark_back = []
 25 |         self.target = ""
 26 |         self.action = ""
 27 |         
 28 |         # Initialize obs and info
 29 |         self.obs = None
 30 |         self.info = None
 31 | 
 32 |         self.phase = 0
 33 |         self.initialized = False
 34 |         
 35 |         # Create buffers for actions and observations
 36 |         self.clue = reference_text
 37 | 
 38 |         self.image_clue = reference_image
 39 | 
 40 |         self.person_text = ""
 41 |         self.landmark = []
 42 |         self.landmark_back = []
 43 |         self.target = ""
 44 |         self.action = ""
 45 | 
 46 |         # Initialize obs and info
 47 |         self.obs = None
 48 |         self.info = None
 49 | 
 50 |         self.phase = 0
 51 | 
 52 |         # Create buffers for actions and observations
 53 |         self.action_buffer = []
 54 |         self.obs_buffer = []
 55 |         self.image_buffer = []
 56 | 
 57 |         # State tracking
 58 |         self.current_step = 0
 59 |         self.move_fail_count = 0
 60 |         self.final_move_count = 0
 61 |         self.search_in_progress = False
 62 |         self.move_to_landmark_in_progress = False
 63 |         self.observe_in_progress = False
 64 |         self.move_obstacle_in_progress = False
 65 |         self.search_move_in_progress = False
 66 |         self.pending_action = None
 67 |         self.check = 0
 68 |         self.obs_storage_indices = []
 69 |         self.around_image = None
 70 | 
 71 |         # Add step counters for limiting move_to_landmark iterations
 72 |         self.move_steps = 0
 73 |         self.max_move_steps = 2  # Match original code limit of 2 iterations
 74 | 
 75 |         self.myinfo = {'picked': 0}
 76 |         
 77 |         return
 78 | 
 79 |     def predict(self, obs, info):
 80 |         # Add a 1-second delay at the beginning of predict
 81 |         time.sleep(1)
 82 | 
 83 |         # Store the current observation and info
 84 |         self.obs = obs
 85 |         self.info = info
 86 | 
 87 |         # Check if there are pending actions in the buffer
 88 |         if self.action_buffer:
 89 |             action = self.action_buffer.pop(0)
 90 | 
 91 |             # If this action should store an observation, mark it
 92 |             if self.obs_buffer and len(self.obs_buffer) > 0:
 93 |                 if self.obs_buffer[0]:
 94 |                     # Store the current observation for future use
 95 |                     self.image_buffer.append(obs.copy())
 96 |                 self.obs_buffer.pop(0)
 97 | 
 98 |             return action
 99 | 
100 |         # Start the main logic chain if no pending actions
101 |         if self.phase == 0:
102 |             # Initialize by analyzing the clue
103 |             return self._handle_initial_phase()
104 | 
105 |         elif self.phase == 1:
106 |             # Search for the injured person
107 |             return self._handle_search_phase()
108 | 
109 |         elif self.phase == 2:
110 |             # Rescue the injured person
111 |             return self._handle_rescue_phase()
112 | 
113 |         elif self.phase == 3:
114 |             # Search for the stretcher/ambulance
115 |             return self._handle_return_phase()
116 | 
117 |         elif self.phase == 4:
118 |             # Place the person on the stretcher
119 |             return self._handle_placement_phase()
120 | 
121 | 
122 |         # Default action if nothing else to do
123 |         return ([0, 0], 0, 0)
124 | 
125 |     def _handle_initial_phase(self):
126 |         # First time through, analyze the clue
127 |         if not self.landmark:
128 |             self.analyse_initial_clue()
129 |             self.analyse_initial_image()
130 |             print(f"\n\n\nLANDMARK LIST: {', '.join(self.landmark)}")
131 |             if self.action_buffer:
132 |                 return self.action_buffer.pop(0)
133 |             # Directly search in the initial direction without rotating
134 |         prompt = search_prompt_begin(self.landmark, self.person_text)
135 |         max_retries = 3
136 |         for attempt in range(max_retries):
137 |             try:
138 |                 # res1 = call_api_vlm(prompt = """Please observe the image to determine if there is a relatively open area in front of you or if there are objects close enough for clear observation? Please only return "1" to indicate emptiness, or "0" to indicate not emptiness""", base64_image=self.encode_image_array(self.obs[0]))
139 |                 # if res1 == "1":
140 |                 #     for i in range(5):
141 |                 #         self.action_buffer.append(([0, 100], 0, 0))
142 |                 res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.obs))
143 |                 print(f"[SEARCH] \n {res} \n\n\n")
144 |                 pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>', re.DOTALL)
145 |                 match = pattern.search(res)
146 | 
147 |                 if match:
148 |                     analysis = match.group(1).strip()
149 |                     landmark = match.group(2).strip()
150 | 
151 |                     if landmark != "None" and landmark != "NONE":
152 |                         self.target = landmark
153 |                         self.phase = 1
154 |                         self.move_to_landmark_in_progress = True
155 |                         # Queue up the initial forward movement
156 |                         self.action_buffer.append(([0, 50], 0, 1))
157 |                         # if "njured" not in self.target:
158 |                         #     self.action_buffer.append(([0, 100], 0, 0))
159 |                         #     self.action_buffer.append(([0, 100], 0, 0))
160 |                         return ([0, 0], 0, 0)
161 |                     else:
162 |                         self.phase = 1
163 |                         self.search_in_progress = True
164 |                         return self._start_search()
165 | 
166 |             except Exception as e:
167 |                 if attempt == max_retries - 1:
168 |                     raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}")
169 | 
170 |             # If we get here, no landmark was found
171 |             self.phase = 1
172 |             self.search_in_progress = True
173 |             return self._start_search()
174 | 
175 |         # Return default action if already initialized
176 |         return ([0, 0], 0, 0)
177 | 
178 |     def _handle_search_phase(self):
179 |         if self.info['picked'] == 1:
180 |             self.phase = 3
181 |             self.action_buffer.clear()
182 |             self.obs_buffer.clear()
183 |             print("[SPEAK] I have rescued them. It's time to go back.")
184 |             return self._start_search()
185 |         # If search is currently in progress, continue it
186 |         if self.search_in_progress:
187 |             result = self._process_search_result()
188 |             self.search_in_progress = False
189 | 
190 |             if result == True:  # landmark exists
191 |                 self.move_to_landmark_in_progress = True
192 |                 self.move_steps = 0  # Reset step counter when starting new movement
193 |                 return self._start_move_to_landmark()
194 |             else:
195 |                 # self.search_move_in_progress = True
196 |                 # return self._start_search_move()
197 |                 self.action_buffer.append(([0, 100], 0, 0))
198 |                 self.action_buffer.append(([0, 100], 0, 0))
199 |                 self.action_buffer.append(([0, 100], 0, 0))
200 |                 return self._start_search()
201 | 
202 |         # If moving to landmark is in progress
203 |         elif self.move_to_landmark_in_progress:
204 |             if self.move_steps > self.max_move_steps:
205 |                 # If reached max steps, go back to search
206 |                 self.move_to_landmark_in_progress = False
207 |                 self.search_in_progress = True
208 |                 return self._start_search()
209 | 
210 |             result = self._process_move_result()
211 | 
212 |             if result == False:
213 |                 self.move_fail_count += 1
214 |                 self.move_to_landmark_in_progress = False
215 |                 self.search_in_progress = True
216 |                 return self._start_search()
217 | 
218 |             # Check if we've reached max steps for move_to_landmark
219 |             self.move_steps += 1
220 | 
221 |             if "injured" in self.target:
222 |                 self.phase = 2
223 |                 self.move_to_landmark_in_progress = False
224 |                 print("[SPEAK] I have found the injured person! Now I will try to rescue them.")
225 | 
226 |                 # Queue up two forward actions
227 |                 self.action_buffer.append(([0, 50], 0, 3))
228 |                 self.obs_buffer.append(False)
229 |                 return ([0, 50], 0, 0)
230 | 
231 |             # Continue moving to landmark
232 |             return self._start_move_to_landmark()
233 | 
234 |         # If neither search nor move is in progress, start search
235 |         else:
236 |             self.search_in_progress = True
237 |             return self._start_search()
238 | 
239 |     def _handle_rescue_phase(self):
240 |         # If person is picked, move to next phase
241 |         if self.info['picked'] == 1:
242 |             self.phase = 3
243 |             print("[SPEAK] I have rescued them. It's time to go back.")
244 |             return ([0, 0], 0, 0)
245 | 
246 |         # If moving to landmark is in progress
247 |         elif self.move_to_landmark_in_progress:
248 |             result = self._process_move_result()
249 | 
250 |             if result == False:
251 |                 self.move_to_landmark_in_progress = False
252 |                 self.phase = 1
253 |                 self.target = ""
254 |                 print("[SPEAK] I lost the injured person. Let me try to search again.")
255 |                 return ([0, 20], 0, 0)
256 | 
257 |             # Continue moving to landmark
258 |             return self._start_move_to_landmark()
259 | 
260 |         # Start moving to landmark
261 |         else:
262 |             self.move_to_landmark_in_progress = True
263 |             self.move_steps = 0  # Reset step counter
264 |             return self._start_move_to_landmark()
265 | 
266 |     def _handle_return_phase(self):
267 |         # Update landmarks for the return journey
268 |         if not self.landmark_back:
269 |             self.landmark = ['ambulance', 'orange bench']
270 |             self.landmark_back = self.landmark
271 | 
272 |         # If search is in progress
273 |         if self.search_in_progress:
274 |             result = self._process_search_result()
275 |             self.search_in_progress = False
276 | 
277 |             if "range" in self.target or "ench" in self.target:
278 |                 self.phase = 4
279 |                 print("[SPEAK] I found the stretcher. Let me place the injured person on it.")
280 |                 return self._start_move_to_landmark()
281 | 
282 |             if result == True:  # landmark exists in view
283 |                 if self.move_obstacle_in_progress:
284 |                     obstacle_result = self._process_obstacle_result()
285 |                     if obstacle_result == True:
286 |                         self.move_obstacle_in_progress = False
287 |                         return self._start_search()
288 | 
289 |                 self.move_to_landmark_in_progress = True
290 |                 return self._start_move_to_landmark()
291 |             else:
292 |                 self.search_in_progress = True
293 |                 return self._start_search()
294 | 
295 |         elif self.move_obstacle_in_progress:
296 |             result = self._process_obstacle_result()
297 |             self.move_obstacle_in_progress = False
298 |             self.move_to_landmark_in_progress = True
299 |             return self._start_move_to_landmark()
300 | 
301 |         # If moving to landmark is in progress
302 |         elif self.move_to_landmark_in_progress:
303 |             if self.move_steps > self.max_move_steps:
304 |                 # If reached max steps, go back to search
305 |                 self.move_to_landmark_in_progress = False
306 |                 self.search_in_progress = True
307 |                 return self._start_search()
308 | 
309 |             result = self._process_move_result()
310 | 
311 |             if result == False:
312 |                 self.move_to_landmark_in_progress = False
313 |                 self.search_in_progress = True
314 |                 return self._start_search()
315 |             else:
316 |                 self.move_steps += 1
317 | 
318 |             # Keep trying to move to landmark
319 |             return self._start_move_to_landmark()
320 | 
321 |         # Start the search process
322 |         else:
323 |             self.search_in_progress = True
324 |             return self._start_search()
325 | 
326 |     # def _handle_placement_phase(self):
327 |     #     # If moving to landmark is in progress
328 |     #     if self.move_to_landmark_in_progress:
329 |     #         result = self._process_move_result()
330 |     #         self.final_move_count += 1
331 | 
332 |     #         if result == False or self.final_move_count >= 10:
333 |     #             self.move_to_landmark_in_progress = False
334 |     #             print("[SPEAK] I am going to place the injured person.")
335 |     #             self.phase = 1
336 |     #             # Queue backward movements and drop action
337 |     #             self.action_buffer.append(([0, -100], 0, 0))
338 |     #             self.action_buffer.append(([0, 0], 0, 4))
339 |     #             return ([0, -100], 0, 0)
340 | 
341 |     #         # Continue moving to landmark
342 |     #         return self._start_move_to_landmark()
343 | 
344 |     #     # Start moving to landmark
345 |     #     else:
346 |     #         self.move_to_landmark_in_progress = True
347 |     #         return self._start_move_to_landmark()
348 | 
349 |     def _handle_placement_phase(self):
350 |         # if moving to landmark is in progress
351 |         if self.move_to_landmark_in_progress:
352 |             result = self._process_move_result()
353 |             self.final_move_count += 1
354 | 
355 |             if result == False:
356 |                 if self.check == 0:
357 |                     self.check = 1
358 |                     # move backward
359 |                     self.action_buffer.clear()
360 |                     self.obs_buffer.clear()
361 |                     self.action_buffer.append(([0, -100], 0, 0))
362 |                     self.obs_buffer.append(False)
363 | 
364 |                     # observe
365 |                     self.action_buffer.append(([0, 0], 2, 0))
366 |                     self.obs_buffer.append(True)  # observe to check
367 | 
368 |                     print("[SPEAK] let me check the stretcher")
369 |                     return ([0, -100], 0, 0)
370 |                 elif self.check == 1:
371 |                     self.check = 0
372 |                     self.move_to_landmark_in_progress = False
373 |                     self.search_in_progress = True
374 |                     self.phase = 3
375 |                     return self._start_search()
376 |                 elif self.check == 2:
377 |                     self.action_buffer.append(([0, -100], 0, 0))
378 |                     self.action_buffer.append(([0, -100], 0, 0))
379 |                     self.action_buffer.append(([0, -50], 0, 4))
380 |             else:
381 |                 if self.final_move_count >= 10:
382 |                     self.move_to_landmark_in_progress = False
383 |                     print("[SPEAK] I have moved too many times. Maybe I am stuck.")
384 |                     self.phase = 3
385 |                     self.action_buffer.append(([0, -50], 0, 0))
386 |                     self.action_buffer.append(([0, 0], 0, 4))
387 |                     return ([0, -100], 0, 0)
388 |                 if self.check == 1:
389 |                     self.check = 2
390 |             # keep moving
391 |             return self._start_move_to_landmark()
392 | 
393 |         # move
394 |         else:
395 |             self.move_to_landmark_in_progress = True
396 |             return self._start_move_to_landmark()
397 | 
398 |     # Helper methods to start various processes
399 |     def _start_search(self):
400 |         # Prepare for search by observing surroundings
401 |         self.observe_in_progress = True
402 |         self.around_image = None
403 |         self.image_buffer = []
404 | 
405 |         # For phase 3 or 4, start rotation to observe surroundings
406 |         self.action_buffer = []
407 |         self.obs_buffer = []
408 | 
409 |         if self.phase == 3 or self.phase == 4:
410 |             # Prepare for 4 observations while rotating
411 |             for i in range(4):
412 |                 # For first observation, don't rotate
413 |                 if i == 0:
414 |                     self.action_buffer.append(([0, 0], 0, 0))
415 |                     self.obs_buffer.append(True)
416 |                 else:
417 |                     # Add 3 rotations (first 2 don't save observations)
418 |                     for j in range(3):
419 |                         self.action_buffer.append(([30, 0], 0, 0))
420 |                         # Only save observation on the last rotation of each set
421 |                         self.obs_buffer.append(j == 2)
422 |             for i in range(3):
423 |                 self.action_buffer.append(([30, 0], 0, 0))
424 |                 self.obs_buffer.append(False)
425 | 
426 |             # Return first action
427 |             action = self.action_buffer.pop(0)
428 |             should_save = self.obs_buffer.pop(0)
429 |             if should_save:
430 |                 self.image_buffer.append(self.obs.copy())
431 |             return action
432 |         else:
433 |             # First rotate left 3 times (don't save observations)
434 |             for _ in range(3):
435 |                 self.action_buffer.append(([-30, 0], 0, 0))
436 |                 self.obs_buffer.append(False)
437 | 
438 |             # Now we're facing left - save this observation
439 |             self.action_buffer.append(([0, 0], 0, 0))
440 |             self.obs_buffer.append(True)
441 | 
442 |             # Rotate right 3 times to face forward (don't save)
443 |             for _ in range(3):
444 |                 self.action_buffer.append(([30, 0], 0, 0))
445 |                 self.obs_buffer.append(False)
446 | 
447 |             # Now we're facing forward - save this observation
448 |             self.action_buffer.append(([0, 0], 0, 0))
449 |             self.obs_buffer.append(True)
450 | 
451 |             # Rotate right 3 more times to face right (don't save)
452 |             for _ in range(3):
453 |                 self.action_buffer.append(([30, 0], 0, 0))
454 |                 self.obs_buffer.append(False)
455 | 
456 |             # Now we're facing right - save this observation
457 |             self.action_buffer.append(([0, 0], 0, 0))
458 |             self.obs_buffer.append(True)
459 | 
460 |             # Rotate left 3 times to return to center (don't save)
461 |             for _ in range(3):
462 |                 self.action_buffer.append(([-30, 0], 0, 0))
463 |                 self.obs_buffer.append(False)
464 | 
465 |             # Get first action and observation status
466 |             action = self.action_buffer.pop(0)
467 |             should_save = self.obs_buffer.pop(0)
468 |             if should_save:
469 |                 self.image_buffer.append(self.obs.copy())
470 | 
471 |             return action
472 | 
473 |     def _start_move_to_landmark(self):
474 |         # This action will be handled immediately, the result processed on next prediction
475 |         return ([0, 0], 0, 0)
476 | 
477 |     def _start_move_obstacle(self):
478 |         # Prepare for obstacle check
479 |         self.action_buffer = [
480 |             ([0, -50], 0, 0),
481 |             ([0, 0], 2, 0)
482 |         ]
483 |         self.obs_buffer = [False, False]
484 | 
485 |         # Return first action
486 |         action = self.action_buffer.pop(0)
487 |         self.obs_buffer.pop(0)
488 |         return action
489 | 
490 |     def _start_search_move(self):
491 |         # Prepare for search move
492 |         self.observe_in_progress = True
493 |         self.image_buffer = []
494 | 
495 |         # Set up observation sequence
496 |         self.action_buffer = []
497 |         self.obs_buffer = []
498 | 
499 |         if self.phase == 3 or self.phase == 4:
500 |             # Similar to search but with a focus on movement
501 |             for i in range(4):
502 |                 if i == 0:
503 |                     self.action_buffer.append(([0, 0], 0, 0))
504 |                     self.obs_buffer.append(True)
505 |                 else:
506 |                     for j in range(3):
507 |                         self.action_buffer.append(([30, 0], 0, 0))
508 |                         self.obs_buffer.append(j == 2)
509 | 
510 |             # Get first action and observation status
511 |             action = self.action_buffer.pop(0)
512 |             should_save = self.obs_buffer.pop(0)
513 |             if should_save:
514 |                 self.image_buffer.append(self.obs.copy())
515 | 
516 |             return action
517 |         else:
518 |             # First rotate left 3 times (don't save observations)
519 |             for _ in range(3):
520 |                 self.action_buffer.append(([-30, 0], 0, 0))
521 |                 self.obs_buffer.append(False)
522 | 
523 |             # Now we're facing left - save this observation
524 |             self.action_buffer.append(([0, 0], 0, 0))
525 |             self.obs_buffer.append(True)
526 | 
527 |             # Rotate right 3 times to face forward (don't save)
528 |             for _ in range(3):
529 |                 self.action_buffer.append(([30, 0], 0, 0))
530 |                 self.obs_buffer.append(False)
531 | 
532 |             # Now we're facing forward - save this observation
533 |             self.action_buffer.append(([0, 0], 0, 0))
534 |             self.obs_buffer.append(True)
535 | 
536 |             # Rotate right 3 more times to face right (don't save)
537 |             for _ in range(3):
538 |                 self.action_buffer.append(([30, 0], 0, 0))
539 |                 self.obs_buffer.append(False)
540 | 
541 |             # Now we're facing right - save this observation
542 |             self.action_buffer.append(([0, 0], 0, 0))
543 |             self.obs_buffer.append(True)
544 | 
545 |             # Rotate left 3 times to return to center (don't save)
546 |             for _ in range(3):
547 |                 self.action_buffer.append(([-30, 0], 0, 0))
548 |                 self.obs_buffer.append(False)
549 | 
550 |             # Get first action and observation status
551 |             action = self.action_buffer.pop(0)
552 |             should_save = self.obs_buffer.pop(0)
553 |             if should_save:
554 |                 self.image_buffer.append(self.obs.copy())
555 | 
556 |             return action
557 | 
558 |     # Methods to process results of various actions
559 |     def _process_search_result(self):
560 |         # If we've collected observations, process them
561 |         if self.image_buffer:
562 |             self.around_image = self.concatenate_images(self.image_buffer)
563 |             self.image_buffer = []
564 | 
565 |             if self.phase == 0:
566 |                 self.phase = 1
567 |                 prompt = search_prompt_begin(self.landmark, self.person_text)
568 |                 max_retries = 3
569 |                 for attempt in range(max_retries):
570 |                     try:
571 |                         res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.obs))
572 |                         print(f"[SEARCH] \n {res} \n\n\n")
573 |                         pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>', re.DOTALL)
574 |                         match = pattern.search(res)
575 | 
576 |                         if match:
577 |                             analysis = match.group(1).strip()
578 |                             landmark = match.group(2).strip()
579 | 
580 |                             if landmark != "None" and landmark != "NONE":
581 |                                 self.target = landmark
582 |                                 # Queue search confirmation action
583 |                                 self.action_buffer.append(([0, 100], 0, 1))
584 |                                 self.action_buffer.append(([0, 100], 0, 1))
585 |                                 self.obs_buffer.append(False)  # Don't save observation after this action
586 |                                 return True
587 |                             else:
588 |                                 return False
589 | 
590 |                     except Exception as e:
591 |                         if attempt == max_retries - 1:
592 |                             raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}")
593 | 
594 |             elif self.phase == 3 or self.phase == 4:
595 |                 landmark = random.choice([self.landmark, ["orange bench"]])
596 |                 prompt = search_prompt_back(landmark, self.person_text)
597 |                 max_retries = 3
598 |                 for attempt in range(max_retries):
599 |                     try:
600 |                         res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.around_image))
601 |                         print(f"[SEARCH] \n {res} \n\n\n")
602 |                         pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>\s*<c>(.*?)</c>', re.DOTALL)
603 |                         match = pattern.search(res)
604 | 
605 |                         if match:
606 |                             analysis = match.group(1).strip()
607 |                             landmark = match.group(2).strip()
608 |                             side = match.group(3).strip()
609 | 
610 |                             if landmark != "None" and landmark != "NONE":
611 |                                 self.target = landmark
612 |                                 # Queue appropriate rotation actions based on side
613 |                                 if side == "left":
614 |                                     for _ in range(3):
615 |                                         self.action_buffer.append(([-30, 0], 0, 0))
616 |                                         self.obs_buffer.append(False)  # Don't save observations during rotation
617 |                                 elif side == "right":
618 |                                     for _ in range(3):
619 |                                         self.action_buffer.append(([30, 0], 0, 0))
620 |                                         self.obs_buffer.append(False)  # Don't save observations during rotation
621 |                                 elif side == "back":
622 |                                     for _ in range(6):
623 |                                         self.action_buffer.append(([30, 0], 0, 0))
624 |                                         self.obs_buffer.append(False)  # Don't save observations during rotation
625 |                                 # Add forward action
626 |                                 self.action_buffer.append(([0, 100], 0, 0))
627 |                                 self.action_buffer.append(([0, 100], 0, 0))
628 |                                 self.obs_buffer.append(False)  # Don't save observation after this action
629 |                                 return True
630 |                             else:
631 |                                 return False
632 | 
633 |                     except Exception as e:
634 |                         if attempt == max_retries - 1:
635 |                             raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}")
636 | 
637 |             else:
638 |                 prompt = search_prompt(self.landmark, self.person_text)
639 |                 max_retries = 3
640 |                 for attempt in range(max_retries):
641 |                     try:
642 |                         res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.around_image))
643 |                         print(f"[SEARCH] \n {res} \n\n\n")
644 |                         pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>\s*<c>(.*?)</c>', re.DOTALL)
645 |                         match = pattern.search(res)
646 | 
647 |                         if match:
648 |                             analysis = match.group(1).strip()
649 |                             landmark = match.group(2).strip()
650 |                             side = match.group(3).strip()
651 | 
652 |                             if landmark != "None" and landmark != "NONE":
653 |                                 self.target = landmark
654 |                                 self.action_buffer.append(([0, 10], 0, 3))
655 |                                 # Queue appropriate rotation actions based on side
656 |                                 if side == "left":
657 |                                     for _ in range(3):
658 |                                         self.action_buffer.append(([-30, 0], 0, 0))
659 |                                         self.obs_buffer.append(False)  # Don't save observations during rotation
660 |                                 elif side == "right":
661 |                                     for _ in range(3):
662 |                                         self.action_buffer.append(([30, 0], 0, 0))
663 |                                         self.obs_buffer.append(False)  # Don't save observations during rotation
664 |                                 # Add forward action
665 | 
666 |                                 self.action_buffer.append(([0, 100], 0, 0))
667 |                                 self.obs_buffer.append(False)  # Don't save observation after this action
668 |                                 return True
669 |                             else:
670 |                                 return False
671 | 
672 |                     except Exception as e:
673 |                         if attempt == max_retries - 1:
674 |                             raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}")
675 | 
676 |         return False
677 | 
678 |     def _process_move_result(self):
679 |         prompt = move_forward_prompt(self.target, self.person_text)
680 |         max_retries = 3
681 |         for attempt in range(max_retries):
682 |             try:
683 |                 res = call_api_vlm(prompt=prompt,
684 |                                    base64_image=self.encode_image_array(self.add_vertical_lines(self.obs)))
685 |                 print(f"[MOVING] \n {res} \n\n\n")
686 |                 pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>', re.DOTALL)
687 |                 match = pattern.search(res)
688 | 
689 |                 if match:
690 |                     tag = match.group(1).strip()
691 |                     direction = match.group(2).strip()
692 | 
693 |                     if tag == "no" or tag == "No" or tag == "NO":
694 |                         return False
695 |                     elif tag == "yes" or tag == "Yes" or tag == "YES":
696 |                         # Queue appropriate actions based on direction
697 |                         if direction == "left":
698 |                             self.action_buffer.append(([-20, 0], 0, 0))
699 |                             self.action_buffer.append(([-20, 0], 0, 0))
700 |                         elif direction == "right":
701 |                             self.action_buffer.append(([20, 0], 0, 0))
702 |                             self.action_buffer.append(([20, 0], 0, 0))
703 |                         elif direction == "middle":
704 |                             self.action_buffer.append(([0, 100], 0, 0))
705 |                             self.action_buffer.append(([0, 100], 0, 0))
706 |                         # forward action
707 |                         self.action_buffer.append(([0, 50], 0, 3))
708 |                         self.obs_buffer.append(False)  # Don't save observation during forward movement
709 | 
710 |                         # Special handling for injured person target
711 |                         # if "njured" in self.target and self.info['picked'] == 0:
712 |                         #     # Add pickup action (3) as in original code
713 |                         #     self.action_buffer.append(([0, 0], 0, 3))
714 |                         #     self.obs_buffer.append(False)  # Don't save observation during pickup
715 | 
716 |                         # Special handling for cross the door
717 |                         if "door" in self.target:
718 |                             self.action_buffer.append(([0, 0], 0, 5))
719 |                             self.action_buffer.append(([0, 100], 0, 0))
720 |                         # jump can fix many bugs!
721 |                         if self.info['picked'] != 1:
722 |                             self.action_buffer.append(([0, 40], 0, 1))
723 |                             self.obs_buffer.append(False)  # Don't save observation during pickup check
724 | 
725 |                         return True
726 | 
727 |             except Exception as e:
728 |                 if attempt == max_retries - 1:
729 |                     raise ValueError(f"[MOVE] Failed after {max_retries} attempts: {str(e)}")
730 | 
731 |         return False
732 | 
733 |     def _process_obstacle_result(self):
734 |         prompt = move_obstacle_prompt()
735 |         max_retries = 3
736 |         for attempt in range(max_retries):
737 |             try:
738 |                 res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.obs))
739 |                 print(f"[CHECKING OBSTACLE] \n {res} \n\n\n")
740 |                 pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>', re.DOTALL)
741 |                 match = pattern.search(res)
742 | 
743 |                 if match:
744 |                     tag = match.group(1).strip()
745 |                     obstacle = match.group(2).strip()
746 | 
747 |                     if tag == "1":
748 |                         # Queue random turning and movement to avoid obstacle
749 |                         turn = random.choice([-30, 30])
750 |                         self.action_buffer.append(([turn, 0], 0, 0))
751 |                         self.obs_buffer.append(False)  # Don't save observation during turning
752 | 
753 |                         self.action_buffer.append(([turn, 0], 0, 0))
754 |                         self.obs_buffer.append(False)  # Don't save observation during turning
755 | 
756 |                         # Add forward movements
757 |                         for _ in range(5):
758 |                             self.action_buffer.append(([0, 100], 0, 0))
759 |                             self.obs_buffer.append(False)  # Don't save observations during forward movement
760 | 
761 |                         # Add return rotation
762 |                         self.action_buffer.append(([-1 * turn, 0], 0, 0))
763 |                         self.obs_buffer.append(False)  # Don't save observation during return rotation
764 | 
765 |                         self.action_buffer.append(([-1 * turn, 0], 0, 0))
766 |                         self.obs_buffer.append(False)  # Don't save observation during return rotation
767 | 
768 |                         return True
769 |                     else:
770 |                         return False
771 | 
772 |             except Exception as e:
773 |                 if attempt == max_retries - 1:
774 |                     raise ValueError(f"[MOVE_OBSTACLE] Failed after {max_retries} attempts: {str(e)}")
775 | 
776 |         return False
777 | 
778 |     def _process_search_move_result(self):
779 |         prompt = search_move_prompt()
780 |         max_retries = 3
781 |         for attempt in range(max_retries):
782 |             try:
783 |                 res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.around_image))
784 |                 print(f"[SEARCH-MOVE] \n {res} \n\n\n")
785 |                 pattern = re.compile(r'<think>(.*?)</think>\s*<output>(.*?)</output>', re.DOTALL)
786 |                 match = pattern.search(res)
787 | 
788 |                 if match:
789 |                     output = match.group(2).strip()
790 | 
791 |                     # Queue appropriate rotation actions
792 |                     if output == "left":
793 |                         for _ in range(3):
794 |                             self.action_buffer.append(([-30, 0], 0, 0))
795 |                     elif output == "right":
796 |                         for _ in range(3):
797 |                             self.action_buffer.append(([30, 0], 0, 0))
798 | 
799 |                     # Add forward actions
800 |                     for _ in range(3):
801 |                         self.action_buffer.append(([0, 100], 0, 0))
802 | 
803 |                     # Add a search action if not carrying
804 |                     if self.info["picked"] != 1:
805 |                         self.action_buffer.append(([0, 50], 0, 1))
806 | 
807 |                     return True
808 | 
809 |             except Exception as e:
810 |                 if attempt == max_retries - 1:
811 |                     raise ValueError(f"[SEARCH_MOVE] Failed after {max_retries} attempts: {str(e)}")
812 | 
813 |         return False
814 | 
815 |     # Original helper methods preserved
816 |     def analyse_initial_clue(self):
817 |         # res0 = call_api_vlm(prompt=in_out_door_prompt(self.clue), base64_image=self.image_clue)
818 |         prompt = initial_clue_prompt(self.clue)
819 |         # prompt = initial_clue_prompt(self.clue)
820 |         max_retries = 3
821 |         for attempt in range(max_retries):
822 |             try:
823 |                 res = call_api_llm(prompt=prompt)
824 |                 print(res)
825 |                 pattern = re.compile(r'<a>(.*?)</a>\s*<b>(.*?)</b>\s*<c>(.*?)</c>', re.DOTALL)
826 |                 match = pattern.search(res)
827 | 
828 |                 if match:
829 |                     direction = match.group(2).strip()
830 |                     # Queue appropriate rotation actions based on direction
831 |                     if direction == "left":
832 |                         for _ in range(3):
833 |                             self.action_buffer.append(([-30, 0], 0, 0))
834 |                             self.obs_buffer.append(False)
835 |                     elif direction == "right":
836 |                         for _ in range(3):
837 |                             self.action_buffer.append(([30, 0], 0, 0))
838 |                             self.obs_buffer.append(False)
839 |                     elif direction == "back":
840 |                         for _ in range(6):
841 |                             self.action_buffer.append(([30, 0], 0, 0))
842 |                             self.obs_buffer.append(False)
843 |                     elif "front" in direction and "left" in direction:
844 |                         self.action_buffer.append(([-30, 0], 0, 0))
845 |                         self.obs_buffer.append(False)
846 |                     elif "front" in direction and "right" in direction:
847 |                         self.action_buffer.append(([30, 0], 0, 0))
848 |                         self.obs_buffer.append(False)
849 |                     elif "rear" in direction and "left" in direction:
850 |                         for _ in range(4):
851 |                             self.action_buffer.append(([-30, 0], 0, 0))
852 |                             self.obs_buffer.append(False)
853 |                     elif "rear" in direction and "right" in direction:
854 |                         for _ in range(4):
855 |                             self.action_buffer.append(([30, 0], 0, 0))
856 |                             self.obs_buffer.append(False)
857 |                     # if res0 == "0":
858 |                     #    for _ in range(10):
859 |                     #        self.action_buffer.append(([0, 100], 0, 0))
860 |                     # self.action_buffer.append(([0, 0], 0, 0))
861 |                     # self.obs_buffer.append(True)
862 |                     self.action_buffer.append(([0, 0], 0, 0))
863 |                     self.obs_buffer.append(True)
864 | 
865 |                     # Parse landmarks
866 |                     # self.landmark = ["ambulance"]
867 |                     # if res0 == "0":
868 |                     #    self.landmark = ["door"]
869 |                     self.landmark += match.group(3).strip().split(",")
870 |                     self.landmark = [
871 |                         "injured person" if landmark.strip().lower() == "injured woman" or landmark.strip().lower() == "injured man"
872 |                         else landmark.strip() for landmark in self.landmark]
873 |                     self.landmark = [landmark for landmark in self.landmark if
874 |                                      "ench" not in landmark and "mbulance" not in landmark]
875 |                     # if res0 == "0":
876 |                     #    self.landmark_back = ['door', 'ambulance', 'stretcher']
877 |                     # else:
878 |                     self.landmark_back = ['ambulance', 'orange bench']
879 |                     return
880 | 
881 |             except Exception as e:
882 |                 if attempt == max_retries - 1:
883 |                     raise ValueError(f"[CLUE ANALYSE] Failed after {max_retries} attempts: {str(e)}")
884 | 
885 |     def analyse_initial_image(self):
886 |         prompt = initial_image_prompt()
887 |         max_retries = 3
888 |         for attempt in range(max_retries):
889 |             try:
890 |                 res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.image_clue))
891 |                 self.person_text = res
892 |                 print(res)
893 |                 return True
894 | 
895 |             except Exception as e:
896 |                 if attempt == max_retries - 1:
897 |                     raise ValueError(f"[IMAGE ANALYSE] Failed after {max_retries} attempts: {str(e)}")
898 | 
899 |         return False
900 | 
901 |     def encode_image_array(self, image_array):
902 |         # Convert the image array to a PIL Image object
903 |         image = Image.fromarray(np.uint8(image_array))
904 | 
905 |         # Save the PIL Image object to a bytes buffer
906 |         buffered = io.BytesIO()
907 |         image.save(buffered, format="JPEG")
908 | 
909 |         # Encode the bytes buffer to Base64
910 |         img_str = base64.b64encode(buffered.getvalue()).decode('utf-8')
911 | 
912 |         return img_str
913 | 
914 |     def concatenate_images(self, image_list):
915 |         height, width, channels = image_list[0].shape
916 | 
917 |         total_width = width * len(image_list)
918 |         concatenated_image = np.zeros((height, total_width, channels), dtype=np.uint8)
919 | 
920 |         for i, img in enumerate(image_list):
921 |             concatenated_image[:, i * width:(i + 1) * width, :] = img
922 | 
923 |         return concatenated_image
924 | 
925 |     def add_vertical_lines(self, image_array):
926 |         h, w, c = image_array.shape
927 | 
928 |         line1 = w // 3
929 |         line2 = w * 2 // 3
930 |         line_color = (0, 0, 255)
931 |         line_thickness = 2
932 | 
933 |         # Create a copy to avoid modifying the original
934 |         image_copy = image_array.copy()
935 |         cv2.line(image_copy, (line1, 0), (line1, h), line_color, line_thickness)
936 |         cv2.line(image_copy, (line2, 0), (line2, h), line_color, line_thickness)
937 | 
938 |         return image_copy
939 | 
940 | 
941 | 
942 |     def reset(self, text, image):
943 |         self.clue = text
944 |         self.image_clue = image
945 |         return
946 | 
947 | 
948 | 


--------------------------------------------------------------------------------
/VLM_Agent/api_yoloVLM.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import requests
 3 | import ollama
 4 | 
 5 | 
 6 | def call_api_vlm(prompt, base64_image=None):
 7 | 
 8 | 
 9 |     ollama_client = ollama.Client()
10 | 
11 |     response = ollama_client.chat(model='gemma3:27b',
12 |     messages=[{
13 |         'role': 'system',
14 |         'content': prompt,
15 |         },
16 |         {
17 |             'role': 'user',
18 |             'content':"Robot's observation",
19 |             "images": [base64_image]
20 |         }
21 |     ]
22 |     )
23 | 
24 |     return response['message']['content']
25 | 
26 | 
27 | 
28 | 
29 | 
30 | 
31 | 
32 | def call_api_llm(prompt):
33 | 
34 |     input_messages = [
35 |         {"role": "system", "content": "You are a helpful assistant."},
36 |         {"role": "user", "content": prompt}
37 |     ]
38 | 
39 |     ollama_client = ollama.Client()
40 |     response = ollama_client.chat(model='gemma3:27b',
41 |     messages=input_messages
42 |     )
43 | 
44 |     return response['message']['content']
45 | 
46 | 
47 | 
48 | 
49 | 
50 | 


--------------------------------------------------------------------------------
/VLM_Agent/prompt_yoloVLM.py:
--------------------------------------------------------------------------------
  1 | 
  2 | def initial_clue_prompt(clue):
  3 |     p = f"""
  4 |         You are a rescue robot, and now you have the following initial prompts about the rescue target:
  5 |         Clue: {clue}
  6 | 
  7 |         Please analyze two major questions:
  8 |         1. Does the prompt mention where the first target is located on the robot? (Front, right side, etc.)
  9 |         2. What objects or symbols are mentioned in the prompt? According to the logical description of the prompt, arrange the mentioned landmark objects as nodes in the order from the starting point (away from the injured person) to the ending point (injured person).
 10 | 
 11 |         Output format with XML tags:
 12 |         <a> Your step-by-step think process </a>
 13 |         <b>first target's direction (front/back/left/right/none)</b>
 14 |         <c>Sorted landmark objects<c>
 15 | 
 16 |         Example:
 17 |         Clue: The injured person is under a fruit tree. After the fruit tree enters the orchard, turn left. There is a fence outside the orchard and a car is parked.
 18 |         <a> (think process) </a>
 19 |         <b>front</b>
 20 |         <c>car,fence,orchard entrance,fruit tree,injured person</c>
 21 | 
 22 | 
 23 |     """
 24 |     return p
 25 | 
 26 | 
 27 | 
 28 | def initial_clue_prompt_indoor(clue):
 29 |     p = f"""
 30 |         You are a rescue robot, and now you have the following initial prompts about the rescue target:
 31 |         Clue: {clue}
 32 | 
 33 |         Please analyze two major questions:
 34 |         1. Does the prompt mention where the first target is located on the robot? (Front, right side, etc.)
 35 |         2. What objects or symbols are mentioned in the prompt? According to the logical description of the prompt, arrange the mentioned landmark objects as nodes in the order from the starting point (away from the injured person) to the ending point (injured person).
 36 |         - Attention, you already know that the injured person is indoors, so the node must include a "door".
 37 |         Output format with XML tags:
 38 |         <a> Your step-by-step think process </a>
 39 |         <b>first target's direction (front/back/left/right/none)</b>
 40 |         <c>Sorted landmark objects<c>
 41 | 
 42 |         Example:
 43 |         Clue: The injured person is under a fruit tree. After the fruit tree enters the orchard, turn left. There is a fence outside the orchard and a car is parked.
 44 |         <a> (think process) </a>
 45 |         <b>front</b>
 46 |         <c>car,fence,orchard entrance,fruit tree,injured person</c>
 47 | 
 48 | 
 49 |     """
 50 |     return p
 51 | 
 52 | 
 53 | 
 54 | def in_out_door_prompt(clue):
 55 |     p = f"""
 56 |     Here are clue about the target:
 57 |     {clue}
 58 |     and you will get the image of the target.
 59 |     Please determine whether the target is indoors or outdoors based on this text. 
 60 |     If indoors, please return 0; If outdoors, please return to 1. 
 61 |     Do not provide any additional responses.
 62 | 
 63 |     """
 64 |     return p
 65 | 
 66 | def initial_image_prompt():
 67 |     p = f"""
 68 |     Please detect the human lying on the ground in the picture and output the color of the clothes they are wearing.
 69 |     Please only output colors and do not include any other explanatory text.
 70 |     Example 1 (correct):
 71 |     Blue and white
 72 |     Example 2 (incorrect, as only color output is allowed):
 73 |     The color of the clothes passed on by this person is blue and white
 74 |     """
 75 |     return p
 76 | 
 77 | 
 78 | 
 79 | 
 80 | 
 81 | def search_prompt(landmark_list, person_text):
 82 | 
 83 |     landmark_str = ", ".join(landmark_list)
 84 | 
 85 |     p = f"""
 86 | 
 87 |         You are a rescue robot, you need to find the wounded.
 88 | 
 89 |         The input RGB image is a concatenation of three images, reflecting the robot's field of view on the left, front, and right sides, respectively.
 90 | 
 91 |         You now have a list of landmarks:[{landmark_str}]
 92 |         the list was sort in descending order of priority. 
 93 | 
 94 |         Please complete the following tasks:
 95 |         1. Please analyze the information and objects contained in each image separately;
 96 |         2. Check if the objects in the list appear within the field of view.
 97 | 
 98 |         If your target is an Injured person, please note that they are human beings lying on the ground and the color of their clothes is {person_text}.
 99 | 
100 | 
101 |         Use XML tags to output results in the following format:
102 |         <a>Check whether the objects in the list appear in order based on the information within the field of view</a>
103 |         <b>the object in the views with the highest priority, or outputs NONE to indicate that all objects in the list are invisible</b> (When describing an object, please ensure consistency with the expression in the list)
104 |         <c> choose one from left/front/right (Which image does the chosen object belong to) </c>
105 | 
106 |         Example:
107 |         <a>The left view includes red car, trash can, and lawn. There is a wall in front, which should be the wall of the house. There is a swimming pool on the right, with parasols and lounge chairs next to it. In order, first check the injured person - none of the three pictures are present. Then there is the red car, which exists in the field of view and belongs to the left view. I should output red car.</a>
108 |         <b>red car</b>
109 |         <c>left</c>
110 | 
111 |     """
112 |     return p
113 | 
114 | 
115 | 
116 | 
117 | 
118 | def search_prompt_begin(landmark_list, person_text):
119 | 
120 |     landmark_str = ", ".join(landmark_list)
121 | 
122 |     p = f"""
123 | 
124 |         You are a rescue robot, you need to find the wounded.
125 | 
126 |         The input RGB image now displays the visual field in front of the robot.
127 | 
128 |         You now have a list of landmarks:[{landmark_str}]
129 |         the objects in the list are some iconic objects that may be seen on the path from your current location to the injured person. The objects at the front are of low priority, while the injured at the bottom have the highest priority.
130 | 
131 |         Please complete the following tasks:
132 |         1. Analyze which objects are present in the field of view;
133 |         2. Corresponding to the iconic objects in the list, find the object with the highest priority in the field of view.
134 |         Please note that the description should be consistent with the description in the list.
135 | 
136 |         Use XML tags to output results in the following format:
137 |         <a>Analysis of information within the field of view</a>
138 |         <b>The highest priority landmark object (If there is no landmark object in the list within the field of view, output NONE)</b>
139 | 
140 |         Example:
141 |         <a>In the field of view, you can see the entrance of the orchard, the fence, and the garbage bin, both of which are landmarks on the list. Since the priority of the entrance is higher than that of the fence, I will choose the fence output.</b>
142 |         <b>fence</b>
143 | 
144 |     """
145 |     return p
146 | 
147 | 
148 | 
149 | def search_prompt_back(landmark_list, person_text):
150 | 
151 |     landmark_str = ", ".join(landmark_list)
152 | 
153 |     p = f"""
154 | 
155 |         You are a rescue robot, you need to find the wounded.
156 | 
157 |         The input RGB image is a concatenation of four images, reflecting the robot's field of view on the front, right ,back and left sides, respectively.
158 | 
159 |         You need to find the stretcher within your field of view (if it exists). Stretchers will always appear next to the ambulance, so when you cannot see the stretcher, you should output the ambulance.
160 | 
161 |         Please complete the following tasks:
162 | 
163 |         1. Please analyze the information and objects contained in each image separately;
164 |         2. Check if there is an ambulance or stretcher in the field of view. If there is, output it. When both are visible, prioritize outputting the stretcher and only output one object.
165 | 
166 |         Use XML tags to output results in the following format:
167 | 
168 |         <a>Check if there is a stretcher or ambulance in the field of view</a>
169 |         <b>stretcher/ambulance/NONE</b>
170 |         <c> select one from front/right/back/left (which image does the selected object belong to)</c>
171 | 
172 |         example:
173 |         <a>The left view includes red cars, trash cans, and lawns. There are stretchers and ambulances ahead. There is a swimming pool on the right, with parasols and lounge chairs next to it. The image behind is a wall. The stretcher and ambulance are both within sight, I should prioritize the delivery of the stretcher.</a>
174 |         <b>stretcher</b>
175 |         <c>front</c>
176 | 
177 |     """
178 |     return p
179 | 
180 | 
181 | def move_forward_prompt(target, person_text):
182 |     p = f"""
183 | 
184 | You are a rescue robot, you need to find the wounded.
185 | 
186 | The input RGB image now displays the visual field in front of the robot. You are currently moving towards the target :[{target}].
187 | The image is divided into three parts by two red vertical lines: left, middle, right. Please identify the following issue:
188 | 1. Is the target still within sight?
189 | 2. Is the target mainly located on the left, middle, or right side of the image?
190 | 
191 | If your target is an Injured person, please note that they are human beings lying on the ground and the color of their clothes is {person_text}.
192 | 
193 | Use XML tags to output results in the following format:
194 | <a>yes/no (Determine if the target is still within the field of view)</a>
195 | <b>left/middle/right(Determine which area the target is in)</b> 
196 | 
197 | Example:
198 | <a>yes</a>
199 | <b>middle</b> 
200 | 
201 |     """
202 |     return p
203 | 
204 | 
205 | 
206 | 
207 | 
208 | 
209 | 
210 | 
211 | def search_move_prompt():
212 |     p = f"""
213 | 
214 |         You are a rescue robot, you need to explore the environment and find the wounded.
215 |         The input RGB image now is a concatenation of three images, showing the visual content of the robot's left, front, and right sides respectively.
216 |         Please analyze the field of view information within each of the three images and choose the direction that is more worth exploring.
217 |         How to determine if it is worth exploring:
218 |         If there are distant extensions or new areas such as corners or channels connecting in the picture, it is more worth exploring; If it's a wall or a corner, then it's obviously not worth exploring.
219 |         Use XML tags to output results in the following format:
220 |         <think> Your analysis process </think>
221 |         <output>The direction you choose can be left, front, or right</output>
222 | 
223 |         Example:
224 |         <think> In front of me is a wall, and on the right is an empty space, which seems to have no exploratory value. There is a door on my left, and the space inside the door is an unknown area worth exploring. </think>
225 |         <output>left</output>
226 |     """
227 |     return p
228 | 
229 | 
230 | def access_search_prompt():
231 |     p = f"""
232 |         # background
233 | 
234 |         You are a rescue robot that needs to explore unknown environments and find mission targets (wounded and stretchers)
235 |         A person was injured and lying next to the red car on the right side of the house.
236 | 
237 |         # Input information:
238 |         RGB image: composed of four images pieced together, representing the content of your field of view in the front, right, back, and left directions.
239 | 
240 |         Please perform the following tasks in order:
241 | 
242 |         1. Analyze the information within your field of view to determine if you are currently on the road?
243 |         2. If you are not on the road, please find the nearest intersection to help you move from your current area to the road.
244 | 
245 |         Output (including XML format):
246 |         <a>whether you are currently on the road</a> (select from yes, no)
247 |         <b>To move the target, you need to describe the intersection you want to pass through</b>
248 |         <c> In which direction is the intersection within the field of view? (Select from front, right, back, left)</c>
249 | 
250 | 
251 |         Example:
252 |         <a>no</a>
253 |         <b>the left side of the white fence can be accessed</b>
254 |         <c>front</c>
255 | 
256 | 
257 |     """
258 | 
259 |     return p
260 | 
261 | 
262 | 
263 | 
264 | 
265 | def move_obstacle_prompt():
266 |     p = f"""
267 |         You are helping a robot move, and the incoming RGB image is its downward view. 
268 |         Please confirm if there are any horizontal obstacles in the view that may hinder our movement, such as a wall, a horizontal fence, etc. And other objects, such as a tree or a box, although they are also obstacles, their width is limited, so we can easily navigate around them, and these are not considered horizontal obstacles.
269 |         In addition, obstacles also need to have sufficient height, and if they are just objects on the ground, they can also be ignored.
270 | 
271 |         Please output according to the format, including XML tags:
272 |         <a>Whether there are horizontal obstacles</a> (Choose from 0 and 1, where 0 represents none and 1 represents presence)
273 |         <b>If it exists, describe what obstacle it is</b>
274 | 
275 | 
276 |         Example:
277 |         <a>1</a>
278 |         <b>There is a white fence in front of me</b>
279 | 
280 |         """
281 |     return p
282 | 
283 | def moving_suggestion(clue):
284 |     p = f"""
285 |     You are assisting a robot in locating an injured person lying on the ground. Your task is to provide a **moving direction suggestion** based on the **text clue**, the robot's **current visual observation**, and a **scratch trajectory image** that aids spatial reasoning.
286 | 
287 |     ## Inputs:
288 |     1. **Text Clue**: {clue}
289 |     2. **Robot’s Observation Sequence**: A concatenation of RGB images representing the robot's recent first-person views.
290 |     3. **Trajectory Image**: A top-down sketch of the robot’s movement history. The **green point** marks the starting location, the **red point** marks the current position, and the **upward direction** represents the robot’s initial orientation. The robot has **no memory** of previously visited paths, so this image is crucial for reasoning.
291 | 
292 |     ## Your Task:
293 |     Analyze the text clue, image observations, and the trajectory to determine which direction the robot should move in next.
294 | 
295 |     ## Output Format:
296 |     Provide your suggestion using the following XML format:
297 |     <a>YOUR_SUGGESTION</a>
298 |     (Choose from: front, right, back, left, jump)
299 | 
300 |     ## Example Output:
301 |     <a>front</a>
302 |     """
303 |     return p
304 | 
305 | 
306 | 
307 | def moving_back_suggestion():
308 |     p = f"""
309 |     You are helping a robot to finding the yellow stretcher beside a ambulance car, your task is to give a moving direction suggestion, considering the robot's observations.
310 | 
311 |     #Input
312 |     1. Robot's observation sequence: The input RGB image is a concatenation of robot's continuous observation
313 | 
314 |     Please analyze the robot's observations to determine the direction the robot should explore.
315 |     **Considering Strategy**
316 |     1. First determine if the ambulance car is visible in the current view, if not, turn around to explore.
317 |     2. The ambulance car generally are parking near a large free area, like middle of the road, so avoid moving toward area with clutter structures or buildings.
318 |     
319 |     Please output according to the format, including XML tags:
320 |         <a>The moving suggestion</a> (Select from front, right, back, left)</c>
321 |     
322 |     Example:
323 |     <a>front</a>
324 | 
325 |     """
326 |     return p


--------------------------------------------------------------------------------
/VLM_Agent/readme.md:
--------------------------------------------------------------------------------
 1 | # Hybrid YOLO-VLM Rescue Agent README
 2 | 
 3 | ## Overview
 4 | This agent is designed to solve rescue missions in a simulated environment by combining object detection (YOLO) and vision-language models (VLM) to create a two-tiered decision system. The agent navigates through the environment, finds an injured person, and transports them to a stretcher or ambulance.
 5 | 
 6 | The agent successfully completes rescue tasks by implementing a hybrid approach:
 7 | 1. **Primary Tier: YOLO Detection** - Provides precise object detection and movement when targets are clearly visible
 8 | 2. **Secondary Tier: VLM Landmark Navigation** - Takes over when YOLO detection fails, using contextual understanding to search for landmarks
 9 | 
10 | ## Installation
11 | This guide will help you set up the environment for running the YOLO-VLM integration project, which combines object detection with vision-language capabilities.
12 | 
13 | ### Step 1: Clone the Repository
14 | ```bash
15 | git clone https://github.com/atecup/atec2025_software_algorithm.git
16 | cd atec2025_software_algorithm/VLM_Agent
17 | ```
18 | 
19 | ### Step 2: Install Dependencies
20 | ```bash
21 | pip install -r requirements.txt
22 | ```
23 | 
24 | ### Step 3: Set Up Ollama
25 | 1. Install Ollama from: https://ollama.com/download
26 | 2. Pull the Gemma 3 model (or other models if you need):
27 | ```bash
28 | ollama pull gemma3:27b
29 | ```
30 | You can then use solution_yoloVLM.py as the solution.py required to complete the full task.
31 | 
32 | 
33 | ## System Architecture
34 | 
35 | ### Two-Tier Decision System
36 | The agent implements a hierarchical decision-making process:
37 | 
38 | 1. **YOLO Object Detection (Primary)**
39 |    - Runs first on each frame to detect key objects (person, stretcher, truck/ambulance)
40 |    - Provides precise positioning and movement when objects are detected
41 |    - Handles direct movement control with defined thresholds for actions
42 | 
43 | 2. **VLM Landmark Navigation (Secondary)**
44 |    - Activates when YOLO fails to detect relevant objects
45 |    - Analyzes textual clues and visual environment
46 |    - Uses landmark-based navigation to systematically search the environment
47 | 
48 | ### YOLO Detection System
49 | - **Object Detection**: Uses YOLO model to detect persons, stretchers (suitcases), trucks, and buses with confidence thresholds
50 | - **Precision Control**: 
51 |   - Divides screen into navigation zones (left, center, right)
52 |   - Uses object coordinates to determine optimal movement action
53 |   - Implements proximity thresholds for pickup and drop actions
54 | - **State Tracking**: Maintains state of rescue operation (whether person has been picked up)
55 | 
56 | ### Landmark-Based Navigation (VLM)
57 | The agent parses the initial text clue to identify potential landmarks that might lead to the injured person. For example, if the clue is "The injured person is by the roadside garbage bin, and there is a car nearby," landmarks might include [roadside, car, garbage bin, injured person]. The agent will detect them in priority order and move towards them.
58 | 
59 | ### Action and Observation Buffers
60 | - **Action Buffer**: Maintains a queue of pending actions to be executed, allowing the agent to plan multiple steps ahead.
61 | - **Observation Buffer**: Coordinates with the action buffer to determine which observations should be saved for later analysis.
62 | 
63 | These two buffers allow the VLM to perform continuous action control on the agent during the loop process.
64 | 
65 | ## Technical Implementation Details
66 | 
67 | ### YOLO Detection Logic
68 | - **Confidence Thresholds**: Adapts confidence level based on whether a person has been picked up (0.1 when carrying, 0.2 when searching)
69 | - **Object Prioritization**: Hierarchical detection logic that prioritizes persons when searching and stretcher/vehicles when carrying
70 | - **Position-Based Actions**:
71 |   - Central region (220-420 px): Move forward
72 |   - Left region (<220 px): Turn left
73 |   - Right region (>420 px): Turn right
74 |   - Proximity triggers (y-coordinate thresholds): Initiate carry/drop actions
75 | 
76 | ### VLM Navigation Features
77 | - **Obstacle Avoidance**: The agent alternates between walking and jumping to navigate around obstacles when searching.
78 | - **Repeated Pickup Attempts**: After finding the target person, the agent attempts to pick them up after each move.
79 | - **Optimal Placement Distance**: When placing the injured person on the stretcher, the agent backs up before dropping.
80 | - **Multi-Directional Observation**: The agent captures images from multiple directions and concatenates them.
81 | - **Visual Guidance Lines**: When moving forward, the agent divides the image into regions using vertical lines.
82 | 
83 | 
84 | 


--------------------------------------------------------------------------------
/VLM_Agent/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==2.2.4
2 | opencv_python==4.11.0.86
3 | Pillow==11.2.1
4 | Requests==2.32.3
5 | torch==2.4.0+cu124
6 | ultralytics==8.3.107
7 | ollama


--------------------------------------------------------------------------------
/VLM_Agent/solution_yoloVLM.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import cv2
  3 | import time
  4 | import numpy as np
  5 | import os
  6 | import json
  7 | from VLM_Agent.agent_VLM import agent
  8 | from ultralytics import YOLO
  9 | import torch
 10 | import base64
 11 | 
 12 | class AlgSolution:
 13 |     def __init__(self):
 14 |         os.system("OLLAMA_MODELS=./checkpoints ./ollama/bin/ollama serve &")
 15 |         time.sleep(5)
 16 |         if os.path.exists('/home/admin/workspace/job/logs/'):
 17 |             self.handle = open('/home/admin/workspace/job/logs/user.log', 'w')
 18 |         else:
 19 |             self.handle = open('user.log', 'w')
 20 |         self.handle.write("ollama loaded\n")
 21 |         # Initialize VLM agent
 22 |         self.vlm_agent = agent(reference_text=None,reference_image=None)
 23 |         
 24 |         # Initialize detection model
 25 |         self.yolo_model = YOLO('checkpoints/yolo11x.pt')
 26 |         
 27 |         # State tracking
 28 |         self.person_detected = False
 29 |         self.stretcher_detected = False
 30 |         self.current_target = None  # 'person' or 'stretcher'
 31 |         
 32 |         self.myinfo = {'picked': 0}
 33 |         self.prev = {
 34 |             'angular': 0,
 35 |             'velocity': 0,
 36 |             'viewport': 0, 
 37 |             'interaction': 0,
 38 |         }
 39 | 
 40 |         self.foreward = ([0,50],0,0)
 41 | 
 42 |         self.backward = ([0,-50],0,0)
 43 |         self.turnleft = ([-20,0],0,0)
 44 |         self.turnright = ([20,0],0,0)
 45 |         self.carry = ([0,0],0,3)
 46 |         self.drop = ([0,0],0,4)
 47 |         self.noaction = ([0,0],0,0)
 48 | 
 49 | 
 50 |     def predicts(self, ob, success):
 51 |         if self.prev['interaction'] == 3 and success:
 52 |             self.myinfo = {'picked': 1}
 53 | 
 54 |         ob = base64.b64decode(ob)
 55 |         ob = cv2.imdecode(np.frombuffer(ob, np.uint8), cv2.IMREAD_COLOR)
 56 |         pred = self.predict(ob, self.myinfo)
 57 |         res = {
 58 |             'angular': pred[0][0],
 59 |             'velocity': pred[0][1],
 60 |             'viewport': pred[1],
 61 |             'interaction': pred[2],
 62 |         }
 63 |         self.prev = res
 64 |         return res
 65 | 
 66 | 
 67 | 
 68 |     def reset(self, reference_text, reference_image):
 69 |         reference_image=base64.b64decode(reference_image)
 70 |         reference_image = cv2.imdecode(np.frombuffer(reference_image, np.uint8), cv2.IMREAD_COLOR)
 71 |         # print('reset obs shape after decode', reference_image.shape)
 72 |         self.vlm_agent.reset(reference_text, reference_image)
 73 |         self.person_detected = False
 74 |         self.stretcher_detected = False
 75 |         self.current_target = None
 76 |         self.myinfo = {'picked': 0}
 77 | 
 78 | 
 79 |     def draw_bbox_on_obs(self,obs, boxes, labels, color=(0, 255, 0), thickness=2):
 80 |         """
 81 |         Draw bounding boxes on the observation image.
 82 | 
 83 |         Args:
 84 |             obs: The observation image (numpy array).
 85 |             boxes: List of bounding boxes, each in the format [x, y, w, h].
 86 |             labels: List of labels corresponding to the bounding boxes.
 87 |             color: Color of the bounding box (default: green).
 88 |             thickness: Thickness of the bounding box lines (default: 2).
 89 |         """
 90 |         for box, label in zip(boxes, labels):
 91 |             x, y, w, h = box
 92 |             top_left = (int(x - w / 2), int(y - h / 2))
 93 |             bottom_right = (int(x + w / 2), int(y + h / 2))
 94 |             cv2.rectangle(obs, top_left, bottom_right, color, thickness)
 95 |             cv2.putText(obs, label, (top_left[0], top_left[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)
 96 | 
 97 |         return obs
 98 | 
 99 |     def predict(self, obs, info):
100 |         # First try to detect objects using YOLO
101 |         if info['picked'] == 1:
102 |             results = self.yolo_model(source=obs, imgsz=640, conf=0.1)
103 |         else:
104 |             results = self.yolo_model(source=obs, imgsz=640, conf=0.2)
105 | 
106 |         boxes = results[0].boxes  # get all detected bounding box
107 | 
108 |         boxes_tmp = [box.xywh[0].tolist() for box in boxes]
109 |         labels_tmp = [self.yolo_model.names[int(box.cls.item())] for box in results[0].boxes]
110 | 
111 |         # Draw bounding boxes on the observation image
112 |         #obs_with_bbox = self.draw_bbox_on_obs(obs, boxes_tmp, labels_tmp)
113 |         #cv2.imshow('Observation with BBox', obs_with_bbox)
114 |         #cv2.waitKey(1)
115 |         # Check for person and stretcher in detections
116 |         person_box = None
117 |         stretcher_box = None
118 |         truck_box=None
119 |         if info['picked']==1:
120 |             self.person_detected=True
121 |         for box in boxes:
122 |             cls = int(box.cls.item())
123 |             if self.yolo_model.names[cls] == 'person' and not self.person_detected:
124 |                 person_box = box.xywh[0].tolist()
125 |             elif self.yolo_model.names[cls] == 'suitcase' and self.person_detected:
126 |                 stretcher_box = box.xywh[0].tolist()
127 |             # elif self.yolo_model.names[cls] == 'bench'  and self.person_detected:
128 |             #     stretcher_box = box.xywh[0].tolist()
129 |             elif self.yolo_model.names[cls] =='truck'and self.person_detected:
130 |                 truck_box = box.xywh[0].tolist()
131 |             elif self.yolo_model.names[cls] =='bus'and self.person_detected:
132 |                 truck_box = box.xywh[0].tolist()
133 | 
134 |         # If we have a detection, use detection-based movement
135 |         if person_box and not self.person_detected:
136 |             return self._move_based_on_detection(person_box, 'person')
137 |         elif stretcher_box and self.person_detected:
138 |             return self._move_based_on_detection(stretcher_box, 'stretcher')
139 |         elif truck_box and self.person_detected:
140 |             return self._move_based_on_detection(truck_box, 'truck')
141 | 
142 | 
143 |         # If no detection, use VLM agent for exploration
144 |         print('call VLM for inference...')
145 |         return self.vlm_agent.predict(obs, info)
146 | 
147 |     def _move_based_on_detection(self, box, target_type):
148 |         x0, y0, w_, h_ = box
149 |         
150 |         if target_type == 'person':
151 |             # if w_ > h_:
152 |             if y0 - 0.5*h_ > 390 and x0>220 and x0<420:
153 |                 # self.person_detected = True
154 |                 return self.carry
155 |             else:
156 |                 if x0 < 220:
157 |                     return self.turnleft
158 |                 elif x0 > 420:
159 |                     return self.turnright
160 |                 else:
161 |                     return self.foreward
162 |         elif target_type =='stretcher':
163 |             if y0 - 0.5*h_ > 350  and x0>220 and x0<420 :
164 |                 print('drop')
165 |                 return self.drop
166 |             else:
167 |                 if x0 < 220:
168 |                     return self.turnleft
169 |                 elif x0 > 420:
170 |                     return self.turnright
171 |                 else:
172 |                     return self.foreward
173 |         elif target_type == 'truck':  # stretcher
174 |             if (w_> 320 and h_>320)  and x0>220 and x0<420 :
175 |                 print('drop')
176 |                 return self.drop
177 |             else:
178 |                 if x0 < 220:
179 |                     return self.turnleft
180 |                 elif x0 > 420:
181 |                     return self.turnright
182 |                 else:
183 |                     return self.foreward
184 | 
185 | 
186 | 


--------------------------------------------------------------------------------
/dockerfile:
--------------------------------------------------------------------------------
 1 | FROM ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch-ubuntu:2.3.0-cuda12.1.1-ubuntu22.04
 2 | RUN pip install flask opencv-python
 3 | RUN pip install opencv-python-headless matplotlib tqdm pyyaml psutil
 4 | RUN pip install --no-deps ultralytics
 5 | RUN mkdir -p /home/admin/atec_project
 6 | COPY checkpoints /home/admin/atec_project/checkpoints
 7 | COPY *.py /home/admin/atec_project/
 8 | COPY *.sh /home/admin/atec_project/
 9 | RUN ln -s /bin/python3 /bin/python
10 | 


--------------------------------------------------------------------------------
/gym_rescue/__init__.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import glob
 3 | from gym.envs.registration import register
 4 | import logging
 5 | from gym_rescue.envs.utils.misc import load_env_setting
 6 | logger = logging.getLogger(__name__)
 7 | use_docker = False  # True: use nvidia docker   False: do not use nvidia-docker
 8 | 
 9 | 
10 | Tasks = ['Rescue']
11 | Observations = ['Color', 'Depth', 'Rgbd', 'Mask', 'Pose','MaskDepth','ColorMask']
12 | Actions = ['Discrete', 'Continuous', 'Mixed']
13 | 
14 | # Task-oriented envs
15 | UnrealEnv = os.environ.get('UnrealEnv')
16 | setting_files = glob.glob(
17 |     os.path.join(UnrealEnv , 'settings', '*.json')
18 | )
19 | for setting_file in setting_files:
20 |     
21 |     env = os.path.split(setting_file)[-1].split('.')[0]
22 |     for task in Tasks:
23 |         name = f'Unreal{task}-{env}'
24 |         register(
25 |             id=name,
26 |             entry_point=f'gym_rescue.envs:{task}',
27 |             kwargs={'env_file': setting_file,},
28 |             max_episode_steps=10000
29 |         )
30 | 
31 | 


--------------------------------------------------------------------------------
/gym_rescue/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/__init__.py:
--------------------------------------------------------------------------------
1 | from gym_rescue.envs.rescue import Rescue
2 | 
3 | 


--------------------------------------------------------------------------------
/gym_rescue/envs/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/__pycache__/base_env.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/__pycache__/base_env.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/__pycache__/rescue.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/__pycache__/rescue.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/agent/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/agent/__init__.py


--------------------------------------------------------------------------------
/gym_rescue/envs/agent/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/agent/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/agent/__pycache__/character.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/agent/__pycache__/character.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/agent/character.py:
--------------------------------------------------------------------------------
  1 | from unrealcv.api import UnrealCv_API
  2 | import numpy as np
  3 | import math
  4 | import time
  5 | import json
  6 | import re
  7 | from io import BytesIO
  8 | import PIL.Image
  9 | class Character_API(UnrealCv_API):
 10 |     def __init__(self, port=9000, ip='127.0.0.1', resolution=(160, 120), comm_mode='tcp'):
 11 |         super(Character_API, self).__init__(port=port, ip=ip, resolution=resolution, mode=comm_mode)
 12 |         self.obstacles = []
 13 |         self.targets = []
 14 |         self.img_color = np.zeros((resolution[1], resolution[0], 3))
 15 |         self.img_depth = np.zeros((resolution[1], resolution[0], 1))
 16 |         self.animation_dict = {
 17 |             'stand': self.set_standup,
 18 |             'jump': self.set_jump,
 19 |             'crouch': self.set_crouch,
 20 |             'liedown': self.set_liedown,
 21 |             'open_door': self.set_open_door,
 22 |             'enter_vehicle': self.enter_exit_car,
 23 |             'carry':self.carry_body,
 24 |             'drop':self.drop_body
 25 |         }
 26 | 
 27 |     def config_ue(self, resolution=(320, 240), quality=1, disable_all_screen_messages=True, Lumen=False):
 28 |         """
 29 |         Configure Unreal Engine settings.
 30 | 
 31 |         Args:
 32 |             resolution (tuple): The resolution of the display window.
 33 |             quality (bool): The quality of the display window. From 0 to 3.
 34 |             disable_all_screen_messages (bool): Whether to disable all screen messages. Default is True.
 35 |         """
 36 |         self.check_connection()
 37 |         [w, h] = resolution
 38 |         self.client.request(f'vrun setres {w}x{h}w', -1)  # set resolution of the display window
 39 |         if disable_all_screen_messages:
 40 |             self.client.request('DisableAllScreenMessages', -1)  # disable all screen messages
 41 |         if quality > 0:
 42 |             self.client.request(f'vrun sg.ShadowQuality {quality}', -1)  # set shadow quality to low
 43 |             self.client.request(f'vrun sg.TextureQuality {quality}', -1)  # set texture quality to low
 44 |             self.client.request(f'vrun sg.EffectsQuality {quality}', -1)  # set effects quality to low
 45 |         if Lumen:
 46 |             self.client.request('vset /camera/1/illumination Lumen', -1)
 47 |             self.client.request('vset /camera/1/reflection Lumen', -1)
 48 |         time.sleep(0.1)
 49 | 
 50 |     def init_mask_color(self, targets=None):
 51 |         if targets == 'all':
 52 |             self.targets = self.get_objects()
 53 |             self.color_dict = self.build_color_dict(self.targets)
 54 |         elif targets is not None:
 55 |             self.targets = targets
 56 |             self.color_dict = self.build_color_dict(self.targets)
 57 | 
 58 |     def get_observation(self, cam_id, observation_type, mode='bmp'):
 59 |         if observation_type == 'Color':
 60 |             self.img_color = state = self.get_image(cam_id, 'lit', mode)
 61 |         elif observation_type == 'Mask':
 62 |             self.img_color = state = self.get_image(cam_id, 'object_mask', mode)
 63 |         elif observation_type == 'Depth':
 64 |             self.img_depth = state = self.get_depth(cam_id)
 65 |         elif observation_type == 'Rgbd':
 66 |             state = self.get_image_multimodal(cam_id, ['lit', 'depth'], [mode, 'npy'])
 67 |             self.img_color = state[..., :3]
 68 |             self.img_depth = state[..., 3:]
 69 |             state = np.append(self.img_color, self.img_depth, axis=2)
 70 |         elif observation_type == 'Gray':
 71 |             self.img_color = self.read_image(cam_id, 'lit', mode)
 72 |             self.img_gray = self.img_color.mean(2)
 73 |             self.img_gray = np.expand_dims(self.img_gray, -1)
 74 |         elif observation_type == 'Pose':
 75 |             state = self.get_cam_pose(cam_id) # camera 6D pose
 76 |         return state
 77 | 
 78 |     # functions for character setting
 79 |     def set_max_speed(self, player, speed, return_cmd=False):
 80 |         # set the max velocity of the agent object
 81 |         cmd = f'vbp {player} set_speed {speed}'
 82 |         res = None
 83 |         while res is None:
 84 |             res = self.client.request(cmd)
 85 |         return speed
 86 | 
 87 |     def set_acceleration(self, player, acc):
 88 |         cmd = f'vbp {player} set_acc {acc}'
 89 |         res = None
 90 |         while res is None:
 91 |             res = self.client.request(cmd.format(player=player, acc=acc))
 92 |         return acc
 93 | 
 94 |     def set_appearance(self, player, id):
 95 |         # set the appearance of the agent object, id: 0~19
 96 |         cmd = f'vbp {player} set_app {id}'
 97 |         res = None
 98 |         while res is None:
 99 |             res = self.client.request(cmd.format(player=player, id=id), -1)
100 |         return id
101 | 
102 |     def move_cam_2d(self, cam_id, angle, distance):
103 |         # move the camera in 2D as a mobile agent
104 |         self.move_cam_forward(cam_id, angle, distance, height=0, pitch=0)
105 | 
106 |     def get_speed(self, player):
107 |         cmd = f'vbp {player} get_speed'
108 |         res = None
109 |         while res is None:
110 |             res = self.client.request(cmd)
111 |         return self.decoder.string2vector(res)[0]
112 | 
113 |     def get_angle(self, player):
114 |         cmd = f'vbp {player} get_angle'
115 |         res = None
116 |         while res is None:
117 |             res = self.client.request(cmd)
118 |         return self.decoder.string2vector(res)[0]
119 | 
120 |     def reset_player(self, player):
121 |         cmd = f'vbp {player} reset'
122 |         res=None
123 |         while res is None:
124 |             res = self.client.request(cmd)
125 | 
126 |     def set_phy(self, obj, state):
127 |         cmd = f'vbp {obj} set_phy {state}'
128 |         res=None
129 |         while res is None:
130 |             res = self.client.request(cmd, -1)
131 | 
132 |     def simulate_physics(self, objects):
133 |         res = [self.set_phy(obj, 1) for obj in objects]
134 | 
135 |     def set_move_bp(self, player, params, return_cmd=False):
136 |         '''
137 |         new move function, can adapt to different number of params
138 |         2 params: [v_angle, v_linear], used for agents moving in plane, e.g. human, car, animal
139 |         4 params: [v_ x, v_y, v_z, v_yaw], used for agents moving in 3D space, e.g. drone
140 |         '''
141 |         params_str = ' '.join([str(param) for param in params])
142 |         cmd = f'vbp {player} set_move {params_str}'
143 |         if return_cmd:
144 |             return cmd
145 |         res = None
146 |         while res is None:
147 |             res = self.client.request(cmd, -1)
148 | 
149 |     # functions for character actions
150 |     def set_jump(self, player, return_cmd=False):
151 |         cmd = f'vbp {player} set_jump'
152 |         if return_cmd:
153 |             return cmd
154 |         res = None
155 |         while res is None:
156 |             res = self.client.request(cmd, -1)
157 | 
158 |     def set_crouch(self, player, return_cmd=False):
159 |         cmd = f'vbp {player} set_crouch'
160 |         if return_cmd:
161 |             return cmd
162 |         res = None
163 |         while res is None:
164 |             res = self.client.request(cmd, -1)
165 | 
166 |     def set_liedown(self, player, directions=(100, 100), return_cmd=False):
167 |         frontback = directions[0]
168 |         leftright = directions[1]
169 |         cmd = f'vbp {player} set_liedown {frontback} {leftright}'
170 |         if return_cmd:
171 |             return cmd
172 |         res = None
173 |         while res is None:
174 |             res = self.client.request(cmd, -1)
175 | 
176 |     def set_standup(self, player, return_cmd=False):
177 |         cmd = f'vbp {player} set_standup'
178 |         if return_cmd:
179 |             return cmd
180 |         res = None
181 |         while res is None:
182 |             res = self.client.request(cmd, -1)
183 | 
184 |     def set_animation(self, player, anim_id, return_cmd=False):
185 |         return self.animation_dict[anim_id](player, return_cmd=return_cmd)
186 | 
187 |     def get_hit(self, player):
188 |         cmd = f'vbp {player} get_hit'
189 |         res = None
190 |         while res is None:
191 |             res = self.client.request(cmd)
192 |         if '1' in res:
193 |             return True
194 |         if '0' in res:
195 |             return False
196 | 
197 |     def set_random(self, player, value=1):
198 |         cmd = f'vbp {player} set_random {value}'
199 |         res=None
200 |         while res is None:
201 |             res = self.client.request(cmd, -1)
202 | 
203 |     def set_interval(self, player, interval):
204 |         cmd = f'vbp {player} set_interval {interval}'
205 |         res = None
206 |         while res is None:
207 |             res = self.client.request(cmd, -1)
208 | 
209 |     def init_objects(self, objects):
210 |         self.objects_dict = dict()
211 |         for obj in objects:
212 |             # print (obj)
213 |             self.objects_dict[obj] = self.get_obj_location(obj)
214 |         return self.objects_dict
215 | 
216 |     def random_obstacles(self, objects, img_dirs, num, area, start_area, texture=False):
217 |         sample_index = np.random.choice(len(objects), num, replace=False)
218 |         for id in sample_index:
219 |             obstacle = objects[id]
220 |             self.obstacles.append(obstacle)
221 |             # texture
222 |             if texture:
223 |                 img_dir = img_dirs[np.random.randint(0, len(img_dirs))]
224 |                 self.set_texture(obstacle, (1, 1, 1), np.random.uniform(0, 1, 3), img_dir, np.random.randint(1, 4))
225 |             # scale
226 |             # self.set_obj_scale(obstacle, np.random.uniform(0.3, 3, 3))
227 |             self.set_obj_scale(obstacle, np.random.uniform(0.5, 4, 3))
228 | 
229 |             # location
230 |             obstacle_loc = [start_area[0], start_area[2], 0]
231 |             while start_area[0] <= obstacle_loc[0] <= start_area[1] and start_area[2] <= obstacle_loc[1] <= start_area[3]:
232 |                 obstacle_loc[0] = np.random.uniform(area[0]+100, area[1]-100)
233 |                 obstacle_loc[1] = np.random.uniform(area[2]+100, area[3]-100)
234 |                 obstacle_loc[2] = np.random.uniform(area[4], area[5]) -100
235 |             self.set_obj_location(obstacle, obstacle_loc)
236 |             time.sleep(0.01)
237 | 
238 |     def clean_obstacles(self):
239 |         for obj in self.obstacles:
240 |             self.set_obj_location(obj, self.objects_dict[obj])
241 |         self.obstacles = []
242 | 
243 |     def new_obj(self, obj_class_name, obj_name, loc, rot=[0, 0, 0]):
244 |         # spawn, set obj pose, enable physics
245 |         [x, y, z] = loc
246 |         [pitch, yaw, roll] = rot
247 |         if obj_class_name =="bp_character_C" or obj_class_name =="target_C":
248 |             cmd = [f'vset /objects/spawn {obj_class_name} {obj_name}',
249 |                    f'vset /object/{obj_name}/location {x} {y} {z}',
250 |                    f'vset /object/{obj_name}/rotation {pitch} {yaw} {roll}',
251 |                    f'vbp {obj_name} set_phy 0'
252 |                    ]
253 |         else:
254 |             cmd = [f'vset /objects/spawn {obj_class_name} {obj_name}',
255 |                    f'vset /object/{obj_name}/location {x} {y} {z}',
256 |                    f'vset /object/{obj_name}/rotation {pitch} {yaw} {roll}',
257 |                    f'vbp {obj_name} set_phy 1'
258 |                    ]
259 |         self.client.request(cmd, -1)
260 |         return obj_name
261 | 
262 |     def set_cam(self, obj, loc=[0, 30, 70], rot=[0, 0, 0], return_cmd=False):
263 |         # set the camera pose relative to a actor
264 |         x, y, z = loc
265 |         roll, pitch, yaw = rot
266 |         cmd = f'vbp {obj} set_cam {x} {y} {z} {roll} {pitch} {yaw}'
267 |         if return_cmd:
268 |             return cmd
269 |         res = self.client.request(cmd, -1)
270 |         return res
271 | 
272 |     def adjust_fov(self, cam_id, delta_fov, min_max=[45, 135]):  # increase/decrease fov
273 |         return self.set_fov(cam_id, np.clip(self.cam[cam_id]['fov']+delta_fov, min_max[0], min_max[1]))
274 | 
275 |     def stop_car(self, obj):
276 |         cmd = f'vbp {obj} set_stop'
277 |         res = self.client.request(cmd, -1)
278 |         return res
279 | 
280 |     def nav_to_goal(self, obj, loc): # navigate the agent to a goal location
281 |         # Assign the agent a navigation goal, and use Navmesh to automatically control its movement to reach the goal via the shortest path.
282 |         # The goal should be reachable in the environment.
283 |         x, y, z = loc
284 |         cmd = f'vbp {obj} nav_to_goal {x} {y} {z}'
285 |         res = self.client.request(cmd, -1)
286 |         return res
287 | 
288 |     def nav_to_goal_bypath(self, obj, loc): # navigate the agent to a goal location
289 |         # Assign the agent a navigation goal, and use Navmesh to automatically control its movement to reach the goal via the shortest path.
290 |         # The goal should be reachable in the environment.
291 |         x, y, z = loc
292 |         cmd = f'vbp {obj} nav_to_goal {x} {y} {z}'
293 |         res = self.client.request(cmd, -1)
294 |         return res
295 | 
296 |     def nav_to_random(self, obj, radius, loop): # navigate the agent to a random location
297 |         # Agent randomly selects a point within its own radius range for navigation.
298 |         # The loop parameter controls whether continuous navigation is performed.（True for continuous navigation).
299 |         # Return with the randomly sampled location.
300 |         cmd = f'vbp {obj} nav_random {radius} {loop}'
301 |         res = self.client.request(cmd)
302 |         return res
303 | 
304 |     def nav_to_obj(self, obj, target_obj, distance=200): # navigate the agent to a target object
305 |         # Assign the agent a navigation goal, and use Navmesh to automatically control its movement to reach the goal via the shortest path.
306 |         cmd = f'vbp {obj} nav_to_target {target_obj} {distance}'
307 |         res = self.client.request(cmd, -1)
308 |         return res
309 | 
310 |     def nav_random(self, player, radius, loop): # navigate the agent to a random location
311 |         # Agent randomly selects a point within its own radius range for navigation.
312 |         # The loop parameter controls whether continuous navigation is performed.（True for continuous navigation).
313 |         # Return with the randomly sampled location.
314 |         cmd = f'vbp {player} nav_random {radius} {loop}'
315 |         res = self.client.request(cmd)
316 |         return self.decoder.string2vector(res)
317 | 
318 |     def generate_nav_goal(self, player, radius_max,radius_min):  # navigate the agent to a random location
319 |         # Agent randomly selects a point within its own radius range for navigation.
320 |         # The loop parameter controls whether continuous navigation is performed.（True for continuous navigation).
321 |         # Return with the randomly sampled location.
322 |         cmd = f'vbp {player} generate_nav_goal {radius_max} {radius_min} '
323 |         res = self.client.request(cmd)
324 |         answer_dict = json.loads(res)
325 |         try:
326 |             loc = answer_dict["nav_goal"]
327 |         except:
328 |             loc = answer_dict["Nav_goal"]
329 |         coordinates = re.findall(r"[-+]?\d*\.\d+|\d+", loc)
330 |         # Convert the numbers to floats and store them in an array
331 |         coordinates = [float(coordinate) for coordinate in coordinates]
332 |         return coordinates[0],coordinates[1],coordinates[2]
333 | 
334 |     def set_max_nav_speed(self, obj, max_vel): # set the maximum navigation speed of the car
335 |         cmd = f'vbp {obj} set_nav_speed {max_vel}'
336 |         res = self.client.request(cmd, -1)
337 |         return res
338 | 
339 |     def enter_exit_car(self, obj, player_index):
340 |         # enter or exit the car for a player.
341 |         # If the player is already in the car, it will exit the car. Otherwise, it will enter the car.
342 |         cmd = f'vbp {obj} enter_exit_car {player_index}'
343 |         res = self.client.request(cmd, -1)
344 |         return res
345 | 
346 |     def set_open_door(self, player, return_cmd=False):
347 |         # state: 0 close, 1 open
348 |         cmd = f'vbp {player} set_open_door 1'
349 |         if return_cmd:
350 |             return cmd
351 |         else:
352 |             self.client.request(cmd, -1)
353 |     def carry_body(self,player,return_cmd=False):
354 |         cmd = f'vbp {player} carry_body'
355 |         if return_cmd:
356 |             return cmd
357 |         else:
358 |             self.client.request(cmd, -1)
359 | 
360 |     def drop_body(self,player, return_cmd=False):
361 |         cmd = f'vbp {player} drop_body'
362 |         if return_cmd:
363 |             return cmd
364 |         else:
365 |             self.client.request(cmd, -1)
366 | 
367 |     def Is_picked(self,player, return_cmd = False):
368 |         # check if the player is picked up
369 |         cmd = f'vbp {player} is_picked'
370 |         if return_cmd:
371 |             return cmd
372 |         else:
373 |             res = self.client.request(cmd)
374 |             if '1' in res:
375 |                 return True
376 |             if '0' in res:
377 |                 return False
378 |     def is_carrying(self,player, return_cmd = False):
379 |         # check if the player is carrying a body
380 |         cmd = f'vbp {player} is_carrying'
381 |         if return_cmd:
382 |             return cmd
383 |         else:
384 |             res = self.client.request(cmd)
385 |             if '1' in res:
386 |                 return True
387 |             if '0' in res:
388 |                 return False
389 | 
390 |     def set_viewport(self, player):
391 |         # set the game window to the player's view
392 |         cmd = f'vbp {player} set_viewport'
393 |         res = self.client.request(cmd, -1)
394 |         return res
395 | 
396 |     def get_pose_img_batch(self, objs_list, cam_ids, img_flag=[False, True, False, False]):
397 |         # get pose and image of objects in objs_list from cameras in cam_ids
398 |         cmd_list = []
399 |         decoder_list = []
400 |         [use_cam_pose, use_color, use_mask, use_depth] = img_flag
401 |         for obj in objs_list:
402 |             cmd_list.extend([self.get_obj_location(obj, True),
403 |                              self.get_obj_rotation(obj, True)])
404 | 
405 |         for cam_id in cam_ids:
406 |             if cam_id < 0:
407 |                 continue
408 |             if use_cam_pose:
409 |                 cmd_list.extend([self.get_cam_location(cam_id, return_cmd=True),
410 |                                  self.get_cam_rotation(cam_id, return_cmd=True)])
411 |             if use_color:
412 |                 cmd_list.append(self.get_image(cam_id, 'lit', 'bmp', return_cmd=True))
413 |             if use_mask:
414 |                 cmd_list.append(self.get_image(cam_id, 'object_mask', 'bmp', return_cmd=True))
415 |             if use_depth:
416 |                 cmd_list.append(f'vget /camera/{cam_id}/depth npy')
417 |                 # cmd_list.append(self.get_image(cam_id, 'depth', 'bmp', return_cmd=True))
418 | 
419 |         decoders = [self.decoder.decode_map[self.decoder.cmd2key(cmd)] for cmd in cmd_list]
420 |         res_list = self.batch_cmd(cmd_list, decoders)
421 |         obj_pose_list = []
422 |         cam_pose_list = []
423 |         img_list = []
424 |         mask_list = []
425 |         depth_list = []
426 |         # start to read results
427 |         start_point = 0
428 |         for i, obj in enumerate(objs_list):
429 |             obj_pose_list.append(res_list[start_point] + res_list[start_point+1])
430 |             start_point += 2
431 |         for i, cam_id in enumerate(cam_ids):
432 |             # print(cam_id)
433 |             if cam_id < 0:
434 |                 img_list.append(np.zeros((self.resolution[1], self.resolution[0], 3), dtype=np.uint8))
435 |                 continue
436 |             if use_cam_pose:
437 |                 cam_pose_list.append(res_list[start_point] + res_list[start_point+1])
438 |                 start_point += 2
439 |             if use_color:
440 |                 # image = self.decoder.decode_bmp(res_list[start_point])
441 |                 img_list.append(res_list[start_point])
442 |                 start_point += 1
443 |             if use_mask:
444 |                 # image = self.decoder.decode_bmp(res_list[start_point])
445 |                 mask_list.append(res_list[start_point])
446 |                 start_point += 1
447 |             if use_depth:
448 |                 # image = 1 / self.decoder.decode_depth(res_list[start_point],bytesio=False)
449 |                 # image = self.decoder.decode_depth(res_list[start_point],bytesio=False)
450 |                 image = self.get_depth(cam_id,show=False)
451 |                 # image = np.expand_dims(image, axis=-1)
452 |                 depth_list.append(image)  # 500 is the default max depth of most depth cameras
453 |                 # depth_list.append(res_list[start_point])  # 500 is the default max depth of most depth cameras
454 |                 start_point += 1
455 | 
456 |         return obj_pose_list, cam_pose_list, img_list, mask_list, depth_list
457 | 
458 |     # Domain Randomization Functions: randomize texture
459 |     def set_texture(self, player, color=(1, 1, 1), param=(0, 0, 0), picpath=None, tiling=1, e_num=0): #[r, g, b, meta, spec, rough, tiling, picpath]
460 |         param = param / param.max()
461 |         r, g, b = color
462 |         meta, spec, rough = param
463 |         cmd = f'vbp {player} set_mat {e_num} {r} {g} {b} {meta} {spec} {rough} {tiling} {picpath}'
464 |         self.client.request(cmd, -1)
465 | 
466 |     def set_light(self, obj, direction, intensity, color): # param num out of range
467 |         [roll, yaw, pitch] = direction
468 |         color = color / color.max()
469 |         [r, g, b] = color
470 |         cmd = f'vbp {obj} set_light {roll} {yaw} {pitch} {intensity} {r} {g} {b}'
471 |         self.client.request(cmd, -1)
472 | 
473 |     def random_texture(self, backgrounds, img_dirs, num=5):
474 |         if num < 0:
475 |             sample_index = range(len(backgrounds))
476 |         else:
477 |             sample_index = np.random.choice(len(backgrounds), num, replace=False)
478 |         for id in sample_index:
479 |             obj = backgrounds[id]
480 |             img_dir = img_dirs[np.random.randint(0, len(img_dirs))]
481 |             self.set_texture(obj, (1, 1, 1), np.random.uniform(0, 1, 3), img_dir, np.random.randint(1, 4))
482 |             time.sleep(0.03)
483 | 
484 |     def random_player_texture(self, player, img_dirs, num):
485 |         sample_index = np.random.choice(5, num)
486 |         for id in sample_index:
487 |             img_dir = img_dirs[np.random.randint(0, len(img_dirs))]
488 |             self.set_texture(player, (1, 1, 1), np.random.uniform(0, 1, 3),
489 |                              img_dir, np.random.randint(2, 6), id)
490 |             time.sleep(0.03)
491 | 
492 |     def random_character(self, player):  # appearance, speed, acceleration
493 |         self.set_max_speed(player, np.random.randint(40, 100))
494 |         self.set_acceleration(player, np.random.randint(100, 300))
495 | 
496 |     def random_lit(self, light_list):
497 |         for lit in light_list:
498 |             if 'sky' in lit:
499 |                 self.set_skylight(lit, [1, 1, 1], np.random.uniform(1, 10))
500 |             else:
501 |                 lit_direction = np.random.uniform(-1, 1, 3)
502 |                 if 'directional' in lit:
503 |                     lit_direction[0] = lit_direction[0] * 60
504 |                     lit_direction[1] = lit_direction[1] * 80
505 |                     lit_direction[2] = lit_direction[2] * 60
506 |                 else:
507 |                     lit_direction *= 180
508 |                 self.set_light(lit, lit_direction, np.random.uniform(1, 5), np.random.uniform(0.3, 1, 3))
509 | 
510 |     def set_skylight(self, obj, color, intensity): # param num out of range
511 |         [r, g, b] = color
512 |         cmd = f'vbp {obj} set_light {r} {g} {b} {intensity}'
513 |         self.client.request(cmd, -1)
514 | 
515 |     def get_obj_speed(self,obj):
516 |         cmd = f'vbp {obj} get_speed'
517 |         res = self.client.request(cmd)
518 |         answer_dict = json.loads(res)
519 |         speed = float(answer_dict["Speed"])
520 |         return speed
521 | 
522 |     def check_visibility(self, tracker_cam_id, target_obj):
523 |         # get the percentage of the target object visible in the tracker camera
524 |         mask = self.get_image(tracker_cam_id, 'object_mask')
525 |         print(mask.shape)
526 |         mask, bbox = self.get_bbox(mask, target_obj, normalize=False)
527 |         mask_percent = mask.sum()/(mask.shape[0]*mask.shape[1])
528 |         return mask_percent


--------------------------------------------------------------------------------
/gym_rescue/envs/base_env.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import warnings
  3 | import gym
  4 | import numpy as np
  5 | from gym import spaces
  6 | from gym_rescue.envs.utils import misc
  7 | from unrealcv.launcher import RunUnreal
  8 | from gym_rescue.envs.agent.character import Character_API
  9 | import random
 10 | import sys
 11 | import time
 12 | import warnings
 13 | import cv2
 14 | # warnings.filterwarnings("ignore", category=UserWarning)
 15 | ''' 
 16 | It is a base env for general purpose agent-env interaction, including single/multi-agent navigation, tracking, etc.
 17 | Observation : raw color image and depth
 18 | Action:  Discrete/Continuous
 19 | Done : define by the task wrapper
 20 | '''
 21 | 
 22 | class UnrealCv_base(gym.Env):
 23 |     def __init__(self,
 24 |                  setting_file,  # the setting file to define the task
 25 |                  action_type='Discrete',  # 'discrete', 'continuous'
 26 |                  observation_type='Color',  # 'color', 'depth', 'rgbd', 'Gray'
 27 |                  resolution=(160, 160),
 28 |                  reset_type=0
 29 |                  ):
 30 | 
 31 |         setting = misc.load_env_setting(setting_file)
 32 |         self.env_name = setting['env_name']
 33 |         # self.max_steps = setting['max_steps']
 34 |         self.height = setting['height']
 35 |         self.cam_id = [setting['third_cam']['cam_id']]
 36 |         self.agent_configs = setting['agents']
 37 |         self.env_configs = setting["env"]
 38 |         self.agents = misc.convert_dict(self.agent_configs)
 39 |         self.reset_type=reset_type
 40 |         # TODO: it is useless.
 41 |         self.character = {
 42 |             'player': [], # the list of player to control
 43 |             'npc': [], # the list of Non-player character
 44 |             'freeze': [], # the list of player that exists in the scene, but it is frozen
 45 |         }
 46 | 
 47 |         self.height_top_view = setting['third_cam']['height_top_view']
 48 | 
 49 |         # self.env_obj_list = self.env_configs[""]
 50 |         self.objects_list = []
 51 |         self.reset_area = setting['reset_area']
 52 | 
 53 |         self.safe_start = setting['safe_start']
 54 |         self.interval = setting['interval']
 55 |         self.random_init = setting['random_init']
 56 |         self.start_area = self.get_start_area(self.safe_start[0], 500)
 57 |         # the start area of the agent, where we don't put obstacles when using environment augmentation
 58 |         self.count_eps = 0
 59 |         self.count_steps = 0
 60 | 
 61 |         # env configs
 62 |         self.docker = False
 63 |         self.resolution = resolution
 64 |         self.display = None
 65 |         self.use_opengl = False
 66 |         self.offscreen_rendering = False
 67 |         self.nullrhi = False
 68 |         self.gpu_id = None  # None means using the default gpu
 69 |         self.sleep_time = 5
 70 |         self.launched = False
 71 |         self.comm_mode = 'tcp' # 'tcp' or 'unix', 'unix' is only available in Linux
 72 |         self.render_quality = 2
 73 |         self.use_lumen = False
 74 | 
 75 |         self.agents_category = ['player', 'animal'] # the agent category we use in the env
 76 |         self.protagonist_id = 0 # the player index that we use to interact with the env
 77 | 
 78 |         # init agents
 79 |         self.player_list = list(self.agents.keys())
 80 |         self.cam_list = [self.agents[player]['cam_id'] for player in self.player_list]
 81 | 
 82 |         # define action space
 83 |         self.action_type = action_type
 84 |         assert self.action_type in ['Discrete', 'Continuous', 'Mixed']
 85 |         # self.action_space = [self.define_action_space(self.action_type, self.agents[obj]) for obj in self.player_list]
 86 |         self.action_space = self.define_action_space(self.action_type,self.agents[self.player_list[self.protagonist_id]])
 87 |         # define observation space,
 88 |         # color, depth, rgbd,...
 89 |         self.observation_type = observation_type
 90 |         assert self.observation_type in ['Color', 'Depth', 'Rgbd', 'Gray', 'CG', 'Mask', 'Pose','MaskDepth','ColorMask']
 91 |         # self.observation_space = [self.define_observation_space(self.cam_list[i], self.observation_type, resolution)
 92 |         #                           for i in range(len(self.player_list))]
 93 |         self.observation_space = self.define_observation_space(self.cam_list[self.protagonist_id], self.observation_type, resolution)
 94 |         self.goal_show = np.zeros((resolution[1], resolution[0], 3), dtype=np.uint8)
 95 |         # config unreal env
 96 |         if 'linux' in sys.platform:
 97 |             env_bin = setting['env_bin']["Linux"]
 98 |         elif 'darwin' in sys.platform:
 99 |             env_bin = setting['env_bin']["Mac"]
100 |         elif 'win' in sys.platform:
101 |             env_bin = setting['env_bin']["Win"]
102 | 
103 |         if 'env_map' in setting.keys():
104 |             self.env_map = setting['env_map']
105 |         else:
106 |             self.env_map = None
107 | 
108 |         self.ue_binary = RunUnreal(ENV_BIN=env_bin, ENV_MAP=self.env_map) # the binary file to launch the UE5
109 |         self.info = dict(
110 |             Collision=0,
111 |             termination=False,
112 |             truncation=False,
113 |             Reward=0.0,
114 |             Action=None,
115 |             Pose=[],
116 |             Steps=0,
117 |             Direction=None,
118 |             Distance=None,
119 |             Color=None,
120 |             Depth=None,
121 |             Relative_Pose=[],
122 |             picked = False,
123 |             reference_text=[],
124 |             reference_image=[],
125 |         )
126 |     def step(self, actions):
127 | 
128 |         actions2move, actions2turn, actions2animate = self.action_mapping(actions, self.player_list)
129 |         if self.info['picked'] and actions2animate[self.protagonist_id] == 'jump':
130 |             # the agent is holding the object, it can't jump
131 |             actions2animate[self.protagonist_id] = 'stand'
132 | 
133 |         move_cmds = [self.unrealcv.set_move_bp(obj, actions2move[i], return_cmd=True) for i, obj in enumerate(self.player_list) if actions2move[i] is not None]
134 |         head_cmds = [self.unrealcv.set_cam(obj, self.agents[obj]['relative_location'], actions2turn[i], return_cmd=True) for i, obj in enumerate(self.player_list) if actions2turn[i] is not None]
135 |         anim_cmds = [self.unrealcv.set_animation(obj, actions2animate[i], return_cmd=True) for i, obj in enumerate(self.player_list) if actions2animate[i] is not None]
136 |         if actions2animate[self.protagonist_id] == 'carry' or actions2animate[self.protagonist_id] == 'drop':
137 |             time.sleep(1)
138 |         self.unrealcv.batch_cmd(move_cmds+head_cmds+anim_cmds, None)
139 |         self.count_steps += 1
140 | 
141 |         # get states (multi agent)
142 |         obj_poses, cam_poses, imgs, masks, depths = self.unrealcv.get_pose_img_batch(self.player_list, self.cam_list, self.cam_flag)
143 |         self.obj_poses = obj_poses
144 |         observations = self.prepare_observation(self.observation_type, imgs, masks, depths, obj_poses)
145 |         self.img_show = self.prepare_img2show(self.protagonist_id, observations)
146 |         pose_obs, relative_pose = self.get_pose_states(obj_poses)
147 | 
148 |         # prepare the info
149 |         self.info['Pose'] = obj_poses
150 |         self.info['Relative_Pose'] = relative_pose
151 |         self.info['Pose_Obs'] = pose_obs
152 |         self.info['Reward'] = 0
153 | 
154 |         return observations[0], self.info['Reward'], self.info['termination'], self.info['truncation'],self.info
155 | 
156 |     def reset(self):
157 | 
158 |         if not self.launched:  # first time to launch
159 |             self.launched = self.launch_ue_env() # launch the UE4 binary and connect to UnrealCV
160 |             self.init_agents() # init agents
161 |             self.init_objects() # init objects
162 | 
163 |         self.count_close = 0
164 |         self.count_steps = 0
165 |         self.count_eps += 1
166 | 
167 |         # stop move and disable physics
168 |         for i, obj in enumerate(self.player_list):
169 |             if self.agents[obj]['agent_type'] in self.agents_category:
170 |                 if not self.agents[obj]['internal_nav']:
171 |                     # self.unrealcv.set_move_bp(obj, [0, 100])
172 |                     # self.unrealcv.set_max_speed(obj, 100)
173 |                     continue
174 |                     # self.unrealcv.set_phy(obj, 1)
175 |             elif self.agents[obj]['agent_type'] == 'drone':
176 |                 self.unrealcv.set_move_bp(obj, [0, 0, 0, 0])
177 |                 self.unrealcv.set_phy(obj, 0)
178 | 
179 |         # reset target location
180 |         if self.random_init:
181 |             init_poses = self.sample_init_pose(self.random_init, len(self.player_list))
182 |             for i, obj in enumerate(self.player_list):
183 |                 self.unrealcv.set_obj_location(obj, init_poses[i])
184 |             # set view point
185 |                 self.unrealcv.set_cam(obj, self.agents[obj]['relative_location'], self.agents[obj]['relative_rotation'])
186 |         # get state
187 |         observations, self.obj_poses, self.img_show = self.update_observation(self.player_list, self.cam_list, self.cam_flag, self.observation_type)
188 | 
189 |         #latest gym version support return info in reset()
190 |         #return info
191 |         self.info['pose']=self.obj_poses
192 |         self.info['Steps']=self.count_steps
193 |         self.info['termination']=False
194 |         self.info['picked']=False
195 | 
196 |         return observations[0], self.info
197 | 
198 |     def close(self):
199 |         if self.launched:
200 |             self.unrealcv.client.disconnect()
201 |             self.ue_binary.close()
202 |             self.launched = False
203 | 
204 |     def render(self, mode='obs', show=True,save=None):
205 |         if mode == 'obs':
206 |             frame = self.img_show
207 |         elif mode =='ref':
208 |             frame = self.goal_show
209 |         else: # ref+obs
210 |             self.goal_show = cv2.resize(self.goal_show, (int(1.5*self.resolution[1]), self.resolution[1]))
211 |             frame = cv2.hconcat((self.goal_show, self.img_show))
212 |         if save is not None:
213 |             self.goal_show = cv2.resize(self.goal_show, (int(1.5*self.resolution[1]), self.resolution[1]))
214 |             frame = cv2.hconcat((self.goal_show, self.img_show))
215 |             save.write(frame)
216 |         if show:
217 |             cv2.imshow('frame', frame)
218 |             cv2.waitKey(1)
219 |         return frame,save
220 | 
221 |     def seed(self, seed=None):
222 |         np.random.seed(seed)
223 |         # if seed is not None:
224 |         #     self.player_num = seed % (self.max_player_num-2) + 2
225 | 
226 |     def update_observation(self, player_list, cam_list, cam_flag, observation_type):
227 |         obj_poses, cam_poses, imgs, masks, depths = self.unrealcv.get_pose_img_batch(player_list, cam_list, cam_flag)
228 |         observations = self.prepare_observation(observation_type, imgs, masks, depths, obj_poses)
229 |         img_show = self.prepare_img2show(self.protagonist_id, observations)
230 |         return observations, obj_poses, img_show
231 | 
232 |     def get_start_area(self, safe_start, safe_range):
233 |         start_area = [safe_start[0]-safe_range, safe_start[0]+safe_range,
234 |                      safe_start[1]-safe_range, safe_start[1]+safe_range]
235 |         return start_area
236 | 
237 |     def set_topview(self, current_pose, cam_id):
238 |         cam_loc = current_pose[:3]
239 |         cam_loc[-1] = self.height_top_view
240 |         cam_rot = [-90, 0, 0]
241 |         self.unrealcv.set_cam_location(cam_id, cam_loc)
242 |         self.unrealcv.set_cam_rotation(cam_id, cam_rot)
243 | 
244 |     def get_relative(self, pose0, pose1):  # pose0-centric
245 |         delt_yaw = pose1[4] - pose0[4]
246 |         angle = misc.get_direction(pose0, pose1)
247 |         distance = self.unrealcv.get_distance(pose1, pose0, 3)
248 |         # distance_norm = distance / self.exp_distance
249 |         obs_vector = [np.sin(delt_yaw/180*np.pi), np.cos(delt_yaw/180*np.pi),
250 |                       np.sin(angle/180*np.pi), np.cos(angle/180*np.pi),
251 |                       distance]
252 |         return obs_vector, distance, angle
253 | 
254 |     def prepare_observation(self, observation_type, img_list, mask_list, depth_list, pose_list):
255 |         if observation_type == 'Depth':
256 |             return np.array(depth_list)
257 |         elif observation_type == 'Mask':
258 |             return np.array(mask_list)
259 |         elif observation_type == 'Color':
260 |             return np.array(img_list)
261 |         elif observation_type == 'Rgbd':
262 |             return np.append(np.array(img_list), np.array(depth_list), axis=-1)
263 |         elif observation_type == 'Pose':
264 |             return np.array(pose_list)
265 |         elif observation_type == 'MaskDepth':
266 |             return np.append(np.array(mask_list), np.array(depth_list), axis=-1)
267 |         elif observation_type =='ColorMask':
268 |             return np.append(np.array(img_list), np.array(mask_list), axis=-1)
269 | 
270 |     def relative_metrics(self, relative_pose):
271 |         # compute the relative relation (collision, in-the-view, misleading) among agents for rewards and evaluation metrics
272 |         info = dict()
273 |         relative_dis = relative_pose[:, :, 0]
274 |         relative_ori = relative_pose[:, :, 1]
275 |         collision_mat = np.zeros_like(relative_dis)
276 |         collision_mat[np.where(relative_dis < 100)] = 1
277 |         collision_mat[np.where(np.fabs(relative_ori) > 45)] = 0  # collision should be at the front view
278 |         info['collision'] = collision_mat
279 |         info['dis_ave'] = relative_dis.mean() # average distance among players, regard as a kind of density metric
280 | 
281 |         return info
282 | 
283 |     def add_agent(self, name, loc, refer_agent):
284 |         new_dict = refer_agent.copy()
285 |         cam_num = self.unrealcv.get_camera_num()
286 |         self.unrealcv.new_obj(refer_agent['class_name'], name, random.sample(self.safe_start, 1)[0])
287 |         self.player_list.append(name)
288 |         if self.unrealcv.get_camera_num() > cam_num:
289 |             new_dict['cam_id'] = cam_num
290 |         else:
291 |             new_dict['cam_id'] = -1
292 |         self.cam_list.append(new_dict['cam_id'])
293 |         self.unrealcv.set_obj_scale(name, refer_agent['scale'])
294 |         self.unrealcv.set_obj_color(name, np.random.randint(0, 255, 3))
295 |         self.unrealcv.set_random(name, 0)
296 |         self.unrealcv.set_interval(self.interval, name)
297 |         self.unrealcv.set_obj_location(name, loc)
298 |         self.unrealcv.set_phy(name, 0)
299 |         return new_dict
300 | 
301 |     def remove_agent(self, name):
302 |         # print(f'remove {name}')
303 |         agent_index = self.player_list.index(name)
304 |         self.player_list.remove(name)
305 |         last_cam_list = self.cam_list
306 |         self.cam_list = self.remove_cam(name)
307 |         self.action_space.pop(agent_index)
308 |         self.observation_space.pop(agent_index)
309 |         self.unrealcv.destroy_obj(name)  # the agent is removed from the scene
310 |         self.agents.pop(name)
311 |         st_time = time.time()
312 |         while self.unrealcv.get_camera_num() > len(last_cam_list) + 1:
313 |             if (time.time() - st_time) > 10:
314 |                 print('remove agent timeout')
315 |                 break
316 |             pass
317 | 
318 |     def remove_cam(self, name):
319 |         cam_id = self.agents[name]['cam_id']
320 |         cam_list = []
321 |         for obj in self.player_list:
322 |             if self.agents[obj]['cam_id'] > cam_id and cam_id > 0:
323 |                 self.agents[obj]['cam_id'] -= 1
324 |             cam_list.append(self.agents[obj]['cam_id'])
325 |         return cam_list
326 | 
327 |     def define_action_space(self, action_type, agent_info):
328 |         if action_type == 'Discrete':
329 |             return spaces.Discrete(len(agent_info["move_action"]))
330 |         elif action_type == 'Continuous':
331 |             return spaces.Box(low=np.array(agent_info["move_action_continuous"]['low']),
332 |                               high=np.array(agent_info["move_action_continuous"]['high']), dtype=np.float32)
333 |         else:  # Hybrid
334 |             move_space = spaces.Box(low=np.array(agent_info["move_action_continuous"]['low']),
335 |                                     high=np.array(agent_info["move_action_continuous"]['high']), dtype=np.float32)
336 |             turn_space = None
337 |             animation_space = None
338 |             if "head_action" in agent_info.keys():
339 |                 turn_space = spaces.Discrete(len(agent_info["head_action"]))
340 |             if "animation_action" in agent_info.keys():
341 |                 animation_space = spaces.Discrete(len(agent_info["animation_action"]))
342 |             return spaces.Tuple((move_space, turn_space, animation_space))
343 | 
344 |     def define_observation_space(self, cam_id, observation_type, resolution=(160, 120)):
345 |         if observation_type == 'Pose' or cam_id < 0:
346 |             observation_space = spaces.Box(low=-100, high=100, shape=(6,),
347 |                                                dtype=np.float16)  # TODO check the range and shape
348 |         else:
349 |             if observation_type == 'Color' or observation_type == 'CG' or observation_type == 'Mask':
350 |                 img_shape = (resolution[1], resolution[0], 3)
351 |                 observation_space = spaces.Box(low=0, high=255, shape=img_shape, dtype=np.uint8)
352 |             elif observation_type == 'Depth':
353 |                 img_shape = (resolution[1], resolution[0], 1)
354 |                 observation_space = spaces.Box(low=0, high=100, shape=img_shape, dtype=np.float16)
355 |             elif observation_type == 'Rgbd':
356 |                 s_low = np.zeros((resolution[1], resolution[0], 4))
357 |                 s_high = np.ones((resolution[1], resolution[0], 4))
358 |                 s_high[:, :, -1] = 100.0  # max_depth
359 |                 s_high[:, :, :-1] = 255  # max_rgb
360 |                 observation_space = spaces.Box(low=s_low, high=s_high, dtype=np.float16)
361 |             elif observation_type == 'MaskDepth':
362 |                 s_low = np.zeros((resolution[1], resolution[0], 4))
363 |                 s_high = np.ones((resolution[1], resolution[0], 4))
364 |                 s_high[:, :, -1] = 100.0  # max_depth
365 |                 s_high[:, :, :-1] = 255  # max_rgb
366 |                 observation_space = spaces.Box(low=s_low, high=s_high, dtype=np.float16)
367 |             elif observation_type=='ColorMask':
368 |                 img_shape = (resolution[1], resolution[0], 6)
369 |                 observation_space = spaces.Box(low=0, high=255, shape=img_shape, dtype=np.uint8)
370 |         return observation_space
371 | 
372 |     def sample_init_pose(self, use_reset_area=False, num_agents=1):
373 |         # sample poses to reset the agents
374 |         if num_agents > len(self.safe_start):
375 |             use_reset_area = True
376 |             warnings.warn('The number of agents is less than the number of pre-defined start points, random sample points from the pre-defined area instead.')
377 |         if use_reset_area:
378 |             locations = self.sample_from_area(self.reset_area, num_agents)  # sample from a pre-defined area
379 |         else:
380 |             locations = random.sample(self.safe_start, num_agents) # sample one pre-defined start point
381 |         return locations
382 | 
383 |     def random_app(self):
384 |         app_map = {
385 |             'player': range(1, 19),
386 |             'animal': range(0, 27),
387 |         }
388 |         for obj in self.player_list:
389 |             category = self.agents[obj]['agent_type']
390 |             if category not in app_map.keys():
391 |                 continue
392 |             app_id = np.random.choice(app_map[category])
393 |             self.unrealcv.set_appearance(obj, app_id)
394 | 
395 |     def environment_augmentation(self, player_mesh=False, player_texture=False,
396 |                                  light=False, background_texture=False,
397 |                                  layout=False, layout_texture=False):
398 |         app_map = {
399 |             'player': range(1, 19),
400 |             'animal': range(0, 27),
401 |         }
402 |         if player_mesh:  # random human mesh
403 |             for obj in self.player_list:
404 |                 app_id = np.random.choice(app_map[self.agents[obj]['agent_type']])
405 |                 self.unrealcv.set_appearance(obj, app_id)
406 |         # random light and texture of the agents
407 |         if player_texture:
408 |             if self.env_name == 'FlexibleRoom':  # random target texture
409 |                 for obj in self.player_list:
410 |                     if self.agents[obj]['agent_type'] == 'player':
411 |                         self.unrealcv.random_player_texture(obj, self.textures_list, 3)
412 |         if light:
413 |             self.unrealcv.random_lit(self.env_configs["lights"])
414 | 
415 |         # random the texture of the background
416 |         if background_texture:
417 |             self.unrealcv.random_texture(self.env_configs["backgrounds"], self.textures_list, 3)
418 | 
419 |         # random place the obstacle
420 |         if layout:
421 |             self.unrealcv.clean_obstacles()
422 |             self.unrealcv.random_obstacles(self.objects_list, self.textures_list,
423 |                                            15, self.reset_area, self.start_area, layout_texture)
424 | 
425 |     def get_pose_states(self, obj_pos):
426 |         # get the relative pose of each agent and the absolute location and orientation of the agent
427 |         pose_obs = []
428 |         player_num = len(obj_pos)
429 |         np.zeros((player_num, player_num, 2))
430 |         relative_pose = np.zeros((player_num, player_num, 2))
431 |         for j in range(player_num):
432 |             vectors = []
433 |             for i in range(player_num):
434 |                 obs, distance, direction = self.get_relative(obj_pos[j], obj_pos[i])
435 |                 yaw = obj_pos[j][4]/180*np.pi
436 |                 # rescale the absolute location and orientation
437 |                 abs_loc = [obj_pos[i][0], obj_pos[i][1],
438 |                            obj_pos[i][2], np.cos(yaw), np.sin(yaw)]
439 |                 obs = obs + abs_loc
440 |                 vectors.append(obs)
441 |                 relative_pose[j, i] = np.array([distance, direction])
442 |             pose_obs.append(vectors)
443 | 
444 |         return np.array(pose_obs), relative_pose
445 | 
446 |     def launch_ue_env(self):
447 |         # launch the UE4 binary and connect to UnrealCV
448 |         env_ip, env_port = self.ue_binary.start(docker=self.docker, resolution=self.resolution, display=self.display,
449 |                                                opengl=self.use_opengl, offscreen=self.offscreen_rendering,
450 |                                                nullrhi=self.nullrhi,sleep_time=10)
451 |         # connect UnrealCV
452 |         self.unrealcv = Character_API(port=env_port, ip=env_ip, resolution=self.resolution, comm_mode=self.comm_mode)
453 |         self.unrealcv.set_map(self.env_map)
454 |         self.unrealcv.config_ue(quality=self.render_quality, Lumen=self.use_lumen)
455 |         return True
456 | 
457 |     def init_agents(self):
458 |         for obj in self.player_list.copy(): # the agent will be fully removed in self.agents
459 |             if self.agents[obj]['agent_type'] not in self.agents_category:
460 |                 self.remove_agent(obj)
461 | 
462 |         for obj in self.player_list:
463 |             self.unrealcv.set_obj_scale(obj, self.agents[obj]['scale'])
464 |             self.unrealcv.set_random(obj, 0)
465 |             self.unrealcv.set_interval(self.interval, obj)
466 | 
467 |         self.unrealcv.build_color_dict(self.player_list)
468 |         self.cam_flag = self.get_cam_flag(self.observation_type)
469 | 
470 |     def init_objects(self):
471 |         self.unrealcv.init_objects(self.objects_list)
472 | 
473 |     def prepare_img2show(self, index, states):
474 |         if self.observation_type == 'Rgbd':
475 |             return states[index][:, :, :3]
476 |         elif self.observation_type in ['Color', 'Gray', 'CG', 'Mask']:
477 |             return states[index]
478 |         elif self.observation_type == 'Depth':
479 |             return states[index]/states[index].max()  # normalize the depth image
480 |         else:
481 |             return None
482 | 
483 |     def set_population(self, num_agents):
484 |         while len(self.player_list) < num_agents:
485 |             refer_agent = self.agents[random.choice(list(self.agents.keys()))]
486 |             name = f'{refer_agent["agent_type"]}_EP{self.count_eps}_{len(self.player_list)}'
487 |             self.agents[name] = self.add_agent(name, random.choice(self.safe_start), refer_agent)
488 |         while len(self.player_list) > num_agents:
489 |             self.remove_agent(self.player_list[-1])  # remove the last one
490 | 
491 |     def set_npc(self):
492 |         # TODO: set the NPC agent
493 |         return self.player_list.index(random.choice([x for x in self.player_list if x > 0]))
494 | 
495 |     def set_agent(self):
496 |         # the agent is controlled by the external controller
497 |         return self.cam_list.index(random.choice([x for x in self.cam_list if x > 0]))
498 | 
499 |     def action_mapping(self, actions, player_list):
500 |         actions2move = []
501 |         actions2animate = []
502 |         actions2head = []
503 |         for i, obj in enumerate(player_list):
504 |             action_space = self.action_space
505 |             act = actions[i]
506 |             if act is None:  # if the action is None, then we don't control this agent
507 |                 actions2move.append(None)  # place holder
508 |                 actions2animate.append(None)
509 |                 actions2head.append(None)
510 |                 continue
511 |             if isinstance(action_space, spaces.Discrete):
512 |                 actions2move.append(self.agents[obj]["move_action"][act])
513 |                 actions2animate.append(None)
514 |                 actions2head.append(None)
515 |             elif isinstance(action_space, spaces.Box):
516 |                 actions2move.append(act)
517 |                 actions2animate.append(None)
518 |                 actions2head.append(None)
519 |             elif isinstance(action_space, spaces.Tuple):
520 |                 for j, action in enumerate(actions[i]):
521 |                     if j == 0:
522 |                         if isinstance(action, int):
523 |                             actions2move.append(self.agents[obj]["move_action"][action])
524 |                         else:
525 |                             action = np.clip(action, self.action_space[0].low, self.action_space[0].high)
526 |                             actions2move.append(action)
527 |                     elif j == 1:
528 |                         if isinstance(action, int):
529 |                             actions2head.append(self.agents[obj]["head_action"][action])
530 |                         else:
531 |                             actions2head.append(action)
532 |                     elif j == 2:
533 |                         actions2animate.append(self.agents[obj]["animation_action"][action])
534 |         return actions2move, actions2head, actions2animate
535 | 
536 | 
537 |     def get_cam_flag(self, observation_type, use_color=False, use_mask=False, use_depth=False, use_cam_pose=False):
538 |         # get flag for camera
539 |         # observation_type: 'color', 'depth', 'mask', 'cam_pose'
540 |         flag = [False, False, False, False]
541 |         flag[0] = use_cam_pose
542 |         flag[1] = observation_type == 'Color' or observation_type == 'Rgbd' or use_color or observation_type == 'ColorMask'
543 |         flag[2] = observation_type == 'Mask' or use_mask or observation_type == 'MaskDepth' or observation_type == 'ColorMask'
544 |         flag[3] = observation_type == 'Depth' or observation_type == 'Rgbd' or use_depth or observation_type == 'MaskDepth'
545 |         return flag
546 | 
547 |     def sample_from_area(self, area, num):
548 |         x = np.random.randint(area[0], area[1], num)
549 |         y = np.random.randint(area[2], area[3], num)
550 |         z = np.random.randint(area[4], area[5], num)
551 |         return np.vstack((x, y, z)).T
552 | 
553 |     def get_startpoint(self, target_pos=[], distance=None, reset_area=[], exp_height=200, direction=None):
554 |         for i in range(5):  # searching a safe point
555 |             if direction == None:
556 |                 direction = 2 * np.pi * np.random.sample(1)
557 |             else:
558 |                 direction = direction % (2 * np.pi)
559 |             if distance == None:
560 |                 x = np.random.randint(reset_area[0], reset_area[1])
561 |                 y = np.random.randint(reset_area[2], reset_area[3])
562 |             else:
563 |                 dx = float(distance * np.cos(direction))
564 |                 dy = float(distance * np.sin(direction))
565 |                 x = dx + target_pos[0]
566 |                 y = dy + target_pos[1]
567 |             cam_pos_exp = [x, y, exp_height]
568 |             if reset_area[0] < x < reset_area[1] and reset_area[2] < y < reset_area[3]:
569 |                 cam_pos_exp[0] = x
570 |                 cam_pos_exp[1] = y
571 |                 return cam_pos_exp
572 |         return []
573 | 


--------------------------------------------------------------------------------
/gym_rescue/envs/rescue.py:
--------------------------------------------------------------------------------
  1 | # from gym_unrealcv import settings
  2 | import time
  3 | from gym_rescue.envs.base_env import UnrealCv_base
  4 | import numpy as np
  5 | import random
  6 | '''
  7 | Tasks: The task is to make the agents find the target(injured person) and rescue him. 
  8 | The agents are allowed to communicate with others.
  9 | The agents observe the environment with their own camera.
 10 | The agents are rewarded based on the distance and orientation to the target.
 11 | The episode ends when the agents reach the target(injured agent) or the maximum steps are reached.
 12 | '''
 13 | 
 14 | class Rescue(UnrealCv_base):
 15 |     def __init__(self,
 16 |                  env_file,  # the setting file to define the task
 17 |                  task_file=None,  # the file to define the task TODO: use this file to config task specific parameters
 18 |                  action_type='Discrete',  # 'discrete', 'continuous'
 19 |                  observation_type='Color',  # 'color', 'depth', 'rgbd', 'Gray'
 20 |                  resolution=(160, 160),
 21 |                  reset_type=0
 22 |                  ):
 23 |         super(Rescue, self).__init__(setting_file=env_file,  # the setting file to define the task
 24 |                                      action_type=action_type,  # 'discrete', 'continuous'
 25 |                                      observation_type=observation_type,  # 'color', 'depth', 'rgbd', 'Gray'
 26 |                                      resolution=resolution,
 27 |                                      reset_type=reset_type)
 28 |         self.count_reach = 0
 29 |         self.max_reach_steps = 5
 30 |         self.distance_threshold = 120
 31 |         self.agents_category = ['player']
 32 |         self.injured_agent = self.env_configs['injured_player'][0]
 33 |         self.stretcher=self.env_configs['stretcher'][0]
 34 |         self.ambulance = self.env_configs['ambulance'][0]
 35 | 
 36 |         self.injured_player_pose = None
 37 |         self.rescue_pose = None
 38 |         self.agent_pose = None
 39 |         self.ambulance_pose = None
 40 | 
 41 |     def step(self, action):
 42 |         obs, reward, termination, truncation, info = super(Rescue, self).step(action)
 43 |         # compute the useful metrics for rewards and done condition
 44 |         current_pose = [self.unrealcv.get_cam_pose(self.cam_list[self.protagonist_id])]
 45 |         metrics = self.rescue_metrics(current_pose, self.target_pose)
 46 |         info['picked']=metrics['picked']
 47 |         if info['picked']:
 48 |             self.target_pose = self.rescue_pose
 49 |             if self.first_picked is False:  # the first time the agent pick the injured agent
 50 |                 reward = 0.5
 51 |                 self.first_picked = True
 52 | 
 53 |         # done condition
 54 |         current_injured_player_pose = self.unrealcv.get_obj_location(self.injured_agent)+self.unrealcv.get_obj_rotation(self.injured_agent)
 55 |         _, dis_tmp, dire_tmp=self.get_relative(current_injured_player_pose, self.rescue_pose)
 56 | 
 57 |         if not metrics['picked'] and dis_tmp<200: # the agent place the injured agent on the stretcher
 58 |             info['termination'] = True
 59 |             info['truncation'] = False
 60 |             reward = 0.5
 61 |             # if current_injured_player_pose[2]-self.rescue_pose[2] > 120:#the injured agent is on the stretcher
 62 |             #     self.count_reach+=1
 63 |             #     if self.count_reach > self.max_reach_steps:
 64 |             #         info['termination']=True
 65 |             #         info['truncation']=False
 66 |         else:
 67 |             self.count_reach= 0
 68 |         info['Reward'] = reward
 69 | 
 70 |         return obs, reward, info['termination'],info['truncation'], info
 71 | 
 72 |     def reset(self):
 73 |         # initialize the environment
 74 |         states, info = super(Rescue, self).reset()
 75 |         # super(Rescue, self).random_app()
 76 |         # for i in range(15):
 77 |         #     self.unrealcv.set_appearance(self.injured_agent, i)
 78 |         #     self.unrealcv.get_obj_location(self.injured_agent)
 79 |         #     time.sleep(2)
 80 |         # self.unrealcv.set_appearance(self.player_list[self.protagonist_id], 10)
 81 |         self.target_pose = self.injured_player_pose
 82 |         self.count_reach = 0
 83 |         self.first_picked = False
 84 | 
 85 |         #NPC will randomly walking in the environment
 86 |         for i in range(len(self.player_list)):
 87 |             if i !=self.protagonist_id:
 88 |                 self.unrealcv.nav_random(self.player_list[i],1000,1)
 89 |         self.player_list = [self.player_list[self.protagonist_id]]
 90 | 
 91 | 
 92 |         #initialize start location
 93 |         self.unrealcv.set_obj_rotation(self.player_list[self.protagonist_id], self.unrealcv.get_obj_rotation(self.injured_agent))
 94 |         self.unrealcv.set_obj_location(self.player_list[self.protagonist_id], self.unrealcv.get_obj_location(self.injured_agent))
 95 |         self.unrealcv.carry_body(self.player_list[self.protagonist_id])
 96 |         time.sleep(1)
 97 |         self.unrealcv.drop_body(self.player_list[self.protagonist_id])
 98 |         time.sleep(1)
 99 | 
100 |         self.unrealcv.set_obj_rotation(self.player_list[self.protagonist_id], self.agent_pose[3:])
101 |         self.unrealcv.set_obj_location(self.player_list[self.protagonist_id], self.agent_pose[:3])
102 |         self.unrealcv.set_cam_fov(self.cam_list[self.protagonist_id],110)
103 |         time.sleep(1)
104 |         self.unrealcv.set_obj_rotation(self.injured_agent, self.injured_player_pose[3:])
105 |         self.unrealcv.set_obj_location(self.injured_agent, self.injured_player_pose[:3])
106 |         random_color = color = np.random.randint(100, 255, 3) #assign the injured agent a color for mask mode
107 |         self.unrealcv.set_obj_color(self.injured_agent, random_color)
108 | 
109 |         time.sleep(1)
110 |         self.unrealcv.set_obj_rotation(self.stretcher, self.rescue_pose[3:])
111 |         self.unrealcv.set_obj_location(self.stretcher, self.rescue_pose[:3])
112 |         time.sleep(1)
113 |         self.unrealcv.set_phy(self.ambulance, 1)
114 |         self.unrealcv.set_obj_rotation(self.ambulance, self.ambulance_pose[3:])
115 |         self.unrealcv.set_obj_location(self.ambulance, self.ambulance_pose[:3])
116 |         time.sleep(1)
117 |         self.unrealcv.set_phy(self.ambulance, 0)
118 |         # self.unrealcv.init_mask_color(self.injured_agent)
119 |         # self.unrealcv.check_visibility(self.cam_list[self.protagonist_id], self.injured_agent)
120 |         time.sleep(1)
121 |         return states,info
122 | 
123 |     def reward(self, metrics):
124 |         # individual reward
125 |         if 'individual' in self.reward_type:
126 |             if 'sparse' in self.reward_type:
127 |                 rewards = metrics['reach']  # only the agent who reach the target get the reward
128 |             else:
129 |                 rewards = 1 - metrics['dis_each']/self.distance_threshold - np.fabs(metrics['ori_each'])/180 + metrics['reach']
130 |         elif 'shared' in self.reward_type:
131 |             if 'sparse' in self.reward_type:
132 |                 rewards = metrics['reach'].max()
133 |             else:
134 |                 rewards = 1 - metrics['dis_min']/self.distance_threshold + metrics['reach'].max()
135 |         else:
136 |             raise ValueError('reward type is not defined')
137 |         return rewards
138 | 
139 |     def rescue_metrics(self, objs_pose, target_loc):
140 |         # compute the relative relation (distance, collision) among agents for rewards and evaluation metrics
141 |         info = dict()
142 |         relative_pose = []
143 |         for obj_pos in objs_pose:
144 |             obs, distance, direction = self.get_relative(obj_pos, target_loc)
145 |             relative_pose.append(np.array([distance, direction]))
146 |         relative_pose = np.array(relative_pose)
147 |         relative_dis = relative_pose[:, 0]
148 |         relative_ori = relative_pose[:, 1]
149 |         reach_mat = np.zeros_like(relative_dis)
150 |         reach_mat[np.where(relative_dis < self.distance_threshold)] = 1
151 |         reach_mat[np.where(np.fabs(relative_ori) > 45)] = 0  # collision should be at the front view
152 | 
153 |         info['reach'] = reach_mat
154 |         info['dis_min'] = relative_dis.min(-1)  # minimal distance from agents to target
155 |         info['dis_each'] = relative_dis  # distance from agents to target
156 |         info['ori_each'] = relative_ori  # orientation from agents to target
157 |         info['picked']  = self.unrealcv.Is_picked(self.injured_agent)
158 |         return info
159 | 


--------------------------------------------------------------------------------
/gym_rescue/envs/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__init__.py


--------------------------------------------------------------------------------
/gym_rescue/envs/utils/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/utils/__pycache__/keyboard_util.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__pycache__/keyboard_util.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/utils/__pycache__/misc.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__pycache__/misc.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/utils/keyboard_util.py:
--------------------------------------------------------------------------------
 1 | from pynput import keyboard
 2 | 
 3 | key_state = {
 4 |     'i': False,
 5 |     'j': False,
 6 |     'k': False,
 7 |     'l': False,
 8 |     'space': False,
 9 |     'ctrl':False,
10 |     '1': False,
11 |     '2': False,
12 |     'head_up': False,
13 |     'head_down': False,
14 |     'e': False
15 | }
16 | 
17 | def on_press(key):
18 |     try:
19 |         if key.char in key_state:
20 |             key_state[key.char] = True
21 |     except AttributeError:
22 |         if key == keyboard.Key.space:
23 |             key_state['space'] = True
24 |         if key == keyboard.Key.up:
25 |             key_state['head_up'] = True
26 |         if key == keyboard.Key.down:
27 |             key_state['head_down'] = True
28 |         if key ==keyboard.Key.ctrl_l:
29 |             key_state['ctrl'] = True
30 | 
31 | 
32 | def on_release(key):
33 |     try:
34 |         if key.char in key_state:
35 |             key_state[key.char] = False
36 |     except AttributeError:
37 |         if key == keyboard.Key.space:
38 |             key_state['space'] = False
39 |         if key == keyboard.Key.up:
40 |             key_state['head_up'] = False
41 |         if key == keyboard.Key.down:
42 |             key_state['head_down'] = False
43 |         if key ==keyboard.Key.ctrl_l:
44 |             key_state['ctrl'] = False
45 | def get_key_action():
46 |     action = ([0, 0], 0, 0)
47 |     action = list(action)  # Convert tuple to list for modification
48 |     action[0] = list(action[0])  # Convert inner tuple to list for modification
49 | 
50 |     if key_state['i']:
51 |         action[0][1] = 200
52 |     if key_state['k']:
53 |         action[0][1] = -200
54 |     if key_state['j']:
55 |         action[0][0] = -30
56 |     if key_state['l']:
57 |         action[0][0] = 30
58 |     if key_state['space']:
59 |         action[2] = 1
60 |     if key_state['ctrl']:
61 |         action[2] = 2
62 |     if key_state['1']:
63 |         action[2] = 3
64 |     if key_state['2']:
65 |         action[2] = 4
66 |     if key_state['head_up']:
67 |         action[1] = 1
68 |     if key_state['head_down']:
69 |         action[1] = 2
70 |     if key_state['e']:
71 |         action[2] = 5
72 | 
73 |     action[0] = tuple(action[0])  # Convert inner list back to tuple
74 |     action = tuple(action)  # Convert list back to tuple
75 |     return action
76 | 


--------------------------------------------------------------------------------
/gym_rescue/envs/utils/misc.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import numpy as np
 3 | import json
 4 | import unrealcv
 5 | def load_env_setting(filename):
 6 |     f = open(get_settingpath(filename))
 7 |     type = os.path.splitext(filename)[1]
 8 |     assert type in ['.json'], type + ' is not supported'
 9 |     setting = json.load(f)
10 |     return setting
11 | 
12 | 
13 | def get_settingpath(filename):
14 |     import gym_rescue
15 |     gympath = os.path.dirname(gym_rescue.__file__)
16 |     return os.path.join(gympath, 'envs/setting', filename)
17 | 
18 | 
19 | def get_action_size(action):
20 |     return len(action)
21 | 
22 | 
23 | def get_direction(current_pose, target_pose):  # get relative angle between current pose and target pose in x-y plane
24 |     y_delt = target_pose[1] - current_pose[1]
25 |     x_delt = target_pose[0] - current_pose[0]
26 |     if x_delt == 0 and y_delt == 0:  # if the same position
27 |         return 0
28 |     angle_abs = np.arctan2(y_delt, x_delt)/np.pi*180
29 |     angle_relative = angle_abs - current_pose[4]
30 |     if angle_relative > 180:
31 |         angle_relative -= 360
32 |     if angle_relative < -180:
33 |         angle_relative += 360
34 |     return angle_relative
35 | 
36 | 
37 | def get_textures(texture_name="textures", docker=False):
38 |     try:
39 |         texture_dir = os.path.join(unrealcv.util.get_path2UnrealEnv(), "textures")
40 |     except AttributeError:
41 |         raise ImportError(
42 |             "Function get_path2UnrealEnv() not found. "
43 |             "Please upgrade unrealcv to version 1.1.5 or higher using: \n"
44 |             "pip install --upgrade unrealcv"
45 |             )
46 |     textures_list = os.listdir(texture_dir)
47 |     # relative to abs
48 |     for i in range(len(textures_list)):
49 |         if docker:
50 |             textures_list[i] = os.path.join('/unreal', texture_dir, textures_list[i])
51 |         else:
52 |             textures_list[i] = os.path.join(texture_dir, textures_list[i])
53 |     return textures_list
54 | 
55 | def convert_dict(old_dict):
56 |     new_dict = {}
57 |     for agent, info in old_dict.items():
58 |         names = info["name"]
59 |         for i, name in enumerate(names):
60 |             new_dict[name] = {
61 |                 "agent_type": agent,
62 |             }
63 |             for key in info.keys():
64 |                 if key == "name" or key == "cam_id" or key == "class_name":
65 |                     new_dict[name][key] = info[key][i]
66 |                 else:
67 |                     new_dict[name][key] = info[key]
68 |     return new_dict


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__init__.py


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/configUE.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/configUE.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/early_done.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/early_done.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/monitor.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/monitor.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/population.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/population.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/task_cue.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/task_cue.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/__pycache__/time_dilation.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/time_dilation.cpython-39.pyc


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/configUE.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | from gym import Wrapper
 3 | 
 4 | class ConfigUEWrapper(Wrapper):
 5 |     def __init__(self, env, docker=False, resolution=(160, 160), display=None, offscreen=False,
 6 |                             use_opengl=False, nullrhi=False, gpu_id=None, render_quality=2, use_lumen=False, sleep_time=5, comm_mode='tcp'):
 7 |         super().__init__(env)
 8 |         env.unwrapped.docker = docker
 9 |         env.unwrapped.display = display
10 |         env.unwrapped.offscreen_rendering = offscreen
11 |         env.unwrapped.use_opengl = use_opengl
12 |         env.unwrapped.nullrhi = nullrhi
13 |         env.unwrapped.gpu_id = gpu_id
14 |         env.unwrapped.sleep_time = sleep_time
15 |         env.unwrapped.resolution = resolution
16 |         env.unwrapped.comm_mode = comm_mode
17 |         env.unwrapped.use_lumen = use_lumen
18 |         env.unwrapped.render_quality = render_quality
19 |         # reset the observation space based on the new resolution
20 |         self.observation_space = self.define_observation_space(self.cam_list[self.protagonist_id], self.observation_type, resolution)
21 | 
22 | 
23 |     def step(self, action):
24 |         obs, reward, termination,truncation, info = self.env.step(action)
25 |         return obs, reward, termination,truncation, info
26 | 
27 |     def reset(self, **kwargs):
28 |         states,info = self.env.reset(**kwargs)
29 |         return states,info


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/early_done.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | from gym import Wrapper
 3 | import time
 4 | 
 5 | class EarlyDoneWrapper(Wrapper):
 6 |     def __init__(self, env, max_time=180):
 7 |         super().__init__(env)
 8 |         self.max_time = max_time
 9 |         self.count_lost = 0  # Initialize count_lost
10 |         self.start_time = time.time()  # Initialize start time
11 | 
12 |     def step(self, action):
13 |         obs, reward, terminated, truncated, info = self.env.step(action)  # take a step in the wrapped environment
14 |         elapsed_time = time.time() - self.start_time
15 |         if elapsed_time > self.max_time:
16 |             truncated = True
17 |         if action[0][2]==4: # if drop action is triggerd, the task is truncated
18 |             truncated=True
19 |         if terminated or truncated:
20 |             info['execution_time'] = elapsed_time
21 |             print('elapsed_time:', elapsed_time)
22 |         return obs, reward, terminated, truncated, info  # return the same results as the wrapped environment
23 | 
24 |     def reset(self, **kwargs):
25 |         obs, info = self.env.reset(**kwargs)
26 |         self.start_time = time.time()
27 |         return obs, info  # return the same results as the wrapped environment


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/monitor.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | from gym import Wrapper
 3 | import time
 4 | import cv2
 5 | 
 6 | class DisplayWrapper(Wrapper):
 7 |     def __init__(self, env, dynamic_top_down=True, fix_camera=False, get_bbox=False):
 8 |         super().__init__(env)
 9 |         self.dynamic_top_down = dynamic_top_down
10 |         self.fix_camera = fix_camera
11 |         self.get_bbox = get_bbox
12 | 
13 |     def step(self, action):
14 |         obs, reward, termination, truncation, info = self.env.step(action) # take a step in the wrapped environment
15 |         # set top_down camera
16 |         env = self.env.unwrapped
17 | 
18 |         # for recording demo
19 |         if self.get_bbox:
20 |             mask = env.unrealcv.get_image(env.cam_list[env.protagonist_id], 'object_mask', 'bmp')
21 |             mask, bbox = env.unrealcv.get_bbox(mask, env.unwrapped.injured_agent, normalize=False)
22 |             self.show_bbox(env.img_show.copy(), bbox)
23 |             info['bbox'] = bbox
24 | 
25 |         if self.dynamic_top_down:
26 |             env.set_topview(info['Pose'][env.protagonist_id], env.cam_id[0]) # set top_down camera
27 | 
28 |         return obs, reward, termination, truncation, info # return the same results as the wrapped environment
29 | 
30 |     def reset(self, **kwargs):
31 |         states = self.env.reset(**kwargs)
32 |         env = self.env.unwrapped
33 |         if self.fix_camera:
34 |             # current_pose = env.unrealcv.get_cam_pose(env.cam_list[env.protagonist_id])
35 |             center_pos = [self.ambulance_pose[0], self.ambulance_pose[1], self.ambulance_pose[2]+2000]
36 |             env.set_topview(center_pos, env.cam_id[0])
37 |         if self.get_bbox:
38 |             self.bbox_init = []
39 |             mask = env.unrealcv.get_image(env.cam_list[env.protagonist_id], 'object_mask', 'bmp')
40 |             mask, bbox = env.unrealcv.get_bbox(mask, env.unwrapped.injured_agent, normalize=False)
41 |             self.mask_percent = mask.sum()/(255 * env.resolution[0] * env.resolution[1])
42 |             self.bbox_init.append(bbox)
43 |         # cv2.imshow('init', env.img_show)
44 |         # cv2.waitKey(1)
45 |         return states # return the same results as the wrapped environment
46 | 
47 |     def show_bbox(self, img2disp, bbox):
48 |         # im_disp = states[0][:, :, :3].copy()
49 |         cv2.rectangle(img2disp, (int(bbox[0]), int(bbox[1])), (int(bbox[2] + bbox[0]), int(bbox[3] + bbox[1])), (0, 255, 0), 5)
50 |         cv2.imshow('track_res', img2disp)
51 |         cv2.waitKey(1)


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/population.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | from gym import Wrapper
 3 | import numpy as np
 4 | 
 5 | class RandomPopulationWrapper(Wrapper):
 6 |     def __init__(self, env,  num_min=2, num_max=10, random_target=False, random_tracker=False):
 7 |         super().__init__(env)
 8 |         self.min_num = num_min
 9 |         self.max_num = num_max
10 |         self.random_target_id = random_target
11 |         self.random_tracker_id = random_tracker
12 | 
13 |     def step(self, action):
14 |         obs, reward, termination,truncation, info = self.env.step(action)
15 |         return obs,reward, termination,truncation, info
16 | 
17 |     def reset(self, **kwargs):
18 |         env = self.env.unwrapped
19 |         if not env.launched:  # we need to launch the environment
20 |             env.launched = env.launch_ue_env()
21 |             env.init_agents()
22 |             env.init_objects()
23 | 
24 |         if self.min_num == self.max_num:
25 |             env.num_agents = self.min_num
26 |         else:
27 |             # Randomize the number of agents
28 |             env.num_agents = np.random.randint(self.min_num, self.max_num)
29 |         env.set_population(env.num_agents)
30 |         if self.random_tracker_id:
31 |             env.tracker_id = env.sample_tracker()
32 |         if self.random_target_id:
33 |             new_target = env.sample_target()
34 |             if new_target != env.tracker_id:  # set target object mask to white
35 |                 env.unrealcv.build_color_dict(env.player_list)
36 |                 env.unrealcv.set_obj_color(env.player_list[env.target_id], env.unrealcv.color_dict[env.player_list[new_target]])
37 |                 env.unrealcv.set_obj_color(env.player_list[new_target], [255, 255, 255])
38 |                 env.target_id = new_target
39 |         states,info = self.env.reset(**kwargs)
40 | 
41 | 
42 | 
43 |         return states,info


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/switch_env.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | from gym import Wrapper
 3 | import time
 4 | 
 5 | class SwitchEnvWrapper(Wrapper):
 6 |     def __init__(self, env, env_old):
 7 |         super().__init__(env)
 8 |         env.unwrapped.unrealcv = env_old.unwrapped.unrealcv
 9 |         env.unwrapped.ue_binary = env_old.unwrapped.ue_binary
10 |         env.unwrapped.unrealcv.set_map(env.unwrapped.env_name)
11 |         print('Switching to environment:', env.unwrapped.env_name)
12 |         env.unwrapped.launched = True
13 |         env.unwrapped.init_agents()
14 |         env.unwrapped.init_objects()
15 | 
16 |     def step(self, action):
17 |         obs, reward, terminated, truncated, info = self.env.step(action)  # take a step in the wrapped environment
18 |         return obs, reward, terminated, truncated, info  # return the same results as the wrapped environment
19 | 
20 |     def reset(self, **kwargs):
21 |         obs, info = self.env.reset(**kwargs)
22 |         return obs, info  # return the same results as the wrapped environment
23 | 


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/task_cue.py:
--------------------------------------------------------------------------------
 1 | import os.path
 2 | import cv2
 3 | import gym
 4 | import gym_rescue
 5 | from gym import Wrapper
 6 | class TaskCueWrapper(Wrapper):
 7 |     def __init__(self, env, test_point):
 8 |         super().__init__(env)
 9 |         self.test_point = test_point
10 |         env.unwrapped.injured_player_pose = self.test_point['injured_player_loc']
11 |         env.unwrapped.rescue_pose = self.test_point['stretcher_loc']
12 |         env.unwrapped.agent_pose = self.test_point['agent_loc']
13 |         env.unwrapped.ambulance_pose = self.test_point['ambulance_loc']
14 | 
15 |     def step(self, action):
16 |         obs, reward, termination,truncation, info = self.env.step(action)
17 |         return obs, reward, termination,truncation, info
18 | 
19 |     def reset(self, **kwargs):
20 |         states,info = self.env.reset(**kwargs)
21 | 
22 |         injured_agent_appid = self.test_point['injured_agent_id']
23 |         self.env.unwrapped.unrealcv.set_appearance(self.env.unwrapped.injured_agent, injured_agent_appid)
24 | 
25 |         UnrealEnv = os.environ.get('UnrealEnv')
26 |         reference_image_path = os.path.join(UnrealEnv, 'ref_image', self.test_point['reference_image_path'][0])
27 | 
28 |         image = cv2.imread(reference_image_path)
29 |         text = self.test_point['reference_text'][0]
30 |         info['reference_image'] = image
31 |         info['reference_text']=text
32 |         # Display the reference image and text
33 |         goal_show = cv2.resize(info['reference_image'], (480, 320))
34 |         reference_text = info['reference_text']
35 |         for idx, line in enumerate([reference_text[i:i + 50] for i in range(0, len(reference_text), 50)]):
36 |             y = 30 + idx * 20
37 |             # Create rectangle background
38 |             overlay = goal_show.copy()
39 |             cv2.rectangle(overlay, (5, y - 15), (len(line) * 10 + 15, y + 5), (128, 128, 128), -1)
40 |             # Apply semi-transparent overlay
41 |             alpha = 0.6
42 |             cv2.addWeighted(overlay, alpha, goal_show, 1 - alpha, 0, goal_show)
43 |             # Draw text on top of background
44 |             cv2.putText(goal_show, line, (10, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
45 |                         (0, 0, 0), 1, cv2.LINE_AA)
46 |         self.env.unwrapped.goal_show = goal_show
47 |         return states,info


--------------------------------------------------------------------------------
/gym_rescue/envs/wrappers/time_dilation.py:
--------------------------------------------------------------------------------
 1 | import gym
 2 | from gym import Wrapper
 3 | import time
 4 | 
 5 | class TimeDilationWrapper(Wrapper):
 6 |     def __init__(self, env, reference_fps=10, update_steps=10, update_dilation=True):
 7 |         super().__init__(env)
 8 |         self.dilation_factor = 1  # the factor by which to slow down the environment
 9 |         self.reference_fps = reference_fps  # the fps at which the environment should run
10 |         self.update_steps = update_steps  # the number of steps after which to update the time dilation
11 |         self.update_dilation = update_dilation  # whether to update the time dilation or not
12 |     def step(self, action):
13 |         obs, reward, done, info = self.env.step(action)  # take a step in the wrapped environment
14 |         self.count_steps += 1
15 |         if self.count_steps % self.update_steps == 0:  # update the time dilation every 10 steps
16 |             fps = self.count_steps / (time.time() - self.start_time)
17 |             dilation_factor_new = fps / self.reference_fps  # reference fps: 10
18 |             # print(f'FPS: {fps}', f'Dilation factor: {dilation_factor_new}')
19 |             if  dilation_factor_new / self.dilation_factor > 1.1 or dilation_factor_new / self.dilation_factor < 0.9:
20 |                 env = self.env.unwrapped
21 |                 if self.update_dilation:
22 |                     self.dilation_factor = dilation_factor_new
23 |                     env.unrealcv.set_global_time_dilation(self.dilation_factor)
24 |         return obs, reward, done, info  # return the same results as the wrapped environment
25 | 
26 |     def reset(self, **kwargs):
27 |         states = self.env.reset(**kwargs)
28 |         self.start_time = time.time()
29 |         self.count_steps = 0
30 |         return states # return the same results as the wrapped environment


--------------------------------------------------------------------------------
/gym_rescue/example/rescue_demo.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import gym
  3 | import cv2
  4 | import time
  5 | from gym_rescue.envs.wrappers import time_dilation, early_done, monitor, population, configUE,task_cue
  6 | from gym_rescue.envs.utils.keyboard_util import get_key_action,on_press,on_release
  7 | from pynput import keyboard
  8 | import os
  9 | import json
 10 | 
 11 | 
 12 | #start keyboard listener
 13 | listener = keyboard.Listener(on_press=on_press, on_release=on_release)
 14 | listener.start()
 15 | 
 16 | 
 17 | class RandomAgent(object):
 18 |     """The world's simplest agent!"""
 19 |     def __init__(self, action_space):
 20 |         self.action_space = action_space
 21 | 
 22 |     def act(self, observation):
 23 |         return self.action_space.sample()
 24 | 
 25 | if __name__ == '__main__':
 26 |     parser = argparse.ArgumentParser(description=None)
 27 |     parser.add_argument("-r", '--render', dest='render', action='store_true', help='show env using cv2')
 28 |     parser.add_argument("-s", '--seed', dest='seed', type=int, default=1, help='number of population')
 29 |     parser.add_argument("-t", '--time-dilation', dest='time_dilation', type=int, default=-1, help='time_dilation to keep fps in simulator')
 30 |     parser.add_argument("-n", '--nav-agent', dest='nav_agent', action='store_true', help='use nav agent to control the agents')
 31 |     parser.add_argument("-m", '--monitor', dest='monitor', action='store_true', help='auto_monitor')
 32 |     parser.add_argument("-k", '--keyboard', dest='keyboard', action='store_true', help='use keyboard to control the agents')
 33 |     parser.add_argument("-o", '--offscreen', dest='offscreen', action='store_true', help='offscreen rendering')
 34 |     parser.add_argument("-q", '--quality', dest='quality', type=int, default=3, help='render quality')
 35 |     parser.add_argument("-u", '--use_lumen', dest='use_lumen', action='store_true', help='use lumen')
 36 |     parser.add_argument("-v", '--record_video', dest='record_video', action='store_true', help='record task cues and observation into video')
 37 | 
 38 |     UnrealEnv = os.environ.get('UnrealEnv')
 39 |     TEST_JSONL = os.path.join(UnrealEnv, 'test_L1.jsonl')
 40 |     with open(TEST_JSONL, "r") as fp:
 41 |         try:
 42 |             point_id = 0
 43 |             for line in fp.readlines():
 44 |                 content = json.loads(line)
 45 |                 env_id = content.get("env_id")
 46 |                 timeout = content.get("timeout")
 47 | 
 48 |                 InterruptedException = False
 49 |                 args = parser.parse_args()
 50 |                 resolution = (320, 320)
 51 |                 if args.record_video:
 52 |                     out = cv2.VideoWriter('output_{}.mp4'.format(point_id), cv2.VideoWriter_fourcc(*'mp4v'), 20.0,
 53 |                                           (int(2.5 * resolution[1]), resolution[1]))
 54 |                 else:
 55 |                     out = None
 56 |                 # create the environment
 57 |                 env = gym.make(env_id, action_type='Mixed', observation_type='Color')  # available observation_type : ['Color', 'Depth', 'Rgbd', 'Mask', 'Pose','MaskDepth','ColorMask']
 58 |                 # set the rendering mode, resolution and render quality, gpu_id
 59 |                 env = configUE.ConfigUEWrapper(env, offscreen=args.offscreen, resolution=resolution, use_lumen=args.use_lumen, render_quality=args.quality)
 60 | 
 61 |                 if int(args.time_dilation) > 0:  # -1 means no time_dilation, the number means the expected FPS in simulator
 62 |                     env = time_dilation.TimeDilationWrapper(env, int(args.time_dilation))
 63 |                 if int(timeout) > 0:  # -1 means no early_done, 180 mean 180 seconds
 64 |                     env = early_done.EarlyDoneWrapper(env, timeout)
 65 |                 if args.monitor: # if dynamic_top_down is True, the top-down view will follow the agent
 66 |                     env = monitor.DisplayWrapper(env, dynamic_top_down=False, fix_camera=True,get_bbox=True)
 67 |                 env = task_cue.TaskCueWrapper(env=env, test_point=content)
 68 | 
 69 |                 agent = RandomAgent(env.action_space)
 70 |                 rewards = 0
 71 |                 done = False
 72 |                 Total_rewards = 0
 73 |                 obs,info = env.reset() # environment reset
 74 |                 count_step = 0
 75 |                 t0=time.time()
 76 |                 while True:
 77 |                     try:
 78 |                         # print(info['reference_text'])  # print the text hints to terminal
 79 |                         if args.keyboard:
 80 |                             action = get_key_action() # get the action from keyboard
 81 |                         else:
 82 |                             action = agent.act(obs) # get the action from random agent
 83 |                         obs, reward, termination, truncation, info= env.step([action]) #environment update
 84 |                         rewards+=reward
 85 |                         if reward != 0:
 86 |                             print('Reward:', reward)
 87 |                         if args.render or args.record_video: # show the visual cues & text cues and first-person view observation
 88 |                             frame, out=env.render(mode='ref+obs', show=args.render,save=out)
 89 |                         count_step+=1
 90 | 
 91 |                         if termination:
 92 |                             fps = count_step / (time.time() - t0)
 93 |                             if out is not None:
 94 |                                 out.release()
 95 |                             print('Success')
 96 |                             print('reward:',rewards)
 97 |                             print('Fps:' + str(fps))
 98 |                             break
 99 |                         if truncation or action[2]==4:
100 |                             fps = count_step / (time.time() - t0)
101 |                             if out is not None:
102 |                                 out.release()
103 |                             print('Failed')
104 |                             print('reward:',rewards)
105 |                             print('Fps:' + str(fps))
106 |                             break
107 |                     except KeyboardInterrupt:
108 |                         print('\nReceived CTRL+C. Cleaning up...')
109 |                         if out is not None:
110 |                             out.release()
111 |                         InterruptedException = True
112 |                         break
113 |                     except Exception as e:
114 |                         if out is not None:
115 |                             out.release()
116 |                         print(e)
117 |                         InterruptedException = True
118 |                         break
119 |                 point_id+=1
120 |                 if InterruptedException:
121 |                     if out is not None:
122 |                         out.release()
123 |                     env.close()
124 |                     break
125 |                 env.close()
126 |         except Exception as e:
127 |             print(e)
128 |             if out is not None:
129 |                 out.release()
130 |             env.close()
131 |             exit(0)


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import gym_rescue
  2 | from gym_rescue.envs.wrappers import switch_env, configUE, early_done
  3 | import gym
  4 | import numpy as np
  5 | import cv2
  6 | import os
  7 | import json
  8 | import base64
  9 | import time
 10 | import traceback
 11 | 
 12 | from solution import AlgSolution
 13 | 
 14 | UnrealEnv = os.environ.get('UnrealEnv')
 15 | TEST_JSONL = os.path.join(UnrealEnv, 'test_L1.jsonl')
 16 | 
 17 | def img2base64(img):
 18 |     buf = cv2.imencode('.png', img)[1]
 19 |     b64 = base64.b64encode(buf).decode('utf-8')
 20 |     return b64
 21 | 
 22 | 
 23 | agent = AlgSolution()
 24 | 
 25 | results = []
 26 | seed = 0
 27 | 
 28 | rewards = []
 29 | ep_times = []
 30 | 
 31 | env_old = None
 32 | with open(TEST_JSONL, "r") as fp:
 33 |     for line in fp.readlines():
 34 | 
 35 |         content = json.loads(line)
 36 |         level = content.get('level')
 37 |         env_id = content.get("env_id")
 38 |         agent_loc = content.get("agent_loc")
 39 |         injured_player_loc = content.get("injured_player_loc")
 40 |         injured_agent_id = content.get("injured_agent_id")
 41 |         stretcher_loc = content.get("stretcher_loc")
 42 |         ambulance_loc= content.get("ambulance_loc")
 43 |         reference_text = content.get("reference_text")
 44 |         reference_image_path = content.get("reference_image_path")
 45 |         max_steps = 9999999
 46 |         timeout = content.get("timeout")
 47 | 
 48 |         env = gym.make(env_id, action_type='Mixed', observation_type='Color', reset_type=level)
 49 |         if env_old is not None:
 50 |             env = switch_env.SwitchEnvWrapper(env, env_old)
 51 |         env_old = env
 52 |         try:
 53 |             env._max_episode_steps = max_steps
 54 |             env = configUE.ConfigUEWrapper(env, offscreen=True, resolution=(640,480))
 55 |             env = early_done.EarlyDoneWrapper(env, int(timeout))
 56 | 
 57 |             if reference_image_path:
 58 |                 reference_image_path = os.path.join(UnrealEnv, 'ref_image', reference_image_path[0])
 59 |                 reference_image = base64.b64encode(open(reference_image_path, 'rb').read()).decode("utf-8")
 60 |             else:
 61 |                 reference_image = None
 62 | 
 63 |             agent.reset(reference_text, reference_image)
 64 | 
 65 |             env.seed(seed)
 66 |             seed += 1
 67 | 
 68 |             env.unwrapped.injured_player_pose = injured_player_loc
 69 |             env.unwrapped.rescue_pose = stretcher_loc
 70 |             env.unwrapped.agent_pose = agent_loc
 71 |             env.unwrapped.ambulance_pose = ambulance_loc
 72 |             ob, info = env.reset()
 73 |             env.unwrapped.unrealcv.set_appearance(env.unwrapped.injured_agent, injured_agent_id)
 74 | 
 75 |             ob, reward, termination, truncation, info = env.step([[(0, 0), 0, 0]])
 76 |             current_step = 0
 77 |             picked_reward = 0.
 78 |             success = True
 79 | 
 80 |             start_t = time.time()
 81 |             while True:
 82 |                 pred = agent.predicts(
 83 |                     img2base64(ob),
 84 |                     success,
 85 |                 )
 86 |                 action = [(pred['angular'], pred['velocity']), pred['viewport'], pred['interaction']]
 87 |                 ob, reward, termination, truncation, info = env.step([action])
 88 |                 if info['picked']:
 89 |                     picked_reward = 0.5
 90 | 
 91 |                 if pred['interaction'] == 3:
 92 |                     if info['picked']:
 93 |                         success = True
 94 |                     else:
 95 |                         success = False
 96 |                 else:
 97 |                     success = True
 98 | 
 99 |                 if current_step % 1 == 0:
100 |                     print('step', current_step)
101 |                 end_t = time.time()
102 |                 if termination:
103 |                     rewards.append(1)
104 |                     ep_times.append(end_t - start_t)
105 |                     break
106 |                 if truncation or pred['interaction'] == 4:
107 |                     rewards.append(picked_reward)
108 |                     ep_times.append(end_t - start_t)
109 |                     break
110 | 
111 |                 current_step += 1
112 |         except:
113 |             env.close()
114 |             traceback.print_exc()
115 |             exit()
116 | 
117 | env.close()
118 | 
119 | print('Rewards', rewards)
120 | print('Times', ep_times)
121 | rewards = np.mean(rewards)
122 | ep_times = np.sum(ep_times)
123 | print('Rewards', rewards, 'Times', ep_times)
124 | 
125 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | unrealcv
 2 | gym==0.26
 3 | matplotlib
 4 | numpy==1.21.6
 5 | wget
 6 | opencv-python
 7 | pynput
 8 | docker
 9 | ultralytics
10 | 


--------------------------------------------------------------------------------
/run.sh:
--------------------------------------------------------------------------------
1 | cd /home/admin/atec_project/
2 | mkdir -p /home/admin/workspace/job/logs
3 | python server.py
4 | 


--------------------------------------------------------------------------------
/run_main.bat:
--------------------------------------------------------------------------------
1 | REM UnrealEnv need absolute path
2 | set UnrealEnv=%CD%\dataset
3 | python main.py
4 | 


--------------------------------------------------------------------------------
/run_main.sh:
--------------------------------------------------------------------------------
1 | DIR_PATH=`pwd`
2 | # UnrealEnv need absolute path
3 | UnrealEnv=${DIR_PATH}/dataset PYTHONPATH=./ python main.py
4 | 
5 | 


--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
 1 | from flask import Flask, jsonify, request
 2 | from solution import AlgSolution
 3 | import logging
 4 | import os
 5 | import sys
 6 | 
 7 | agent = AlgSolution()
 8 | app = Flask(__name__)
 9 | 
10 | app.config['MAX_FORM_MEMORY_SIZE'] = 64 * 1024 * 1024
11 | 
12 | 
13 | @app.route('/step', methods=['POST'])
14 | def step():
15 |     form = request.form
16 |     action = agent.predicts(**form)
17 |     return jsonify(**action)
18 | 
19 | @app.route('/reset', methods=['POST'])
20 | def reset():
21 |     form = request.form
22 |     agent.reset(**form)
23 |     return jsonify(message="success")
24 | 
25 | @app.route('/synchronize', methods=['GET'])
26 | def synchroize():
27 |     return jsonify(message="success")
28 | 
29 | @app.route('/stop', methods=['POST'])
30 | def stop():
31 |     msg = request.form['msg']
32 |     f = open('/home/admin/workspace/job/logs/user.log', 'a')
33 |     f.write(msg)
34 |     f.close()
35 |     return jsonify(message="success")
36 | 
37 | if __name__ == '__main__':
38 |     app.run(host='0.0.0.0', port=5000, debug=True)
39 | 


--------------------------------------------------------------------------------
/solution.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import sys
  3 | import json
  4 | import cv2
  5 | import numpy as np
  6 | import os
  7 | from io import BytesIO
  8 | import base64
  9 | from ultralytics import YOLO
 10 | # 加载预训练的YOLO模型，如果下载了特定版本的权重，请指定路径
 11 | 
 12 | class AlgSolution:
 13 | 
 14 |     def __init__(self):
 15 |         self.yolo_model = YOLO('checkpoints/yolo11x.pt')  #  'yolov8n.pt' is the pretrained model file name for YOLOv8 nano version, replace it according to the actual situation
 16 |         if os.path.exists('/home/admin/workspace/job/logs/'):
 17 |             self.handle = open('/home/admin/workspace/job/logs/user.log', 'w')
 18 |         else:
 19 |             self.handle = open('user.log', 'w')
 20 |         self.handle.write("model loaded\n")
 21 |         self.handle.flush()
 22 |         self.foreward = {
 23 |             'angular': 0, # [-30, 30]
 24 |             'velocity': 50, # [-100, 100],
 25 |             'viewport': 0, # {0: stay, 1: look up, 2: look down},
 26 |             'interaction': 0, #
 27 |         }
 28 |         self.backward = {
 29 |             'angular': 0, # [-30, 30]
 30 |             'velocity': -50, # [-100, 100],
 31 |             'viewport': 0, # {0: stay, 1: look up, 2: look down},
 32 |             'interaction': 0, #
 33 |         }
 34 |         self.turnleft = {
 35 |             'angular': -20, # [-30, 30]
 36 |             'velocity': 0, # [-100, 100],
 37 |             'viewport': 0, # {0: stay, 1: look up, 2: look down},
 38 |             'interaction': 0, #
 39 |         }
 40 |         self.turnright = {
 41 |             'angular': 20, # [-30, 30]
 42 |             'velocity': 0, # [-100, 100],
 43 |             'viewport': 0, # {0: stay, 1: look up, 2: look down},
 44 |             'interaction': 0, #
 45 |         }
 46 |         self.carry = {
 47 |             'angular': 0, # [-30, 30]
 48 |             'velocity': 0, # [-100, 100],
 49 |             'viewport': 0, # {0: keep, 1: up, 2: down},
 50 |             'interaction': 3,
 51 |         }
 52 |         self.drop = {
 53 |             'angular': 0, # [-30, 30]
 54 |             'velocity': 0, # [-100, 100],
 55 |             'viewport': 0, # {0: keep, 1: up, 2: down},
 56 |             'interaction': 4,
 57 |         }
 58 |         self.noaction = {
 59 |             'angular': 0, # [-30, 30]
 60 |             'velocity': 0, # [-100, 100],
 61 |             'viewport': 0, # {0: stay, 1: look up, 2: look down},
 62 |             'interaction': 0, # {0: stand, 1: jump, 2: crouch, 3: carry, 4: drop, 5: open door}
 63 |         }
 64 | 
 65 |         self.person_success = False
 66 |         self.right_times = 0
 67 |         self.foreward_times = 0
 68 |         self.backward_times = 0
 69 |         self.try_times = 10
 70 |         self.idx = 0
 71 | 
 72 |     def reset(self, reference_text=None, reference_image=None):
 73 |         self.person_success = False
 74 |         self.right_times = 0
 75 |         self.foreward_times = 0
 76 |         self.backward_times = 0
 77 |         self.try_times = 10
 78 |         self.idx = 0
 79 | 
 80 |     def predicts(self, ob,success):
 81 |         # person_exist = False
 82 |         # truck_exist = False
 83 |         self.handle.write('Step %d\n'%self.idx)
 84 |         self.handle.flush()
 85 |         self.idx += 1
 86 |         if self.idx == 200:
 87 |             return self.drop
 88 | 
 89 |         ob = base64.b64decode(ob)
 90 |         ob = cv2.imdecode(np.frombuffer(ob, np.uint8), cv2.IMREAD_COLOR)
 91 |         results = self.yolo_model(source=ob,imgsz=640,conf=0.1) 
 92 |         boxes = results[0].boxes  # get all detected bounding box
 93 |         if boxes.shape[0] == 0:
 94 |             # randomly choose a number from list[1,2]
 95 |             if self.right_times >= self.try_times:
 96 |                 if self.foreward_times<self.try_times:
 97 |                     action = self.foreward
 98 |                     time.sleep(2)
 99 |                     self.handle.write(" ---------------- no object foreward ------------------\n")
100 |                     self.foreward_times += 1
101 |                     self.handle.write(json.dumps(action) + '\n')
102 |                     self.handle.flush()
103 |                     return action
104 |                 else:
105 |                     if self.backward_times<self.try_times:
106 |                         action = self.backward
107 |                         time.sleep(2)
108 |                         self.handle.write(" ---------------- no object backward ------------------\n")
109 |                         self.backward_times += 1
110 |                         self.handle.write(json.dumps(action) + '\n')
111 |                         self.handle.flush()
112 |                         return action
113 |             else:
114 |                 action = self.turnright
115 |                 time.sleep(2)
116 |                 self.handle.write(" ---------------- no object turnright ------------------\n")
117 |                 self.right_times += 1
118 |                 self.handle.write(json.dumps(action) + '\n')
119 |                 self.handle.flush()
120 |                 return action
121 |             if self.right_times >= self.try_times and self.foreward_times>=self.try_times and self.backward_times>=self.try_times:
122 |                 #self.right_times = 9
123 |                 self.right_times = 9
124 |                 self.foreward_times = 0
125 |                 self.backward_times= 0
126 |                 action = self.turnright
127 |                 time.sleep(2)
128 |                 self.handle.write(" ---------------- no object turnright ------------------\n")
129 |                 self.handle.write(json.dumps(action) + '\n')
130 |                 self.handle.flush()
131 |                 return action
132 |         for box in boxes:
133 |             cls = int(box.cls.item())  # get class ID
134 |             if self.yolo_model.names[cls] == 'person' and self.person_success is False:  # 检查是否是person类别
135 |                 # get bounding box coordinate（x1, y1, x2, y2）
136 |                 #x0, y0, w_, h_ = box.xywh
137 |                 res_ = box.xywh
138 |                 self.handle.write("res_person: %s\n"%res_)
139 |                 self.handle.flush()
140 | 
141 |                 x0, y0, w_, h_ = res_[0].tolist()
142 |                 self.handle.write("x0, y0, w_, h_: %s, %s, %s, %s\n"%(x0, y0, w_, h_))
143 |                 self.handle.flush()
144 |                 if w_ > h_:
145 |                     if y0-0.5*h_ > 400:
146 |                         action = self.carry 
147 |                         time.sleep(2)
148 |                         self.handle.write("carry ！！！！！！！！！！！！！！！！！！\n")
149 |                         self.person_success = True
150 |                         self.handle.write(json.dumps(action) + '\n')
151 |                         self.handle.flush()
152 |                         return action
153 |                     else:
154 |                         if x0 < 220:
155 |                             action = self.turnleft
156 |                             time.sleep(2)
157 |                             self.handle.write(" ---------------- turnleft ------------------\n")
158 |                             self.handle.write(json.dumps(action) + '\n')
159 |                             self.handle.flush()
160 |                             return action
161 |                         elif x0 > 420:
162 |                             action = self.turnright
163 |                             time.sleep(2)
164 |                             self.handle.write(" ---------------- turnright ------------------\n")
165 |                             self.handle.write(json.dumps(action) + '\n')
166 |                             self.handle.flush()
167 |                             return action
168 |                         else:
169 |                             action = self.foreward
170 |                             time.sleep(2)
171 |                             self.handle.write(" ---------------- foreward ------------------\n")
172 |                             self.handle.write(json.dumps(action) + '\n')
173 |                             self.handle.flush()
174 |                             return action
175 |             elif self.yolo_model.names[cls] == 'truck' and self.person_success:  #  Check if it is the 'person' category
176 |                 # get bounding box coordinate（x1, y1, x2, y2）
177 |                 #x0, y0, w_, h_ = box.xywh
178 |                 res_ = box.xywh
179 |                 self.handle.write("res_truck: %s\n"%res_)
180 |                 self.handle.flush()
181 | 
182 |                 x0, y0, w_, h_ = res_[0].tolist()
183 |                 if w_ > 390:
184 |                     action = self.drop 
185 |                     time.sleep(2)
186 |                     self.handle.write(" ============================================= drop =============================================\n")
187 |                     self.handle.write(json.dumps(action) + '\n')
188 |                     self.handle.flush()
189 |                     return action
190 |                 else:
191 |                     if x0 < 220:
192 |                         action = self.turnleft
193 |                         time.sleep(2)
194 |                         self.handle.write(" =============== turnleft ===============\n")
195 |                         self.handle.write(json.dumps(action) + '\n')
196 |                         self.handle.flush()
197 |                         return action
198 |                     elif x0 > 420:
199 |                         action = self.turnright
200 |                         self.handle.write(" =============== turnright ===============\n")
201 |                         time.sleep(2)
202 |                         self.handle.write(json.dumps(action) + '\n')
203 |                         self.handle.flush()
204 |                         return action
205 |                     else:
206 |                         action = self.foreward
207 |                         time.sleep(2)
208 |                         self.handle.write(" =============== forward ===============\n")
209 |                         self.handle.write(json.dumps(action) + '\n')
210 |                         self.handle.flush()
211 |                         return action
212 |         
213 |         if self.right_times >= self.try_times:
214 |             if self.foreward_times<self.try_times:
215 |                 action = self.foreward
216 |                 time.sleep(2)
217 |                 self.handle.write(" ---------------- no person and truck foreward------------------\n")
218 |                 self.foreward_times += 1
219 |                 self.handle.write(json.dumps(action) + '\n')
220 |                 self.handle.flush()
221 |                 return action
222 |             else:
223 |                 if self.backward_times<self.try_times:
224 |                     action = self.backward
225 |                     time.sleep(2)
226 |                     self.handle.write(" ---------------- no person and truck backward------------------\n")
227 |                     self.backward_times += 1
228 |                     self.handle.write(json.dumps(action) + '\n')
229 |                     self.handle.flush()
230 |                     return action
231 |         else:
232 |             action = self.turnright
233 |             time.sleep(2)
234 |             self.handle.write(" ---------------- no person and truck turnright------------------\n")
235 |             self.right_times += 1
236 |             self.handle.write(json.dumps(action) + '\n')
237 |             self.handle.flush()
238 |             return action
239 | 
240 |         if self.right_times >= self.try_times and self.foreward_times>=self.try_times and self.backward_times>=self.try_times:
241 |             #self.right_times = 9
242 |             self.right_times = 9
243 |             self.foreward_times = 0
244 |             self.backward_times= 0
245 |             action = self.turnright
246 |             time.sleep(2)
247 |             self.handle.write(" ---------------- no person and truck turnright------------------\n")
248 |             self.handle.write(json.dumps(action) + '\n')
249 |             self.handle.flush()
250 |             return action
251 |         else:
252 |             action = self.noaction
253 |             time.sleep(2)
254 |             self.handle.write(" ---------------- no person and truck turnright------------------\n")
255 |             self.handle.write(json.dumps(action) + '\n')
256 |             self.handle.flush()
257 |             return action
258 | 
259 | 


--------------------------------------------------------------------------------
/solution_random.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | 
 3 | class AlgSolution: # 类名不能变
 4 | 
 5 |     def __init__(self):
 6 |         pass
 7 | 
 8 |     def reset(self, reference_text, reference_image):
 9 |         # This function is called before each new test data inference;
10 |         # reference_text and reference_image are the reference text and reference image for the data, respectively. If None, it means there is none.
11 |         # reference_image is a base64 encoded png image
12 |         # This function completes some state initialization, cache clearing, etc.
13 |         pass
14 | 
15 |     def predicts(self, ob, success):
16 |         # ob: a base64 encoded png image with a resolution of 320*320, representing the current observation image
17 |         # success: a boolean variable, True/False represents whether the last command was successfully executed
18 |         # Most of the time, success is True. When the player performs the "carry" operation to lift the wounded, if it is not within the allowable distance, the action may fail.
19 | 
20 |         linear = random.randint(-100, 100)
21 |         angular = random.randint(-30,30)
22 | 
23 |         action = {
24 |             'angular': angular,  # [-30, 30]
25 |             'velocity': linear,  # [-100, 100],
26 |             'viewport': 0,  # {0: stay, 1: look up, 2: look down},
27 |             'interaction': 0,  # {0: stand, 1: jump, 2: crouch, 3: carry, 4: drop, 5: open door}
28 |         }
29 |         return action


--------------------------------------------------------------------------------