├── Figure ├── image.png └── task_cue.png ├── README.md ├── Run_VLM_Agent.py ├── VLM_Agent ├── agent_VLM.py ├── api_yoloVLM.py ├── prompt_yoloVLM.py ├── readme.md ├── requirements.txt └── solution_yoloVLM.py ├── dockerfile ├── gym_rescue ├── __init__.py ├── __pycache__ │ └── __init__.cpython-39.pyc ├── envs │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-39.pyc │ │ ├── base_env.cpython-39.pyc │ │ └── rescue.cpython-39.pyc │ ├── agent │ │ ├── __init__.py │ │ ├── __pycache__ │ │ │ ├── __init__.cpython-39.pyc │ │ │ └── character.cpython-39.pyc │ │ └── character.py │ ├── base_env.py │ ├── rescue.py │ ├── utils │ │ ├── __init__.py │ │ ├── __pycache__ │ │ │ ├── __init__.cpython-39.pyc │ │ │ ├── keyboard_util.cpython-39.pyc │ │ │ └── misc.cpython-39.pyc │ │ ├── keyboard_util.py │ │ └── misc.py │ └── wrappers │ │ ├── __init__.py │ │ ├── __pycache__ │ │ ├── __init__.cpython-39.pyc │ │ ├── configUE.cpython-39.pyc │ │ ├── early_done.cpython-39.pyc │ │ ├── monitor.cpython-39.pyc │ │ ├── population.cpython-39.pyc │ │ ├── task_cue.cpython-39.pyc │ │ └── time_dilation.cpython-39.pyc │ │ ├── configUE.py │ │ ├── early_done.py │ │ ├── monitor.py │ │ ├── population.py │ │ ├── switch_env.py │ │ ├── task_cue.py │ │ └── time_dilation.py └── example │ └── rescue_demo.py ├── main.py ├── requirements.txt ├── run.sh ├── run_main.bat ├── run_main.sh ├── server.py ├── solution.py └── solution_random.py /Figure/image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/Figure/image.png -------------------------------------------------------------------------------- /Figure/task_cue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/Figure/task_cue.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ATEC_competition_demo_code_2025 2 | === 3 | 4 | [//]: # (Gym-Rescue: ) 5 | Note that the Unreal Environment Binaries and example dataset in JSONL format are not included in this repository. Please register on the [ATEC official website](https://www.atecup.cn/competitions/atec2025) to access and download the completed demo. 6 | 7 | **This project integrates Unreal Engine with OpenAI Gym for a rescue task based on [UnrealCV](http://unrealcv.org/).** 8 | 9 | ## 🔥 News 10 | 11 | - 04/05/2025. We released two additional baseline methods: **VLM-based** and **RL-based**. 12 | ``` 13 | # An example of implementing RL-based algorithm, user could try to improve it by modifying the reward function, integrating VFMs etc. 14 | python train_ppo.py --num_envs 4 --num_epochs 100000 --device cuda --num_steps 60 --test_jsonl test_L1.jsonl --gamma 0.99 --ppo_epochs 20 --clip_param 0.2 15 | 16 | # An example of implementing VLM-based method for zero-shot reasoning and action execution. 17 | python Run_VLM_Agent.py 18 | ``` 19 | 20 | # Introduction 21 | 22 | **In this competition, the agent will be given image and text cues at the initial stage to help it respond to distress signals from an injured person in need of rescue. The agent must navigate a complex 3D environment, locate the objects, and transport them to designated stretchers as quickly as possible.** 23 | 24 | **1.Simulation Interaction Environment:** 25 | Different from traditional computer vision tasks, contestants need to conduct real-time dynamic interactions in the virtual environment provided by this competition. Contestants can use this virtual simulation platform for data collection and strategy training, and the final scoring of the competition will also be evaluated on the same platform. 26 | 27 | Examples of the simulation environment used in the competition and the first-person perspective images of the intelligent agent are shown below: 28 | ![Description of the image](./Figure/image.png) 29 | 30 | **2.Interaction Interface:** 31 | Contestants can interact with the environment through a gym-like Python interface to obtain the initial clues (a color image and a text description), as well as the first-person perspective observation information (RGB image) of the intelligent agent. They can also control the real-time movement of the intelligent agent in the environment, execute actions, and receive reward signals. 32 | 33 | Examples of the clues obtained by the intelligent agent at the initial stage. 34 | ![Description of the image](./Figure/task_cue.png) 35 | 36 | # Installation 37 | 38 | ## Dependencies 39 | UnrealCV, Gym, CV2, Numpy, Docker(Optional), Nvidia-Docker(Optional) 40 | 41 | We recommend you use [anaconda](https://www.continuum.io/downloads) to install and manage your Python environment. 42 | 43 | ```CV2``` is used for image processing, like extracting object masks and bounding boxes. ```Matplotlib``` is used for visualization. 44 | 45 | 46 | 47 | 48 | It is easy to install, just activate your python environment and install dependency package 49 | ``` 50 | pip install -r requirements.txt 51 | ``` 52 | 53 | 54 | ## Run the baseline code 55 | ``` 56 | ./run_main.sh 57 | ``` 58 | ## Task Demonstration 59 | We provide a task execution example to help participants better understand the task. Participants can: 60 | - Control the agent via keyboard to complete the rescue mission ```--keyboard```. 61 | - Observe a randomly controlled agent navigating the environment. 62 | By adding the parameters ```--render``` and ```--record_video```, participants can visualize the agent’s first-person perspective and save the entire observation sequence as an MP4 file, gaining a clear understanding of the success and failure criteria. 63 | ### Run a keyboard controlled agent, visualize and save to output.mp4 64 | - Use `i`, `j`, `k`, `l` for agent movement 65 | - `1` for pick 66 | - `2` for drop 67 | - `e` for open the door 68 | - `space` for jump 69 | - `ctrl` for crouch 70 | 71 | ``` 72 | python example/rescue_demo.py --render --record_video --keyboard 73 | ``` 74 | 75 | ### Run a random agent, visualize it 76 | ``` 77 | python example/rescue_demo.py --render 78 | ``` 79 | 80 | 81 | ## Acknowledgments 82 | **This repository is a specialized branch of the [unrealzoo project](http://unrealzoo.site/), tailored to simulate rescue task scenarios.** 83 | We acknowledge the following projects for their contributions: 84 | - [UnrealCV](https://unrealcv.org/) 85 | - [OpenAI Gym](https://gym.openai.com/) 86 | - [Unreal Engine](https://www.unrealengine.com/) 87 | - [UnrealZoo](http://unrealzoo.site/) 88 | 89 | 90 | 91 | -------------------------------------------------------------------------------- /Run_VLM_Agent.py: -------------------------------------------------------------------------------- 1 | import gym_rescue 2 | from gym_rescue.envs.wrappers import switch_env, configUE, early_done 3 | import gym 4 | import numpy as np 5 | import cv2 6 | import os 7 | import json 8 | import base64 9 | import time 10 | import traceback 11 | 12 | from VLM_Agent.solution_yoloVLM import AlgSolution 13 | 14 | UnrealEnv = os.environ.get('UnrealEnv') 15 | TEST_JSONL = os.path.join(UnrealEnv, 'test_L1.jsonl') 16 | 17 | def img2base64(img): 18 | buf = cv2.imencode('.png', img)[1] 19 | b64 = base64.b64encode(buf).decode('utf-8') 20 | return b64 21 | 22 | 23 | agent = AlgSolution() 24 | 25 | results = [] 26 | seed = 0 27 | 28 | rewards = [] 29 | ep_times = [] 30 | 31 | env_old = None 32 | with open(TEST_JSONL, "r") as fp: 33 | for line in fp.readlines(): 34 | 35 | content = json.loads(line) 36 | level = content.get('level') 37 | env_id = content.get("env_id") 38 | agent_loc = content.get("agent_loc") 39 | injured_player_loc = content.get("injured_player_loc") 40 | injured_agent_id = content.get("injured_agent_id") 41 | stretcher_loc = content.get("stretcher_loc") 42 | ambulance_loc= content.get("ambulance_loc") 43 | reference_text = content.get("reference_text") 44 | reference_image_path = content.get("reference_image_path") 45 | max_steps = 9999999 46 | timeout = content.get("timeout") 47 | 48 | env = gym.make(env_id, action_type='Mixed', observation_type='Color', reset_type=level) 49 | if env_old is not None: 50 | env = switch_env.SwitchEnvWrapper(env, env_old) 51 | env_old = env 52 | try: 53 | env._max_episode_steps = max_steps 54 | env = configUE.ConfigUEWrapper(env, offscreen=True, resolution=(640,480)) 55 | env = early_done.EarlyDoneWrapper(env, int(timeout)) 56 | 57 | if reference_image_path: 58 | reference_image_path = os.path.join(UnrealEnv, 'ref_image', reference_image_path[0]) 59 | reference_image = base64.b64encode(open(reference_image_path, 'rb').read()).decode("utf-8") 60 | else: 61 | reference_image = None 62 | 63 | agent.reset(reference_text, reference_image) 64 | 65 | env.seed(seed) 66 | seed += 1 67 | 68 | env.unwrapped.injured_player_pose = injured_player_loc 69 | env.unwrapped.rescue_pose = stretcher_loc 70 | env.unwrapped.agent_pose = agent_loc 71 | env.unwrapped.ambulance_pose = ambulance_loc 72 | ob, info = env.reset() 73 | env.unwrapped.unrealcv.set_appearance(env.unwrapped.injured_agent, injured_agent_id) 74 | 75 | ob, reward, termination, truncation, info = env.step([[(0, 0), 0, 0]]) 76 | current_step = 0 77 | picked_reward = 0. 78 | success = True 79 | 80 | start_t = time.time() 81 | while True: 82 | pred = agent.predicts( 83 | img2base64(ob), 84 | success, 85 | ) 86 | action = [(pred['angular'], pred['velocity']), pred['viewport'], pred['interaction']] 87 | ob, reward, termination, truncation, info = env.step([action]) 88 | if info['picked']: 89 | picked_reward = 0.5 90 | 91 | if pred['interaction'] == 3: 92 | if info['picked']: 93 | success = True 94 | else: 95 | success = False 96 | else: 97 | success = True 98 | 99 | if current_step % 1 == 0: 100 | print('step', current_step) 101 | end_t = time.time() 102 | if termination: 103 | rewards.append(1) 104 | ep_times.append(end_t - start_t) 105 | break 106 | if truncation or pred['interaction'] == 4: 107 | rewards.append(picked_reward) 108 | ep_times.append(end_t - start_t) 109 | break 110 | 111 | current_step += 1 112 | except: 113 | env.close() 114 | traceback.print_exc() 115 | exit() 116 | 117 | env.close() 118 | 119 | print('Rewards', rewards) 120 | print('Times', ep_times) 121 | rewards = np.mean(rewards) 122 | ep_times = np.sum(ep_times) 123 | print('Rewards', rewards, 'Times', ep_times) 124 | 125 | -------------------------------------------------------------------------------- /VLM_Agent/agent_VLM.py: -------------------------------------------------------------------------------- 1 | from VLM_Agent.prompt_yoloVLM import * 2 | from VLM_Agent.api_yoloVLM import * 3 | import os 4 | import re 5 | #import argparse 6 | #import gym 7 | import cv2 8 | import time 9 | import numpy as np 10 | import base64 11 | from PIL import Image 12 | import io 13 | from datetime import datetime 14 | from collections import deque 15 | import random 16 | from ultralytics import YOLO 17 | 18 | 19 | class agent: 20 | 21 | def __init__(self,reference_text,reference_image): 22 | self.clue = "" 23 | self.landmark = [] 24 | self.landmark_back = [] 25 | self.target = "" 26 | self.action = "" 27 | 28 | # Initialize obs and info 29 | self.obs = None 30 | self.info = None 31 | 32 | self.phase = 0 33 | self.initialized = False 34 | 35 | # Create buffers for actions and observations 36 | self.clue = reference_text 37 | 38 | self.image_clue = reference_image 39 | 40 | self.person_text = "" 41 | self.landmark = [] 42 | self.landmark_back = [] 43 | self.target = "" 44 | self.action = "" 45 | 46 | # Initialize obs and info 47 | self.obs = None 48 | self.info = None 49 | 50 | self.phase = 0 51 | 52 | # Create buffers for actions and observations 53 | self.action_buffer = [] 54 | self.obs_buffer = [] 55 | self.image_buffer = [] 56 | 57 | # State tracking 58 | self.current_step = 0 59 | self.move_fail_count = 0 60 | self.final_move_count = 0 61 | self.search_in_progress = False 62 | self.move_to_landmark_in_progress = False 63 | self.observe_in_progress = False 64 | self.move_obstacle_in_progress = False 65 | self.search_move_in_progress = False 66 | self.pending_action = None 67 | self.check = 0 68 | self.obs_storage_indices = [] 69 | self.around_image = None 70 | 71 | # Add step counters for limiting move_to_landmark iterations 72 | self.move_steps = 0 73 | self.max_move_steps = 2 # Match original code limit of 2 iterations 74 | 75 | self.myinfo = {'picked': 0} 76 | 77 | return 78 | 79 | def predict(self, obs, info): 80 | # Add a 1-second delay at the beginning of predict 81 | time.sleep(1) 82 | 83 | # Store the current observation and info 84 | self.obs = obs 85 | self.info = info 86 | 87 | # Check if there are pending actions in the buffer 88 | if self.action_buffer: 89 | action = self.action_buffer.pop(0) 90 | 91 | # If this action should store an observation, mark it 92 | if self.obs_buffer and len(self.obs_buffer) > 0: 93 | if self.obs_buffer[0]: 94 | # Store the current observation for future use 95 | self.image_buffer.append(obs.copy()) 96 | self.obs_buffer.pop(0) 97 | 98 | return action 99 | 100 | # Start the main logic chain if no pending actions 101 | if self.phase == 0: 102 | # Initialize by analyzing the clue 103 | return self._handle_initial_phase() 104 | 105 | elif self.phase == 1: 106 | # Search for the injured person 107 | return self._handle_search_phase() 108 | 109 | elif self.phase == 2: 110 | # Rescue the injured person 111 | return self._handle_rescue_phase() 112 | 113 | elif self.phase == 3: 114 | # Search for the stretcher/ambulance 115 | return self._handle_return_phase() 116 | 117 | elif self.phase == 4: 118 | # Place the person on the stretcher 119 | return self._handle_placement_phase() 120 | 121 | 122 | # Default action if nothing else to do 123 | return ([0, 0], 0, 0) 124 | 125 | def _handle_initial_phase(self): 126 | # First time through, analyze the clue 127 | if not self.landmark: 128 | self.analyse_initial_clue() 129 | self.analyse_initial_image() 130 | print(f"\n\n\nLANDMARK LIST: {', '.join(self.landmark)}") 131 | if self.action_buffer: 132 | return self.action_buffer.pop(0) 133 | # Directly search in the initial direction without rotating 134 | prompt = search_prompt_begin(self.landmark, self.person_text) 135 | max_retries = 3 136 | for attempt in range(max_retries): 137 | try: 138 | # res1 = call_api_vlm(prompt = """Please observe the image to determine if there is a relatively open area in front of you or if there are objects close enough for clear observation? Please only return "1" to indicate emptiness, or "0" to indicate not emptiness""", base64_image=self.encode_image_array(self.obs[0])) 139 | # if res1 == "1": 140 | # for i in range(5): 141 | # self.action_buffer.append(([0, 100], 0, 0)) 142 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.obs)) 143 | print(f"[SEARCH] \n {res} \n\n\n") 144 | pattern = re.compile(r'(.*?)\s*(.*?)', re.DOTALL) 145 | match = pattern.search(res) 146 | 147 | if match: 148 | analysis = match.group(1).strip() 149 | landmark = match.group(2).strip() 150 | 151 | if landmark != "None" and landmark != "NONE": 152 | self.target = landmark 153 | self.phase = 1 154 | self.move_to_landmark_in_progress = True 155 | # Queue up the initial forward movement 156 | self.action_buffer.append(([0, 50], 0, 1)) 157 | # if "njured" not in self.target: 158 | # self.action_buffer.append(([0, 100], 0, 0)) 159 | # self.action_buffer.append(([0, 100], 0, 0)) 160 | return ([0, 0], 0, 0) 161 | else: 162 | self.phase = 1 163 | self.search_in_progress = True 164 | return self._start_search() 165 | 166 | except Exception as e: 167 | if attempt == max_retries - 1: 168 | raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}") 169 | 170 | # If we get here, no landmark was found 171 | self.phase = 1 172 | self.search_in_progress = True 173 | return self._start_search() 174 | 175 | # Return default action if already initialized 176 | return ([0, 0], 0, 0) 177 | 178 | def _handle_search_phase(self): 179 | if self.info['picked'] == 1: 180 | self.phase = 3 181 | self.action_buffer.clear() 182 | self.obs_buffer.clear() 183 | print("[SPEAK] I have rescued them. It's time to go back.") 184 | return self._start_search() 185 | # If search is currently in progress, continue it 186 | if self.search_in_progress: 187 | result = self._process_search_result() 188 | self.search_in_progress = False 189 | 190 | if result == True: # landmark exists 191 | self.move_to_landmark_in_progress = True 192 | self.move_steps = 0 # Reset step counter when starting new movement 193 | return self._start_move_to_landmark() 194 | else: 195 | # self.search_move_in_progress = True 196 | # return self._start_search_move() 197 | self.action_buffer.append(([0, 100], 0, 0)) 198 | self.action_buffer.append(([0, 100], 0, 0)) 199 | self.action_buffer.append(([0, 100], 0, 0)) 200 | return self._start_search() 201 | 202 | # If moving to landmark is in progress 203 | elif self.move_to_landmark_in_progress: 204 | if self.move_steps > self.max_move_steps: 205 | # If reached max steps, go back to search 206 | self.move_to_landmark_in_progress = False 207 | self.search_in_progress = True 208 | return self._start_search() 209 | 210 | result = self._process_move_result() 211 | 212 | if result == False: 213 | self.move_fail_count += 1 214 | self.move_to_landmark_in_progress = False 215 | self.search_in_progress = True 216 | return self._start_search() 217 | 218 | # Check if we've reached max steps for move_to_landmark 219 | self.move_steps += 1 220 | 221 | if "injured" in self.target: 222 | self.phase = 2 223 | self.move_to_landmark_in_progress = False 224 | print("[SPEAK] I have found the injured person! Now I will try to rescue them.") 225 | 226 | # Queue up two forward actions 227 | self.action_buffer.append(([0, 50], 0, 3)) 228 | self.obs_buffer.append(False) 229 | return ([0, 50], 0, 0) 230 | 231 | # Continue moving to landmark 232 | return self._start_move_to_landmark() 233 | 234 | # If neither search nor move is in progress, start search 235 | else: 236 | self.search_in_progress = True 237 | return self._start_search() 238 | 239 | def _handle_rescue_phase(self): 240 | # If person is picked, move to next phase 241 | if self.info['picked'] == 1: 242 | self.phase = 3 243 | print("[SPEAK] I have rescued them. It's time to go back.") 244 | return ([0, 0], 0, 0) 245 | 246 | # If moving to landmark is in progress 247 | elif self.move_to_landmark_in_progress: 248 | result = self._process_move_result() 249 | 250 | if result == False: 251 | self.move_to_landmark_in_progress = False 252 | self.phase = 1 253 | self.target = "" 254 | print("[SPEAK] I lost the injured person. Let me try to search again.") 255 | return ([0, 20], 0, 0) 256 | 257 | # Continue moving to landmark 258 | return self._start_move_to_landmark() 259 | 260 | # Start moving to landmark 261 | else: 262 | self.move_to_landmark_in_progress = True 263 | self.move_steps = 0 # Reset step counter 264 | return self._start_move_to_landmark() 265 | 266 | def _handle_return_phase(self): 267 | # Update landmarks for the return journey 268 | if not self.landmark_back: 269 | self.landmark = ['ambulance', 'orange bench'] 270 | self.landmark_back = self.landmark 271 | 272 | # If search is in progress 273 | if self.search_in_progress: 274 | result = self._process_search_result() 275 | self.search_in_progress = False 276 | 277 | if "range" in self.target or "ench" in self.target: 278 | self.phase = 4 279 | print("[SPEAK] I found the stretcher. Let me place the injured person on it.") 280 | return self._start_move_to_landmark() 281 | 282 | if result == True: # landmark exists in view 283 | if self.move_obstacle_in_progress: 284 | obstacle_result = self._process_obstacle_result() 285 | if obstacle_result == True: 286 | self.move_obstacle_in_progress = False 287 | return self._start_search() 288 | 289 | self.move_to_landmark_in_progress = True 290 | return self._start_move_to_landmark() 291 | else: 292 | self.search_in_progress = True 293 | return self._start_search() 294 | 295 | elif self.move_obstacle_in_progress: 296 | result = self._process_obstacle_result() 297 | self.move_obstacle_in_progress = False 298 | self.move_to_landmark_in_progress = True 299 | return self._start_move_to_landmark() 300 | 301 | # If moving to landmark is in progress 302 | elif self.move_to_landmark_in_progress: 303 | if self.move_steps > self.max_move_steps: 304 | # If reached max steps, go back to search 305 | self.move_to_landmark_in_progress = False 306 | self.search_in_progress = True 307 | return self._start_search() 308 | 309 | result = self._process_move_result() 310 | 311 | if result == False: 312 | self.move_to_landmark_in_progress = False 313 | self.search_in_progress = True 314 | return self._start_search() 315 | else: 316 | self.move_steps += 1 317 | 318 | # Keep trying to move to landmark 319 | return self._start_move_to_landmark() 320 | 321 | # Start the search process 322 | else: 323 | self.search_in_progress = True 324 | return self._start_search() 325 | 326 | # def _handle_placement_phase(self): 327 | # # If moving to landmark is in progress 328 | # if self.move_to_landmark_in_progress: 329 | # result = self._process_move_result() 330 | # self.final_move_count += 1 331 | 332 | # if result == False or self.final_move_count >= 10: 333 | # self.move_to_landmark_in_progress = False 334 | # print("[SPEAK] I am going to place the injured person.") 335 | # self.phase = 1 336 | # # Queue backward movements and drop action 337 | # self.action_buffer.append(([0, -100], 0, 0)) 338 | # self.action_buffer.append(([0, 0], 0, 4)) 339 | # return ([0, -100], 0, 0) 340 | 341 | # # Continue moving to landmark 342 | # return self._start_move_to_landmark() 343 | 344 | # # Start moving to landmark 345 | # else: 346 | # self.move_to_landmark_in_progress = True 347 | # return self._start_move_to_landmark() 348 | 349 | def _handle_placement_phase(self): 350 | # if moving to landmark is in progress 351 | if self.move_to_landmark_in_progress: 352 | result = self._process_move_result() 353 | self.final_move_count += 1 354 | 355 | if result == False: 356 | if self.check == 0: 357 | self.check = 1 358 | # move backward 359 | self.action_buffer.clear() 360 | self.obs_buffer.clear() 361 | self.action_buffer.append(([0, -100], 0, 0)) 362 | self.obs_buffer.append(False) 363 | 364 | # observe 365 | self.action_buffer.append(([0, 0], 2, 0)) 366 | self.obs_buffer.append(True) # observe to check 367 | 368 | print("[SPEAK] let me check the stretcher") 369 | return ([0, -100], 0, 0) 370 | elif self.check == 1: 371 | self.check = 0 372 | self.move_to_landmark_in_progress = False 373 | self.search_in_progress = True 374 | self.phase = 3 375 | return self._start_search() 376 | elif self.check == 2: 377 | self.action_buffer.append(([0, -100], 0, 0)) 378 | self.action_buffer.append(([0, -100], 0, 0)) 379 | self.action_buffer.append(([0, -50], 0, 4)) 380 | else: 381 | if self.final_move_count >= 10: 382 | self.move_to_landmark_in_progress = False 383 | print("[SPEAK] I have moved too many times. Maybe I am stuck.") 384 | self.phase = 3 385 | self.action_buffer.append(([0, -50], 0, 0)) 386 | self.action_buffer.append(([0, 0], 0, 4)) 387 | return ([0, -100], 0, 0) 388 | if self.check == 1: 389 | self.check = 2 390 | # keep moving 391 | return self._start_move_to_landmark() 392 | 393 | # move 394 | else: 395 | self.move_to_landmark_in_progress = True 396 | return self._start_move_to_landmark() 397 | 398 | # Helper methods to start various processes 399 | def _start_search(self): 400 | # Prepare for search by observing surroundings 401 | self.observe_in_progress = True 402 | self.around_image = None 403 | self.image_buffer = [] 404 | 405 | # For phase 3 or 4, start rotation to observe surroundings 406 | self.action_buffer = [] 407 | self.obs_buffer = [] 408 | 409 | if self.phase == 3 or self.phase == 4: 410 | # Prepare for 4 observations while rotating 411 | for i in range(4): 412 | # For first observation, don't rotate 413 | if i == 0: 414 | self.action_buffer.append(([0, 0], 0, 0)) 415 | self.obs_buffer.append(True) 416 | else: 417 | # Add 3 rotations (first 2 don't save observations) 418 | for j in range(3): 419 | self.action_buffer.append(([30, 0], 0, 0)) 420 | # Only save observation on the last rotation of each set 421 | self.obs_buffer.append(j == 2) 422 | for i in range(3): 423 | self.action_buffer.append(([30, 0], 0, 0)) 424 | self.obs_buffer.append(False) 425 | 426 | # Return first action 427 | action = self.action_buffer.pop(0) 428 | should_save = self.obs_buffer.pop(0) 429 | if should_save: 430 | self.image_buffer.append(self.obs.copy()) 431 | return action 432 | else: 433 | # First rotate left 3 times (don't save observations) 434 | for _ in range(3): 435 | self.action_buffer.append(([-30, 0], 0, 0)) 436 | self.obs_buffer.append(False) 437 | 438 | # Now we're facing left - save this observation 439 | self.action_buffer.append(([0, 0], 0, 0)) 440 | self.obs_buffer.append(True) 441 | 442 | # Rotate right 3 times to face forward (don't save) 443 | for _ in range(3): 444 | self.action_buffer.append(([30, 0], 0, 0)) 445 | self.obs_buffer.append(False) 446 | 447 | # Now we're facing forward - save this observation 448 | self.action_buffer.append(([0, 0], 0, 0)) 449 | self.obs_buffer.append(True) 450 | 451 | # Rotate right 3 more times to face right (don't save) 452 | for _ in range(3): 453 | self.action_buffer.append(([30, 0], 0, 0)) 454 | self.obs_buffer.append(False) 455 | 456 | # Now we're facing right - save this observation 457 | self.action_buffer.append(([0, 0], 0, 0)) 458 | self.obs_buffer.append(True) 459 | 460 | # Rotate left 3 times to return to center (don't save) 461 | for _ in range(3): 462 | self.action_buffer.append(([-30, 0], 0, 0)) 463 | self.obs_buffer.append(False) 464 | 465 | # Get first action and observation status 466 | action = self.action_buffer.pop(0) 467 | should_save = self.obs_buffer.pop(0) 468 | if should_save: 469 | self.image_buffer.append(self.obs.copy()) 470 | 471 | return action 472 | 473 | def _start_move_to_landmark(self): 474 | # This action will be handled immediately, the result processed on next prediction 475 | return ([0, 0], 0, 0) 476 | 477 | def _start_move_obstacle(self): 478 | # Prepare for obstacle check 479 | self.action_buffer = [ 480 | ([0, -50], 0, 0), 481 | ([0, 0], 2, 0) 482 | ] 483 | self.obs_buffer = [False, False] 484 | 485 | # Return first action 486 | action = self.action_buffer.pop(0) 487 | self.obs_buffer.pop(0) 488 | return action 489 | 490 | def _start_search_move(self): 491 | # Prepare for search move 492 | self.observe_in_progress = True 493 | self.image_buffer = [] 494 | 495 | # Set up observation sequence 496 | self.action_buffer = [] 497 | self.obs_buffer = [] 498 | 499 | if self.phase == 3 or self.phase == 4: 500 | # Similar to search but with a focus on movement 501 | for i in range(4): 502 | if i == 0: 503 | self.action_buffer.append(([0, 0], 0, 0)) 504 | self.obs_buffer.append(True) 505 | else: 506 | for j in range(3): 507 | self.action_buffer.append(([30, 0], 0, 0)) 508 | self.obs_buffer.append(j == 2) 509 | 510 | # Get first action and observation status 511 | action = self.action_buffer.pop(0) 512 | should_save = self.obs_buffer.pop(0) 513 | if should_save: 514 | self.image_buffer.append(self.obs.copy()) 515 | 516 | return action 517 | else: 518 | # First rotate left 3 times (don't save observations) 519 | for _ in range(3): 520 | self.action_buffer.append(([-30, 0], 0, 0)) 521 | self.obs_buffer.append(False) 522 | 523 | # Now we're facing left - save this observation 524 | self.action_buffer.append(([0, 0], 0, 0)) 525 | self.obs_buffer.append(True) 526 | 527 | # Rotate right 3 times to face forward (don't save) 528 | for _ in range(3): 529 | self.action_buffer.append(([30, 0], 0, 0)) 530 | self.obs_buffer.append(False) 531 | 532 | # Now we're facing forward - save this observation 533 | self.action_buffer.append(([0, 0], 0, 0)) 534 | self.obs_buffer.append(True) 535 | 536 | # Rotate right 3 more times to face right (don't save) 537 | for _ in range(3): 538 | self.action_buffer.append(([30, 0], 0, 0)) 539 | self.obs_buffer.append(False) 540 | 541 | # Now we're facing right - save this observation 542 | self.action_buffer.append(([0, 0], 0, 0)) 543 | self.obs_buffer.append(True) 544 | 545 | # Rotate left 3 times to return to center (don't save) 546 | for _ in range(3): 547 | self.action_buffer.append(([-30, 0], 0, 0)) 548 | self.obs_buffer.append(False) 549 | 550 | # Get first action and observation status 551 | action = self.action_buffer.pop(0) 552 | should_save = self.obs_buffer.pop(0) 553 | if should_save: 554 | self.image_buffer.append(self.obs.copy()) 555 | 556 | return action 557 | 558 | # Methods to process results of various actions 559 | def _process_search_result(self): 560 | # If we've collected observations, process them 561 | if self.image_buffer: 562 | self.around_image = self.concatenate_images(self.image_buffer) 563 | self.image_buffer = [] 564 | 565 | if self.phase == 0: 566 | self.phase = 1 567 | prompt = search_prompt_begin(self.landmark, self.person_text) 568 | max_retries = 3 569 | for attempt in range(max_retries): 570 | try: 571 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.obs)) 572 | print(f"[SEARCH] \n {res} \n\n\n") 573 | pattern = re.compile(r'(.*?)\s*(.*?)', re.DOTALL) 574 | match = pattern.search(res) 575 | 576 | if match: 577 | analysis = match.group(1).strip() 578 | landmark = match.group(2).strip() 579 | 580 | if landmark != "None" and landmark != "NONE": 581 | self.target = landmark 582 | # Queue search confirmation action 583 | self.action_buffer.append(([0, 100], 0, 1)) 584 | self.action_buffer.append(([0, 100], 0, 1)) 585 | self.obs_buffer.append(False) # Don't save observation after this action 586 | return True 587 | else: 588 | return False 589 | 590 | except Exception as e: 591 | if attempt == max_retries - 1: 592 | raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}") 593 | 594 | elif self.phase == 3 or self.phase == 4: 595 | landmark = random.choice([self.landmark, ["orange bench"]]) 596 | prompt = search_prompt_back(landmark, self.person_text) 597 | max_retries = 3 598 | for attempt in range(max_retries): 599 | try: 600 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.around_image)) 601 | print(f"[SEARCH] \n {res} \n\n\n") 602 | pattern = re.compile(r'(.*?)\s*(.*?)\s*(.*?)', re.DOTALL) 603 | match = pattern.search(res) 604 | 605 | if match: 606 | analysis = match.group(1).strip() 607 | landmark = match.group(2).strip() 608 | side = match.group(3).strip() 609 | 610 | if landmark != "None" and landmark != "NONE": 611 | self.target = landmark 612 | # Queue appropriate rotation actions based on side 613 | if side == "left": 614 | for _ in range(3): 615 | self.action_buffer.append(([-30, 0], 0, 0)) 616 | self.obs_buffer.append(False) # Don't save observations during rotation 617 | elif side == "right": 618 | for _ in range(3): 619 | self.action_buffer.append(([30, 0], 0, 0)) 620 | self.obs_buffer.append(False) # Don't save observations during rotation 621 | elif side == "back": 622 | for _ in range(6): 623 | self.action_buffer.append(([30, 0], 0, 0)) 624 | self.obs_buffer.append(False) # Don't save observations during rotation 625 | # Add forward action 626 | self.action_buffer.append(([0, 100], 0, 0)) 627 | self.action_buffer.append(([0, 100], 0, 0)) 628 | self.obs_buffer.append(False) # Don't save observation after this action 629 | return True 630 | else: 631 | return False 632 | 633 | except Exception as e: 634 | if attempt == max_retries - 1: 635 | raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}") 636 | 637 | else: 638 | prompt = search_prompt(self.landmark, self.person_text) 639 | max_retries = 3 640 | for attempt in range(max_retries): 641 | try: 642 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.around_image)) 643 | print(f"[SEARCH] \n {res} \n\n\n") 644 | pattern = re.compile(r'(.*?)\s*(.*?)\s*(.*?)', re.DOTALL) 645 | match = pattern.search(res) 646 | 647 | if match: 648 | analysis = match.group(1).strip() 649 | landmark = match.group(2).strip() 650 | side = match.group(3).strip() 651 | 652 | if landmark != "None" and landmark != "NONE": 653 | self.target = landmark 654 | self.action_buffer.append(([0, 10], 0, 3)) 655 | # Queue appropriate rotation actions based on side 656 | if side == "left": 657 | for _ in range(3): 658 | self.action_buffer.append(([-30, 0], 0, 0)) 659 | self.obs_buffer.append(False) # Don't save observations during rotation 660 | elif side == "right": 661 | for _ in range(3): 662 | self.action_buffer.append(([30, 0], 0, 0)) 663 | self.obs_buffer.append(False) # Don't save observations during rotation 664 | # Add forward action 665 | 666 | self.action_buffer.append(([0, 100], 0, 0)) 667 | self.obs_buffer.append(False) # Don't save observation after this action 668 | return True 669 | else: 670 | return False 671 | 672 | except Exception as e: 673 | if attempt == max_retries - 1: 674 | raise ValueError(f"[SEARCH] Failed after {max_retries} attempts: {str(e)}") 675 | 676 | return False 677 | 678 | def _process_move_result(self): 679 | prompt = move_forward_prompt(self.target, self.person_text) 680 | max_retries = 3 681 | for attempt in range(max_retries): 682 | try: 683 | res = call_api_vlm(prompt=prompt, 684 | base64_image=self.encode_image_array(self.add_vertical_lines(self.obs))) 685 | print(f"[MOVING] \n {res} \n\n\n") 686 | pattern = re.compile(r'(.*?)\s*(.*?)', re.DOTALL) 687 | match = pattern.search(res) 688 | 689 | if match: 690 | tag = match.group(1).strip() 691 | direction = match.group(2).strip() 692 | 693 | if tag == "no" or tag == "No" or tag == "NO": 694 | return False 695 | elif tag == "yes" or tag == "Yes" or tag == "YES": 696 | # Queue appropriate actions based on direction 697 | if direction == "left": 698 | self.action_buffer.append(([-20, 0], 0, 0)) 699 | self.action_buffer.append(([-20, 0], 0, 0)) 700 | elif direction == "right": 701 | self.action_buffer.append(([20, 0], 0, 0)) 702 | self.action_buffer.append(([20, 0], 0, 0)) 703 | elif direction == "middle": 704 | self.action_buffer.append(([0, 100], 0, 0)) 705 | self.action_buffer.append(([0, 100], 0, 0)) 706 | # forward action 707 | self.action_buffer.append(([0, 50], 0, 3)) 708 | self.obs_buffer.append(False) # Don't save observation during forward movement 709 | 710 | # Special handling for injured person target 711 | # if "njured" in self.target and self.info['picked'] == 0: 712 | # # Add pickup action (3) as in original code 713 | # self.action_buffer.append(([0, 0], 0, 3)) 714 | # self.obs_buffer.append(False) # Don't save observation during pickup 715 | 716 | # Special handling for cross the door 717 | if "door" in self.target: 718 | self.action_buffer.append(([0, 0], 0, 5)) 719 | self.action_buffer.append(([0, 100], 0, 0)) 720 | # jump can fix many bugs! 721 | if self.info['picked'] != 1: 722 | self.action_buffer.append(([0, 40], 0, 1)) 723 | self.obs_buffer.append(False) # Don't save observation during pickup check 724 | 725 | return True 726 | 727 | except Exception as e: 728 | if attempt == max_retries - 1: 729 | raise ValueError(f"[MOVE] Failed after {max_retries} attempts: {str(e)}") 730 | 731 | return False 732 | 733 | def _process_obstacle_result(self): 734 | prompt = move_obstacle_prompt() 735 | max_retries = 3 736 | for attempt in range(max_retries): 737 | try: 738 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.obs)) 739 | print(f"[CHECKING OBSTACLE] \n {res} \n\n\n") 740 | pattern = re.compile(r'(.*?)\s*(.*?)', re.DOTALL) 741 | match = pattern.search(res) 742 | 743 | if match: 744 | tag = match.group(1).strip() 745 | obstacle = match.group(2).strip() 746 | 747 | if tag == "1": 748 | # Queue random turning and movement to avoid obstacle 749 | turn = random.choice([-30, 30]) 750 | self.action_buffer.append(([turn, 0], 0, 0)) 751 | self.obs_buffer.append(False) # Don't save observation during turning 752 | 753 | self.action_buffer.append(([turn, 0], 0, 0)) 754 | self.obs_buffer.append(False) # Don't save observation during turning 755 | 756 | # Add forward movements 757 | for _ in range(5): 758 | self.action_buffer.append(([0, 100], 0, 0)) 759 | self.obs_buffer.append(False) # Don't save observations during forward movement 760 | 761 | # Add return rotation 762 | self.action_buffer.append(([-1 * turn, 0], 0, 0)) 763 | self.obs_buffer.append(False) # Don't save observation during return rotation 764 | 765 | self.action_buffer.append(([-1 * turn, 0], 0, 0)) 766 | self.obs_buffer.append(False) # Don't save observation during return rotation 767 | 768 | return True 769 | else: 770 | return False 771 | 772 | except Exception as e: 773 | if attempt == max_retries - 1: 774 | raise ValueError(f"[MOVE_OBSTACLE] Failed after {max_retries} attempts: {str(e)}") 775 | 776 | return False 777 | 778 | def _process_search_move_result(self): 779 | prompt = search_move_prompt() 780 | max_retries = 3 781 | for attempt in range(max_retries): 782 | try: 783 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.around_image)) 784 | print(f"[SEARCH-MOVE] \n {res} \n\n\n") 785 | pattern = re.compile(r'(.*?)\s*(.*?)', re.DOTALL) 786 | match = pattern.search(res) 787 | 788 | if match: 789 | output = match.group(2).strip() 790 | 791 | # Queue appropriate rotation actions 792 | if output == "left": 793 | for _ in range(3): 794 | self.action_buffer.append(([-30, 0], 0, 0)) 795 | elif output == "right": 796 | for _ in range(3): 797 | self.action_buffer.append(([30, 0], 0, 0)) 798 | 799 | # Add forward actions 800 | for _ in range(3): 801 | self.action_buffer.append(([0, 100], 0, 0)) 802 | 803 | # Add a search action if not carrying 804 | if self.info["picked"] != 1: 805 | self.action_buffer.append(([0, 50], 0, 1)) 806 | 807 | return True 808 | 809 | except Exception as e: 810 | if attempt == max_retries - 1: 811 | raise ValueError(f"[SEARCH_MOVE] Failed after {max_retries} attempts: {str(e)}") 812 | 813 | return False 814 | 815 | # Original helper methods preserved 816 | def analyse_initial_clue(self): 817 | # res0 = call_api_vlm(prompt=in_out_door_prompt(self.clue), base64_image=self.image_clue) 818 | prompt = initial_clue_prompt(self.clue) 819 | # prompt = initial_clue_prompt(self.clue) 820 | max_retries = 3 821 | for attempt in range(max_retries): 822 | try: 823 | res = call_api_llm(prompt=prompt) 824 | print(res) 825 | pattern = re.compile(r'(.*?)\s*(.*?)\s*(.*?)', re.DOTALL) 826 | match = pattern.search(res) 827 | 828 | if match: 829 | direction = match.group(2).strip() 830 | # Queue appropriate rotation actions based on direction 831 | if direction == "left": 832 | for _ in range(3): 833 | self.action_buffer.append(([-30, 0], 0, 0)) 834 | self.obs_buffer.append(False) 835 | elif direction == "right": 836 | for _ in range(3): 837 | self.action_buffer.append(([30, 0], 0, 0)) 838 | self.obs_buffer.append(False) 839 | elif direction == "back": 840 | for _ in range(6): 841 | self.action_buffer.append(([30, 0], 0, 0)) 842 | self.obs_buffer.append(False) 843 | elif "front" in direction and "left" in direction: 844 | self.action_buffer.append(([-30, 0], 0, 0)) 845 | self.obs_buffer.append(False) 846 | elif "front" in direction and "right" in direction: 847 | self.action_buffer.append(([30, 0], 0, 0)) 848 | self.obs_buffer.append(False) 849 | elif "rear" in direction and "left" in direction: 850 | for _ in range(4): 851 | self.action_buffer.append(([-30, 0], 0, 0)) 852 | self.obs_buffer.append(False) 853 | elif "rear" in direction and "right" in direction: 854 | for _ in range(4): 855 | self.action_buffer.append(([30, 0], 0, 0)) 856 | self.obs_buffer.append(False) 857 | # if res0 == "0": 858 | # for _ in range(10): 859 | # self.action_buffer.append(([0, 100], 0, 0)) 860 | # self.action_buffer.append(([0, 0], 0, 0)) 861 | # self.obs_buffer.append(True) 862 | self.action_buffer.append(([0, 0], 0, 0)) 863 | self.obs_buffer.append(True) 864 | 865 | # Parse landmarks 866 | # self.landmark = ["ambulance"] 867 | # if res0 == "0": 868 | # self.landmark = ["door"] 869 | self.landmark += match.group(3).strip().split(",") 870 | self.landmark = [ 871 | "injured person" if landmark.strip().lower() == "injured woman" or landmark.strip().lower() == "injured man" 872 | else landmark.strip() for landmark in self.landmark] 873 | self.landmark = [landmark for landmark in self.landmark if 874 | "ench" not in landmark and "mbulance" not in landmark] 875 | # if res0 == "0": 876 | # self.landmark_back = ['door', 'ambulance', 'stretcher'] 877 | # else: 878 | self.landmark_back = ['ambulance', 'orange bench'] 879 | return 880 | 881 | except Exception as e: 882 | if attempt == max_retries - 1: 883 | raise ValueError(f"[CLUE ANALYSE] Failed after {max_retries} attempts: {str(e)}") 884 | 885 | def analyse_initial_image(self): 886 | prompt = initial_image_prompt() 887 | max_retries = 3 888 | for attempt in range(max_retries): 889 | try: 890 | res = call_api_vlm(prompt=prompt, base64_image=self.encode_image_array(self.image_clue)) 891 | self.person_text = res 892 | print(res) 893 | return True 894 | 895 | except Exception as e: 896 | if attempt == max_retries - 1: 897 | raise ValueError(f"[IMAGE ANALYSE] Failed after {max_retries} attempts: {str(e)}") 898 | 899 | return False 900 | 901 | def encode_image_array(self, image_array): 902 | # Convert the image array to a PIL Image object 903 | image = Image.fromarray(np.uint8(image_array)) 904 | 905 | # Save the PIL Image object to a bytes buffer 906 | buffered = io.BytesIO() 907 | image.save(buffered, format="JPEG") 908 | 909 | # Encode the bytes buffer to Base64 910 | img_str = base64.b64encode(buffered.getvalue()).decode('utf-8') 911 | 912 | return img_str 913 | 914 | def concatenate_images(self, image_list): 915 | height, width, channels = image_list[0].shape 916 | 917 | total_width = width * len(image_list) 918 | concatenated_image = np.zeros((height, total_width, channels), dtype=np.uint8) 919 | 920 | for i, img in enumerate(image_list): 921 | concatenated_image[:, i * width:(i + 1) * width, :] = img 922 | 923 | return concatenated_image 924 | 925 | def add_vertical_lines(self, image_array): 926 | h, w, c = image_array.shape 927 | 928 | line1 = w // 3 929 | line2 = w * 2 // 3 930 | line_color = (0, 0, 255) 931 | line_thickness = 2 932 | 933 | # Create a copy to avoid modifying the original 934 | image_copy = image_array.copy() 935 | cv2.line(image_copy, (line1, 0), (line1, h), line_color, line_thickness) 936 | cv2.line(image_copy, (line2, 0), (line2, h), line_color, line_thickness) 937 | 938 | return image_copy 939 | 940 | 941 | 942 | def reset(self, text, image): 943 | self.clue = text 944 | self.image_clue = image 945 | return 946 | 947 | 948 | -------------------------------------------------------------------------------- /VLM_Agent/api_yoloVLM.py: -------------------------------------------------------------------------------- 1 | 2 | import requests 3 | import ollama 4 | 5 | 6 | def call_api_vlm(prompt, base64_image=None): 7 | 8 | 9 | ollama_client = ollama.Client() 10 | 11 | response = ollama_client.chat(model='gemma3:27b', 12 | messages=[{ 13 | 'role': 'system', 14 | 'content': prompt, 15 | }, 16 | { 17 | 'role': 'user', 18 | 'content':"Robot's observation", 19 | "images": [base64_image] 20 | } 21 | ] 22 | ) 23 | 24 | return response['message']['content'] 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | def call_api_llm(prompt): 33 | 34 | input_messages = [ 35 | {"role": "system", "content": "You are a helpful assistant."}, 36 | {"role": "user", "content": prompt} 37 | ] 38 | 39 | ollama_client = ollama.Client() 40 | response = ollama_client.chat(model='gemma3:27b', 41 | messages=input_messages 42 | ) 43 | 44 | return response['message']['content'] 45 | 46 | 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /VLM_Agent/prompt_yoloVLM.py: -------------------------------------------------------------------------------- 1 | 2 | def initial_clue_prompt(clue): 3 | p = f""" 4 | You are a rescue robot, and now you have the following initial prompts about the rescue target: 5 | Clue: {clue} 6 | 7 | Please analyze two major questions: 8 | 1. Does the prompt mention where the first target is located on the robot? (Front, right side, etc.) 9 | 2. What objects or symbols are mentioned in the prompt? According to the logical description of the prompt, arrange the mentioned landmark objects as nodes in the order from the starting point (away from the injured person) to the ending point (injured person). 10 | 11 | Output format with XML tags: 12 | Your step-by-step think process 13 | first target's direction (front/back/left/right/none) 14 | Sorted landmark objects 15 | 16 | Example: 17 | Clue: The injured person is under a fruit tree. After the fruit tree enters the orchard, turn left. There is a fence outside the orchard and a car is parked. 18 | (think process) 19 | front 20 | car,fence,orchard entrance,fruit tree,injured person 21 | 22 | 23 | """ 24 | return p 25 | 26 | 27 | 28 | def initial_clue_prompt_indoor(clue): 29 | p = f""" 30 | You are a rescue robot, and now you have the following initial prompts about the rescue target: 31 | Clue: {clue} 32 | 33 | Please analyze two major questions: 34 | 1. Does the prompt mention where the first target is located on the robot? (Front, right side, etc.) 35 | 2. What objects or symbols are mentioned in the prompt? According to the logical description of the prompt, arrange the mentioned landmark objects as nodes in the order from the starting point (away from the injured person) to the ending point (injured person). 36 | - Attention, you already know that the injured person is indoors, so the node must include a "door". 37 | Output format with XML tags: 38 | Your step-by-step think process 39 | first target's direction (front/back/left/right/none) 40 | Sorted landmark objects 41 | 42 | Example: 43 | Clue: The injured person is under a fruit tree. After the fruit tree enters the orchard, turn left. There is a fence outside the orchard and a car is parked. 44 | (think process) 45 | front 46 | car,fence,orchard entrance,fruit tree,injured person 47 | 48 | 49 | """ 50 | return p 51 | 52 | 53 | 54 | def in_out_door_prompt(clue): 55 | p = f""" 56 | Here are clue about the target: 57 | {clue} 58 | and you will get the image of the target. 59 | Please determine whether the target is indoors or outdoors based on this text. 60 | If indoors, please return 0; If outdoors, please return to 1. 61 | Do not provide any additional responses. 62 | 63 | """ 64 | return p 65 | 66 | def initial_image_prompt(): 67 | p = f""" 68 | Please detect the human lying on the ground in the picture and output the color of the clothes they are wearing. 69 | Please only output colors and do not include any other explanatory text. 70 | Example 1 (correct): 71 | Blue and white 72 | Example 2 (incorrect, as only color output is allowed): 73 | The color of the clothes passed on by this person is blue and white 74 | """ 75 | return p 76 | 77 | 78 | 79 | 80 | 81 | def search_prompt(landmark_list, person_text): 82 | 83 | landmark_str = ", ".join(landmark_list) 84 | 85 | p = f""" 86 | 87 | You are a rescue robot, you need to find the wounded. 88 | 89 | The input RGB image is a concatenation of three images, reflecting the robot's field of view on the left, front, and right sides, respectively. 90 | 91 | You now have a list of landmarks:[{landmark_str}] 92 | the list was sort in descending order of priority. 93 | 94 | Please complete the following tasks: 95 | 1. Please analyze the information and objects contained in each image separately; 96 | 2. Check if the objects in the list appear within the field of view. 97 | 98 | If your target is an Injured person, please note that they are human beings lying on the ground and the color of their clothes is {person_text}. 99 | 100 | 101 | Use XML tags to output results in the following format: 102 | Check whether the objects in the list appear in order based on the information within the field of view 103 | the object in the views with the highest priority, or outputs NONE to indicate that all objects in the list are invisible (When describing an object, please ensure consistency with the expression in the list) 104 | choose one from left/front/right (Which image does the chosen object belong to) 105 | 106 | Example: 107 | The left view includes red car, trash can, and lawn. There is a wall in front, which should be the wall of the house. There is a swimming pool on the right, with parasols and lounge chairs next to it. In order, first check the injured person - none of the three pictures are present. Then there is the red car, which exists in the field of view and belongs to the left view. I should output red car. 108 | red car 109 | left 110 | 111 | """ 112 | return p 113 | 114 | 115 | 116 | 117 | 118 | def search_prompt_begin(landmark_list, person_text): 119 | 120 | landmark_str = ", ".join(landmark_list) 121 | 122 | p = f""" 123 | 124 | You are a rescue robot, you need to find the wounded. 125 | 126 | The input RGB image now displays the visual field in front of the robot. 127 | 128 | You now have a list of landmarks:[{landmark_str}] 129 | the objects in the list are some iconic objects that may be seen on the path from your current location to the injured person. The objects at the front are of low priority, while the injured at the bottom have the highest priority. 130 | 131 | Please complete the following tasks: 132 | 1. Analyze which objects are present in the field of view; 133 | 2. Corresponding to the iconic objects in the list, find the object with the highest priority in the field of view. 134 | Please note that the description should be consistent with the description in the list. 135 | 136 | Use XML tags to output results in the following format: 137 | Analysis of information within the field of view 138 | The highest priority landmark object (If there is no landmark object in the list within the field of view, output NONE) 139 | 140 | Example: 141 | In the field of view, you can see the entrance of the orchard, the fence, and the garbage bin, both of which are landmarks on the list. Since the priority of the entrance is higher than that of the fence, I will choose the fence output. 142 | fence 143 | 144 | """ 145 | return p 146 | 147 | 148 | 149 | def search_prompt_back(landmark_list, person_text): 150 | 151 | landmark_str = ", ".join(landmark_list) 152 | 153 | p = f""" 154 | 155 | You are a rescue robot, you need to find the wounded. 156 | 157 | The input RGB image is a concatenation of four images, reflecting the robot's field of view on the front, right ,back and left sides, respectively. 158 | 159 | You need to find the stretcher within your field of view (if it exists). Stretchers will always appear next to the ambulance, so when you cannot see the stretcher, you should output the ambulance. 160 | 161 | Please complete the following tasks: 162 | 163 | 1. Please analyze the information and objects contained in each image separately; 164 | 2. Check if there is an ambulance or stretcher in the field of view. If there is, output it. When both are visible, prioritize outputting the stretcher and only output one object. 165 | 166 | Use XML tags to output results in the following format: 167 | 168 | Check if there is a stretcher or ambulance in the field of view 169 | stretcher/ambulance/NONE 170 | select one from front/right/back/left (which image does the selected object belong to) 171 | 172 | example: 173 | The left view includes red cars, trash cans, and lawns. There are stretchers and ambulances ahead. There is a swimming pool on the right, with parasols and lounge chairs next to it. The image behind is a wall. The stretcher and ambulance are both within sight, I should prioritize the delivery of the stretcher. 174 | stretcher 175 | front 176 | 177 | """ 178 | return p 179 | 180 | 181 | def move_forward_prompt(target, person_text): 182 | p = f""" 183 | 184 | You are a rescue robot, you need to find the wounded. 185 | 186 | The input RGB image now displays the visual field in front of the robot. You are currently moving towards the target :[{target}]. 187 | The image is divided into three parts by two red vertical lines: left, middle, right. Please identify the following issue: 188 | 1. Is the target still within sight? 189 | 2. Is the target mainly located on the left, middle, or right side of the image? 190 | 191 | If your target is an Injured person, please note that they are human beings lying on the ground and the color of their clothes is {person_text}. 192 | 193 | Use XML tags to output results in the following format: 194 | yes/no (Determine if the target is still within the field of view) 195 | left/middle/right(Determine which area the target is in) 196 | 197 | Example: 198 | yes 199 | middle 200 | 201 | """ 202 | return p 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | def search_move_prompt(): 212 | p = f""" 213 | 214 | You are a rescue robot, you need to explore the environment and find the wounded. 215 | The input RGB image now is a concatenation of three images, showing the visual content of the robot's left, front, and right sides respectively. 216 | Please analyze the field of view information within each of the three images and choose the direction that is more worth exploring. 217 | How to determine if it is worth exploring: 218 | If there are distant extensions or new areas such as corners or channels connecting in the picture, it is more worth exploring; If it's a wall or a corner, then it's obviously not worth exploring. 219 | Use XML tags to output results in the following format: 220 | Your analysis process 221 | The direction you choose can be left, front, or right 222 | 223 | Example: 224 | In front of me is a wall, and on the right is an empty space, which seems to have no exploratory value. There is a door on my left, and the space inside the door is an unknown area worth exploring. 225 | left 226 | """ 227 | return p 228 | 229 | 230 | def access_search_prompt(): 231 | p = f""" 232 | # background 233 | 234 | You are a rescue robot that needs to explore unknown environments and find mission targets (wounded and stretchers) 235 | A person was injured and lying next to the red car on the right side of the house. 236 | 237 | # Input information: 238 | RGB image: composed of four images pieced together, representing the content of your field of view in the front, right, back, and left directions. 239 | 240 | Please perform the following tasks in order: 241 | 242 | 1. Analyze the information within your field of view to determine if you are currently on the road? 243 | 2. If you are not on the road, please find the nearest intersection to help you move from your current area to the road. 244 | 245 | Output (including XML format): 246 | whether you are currently on the road (select from yes, no) 247 | To move the target, you need to describe the intersection you want to pass through 248 | In which direction is the intersection within the field of view? (Select from front, right, back, left) 249 | 250 | 251 | Example: 252 | no 253 | the left side of the white fence can be accessed 254 | front 255 | 256 | 257 | """ 258 | 259 | return p 260 | 261 | 262 | 263 | 264 | 265 | def move_obstacle_prompt(): 266 | p = f""" 267 | You are helping a robot move, and the incoming RGB image is its downward view. 268 | Please confirm if there are any horizontal obstacles in the view that may hinder our movement, such as a wall, a horizontal fence, etc. And other objects, such as a tree or a box, although they are also obstacles, their width is limited, so we can easily navigate around them, and these are not considered horizontal obstacles. 269 | In addition, obstacles also need to have sufficient height, and if they are just objects on the ground, they can also be ignored. 270 | 271 | Please output according to the format, including XML tags: 272 | Whether there are horizontal obstacles (Choose from 0 and 1, where 0 represents none and 1 represents presence) 273 | If it exists, describe what obstacle it is 274 | 275 | 276 | Example: 277 | 1 278 | There is a white fence in front of me 279 | 280 | """ 281 | return p 282 | 283 | def moving_suggestion(clue): 284 | p = f""" 285 | You are assisting a robot in locating an injured person lying on the ground. Your task is to provide a **moving direction suggestion** based on the **text clue**, the robot's **current visual observation**, and a **scratch trajectory image** that aids spatial reasoning. 286 | 287 | ## Inputs: 288 | 1. **Text Clue**: {clue} 289 | 2. **Robot’s Observation Sequence**: A concatenation of RGB images representing the robot's recent first-person views. 290 | 3. **Trajectory Image**: A top-down sketch of the robot’s movement history. The **green point** marks the starting location, the **red point** marks the current position, and the **upward direction** represents the robot’s initial orientation. The robot has **no memory** of previously visited paths, so this image is crucial for reasoning. 291 | 292 | ## Your Task: 293 | Analyze the text clue, image observations, and the trajectory to determine which direction the robot should move in next. 294 | 295 | ## Output Format: 296 | Provide your suggestion using the following XML format: 297 | YOUR_SUGGESTION 298 | (Choose from: front, right, back, left, jump) 299 | 300 | ## Example Output: 301 | front 302 | """ 303 | return p 304 | 305 | 306 | 307 | def moving_back_suggestion(): 308 | p = f""" 309 | You are helping a robot to finding the yellow stretcher beside a ambulance car, your task is to give a moving direction suggestion, considering the robot's observations. 310 | 311 | #Input 312 | 1. Robot's observation sequence: The input RGB image is a concatenation of robot's continuous observation 313 | 314 | Please analyze the robot's observations to determine the direction the robot should explore. 315 | **Considering Strategy** 316 | 1. First determine if the ambulance car is visible in the current view, if not, turn around to explore. 317 | 2. The ambulance car generally are parking near a large free area, like middle of the road, so avoid moving toward area with clutter structures or buildings. 318 | 319 | Please output according to the format, including XML tags: 320 | The moving suggestion (Select from front, right, back, left) 321 | 322 | Example: 323 | front 324 | 325 | """ 326 | return p -------------------------------------------------------------------------------- /VLM_Agent/readme.md: -------------------------------------------------------------------------------- 1 | # Hybrid YOLO-VLM Rescue Agent README 2 | 3 | ## Overview 4 | This agent is designed to solve rescue missions in a simulated environment by combining object detection (YOLO) and vision-language models (VLM) to create a two-tiered decision system. The agent navigates through the environment, finds an injured person, and transports them to a stretcher or ambulance. 5 | 6 | The agent successfully completes rescue tasks by implementing a hybrid approach: 7 | 1. **Primary Tier: YOLO Detection** - Provides precise object detection and movement when targets are clearly visible 8 | 2. **Secondary Tier: VLM Landmark Navigation** - Takes over when YOLO detection fails, using contextual understanding to search for landmarks 9 | 10 | ## Installation 11 | This guide will help you set up the environment for running the YOLO-VLM integration project, which combines object detection with vision-language capabilities. 12 | 13 | ### Step 1: Clone the Repository 14 | ```bash 15 | git clone https://github.com/atecup/atec2025_software_algorithm.git 16 | cd atec2025_software_algorithm/VLM_Agent 17 | ``` 18 | 19 | ### Step 2: Install Dependencies 20 | ```bash 21 | pip install -r requirements.txt 22 | ``` 23 | 24 | ### Step 3: Set Up Ollama 25 | 1. Install Ollama from: https://ollama.com/download 26 | 2. Pull the Gemma 3 model (or other models if you need): 27 | ```bash 28 | ollama pull gemma3:27b 29 | ``` 30 | You can then use solution_yoloVLM.py as the solution.py required to complete the full task. 31 | 32 | 33 | ## System Architecture 34 | 35 | ### Two-Tier Decision System 36 | The agent implements a hierarchical decision-making process: 37 | 38 | 1. **YOLO Object Detection (Primary)** 39 | - Runs first on each frame to detect key objects (person, stretcher, truck/ambulance) 40 | - Provides precise positioning and movement when objects are detected 41 | - Handles direct movement control with defined thresholds for actions 42 | 43 | 2. **VLM Landmark Navigation (Secondary)** 44 | - Activates when YOLO fails to detect relevant objects 45 | - Analyzes textual clues and visual environment 46 | - Uses landmark-based navigation to systematically search the environment 47 | 48 | ### YOLO Detection System 49 | - **Object Detection**: Uses YOLO model to detect persons, stretchers (suitcases), trucks, and buses with confidence thresholds 50 | - **Precision Control**: 51 | - Divides screen into navigation zones (left, center, right) 52 | - Uses object coordinates to determine optimal movement action 53 | - Implements proximity thresholds for pickup and drop actions 54 | - **State Tracking**: Maintains state of rescue operation (whether person has been picked up) 55 | 56 | ### Landmark-Based Navigation (VLM) 57 | The agent parses the initial text clue to identify potential landmarks that might lead to the injured person. For example, if the clue is "The injured person is by the roadside garbage bin, and there is a car nearby," landmarks might include [roadside, car, garbage bin, injured person]. The agent will detect them in priority order and move towards them. 58 | 59 | ### Action and Observation Buffers 60 | - **Action Buffer**: Maintains a queue of pending actions to be executed, allowing the agent to plan multiple steps ahead. 61 | - **Observation Buffer**: Coordinates with the action buffer to determine which observations should be saved for later analysis. 62 | 63 | These two buffers allow the VLM to perform continuous action control on the agent during the loop process. 64 | 65 | ## Technical Implementation Details 66 | 67 | ### YOLO Detection Logic 68 | - **Confidence Thresholds**: Adapts confidence level based on whether a person has been picked up (0.1 when carrying, 0.2 when searching) 69 | - **Object Prioritization**: Hierarchical detection logic that prioritizes persons when searching and stretcher/vehicles when carrying 70 | - **Position-Based Actions**: 71 | - Central region (220-420 px): Move forward 72 | - Left region (<220 px): Turn left 73 | - Right region (>420 px): Turn right 74 | - Proximity triggers (y-coordinate thresholds): Initiate carry/drop actions 75 | 76 | ### VLM Navigation Features 77 | - **Obstacle Avoidance**: The agent alternates between walking and jumping to navigate around obstacles when searching. 78 | - **Repeated Pickup Attempts**: After finding the target person, the agent attempts to pick them up after each move. 79 | - **Optimal Placement Distance**: When placing the injured person on the stretcher, the agent backs up before dropping. 80 | - **Multi-Directional Observation**: The agent captures images from multiple directions and concatenates them. 81 | - **Visual Guidance Lines**: When moving forward, the agent divides the image into regions using vertical lines. 82 | 83 | 84 | -------------------------------------------------------------------------------- /VLM_Agent/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==2.2.4 2 | opencv_python==4.11.0.86 3 | Pillow==11.2.1 4 | Requests==2.32.3 5 | torch==2.4.0+cu124 6 | ultralytics==8.3.107 7 | ollama -------------------------------------------------------------------------------- /VLM_Agent/solution_yoloVLM.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import cv2 3 | import time 4 | import numpy as np 5 | import os 6 | import json 7 | from VLM_Agent.agent_VLM import agent 8 | from ultralytics import YOLO 9 | import torch 10 | import base64 11 | 12 | class AlgSolution: 13 | def __init__(self): 14 | os.system("OLLAMA_MODELS=./checkpoints ./ollama/bin/ollama serve &") 15 | time.sleep(5) 16 | if os.path.exists('/home/admin/workspace/job/logs/'): 17 | self.handle = open('/home/admin/workspace/job/logs/user.log', 'w') 18 | else: 19 | self.handle = open('user.log', 'w') 20 | self.handle.write("ollama loaded\n") 21 | # Initialize VLM agent 22 | self.vlm_agent = agent(reference_text=None,reference_image=None) 23 | 24 | # Initialize detection model 25 | self.yolo_model = YOLO('checkpoints/yolo11x.pt') 26 | 27 | # State tracking 28 | self.person_detected = False 29 | self.stretcher_detected = False 30 | self.current_target = None # 'person' or 'stretcher' 31 | 32 | self.myinfo = {'picked': 0} 33 | self.prev = { 34 | 'angular': 0, 35 | 'velocity': 0, 36 | 'viewport': 0, 37 | 'interaction': 0, 38 | } 39 | 40 | self.foreward = ([0,50],0,0) 41 | 42 | self.backward = ([0,-50],0,0) 43 | self.turnleft = ([-20,0],0,0) 44 | self.turnright = ([20,0],0,0) 45 | self.carry = ([0,0],0,3) 46 | self.drop = ([0,0],0,4) 47 | self.noaction = ([0,0],0,0) 48 | 49 | 50 | def predicts(self, ob, success): 51 | if self.prev['interaction'] == 3 and success: 52 | self.myinfo = {'picked': 1} 53 | 54 | ob = base64.b64decode(ob) 55 | ob = cv2.imdecode(np.frombuffer(ob, np.uint8), cv2.IMREAD_COLOR) 56 | pred = self.predict(ob, self.myinfo) 57 | res = { 58 | 'angular': pred[0][0], 59 | 'velocity': pred[0][1], 60 | 'viewport': pred[1], 61 | 'interaction': pred[2], 62 | } 63 | self.prev = res 64 | return res 65 | 66 | 67 | 68 | def reset(self, reference_text, reference_image): 69 | reference_image=base64.b64decode(reference_image) 70 | reference_image = cv2.imdecode(np.frombuffer(reference_image, np.uint8), cv2.IMREAD_COLOR) 71 | # print('reset obs shape after decode', reference_image.shape) 72 | self.vlm_agent.reset(reference_text, reference_image) 73 | self.person_detected = False 74 | self.stretcher_detected = False 75 | self.current_target = None 76 | self.myinfo = {'picked': 0} 77 | 78 | 79 | def draw_bbox_on_obs(self,obs, boxes, labels, color=(0, 255, 0), thickness=2): 80 | """ 81 | Draw bounding boxes on the observation image. 82 | 83 | Args: 84 | obs: The observation image (numpy array). 85 | boxes: List of bounding boxes, each in the format [x, y, w, h]. 86 | labels: List of labels corresponding to the bounding boxes. 87 | color: Color of the bounding box (default: green). 88 | thickness: Thickness of the bounding box lines (default: 2). 89 | """ 90 | for box, label in zip(boxes, labels): 91 | x, y, w, h = box 92 | top_left = (int(x - w / 2), int(y - h / 2)) 93 | bottom_right = (int(x + w / 2), int(y + h / 2)) 94 | cv2.rectangle(obs, top_left, bottom_right, color, thickness) 95 | cv2.putText(obs, label, (top_left[0], top_left[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1) 96 | 97 | return obs 98 | 99 | def predict(self, obs, info): 100 | # First try to detect objects using YOLO 101 | if info['picked'] == 1: 102 | results = self.yolo_model(source=obs, imgsz=640, conf=0.1) 103 | else: 104 | results = self.yolo_model(source=obs, imgsz=640, conf=0.2) 105 | 106 | boxes = results[0].boxes # get all detected bounding box 107 | 108 | boxes_tmp = [box.xywh[0].tolist() for box in boxes] 109 | labels_tmp = [self.yolo_model.names[int(box.cls.item())] for box in results[0].boxes] 110 | 111 | # Draw bounding boxes on the observation image 112 | #obs_with_bbox = self.draw_bbox_on_obs(obs, boxes_tmp, labels_tmp) 113 | #cv2.imshow('Observation with BBox', obs_with_bbox) 114 | #cv2.waitKey(1) 115 | # Check for person and stretcher in detections 116 | person_box = None 117 | stretcher_box = None 118 | truck_box=None 119 | if info['picked']==1: 120 | self.person_detected=True 121 | for box in boxes: 122 | cls = int(box.cls.item()) 123 | if self.yolo_model.names[cls] == 'person' and not self.person_detected: 124 | person_box = box.xywh[0].tolist() 125 | elif self.yolo_model.names[cls] == 'suitcase' and self.person_detected: 126 | stretcher_box = box.xywh[0].tolist() 127 | # elif self.yolo_model.names[cls] == 'bench' and self.person_detected: 128 | # stretcher_box = box.xywh[0].tolist() 129 | elif self.yolo_model.names[cls] =='truck'and self.person_detected: 130 | truck_box = box.xywh[0].tolist() 131 | elif self.yolo_model.names[cls] =='bus'and self.person_detected: 132 | truck_box = box.xywh[0].tolist() 133 | 134 | # If we have a detection, use detection-based movement 135 | if person_box and not self.person_detected: 136 | return self._move_based_on_detection(person_box, 'person') 137 | elif stretcher_box and self.person_detected: 138 | return self._move_based_on_detection(stretcher_box, 'stretcher') 139 | elif truck_box and self.person_detected: 140 | return self._move_based_on_detection(truck_box, 'truck') 141 | 142 | 143 | # If no detection, use VLM agent for exploration 144 | print('call VLM for inference...') 145 | return self.vlm_agent.predict(obs, info) 146 | 147 | def _move_based_on_detection(self, box, target_type): 148 | x0, y0, w_, h_ = box 149 | 150 | if target_type == 'person': 151 | # if w_ > h_: 152 | if y0 - 0.5*h_ > 390 and x0>220 and x0<420: 153 | # self.person_detected = True 154 | return self.carry 155 | else: 156 | if x0 < 220: 157 | return self.turnleft 158 | elif x0 > 420: 159 | return self.turnright 160 | else: 161 | return self.foreward 162 | elif target_type =='stretcher': 163 | if y0 - 0.5*h_ > 350 and x0>220 and x0<420 : 164 | print('drop') 165 | return self.drop 166 | else: 167 | if x0 < 220: 168 | return self.turnleft 169 | elif x0 > 420: 170 | return self.turnright 171 | else: 172 | return self.foreward 173 | elif target_type == 'truck': # stretcher 174 | if (w_> 320 and h_>320) and x0>220 and x0<420 : 175 | print('drop') 176 | return self.drop 177 | else: 178 | if x0 < 220: 179 | return self.turnleft 180 | elif x0 > 420: 181 | return self.turnright 182 | else: 183 | return self.foreward 184 | 185 | 186 | -------------------------------------------------------------------------------- /dockerfile: -------------------------------------------------------------------------------- 1 | FROM ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch-ubuntu:2.3.0-cuda12.1.1-ubuntu22.04 2 | RUN pip install flask opencv-python 3 | RUN pip install opencv-python-headless matplotlib tqdm pyyaml psutil 4 | RUN pip install --no-deps ultralytics 5 | RUN mkdir -p /home/admin/atec_project 6 | COPY checkpoints /home/admin/atec_project/checkpoints 7 | COPY *.py /home/admin/atec_project/ 8 | COPY *.sh /home/admin/atec_project/ 9 | RUN ln -s /bin/python3 /bin/python 10 | -------------------------------------------------------------------------------- /gym_rescue/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | from gym.envs.registration import register 4 | import logging 5 | from gym_rescue.envs.utils.misc import load_env_setting 6 | logger = logging.getLogger(__name__) 7 | use_docker = False # True: use nvidia docker False: do not use nvidia-docker 8 | 9 | 10 | Tasks = ['Rescue'] 11 | Observations = ['Color', 'Depth', 'Rgbd', 'Mask', 'Pose','MaskDepth','ColorMask'] 12 | Actions = ['Discrete', 'Continuous', 'Mixed'] 13 | 14 | # Task-oriented envs 15 | UnrealEnv = os.environ.get('UnrealEnv') 16 | setting_files = glob.glob( 17 | os.path.join(UnrealEnv , 'settings', '*.json') 18 | ) 19 | for setting_file in setting_files: 20 | 21 | env = os.path.split(setting_file)[-1].split('.')[0] 22 | for task in Tasks: 23 | name = f'Unreal{task}-{env}' 24 | register( 25 | id=name, 26 | entry_point=f'gym_rescue.envs:{task}', 27 | kwargs={'env_file': setting_file,}, 28 | max_episode_steps=10000 29 | ) 30 | 31 | -------------------------------------------------------------------------------- /gym_rescue/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/__init__.py: -------------------------------------------------------------------------------- 1 | from gym_rescue.envs.rescue import Rescue 2 | 3 | -------------------------------------------------------------------------------- /gym_rescue/envs/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/__pycache__/base_env.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/__pycache__/base_env.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/__pycache__/rescue.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/__pycache__/rescue.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/agent/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/agent/__init__.py -------------------------------------------------------------------------------- /gym_rescue/envs/agent/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/agent/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/agent/__pycache__/character.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/agent/__pycache__/character.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/agent/character.py: -------------------------------------------------------------------------------- 1 | from unrealcv.api import UnrealCv_API 2 | import numpy as np 3 | import math 4 | import time 5 | import json 6 | import re 7 | from io import BytesIO 8 | import PIL.Image 9 | class Character_API(UnrealCv_API): 10 | def __init__(self, port=9000, ip='127.0.0.1', resolution=(160, 120), comm_mode='tcp'): 11 | super(Character_API, self).__init__(port=port, ip=ip, resolution=resolution, mode=comm_mode) 12 | self.obstacles = [] 13 | self.targets = [] 14 | self.img_color = np.zeros((resolution[1], resolution[0], 3)) 15 | self.img_depth = np.zeros((resolution[1], resolution[0], 1)) 16 | self.animation_dict = { 17 | 'stand': self.set_standup, 18 | 'jump': self.set_jump, 19 | 'crouch': self.set_crouch, 20 | 'liedown': self.set_liedown, 21 | 'open_door': self.set_open_door, 22 | 'enter_vehicle': self.enter_exit_car, 23 | 'carry':self.carry_body, 24 | 'drop':self.drop_body 25 | } 26 | 27 | def config_ue(self, resolution=(320, 240), quality=1, disable_all_screen_messages=True, Lumen=False): 28 | """ 29 | Configure Unreal Engine settings. 30 | 31 | Args: 32 | resolution (tuple): The resolution of the display window. 33 | quality (bool): The quality of the display window. From 0 to 3. 34 | disable_all_screen_messages (bool): Whether to disable all screen messages. Default is True. 35 | """ 36 | self.check_connection() 37 | [w, h] = resolution 38 | self.client.request(f'vrun setres {w}x{h}w', -1) # set resolution of the display window 39 | if disable_all_screen_messages: 40 | self.client.request('DisableAllScreenMessages', -1) # disable all screen messages 41 | if quality > 0: 42 | self.client.request(f'vrun sg.ShadowQuality {quality}', -1) # set shadow quality to low 43 | self.client.request(f'vrun sg.TextureQuality {quality}', -1) # set texture quality to low 44 | self.client.request(f'vrun sg.EffectsQuality {quality}', -1) # set effects quality to low 45 | if Lumen: 46 | self.client.request('vset /camera/1/illumination Lumen', -1) 47 | self.client.request('vset /camera/1/reflection Lumen', -1) 48 | time.sleep(0.1) 49 | 50 | def init_mask_color(self, targets=None): 51 | if targets == 'all': 52 | self.targets = self.get_objects() 53 | self.color_dict = self.build_color_dict(self.targets) 54 | elif targets is not None: 55 | self.targets = targets 56 | self.color_dict = self.build_color_dict(self.targets) 57 | 58 | def get_observation(self, cam_id, observation_type, mode='bmp'): 59 | if observation_type == 'Color': 60 | self.img_color = state = self.get_image(cam_id, 'lit', mode) 61 | elif observation_type == 'Mask': 62 | self.img_color = state = self.get_image(cam_id, 'object_mask', mode) 63 | elif observation_type == 'Depth': 64 | self.img_depth = state = self.get_depth(cam_id) 65 | elif observation_type == 'Rgbd': 66 | state = self.get_image_multimodal(cam_id, ['lit', 'depth'], [mode, 'npy']) 67 | self.img_color = state[..., :3] 68 | self.img_depth = state[..., 3:] 69 | state = np.append(self.img_color, self.img_depth, axis=2) 70 | elif observation_type == 'Gray': 71 | self.img_color = self.read_image(cam_id, 'lit', mode) 72 | self.img_gray = self.img_color.mean(2) 73 | self.img_gray = np.expand_dims(self.img_gray, -1) 74 | elif observation_type == 'Pose': 75 | state = self.get_cam_pose(cam_id) # camera 6D pose 76 | return state 77 | 78 | # functions for character setting 79 | def set_max_speed(self, player, speed, return_cmd=False): 80 | # set the max velocity of the agent object 81 | cmd = f'vbp {player} set_speed {speed}' 82 | res = None 83 | while res is None: 84 | res = self.client.request(cmd) 85 | return speed 86 | 87 | def set_acceleration(self, player, acc): 88 | cmd = f'vbp {player} set_acc {acc}' 89 | res = None 90 | while res is None: 91 | res = self.client.request(cmd.format(player=player, acc=acc)) 92 | return acc 93 | 94 | def set_appearance(self, player, id): 95 | # set the appearance of the agent object, id: 0~19 96 | cmd = f'vbp {player} set_app {id}' 97 | res = None 98 | while res is None: 99 | res = self.client.request(cmd.format(player=player, id=id), -1) 100 | return id 101 | 102 | def move_cam_2d(self, cam_id, angle, distance): 103 | # move the camera in 2D as a mobile agent 104 | self.move_cam_forward(cam_id, angle, distance, height=0, pitch=0) 105 | 106 | def get_speed(self, player): 107 | cmd = f'vbp {player} get_speed' 108 | res = None 109 | while res is None: 110 | res = self.client.request(cmd) 111 | return self.decoder.string2vector(res)[0] 112 | 113 | def get_angle(self, player): 114 | cmd = f'vbp {player} get_angle' 115 | res = None 116 | while res is None: 117 | res = self.client.request(cmd) 118 | return self.decoder.string2vector(res)[0] 119 | 120 | def reset_player(self, player): 121 | cmd = f'vbp {player} reset' 122 | res=None 123 | while res is None: 124 | res = self.client.request(cmd) 125 | 126 | def set_phy(self, obj, state): 127 | cmd = f'vbp {obj} set_phy {state}' 128 | res=None 129 | while res is None: 130 | res = self.client.request(cmd, -1) 131 | 132 | def simulate_physics(self, objects): 133 | res = [self.set_phy(obj, 1) for obj in objects] 134 | 135 | def set_move_bp(self, player, params, return_cmd=False): 136 | ''' 137 | new move function, can adapt to different number of params 138 | 2 params: [v_angle, v_linear], used for agents moving in plane, e.g. human, car, animal 139 | 4 params: [v_ x, v_y, v_z, v_yaw], used for agents moving in 3D space, e.g. drone 140 | ''' 141 | params_str = ' '.join([str(param) for param in params]) 142 | cmd = f'vbp {player} set_move {params_str}' 143 | if return_cmd: 144 | return cmd 145 | res = None 146 | while res is None: 147 | res = self.client.request(cmd, -1) 148 | 149 | # functions for character actions 150 | def set_jump(self, player, return_cmd=False): 151 | cmd = f'vbp {player} set_jump' 152 | if return_cmd: 153 | return cmd 154 | res = None 155 | while res is None: 156 | res = self.client.request(cmd, -1) 157 | 158 | def set_crouch(self, player, return_cmd=False): 159 | cmd = f'vbp {player} set_crouch' 160 | if return_cmd: 161 | return cmd 162 | res = None 163 | while res is None: 164 | res = self.client.request(cmd, -1) 165 | 166 | def set_liedown(self, player, directions=(100, 100), return_cmd=False): 167 | frontback = directions[0] 168 | leftright = directions[1] 169 | cmd = f'vbp {player} set_liedown {frontback} {leftright}' 170 | if return_cmd: 171 | return cmd 172 | res = None 173 | while res is None: 174 | res = self.client.request(cmd, -1) 175 | 176 | def set_standup(self, player, return_cmd=False): 177 | cmd = f'vbp {player} set_standup' 178 | if return_cmd: 179 | return cmd 180 | res = None 181 | while res is None: 182 | res = self.client.request(cmd, -1) 183 | 184 | def set_animation(self, player, anim_id, return_cmd=False): 185 | return self.animation_dict[anim_id](player, return_cmd=return_cmd) 186 | 187 | def get_hit(self, player): 188 | cmd = f'vbp {player} get_hit' 189 | res = None 190 | while res is None: 191 | res = self.client.request(cmd) 192 | if '1' in res: 193 | return True 194 | if '0' in res: 195 | return False 196 | 197 | def set_random(self, player, value=1): 198 | cmd = f'vbp {player} set_random {value}' 199 | res=None 200 | while res is None: 201 | res = self.client.request(cmd, -1) 202 | 203 | def set_interval(self, player, interval): 204 | cmd = f'vbp {player} set_interval {interval}' 205 | res = None 206 | while res is None: 207 | res = self.client.request(cmd, -1) 208 | 209 | def init_objects(self, objects): 210 | self.objects_dict = dict() 211 | for obj in objects: 212 | # print (obj) 213 | self.objects_dict[obj] = self.get_obj_location(obj) 214 | return self.objects_dict 215 | 216 | def random_obstacles(self, objects, img_dirs, num, area, start_area, texture=False): 217 | sample_index = np.random.choice(len(objects), num, replace=False) 218 | for id in sample_index: 219 | obstacle = objects[id] 220 | self.obstacles.append(obstacle) 221 | # texture 222 | if texture: 223 | img_dir = img_dirs[np.random.randint(0, len(img_dirs))] 224 | self.set_texture(obstacle, (1, 1, 1), np.random.uniform(0, 1, 3), img_dir, np.random.randint(1, 4)) 225 | # scale 226 | # self.set_obj_scale(obstacle, np.random.uniform(0.3, 3, 3)) 227 | self.set_obj_scale(obstacle, np.random.uniform(0.5, 4, 3)) 228 | 229 | # location 230 | obstacle_loc = [start_area[0], start_area[2], 0] 231 | while start_area[0] <= obstacle_loc[0] <= start_area[1] and start_area[2] <= obstacle_loc[1] <= start_area[3]: 232 | obstacle_loc[0] = np.random.uniform(area[0]+100, area[1]-100) 233 | obstacle_loc[1] = np.random.uniform(area[2]+100, area[3]-100) 234 | obstacle_loc[2] = np.random.uniform(area[4], area[5]) -100 235 | self.set_obj_location(obstacle, obstacle_loc) 236 | time.sleep(0.01) 237 | 238 | def clean_obstacles(self): 239 | for obj in self.obstacles: 240 | self.set_obj_location(obj, self.objects_dict[obj]) 241 | self.obstacles = [] 242 | 243 | def new_obj(self, obj_class_name, obj_name, loc, rot=[0, 0, 0]): 244 | # spawn, set obj pose, enable physics 245 | [x, y, z] = loc 246 | [pitch, yaw, roll] = rot 247 | if obj_class_name =="bp_character_C" or obj_class_name =="target_C": 248 | cmd = [f'vset /objects/spawn {obj_class_name} {obj_name}', 249 | f'vset /object/{obj_name}/location {x} {y} {z}', 250 | f'vset /object/{obj_name}/rotation {pitch} {yaw} {roll}', 251 | f'vbp {obj_name} set_phy 0' 252 | ] 253 | else: 254 | cmd = [f'vset /objects/spawn {obj_class_name} {obj_name}', 255 | f'vset /object/{obj_name}/location {x} {y} {z}', 256 | f'vset /object/{obj_name}/rotation {pitch} {yaw} {roll}', 257 | f'vbp {obj_name} set_phy 1' 258 | ] 259 | self.client.request(cmd, -1) 260 | return obj_name 261 | 262 | def set_cam(self, obj, loc=[0, 30, 70], rot=[0, 0, 0], return_cmd=False): 263 | # set the camera pose relative to a actor 264 | x, y, z = loc 265 | roll, pitch, yaw = rot 266 | cmd = f'vbp {obj} set_cam {x} {y} {z} {roll} {pitch} {yaw}' 267 | if return_cmd: 268 | return cmd 269 | res = self.client.request(cmd, -1) 270 | return res 271 | 272 | def adjust_fov(self, cam_id, delta_fov, min_max=[45, 135]): # increase/decrease fov 273 | return self.set_fov(cam_id, np.clip(self.cam[cam_id]['fov']+delta_fov, min_max[0], min_max[1])) 274 | 275 | def stop_car(self, obj): 276 | cmd = f'vbp {obj} set_stop' 277 | res = self.client.request(cmd, -1) 278 | return res 279 | 280 | def nav_to_goal(self, obj, loc): # navigate the agent to a goal location 281 | # Assign the agent a navigation goal, and use Navmesh to automatically control its movement to reach the goal via the shortest path. 282 | # The goal should be reachable in the environment. 283 | x, y, z = loc 284 | cmd = f'vbp {obj} nav_to_goal {x} {y} {z}' 285 | res = self.client.request(cmd, -1) 286 | return res 287 | 288 | def nav_to_goal_bypath(self, obj, loc): # navigate the agent to a goal location 289 | # Assign the agent a navigation goal, and use Navmesh to automatically control its movement to reach the goal via the shortest path. 290 | # The goal should be reachable in the environment. 291 | x, y, z = loc 292 | cmd = f'vbp {obj} nav_to_goal {x} {y} {z}' 293 | res = self.client.request(cmd, -1) 294 | return res 295 | 296 | def nav_to_random(self, obj, radius, loop): # navigate the agent to a random location 297 | # Agent randomly selects a point within its own radius range for navigation. 298 | # The loop parameter controls whether continuous navigation is performed.(True for continuous navigation). 299 | # Return with the randomly sampled location. 300 | cmd = f'vbp {obj} nav_random {radius} {loop}' 301 | res = self.client.request(cmd) 302 | return res 303 | 304 | def nav_to_obj(self, obj, target_obj, distance=200): # navigate the agent to a target object 305 | # Assign the agent a navigation goal, and use Navmesh to automatically control its movement to reach the goal via the shortest path. 306 | cmd = f'vbp {obj} nav_to_target {target_obj} {distance}' 307 | res = self.client.request(cmd, -1) 308 | return res 309 | 310 | def nav_random(self, player, radius, loop): # navigate the agent to a random location 311 | # Agent randomly selects a point within its own radius range for navigation. 312 | # The loop parameter controls whether continuous navigation is performed.(True for continuous navigation). 313 | # Return with the randomly sampled location. 314 | cmd = f'vbp {player} nav_random {radius} {loop}' 315 | res = self.client.request(cmd) 316 | return self.decoder.string2vector(res) 317 | 318 | def generate_nav_goal(self, player, radius_max,radius_min): # navigate the agent to a random location 319 | # Agent randomly selects a point within its own radius range for navigation. 320 | # The loop parameter controls whether continuous navigation is performed.(True for continuous navigation). 321 | # Return with the randomly sampled location. 322 | cmd = f'vbp {player} generate_nav_goal {radius_max} {radius_min} ' 323 | res = self.client.request(cmd) 324 | answer_dict = json.loads(res) 325 | try: 326 | loc = answer_dict["nav_goal"] 327 | except: 328 | loc = answer_dict["Nav_goal"] 329 | coordinates = re.findall(r"[-+]?\d*\.\d+|\d+", loc) 330 | # Convert the numbers to floats and store them in an array 331 | coordinates = [float(coordinate) for coordinate in coordinates] 332 | return coordinates[0],coordinates[1],coordinates[2] 333 | 334 | def set_max_nav_speed(self, obj, max_vel): # set the maximum navigation speed of the car 335 | cmd = f'vbp {obj} set_nav_speed {max_vel}' 336 | res = self.client.request(cmd, -1) 337 | return res 338 | 339 | def enter_exit_car(self, obj, player_index): 340 | # enter or exit the car for a player. 341 | # If the player is already in the car, it will exit the car. Otherwise, it will enter the car. 342 | cmd = f'vbp {obj} enter_exit_car {player_index}' 343 | res = self.client.request(cmd, -1) 344 | return res 345 | 346 | def set_open_door(self, player, return_cmd=False): 347 | # state: 0 close, 1 open 348 | cmd = f'vbp {player} set_open_door 1' 349 | if return_cmd: 350 | return cmd 351 | else: 352 | self.client.request(cmd, -1) 353 | def carry_body(self,player,return_cmd=False): 354 | cmd = f'vbp {player} carry_body' 355 | if return_cmd: 356 | return cmd 357 | else: 358 | self.client.request(cmd, -1) 359 | 360 | def drop_body(self,player, return_cmd=False): 361 | cmd = f'vbp {player} drop_body' 362 | if return_cmd: 363 | return cmd 364 | else: 365 | self.client.request(cmd, -1) 366 | 367 | def Is_picked(self,player, return_cmd = False): 368 | # check if the player is picked up 369 | cmd = f'vbp {player} is_picked' 370 | if return_cmd: 371 | return cmd 372 | else: 373 | res = self.client.request(cmd) 374 | if '1' in res: 375 | return True 376 | if '0' in res: 377 | return False 378 | def is_carrying(self,player, return_cmd = False): 379 | # check if the player is carrying a body 380 | cmd = f'vbp {player} is_carrying' 381 | if return_cmd: 382 | return cmd 383 | else: 384 | res = self.client.request(cmd) 385 | if '1' in res: 386 | return True 387 | if '0' in res: 388 | return False 389 | 390 | def set_viewport(self, player): 391 | # set the game window to the player's view 392 | cmd = f'vbp {player} set_viewport' 393 | res = self.client.request(cmd, -1) 394 | return res 395 | 396 | def get_pose_img_batch(self, objs_list, cam_ids, img_flag=[False, True, False, False]): 397 | # get pose and image of objects in objs_list from cameras in cam_ids 398 | cmd_list = [] 399 | decoder_list = [] 400 | [use_cam_pose, use_color, use_mask, use_depth] = img_flag 401 | for obj in objs_list: 402 | cmd_list.extend([self.get_obj_location(obj, True), 403 | self.get_obj_rotation(obj, True)]) 404 | 405 | for cam_id in cam_ids: 406 | if cam_id < 0: 407 | continue 408 | if use_cam_pose: 409 | cmd_list.extend([self.get_cam_location(cam_id, return_cmd=True), 410 | self.get_cam_rotation(cam_id, return_cmd=True)]) 411 | if use_color: 412 | cmd_list.append(self.get_image(cam_id, 'lit', 'bmp', return_cmd=True)) 413 | if use_mask: 414 | cmd_list.append(self.get_image(cam_id, 'object_mask', 'bmp', return_cmd=True)) 415 | if use_depth: 416 | cmd_list.append(f'vget /camera/{cam_id}/depth npy') 417 | # cmd_list.append(self.get_image(cam_id, 'depth', 'bmp', return_cmd=True)) 418 | 419 | decoders = [self.decoder.decode_map[self.decoder.cmd2key(cmd)] for cmd in cmd_list] 420 | res_list = self.batch_cmd(cmd_list, decoders) 421 | obj_pose_list = [] 422 | cam_pose_list = [] 423 | img_list = [] 424 | mask_list = [] 425 | depth_list = [] 426 | # start to read results 427 | start_point = 0 428 | for i, obj in enumerate(objs_list): 429 | obj_pose_list.append(res_list[start_point] + res_list[start_point+1]) 430 | start_point += 2 431 | for i, cam_id in enumerate(cam_ids): 432 | # print(cam_id) 433 | if cam_id < 0: 434 | img_list.append(np.zeros((self.resolution[1], self.resolution[0], 3), dtype=np.uint8)) 435 | continue 436 | if use_cam_pose: 437 | cam_pose_list.append(res_list[start_point] + res_list[start_point+1]) 438 | start_point += 2 439 | if use_color: 440 | # image = self.decoder.decode_bmp(res_list[start_point]) 441 | img_list.append(res_list[start_point]) 442 | start_point += 1 443 | if use_mask: 444 | # image = self.decoder.decode_bmp(res_list[start_point]) 445 | mask_list.append(res_list[start_point]) 446 | start_point += 1 447 | if use_depth: 448 | # image = 1 / self.decoder.decode_depth(res_list[start_point],bytesio=False) 449 | # image = self.decoder.decode_depth(res_list[start_point],bytesio=False) 450 | image = self.get_depth(cam_id,show=False) 451 | # image = np.expand_dims(image, axis=-1) 452 | depth_list.append(image) # 500 is the default max depth of most depth cameras 453 | # depth_list.append(res_list[start_point]) # 500 is the default max depth of most depth cameras 454 | start_point += 1 455 | 456 | return obj_pose_list, cam_pose_list, img_list, mask_list, depth_list 457 | 458 | # Domain Randomization Functions: randomize texture 459 | def set_texture(self, player, color=(1, 1, 1), param=(0, 0, 0), picpath=None, tiling=1, e_num=0): #[r, g, b, meta, spec, rough, tiling, picpath] 460 | param = param / param.max() 461 | r, g, b = color 462 | meta, spec, rough = param 463 | cmd = f'vbp {player} set_mat {e_num} {r} {g} {b} {meta} {spec} {rough} {tiling} {picpath}' 464 | self.client.request(cmd, -1) 465 | 466 | def set_light(self, obj, direction, intensity, color): # param num out of range 467 | [roll, yaw, pitch] = direction 468 | color = color / color.max() 469 | [r, g, b] = color 470 | cmd = f'vbp {obj} set_light {roll} {yaw} {pitch} {intensity} {r} {g} {b}' 471 | self.client.request(cmd, -1) 472 | 473 | def random_texture(self, backgrounds, img_dirs, num=5): 474 | if num < 0: 475 | sample_index = range(len(backgrounds)) 476 | else: 477 | sample_index = np.random.choice(len(backgrounds), num, replace=False) 478 | for id in sample_index: 479 | obj = backgrounds[id] 480 | img_dir = img_dirs[np.random.randint(0, len(img_dirs))] 481 | self.set_texture(obj, (1, 1, 1), np.random.uniform(0, 1, 3), img_dir, np.random.randint(1, 4)) 482 | time.sleep(0.03) 483 | 484 | def random_player_texture(self, player, img_dirs, num): 485 | sample_index = np.random.choice(5, num) 486 | for id in sample_index: 487 | img_dir = img_dirs[np.random.randint(0, len(img_dirs))] 488 | self.set_texture(player, (1, 1, 1), np.random.uniform(0, 1, 3), 489 | img_dir, np.random.randint(2, 6), id) 490 | time.sleep(0.03) 491 | 492 | def random_character(self, player): # appearance, speed, acceleration 493 | self.set_max_speed(player, np.random.randint(40, 100)) 494 | self.set_acceleration(player, np.random.randint(100, 300)) 495 | 496 | def random_lit(self, light_list): 497 | for lit in light_list: 498 | if 'sky' in lit: 499 | self.set_skylight(lit, [1, 1, 1], np.random.uniform(1, 10)) 500 | else: 501 | lit_direction = np.random.uniform(-1, 1, 3) 502 | if 'directional' in lit: 503 | lit_direction[0] = lit_direction[0] * 60 504 | lit_direction[1] = lit_direction[1] * 80 505 | lit_direction[2] = lit_direction[2] * 60 506 | else: 507 | lit_direction *= 180 508 | self.set_light(lit, lit_direction, np.random.uniform(1, 5), np.random.uniform(0.3, 1, 3)) 509 | 510 | def set_skylight(self, obj, color, intensity): # param num out of range 511 | [r, g, b] = color 512 | cmd = f'vbp {obj} set_light {r} {g} {b} {intensity}' 513 | self.client.request(cmd, -1) 514 | 515 | def get_obj_speed(self,obj): 516 | cmd = f'vbp {obj} get_speed' 517 | res = self.client.request(cmd) 518 | answer_dict = json.loads(res) 519 | speed = float(answer_dict["Speed"]) 520 | return speed 521 | 522 | def check_visibility(self, tracker_cam_id, target_obj): 523 | # get the percentage of the target object visible in the tracker camera 524 | mask = self.get_image(tracker_cam_id, 'object_mask') 525 | print(mask.shape) 526 | mask, bbox = self.get_bbox(mask, target_obj, normalize=False) 527 | mask_percent = mask.sum()/(mask.shape[0]*mask.shape[1]) 528 | return mask_percent -------------------------------------------------------------------------------- /gym_rescue/envs/base_env.py: -------------------------------------------------------------------------------- 1 | import os 2 | import warnings 3 | import gym 4 | import numpy as np 5 | from gym import spaces 6 | from gym_rescue.envs.utils import misc 7 | from unrealcv.launcher import RunUnreal 8 | from gym_rescue.envs.agent.character import Character_API 9 | import random 10 | import sys 11 | import time 12 | import warnings 13 | import cv2 14 | # warnings.filterwarnings("ignore", category=UserWarning) 15 | ''' 16 | It is a base env for general purpose agent-env interaction, including single/multi-agent navigation, tracking, etc. 17 | Observation : raw color image and depth 18 | Action: Discrete/Continuous 19 | Done : define by the task wrapper 20 | ''' 21 | 22 | class UnrealCv_base(gym.Env): 23 | def __init__(self, 24 | setting_file, # the setting file to define the task 25 | action_type='Discrete', # 'discrete', 'continuous' 26 | observation_type='Color', # 'color', 'depth', 'rgbd', 'Gray' 27 | resolution=(160, 160), 28 | reset_type=0 29 | ): 30 | 31 | setting = misc.load_env_setting(setting_file) 32 | self.env_name = setting['env_name'] 33 | # self.max_steps = setting['max_steps'] 34 | self.height = setting['height'] 35 | self.cam_id = [setting['third_cam']['cam_id']] 36 | self.agent_configs = setting['agents'] 37 | self.env_configs = setting["env"] 38 | self.agents = misc.convert_dict(self.agent_configs) 39 | self.reset_type=reset_type 40 | # TODO: it is useless. 41 | self.character = { 42 | 'player': [], # the list of player to control 43 | 'npc': [], # the list of Non-player character 44 | 'freeze': [], # the list of player that exists in the scene, but it is frozen 45 | } 46 | 47 | self.height_top_view = setting['third_cam']['height_top_view'] 48 | 49 | # self.env_obj_list = self.env_configs[""] 50 | self.objects_list = [] 51 | self.reset_area = setting['reset_area'] 52 | 53 | self.safe_start = setting['safe_start'] 54 | self.interval = setting['interval'] 55 | self.random_init = setting['random_init'] 56 | self.start_area = self.get_start_area(self.safe_start[0], 500) 57 | # the start area of the agent, where we don't put obstacles when using environment augmentation 58 | self.count_eps = 0 59 | self.count_steps = 0 60 | 61 | # env configs 62 | self.docker = False 63 | self.resolution = resolution 64 | self.display = None 65 | self.use_opengl = False 66 | self.offscreen_rendering = False 67 | self.nullrhi = False 68 | self.gpu_id = None # None means using the default gpu 69 | self.sleep_time = 5 70 | self.launched = False 71 | self.comm_mode = 'tcp' # 'tcp' or 'unix', 'unix' is only available in Linux 72 | self.render_quality = 2 73 | self.use_lumen = False 74 | 75 | self.agents_category = ['player', 'animal'] # the agent category we use in the env 76 | self.protagonist_id = 0 # the player index that we use to interact with the env 77 | 78 | # init agents 79 | self.player_list = list(self.agents.keys()) 80 | self.cam_list = [self.agents[player]['cam_id'] for player in self.player_list] 81 | 82 | # define action space 83 | self.action_type = action_type 84 | assert self.action_type in ['Discrete', 'Continuous', 'Mixed'] 85 | # self.action_space = [self.define_action_space(self.action_type, self.agents[obj]) for obj in self.player_list] 86 | self.action_space = self.define_action_space(self.action_type,self.agents[self.player_list[self.protagonist_id]]) 87 | # define observation space, 88 | # color, depth, rgbd,... 89 | self.observation_type = observation_type 90 | assert self.observation_type in ['Color', 'Depth', 'Rgbd', 'Gray', 'CG', 'Mask', 'Pose','MaskDepth','ColorMask'] 91 | # self.observation_space = [self.define_observation_space(self.cam_list[i], self.observation_type, resolution) 92 | # for i in range(len(self.player_list))] 93 | self.observation_space = self.define_observation_space(self.cam_list[self.protagonist_id], self.observation_type, resolution) 94 | self.goal_show = np.zeros((resolution[1], resolution[0], 3), dtype=np.uint8) 95 | # config unreal env 96 | if 'linux' in sys.platform: 97 | env_bin = setting['env_bin']["Linux"] 98 | elif 'darwin' in sys.platform: 99 | env_bin = setting['env_bin']["Mac"] 100 | elif 'win' in sys.platform: 101 | env_bin = setting['env_bin']["Win"] 102 | 103 | if 'env_map' in setting.keys(): 104 | self.env_map = setting['env_map'] 105 | else: 106 | self.env_map = None 107 | 108 | self.ue_binary = RunUnreal(ENV_BIN=env_bin, ENV_MAP=self.env_map) # the binary file to launch the UE5 109 | self.info = dict( 110 | Collision=0, 111 | termination=False, 112 | truncation=False, 113 | Reward=0.0, 114 | Action=None, 115 | Pose=[], 116 | Steps=0, 117 | Direction=None, 118 | Distance=None, 119 | Color=None, 120 | Depth=None, 121 | Relative_Pose=[], 122 | picked = False, 123 | reference_text=[], 124 | reference_image=[], 125 | ) 126 | def step(self, actions): 127 | 128 | actions2move, actions2turn, actions2animate = self.action_mapping(actions, self.player_list) 129 | if self.info['picked'] and actions2animate[self.protagonist_id] == 'jump': 130 | # the agent is holding the object, it can't jump 131 | actions2animate[self.protagonist_id] = 'stand' 132 | 133 | move_cmds = [self.unrealcv.set_move_bp(obj, actions2move[i], return_cmd=True) for i, obj in enumerate(self.player_list) if actions2move[i] is not None] 134 | head_cmds = [self.unrealcv.set_cam(obj, self.agents[obj]['relative_location'], actions2turn[i], return_cmd=True) for i, obj in enumerate(self.player_list) if actions2turn[i] is not None] 135 | anim_cmds = [self.unrealcv.set_animation(obj, actions2animate[i], return_cmd=True) for i, obj in enumerate(self.player_list) if actions2animate[i] is not None] 136 | if actions2animate[self.protagonist_id] == 'carry' or actions2animate[self.protagonist_id] == 'drop': 137 | time.sleep(1) 138 | self.unrealcv.batch_cmd(move_cmds+head_cmds+anim_cmds, None) 139 | self.count_steps += 1 140 | 141 | # get states (multi agent) 142 | obj_poses, cam_poses, imgs, masks, depths = self.unrealcv.get_pose_img_batch(self.player_list, self.cam_list, self.cam_flag) 143 | self.obj_poses = obj_poses 144 | observations = self.prepare_observation(self.observation_type, imgs, masks, depths, obj_poses) 145 | self.img_show = self.prepare_img2show(self.protagonist_id, observations) 146 | pose_obs, relative_pose = self.get_pose_states(obj_poses) 147 | 148 | # prepare the info 149 | self.info['Pose'] = obj_poses 150 | self.info['Relative_Pose'] = relative_pose 151 | self.info['Pose_Obs'] = pose_obs 152 | self.info['Reward'] = 0 153 | 154 | return observations[0], self.info['Reward'], self.info['termination'], self.info['truncation'],self.info 155 | 156 | def reset(self): 157 | 158 | if not self.launched: # first time to launch 159 | self.launched = self.launch_ue_env() # launch the UE4 binary and connect to UnrealCV 160 | self.init_agents() # init agents 161 | self.init_objects() # init objects 162 | 163 | self.count_close = 0 164 | self.count_steps = 0 165 | self.count_eps += 1 166 | 167 | # stop move and disable physics 168 | for i, obj in enumerate(self.player_list): 169 | if self.agents[obj]['agent_type'] in self.agents_category: 170 | if not self.agents[obj]['internal_nav']: 171 | # self.unrealcv.set_move_bp(obj, [0, 100]) 172 | # self.unrealcv.set_max_speed(obj, 100) 173 | continue 174 | # self.unrealcv.set_phy(obj, 1) 175 | elif self.agents[obj]['agent_type'] == 'drone': 176 | self.unrealcv.set_move_bp(obj, [0, 0, 0, 0]) 177 | self.unrealcv.set_phy(obj, 0) 178 | 179 | # reset target location 180 | if self.random_init: 181 | init_poses = self.sample_init_pose(self.random_init, len(self.player_list)) 182 | for i, obj in enumerate(self.player_list): 183 | self.unrealcv.set_obj_location(obj, init_poses[i]) 184 | # set view point 185 | self.unrealcv.set_cam(obj, self.agents[obj]['relative_location'], self.agents[obj]['relative_rotation']) 186 | # get state 187 | observations, self.obj_poses, self.img_show = self.update_observation(self.player_list, self.cam_list, self.cam_flag, self.observation_type) 188 | 189 | #latest gym version support return info in reset() 190 | #return info 191 | self.info['pose']=self.obj_poses 192 | self.info['Steps']=self.count_steps 193 | self.info['termination']=False 194 | self.info['picked']=False 195 | 196 | return observations[0], self.info 197 | 198 | def close(self): 199 | if self.launched: 200 | self.unrealcv.client.disconnect() 201 | self.ue_binary.close() 202 | self.launched = False 203 | 204 | def render(self, mode='obs', show=True,save=None): 205 | if mode == 'obs': 206 | frame = self.img_show 207 | elif mode =='ref': 208 | frame = self.goal_show 209 | else: # ref+obs 210 | self.goal_show = cv2.resize(self.goal_show, (int(1.5*self.resolution[1]), self.resolution[1])) 211 | frame = cv2.hconcat((self.goal_show, self.img_show)) 212 | if save is not None: 213 | self.goal_show = cv2.resize(self.goal_show, (int(1.5*self.resolution[1]), self.resolution[1])) 214 | frame = cv2.hconcat((self.goal_show, self.img_show)) 215 | save.write(frame) 216 | if show: 217 | cv2.imshow('frame', frame) 218 | cv2.waitKey(1) 219 | return frame,save 220 | 221 | def seed(self, seed=None): 222 | np.random.seed(seed) 223 | # if seed is not None: 224 | # self.player_num = seed % (self.max_player_num-2) + 2 225 | 226 | def update_observation(self, player_list, cam_list, cam_flag, observation_type): 227 | obj_poses, cam_poses, imgs, masks, depths = self.unrealcv.get_pose_img_batch(player_list, cam_list, cam_flag) 228 | observations = self.prepare_observation(observation_type, imgs, masks, depths, obj_poses) 229 | img_show = self.prepare_img2show(self.protagonist_id, observations) 230 | return observations, obj_poses, img_show 231 | 232 | def get_start_area(self, safe_start, safe_range): 233 | start_area = [safe_start[0]-safe_range, safe_start[0]+safe_range, 234 | safe_start[1]-safe_range, safe_start[1]+safe_range] 235 | return start_area 236 | 237 | def set_topview(self, current_pose, cam_id): 238 | cam_loc = current_pose[:3] 239 | cam_loc[-1] = self.height_top_view 240 | cam_rot = [-90, 0, 0] 241 | self.unrealcv.set_cam_location(cam_id, cam_loc) 242 | self.unrealcv.set_cam_rotation(cam_id, cam_rot) 243 | 244 | def get_relative(self, pose0, pose1): # pose0-centric 245 | delt_yaw = pose1[4] - pose0[4] 246 | angle = misc.get_direction(pose0, pose1) 247 | distance = self.unrealcv.get_distance(pose1, pose0, 3) 248 | # distance_norm = distance / self.exp_distance 249 | obs_vector = [np.sin(delt_yaw/180*np.pi), np.cos(delt_yaw/180*np.pi), 250 | np.sin(angle/180*np.pi), np.cos(angle/180*np.pi), 251 | distance] 252 | return obs_vector, distance, angle 253 | 254 | def prepare_observation(self, observation_type, img_list, mask_list, depth_list, pose_list): 255 | if observation_type == 'Depth': 256 | return np.array(depth_list) 257 | elif observation_type == 'Mask': 258 | return np.array(mask_list) 259 | elif observation_type == 'Color': 260 | return np.array(img_list) 261 | elif observation_type == 'Rgbd': 262 | return np.append(np.array(img_list), np.array(depth_list), axis=-1) 263 | elif observation_type == 'Pose': 264 | return np.array(pose_list) 265 | elif observation_type == 'MaskDepth': 266 | return np.append(np.array(mask_list), np.array(depth_list), axis=-1) 267 | elif observation_type =='ColorMask': 268 | return np.append(np.array(img_list), np.array(mask_list), axis=-1) 269 | 270 | def relative_metrics(self, relative_pose): 271 | # compute the relative relation (collision, in-the-view, misleading) among agents for rewards and evaluation metrics 272 | info = dict() 273 | relative_dis = relative_pose[:, :, 0] 274 | relative_ori = relative_pose[:, :, 1] 275 | collision_mat = np.zeros_like(relative_dis) 276 | collision_mat[np.where(relative_dis < 100)] = 1 277 | collision_mat[np.where(np.fabs(relative_ori) > 45)] = 0 # collision should be at the front view 278 | info['collision'] = collision_mat 279 | info['dis_ave'] = relative_dis.mean() # average distance among players, regard as a kind of density metric 280 | 281 | return info 282 | 283 | def add_agent(self, name, loc, refer_agent): 284 | new_dict = refer_agent.copy() 285 | cam_num = self.unrealcv.get_camera_num() 286 | self.unrealcv.new_obj(refer_agent['class_name'], name, random.sample(self.safe_start, 1)[0]) 287 | self.player_list.append(name) 288 | if self.unrealcv.get_camera_num() > cam_num: 289 | new_dict['cam_id'] = cam_num 290 | else: 291 | new_dict['cam_id'] = -1 292 | self.cam_list.append(new_dict['cam_id']) 293 | self.unrealcv.set_obj_scale(name, refer_agent['scale']) 294 | self.unrealcv.set_obj_color(name, np.random.randint(0, 255, 3)) 295 | self.unrealcv.set_random(name, 0) 296 | self.unrealcv.set_interval(self.interval, name) 297 | self.unrealcv.set_obj_location(name, loc) 298 | self.unrealcv.set_phy(name, 0) 299 | return new_dict 300 | 301 | def remove_agent(self, name): 302 | # print(f'remove {name}') 303 | agent_index = self.player_list.index(name) 304 | self.player_list.remove(name) 305 | last_cam_list = self.cam_list 306 | self.cam_list = self.remove_cam(name) 307 | self.action_space.pop(agent_index) 308 | self.observation_space.pop(agent_index) 309 | self.unrealcv.destroy_obj(name) # the agent is removed from the scene 310 | self.agents.pop(name) 311 | st_time = time.time() 312 | while self.unrealcv.get_camera_num() > len(last_cam_list) + 1: 313 | if (time.time() - st_time) > 10: 314 | print('remove agent timeout') 315 | break 316 | pass 317 | 318 | def remove_cam(self, name): 319 | cam_id = self.agents[name]['cam_id'] 320 | cam_list = [] 321 | for obj in self.player_list: 322 | if self.agents[obj]['cam_id'] > cam_id and cam_id > 0: 323 | self.agents[obj]['cam_id'] -= 1 324 | cam_list.append(self.agents[obj]['cam_id']) 325 | return cam_list 326 | 327 | def define_action_space(self, action_type, agent_info): 328 | if action_type == 'Discrete': 329 | return spaces.Discrete(len(agent_info["move_action"])) 330 | elif action_type == 'Continuous': 331 | return spaces.Box(low=np.array(agent_info["move_action_continuous"]['low']), 332 | high=np.array(agent_info["move_action_continuous"]['high']), dtype=np.float32) 333 | else: # Hybrid 334 | move_space = spaces.Box(low=np.array(agent_info["move_action_continuous"]['low']), 335 | high=np.array(agent_info["move_action_continuous"]['high']), dtype=np.float32) 336 | turn_space = None 337 | animation_space = None 338 | if "head_action" in agent_info.keys(): 339 | turn_space = spaces.Discrete(len(agent_info["head_action"])) 340 | if "animation_action" in agent_info.keys(): 341 | animation_space = spaces.Discrete(len(agent_info["animation_action"])) 342 | return spaces.Tuple((move_space, turn_space, animation_space)) 343 | 344 | def define_observation_space(self, cam_id, observation_type, resolution=(160, 120)): 345 | if observation_type == 'Pose' or cam_id < 0: 346 | observation_space = spaces.Box(low=-100, high=100, shape=(6,), 347 | dtype=np.float16) # TODO check the range and shape 348 | else: 349 | if observation_type == 'Color' or observation_type == 'CG' or observation_type == 'Mask': 350 | img_shape = (resolution[1], resolution[0], 3) 351 | observation_space = spaces.Box(low=0, high=255, shape=img_shape, dtype=np.uint8) 352 | elif observation_type == 'Depth': 353 | img_shape = (resolution[1], resolution[0], 1) 354 | observation_space = spaces.Box(low=0, high=100, shape=img_shape, dtype=np.float16) 355 | elif observation_type == 'Rgbd': 356 | s_low = np.zeros((resolution[1], resolution[0], 4)) 357 | s_high = np.ones((resolution[1], resolution[0], 4)) 358 | s_high[:, :, -1] = 100.0 # max_depth 359 | s_high[:, :, :-1] = 255 # max_rgb 360 | observation_space = spaces.Box(low=s_low, high=s_high, dtype=np.float16) 361 | elif observation_type == 'MaskDepth': 362 | s_low = np.zeros((resolution[1], resolution[0], 4)) 363 | s_high = np.ones((resolution[1], resolution[0], 4)) 364 | s_high[:, :, -1] = 100.0 # max_depth 365 | s_high[:, :, :-1] = 255 # max_rgb 366 | observation_space = spaces.Box(low=s_low, high=s_high, dtype=np.float16) 367 | elif observation_type=='ColorMask': 368 | img_shape = (resolution[1], resolution[0], 6) 369 | observation_space = spaces.Box(low=0, high=255, shape=img_shape, dtype=np.uint8) 370 | return observation_space 371 | 372 | def sample_init_pose(self, use_reset_area=False, num_agents=1): 373 | # sample poses to reset the agents 374 | if num_agents > len(self.safe_start): 375 | use_reset_area = True 376 | warnings.warn('The number of agents is less than the number of pre-defined start points, random sample points from the pre-defined area instead.') 377 | if use_reset_area: 378 | locations = self.sample_from_area(self.reset_area, num_agents) # sample from a pre-defined area 379 | else: 380 | locations = random.sample(self.safe_start, num_agents) # sample one pre-defined start point 381 | return locations 382 | 383 | def random_app(self): 384 | app_map = { 385 | 'player': range(1, 19), 386 | 'animal': range(0, 27), 387 | } 388 | for obj in self.player_list: 389 | category = self.agents[obj]['agent_type'] 390 | if category not in app_map.keys(): 391 | continue 392 | app_id = np.random.choice(app_map[category]) 393 | self.unrealcv.set_appearance(obj, app_id) 394 | 395 | def environment_augmentation(self, player_mesh=False, player_texture=False, 396 | light=False, background_texture=False, 397 | layout=False, layout_texture=False): 398 | app_map = { 399 | 'player': range(1, 19), 400 | 'animal': range(0, 27), 401 | } 402 | if player_mesh: # random human mesh 403 | for obj in self.player_list: 404 | app_id = np.random.choice(app_map[self.agents[obj]['agent_type']]) 405 | self.unrealcv.set_appearance(obj, app_id) 406 | # random light and texture of the agents 407 | if player_texture: 408 | if self.env_name == 'FlexibleRoom': # random target texture 409 | for obj in self.player_list: 410 | if self.agents[obj]['agent_type'] == 'player': 411 | self.unrealcv.random_player_texture(obj, self.textures_list, 3) 412 | if light: 413 | self.unrealcv.random_lit(self.env_configs["lights"]) 414 | 415 | # random the texture of the background 416 | if background_texture: 417 | self.unrealcv.random_texture(self.env_configs["backgrounds"], self.textures_list, 3) 418 | 419 | # random place the obstacle 420 | if layout: 421 | self.unrealcv.clean_obstacles() 422 | self.unrealcv.random_obstacles(self.objects_list, self.textures_list, 423 | 15, self.reset_area, self.start_area, layout_texture) 424 | 425 | def get_pose_states(self, obj_pos): 426 | # get the relative pose of each agent and the absolute location and orientation of the agent 427 | pose_obs = [] 428 | player_num = len(obj_pos) 429 | np.zeros((player_num, player_num, 2)) 430 | relative_pose = np.zeros((player_num, player_num, 2)) 431 | for j in range(player_num): 432 | vectors = [] 433 | for i in range(player_num): 434 | obs, distance, direction = self.get_relative(obj_pos[j], obj_pos[i]) 435 | yaw = obj_pos[j][4]/180*np.pi 436 | # rescale the absolute location and orientation 437 | abs_loc = [obj_pos[i][0], obj_pos[i][1], 438 | obj_pos[i][2], np.cos(yaw), np.sin(yaw)] 439 | obs = obs + abs_loc 440 | vectors.append(obs) 441 | relative_pose[j, i] = np.array([distance, direction]) 442 | pose_obs.append(vectors) 443 | 444 | return np.array(pose_obs), relative_pose 445 | 446 | def launch_ue_env(self): 447 | # launch the UE4 binary and connect to UnrealCV 448 | env_ip, env_port = self.ue_binary.start(docker=self.docker, resolution=self.resolution, display=self.display, 449 | opengl=self.use_opengl, offscreen=self.offscreen_rendering, 450 | nullrhi=self.nullrhi,sleep_time=10) 451 | # connect UnrealCV 452 | self.unrealcv = Character_API(port=env_port, ip=env_ip, resolution=self.resolution, comm_mode=self.comm_mode) 453 | self.unrealcv.set_map(self.env_map) 454 | self.unrealcv.config_ue(quality=self.render_quality, Lumen=self.use_lumen) 455 | return True 456 | 457 | def init_agents(self): 458 | for obj in self.player_list.copy(): # the agent will be fully removed in self.agents 459 | if self.agents[obj]['agent_type'] not in self.agents_category: 460 | self.remove_agent(obj) 461 | 462 | for obj in self.player_list: 463 | self.unrealcv.set_obj_scale(obj, self.agents[obj]['scale']) 464 | self.unrealcv.set_random(obj, 0) 465 | self.unrealcv.set_interval(self.interval, obj) 466 | 467 | self.unrealcv.build_color_dict(self.player_list) 468 | self.cam_flag = self.get_cam_flag(self.observation_type) 469 | 470 | def init_objects(self): 471 | self.unrealcv.init_objects(self.objects_list) 472 | 473 | def prepare_img2show(self, index, states): 474 | if self.observation_type == 'Rgbd': 475 | return states[index][:, :, :3] 476 | elif self.observation_type in ['Color', 'Gray', 'CG', 'Mask']: 477 | return states[index] 478 | elif self.observation_type == 'Depth': 479 | return states[index]/states[index].max() # normalize the depth image 480 | else: 481 | return None 482 | 483 | def set_population(self, num_agents): 484 | while len(self.player_list) < num_agents: 485 | refer_agent = self.agents[random.choice(list(self.agents.keys()))] 486 | name = f'{refer_agent["agent_type"]}_EP{self.count_eps}_{len(self.player_list)}' 487 | self.agents[name] = self.add_agent(name, random.choice(self.safe_start), refer_agent) 488 | while len(self.player_list) > num_agents: 489 | self.remove_agent(self.player_list[-1]) # remove the last one 490 | 491 | def set_npc(self): 492 | # TODO: set the NPC agent 493 | return self.player_list.index(random.choice([x for x in self.player_list if x > 0])) 494 | 495 | def set_agent(self): 496 | # the agent is controlled by the external controller 497 | return self.cam_list.index(random.choice([x for x in self.cam_list if x > 0])) 498 | 499 | def action_mapping(self, actions, player_list): 500 | actions2move = [] 501 | actions2animate = [] 502 | actions2head = [] 503 | for i, obj in enumerate(player_list): 504 | action_space = self.action_space 505 | act = actions[i] 506 | if act is None: # if the action is None, then we don't control this agent 507 | actions2move.append(None) # place holder 508 | actions2animate.append(None) 509 | actions2head.append(None) 510 | continue 511 | if isinstance(action_space, spaces.Discrete): 512 | actions2move.append(self.agents[obj]["move_action"][act]) 513 | actions2animate.append(None) 514 | actions2head.append(None) 515 | elif isinstance(action_space, spaces.Box): 516 | actions2move.append(act) 517 | actions2animate.append(None) 518 | actions2head.append(None) 519 | elif isinstance(action_space, spaces.Tuple): 520 | for j, action in enumerate(actions[i]): 521 | if j == 0: 522 | if isinstance(action, int): 523 | actions2move.append(self.agents[obj]["move_action"][action]) 524 | else: 525 | action = np.clip(action, self.action_space[0].low, self.action_space[0].high) 526 | actions2move.append(action) 527 | elif j == 1: 528 | if isinstance(action, int): 529 | actions2head.append(self.agents[obj]["head_action"][action]) 530 | else: 531 | actions2head.append(action) 532 | elif j == 2: 533 | actions2animate.append(self.agents[obj]["animation_action"][action]) 534 | return actions2move, actions2head, actions2animate 535 | 536 | 537 | def get_cam_flag(self, observation_type, use_color=False, use_mask=False, use_depth=False, use_cam_pose=False): 538 | # get flag for camera 539 | # observation_type: 'color', 'depth', 'mask', 'cam_pose' 540 | flag = [False, False, False, False] 541 | flag[0] = use_cam_pose 542 | flag[1] = observation_type == 'Color' or observation_type == 'Rgbd' or use_color or observation_type == 'ColorMask' 543 | flag[2] = observation_type == 'Mask' or use_mask or observation_type == 'MaskDepth' or observation_type == 'ColorMask' 544 | flag[3] = observation_type == 'Depth' or observation_type == 'Rgbd' or use_depth or observation_type == 'MaskDepth' 545 | return flag 546 | 547 | def sample_from_area(self, area, num): 548 | x = np.random.randint(area[0], area[1], num) 549 | y = np.random.randint(area[2], area[3], num) 550 | z = np.random.randint(area[4], area[5], num) 551 | return np.vstack((x, y, z)).T 552 | 553 | def get_startpoint(self, target_pos=[], distance=None, reset_area=[], exp_height=200, direction=None): 554 | for i in range(5): # searching a safe point 555 | if direction == None: 556 | direction = 2 * np.pi * np.random.sample(1) 557 | else: 558 | direction = direction % (2 * np.pi) 559 | if distance == None: 560 | x = np.random.randint(reset_area[0], reset_area[1]) 561 | y = np.random.randint(reset_area[2], reset_area[3]) 562 | else: 563 | dx = float(distance * np.cos(direction)) 564 | dy = float(distance * np.sin(direction)) 565 | x = dx + target_pos[0] 566 | y = dy + target_pos[1] 567 | cam_pos_exp = [x, y, exp_height] 568 | if reset_area[0] < x < reset_area[1] and reset_area[2] < y < reset_area[3]: 569 | cam_pos_exp[0] = x 570 | cam_pos_exp[1] = y 571 | return cam_pos_exp 572 | return [] 573 | -------------------------------------------------------------------------------- /gym_rescue/envs/rescue.py: -------------------------------------------------------------------------------- 1 | # from gym_unrealcv import settings 2 | import time 3 | from gym_rescue.envs.base_env import UnrealCv_base 4 | import numpy as np 5 | import random 6 | ''' 7 | Tasks: The task is to make the agents find the target(injured person) and rescue him. 8 | The agents are allowed to communicate with others. 9 | The agents observe the environment with their own camera. 10 | The agents are rewarded based on the distance and orientation to the target. 11 | The episode ends when the agents reach the target(injured agent) or the maximum steps are reached. 12 | ''' 13 | 14 | class Rescue(UnrealCv_base): 15 | def __init__(self, 16 | env_file, # the setting file to define the task 17 | task_file=None, # the file to define the task TODO: use this file to config task specific parameters 18 | action_type='Discrete', # 'discrete', 'continuous' 19 | observation_type='Color', # 'color', 'depth', 'rgbd', 'Gray' 20 | resolution=(160, 160), 21 | reset_type=0 22 | ): 23 | super(Rescue, self).__init__(setting_file=env_file, # the setting file to define the task 24 | action_type=action_type, # 'discrete', 'continuous' 25 | observation_type=observation_type, # 'color', 'depth', 'rgbd', 'Gray' 26 | resolution=resolution, 27 | reset_type=reset_type) 28 | self.count_reach = 0 29 | self.max_reach_steps = 5 30 | self.distance_threshold = 120 31 | self.agents_category = ['player'] 32 | self.injured_agent = self.env_configs['injured_player'][0] 33 | self.stretcher=self.env_configs['stretcher'][0] 34 | self.ambulance = self.env_configs['ambulance'][0] 35 | 36 | self.injured_player_pose = None 37 | self.rescue_pose = None 38 | self.agent_pose = None 39 | self.ambulance_pose = None 40 | 41 | def step(self, action): 42 | obs, reward, termination, truncation, info = super(Rescue, self).step(action) 43 | # compute the useful metrics for rewards and done condition 44 | current_pose = [self.unrealcv.get_cam_pose(self.cam_list[self.protagonist_id])] 45 | metrics = self.rescue_metrics(current_pose, self.target_pose) 46 | info['picked']=metrics['picked'] 47 | if info['picked']: 48 | self.target_pose = self.rescue_pose 49 | if self.first_picked is False: # the first time the agent pick the injured agent 50 | reward = 0.5 51 | self.first_picked = True 52 | 53 | # done condition 54 | current_injured_player_pose = self.unrealcv.get_obj_location(self.injured_agent)+self.unrealcv.get_obj_rotation(self.injured_agent) 55 | _, dis_tmp, dire_tmp=self.get_relative(current_injured_player_pose, self.rescue_pose) 56 | 57 | if not metrics['picked'] and dis_tmp<200: # the agent place the injured agent on the stretcher 58 | info['termination'] = True 59 | info['truncation'] = False 60 | reward = 0.5 61 | # if current_injured_player_pose[2]-self.rescue_pose[2] > 120:#the injured agent is on the stretcher 62 | # self.count_reach+=1 63 | # if self.count_reach > self.max_reach_steps: 64 | # info['termination']=True 65 | # info['truncation']=False 66 | else: 67 | self.count_reach= 0 68 | info['Reward'] = reward 69 | 70 | return obs, reward, info['termination'],info['truncation'], info 71 | 72 | def reset(self): 73 | # initialize the environment 74 | states, info = super(Rescue, self).reset() 75 | # super(Rescue, self).random_app() 76 | # for i in range(15): 77 | # self.unrealcv.set_appearance(self.injured_agent, i) 78 | # self.unrealcv.get_obj_location(self.injured_agent) 79 | # time.sleep(2) 80 | # self.unrealcv.set_appearance(self.player_list[self.protagonist_id], 10) 81 | self.target_pose = self.injured_player_pose 82 | self.count_reach = 0 83 | self.first_picked = False 84 | 85 | #NPC will randomly walking in the environment 86 | for i in range(len(self.player_list)): 87 | if i !=self.protagonist_id: 88 | self.unrealcv.nav_random(self.player_list[i],1000,1) 89 | self.player_list = [self.player_list[self.protagonist_id]] 90 | 91 | 92 | #initialize start location 93 | self.unrealcv.set_obj_rotation(self.player_list[self.protagonist_id], self.unrealcv.get_obj_rotation(self.injured_agent)) 94 | self.unrealcv.set_obj_location(self.player_list[self.protagonist_id], self.unrealcv.get_obj_location(self.injured_agent)) 95 | self.unrealcv.carry_body(self.player_list[self.protagonist_id]) 96 | time.sleep(1) 97 | self.unrealcv.drop_body(self.player_list[self.protagonist_id]) 98 | time.sleep(1) 99 | 100 | self.unrealcv.set_obj_rotation(self.player_list[self.protagonist_id], self.agent_pose[3:]) 101 | self.unrealcv.set_obj_location(self.player_list[self.protagonist_id], self.agent_pose[:3]) 102 | self.unrealcv.set_cam_fov(self.cam_list[self.protagonist_id],110) 103 | time.sleep(1) 104 | self.unrealcv.set_obj_rotation(self.injured_agent, self.injured_player_pose[3:]) 105 | self.unrealcv.set_obj_location(self.injured_agent, self.injured_player_pose[:3]) 106 | random_color = color = np.random.randint(100, 255, 3) #assign the injured agent a color for mask mode 107 | self.unrealcv.set_obj_color(self.injured_agent, random_color) 108 | 109 | time.sleep(1) 110 | self.unrealcv.set_obj_rotation(self.stretcher, self.rescue_pose[3:]) 111 | self.unrealcv.set_obj_location(self.stretcher, self.rescue_pose[:3]) 112 | time.sleep(1) 113 | self.unrealcv.set_phy(self.ambulance, 1) 114 | self.unrealcv.set_obj_rotation(self.ambulance, self.ambulance_pose[3:]) 115 | self.unrealcv.set_obj_location(self.ambulance, self.ambulance_pose[:3]) 116 | time.sleep(1) 117 | self.unrealcv.set_phy(self.ambulance, 0) 118 | # self.unrealcv.init_mask_color(self.injured_agent) 119 | # self.unrealcv.check_visibility(self.cam_list[self.protagonist_id], self.injured_agent) 120 | time.sleep(1) 121 | return states,info 122 | 123 | def reward(self, metrics): 124 | # individual reward 125 | if 'individual' in self.reward_type: 126 | if 'sparse' in self.reward_type: 127 | rewards = metrics['reach'] # only the agent who reach the target get the reward 128 | else: 129 | rewards = 1 - metrics['dis_each']/self.distance_threshold - np.fabs(metrics['ori_each'])/180 + metrics['reach'] 130 | elif 'shared' in self.reward_type: 131 | if 'sparse' in self.reward_type: 132 | rewards = metrics['reach'].max() 133 | else: 134 | rewards = 1 - metrics['dis_min']/self.distance_threshold + metrics['reach'].max() 135 | else: 136 | raise ValueError('reward type is not defined') 137 | return rewards 138 | 139 | def rescue_metrics(self, objs_pose, target_loc): 140 | # compute the relative relation (distance, collision) among agents for rewards and evaluation metrics 141 | info = dict() 142 | relative_pose = [] 143 | for obj_pos in objs_pose: 144 | obs, distance, direction = self.get_relative(obj_pos, target_loc) 145 | relative_pose.append(np.array([distance, direction])) 146 | relative_pose = np.array(relative_pose) 147 | relative_dis = relative_pose[:, 0] 148 | relative_ori = relative_pose[:, 1] 149 | reach_mat = np.zeros_like(relative_dis) 150 | reach_mat[np.where(relative_dis < self.distance_threshold)] = 1 151 | reach_mat[np.where(np.fabs(relative_ori) > 45)] = 0 # collision should be at the front view 152 | 153 | info['reach'] = reach_mat 154 | info['dis_min'] = relative_dis.min(-1) # minimal distance from agents to target 155 | info['dis_each'] = relative_dis # distance from agents to target 156 | info['ori_each'] = relative_ori # orientation from agents to target 157 | info['picked'] = self.unrealcv.Is_picked(self.injured_agent) 158 | return info 159 | -------------------------------------------------------------------------------- /gym_rescue/envs/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__init__.py -------------------------------------------------------------------------------- /gym_rescue/envs/utils/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/utils/__pycache__/keyboard_util.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__pycache__/keyboard_util.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/utils/__pycache__/misc.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/utils/__pycache__/misc.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/utils/keyboard_util.py: -------------------------------------------------------------------------------- 1 | from pynput import keyboard 2 | 3 | key_state = { 4 | 'i': False, 5 | 'j': False, 6 | 'k': False, 7 | 'l': False, 8 | 'space': False, 9 | 'ctrl':False, 10 | '1': False, 11 | '2': False, 12 | 'head_up': False, 13 | 'head_down': False, 14 | 'e': False 15 | } 16 | 17 | def on_press(key): 18 | try: 19 | if key.char in key_state: 20 | key_state[key.char] = True 21 | except AttributeError: 22 | if key == keyboard.Key.space: 23 | key_state['space'] = True 24 | if key == keyboard.Key.up: 25 | key_state['head_up'] = True 26 | if key == keyboard.Key.down: 27 | key_state['head_down'] = True 28 | if key ==keyboard.Key.ctrl_l: 29 | key_state['ctrl'] = True 30 | 31 | 32 | def on_release(key): 33 | try: 34 | if key.char in key_state: 35 | key_state[key.char] = False 36 | except AttributeError: 37 | if key == keyboard.Key.space: 38 | key_state['space'] = False 39 | if key == keyboard.Key.up: 40 | key_state['head_up'] = False 41 | if key == keyboard.Key.down: 42 | key_state['head_down'] = False 43 | if key ==keyboard.Key.ctrl_l: 44 | key_state['ctrl'] = False 45 | def get_key_action(): 46 | action = ([0, 0], 0, 0) 47 | action = list(action) # Convert tuple to list for modification 48 | action[0] = list(action[0]) # Convert inner tuple to list for modification 49 | 50 | if key_state['i']: 51 | action[0][1] = 200 52 | if key_state['k']: 53 | action[0][1] = -200 54 | if key_state['j']: 55 | action[0][0] = -30 56 | if key_state['l']: 57 | action[0][0] = 30 58 | if key_state['space']: 59 | action[2] = 1 60 | if key_state['ctrl']: 61 | action[2] = 2 62 | if key_state['1']: 63 | action[2] = 3 64 | if key_state['2']: 65 | action[2] = 4 66 | if key_state['head_up']: 67 | action[1] = 1 68 | if key_state['head_down']: 69 | action[1] = 2 70 | if key_state['e']: 71 | action[2] = 5 72 | 73 | action[0] = tuple(action[0]) # Convert inner list back to tuple 74 | action = tuple(action) # Convert list back to tuple 75 | return action 76 | -------------------------------------------------------------------------------- /gym_rescue/envs/utils/misc.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import json 4 | import unrealcv 5 | def load_env_setting(filename): 6 | f = open(get_settingpath(filename)) 7 | type = os.path.splitext(filename)[1] 8 | assert type in ['.json'], type + ' is not supported' 9 | setting = json.load(f) 10 | return setting 11 | 12 | 13 | def get_settingpath(filename): 14 | import gym_rescue 15 | gympath = os.path.dirname(gym_rescue.__file__) 16 | return os.path.join(gympath, 'envs/setting', filename) 17 | 18 | 19 | def get_action_size(action): 20 | return len(action) 21 | 22 | 23 | def get_direction(current_pose, target_pose): # get relative angle between current pose and target pose in x-y plane 24 | y_delt = target_pose[1] - current_pose[1] 25 | x_delt = target_pose[0] - current_pose[0] 26 | if x_delt == 0 and y_delt == 0: # if the same position 27 | return 0 28 | angle_abs = np.arctan2(y_delt, x_delt)/np.pi*180 29 | angle_relative = angle_abs - current_pose[4] 30 | if angle_relative > 180: 31 | angle_relative -= 360 32 | if angle_relative < -180: 33 | angle_relative += 360 34 | return angle_relative 35 | 36 | 37 | def get_textures(texture_name="textures", docker=False): 38 | try: 39 | texture_dir = os.path.join(unrealcv.util.get_path2UnrealEnv(), "textures") 40 | except AttributeError: 41 | raise ImportError( 42 | "Function get_path2UnrealEnv() not found. " 43 | "Please upgrade unrealcv to version 1.1.5 or higher using: \n" 44 | "pip install --upgrade unrealcv" 45 | ) 46 | textures_list = os.listdir(texture_dir) 47 | # relative to abs 48 | for i in range(len(textures_list)): 49 | if docker: 50 | textures_list[i] = os.path.join('/unreal', texture_dir, textures_list[i]) 51 | else: 52 | textures_list[i] = os.path.join(texture_dir, textures_list[i]) 53 | return textures_list 54 | 55 | def convert_dict(old_dict): 56 | new_dict = {} 57 | for agent, info in old_dict.items(): 58 | names = info["name"] 59 | for i, name in enumerate(names): 60 | new_dict[name] = { 61 | "agent_type": agent, 62 | } 63 | for key in info.keys(): 64 | if key == "name" or key == "cam_id" or key == "class_name": 65 | new_dict[name][key] = info[key][i] 66 | else: 67 | new_dict[name][key] = info[key] 68 | return new_dict -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__init__.py -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/configUE.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/configUE.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/early_done.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/early_done.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/monitor.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/monitor.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/population.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/population.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/task_cue.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/task_cue.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/__pycache__/time_dilation.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/atecup/atec2025_software_algorithm/1aa52b3a275c9624acd5761f2f15b08ac5af033d/gym_rescue/envs/wrappers/__pycache__/time_dilation.cpython-39.pyc -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/configUE.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import Wrapper 3 | 4 | class ConfigUEWrapper(Wrapper): 5 | def __init__(self, env, docker=False, resolution=(160, 160), display=None, offscreen=False, 6 | use_opengl=False, nullrhi=False, gpu_id=None, render_quality=2, use_lumen=False, sleep_time=5, comm_mode='tcp'): 7 | super().__init__(env) 8 | env.unwrapped.docker = docker 9 | env.unwrapped.display = display 10 | env.unwrapped.offscreen_rendering = offscreen 11 | env.unwrapped.use_opengl = use_opengl 12 | env.unwrapped.nullrhi = nullrhi 13 | env.unwrapped.gpu_id = gpu_id 14 | env.unwrapped.sleep_time = sleep_time 15 | env.unwrapped.resolution = resolution 16 | env.unwrapped.comm_mode = comm_mode 17 | env.unwrapped.use_lumen = use_lumen 18 | env.unwrapped.render_quality = render_quality 19 | # reset the observation space based on the new resolution 20 | self.observation_space = self.define_observation_space(self.cam_list[self.protagonist_id], self.observation_type, resolution) 21 | 22 | 23 | def step(self, action): 24 | obs, reward, termination,truncation, info = self.env.step(action) 25 | return obs, reward, termination,truncation, info 26 | 27 | def reset(self, **kwargs): 28 | states,info = self.env.reset(**kwargs) 29 | return states,info -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/early_done.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import Wrapper 3 | import time 4 | 5 | class EarlyDoneWrapper(Wrapper): 6 | def __init__(self, env, max_time=180): 7 | super().__init__(env) 8 | self.max_time = max_time 9 | self.count_lost = 0 # Initialize count_lost 10 | self.start_time = time.time() # Initialize start time 11 | 12 | def step(self, action): 13 | obs, reward, terminated, truncated, info = self.env.step(action) # take a step in the wrapped environment 14 | elapsed_time = time.time() - self.start_time 15 | if elapsed_time > self.max_time: 16 | truncated = True 17 | if action[0][2]==4: # if drop action is triggerd, the task is truncated 18 | truncated=True 19 | if terminated or truncated: 20 | info['execution_time'] = elapsed_time 21 | print('elapsed_time:', elapsed_time) 22 | return obs, reward, terminated, truncated, info # return the same results as the wrapped environment 23 | 24 | def reset(self, **kwargs): 25 | obs, info = self.env.reset(**kwargs) 26 | self.start_time = time.time() 27 | return obs, info # return the same results as the wrapped environment -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/monitor.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import Wrapper 3 | import time 4 | import cv2 5 | 6 | class DisplayWrapper(Wrapper): 7 | def __init__(self, env, dynamic_top_down=True, fix_camera=False, get_bbox=False): 8 | super().__init__(env) 9 | self.dynamic_top_down = dynamic_top_down 10 | self.fix_camera = fix_camera 11 | self.get_bbox = get_bbox 12 | 13 | def step(self, action): 14 | obs, reward, termination, truncation, info = self.env.step(action) # take a step in the wrapped environment 15 | # set top_down camera 16 | env = self.env.unwrapped 17 | 18 | # for recording demo 19 | if self.get_bbox: 20 | mask = env.unrealcv.get_image(env.cam_list[env.protagonist_id], 'object_mask', 'bmp') 21 | mask, bbox = env.unrealcv.get_bbox(mask, env.unwrapped.injured_agent, normalize=False) 22 | self.show_bbox(env.img_show.copy(), bbox) 23 | info['bbox'] = bbox 24 | 25 | if self.dynamic_top_down: 26 | env.set_topview(info['Pose'][env.protagonist_id], env.cam_id[0]) # set top_down camera 27 | 28 | return obs, reward, termination, truncation, info # return the same results as the wrapped environment 29 | 30 | def reset(self, **kwargs): 31 | states = self.env.reset(**kwargs) 32 | env = self.env.unwrapped 33 | if self.fix_camera: 34 | # current_pose = env.unrealcv.get_cam_pose(env.cam_list[env.protagonist_id]) 35 | center_pos = [self.ambulance_pose[0], self.ambulance_pose[1], self.ambulance_pose[2]+2000] 36 | env.set_topview(center_pos, env.cam_id[0]) 37 | if self.get_bbox: 38 | self.bbox_init = [] 39 | mask = env.unrealcv.get_image(env.cam_list[env.protagonist_id], 'object_mask', 'bmp') 40 | mask, bbox = env.unrealcv.get_bbox(mask, env.unwrapped.injured_agent, normalize=False) 41 | self.mask_percent = mask.sum()/(255 * env.resolution[0] * env.resolution[1]) 42 | self.bbox_init.append(bbox) 43 | # cv2.imshow('init', env.img_show) 44 | # cv2.waitKey(1) 45 | return states # return the same results as the wrapped environment 46 | 47 | def show_bbox(self, img2disp, bbox): 48 | # im_disp = states[0][:, :, :3].copy() 49 | cv2.rectangle(img2disp, (int(bbox[0]), int(bbox[1])), (int(bbox[2] + bbox[0]), int(bbox[3] + bbox[1])), (0, 255, 0), 5) 50 | cv2.imshow('track_res', img2disp) 51 | cv2.waitKey(1) -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/population.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import Wrapper 3 | import numpy as np 4 | 5 | class RandomPopulationWrapper(Wrapper): 6 | def __init__(self, env, num_min=2, num_max=10, random_target=False, random_tracker=False): 7 | super().__init__(env) 8 | self.min_num = num_min 9 | self.max_num = num_max 10 | self.random_target_id = random_target 11 | self.random_tracker_id = random_tracker 12 | 13 | def step(self, action): 14 | obs, reward, termination,truncation, info = self.env.step(action) 15 | return obs,reward, termination,truncation, info 16 | 17 | def reset(self, **kwargs): 18 | env = self.env.unwrapped 19 | if not env.launched: # we need to launch the environment 20 | env.launched = env.launch_ue_env() 21 | env.init_agents() 22 | env.init_objects() 23 | 24 | if self.min_num == self.max_num: 25 | env.num_agents = self.min_num 26 | else: 27 | # Randomize the number of agents 28 | env.num_agents = np.random.randint(self.min_num, self.max_num) 29 | env.set_population(env.num_agents) 30 | if self.random_tracker_id: 31 | env.tracker_id = env.sample_tracker() 32 | if self.random_target_id: 33 | new_target = env.sample_target() 34 | if new_target != env.tracker_id: # set target object mask to white 35 | env.unrealcv.build_color_dict(env.player_list) 36 | env.unrealcv.set_obj_color(env.player_list[env.target_id], env.unrealcv.color_dict[env.player_list[new_target]]) 37 | env.unrealcv.set_obj_color(env.player_list[new_target], [255, 255, 255]) 38 | env.target_id = new_target 39 | states,info = self.env.reset(**kwargs) 40 | 41 | 42 | 43 | return states,info -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/switch_env.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import Wrapper 3 | import time 4 | 5 | class SwitchEnvWrapper(Wrapper): 6 | def __init__(self, env, env_old): 7 | super().__init__(env) 8 | env.unwrapped.unrealcv = env_old.unwrapped.unrealcv 9 | env.unwrapped.ue_binary = env_old.unwrapped.ue_binary 10 | env.unwrapped.unrealcv.set_map(env.unwrapped.env_name) 11 | print('Switching to environment:', env.unwrapped.env_name) 12 | env.unwrapped.launched = True 13 | env.unwrapped.init_agents() 14 | env.unwrapped.init_objects() 15 | 16 | def step(self, action): 17 | obs, reward, terminated, truncated, info = self.env.step(action) # take a step in the wrapped environment 18 | return obs, reward, terminated, truncated, info # return the same results as the wrapped environment 19 | 20 | def reset(self, **kwargs): 21 | obs, info = self.env.reset(**kwargs) 22 | return obs, info # return the same results as the wrapped environment 23 | -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/task_cue.py: -------------------------------------------------------------------------------- 1 | import os.path 2 | import cv2 3 | import gym 4 | import gym_rescue 5 | from gym import Wrapper 6 | class TaskCueWrapper(Wrapper): 7 | def __init__(self, env, test_point): 8 | super().__init__(env) 9 | self.test_point = test_point 10 | env.unwrapped.injured_player_pose = self.test_point['injured_player_loc'] 11 | env.unwrapped.rescue_pose = self.test_point['stretcher_loc'] 12 | env.unwrapped.agent_pose = self.test_point['agent_loc'] 13 | env.unwrapped.ambulance_pose = self.test_point['ambulance_loc'] 14 | 15 | def step(self, action): 16 | obs, reward, termination,truncation, info = self.env.step(action) 17 | return obs, reward, termination,truncation, info 18 | 19 | def reset(self, **kwargs): 20 | states,info = self.env.reset(**kwargs) 21 | 22 | injured_agent_appid = self.test_point['injured_agent_id'] 23 | self.env.unwrapped.unrealcv.set_appearance(self.env.unwrapped.injured_agent, injured_agent_appid) 24 | 25 | UnrealEnv = os.environ.get('UnrealEnv') 26 | reference_image_path = os.path.join(UnrealEnv, 'ref_image', self.test_point['reference_image_path'][0]) 27 | 28 | image = cv2.imread(reference_image_path) 29 | text = self.test_point['reference_text'][0] 30 | info['reference_image'] = image 31 | info['reference_text']=text 32 | # Display the reference image and text 33 | goal_show = cv2.resize(info['reference_image'], (480, 320)) 34 | reference_text = info['reference_text'] 35 | for idx, line in enumerate([reference_text[i:i + 50] for i in range(0, len(reference_text), 50)]): 36 | y = 30 + idx * 20 37 | # Create rectangle background 38 | overlay = goal_show.copy() 39 | cv2.rectangle(overlay, (5, y - 15), (len(line) * 10 + 15, y + 5), (128, 128, 128), -1) 40 | # Apply semi-transparent overlay 41 | alpha = 0.6 42 | cv2.addWeighted(overlay, alpha, goal_show, 1 - alpha, 0, goal_show) 43 | # Draw text on top of background 44 | cv2.putText(goal_show, line, (10, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, 45 | (0, 0, 0), 1, cv2.LINE_AA) 46 | self.env.unwrapped.goal_show = goal_show 47 | return states,info -------------------------------------------------------------------------------- /gym_rescue/envs/wrappers/time_dilation.py: -------------------------------------------------------------------------------- 1 | import gym 2 | from gym import Wrapper 3 | import time 4 | 5 | class TimeDilationWrapper(Wrapper): 6 | def __init__(self, env, reference_fps=10, update_steps=10, update_dilation=True): 7 | super().__init__(env) 8 | self.dilation_factor = 1 # the factor by which to slow down the environment 9 | self.reference_fps = reference_fps # the fps at which the environment should run 10 | self.update_steps = update_steps # the number of steps after which to update the time dilation 11 | self.update_dilation = update_dilation # whether to update the time dilation or not 12 | def step(self, action): 13 | obs, reward, done, info = self.env.step(action) # take a step in the wrapped environment 14 | self.count_steps += 1 15 | if self.count_steps % self.update_steps == 0: # update the time dilation every 10 steps 16 | fps = self.count_steps / (time.time() - self.start_time) 17 | dilation_factor_new = fps / self.reference_fps # reference fps: 10 18 | # print(f'FPS: {fps}', f'Dilation factor: {dilation_factor_new}') 19 | if dilation_factor_new / self.dilation_factor > 1.1 or dilation_factor_new / self.dilation_factor < 0.9: 20 | env = self.env.unwrapped 21 | if self.update_dilation: 22 | self.dilation_factor = dilation_factor_new 23 | env.unrealcv.set_global_time_dilation(self.dilation_factor) 24 | return obs, reward, done, info # return the same results as the wrapped environment 25 | 26 | def reset(self, **kwargs): 27 | states = self.env.reset(**kwargs) 28 | self.start_time = time.time() 29 | self.count_steps = 0 30 | return states # return the same results as the wrapped environment -------------------------------------------------------------------------------- /gym_rescue/example/rescue_demo.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import gym 3 | import cv2 4 | import time 5 | from gym_rescue.envs.wrappers import time_dilation, early_done, monitor, population, configUE,task_cue 6 | from gym_rescue.envs.utils.keyboard_util import get_key_action,on_press,on_release 7 | from pynput import keyboard 8 | import os 9 | import json 10 | 11 | 12 | #start keyboard listener 13 | listener = keyboard.Listener(on_press=on_press, on_release=on_release) 14 | listener.start() 15 | 16 | 17 | class RandomAgent(object): 18 | """The world's simplest agent!""" 19 | def __init__(self, action_space): 20 | self.action_space = action_space 21 | 22 | def act(self, observation): 23 | return self.action_space.sample() 24 | 25 | if __name__ == '__main__': 26 | parser = argparse.ArgumentParser(description=None) 27 | parser.add_argument("-r", '--render', dest='render', action='store_true', help='show env using cv2') 28 | parser.add_argument("-s", '--seed', dest='seed', type=int, default=1, help='number of population') 29 | parser.add_argument("-t", '--time-dilation', dest='time_dilation', type=int, default=-1, help='time_dilation to keep fps in simulator') 30 | parser.add_argument("-n", '--nav-agent', dest='nav_agent', action='store_true', help='use nav agent to control the agents') 31 | parser.add_argument("-m", '--monitor', dest='monitor', action='store_true', help='auto_monitor') 32 | parser.add_argument("-k", '--keyboard', dest='keyboard', action='store_true', help='use keyboard to control the agents') 33 | parser.add_argument("-o", '--offscreen', dest='offscreen', action='store_true', help='offscreen rendering') 34 | parser.add_argument("-q", '--quality', dest='quality', type=int, default=3, help='render quality') 35 | parser.add_argument("-u", '--use_lumen', dest='use_lumen', action='store_true', help='use lumen') 36 | parser.add_argument("-v", '--record_video', dest='record_video', action='store_true', help='record task cues and observation into video') 37 | 38 | UnrealEnv = os.environ.get('UnrealEnv') 39 | TEST_JSONL = os.path.join(UnrealEnv, 'test_L1.jsonl') 40 | with open(TEST_JSONL, "r") as fp: 41 | try: 42 | point_id = 0 43 | for line in fp.readlines(): 44 | content = json.loads(line) 45 | env_id = content.get("env_id") 46 | timeout = content.get("timeout") 47 | 48 | InterruptedException = False 49 | args = parser.parse_args() 50 | resolution = (320, 320) 51 | if args.record_video: 52 | out = cv2.VideoWriter('output_{}.mp4'.format(point_id), cv2.VideoWriter_fourcc(*'mp4v'), 20.0, 53 | (int(2.5 * resolution[1]), resolution[1])) 54 | else: 55 | out = None 56 | # create the environment 57 | env = gym.make(env_id, action_type='Mixed', observation_type='Color') # available observation_type : ['Color', 'Depth', 'Rgbd', 'Mask', 'Pose','MaskDepth','ColorMask'] 58 | # set the rendering mode, resolution and render quality, gpu_id 59 | env = configUE.ConfigUEWrapper(env, offscreen=args.offscreen, resolution=resolution, use_lumen=args.use_lumen, render_quality=args.quality) 60 | 61 | if int(args.time_dilation) > 0: # -1 means no time_dilation, the number means the expected FPS in simulator 62 | env = time_dilation.TimeDilationWrapper(env, int(args.time_dilation)) 63 | if int(timeout) > 0: # -1 means no early_done, 180 mean 180 seconds 64 | env = early_done.EarlyDoneWrapper(env, timeout) 65 | if args.monitor: # if dynamic_top_down is True, the top-down view will follow the agent 66 | env = monitor.DisplayWrapper(env, dynamic_top_down=False, fix_camera=True,get_bbox=True) 67 | env = task_cue.TaskCueWrapper(env=env, test_point=content) 68 | 69 | agent = RandomAgent(env.action_space) 70 | rewards = 0 71 | done = False 72 | Total_rewards = 0 73 | obs,info = env.reset() # environment reset 74 | count_step = 0 75 | t0=time.time() 76 | while True: 77 | try: 78 | # print(info['reference_text']) # print the text hints to terminal 79 | if args.keyboard: 80 | action = get_key_action() # get the action from keyboard 81 | else: 82 | action = agent.act(obs) # get the action from random agent 83 | obs, reward, termination, truncation, info= env.step([action]) #environment update 84 | rewards+=reward 85 | if reward != 0: 86 | print('Reward:', reward) 87 | if args.render or args.record_video: # show the visual cues & text cues and first-person view observation 88 | frame, out=env.render(mode='ref+obs', show=args.render,save=out) 89 | count_step+=1 90 | 91 | if termination: 92 | fps = count_step / (time.time() - t0) 93 | if out is not None: 94 | out.release() 95 | print('Success') 96 | print('reward:',rewards) 97 | print('Fps:' + str(fps)) 98 | break 99 | if truncation or action[2]==4: 100 | fps = count_step / (time.time() - t0) 101 | if out is not None: 102 | out.release() 103 | print('Failed') 104 | print('reward:',rewards) 105 | print('Fps:' + str(fps)) 106 | break 107 | except KeyboardInterrupt: 108 | print('\nReceived CTRL+C. Cleaning up...') 109 | if out is not None: 110 | out.release() 111 | InterruptedException = True 112 | break 113 | except Exception as e: 114 | if out is not None: 115 | out.release() 116 | print(e) 117 | InterruptedException = True 118 | break 119 | point_id+=1 120 | if InterruptedException: 121 | if out is not None: 122 | out.release() 123 | env.close() 124 | break 125 | env.close() 126 | except Exception as e: 127 | print(e) 128 | if out is not None: 129 | out.release() 130 | env.close() 131 | exit(0) -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import gym_rescue 2 | from gym_rescue.envs.wrappers import switch_env, configUE, early_done 3 | import gym 4 | import numpy as np 5 | import cv2 6 | import os 7 | import json 8 | import base64 9 | import time 10 | import traceback 11 | 12 | from solution import AlgSolution 13 | 14 | UnrealEnv = os.environ.get('UnrealEnv') 15 | TEST_JSONL = os.path.join(UnrealEnv, 'test_L1.jsonl') 16 | 17 | def img2base64(img): 18 | buf = cv2.imencode('.png', img)[1] 19 | b64 = base64.b64encode(buf).decode('utf-8') 20 | return b64 21 | 22 | 23 | agent = AlgSolution() 24 | 25 | results = [] 26 | seed = 0 27 | 28 | rewards = [] 29 | ep_times = [] 30 | 31 | env_old = None 32 | with open(TEST_JSONL, "r") as fp: 33 | for line in fp.readlines(): 34 | 35 | content = json.loads(line) 36 | level = content.get('level') 37 | env_id = content.get("env_id") 38 | agent_loc = content.get("agent_loc") 39 | injured_player_loc = content.get("injured_player_loc") 40 | injured_agent_id = content.get("injured_agent_id") 41 | stretcher_loc = content.get("stretcher_loc") 42 | ambulance_loc= content.get("ambulance_loc") 43 | reference_text = content.get("reference_text") 44 | reference_image_path = content.get("reference_image_path") 45 | max_steps = 9999999 46 | timeout = content.get("timeout") 47 | 48 | env = gym.make(env_id, action_type='Mixed', observation_type='Color', reset_type=level) 49 | if env_old is not None: 50 | env = switch_env.SwitchEnvWrapper(env, env_old) 51 | env_old = env 52 | try: 53 | env._max_episode_steps = max_steps 54 | env = configUE.ConfigUEWrapper(env, offscreen=True, resolution=(640,480)) 55 | env = early_done.EarlyDoneWrapper(env, int(timeout)) 56 | 57 | if reference_image_path: 58 | reference_image_path = os.path.join(UnrealEnv, 'ref_image', reference_image_path[0]) 59 | reference_image = base64.b64encode(open(reference_image_path, 'rb').read()).decode("utf-8") 60 | else: 61 | reference_image = None 62 | 63 | agent.reset(reference_text, reference_image) 64 | 65 | env.seed(seed) 66 | seed += 1 67 | 68 | env.unwrapped.injured_player_pose = injured_player_loc 69 | env.unwrapped.rescue_pose = stretcher_loc 70 | env.unwrapped.agent_pose = agent_loc 71 | env.unwrapped.ambulance_pose = ambulance_loc 72 | ob, info = env.reset() 73 | env.unwrapped.unrealcv.set_appearance(env.unwrapped.injured_agent, injured_agent_id) 74 | 75 | ob, reward, termination, truncation, info = env.step([[(0, 0), 0, 0]]) 76 | current_step = 0 77 | picked_reward = 0. 78 | success = True 79 | 80 | start_t = time.time() 81 | while True: 82 | pred = agent.predicts( 83 | img2base64(ob), 84 | success, 85 | ) 86 | action = [(pred['angular'], pred['velocity']), pred['viewport'], pred['interaction']] 87 | ob, reward, termination, truncation, info = env.step([action]) 88 | if info['picked']: 89 | picked_reward = 0.5 90 | 91 | if pred['interaction'] == 3: 92 | if info['picked']: 93 | success = True 94 | else: 95 | success = False 96 | else: 97 | success = True 98 | 99 | if current_step % 1 == 0: 100 | print('step', current_step) 101 | end_t = time.time() 102 | if termination: 103 | rewards.append(1) 104 | ep_times.append(end_t - start_t) 105 | break 106 | if truncation or pred['interaction'] == 4: 107 | rewards.append(picked_reward) 108 | ep_times.append(end_t - start_t) 109 | break 110 | 111 | current_step += 1 112 | except: 113 | env.close() 114 | traceback.print_exc() 115 | exit() 116 | 117 | env.close() 118 | 119 | print('Rewards', rewards) 120 | print('Times', ep_times) 121 | rewards = np.mean(rewards) 122 | ep_times = np.sum(ep_times) 123 | print('Rewards', rewards, 'Times', ep_times) 124 | 125 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | unrealcv 2 | gym==0.26 3 | matplotlib 4 | numpy==1.21.6 5 | wget 6 | opencv-python 7 | pynput 8 | docker 9 | ultralytics 10 | -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | cd /home/admin/atec_project/ 2 | mkdir -p /home/admin/workspace/job/logs 3 | python server.py 4 | -------------------------------------------------------------------------------- /run_main.bat: -------------------------------------------------------------------------------- 1 | REM UnrealEnv need absolute path 2 | set UnrealEnv=%CD%\dataset 3 | python main.py 4 | -------------------------------------------------------------------------------- /run_main.sh: -------------------------------------------------------------------------------- 1 | DIR_PATH=`pwd` 2 | # UnrealEnv need absolute path 3 | UnrealEnv=${DIR_PATH}/dataset PYTHONPATH=./ python main.py 4 | 5 | -------------------------------------------------------------------------------- /server.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, jsonify, request 2 | from solution import AlgSolution 3 | import logging 4 | import os 5 | import sys 6 | 7 | agent = AlgSolution() 8 | app = Flask(__name__) 9 | 10 | app.config['MAX_FORM_MEMORY_SIZE'] = 64 * 1024 * 1024 11 | 12 | 13 | @app.route('/step', methods=['POST']) 14 | def step(): 15 | form = request.form 16 | action = agent.predicts(**form) 17 | return jsonify(**action) 18 | 19 | @app.route('/reset', methods=['POST']) 20 | def reset(): 21 | form = request.form 22 | agent.reset(**form) 23 | return jsonify(message="success") 24 | 25 | @app.route('/synchronize', methods=['GET']) 26 | def synchroize(): 27 | return jsonify(message="success") 28 | 29 | @app.route('/stop', methods=['POST']) 30 | def stop(): 31 | msg = request.form['msg'] 32 | f = open('/home/admin/workspace/job/logs/user.log', 'a') 33 | f.write(msg) 34 | f.close() 35 | return jsonify(message="success") 36 | 37 | if __name__ == '__main__': 38 | app.run(host='0.0.0.0', port=5000, debug=True) 39 | -------------------------------------------------------------------------------- /solution.py: -------------------------------------------------------------------------------- 1 | import time 2 | import sys 3 | import json 4 | import cv2 5 | import numpy as np 6 | import os 7 | from io import BytesIO 8 | import base64 9 | from ultralytics import YOLO 10 | # 加载预训练的YOLO模型,如果下载了特定版本的权重,请指定路径 11 | 12 | class AlgSolution: 13 | 14 | def __init__(self): 15 | self.yolo_model = YOLO('checkpoints/yolo11x.pt') # 'yolov8n.pt' is the pretrained model file name for YOLOv8 nano version, replace it according to the actual situation 16 | if os.path.exists('/home/admin/workspace/job/logs/'): 17 | self.handle = open('/home/admin/workspace/job/logs/user.log', 'w') 18 | else: 19 | self.handle = open('user.log', 'w') 20 | self.handle.write("model loaded\n") 21 | self.handle.flush() 22 | self.foreward = { 23 | 'angular': 0, # [-30, 30] 24 | 'velocity': 50, # [-100, 100], 25 | 'viewport': 0, # {0: stay, 1: look up, 2: look down}, 26 | 'interaction': 0, # 27 | } 28 | self.backward = { 29 | 'angular': 0, # [-30, 30] 30 | 'velocity': -50, # [-100, 100], 31 | 'viewport': 0, # {0: stay, 1: look up, 2: look down}, 32 | 'interaction': 0, # 33 | } 34 | self.turnleft = { 35 | 'angular': -20, # [-30, 30] 36 | 'velocity': 0, # [-100, 100], 37 | 'viewport': 0, # {0: stay, 1: look up, 2: look down}, 38 | 'interaction': 0, # 39 | } 40 | self.turnright = { 41 | 'angular': 20, # [-30, 30] 42 | 'velocity': 0, # [-100, 100], 43 | 'viewport': 0, # {0: stay, 1: look up, 2: look down}, 44 | 'interaction': 0, # 45 | } 46 | self.carry = { 47 | 'angular': 0, # [-30, 30] 48 | 'velocity': 0, # [-100, 100], 49 | 'viewport': 0, # {0: keep, 1: up, 2: down}, 50 | 'interaction': 3, 51 | } 52 | self.drop = { 53 | 'angular': 0, # [-30, 30] 54 | 'velocity': 0, # [-100, 100], 55 | 'viewport': 0, # {0: keep, 1: up, 2: down}, 56 | 'interaction': 4, 57 | } 58 | self.noaction = { 59 | 'angular': 0, # [-30, 30] 60 | 'velocity': 0, # [-100, 100], 61 | 'viewport': 0, # {0: stay, 1: look up, 2: look down}, 62 | 'interaction': 0, # {0: stand, 1: jump, 2: crouch, 3: carry, 4: drop, 5: open door} 63 | } 64 | 65 | self.person_success = False 66 | self.right_times = 0 67 | self.foreward_times = 0 68 | self.backward_times = 0 69 | self.try_times = 10 70 | self.idx = 0 71 | 72 | def reset(self, reference_text=None, reference_image=None): 73 | self.person_success = False 74 | self.right_times = 0 75 | self.foreward_times = 0 76 | self.backward_times = 0 77 | self.try_times = 10 78 | self.idx = 0 79 | 80 | def predicts(self, ob,success): 81 | # person_exist = False 82 | # truck_exist = False 83 | self.handle.write('Step %d\n'%self.idx) 84 | self.handle.flush() 85 | self.idx += 1 86 | if self.idx == 200: 87 | return self.drop 88 | 89 | ob = base64.b64decode(ob) 90 | ob = cv2.imdecode(np.frombuffer(ob, np.uint8), cv2.IMREAD_COLOR) 91 | results = self.yolo_model(source=ob,imgsz=640,conf=0.1) 92 | boxes = results[0].boxes # get all detected bounding box 93 | if boxes.shape[0] == 0: 94 | # randomly choose a number from list[1,2] 95 | if self.right_times >= self.try_times: 96 | if self.foreward_times= self.try_times and self.foreward_times>=self.try_times and self.backward_times>=self.try_times: 122 | #self.right_times = 9 123 | self.right_times = 9 124 | self.foreward_times = 0 125 | self.backward_times= 0 126 | action = self.turnright 127 | time.sleep(2) 128 | self.handle.write(" ---------------- no object turnright ------------------\n") 129 | self.handle.write(json.dumps(action) + '\n') 130 | self.handle.flush() 131 | return action 132 | for box in boxes: 133 | cls = int(box.cls.item()) # get class ID 134 | if self.yolo_model.names[cls] == 'person' and self.person_success is False: # 检查是否是person类别 135 | # get bounding box coordinate(x1, y1, x2, y2) 136 | #x0, y0, w_, h_ = box.xywh 137 | res_ = box.xywh 138 | self.handle.write("res_person: %s\n"%res_) 139 | self.handle.flush() 140 | 141 | x0, y0, w_, h_ = res_[0].tolist() 142 | self.handle.write("x0, y0, w_, h_: %s, %s, %s, %s\n"%(x0, y0, w_, h_)) 143 | self.handle.flush() 144 | if w_ > h_: 145 | if y0-0.5*h_ > 400: 146 | action = self.carry 147 | time.sleep(2) 148 | self.handle.write("carry !!!!!!!!!!!!!!!!!!\n") 149 | self.person_success = True 150 | self.handle.write(json.dumps(action) + '\n') 151 | self.handle.flush() 152 | return action 153 | else: 154 | if x0 < 220: 155 | action = self.turnleft 156 | time.sleep(2) 157 | self.handle.write(" ---------------- turnleft ------------------\n") 158 | self.handle.write(json.dumps(action) + '\n') 159 | self.handle.flush() 160 | return action 161 | elif x0 > 420: 162 | action = self.turnright 163 | time.sleep(2) 164 | self.handle.write(" ---------------- turnright ------------------\n") 165 | self.handle.write(json.dumps(action) + '\n') 166 | self.handle.flush() 167 | return action 168 | else: 169 | action = self.foreward 170 | time.sleep(2) 171 | self.handle.write(" ---------------- foreward ------------------\n") 172 | self.handle.write(json.dumps(action) + '\n') 173 | self.handle.flush() 174 | return action 175 | elif self.yolo_model.names[cls] == 'truck' and self.person_success: # Check if it is the 'person' category 176 | # get bounding box coordinate(x1, y1, x2, y2) 177 | #x0, y0, w_, h_ = box.xywh 178 | res_ = box.xywh 179 | self.handle.write("res_truck: %s\n"%res_) 180 | self.handle.flush() 181 | 182 | x0, y0, w_, h_ = res_[0].tolist() 183 | if w_ > 390: 184 | action = self.drop 185 | time.sleep(2) 186 | self.handle.write(" ============================================= drop =============================================\n") 187 | self.handle.write(json.dumps(action) + '\n') 188 | self.handle.flush() 189 | return action 190 | else: 191 | if x0 < 220: 192 | action = self.turnleft 193 | time.sleep(2) 194 | self.handle.write(" =============== turnleft ===============\n") 195 | self.handle.write(json.dumps(action) + '\n') 196 | self.handle.flush() 197 | return action 198 | elif x0 > 420: 199 | action = self.turnright 200 | self.handle.write(" =============== turnright ===============\n") 201 | time.sleep(2) 202 | self.handle.write(json.dumps(action) + '\n') 203 | self.handle.flush() 204 | return action 205 | else: 206 | action = self.foreward 207 | time.sleep(2) 208 | self.handle.write(" =============== forward ===============\n") 209 | self.handle.write(json.dumps(action) + '\n') 210 | self.handle.flush() 211 | return action 212 | 213 | if self.right_times >= self.try_times: 214 | if self.foreward_times= self.try_times and self.foreward_times>=self.try_times and self.backward_times>=self.try_times: 241 | #self.right_times = 9 242 | self.right_times = 9 243 | self.foreward_times = 0 244 | self.backward_times= 0 245 | action = self.turnright 246 | time.sleep(2) 247 | self.handle.write(" ---------------- no person and truck turnright------------------\n") 248 | self.handle.write(json.dumps(action) + '\n') 249 | self.handle.flush() 250 | return action 251 | else: 252 | action = self.noaction 253 | time.sleep(2) 254 | self.handle.write(" ---------------- no person and truck turnright------------------\n") 255 | self.handle.write(json.dumps(action) + '\n') 256 | self.handle.flush() 257 | return action 258 | 259 | -------------------------------------------------------------------------------- /solution_random.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | class AlgSolution: # 类名不能变 4 | 5 | def __init__(self): 6 | pass 7 | 8 | def reset(self, reference_text, reference_image): 9 | # This function is called before each new test data inference; 10 | # reference_text and reference_image are the reference text and reference image for the data, respectively. If None, it means there is none. 11 | # reference_image is a base64 encoded png image 12 | # This function completes some state initialization, cache clearing, etc. 13 | pass 14 | 15 | def predicts(self, ob, success): 16 | # ob: a base64 encoded png image with a resolution of 320*320, representing the current observation image 17 | # success: a boolean variable, True/False represents whether the last command was successfully executed 18 | # Most of the time, success is True. When the player performs the "carry" operation to lift the wounded, if it is not within the allowable distance, the action may fail. 19 | 20 | linear = random.randint(-100, 100) 21 | angular = random.randint(-30,30) 22 | 23 | action = { 24 | 'angular': angular, # [-30, 30] 25 | 'velocity': linear, # [-100, 100], 26 | 'viewport': 0, # {0: stay, 1: look up, 2: look down}, 27 | 'interaction': 0, # {0: stand, 1: jump, 2: crouch, 3: carry, 4: drop, 5: open door} 28 | } 29 | return action --------------------------------------------------------------------------------