├── .gitignore ├── README.md ├── assets └── Viz.png ├── gpt-driver ├── __init__.py ├── create_data.py ├── create_data_uniad.py ├── finetune.py ├── incontext_learning.py ├── pack_incontext_dict.py ├── prompt_message.py ├── search_invalid_tokens.py └── test.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *.json 2 | *.pkl 3 | *.jsonl 4 | __pycache__ 5 | viz/* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GPT-Driver 2 | 3 | This is a repo of our arXiv pre-print [GPT-Driver](https://arxiv.org/abs/2310.01415) [[Project Page](https://pointscoder.github.io/projects/gpt_driver/index.html)]. 4 | 5 | Note: Running GPT-Driver requires an [OpenAI API account](https://platform.openai.com/) 6 | 7 | Note: Evaluation Code is [here](https://drive.google.com/drive/folders/1NCqPtdK8agPi1q3sr9-8-vPdYj08OCAE?usp=sharing) for the open-loop motion planning performance on nuScenes. 8 | 9 | ## Introduction 10 | 11 | We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate insufficient generalization capabilities in the face of novel and unseen driving scenarios. In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs). The fundamental insight of our approach is the reformulation of motion planning as a language modeling problem, a perspective not previously explored. Specifically, we represent the planner inputs and outputs as language tokens, and leverage the LLM to generate driving trajectories through a language description of coordinate positions. Furthermore, we propose a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. With this strategy, the LLM can describe highly precise trajectory coordinates and also its internal decision-making process in natural language. We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner. 12 | 13 | ![Alt text](assets/Viz.png) 14 | 15 | ## Installation 16 | a. Clone this repository. 17 | ```shell 18 | git clone https://github.com/PointsCoder/GPT-Driver.git 19 | ``` 20 | 21 | b. Install the dependent libraries as follows: 22 | 23 | ``` 24 | pip install -r requirements.txt 25 | ``` 26 | 27 | ## Data Preparation 28 | 29 | a. We pre-cached the used information (detections, predictions, trajectories, etc.) from the nuScenes dataset (cached_nuscenes_info.pkl) and UniAD pretrained models (detection_motion_result_trainval.jsonl). The data can be downloaded at [Google Drive](https://drive.google.com/drive/folders/1hUb1dsaDUABbUKnhj63vQBi0n4AZaZyM?usp=sharing). 30 | 31 | b. You can put the downloaded data here: 32 | ``` 33 | GPT-Driver 34 | ├── data 35 | │ ├── cached_nuscenes_info.pkl 36 | │ ├── detection_motion_result_trainval.jsonl 37 | │ ├── split.json 38 | ├── gpt-driver 39 | ├── outputs 40 | ``` 41 | 42 | c. OpenAI requires submitting a json file that contains the prompts and answers for fine-tuning. To build this `train.json` file, run 43 | ``` 44 | python gpt-driver/create_data.py 45 | ``` 46 | You will get the `train.json` here: 47 | ``` 48 | GPT-Driver 49 | ├── data 50 | │ ├── cached_nuscenes_info.pkl 51 | │ ├── detection_motion_result_trainval.jsonl 52 | │ ├── split.json 53 | │ ├── train.json 54 | ├── gpt-driver 55 | ├── outputs 56 | ``` 57 | 58 | ## Fine-Tuning via OpenAI API 59 | 60 | a. To finetune your own model, you need to first register an [OpenAI API account](https://platform.openai.com/). 61 | 62 | b. After registration, you can generate an API-key in your account settings. Here is an example: 63 | 64 | ``` 65 | openai.api_key = "sk-I**p" 66 | ``` 67 | You need to specify this key anywhere needed in the code. Please note that this is your own key and will be linked to your bill payment, so keep this confidential and do not distribute it to others! 68 | 69 | c. To submitted a fine-tune job to OpenAI, you can use the following commands in your Python console: 70 | ``` 71 | import openai 72 | openai.api_key = "sk-I**p" 73 | 74 | # This will take some time for uploading train.json to the OpenAI server. 75 | response = openai.File.create(file=open("train.json", "r"), purpose='fine-tune') 76 | 77 | # Get the file id after waiting for some minutes. 78 | train_file_id = response["id"] 79 | 80 | # Launch a fine-tune job. Fine-tuning takes several hours to complete. 81 | response = openai.FineTuningJob.create(training_file=train_file_id, model="gpt-3.5-turbo", hyperparameters={"n_epochs":1, }) 82 | 83 | # Optionally, you can check your fine-tune job status with these commands 84 | finetune_job_id = response["id"] 85 | openai.FineTuningJob.retrieve(finetune_job_id) 86 | ``` 87 | You can also find these commands in `gpt-driver/finetune.py`: 88 | 89 | **Note:** Fine-tuning costs money. Please refer to the [pricing page](https://openai.com/pricing). In general, 10M tokens (fine-tune on the full nuScenes training set for one epoch) will cost around 80 USD. You can use shorter prompts to reduce the cost. 90 | 91 | d. When your fine-tune job successfully completes, you will receive an email notifying your fine-tuned GPT model id, like this 92 | ``` 93 | ft:gpt-3.5-turbo-0613:**::8**O 94 | ``` 95 | This model id denotes your own GPT motion planner and will be used in evaluation. 96 | 97 | ## Evaluation 98 | 99 | a. After you get your model id, you can run this command to generate motion planning results for the nuScenes validation set: 100 | ``` 101 | python gpt-driver/test.py -i your_model_id -o your_output_file_name 102 | ``` 103 | You can get a `your_output_file_name.pkl` that contains a `Dict[token: np.array((6, 2))]` where each test sample has a 3-second planned trajectory. This pickle file can be directly used for evaluation on nuScenes. 104 | 105 | b. You can refer to the code and data [here](https://drive.google.com/drive/folders/1NCqPtdK8agPi1q3sr9-8-vPdYj08OCAE?usp=sharing) for evaluating the motion planning performance on nuScenes. 106 | 107 | ## Citation 108 | If you find this project useful in your research, please consider cite: 109 | 110 | ``` 111 | @article{gptdriver, 112 | title={GPT-Driver: Learning to Drive with GPT}, 113 | author={Mao, Jiageng and Qian, Yuxi and Zhao, Hang and Wang, Yue}, 114 | year={2023} 115 | } 116 | ``` 117 | -------------------------------------------------------------------------------- /assets/Viz.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PointsCoder/GPT-Driver/6294144d196df7d8a447242ebe7c5e5a1d0ff565/assets/Viz.png -------------------------------------------------------------------------------- /gpt-driver/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PointsCoder/GPT-Driver/6294144d196df7d8a447242ebe7c5e5a1d0ff565/gpt-driver/__init__.py -------------------------------------------------------------------------------- /gpt-driver/create_data.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import ndjson 3 | import json 4 | import tiktoken 5 | from prompt_message import system_message, generate_user_message, generate_assistant_message 6 | 7 | data = pickle.load(open('data/cached_nuscenes_info.pkl', 'rb')) 8 | split = json.load(open('data/split.json', 'r')) 9 | 10 | train_tokens = split["train"] 11 | val_tokens = split["val"] 12 | num_train_samples = len(train_tokens) 13 | train_ratio = 1 14 | 15 | encoding = tiktoken.encoding_for_model("gpt-3.5-turbo") 16 | 17 | num_language_tokens = 0 18 | num_system_tokens = 0 19 | num_user_tokens = 0 20 | num_assistant_tokens = 0 21 | 22 | traj_only = False 23 | 24 | train_messages = [] 25 | for token_i, token in enumerate(train_tokens): 26 | if token_i >= train_ratio * num_train_samples: 27 | break 28 | user_message = generate_user_message(data, token) 29 | assitant_message = generate_assistant_message(data, token, traj_only=traj_only) 30 | if len(assitant_message.split("\n")) > 6: 31 | print() 32 | print(token) 33 | print(system_message) 34 | print(user_message) 35 | print(assitant_message) 36 | num_language_tokens += len(encoding.encode(system_message)) 37 | num_system_tokens += len(encoding.encode(system_message)) 38 | num_language_tokens += len(encoding.encode(user_message)) 39 | num_user_tokens += len(encoding.encode(user_message)) 40 | num_language_tokens += len(encoding.encode(assitant_message)) 41 | num_assistant_tokens += len(encoding.encode(assitant_message)) 42 | 43 | 44 | train_message = {"messages": 45 | [ 46 | {"role": "system", "content": system_message}, 47 | {"role": "user", "content": user_message}, 48 | {"role": "assistant", "content": assitant_message} 49 | ] 50 | } 51 | train_messages.append(train_message) 52 | 53 | print("#### Cost Summarization ####") 54 | print(f"Number of system tokens: {num_system_tokens}") 55 | print(f"Number of user tokens: {num_user_tokens}") 56 | print(f"Number of assistant tokens: {num_assistant_tokens}") 57 | print(f"Number of total tokens: {num_language_tokens}") 58 | 59 | with open("data/train.json", "w") as f: 60 | ndjson.dump(train_messages, f) -------------------------------------------------------------------------------- /gpt-driver/create_data_uniad.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import ndjson 3 | import json 4 | import tiktoken 5 | import numpy as np 6 | from prompt_message import system_message, generate_user_message, generate_assistant_message 7 | 8 | def jsonl_to_dict(filename): 9 | data_dict = {} 10 | with open(filename, 'r') as file: 11 | for line in file: 12 | json_obj = json.loads(line.strip()) 13 | token = json_obj['token'] 14 | data_dict[token] = json_obj['detections'] 15 | return data_dict 16 | 17 | data = pickle.load(open('data/cached_nuscenes_info.pkl', 'rb')) 18 | uniad_perceptions = jsonl_to_dict('data/detection_motion_result_trainval.jsonl') 19 | split = json.load(open('data/split.json', 'r')) 20 | 21 | train_tokens = split["train"] 22 | val_tokens = split["val"] 23 | num_train_samples = len(train_tokens) 24 | train_ratio = 1 25 | 26 | encoding = tiktoken.encoding_for_model("gpt-3.5-turbo") 27 | 28 | num_language_tokens = 0 29 | num_system_tokens = 0 30 | num_user_tokens = 0 31 | num_assistant_tokens = 0 32 | 33 | traj_only = False 34 | 35 | train_messages = [] 36 | for token_i, token in enumerate(train_tokens): 37 | if token_i >= train_ratio * num_train_samples: 38 | break 39 | 40 | uniad_perception = uniad_perceptions[token] # list of dicts 41 | uniad_boxes, uniad_names, uniad_trajs = [], [], [] 42 | for obj in uniad_perception: 43 | uniad_names.append(obj['name']) 44 | uniad_boxes.append(obj['box']) 45 | box_center = np.array(obj['box'][:2]) # (2) 46 | full_traj = np.array(obj['traj'][:6]) # [6, 2] 47 | rel_traj = full_traj - box_center[None,:] 48 | rel_traj = np.concatenate([np.zeros((1,2)), rel_traj], axis=0) # [7, 2] 49 | rel_diff_traj = rel_traj[1:] - rel_traj[:-1] # [6, 2] 50 | uniad_trajs.append(rel_diff_traj) 51 | if len(uniad_trajs) == 0: 52 | data[token]['gt_agent_fut_trajs'] = np.zeros((0,6,2)) # [num_objs, 6, 2] 53 | data[token]['gt_boxes'] = np.zeros((0,9)) 54 | data[token]['gt_names'] = np.array([]) 55 | else: 56 | data[token]['gt_boxes'] = np.array(uniad_boxes) 57 | data[token]['gt_names'] = np.array(uniad_names) 58 | data[token]['gt_agent_fut_trajs'] = np.stack(uniad_trajs, axis=0) # [num_objs, 6, 2] 59 | data[token]['gt_agent_fut_masks'] = np.ones((len(uniad_boxes), 6)) # [num_objs, 6] 60 | 61 | user_message = generate_user_message(data, token) 62 | assitant_message = generate_assistant_message(data, token, traj_only=traj_only) 63 | if len(assitant_message.split("\n")) > 6: 64 | print() 65 | print(token) 66 | print(system_message) 67 | print(user_message) 68 | print(assitant_message) 69 | num_language_tokens += len(encoding.encode(system_message)) 70 | num_system_tokens += len(encoding.encode(system_message)) 71 | num_language_tokens += len(encoding.encode(user_message)) 72 | num_user_tokens += len(encoding.encode(user_message)) 73 | num_language_tokens += len(encoding.encode(assitant_message)) 74 | num_assistant_tokens += len(encoding.encode(assitant_message)) 75 | 76 | 77 | train_message = {"messages": 78 | [ 79 | {"role": "system", "content": system_message}, 80 | {"role": "user", "content": user_message}, 81 | {"role": "assistant", "content": assitant_message} 82 | ] 83 | } 84 | train_messages.append(train_message) 85 | 86 | print("#### Cost Summarization ####") 87 | print(f"Number of system tokens: {num_system_tokens}") 88 | print(f"Number of user tokens: {num_user_tokens}") 89 | print(f"Number of assistant tokens: {num_assistant_tokens}") 90 | print(f"Number of total tokens: {num_language_tokens}") 91 | 92 | with open("data/train_uniad.json", "w") as f: 93 | ndjson.dump(train_messages, f) -------------------------------------------------------------------------------- /gpt-driver/finetune.py: -------------------------------------------------------------------------------- 1 | import openai 2 | 3 | openai.api_key = "" # insert your API key here 4 | 5 | response = openai.File.create(file=open("train.json", "r"), purpose='fine-tune') 6 | print(response) 7 | train_file_id = response["id"] 8 | 9 | response = openai.FineTuningJob.create(training_file=train_file_id, model="gpt-3.5-turbo", hyperparameters={"n_epochs":1, }) 10 | print(response) 11 | finetune_job_id = response["id"] 12 | 13 | # List 10 fine-tuning jobs 14 | openai.FineTuningJob.list(limit=10) 15 | 16 | # Retrieve the state of a fine-tune 17 | openai.FineTuningJob.retrieve(finetune_job_id) 18 | 19 | # List up to 10 events from a fine-tuning job 20 | openai.FineTuningJob.list_events(id=finetune_job_id, limit=10) -------------------------------------------------------------------------------- /gpt-driver/incontext_learning.py: -------------------------------------------------------------------------------- 1 | import openai 2 | import pickle 3 | import json 4 | import ast 5 | import tiktoken 6 | import numpy as np 7 | import time 8 | import argparse 9 | from prompt_message import system_message, generate_user_message, generate_assistant_message, generate_incontext_message 10 | from tenacity import ( 11 | retry, 12 | stop_after_attempt, 13 | wait_random_exponential, 14 | ) # for exponential backoff 15 | 16 | @retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6)) 17 | def completion_with_backoff(**kwargs): 18 | return openai.ChatCompletion.create(**kwargs) 19 | 20 | parser = argparse.ArgumentParser(description="GPT-Driver test.") 21 | parser.add_argument("-o", "--output", type=str, help="output file name") 22 | args = parser.parse_args() 23 | 24 | encoding = tiktoken.encoding_for_model("gpt-3.5-turbo") 25 | 26 | saved_traj_name = "outputs/" + args.output + ".pkl" 27 | saved_text_name = "outputs/" + args.output + "_text.pkl" 28 | temp_text_name = "outputs/" + args.output + "_temp.jsonl" 29 | 30 | openai.api_key = "" # insert your API key here 31 | 32 | data = pickle.load(open('data/cached_nuscenes_info.pkl', 'rb')) 33 | split = json.load(open('data/split.json', 'r')) 34 | 35 | train_tokens = split["train"] 36 | test_tokens = split["val"] 37 | 38 | text_dict, traj_dict = {}, {} 39 | 40 | invalid_tokens = [] 41 | 42 | untest_tokens = [ 43 | ] 44 | 45 | num_incontext_prompts = 5 46 | 47 | for token_index, token in enumerate(test_tokens): 48 | if len(untest_tokens) > 0 and token not in untest_tokens: 49 | continue 50 | 51 | print() 52 | print(token) 53 | 54 | time.sleep(1) 55 | incontext_message = "" 56 | for i in range(num_incontext_prompts): 57 | train_token_id = token_index * 5 + i 58 | if train_token_id >= len(train_tokens): 59 | train_token_id = train_token_id % len(train_tokens) 60 | train_token = train_tokens[train_token_id] 61 | incontext_message += generate_incontext_message(data, train_token) 62 | system_incontext_message = system_message + incontext_message 63 | user_message = generate_user_message(data, token) 64 | 65 | num_system_tokens = len(encoding.encode(system_incontext_message)) 66 | num_user_tokens = len(encoding.encode(user_message)) 67 | if num_system_tokens + num_user_tokens > 4096: # overflow 68 | system_incontext_message = system_message 69 | num_system_tokens = len(encoding.encode(system_incontext_message)) 70 | if num_system_tokens + num_user_tokens > 4096: # overflow again 71 | system_incontext_message = "" 72 | 73 | assitant_message = generate_assistant_message(data, token) 74 | # print(f"System:\n {system_incontext_message}") 75 | completion = completion_with_backoff( 76 | model="gpt-3.5-turbo", 77 | messages=[ 78 | {"role": "system", "content": system_incontext_message}, 79 | {"role": "user", "content": user_message}, 80 | ] 81 | ) 82 | # import pdb; pdb.set_trace() 83 | result = completion.choices[0].message["content"] 84 | print("#### Result ####") 85 | print(f"GPT Planner:\n {result}") 86 | print(f"Ground Truth:\n {assitant_message}") 87 | output_dict = { 88 | "token": token, 89 | "GPT": result, 90 | "GT": assitant_message, 91 | } 92 | 93 | text_dict[token] = result 94 | 95 | traj = result.split("\n")[-1] 96 | try: 97 | traj = ast.literal_eval(traj) 98 | traj = np.array(traj) 99 | except: 100 | print(f"Invalid token: {token}") 101 | invalid_tokens.append(token) 102 | continue 103 | traj_dict[token] = traj 104 | 105 | with open(temp_text_name, "a+") as file: 106 | file.write(json.dumps(output_dict) + '\n') 107 | 108 | # output_dicts = [] 109 | # with open(temp_text_name, "r") as file: 110 | # for line in file: 111 | # output_dicts.append(json.loads(line)) 112 | 113 | if len(untest_tokens) > 0: 114 | exist_dict = pickle.load(open(saved_traj_name, 'rb')) 115 | exist_dict.update(traj_dict) 116 | fd = open(saved_traj_name, "wb") 117 | pickle.dump(exist_dict, fd) 118 | 119 | print("#### Invalid Tokens ####") 120 | for token in invalid_tokens: 121 | print(token) 122 | 123 | if len(untest_tokens) == 0: 124 | with open(saved_text_name, "wb") as f: 125 | pickle.dump(text_dict, f) 126 | with open(saved_traj_name, "wb") as f: 127 | pickle.dump(traj_dict, f) -------------------------------------------------------------------------------- /gpt-driver/pack_incontext_dict.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import json 3 | import ast 4 | import numpy as np 5 | 6 | filename = "outputs/gpt_incontext_temp.jsonl" 7 | data_dict = {} 8 | with open(filename, 'r') as file: 9 | for line in file: 10 | json_obj = json.loads(line.strip()) 11 | token = json_obj['token'] 12 | try: 13 | gpt_text = json_obj['GPT'] 14 | traj = gpt_text.split("\n")[-1] 15 | traj = ast.literal_eval(traj) 16 | traj = np.array(traj) 17 | if traj.shape[0] != 6 or traj.shape[1] != 2: 18 | print(f"Invalid token: {token}") 19 | continue 20 | data_dict[token] = traj 21 | except: 22 | print(f"Invalid token: {token}") 23 | continue 24 | 25 | with open("outputs/gpt_incontext.pkl", "wb") as f: 26 | pickle.dump(data_dict, f) 27 | -------------------------------------------------------------------------------- /gpt-driver/prompt_message.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | system_message = """ 4 | **Autonomous Driving Planner** 5 | Role: You are the brain of an autonomous vehicle. Plan a safe 3-second driving trajectory. Avoid collisions with other objects. 6 | 7 | Context 8 | - Coordinates: X-axis is perpendicular, and Y-axis is parallel to the direction you're facing. You're at point (0,0). 9 | - Objective: Create a 3-second route using 6 waypoints, one every 0.5 seconds. 10 | 11 | Inputs 12 | 1. Perception & Prediction: Info about surrounding objects and their predicted movements. 13 | 2. Historical Trajectory: Your past 2-second route, given by 4 waypoints. 14 | 3. Ego-States: Your current state including velocity, heading angular velocity, can bus data, heading speed, and steering signal. 15 | 4. Mission Goal: Goal location for the next 3 seconds. 16 | 17 | Task 18 | - Thought Process: Note down critical objects and potential effects from your perceptions and predictions. 19 | - Action Plan: Detail your meta-actions based on your analysis. 20 | - Trajectory Planning: Develop a safe and feasible 3-second route using 6 new waypoints. 21 | 22 | Output 23 | - Thoughts: 24 | - Notable Objects 25 | Potential Effects 26 | - Meta Action 27 | - Trajectory (MOST IMPORTANT): 28 | - [(x1,y1), (x2,y2), ... , (x6,y6)] 29 | """ 30 | 31 | system_message_cot = """ 32 | **Autonomous Driving Planner** 33 | Role: You are the brain of an autonomous vehicle. Plan a safe 3-second driving trajectory. Avoid collisions with other objects. 34 | 35 | Output 36 | - Thoughts: identify critical objects and potential effects from perceptions and predictions. 37 | - Meta Action 38 | - Trajectory (MOST IMPORTANT): 6 waypoints, one every 0.5 seconds 39 | - [(x1,y1), (x2,y2), ... , (x6,y6)] 40 | """ 41 | 42 | system_message_short = """ 43 | **Autonomous Driving Planner** 44 | Role: You are the brain of an autonomous vehicle. Plan a safe 3-second driving trajectory. Avoid collisions with other objects. 45 | 46 | Output 47 | - Trajectory (MOST IMPORTANT): 6 waypoints, one every 0.5 seconds 48 | - [(x1,y1), (x2,y2), ... , (x6,y6)] 49 | """ 50 | 51 | def generate_user_message(data, token, perception_range=20.0, short=True): 52 | 53 | # user_message = f"You have received new input data to help you plan your route.\n" 54 | user_message = f"\n" 55 | 56 | data_dict = data[token] 57 | 58 | """ 59 | Perception and Prediction Outputs: 60 | object_boxes: [N, 7] 61 | object_names: [N] 62 | object_velocity: [N, 2] 63 | object_rel_fut_trajs: [N, 12] # diff movements in their local frames 64 | object_fut_mask: [N, 6] 65 | """ 66 | object_boxes = data_dict['gt_boxes'] 67 | object_names = data_dict['gt_names'] 68 | # object_velocity = data_dict['gt_velocity'] 69 | object_rel_fut_trajs = data_dict['gt_agent_fut_trajs'].reshape(-1, 6, 2) 70 | object_fut_trajs = np.cumsum(object_rel_fut_trajs, axis=1) + object_boxes[:, None, :2] 71 | object_fut_mask = data_dict['gt_agent_fut_masks'] 72 | user_message += f"Perception and Prediction:\n" 73 | num_objects = object_boxes.shape[0] 74 | for i in range(num_objects): 75 | if ((object_fut_trajs[i, :, 1] <= 0).all()) and (object_boxes[i, 1] <= 0): # negative Y, meaning the object is always behind us, we don't care 76 | continue 77 | if ((np.abs(object_fut_trajs[i, :, :]) > perception_range).any()) or (np.abs(object_boxes[i, :2]) > perception_range).any(): # filter faraway (> 20m) objects in case there are too many outputs 78 | continue 79 | if not short: 80 | object_name = object_names[i] 81 | ox, oy = object_boxes[i, :2] 82 | user_message += f" - {object_name} at ({ox:.2f},{oy:.2f}). " 83 | user_message += f"Future trajectory: [" 84 | prediction_ts = 6 85 | for t in range(prediction_ts): 86 | if object_fut_mask[i, t] > 0: 87 | ox, oy = object_fut_trajs[i, t] 88 | user_message += f"({ox:.2f},{oy:.2f})" 89 | else: 90 | ox, oy = "UN", "UN" 91 | user_message += f"({ox},{oy})" 92 | if t != prediction_ts -1: 93 | user_message += f", " 94 | user_message += f"]\n" 95 | else: 96 | object_name = object_names[i] 97 | object_name = object_name.split(".")[-1] 98 | ox, oy = object_boxes[i, :2] 99 | user_message += f" - {object_name} at ({ox:.2f},{oy:.2f}), " 100 | ex, ey = object_fut_trajs[i, -1] 101 | if object_fut_mask[i, -1] > 0: 102 | user_message += f"moving to ({ex:.2f},{ey:.2f}).\n" 103 | else: 104 | user_message += f"moving to unknown location.\n" 105 | 106 | """ 107 | Ego-States: 108 | gt_ego_lcf_feat: [vx, vy, ?, ?, v_yaw (rad/s), ego_length, ego_width, v0 (vy from canbus), Kappa (steering)] 109 | """ 110 | vx = data_dict['gt_ego_lcf_feat'][0]*0.5 111 | vy = data_dict['gt_ego_lcf_feat'][1]*0.5 112 | v_yaw = data_dict['gt_ego_lcf_feat'][4] 113 | ax = data_dict['gt_ego_his_diff'][-1, 0] - data_dict['gt_ego_his_diff'][-2, 0] 114 | ay = data_dict['gt_ego_his_diff'][-1, 1] - data_dict['gt_ego_his_diff'][-2, 1] 115 | cx = data_dict['gt_ego_lcf_feat'][2] 116 | cy = data_dict['gt_ego_lcf_feat'][3] 117 | vhead = data_dict['gt_ego_lcf_feat'][7]*0.5 118 | steeling = data_dict['gt_ego_lcf_feat'][8] 119 | user_message += f"Ego-States:\n" 120 | user_message += f" - Velocity (vx,vy): ({vx:.2f},{vy:.2f})\n" 121 | user_message += f" - Heading Angular Velocity (v_yaw): ({v_yaw:.2f})\n" 122 | user_message += f" - Acceleration (ax,ay): ({ax:.2f},{ay:.2f})\n" 123 | user_message += f" - Can Bus: ({cx:.2f},{cy:.2f})\n" 124 | user_message += f" - Heading Speed: ({vhead:.2f})\n" 125 | user_message += f" - Steering: ({steeling:.2f})\n" 126 | 127 | """ 128 | Historical Trjectory: 129 | gt_ego_his_trajs: [5, 2] last 2 seconds 130 | gt_ego_his_diff: [4, 2] last 2 seconds, differential format, viewed as velocity 131 | """ 132 | xh1 = data_dict['gt_ego_his_trajs'][0][0] 133 | yh1 = data_dict['gt_ego_his_trajs'][0][1] 134 | xh2 = data_dict['gt_ego_his_trajs'][1][0] 135 | yh2 = data_dict['gt_ego_his_trajs'][1][1] 136 | xh3 = data_dict['gt_ego_his_trajs'][2][0] 137 | yh3 = data_dict['gt_ego_his_trajs'][2][1] 138 | xh4 = data_dict['gt_ego_his_trajs'][3][0] 139 | yh4 = data_dict['gt_ego_his_trajs'][3][1] 140 | user_message += f"Historical Trajectory (last 2 seconds):" 141 | user_message += f" [({xh1:.2f},{yh1:.2f}), ({xh2:.2f},{yh2:.2f}), ({xh3:.2f},{yh3:.2f}), ({xh4:.2f},{yh4:.2f})]\n" 142 | 143 | """ 144 | Mission goal: 145 | gt_ego_fut_cmd 146 | """ 147 | cmd_vec = data_dict['gt_ego_fut_cmd'] 148 | right, left, forward = cmd_vec 149 | if right > 0: 150 | mission_goal = "RIGHT" 151 | elif left > 0: 152 | mission_goal = "LEFT" 153 | else: 154 | assert forward > 0 155 | mission_goal = "FORWARD" 156 | user_message += f"Mission Goal: " 157 | user_message += f"{mission_goal}\n" 158 | 159 | return user_message 160 | 161 | def generate_assistant_message(data, token, traj_only = False): 162 | 163 | data_dict = data[token] 164 | if traj_only: 165 | assitant_message = "" 166 | else: 167 | assitant_message = generate_chain_of_thoughts(data_dict) 168 | 169 | x1 = data_dict['gt_ego_fut_trajs'][1][0] 170 | x2 = data_dict['gt_ego_fut_trajs'][2][0] 171 | x3 = data_dict['gt_ego_fut_trajs'][3][0] 172 | x4 = data_dict['gt_ego_fut_trajs'][4][0] 173 | x5 = data_dict['gt_ego_fut_trajs'][5][0] 174 | x6 = data_dict['gt_ego_fut_trajs'][6][0] 175 | y1 = data_dict['gt_ego_fut_trajs'][1][1] 176 | y2 = data_dict['gt_ego_fut_trajs'][2][1] 177 | y3 = data_dict['gt_ego_fut_trajs'][3][1] 178 | y4 = data_dict['gt_ego_fut_trajs'][4][1] 179 | y5 = data_dict['gt_ego_fut_trajs'][5][1] 180 | y6 = data_dict['gt_ego_fut_trajs'][6][1] 181 | if not traj_only: 182 | assitant_message += f"Trajectory:\n" 183 | assitant_message += f"[({x1:.2f},{y1:.2f}), ({x2:.2f},{y2:.2f}), ({x3:.2f},{y3:.2f}), ({x4:.2f},{y4:.2f}), ({x5:.2f},{y5:.2f}), ({x6:.2f},{y6:.2f})]" 184 | # assitant_message += f"[ {x1:.2f},{x2:.2f},{x3:.2f},{x4:.2f},{x5:.2f},{x6:.2f},{y1:.2f},{y2:.2f},{y3:.2f},{y4:.2f},{y5:.2f},{y6:.2f} ]" 185 | return assitant_message 186 | 187 | def generate_chain_of_thoughts(data_dict, perception_range=20.0, short=True): 188 | """ 189 | Generate chain of thoughts reasoning and prompting by simple rules 190 | """ 191 | ego_fut_trajs = data_dict['gt_ego_fut_trajs'] 192 | ego_his_trajs = data_dict['gt_ego_his_trajs'] 193 | ego_fut_diff = data_dict['gt_ego_fut_diff'] 194 | ego_his_diff = data_dict['gt_ego_his_diff'] 195 | vx = data_dict['gt_ego_lcf_feat'][0]*0.5 196 | vy = data_dict['gt_ego_lcf_feat'][1]*0.5 197 | ax = data_dict['gt_ego_his_diff'][-1, 0] - data_dict['gt_ego_his_diff'][-2, 0] 198 | ay = data_dict['gt_ego_his_diff'][-1, 1] - data_dict['gt_ego_his_diff'][-2, 1] 199 | ego_estimate_velos = [ 200 | [0, 0], 201 | [vx, vy], 202 | [vx+ax, vy+ay], 203 | [vx+2*ax, vy+2*ay], 204 | [vx+3*ax, vy+3*ay], 205 | [vx+4*ax, vy+4*ay], 206 | [vx+5*ax, vy+5*ay], 207 | ] 208 | ego_estimate_trajs = np.cumsum(ego_estimate_velos, axis=0) # [7, 2] 209 | # print(ego_estimate_trajs) 210 | object_boxes = data_dict['gt_boxes'] 211 | object_names = data_dict['gt_names'] 212 | 213 | object_rel_fut_trajs = data_dict['gt_agent_fut_trajs'].reshape(-1, 6, 2) 214 | object_fut_trajs = np.cumsum(object_rel_fut_trajs, axis=1) + object_boxes[:, None, :2] 215 | object_fut_trajs = np.concatenate([object_boxes[:, None, :2], object_fut_trajs], axis=1) 216 | object_fut_mask = data_dict['gt_agent_fut_masks'] 217 | num_objects = object_boxes.shape[0] 218 | 219 | num_future_horizon = 7 # include current 220 | object_collisons = np.zeros((num_objects, num_future_horizon)) 221 | for i in range(num_objects): 222 | if (object_fut_trajs[i, :, 1] <= 0).all(): # negative Y, meaning the object is always behind us, we don't care 223 | continue 224 | if (np.abs(object_fut_trajs[i, :, :]) > perception_range).any(): # filter faraway (> 20m) objects in case there are too many outputs 225 | continue 226 | for t in range(num_future_horizon): 227 | mask = object_fut_mask[i, t-1] > 0 if t > 0 else True 228 | if not mask: continue 229 | ego_x, ego_y = ego_estimate_trajs[t] 230 | object_x, object_y = object_fut_trajs[i, t] 231 | size_x, size_y = object_boxes[i, 3:5] * 0.5 # half size 232 | collision = collision_detection(ego_x, ego_y, 0.925, 2.04, object_x, object_y, size_x, size_y) 233 | if collision: 234 | object_collisons[i, t] = 1 235 | # import pdb; pdb.set_trace() 236 | break 237 | 238 | assitant_message = f"Thoughts:\n" 239 | if (object_collisons==0).all(): # nothing to care about 240 | assitant_message += f" - Notable Objects from Perception: None\n" 241 | assitant_message += f" Potential Effects from Prediction: None\n" 242 | # assitant_message += f" Nothing to care.\n" 243 | else: 244 | for i in range(num_objects): 245 | for t in range(num_future_horizon): 246 | if object_collisons[i, t] > 0: 247 | object_name = object_names[i] 248 | if short: 249 | object_name = object_name.split(".")[-1] 250 | ox, oy = object_boxes[i, :2] 251 | time = t*0.5 252 | # assitant_message += f" ################################################################################\n" 253 | assitant_message += f" - Notable Objects from Perception: {object_name} at ({ox:.2f},{oy:.2f})\n" 254 | assitant_message += f" Potential Effects from Prediction: within the safe zone of the ego-vehicle at the {time}-second timestep\n" 255 | meta_action = generate_meta_action( 256 | ego_fut_diff=ego_fut_diff, 257 | ego_fut_trajs=ego_fut_trajs, 258 | ego_his_diff=ego_his_diff, 259 | ego_his_trajs=ego_his_trajs 260 | ) 261 | assitant_message += ("Meta Action: " + meta_action) 262 | return assitant_message 263 | 264 | def collision_detection(x1, y1, sx1, sy1, x2, y2, sx2, sy2, x_space=1.0, y_space=3.0): # safe distance 265 | if (np.abs(x1-x2) < sx1+sx2+x_space) and (y2 > y1) and (y2 - y1 < sy1+sy2+y_space): # in front of you 266 | return True 267 | else: 268 | return False 269 | 270 | def generate_meta_action( 271 | ego_fut_diff, 272 | ego_fut_trajs, 273 | ego_his_diff, 274 | ego_his_trajs, 275 | ): 276 | meta_action = "" 277 | 278 | # speed meta 279 | constant_eps = 0.5 280 | his_velos = np.linalg.norm(ego_his_diff, axis=1) 281 | fut_velos = np.linalg.norm(ego_fut_diff, axis=1) 282 | cur_velo = his_velos[-1] 283 | end_velo = fut_velos[-1] 284 | 285 | if cur_velo < constant_eps and end_velo < constant_eps: 286 | speed_meta = "stop" 287 | elif end_velo < constant_eps: 288 | speed_meta = "a deceleration to zero" 289 | elif np.abs(end_velo - cur_velo) < constant_eps: 290 | speed_meta = "a constant speed" 291 | else: 292 | if cur_velo > end_velo: 293 | if cur_velo > 2 * end_velo: 294 | speed_meta = "a quick deceleration" 295 | else: 296 | speed_meta = "a deceleration" 297 | else: 298 | if end_velo > 2 * cur_velo: 299 | speed_meta = "a quick acceleration" 300 | else: 301 | speed_meta = "an acceleration" 302 | 303 | # behavior_meta 304 | if speed_meta == "stop": 305 | meta_action += (speed_meta + "\n") 306 | return meta_action.upper() 307 | else: 308 | forward_th = 2.0 309 | lane_changing_th = 4.0 310 | if (np.abs(ego_fut_trajs[:, 0]) < forward_th).all(): 311 | behavior_meta = "move forward" 312 | else: 313 | if ego_fut_trajs[-1, 0] < 0: # left 314 | if np.abs(ego_fut_trajs[-1, 0]) > lane_changing_th: 315 | behavior_meta = "turn left" 316 | else: 317 | behavior_meta = "chane lane to left" 318 | elif ego_fut_trajs[-1, 0] > 0: # right 319 | if np.abs(ego_fut_trajs[-1, 0]) > lane_changing_th: 320 | behavior_meta = "turn right" 321 | else: 322 | behavior_meta = "change lane to right" 323 | else: 324 | raise ValueError(f"Undefined behaviors: {ego_fut_trajs}") 325 | 326 | # heading-based rules 327 | # ego_fut_headings = np.arctan(ego_fut_diff[:,0]/(ego_fut_diff[:,1]+1e-4))*180/np.pi # in degree 328 | # ego_his_headings = np.arctan(ego_his_diff[:,0]/(ego_his_diff[:,1]+1e-4))*180/np.pi # in degree 329 | 330 | # forward_heading_th = 5 # forward heading is always near 0 331 | # turn_heading_th = 45 332 | 333 | # if (np.abs(ego_fut_headings) < forward_heading_th).all(): 334 | # behavior_meta = "move forward" 335 | # else: 336 | # # we extract a 5-s curve, if the largest heading change is above 45 degrees, we view it as turn 337 | # curve_headings = np.concatenate([ego_his_headings, ego_fut_headings]) 338 | # min_heading, max_heading = curve_headings.min(), curve_headings.max() 339 | # if ego_fut_trajs[-1, 0] < 0: # left 340 | # if np.abs(max_heading - min_heading) > turn_heading_th: 341 | # behavior_meta = "turn left" 342 | # else: 343 | # behavior_meta = "chane lane to left" 344 | # elif ego_fut_trajs[-1, 0] > 0: # right 345 | # if np.abs(max_heading - min_heading) > turn_heading_th: 346 | # behavior_meta = "turn right" 347 | # else: 348 | # behavior_meta = "chane lane to right" 349 | # else: 350 | # raise ValueError(f"Undefined behaviors: {ego_fut_trajs}") 351 | 352 | meta_action += (behavior_meta + " with " + speed_meta + "\n") 353 | return meta_action.upper() 354 | 355 | 356 | # system_message = """ 357 | # As a professional autonomous driving system, you are tasked with plotting a secure and human-like path within a 3-second window using the following guidelines and inputs: 358 | 359 | # ### Context 360 | # - **Coordinate System**: You are in the ego-vehicle coordinate system positioned at (0,0). The X-axis is perpendicular to your heading direction, while the Y-axis represents the heading direction. 361 | # - **Location**: You are mounted at the center of an ego-vehicle that has 4.08 meters length and 1.85 meters width. 362 | # - **Objective**: Generate a route characterized by 6 waypoints, with a new waypoint established every 0.5 seconds. 363 | 364 | # ### Inputs 365 | # 1. **Perception & Prediction** (You observe the surrounding objects and estimate their future movements): 366 | # - object name at (ox1, ox2). Future trajectory: [(oxt1, oyt1), ..., (oxt6, oyt6)], 6 waypoints in 3 seconds, UN denotes future location at that timestep is unknown 367 | # - ... 368 | 369 | # 2. **Historical Trajectory** (Your historital trajectory from the last 2 seconds, presented as 4 waypoints): 370 | # - [(xh1, yh1), (xh2, yh2), (xh3, yh3), (xh4, yh4)] 371 | 372 | # 3. **Ego-States** (Your current states): 373 | # - **Velocity** (vx, vy) # meters per 0.5 second 374 | # - **Heading Angular Velocity** (v_yaw) # ego-vehicle heading change rate, rad per second 375 | # - **Acceleration** (ax, ay) # velocity change rate per 0.5 second 376 | # - **Heading Speed** # meters per 0.5 second 377 | # - **Steering** # steering signal 378 | 379 | # 4. **Mission Goal**: Instructions outlining your objectives for the upcoming 3 seconds. 380 | 381 | # ### Task 382 | # - Integrate and process all the above inputs to construct a driving route. 383 | # - Thinking about what you have received and make driving decisions. Write down your thoughts and the action. 384 | # - Output a set of 6 new waypoints for the upcoming 3 seconds (Note: This task is of the most importance!). These should be formatted as coordinate pairs: 385 | # - (x1, y1) # 0.5 second 386 | # - (x2, y2) # 1.0 second 387 | # - (x3, y3) # 1.5 second 388 | # - (x4, y4) # 2.0 second 389 | # - (x5, y5) # 2.5 second 390 | # - (x6, y6) # 3.0 second 391 | # - Final output format: 392 | # Thoughts: 393 | # - Notable Objects from Perception: ... 394 | # Potential Effects from Prediction: ... 395 | # Meta Action: 396 | # ... 397 | # Trajectory: 398 | # - [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x5, y5), (x6, y6)] 399 | 400 | # Ensure the safety and feasibility of the path devised within the given 3-second timeframe. Let's work on crafting a safe route! 401 | # """ 402 | 403 | def generate_incontext_message(data, token): 404 | incontext_message = "\nFor example:\n" 405 | incontext_message += "Input:\n" 406 | user_message = generate_user_message(data, token) 407 | incontext_message += user_message 408 | incontext_message += "You should generate the following content:\n" 409 | assistant_message = generate_assistant_message(data, token) 410 | incontext_message += assistant_message 411 | return incontext_message -------------------------------------------------------------------------------- /gpt-driver/search_invalid_tokens.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import json 3 | import ast 4 | import numpy as np 5 | 6 | filename = "outputs/gpt_uniad.pkl" 7 | data_dict = pickle.load(open("outputs/gpt_uniad.pkl", "rb")) 8 | 9 | split = json.load(open('data/split.json', 'r')) 10 | 11 | train_tokens = split["train"] 12 | test_tokens = split["val"] 13 | 14 | untest_tokens = [ 15 | ] 16 | 17 | for token in test_tokens: 18 | if token not in data_dict: 19 | untest_tokens.append(token) 20 | 21 | print("#### Invalid Tokens ####") 22 | for token in untest_tokens: 23 | print(token) 24 | -------------------------------------------------------------------------------- /gpt-driver/test.py: -------------------------------------------------------------------------------- 1 | import openai 2 | import pickle 3 | import json 4 | import ast 5 | import numpy as np 6 | import time 7 | import argparse 8 | from prompt_message import system_message, generate_user_message, generate_assistant_message 9 | from tenacity import ( 10 | retry, 11 | stop_after_attempt, 12 | wait_random_exponential, 13 | ) # for exponential backoff 14 | 15 | @retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6)) 16 | def completion_with_backoff(**kwargs): 17 | return openai.ChatCompletion.create(**kwargs) 18 | 19 | parser = argparse.ArgumentParser(description="GPT-Driver test.") 20 | parser.add_argument("-i", "--id", type=str, help="GPT model id") 21 | parser.add_argument("-o", "--output", type=str, help="output file name") 22 | args = parser.parse_args() 23 | 24 | saved_traj_name = "outputs/" + args.output + ".pkl" 25 | saved_text_name = "outputs/" + args.output + "_text.pkl" 26 | temp_text_name = "outputs/" + args.output + "_temp.jsonl" 27 | 28 | openai.api_key = "" # insert your API key here 29 | 30 | data = pickle.load(open('data/cached_nuscenes_info.pkl', 'rb')) 31 | split = json.load(open('data/split.json', 'r')) 32 | 33 | train_tokens = split["train"] 34 | test_tokens = split["val"] 35 | 36 | text_dict, traj_dict = {}, {} 37 | 38 | invalid_tokens = [] 39 | 40 | untest_tokens = [ 41 | ] 42 | 43 | for token in test_tokens: 44 | if len(untest_tokens) > 0 and token not in untest_tokens: 45 | continue 46 | 47 | print() 48 | print(token) 49 | 50 | time.sleep(1) 51 | user_message = generate_user_message(data, token) 52 | assitant_message = generate_assistant_message(data, token) 53 | model_id = args.id 54 | completion = completion_with_backoff( 55 | model=model_id, 56 | messages=[ 57 | {"role": "system", "content": system_message}, 58 | {"role": "user", "content": user_message}, 59 | ] 60 | ) 61 | # import pdb; pdb.set_trace() 62 | result = completion.choices[0].message["content"] 63 | print(f"GPT Planner:\n {result}") 64 | print(f"Ground Truth:\n {assitant_message}") 65 | output_dict = { 66 | "token": token, 67 | "GPT": result, 68 | "GT": assitant_message, 69 | } 70 | 71 | text_dict[token] = result 72 | 73 | traj = result.split("\n")[-1] 74 | try: 75 | traj = ast.literal_eval(traj) 76 | traj = np.array(traj) 77 | except: 78 | print(f"Invalid token: {token}") 79 | invalid_tokens.append(token) 80 | continue 81 | traj_dict[token] = traj 82 | 83 | with open(temp_text_name, "a+") as file: 84 | file.write(json.dumps(output_dict) + '\n') 85 | 86 | # output_dicts = [] 87 | # with open(temp_text_name, "r") as file: 88 | # for line in file: 89 | # output_dicts.append(json.loads(line)) 90 | 91 | if len(untest_tokens) > 0: 92 | exist_dict = pickle.load(open(saved_traj_name, 'rb')) 93 | exist_dict.update(traj_dict) 94 | fd = open(saved_traj_name, "wb") 95 | pickle.dump(exist_dict, fd) 96 | 97 | print("#### Invalid Tokens ####") 98 | for token in invalid_tokens: 99 | print(token) 100 | 101 | if len(untest_tokens) == 0: 102 | with open(saved_text_name, "wb") as f: 103 | pickle.dump(text_dict, f) 104 | with open(saved_traj_name, "wb") as f: 105 | pickle.dump(traj_dict, f) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.5 2 | aiosignal==1.3.1 3 | async-timeout==4.0.3 4 | attrs==23.1.0 5 | certifi==2023.7.22 6 | charset-normalizer==3.2.0 7 | frozenlist==1.4.0 8 | idna==3.4 9 | multidict==6.0.4 10 | ndjson==0.3.1 11 | numpy==1.26.0 12 | openai==0.28.0 13 | regex==2023.8.8 14 | requests==2.31.0 15 | tenacity==8.2.3 16 | tiktoken==0.5.1 17 | tqdm==4.66.1 18 | urllib3==2.0.4 19 | yarl==1.9.2 20 | --------------------------------------------------------------------------------