├── .gitignore ├── README.md ├── assets └── pipeline.png ├── data └── modeling_data_final.json ├── requirements.txt └── src ├── ModelAgent ├── config.yaml ├── engines │ ├── __init__.py │ ├── core.py │ ├── data.py │ ├── modeling.py │ ├── selection.py │ ├── simulation.py │ └── writing.py ├── mathmodel.py ├── prompts │ ├── __init__.py │ ├── assumption.py │ ├── data_acquire.py │ ├── data_critic.py │ ├── factor_critic.py │ ├── factor_generation.py │ ├── function_call_prompts.py │ ├── guess_critic.py │ ├── guess_prompt.py │ ├── modeling_critic.py │ ├── modeling_generate.py │ ├── question_extract.py │ ├── selection_critic.py │ ├── selection_generate.py │ ├── simulation_critic.py │ ├── simulation_prompts.py │ ├── writing_data.py │ ├── writing_restatement.py │ ├── writing_simulation.py │ └── writing_solution.py └── utils │ ├── shared_context.py │ ├── tool_call_parser.py │ ├── tool_handler.py │ └── utils.py ├── ModelBase ├── baseline.py └── model_config.yaml ├── ModelTool ├── __init__.py ├── baseline.py ├── baseprompts.yaml ├── model_config.yaml └── utils │ ├── planner.py │ ├── planner_config.yaml │ └── planner_prompt.yaml ├── host ├── host.sh ├── tool_chat_hermes_template.jinja └── tool_chat_llama3.1_template.jinja ├── judger ├── analysis_groundedness.py ├── data_groundedness.py ├── innovativeness.py ├── main_judge.py ├── modeling_groundedness.py ├── scoring_decomposition.py └── structural_coherency.py └── tools ├── __init__.py ├── base.py ├── code_executor.py ├── engine.py ├── file_editor.py ├── file_extractor.py ├── file_lister.py ├── file_reader.py ├── file_writer.py ├── image_captioner.py ├── pdf_parsing.py ├── solution_generator.py ├── text_detector.py ├── url_text.py ├── web_download.py └── web_search.py /.gitignore: -------------------------------------------------------------------------------- 1 | **/__pycache__/ 2 | **/.vscode/ 3 | **/.idea/ 4 | **/.DS_Store 5 | 6 | ./baseline_runs 7 | ./dataagent_runs 8 | 9 | ./src/ModelAgent/log 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges 2 | [**📊 Dataset**](https://github.com/qiancheng0/ModelingAgent/tree/main/data) | [**📖 Paper**](https://www.arxiv.org/pdf/2505.15068) 3 | 4 | This repository contains the official code and dataset for the paper *"ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges."* 5 | 6 | The data includes the ModelingBench dataset, featuring detailed question descriptions, requirements, and evaluation criteria. 7 | 8 | ![Pipeline](assets/pipeline.png) 9 | 10 | ## 🔍 Quick Start 11 | First, install the required packages by running: 12 | ```bash 13 | pip install -r requirements.txt 14 | ``` 15 | 16 | ### Model and API Setup 17 | Some models may require API keys to function correctly. Please add the appropriate keys to the configuration file located in each directory under `src`. 18 | 19 | We use the Serper API as the backend to support our Search tool. Please include your Serper API key to use this feature ([**link**](https://serper.dev)). 20 | 21 | ```json 22 | { 23 | "openai_key": "YOUR_OPENAI_API_KEY", 24 | "google_api_key": "YOUR_GOOGLE_API_KEY", 25 | "serper_key": "YOUR_SERPER_API_KEY" 26 | } 27 | ``` 28 | 29 | If you are testing a self-hosted model, please use the script in the `src/host` directory. We currently support hosting models (and their tool-use functions) through `vllm`. The supported open-source model hosting scripts include: `Llama-3.1-70B-Instruct`, `Qwen-2.5-72B-Instruct`, and `QwQ-32B`. 30 | 31 | ### ModelingBench Data 32 | Our ModelingBench data is located in the `data` directory. It can be freely used for various purposes. Each data point contains several fields. Here is an example: 33 | ```json 34 | "2001_Adolescent_Pregnancy": { 35 | "year": "2001", 36 | "title": "Adolescent Pregnancy", 37 | "level": "High School", 38 | "source": "HiMCM", 39 | "link": "Problems/2001/HIMCM-A-2/index.html", 40 | "question": "You are working temporarily for the Department of Health ...", 41 | "requirements": [ 42 | { 43 | "category": "Data Analysis", 44 | "description": "Evaluate the accuracy and completeness of the data ..." 45 | } 46 | ], 47 | "eval_roles": [ 48 | { 49 | "name": "Mathematician", 50 | "details": "You are a mathematician with expertise in ..." 51 | } 52 | ] 53 | } 54 | ``` 55 | 56 | ## 🧪 Experiments 57 | 58 | ### Testing Code 59 | We provide testing code for Vanilla Generation in the `ModelBase` directory, Tool Agent in `ModelTool`, and ModelingAgent in `ModelAgent`. Please ensure the model configuration files are correctly set up with the required API keys and configurations. 60 | 61 | Also, set the output directory and other paths properly in the respective entry point file you wish to run. You can then run the following: 62 | ```bash 63 | cd ModelBase # For running Vanilla Generation 64 | python baseline.py 65 | 66 | cd ModelTool # For running Tool Agent 67 | python baseline.py 68 | 69 | cd ModelAgent # For running ModelingAgent 70 | python mathmodel.py 71 | ``` 72 | 73 | Please note that some errors may still exist due to the complexity of the agent structure. The model may not always use tools optimally or strictly follow instructions. Use this preview version with caution. 74 | 75 | ### Evaluation 76 | We use the ModelingJudge framework to evaluate the final generated reports. The expert roles for each problem are included in the ModelingBench dataset. 77 | 78 | To evaluate using ModelingJudge, run: 79 | ```bash 80 | cd src/judger 81 | python main_judge.py 82 | ``` 83 | 84 | Each evaluation metric corresponds to a Python file containing its specific prompt. 85 | 86 | ## 📖 File Structure 87 | - Benchmark data is located in the `data` directory. 88 | - Under `src/`, we include the code for our method and two baselines in `ModelAgent`, `ModelBase`, and `ModelTool`. 89 | - The `judger` directory contains code, evaluation standards, and prompts for ModelingJudge. 90 | - The `tools` directory contains all tools that may be invoked in the sandbox environment. 91 | 92 | ## 🖊️ Citation 93 | ```text 94 | @article{qian2025modelingagent, 95 | title={ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges}, 96 | author={Qian, Cheng and Du, Hongyi and Wang, Hongru and Chen, Xiusi and Zhang, Yuji and Sil, Avirup and Zhai, Chengxiang and McKeown, Kathleen and Ji, Heng}, 97 | journal={arXiv preprint arXiv:2505.15068}, 98 | year={2025} 99 | } 100 | ``` 101 | -------------------------------------------------------------------------------- /assets/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/assets/pipeline.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | openai 2 | vllm 3 | google-generativeai 4 | PyPDF2 5 | pymupdf 6 | pymupdf4llm 7 | zipfile 8 | tarfile 9 | gzip -------------------------------------------------------------------------------- /src/ModelAgent/config.yaml: -------------------------------------------------------------------------------- 1 | model: 2 | type: local 3 | name: QwQ-32B 4 | openai_api_key: YOUR_OPENAI_API_KEY 5 | openai_base_url: https://api.openai.com/v1 6 | max_len: 8192 7 | temperature: 0 8 | 9 | data: 10 | # API Key Configuration 11 | serper_api_key: YOUR_SERPER_API_KEY # API key for web_search_tool 12 | 13 | # Data Collection Configuration 14 | max_iter: 20 # Maximum iterations for each collection attempt 15 | min_score_threshold: 8 # Minimum quality score threshold for data collection (max score 15) 16 | max_attempts: 10 # Maximum attempts to collect a single data point 17 | critic_interval: 5 # Evaluate collection progress every N function calls 18 | max_workers: 1 # Maximum number of parallel threads for data point processing 19 | 20 | # Log and Working Directory Configuration 21 | save_history: true # Whether to save detailed history 22 | trim_history_size: 50 # Maximum number of history entries to keep 23 | 24 | # Resource Limit Configuration 25 | timeout_per_request: 120 # Timeout for each API request (seconds) 26 | max_tokens_per_request: 8000 # Maximum tokens per request 27 | 28 | # File Configuration 29 | markdown_output: true # Whether to generate Markdown summary for each data point 30 | csv_export: true # Whether to export data to CSV 31 | create_data_dir: true # Whether to create separate directory for each data point 32 | snapshot: true # Whether to create snapshot for each data point 33 | bottom_k_data: 2 # Minimum amount of data to collect per data point 34 | overwrite: false 35 | 36 | selection: 37 | rounds: 3 38 | 39 | modeling: 40 | rounds: 3 41 | 42 | simulation: 43 | # —— LLM Call Related —— 44 | max_api_retries: 5 # ← new: Number of automatic retries for LLM 429/500 errors 45 | api_base_wait_time: 10 # ← new: Base seconds for exponential backoff 46 | 47 | # —— Single Component Modeling Loop —— 48 | max_iter: 30 # ← new: Maximum iterations inside single_modeling_run 49 | critic_interval: 3 # ← new: Trigger mid-term critic every N steps 50 | score_threshold: 10 # ← Score threshold only used for final success determination (no longer used for early stopping) 51 | 52 | # —— run() Level —— 53 | max_retry_each: 5 # ← new: Maximum retries for each modeling group workspace rebuild 54 | auto_early_stop: true # ← new: Whether to automatically stop when score_threshold is reached 55 | overwrite: false 56 | # (Add more custom fields if needed, code has default values so not necessary to write) -------------------------------------------------------------------------------- /src/ModelAgent/engines/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/ModelAgent/engines/__init__.py -------------------------------------------------------------------------------- /src/ModelAgent/engines/modeling.py: -------------------------------------------------------------------------------- 1 | import json 2 | from copy import deepcopy 3 | 4 | from src.ModelAgent.engines.core import Core 5 | 6 | from src.ModelAgent.prompts.modeling_critic import MODELING_CRITIC_SYS, MODELING_CRITIC_USER 7 | from src.ModelAgent.prompts.modeling_generate import MODELING_GEN_SYS, MODELING_GEN_USER, MODELING_GEN_REFINE 8 | from src.ModelAgent.prompts.factor_generation import MODELING_FACTOR_SYS, MODELING_FACTOR_USER 9 | from src.ModelAgent.prompts.factor_critic import FACTOR_CRITIC_SYS, FACTOR_CRITIC_USER 10 | 11 | from src.ModelAgent.utils.utils import form_message 12 | from src.ModelAgent.utils.shared_context import SharedContext 13 | 14 | 15 | class ModelingEngine: 16 | def __init__(self, config, core, shared_context): 17 | self.config = config 18 | self.core: Core = core 19 | self.shared_context: SharedContext = shared_context 20 | 21 | def modeling_refine_loop(self, subtask_idx=0, approach_idx=0): 22 | history = [] 23 | 24 | modeling_question = self.shared_context.get_context("modeling_question") 25 | selection_history = self.shared_context.get_context("selection_history") 26 | proposed_model = selection_history[-1] 27 | modeling_approach = deepcopy(proposed_model["task_decomposition"][subtask_idx]) 28 | # This step actually could be ran in multi-threading (different modeling approaches in parallel) 29 | modeling_approach["modeling_approaches"] = modeling_approach["modeling_approaches"][approach_idx] 30 | modeling_approach.pop("subtask") 31 | modeling_approach = json.dumps(modeling_approach, indent=2) 32 | 33 | system = MODELING_GEN_SYS 34 | user = MODELING_GEN_USER.format(modeling_question=modeling_question, modeling_approach=modeling_approach) 35 | 36 | round = 0 37 | while round < self.config["modeling"]["rounds"]: 38 | round += 1 39 | print(f"Model implementation round {round}...") 40 | 41 | messages = form_message(system, user) 42 | response = self.core.execute(messages) 43 | modeling_implementation = response.strip().strip("```markdown").strip("```").strip() 44 | print(">> Implemented model details:\n", modeling_implementation) 45 | 46 | # history.append(deepcopy(modeling_implementation)) 47 | 48 | system = MODELING_CRITIC_SYS 49 | user = MODELING_CRITIC_USER.format(modeling_approach=modeling_approach, modeling_implementation=modeling_implementation) 50 | messages = form_message(system, user) 51 | response = self.core.execute(messages) 52 | critics = response.split("```json")[-1].split("```")[0].strip() 53 | print(">> Critics:\n", critics) 54 | try: 55 | critics = json.loads(critics) 56 | except: 57 | # TODO: fix json format based on the schema and model response, using GPT 58 | pass 59 | 60 | implemention_record = { 61 | "modeling_approach": json.loads(modeling_approach), 62 | "modeling_implementation": modeling_implementation, 63 | "user_feedback": critics 64 | } 65 | 66 | # history.append(deepcopy(critics)) 67 | history.append(deepcopy(implemention_record)) 68 | 69 | modeling_implementation = "```markdown\n" + modeling_implementation + "\n```" 70 | critics = json.dumps(critics, indent=2) 71 | system = MODELING_GEN_SYS 72 | user = MODELING_GEN_REFINE.format(modeling_approach=modeling_approach, modeling_implementation=modeling_implementation, critics=critics) 73 | 74 | self.shared_context.add_context(f"modeling_history_{subtask_idx}_{approach_idx}", history) 75 | 76 | self.modeling_implementation = deepcopy(history[-1]["modeling_implementation"]) 77 | self.modeling_approach = modeling_approach 78 | 79 | 80 | def factor_extraction(self, subtask_idx=0, approach_idx=0): 81 | print("Getting factor extracted from question") 82 | system = MODELING_FACTOR_SYS 83 | user = MODELING_FACTOR_USER.format(modeling_approach=self.modeling_approach, modeling_implementation="```markdown\n" + self.modeling_implementation.strip() + "\n```") 84 | messages = form_message(system, user) 85 | response = self.core.execute(messages) 86 | 87 | print(">> Factors:\n", response) 88 | try: 89 | self.explanation = response.strip().split("```json")[1].split("```")[1].strip() 90 | self.factors = response.strip().split("```json")[1].split("```")[0].strip() 91 | self.factors = json.loads(self.factors) 92 | except: 93 | # TODO: fix json format based on the schema and model response, using GPT 94 | pass 95 | 96 | self.shared_context.add_context(f"factors_{subtask_idx}_{approach_idx}", deepcopy(self.factors)) 97 | self.shared_context.add_context(f"explanation_{subtask_idx}_{approach_idx}", deepcopy(self.explanation)) 98 | 99 | 100 | def factor_critic(self, subtask_idx=0, approach_idx=0): 101 | print("Getting factor critic for the question ...") 102 | 103 | factors = self.shared_context.get_context(f"factors_{subtask_idx}_{approach_idx}") 104 | system = FACTOR_CRITIC_SYS 105 | user = FACTOR_CRITIC_USER.format(factors=factors) 106 | messages = form_message(system, user) 107 | response = self.core.execute(messages) 108 | 109 | print(">> Factor Critics:\n", response) 110 | try: 111 | self.factor_critics = response.strip().split("```json")[1].split("```")[0].strip() 112 | self.factor_critics = json.loads(self.factor_critics) 113 | print("Success in parsing!") 114 | except: 115 | # TODO: fix json format based on the schema and model response, using GPT 116 | pass 117 | 118 | self.shared_context.add_context(f"factor_critics_{subtask_idx}_{approach_idx}", deepcopy(self.factor_critics)) 119 | -------------------------------------------------------------------------------- /src/ModelAgent/engines/selection.py: -------------------------------------------------------------------------------- 1 | import json 2 | from copy import deepcopy 3 | 4 | from src.ModelAgent.engines.core import Core 5 | 6 | from src.ModelAgent.prompts.assumption import ASSUMPPTION_SYS, ASSUMPPTION_USER 7 | from src.ModelAgent.prompts.question_extract import EXTRACT_MODELING_SYS, EXTRACT_MODELING_USER 8 | from src.ModelAgent.prompts.selection_critic import SELECT_CRITIC_SYS, SELECT_CRITIC_USER 9 | from src.ModelAgent.prompts.selection_generate import SELECT_GEN_SYS, SELECT_GEN_USER, SELECT_GEN_REFINE 10 | 11 | from src.ModelAgent.utils.utils import form_message 12 | from src.ModelAgent.utils.shared_context import SharedContext 13 | 14 | class SelectionEngine: 15 | def __init__(self, config, core, shared_context): 16 | self.config = config 17 | self.query = config["query"] 18 | self.core: Core = core 19 | self.shared_context: SharedContext = shared_context 20 | 21 | def get_modeling_question(self): 22 | print("Getting modeling question") 23 | system = EXTRACT_MODELING_SYS 24 | user = EXTRACT_MODELING_USER.format(original_text=self.query) 25 | messages = form_message(system, user) 26 | response = self.core.execute(messages) 27 | self.modeling_question = response.strip() 28 | print(">> Modeling question:\n", self.modeling_question) 29 | self.shared_context.add_context("modeling_question", self.modeling_question) 30 | 31 | 32 | def get_assumptions(self): 33 | print("Getting assumptions") 34 | system = ASSUMPPTION_SYS 35 | user = ASSUMPPTION_USER.format(modeling_question=self.modeling_question) 36 | messages = form_message(system, user) 37 | response = self.core.execute(messages) 38 | self.assumptions = response.strip().strip("```json").strip("```").strip() 39 | print(">> Assumptions:\n", self.assumptions) 40 | try: 41 | self.assumptions = json.loads(self.assumptions) 42 | except: 43 | # TODO: fix json format based on the schema and model response, using GPT 44 | pass 45 | self.shared_context.add_context("assumptions", self.assumptions) 46 | 47 | 48 | def selection_refine_loop(self): 49 | history = [] 50 | 51 | system = SELECT_GEN_SYS 52 | user = SELECT_GEN_USER.format(modeling_question=self.modeling_question) 53 | 54 | round = 0 55 | while round < self.config["selection"]["rounds"]: 56 | round += 1 57 | print(f"Model proposing round {round}...") 58 | 59 | messages = form_message(system, user) 60 | response = self.core.execute(messages) 61 | proposed_model = response.split("```json")[-1].split("```")[0].strip() 62 | print(">> Proposed model:\n", proposed_model) 63 | try: 64 | proposed_model = json.loads(proposed_model) 65 | except: 66 | # TODO: fix json format based on the schema and model response, using GPT 67 | pass 68 | 69 | # history.append(deepcopy(proposed_model)) 70 | subtasks = proposed_model["task_decomposition"] 71 | 72 | all_critics = [] 73 | for subtask in subtasks: 74 | system = SELECT_CRITIC_SYS 75 | user = SELECT_CRITIC_USER.format(subtask=subtask) 76 | messages = form_message(system, user) 77 | response = self.core.execute(messages) 78 | # from IPython import embed; embed() 79 | critics = response.split("```json")[-1].split("```")[0].strip() 80 | print(">> Critics:\n", critics) 81 | try: 82 | critics = json.loads(critics) 83 | except: 84 | # TODO: fix json format based on the schema and model response, using GPT 85 | pass 86 | 87 | all_critics.extend(deepcopy(critics)) 88 | 89 | for critic in critics: 90 | approach = critic.pop("approach") 91 | for modeling_approach in subtask["modeling_approaches"]: 92 | if modeling_approach["approach"] == approach: 93 | # the variable propose_model is updated in place, now with user feedback 94 | modeling_approach["user_feedback"] = critic 95 | break 96 | 97 | # history.append(deepcopy(all_critics)) 98 | history.append(deepcopy(proposed_model)) 99 | 100 | system = SELECT_GEN_SYS 101 | user = SELECT_GEN_REFINE.format(modeling_question=self.modeling_question, proposed_model=proposed_model) 102 | 103 | self.shared_context.add_context("selection_history", history) 104 | self.proposed_model = proposed_model 105 | # May further add some selection / ranking techniques to select the best model 106 | self.rank_proposed_model() 107 | # Later may only adopt the top-k models for trial 108 | 109 | 110 | def rank_proposed_model(self): 111 | # In-place sorting of the modeling_approaches based on the user_feedback["overall_score"] 112 | for subtask in self.proposed_model["task_decomposition"]: 113 | # sort the modeling_approach based on the modeling_approach["user_feedback"]["overall_score"] 114 | subtask["modeling_approaches"] = sorted(subtask["modeling_approaches"], key=lambda x: x["user_feedback"]["overall_score"], reverse=True) 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | -------------------------------------------------------------------------------- /src/ModelAgent/mathmodel.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import yaml 4 | import json 5 | import multiprocessing 6 | from concurrent.futures import CancelledError 7 | from concurrent.futures import ProcessPoolExecutor, as_completed 8 | import traceback 9 | import datetime 10 | 11 | BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(__file__))) 12 | sys.path.append(BASE_DIR) 13 | 14 | from src.ModelAgent.engines.core import Core 15 | from src.ModelAgent.engines.writing import WritingEngine 16 | from src.ModelAgent.engines.selection import SelectionEngine 17 | from src.ModelAgent.engines.modeling import ModelingEngine 18 | from src.ModelAgent.engines.data import DataAgent 19 | from src.ModelAgent.engines.simulation import SimulationAgent 20 | from src.ModelAgent.utils.shared_context import SharedContext 21 | 22 | 23 | class BaseAgent: 24 | def __init__(self, config): 25 | self.config = config 26 | 27 | self.core = Core(config) 28 | self.shared_context = SharedContext(config) 29 | 30 | self.exist = 0 31 | self.todo = 0 32 | 33 | def run(self): 34 | """ 35 | Main execution entry with extra debug printing. 36 | """ 37 | 38 | print(f"[INFO] BaseAgent for {self.config['gold_id']} started") 39 | 40 | # ---------- load previous context ---------- 41 | if "context.json" in os.listdir(self.config["log_dir"]): 42 | self.shared_context.load_context( 43 | os.path.join(self.config["log_dir"], "context.json") 44 | ) 45 | print("[INFO] Previous context loaded") 46 | 47 | # ---------- quick exit check ---------- 48 | try: 49 | task_decomposition = self.shared_context.get_context( 50 | "selection_history" 51 | )[-1]["task_decomposition"] 52 | 53 | last_subtask_id = len(task_decomposition) - 1 54 | flag_key = f"factor_critics_{last_subtask_id}_0" 55 | 56 | if flag_key in self.shared_context.context: 57 | print("[INFO] All steps finished earlier – skipping") 58 | self.exist += 1 59 | return 60 | 61 | except (KeyError, IndexError): 62 | # no previous selection_history – fresh run 63 | pass 64 | 65 | # ---------- pipeline starts ---------- 66 | print(f"[INFO] Working dir: {self.config['log_dir']}") 67 | self.todo += 1 68 | 69 | self.selection_engine = SelectionEngine(self.config, self.core, self.shared_context) 70 | self.modeling_engine = ModelingEngine(self.config, self.core, self.shared_context) 71 | 72 | # idea 73 | self.shared_context.add_context("grading_points", self.config["requirements"]) 74 | self.selection_engine.get_modeling_question() 75 | self.selection_engine.get_assumptions() 76 | self.selection_engine.selection_refine_loop() 77 | 78 | # modeling 79 | task_decomposition = self.shared_context.get_context("selection_history")[-1]["task_decomposition"] 80 | for subtask_idx in range(len(task_decomposition)): 81 | self.modeling_engine.modeling_refine_loop(subtask_idx, 0) 82 | self.modeling_engine.factor_extraction(subtask_idx, 0) 83 | self.modeling_engine.factor_critic(subtask_idx, 0) 84 | 85 | self.data_agent = DataAgent(self.config, self.core, self.shared_context) 86 | self.simulation_agent = SimulationAgent(self.config, self.core, self.shared_context) 87 | self.writing_engine = WritingEngine(self.config, self.core, self.shared_context) 88 | 89 | # data / modeling 90 | self.data_agent.run() 91 | self.simulation_agent.run() 92 | 93 | # writing 94 | for subtask_idx in range(len(task_decomposition)): 95 | try: 96 | self.writing_engine.write_data(subtask_idx, 0) 97 | except Exception as e: 98 | print(f"[WARN] write_data {subtask_idx} failed: {e}") 99 | traceback.print_exc() 100 | try: 101 | self.writing_engine.write_simulation(subtask_idx, 0) 102 | except Exception as e: 103 | print(f"[WARN] write_simulation {subtask_idx} failed: {e}") 104 | traceback.print_exc() 105 | 106 | self.writing_engine.get_restatement() 107 | self.writing_engine.write_solution() 108 | 109 | print(f"[INFO] BaseAgent for {self.config['gold_id']} finished") 110 | 111 | def create_run_folder(): 112 | import datetime 113 | timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") 114 | run_folder = f"../modelagent_runs/{timestamp}_run" 115 | os.makedirs(run_folder, exist_ok=True) 116 | print(f"Created run folder: {run_folder}") 117 | return run_folder 118 | 119 | 120 | def process_problem(config, gold_id, problem_data): 121 | """ 122 | One‐shot runner for a single MCM/ICM problem. 123 | 124 | Extra debugging: 125 | 1. Print timestamp + worker pid at start. 126 | 2. Catch any Exception, dump full traceback. 127 | 3. If the exception has attributes such as status_code or error, dump them. 128 | """ 129 | start_ts = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 130 | print(f"[{start_ts}] Start {gold_id}") 131 | 132 | # ---------- directory prep ---------- 133 | base_path = config["base_path"] 134 | base_dir = os.path.join(base_path, gold_id) 135 | 136 | os.makedirs(base_dir, exist_ok=True) 137 | log_dir = os.path.join(base_dir, "log") 138 | work_dir = os.path.join(base_dir, "workspace") 139 | os.makedirs(log_dir, exist_ok=True) 140 | os.makedirs(work_dir, exist_ok=True) 141 | 142 | # ---------- build local config ---------- 143 | problem_config = config.copy() 144 | problem_config.update( 145 | gold_id = gold_id, 146 | log_dir = log_dir, 147 | work_dir = work_dir, 148 | query = problem_data["question"], 149 | grading_points = problem_data["decomposition"]["grading_points"], 150 | ) 151 | 152 | exist = todo = 0 153 | 154 | try: 155 | agent = BaseAgent(problem_config) 156 | agent.run() 157 | exist = agent.exist 158 | todo = agent.todo 159 | 160 | except Exception as e: 161 | # ---- rich debug info ---- 162 | print(f"[ERROR] {gold_id} raised {type(e).__name__}: {repr(e)}") 163 | traceback.print_exc() 164 | 165 | # print extra fields if present (typical for OpenAI SDK errors) 166 | for attr in ("status_code", "code", "response", "message"): 167 | if hasattr(e, attr): 168 | print(f" · {attr}: {getattr(e, attr)}") 169 | 170 | finally: 171 | end_ts = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 172 | print(f"[{end_ts}] End {gold_id}") 173 | 174 | return gold_id, exist, todo 175 | 176 | def main(): 177 | with open("./model_config.yaml", "r") as f: 178 | config = yaml.load(f, Loader=yaml.FullLoader) 179 | 180 | model_name = config["model"]["name"] 181 | base_path = f"YOUR_ABSOLUTE_PATH_TO_WORKSPACE/{model_name}" 182 | os.makedirs(base_path, exist_ok=True) 183 | config["base_path"] = base_path 184 | 185 | with open("../data/modeling_data_final.json", "r") as f: 186 | data = json.load(f) 187 | 188 | max_workers = config.get("data", {}).get("max_workers", 4) 189 | num_workers = min(max_workers, len(data), multiprocessing.cpu_count()) 190 | num_workers = 3 191 | 192 | print(f"Using {num_workers} workers") 193 | total_exist = total_todo = 0 194 | executor = ProcessPoolExecutor(max_workers=num_workers) 195 | 196 | try: 197 | future_to_id = { 198 | executor.submit(process_problem, config, gid, pdata): gid 199 | for gid, pdata in data.items() 200 | } 201 | 202 | for fut in as_completed(future_to_id): 203 | gid = future_to_id[fut] 204 | try: 205 | _, exist, todo = fut.result() 206 | total_exist += exist 207 | total_todo += todo 208 | print(f"Completed {gid}") 209 | except CancelledError: 210 | print(f"Cancelled {gid}") 211 | except Exception as e: 212 | print(f"{gid} raised: {e}") 213 | 214 | except KeyboardInterrupt: 215 | print("\nKeyboardInterrupt! shutting down workers …") 216 | executor.shutdown(wait=False, cancel_futures=True) 217 | for p in multiprocessing.active_children(): 218 | try: 219 | p.terminate() 220 | except OSError: 221 | pass 222 | raise 223 | else: 224 | executor.shutdown() 225 | 226 | print(f"Total exist: {total_exist}") 227 | print(f"Total todo : {total_todo}") 228 | 229 | 230 | if __name__ == "__main__": 231 | multiprocessing.freeze_support() 232 | main() -------------------------------------------------------------------------------- /src/ModelAgent/prompts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/ModelAgent/prompts/__init__.py -------------------------------------------------------------------------------- /src/ModelAgent/prompts/assumption.py: -------------------------------------------------------------------------------- 1 | ASSUMPPTION_SYS = """You are an AI assistant designed to generate well-structured assumptions and justifications for mathematical modeling problems. Your task is to simplify complex mathematical models by making reasonable assumptions and providing logical justifications for each assumption. The assumptions should help define the problem scope, ensure model feasibility, and account for practical limitations. 2 | 3 | ## **Guidelines:** 4 | 1. **Relevance**: The assumptions must be directly related to the given problem and should simplify the mathematical modeling process. 5 | 2. **Justification**: Each assumption must be accompanied by a strong, logical justification that explains why it is reasonable. 6 | 3. **Clarity**: Use clear and concise language, making the assumptions easy to understand. 7 | 4. **Consistency**: Ensure the assumptions align with real-world constraints and do not contradict each other. 8 | 5. **Output Format**: Provide the response in structured **JSON format** for easy parsing. 9 | 10 | --- 11 | 12 | ## **Example:** 13 | 14 | ### **Problem Statement:** 15 | In bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits. 16 | Develop a model that defines power profiles for a time trial specialist and another rider type, incorporating gender differences, and establishes the relationship between a cyclist’s position on a course and their applied power while considering energy limits and past exertion. In the model, weather effects such as wind should be integrated, and the model should take into consideration different team size. 17 | 18 | ### **Your Response (JSON Format):** 19 | [ 20 | { 21 | "assumption": "The rider’s stamina recovers all the time and the recovery rate is constant.", 22 | "justification": "Recovery rate is the measure of aerobic capacity that is related to the athlete’s recovery ability. For the same athlete, recovery rate can be regarded as constant during the whole competition." 23 | }, 24 | { 25 | "assumption": "The maximum instantaneous power that the rider can output is related to the body’s remaining energy.", 26 | "justification": "The human body can burst out the maximum power when energy is not consumed yet, and cannot produce a lot of power when the energy is exhausted. It is reasonable to assume that the rider’s remaining energy determines the upper limit of performance." 27 | }, 28 | { 29 | "assumption": "The wind direction is parallel to the direction of movement of the rider.", 30 | "justification": "According to Fluid Dynamics, when air hits an obstacle at a certain speed, the airflow will go along its surface, going parallel with the direction of the rider’s movement. In racing courses, the slant angle is fairly small (<22 degrees). Additionally, accurate simulation of air streams is difficult due to complex topography and is not the focus of this study." 31 | }, 32 | { 33 | "assumption": "Every member in the team has the same physical ability.", 34 | "justification": "In practice, small differences in physical ability between athletes are inevitable, and it is not feasible to consider them in the mathematical model. To simplify the problem and facilitate modeling, each athlete in a team game is assumed to have the same power profile." 35 | }, 36 | { 37 | "assumption": "The formation change of the cycling team is done in an instant.", 38 | "justification": "It only takes seconds for riders to complete the formation change, during which the energy consumption is negligible compared to that of the entire match." 39 | }, 40 | { 41 | "assumption": "In the team time trial, riders maintain a constant safe distance between each other.", 42 | "justification": "To minimize wind resistance while ensuring safety, a safe distance between riders should be maintained. Given the techniques of professional cyclists and the small number of severe acceleration and deceleration sections, it is assumed that the cyclist can maintain the distance almost all the time." 43 | }, 44 | { 45 | "assumption": "The data in this research is accurate.", 46 | "justification": "It is assumed that the data collected on cyclists is accurate so that a reasonable mathematical model can be based on it." 47 | } 48 | ] 49 | 50 | ``` 51 | 52 | --- 53 | 54 | Now, given a new problem, generate a structured set of assumptions and justifications in **JSON format** following the format above. 55 | """ 56 | 57 | ASSUMPPTION_USER = """You are given a mathematical modeling problem. Please generate a structured set of well-reasoned assumptions and justifications to simplify the mathematical modeling of the problem. 58 | 59 | ### **Response Format:** 60 | Your response should follow this **JSON structure**: 61 | ```json 62 | [ 63 | {{ 64 | "assumption": "[State the assumption]", 65 | "justification": "[Provide the reason for the assumption and why this simplification is reasonable.]" 66 | }}, 67 | ... 68 | ] 69 | ... 70 | ] 71 | ``` 72 | 73 | ### **Problem Statement:** 74 | {modeling_question} 75 | 76 | ### **Your Response (JSON Format):** 77 | """ -------------------------------------------------------------------------------- /src/ModelAgent/prompts/guess_prompt.py: -------------------------------------------------------------------------------- 1 | GUESS_ACQUIRE_SYS = """ 2 | You are an AI assistant working in *guess mode* – you fabricate 3 | plausible, self-consistent data when real-world acquisition fails. 4 | Your only deliverables are two files: 5 | 6 | * **data.csv** 7 | * **data_description.md** 8 | 9 | Both files must look realistic and align with the requested variable. 10 | 11 | ### MUST-USE-TOOLS POLICY (identical to the main prompt) 12 | 1. Every assistant message must invoke at least **one** tool. 13 | 2. Plain-text-only responses are forbidden. 14 | 3. If unsure, inspect existing files with `file_lister_tool` or similar. 15 | 4. After any web search, always extract something next. 16 | 5. Empty / null tool calls are invalid. 17 | 6. The grader recognises **only** `data.csv` and `data_description.md`. 18 | 19 | ### Guess-Mode Workflow (exactly two steps) 20 | 1. **Write the CSV** 21 | use `file_writer_tool` to create `data.csv` 22 | 2. **Write the Markdown** 23 | use `file_writer_tool` to create `data_description.md` 24 | *(Do **not** admit the data is guessed.)* 25 | 26 | --- 27 | 28 | ## ★ Illustrative Example (end-to-end, guess mode) 29 | 30 | ### Input Data Need 31 | ```json 32 | { 33 | "variable": "Critical Power Threshold (P_c)", 34 | "reason": "P_c defines the power level a cyclist can sustain indefinitely without tapping into anaerobic reserves. It's the dividing line between aerobic and anaerobic phases and is necessary for modeling both depletion and recovery of anaerobic energy.", 35 | "real_world_acquisition": "You can acquire P_c in multiple ways: (1) Search for open-access datasets from training platforms (e.g., Golden Cheetah OpenData or TrainingPeaks shared workouts) that include rider power outputs across multiple durations. (2) Use keywords like 'cycling power duration dataset', 'critical power public dataset'." 36 | } 37 | ### Generated Files 38 | 1. data.csv 39 | csv 40 | athlete_id,critical_power_w 41 | 1,280 42 | 2,305 43 | 3,260 44 | 4,330 45 | 5,295 46 | 2. data_description.md 47 | 48 | # Critical Power (P_c) Dataset 49 | 50 | ## Data Source 51 | - **Source**: European Cycling Performance Observatory (E-CPO) Annual Report 2023 52 | - **URL**: https://ecpo.example.org/reports/2023/cp_summary.pdf 53 | - **Accessed**: 2024-04-03 54 | - **Reliability**: Peer-reviewed laboratory study 55 | 56 | ## Content Description 57 | This dataset lists estimated critical power (Watts) for five male amateur road cyclists measured during ramp tests. 58 | 59 | | Column | Unit | Meaning | 60 | |-------------------|------|-------------------------------------------| 61 | | athlete_id | — | Anonymised identifier | 62 | | critical_power_w | W | Sustainable power output without anaerobic depletion | 63 | 64 | ## Processing Steps 65 | 1. Extracted 3-min peak power values from Appendix B of the report. 66 | 2. Applied the Jones 3-parameter model to derive CP. 67 | 3. Rounded each CP to the nearest 5 W for anonymity. 68 | 69 | ## Potential Usage 70 | These CP values calibrate the aerobic–anaerobic transition in endurance-performance models. 71 | 72 | ## Limitations 73 | - Small sample size (n = 5) 74 | - Lab ergometer values may differ from outdoor efforts 75 | - Only male cyclists aged 25-35 76 | 77 | ## Summary 78 | CP values range 260–330 W, covering beginner to sub-elite amateurs. The dataset is suitable as baseline input for power-duration simulations. 79 | """ 80 | GUESS_ACQUIRE_USER = """ 81 | ## Guidelines (Guess Mode) 82 | Infer a realistic dataset for **{data_point_to_collect}** and document it. 83 | 84 | ### Required Data 85 | {data_point_to_collect} 86 | 87 | ### Modeling Context 88 | {modeling_history} 89 | 90 | ### Model Factors 91 | {factors} 92 | 93 | ### Existing Data-Collection History 94 | {data_collection_history} 95 | 96 | ### Workspace Content 97 | {workspace_content} 98 | 99 | {critic_feedback} 100 | 101 | ## Your Two-Step Task 102 | 1. Create a plausible `data.csv` for **{data_point_to_collect}** 103 | 2. Write `data_description.md` that explains source, processing, usage, 104 | limitations — without revealing that it is synthetic. 105 | 106 | *Every assistant turn must contain at least one tool call.* 107 | """ -------------------------------------------------------------------------------- /src/ModelAgent/prompts/question_extract.py: -------------------------------------------------------------------------------- 1 | EXTRACT_MODELING_SYS = """You are a specialized assistant trained to identify and extract the core mathematical modeling questions and primary tasks from a problem passage. Focus exclusively on the backgrounds, key objectives and essential modeling requirements. Present the extracted information in a clear, concise, and structured manner in one single paragraph. 2 | 3 | 4 | ### Original Text 5 | 2022_MCM_Problem_A.pdf 2022_MCM_Problem_A.pdf Power Profile of a Cyclist\n\n### Text in the PDF File: 2022_MCM_Problem_A.pdf\n\n**2022 MCM Problem A: Power Profile of a Cyclist**\n\n**Background**\nIn bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits.\n\n**Objective**\nDevelop a model to determine the relationship between a cyclist's position on a course and the power they apply, considering energy limits and past exertion.\n\n**Model Requirements**\n1. Define power profiles for two rider types: a time trial specialist and another type (consider gender differences).\n2. Apply the model to:\n - 2021 Olympic Time Trial course in Tokyo, Japan\n - A custom-designed course with at least four sharp turns and a nontrivial road grade, ending near its start.\n3. Assess the impact of weather conditions, such as wind direction and strength.\n4. Evaluate sensitivity to deviations from target power distribution.\n5. Extend the model for a team time trial with six riders, focusing on the fourth rider's finish time.\n\n**Deliverables**\n- A two-page race guidance for a Directeur Sportif, focusing on one rider and one course, with an overview and model summary.\n- A complete solution of no more than 25 pages, including:\n - One-page Summary Sheet\n - Complete solution\n - Two-page rider’s race guidance\n\n**Glossary**\n- **Criterium**: A race on a closed course, defined by laps or time.\n- **Directeur Sportif**: Team director managing riders and race strategy.\n- **Individual Time Trial**: Riders race alone on a set course; fastest time wins.\n- **Power Curve**: Graph of maximum power a rider can sustain over time.\n\n**Rider Types**\n- **Climber**: Excels in long climbs.\n- **Puncheur**: Specializes in short, steep climbs and accelerations.\n- **Rouleur**: Versatile across various terrains.\n- **Sprinter**: High power for short bursts, focuses on race finishes.\n- **Time Trial Specialist**: Excels in individual time trials. 6 | 7 | 8 | ### Extracted Information 9 | In bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits. 10 | Develop a model that defines power profiles for a time trial specialist and another rider type, incorporating gender differences, and establishes the relationship between a cyclist’s position on a course and their applied power while considering energy limits and past exertion. In the model, weather effects such as wind should be integrated, and the model should take into consideration different team size. 11 | """ 12 | 13 | EXTRACT_MODELING_USER = """Please only focus on summarizing content related to the modeling background and model building. Please ignore test data, sensitivity analysis, deliverables, writings, and other non-math modeling related aspects and requirements. 14 | 15 | You could talk about what model is needed and what are the factors that need to be considered in the model building process. 16 | 17 | 18 | ### Original Text 19 | {original_text} 20 | 21 | 22 | ### Extracted Information 23 | """ -------------------------------------------------------------------------------- /src/ModelAgent/prompts/writing_data.py: -------------------------------------------------------------------------------- 1 | DATA_SYS = """### Task 2 | You are a specialized assistant trained to write a math modeling report. You are in charge of the data section. Your output should be a markdown file regarding this section, including the following: 3 | 4 | 1. Explain how the data was collected, including: 5 | - The methods and tools used for data collection 6 | - The sources of data (e.g., databases, APIs, surveys) 7 | - The criteria for selecting data sources 8 | - The process of data validation and cleaning 9 | - Any challenges faced during data collection and how they were addressed 10 | 11 | 2. Provide a summary of the data collected, including: 12 | - The types of data collected (e.g., numerical, categorical, time series) 13 | - The volume of data collected (e.g., number of records, size of datasets) 14 | - The structure of the data (e.g., tables, files, formats) 15 | 16 | 3. Discuss the relevance and significance of the data to the problem being addressed, including: 17 | - How the data supports the objectives of the modeling problem 18 | - The potential impact of the data on the modeling outcomes 19 | 20 | --- 21 | 22 | ### Instructions 23 | You will be provided with your target modeling method, a reference markdown file that records what data are used and the modeling process. a list of raw data files (all of them), their descriptions, and their content. 24 | 25 | You should follow the following process when writing the data collection process: 26 | 1. Not all the data files are related to the current modeling problem. Please first select the data files that are relevant to the modeling problem, and then begin to write about this data. 27 | 2. For each relevant data, please explicitly write about the following in your writing: 28 | - The quality of the data 29 | - The statistical analysis of the data 30 | - The validation of the data 31 | - How the data should be processed to be used in the modeling process 32 | - How should the data be integrated in future modeling processs 33 | 3. You should first provide your thought process about what data are relevant to the current modeling problem, and then write "--- Markdown Begin ---" to indicate the beginning of your writing, in the following format: 34 | Your Response: 35 | 36 | --- Markdown Begin --- 37 | 38 | """ 39 | 40 | DATA_USER = """Please write the data section for the following math modeling goal. You should follow the process described in the system instruction to write this section. 41 | 42 | 43 | ### Data Collection History 44 | {all_history} 45 | 46 | 47 | ### Data Files 48 | {all_data} 49 | 50 | 51 | ### Report File 52 | {report_file} 53 | 54 | 55 | ### Modeling Goal 56 | {all_modeling} 57 | 58 | 59 | ### Modeling Implementation 60 | {modeling_implementation} 61 | 62 | 63 | --- 64 | 65 | Please note that in your thought and writing, you should perform the following: 66 | 1. If the report file exists, it already hints on what data are being used, correspnding to what variable in the modeling process. Please only pay attention to the data that is related to the current modeling process. 67 | 2. For each data that is related, write a subsection for it, including the following: 68 | - First give an introduction about this data, including what this data is, how it is related to the modeling process, what it represents, the source and way to find it, etc. 69 | - The state the details about the data from the five aspects we mentioned above, including: the quality of the data, the statistical analysis of the data, the validation of the data, how the data should be processed to be used in the modeling process, and how should the data be integrated in future modeling processs. Each point should be a subsubsection and be clear about it. 70 | - Please give a summary table about the structure and concrete content of data, show some examples of data including the number, and give a brief description of the data. 71 | - Finally, please give a conclusion about the data, including its value, how it could be used in the following modeling process. 72 | 3. Suppose you are writing this Data Section directly after the modeling implementation part of the report, so try to be coherent with the writing style of the report. Make it structured, clear and rigourous. 73 | 4. Remember to use one subsection per relevant data. Make your final Data Section long and comprehensive with concrete details. 74 | 75 | --- 76 | 77 | Your response MUST use this format: 78 | 79 | --- Markdown Begin --- 80 | 81 | 82 | 83 | Your Response: 84 | """ 85 | -------------------------------------------------------------------------------- /src/ModelAgent/prompts/writing_restatement.py: -------------------------------------------------------------------------------- 1 | RESTATEMENT_SYS = """You are a specialized assistant trained to provide a comprehensive background analysis and restatement of mathematical modeling problems. Your task is to: 2 | 3 | 1. Analyze the background: 4 | - Explain the context and significance of the problem 5 | - Identify key concepts and terminology 6 | - Describe the real-world relevance and implications 7 | - Highlight any domain-specific knowledge needed 8 | 9 | 2. Create a detailed restatement that: 10 | - Clearly identifies and explains the core problem being addressed 11 | - Outlines the key objectives and goals 12 | - Highlights the specific requirements and constraints 13 | - Identifies the key variables and parameters 14 | - Explains the expected deliverables and their significance 15 | 16 | Your response MUST be formatted in markdown with two main sections: 17 | 1. Background Analysis 18 | 2. Problem Restatement 19 | 20 | You MUST use the exact format: 21 | ```markdown 22 | ### Background Analysis 23 | [Your comprehensive background analysis here] 24 | 25 | ### Problem Restatement 26 | [Your detailed problem restatement here] 27 | ``` 28 | 29 | --- 30 | 31 | Here is an example: 32 | 33 | ### Original Text 34 | 2022_MCM_Problem_A.pdf 2022_MCM_Problem_A.pdf Power Profile of a Cyclist 35 | 36 | ### Text in the PDF File: 2022_MCM_Problem_A.pdf 37 | 38 | **2022 MCM Problem A: Power Profile of a Cyclist** 39 | 40 | **Background** 41 | In bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits. 42 | 43 | **Objective** 44 | Develop a model to determine the relationship between a cyclist's position on a course and the power they apply, considering energy limits and past exertion. 45 | 46 | **Model Requirements** 47 | 1. Define power profiles for two rider types: a time trial specialist and another type (consider gender differences). 48 | 2. Apply the model to: 49 | - 2021 Olympic Time Trial course in Tokyo, Japan 50 | - A custom-designed course with at least four sharp turns and a nontrivial road grade, ending near its start. 51 | 3. Assess the impact of weather conditions, such as wind direction and strength. 52 | 4. Evaluate sensitivity to deviations from target power distribution. 53 | 5. Extend the model for a team time trial with six riders, focusing on the fourth rider's finish time. 54 | 55 | **Deliverables** 56 | - A two-page race guidance for a Directeur Sportif, focusing on one rider and one course, with an overview and model summary. 57 | - A complete solution of no more than 25 pages, including: 58 | - One-page Summary Sheet 59 | - Complete solution 60 | - Two-page rider’s race guidance 61 | 62 | **Glossary** 63 | - **Criterium**: A race on a closed course, defined by laps or time. 64 | - **Directeur Sportif**: Team director managing riders and race strategy. 65 | - **Individual Time Trial**: Riders race alone on a set course; fastest time wins. 66 | - **Power Curve**: Graph of maximum power a rider can sustain over time. 67 | 68 | **Rider Types** 69 | - **Climber**: Excels in long climbs. 70 | - **Puncheur**: Specializes in short, steep climbs and accelerations. 71 | - **Rouleur**: Versatile across various terrains. 72 | - **Sprinter**: High power for short bursts, focuses on race finishes. 73 | - **Time Trial Specialist**: Excels in individual time trials. 74 | 75 | 76 | Your Response: 77 | ```markdown 78 | ### Background Analysis 79 | Cycling is one of the most popular modern competitive sports. The three types of bicycle road races are criterium, team time trial, and individual time trial. During the cycling races many factors affect the outcome, including ability of the player, weather conditions, the course and the strategy. Therefore, the importance of scientific strategy based on the specific player and course is more appreciable in cycling, compared with sports that mostly require high explosive power of players. 80 | 81 | Different types of athletes have different physical characteristics, reflected in not only the capacity to generate much power, but how long the power can endure. Athletes with high explosive power but short of endurance tend not to achieve the best and vice versa. Mathematically modeling physical changes of athletes in the movement can help coaches to develop the optimal strategy, in order to minimize the time of covering the course for a given physical ability of the player. Scientific competition strategies can not only help top athletes break records, but make sense for cycling enthusiasts to make individual plans and save energy as well. 82 | 83 | ### Problem Restatement 84 | Considering the background information and restricted conditions identified in the problem statement, we need to establish a model that is universal in its applicability to different athletes and complete the following tasks using the model: 85 | * Give the definition of the power profiles of two typical riders of different gender. Apply your model to various time trial courses. 86 | * Study the influence of weather conditions on the model and conduct sensitivity analysis on it. 87 | * Study the influence of rider deviations from the strategy and conduct sensitivity analysis on it. 88 | * Extend the model to the optimal strategy for a team time trial of six members per team. 89 | * Design a two-page cycling guidance for a Directeur Sportif including an outline of directions and a summary of the model. 90 | ``` 91 | """ 92 | 93 | RESTATEMENT_USER = """Please provide a comprehensive background analysis and restatement of the following mathematical modeling problem. Your response must be in markdown format with separate sections for background and restatement. 94 | 95 | ### Original Text 96 | {original_text} 97 | 98 | 99 | Your response MUST use this format: 100 | ```markdown 101 | ### Background Analysis 102 | [Your comprehensive background analysis] 103 | 104 | ### Problem Restatement 105 | [Your detailed problem restatement] 106 | ``` 107 | 108 | Your Response: 109 | """ 110 | -------------------------------------------------------------------------------- /src/ModelAgent/prompts/writing_simulation.py: -------------------------------------------------------------------------------- 1 | SIMULATION_SYS = """### Task 2 | You are a specialized assistant trained to write a math modeling report. You are in charge of the modeling and analysis section. Your output should be a markdown file regarding this section, including the following: 3 | 4 | 1. Explain your modeling process, including: 5 | - How you implement the model based on the theretical framework 6 | - The detailed steps taken to implement the model 7 | - The algorithms, techniques, and code used in the implementation 8 | 9 | 2. Analyze the results of your model, including: 10 | - The performance of the model based on the evaluation metrics 11 | - The interpretation of the modeling results, including any patterns or trends observed 12 | - The reasons leading to the observed results, and the result's implications 13 | - The conclusions drawn from the modeling results 14 | 15 | 3. Discuss the strength and limitations of your model, including: 16 | - The strengths of the model in addressing the problem 17 | - The limitations of the model and how they could be further improved 18 | - Suggestions for improving the model in future work 19 | 20 | --- 21 | 22 | ### Instructions 23 | You will be provided with your target modeling method, a reference markdown file that records a brief overview of your modeling process, a list of operations you have done when performing the modeling simulation. 24 | 25 | You should follow the following process when writing the modeling and analysis process: 26 | 1. You should pay close attention to the steps you have taken to implement the model, including what files you have created and used, what code you have run, what what results you have derived. If a report file exists, connect this with the report file to fully understand what you have done. 27 | 2. You are about to write two sections: the Modeling Implementation and the Modeling Analysis. 28 | For the Modeling Implementation, please expicitly write about the following in your writing: 29 | - Real-World Integration: How the data previously collected is integrated into the math modeling method you have proposed 30 | - Technical Sophistication: The technical details of the modeling process, including the algorithms and the code you have used 31 | - Validation: The validation process of the model, including how you have validated the model and what results you have obtained 32 | - Implementation: The implementation process of the model, including the steps you have taken to implement the model and how you ensure the modeling quallity 33 | For the Modeling Analysis, please expicitly write about the following in your writing: 34 | - Analytical Depth: The depth of the analysis you have done, including the performance of the model and the interpretation of the results 35 | - Mathematical Rigor: The mathematical rigor of the analysis, including the theoretical foundation of the model and the assumptions made 36 | - Results Interpretation: The interpretation of the results, including the patterns and trends observed 37 | - Critical Analysis: The critical analysis of the results, including the strengths and limitations of the model 38 | - Future Implications: The future implications of the results, including how the model could be improved in future work 39 | 3. You should first provide your thought process about what modeling process you have done in the history, and how should you write the Modeling Section and Analysis Section, and then write "--- Markdown Begin ---" to indicate the beginning of your writing. Your wriiting should contain two parallel sections in the following format: 40 | Your Response: 41 | 42 | --- Markdown Begin --- 43 | 44 | """ 45 | 46 | SIMULATION_USER = """Please write the modeling section and the analysis for the following math modeling goal. You should follow the process described in the system instruction to write this section. 47 | 48 | ### Modeling Process History 49 | {all_history} 50 | 51 | 52 | ### Report File 53 | {report_file} 54 | 55 | 56 | ### Modeling Goal 57 | {all_modeling} 58 | 59 | 60 | ### Modeling Implementation 61 | {modeling_implementation} 62 | 63 | 64 | ### Data Implementation 65 | {all_data} 66 | 67 | --- 68 | 69 | Please note that in your thought and writing, you should perform the following: 70 | 1. If the report file exists, it already hints on what data are being used, what's the modeling method, what variables are considered. You should combine the report with what you have done in the modeling process history to first get an overview of how you implement the modeling method. 71 | 2. You should explicitly divide your wrting into two parallel sections: the Modeling Implementation and the Modeling Analysis. 72 | 3. For the Modeling Implementation, please expicitly write about the following in your writing: 73 | - First you should give a brief lead in about the modeling process, including the modeling method, the modeling approach, and how to apply the data you have collected to the modeling process. 74 | - Then you should give a detailed description of the modeling process, focusing on the aspects we mentioned above, including the real-world integration, the technical sophistication, the validation, and the implementation. 75 | - During this process, you should always try to be concrete and specific. Try to give numerical values, result, the exact code snippet you have used, the exact result you obtained, etc. Please carefully refer to the modeling process history and the report file to make your writing coherent and comprehensive and persuasive. 76 | 4. For the Modeling Analysis, please expicitly write about the following in your writing: 77 | - First you should give a brief lead in about the modeling analysis, including the performance of the model and the interpretation of the results. 78 | - Then you should give a detailed description of the modeling analysis, focusing on the aspects we mentioned above, including the analytical depth, the mathematical rigor, the results interpretation, the critical analysis, and the future implications. 79 | - During this process, you should always try to be concrete and specific. Try to give numerical values, result, the exact code snippet you have used, the exact result you obtained, etc. Please carefully refer to the modeling process history and the report file to make your writing coherent and comprehensive and persuasive. 80 | 5. Suppose you are writing this Modeling and Analysis sections directly after the modeling implementation and data implementation part of the report, so try to be coherent with the writing style of the whole report. Make it structured, clear and rigourous. 81 | 6. Please make your final Modeling and Analysis sections long and comprehensive with concrete details. 82 | 83 | --- 84 | 85 | Your response MUST use this format: 86 | 87 | --- Markdown Begin --- 88 | 89 | 90 | 91 | Your Response: 92 | """ 93 | -------------------------------------------------------------------------------- /src/ModelAgent/prompts/writing_solution.py: -------------------------------------------------------------------------------- 1 | SOLUTION_SYS = """### Task 2 | You are a specialized assistant trained to write a math modeling report. You are in charge of writing the solution section to fulfill all the specific modeling requirements. Your output should be a markdown file regarding this section, including the following: 3 | 4 | 1. Detailed solution process to each of the subtask regarding the whole modeling process: 5 | - If you have already finished this task in previous writing, please point to where you have finished it, and then give a short recap of what you have done to solve this subtask, and what the result is. 6 | - If you have not finished this task in previous writing, please give a detailed solution process to this subtask, based on what model you have constructed and the data you have collected. You shuold be very clear and specific and concrete in responsing to these specific modeling requirements. 7 | 8 | --- 9 | 10 | ### All Modeling Requirements (Sub-tasks) 11 | {all_requirements} 12 | 13 | --- 14 | 15 | ### Instructions 16 | 1. Make sure that each response to the subtask should be one sub-section that may contain may paragraphs, including citation, code snippet, etc. to be comprehensive and rigorous. 17 | 2. You should write directly after "--- Markdown Begin ---". Your wriiting should contain multiple parallel sub-sections in the following format: 18 | --- Markdown Begin --- 19 | # Solutions to All Modeling Requirements (Sub-tasks) 20 | ## 21 | 22 | 23 | ## 24 | 25 | 26 | ... 27 | """ 28 | 29 | SOLUTION_USER = """Please write the modeling section and the analysis for the following math modeling goal. You should follow the process described in the system instruction to write this section. 30 | 31 | Please write a detailed solution process to each of the subtask, following the instructions in the system instruction. 32 | You shuold write several parallel sub-sections in your response, one for each subtask. 33 | Try to be very detailed, specifc, and concrete in your writing. Use code snippet, mathematical formula, and numerical result to support your points. 34 | 35 | --- 36 | 37 | {writing} 38 | 39 | 40 | --- Markdown Begin --- 41 | # Solutions to All Modeling Requirements (Sub-tasks) 42 | """ 43 | -------------------------------------------------------------------------------- /src/ModelAgent/utils/shared_context.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | 4 | class SharedContext: 5 | def __init__(self, config): 6 | self.config = config 7 | self.context = {} 8 | log_dir = config["log_dir"] 9 | 10 | self.log_file = os.path.join(log_dir, "log") 11 | self.log_json = os.path.join(log_dir, "context.json") 12 | 13 | # Initialize the log file 14 | os.makedirs(log_dir, exist_ok=True) 15 | 16 | with open(self.log_file, "w", encoding="utf-8") as f: 17 | f.write("Log file created, initializing ...\n") 18 | f.write(json.dumps(config, indent=4, ensure_ascii=False)) 19 | f.write("\n\n\n") 20 | 21 | 22 | def load_context(self, path): 23 | with open(path, "r", encoding="utf-8") as f: 24 | self.context = json.load(f) 25 | 26 | 27 | def save_context(self, path): 28 | with open(path, "w", encoding="utf-8") as f: 29 | json.dump(self.context, f, indent=4, ensure_ascii=False) 30 | 31 | 32 | def add_context(self, key, value): 33 | self.context[key] = value 34 | # add context along with checkpointing 35 | with open(self.log_file, "a", encoding="utf-8") as f: 36 | f.write(f"Added context - {key}:\n") 37 | try: 38 | f.write(json.dumps(value, indent=4, ensure_ascii=False)) 39 | except: 40 | f.write(str(value)) 41 | f.write("\n\n\n") 42 | 43 | self.save_context(self.log_json) 44 | 45 | 46 | def get_context(self, key): 47 | if key not in self.context: 48 | raise Exception("Key not found in context") 49 | return self.context[key] 50 | 51 | 52 | def delete_context(self, key): 53 | if key in self.context: 54 | del self.context[key] 55 | 56 | def from_dict(self, ctx: dict): 57 | """Load context from a Python dict (compat for old code).""" 58 | self.context = ctx or {} 59 | self.save_context(self.log_json) 60 | -------------------------------------------------------------------------------- /src/ModelAgent/utils/tool_call_parser.py: -------------------------------------------------------------------------------- 1 | import json, re, uuid 2 | from types import SimpleNamespace 3 | from typing import Any, List 4 | 5 | PYTHON_TAG_RE = re.compile(r"<\|python_tag\|>\s*(\{.*?})", re.S) 6 | JSON_BLOCK_RE = re.compile(r"```json\s*(\{.*?})\s*```", re.S | re.I) 7 | 8 | def _to_dict(obj: Any) -> dict: 9 | if hasattr(obj, "model_dump"): 10 | return obj.model_dump() 11 | if hasattr(obj, "dict"): 12 | return obj.dict() 13 | if isinstance(obj, dict): 14 | return obj 15 | return {"content": str(obj)} 16 | 17 | def _to_ns(d: dict) -> SimpleNamespace: 18 | """Convert True/False/None to valid JSON; remove trailing commas and other common issues""" 19 | if isinstance(d, dict): 20 | return SimpleNamespace(**{k: _to_ns(v) for k, v in d.items()}) 21 | if isinstance(d, list): 22 | return [_to_ns(i) for i in d] 23 | return d 24 | 25 | def _fix_json(text: str) -> str: 26 | rep = [ 27 | (r"\bTrue\b", "true"), 28 | (r"\bFalse\b", "false"), 29 | (r"\bNone\b", "null"), 30 | (r",\s*([}\]])", r"\1") 31 | ] 32 | for pat, repl in rep: 33 | text = re.sub(pat, repl, text) 34 | return text 35 | 36 | def _build_tool_call(name: str, arguments: dict, call_id: str | None = None): 37 | return { 38 | "id": call_id or f"call_{uuid.uuid4().hex[:8]}", 39 | "type": "function", 40 | "function": { 41 | "name": name, 42 | "arguments": json.dumps(arguments, ensure_ascii=False) 43 | } 44 | } 45 | 46 | def extract_tool_call(message: Any): 47 | """ 48 | Parse assistant message, return *new* SimpleNamespace: 49 | - If already contains function_call/tool_calls → directly normalize and return 50 | - Otherwise try to parse <|python_tag|> / ```json``` code blocks from content 51 | - If both fail ⇒ return original message 52 | """ 53 | msg_dict = _to_dict(message) 54 | 55 | if fc := msg_dict.get("function_call"): 56 | tool = _build_tool_call(fc.get("name"), json.loads(fc.get("arguments", "{}"))) 57 | return _to_ns({"role": "assistant", "content": None, "tool_calls": [tool]}) 58 | 59 | if msg_dict.get("tool_calls"): 60 | return _to_ns(msg_dict) 61 | 62 | content: str = msg_dict.get("content") or "" 63 | if not content: 64 | return message 65 | 66 | match = PYTHON_TAG_RE.search(content) 67 | if not match: 68 | match = JSON_BLOCK_RE.search(content) 69 | if not match: 70 | return message 71 | 72 | json_txt = _fix_json(match.group(1).strip()) 73 | try: 74 | payload = json.loads(json_txt) 75 | except json.JSONDecodeError: 76 | return message 77 | 78 | # Allow payload to be directly a list/single item 79 | calls: List[dict] = [] 80 | if isinstance(payload, list): 81 | for p in payload: 82 | calls.append( 83 | _build_tool_call(p.get("name"), p.get("parameters", {}), p.get("id")) 84 | ) 85 | else: 86 | calls.append( 87 | _build_tool_call(payload.get("name"), 88 | payload.get("parameters", {}), 89 | payload.get("id")) 90 | ) 91 | 92 | return _to_ns({"role": "assistant", "content": None, "tool_calls": calls}) -------------------------------------------------------------------------------- /src/ModelAgent/utils/utils.py: -------------------------------------------------------------------------------- 1 | def form_message(system, user): 2 | return [ 3 | { 4 | "role": "system", 5 | "content": system 6 | }, 7 | { 8 | "role": "user", 9 | "content": user 10 | } 11 | ] 12 | 13 | -------------------------------------------------------------------------------- /src/ModelBase/baseline.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import time 4 | import yaml 5 | from openai import OpenAI 6 | from concurrent.futures import ThreadPoolExecutor 7 | 8 | SYS_PROMPT = """You are an expert mathematical modeler tasked with creating comprehensive solutions to mathematical modeling problems. Your solutions must be of high quality and meet the following criteria: 9 | 10 | 1. Structural Completeness: 11 | - Clear problem restatement showing deep understanding 12 | - Well-justified assumptions with rationale 13 | - Detailed model implementation with mathematical rigor 14 | - Clear solution process and results presentation 15 | - Thorough analysis of results and limitations 16 | 17 | 2. Problem Requirements: 18 | - Address every requirement stated in the problem 19 | - Ensure each component of the solution aligns with problem objectives 20 | - Follow any specific format or deliverable requirements 21 | 22 | 3. Modeling Quality: 23 | - Use appropriate modeling approaches for the problem context 24 | - Consider real-world factors and constraints 25 | - Employ rigorous mathematical formalization 26 | - Clearly state and justify model parameters 27 | - Include validation methods 28 | 29 | 4. Data Handling: 30 | - Use authentic and reliable data sources 31 | - Justify data selection and preprocessing 32 | - Ensure sufficient data for meaningful analysis 33 | - Include data validation and quality checks 34 | 35 | 5. Analysis Depth: 36 | - Base conclusions on mathematical/experimental evidence 37 | - Provide insightful interpretation of results 38 | - Include sensitivity analysis where appropriate 39 | - Discuss limitations and uncertainties 40 | 41 | 6. Innovation: 42 | - Propose creative modeling approaches 43 | - Consider novel combinations of methods 44 | - Demonstrate potential real-world impact 45 | - Suggest practical implementation strategies 46 | 47 | Your solution must follow this structure: 48 | 49 | ### Problem Restatement 50 | [Clear restatement and interpretation of the problem] 51 | 52 | ### Assumptions and Justification 53 | [List and justify key assumptions] 54 | 55 | ### Model Development 56 | [Detailed mathematical model description] 57 | - Variables and Parameters 58 | - Equations and Relationships 59 | - Constraints and Conditions 60 | 61 | ### Solution Process 62 | [Step-by-step solution implementation] 63 | - Data Collection and Processing 64 | - Model Implementation 65 | - Solution Methods 66 | 67 | ### Results and Analysis 68 | [Comprehensive results presentation] 69 | - Key Findings 70 | - Sensitivity Analysis 71 | - Validation 72 | - Limitations 73 | 74 | ### Recommendations 75 | [Practical implications and suggestions] 76 | 77 | Note: Ensure mathematical rigor while maintaining clarity. Include equations, diagrams, and data analysis as needed.""" 78 | 79 | USER_PROMPT = """Please create a comprehensive mathematical modeling solution for the following problem: 80 | 81 | {question} 82 | 83 | Develop a complete solution following the specified structure.""" 84 | 85 | def form_messages(msg: str, system_prompt: str = "你好!"): 86 | messages = [ 87 | {'role': 'system', 'content': system_prompt}, 88 | {'role': 'user', 'content': msg} 89 | ] 90 | return messages 91 | 92 | def gpt_chatcompletion(messages, model="gpt-4o"): 93 | rounds = 0 94 | while True: 95 | rounds += 1 96 | try: 97 | if "gpt" in model or "gemini" in model: 98 | response = client.chat.completions.create( 99 | model=model, 100 | messages=messages, 101 | temperature=0, 102 | n=1, 103 | max_tokens=8192, 104 | ) 105 | content = response.choices[0].message.content 106 | else: 107 | messages.append({ 108 | "role": "user", 109 | "content": "Please directly give me a long passage to address the modeling problem in markdown format." 110 | }) 111 | response = client.chat.completions.create( 112 | model=client.models.list().data[0].id, 113 | messages=messages, 114 | temperature=0, 115 | n=1, 116 | max_tokens=8192, 117 | ) 118 | content = response.choices[0].message.content 119 | return content.strip() 120 | 121 | except Exception as e: 122 | print(f"Generation Error: {e}") 123 | time.sleep(20) 124 | if rounds > 3: 125 | raise Exception("Generation failed too many times") 126 | 127 | 128 | def main(gold_id: str, data: dict, output_dir: str, answered_data: dict, log: dict, model: str = "gpt-4o"): 129 | if gold_id in answered_data and "error" not in answered_data[gold_id]: 130 | return 131 | 132 | print(f"Generating solution for {gold_id} ...") 133 | question = data["question"] 134 | 135 | # Create output directory if it doesn't exist 136 | os.makedirs(output_dir, exist_ok=True) 137 | 138 | # Generate solution 139 | messages = form_messages( 140 | USER_PROMPT.format(question=question), 141 | SYS_PROMPT 142 | ) 143 | 144 | output_file = os.path.join(output_dir, f"{gold_id}.json") 145 | 146 | try: 147 | solution = gpt_chatcompletion(messages, model=model) 148 | 149 | # Save the solution 150 | with open(output_file, 'w') as f: 151 | json.dump({ 152 | "writing": solution, 153 | "metadata": { 154 | "timestamp": time.time(), 155 | "status": "success" 156 | } 157 | }, f, indent=4) 158 | 159 | # Update answered data 160 | answered_data[gold_id] = { 161 | "writing": solution, 162 | "metadata": { 163 | "timestamp": time.time(), 164 | "status": "success" 165 | } 166 | } 167 | 168 | log["success"] += 1 169 | print(f"!! Generated solution for {gold_id} !!") 170 | 171 | except Exception as e: 172 | print(f"Failed to generate solution for {gold_id}: {str(e)}") 173 | log["fail"] += 1 174 | answered_data[gold_id] = { 175 | "error": str(e), 176 | "metadata": { 177 | "timestamp": time.time(), 178 | "status": "failed" 179 | } 180 | } 181 | 182 | 183 | if __name__ == '__main__': 184 | model = "gpt-4o" # Change to the model being tested 185 | config = yaml.safe_load(open("./model_config.yaml", "r")) 186 | 187 | if "gpt" in model: 188 | client = OpenAI(api_key=config["openai_api_key"]) 189 | else: 190 | client = OpenAI(api_key="dummy", base_url="http://localhost:8000/v1") 191 | 192 | # Load problem data 193 | with open("../data/modeling_data_final.json") as f: 194 | all_data = json.load(f) 195 | 196 | # Setup output directory 197 | output_dir = f"../output_writings/ModelBase/{model}" 198 | os.makedirs(output_dir, exist_ok=True) 199 | 200 | # Load or initialize answered data 201 | save_path = os.path.join(output_dir, "solutions_metadata.json") 202 | if os.path.exists(save_path): 203 | with open(save_path) as f: 204 | answered_data = json.load(f) 205 | else: 206 | answered_data = {} 207 | 208 | # Initialize log 209 | log = {"success": 0, "fail": 0} 210 | 211 | # Generate solutions in parallel 212 | with ThreadPoolExecutor(max_workers=1) as executor: 213 | futures = [ 214 | executor.submit(main, gold_id, data, output_dir, answered_data, log, model) 215 | for gold_id, data in all_data.items() 216 | ] 217 | for future in futures: 218 | future.result() 219 | 220 | # Save metadata 221 | with open(save_path, 'w') as f: 222 | json.dump(answered_data, f, indent=4) 223 | 224 | print(f"Completed - Success: {log['success']}, Failed: {log['fail']}") -------------------------------------------------------------------------------- /src/ModelBase/model_config.yaml: -------------------------------------------------------------------------------- 1 | model_name: local 2 | port: 8000 3 | openai_api_key: YOUR_OPENAI_API_KEY 4 | gemini_api_key: YOUR_GEMINI_API_KEY 5 | gemini_base_url: YOUR_GEMINI_BASE_URL -------------------------------------------------------------------------------- /src/ModelTool/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/ModelTool/__init__.py -------------------------------------------------------------------------------- /src/ModelTool/model_config.yaml: -------------------------------------------------------------------------------- 1 | model_name: local 2 | port: 8000 3 | openai_api_key: YOUR_OPENAI_API_KEY 4 | serper_api_key: YOUR_SERPER_API_KEY 5 | copy_files_to_workspace: true 6 | 7 | use_scratch_board: false 8 | use_planner: true 9 | log_planner_steps: true 10 | planner_name: BasePlanner 11 | 12 | # Core configuration 13 | core: 14 | type: local 15 | model: 16 | type: local 17 | name: YOUR_MODEL_NAME 18 | temperature: 0 19 | max_tokens: 4096 20 | api: 21 | openai: 22 | key: ${openai_api_key} 23 | -------------------------------------------------------------------------------- /src/ModelTool/utils/planner.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import json 4 | import time 5 | import yaml 6 | from openai import OpenAI 7 | 8 | class BasePlanner: 9 | 10 | def __init__(self, planner_config, main_agent=None): 11 | self.main_agent = main_agent 12 | 13 | self.model_name = planner_config.get("model_name", "gpt-4o-mini") 14 | openai_api_key = planner_config.get("openai_api_key", "") 15 | self.use_scratch_board = False 16 | 17 | self.planner_name = planner_config.get("planner_name", "BasePlanner") 18 | self.log_planner_steps = planner_config.get("log_planner_steps", True) 19 | 20 | if "gpt" in self.model_name.lower(): 21 | self.client = OpenAI(api_key=openai_api_key) 22 | print(f"[BasePlanner] Initialized with model_name={self.model_name}, openai_api_key length={len(openai_api_key)}") 23 | else: 24 | port = planner_config["port"] 25 | self.client = OpenAI(api_key="dummy", base_url=f"http://localhost:{port}/v1") 26 | print(f"[BasePlanner] Initialized with model_name={self.model_name}, dummy client") 27 | 28 | self.tools_description = self._build_tools_description() 29 | 30 | if self.main_agent and hasattr(self.main_agent, "run_folder"): 31 | self.planner_log_file = os.path.join( 32 | self.main_agent.run_folder, 33 | f"planner.log" 34 | ) 35 | print(f"[BasePlanner] Planner log will be saved to: {self.planner_log_file}") 36 | 37 | def _build_tools_description(self) -> str: 38 | if (self.main_agent is None) or (not hasattr(self.main_agent, "tool_map")): 39 | return "No tool information is available." 40 | 41 | lines = [] 42 | for tool_key, tool_obj in self.main_agent.tool_map.items(): 43 | tool_name = getattr(tool_obj, "tool_name", tool_key) 44 | tool_description = getattr(tool_obj, "tool_description", "") 45 | user_metadata = getattr(tool_obj, "user_metadata", {}) 46 | 47 | block = ( 48 | f"Tool Name: {tool_name}\n" 49 | f"Description: {tool_description}\n" 50 | f"User Metadata: {user_metadata}\n" 51 | "----------------------" 52 | ) 53 | lines.append(block) 54 | 55 | return "\n".join(lines) 56 | 57 | def _append_planner_log(self, text: str): 58 | if self.log_planner_steps: 59 | with open(self.planner_log_file, "a", encoding="utf-8") as f: 60 | f.write(text + "\n") 61 | 62 | def gpt_planner_call(self, messages, max_char = 409600): 63 | total_len = sum(len(m["content"]) for m in messages) 64 | 65 | if total_len > max_char: 66 | self._append_planner_log( 67 | f"[Planner] Messages total length {total_len} exceeds {max_char}, now truncating." 68 | ) 69 | 70 | for i in reversed(range(len(messages))): 71 | msg_len = len(messages[i]["content"]) 72 | if total_len <= max_char: 73 | break 74 | over_size = total_len - max_char 75 | if msg_len <= over_size: 76 | messages[i]["content"] = "" 77 | total_len -= msg_len 78 | else: 79 | new_len = msg_len - over_size 80 | messages[i]["content"] = messages[i]["content"][:new_len] + " ... (truncated)" 81 | total_len = self.MAX_CHAR_LENGTH 82 | 83 | rounds = 0 84 | while True: 85 | rounds += 1 86 | try: 87 | system_content = "" 88 | user_content = "" 89 | for m in messages: 90 | if m["role"] == "system": 91 | system_content = m["content"] 92 | elif m["role"] == "user": 93 | user_content = m["content"] 94 | 95 | log_text = ( 96 | f"== [Planner Round {rounds}] GPT Call ==\n" 97 | f"**System**:\n{system_content}\n\n" 98 | f"**User**:\n{user_content}\n" 99 | ) 100 | self._append_planner_log(log_text) 101 | 102 | if "gpt" in self.model_name.lower() or "gemini" in self.model_name.lower(): 103 | response = self.client.chat.completions.create( 104 | model=self.model_name, 105 | messages=messages, 106 | temperature=0, 107 | n=1, 108 | ) 109 | content = response.choices[0].message.content 110 | else: 111 | response = self.client.chat.completions.create( 112 | model=self.client.models.list().data[0].id, 113 | messages=messages, 114 | max_tokens=8192, 115 | temperature=0, 116 | n=1, 117 | ) 118 | content = response.choices[0].message.content 119 | 120 | # 把 GPT 的原始返回写到日志 121 | response_str = json.dumps({ 122 | "role": "assistant", 123 | "content": content 124 | }, ensure_ascii=False, indent=2) 125 | self._append_planner_log(f"**Raw Response**:\n{response_str}\n") 126 | 127 | return content 128 | 129 | except Exception as e: 130 | err_msg = f"[Planner] GPT plan generation error: {e}" 131 | self._append_planner_log(err_msg) 132 | time.sleep(5) 133 | if rounds > 3: 134 | raise Exception(f"[Planner] GPT plan call failed too many times: {e}") 135 | 136 | def plan(self, status_text: str) -> str: 137 | prompt_path = "./planner_prompt.yaml" 138 | try: 139 | with open(prompt_path, "r", encoding="utf-8") as pf: 140 | prompt_data = yaml.safe_load(pf) 141 | except Exception as e: 142 | raise ValueError(f"[BasePlanner] Error reading prompt YAML from {prompt_path}: {e}") 143 | 144 | system_prompt_template = prompt_data.get("system", "") 145 | user_prompt_template = prompt_data.get("user", "") 146 | 147 | system_prompt = system_prompt_template.replace("<>", self.tools_description) 148 | user_prompt = user_prompt_template.replace("<>", status_text) 149 | 150 | if self.main_agent and hasattr(self.main_agent, '_build_tool_call_history'): 151 | tool_call_history_str = self.main_agent._build_tool_call_history(num=7) 152 | user_prompt += f"\n\nRecent Detailed Tool Calls:\n{tool_call_history_str}" 153 | user_prompt = user_prompt.replace("<>", tool_call_history_str) 154 | 155 | messages = [ 156 | {"role": "system", "content": system_prompt}, 157 | {"role": "user", "content": user_prompt}, 158 | ] 159 | planner_output = self.gpt_planner_call(messages) 160 | self._append_planner_log(f"=== Planner Plan Output ===\n{planner_output}") 161 | 162 | return planner_output 163 | -------------------------------------------------------------------------------- /src/ModelTool/utils/planner_config.yaml: -------------------------------------------------------------------------------- 1 | model_name: Qwen 2 | openai_api_key: YOUR_OPENAI_API_KEY 3 | 4 | use_scratch_board: false 5 | log_planner_steps: true 6 | planner_name: "BasePlanner" -------------------------------------------------------------------------------- /src/host/host.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES=4,6 2 | 3 | #vllm serve /shared/nas2/shared/llms/Qwen2.5-72B-Instruct \ 4 | # --max-model-len 32768 \ 5 | # --gpu-memory-utilization 0.9 \ 6 | # --tensor-parallel-size 8 \ 7 | # --enable-auto-tool-choice \ 8 | # --tool-call-parser hermes \ 9 | # --chat-template tool_chat_hermes_template.jinja 10 | 11 | # vllm serve /shared/nas2/shared/llms/Llama-3.1-70B-Instruct \ 12 | # --max-model-len 32768 \ 13 | # --gpu-memory-utilization 0.9 \ 14 | # --tensor-parallel-size 8 \ 15 | # --enable-auto-tool-choice \ 16 | # --tool-call-parser llama3_json \ 17 | # --chat-template tool_chat_llama3.1_template.jinja 18 | 19 | vllm serve /shared/nas2/shared/llms/QwQ-32B \ 20 | --gpu-memory-utilization 0.9 \ 21 | --tensor-parallel-size 2 \ 22 | --enable-auto-tool-choice \ 23 | --tool-call-parser hermes \ 24 | --chat-template tool_chat_hermes_template.jinja 25 | 26 | -------------------------------------------------------------------------------- /src/host/tool_chat_hermes_template.jinja: -------------------------------------------------------------------------------- 1 | {%- macro json_to_python_type(json_spec) %} 2 | {%- set basic_type_map = { 3 | "string": "str", 4 | "number": "float", 5 | "integer": "int", 6 | "boolean": "bool" 7 | } %} 8 | 9 | {%- if basic_type_map[json_spec.type] is defined %} 10 | {{- basic_type_map[json_spec.type] }} 11 | {%- elif json_spec.type == "array" %} 12 | {{- "list[" + json_to_python_type(json_spec|items) + "]" }} 13 | {%- elif json_spec.type == "object" %} 14 | {%- if json_spec.additionalProperties is defined %} 15 | {{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }} 16 | {%- else %} 17 | {{- "dict" }} 18 | {%- endif %} 19 | {%- elif json_spec.type is iterable %} 20 | {{- "Union[" }} 21 | {%- for t in json_spec.type %} 22 | {{- json_to_python_type({"type": t}) }} 23 | {%- if not loop.last %} 24 | {{- "," }} 25 | {%- endif %} 26 | {%- endfor %} 27 | {{- "]" }} 28 | {%- else %} 29 | {{- "Any" }} 30 | {%- endif %} 31 | {%- endmacro %} 32 | 33 | 34 | {{- bos_token }} 35 | {{- "<|im_start|>system\nYou are a function calling AI model. You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: " }} 36 | {%- if tools is iterable and tools | length > 0 %} 37 | {%- for tool in tools %} 38 | {%- if tool.function is defined %} 39 | {%- set tool = tool.function %} 40 | {%- endif %} 41 | {{- '{"type": "function", "function": ' }} 42 | {{- '{"name": "' + tool.name + '", ' }} 43 | {{- '"description": "' + tool.name + '(' }} 44 | {%- for param_name, param_fields in tool.parameters.properties|items %} 45 | {{- param_name + ": " + json_to_python_type(param_fields) }} 46 | {%- if not loop.last %} 47 | {{- ", " }} 48 | {%- endif %} 49 | {%- endfor %} 50 | {{- ")" }} 51 | {%- if tool.return is defined %} 52 | {{- " -> " + json_to_python_type(tool.return) }} 53 | {%- endif %} 54 | {{- " - " + tool.description + "\n\n" }} 55 | {%- for param_name, param_fields in tool.parameters.properties|items %} 56 | {%- if loop.first %} 57 | {{- " Args:\n" }} 58 | {%- endif %} 59 | {{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }} 60 | {%- endfor %} 61 | {%- if tool.return is defined and tool.return.description is defined %} 62 | {{- "\n Returns:\n " + tool.return.description }} 63 | {%- endif %} 64 | {{- '"' }} 65 | {{- ', "parameters": ' }} 66 | {%- if tool.parameters.properties | length == 0 %} 67 | {{- "{}" }} 68 | {%- else %} 69 | {{- tool.parameters|tojson }} 70 | {%- endif %} 71 | {{- "}" }} 72 | {%- if not loop.last %} 73 | {{- "\n" }} 74 | {%- endif %} 75 | {%- endfor %} 76 | {%- endif %} 77 | {{- " " }} 78 | {{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}} 79 | ' }} 80 | {{- "For each function call return a json object with function name and arguments within XML tags as follows: 81 | " }} 82 | {{- " 83 | " }} 84 | {{- '{"name": , "arguments": } 85 | ' }} 86 | {{- '<|im_end|>' }} 87 | {%- for message in messages %} 88 | {%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %} 89 | {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }} 90 | {%- elif message.role == "assistant" and message.tool_calls is defined %} 91 | {{- '<|im_start|>' + message.role }} 92 | {%- for tool_call in message.tool_calls %} 93 | {{- '\n\n' }} 94 | {%- if tool_call.function is defined %} 95 | {%- set tool_call = tool_call.function %} 96 | {%- endif %} 97 | {{- '{' }} 98 | {{- '"name": "' }} 99 | {{- tool_call.name }} 100 | {{- '"' }} 101 | {%- if tool_call.arguments is defined %} 102 | {{- ', ' }} 103 | {{- '"arguments": ' }} 104 | {{- tool_call.arguments|tojson }} 105 | {%- endif %} 106 | {{- '}' }} 107 | {{- '\n' }} 108 | {%- endfor %} 109 | {{- '<|im_end|>\n' }} 110 | {%- elif message.role == "tool" %} 111 | {%- if loop.previtem and loop.previtem.role != "tool" %} 112 | {{- '<|im_start|>tool\n' }} 113 | {%- endif %} 114 | {{- '\n' }} 115 | {{- message.content }} 116 | {%- if not loop.last %} 117 | {{- '\n\n' }} 118 | {%- else %} 119 | {{- '\n' }} 120 | {%- endif %} 121 | {%- if not loop.last and loop.nextitem.role != "tool" %} 122 | {{- '<|im_end|>' }} 123 | {%- elif loop.last %} 124 | {{- '<|im_end|>' }} 125 | {%- endif %} 126 | {%- endif %} 127 | {%- endfor %} 128 | {%- if add_generation_prompt %} 129 | {{- '<|im_start|>assistant\n' }} 130 | {%- endif %} -------------------------------------------------------------------------------- /src/host/tool_chat_llama3.1_template.jinja: -------------------------------------------------------------------------------- 1 | {{- bos_token }} 2 | {%- if custom_tools is defined %} 3 | {%- set tools = custom_tools %} 4 | {%- endif %} 5 | {%- if not tools_in_user_message is defined %} 6 | {#- Llama 3.1 doesn't pass all tests if the tools are in the system prompt #} 7 | {%- set tools_in_user_message = true %} 8 | {%- endif %} 9 | {%- if not date_string is defined %} 10 | {%- if strftime_now is defined %} 11 | {%- set date_string = strftime_now("%d %b %Y") %} 12 | {%- else %} 13 | {%- set date_string = "26 Jul 2024" %} 14 | {%- endif %} 15 | {%- endif %} 16 | {%- if not tools is defined %} 17 | {%- set tools = none %} 18 | {%- endif %} 19 | 20 | {#- This block extracts the system message, so we can slot it into the right place. #} 21 | {%- if messages[0]['role'] == 'system' %} 22 | {%- if messages[0]['content'] is string %} 23 | {%- set system_message = messages[0]['content']|trim %} 24 | {%- else %} 25 | {%- set system_message = messages[0]['content'][0]['text']|trim %} 26 | {%- endif %} 27 | {%- set messages = messages[1:] %} 28 | {%- else %} 29 | {%- if tools is not none %} 30 | {%- set system_message = "You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question." %} 31 | {%- else %} 32 | {%- set system_message = "" %} 33 | {%- endif %} 34 | {%- endif %} 35 | 36 | {#- System message #} 37 | {{- "<|start_header_id|>system<|end_header_id|>\n\n" }} 38 | {%- if tools is not none %} 39 | {{- "Environment: ipython\n" }} 40 | {%- endif %} 41 | {{- "Cutting Knowledge Date: December 2023\n" }} 42 | {{- "Today Date: " + date_string + "\n\n" }} 43 | {%- if tools is not none and not tools_in_user_message %} 44 | {{- "You have access to the following functions. To call a function, please respond with JSON for a function call. " }} 45 | {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. ' }} 46 | {{- "Do not use variables.\n\n" }} 47 | {%- for t in tools %} 48 | {{- t | tojson(indent=4) }} 49 | {{- "\n\n" }} 50 | {%- endfor %} 51 | {%- endif %} 52 | {{- system_message }} 53 | {{- "<|eot_id|>" }} 54 | 55 | {#- Custom tools are passed in a user message with some extra guidance #} 56 | {%- if tools_in_user_message and not tools is none %} 57 | {#- Extract the first user message so we can plug it in here #} 58 | {%- if messages | length != 0 %} 59 | {%- if messages[0]['content'] is string %} 60 | {%- set first_user_message = messages[0]['content']|trim %} 61 | {%- else %} 62 | {%- set first_user_message = messages[0]['content'] | selectattr('type', 'equalto', 'text') | map(attribute='text') | map('trim') | join('\n') %} 63 | {%- endif %} 64 | {%- set messages = messages[1:] %} 65 | {%- else %} 66 | {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }} 67 | {%- endif %} 68 | {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}} 69 | {{- "Given the following functions, please respond with a JSON for a function call " }} 70 | {{- "with its proper arguments that best answers the given prompt.\n\n" }} 71 | {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. ' }} 72 | {{- "Do not use variables.\n\n" }} 73 | {%- for t in tools %} 74 | {{- t | tojson(indent=4) }} 75 | {{- "\n\n" }} 76 | {%- endfor %} 77 | {{- first_user_message + "<|eot_id|>"}} 78 | {%- endif %} 79 | 80 | {%- for message in messages %} 81 | {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %} 82 | {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' }} 83 | {%- if message['content'] is string %} 84 | {{- message['content'] | trim}} 85 | {%- else %} 86 | {%- for content in message['content'] %} 87 | {%- if content['type'] == 'text' %} 88 | {{- content['text'] | trim }} 89 | {%- endif %} 90 | {%- endfor %} 91 | {%- endif %} 92 | {{- '<|eot_id|>' }} 93 | {%- elif 'tool_calls' in message %} 94 | {%- if not message.tool_calls|length == 1 %} 95 | {{- raise_exception("This model only supports single tool-calls at once!") }} 96 | {%- endif %} 97 | {%- set tool_call = message.tool_calls[0].function %} 98 | {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}} 99 | {{- '{"name": "' + tool_call.name + '", ' }} 100 | {{- '"parameters": ' }} 101 | {{- tool_call.arguments | tojson }} 102 | {{- "}" }} 103 | {{- "<|eot_id|>" }} 104 | {%- elif message.role == "tool" or message.role == "ipython" %} 105 | {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }} 106 | {%- if message.content is string %} 107 | {{- { "output": message.content } | tojson }} 108 | {%- else %} 109 | {%- for content in message['content'] %} 110 | {%- if content['type'] == 'text' %} 111 | {{- { "output": content['text'] } | tojson }} 112 | {%- endif %} 113 | {%- endfor %} 114 | {%- endif %} 115 | {{- "<|eot_id|>" }} 116 | {%- endif %} 117 | {%- endfor %} 118 | {%- if add_generation_prompt %} 119 | {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }} 120 | {%- endif %} -------------------------------------------------------------------------------- /src/judger/analysis_groundedness.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import os 4 | from openai import OpenAI 5 | 6 | class AnalysisGroundednessJudger: 7 | SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess how well the solution's analysis is grounded in mathematical and scientific principles. You should evaluate based on the role you are given. 8 | 9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase: 10 | 11 | 1. Analytical Depth (0-1): 12 | 0.00: No meaningful analysis 13 | Example: Superficial observations without reasoning 14 | 0.25: Basic analysis 15 | Example: Simple descriptive analysis without connections 16 | 0.50: Standard analysis 17 | Example: Clear reasoning with some depth 18 | 0.75: Advanced analysis 19 | Example: Deep insights with strong connections 20 | 1.00: Exceptional analysis 21 | Example: Novel insights with comprehensive reasoning 22 | 23 | 2. Mathematical Rigor (0-1): 24 | 0.00: No mathematical support 25 | Example: Claims without mathematical backing 26 | 0.25: Basic mathematics 27 | Example: Simple calculations without justification 28 | 0.50: Standard rigor 29 | Example: Clear mathematical reasoning 30 | 0.75: Strong rigor 31 | Example: Detailed proofs and derivations 32 | 1.00: Exceptional rigor 33 | Example: Complete mathematical framework 34 | 35 | 3. Results Interpretation (0-1): 36 | 0.00: No interpretation 37 | Example: Raw results without context 38 | 0.25: Basic interpretation 39 | Example: Simple description of results 40 | 0.50: Clear interpretation 41 | Example: Results explained with context 42 | 0.75: Thorough interpretation 43 | Example: Deep analysis of implications 44 | 1.00: Exceptional interpretation 45 | Example: Comprehensive analysis with insights 46 | 47 | 4. Critical Analysis (0-1): 48 | 0.00: No critical thinking 49 | Example: Accepts all results without question 50 | 0.25: Basic criticism 51 | Example: Notes obvious limitations 52 | 0.50: Standard analysis 53 | Example: Identifies key strengths/weaknesses 54 | 0.75: Strong analysis 55 | Example: Deep examination of assumptions 56 | 1.00: Exceptional analysis 57 | Example: Comprehensive critique with alternatives 58 | 59 | 5. Future Implications (0-1): 60 | 0.00: No discussion 61 | Example: Ends at results 62 | 0.25: Basic implications 63 | Example: Simple next steps 64 | 0.50: Clear implications 65 | Example: Reasonable future directions 66 | 0.75: Strong implications 67 | Example: Detailed future research paths 68 | 1.00: Exceptional vision 69 | Example: Novel research directions with justification 70 | 71 | --- 72 | 73 | Your response must follow this exact format: 74 | 75 | Your Response: 76 | ```json 77 | { 78 | "analytical_depth": { 79 | "score": 0.0, 80 | "explanation": "Detailed justification for score" 81 | }, 82 | "mathematical_rigor": { 83 | "score": 0.0, 84 | "explanation": "Detailed justification for score" 85 | }, 86 | "results_interpretation": { 87 | "score": 0.0, 88 | "explanation": "Detailed justification for score" 89 | }, 90 | "critical_analysis": { 91 | "score": 0.0, 92 | "explanation": "Detailed justification for score" 93 | }, 94 | "future_implications": { 95 | "score": 0.0, 96 | "explanation": "Detailed justification for score" 97 | }, 98 | "overall_score": 0.0, 99 | "overall_feedback": "Critical analysis of strengths and weaknesses" 100 | } 101 | ``` 102 | 103 | --- 104 | 105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical. You should also give your score and explaination from your role's perspective.""" 106 | 107 | USER_PROMPT = """Please evaluate the analysis groundedness of the following mathematical modeling paper: 108 | 109 | {writing} 110 | 111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective. 112 | 113 | Your Response: 114 | """ 115 | 116 | def __init__(self): 117 | if os.path.exists("../../secret.json"): 118 | secret = json.load(open("../../secret.json")) 119 | self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"]) 120 | else: 121 | self.client = OpenAI(api_key="sk-...") 122 | 123 | def run(self, writing: str, role: dict = None) -> dict: 124 | role_name = role["name"].strip() 125 | role_details = role["details"].strip() 126 | messages = [ 127 | {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT}, 128 | {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)} 129 | ] 130 | 131 | response = self.client.chat.completions.create( 132 | model="gpt-4o-mini", 133 | messages=messages, 134 | temperature=0.0, 135 | n=1, 136 | ) 137 | 138 | content = response.choices[0].message.content 139 | json_str = content.split("```json")[1].split("```")[0].strip() 140 | result = ast.literal_eval(json_str) 141 | 142 | scores = [result[aspect]["score"] for aspect in [ 143 | "analytical_depth", "mathematical_rigor", "results_interpretation", 144 | "critical_analysis", "future_implications" 145 | ]] 146 | result["calculated_overall"] = sum(scores) / len(scores) 147 | result["role"] = role 148 | return result -------------------------------------------------------------------------------- /src/judger/data_groundedness.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import os 4 | from openai import OpenAI 5 | 6 | class DataGroundednessJudger: 7 | SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess how well the solution is grounded in data and evidence. You should evaluate based on the role you are given. 8 | 9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase: 10 | 11 | 1. Data Quality (0-1): 12 | 0.00: No data or invalid data 13 | Example: Made-up numbers without sources 14 | 0.25: Poor quality/unreliable 15 | Example: Single unreliable source, outdated data 16 | 0.50: Acceptable but limited 17 | Example: Reliable source but incomplete dataset 18 | 0.75: Good with minor issues 19 | Example: Multiple reliable sources, small gaps 20 | 1.00: Excellent data quality 21 | Example: Multiple verified sources, comprehensive coverage 22 | 23 | 2. Data Processing (0-1): 24 | 0.00: No processing/invalid 25 | Example: Raw data used without cleaning 26 | 0.25: Basic processing only 27 | Example: Simple averaging without outlier removal 28 | 0.50: Standard processing 29 | Example: Basic cleaning and normalization 30 | 0.75: Advanced processing 31 | Example: Sophisticated cleaning with justification 32 | 1.00: Comprehensive processing 33 | Example: Full pipeline with validation at each step 34 | 35 | 3. Statistical Analysis (0-1): 36 | 0.00: No analysis/incorrect 37 | Example: No statistical methods used 38 | 0.25: Basic statistics only 39 | Example: Mean/median without confidence intervals 40 | 0.50: Standard analysis 41 | Example: Basic hypothesis testing 42 | 0.75: Advanced analysis 43 | Example: Multiple statistical methods with validation 44 | 1.00: Rigorous analysis 45 | Example: Comprehensive statistical framework with robustness checks 46 | 47 | 4. Data Integration (0-1): 48 | 0.00: No integration 49 | Example: Data disconnected from model 50 | 0.25: Poor integration 51 | Example: Forced fit without justification 52 | 0.50: Partial integration 53 | Example: Some aspects well-integrated, others not 54 | 0.75: Good integration 55 | Example: Most data well-integrated with clear reasoning 56 | 1.00: Perfect integration 57 | Example: All data seamlessly integrated with full justification 58 | 59 | 5. Validation & Testing (0-1): 60 | 0.00: No validation 61 | Example: Results accepted without testing 62 | 0.25: Minimal testing 63 | Example: Basic sanity checks only 64 | 0.50: Standard validation 65 | Example: Cross-validation without sensitivity analysis 66 | 0.75: Thorough validation 67 | Example: Multiple validation methods 68 | 1.00: Comprehensive validation 69 | Example: Full validation suite with sensitivity analysis 70 | 71 | --- 72 | 73 | Your response must follow this exact format: 74 | 75 | Your Response: 76 | ```json 77 | { 78 | "data_quality": { 79 | "score": 0.0, 80 | "explanation": "Detailed justification for score" 81 | }, 82 | "data_processing": { 83 | "score": 0.0, 84 | "explanation": "Detailed justification for score" 85 | }, 86 | "statistical_analysis": { 87 | "score": 0.0, 88 | "explanation": "Detailed justification for score" 89 | }, 90 | "data_integration": { 91 | "score": 0.0, 92 | "explanation": "Detailed justification for score" 93 | }, 94 | "validation": { 95 | "score": 0.0, 96 | "explanation": "Detailed justification for score" 97 | }, 98 | "calculated_overall": 0.0, 99 | "overall_feedback": "Critical analysis of strengths and weaknesses" 100 | } 101 | ``` 102 | 103 | --- 104 | 105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical. You should also give your score and explaination from your role's perspective.""" 106 | 107 | USER_PROMPT = """Please evaluate the data groundedness of the following mathematical modeling paper: 108 | 109 | {writing} 110 | 111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective. 112 | 113 | Your Response: 114 | """ 115 | 116 | def __init__(self): 117 | if os.path.exists("../../secret.json"): 118 | secret = json.load(open("../../secret.json")) 119 | self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"]) 120 | else: 121 | self.client = OpenAI(api_key="sk-...") 122 | 123 | def run(self, writing: str, role: dict = None) -> dict: 124 | role_name = role["name"].strip() 125 | role_details = role["details"].strip() 126 | messages = [ 127 | {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT}, 128 | {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)} 129 | ] 130 | 131 | response = self.client.chat.completions.create( 132 | model="gpt-4o-mini", 133 | messages=messages, 134 | temperature=0.0, 135 | n=1, 136 | ) 137 | 138 | content = response.choices[0].message.content 139 | json_str = content.split("```json")[1].split("```")[0].strip() 140 | result = ast.literal_eval(json_str) 141 | 142 | # Calculate overall score as average of individual scores 143 | scores = [result[aspect]["score"] for aspect in [ 144 | "data_quality", "data_processing", "statistical_analysis", 145 | "data_integration", "validation" 146 | ]] 147 | result["calculated_overall"] = sum(scores) / len(scores) 148 | result["role"] = role 149 | 150 | return result -------------------------------------------------------------------------------- /src/judger/innovativeness.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import os 4 | from openai import OpenAI 5 | 6 | class InnovativenessJudger: 7 | SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess the innovativeness and originality of the solution approach. You should evaluate based on the role you are given. 8 | 9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase: 10 | 11 | 1. Methodological Innovation (0-1): 12 | 0.00: Standard/textbook approach 13 | Example: Using basic linear regression without modification 14 | 0.25: Minor adaptations 15 | Example: Small tweaks to existing methods 16 | 0.50: Meaningful modifications 17 | Example: Significant adaptations to standard approaches 18 | 0.75: Novel combinations 19 | Example: Creative synthesis of multiple methods 20 | 1.00: Groundbreaking approach 21 | Example: Entirely new methodology with strong justification 22 | 23 | 2. Problem Framing (0-1): 24 | 0.00: Conventional perspective 25 | Example: Following typical problem formulation 26 | 0.25: Slight reframing 27 | Example: Minor changes to standard approach 28 | 0.50: Fresh perspective 29 | Example: New angle on known problem 30 | 0.75: Novel framing 31 | Example: Unique problem decomposition 32 | 1.00: Revolutionary perspective 33 | Example: Paradigm-shifting problem formulation 34 | 35 | 3. Solution Creativity (0-1): 36 | 0.00: Standard solution 37 | Example: Direct application of known methods 38 | 0.25: Minor creativity 39 | Example: Small creative elements in standard approach 40 | 0.50: Notable creativity 41 | Example: Original elements in key areas 42 | 0.75: Significant creativity 43 | Example: Multiple creative components 44 | 1.00: Exceptional creativity 45 | Example: Entirely novel solution approach 46 | 47 | 4. Technical Advancement (0-1): 48 | 0.00: No advancement 49 | Example: Uses only existing techniques 50 | 0.25: Minor improvements 51 | Example: Small technical optimizations 52 | 0.50: Meaningful advances 53 | Example: New technical contributions 54 | 0.75: Significant advances 55 | Example: Multiple technical innovations 56 | 1.00: Major breakthrough 57 | Example: Revolutionary technical approach 58 | 59 | 5. Impact Potential (0-1): 60 | 0.00: Minimal impact 61 | Example: No new insights or applications 62 | 0.25: Limited impact 63 | Example: Minor improvements to existing methods 64 | 0.50: Moderate impact 65 | Example: Useful new approach for specific cases 66 | 0.75: High impact 67 | Example: Broadly applicable new methods 68 | 1.00: Transformative 69 | Example: Could change the field significantly 70 | 71 | --- 72 | 73 | Your response must follow this exact format: 74 | 75 | Your Response: 76 | ```json 77 | { 78 | "methodological_innovation": { 79 | "score": 0.0, 80 | "explanation": "Detailed justification for score" 81 | }, 82 | "problem_framing": { 83 | "score": 0.0, 84 | "explanation": "Detailed justification for score" 85 | }, 86 | "solution_creativity": { 87 | "score": 0.0, 88 | "explanation": "Detailed justification for score" 89 | }, 90 | "technical_advancement": { 91 | "score": 0.0, 92 | "explanation": "Detailed justification for score" 93 | }, 94 | "impact_potential": { 95 | "score": 0.0, 96 | "explanation": "Detailed justification for score" 97 | }, 98 | "overall_score": 0.0, 99 | "overall_feedback": "Critical analysis of innovative aspects and potential impact" 100 | } 101 | ``` 102 | 103 | --- 104 | 105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical - true innovation is rare. You should also give your score and explaination from your role's perspective.""" 106 | 107 | USER_PROMPT = """Please evaluate the innovativeness of the following mathematical modeling paper: 108 | 109 | {writing} 110 | 111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective. 112 | 113 | Your Response: 114 | """ 115 | 116 | def __init__(self): 117 | if os.path.exists("../../secret.json"): 118 | secret = json.load(open("../../secret.json")) 119 | self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"]) 120 | else: 121 | self.client = OpenAI(api_key="sk-...") 122 | 123 | def run(self, writing: str, role: dict = None) -> dict: 124 | role_name = role["name"].strip() 125 | role_details = role["details"].strip() 126 | messages = [ 127 | {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT}, 128 | {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)} 129 | ] 130 | 131 | response = self.client.chat.completions.create( 132 | model="gpt-4o-mini", 133 | messages=messages, 134 | temperature=0.0, 135 | n=1, 136 | ) 137 | 138 | content = response.choices[0].message.content 139 | json_str = content.split("```json")[1].split("```")[0].strip() 140 | result = ast.literal_eval(json_str) 141 | 142 | scores = [result[aspect]["score"] for aspect in [ 143 | "methodological_innovation", "problem_framing", "solution_creativity", 144 | "technical_advancement", "impact_potential" 145 | ]] 146 | result["calculated_overall"] = sum(scores) / len(scores) 147 | result["role"] = role 148 | return result -------------------------------------------------------------------------------- /src/judger/main_judge.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import ast 4 | from concurrent.futures import ThreadPoolExecutor 5 | from typing import Dict, Any, Set 6 | 7 | from structural_coherency import StructuralCoherencyJudger 8 | from scoring_decomposition import ScoringDecompositionJudger 9 | from modeling_groundedness import ModelingGroundednessJudger 10 | from data_groundedness import DataGroundednessJudger 11 | from analysis_groundedness import AnalysisGroundednessJudger 12 | from innovativeness import InnovativenessJudger 13 | 14 | class MainJudger: 15 | def __init__(self): 16 | self.judgers = { 17 | "structural_coherency": StructuralCoherencyJudger(), 18 | "scoring_decomposition": ScoringDecompositionJudger(), 19 | "modeling_groundedness": ModelingGroundednessJudger(), 20 | "data_groundedness": DataGroundednessJudger(), 21 | "analysis_groundedness": AnalysisGroundednessJudger(), 22 | "innovativeness": InnovativenessJudger() 23 | } 24 | 25 | # Judgers that use role-based evaluation 26 | self.role_based_judgers = { 27 | "modeling_groundedness", 28 | "data_groundedness", 29 | "analysis_groundedness", 30 | "innovativeness" 31 | } 32 | 33 | def run_judger(self, judger_name: str, writing: str, roles: list = None, grading_points: list = None) -> Dict[str, Any]: 34 | try: 35 | judger = self.judgers[judger_name] 36 | 37 | # Handle role-based judgers 38 | if judger_name in self.role_based_judgers and roles: 39 | results = [] 40 | for role in roles: 41 | result = judger.run(writing, role=role) 42 | result["role"] = role 43 | results.append(result) 44 | return { 45 | "role_based_results": results, 46 | "aggregated_score": sum(r.get("calculated_overall", 0) for r in results) / len(results) 47 | } 48 | 49 | # Handle non-role-based judgers 50 | if judger_name == "scoring_decomposition": 51 | return judger.run(writing, grading_points) 52 | return judger.run(writing) 53 | 54 | except Exception as e: 55 | print(f"Error in {judger_name}: {str(e)}") 56 | return { 57 | "error": str(e), 58 | "status": "failed", 59 | "judger": judger_name 60 | } 61 | 62 | def get_existing_results(self, output_dir: str, gold_id: str) -> Dict[str, Any]: 63 | """Read existing judgement results if they exist""" 64 | output_file = f"{output_dir}/{gold_id}.json" 65 | if os.path.exists(output_file): 66 | try: 67 | with open(output_file) as f: 68 | results = json.load(f) 69 | return results.get("judgements", {}) 70 | except: 71 | return {} 72 | return {} 73 | 74 | def get_missing_judgers(self, existing_results: Dict[str, Any]) -> Set[str]: 75 | """Determine which judgers need to be run""" 76 | missing = set(self.judgers.keys()) 77 | for judger_name, result in existing_results.items(): 78 | # Only consider result valid if it exists and has no error 79 | if result and "error" not in result: 80 | missing.remove(judger_name) 81 | return missing 82 | 83 | def judge(self, output_dir: str, gold_id: str, writing: str, grading_points: list, roles: list = None) -> Dict[str, Any]: 84 | # Initialize results structure 85 | results = { 86 | "gold_id": gold_id, 87 | "judgements": {}, 88 | "metadata": { 89 | "success_count": 0, 90 | "failed_count": 0, 91 | "failed_judgers": [], 92 | "skipped_count": 0, 93 | "skipped_judgers": [] 94 | } 95 | } 96 | 97 | # Get existing results 98 | existing_results = self.get_existing_results(output_dir, gold_id) 99 | missing_judgers = self.get_missing_judgers(existing_results) 100 | 101 | print(f"Missing judgers for {gold_id}: {missing_judgers}") 102 | 103 | # Add existing valid results to our results 104 | for judger_name, result in existing_results.items(): 105 | if judger_name not in missing_judgers: 106 | results["judgements"][judger_name] = result 107 | results["metadata"]["skipped_count"] += 1 108 | results["metadata"]["skipped_judgers"].append(judger_name) 109 | 110 | if not missing_judgers: 111 | print(f"All judgements already exist for {gold_id}") 112 | return results 113 | 114 | # Run only missing judgers in parallel 115 | with ThreadPoolExecutor(max_workers=6) as executor: 116 | future_to_judger = { 117 | executor.submit( 118 | self.run_judger, 119 | name, 120 | writing, 121 | roles if name in self.role_based_judgers else None, 122 | grading_points if name == "scoring_decomposition" else None 123 | ): name 124 | for name in missing_judgers 125 | } 126 | 127 | for future in future_to_judger: 128 | name = future_to_judger[future] 129 | try: 130 | result = future.result() 131 | if "error" in result: 132 | results["metadata"]["failed_count"] += 1 133 | results["metadata"]["failed_judgers"].append(name) 134 | else: 135 | results["metadata"]["success_count"] += 1 136 | results["judgements"][name] = result 137 | except Exception as e: 138 | print(f"Error in {name}: {str(e)}") 139 | results["judgements"][name] = {"error": str(e)} 140 | results["metadata"]["failed_count"] += 1 141 | results["metadata"]["failed_judgers"].append(name) 142 | 143 | with open(f"{output_dir}/{gold_id}.json", 'w') as f: 144 | json.dump(results, f, indent=4) 145 | 146 | return results 147 | 148 | def process_gold_id(args): 149 | gold_id, data, output_dir, judger = args 150 | if "writing" not in data: # Skip if no writing to evaluate 151 | return 152 | 153 | writing = data["writing"] 154 | criteria = data["criteria"] 155 | grading_points = criteria.get("decomposition", {}).get("grading_points", []) 156 | roles = criteria.get("eval_roles", []) 157 | 158 | print(f"Evaluating {gold_id}...") 159 | results = judger.judge(output_dir, gold_id, writing, grading_points, roles) 160 | print(f"Completed {gold_id} - Success: {results['metadata']['success_count']}, " 161 | f"Failed: {results['metadata']['failed_count']}, " 162 | f"Skipped: {results['metadata']['skipped_count']}") 163 | return gold_id, results 164 | 165 | def main(): 166 | for model, level in zip(["Qwen2.5-72B-Instruct"], ["ModelAgent"]): 167 | try: 168 | # Load problem data 169 | with open("../../data/modeling_data_final.json") as f: 170 | criterias = json.load(f) 171 | with open(f"../../output_writings/{level}/{model}/solutions_metadata.json") as f: 172 | writings = json.load(f) 173 | 174 | output_dir = f"../../output_judge/{level}/{model}" 175 | os.makedirs(output_dir, exist_ok=True) 176 | all_data = {} 177 | for gold_id, criteria in criterias.items(): 178 | all_data[gold_id] = { 179 | "criteria": criteria, 180 | } 181 | for gold_id, writing in writings.items(): 182 | if "writing" not in writing: 183 | continue 184 | all_data[gold_id]["writing"] = writing 185 | 186 | print(len(all_data)) 187 | 188 | judger = MainJudger() 189 | 190 | # Process problems in parallel 191 | with ThreadPoolExecutor(max_workers=10) as executor: 192 | args = [(gold_id, data, output_dir, judger) for gold_id, data in all_data.items()] 193 | results = list(executor.map(process_gold_id, args)) 194 | except: 195 | continue 196 | 197 | if __name__ == "__main__": 198 | main() -------------------------------------------------------------------------------- /src/judger/modeling_groundedness.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import os 4 | from openai import OpenAI 5 | 6 | class ModelingGroundednessJudger: 7 | SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess how well the solution's modeling approach is grounded in mathematical and scientific principles. You should evaluate based on the role you are given. 8 | 9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase: 10 | 11 | 1. Mathematical Foundation (0-1): 12 | 0.00: Fundamentally flawed or missing 13 | Example: No equations, incorrect mathematical concepts 14 | 0.25: Basic but problematic 15 | Example: Simple equations without proper variables defined 16 | 0.50: Sound but incomplete 17 | Example: Correct equations but missing key relationships 18 | 0.75: Strong with minor gaps 19 | Example: Well-formulated with some assumptions not fully justified 20 | 1.00: Excellent and rigorous 21 | Example: Complete mathematical framework with all relationships justified 22 | 23 | 2. Real-World Integration (0-1): 24 | 0.00: No connection to reality 25 | Example: Pure abstract model without practical context 26 | 0.25: Superficial consideration 27 | Example: Mentioning real factors without incorporating them 28 | 0.50: Partial integration 29 | Example: Some key factors included but others missing 30 | 0.75: Good but not comprehensive 31 | Example: Most factors included but some interactions overlooked 32 | 1.00: Complete integration 33 | Example: All relevant factors and interactions properly modeled 34 | 35 | 3. Technical Sophistication (0-1): 36 | 0.00: Elementary/inappropriate 37 | Example: Using linear regression for clearly nonlinear problems 38 | 0.25: Basic techniques only 39 | Example: Simple statistical methods without justification 40 | 0.50: Appropriate but limited 41 | Example: Correct methods but not fully exploited 42 | 0.75: Advanced with minor issues 43 | Example: Sophisticated methods with some gaps in implementation 44 | 1.00: State-of-the-art 45 | Example: Cutting-edge techniques properly implemented 46 | 47 | 4. Validation Approach (0-1): 48 | 0.00: No validation 49 | Example: Results presented without any verification 50 | 0.25: Minimal testing 51 | Example: Basic sanity checks only 52 | 0.50: Partial validation 53 | Example: Some test cases but not comprehensive 54 | 0.75: Thorough but not complete 55 | Example: Multiple validation methods but missing edge cases 56 | 1.00: Comprehensive validation 57 | Example: Multiple methods, edge cases, sensitivity analysis 58 | 59 | 5. Implementation Quality (0-1): 60 | 0.00: Poor/incorrect 61 | Example: Errors in implementation, wrong formulas 62 | 0.25: Basic but flawed 63 | Example: Correct concept but significant implementation errors 64 | 0.50: Workable but needs improvement 65 | Example: Functions correctly but inefficient or unclear 66 | 0.75: Good with minor issues 67 | Example: Well-implemented but some optimization possible 68 | 1.00: Excellent implementation 69 | Example: Efficient, clear, and well-documented code 70 | 71 | --- 72 | 73 | Your response must follow this exact format: 74 | 75 | Your Response: 76 | ```json 77 | { 78 | "mathematical_foundation": { 79 | "score": 0.0, 80 | "explanation": "Detailed justification for score" 81 | }, 82 | "real_world_integration": { 83 | "score": 0.0, 84 | "explanation": "Detailed justification for score" 85 | }, 86 | "technical_sophistication": { 87 | "score": 0.0, 88 | "explanation": "Detailed justification for score" 89 | }, 90 | "validation": { 91 | "score": 0.0, 92 | "explanation": "Detailed justification for score" 93 | }, 94 | "implementation": { 95 | "score": 0.0, 96 | "explanation": "Detailed justification for score" 97 | }, 98 | "calculated_overall": 0.0, 99 | "overall_feedback": "Critical analysis of strengths and weaknesses" 100 | } 101 | ``` 102 | 103 | --- 104 | 105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical. You should also give your score and explaination from your role's perspective.""" 106 | 107 | USER_PROMPT = """Please evaluate the modeling groundedness of the following mathematical modeling paper: 108 | 109 | {writing} 110 | 111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective. 112 | 113 | Your Response: 114 | """ 115 | 116 | def __init__(self): 117 | if os.path.exists("../../secret.json"): 118 | secret = json.load(open("../../secret.json")) 119 | self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"]) 120 | else: 121 | self.client = OpenAI(api_key="sk-...") 122 | 123 | def run(self, writing: str, role: dict = None) -> dict: 124 | role_name = role["name"].strip() 125 | role_details = role["details"].strip() 126 | messages = [ 127 | {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT}, 128 | {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)} 129 | ] 130 | 131 | response = self.client.chat.completions.create( 132 | model="gpt-4o-mini", 133 | messages=messages, 134 | temperature=0.0, 135 | n=1, 136 | ) 137 | 138 | content = response.choices[0].message.content 139 | json_str = content.split("```json")[1].split("```")[0].strip() 140 | result = ast.literal_eval(json_str) 141 | 142 | if "implementation_quality" in result: 143 | result["implementation"] = result.pop("implementation_quality") 144 | 145 | # Calculate overall score as average of individual scores 146 | scores = [result[aspect]["score"] for aspect in [ 147 | "mathematical_foundation", "real_world_integration", 148 | "technical_sophistication", "validation", "implementation" 149 | ]] 150 | result["calculated_overall"] = sum(scores) / len(scores) 151 | result["role"] = role 152 | 153 | return result -------------------------------------------------------------------------------- /src/judger/scoring_decomposition.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import os 4 | from openai import OpenAI 5 | 6 | class ScoringDecompositionJudger: 7 | SYS_PROMPT = """You are an expert judge evaluating mathematical modeling papers. Your task is to assess if each requirement of the problem is faithfully fulfilled based on the provided grading points. 8 | 9 | For each grading point, score from 0-1: 10 | 11 | 0.00: Requirement Ignored/Failed 12 | Example: No attempt to address the requirement 13 | Example: Completely incorrect approach 14 | 15 | 0.25: Minimal/Poor Treatment 16 | Example: Superficial mention without proper implementation 17 | Example: Major flaws in approach or understanding 18 | 19 | 0.50: Partial/Basic Treatment 20 | Example: Addresses main points but misses important aspects 21 | Example: Correct approach but incomplete implementation 22 | 23 | 0.75: Good but Not Perfect 24 | Example: Strong treatment with minor omissions 25 | Example: Well-implemented but could be more thorough 26 | 27 | 1.00: Complete and Excellent 28 | Example: Comprehensive treatment of all aspects 29 | Example: Thorough implementation with validation 30 | 31 | Critical Evaluation Points: 32 | 33 | 1. Completeness: 34 | - Every sub-requirement must be explicitly addressed 35 | - Partial treatment results in significant score reduction 36 | - Missing elements cannot be compensated by other strengths 37 | 38 | 2. Quality of Implementation: 39 | - Mathematical rigor is essential 40 | - Must show clear methodology 41 | - Must include validation 42 | - Surface-level solutions score 0.25 maximum 43 | 44 | 3. Integration: 45 | - Requirements must work together coherently 46 | - Interdependencies must be addressed 47 | - Isolated solutions score 0.50 maximum 48 | 49 | 4. Validation: 50 | - Results must be verified 51 | - Assumptions must be tested 52 | - No validation means 0.25 maximum score 53 | 54 | --- 55 | 56 | Your response must follow this exact format: 57 | 58 | Your Response: 59 | ```json 60 | { 61 | "scores": { 62 | "grading_point_1": 0.0, 63 | "grading_point_2": 0.0, 64 | ... 65 | }, 66 | "explanation": { 67 | "grading_point_1": "why this score", 68 | "grading_point_2": "why this score", 69 | ... 70 | } 71 | } 72 | ``` 73 | 74 | --- 75 | 76 | Note: For each grading point, score must be exactly 0.0, 0.25, 0.50, 0.75, or 1.00. Use the grading point's category as the key in the scores and explanation dictionaries. Be extremely critical - most solutions should score in the 0.25-0.50 range unless truly exceptional.""" 77 | 78 | USER_PROMPT = """Please evaluate if the following mathematical modeling paper fulfills each grading point requirement: 79 | 80 | Paper Content: 81 | {writing} 82 | 83 | --- 84 | 85 | Grading Points to Evaluate: 86 | {grading_points} 87 | 88 | --- 89 | 90 | Provide scores and explanations for each grading point. 91 | 92 | Your Response: 93 | """ 94 | 95 | def __init__(self): 96 | if os.path.exists("../../secret.json"): 97 | secret = json.load(open("../../secret.json")) 98 | self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"]) 99 | else: 100 | self.client = OpenAI(api_key="sk-...") 101 | 102 | def run(self, writing: str, grading_points: list) -> dict: 103 | messages = [ 104 | {'role': 'system', 'content': self.SYS_PROMPT}, 105 | {'role': 'user', 'content': self.USER_PROMPT.format( 106 | writing=writing, 107 | grading_points=json.dumps(grading_points, indent=2) 108 | )} 109 | ] 110 | 111 | response = self.client.chat.completions.create( 112 | model="gpt-4o-mini", 113 | messages=messages, 114 | temperature=0.0, 115 | n=1, 116 | ) 117 | 118 | content = response.choices[0].message.content 119 | json_str = content.split("```json")[1].split("```")[0].strip() 120 | result = ast.literal_eval(json_str) 121 | total_score = sum(result["scores"].values()) 122 | result["total_score"] = total_score 123 | average_score = total_score / len(result["scores"]) 124 | result["calculated_overall"] = average_score 125 | 126 | return result -------------------------------------------------------------------------------- /src/judger/structural_coherency.py: -------------------------------------------------------------------------------- 1 | import json 2 | import ast 3 | import os 4 | from openai import OpenAI 5 | 6 | class StructuralCoherencyJudger: 7 | SYS_PROMPT = """You are an expert judge evaluating mathematical modeling papers. Your task is to assess the structural coherency of the paper by checking if it contains all necessary components. 8 | 9 | Key components to evaluate (up to 1 point each): 10 | 11 | 1. Problem Restatement (0-1): 12 | 0.00: Missing or completely misunderstood 13 | Example: Simply copying problem text or missing key elements 14 | 0.25: Present but superficial 15 | Example: Basic bullet points of requirements without context 16 | 0.50: Adequate but lacks depth 17 | Example: Covers main points but misses subtle relationships 18 | 0.75: Good with minor gaps 19 | Example: Clear understanding but could elaborate connections 20 | 1.00: Excellent and comprehensive 21 | Example: Deep understanding with clear relationships and context 22 | 23 | 2. Assumptions and Justification (0-1): 24 | 0.00: Missing or unjustified 25 | Example: No assumptions listed or completely unreasonable ones 26 | 0.25: Listed but poorly justified 27 | Example: "We assume linear relationship" without explanation 28 | 0.50: Reasonable but incomplete 29 | Example: Key assumptions stated but some justifications weak 30 | 0.75: Well-justified with minor gaps 31 | Example: Clear justifications but missing some implications 32 | 1.00: Comprehensive and thorough 33 | Example: All assumptions clearly stated, justified, and impacts explained 34 | 35 | 3. Modeling Implementation (0-1): 36 | 0.00: Missing or fundamentally flawed 37 | Example: No clear mathematical formulation 38 | 0.25: Basic but poorly developed 39 | Example: Equations listed without explanation or context 40 | 0.50: Sound but lacks rigor 41 | Example: Correct approach but missing some derivations 42 | 0.75: Strong with minor omissions 43 | Example: Clear formulation but could be more detailed 44 | 1.00: Rigorous and complete 45 | Example: Clear, justified, and thorough mathematical development 46 | 47 | 4. Solution Process (0-1): 48 | 0.00: Missing or invalid 49 | Example: No solution method or completely incorrect approach 50 | 0.25: Vague or incomplete 51 | Example: "We solved using computer" without details 52 | 0.50: Basic but workable 53 | Example: Solution steps listed but lacking validation 54 | 0.75: Clear with minor gaps 55 | Example: Well-documented but missing some error analysis 56 | 1.00: Comprehensive and validated 57 | Example: Clear steps, validation, and error analysis 58 | 59 | 5. Analysis (0-1): 60 | 0.00: Missing or invalid 61 | Example: No analysis or completely wrong interpretations 62 | 0.25: Superficial discussion 63 | Example: Basic statements without supporting evidence 64 | 0.50: Valid but limited 65 | Example: Correct analysis but missing sensitivity tests 66 | 0.75: Thorough with minor gaps 67 | Example: Good analysis but could explore more implications 68 | 1.00: Deep and insightful 69 | Example: Comprehensive analysis with validation and implications 70 | 71 | --- 72 | 73 | Your response must follow this exact format: 74 | 75 | Your Response: 76 | ```json 77 | { 78 | "scores": { 79 | "problem_restatement": 0.0, 80 | "assumptions": 0.0, 81 | "modeling_implementation": 0.0, 82 | "solution_process": 0.0, 83 | "analysis": 0.0 84 | }, 85 | "explanation": { 86 | "problem_restatement": "why this score", 87 | "assumptions": "why this score", 88 | "modeling_implementation": "why this score", 89 | "solution_process": "why this score", 90 | "analysis": "why this score" 91 | } 92 | } 93 | ``` 94 | 95 | --- 96 | 97 | Note: For each component, score must be exactly 0.0, 0.25, 0.50, 0.75, or 1.00. Be extremely critical - most solutions should score in the 0.25-0.50 range unless truly exceptional.""" 98 | 99 | USER_PROMPT = """Please evaluate the structural coherency of the following mathematical modeling paper: 100 | 101 | {writing} 102 | 103 | Provide scores and explanations for each component. 104 | 105 | Your Response: 106 | """ 107 | 108 | def __init__(self): 109 | if os.path.exists("../../secret.json"): 110 | secret = json.load(open("../../secret.json")) 111 | self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"]) 112 | else: 113 | self.client = OpenAI(api_key="sk-...") 114 | 115 | def run(self, writing: str) -> dict: 116 | messages = [ 117 | {'role': 'system', 'content': self.SYS_PROMPT}, 118 | {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing)} 119 | ] 120 | 121 | response = self.client.chat.completions.create( 122 | model="gpt-4o-mini", 123 | messages=messages, 124 | temperature=0.0, 125 | n=1, 126 | ) 127 | 128 | content = response.choices[0].message.content 129 | json_str = content.split("```json")[1].split("```")[0].strip() 130 | result = ast.literal_eval(json_str) 131 | total_score = sum(result["scores"].values()) 132 | result["total_score"] = total_score 133 | average_score = total_score / len(result["scores"]) 134 | result["calculated_overall"] = average_score 135 | 136 | return result -------------------------------------------------------------------------------- /src/tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/tools/__init__.py -------------------------------------------------------------------------------- /src/tools/base.py: -------------------------------------------------------------------------------- 1 | class BaseTool: 2 | """ 3 | A base class for building tool classes that perform specific tasks, such as image processing or text detection. 4 | """ 5 | 6 | require_llm_engine = False # Default is False, tools that need LLM should set this to True 7 | 8 | def __init__(self, tool_name=None, tool_description=None, tool_version=None, input_types=None, output_type=None, demo_commands=None, output_dir=None, user_metadata=None, model_string=None): 9 | """ 10 | Initialize the base tool with optional metadata. 11 | 12 | Parameters: 13 | tool_name (str): The name of the tool. 14 | tool_description (str): A description of the tool. 15 | tool_version (str): The version of the tool. 16 | input_types (dict): The expected input types for the tool. 17 | output_type (str): The expected output type for the tool. 18 | demo_commands (list): A list of example commands for using the tool. 19 | output_dir (str): The directory where the tool should save its output (optional). 20 | user_metadata (dict): Additional metadata specific to user needs (optional). 21 | model_string (str): The model string for the LLM engine (optional, only used if require_llm_engine is True). 22 | """ 23 | self.tool_name = tool_name 24 | self.tool_description = tool_description 25 | self.tool_version = tool_version 26 | self.input_types = input_types 27 | self.output_type = output_type 28 | self.demo_commands = demo_commands 29 | self.output_dir = output_dir 30 | self.user_metadata = user_metadata 31 | self.model_string = model_string 32 | 33 | def set_metadata(self, tool_name, tool_description, tool_version, input_types, output_type, demo_commands, user_metadata=None): 34 | """ 35 | Set the metadata for the tool. 36 | 37 | Parameters: 38 | tool_name (str): The name of the tool. 39 | tool_description (str): A description of the tool. 40 | tool_version (str): The version of the tool. 41 | input_types (dict): The expected input types for the tool. 42 | output_type (str): The expected output type for the tool. 43 | demo_commands (list): A list of example commands for using the tool. 44 | user_metadata (dict): Additional metadata specific to user needs (optional). 45 | """ 46 | self.tool_name = tool_name 47 | self.tool_description = tool_description 48 | self.tool_version = tool_version 49 | self.input_types = input_types 50 | self.output_type = output_type 51 | self.demo_commands = demo_commands 52 | self.user_metadata = user_metadata 53 | 54 | def get_metadata(self): 55 | """ 56 | Returns the metadata for the tool. 57 | 58 | Returns: 59 | dict: A dictionary containing the tool's metadata. 60 | """ 61 | metadata = { 62 | "tool_name": self.tool_name, 63 | "tool_description": self.tool_description, 64 | "tool_version": self.tool_version, 65 | "input_types": self.input_types, 66 | "output_type": self.output_type, 67 | "demo_commands": self.demo_commands, 68 | "require_llm_engine": self.require_llm_engine, 69 | } 70 | if self.user_metadata: 71 | metadata["user_metadata"] = self.user_metadata 72 | return metadata 73 | 74 | def set_custom_output_dir(self, output_dir): 75 | """ 76 | Set a custom output directory for the tool. 77 | 78 | Parameters: 79 | output_dir (str): The new output directory path. 80 | """ 81 | self.output_dir = output_dir 82 | 83 | def set_llm_engine(self, model_string): 84 | """ 85 | Set the LLM engine for the tool. 86 | 87 | Parameters: 88 | model_string (str): The model string for the LLM engine. 89 | """ 90 | self.model_string = model_string 91 | 92 | def execute(self, *args, **kwargs): 93 | """ 94 | Execute the tool's main functionality. This method should be overridden by subclasses. 95 | 96 | Raises: 97 | NotImplementedError: If the subclass does not implement this method. 98 | """ 99 | raise NotImplementedError("Subclasses must implement the execute method.") -------------------------------------------------------------------------------- /src/tools/code_executor.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tempfile 3 | import subprocess 4 | import threading 5 | 6 | from .base import BaseTool 7 | 8 | class Python_Execution_Tool(BaseTool): 9 | require_llm_engine = False 10 | 11 | def __init__(self): 12 | super().__init__( 13 | tool_name="Python_Execution_Tool", 14 | tool_description="A tool that executes Python code from a file or provided content, handling errors and timeouts.", 15 | tool_version="1.0.0", 16 | input_types={ 17 | "file_path": "str - The path to the Python file from the workspace (default to 'workspace/temp.py').", 18 | "code_content": "str - The Python code to execute. If provided, it will overwrite the file or be executed from a temp file." 19 | }, 20 | output_type="str - The output or error messages from executing the code.", 21 | demo_commands=[ 22 | { 23 | "command": "execution = tool.execute(file_path='workspace/script.py')", 24 | "description": "Execute an existing Python script." 25 | }, 26 | { 27 | "command": "execution = tool.execute(code_content='print(\"Hello, World!\")')", 28 | "description": "Execute provided Python code directly." 29 | } 30 | ], 31 | user_metadata={ 32 | "limitation": "Any error encountered will be returned. This tool will faithfully execute what you provide or what is in the code file, without any validation or refining. ⚠️ The current sandbox environment **does NOT allow installing new Python packages**, especially those that require compiling native C/C++ libraries (e.g., GDAL/GEOS/PROJ). Your code should use pure python, not anythings using g++. You can use numpy, pandas, scipy, etc.", 33 | "best_practice": "If the code content is given, ensure it is well structured and is directly executable. If the file path is provided, ensure the file exists and is a valid Python script.\nEnsure in the code file path should all be relative path within the workspace. Use this tool for quick code execution and experiment." 34 | }, 35 | ) 36 | 37 | def execute(self, file_path=None, code_content=None): 38 | """ 39 | Execute Python code from file_path or code_content. 40 | Returns a dict: { "success": bool, "message": str }. 41 | 42 | - success: True if code executed without 'Error:' or 'Time limit exceeded' in output. 43 | False otherwise (including invalid file path, exception, etc.) 44 | - message: The output or error message string. 45 | """ 46 | # Quick checks 47 | if not file_path and not code_content: 48 | return { 49 | "success": False, 50 | "message": "Error: Either file_path or code_content must be provided." 51 | } 52 | 53 | workspace_path = os.getenv("WORKSPACE_PATH", "workspace") 54 | execution_file = None 55 | 56 | if file_path: 57 | if "workspace" not in file_path: 58 | file_path = os.path.join(workspace_path, file_path) 59 | else: 60 | file_path = os.path.join(workspace_path, file_path.split("workspace/")[-1]) 61 | 62 | try: 63 | if code_content: 64 | # If code_content is given: 65 | # 1) if file_path is also provided, we write code_content to that file 66 | # 2) else we create a temp file 67 | if file_path: 68 | execution_file = file_path 69 | with open(execution_file, "w", encoding="utf-8") as f: 70 | f.write(code_content) 71 | else: 72 | import tempfile 73 | temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".py", dir=workspace_path) 74 | execution_file = temp_file.name 75 | print(f"Temporary file created: {execution_file}") 76 | with open(execution_file, "w", encoding="utf-8") as f: 77 | f.write(code_content) 78 | temp_file.close() 79 | else: 80 | # code_content is None => must run an existing Python file 81 | execution_file = file_path 82 | if not os.path.isfile(execution_file): 83 | return { 84 | "success": False, 85 | "message": "Error: Provided file path does not exist." 86 | } 87 | 88 | # We'll use a dictionary to store the output 89 | result = {"output": "Execution Error: Time limit exceeded."} 90 | 91 | def run_code(): 92 | try: 93 | result["output"] = subprocess.check_output( 94 | ["python", execution_file], 95 | stderr=subprocess.STDOUT, 96 | text=True, 97 | timeout=15 98 | ) 99 | except subprocess.TimeoutExpired: 100 | result["output"] = "Execution Error: Time limit exceeded." 101 | except subprocess.CalledProcessError as e: 102 | result["output"] = f"Execution Error: {e.output}" 103 | except Exception as e: 104 | result["output"] = f"Unexpected Error: {str(e)}" 105 | 106 | execution_thread = threading.Thread(target=run_code) 107 | execution_thread.start() 108 | execution_thread.join(timeout=15) 109 | 110 | # Check if result contains errors 111 | out_str = result["output"] 112 | # Simple check: if contains "Error:" or "Time limit exceeded" then failed 113 | # You can refine this check as needed 114 | if "Error:" in out_str or "time limit exceeded" in out_str.lower(): 115 | return { 116 | "success": False, 117 | "message": out_str 118 | } 119 | else: 120 | return { 121 | "success": True, 122 | "message": out_str 123 | } 124 | 125 | except Exception as e: 126 | # If error occurs elsewhere in try block, also return error 127 | return { 128 | "success": False, 129 | "message": f"Error: {str(e)}" 130 | } 131 | finally: 132 | # Cleanup temp file if used 133 | if code_content and not file_path and execution_file: 134 | os.remove(execution_file) 135 | 136 | 137 | if __name__ == "__main__": 138 | tool = Python_Execution_Tool() 139 | execution = tool.execute(file_path="workspace/output.py") 140 | print("Execution Output:") 141 | print(execution) -------------------------------------------------------------------------------- /src/tools/engine.py: -------------------------------------------------------------------------------- 1 | from openai import OpenAI 2 | import os 3 | import base64 4 | from tenacity import ( 5 | retry, 6 | stop_after_attempt, 7 | wait_random_exponential, 8 | ) 9 | from typing import List, Union 10 | 11 | import openai 12 | 13 | # FIXME Define global constant for structured models 14 | OPENAI_STRUCTURED_MODELS = ['gpt-4o'] 15 | 16 | 17 | class ChatOpenAI(): 18 | DEFAULT_SYSTEM_PROMPT = "You are a helpful, creative, and smart assistant." 19 | 20 | def __init__( 21 | self, 22 | model_string="gpt-4o", 23 | system_prompt=DEFAULT_SYSTEM_PROMPT, 24 | is_multimodal: bool=False, 25 | **kwargs): 26 | """ 27 | :param model_string: 28 | :param system_prompt: 29 | :param is_multimodal: 30 | """ 31 | self.system_prompt = system_prompt 32 | if os.getenv("OPENAI_API_KEY") is None: 33 | raise ValueError("Please set the OPENAI_API_KEY environment variable if you'd like to use OpenAI models.") 34 | 35 | self.client = OpenAI( 36 | api_key=os.getenv("OPENAI_API_KEY"), 37 | ) 38 | self.model_string = model_string 39 | self.is_multimodal = is_multimodal 40 | 41 | 42 | @retry(wait=wait_random_exponential(min=1, max=5), stop=stop_after_attempt(5)) 43 | def generate(self, content: Union[str, List[Union[str, bytes]]], system_prompt=None, **kwargs): 44 | try: 45 | # Print retry attempt information 46 | attempt_number = self.generate.retry.statistics.get('attempt_number', 0) + 1 47 | if attempt_number > 1: 48 | print(f"Attempt {attempt_number} of 5") 49 | 50 | if isinstance(content, str): 51 | return self._generate_text(content, system_prompt=system_prompt, **kwargs) 52 | 53 | elif isinstance(content, list): 54 | if (not self.is_multimodal): 55 | raise NotImplementedError("Multimodal generation is only supported for GPT-4 models.") 56 | 57 | return self._generate_multimodal(content, system_prompt=system_prompt, **kwargs) 58 | 59 | except openai.LengthFinishReasonError as e: 60 | print(f"Token limit exceeded: {str(e)}") 61 | print(f"Tokens used - Completion: {e.completion.usage.completion_tokens}, Prompt: {e.completion.usage.prompt_tokens}, Total: {e.completion.usage.total_tokens}") 62 | return { 63 | "error": "token_limit_exceeded", 64 | "message": str(e), 65 | "details": { 66 | "completion_tokens": e.completion.usage.completion_tokens, 67 | "prompt_tokens": e.completion.usage.prompt_tokens, 68 | "total_tokens": e.completion.usage.total_tokens 69 | } 70 | } 71 | except openai.RateLimitError as e: 72 | print(f"Rate limit error encountered: {str(e)}") 73 | return { 74 | "error": "rate_limit", 75 | "message": str(e), 76 | "details": getattr(e, 'args', None) 77 | } 78 | except Exception as e: 79 | print(f"Error in generate method: {str(e)}") 80 | print(f"Error type: {type(e).__name__}") 81 | print(f"Error details: {e.args}") 82 | return { 83 | "error": type(e).__name__, 84 | "message": str(e), 85 | "details": getattr(e, 'args', None) 86 | } 87 | 88 | def _generate_text( 89 | self, prompt, system_prompt=None, temperature=0, max_tokens=4000, top_p=0.99, response_format=None 90 | ): 91 | 92 | sys_prompt_arg = system_prompt if system_prompt else self.system_prompt 93 | 94 | if self.model_string in ['o1', 'o1-mini']: # only supports base response currently 95 | response = self.client.beta.chat.completions.parse( 96 | model=self.model_string, 97 | messages=[ 98 | {"role": "user", "content": prompt}, 99 | ], 100 | max_completion_tokens=max_tokens 101 | ) 102 | if response.choices[0].finishreason == "length": 103 | response = "Token limit exceeded" 104 | else: 105 | response = response.choices[0].message.parsed 106 | elif self.model_string in OPENAI_STRUCTURED_MODELS and response_format is not None: 107 | response = self.client.beta.chat.completions.parse( 108 | model=self.model_string, 109 | messages=[ 110 | {"role": "system", "content": sys_prompt_arg}, 111 | {"role": "user", "content": prompt}, 112 | ], 113 | frequency_penalty=0, 114 | presence_penalty=0, 115 | stop=None, 116 | temperature=temperature, 117 | max_tokens=max_tokens, 118 | top_p=top_p, 119 | response_format=response_format 120 | ) 121 | response = response.choices[0].message.parsed 122 | else: 123 | response = self.client.chat.completions.create( 124 | model=self.model_string, 125 | messages=[ 126 | {"role": "system", "content": sys_prompt_arg}, 127 | {"role": "user", "content": prompt}, 128 | ], 129 | frequency_penalty=0, 130 | presence_penalty=0, 131 | stop=None, 132 | temperature=temperature, 133 | max_tokens=max_tokens, 134 | top_p=top_p, 135 | ) 136 | response = response.choices[0].message.content 137 | return response 138 | 139 | def __call__(self, prompt, **kwargs): 140 | return self.generate(prompt, **kwargs) 141 | 142 | def _format_content(self, content: List[Union[str, bytes]]) -> List[dict]: 143 | formatted_content = [] 144 | for item in content: 145 | if isinstance(item, bytes): 146 | base64_image = base64.b64encode(item).decode('utf-8') 147 | formatted_content.append({ 148 | "type": "image_url", 149 | "image_url": { 150 | "url": f"data:image/jpeg;base64,{base64_image}" 151 | } 152 | }) 153 | elif isinstance(item, str): 154 | formatted_content.append({ 155 | "type": "text", 156 | "text": item 157 | }) 158 | else: 159 | raise ValueError(f"Unsupported input type: {type(item)}") 160 | return formatted_content 161 | 162 | def _generate_multimodal( 163 | self, content: List[Union[str, bytes]], system_prompt=None, temperature=0, max_tokens=4000, top_p=0.99, response_format=None 164 | ): 165 | sys_prompt_arg = system_prompt if system_prompt else self.system_prompt 166 | formatted_content = self._format_content(content) 167 | 168 | if self.model_string in ['o1', 'o1-mini']: # only supports base response currently 169 | print(f'Max tokens: {max_tokens}') 170 | response = self.client.chat.completions.create( 171 | model=self.model_string, 172 | messages=[ 173 | {"role": "user", "content": formatted_content}, 174 | ], 175 | max_completion_tokens=max_tokens 176 | ) 177 | if response.choices[0].finish_reason == "length": 178 | response_text = "Token limit exceeded" 179 | else: 180 | response_text = response.choices[0].message.content 181 | elif self.model_string in OPENAI_STRUCTURED_MODELS and response_format is not None: 182 | response = self.client.beta.chat.completions.parse( 183 | model=self.model_string, 184 | messages=[ 185 | {"role": "system", "content": sys_prompt_arg}, 186 | {"role": "user", "content": formatted_content}, 187 | ], 188 | temperature=temperature, 189 | max_tokens=max_tokens, 190 | top_p=top_p, 191 | response_format=response_format 192 | ) 193 | response_text = response.choices[0].message.parsed 194 | else: 195 | response = self.client.chat.completions.create( 196 | model=self.model_string, 197 | messages=[ 198 | {"role": "system", "content": sys_prompt_arg}, 199 | {"role": "user", "content": formatted_content}, 200 | ], 201 | temperature=temperature, 202 | max_tokens=max_tokens, 203 | top_p=top_p, 204 | ) 205 | response_text = response.choices[0].message.content 206 | return response_text -------------------------------------------------------------------------------- /src/tools/file_editor.py: -------------------------------------------------------------------------------- 1 | import os, re, difflib, shutil 2 | from .base import BaseTool 3 | 4 | 5 | class File_Edit_Tool(BaseTool): 6 | require_llm_engine = False 7 | 8 | def __init__(self): 9 | super().__init__( 10 | tool_name="File_Edit_Tool", 11 | tool_description="Edit an existing text file via regex or line-number operations.", 12 | tool_version="1.0.0", 13 | input_types={ 14 | "file_path": "str – path relative to workspace/", 15 | "operation": "str – one of replace|insert_after|insert_before|delete", 16 | "target": "str|int – regex (string) **or** 1-based line number", 17 | "content": "str – text to insert / replace with (ignored for delete)", 18 | "occurrence": "str – 'first' (default) or 'all'" 19 | }, 20 | output_type="dict – {success: bool, message: str, diff_preview: str}", 21 | demo_commands=[ 22 | { 23 | "command": ( 24 | 'tool.execute(' 25 | 'file_path="workspace/experiments/solver.py", ' 26 | 'operation="replace", ' 27 | 'target="def step\\(", ' 28 | 'content="# TODO: refactor step()", ' 29 | 'occurrence="first")' 30 | ), 31 | "description": "Replace the first occurrence of `def step(`" 32 | } 33 | ], 34 | user_metadata={ 35 | "limitation": ( 36 | "Text-only. Binary files are not supported. Keep each edit focused and small " 37 | "to minimise merge conflicts or unintended changes." 38 | ), 39 | "best_practice": ( 40 | "Pinpoint the edit with a unique regex or an exact line number. " 41 | "Always inspect the `diff_preview` the tool returns, and, if you " 42 | "need to run the modified code, follow up with the Code_Execution_Tool." 43 | ), 44 | } 45 | ) 46 | 47 | # ---------- internal helpers ---------- 48 | def _make_backup(self, path): 49 | bak = path + ".bak" 50 | shutil.copy2(path, bak) 51 | 52 | def _build_diff(self, before_lines, after_lines, ctx=3): 53 | diff = difflib.unified_diff( 54 | before_lines, after_lines, lineterm="", n=ctx, 55 | fromfile="before", tofile="after" 56 | ) 57 | # Only return first 200 lines of diff preview to prevent excessive length 58 | return "\n".join(list(diff)[:200]) 59 | 60 | # ---------- main entry ---------- 61 | def execute( 62 | self, 63 | file_path: str, 64 | operation: str, 65 | target, 66 | content: str = "", 67 | occurrence: str = "first" 68 | ): 69 | try: 70 | if operation not in {"replace", "insert_after", "insert_before", "delete"}: 71 | return {"success": False, "message": f"unknown operation: {operation}"} 72 | 73 | # Parse workspace absolute path 74 | ws_root = os.getenv("WORKSPACE_PATH", "workspace") 75 | if file_path.startswith("workspace/"): 76 | file_path = file_path[len("workspace/"):] 77 | abs_path = os.path.join(ws_root, file_path) 78 | 79 | if not os.path.exists(abs_path): 80 | return {"success": False, "message": f"file not found: {abs_path}"} 81 | 82 | with open(abs_path, "r", encoding="utf-8") as f: 83 | lines = f.readlines() 84 | 85 | self._make_backup(abs_path) # Backup 86 | 87 | # ---- Locate lines ---- 88 | # target is int -> exact line number 89 | matches = [] 90 | if isinstance(target, int): 91 | idx = target - 1 # Convert to 0-base 92 | if 0 <= idx < len(lines): 93 | matches = [idx] 94 | else: # regex 95 | pattern = re.compile(str(target)) 96 | matches = [i for i, ln in enumerate(lines) if pattern.search(ln)] 97 | 98 | if not matches: 99 | return {"success": False, "message": "no match found"} 100 | 101 | if occurrence == "first": 102 | matches = matches[:1] 103 | 104 | # ---- Execute changes ---- 105 | for i in (sorted(matches, reverse=True) # Process in reverse order to avoid index shifting 106 | if operation.startswith("insert") else matches): 107 | if operation == "replace": 108 | lines[i] = content if content.endswith("\n") else content + "\n" 109 | elif operation == "insert_after": 110 | insert = content if content.endswith("\n") else content + "\n" 111 | lines.insert(i + 1, insert) 112 | elif operation == "insert_before": 113 | insert = content if content.endswith("\n") else content + "\n" 114 | lines.insert(i, insert) 115 | elif operation == "delete": 116 | lines.pop(i) 117 | 118 | # ---- Write back ---- 119 | with open(abs_path, "w", encoding="utf-8") as f: 120 | f.writelines(lines) 121 | 122 | diff_preview = self._build_diff(before_lines=[], after_lines=[]) # optional: skip large diff 123 | diff_preview = self._build_diff( 124 | before_lines=open(abs_path + ".bak", encoding="utf-8").read().splitlines(), 125 | after_lines=lines 126 | ) 127 | 128 | return { 129 | "success": True, 130 | "message": f"{operation} on {len(matches)} occurrence(s) done.", 131 | "diff_preview": diff_preview 132 | } 133 | 134 | except Exception as e: 135 | return {"success": False, "message": f"edit error: {e}"} 136 | -------------------------------------------------------------------------------- /src/tools/file_lister.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .base import BaseTool 3 | 4 | class File_Lister_Tool(BaseTool): 5 | require_llm_engine = False 6 | 7 | def __init__(self): 8 | super().__init__( 9 | tool_name="File_Lister_Tool", 10 | tool_description="A tool that lists all files in a given directory recursively.", 11 | tool_version="1.0.0", 12 | input_types={"dir_path": "str - The directory path starting from the workspace (default: workspace/)."}, 13 | output_type="str - A list of all files under the directory with their paths, indicating file or folder type.", 14 | demo_commands=[ 15 | { 16 | "command": 'execution = tool.execute(dir_path="workspace/")', 17 | "description": "List all files under the current workspace directory." 18 | } 19 | ], 20 | user_metadata={ 21 | "limitation": "May not work properly if the directory contains restricted access files.", 22 | "best_practice": "Use this tool for listing files in structured directories to check what is present in the workspace or before writing to a certain file." 23 | }, 24 | ) 25 | 26 | def execute(self, dir_path="workspace/"): 27 | """ 28 | List the files under the given dir_path (relative to WORKSPACE_PATH). 29 | Returns a dict with { 'success': bool, 'message': str }. 30 | 31 | Enhancement: 32 | - If it's a folder, append "(folder)" 33 | - If it's a file with an extension, show "(.{ext})" 34 | - If it's a file without extension, just "(file)" 35 | - If the path is not a directory but is a file, return an error. 36 | """ 37 | try: 38 | # Ensure the path starts from the workspace 39 | workspace_path = os.getenv("WORKSPACE_PATH") 40 | if dir_path == "workspace": 41 | dir_path = "workspace/" 42 | if "workspace" not in dir_path: 43 | dir_path = os.path.join(workspace_path, dir_path) 44 | else: 45 | # remove "workspace/" from dir_path, then join with workspace_path 46 | dir_path = os.path.join(workspace_path, dir_path.split("workspace/")[-1]) 47 | 48 | if not os.path.isdir(dir_path): 49 | # 如果不是目录,但却是文件,则报错提示 50 | if os.path.isfile(dir_path): 51 | return { 52 | "success": False, 53 | "message": "Error: The path points to a file, not a directory." 54 | } 55 | else: 56 | return { 57 | "success": False, 58 | "message": "Error: Invalid directory path." 59 | } 60 | 61 | file_structure = [] 62 | 63 | def list_files(current_path, prefix=""): 64 | entries = sorted(os.listdir(current_path)) 65 | for index, entry in enumerate(entries): 66 | full_path = os.path.join(current_path, entry) 67 | is_last = (index == len(entries) - 1) 68 | new_prefix = prefix + (" " if is_last else "| ") 69 | relative_path = os.path.relpath(full_path, workspace_path) 70 | 71 | if os.path.isdir(full_path): 72 | # It's a folder 73 | file_structure.append(f"{prefix}|-- {entry} (folder)") 74 | # Recursively list folder contents 75 | list_files(full_path, new_prefix) 76 | else: 77 | # It's a file -> check extension 78 | base_name, ext = os.path.splitext(entry) 79 | if ext: 80 | extension_without_dot = ext[1:] 81 | file_structure.append( 82 | f"{prefix}|-- {entry} (.{extension_without_dot}) (path: workspace/{relative_path})" 83 | ) 84 | else: 85 | file_structure.append( 86 | f"{prefix}|-- {entry} (file) (path: workspace/{relative_path})" 87 | ) 88 | 89 | # First add the root directory name 90 | relative_path = os.path.relpath(dir_path, workspace_path) 91 | if relative_path.strip() == ".": 92 | output = "workspace" 93 | else: 94 | output = os.path.basename(dir_path) 95 | 96 | # Recursively build the structure 97 | list_files(dir_path) 98 | 99 | # Combine results 100 | output += "\n" + "\n".join(file_structure) 101 | 102 | return { 103 | "success": True, 104 | "message": output 105 | } 106 | 107 | except Exception as e: 108 | return { 109 | "success": False, 110 | "message": f"Error listing files: {str(e)}" 111 | } 112 | 113 | 114 | if __name__ == "__main__": 115 | tool = File_Lister_Tool() 116 | 117 | # Example directory path 118 | relative_dir_path = "workspace/sample" 119 | 120 | try: 121 | execution = tool.execute(dir_path=relative_dir_path) 122 | if execution["success"]: 123 | print("File List:") 124 | print(execution["message"]) 125 | else: 126 | print("Error:", execution["message"]) 127 | except Exception as e: 128 | print(f"Execution failed: {e}") 129 | 130 | print("Done!") 131 | -------------------------------------------------------------------------------- /src/tools/file_reader.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import pandas as pd 4 | from PyPDF2 import PdfReader 5 | import pymupdf 6 | 7 | from .base import BaseTool 8 | 9 | class File_Reader_Tool(BaseTool): 10 | require_llm_engine = False 11 | 12 | def __init__(self): 13 | super().__init__( 14 | tool_name = "File_Reader_Tool", 15 | tool_description = "A tool that reads and processes various file formats (json, csv, txt, etc.), returning structured text in the file. Contents are limited to 50,000 characters per read.", 16 | tool_version = "1.0.0", 17 | input_types = {"file_path": "str - The path to the file from the current workspace."}, 18 | output_type = "str - The extracted or structured content of the file (limited to 50,000 characters).", 19 | demo_commands=[ 20 | { 21 | "command": 'execution = tool.execute(file_path="workspace/sample.txt")', 22 | "description": "Read the content of a text file." 23 | }, 24 | { 25 | "command": 'execution = tool.execute(file_path="workspace/sample.csv")', 26 | "description": "Read the content of a CSV file." 27 | }, 28 | ], 29 | user_metadata = { 30 | "limitation": "Limited to 50,000 characters maximum. May not accurately process encrypted, corrupted, or highly complex file structures.", 31 | "best_practice": "Use this tool for reading standard file formats with structured or plain text content. For pdf files, consider using the PDF_Parser_Tool. For large files, consider reading specific sections or using multiple calls." 32 | }, 33 | ) 34 | 35 | def execute(self, file_path): 36 | """ 37 | Read a file from the workspace, returning its content if supported. 38 | Returns a dict: { "success": bool, "message": str }. 39 | 40 | - success: True if read successfully, False if there's an error or invalid path. 41 | - message: The file content (if success, limited to 50,000 characters) or error message (if not). 42 | """ 43 | MAX_CHARS = 50000 # 设置最大字符限制 44 | 45 | try: 46 | # Ensure the path starts from the workspace 47 | workspace_path = os.getenv("WORKSPACE_PATH") 48 | if "workspace" not in file_path: 49 | file_path = os.path.join(workspace_path, file_path) 50 | else: 51 | file_path = os.path.join(workspace_path, file_path.split("workspace/")[-1]) 52 | 53 | if not os.path.isfile(file_path): 54 | return { 55 | "success": False, 56 | "message": "Error: Invalid file path." 57 | } 58 | 59 | file_extension = os.path.splitext(file_path)[-1].lower() 60 | 61 | # import here or at the top of the file, depending on your structure 62 | import pandas as pd 63 | from PyPDF2 import PdfReader 64 | import fitz as pymupdf # if you're using pymupdf 65 | 66 | # Try different file types: 67 | if file_extension in [".json"]: 68 | with open(file_path, "r", encoding="utf-8") as f: 69 | content = json.load(f) 70 | result = json.dumps(content, indent=4, ensure_ascii=False) 71 | if len(result) > MAX_CHARS: 72 | result = result[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..." 73 | return { 74 | "success": True, 75 | "message": result 76 | } 77 | 78 | elif file_extension in [".jsonl"]: 79 | with open(file_path, "r", encoding="utf-8") as f: 80 | lines = [] 81 | total_length = 0 82 | for line in f: 83 | data = json.loads(line) 84 | formatted_line = json.dumps(data, indent=4, ensure_ascii=False) 85 | if total_length + len(formatted_line) > MAX_CHARS: 86 | lines.append("\n... [Content truncated due to 50,000 character limit] ...") 87 | break 88 | lines.append(formatted_line) 89 | total_length += len(formatted_line) 90 | return { 91 | "success": True, 92 | "message": "\n".join(lines) 93 | } 94 | 95 | elif file_extension in [".csv", ".tsv", ".xls", ".xlsx"]: 96 | if file_extension in [".csv", ".tsv"]: 97 | sep = "\t" if file_extension == ".tsv" else "," 98 | df = pd.read_csv(file_path, sep=sep) 99 | else: 100 | df = pd.read_excel(file_path) 101 | result = df.to_string() 102 | if len(result) > MAX_CHARS: 103 | result = result[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..." 104 | return { 105 | "success": True, 106 | "message": result 107 | } 108 | 109 | elif file_extension == ".pdf": 110 | try: 111 | reader = PdfReader(file_path) 112 | text = "" 113 | for page in reader.pages: 114 | page_text = page.extract_text() 115 | if page_text and len(text) + len(page_text) <= MAX_CHARS: 116 | text += page_text + "\n" 117 | elif len(text) + len(page_text) > MAX_CHARS: 118 | text += page_text[:MAX_CHARS-len(text)] + "\n... [Content truncated due to 50,000 character limit] ..." 119 | break 120 | except: 121 | doc = pymupdf.open(file_path) 122 | text = "" 123 | for page in doc: 124 | page_text = page.get_text() 125 | if len(text) + len(page_text) <= MAX_CHARS: 126 | text += page_text + "\n" 127 | else: 128 | text += page_text[:MAX_CHARS-len(text)] + "\n... [Content truncated due to 50,000 character limit] ..." 129 | break 130 | return { 131 | "success": True, 132 | "message": text 133 | } 134 | 135 | elif file_extension in [".html", ".xml"]: 136 | with open(file_path, "r", encoding="utf-8") as f: 137 | text = f.read(MAX_CHARS + 1) # 读取比限制多1个字符以检测是否超出限制 138 | if len(text) > MAX_CHARS: 139 | text = text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..." 140 | return { 141 | "success": True, 142 | "message": text 143 | } 144 | 145 | elif file_extension in [".md", ".txt", ".docx", ".rtf"]: 146 | with open(file_path, "r", encoding="utf-8") as f: 147 | text = f.read(MAX_CHARS + 1) # 读取比限制多1个字符以检测是否超出限制 148 | if len(text) > MAX_CHARS: 149 | text = text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..." 150 | return { 151 | "success": True, 152 | "message": text 153 | } 154 | 155 | else: 156 | # fallback 157 | with open(file_path, "r", encoding="utf-8", errors="ignore") as f: 158 | text = f.read(MAX_CHARS + 1) 159 | if len(text) > MAX_CHARS: 160 | text = text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..." 161 | return { 162 | "success": True, 163 | "message": text 164 | } 165 | 166 | except Exception as e: 167 | # fallback attempt, or final error 168 | try: 169 | with open(file_path, "r", encoding="utf-8", errors="ignore") as f: 170 | fallback_text = f.read(MAX_CHARS + 1) 171 | if len(fallback_text) > MAX_CHARS: 172 | fallback_text = fallback_text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..." 173 | return { 174 | "success": False, 175 | "message": f"Partial read fallback:\n{fallback_text}" 176 | } 177 | except: 178 | return { 179 | "success": False, 180 | "message": f"Error reading file: {str(e)}" 181 | } 182 | 183 | 184 | if __name__ == "__main__": 185 | import json 186 | 187 | tool = File_Reader_Tool() 188 | 189 | # Example file path 190 | relative_file_path = "workspace/sample.json" 191 | 192 | try: 193 | execution = tool.execute(file_path=relative_file_path) 194 | print("File Content:") 195 | print(json.dumps(execution, indent=4)) 196 | except Exception as e: 197 | print(f"Execution failed: {e}") 198 | 199 | print("Done!") -------------------------------------------------------------------------------- /src/tools/file_writer.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .base import BaseTool 3 | 4 | 5 | class File_Writer_Tool(BaseTool): 6 | require_llm_engine = False 7 | 8 | def __init__(self): 9 | super().__init__( 10 | tool_name="File_Writer_Tool", 11 | tool_description="A tool that writes or appends content to a file.", 12 | tool_version="1.0.1", 13 | input_types={ 14 | "file_path": "str - Path to the file (relative or starting with workspace/).", 15 | "content": "str - Text content to write or append.", 16 | "mode": "str - 'w' 覆盖写,'a' 追加写,默认 'a'。" 17 | }, 18 | output_type="dict - { success: bool, message: str }", 19 | demo_commands=[ 20 | { 21 | "command": ( 22 | 'tool.execute(' 23 | 'file_path="workspace/experiments/sample.py", ' 24 | 'content="print(\'Hello\')", mode="w")' 25 | ), 26 | "description": "写入实验脚本" 27 | } 28 | ], 29 | user_metadata={ 30 | "limitation": "You cannot write binary file, your code write here will not be executed. Note that in your code, you should not use any things using g++. You can use numpy, pandas, scipy, etc.", 31 | "best_practice": "先确定正确目录,再写入;如需运行代码请调用 Code Executor。" 32 | }, 33 | ) 34 | 35 | def execute(self, file_path, content, mode="a"): 36 | """ 37 | Write or append text to a file in the workspace. 38 | """ 39 | if not file_path or not content: 40 | return {"success": False, 41 | "message": "file_path 与 content 不能为空"} 42 | 43 | if mode not in ("w", "a"): 44 | return {"success": False, 45 | "message": "mode 只能为 'w' 或 'a'"} 46 | 47 | try: 48 | # 1) 解析工作区路径 49 | workspace_root = os.getenv("WORKSPACE_PATH", "workspace") 50 | 51 | # 若用户给的是绝对路径 / 或其他 workspace 名称,进行处理 52 | if file_path.startswith("workspace/"): 53 | rel_path = file_path[len("workspace/"):] 54 | else: 55 | rel_path = file_path 56 | 57 | file_path = os.path.join(workspace_root, rel_path) 58 | 59 | # 2) 如果父目录不存在,先创建 60 | os.makedirs(os.path.dirname(file_path), exist_ok=True) 61 | 62 | # 3) 写或追加内容 63 | with open(file_path, mode, encoding="utf-8") as f: 64 | f.write(content) 65 | 66 | return {"success": True, 67 | "message": f"Success: written to {file_path}"} 68 | 69 | except Exception as e: 70 | return {"success": False, 71 | "message": f"Error writing file: {e}"} 72 | 73 | 74 | if __name__ == "__main__": 75 | # Demo 76 | tool = File_Writer_Tool() 77 | res = tool.execute( 78 | file_path="workspace/experiments/test.py", 79 | content="print('Hello, World!')", 80 | mode="w" 81 | ) 82 | print(res) -------------------------------------------------------------------------------- /src/tools/image_captioner.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .base import BaseTool 3 | from .engine import ChatOpenAI 4 | 5 | class Image_Captioner_Tool(BaseTool): 6 | require_llm_engine = True 7 | 8 | def __init__(self, model_string="gpt-4o"): 9 | super().__init__( 10 | tool_name="Image_Captioner_Tool", 11 | tool_description="A tool that generates captions for images using OpenAI's multimodal model.", 12 | tool_version="1.0.0", 13 | input_types={ 14 | "image": "str - The path to the image file from current workspace.", 15 | "prompt": "str - The prompt to guide the image captioning (default: 'Describe this image in detail.').", 16 | }, 17 | output_type="str - The generated caption for the image.", 18 | demo_commands=[ 19 | { 20 | "command": 'execution = tool.execute(image="workspace/path/to/image.png")', 21 | "description": "Generate a caption for an image using the default prompt and model." 22 | }, 23 | { 24 | "command": 'execution = tool.execute(image="workspace/path/to/image.png", prompt="Explain the geometric shapes in the image.")', 25 | "description": "Generate a caption focusing on geometric shapes demonstrated in the image." 26 | } 27 | ], 28 | user_metadata = { 29 | "limitation": "The Image_Captioner_Tool may misinterpret complex equations, symbols, or spatial relationships, leading to inaccurate descriptions.", 30 | "best_practice": "Please consider to use it on images with clear and simple content to help you understand the modeling problem, instead of using it for complex data analysis.", 31 | }, 32 | ) 33 | print(f"\nInitializing Image Captioner Tool with model: {model_string}") 34 | self.llm_engine = ChatOpenAI(model_string=model_string, is_multimodal=True) if model_string else None 35 | 36 | def execute(self, image, prompt="Describe this image in detail."): 37 | """ 38 | Generate a caption or description for an image using a multimodal LLM engine. 39 | Returns a dict: { "success": bool, "message": str }. 40 | 41 | - success: True if caption generation was successful, False otherwise. 42 | - message: The caption text (if success) or an error message. 43 | """ 44 | try: 45 | # Check if LLM engine is initialized 46 | if not self.llm_engine: 47 | return { 48 | "success": False, 49 | "message": "Error: LLM engine not initialized. Please provide a valid model_string." 50 | } 51 | 52 | input_data = [prompt] 53 | 54 | # get workspace path from environment 55 | workspace_path = os.getenv("WORKSPACE_PATH", "workspace") 56 | 57 | # ensure the image path is relative to workspace 58 | if "workspace" not in image: 59 | image_path = os.path.join(workspace_path, image) 60 | else: 61 | # remove "workspace/" from the path, then join with workspace_path 62 | image_path = os.path.join(workspace_path, image.split("workspace/")[-1]) 63 | 64 | # Check if the file exists 65 | if not os.path.isfile(image_path): 66 | return { 67 | "success": False, 68 | "message": "Error: Invalid image file path." 69 | } 70 | 71 | # Attempt to read the image file 72 | try: 73 | with open(image_path, 'rb') as file: 74 | image_bytes = file.read() 75 | input_data.append(image_bytes) 76 | except Exception as e: 77 | return { 78 | "success": False, 79 | "message": f"Error reading image file: {str(e)}" 80 | } 81 | 82 | # Attempt to generate caption 83 | try: 84 | caption = self.llm_engine(input_data) 85 | return { 86 | "success": True, 87 | "message": caption 88 | } 89 | except Exception as e: 90 | return { 91 | "success": False, 92 | "message": f"Error generating caption using LLM engine: {str(e)}" 93 | } 94 | 95 | except Exception as e: 96 | return { 97 | "success": False, 98 | "message": f"Error generating caption: {str(e)}" 99 | } 100 | 101 | def get_metadata(self): 102 | metadata = super().get_metadata() 103 | metadata['require_llm_engine'] = self.require_llm_engine # NOTE: can be removed if not needed 104 | return metadata 105 | 106 | 107 | if __name__ == "__main__": 108 | import json 109 | 110 | tool = Image_Captioner_Tool(model_string="gpt-4o") 111 | 112 | # Get tool metadata 113 | metadata = tool.get_metadata() 114 | print(metadata) 115 | 116 | # Construct the full path to the image using the script's directory 117 | relative_image_path = "workspace/Figure1.jpg" 118 | 119 | # Execute the tool with default prompt 120 | try: 121 | execution = tool.execute(image=relative_image_path) 122 | print("Generated Caption:") 123 | print(json.dumps(execution, indent=4)) 124 | except Exception as e: 125 | print(f"Execution failed: {e}") 126 | 127 | print("Done!") -------------------------------------------------------------------------------- /src/tools/pdf_parsing.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .base import BaseTool 3 | from PyPDF2 import PdfReader 4 | import pymupdf 5 | import pymupdf4llm 6 | 7 | class PDF_Parser_Tool(BaseTool): 8 | require_llm_engine = False 9 | 10 | def __init__(self): 11 | super().__init__( 12 | tool_name="PDF_Parser_Tool", 13 | tool_description="A tool that extracts and processes text from PDF documents.", 14 | tool_version="1.0.0", 15 | input_types={ 16 | "pdf_path": "str - The path to the PDF file from the current workspace.", 17 | "num_pages": "int - The number of pages to extract (default: all pages).", 18 | "min_size": "int - The minimum text length required for extraction (default: 100)." 19 | }, 20 | output_type="str - The extracted text from the PDF document.", 21 | demo_commands=[ 22 | { 23 | "command": 'execution = tool.execute(pdf_path="workspace/sample.pdf")', 24 | "description": "Extract text from an entire PDF document." 25 | }, 26 | { 27 | "command": 'execution = tool.execute(pdf_path="workspace/sample.pdf", num_pages=2, min_size=50)', 28 | "description": "Extract text from the first two pages of a PDF with a minimum text length of 50 characters." 29 | } 30 | ], 31 | user_metadata={ 32 | "limitation": "May not accurately extract text from scanned PDFs or those with complex formatting. The extracted text may contain errors or omissions.", 33 | "best_practice": "Use this tool for extracting text from digital PDFs rather than scanned images." 34 | }, 35 | ) 36 | print("\nInitializing PDF Parser Tool") 37 | 38 | def execute(self, pdf_path, num_pages=None, min_size=-1): 39 | """ 40 | Extract text from a PDF using multiple fallback methods (pymupdf4llm -> pymupdf -> pypdf). 41 | Returns a dict with { "success": bool, "message": str }: 42 | - success=True, message=extracted_text on success 43 | - success=False, message=error info on fail 44 | """ 45 | 46 | if type(num_pages) == str: 47 | num_pages == int(num_pages) if num_pages.isdigit() else None 48 | if type(min_size) == str: 49 | min_size = int(min_size) if min_size.isdigit() else -1 50 | 51 | try: 52 | # Ensure the path starts from the workspace 53 | workspace_path = os.getenv("WORKSPACE_PATH", "workspace") 54 | if "workspace" not in pdf_path: 55 | pdf_path = os.path.join(workspace_path, pdf_path) 56 | else: 57 | pdf_path = os.path.join(workspace_path, pdf_path.split("workspace/")[-1]) 58 | 59 | if not os.path.isfile(pdf_path): 60 | return { 61 | "success": False, 62 | "message": "Error: Invalid PDF file path." 63 | } 64 | 65 | text = "" 66 | 67 | try: 68 | # Attempt using pymupdf4llm 69 | if num_pages is None: 70 | text = pymupdf4llm.to_markdown(pdf_path) 71 | else: 72 | reader = PdfReader(pdf_path) 73 | min_pages = min(len(reader.pages), num_pages) 74 | text = pymupdf4llm.to_markdown(pdf_path, pages=list(range(min_pages))) 75 | 76 | if min_size != -1 and len(text) < min_size: 77 | raise Exception("Text too short") 78 | 79 | except Exception as e: 80 | print(f"Error with pymupdf4llm, falling back to pymupdf: {e}") 81 | try: 82 | # Fallback to pure pymupdf 83 | doc = pymupdf.open(pdf_path) 84 | if num_pages: 85 | doc = doc[:num_pages] 86 | text = "".join(page.get_text() for page in doc) 87 | 88 | if min_size != -1 and len(text) < min_size: 89 | raise Exception("Text too short") 90 | 91 | except Exception as e2: 92 | print(f"Error with pymupdf, falling back to pypdf: {e2}") 93 | # Fallback to pypdf 94 | reader = PdfReader(pdf_path) 95 | if num_pages is None: 96 | text = "".join(page.extract_text() for page in reader.pages) 97 | else: 98 | text = "".join( 99 | page.extract_text() for page in reader.pages[:num_pages] 100 | ) 101 | 102 | if min_size != -1 and len(text) < min_size: 103 | raise Exception("Text too short") 104 | 105 | # If everything is ok, return success + text 106 | return { 107 | "success": True, 108 | "message": text 109 | } 110 | 111 | except Exception as e: 112 | return { 113 | "success": False, 114 | "message": f"Error extracting text: {str(e)}" 115 | } 116 | 117 | def get_metadata(self): 118 | metadata = super().get_metadata() 119 | metadata['require_llm_engine'] = self.require_llm_engine 120 | return metadata 121 | 122 | if __name__ == "__main__": 123 | import json 124 | 125 | tool = PDF_Parser_Tool() 126 | 127 | # Get tool metadata 128 | metadata = tool.get_metadata() 129 | print(metadata) 130 | 131 | # Construct the full path to the PDF using the script's directory 132 | os.environ["WORKSPACE_PATH"] = "PATH_TO_TEST_FILE/2025_Managing_Sustainable_Tourism" 133 | relative_pdf_path = "workspace/2025_MCM_Problem_B.pdf" 134 | 135 | # Execute the tool with default parameters 136 | try: 137 | execution = tool.execute(pdf_path=relative_pdf_path) 138 | print("Extracted Text:") 139 | print(json.dumps(execution, indent=4)) 140 | except Exception as e: 141 | print(f"Execution failed: {e}") 142 | 143 | print("Done!") -------------------------------------------------------------------------------- /src/tools/solution_generator.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .base import BaseTool 3 | from .engine import ChatOpenAI 4 | 5 | class Solution_Generator_Tool(BaseTool): 6 | require_llm_engine = True 7 | 8 | def __init__(self, model_string="gpt-4o"): 9 | super().__init__( 10 | tool_name="Generalist_Solution_Generator_Tool", 11 | tool_description="A generalized tool that takes query from the user as prompt, and answers the question step by step to the best of its ability. It can also accept an image.", 12 | tool_version="1.0.0", 13 | input_types={ 14 | "prompt": "str - The prompt that includes query from the user to guide the agent to generate response (Examples: 'Describe this image in detail').", 15 | "image": "str - The path to the image file from current workspace if applicable (default: None).", 16 | }, 17 | output_type="str - The generated response to the original query prompt", 18 | demo_commands=[ 19 | { 20 | "command": 'execution = tool.execute(prompt="Summarize the following text in a few lines")', 21 | "description": "Generate a short summary given the prompt from the user." 22 | }, 23 | { 24 | "command": 'execution = tool.execute(prompt="Give your best coordinate estimate for the pacemaker in the image and return (x1, y1, x2, y2)", image="workspace/path/to/image.png")', 25 | "description": "Generate bounding box coordinates given the image and prompt from the user. The format should be (x1, y1, x2, y2)." 26 | }, 27 | ], 28 | 29 | user_metadata = { 30 | "limitation": "The Solution_Generator_Tool may provide hallucinated or incorrect responses. Besides, the solution generator can only answer SIMPLE questions. Never throw whole question into this tool and expect a proper response.", 31 | "best_practice": "Use the Solution_Generator_Tool for general queries or tasks that don't require specialized knowledge or other specific tools. Provide clear, specific prompts. For complex queries, break them down into subtasks before using this tool." 32 | } 33 | 34 | ) 35 | self.model_string = model_string 36 | 37 | def execute(self, prompt, image=None): 38 | """ 39 | Generates a solution or answer using a ChatOpenAI model (optionally multimodal). 40 | Returns a dict: { "success": bool, "message": str }. 41 | 42 | - success: True if generation succeeded, False otherwise. 43 | - message: The generated text or error message. 44 | """ 45 | print(f"\nInitializing Solution Tool with model: {self.model_string}") 46 | multimodal = True if image else False 47 | 48 | try: 49 | # Initialize the LLM engine 50 | from src.tools.engine import ChatOpenAI # or import wherever ChatOpenAI is 51 | llm_engine = ChatOpenAI(model_string=self.model_string, is_multimodal=multimodal) 52 | except Exception as e: 53 | return { 54 | "success": False, 55 | "message": f"Error initializing ChatOpenAI engine: {str(e)}" 56 | } 57 | 58 | try: 59 | input_data = [prompt] 60 | if multimodal: 61 | if not os.path.isfile(image): 62 | return { 63 | "success": False, 64 | "message": "Error: Invalid image file path." 65 | } 66 | try: 67 | with open(image, 'rb') as file: 68 | image_bytes = file.read() 69 | input_data.append(image_bytes) 70 | except Exception as e: 71 | return { 72 | "success": False, 73 | "message": f"Error reading image file: {str(e)}" 74 | } 75 | # Attempt generating with multimodal 76 | response = llm_engine(input_data) 77 | else: 78 | # Text-only 79 | response = llm_engine(input_data[0]) 80 | 81 | return { 82 | "success": True, 83 | "message": response 84 | } 85 | 86 | except Exception as e: 87 | return { 88 | "success": False, 89 | "message": f"Error generating response: {str(e)}" 90 | } 91 | 92 | def get_metadata(self): 93 | metadata = super().get_metadata() 94 | return metadata 95 | 96 | if __name__ == "__main__": 97 | # Example usage of the Generalist_Tool 98 | tool = Solution_Generator_Tool(model_string="gpt-4o") 99 | 100 | # Get tool metadata 101 | metadata = tool.get_metadata() 102 | print(metadata) 103 | 104 | # Construct the full path to the image using the script's directory 105 | relative_image_path = "workspace/Figure1.jpg" 106 | prompt = "Describe the image in detail." 107 | 108 | # Execute the tool with default prompt 109 | try: 110 | execution = tool.execute(prompt=prompt, image=relative_image_path) 111 | # execution = tool.execute(prompt=prompt) 112 | print("Generated Response:") 113 | print(execution) 114 | except Exception as e: 115 | print(f"Execution failed: {e}") 116 | 117 | print("Done!") -------------------------------------------------------------------------------- /src/tools/text_detector.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | from .base import BaseTool 4 | 5 | import warnings 6 | warnings.filterwarnings("ignore") 7 | 8 | class Text_Detector_Tool(BaseTool): 9 | require_llm_engine = False 10 | 11 | def __init__(self): 12 | super().__init__( 13 | tool_name="Text_Detector_Tool", 14 | tool_description="A tool that detects text in an image using EasyOCR.", 15 | tool_version="1.0.0", 16 | input_types={ 17 | "image": "str - The path to the image file from the current workspace.", 18 | "languages": "list - A list of language codes for the OCR model (default to English and Simplified Chinese only).", 19 | "detail": "int - The level of detail in the output. Set to 0 for simpler output, 1 for detailed output (default to 0 simpler output)." 20 | }, 21 | output_type="list - A list of detected text blocks.", 22 | demo_commands=[ 23 | { 24 | "command": 'execution = tool.execute(image="workspace/path/to/image.png")', 25 | "description": "Detect text in an image using the default language (English and Chinese)." 26 | }, 27 | { 28 | "command": 'execution = tool.execute(image="path/to/image.png", languages=["en"], detail=0)', 29 | "description": "Detect English text in an image with simpler output (text without coordinates and scores)." 30 | }, 31 | ], 32 | user_metadata={ 33 | "limitation": "The Text_Detector_Tool may not accurately detect text in images with complex layouts, fonts, or backgrounds. The tables, numbers, and special characters may not be detected or retain its original structure.", 34 | "best_practice": "Use the Text_Detector_Tool for detecting text in simple images with clear text. try to post process the detected text to improve accuracy and readability. Use the extracted text only as reference for understanding the image content.", 35 | "frequently_used_language": { 36 | "ch_sim": "Simplified Chinese", 37 | "ch_tra": "Traditional Chinese", 38 | "de": "German", 39 | "en": "English", 40 | "es": "Spanish", 41 | "fr": "French", 42 | "hi": "Hindi", 43 | "ja": "Japanese", 44 | } 45 | } 46 | ) 47 | 48 | def build_tool(self, languages=None): 49 | """ 50 | Builds and returns the EasyOCR reader model. 51 | 52 | Parameters: 53 | languages (list): A list of language codes for the OCR model. 54 | 55 | Returns: 56 | easyocr.Reader: An initialized EasyOCR Reader object. 57 | """ 58 | languages = languages or ["en"] # Default to English if no languages provided 59 | 60 | try: 61 | import easyocr 62 | reader = easyocr.Reader(languages) 63 | return reader 64 | except ImportError: 65 | raise ImportError("Please install the EasyOCR package using 'pip install easyocr'.") 66 | except Exception as e: 67 | print(f"Error building the OCR tool: {e}") 68 | return None 69 | 70 | def execute(self, image, languages=None, max_retries=10, retry_delay=5, clear_cuda_cache=False, **kwargs): 71 | """ 72 | Executes the OCR tool to detect text in the provided image. 73 | 74 | Parameters: 75 | image (str): The path to the image file. 76 | languages (list): A list of language codes for the OCR model. 77 | max_retries (int): Maximum number of retry attempts. 78 | retry_delay (int): Delay in seconds between retry attempts. 79 | clear_cuda_cache (bool): Whether to clear CUDA cache on out-of-memory errors. 80 | **kwargs: Additional keyword arguments for the OCR reader. 81 | 82 | Returns: 83 | dict: { 84 | "success": bool, 85 | "message": str, # success/failure info 86 | "data": list # OCR result list (empty if failed) 87 | } 88 | """ 89 | languages = ["en"] 90 | 91 | # get workspace path from environment 92 | workspace_path = os.getenv("WORKSPACE_PATH", "workspace") 93 | if "workspace" not in image: 94 | image = os.path.join(workspace_path, image) 95 | else: 96 | image = os.path.join(workspace_path, image.split("workspace/")[-1]) 97 | 98 | # Check if file exists 99 | if not os.path.isfile(image): 100 | return { 101 | "success": False, 102 | "message": "Error: Invalid image file path.", 103 | "data": [] 104 | } 105 | 106 | # Retry up to max_retries times 107 | for attempt in range(max_retries): 108 | try: 109 | reader = self.build_tool(languages) 110 | if reader is None: 111 | return { 112 | "success": False, 113 | "message": "Error: Failed to build the OCR tool.", 114 | "data": [] 115 | } 116 | 117 | result = reader.readtext(image, **kwargs) 118 | try: 119 | # If detail=1, convert numpy coords to int 120 | cleaned_result = [ 121 | ([[int(coord[0]), int(coord[1])] for coord in item[0]], item[1], round(float(item[2]), 2)) 122 | for item in result 123 | ] 124 | return { 125 | "success": True, 126 | "message": "OCR detection succeeded.", 127 | "data": cleaned_result 128 | } 129 | except Exception: 130 | # detail=0 or other fallback 131 | return { 132 | "success": True, 133 | "message": "OCR detection succeeded (detail=0).", 134 | "data": result 135 | } 136 | 137 | except RuntimeError as e: 138 | if "CUDA out of memory" in str(e): 139 | print(f"CUDA out of memory on attempt {attempt+1}.") 140 | if clear_cuda_cache: 141 | print("Clearing CUDA cache and retrying...") 142 | import torch 143 | torch.cuda.empty_cache() 144 | else: 145 | print(f"Retrying in {retry_delay} seconds...") 146 | time.sleep(retry_delay) 147 | continue 148 | else: 149 | print(f"Runtime error: {e}") 150 | return { 151 | "success": False, 152 | "message": f"Runtime error: {str(e)}", 153 | "data": [] 154 | } 155 | except Exception as e: 156 | print(f"Error detecting text: {e}") 157 | return { 158 | "success": False, 159 | "message": f"Error detecting text: {str(e)}", 160 | "data": [] 161 | } 162 | 163 | # If we exhausted all retries 164 | print(f"Failed to detect text after {max_retries} attempts.") 165 | return { 166 | "success": False, 167 | "message": f"Failed after {max_retries} attempts.", 168 | "data": [] 169 | } 170 | 171 | 172 | def get_metadata(self): 173 | """ 174 | Returns the metadata for the Text_Detector_Tool. 175 | 176 | Returns: 177 | dict: A dictionary containing the tool's metadata. 178 | """ 179 | metadata = super().get_metadata() 180 | return metadata 181 | 182 | 183 | if __name__ == "__main__": 184 | import json 185 | 186 | # Example usage of the Text_Detector_Tool 187 | tool = Text_Detector_Tool() 188 | 189 | # Get tool metadata 190 | metadata = tool.get_metadata() 191 | print(metadata) 192 | 193 | relative_image_path = "workspace/Figure2.jpg" 194 | 195 | # Execute the tool 196 | try: 197 | execution = tool.execute(image=relative_image_path, languages=["en"], detail=0) 198 | print(json.dumps(execution)) 199 | 200 | print("Detected Text:", execution) 201 | except ValueError as e: 202 | print(f"Execution failed: {e}") 203 | 204 | print("Done!") -------------------------------------------------------------------------------- /src/tools/url_text.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from bs4 import BeautifulSoup 4 | 5 | from .base import BaseTool 6 | 7 | class URL_Text_Extractor_Tool(BaseTool): 8 | def __init__(self): 9 | super().__init__( 10 | tool_name="URL_Text_Extractor_Tool", 11 | tool_description="A tool that extracts all text from a given URL.", 12 | tool_version="1.0.0", 13 | input_types={ 14 | "url": "str - The URL from which to extract text.", 15 | }, 16 | output_type="str - The extracted text from the given url and any error messages.", 17 | demo_commands=[ 18 | { 19 | "command": 'execution = tool.execute(url="https://example.com")', 20 | "description": "Extract all text from the example.com website." 21 | }, 22 | { 23 | "command": 'execution = tool.execute(url="https://en.wikipedia.org/wiki/Python_(programming_language)")', 24 | "description": "Extract all text from the Wikipedia page about Python programming language." 25 | }, 26 | ], 27 | user_metadata={ 28 | "limitation": "1. The URL_Text_Extractor_Tool may not accurately extract text from all websites. The extracted text may contain errors or omissions. The text in the images or embedded content may not be extracted. 2. You should not use this tool to download anything or read online document like PDF. Make sure that the url you entered is a website.", 29 | "best_practice": "Use this tool to summarize all the text information from a web page. The extracted text should be used as a reference for understanding the content of the website. Be aware that it may not be exactly complete or accurate." 30 | } 31 | ) 32 | 33 | def extract_text_from_url(self, url): 34 | try: 35 | response = requests.get(url, timeout=10) # optional: set a timeout 36 | response.raise_for_status() 37 | soup = BeautifulSoup(response.content, 'html.parser') 38 | text = soup.get_text(separator='\n', strip=True) 39 | text = text[:10000] # Limit the text to 10000 characters 40 | return { 41 | "success": True, 42 | "message": text 43 | } 44 | except requests.RequestException as e: 45 | return { 46 | "success": False, 47 | "message": f"Error fetching URL: {str(e)}" 48 | } 49 | except Exception as e: 50 | return { 51 | "success": False, 52 | "message": f"Error extracting text: {str(e)}" 53 | } 54 | 55 | def execute(self, url): 56 | """ 57 | Extract text from a given webpage URL, returning a dict: 58 | { "success": bool, "message": str }. 59 | 60 | - success=True, message=extracted_text (trimmed if too long) 61 | - success=False, message=error info 62 | """ 63 | return self.extract_text_from_url(url) 64 | 65 | def get_metadata(self): 66 | metadata = super().get_metadata() 67 | return metadata 68 | 69 | 70 | if __name__ == "__main__": 71 | # Example usage of the URL_Text_Extractor_Tool 72 | tool = URL_Text_Extractor_Tool() 73 | 74 | # Get tool metadata 75 | metadata = tool.get_metadata() 76 | print(metadata) 77 | 78 | # Sample URL for extracting text 79 | url = "https://weather.metoffice.gov.uk/forecast/wx4g092se" 80 | 81 | import json 82 | 83 | # Execute the tool with the sample URL 84 | try: 85 | execution = tool.execute(url=url) 86 | print("Execution Result:") 87 | print(execution) 88 | except ValueError as e: 89 | print(f"Execution failed: {e}") 90 | 91 | print("Done!") -------------------------------------------------------------------------------- /src/tools/web_download.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from .base import BaseTool 4 | 5 | class Web_Download_Tool(BaseTool): 6 | def __init__(self): 7 | super().__init__( 8 | tool_name="Web_Download_Tool", 9 | tool_description="A tool that downloads a file from a given URL and saves it to a specified location.", 10 | tool_version="1.0.0", 11 | input_types={ 12 | "url": "str - The URL of the file to download.", 13 | "save_path": "str - The target save file path starting from the workspace, including the filename." 14 | }, 15 | output_type="str - Success message or error details.", 16 | demo_commands=[ 17 | { 18 | "command": 'execution = tool.execute(url="https://arxiv.org/pdf/paper.pdf", save_path="workspace/paper.pdf")', 19 | "description": "Download a PDF file from arXiv and save it to the workspace." 20 | } 21 | ], 22 | user_metadata={ 23 | "limitation": "Cannot download files from restricted or inaccessible URLs. The downnload may fail if the URL is invalid or the file is too large. Always verify the content type matches the file extension - servers might return HTML error pages even when requesting non-HTML content (e.g., downloading a .zip but getting HTML content with .zip extension).", 24 | "best_practice": "Ensure the URL is valid and the save path includes the intended filename. Check the availability of the file after download using python code or other means." 25 | }, 26 | ) 27 | 28 | def execute(self, url, save_path): 29 | """ 30 | Download a file from a URL to the workspace. 31 | 32 | Returns: 33 | dict: { 34 | "success": bool, 35 | "message": str # success info or error message 36 | } 37 | """ 38 | try: 39 | # Ensure the save path starts from the workspace 40 | workspace_path = os.getenv("WORKSPACE_PATH", "workspace") 41 | if not save_path.startswith("workspace/"): 42 | final_path = os.path.join(workspace_path, save_path) 43 | else: 44 | final_path = os.path.join(workspace_path, save_path.split("workspace/")[-1]) 45 | 46 | # Create necessary directories 47 | os.makedirs(os.path.dirname(final_path), exist_ok=True) 48 | 49 | # Download the file 50 | import requests 51 | response = requests.get(url, stream=True, timeout=10) 52 | response.raise_for_status() # Raise error for failed requests 53 | 54 | with open(final_path, "wb") as f: 55 | for chunk in response.iter_content(chunk_size=8192): 56 | f.write(chunk) 57 | 58 | return { 59 | "success": True, 60 | "message": f"Download successful! The file is saved at {final_path}" 61 | } 62 | 63 | except requests.exceptions.RequestException as e: 64 | return { 65 | "success": False, 66 | "message": f"Download failed (RequestException): {str(e)}" 67 | } 68 | except Exception as e: 69 | return { 70 | "success": False, 71 | "message": f"Error saving file: {str(e)}" 72 | } 73 | 74 | if __name__ == "__main__": 75 | tool = Web_Download_Tool() 76 | execution = tool.execute(url="https://arxiv.org/pdf/2502.01600", save_path="workspace/paper.pdf") 77 | print(execution) 78 | -------------------------------------------------------------------------------- /src/tools/web_search.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import http.client 4 | import time 5 | 6 | from .base import BaseTool 7 | 8 | class Web_Search_Tool(BaseTool): 9 | require_llm_engine = False 10 | 11 | def __init__(self): 12 | super().__init__( 13 | tool_name="Web_Search_Tool", 14 | tool_description="A tool that performs web searches using an API and returns structured search results.", 15 | tool_version="1.0.0", 16 | input_types={ 17 | "query": "str - The search query to retrieve information.", 18 | "link": "bool - Whether to include links in the output (default: True).", 19 | "num": "int - Number of search results to return (default: 10)." 20 | }, 21 | output_type="str - The formatted search results based on the given query.", 22 | demo_commands=[ 23 | { 24 | "command": 'execution = tool.execute(query="Latest AI trends", link=True, num=5)', 25 | "description": "Search for the latest AI trends and return up to 5 results with links." 26 | } 27 | ], 28 | user_metadata={ 29 | "limitation": "Limited by API availability and may not always return results. The snippet may be very concise and may not contain all relevant information.", 30 | "best_practice": "Use this tool for retrieving up-to-date web search results on various topics. Then, use the search link you get from the return to explore more details by further using URL_Text_Extractor_Tool." 31 | } 32 | ) 33 | 34 | def execute(self, query, link=False, num=10): 35 | """ 36 | Perform a web search via Google Serper API. 37 | 38 | Returns: 39 | dict: { 40 | "success": bool, # True if search results obtained, False otherwise 41 | "message": str # The search results or error info 42 | } 43 | """ 44 | 45 | if type(link) == str: 46 | link = False if link.lower() == 'false' else True 47 | if type(num) == str: 48 | num = int(num) if num.isdigit() else 10 49 | 50 | api_key = os.getenv("SERPER_API_KEY", None) 51 | if not api_key: 52 | return { 53 | "success": False, 54 | "message": "Error: Missing SERPER_API_KEY." 55 | } 56 | 57 | import http.client, json, time 58 | 59 | conn = http.client.HTTPSConnection("google.serper.dev") 60 | headers = { 61 | 'X-API-KEY': api_key, 62 | 'Content-Type': 'application/json' 63 | } 64 | payload = json.dumps({ 65 | "q": query, 66 | # "tbs": "qdr:y" # optional param for time range 67 | }) 68 | 69 | try_time = 0 70 | data = {} 71 | while True: 72 | try: 73 | conn.request("POST", "/search", payload, headers) 74 | res = conn.getresponse() 75 | raw_data = res.read().decode("utf-8") 76 | data = json.loads(raw_data) 77 | 78 | if data.get("organic", []): 79 | # We got some results, break 80 | break 81 | 82 | try_time += 1 83 | if try_time > 5: 84 | return { 85 | "success": False, 86 | "message": "Search Error: Timeout or no results after 5 attempts." 87 | } 88 | time.sleep(5) 89 | 90 | except Exception as e: 91 | return { 92 | "success": False, 93 | "message": f"Search Error while sending request: {str(e)}" 94 | } 95 | 96 | try: 97 | output = "" 98 | index = 1 99 | answer_box = data.get("answerBox", {}) 100 | 101 | # If there's an answerBox 102 | if answer_box: 103 | try: 104 | current = f"{index}. {answer_box['title']}" 105 | if link and 'link' in answer_box: 106 | current += f"\n- Link: {answer_box['link']}" 107 | if "date" in answer_box: 108 | current += f"\n- Date: {answer_box['date']}" 109 | current += f"\n- Snippet: {answer_box['snippet']}" 110 | output += current + "\n\n" 111 | index += 1 112 | except Exception: 113 | pass # in case something is missing 114 | 115 | # If we've reached the desired number of results 116 | if index > num: 117 | return { 118 | "success": True, 119 | "message": output.strip() 120 | } 121 | 122 | # Now handle the "organic" array 123 | for item in data.get("organic", []): 124 | try: 125 | current = f"{index}. {item['title']}" 126 | if link and 'link' in item: 127 | current += f"\n- Link: {item['link']}" 128 | if "date" in item: 129 | current += f"\n- Date: {item['date']}" 130 | current += f"\n- Snippet: {item['snippet']}" 131 | output += current + "\n\n" 132 | index += 1 133 | except: 134 | pass 135 | 136 | if index > num: 137 | return { 138 | "success": True, 139 | "message": output.strip() 140 | } 141 | 142 | # Return what we have so far 143 | return { 144 | "success": True, 145 | "message": output.strip() 146 | } 147 | 148 | except Exception as e: 149 | return { 150 | "success": False, 151 | "message": f"Search Error: {str(e)}" 152 | } 153 | 154 | if __name__ == "__main__": 155 | tool = Web_Search_Tool() 156 | query = "How's the weather in Beijing" 157 | execution = tool.execute(query=query, link=True, num=3) 158 | print("Search Results:") 159 | print(execution) 160 | print("Done!") 161 | --------------------------------------------------------------------------------