├── .gitignore
├── README.md
├── assets
    └── pipeline.png
├── data
    └── modeling_data_final.json
├── requirements.txt
└── src
    ├── ModelAgent
        ├── config.yaml
        ├── engines
        │   ├── __init__.py
        │   ├── core.py
        │   ├── data.py
        │   ├── modeling.py
        │   ├── selection.py
        │   ├── simulation.py
        │   └── writing.py
        ├── mathmodel.py
        ├── prompts
        │   ├── __init__.py
        │   ├── assumption.py
        │   ├── data_acquire.py
        │   ├── data_critic.py
        │   ├── factor_critic.py
        │   ├── factor_generation.py
        │   ├── function_call_prompts.py
        │   ├── guess_critic.py
        │   ├── guess_prompt.py
        │   ├── modeling_critic.py
        │   ├── modeling_generate.py
        │   ├── question_extract.py
        │   ├── selection_critic.py
        │   ├── selection_generate.py
        │   ├── simulation_critic.py
        │   ├── simulation_prompts.py
        │   ├── writing_data.py
        │   ├── writing_restatement.py
        │   ├── writing_simulation.py
        │   └── writing_solution.py
        └── utils
        │   ├── shared_context.py
        │   ├── tool_call_parser.py
        │   ├── tool_handler.py
        │   └── utils.py
    ├── ModelBase
        ├── baseline.py
        └── model_config.yaml
    ├── ModelTool
        ├── __init__.py
        ├── baseline.py
        ├── baseprompts.yaml
        ├── model_config.yaml
        └── utils
        │   ├── planner.py
        │   ├── planner_config.yaml
        │   └── planner_prompt.yaml
    ├── host
        ├── host.sh
        ├── tool_chat_hermes_template.jinja
        └── tool_chat_llama3.1_template.jinja
    ├── judger
        ├── analysis_groundedness.py
        ├── data_groundedness.py
        ├── innovativeness.py
        ├── main_judge.py
        ├── modeling_groundedness.py
        ├── scoring_decomposition.py
        └── structural_coherency.py
    └── tools
        ├── __init__.py
        ├── base.py
        ├── code_executor.py
        ├── engine.py
        ├── file_editor.py
        ├── file_extractor.py
        ├── file_lister.py
        ├── file_reader.py
        ├── file_writer.py
        ├── image_captioner.py
        ├── pdf_parsing.py
        ├── solution_generator.py
        ├── text_detector.py
        ├── url_text.py
        ├── web_download.py
        └── web_search.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | **/__pycache__/
 2 | **/.vscode/
 3 | **/.idea/
 4 | **/.DS_Store
 5 | 
 6 | ./baseline_runs
 7 | ./dataagent_runs
 8 | 
 9 | ./src/ModelAgent/log
10 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges
  2 | [**📊 Dataset**](https://github.com/qiancheng0/ModelingAgent/tree/main/data) | [**📖 Paper**](https://www.arxiv.org/pdf/2505.15068)
  3 | 
  4 | This repository contains the official code and dataset for the paper *"ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges."*
  5 | 
  6 | The data includes the ModelingBench dataset, featuring detailed question descriptions, requirements, and evaluation criteria.
  7 | 
  8 | ![Pipeline](assets/pipeline.png)
  9 | 
 10 | ## 🔍 Quick Start
 11 | First, install the required packages by running:
 12 | ```bash
 13 | pip install -r requirements.txt
 14 | ```
 15 | 
 16 | ### Model and API Setup
 17 | Some models may require API keys to function correctly. Please add the appropriate keys to the configuration file located in each directory under `src`.
 18 | 
 19 | We use the Serper API as the backend to support our Search tool. Please include your Serper API key to use this feature ([**link**](https://serper.dev)).
 20 | 
 21 | ```json
 22 | {
 23 |     "openai_key": "YOUR_OPENAI_API_KEY",
 24 |     "google_api_key": "YOUR_GOOGLE_API_KEY",
 25 |     "serper_key": "YOUR_SERPER_API_KEY"
 26 | }
 27 | ```
 28 | 
 29 | If you are testing a self-hosted model, please use the script in the `src/host` directory. We currently support hosting models (and their tool-use functions) through `vllm`. The supported open-source model hosting scripts include: `Llama-3.1-70B-Instruct`, `Qwen-2.5-72B-Instruct`, and `QwQ-32B`.
 30 | 
 31 | ### ModelingBench Data
 32 | Our ModelingBench data is located in the `data` directory. It can be freely used for various purposes. Each data point contains several fields. Here is an example:
 33 | ```json
 34 | "2001_Adolescent_Pregnancy": {
 35 |     "year": "2001",
 36 |     "title": "Adolescent Pregnancy",
 37 |     "level": "High School",
 38 |     "source": "HiMCM",
 39 |     "link": "Problems/2001/HIMCM-A-2/index.html",
 40 |     "question": "You are working temporarily for the Department of Health ...",
 41 |     "requirements": [
 42 |         {
 43 |             "category": "Data Analysis", 
 44 |             "description": "Evaluate the accuracy and completeness of the data ..."
 45 |         }
 46 |     ],
 47 |     "eval_roles": [
 48 |         {
 49 |             "name": "Mathematician",
 50 |             "details": "You are a mathematician with expertise in ..."
 51 |         }
 52 |     ]
 53 | }
 54 | ```
 55 | 
 56 | ## 🧪 Experiments
 57 | 
 58 | ### Testing Code
 59 | We provide testing code for Vanilla Generation in the `ModelBase` directory, Tool Agent in `ModelTool`, and ModelingAgent in `ModelAgent`. Please ensure the model configuration files are correctly set up with the required API keys and configurations.
 60 | 
 61 | Also, set the output directory and other paths properly in the respective entry point file you wish to run. You can then run the following:
 62 | ```bash
 63 | cd ModelBase # For running Vanilla Generation
 64 | python baseline.py
 65 | 
 66 | cd ModelTool # For running Tool Agent
 67 | python baseline.py
 68 | 
 69 | cd ModelAgent # For running ModelingAgent
 70 | python mathmodel.py
 71 | ```
 72 | 
 73 | Please note that some errors may still exist due to the complexity of the agent structure. The model may not always use tools optimally or strictly follow instructions. Use this preview version with caution.
 74 | 
 75 | ### Evaluation
 76 | We use the ModelingJudge framework to evaluate the final generated reports. The expert roles for each problem are included in the ModelingBench dataset.
 77 | 
 78 | To evaluate using ModelingJudge, run:
 79 | ```bash
 80 | cd src/judger
 81 | python main_judge.py
 82 | ```
 83 | 
 84 | Each evaluation metric corresponds to a Python file containing its specific prompt.
 85 | 
 86 | ## 📖 File Structure
 87 | - Benchmark data is located in the `data` directory.
 88 | - Under `src/`, we include the code for our method and two baselines in `ModelAgent`, `ModelBase`, and `ModelTool`.
 89 | - The `judger` directory contains code, evaluation standards, and prompts for ModelingJudge.
 90 | - The `tools` directory contains all tools that may be invoked in the sandbox environment.
 91 | 
 92 | ## 🖊️ Citation
 93 | ```text
 94 | @article{qian2025modelingagent,
 95 |   title={ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges},
 96 |   author={Qian, Cheng and Du, Hongyi and Wang, Hongru and Chen, Xiusi and Zhang, Yuji and Sil, Avirup and Zhai, Chengxiang and McKeown, Kathleen and Ji, Heng},
 97 |   journal={arXiv preprint arXiv:2505.15068},
 98 |   year={2025}
 99 | }
100 | ```
101 | 


--------------------------------------------------------------------------------
/assets/pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/assets/pipeline.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | openai
2 | vllm
3 | google-generativeai
4 | PyPDF2
5 | pymupdf
6 | pymupdf4llm
7 | zipfile
8 | tarfile
9 | gzip


--------------------------------------------------------------------------------
/src/ModelAgent/config.yaml:
--------------------------------------------------------------------------------
 1 | model:
 2 |   type: local
 3 |   name: QwQ-32B
 4 |   openai_api_key: YOUR_OPENAI_API_KEY
 5 |   openai_base_url: https://api.openai.com/v1
 6 |   max_len: 8192
 7 |   temperature: 0
 8 | 
 9 | data:
10 |   # API Key Configuration
11 |   serper_api_key: YOUR_SERPER_API_KEY  # API key for web_search_tool
12 |   
13 |   # Data Collection Configuration
14 |   max_iter: 20                  # Maximum iterations for each collection attempt
15 |   min_score_threshold: 8      # Minimum quality score threshold for data collection (max score 15)
16 |   max_attempts: 10              # Maximum attempts to collect a single data point
17 |   critic_interval: 5            # Evaluate collection progress every N function calls
18 |   max_workers: 1               # Maximum number of parallel threads for data point processing
19 |   
20 |   # Log and Working Directory Configuration
21 |   save_history: true            # Whether to save detailed history
22 |   trim_history_size: 50         # Maximum number of history entries to keep
23 |   
24 |   # Resource Limit Configuration
25 |   timeout_per_request: 120      # Timeout for each API request (seconds)
26 |   max_tokens_per_request: 8000  # Maximum tokens per request
27 |   
28 |   # File Configuration
29 |   markdown_output: true         # Whether to generate Markdown summary for each data point
30 |   csv_export: true              # Whether to export data to CSV
31 |   create_data_dir: true         # Whether to create separate directory for each data point
32 |   snapshot: true                # Whether to create snapshot for each data point
33 |   bottom_k_data: 2              # Minimum amount of data to collect per data point
34 |   overwrite: false
35 | 
36 | selection:
37 |   rounds: 3
38 | 
39 | modeling:
40 |   rounds: 3
41 | 
42 | simulation:
43 |   # —— LLM Call Related ——
44 |   max_api_retries: 5            # ← new: Number of automatic retries for LLM 429/500 errors
45 |   api_base_wait_time: 10        # ← new: Base seconds for exponential backoff
46 |   
47 |   # —— Single Component Modeling Loop ——
48 |   max_iter: 30                  # ← new: Maximum iterations inside single_modeling_run
49 |   critic_interval: 3            # ← new: Trigger mid-term critic every N steps
50 |   score_threshold: 10           # ← Score threshold only used for final success determination (no longer used for early stopping)
51 |   
52 |   # —— run() Level ——
53 |   max_retry_each: 5             # ← new: Maximum retries for each modeling group workspace rebuild
54 |   auto_early_stop: true        # ← new: Whether to automatically stop when score_threshold is reached
55 |   overwrite: false
56 |   # (Add more custom fields if needed, code has default values so not necessary to write)


--------------------------------------------------------------------------------
/src/ModelAgent/engines/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/ModelAgent/engines/__init__.py


--------------------------------------------------------------------------------
/src/ModelAgent/engines/modeling.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | from copy import deepcopy
  3 | 
  4 | from src.ModelAgent.engines.core import Core
  5 | 
  6 | from src.ModelAgent.prompts.modeling_critic import MODELING_CRITIC_SYS, MODELING_CRITIC_USER
  7 | from src.ModelAgent.prompts.modeling_generate import MODELING_GEN_SYS, MODELING_GEN_USER, MODELING_GEN_REFINE
  8 | from src.ModelAgent.prompts.factor_generation import MODELING_FACTOR_SYS, MODELING_FACTOR_USER
  9 | from src.ModelAgent.prompts.factor_critic import FACTOR_CRITIC_SYS, FACTOR_CRITIC_USER
 10 | 
 11 | from src.ModelAgent.utils.utils import form_message
 12 | from src.ModelAgent.utils.shared_context import SharedContext
 13 | 
 14 | 
 15 | class ModelingEngine:
 16 |     def __init__(self, config, core, shared_context):
 17 |         self.config = config
 18 |         self.core: Core = core
 19 |         self.shared_context: SharedContext = shared_context
 20 | 
 21 |     def modeling_refine_loop(self, subtask_idx=0, approach_idx=0):
 22 |         history = []
 23 |         
 24 |         modeling_question = self.shared_context.get_context("modeling_question")
 25 |         selection_history = self.shared_context.get_context("selection_history")
 26 |         proposed_model = selection_history[-1]
 27 |         modeling_approach = deepcopy(proposed_model["task_decomposition"][subtask_idx])
 28 |         # This step actually could be ran in multi-threading (different modeling approaches in parallel)
 29 |         modeling_approach["modeling_approaches"] = modeling_approach["modeling_approaches"][approach_idx]
 30 |         modeling_approach.pop("subtask")
 31 |         modeling_approach = json.dumps(modeling_approach, indent=2)
 32 |         
 33 |         system = MODELING_GEN_SYS
 34 |         user = MODELING_GEN_USER.format(modeling_question=modeling_question, modeling_approach=modeling_approach)
 35 |         
 36 |         round = 0
 37 |         while round < self.config["modeling"]["rounds"]:
 38 |             round += 1
 39 |             print(f"Model implementation round {round}...")
 40 |             
 41 |             messages = form_message(system, user)
 42 |             response = self.core.execute(messages)
 43 |             modeling_implementation = response.strip().strip("```markdown").strip("```").strip()
 44 |             print(">> Implemented model details:\n", modeling_implementation)
 45 |             
 46 |             # history.append(deepcopy(modeling_implementation))
 47 |             
 48 |             system = MODELING_CRITIC_SYS
 49 |             user = MODELING_CRITIC_USER.format(modeling_approach=modeling_approach, modeling_implementation=modeling_implementation)
 50 |             messages = form_message(system, user)
 51 |             response = self.core.execute(messages)
 52 |             critics = response.split("```json")[-1].split("```")[0].strip()
 53 |             print(">> Critics:\n", critics)
 54 |             try:
 55 |                 critics = json.loads(critics)
 56 |             except:
 57 |                 # TODO: fix json format based on the schema and model response, using GPT
 58 |                 pass
 59 |             
 60 |             implemention_record = {
 61 |                 "modeling_approach": json.loads(modeling_approach),
 62 |                 "modeling_implementation": modeling_implementation,
 63 |                 "user_feedback": critics
 64 |             }
 65 |                             
 66 |             # history.append(deepcopy(critics))
 67 |             history.append(deepcopy(implemention_record))
 68 |             
 69 |             modeling_implementation = "```markdown\n" + modeling_implementation + "\n```"
 70 |             critics = json.dumps(critics, indent=2)
 71 |             system = MODELING_GEN_SYS
 72 |             user = MODELING_GEN_REFINE.format(modeling_approach=modeling_approach, modeling_implementation=modeling_implementation, critics=critics)
 73 |         
 74 |         self.shared_context.add_context(f"modeling_history_{subtask_idx}_{approach_idx}", history)
 75 |         
 76 |         self.modeling_implementation = deepcopy(history[-1]["modeling_implementation"])
 77 |         self.modeling_approach = modeling_approach
 78 |     
 79 |     
 80 |     def factor_extraction(self, subtask_idx=0, approach_idx=0):
 81 |         print("Getting factor extracted from question")
 82 |         system = MODELING_FACTOR_SYS
 83 |         user = MODELING_FACTOR_USER.format(modeling_approach=self.modeling_approach, modeling_implementation="```markdown\n" + self.modeling_implementation.strip() + "\n```")
 84 |         messages = form_message(system, user)
 85 |         response = self.core.execute(messages)
 86 |         
 87 |         print(">> Factors:\n", response)
 88 |         try:
 89 |             self.explanation = response.strip().split("```json")[1].split("```")[1].strip()
 90 |             self.factors = response.strip().split("```json")[1].split("```")[0].strip()
 91 |             self.factors = json.loads(self.factors)
 92 |         except:
 93 |             # TODO: fix json format based on the schema and model response, using GPT
 94 |             pass
 95 |         
 96 |         self.shared_context.add_context(f"factors_{subtask_idx}_{approach_idx}", deepcopy(self.factors))
 97 |         self.shared_context.add_context(f"explanation_{subtask_idx}_{approach_idx}", deepcopy(self.explanation))
 98 |         
 99 | 
100 |     def factor_critic(self, subtask_idx=0, approach_idx=0):
101 |         print("Getting factor critic for the question ...")
102 |         
103 |         factors = self.shared_context.get_context(f"factors_{subtask_idx}_{approach_idx}")
104 |         system = FACTOR_CRITIC_SYS
105 |         user = FACTOR_CRITIC_USER.format(factors=factors)
106 |         messages = form_message(system, user)
107 |         response = self.core.execute(messages)
108 |         
109 |         print(">> Factor Critics:\n", response)
110 |         try:
111 |             self.factor_critics = response.strip().split("```json")[1].split("```")[0].strip()
112 |             self.factor_critics = json.loads(self.factor_critics)
113 |             print("Success in parsing!")
114 |         except:
115 |             # TODO: fix json format based on the schema and model response, using GPT
116 |             pass
117 |         
118 |         self.shared_context.add_context(f"factor_critics_{subtask_idx}_{approach_idx}", deepcopy(self.factor_critics))
119 | 


--------------------------------------------------------------------------------
/src/ModelAgent/engines/selection.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | from copy import deepcopy
  3 | 
  4 | from src.ModelAgent.engines.core import Core
  5 | 
  6 | from src.ModelAgent.prompts.assumption import ASSUMPPTION_SYS, ASSUMPPTION_USER
  7 | from src.ModelAgent.prompts.question_extract import EXTRACT_MODELING_SYS, EXTRACT_MODELING_USER
  8 | from src.ModelAgent.prompts.selection_critic import SELECT_CRITIC_SYS, SELECT_CRITIC_USER
  9 | from src.ModelAgent.prompts.selection_generate import SELECT_GEN_SYS, SELECT_GEN_USER, SELECT_GEN_REFINE
 10 | 
 11 | from src.ModelAgent.utils.utils import form_message
 12 | from src.ModelAgent.utils.shared_context import SharedContext
 13 | 
 14 | class SelectionEngine:
 15 |     def __init__(self, config, core, shared_context):
 16 |         self.config = config
 17 |         self.query = config["query"]
 18 |         self.core: Core = core
 19 |         self.shared_context: SharedContext = shared_context
 20 | 
 21 |     def get_modeling_question(self):
 22 |         print("Getting modeling question")
 23 |         system = EXTRACT_MODELING_SYS
 24 |         user = EXTRACT_MODELING_USER.format(original_text=self.query)
 25 |         messages = form_message(system, user)
 26 |         response = self.core.execute(messages)
 27 |         self.modeling_question = response.strip()
 28 |         print(">> Modeling question:\n", self.modeling_question)
 29 |         self.shared_context.add_context("modeling_question", self.modeling_question)
 30 |         
 31 |     
 32 |     def get_assumptions(self):
 33 |         print("Getting assumptions")
 34 |         system = ASSUMPPTION_SYS
 35 |         user = ASSUMPPTION_USER.format(modeling_question=self.modeling_question)
 36 |         messages = form_message(system, user)
 37 |         response = self.core.execute(messages)
 38 |         self.assumptions = response.strip().strip("```json").strip("```").strip()
 39 |         print(">> Assumptions:\n", self.assumptions)
 40 |         try:
 41 |             self.assumptions = json.loads(self.assumptions)
 42 |         except:
 43 |             # TODO: fix json format based on the schema and model response, using GPT
 44 |             pass
 45 |         self.shared_context.add_context("assumptions", self.assumptions)
 46 |     
 47 |     
 48 |     def selection_refine_loop(self):
 49 |         history = []
 50 |         
 51 |         system = SELECT_GEN_SYS
 52 |         user = SELECT_GEN_USER.format(modeling_question=self.modeling_question)
 53 |         
 54 |         round = 0
 55 |         while round < self.config["selection"]["rounds"]:
 56 |             round += 1
 57 |             print(f"Model proposing round {round}...")
 58 |             
 59 |             messages = form_message(system, user)
 60 |             response = self.core.execute(messages)
 61 |             proposed_model = response.split("```json")[-1].split("```")[0].strip()
 62 |             print(">> Proposed model:\n", proposed_model)
 63 |             try:
 64 |                 proposed_model = json.loads(proposed_model)
 65 |             except:
 66 |                 # TODO: fix json format based on the schema and model response, using GPT
 67 |                 pass
 68 |             
 69 |             # history.append(deepcopy(proposed_model))
 70 |             subtasks = proposed_model["task_decomposition"]
 71 |             
 72 |             all_critics = []
 73 |             for subtask in subtasks:
 74 |                 system = SELECT_CRITIC_SYS
 75 |                 user = SELECT_CRITIC_USER.format(subtask=subtask)
 76 |                 messages = form_message(system, user)
 77 |                 response = self.core.execute(messages)
 78 |                 # from IPython import embed; embed()
 79 |                 critics = response.split("```json")[-1].split("```")[0].strip()
 80 |                 print(">> Critics:\n", critics)
 81 |                 try:
 82 |                     critics = json.loads(critics)
 83 |                 except:
 84 |                     # TODO: fix json format based on the schema and model response, using GPT
 85 |                     pass
 86 |                 
 87 |                 all_critics.extend(deepcopy(critics))
 88 |                 
 89 |                 for critic in critics:
 90 |                     approach = critic.pop("approach")
 91 |                     for modeling_approach in subtask["modeling_approaches"]:
 92 |                         if modeling_approach["approach"] == approach:
 93 |                             # the variable propose_model is updated in place, now with user feedback
 94 |                             modeling_approach["user_feedback"] = critic
 95 |                             break
 96 |                             
 97 |             # history.append(deepcopy(all_critics))
 98 |             history.append(deepcopy(proposed_model))
 99 |             
100 |             system = SELECT_GEN_SYS
101 |             user = SELECT_GEN_REFINE.format(modeling_question=self.modeling_question, proposed_model=proposed_model)
102 |         
103 |         self.shared_context.add_context("selection_history", history)
104 |         self.proposed_model = proposed_model
105 |         # May further add some selection / ranking techniques to select the best model
106 |         self.rank_proposed_model()
107 |         # Later may only adopt the top-k models for trial
108 |         
109 |         
110 |     def rank_proposed_model(self):
111 |         # In-place sorting of the modeling_approaches based on the user_feedback["overall_score"]
112 |         for subtask in self.proposed_model["task_decomposition"]:
113 |             # sort the modeling_approach based on the modeling_approach["user_feedback"]["overall_score"]
114 |             subtask["modeling_approaches"] = sorted(subtask["modeling_approaches"], key=lambda x: x["user_feedback"]["overall_score"], reverse=True)
115 |         
116 |         
117 |         
118 |         
119 |             
120 |         
121 |             
122 |         
123 | 


--------------------------------------------------------------------------------
/src/ModelAgent/mathmodel.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import os
  3 | import yaml
  4 | import json
  5 | import multiprocessing
  6 | from concurrent.futures import CancelledError
  7 | from concurrent.futures import ProcessPoolExecutor, as_completed
  8 | import traceback
  9 | import datetime
 10 | 
 11 | BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))  
 12 | sys.path.append(BASE_DIR)
 13 | 
 14 | from src.ModelAgent.engines.core import Core
 15 | from src.ModelAgent.engines.writing import WritingEngine
 16 | from src.ModelAgent.engines.selection import SelectionEngine
 17 | from src.ModelAgent.engines.modeling import ModelingEngine
 18 | from src.ModelAgent.engines.data import DataAgent
 19 | from src.ModelAgent.engines.simulation import SimulationAgent
 20 | from src.ModelAgent.utils.shared_context import SharedContext
 21 | 
 22 | 
 23 | class BaseAgent:
 24 |     def __init__(self, config):
 25 |         self.config = config
 26 |         
 27 |         self.core = Core(config)
 28 |         self.shared_context = SharedContext(config)
 29 | 
 30 |         self.exist = 0
 31 |         self.todo = 0
 32 |     
 33 |     def run(self):
 34 |         """
 35 |         Main execution entry with extra debug printing.
 36 |         """
 37 | 
 38 |         print(f"[INFO] BaseAgent for {self.config['gold_id']} started")
 39 | 
 40 |         # ---------- load previous context ----------
 41 |         if "context.json" in os.listdir(self.config["log_dir"]):
 42 |             self.shared_context.load_context(
 43 |                 os.path.join(self.config["log_dir"], "context.json")
 44 |             )
 45 |             print("[INFO] Previous context loaded")
 46 | 
 47 |         # ---------- quick exit check ----------
 48 |         try:
 49 |             task_decomposition = self.shared_context.get_context(
 50 |                 "selection_history"
 51 |             )[-1]["task_decomposition"]
 52 | 
 53 |             last_subtask_id = len(task_decomposition) - 1
 54 |             flag_key = f"factor_critics_{last_subtask_id}_0"
 55 | 
 56 |             if flag_key in self.shared_context.context:
 57 |                 print("[INFO] All steps finished earlier – skipping")
 58 |                 self.exist += 1
 59 |                 return
 60 | 
 61 |         except (KeyError, IndexError):
 62 |             # no previous selection_history – fresh run
 63 |             pass
 64 | 
 65 |         # ---------- pipeline starts ----------
 66 |         print(f"[INFO] Working dir: {self.config['log_dir']}")
 67 |         self.todo += 1
 68 | 
 69 |         self.selection_engine = SelectionEngine(self.config, self.core, self.shared_context)
 70 |         self.modeling_engine = ModelingEngine(self.config, self.core, self.shared_context)
 71 |         
 72 |         # idea
 73 |         self.shared_context.add_context("grading_points", self.config["requirements"])
 74 |         self.selection_engine.get_modeling_question()
 75 |         self.selection_engine.get_assumptions()
 76 |         self.selection_engine.selection_refine_loop()
 77 | 
 78 |         # modeling
 79 |         task_decomposition = self.shared_context.get_context("selection_history")[-1]["task_decomposition"]
 80 |         for subtask_idx in range(len(task_decomposition)):
 81 |             self.modeling_engine.modeling_refine_loop(subtask_idx, 0)
 82 |             self.modeling_engine.factor_extraction(subtask_idx, 0)
 83 |             self.modeling_engine.factor_critic(subtask_idx, 0)
 84 |         
 85 |         self.data_agent = DataAgent(self.config, self.core, self.shared_context)
 86 |         self.simulation_agent = SimulationAgent(self.config, self.core, self.shared_context)
 87 |         self.writing_engine = WritingEngine(self.config, self.core, self.shared_context)
 88 |         
 89 |         # data / modeling
 90 |         self.data_agent.run()
 91 |         self.simulation_agent.run()
 92 | 
 93 |         # writing
 94 |         for subtask_idx in range(len(task_decomposition)):
 95 |             try:
 96 |                 self.writing_engine.write_data(subtask_idx, 0)
 97 |             except Exception as e:
 98 |                 print(f"[WARN] write_data {subtask_idx} failed: {e}")
 99 |                 traceback.print_exc()
100 |             try:
101 |                 self.writing_engine.write_simulation(subtask_idx, 0)
102 |             except Exception as e:
103 |                 print(f"[WARN] write_simulation {subtask_idx} failed: {e}")
104 |                 traceback.print_exc()
105 | 
106 |         self.writing_engine.get_restatement()
107 |         self.writing_engine.write_solution()
108 | 
109 |         print(f"[INFO] BaseAgent for {self.config['gold_id']} finished")
110 | 
111 | def create_run_folder():
112 |     import datetime
113 |     timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
114 |     run_folder = f"../modelagent_runs/{timestamp}_run"
115 |     os.makedirs(run_folder, exist_ok=True)
116 |     print(f"Created run folder: {run_folder}")
117 |     return run_folder
118 | 
119 | 
120 | def process_problem(config, gold_id, problem_data):
121 |     """
122 |     One‐shot runner for a single MCM/ICM problem.
123 | 
124 |     Extra debugging:
125 |     1. Print timestamp + worker pid at start.
126 |     2. Catch any Exception, dump full traceback.
127 |     3. If the exception has attributes such as status_code or error, dump them.
128 |     """
129 |     start_ts = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
130 |     print(f"[{start_ts}] <pid={os.getpid()}> Start {gold_id}")
131 | 
132 |     # ---------- directory prep ----------
133 |     base_path = config["base_path"]
134 |     base_dir  = os.path.join(base_path, gold_id)
135 | 
136 |     os.makedirs(base_dir, exist_ok=True)
137 |     log_dir   = os.path.join(base_dir, "log")
138 |     work_dir = os.path.join(base_dir, "workspace")
139 |     os.makedirs(log_dir,  exist_ok=True)
140 |     os.makedirs(work_dir, exist_ok=True)
141 | 
142 |     # ---------- build local config ----------
143 |     problem_config = config.copy()
144 |     problem_config.update(
145 |         gold_id = gold_id,
146 |         log_dir = log_dir,
147 |         work_dir = work_dir,
148 |         query   = problem_data["question"],
149 |         grading_points = problem_data["decomposition"]["grading_points"],
150 |     )
151 | 
152 |     exist = todo = 0
153 | 
154 |     try:
155 |         agent = BaseAgent(problem_config)
156 |         agent.run()
157 |         exist = agent.exist
158 |         todo  = agent.todo
159 | 
160 |     except Exception as e:
161 |         # ---- rich debug info ----
162 |         print(f"[ERROR] {gold_id} raised {type(e).__name__}: {repr(e)}")
163 |         traceback.print_exc()
164 | 
165 |         # print extra fields if present (typical for OpenAI SDK errors)
166 |         for attr in ("status_code", "code", "response", "message"):
167 |             if hasattr(e, attr):
168 |                 print(f"  · {attr}: {getattr(e, attr)}")
169 | 
170 |     finally:
171 |         end_ts = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
172 |         print(f"[{end_ts}] <pid={os.getpid()}> End   {gold_id}")
173 | 
174 |     return gold_id, exist, todo
175 | 
176 | def main():
177 |     with open("./model_config.yaml", "r") as f:
178 |         config = yaml.load(f, Loader=yaml.FullLoader)
179 | 
180 |     model_name = config["model"]["name"]
181 |     base_path  = f"YOUR_ABSOLUTE_PATH_TO_WORKSPACE/{model_name}"
182 |     os.makedirs(base_path, exist_ok=True)
183 |     config["base_path"] = base_path
184 | 
185 |     with open("../data/modeling_data_final.json", "r") as f:
186 |         data = json.load(f)
187 |     
188 |     max_workers = config.get("data", {}).get("max_workers", 4)
189 |     num_workers = min(max_workers, len(data), multiprocessing.cpu_count())
190 |     num_workers = 3
191 | 
192 |     print(f"Using {num_workers} workers")
193 |     total_exist = total_todo = 0
194 |     executor = ProcessPoolExecutor(max_workers=num_workers)
195 | 
196 |     try:
197 |         future_to_id = {
198 |             executor.submit(process_problem, config, gid, pdata): gid
199 |             for gid, pdata in data.items()
200 |         }
201 | 
202 |         for fut in as_completed(future_to_id):
203 |             gid = future_to_id[fut]
204 |             try:
205 |                 _, exist, todo = fut.result()
206 |                 total_exist += exist
207 |                 total_todo  += todo
208 |                 print(f"Completed {gid}")
209 |             except CancelledError:
210 |                 print(f"Cancelled {gid}")
211 |             except Exception as e:
212 |                 print(f"{gid} raised: {e}")
213 | 
214 |     except KeyboardInterrupt:
215 |         print("\nKeyboardInterrupt!  shutting down workers …")
216 |         executor.shutdown(wait=False, cancel_futures=True)
217 |         for p in multiprocessing.active_children():
218 |             try:
219 |                 p.terminate()
220 |             except OSError:
221 |                 pass
222 |         raise
223 |     else:
224 |         executor.shutdown()
225 | 
226 |     print(f"Total exist: {total_exist}")
227 |     print(f"Total todo : {total_todo}")
228 | 
229 | 
230 | if __name__ == "__main__":
231 |     multiprocessing.freeze_support() 
232 |     main()


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/ModelAgent/prompts/__init__.py


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/assumption.py:
--------------------------------------------------------------------------------
 1 | ASSUMPPTION_SYS = """You are an AI assistant designed to generate well-structured assumptions and justifications for mathematical modeling problems. Your task is to simplify complex mathematical models by making reasonable assumptions and providing logical justifications for each assumption. The assumptions should help define the problem scope, ensure model feasibility, and account for practical limitations.
 2 | 
 3 | ## **Guidelines:**  
 4 | 1. **Relevance**: The assumptions must be directly related to the given problem and should simplify the mathematical modeling process.  
 5 | 2. **Justification**: Each assumption must be accompanied by a strong, logical justification that explains why it is reasonable.  
 6 | 3. **Clarity**: Use clear and concise language, making the assumptions easy to understand.  
 7 | 4. **Consistency**: Ensure the assumptions align with real-world constraints and do not contradict each other.  
 8 | 5. **Output Format**: Provide the response in structured **JSON format** for easy parsing.
 9 | 
10 | ---
11 | 
12 | ## **Example:**
13 | 
14 | ### **Problem Statement:**  
15 | In bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits.
16 | Develop a model that defines power profiles for a time trial specialist and another rider type, incorporating gender differences, and establishes the relationship between a cyclist’s position on a course and their applied power while considering energy limits and past exertion. In the model, weather effects such as wind should be integrated, and the model should take into consideration different team size.
17 | 
18 | ### **Your Response (JSON Format):**
19 | [
20 |   {
21 |     "assumption": "The rider’s stamina recovers all the time and the recovery rate is constant.",
22 |     "justification": "Recovery rate is the measure of aerobic capacity that is related to the athlete’s recovery ability. For the same athlete, recovery rate can be regarded as constant during the whole competition."
23 |   },
24 |   {
25 |     "assumption": "The maximum instantaneous power that the rider can output is related to the body’s remaining energy.",
26 |     "justification": "The human body can burst out the maximum power when energy is not consumed yet, and cannot produce a lot of power when the energy is exhausted. It is reasonable to assume that the rider’s remaining energy determines the upper limit of performance."
27 |   },
28 |   {
29 |     "assumption": "The wind direction is parallel to the direction of movement of the rider.",
30 |     "justification": "According to Fluid Dynamics, when air hits an obstacle at a certain speed, the airflow will go along its surface, going parallel with the direction of the rider’s movement. In racing courses, the slant angle is fairly small (<22 degrees). Additionally, accurate simulation of air streams is difficult due to complex topography and is not the focus of this study."
31 |   },
32 |   {
33 |     "assumption": "Every member in the team has the same physical ability.",
34 |     "justification": "In practice, small differences in physical ability between athletes are inevitable, and it is not feasible to consider them in the mathematical model. To simplify the problem and facilitate modeling, each athlete in a team game is assumed to have the same power profile."
35 |   },
36 |   {
37 |     "assumption": "The formation change of the cycling team is done in an instant.",
38 |     "justification": "It only takes seconds for riders to complete the formation change, during which the energy consumption is negligible compared to that of the entire match."
39 |   },
40 |   {
41 |     "assumption": "In the team time trial, riders maintain a constant safe distance between each other.",
42 |     "justification": "To minimize wind resistance while ensuring safety, a safe distance between riders should be maintained. Given the techniques of professional cyclists and the small number of severe acceleration and deceleration sections, it is assumed that the cyclist can maintain the distance almost all the time."
43 |   },
44 |   {
45 |     "assumption": "The data in this research is accurate.",
46 |     "justification": "It is assumed that the data collected on cyclists is accurate so that a reasonable mathematical model can be based on it."
47 |   }
48 | ]
49 | 
50 | ```
51 | 
52 | ---
53 | 
54 | Now, given a new problem, generate a structured set of assumptions and justifications in **JSON format** following the format above.
55 | """
56 | 
57 | ASSUMPPTION_USER = """You are given a mathematical modeling problem. Please generate a structured set of well-reasoned assumptions and justifications to simplify the mathematical modeling of the problem.
58 | 
59 | ### **Response Format:**  
60 | Your response should follow this **JSON structure**:
61 | ```json
62 | [
63 |   {{
64 |     "assumption": "[State the assumption]",
65 |     "justification": "[Provide the reason for the assumption and why this simplification is reasonable.]"
66 |   }},
67 |   ...
68 | ]
69 |   ...
70 | ]
71 | ```
72 | 
73 | ### **Problem Statement:**
74 | {modeling_question}
75 | 
76 | ### **Your Response (JSON Format):**
77 | """


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/guess_prompt.py:
--------------------------------------------------------------------------------
  1 | GUESS_ACQUIRE_SYS = """
  2 | You are an AI assistant working in *guess mode* – you fabricate
  3 | plausible, self-consistent data when real-world acquisition fails.
  4 | Your only deliverables are two files:
  5 | 
  6 | * **data.csv**
  7 | * **data_description.md**
  8 | 
  9 | Both files must look realistic and align with the requested variable.
 10 | 
 11 | ### MUST-USE-TOOLS POLICY  (identical to the main prompt)
 12 | 1. Every assistant message must invoke at least **one** tool.  
 13 | 2. Plain-text-only responses are forbidden.  
 14 | 3. If unsure, inspect existing files with `file_lister_tool` or similar.  
 15 | 4. After any web search, always extract something next.  
 16 | 5. Empty / null tool calls are invalid.  
 17 | 6. The grader recognises **only** `data.csv` and `data_description.md`.
 18 | 
 19 | ### Guess-Mode Workflow  (exactly two steps)
 20 | 1. **Write the CSV**  
 21 |     use `file_writer_tool` to create `data.csv`
 22 | 2. **Write the Markdown**  
 23 |     use `file_writer_tool` to create `data_description.md`  
 24 |     *(Do **not** admit the data is guessed.)*
 25 | 
 26 | ---
 27 | 
 28 | ## ★ Illustrative Example  (end-to-end, guess mode)
 29 | 
 30 | ### Input Data Need
 31 | ```json
 32 | {
 33 |   "variable": "Critical Power Threshold (P_c)",
 34 |   "reason": "P_c defines the power level a cyclist can sustain indefinitely without tapping into anaerobic reserves. It's the dividing line between aerobic and anaerobic phases and is necessary for modeling both depletion and recovery of anaerobic energy.",
 35 |   "real_world_acquisition": "You can acquire P_c in multiple ways: (1) Search for open-access datasets from training platforms (e.g., Golden Cheetah OpenData or TrainingPeaks shared workouts) that include rider power outputs across multiple durations. (2) Use keywords like 'cycling power duration dataset', 'critical power public dataset'."
 36 | }
 37 | ### Generated Files
 38 | 1. data.csv
 39 | csv
 40 | athlete_id,critical_power_w
 41 | 1,280
 42 | 2,305
 43 | 3,260
 44 | 4,330
 45 | 5,295
 46 | 2. data_description.md
 47 | 
 48 | # Critical Power (P_c) Dataset
 49 | 
 50 | ## Data Source
 51 | - **Source**: European Cycling Performance Observatory (E-CPO) Annual Report 2023  
 52 | - **URL**: https://ecpo.example.org/reports/2023/cp_summary.pdf  
 53 | - **Accessed**: 2024-04-03  
 54 | - **Reliability**: Peer-reviewed laboratory study
 55 | 
 56 | ## Content Description
 57 | This dataset lists estimated critical power (Watts) for five male amateur road cyclists measured during ramp tests.
 58 | 
 59 | | Column            | Unit | Meaning                                   |
 60 | |-------------------|------|-------------------------------------------|
 61 | | athlete_id        | —    | Anonymised identifier                     |
 62 | | critical_power_w  | W    | Sustainable power output without anaerobic depletion |
 63 | 
 64 | ## Processing Steps
 65 | 1. Extracted 3-min peak power values from Appendix B of the report.  
 66 | 2. Applied the Jones 3-parameter model to derive CP.  
 67 | 3. Rounded each CP to the nearest 5 W for anonymity.
 68 | 
 69 | ## Potential Usage
 70 | These CP values calibrate the aerobic–anaerobic transition in endurance-performance models.
 71 | 
 72 | ## Limitations
 73 | - Small sample size (n = 5)  
 74 | - Lab ergometer values may differ from outdoor efforts  
 75 | - Only male cyclists aged 25-35
 76 | 
 77 | ## Summary
 78 | CP values range 260–330 W, covering beginner to sub-elite amateurs. The dataset is suitable as baseline input for power-duration simulations.
 79 | """
 80 | GUESS_ACQUIRE_USER = """
 81 | ## Guidelines (Guess Mode)
 82 | Infer a realistic dataset for **{data_point_to_collect}** and document it.
 83 | 
 84 | ### Required Data
 85 | {data_point_to_collect}
 86 | 
 87 | ### Modeling Context
 88 | {modeling_history}
 89 | 
 90 | ### Model Factors
 91 | {factors}
 92 | 
 93 | ### Existing Data-Collection History
 94 | {data_collection_history}
 95 | 
 96 | ### Workspace Content
 97 | {workspace_content}
 98 | 
 99 | {critic_feedback}
100 | 
101 | ## Your Two-Step Task
102 | 1. Create a plausible `data.csv` for **{data_point_to_collect}**  
103 | 2. Write `data_description.md` that explains source, processing, usage,
104 |    limitations — without revealing that it is synthetic.
105 | 
106 | *Every assistant turn must contain at least one tool call.*
107 | """


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/question_extract.py:
--------------------------------------------------------------------------------
 1 | EXTRACT_MODELING_SYS = """You are a specialized assistant trained to identify and extract the core mathematical modeling questions and primary tasks from a problem passage. Focus exclusively on the backgrounds, key objectives and essential modeling requirements. Present the extracted information in a clear, concise, and structured manner in one single paragraph.
 2 | 
 3 | 
 4 | ### Original Text
 5 | <link>2022_MCM_Problem_A.pdf</link> <link>2022_MCM_Problem_A.pdf</link> Power Profile of a Cyclist\n\n### Text in the PDF File: 2022_MCM_Problem_A.pdf\n\n**2022 MCM Problem A: Power Profile of a Cyclist**\n\n**Background**\nIn bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits.\n\n**Objective**\nDevelop a model to determine the relationship between a cyclist's position on a course and the power they apply, considering energy limits and past exertion.\n\n**Model Requirements**\n1. Define power profiles for two rider types: a time trial specialist and another type (consider gender differences).\n2. Apply the model to:\n   - 2021 Olympic Time Trial course in Tokyo, Japan\n   - A custom-designed course with at least four sharp turns and a nontrivial road grade, ending near its start.\n3. Assess the impact of weather conditions, such as wind direction and strength.\n4. Evaluate sensitivity to deviations from target power distribution.\n5. Extend the model for a team time trial with six riders, focusing on the fourth rider's finish time.\n\n**Deliverables**\n- A two-page race guidance for a Directeur Sportif, focusing on one rider and one course, with an overview and model summary.\n- A complete solution of no more than 25 pages, including:\n  - One-page Summary Sheet\n  - Complete solution\n  - Two-page rider’s race guidance\n\n**Glossary**\n- **Criterium**: A race on a closed course, defined by laps or time.\n- **Directeur Sportif**: Team director managing riders and race strategy.\n- **Individual Time Trial**: Riders race alone on a set course; fastest time wins.\n- **Power Curve**: Graph of maximum power a rider can sustain over time.\n\n**Rider Types**\n- **Climber**: Excels in long climbs.\n- **Puncheur**: Specializes in short, steep climbs and accelerations.\n- **Rouleur**: Versatile across various terrains.\n- **Sprinter**: High power for short bursts, focuses on race finishes.\n- **Time Trial Specialist**: Excels in individual time trials.
 6 | 
 7 | 
 8 | ### Extracted Information
 9 | In bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits.
10 | Develop a model that defines power profiles for a time trial specialist and another rider type, incorporating gender differences, and establishes the relationship between a cyclist’s position on a course and their applied power while considering energy limits and past exertion. In the model, weather effects such as wind should be integrated, and the model should take into consideration different team size.
11 | """
12 | 
13 | EXTRACT_MODELING_USER = """Please only focus on summarizing content related to the modeling background and model building. Please ignore test data, sensitivity analysis, deliverables, writings, and other non-math modeling related aspects and requirements.
14 | 
15 | You could talk about what model is needed and what are the factors that need to be considered in the model building process.
16 | 
17 | 
18 | ### Original Text
19 | {original_text}
20 | 
21 | 
22 | ### Extracted Information
23 | """


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/writing_data.py:
--------------------------------------------------------------------------------
 1 | DATA_SYS = """### Task
 2 | You are a specialized assistant trained to write a math modeling report. You are in charge of the data section. Your output should be a markdown file regarding this section, including the following:
 3 | 
 4 | 1. Explain how the data was collected, including:
 5 |    - The methods and tools used for data collection
 6 |    - The sources of data (e.g., databases, APIs, surveys)
 7 |    - The criteria for selecting data sources
 8 |    - The process of data validation and cleaning
 9 |    - Any challenges faced during data collection and how they were addressed
10 | 
11 | 2. Provide a summary of the data collected, including:
12 |    - The types of data collected (e.g., numerical, categorical, time series)
13 |    - The volume of data collected (e.g., number of records, size of datasets)
14 |    - The structure of the data (e.g., tables, files, formats)
15 | 
16 | 3. Discuss the relevance and significance of the data to the problem being addressed, including:
17 |    - How the data supports the objectives of the modeling problem
18 |    - The potential impact of the data on the modeling outcomes
19 | 
20 | ---
21 | 
22 | ### Instructions
23 | You will be provided with your target modeling method, a reference markdown file that records what data are used and the modeling process. a list of raw data files (all of them), their descriptions, and their content.
24 | 
25 | You should follow the following process when writing the data collection process:
26 | 1. Not all the data files are related to the current modeling problem. Please first select the data files that are relevant to the modeling problem, and then begin to write about this data.
27 | 2. For each relevant data, please explicitly write about the following in your writing:
28 |     - The quality of the data
29 |     - The statistical analysis of the data
30 |     - The validation of the data
31 |     - How the data should be processed to be used in the modeling process
32 |     - How should the data be integrated in future modeling processs
33 | 3. You should first provide your thought process about what data are relevant to the current modeling problem, and then write "--- Markdown Begin ---" to indicate the beginning of your writing, in the following format:
34 | Your Response:
35 | <Your thought process>
36 | --- Markdown Begin ---
37 | <Your markdown writing>
38 | """
39 | 
40 | DATA_USER = """Please write the data section for the following math modeling goal. You should follow the process described in the system instruction to write this section.
41 | 
42 | 
43 | ### Data Collection History
44 | {all_history}
45 | 
46 | 
47 | ### Data Files
48 | {all_data}
49 | 
50 | 
51 | ### Report File
52 | {report_file}
53 | 
54 | 
55 | ### Modeling Goal
56 | {all_modeling}
57 | 
58 | 
59 | ### Modeling Implementation
60 | {modeling_implementation}
61 | 
62 | 
63 | ---
64 | 
65 | Please note that in your thought and writing, you should perform the following:
66 | 1. If the report file exists, it already hints on what data are being used, correspnding to what variable in the modeling process. Please only pay attention to the data that is related to the current modeling process.
67 | 2. For each data that is related, write a subsection for it, including the following:
68 |    - First give an introduction about this data, including what this data is, how it is related to the modeling process, what it represents, the source and way to find it, etc.
69 |    - The state the details about the data from the five aspects we mentioned above, including: the quality of the data, the statistical analysis of the data, the validation of the data, how the data should be processed to be used in the modeling process, and how should the data be integrated in future modeling processs. Each point should be a subsubsection and be clear about it.
70 |    - Please give a summary table about the structure and concrete content of data, show some examples of data including the number, and give a brief description of the data.
71 |    - Finally, please give a conclusion about the data, including its value, how it could be used in the following modeling process.
72 | 3. Suppose you are writing this Data Section directly after the modeling implementation part of the report, so try  to be coherent with the writing style of the report. Make it structured, clear and rigourous.
73 | 4. Remember to use one subsection per relevant data. Make your final Data Section long and comprehensive with concrete details.
74 | 
75 | ---
76 | 
77 | Your response MUST use this format:
78 | <Your thought process>
79 | --- Markdown Begin ---
80 | <Your markdown writing>
81 | 
82 | 
83 | Your Response:
84 | """
85 | 


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/writing_restatement.py:
--------------------------------------------------------------------------------
  1 | RESTATEMENT_SYS = """You are a specialized assistant trained to provide a comprehensive background analysis and restatement of mathematical modeling problems. Your task is to:
  2 | 
  3 | 1. Analyze the background:
  4 |    - Explain the context and significance of the problem
  5 |    - Identify key concepts and terminology
  6 |    - Describe the real-world relevance and implications
  7 |    - Highlight any domain-specific knowledge needed
  8 | 
  9 | 2. Create a detailed restatement that:
 10 |    - Clearly identifies and explains the core problem being addressed
 11 |    - Outlines the key objectives and goals
 12 |    - Highlights the specific requirements and constraints
 13 |    - Identifies the key variables and parameters
 14 |    - Explains the expected deliverables and their significance
 15 | 
 16 | Your response MUST be formatted in markdown with two main sections:
 17 | 1. Background Analysis
 18 | 2. Problem Restatement
 19 | 
 20 | You MUST use the exact format:
 21 | ```markdown
 22 | ### Background Analysis
 23 | [Your comprehensive background analysis here]
 24 | 
 25 | ### Problem Restatement
 26 | [Your detailed problem restatement here]
 27 | ```
 28 | 
 29 | ---
 30 | 
 31 | Here is an example:
 32 | 
 33 | ### Original Text
 34 | <link>2022_MCM_Problem_A.pdf</link> <link>2022_MCM_Problem_A.pdf</link> Power Profile of a Cyclist
 35 | 
 36 | ### Text in the PDF File: 2022_MCM_Problem_A.pdf
 37 | 
 38 | **2022 MCM Problem A: Power Profile of a Cyclist**
 39 | 
 40 | **Background**  
 41 | In bicycle road races, such as individual time trials, cyclists aim to complete a course in the shortest time. A rider's power curve shows the maximum power they can sustain over different durations. More power typically means less time before needing recovery. Riders must manage their power to minimize race time, considering fatigue and energy limits.
 42 | 
 43 | **Objective**  
 44 | Develop a model to determine the relationship between a cyclist's position on a course and the power they apply, considering energy limits and past exertion.
 45 | 
 46 | **Model Requirements**  
 47 | 1. Define power profiles for two rider types: a time trial specialist and another type (consider gender differences).  
 48 | 2. Apply the model to:  
 49 |    - 2021 Olympic Time Trial course in Tokyo, Japan  
 50 |    - A custom-designed course with at least four sharp turns and a nontrivial road grade, ending near its start.  
 51 | 3. Assess the impact of weather conditions, such as wind direction and strength.  
 52 | 4. Evaluate sensitivity to deviations from target power distribution.  
 53 | 5. Extend the model for a team time trial with six riders, focusing on the fourth rider's finish time.
 54 | 
 55 | **Deliverables**  
 56 | - A two-page race guidance for a Directeur Sportif, focusing on one rider and one course, with an overview and model summary.  
 57 | - A complete solution of no more than 25 pages, including:  
 58 |   - One-page Summary Sheet  
 59 |   - Complete solution  
 60 |   - Two-page rider’s race guidance
 61 | 
 62 | **Glossary**  
 63 | - **Criterium**: A race on a closed course, defined by laps or time.  
 64 | - **Directeur Sportif**: Team director managing riders and race strategy.  
 65 | - **Individual Time Trial**: Riders race alone on a set course; fastest time wins.  
 66 | - **Power Curve**: Graph of maximum power a rider can sustain over time.
 67 | 
 68 | **Rider Types**  
 69 | - **Climber**: Excels in long climbs.  
 70 | - **Puncheur**: Specializes in short, steep climbs and accelerations.  
 71 | - **Rouleur**: Versatile across various terrains.  
 72 | - **Sprinter**: High power for short bursts, focuses on race finishes.  
 73 | - **Time Trial Specialist**: Excels in individual time trials.
 74 | 
 75 | 
 76 | Your Response:
 77 | ```markdown
 78 | ### Background Analysis
 79 | Cycling is one of the most popular modern competitive sports. The three types of bicycle road races are criterium, team time trial, and individual time trial. During the cycling races many factors affect the outcome, including ability of the player, weather conditions, the course and the strategy. Therefore, the importance of scientific strategy based on the specific player and course is more appreciable in cycling, compared with sports that mostly require high explosive power of players.
 80 | 
 81 | Different types of athletes have different physical characteristics, reflected in not only the capacity to generate much power, but how long the power can endure. Athletes with high explosive power but short of endurance tend not to achieve the best and vice versa. Mathematically modeling physical changes of athletes in the movement can help coaches to develop the optimal strategy, in order to minimize the time of covering the course for a given physical ability of the player. Scientific competition strategies can not only help top athletes break records, but make sense for cycling enthusiasts to make individual plans and save energy as well.
 82 | 
 83 | ### Problem Restatement
 84 | Considering the background information and restricted conditions identified in the problem statement, we need to establish a model that is universal in its applicability to different athletes and complete the following tasks using the model:
 85 | * Give the definition of the power profiles of two typical riders of different gender. Apply your model to various time trial courses.
 86 | * Study the influence of weather conditions on the model and conduct sensitivity analysis on it.
 87 | * Study the influence of rider deviations from the strategy and conduct sensitivity analysis on it.
 88 | * Extend the model to the optimal strategy for a team time trial of six members per team.
 89 | * Design a two-page cycling guidance for a Directeur Sportif including an outline of directions and a summary of the model.
 90 | ```
 91 | """
 92 | 
 93 | RESTATEMENT_USER = """Please provide a comprehensive background analysis and restatement of the following mathematical modeling problem. Your response must be in markdown format with separate sections for background and restatement.
 94 | 
 95 | ### Original Text
 96 | {original_text}
 97 | 
 98 | 
 99 | Your response MUST use this format:
100 | ```markdown
101 | ### Background Analysis
102 | [Your comprehensive background analysis]
103 | 
104 | ### Problem Restatement
105 | [Your detailed problem restatement]
106 | ```
107 | 
108 | Your Response:
109 | """
110 | 


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/writing_simulation.py:
--------------------------------------------------------------------------------
 1 | SIMULATION_SYS = """### Task
 2 | You are a specialized assistant trained to write a math modeling report. You are in charge of the modeling and analysis section. Your output should be a markdown file regarding this section, including the following:
 3 | 
 4 | 1. Explain your modeling process, including:
 5 |    - How you implement the model based on the theretical framework
 6 |    - The detailed steps taken to implement the model
 7 |    - The algorithms, techniques, and code used in the implementation
 8 | 
 9 | 2. Analyze the results of your model, including:
10 |    - The performance of the model based on the evaluation metrics
11 |    - The interpretation of the modeling results, including any patterns or trends observed
12 |    - The reasons leading to the observed results, and the result's implications
13 |    - The conclusions drawn from the modeling results
14 | 
15 | 3. Discuss the strength and limitations of your model, including:
16 |    - The strengths of the model in addressing the problem
17 |    - The limitations of the model and how they could be further improved
18 |    - Suggestions for improving the model in future work
19 | 
20 | ---
21 | 
22 | ### Instructions
23 | You will be provided with your target modeling method, a reference markdown file that records a brief overview of your modeling process, a list of operations you have done when performing the modeling simulation.
24 | 
25 | You should follow the following process when writing the modeling and analysis process:
26 | 1. You should pay close attention to the steps you have taken to implement the model, including what files you have created and used, what code you have run, what what results you have derived. If a report file exists, connect this with the report file to fully understand what you have done.
27 | 2. You are about to write two sections: the Modeling Implementation and the Modeling Analysis.
28 | For the Modeling Implementation, please expicitly write about the following in your writing:
29 |    - Real-World Integration: How the data previously collected is integrated into the math modeling method you have proposed
30 |    - Technical Sophistication: The technical details of the modeling process, including the algorithms and the code you have used
31 |    - Validation: The validation process of the model, including how you have validated the model and what results you have obtained
32 |    - Implementation: The implementation process of the model, including the steps you have taken to implement the model and how you ensure the modeling quallity
33 | For the Modeling Analysis, please expicitly write about the following in your writing:
34 |    - Analytical Depth: The depth of the analysis you have done, including the performance of the model and the interpretation of the results
35 |    - Mathematical Rigor: The mathematical rigor of the analysis, including the theoretical foundation of the model and the assumptions made
36 |    - Results Interpretation: The interpretation of the results, including the patterns and trends observed
37 |    - Critical Analysis: The critical analysis of the results, including the strengths and limitations of the model
38 |    - Future Implications: The future implications of the results, including how the model could be improved in future work
39 | 3. You should first provide your thought process about what modeling process you have done in the history, and how should you write the Modeling Section and Analysis Section, and then write "--- Markdown Begin ---" to indicate the beginning of your writing. Your wriiting should contain two parallel sections in the following format:
40 | Your Response:
41 | <Your thought process>
42 | --- Markdown Begin ---
43 | <Your markdown writing>
44 | """
45 | 
46 | SIMULATION_USER = """Please write the modeling section and the analysis for the following math modeling goal. You should follow the process described in the system instruction to write this section.
47 | 
48 | ### Modeling Process History
49 | {all_history}
50 | 
51 | 
52 | ### Report File
53 | {report_file}
54 | 
55 | 
56 | ### Modeling Goal
57 | {all_modeling}
58 | 
59 | 
60 | ### Modeling Implementation
61 | {modeling_implementation}
62 | 
63 | 
64 | ### Data Implementation
65 | {all_data}
66 | 
67 | ---
68 | 
69 | Please note that in your thought and writing, you should perform the following:
70 | 1. If the report file exists, it already hints on what data are being used, what's the modeling method, what variables are considered. You should combine the report with what you have done in the modeling process history to first get an overview of how you implement the modeling method.
71 | 2. You should explicitly divide your wrting into two parallel sections: the Modeling Implementation and the Modeling Analysis.
72 | 3. For the Modeling Implementation, please expicitly write about the following in your writing:
73 |    - First you should give a brief lead in about the modeling process, including the modeling method, the modeling approach, and how to apply the data you have collected to the modeling process.
74 |    - Then you should give a detailed description of the modeling process, focusing on the aspects we mentioned above, including the real-world integration, the technical sophistication, the validation, and the implementation.
75 |    - During this process, you should always try to be concrete and specific. Try to give numerical values, result, the exact code snippet you have used, the exact result you obtained, etc. Please carefully refer to the modeling process history and the report file to make your writing coherent and comprehensive and persuasive.
76 | 4. For the Modeling Analysis, please expicitly write about the following in your writing:
77 |    - First you should give a brief lead in about the modeling analysis, including the performance of the model and the interpretation of the results.
78 |    - Then you should give a detailed description of the modeling analysis, focusing on the aspects we mentioned above, including the analytical depth, the mathematical rigor, the results interpretation, the critical analysis, and the future implications.
79 |    - During this process, you should always try to be concrete and specific. Try to give numerical values, result, the exact code snippet you have used, the exact result you obtained, etc. Please carefully refer to the modeling process history and the report file to make your writing coherent and comprehensive and persuasive.
80 | 5. Suppose you are writing this Modeling and Analysis sections directly after the modeling implementation and data implementation part of the report, so try to be coherent with the writing style of the whole report. Make it structured, clear and rigourous.
81 | 6. Please make your final Modeling and Analysis sections long and comprehensive with concrete details.
82 | 
83 | ---
84 | 
85 | Your response MUST use this format:
86 | <Your thought process>
87 | --- Markdown Begin ---
88 | <Your markdown writing>
89 | 
90 | 
91 | Your Response:
92 | """
93 | 


--------------------------------------------------------------------------------
/src/ModelAgent/prompts/writing_solution.py:
--------------------------------------------------------------------------------
 1 | SOLUTION_SYS = """### Task
 2 | You are a specialized assistant trained to write a math modeling report. You are in charge of writing the solution section to fulfill all the specific modeling requirements. Your output should be a markdown file regarding this section, including the following:
 3 | 
 4 | 1. Detailed solution process to each of the subtask regarding the whole modeling process:
 5 |     - If you have already finished this task in previous writing, please point to where you have finished it, and then give a short recap of what you have done to solve this subtask, and what the result is.
 6 |     - If you have not finished this task in previous writing, please give a detailed solution process to this subtask, based on what model you have constructed and the data you have collected. You shuold be very clear and specific and concrete in responsing to these specific modeling requirements.
 7 | 
 8 | ---
 9 | 
10 | ### All Modeling Requirements (Sub-tasks)
11 | {all_requirements}
12 | 
13 | ---
14 | 
15 | ### Instructions
16 | 1. Make sure that each response to the subtask should be one sub-section that may contain may paragraphs, including citation, code snippet, etc. to be comprehensive and rigorous.
17 | 2. You should write directly after "--- Markdown Begin ---". Your wriiting should contain multiple parallel sub-sections in the following format:
18 | --- Markdown Begin ---
19 | # Solutions to All Modeling Requirements (Sub-tasks)
20 | ## <Subtask 1>
21 | <Your writing response to the subtask 1>
22 | 
23 | ## <Subtask 2>
24 | <Your writing response to the subtask 2>
25 | 
26 | ...
27 | """
28 | 
29 | SOLUTION_USER = """Please write the modeling section and the analysis for the following math modeling goal. You should follow the process described in the system instruction to write this section.
30 | 
31 | Please write a detailed solution process to each of the subtask, following the instructions in the system instruction.
32 | You shuold write several parallel sub-sections in your response, one for each subtask.
33 | Try to be very detailed, specifc, and concrete in your writing. Use code snippet, mathematical formula, and numerical result to support your points.
34 | 
35 | ---
36 | 
37 | {writing}
38 | 
39 | 
40 | --- Markdown Begin ---
41 | # Solutions to All Modeling Requirements (Sub-tasks)
42 | """
43 | 


--------------------------------------------------------------------------------
/src/ModelAgent/utils/shared_context.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | 
 4 | class SharedContext:
 5 |     def __init__(self, config):
 6 |         self.config = config
 7 |         self.context = {}
 8 |         log_dir = config["log_dir"]
 9 | 
10 |         self.log_file = os.path.join(log_dir, "log")
11 |         self.log_json = os.path.join(log_dir, "context.json")
12 |         
13 |         # Initialize the log file
14 |         os.makedirs(log_dir, exist_ok=True)
15 |         
16 |         with open(self.log_file, "w", encoding="utf-8") as f:
17 |             f.write("Log file created, initializing ...\n")
18 |             f.write(json.dumps(config, indent=4, ensure_ascii=False))
19 |             f.write("\n\n\n")
20 |         
21 |     
22 |     def load_context(self, path):
23 |         with open(path, "r", encoding="utf-8") as f:
24 |             self.context = json.load(f)
25 |     
26 |     
27 |     def save_context(self, path):
28 |         with open(path, "w", encoding="utf-8") as f:
29 |             json.dump(self.context, f, indent=4, ensure_ascii=False)
30 |     
31 |         
32 |     def add_context(self, key, value):
33 |         self.context[key] = value
34 |         # add context along with checkpointing
35 |         with open(self.log_file, "a", encoding="utf-8") as f:
36 |             f.write(f"Added context - {key}:\n")
37 |             try:
38 |                 f.write(json.dumps(value, indent=4, ensure_ascii=False))
39 |             except:
40 |                 f.write(str(value))
41 |             f.write("\n\n\n")
42 |         
43 |         self.save_context(self.log_json)
44 |     
45 |     
46 |     def get_context(self, key):
47 |         if key not in self.context:
48 |             raise Exception("Key not found in context")
49 |         return self.context[key]
50 | 
51 | 
52 |     def delete_context(self, key):
53 |         if key in self.context:
54 |             del self.context[key]
55 |             
56 |     def from_dict(self, ctx: dict):
57 |         """Load context from a Python dict (compat for old code)."""
58 |         self.context = ctx or {}
59 |         self.save_context(self.log_json)
60 | 


--------------------------------------------------------------------------------
/src/ModelAgent/utils/tool_call_parser.py:
--------------------------------------------------------------------------------
 1 | import json, re, uuid
 2 | from types import SimpleNamespace
 3 | from typing import Any, List
 4 | 
 5 | PYTHON_TAG_RE = re.compile(r"<\|python_tag\|>\s*(\{.*?})", re.S)
 6 | JSON_BLOCK_RE = re.compile(r"```json\s*(\{.*?})\s*```", re.S | re.I)
 7 | 
 8 | def _to_dict(obj: Any) -> dict:
 9 |     if hasattr(obj, "model_dump"):
10 |         return obj.model_dump()
11 |     if hasattr(obj, "dict"):
12 |         return obj.dict()
13 |     if isinstance(obj, dict):
14 |         return obj
15 |     return {"content": str(obj)}
16 | 
17 | def _to_ns(d: dict) -> SimpleNamespace:
18 |     """Convert True/False/None to valid JSON; remove trailing commas and other common issues"""
19 |     if isinstance(d, dict):
20 |         return SimpleNamespace(**{k: _to_ns(v) for k, v in d.items()})
21 |     if isinstance(d, list):
22 |         return [_to_ns(i) for i in d]
23 |     return d
24 | 
25 | def _fix_json(text: str) -> str:
26 |     rep = [
27 |         (r"\bTrue\b",  "true"),
28 |         (r"\bFalse\b", "false"),
29 |         (r"\bNone\b",  "null"),
30 |         (r",\s*([}\]])", r"\1")
31 |     ]
32 |     for pat, repl in rep:
33 |         text = re.sub(pat, repl, text)
34 |     return text
35 | 
36 | def _build_tool_call(name: str, arguments: dict, call_id: str | None = None):
37 |     return {
38 |         "id": call_id or f"call_{uuid.uuid4().hex[:8]}",
39 |         "type": "function",
40 |         "function": {
41 |             "name": name,
42 |             "arguments": json.dumps(arguments, ensure_ascii=False)
43 |         }
44 |     }
45 | 
46 | def extract_tool_call(message: Any):
47 |     """
48 |     Parse assistant message, return *new* SimpleNamespace:
49 |     - If already contains function_call/tool_calls → directly normalize and return
50 |     - Otherwise try to parse <|python_tag|> / ```json``` code blocks from content
51 |     - If both fail ⇒ return original message
52 |     """
53 |     msg_dict = _to_dict(message)
54 | 
55 |     if fc := msg_dict.get("function_call"):
56 |         tool = _build_tool_call(fc.get("name"), json.loads(fc.get("arguments", "{}")))
57 |         return _to_ns({"role": "assistant", "content": None, "tool_calls": [tool]})
58 | 
59 |     if msg_dict.get("tool_calls"):
60 |         return _to_ns(msg_dict)
61 | 
62 |     content: str = msg_dict.get("content") or ""
63 |     if not content:
64 |         return message
65 | 
66 |     match = PYTHON_TAG_RE.search(content)
67 |     if not match:
68 |         match = JSON_BLOCK_RE.search(content)
69 |     if not match:
70 |         return message
71 |     
72 |     json_txt = _fix_json(match.group(1).strip())
73 |     try:
74 |         payload = json.loads(json_txt)
75 |     except json.JSONDecodeError:
76 |         return message 
77 | 
78 |     # Allow payload to be directly a list/single item
79 |     calls: List[dict] = []
80 |     if isinstance(payload, list):
81 |         for p in payload:
82 |             calls.append(
83 |                 _build_tool_call(p.get("name"), p.get("parameters", {}), p.get("id"))
84 |             )
85 |     else:
86 |         calls.append(
87 |             _build_tool_call(payload.get("name"),
88 |                              payload.get("parameters", {}),
89 |                              payload.get("id"))
90 |         )
91 | 
92 |     return _to_ns({"role": "assistant", "content": None, "tool_calls": calls})


--------------------------------------------------------------------------------
/src/ModelAgent/utils/utils.py:
--------------------------------------------------------------------------------
 1 | def form_message(system, user):
 2 |     return [
 3 |         {
 4 |             "role": "system",
 5 |             "content": system
 6 |         },
 7 |         {
 8 |             "role": "user",
 9 |             "content": user
10 |         }
11 |     ]
12 | 
13 | 


--------------------------------------------------------------------------------
/src/ModelBase/baseline.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | import time
  4 | import yaml
  5 | from openai import OpenAI
  6 | from concurrent.futures import ThreadPoolExecutor
  7 | 
  8 | SYS_PROMPT = """You are an expert mathematical modeler tasked with creating comprehensive solutions to mathematical modeling problems. Your solutions must be of high quality and meet the following criteria:
  9 | 
 10 | 1. Structural Completeness:
 11 |    - Clear problem restatement showing deep understanding
 12 |    - Well-justified assumptions with rationale
 13 |    - Detailed model implementation with mathematical rigor
 14 |    - Clear solution process and results presentation
 15 |    - Thorough analysis of results and limitations
 16 | 
 17 | 2. Problem Requirements:
 18 |    - Address every requirement stated in the problem
 19 |    - Ensure each component of the solution aligns with problem objectives
 20 |    - Follow any specific format or deliverable requirements
 21 | 
 22 | 3. Modeling Quality:
 23 |    - Use appropriate modeling approaches for the problem context
 24 |    - Consider real-world factors and constraints
 25 |    - Employ rigorous mathematical formalization
 26 |    - Clearly state and justify model parameters
 27 |    - Include validation methods
 28 | 
 29 | 4. Data Handling:
 30 |    - Use authentic and reliable data sources
 31 |    - Justify data selection and preprocessing
 32 |    - Ensure sufficient data for meaningful analysis
 33 |    - Include data validation and quality checks
 34 | 
 35 | 5. Analysis Depth:
 36 |    - Base conclusions on mathematical/experimental evidence
 37 |    - Provide insightful interpretation of results
 38 |    - Include sensitivity analysis where appropriate
 39 |    - Discuss limitations and uncertainties
 40 | 
 41 | 6. Innovation:
 42 |    - Propose creative modeling approaches
 43 |    - Consider novel combinations of methods
 44 |    - Demonstrate potential real-world impact
 45 |    - Suggest practical implementation strategies
 46 | 
 47 | Your solution must follow this structure:
 48 | 
 49 | ### Problem Restatement
 50 | [Clear restatement and interpretation of the problem]
 51 | 
 52 | ### Assumptions and Justification
 53 | [List and justify key assumptions]
 54 | 
 55 | ### Model Development
 56 | [Detailed mathematical model description]
 57 | - Variables and Parameters
 58 | - Equations and Relationships
 59 | - Constraints and Conditions
 60 | 
 61 | ### Solution Process
 62 | [Step-by-step solution implementation]
 63 | - Data Collection and Processing
 64 | - Model Implementation
 65 | - Solution Methods
 66 | 
 67 | ### Results and Analysis
 68 | [Comprehensive results presentation]
 69 | - Key Findings
 70 | - Sensitivity Analysis
 71 | - Validation
 72 | - Limitations
 73 | 
 74 | ### Recommendations
 75 | [Practical implications and suggestions]
 76 | 
 77 | Note: Ensure mathematical rigor while maintaining clarity. Include equations, diagrams, and data analysis as needed."""
 78 | 
 79 | USER_PROMPT = """Please create a comprehensive mathematical modeling solution for the following problem:
 80 | 
 81 | {question}
 82 | 
 83 | Develop a complete solution following the specified structure."""
 84 | 
 85 | def form_messages(msg: str, system_prompt: str = "你好！"):
 86 |     messages = [
 87 |         {'role': 'system', 'content': system_prompt},
 88 |         {'role': 'user', 'content': msg}
 89 |     ]
 90 |     return messages
 91 | 
 92 | def gpt_chatcompletion(messages, model="gpt-4o"):
 93 |     rounds = 0
 94 |     while True:
 95 |         rounds += 1
 96 |         try:
 97 |             if "gpt" in model or "gemini" in model:
 98 |                 response = client.chat.completions.create(
 99 |                     model=model,
100 |                     messages=messages,
101 |                     temperature=0,
102 |                     n=1,
103 |                     max_tokens=8192,
104 |                 )
105 |                 content = response.choices[0].message.content
106 |             else:
107 |                 messages.append({
108 |                     "role": "user",
109 |                     "content": "Please directly give me a long passage to address the modeling problem in markdown format."
110 |                 })
111 |                 response = client.chat.completions.create(
112 |                     model=client.models.list().data[0].id,
113 |                     messages=messages,
114 |                     temperature=0,
115 |                     n=1,
116 |                     max_tokens=8192,
117 |                 )
118 |                 content = response.choices[0].message.content
119 |             return content.strip()
120 | 
121 |         except Exception as e:
122 |             print(f"Generation Error: {e}")
123 |             time.sleep(20)
124 |             if rounds > 3:
125 |                 raise Exception("Generation failed too many times")
126 | 
127 | 
128 | def main(gold_id: str, data: dict, output_dir: str, answered_data: dict, log: dict, model: str = "gpt-4o"):
129 |     if gold_id in answered_data and "error" not in answered_data[gold_id]:
130 |         return
131 |     
132 |     print(f"Generating solution for {gold_id} ...")
133 |     question = data["question"]
134 |     
135 |     # Create output directory if it doesn't exist
136 |     os.makedirs(output_dir, exist_ok=True)
137 |     
138 |     # Generate solution
139 |     messages = form_messages(
140 |         USER_PROMPT.format(question=question),
141 |         SYS_PROMPT
142 |     )
143 |     
144 |     output_file = os.path.join(output_dir, f"{gold_id}.json")
145 |     
146 |     try:
147 |         solution = gpt_chatcompletion(messages, model=model)
148 |         
149 |         # Save the solution
150 |         with open(output_file, 'w') as f:
151 |             json.dump({
152 |                 "writing": solution,
153 |                 "metadata": {
154 |                     "timestamp": time.time(),
155 |                     "status": "success"
156 |                 }
157 |             }, f, indent=4)
158 |         
159 |         # Update answered data
160 |         answered_data[gold_id] = {
161 |             "writing": solution,
162 |             "metadata": {
163 |                 "timestamp": time.time(),
164 |                 "status": "success"
165 |             }
166 |         }
167 |         
168 |         log["success"] += 1
169 |         print(f"!! Generated solution for {gold_id} !!")
170 |         
171 |     except Exception as e:
172 |         print(f"Failed to generate solution for {gold_id}: {str(e)}")
173 |         log["fail"] += 1
174 |         answered_data[gold_id] = {
175 |             "error": str(e),
176 |             "metadata": {
177 |                 "timestamp": time.time(),
178 |                 "status": "failed"
179 |             }
180 |         }
181 | 
182 | 
183 | if __name__ == '__main__':
184 |     model = "gpt-4o" # Change to the model being tested
185 |     config = yaml.safe_load(open("./model_config.yaml", "r"))
186 |     
187 |     if "gpt" in model:
188 |         client = OpenAI(api_key=config["openai_api_key"])
189 |     else:
190 |         client = OpenAI(api_key="dummy", base_url="http://localhost:8000/v1")
191 |     
192 |     # Load problem data
193 |     with open("../data/modeling_data_final.json") as f:
194 |         all_data = json.load(f)
195 |     
196 |     # Setup output directory
197 |     output_dir = f"../output_writings/ModelBase/{model}"
198 |     os.makedirs(output_dir, exist_ok=True)
199 |     
200 |     # Load or initialize answered data
201 |     save_path = os.path.join(output_dir, "solutions_metadata.json")
202 |     if os.path.exists(save_path):
203 |         with open(save_path) as f:
204 |             answered_data = json.load(f)
205 |     else:
206 |         answered_data = {}
207 |     
208 |     # Initialize log
209 |     log = {"success": 0, "fail": 0}
210 |     
211 |     # Generate solutions in parallel
212 |     with ThreadPoolExecutor(max_workers=1) as executor:
213 |         futures = [
214 |             executor.submit(main, gold_id, data, output_dir, answered_data, log, model)
215 |             for gold_id, data in all_data.items()
216 |         ]
217 |         for future in futures:
218 |             future.result()
219 |     
220 |     # Save metadata
221 |     with open(save_path, 'w') as f:
222 |         json.dump(answered_data, f, indent=4)
223 |     
224 |     print(f"Completed - Success: {log['success']}, Failed: {log['fail']}") 


--------------------------------------------------------------------------------
/src/ModelBase/model_config.yaml:
--------------------------------------------------------------------------------
1 | model_name: local
2 | port: 8000
3 | openai_api_key: YOUR_OPENAI_API_KEY
4 | gemini_api_key: YOUR_GEMINI_API_KEY
5 | gemini_base_url: YOUR_GEMINI_BASE_URL


--------------------------------------------------------------------------------
/src/ModelTool/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/ModelTool/__init__.py


--------------------------------------------------------------------------------
/src/ModelTool/model_config.yaml:
--------------------------------------------------------------------------------
 1 | model_name: local
 2 | port: 8000
 3 | openai_api_key: YOUR_OPENAI_API_KEY
 4 | serper_api_key: YOUR_SERPER_API_KEY
 5 | copy_files_to_workspace: true
 6 | 
 7 | use_scratch_board: false
 8 | use_planner: true
 9 | log_planner_steps: true
10 | planner_name: BasePlanner
11 | 
12 | # Core configuration
13 | core:
14 |   type: local
15 |   model:
16 |     type: local
17 |     name: YOUR_MODEL_NAME
18 |     temperature: 0
19 |     max_tokens: 4096
20 |   api:
21 |     openai:
22 |       key: ${openai_api_key}
23 | 


--------------------------------------------------------------------------------
/src/ModelTool/utils/planner.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import os
  3 | import json
  4 | import time
  5 | import yaml
  6 | from openai import OpenAI
  7 | 
  8 | class BasePlanner:
  9 |     
 10 |     def __init__(self, planner_config, main_agent=None):
 11 |         self.main_agent = main_agent 
 12 | 
 13 |         self.model_name = planner_config.get("model_name", "gpt-4o-mini")
 14 |         openai_api_key = planner_config.get("openai_api_key", "")
 15 |         self.use_scratch_board = False
 16 | 
 17 |         self.planner_name = planner_config.get("planner_name", "BasePlanner")
 18 |         self.log_planner_steps = planner_config.get("log_planner_steps", True)
 19 | 
 20 |         if "gpt" in self.model_name.lower():
 21 |             self.client = OpenAI(api_key=openai_api_key)
 22 |             print(f"[BasePlanner] Initialized with model_name={self.model_name}, openai_api_key length={len(openai_api_key)}")
 23 |         else:
 24 |             port = planner_config["port"]
 25 |             self.client = OpenAI(api_key="dummy", base_url=f"http://localhost:{port}/v1")
 26 |             print(f"[BasePlanner] Initialized with model_name={self.model_name}, dummy client")
 27 | 
 28 |         self.tools_description = self._build_tools_description()
 29 |         
 30 |         if self.main_agent and hasattr(self.main_agent, "run_folder"):
 31 |             self.planner_log_file = os.path.join(
 32 |                 self.main_agent.run_folder,
 33 |                 f"planner.log"
 34 |             )
 35 |         print(f"[BasePlanner] Planner log will be saved to: {self.planner_log_file}")
 36 | 
 37 |     def _build_tools_description(self) -> str:
 38 |         if (self.main_agent is None) or (not hasattr(self.main_agent, "tool_map")):
 39 |             return "No tool information is available."
 40 | 
 41 |         lines = []
 42 |         for tool_key, tool_obj in self.main_agent.tool_map.items():
 43 |             tool_name = getattr(tool_obj, "tool_name", tool_key)
 44 |             tool_description = getattr(tool_obj, "tool_description", "")
 45 |             user_metadata = getattr(tool_obj, "user_metadata", {})
 46 | 
 47 |             block = (
 48 |                 f"Tool Name: {tool_name}\n"
 49 |                 f"Description: {tool_description}\n"
 50 |                 f"User Metadata: {user_metadata}\n"
 51 |                 "----------------------"
 52 |             )
 53 |             lines.append(block)
 54 | 
 55 |         return "\n".join(lines)
 56 | 
 57 |     def _append_planner_log(self, text: str):
 58 |         if self.log_planner_steps:
 59 |             with open(self.planner_log_file, "a", encoding="utf-8") as f:
 60 |                 f.write(text + "\n")
 61 | 
 62 |     def gpt_planner_call(self, messages, max_char = 409600):
 63 |         total_len = sum(len(m["content"]) for m in messages)
 64 | 
 65 |         if total_len > max_char:
 66 |             self._append_planner_log(
 67 |                 f"[Planner] Messages total length {total_len} exceeds {max_char}, now truncating."
 68 |             )
 69 | 
 70 |             for i in reversed(range(len(messages))):
 71 |                 msg_len = len(messages[i]["content"])
 72 |                 if total_len <= max_char:
 73 |                     break
 74 |                 over_size = total_len - max_char
 75 |                 if msg_len <= over_size:
 76 |                     messages[i]["content"] = ""
 77 |                     total_len -= msg_len
 78 |                 else:
 79 |                     new_len = msg_len - over_size
 80 |                     messages[i]["content"] = messages[i]["content"][:new_len] + " ... (truncated)"
 81 |                     total_len = self.MAX_CHAR_LENGTH
 82 | 
 83 |         rounds = 0
 84 |         while True:
 85 |             rounds += 1
 86 |             try:
 87 |                 system_content = ""
 88 |                 user_content = ""
 89 |                 for m in messages:
 90 |                     if m["role"] == "system":
 91 |                         system_content = m["content"]
 92 |                     elif m["role"] == "user":
 93 |                         user_content = m["content"]
 94 | 
 95 |                 log_text = (
 96 |                     f"== [Planner Round {rounds}] GPT Call ==\n"
 97 |                     f"**System**:\n{system_content}\n\n"
 98 |                     f"**User**:\n{user_content}\n"
 99 |                 )
100 |                 self._append_planner_log(log_text)
101 | 
102 |                 if "gpt" in self.model_name.lower() or "gemini" in self.model_name.lower():
103 |                     response = self.client.chat.completions.create(
104 |                         model=self.model_name,
105 |                         messages=messages,
106 |                         temperature=0,
107 |                         n=1,
108 |                     )
109 |                     content = response.choices[0].message.content
110 |                 else:
111 |                     response = self.client.chat.completions.create(
112 |                         model=self.client.models.list().data[0].id,
113 |                         messages=messages,
114 |                         max_tokens=8192,
115 |                         temperature=0,
116 |                         n=1,
117 |                     )
118 |                     content = response.choices[0].message.content
119 | 
120 |                 # 把 GPT 的原始返回写到日志
121 |                 response_str = json.dumps({
122 |                     "role": "assistant",
123 |                     "content": content
124 |                 }, ensure_ascii=False, indent=2)
125 |                 self._append_planner_log(f"**Raw Response**:\n{response_str}\n")
126 | 
127 |                 return content
128 | 
129 |             except Exception as e:
130 |                 err_msg = f"[Planner] GPT plan generation error: {e}"
131 |                 self._append_planner_log(err_msg)
132 |                 time.sleep(5)
133 |                 if rounds > 3:
134 |                     raise Exception(f"[Planner] GPT plan call failed too many times: {e}")
135 |     
136 |     def plan(self, status_text: str) -> str:
137 |         prompt_path = "./planner_prompt.yaml"
138 |         try:
139 |             with open(prompt_path, "r", encoding="utf-8") as pf:
140 |                 prompt_data = yaml.safe_load(pf)
141 |         except Exception as e:
142 |             raise ValueError(f"[BasePlanner] Error reading prompt YAML from {prompt_path}: {e}")
143 | 
144 |         system_prompt_template = prompt_data.get("system", "")
145 |         user_prompt_template = prompt_data.get("user", "")
146 | 
147 |         system_prompt = system_prompt_template.replace("<<tool description>>", self.tools_description)
148 |         user_prompt = user_prompt_template.replace("<<status>>", status_text)
149 | 
150 |         if self.main_agent and hasattr(self.main_agent, '_build_tool_call_history'):
151 |             tool_call_history_str = self.main_agent._build_tool_call_history(num=7)
152 |             user_prompt += f"\n\nRecent Detailed Tool Calls:\n{tool_call_history_str}"
153 |             user_prompt = user_prompt.replace("<<recent_tool_calls>>", tool_call_history_str)
154 | 
155 |         messages = [
156 |             {"role": "system", "content": system_prompt},
157 |             {"role": "user",   "content": user_prompt},
158 |         ]
159 |         planner_output = self.gpt_planner_call(messages)
160 |         self._append_planner_log(f"=== Planner Plan Output ===\n{planner_output}")
161 |         
162 |         return planner_output
163 | 


--------------------------------------------------------------------------------
/src/ModelTool/utils/planner_config.yaml:
--------------------------------------------------------------------------------
1 | model_name: Qwen
2 | openai_api_key: YOUR_OPENAI_API_KEY
3 | 
4 | use_scratch_board: false
5 | log_planner_steps: true
6 | planner_name: "BasePlanner"


--------------------------------------------------------------------------------
/src/host/host.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES=4,6
 2 | 
 3 | #vllm serve /shared/nas2/shared/llms/Qwen2.5-72B-Instruct \
 4 | #    --max-model-len 32768 \
 5 | #    --gpu-memory-utilization 0.9 \
 6 | #    --tensor-parallel-size 8 \
 7 | #    --enable-auto-tool-choice \
 8 | #    --tool-call-parser hermes \
 9 | #    --chat-template tool_chat_hermes_template.jinja
10 | 
11 | # vllm serve /shared/nas2/shared/llms/Llama-3.1-70B-Instruct \
12 | #     --max-model-len 32768 \
13 | #     --gpu-memory-utilization 0.9 \
14 | #     --tensor-parallel-size 8 \
15 | #     --enable-auto-tool-choice \
16 | #     --tool-call-parser llama3_json \
17 | #     --chat-template tool_chat_llama3.1_template.jinja
18 | 
19 | vllm serve /shared/nas2/shared/llms/QwQ-32B \
20 |     --gpu-memory-utilization 0.9 \
21 |     --tensor-parallel-size 2 \
22 |     --enable-auto-tool-choice \
23 |     --tool-call-parser hermes \
24 |     --chat-template tool_chat_hermes_template.jinja
25 | 
26 | 


--------------------------------------------------------------------------------
/src/host/tool_chat_hermes_template.jinja:
--------------------------------------------------------------------------------
  1 | {%- macro json_to_python_type(json_spec) %}
  2 |     {%- set basic_type_map = {
  3 |     "string": "str",
  4 |     "number": "float",
  5 |     "integer": "int",
  6 |     "boolean": "bool"
  7 | } %}
  8 | 
  9 |     {%- if basic_type_map[json_spec.type] is defined %}
 10 |         {{- basic_type_map[json_spec.type] }}
 11 |     {%- elif json_spec.type == "array" %}
 12 |         {{- "list[" +  json_to_python_type(json_spec|items) + "]" }}
 13 |     {%- elif json_spec.type == "object" %}
 14 |         {%- if json_spec.additionalProperties is defined %}
 15 |             {{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }}
 16 |         {%- else %}
 17 |             {{- "dict" }}
 18 |         {%- endif %}
 19 |     {%- elif json_spec.type is iterable %}
 20 |         {{- "Union[" }}
 21 |         {%- for t in json_spec.type %}
 22 |             {{- json_to_python_type({"type": t}) }}
 23 |             {%- if not loop.last %}
 24 |                 {{- "," }}
 25 |             {%- endif %}
 26 |         {%- endfor %}
 27 |         {{- "]" }}
 28 |     {%- else %}
 29 |         {{- "Any" }}
 30 |     {%- endif %}
 31 | {%- endmacro %}
 32 | 
 33 | 
 34 | {{- bos_token }}
 35 | {{- "<|im_start|>system\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> " }}
 36 | {%- if tools is iterable and tools | length > 0 %}
 37 |     {%- for tool in tools %}
 38 |         {%- if tool.function is defined %}
 39 |             {%- set tool = tool.function %}
 40 |         {%- endif %}
 41 |         {{- '{"type": "function", "function": ' }}
 42 |         {{- '{"name": "' + tool.name + '", ' }}
 43 |         {{- '"description": "' + tool.name + '(' }}
 44 |         {%- for param_name, param_fields in tool.parameters.properties|items %}
 45 |             {{- param_name + ": " + json_to_python_type(param_fields) }}
 46 |             {%- if not loop.last %}
 47 |                 {{- ", " }}
 48 |             {%- endif %}
 49 |         {%- endfor %}
 50 |         {{- ")" }}
 51 |         {%- if tool.return is defined %}
 52 |             {{- " -> " + json_to_python_type(tool.return) }}
 53 |         {%- endif %}
 54 |         {{- " - " + tool.description + "\n\n" }}
 55 |         {%- for param_name, param_fields in tool.parameters.properties|items %}
 56 |             {%- if loop.first %}
 57 |                 {{- "    Args:\n" }}
 58 |             {%- endif %}
 59 |             {{- "        " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
 60 |         {%- endfor %}
 61 |         {%- if tool.return is defined and tool.return.description is defined %}
 62 |             {{- "\n    Returns:\n        " + tool.return.description }}
 63 |         {%- endif %}
 64 |         {{- '"' }}
 65 |         {{- ', "parameters": ' }}
 66 |         {%- if tool.parameters.properties | length == 0 %}
 67 |             {{- "{}" }}
 68 |         {%- else %}
 69 |             {{- tool.parameters|tojson }}
 70 |         {%- endif %}
 71 |         {{- "}" }}
 72 |         {%- if not loop.last %}
 73 |             {{- "\n" }}
 74 |         {%- endif %}
 75 |     {%- endfor %}
 76 | {%- endif %}
 77 | {{- " </tools>" }}
 78 | {{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}}
 79 | ' }}
 80 | {{- "For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
 81 | " }}
 82 | {{- "<tool_call>
 83 | " }}
 84 | {{- '{"name": <function-name>, "arguments": <args-dict>}
 85 | ' }}
 86 | {{- '</tool_call><|im_end|>' }}
 87 | {%- for message in messages %}
 88 |     {%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %}
 89 |         {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
 90 |     {%- elif message.role == "assistant" and message.tool_calls is defined %}
 91 |         {{- '<|im_start|>' + message.role }}
 92 |         {%- for tool_call in message.tool_calls %}
 93 |             {{- '\n<tool_call>\n' }}
 94 |             {%- if tool_call.function is defined %}
 95 |                 {%- set tool_call = tool_call.function %}
 96 |             {%- endif %}
 97 |             {{- '{' }}
 98 |             {{- '"name": "' }}
 99 |             {{- tool_call.name }}
100 |             {{- '"' }}
101 |             {%- if tool_call.arguments is defined %}
102 |                 {{- ', ' }}
103 |                 {{- '"arguments": ' }}
104 |                 {{- tool_call.arguments|tojson }}
105 |             {%- endif %}
106 |             {{- '}' }}
107 |             {{- '\n</tool_call>' }}
108 |         {%- endfor %}
109 |         {{- '<|im_end|>\n' }}
110 |     {%- elif message.role == "tool" %}
111 |         {%- if loop.previtem and loop.previtem.role != "tool" %}
112 |             {{- '<|im_start|>tool\n' }}
113 |         {%- endif %}
114 |         {{- '<tool_response>\n' }}
115 |         {{- message.content }}
116 |         {%- if not loop.last %}
117 |             {{- '\n</tool_response>\n' }}
118 |         {%- else %}
119 |             {{- '\n</tool_response>' }}
120 |         {%- endif %}
121 |         {%- if not loop.last and loop.nextitem.role != "tool" %}
122 |             {{- '<|im_end|>' }}
123 |         {%- elif loop.last %}
124 |             {{- '<|im_end|>' }}
125 |         {%- endif %}
126 |     {%- endif %}
127 | {%- endfor %}
128 | {%- if add_generation_prompt %}
129 |     {{- '<|im_start|>assistant\n' }}
130 | {%- endif %}


--------------------------------------------------------------------------------
/src/host/tool_chat_llama3.1_template.jinja:
--------------------------------------------------------------------------------
  1 | {{- bos_token }}
  2 | {%- if custom_tools is defined %}
  3 |     {%- set tools = custom_tools %}
  4 | {%- endif %}
  5 | {%- if not tools_in_user_message is defined %}
  6 |     {#- Llama 3.1 doesn't pass all tests if the tools are in the system prompt #}
  7 |     {%- set tools_in_user_message = true %}
  8 | {%- endif %}
  9 | {%- if not date_string is defined %}
 10 |     {%- if strftime_now is defined %}
 11 |         {%- set date_string = strftime_now("%d %b %Y") %}
 12 |     {%- else %}
 13 |         {%- set date_string = "26 Jul 2024" %}
 14 |     {%- endif %}
 15 | {%- endif %}
 16 | {%- if not tools is defined %}
 17 |     {%- set tools = none %}
 18 | {%- endif %}
 19 | 
 20 | {#- This block extracts the system message, so we can slot it into the right place. #}
 21 | {%- if messages[0]['role'] == 'system' %}
 22 |     {%- if messages[0]['content'] is string %}
 23 |         {%- set system_message = messages[0]['content']|trim %}
 24 |     {%- else %}
 25 |         {%- set system_message = messages[0]['content'][0]['text']|trim %}
 26 |     {%- endif %}
 27 |     {%- set messages = messages[1:] %}
 28 | {%- else %}
 29 |     {%- if tools is not none %}
 30 |         {%- set system_message = "You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question." %}
 31 |     {%- else %}
 32 |         {%- set system_message = "" %}
 33 |     {%- endif %}
 34 | {%- endif %}
 35 | 
 36 | {#- System message #}
 37 | {{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
 38 | {%- if tools is not none %}
 39 |     {{- "Environment: ipython\n" }}
 40 | {%- endif %}
 41 | {{- "Cutting Knowledge Date: December 2023\n" }}
 42 | {{- "Today Date: " + date_string + "\n\n" }}
 43 | {%- if tools is not none and not tools_in_user_message %}
 44 |     {{- "You have access to the following functions. To call a function, please respond with JSON for a function call. " }}
 45 |     {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. ' }}
 46 |     {{- "Do not use variables.\n\n" }}
 47 |     {%- for t in tools %}
 48 |         {{- t | tojson(indent=4) }}
 49 |         {{- "\n\n" }}
 50 |     {%- endfor %}
 51 | {%- endif %}
 52 | {{- system_message }}
 53 | {{- "<|eot_id|>" }}
 54 | 
 55 | {#- Custom tools are passed in a user message with some extra guidance #}
 56 | {%- if tools_in_user_message and not tools is none %}
 57 |     {#- Extract the first user message so we can plug it in here #}
 58 |     {%- if messages | length != 0 %}
 59 |         {%- if messages[0]['content'] is string %}
 60 |             {%- set first_user_message = messages[0]['content']|trim %}
 61 |         {%- else %}
 62 |             {%- set first_user_message = messages[0]['content'] | selectattr('type', 'equalto', 'text') | map(attribute='text') | map('trim') | join('\n') %}
 63 |         {%- endif %}
 64 |         {%- set messages = messages[1:] %}
 65 |     {%- else %}
 66 |         {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
 67 |     {%- endif %}
 68 |     {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
 69 |     {{- "Given the following functions, please respond with a JSON for a function call " }}
 70 |     {{- "with its proper arguments that best answers the given prompt.\n\n" }}
 71 |     {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. ' }}
 72 |     {{- "Do not use variables.\n\n" }}
 73 |     {%- for t in tools %}
 74 |         {{- t | tojson(indent=4) }}
 75 |         {{- "\n\n" }}
 76 |     {%- endfor %}
 77 |     {{- first_user_message + "<|eot_id|>"}}
 78 | {%- endif %}
 79 | 
 80 | {%- for message in messages %}
 81 |     {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
 82 |         {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' }}
 83 |         {%- if message['content'] is string %}
 84 |             {{- message['content'] | trim}}
 85 |         {%- else %}
 86 |             {%- for content in message['content'] %}
 87 |                 {%- if content['type'] == 'text' %}
 88 |                     {{- content['text'] | trim }}
 89 |                 {%- endif %}
 90 |             {%- endfor %}
 91 |         {%- endif %}
 92 |         {{- '<|eot_id|>' }}
 93 |     {%- elif 'tool_calls' in message %}
 94 |         {%- if not message.tool_calls|length == 1 %}
 95 |             {{- raise_exception("This model only supports single tool-calls at once!") }}
 96 |         {%- endif %}
 97 |         {%- set tool_call = message.tool_calls[0].function %}
 98 |         {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
 99 |         {{- '{"name": "' + tool_call.name + '", ' }}
100 |         {{- '"parameters": ' }}
101 |         {{- tool_call.arguments | tojson }}
102 |         {{- "}" }}
103 |         {{- "<|eot_id|>" }}
104 |     {%- elif message.role == "tool" or message.role == "ipython" %}
105 |         {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
106 |         {%- if message.content is string %}
107 |             {{- { "output": message.content } | tojson }}
108 |         {%- else %}
109 |             {%- for content in message['content']  %}
110 |                 {%- if content['type']  == 'text' %}
111 |                     {{- { "output": content['text']  } | tojson }}
112 |                 {%- endif %}
113 |             {%- endfor %}
114 |         {%- endif %}
115 |         {{- "<|eot_id|>" }}
116 |     {%- endif %}
117 | {%- endfor %}
118 | {%- if add_generation_prompt %}
119 |     {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
120 | {%- endif %}


--------------------------------------------------------------------------------
/src/judger/analysis_groundedness.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import os
  4 | from openai import OpenAI
  5 | 
  6 | class AnalysisGroundednessJudger:
  7 |     SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess how well the solution's analysis is grounded in mathematical and scientific principles. You should evaluate based on the role you are given.
  8 | 
  9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase:
 10 | 
 11 | 1. Analytical Depth (0-1):
 12 |    0.00: No meaningful analysis
 13 |         Example: Superficial observations without reasoning
 14 |    0.25: Basic analysis
 15 |         Example: Simple descriptive analysis without connections
 16 |    0.50: Standard analysis
 17 |         Example: Clear reasoning with some depth
 18 |    0.75: Advanced analysis
 19 |         Example: Deep insights with strong connections
 20 |    1.00: Exceptional analysis
 21 |         Example: Novel insights with comprehensive reasoning
 22 | 
 23 | 2. Mathematical Rigor (0-1):
 24 |    0.00: No mathematical support
 25 |         Example: Claims without mathematical backing
 26 |    0.25: Basic mathematics
 27 |         Example: Simple calculations without justification
 28 |    0.50: Standard rigor
 29 |         Example: Clear mathematical reasoning
 30 |    0.75: Strong rigor
 31 |         Example: Detailed proofs and derivations
 32 |    1.00: Exceptional rigor
 33 |         Example: Complete mathematical framework
 34 | 
 35 | 3. Results Interpretation (0-1):
 36 |    0.00: No interpretation
 37 |         Example: Raw results without context
 38 |    0.25: Basic interpretation
 39 |         Example: Simple description of results
 40 |    0.50: Clear interpretation
 41 |         Example: Results explained with context
 42 |    0.75: Thorough interpretation
 43 |         Example: Deep analysis of implications
 44 |    1.00: Exceptional interpretation
 45 |         Example: Comprehensive analysis with insights
 46 | 
 47 | 4. Critical Analysis (0-1):
 48 |    0.00: No critical thinking
 49 |         Example: Accepts all results without question
 50 |    0.25: Basic criticism
 51 |         Example: Notes obvious limitations
 52 |    0.50: Standard analysis
 53 |         Example: Identifies key strengths/weaknesses
 54 |    0.75: Strong analysis
 55 |         Example: Deep examination of assumptions
 56 |    1.00: Exceptional analysis
 57 |         Example: Comprehensive critique with alternatives
 58 | 
 59 | 5. Future Implications (0-1):
 60 |    0.00: No discussion
 61 |         Example: Ends at results
 62 |    0.25: Basic implications
 63 |         Example: Simple next steps
 64 |    0.50: Clear implications
 65 |         Example: Reasonable future directions
 66 |    0.75: Strong implications
 67 |         Example: Detailed future research paths
 68 |    1.00: Exceptional vision
 69 |         Example: Novel research directions with justification
 70 | 
 71 | ---
 72 | 
 73 | Your response must follow this exact format:
 74 | 
 75 | Your Response:
 76 | ```json
 77 | {
 78 |     "analytical_depth": {
 79 |         "score": 0.0,
 80 |         "explanation": "Detailed justification for score"
 81 |     },
 82 |     "mathematical_rigor": {
 83 |         "score": 0.0,
 84 |         "explanation": "Detailed justification for score"
 85 |     },
 86 |     "results_interpretation": {
 87 |         "score": 0.0,
 88 |         "explanation": "Detailed justification for score"
 89 |     },
 90 |     "critical_analysis": {
 91 |         "score": 0.0,
 92 |         "explanation": "Detailed justification for score"
 93 |     },
 94 |     "future_implications": {
 95 |         "score": 0.0,
 96 |         "explanation": "Detailed justification for score"
 97 |     },
 98 |     "overall_score": 0.0,
 99 |     "overall_feedback": "Critical analysis of strengths and weaknesses"
100 | }
101 | ```
102 | 
103 | ---
104 | 
105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical. You should also give your score and explaination from your role's perspective."""
106 | 
107 |     USER_PROMPT = """Please evaluate the analysis groundedness of the following mathematical modeling paper:
108 | 
109 | {writing}
110 | 
111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective.
112 | 
113 | Your Response:
114 | """
115 | 
116 |     def __init__(self):
117 |         if os.path.exists("../../secret.json"):
118 |             secret = json.load(open("../../secret.json"))
119 |             self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"])
120 |         else:
121 |             self.client = OpenAI(api_key="sk-...")
122 | 
123 |     def run(self, writing: str, role: dict = None) -> dict:
124 |         role_name = role["name"].strip()
125 |         role_details = role["details"].strip()
126 |         messages = [
127 |             {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT},
128 |             {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)}
129 |         ]
130 | 
131 |         response = self.client.chat.completions.create(
132 |             model="gpt-4o-mini",
133 |             messages=messages,
134 |             temperature=0.0,
135 |             n=1,
136 |         )
137 |         
138 |         content = response.choices[0].message.content
139 |         json_str = content.split("```json")[1].split("```")[0].strip()
140 |         result = ast.literal_eval(json_str) 
141 |         
142 |         scores = [result[aspect]["score"] for aspect in [
143 |             "analytical_depth", "mathematical_rigor", "results_interpretation", 
144 |             "critical_analysis", "future_implications"
145 |         ]]
146 |         result["calculated_overall"] = sum(scores) / len(scores)
147 |         result["role"] = role
148 |         return result


--------------------------------------------------------------------------------
/src/judger/data_groundedness.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import os
  4 | from openai import OpenAI
  5 | 
  6 | class DataGroundednessJudger:
  7 |     SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess how well the solution is grounded in data and evidence. You should evaluate based on the role you are given.
  8 | 
  9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase:
 10 | 
 11 | 1. Data Quality (0-1):
 12 |    0.00: No data or invalid data
 13 |         Example: Made-up numbers without sources
 14 |    0.25: Poor quality/unreliable
 15 |         Example: Single unreliable source, outdated data
 16 |    0.50: Acceptable but limited
 17 |         Example: Reliable source but incomplete dataset
 18 |    0.75: Good with minor issues
 19 |         Example: Multiple reliable sources, small gaps
 20 |    1.00: Excellent data quality
 21 |         Example: Multiple verified sources, comprehensive coverage
 22 | 
 23 | 2. Data Processing (0-1):
 24 |    0.00: No processing/invalid
 25 |         Example: Raw data used without cleaning
 26 |    0.25: Basic processing only
 27 |         Example: Simple averaging without outlier removal
 28 |    0.50: Standard processing
 29 |         Example: Basic cleaning and normalization
 30 |    0.75: Advanced processing
 31 |         Example: Sophisticated cleaning with justification
 32 |    1.00: Comprehensive processing
 33 |         Example: Full pipeline with validation at each step
 34 | 
 35 | 3. Statistical Analysis (0-1):
 36 |    0.00: No analysis/incorrect
 37 |         Example: No statistical methods used
 38 |    0.25: Basic statistics only
 39 |         Example: Mean/median without confidence intervals
 40 |    0.50: Standard analysis
 41 |         Example: Basic hypothesis testing
 42 |    0.75: Advanced analysis
 43 |         Example: Multiple statistical methods with validation
 44 |    1.00: Rigorous analysis
 45 |         Example: Comprehensive statistical framework with robustness checks
 46 | 
 47 | 4. Data Integration (0-1):
 48 |    0.00: No integration
 49 |         Example: Data disconnected from model
 50 |    0.25: Poor integration
 51 |         Example: Forced fit without justification
 52 |    0.50: Partial integration
 53 |         Example: Some aspects well-integrated, others not
 54 |    0.75: Good integration
 55 |         Example: Most data well-integrated with clear reasoning
 56 |    1.00: Perfect integration
 57 |         Example: All data seamlessly integrated with full justification
 58 | 
 59 | 5. Validation & Testing (0-1):
 60 |    0.00: No validation
 61 |         Example: Results accepted without testing
 62 |    0.25: Minimal testing
 63 |         Example: Basic sanity checks only
 64 |    0.50: Standard validation
 65 |         Example: Cross-validation without sensitivity analysis
 66 |    0.75: Thorough validation
 67 |         Example: Multiple validation methods
 68 |    1.00: Comprehensive validation
 69 |         Example: Full validation suite with sensitivity analysis
 70 | 
 71 | ---
 72 | 
 73 | Your response must follow this exact format:
 74 | 
 75 | Your Response:
 76 | ```json
 77 | {
 78 |     "data_quality": {
 79 |         "score": 0.0,
 80 |         "explanation": "Detailed justification for score"
 81 |     },
 82 |     "data_processing": {
 83 |         "score": 0.0,
 84 |         "explanation": "Detailed justification for score"
 85 |     },
 86 |     "statistical_analysis": {
 87 |         "score": 0.0,
 88 |         "explanation": "Detailed justification for score"
 89 |     },
 90 |     "data_integration": {
 91 |         "score": 0.0,
 92 |         "explanation": "Detailed justification for score"
 93 |     },
 94 |     "validation": {
 95 |         "score": 0.0,
 96 |         "explanation": "Detailed justification for score"
 97 |     },
 98 |     "calculated_overall": 0.0,
 99 |     "overall_feedback": "Critical analysis of strengths and weaknesses"
100 | }
101 | ```
102 | 
103 | ---
104 | 
105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical. You should also give your score and explaination from your role's perspective."""
106 | 
107 |     USER_PROMPT = """Please evaluate the data groundedness of the following mathematical modeling paper:
108 | 
109 | {writing}
110 | 
111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective.
112 | 
113 | Your Response:
114 | """
115 | 
116 |     def __init__(self):
117 |         if os.path.exists("../../secret.json"):
118 |             secret = json.load(open("../../secret.json"))
119 |             self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"])
120 |         else:
121 |             self.client = OpenAI(api_key="sk-...")
122 | 
123 |     def run(self, writing: str, role: dict = None) -> dict:
124 |         role_name = role["name"].strip()
125 |         role_details = role["details"].strip()
126 |         messages = [
127 |             {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT},
128 |             {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)}
129 |         ]
130 | 
131 |         response = self.client.chat.completions.create(
132 |             model="gpt-4o-mini",
133 |             messages=messages,
134 |             temperature=0.0,
135 |             n=1,
136 |         )
137 |         
138 |         content = response.choices[0].message.content
139 |         json_str = content.split("```json")[1].split("```")[0].strip()
140 |         result = ast.literal_eval(json_str)
141 |         
142 |         # Calculate overall score as average of individual scores
143 |         scores = [result[aspect]["score"] for aspect in [
144 |             "data_quality", "data_processing", "statistical_analysis",
145 |             "data_integration", "validation"
146 |         ]]
147 |         result["calculated_overall"] = sum(scores) / len(scores)
148 |         result["role"] = role
149 |         
150 |         return result 


--------------------------------------------------------------------------------
/src/judger/innovativeness.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import os
  4 | from openai import OpenAI
  5 | 
  6 | class InnovativenessJudger:
  7 |     SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess the innovativeness and originality of the solution approach. You should evaluate based on the role you are given.
  8 | 
  9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase:
 10 | 
 11 | 1. Methodological Innovation (0-1):
 12 |    0.00: Standard/textbook approach
 13 |         Example: Using basic linear regression without modification
 14 |    0.25: Minor adaptations
 15 |         Example: Small tweaks to existing methods
 16 |    0.50: Meaningful modifications
 17 |         Example: Significant adaptations to standard approaches
 18 |    0.75: Novel combinations
 19 |         Example: Creative synthesis of multiple methods
 20 |    1.00: Groundbreaking approach
 21 |         Example: Entirely new methodology with strong justification
 22 | 
 23 | 2. Problem Framing (0-1):
 24 |    0.00: Conventional perspective
 25 |         Example: Following typical problem formulation
 26 |    0.25: Slight reframing
 27 |         Example: Minor changes to standard approach
 28 |    0.50: Fresh perspective
 29 |         Example: New angle on known problem
 30 |    0.75: Novel framing
 31 |         Example: Unique problem decomposition
 32 |    1.00: Revolutionary perspective
 33 |         Example: Paradigm-shifting problem formulation
 34 | 
 35 | 3. Solution Creativity (0-1):
 36 |    0.00: Standard solution
 37 |         Example: Direct application of known methods
 38 |    0.25: Minor creativity
 39 |         Example: Small creative elements in standard approach
 40 |    0.50: Notable creativity
 41 |         Example: Original elements in key areas
 42 |    0.75: Significant creativity
 43 |         Example: Multiple creative components
 44 |    1.00: Exceptional creativity
 45 |         Example: Entirely novel solution approach
 46 | 
 47 | 4. Technical Advancement (0-1):
 48 |    0.00: No advancement
 49 |         Example: Uses only existing techniques
 50 |    0.25: Minor improvements
 51 |         Example: Small technical optimizations
 52 |    0.50: Meaningful advances
 53 |         Example: New technical contributions
 54 |    0.75: Significant advances
 55 |         Example: Multiple technical innovations
 56 |    1.00: Major breakthrough
 57 |         Example: Revolutionary technical approach
 58 | 
 59 | 5. Impact Potential (0-1):
 60 |    0.00: Minimal impact
 61 |         Example: No new insights or applications
 62 |    0.25: Limited impact
 63 |         Example: Minor improvements to existing methods
 64 |    0.50: Moderate impact
 65 |         Example: Useful new approach for specific cases
 66 |    0.75: High impact
 67 |         Example: Broadly applicable new methods
 68 |    1.00: Transformative
 69 |         Example: Could change the field significantly
 70 | 
 71 | ---
 72 | 
 73 | Your response must follow this exact format:
 74 | 
 75 | Your Response:
 76 | ```json
 77 | {
 78 |     "methodological_innovation": {
 79 |         "score": 0.0,
 80 |         "explanation": "Detailed justification for score"
 81 |     },
 82 |     "problem_framing": {
 83 |         "score": 0.0,
 84 |         "explanation": "Detailed justification for score"
 85 |     },
 86 |     "solution_creativity": {
 87 |         "score": 0.0,
 88 |         "explanation": "Detailed justification for score"
 89 |     },
 90 |     "technical_advancement": {
 91 |         "score": 0.0,
 92 |         "explanation": "Detailed justification for score"
 93 |     },
 94 |     "impact_potential": {
 95 |         "score": 0.0,
 96 |         "explanation": "Detailed justification for score"
 97 |     },
 98 |     "overall_score": 0.0,
 99 |     "overall_feedback": "Critical analysis of innovative aspects and potential impact"
100 | }
101 | ```
102 | 
103 | ---
104 | 
105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical - true innovation is rare. You should also give your score and explaination from your role's perspective."""
106 | 
107 |     USER_PROMPT = """Please evaluate the innovativeness of the following mathematical modeling paper:
108 | 
109 | {writing}
110 | 
111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective.
112 | 
113 | Your Response:
114 | """
115 | 
116 |     def __init__(self):
117 |         if os.path.exists("../../secret.json"):
118 |             secret = json.load(open("../../secret.json"))
119 |             self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"])
120 |         else:
121 |             self.client = OpenAI(api_key="sk-...")
122 | 
123 |     def run(self, writing: str, role: dict = None) -> dict:
124 |         role_name = role["name"].strip()
125 |         role_details = role["details"].strip()
126 |         messages = [
127 |             {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT},
128 |             {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)}
129 |         ]
130 | 
131 |         response = self.client.chat.completions.create(
132 |             model="gpt-4o-mini",
133 |             messages=messages,
134 |             temperature=0.0,
135 |             n=1,
136 |         )
137 |         
138 |         content = response.choices[0].message.content
139 |         json_str = content.split("```json")[1].split("```")[0].strip()
140 |         result =  ast.literal_eval(json_str) 
141 |         
142 |         scores = [result[aspect]["score"] for aspect in [
143 |             "methodological_innovation", "problem_framing", "solution_creativity", 
144 |             "technical_advancement", "impact_potential"
145 |         ]]
146 |         result["calculated_overall"] = sum(scores) / len(scores)
147 |         result["role"] = role
148 |         return result


--------------------------------------------------------------------------------
/src/judger/main_judge.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | import ast
  4 | from concurrent.futures import ThreadPoolExecutor
  5 | from typing import Dict, Any, Set
  6 | 
  7 | from structural_coherency import StructuralCoherencyJudger
  8 | from scoring_decomposition import ScoringDecompositionJudger
  9 | from modeling_groundedness import ModelingGroundednessJudger
 10 | from data_groundedness import DataGroundednessJudger
 11 | from analysis_groundedness import AnalysisGroundednessJudger
 12 | from innovativeness import InnovativenessJudger
 13 | 
 14 | class MainJudger:
 15 |     def __init__(self):
 16 |         self.judgers = {
 17 |             "structural_coherency": StructuralCoherencyJudger(),
 18 |             "scoring_decomposition": ScoringDecompositionJudger(),
 19 |             "modeling_groundedness": ModelingGroundednessJudger(),
 20 |             "data_groundedness": DataGroundednessJudger(),
 21 |             "analysis_groundedness": AnalysisGroundednessJudger(),
 22 |             "innovativeness": InnovativenessJudger()
 23 |         }
 24 |         
 25 |         # Judgers that use role-based evaluation
 26 |         self.role_based_judgers = {
 27 |             "modeling_groundedness",
 28 |             "data_groundedness", 
 29 |             "analysis_groundedness",
 30 |             "innovativeness"
 31 |         }
 32 |         
 33 |     def run_judger(self, judger_name: str, writing: str, roles: list = None, grading_points: list = None) -> Dict[str, Any]:
 34 |         try:
 35 |             judger = self.judgers[judger_name]
 36 |             
 37 |             # Handle role-based judgers
 38 |             if judger_name in self.role_based_judgers and roles:
 39 |                 results = []
 40 |                 for role in roles:
 41 |                     result = judger.run(writing, role=role)
 42 |                     result["role"] = role
 43 |                     results.append(result)
 44 |                 return {
 45 |                     "role_based_results": results,
 46 |                     "aggregated_score": sum(r.get("calculated_overall", 0) for r in results) / len(results)
 47 |                 }
 48 |             
 49 |             # Handle non-role-based judgers
 50 |             if judger_name == "scoring_decomposition":
 51 |                 return judger.run(writing, grading_points)
 52 |             return judger.run(writing)
 53 |             
 54 |         except Exception as e:
 55 |             print(f"Error in {judger_name}: {str(e)}")
 56 |             return {
 57 |                 "error": str(e),
 58 |                 "status": "failed",
 59 |                 "judger": judger_name
 60 |             }
 61 |     
 62 |     def get_existing_results(self, output_dir: str, gold_id: str) -> Dict[str, Any]:
 63 |         """Read existing judgement results if they exist"""
 64 |         output_file = f"{output_dir}/{gold_id}.json"
 65 |         if os.path.exists(output_file):
 66 |             try:
 67 |                 with open(output_file) as f:
 68 |                     results = json.load(f)
 69 |                 return results.get("judgements", {})
 70 |             except:
 71 |                 return {}
 72 |         return {}
 73 | 
 74 |     def get_missing_judgers(self, existing_results: Dict[str, Any]) -> Set[str]:
 75 |         """Determine which judgers need to be run"""
 76 |         missing = set(self.judgers.keys())
 77 |         for judger_name, result in existing_results.items():
 78 |             # Only consider result valid if it exists and has no error
 79 |             if result and "error" not in result:
 80 |                 missing.remove(judger_name)
 81 |         return missing
 82 |     
 83 |     def judge(self, output_dir: str, gold_id: str, writing: str, grading_points: list, roles: list = None) -> Dict[str, Any]:
 84 |         # Initialize results structure
 85 |         results = {
 86 |             "gold_id": gold_id,
 87 |             "judgements": {},
 88 |             "metadata": {
 89 |                 "success_count": 0,
 90 |                 "failed_count": 0,
 91 |                 "failed_judgers": [],
 92 |                 "skipped_count": 0,
 93 |                 "skipped_judgers": []
 94 |             }
 95 |         }
 96 |         
 97 |         # Get existing results
 98 |         existing_results = self.get_existing_results(output_dir, gold_id)
 99 |         missing_judgers = self.get_missing_judgers(existing_results)
100 |         
101 |         print(f"Missing judgers for {gold_id}: {missing_judgers}")
102 |         
103 |         # Add existing valid results to our results
104 |         for judger_name, result in existing_results.items():
105 |             if judger_name not in missing_judgers:
106 |                 results["judgements"][judger_name] = result
107 |                 results["metadata"]["skipped_count"] += 1
108 |                 results["metadata"]["skipped_judgers"].append(judger_name)
109 |         
110 |         if not missing_judgers:
111 |             print(f"All judgements already exist for {gold_id}")
112 |             return results
113 |         
114 |         # Run only missing judgers in parallel
115 |         with ThreadPoolExecutor(max_workers=6) as executor:
116 |             future_to_judger = {
117 |                 executor.submit(
118 |                     self.run_judger, 
119 |                     name, 
120 |                     writing, 
121 |                     roles if name in self.role_based_judgers else None,
122 |                     grading_points if name == "scoring_decomposition" else None
123 |                 ): name
124 |                 for name in missing_judgers
125 |             }
126 |             
127 |             for future in future_to_judger:
128 |                 name = future_to_judger[future]
129 |                 try:
130 |                     result = future.result()
131 |                     if "error" in result:
132 |                         results["metadata"]["failed_count"] += 1
133 |                         results["metadata"]["failed_judgers"].append(name)
134 |                     else:
135 |                         results["metadata"]["success_count"] += 1
136 |                     results["judgements"][name] = result
137 |                 except Exception as e:
138 |                     print(f"Error in {name}: {str(e)}")
139 |                     results["judgements"][name] = {"error": str(e)}
140 |                     results["metadata"]["failed_count"] += 1
141 |                     results["metadata"]["failed_judgers"].append(name)
142 |         
143 |         with open(f"{output_dir}/{gold_id}.json", 'w') as f:
144 |             json.dump(results, f, indent=4)
145 |         
146 |         return results
147 | 
148 | def process_gold_id(args):
149 |     gold_id, data, output_dir, judger = args
150 |     if "writing" not in data:  # Skip if no writing to evaluate
151 |         return
152 |     
153 |     writing = data["writing"]
154 |     criteria = data["criteria"]
155 |     grading_points = criteria.get("decomposition", {}).get("grading_points", [])
156 |     roles = criteria.get("eval_roles", [])
157 |     
158 |     print(f"Evaluating {gold_id}...")
159 |     results = judger.judge(output_dir, gold_id, writing, grading_points, roles)
160 |     print(f"Completed {gold_id} - Success: {results['metadata']['success_count']}, "
161 |           f"Failed: {results['metadata']['failed_count']}, "
162 |           f"Skipped: {results['metadata']['skipped_count']}")
163 |     return gold_id, results
164 | 
165 | def main():
166 |     for model, level in zip(["Qwen2.5-72B-Instruct"], ["ModelAgent"]):
167 |         try:
168 |             # Load problem data
169 |             with open("../../data/modeling_data_final.json") as f:
170 |                 criterias = json.load(f)
171 |             with open(f"../../output_writings/{level}/{model}/solutions_metadata.json") as f:
172 |                 writings = json.load(f)
173 |             
174 |             output_dir = f"../../output_judge/{level}/{model}"
175 |             os.makedirs(output_dir, exist_ok=True)
176 |             all_data = {}
177 |             for gold_id, criteria in criterias.items():
178 |                     all_data[gold_id] = {
179 |                         "criteria": criteria,
180 |                     }
181 |             for gold_id, writing in writings.items():
182 |                     if "writing" not in writing:
183 |                         continue
184 |                     all_data[gold_id]["writing"] = writing
185 |             
186 |             print(len(all_data))
187 |             
188 |             judger = MainJudger()
189 |             
190 |             # Process problems in parallel
191 |             with ThreadPoolExecutor(max_workers=10) as executor:
192 |                 args = [(gold_id, data, output_dir, judger) for gold_id, data in all_data.items()]
193 |                 results = list(executor.map(process_gold_id, args))
194 |         except:
195 |             continue
196 | 
197 | if __name__ == "__main__":
198 |     main()


--------------------------------------------------------------------------------
/src/judger/modeling_groundedness.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import os
  4 | from openai import OpenAI
  5 | 
  6 | class ModelingGroundednessJudger:
  7 |      SYS_PROMPT = """You are currently evaluating mathematical modeling papers. Your task is to assess how well the solution's modeling approach is grounded in mathematical and scientific principles. You should evaluate based on the role you are given.
  8 | 
  9 | Score each aspect from 0-1, starting at 0 and requiring justification for any increase:
 10 | 
 11 | 1. Mathematical Foundation (0-1):
 12 |    0.00: Fundamentally flawed or missing
 13 |         Example: No equations, incorrect mathematical concepts
 14 |    0.25: Basic but problematic
 15 |         Example: Simple equations without proper variables defined
 16 |    0.50: Sound but incomplete
 17 |         Example: Correct equations but missing key relationships
 18 |    0.75: Strong with minor gaps
 19 |         Example: Well-formulated with some assumptions not fully justified
 20 |    1.00: Excellent and rigorous
 21 |         Example: Complete mathematical framework with all relationships justified
 22 | 
 23 | 2. Real-World Integration (0-1):
 24 |    0.00: No connection to reality
 25 |         Example: Pure abstract model without practical context
 26 |    0.25: Superficial consideration
 27 |         Example: Mentioning real factors without incorporating them
 28 |    0.50: Partial integration
 29 |         Example: Some key factors included but others missing
 30 |    0.75: Good but not comprehensive
 31 |         Example: Most factors included but some interactions overlooked
 32 |    1.00: Complete integration
 33 |         Example: All relevant factors and interactions properly modeled
 34 | 
 35 | 3. Technical Sophistication (0-1):
 36 |    0.00: Elementary/inappropriate
 37 |         Example: Using linear regression for clearly nonlinear problems
 38 |    0.25: Basic techniques only
 39 |         Example: Simple statistical methods without justification
 40 |    0.50: Appropriate but limited
 41 |         Example: Correct methods but not fully exploited
 42 |    0.75: Advanced with minor issues
 43 |         Example: Sophisticated methods with some gaps in implementation
 44 |    1.00: State-of-the-art
 45 |         Example: Cutting-edge techniques properly implemented
 46 | 
 47 | 4. Validation Approach (0-1):
 48 |    0.00: No validation
 49 |         Example: Results presented without any verification
 50 |    0.25: Minimal testing
 51 |         Example: Basic sanity checks only
 52 |    0.50: Partial validation
 53 |         Example: Some test cases but not comprehensive
 54 |    0.75: Thorough but not complete
 55 |         Example: Multiple validation methods but missing edge cases
 56 |    1.00: Comprehensive validation
 57 |         Example: Multiple methods, edge cases, sensitivity analysis
 58 | 
 59 | 5. Implementation Quality (0-1):
 60 |    0.00: Poor/incorrect
 61 |         Example: Errors in implementation, wrong formulas
 62 |    0.25: Basic but flawed
 63 |         Example: Correct concept but significant implementation errors
 64 |    0.50: Workable but needs improvement
 65 |         Example: Functions correctly but inefficient or unclear
 66 |    0.75: Good with minor issues
 67 |         Example: Well-implemented but some optimization possible
 68 |    1.00: Excellent implementation
 69 |         Example: Efficient, clear, and well-documented code
 70 | 
 71 | ---
 72 | 
 73 | Your response must follow this exact format:
 74 | 
 75 | Your Response:
 76 | ```json
 77 | {
 78 |     "mathematical_foundation": {
 79 |         "score": 0.0,
 80 |         "explanation": "Detailed justification for score"
 81 |     },
 82 |     "real_world_integration": {
 83 |         "score": 0.0,
 84 |         "explanation": "Detailed justification for score"
 85 |     },
 86 |     "technical_sophistication": {
 87 |         "score": 0.0,
 88 |         "explanation": "Detailed justification for score"
 89 |     },
 90 |     "validation": {
 91 |         "score": 0.0,
 92 |         "explanation": "Detailed justification for score"
 93 |     },
 94 |     "implementation": {
 95 |         "score": 0.0,
 96 |         "explanation": "Detailed justification for score"
 97 |     },
 98 |     "calculated_overall": 0.0,
 99 |     "overall_feedback": "Critical analysis of strengths and weaknesses"
100 | }
101 | ```
102 | 
103 | ---
104 | 
105 | Note: Scores must be exactly 0.00, 0.25, 0.50, 0.75, or 1.00. Start at 0 and justify each increment. Be extremely critical. You should also give your score and explaination from your role's perspective."""
106 | 
107 |      USER_PROMPT = """Please evaluate the modeling groundedness of the following mathematical modeling paper:
108 | 
109 | {writing}
110 | 
111 | Provide scores and detailed justification for each aspect. Remember your role as {role_name}. Your judgement should be based on this role's perspective.
112 | 
113 | Your Response:
114 | """
115 | 
116 |      def __init__(self):
117 |           if os.path.exists("../../secret.json"):
118 |                secret = json.load(open("../../secret.json"))
119 |                self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"])
120 |           else:
121 |                self.client = OpenAI(api_key="sk-...")
122 | 
123 |      def run(self, writing: str, role: dict = None) -> dict:
124 |           role_name = role["name"].strip()
125 |           role_details = role["details"].strip()
126 |           messages = [
127 |              {'role': 'system', 'content': role_details + "\n\n" + self.SYS_PROMPT},
128 |              {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing, role_name=role_name)}
129 |           ]
130 | 
131 |           response = self.client.chat.completions.create(
132 |                model="gpt-4o-mini",
133 |                messages=messages,
134 |                temperature=0.0,
135 |                n=1,
136 |           )
137 |         
138 |           content = response.choices[0].message.content
139 |           json_str = content.split("```json")[1].split("```")[0].strip()
140 |           result = ast.literal_eval(json_str)
141 |           
142 |           if "implementation_quality" in result:
143 |                result["implementation"] = result.pop("implementation_quality")
144 |                
145 |           # Calculate overall score as average of individual scores
146 |           scores = [result[aspect]["score"] for aspect in [
147 |                "mathematical_foundation", "real_world_integration", 
148 |                "technical_sophistication", "validation", "implementation"
149 |           ]]
150 |           result["calculated_overall"] = sum(scores) / len(scores)
151 |           result["role"] = role
152 |           
153 |           return result 


--------------------------------------------------------------------------------
/src/judger/scoring_decomposition.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import os
  4 | from openai import OpenAI
  5 | 
  6 | class ScoringDecompositionJudger:
  7 |     SYS_PROMPT = """You are an expert judge evaluating mathematical modeling papers. Your task is to assess if each requirement of the problem is faithfully fulfilled based on the provided grading points.
  8 | 
  9 | For each grading point, score from 0-1:
 10 | 
 11 | 0.00: Requirement Ignored/Failed
 12 |       Example: No attempt to address the requirement
 13 |       Example: Completely incorrect approach
 14 | 
 15 | 0.25: Minimal/Poor Treatment
 16 |       Example: Superficial mention without proper implementation
 17 |       Example: Major flaws in approach or understanding
 18 | 
 19 | 0.50: Partial/Basic Treatment
 20 |       Example: Addresses main points but misses important aspects
 21 |       Example: Correct approach but incomplete implementation
 22 | 
 23 | 0.75: Good but Not Perfect
 24 |       Example: Strong treatment with minor omissions
 25 |       Example: Well-implemented but could be more thorough
 26 | 
 27 | 1.00: Complete and Excellent
 28 |       Example: Comprehensive treatment of all aspects
 29 |       Example: Thorough implementation with validation
 30 | 
 31 | Critical Evaluation Points:
 32 | 
 33 | 1. Completeness:
 34 |    - Every sub-requirement must be explicitly addressed
 35 |    - Partial treatment results in significant score reduction
 36 |    - Missing elements cannot be compensated by other strengths
 37 | 
 38 | 2. Quality of Implementation:
 39 |    - Mathematical rigor is essential
 40 |    - Must show clear methodology
 41 |    - Must include validation
 42 |    - Surface-level solutions score 0.25 maximum
 43 | 
 44 | 3. Integration:
 45 |    - Requirements must work together coherently
 46 |    - Interdependencies must be addressed
 47 |    - Isolated solutions score 0.50 maximum
 48 | 
 49 | 4. Validation:
 50 |    - Results must be verified
 51 |    - Assumptions must be tested
 52 |    - No validation means 0.25 maximum score
 53 | 
 54 | ---
 55 | 
 56 | Your response must follow this exact format:
 57 | 
 58 | Your Response:
 59 | ```json
 60 | {
 61 |     "scores": {
 62 |         "grading_point_1": 0.0,
 63 |         "grading_point_2": 0.0,
 64 |         ...
 65 |     },
 66 |     "explanation": {
 67 |         "grading_point_1": "why this score",
 68 |         "grading_point_2": "why this score",
 69 |         ...
 70 |     }
 71 | }
 72 | ```
 73 | 
 74 | ---
 75 | 
 76 | Note: For each grading point, score must be exactly 0.0, 0.25, 0.50, 0.75, or 1.00. Use the grading point's category as the key in the scores and explanation dictionaries. Be extremely critical - most solutions should score in the 0.25-0.50 range unless truly exceptional."""
 77 | 
 78 |     USER_PROMPT = """Please evaluate if the following mathematical modeling paper fulfills each grading point requirement:
 79 | 
 80 | Paper Content:
 81 | {writing}
 82 | 
 83 | ---
 84 | 
 85 | Grading Points to Evaluate:
 86 | {grading_points}
 87 | 
 88 | ---
 89 | 
 90 | Provide scores and explanations for each grading point.
 91 | 
 92 | Your Response:
 93 | """
 94 | 
 95 |     def __init__(self):
 96 |         if os.path.exists("../../secret.json"):
 97 |             secret = json.load(open("../../secret.json"))
 98 |             self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"])
 99 |         else:
100 |             self.client = OpenAI(api_key="sk-...")
101 | 
102 |     def run(self, writing: str, grading_points: list) -> dict:
103 |         messages = [
104 |             {'role': 'system', 'content': self.SYS_PROMPT},
105 |             {'role': 'user', 'content': self.USER_PROMPT.format(
106 |                 writing=writing,
107 |                 grading_points=json.dumps(grading_points, indent=2)
108 |             )}
109 |         ]
110 | 
111 |         response = self.client.chat.completions.create(
112 |             model="gpt-4o-mini",
113 |             messages=messages,
114 |             temperature=0.0,
115 |             n=1,
116 |         )
117 |         
118 |         content = response.choices[0].message.content
119 |         json_str = content.split("```json")[1].split("```")[0].strip()
120 |         result = ast.literal_eval(json_str)
121 |         total_score = sum(result["scores"].values())
122 |         result["total_score"] = total_score
123 |         average_score = total_score / len(result["scores"])
124 |         result["calculated_overall"] = average_score
125 |         
126 |         return result 


--------------------------------------------------------------------------------
/src/judger/structural_coherency.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import ast
  3 | import os
  4 | from openai import OpenAI
  5 | 
  6 | class StructuralCoherencyJudger:
  7 |     SYS_PROMPT = """You are an expert judge evaluating mathematical modeling papers. Your task is to assess the structural coherency of the paper by checking if it contains all necessary components.
  8 | 
  9 | Key components to evaluate (up to 1 point each):
 10 | 
 11 | 1. Problem Restatement (0-1):
 12 |    0.00: Missing or completely misunderstood
 13 |         Example: Simply copying problem text or missing key elements
 14 |    0.25: Present but superficial
 15 |         Example: Basic bullet points of requirements without context
 16 |    0.50: Adequate but lacks depth
 17 |         Example: Covers main points but misses subtle relationships
 18 |    0.75: Good with minor gaps
 19 |         Example: Clear understanding but could elaborate connections
 20 |    1.00: Excellent and comprehensive
 21 |         Example: Deep understanding with clear relationships and context
 22 | 
 23 | 2. Assumptions and Justification (0-1):
 24 |    0.00: Missing or unjustified
 25 |         Example: No assumptions listed or completely unreasonable ones
 26 |    0.25: Listed but poorly justified
 27 |         Example: "We assume linear relationship" without explanation
 28 |    0.50: Reasonable but incomplete
 29 |         Example: Key assumptions stated but some justifications weak
 30 |    0.75: Well-justified with minor gaps
 31 |         Example: Clear justifications but missing some implications
 32 |    1.00: Comprehensive and thorough
 33 |         Example: All assumptions clearly stated, justified, and impacts explained
 34 | 
 35 | 3. Modeling Implementation (0-1):
 36 |    0.00: Missing or fundamentally flawed
 37 |         Example: No clear mathematical formulation
 38 |    0.25: Basic but poorly developed
 39 |         Example: Equations listed without explanation or context
 40 |    0.50: Sound but lacks rigor
 41 |         Example: Correct approach but missing some derivations
 42 |    0.75: Strong with minor omissions
 43 |         Example: Clear formulation but could be more detailed
 44 |    1.00: Rigorous and complete
 45 |         Example: Clear, justified, and thorough mathematical development
 46 | 
 47 | 4. Solution Process (0-1):
 48 |    0.00: Missing or invalid
 49 |         Example: No solution method or completely incorrect approach
 50 |    0.25: Vague or incomplete
 51 |         Example: "We solved using computer" without details
 52 |    0.50: Basic but workable
 53 |         Example: Solution steps listed but lacking validation
 54 |    0.75: Clear with minor gaps
 55 |         Example: Well-documented but missing some error analysis
 56 |    1.00: Comprehensive and validated
 57 |         Example: Clear steps, validation, and error analysis
 58 | 
 59 | 5. Analysis (0-1):
 60 |    0.00: Missing or invalid
 61 |         Example: No analysis or completely wrong interpretations
 62 |    0.25: Superficial discussion
 63 |         Example: Basic statements without supporting evidence
 64 |    0.50: Valid but limited
 65 |         Example: Correct analysis but missing sensitivity tests
 66 |    0.75: Thorough with minor gaps
 67 |         Example: Good analysis but could explore more implications
 68 |    1.00: Deep and insightful
 69 |         Example: Comprehensive analysis with validation and implications
 70 | 
 71 | ---
 72 | 
 73 | Your response must follow this exact format:
 74 | 
 75 | Your Response:
 76 | ```json
 77 | {
 78 |     "scores": {
 79 |         "problem_restatement": 0.0,
 80 |         "assumptions": 0.0,
 81 |         "modeling_implementation": 0.0,
 82 |         "solution_process": 0.0,
 83 |         "analysis": 0.0
 84 |     },
 85 |     "explanation": {
 86 |         "problem_restatement": "why this score",
 87 |         "assumptions": "why this score",
 88 |         "modeling_implementation": "why this score",
 89 |         "solution_process": "why this score",
 90 |         "analysis": "why this score"
 91 |     }
 92 | }
 93 | ```
 94 | 
 95 | ---
 96 | 
 97 | Note: For each component, score must be exactly 0.0, 0.25, 0.50, 0.75, or 1.00. Be extremely critical - most solutions should score in the 0.25-0.50 range unless truly exceptional."""
 98 | 
 99 |     USER_PROMPT = """Please evaluate the structural coherency of the following mathematical modeling paper:
100 | 
101 | {writing}
102 | 
103 | Provide scores and explanations for each component.
104 | 
105 | Your Response:
106 | """
107 | 
108 |     def __init__(self):
109 |         if os.path.exists("../../secret.json"):
110 |             secret = json.load(open("../../secret.json"))
111 |             self.client = OpenAI(api_key=secret["api_key"], base_url=secret["base_url"])
112 |         else:
113 |             self.client = OpenAI(api_key="sk-...")
114 | 
115 |     def run(self, writing: str) -> dict:
116 |         messages = [
117 |             {'role': 'system', 'content': self.SYS_PROMPT},
118 |             {'role': 'user', 'content': self.USER_PROMPT.format(writing=writing)}
119 |         ]
120 | 
121 |         response = self.client.chat.completions.create(
122 |             model="gpt-4o-mini",
123 |             messages=messages,
124 |             temperature=0.0,
125 |             n=1,
126 |         )
127 |         
128 |         content = response.choices[0].message.content
129 |         json_str = content.split("```json")[1].split("```")[0].strip()
130 |         result = ast.literal_eval(json_str)
131 |         total_score = sum(result["scores"].values())
132 |         result["total_score"] = total_score
133 |         average_score = total_score / len(result["scores"])
134 |         result["calculated_overall"] = average_score
135 |         
136 |         return result 


--------------------------------------------------------------------------------
/src/tools/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/qiancheng0/ModelingAgent/dca3588ba7cf114b77ed5f89aa1f3e9ddf4a3baa/src/tools/__init__.py


--------------------------------------------------------------------------------
/src/tools/base.py:
--------------------------------------------------------------------------------
 1 | class BaseTool:
 2 |     """
 3 |     A base class for building tool classes that perform specific tasks, such as image processing or text detection.
 4 |     """
 5 | 
 6 |     require_llm_engine = False  # Default is False, tools that need LLM should set this to True
 7 | 
 8 |     def __init__(self, tool_name=None, tool_description=None, tool_version=None, input_types=None, output_type=None, demo_commands=None, output_dir=None, user_metadata=None, model_string=None):
 9 |         """
10 |         Initialize the base tool with optional metadata.
11 | 
12 |         Parameters:
13 |             tool_name (str): The name of the tool.
14 |             tool_description (str): A description of the tool.
15 |             tool_version (str): The version of the tool.
16 |             input_types (dict): The expected input types for the tool.
17 |             output_type (str): The expected output type for the tool.
18 |             demo_commands (list): A list of example commands for using the tool.
19 |             output_dir (str): The directory where the tool should save its output (optional).
20 |             user_metadata (dict): Additional metadata specific to user needs (optional).
21 |             model_string (str): The model string for the LLM engine (optional, only used if require_llm_engine is True).
22 |         """
23 |         self.tool_name = tool_name
24 |         self.tool_description = tool_description
25 |         self.tool_version = tool_version
26 |         self.input_types = input_types
27 |         self.output_type = output_type
28 |         self.demo_commands = demo_commands
29 |         self.output_dir = output_dir
30 |         self.user_metadata = user_metadata
31 |         self.model_string = model_string
32 | 
33 |     def set_metadata(self, tool_name, tool_description, tool_version, input_types, output_type, demo_commands, user_metadata=None):
34 |         """
35 |         Set the metadata for the tool.
36 | 
37 |         Parameters:
38 |             tool_name (str): The name of the tool.
39 |             tool_description (str): A description of the tool.
40 |             tool_version (str): The version of the tool.
41 |             input_types (dict): The expected input types for the tool.
42 |             output_type (str): The expected output type for the tool.
43 |             demo_commands (list): A list of example commands for using the tool.
44 |             user_metadata (dict): Additional metadata specific to user needs (optional).
45 |         """
46 |         self.tool_name = tool_name
47 |         self.tool_description = tool_description
48 |         self.tool_version = tool_version
49 |         self.input_types = input_types
50 |         self.output_type = output_type
51 |         self.demo_commands = demo_commands
52 |         self.user_metadata = user_metadata
53 | 
54 |     def get_metadata(self):
55 |         """
56 |         Returns the metadata for the tool.
57 | 
58 |         Returns:
59 |             dict: A dictionary containing the tool's metadata.
60 |         """
61 |         metadata = {
62 |             "tool_name": self.tool_name,
63 |             "tool_description": self.tool_description,
64 |             "tool_version": self.tool_version,
65 |             "input_types": self.input_types,
66 |             "output_type": self.output_type,
67 |             "demo_commands": self.demo_commands,
68 |             "require_llm_engine": self.require_llm_engine,
69 |         }
70 |         if self.user_metadata:
71 |             metadata["user_metadata"] = self.user_metadata
72 |         return metadata
73 | 
74 |     def set_custom_output_dir(self, output_dir):
75 |         """
76 |         Set a custom output directory for the tool.
77 | 
78 |         Parameters:
79 |             output_dir (str): The new output directory path.
80 |         """
81 |         self.output_dir = output_dir
82 | 
83 |     def set_llm_engine(self, model_string):
84 |         """
85 |         Set the LLM engine for the tool.
86 | 
87 |         Parameters:
88 |             model_string (str): The model string for the LLM engine.
89 |         """
90 |         self.model_string = model_string
91 | 
92 |     def execute(self, *args, **kwargs):
93 |         """
94 |         Execute the tool's main functionality. This method should be overridden by subclasses.
95 | 
96 |         Raises:
97 |             NotImplementedError: If the subclass does not implement this method.
98 |         """
99 |         raise NotImplementedError("Subclasses must implement the execute method.")


--------------------------------------------------------------------------------
/src/tools/code_executor.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import tempfile
  3 | import subprocess
  4 | import threading
  5 | 
  6 | from .base import BaseTool
  7 | 
  8 | class Python_Execution_Tool(BaseTool):
  9 |     require_llm_engine = False
 10 |     
 11 |     def __init__(self):
 12 |         super().__init__(
 13 |             tool_name="Python_Execution_Tool",
 14 |             tool_description="A tool that executes Python code from a file or provided content, handling errors and timeouts.",
 15 |             tool_version="1.0.0",
 16 |             input_types={
 17 |                 "file_path": "str - The path to the Python file from the workspace (default to 'workspace/temp.py').",
 18 |                 "code_content": "str -  The Python code to execute. If provided, it will overwrite the file or be executed from a temp file."
 19 |             },
 20 |             output_type="str - The output or error messages from executing the code.",
 21 |             demo_commands=[
 22 |                 {
 23 |                     "command": "execution = tool.execute(file_path='workspace/script.py')",
 24 |                     "description": "Execute an existing Python script."
 25 |                 },
 26 |                 {
 27 |                     "command": "execution = tool.execute(code_content='print(\"Hello, World!\")')",
 28 |                     "description": "Execute provided Python code directly."
 29 |                 }
 30 |             ],
 31 |             user_metadata={
 32 |                 "limitation": "Any error encountered will be returned. This tool will faithfully execute what you provide or what is in the code file, without any validation or refining. ⚠️ The current sandbox environment **does NOT allow installing new Python packages**, especially those that require compiling native C/C++ libraries (e.g., GDAL/GEOS/PROJ). Your code should use pure python, not anythings using g++. You can use numpy, pandas, scipy, etc.",
 33 |                 "best_practice": "If the code content is given, ensure it is well structured and is directly executable. If the file path is provided, ensure the file exists and is a valid Python script.\nEnsure in the code file path should all be relative path within the workspace. Use this tool for quick code execution and experiment."
 34 |             },
 35 |         )
 36 |     
 37 |     def execute(self, file_path=None, code_content=None):
 38 |         """
 39 |         Execute Python code from file_path or code_content.
 40 |         Returns a dict: { "success": bool, "message": str }.
 41 | 
 42 |         - success: True if code executed without 'Error:' or 'Time limit exceeded' in output.
 43 |                 False otherwise (including invalid file path, exception, etc.)
 44 |         - message: The output or error message string.
 45 |         """
 46 |         # Quick checks
 47 |         if not file_path and not code_content:
 48 |             return {
 49 |                 "success": False,
 50 |                 "message": "Error: Either file_path or code_content must be provided."
 51 |             }
 52 | 
 53 |         workspace_path = os.getenv("WORKSPACE_PATH", "workspace")
 54 |         execution_file = None
 55 | 
 56 |         if file_path:
 57 |             if "workspace" not in file_path:
 58 |                 file_path = os.path.join(workspace_path, file_path)
 59 |             else:
 60 |                 file_path = os.path.join(workspace_path, file_path.split("workspace/")[-1])
 61 | 
 62 |         try:
 63 |             if code_content:
 64 |                 # If code_content is given:
 65 |                 # 1) if file_path is also provided, we write code_content to that file
 66 |                 # 2) else we create a temp file
 67 |                 if file_path:
 68 |                     execution_file = file_path
 69 |                     with open(execution_file, "w", encoding="utf-8") as f:
 70 |                         f.write(code_content)
 71 |                 else:
 72 |                     import tempfile
 73 |                     temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".py", dir=workspace_path)
 74 |                     execution_file = temp_file.name
 75 |                     print(f"Temporary file created: {execution_file}")
 76 |                     with open(execution_file, "w", encoding="utf-8") as f:
 77 |                         f.write(code_content)
 78 |                     temp_file.close()
 79 |             else:
 80 |                 # code_content is None => must run an existing Python file
 81 |                 execution_file = file_path
 82 |                 if not os.path.isfile(execution_file):
 83 |                     return {
 84 |                         "success": False,
 85 |                         "message": "Error: Provided file path does not exist."
 86 |                     }
 87 | 
 88 |             # We'll use a dictionary to store the output
 89 |             result = {"output": "Execution Error: Time limit exceeded."}
 90 | 
 91 |             def run_code():
 92 |                 try:
 93 |                     result["output"] = subprocess.check_output(
 94 |                         ["python", execution_file],
 95 |                         stderr=subprocess.STDOUT,
 96 |                         text=True,
 97 |                         timeout=15
 98 |                     )
 99 |                 except subprocess.TimeoutExpired:
100 |                     result["output"] = "Execution Error: Time limit exceeded."
101 |                 except subprocess.CalledProcessError as e:
102 |                     result["output"] = f"Execution Error: {e.output}"
103 |                 except Exception as e:
104 |                     result["output"] = f"Unexpected Error: {str(e)}"
105 | 
106 |             execution_thread = threading.Thread(target=run_code)
107 |             execution_thread.start()
108 |             execution_thread.join(timeout=15)
109 | 
110 |             # Check if result contains errors
111 |             out_str = result["output"]
112 |             # Simple check: if contains "Error:" or "Time limit exceeded" then failed
113 |             # You can refine this check as needed
114 |             if "Error:" in out_str or "time limit exceeded" in out_str.lower():
115 |                 return {
116 |                     "success": False,
117 |                     "message": out_str
118 |                 }
119 |             else:
120 |                 return {
121 |                     "success": True,
122 |                     "message": out_str
123 |                 }
124 | 
125 |         except Exception as e:
126 |             # If error occurs elsewhere in try block, also return error
127 |             return {
128 |                 "success": False,
129 |                 "message": f"Error: {str(e)}"
130 |             }
131 |         finally:
132 |             # Cleanup temp file if used
133 |             if code_content and not file_path and execution_file:
134 |                 os.remove(execution_file)
135 | 
136 | 
137 | if __name__ == "__main__":
138 |     tool = Python_Execution_Tool()
139 |     execution = tool.execute(file_path="workspace/output.py")
140 |     print("Execution Output:")
141 |     print(execution)


--------------------------------------------------------------------------------
/src/tools/engine.py:
--------------------------------------------------------------------------------
  1 | from openai import OpenAI
  2 | import os
  3 | import base64
  4 | from tenacity import (
  5 |     retry,
  6 |     stop_after_attempt,
  7 |     wait_random_exponential,
  8 | )
  9 | from typing import List, Union
 10 | 
 11 | import openai
 12 | 
 13 | # FIXME Define global constant for structured models
 14 | OPENAI_STRUCTURED_MODELS = ['gpt-4o']
 15 | 
 16 | 
 17 | class ChatOpenAI():
 18 |     DEFAULT_SYSTEM_PROMPT = "You are a helpful, creative, and smart assistant."
 19 | 
 20 |     def __init__(
 21 |         self,
 22 |         model_string="gpt-4o",
 23 |         system_prompt=DEFAULT_SYSTEM_PROMPT,
 24 |         is_multimodal: bool=False,
 25 |         **kwargs):
 26 |         """
 27 |         :param model_string:
 28 |         :param system_prompt:
 29 |         :param is_multimodal:
 30 |         """
 31 |         self.system_prompt = system_prompt
 32 |         if os.getenv("OPENAI_API_KEY") is None:
 33 |             raise ValueError("Please set the OPENAI_API_KEY environment variable if you'd like to use OpenAI models.")
 34 |         
 35 |         self.client = OpenAI(
 36 |             api_key=os.getenv("OPENAI_API_KEY"),
 37 |         )
 38 |         self.model_string = model_string
 39 |         self.is_multimodal = is_multimodal
 40 |     
 41 | 
 42 |     @retry(wait=wait_random_exponential(min=1, max=5), stop=stop_after_attempt(5))
 43 |     def generate(self, content: Union[str, List[Union[str, bytes]]], system_prompt=None, **kwargs):
 44 |         try:
 45 |             # Print retry attempt information
 46 |             attempt_number = self.generate.retry.statistics.get('attempt_number', 0) + 1
 47 |             if attempt_number > 1:
 48 |                 print(f"Attempt {attempt_number} of 5")
 49 | 
 50 |             if isinstance(content, str):
 51 |                 return self._generate_text(content, system_prompt=system_prompt, **kwargs)
 52 |             
 53 |             elif isinstance(content, list):
 54 |                 if (not self.is_multimodal):
 55 |                     raise NotImplementedError("Multimodal generation is only supported for GPT-4 models.")
 56 |                 
 57 |                 return self._generate_multimodal(content, system_prompt=system_prompt, **kwargs)
 58 | 
 59 |         except openai.LengthFinishReasonError as e:
 60 |             print(f"Token limit exceeded: {str(e)}")
 61 |             print(f"Tokens used - Completion: {e.completion.usage.completion_tokens}, Prompt: {e.completion.usage.prompt_tokens}, Total: {e.completion.usage.total_tokens}")
 62 |             return {
 63 |                 "error": "token_limit_exceeded",
 64 |                 "message": str(e),
 65 |                 "details": {
 66 |                     "completion_tokens": e.completion.usage.completion_tokens,
 67 |                     "prompt_tokens": e.completion.usage.prompt_tokens,
 68 |                     "total_tokens": e.completion.usage.total_tokens
 69 |                 }
 70 |             }
 71 |         except openai.RateLimitError as e:
 72 |             print(f"Rate limit error encountered: {str(e)}")
 73 |             return {
 74 |                 "error": "rate_limit",
 75 |                 "message": str(e),
 76 |                 "details": getattr(e, 'args', None)
 77 |             }
 78 |         except Exception as e:
 79 |             print(f"Error in generate method: {str(e)}")
 80 |             print(f"Error type: {type(e).__name__}")
 81 |             print(f"Error details: {e.args}")
 82 |             return {
 83 |                 "error": type(e).__name__,
 84 |                 "message": str(e),
 85 |                 "details": getattr(e, 'args', None)
 86 |             }
 87 |         
 88 |     def _generate_text(
 89 |         self, prompt, system_prompt=None, temperature=0, max_tokens=4000, top_p=0.99, response_format=None
 90 |     ):
 91 | 
 92 |         sys_prompt_arg = system_prompt if system_prompt else self.system_prompt
 93 | 
 94 |         if self.model_string in ['o1', 'o1-mini']: # only supports base response currently
 95 |             response = self.client.beta.chat.completions.parse(
 96 |                 model=self.model_string,
 97 |                 messages=[
 98 |                     {"role": "user", "content": prompt},
 99 |                 ],
100 |                 max_completion_tokens=max_tokens
101 |             )
102 |             if response.choices[0].finishreason == "length":
103 |                 response = "Token limit exceeded"
104 |             else:
105 |                 response = response.choices[0].message.parsed
106 |         elif self.model_string in OPENAI_STRUCTURED_MODELS and response_format is not None:
107 |             response = self.client.beta.chat.completions.parse(
108 |                 model=self.model_string,
109 |                 messages=[
110 |                     {"role": "system", "content": sys_prompt_arg},
111 |                     {"role": "user", "content": prompt},
112 |                 ],
113 |                 frequency_penalty=0,
114 |                 presence_penalty=0,
115 |                 stop=None,
116 |                 temperature=temperature,
117 |                 max_tokens=max_tokens,
118 |                 top_p=top_p,
119 |                 response_format=response_format
120 |             )
121 |             response = response.choices[0].message.parsed
122 |         else:
123 |             response = self.client.chat.completions.create(
124 |                 model=self.model_string,
125 |                 messages=[
126 |                     {"role": "system", "content": sys_prompt_arg},
127 |                     {"role": "user", "content": prompt},
128 |                 ],
129 |                 frequency_penalty=0,
130 |                 presence_penalty=0,
131 |                 stop=None,
132 |                 temperature=temperature,
133 |                 max_tokens=max_tokens,
134 |                 top_p=top_p,
135 |             )
136 |             response = response.choices[0].message.content
137 |         return response
138 | 
139 |     def __call__(self, prompt, **kwargs):
140 |         return self.generate(prompt, **kwargs)
141 | 
142 |     def _format_content(self, content: List[Union[str, bytes]]) -> List[dict]:
143 |         formatted_content = []
144 |         for item in content:
145 |             if isinstance(item, bytes):
146 |                 base64_image = base64.b64encode(item).decode('utf-8')
147 |                 formatted_content.append({
148 |                     "type": "image_url",
149 |                     "image_url": {
150 |                         "url": f"data:image/jpeg;base64,{base64_image}"
151 |                     }
152 |                 })
153 |             elif isinstance(item, str):
154 |                 formatted_content.append({
155 |                     "type": "text",
156 |                     "text": item
157 |                 })
158 |             else:
159 |                 raise ValueError(f"Unsupported input type: {type(item)}")
160 |         return formatted_content
161 | 
162 |     def _generate_multimodal(
163 |         self, content: List[Union[str, bytes]], system_prompt=None, temperature=0, max_tokens=4000, top_p=0.99, response_format=None
164 |     ):
165 |         sys_prompt_arg = system_prompt if system_prompt else self.system_prompt
166 |         formatted_content = self._format_content(content)
167 | 
168 |         if self.model_string in ['o1', 'o1-mini']: # only supports base response currently
169 |             print(f'Max tokens: {max_tokens}')
170 |             response = self.client.chat.completions.create(
171 |                 model=self.model_string,
172 |                 messages=[
173 |                     {"role": "user", "content": formatted_content},
174 |                 ],
175 |                 max_completion_tokens=max_tokens
176 |             )
177 |             if response.choices[0].finish_reason == "length":
178 |                 response_text = "Token limit exceeded"
179 |             else:
180 |                 response_text = response.choices[0].message.content
181 |         elif self.model_string in OPENAI_STRUCTURED_MODELS and response_format is not None:
182 |             response = self.client.beta.chat.completions.parse(
183 |                 model=self.model_string,
184 |                 messages=[
185 |                     {"role": "system", "content": sys_prompt_arg},
186 |                     {"role": "user", "content": formatted_content},
187 |                 ],
188 |                 temperature=temperature,
189 |                 max_tokens=max_tokens,
190 |                 top_p=top_p,
191 |                 response_format=response_format
192 |             )
193 |             response_text = response.choices[0].message.parsed
194 |         else:
195 |             response = self.client.chat.completions.create(
196 |                 model=self.model_string,
197 |                 messages=[
198 |                     {"role": "system", "content": sys_prompt_arg},
199 |                     {"role": "user", "content": formatted_content},
200 |                 ],
201 |                 temperature=temperature,
202 |                 max_tokens=max_tokens,
203 |                 top_p=top_p,
204 |             )
205 |             response_text = response.choices[0].message.content
206 |         return response_text


--------------------------------------------------------------------------------
/src/tools/file_editor.py:
--------------------------------------------------------------------------------
  1 | import os, re, difflib, shutil
  2 | from .base import BaseTool
  3 | 
  4 | 
  5 | class File_Edit_Tool(BaseTool):
  6 |     require_llm_engine = False
  7 | 
  8 |     def __init__(self):
  9 |         super().__init__(
 10 |             tool_name="File_Edit_Tool",
 11 |             tool_description="Edit an existing text file via regex or line-number operations.",
 12 |             tool_version="1.0.0",
 13 |             input_types={
 14 |                 "file_path":     "str – path relative to workspace/",
 15 |                 "operation":     "str – one of replace|insert_after|insert_before|delete",
 16 |                 "target":        "str|int – regex (string) **or** 1-based line number",
 17 |                 "content":       "str – text to insert / replace with (ignored for delete)",
 18 |                 "occurrence":    "str – 'first' (default) or 'all'"
 19 |             },
 20 |             output_type="dict – {success: bool, message: str, diff_preview: str}",
 21 |             demo_commands=[
 22 |                 {
 23 |                     "command": (
 24 |                         'tool.execute('
 25 |                         'file_path="workspace/experiments/solver.py", '
 26 |                         'operation="replace", '
 27 |                         'target="def step\\(", '
 28 |                         'content="# TODO: refactor step()", '
 29 |                         'occurrence="first")'
 30 |                     ),
 31 |                     "description": "Replace the first occurrence of `def step(`"
 32 |                 }
 33 |             ],
 34 |             user_metadata={
 35 |                 "limitation": (
 36 |                     "Text-only. Binary files are not supported. Keep each edit focused and small "
 37 |                     "to minimise merge conflicts or unintended changes."
 38 |                 ),
 39 |                 "best_practice": (
 40 |                     "Pinpoint the edit with a unique regex or an exact line number. "
 41 |                     "Always inspect the `diff_preview` the tool returns, and, if you "
 42 |                     "need to run the modified code, follow up with the Code_Execution_Tool."
 43 |                 ),
 44 |             }
 45 |         )
 46 | 
 47 |     # ---------- internal helpers ----------
 48 |     def _make_backup(self, path):
 49 |         bak = path + ".bak"
 50 |         shutil.copy2(path, bak)
 51 | 
 52 |     def _build_diff(self, before_lines, after_lines, ctx=3):
 53 |         diff = difflib.unified_diff(
 54 |             before_lines, after_lines, lineterm="", n=ctx,
 55 |             fromfile="before", tofile="after"
 56 |         )
 57 |         # Only return first 200 lines of diff preview to prevent excessive length
 58 |         return "\n".join(list(diff)[:200])
 59 | 
 60 |     # ---------- main entry ----------
 61 |     def execute(
 62 |         self,
 63 |         file_path: str,
 64 |         operation: str,
 65 |         target,
 66 |         content: str = "",
 67 |         occurrence: str = "first"
 68 |     ):
 69 |         try:
 70 |             if operation not in {"replace", "insert_after", "insert_before", "delete"}:
 71 |                 return {"success": False, "message": f"unknown operation: {operation}"}
 72 | 
 73 |             # Parse workspace absolute path
 74 |             ws_root = os.getenv("WORKSPACE_PATH", "workspace")
 75 |             if file_path.startswith("workspace/"):
 76 |                 file_path = file_path[len("workspace/"):]
 77 |             abs_path = os.path.join(ws_root, file_path)
 78 | 
 79 |             if not os.path.exists(abs_path):
 80 |                 return {"success": False, "message": f"file not found: {abs_path}"}
 81 | 
 82 |             with open(abs_path, "r", encoding="utf-8") as f:
 83 |                 lines = f.readlines()
 84 | 
 85 |             self._make_backup(abs_path)  # Backup
 86 | 
 87 |             # ---- Locate lines ----
 88 |             # target is int -> exact line number
 89 |             matches = []
 90 |             if isinstance(target, int):
 91 |                 idx = target - 1  # Convert to 0-base
 92 |                 if 0 <= idx < len(lines):
 93 |                     matches = [idx]
 94 |             else:  # regex
 95 |                 pattern = re.compile(str(target))
 96 |                 matches = [i for i, ln in enumerate(lines) if pattern.search(ln)]
 97 | 
 98 |             if not matches:
 99 |                 return {"success": False, "message": "no match found"}
100 | 
101 |             if occurrence == "first":
102 |                 matches = matches[:1]
103 | 
104 |             # ---- Execute changes ----
105 |             for i in (sorted(matches, reverse=True)  # Process in reverse order to avoid index shifting
106 |                       if operation.startswith("insert") else matches):
107 |                 if operation == "replace":
108 |                     lines[i] = content if content.endswith("\n") else content + "\n"
109 |                 elif operation == "insert_after":
110 |                     insert = content if content.endswith("\n") else content + "\n"
111 |                     lines.insert(i + 1, insert)
112 |                 elif operation == "insert_before":
113 |                     insert = content if content.endswith("\n") else content + "\n"
114 |                     lines.insert(i, insert)
115 |                 elif operation == "delete":
116 |                     lines.pop(i)
117 | 
118 |             # ---- Write back ----
119 |             with open(abs_path, "w", encoding="utf-8") as f:
120 |                 f.writelines(lines)
121 | 
122 |             diff_preview = self._build_diff(before_lines=[], after_lines=[])  # optional: skip large diff
123 |             diff_preview = self._build_diff(
124 |                 before_lines=open(abs_path + ".bak", encoding="utf-8").read().splitlines(),
125 |                 after_lines=lines
126 |             )
127 | 
128 |             return {
129 |                 "success": True,
130 |                 "message": f"{operation} on {len(matches)} occurrence(s) done.",
131 |                 "diff_preview": diff_preview
132 |             }
133 | 
134 |         except Exception as e:
135 |             return {"success": False, "message": f"edit error: {e}"}
136 | 


--------------------------------------------------------------------------------
/src/tools/file_lister.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from .base import BaseTool
  3 | 
  4 | class File_Lister_Tool(BaseTool):
  5 |     require_llm_engine = False
  6 |     
  7 |     def __init__(self):
  8 |         super().__init__(
  9 |             tool_name="File_Lister_Tool",
 10 |             tool_description="A tool that lists all files in a given directory recursively.",
 11 |             tool_version="1.0.0",
 12 |             input_types={"dir_path": "str - The directory path starting from the workspace (default: workspace/)."},
 13 |             output_type="str - A list of all files under the directory with their paths, indicating file or folder type.",
 14 |             demo_commands=[
 15 |                 {
 16 |                     "command": 'execution = tool.execute(dir_path="workspace/")',
 17 |                     "description": "List all files under the current workspace directory."
 18 |                 }
 19 |             ],
 20 |             user_metadata={
 21 |                 "limitation": "May not work properly if the directory contains restricted access files.",
 22 |                 "best_practice": "Use this tool for listing files in structured directories to check what is present in the workspace or before writing to a certain file."
 23 |             },
 24 |         )
 25 |     
 26 |     def execute(self, dir_path="workspace/"):
 27 |         """
 28 |         List the files under the given dir_path (relative to WORKSPACE_PATH).
 29 |         Returns a dict with { 'success': bool, 'message': str }.
 30 | 
 31 |         Enhancement:
 32 |           - If it's a folder, append "(folder)"
 33 |           - If it's a file with an extension, show "(.{ext})"
 34 |           - If it's a file without extension, just "(file)"
 35 |           - If the path is not a directory but is a file, return an error.
 36 |         """
 37 |         try:
 38 |             # Ensure the path starts from the workspace
 39 |             workspace_path = os.getenv("WORKSPACE_PATH")
 40 |             if dir_path == "workspace":
 41 |                 dir_path = "workspace/"
 42 |             if "workspace" not in dir_path:
 43 |                 dir_path = os.path.join(workspace_path, dir_path)
 44 |             else:
 45 |                 # remove "workspace/" from dir_path, then join with workspace_path
 46 |                 dir_path = os.path.join(workspace_path, dir_path.split("workspace/")[-1])
 47 |             
 48 |             if not os.path.isdir(dir_path):
 49 |                 # 如果不是目录，但却是文件，则报错提示
 50 |                 if os.path.isfile(dir_path):
 51 |                     return {
 52 |                         "success": False,
 53 |                         "message": "Error: The path points to a file, not a directory."
 54 |                     }
 55 |                 else:
 56 |                     return {
 57 |                         "success": False,
 58 |                         "message": "Error: Invalid directory path."
 59 |                     }
 60 |             
 61 |             file_structure = []
 62 | 
 63 |             def list_files(current_path, prefix=""):
 64 |                 entries = sorted(os.listdir(current_path))
 65 |                 for index, entry in enumerate(entries):
 66 |                     full_path = os.path.join(current_path, entry)
 67 |                     is_last = (index == len(entries) - 1)
 68 |                     new_prefix = prefix + ("    " if is_last else "|   ")
 69 |                     relative_path = os.path.relpath(full_path, workspace_path)
 70 | 
 71 |                     if os.path.isdir(full_path):
 72 |                         # It's a folder
 73 |                         file_structure.append(f"{prefix}|-- {entry} (folder)")
 74 |                         # Recursively list folder contents
 75 |                         list_files(full_path, new_prefix)
 76 |                     else:
 77 |                         # It's a file -> check extension
 78 |                         base_name, ext = os.path.splitext(entry)
 79 |                         if ext:
 80 |                             extension_without_dot = ext[1:]
 81 |                             file_structure.append(
 82 |                                 f"{prefix}|-- {entry} (.{extension_without_dot}) (path: workspace/{relative_path})"
 83 |                             )
 84 |                         else:
 85 |                             file_structure.append(
 86 |                                 f"{prefix}|-- {entry} (file) (path: workspace/{relative_path})"
 87 |                             )
 88 | 
 89 |             # First add the root directory name
 90 |             relative_path = os.path.relpath(dir_path, workspace_path)
 91 |             if relative_path.strip() == ".":
 92 |                 output = "workspace"
 93 |             else:
 94 |                 output = os.path.basename(dir_path)
 95 | 
 96 |             # Recursively build the structure
 97 |             list_files(dir_path)
 98 | 
 99 |             # Combine results
100 |             output += "\n" + "\n".join(file_structure)
101 | 
102 |             return {
103 |                 "success": True,
104 |                 "message": output
105 |             }
106 | 
107 |         except Exception as e:
108 |             return {
109 |                 "success": False,
110 |                 "message": f"Error listing files: {str(e)}"
111 |             }
112 | 
113 | 
114 | if __name__ == "__main__":
115 |     tool = File_Lister_Tool()
116 |     
117 |     # Example directory path
118 |     relative_dir_path = "workspace/sample"
119 |     
120 |     try:
121 |         execution = tool.execute(dir_path=relative_dir_path)
122 |         if execution["success"]:
123 |             print("File List:")
124 |             print(execution["message"])
125 |         else:
126 |             print("Error:", execution["message"])
127 |     except Exception as e:
128 |         print(f"Execution failed: {e}")
129 |     
130 |     print("Done!")
131 | 


--------------------------------------------------------------------------------
/src/tools/file_reader.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | import pandas as pd
  4 | from PyPDF2 import PdfReader
  5 | import pymupdf
  6 | 
  7 | from .base import BaseTool
  8 | 
  9 | class File_Reader_Tool(BaseTool):
 10 |     require_llm_engine = False
 11 |     
 12 |     def __init__(self):
 13 |         super().__init__(
 14 |             tool_name = "File_Reader_Tool",
 15 |             tool_description = "A tool that reads and processes various file formats (json, csv, txt, etc.), returning structured text in the file. Contents are limited to 50,000 characters per read.",
 16 |             tool_version = "1.0.0",
 17 |             input_types = {"file_path": "str - The path to the file from the current workspace."},
 18 |             output_type = "str - The extracted or structured content of the file (limited to 50,000 characters).",
 19 |             demo_commands=[
 20 |                 {
 21 |                     "command": 'execution = tool.execute(file_path="workspace/sample.txt")',
 22 |                     "description": "Read the content of a text file."
 23 |                 },
 24 |                 {
 25 |                     "command": 'execution = tool.execute(file_path="workspace/sample.csv")',
 26 |                     "description": "Read the content of a CSV file."
 27 |                 },
 28 |             ],
 29 |             user_metadata = {
 30 |                 "limitation": "Limited to 50,000 characters maximum. May not accurately process encrypted, corrupted, or highly complex file structures.",
 31 |                 "best_practice": "Use this tool for reading standard file formats with structured or plain text content. For pdf files, consider using the PDF_Parser_Tool. For large files, consider reading specific sections or using multiple calls."
 32 |             },
 33 |         )
 34 | 
 35 |     def execute(self, file_path):
 36 |         """
 37 |         Read a file from the workspace, returning its content if supported.
 38 |         Returns a dict: { "success": bool, "message": str }.
 39 | 
 40 |         - success: True if read successfully, False if there's an error or invalid path.
 41 |         - message: The file content (if success, limited to 50,000 characters) or error message (if not).
 42 |         """
 43 |         MAX_CHARS = 50000  # 设置最大字符限制
 44 |         
 45 |         try:
 46 |             # Ensure the path starts from the workspace
 47 |             workspace_path = os.getenv("WORKSPACE_PATH")
 48 |             if "workspace" not in file_path:
 49 |                 file_path = os.path.join(workspace_path, file_path)
 50 |             else:
 51 |                 file_path = os.path.join(workspace_path, file_path.split("workspace/")[-1])
 52 | 
 53 |             if not os.path.isfile(file_path):
 54 |                 return {
 55 |                     "success": False,
 56 |                     "message": "Error: Invalid file path."
 57 |                 }
 58 | 
 59 |             file_extension = os.path.splitext(file_path)[-1].lower()
 60 | 
 61 |             # import here or at the top of the file, depending on your structure
 62 |             import pandas as pd
 63 |             from PyPDF2 import PdfReader
 64 |             import fitz as pymupdf   # if you're using pymupdf
 65 | 
 66 |             # Try different file types:
 67 |             if file_extension in [".json"]:
 68 |                 with open(file_path, "r", encoding="utf-8") as f:
 69 |                     content = json.load(f)
 70 |                 result = json.dumps(content, indent=4, ensure_ascii=False)
 71 |                 if len(result) > MAX_CHARS:
 72 |                     result = result[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..."
 73 |                 return {
 74 |                     "success": True,
 75 |                     "message": result
 76 |                 }
 77 | 
 78 |             elif file_extension in [".jsonl"]:
 79 |                 with open(file_path, "r", encoding="utf-8") as f:
 80 |                     lines = []
 81 |                     total_length = 0
 82 |                     for line in f:
 83 |                         data = json.loads(line)
 84 |                         formatted_line = json.dumps(data, indent=4, ensure_ascii=False)
 85 |                         if total_length + len(formatted_line) > MAX_CHARS:
 86 |                             lines.append("\n... [Content truncated due to 50,000 character limit] ...")
 87 |                             break
 88 |                         lines.append(formatted_line)
 89 |                         total_length += len(formatted_line)
 90 |                 return {
 91 |                     "success": True,
 92 |                     "message": "\n".join(lines)
 93 |                 }
 94 | 
 95 |             elif file_extension in [".csv", ".tsv", ".xls", ".xlsx"]:
 96 |                 if file_extension in [".csv", ".tsv"]:
 97 |                     sep = "\t" if file_extension == ".tsv" else ","
 98 |                     df = pd.read_csv(file_path, sep=sep)
 99 |                 else:
100 |                     df = pd.read_excel(file_path)
101 |                 result = df.to_string()
102 |                 if len(result) > MAX_CHARS:
103 |                     result = result[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..."
104 |                 return {
105 |                     "success": True,
106 |                     "message": result
107 |                 }
108 | 
109 |             elif file_extension == ".pdf":
110 |                 try:
111 |                     reader = PdfReader(file_path)
112 |                     text = ""
113 |                     for page in reader.pages:
114 |                         page_text = page.extract_text()
115 |                         if page_text and len(text) + len(page_text) <= MAX_CHARS:
116 |                             text += page_text + "\n"
117 |                         elif len(text) + len(page_text) > MAX_CHARS:
118 |                             text += page_text[:MAX_CHARS-len(text)] + "\n... [Content truncated due to 50,000 character limit] ..."
119 |                             break
120 |                 except:
121 |                     doc = pymupdf.open(file_path)
122 |                     text = ""
123 |                     for page in doc:
124 |                         page_text = page.get_text()
125 |                         if len(text) + len(page_text) <= MAX_CHARS:
126 |                             text += page_text + "\n"
127 |                         else:
128 |                             text += page_text[:MAX_CHARS-len(text)] + "\n... [Content truncated due to 50,000 character limit] ..."
129 |                             break
130 |                 return {
131 |                     "success": True,
132 |                     "message": text
133 |                 }
134 | 
135 |             elif file_extension in [".html", ".xml"]:
136 |                 with open(file_path, "r", encoding="utf-8") as f:
137 |                     text = f.read(MAX_CHARS + 1)  # 读取比限制多1个字符以检测是否超出限制
138 |                 if len(text) > MAX_CHARS:
139 |                     text = text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..."
140 |                 return {
141 |                     "success": True,
142 |                     "message": text
143 |                 }
144 | 
145 |             elif file_extension in [".md", ".txt", ".docx", ".rtf"]:
146 |                 with open(file_path, "r", encoding="utf-8") as f:
147 |                     text = f.read(MAX_CHARS + 1)  # 读取比限制多1个字符以检测是否超出限制
148 |                 if len(text) > MAX_CHARS:
149 |                     text = text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..."
150 |                 return {
151 |                     "success": True,
152 |                     "message": text
153 |                 }
154 | 
155 |             else:
156 |                 # fallback
157 |                 with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
158 |                     text = f.read(MAX_CHARS + 1)
159 |                 if len(text) > MAX_CHARS:
160 |                     text = text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..."
161 |                 return {
162 |                     "success": True,
163 |                     "message": text
164 |                 }
165 | 
166 |         except Exception as e:
167 |             # fallback attempt, or final error
168 |             try:
169 |                 with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
170 |                     fallback_text = f.read(MAX_CHARS + 1)
171 |                 if len(fallback_text) > MAX_CHARS:
172 |                     fallback_text = fallback_text[:MAX_CHARS] + "\n... [Content truncated due to 50,000 character limit] ..."
173 |                 return {
174 |                     "success": False,
175 |                     "message": f"Partial read fallback:\n{fallback_text}"
176 |                 }
177 |             except:
178 |                 return {
179 |                     "success": False,
180 |                     "message": f"Error reading file: {str(e)}"
181 |                 }
182 | 
183 | 
184 | if __name__ == "__main__":
185 |     import json
186 |     
187 |     tool = File_Reader_Tool()
188 |     
189 |     # Example file path
190 |     relative_file_path = "workspace/sample.json"
191 |     
192 |     try:
193 |         execution = tool.execute(file_path=relative_file_path)
194 |         print("File Content:")
195 |         print(json.dumps(execution, indent=4))
196 |     except Exception as e:
197 |         print(f"Execution failed: {e}")
198 |     
199 |     print("Done!")


--------------------------------------------------------------------------------
/src/tools/file_writer.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from .base import BaseTool
 3 | 
 4 | 
 5 | class File_Writer_Tool(BaseTool):
 6 |     require_llm_engine = False
 7 | 
 8 |     def __init__(self):
 9 |         super().__init__(
10 |             tool_name="File_Writer_Tool",
11 |             tool_description="A tool that writes or appends content to a file.",
12 |             tool_version="1.0.1",
13 |             input_types={
14 |                 "file_path": "str - Path to the file (relative or starting with workspace/).",
15 |                 "content": "str - Text content to write or append.",
16 |                 "mode": "str - 'w' 覆盖写，'a' 追加写，默认 'a'。"
17 |             },
18 |             output_type="dict - { success: bool, message: str }",
19 |             demo_commands=[
20 |                 {
21 |                     "command": (
22 |                         'tool.execute('
23 |                         'file_path="workspace/experiments/sample.py", '
24 |                         'content="print(\'Hello\')", mode="w")'
25 |                     ),
26 |                     "description": "写入实验脚本"
27 |                 }
28 |             ],
29 |             user_metadata={
30 |                 "limitation": "You cannot write binary file, your code write here will not be executed. Note that in your code, you should not use any things using g++. You can use numpy, pandas, scipy, etc.",
31 |                 "best_practice": "先确定正确目录，再写入；如需运行代码请调用 Code Executor。"
32 |             },
33 |         )
34 | 
35 |     def execute(self, file_path, content, mode="a"):
36 |         """
37 |         Write or append text to a file in the workspace.
38 |         """
39 |         if not file_path or not content:
40 |             return {"success": False,
41 |                     "message": "file_path 与 content 不能为空"}
42 | 
43 |         if mode not in ("w", "a"):
44 |             return {"success": False,
45 |                     "message": "mode 只能为 'w' 或 'a'"}
46 | 
47 |         try:
48 |             # 1) 解析工作区路径
49 |             workspace_root = os.getenv("WORKSPACE_PATH", "workspace")
50 | 
51 |             # 若用户给的是绝对路径 / 或其他 workspace 名称，进行处理
52 |             if file_path.startswith("workspace/"):
53 |                 rel_path = file_path[len("workspace/"):]
54 |             else:
55 |                 rel_path = file_path
56 | 
57 |             file_path = os.path.join(workspace_root, rel_path)
58 | 
59 |             # 2) 如果父目录不存在，先创建
60 |             os.makedirs(os.path.dirname(file_path), exist_ok=True)
61 | 
62 |             # 3) 写或追加内容
63 |             with open(file_path, mode, encoding="utf-8") as f:
64 |                 f.write(content)
65 | 
66 |             return {"success": True,
67 |                     "message": f"Success: written to {file_path}"}
68 | 
69 |         except Exception as e:
70 |             return {"success": False,
71 |                     "message": f"Error writing file: {e}"}
72 | 
73 | 
74 | if __name__ == "__main__":
75 |     # Demo
76 |     tool = File_Writer_Tool()
77 |     res = tool.execute(
78 |         file_path="workspace/experiments/test.py",
79 |         content="print('Hello, World!')",
80 |         mode="w"
81 |     )
82 |     print(res)


--------------------------------------------------------------------------------
/src/tools/image_captioner.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from .base import BaseTool
  3 | from .engine import ChatOpenAI
  4 | 
  5 | class Image_Captioner_Tool(BaseTool):
  6 |     require_llm_engine = True
  7 |     
  8 |     def __init__(self, model_string="gpt-4o"):
  9 |         super().__init__(
 10 |             tool_name="Image_Captioner_Tool",
 11 |             tool_description="A tool that generates captions for images using OpenAI's multimodal model.",
 12 |             tool_version="1.0.0",
 13 |             input_types={
 14 |                 "image": "str - The path to the image file from current workspace.",
 15 |                 "prompt": "str - The prompt to guide the image captioning (default: 'Describe this image in detail.').",
 16 |             },
 17 |             output_type="str - The generated caption for the image.",
 18 |             demo_commands=[
 19 |                 {
 20 |                     "command": 'execution = tool.execute(image="workspace/path/to/image.png")',
 21 |                     "description": "Generate a caption for an image using the default prompt and model."
 22 |                 },
 23 |                 {
 24 |                     "command": 'execution = tool.execute(image="workspace/path/to/image.png", prompt="Explain the geometric shapes in the image.")',
 25 |                     "description": "Generate a caption focusing on geometric shapes demonstrated in the image."
 26 |                 }
 27 |             ],
 28 |             user_metadata = {
 29 |                 "limitation": "The Image_Captioner_Tool may misinterpret complex equations, symbols, or spatial relationships, leading to inaccurate descriptions.",
 30 |                 "best_practice": "Please consider to use it on images with clear and simple content to help you understand the modeling problem, instead of using it for complex data analysis.",
 31 |             },
 32 |         )
 33 |         print(f"\nInitializing Image Captioner Tool with model: {model_string}")
 34 |         self.llm_engine = ChatOpenAI(model_string=model_string, is_multimodal=True) if model_string else None
 35 | 
 36 |     def execute(self, image, prompt="Describe this image in detail."):
 37 |         """
 38 |         Generate a caption or description for an image using a multimodal LLM engine.
 39 |         Returns a dict: { "success": bool, "message": str }.
 40 | 
 41 |         - success: True if caption generation was successful, False otherwise.
 42 |         - message: The caption text (if success) or an error message.
 43 |         """
 44 |         try:
 45 |             # Check if LLM engine is initialized
 46 |             if not self.llm_engine:
 47 |                 return {
 48 |                     "success": False,
 49 |                     "message": "Error: LLM engine not initialized. Please provide a valid model_string."
 50 |                 }
 51 | 
 52 |             input_data = [prompt]
 53 | 
 54 |             # get workspace path from environment
 55 |             workspace_path = os.getenv("WORKSPACE_PATH", "workspace")
 56 | 
 57 |             # ensure the image path is relative to workspace
 58 |             if "workspace" not in image:
 59 |                 image_path = os.path.join(workspace_path, image)
 60 |             else:
 61 |                 # remove "workspace/" from the path, then join with workspace_path
 62 |                 image_path = os.path.join(workspace_path, image.split("workspace/")[-1])
 63 | 
 64 |             # Check if the file exists
 65 |             if not os.path.isfile(image_path):
 66 |                 return {
 67 |                     "success": False,
 68 |                     "message": "Error: Invalid image file path."
 69 |                 }
 70 | 
 71 |             # Attempt to read the image file
 72 |             try:
 73 |                 with open(image_path, 'rb') as file:
 74 |                     image_bytes = file.read()
 75 |                 input_data.append(image_bytes)
 76 |             except Exception as e:
 77 |                 return {
 78 |                     "success": False,
 79 |                     "message": f"Error reading image file: {str(e)}"
 80 |                 }
 81 | 
 82 |             # Attempt to generate caption
 83 |             try:
 84 |                 caption = self.llm_engine(input_data)
 85 |                 return {
 86 |                     "success": True,
 87 |                     "message": caption
 88 |                 }
 89 |             except Exception as e:
 90 |                 return {
 91 |                     "success": False,
 92 |                     "message": f"Error generating caption using LLM engine: {str(e)}"
 93 |                 }
 94 | 
 95 |         except Exception as e:
 96 |             return {
 97 |                 "success": False,
 98 |                 "message": f"Error generating caption: {str(e)}"
 99 |             }
100 | 
101 |     def get_metadata(self):
102 |         metadata = super().get_metadata()
103 |         metadata['require_llm_engine'] = self.require_llm_engine # NOTE: can be removed if not needed
104 |         return metadata
105 | 
106 | 
107 | if __name__ == "__main__":
108 |     import json
109 |     
110 |     tool = Image_Captioner_Tool(model_string="gpt-4o")
111 | 
112 |     # Get tool metadata
113 |     metadata = tool.get_metadata()
114 |     print(metadata)
115 |     
116 |     # Construct the full path to the image using the script's directory
117 |     relative_image_path = "workspace/Figure1.jpg"
118 | 
119 |     # Execute the tool with default prompt
120 |     try:
121 |         execution = tool.execute(image=relative_image_path)
122 |         print("Generated Caption:")
123 |         print(json.dumps(execution, indent=4)) 
124 |     except Exception as e: 
125 |         print(f"Execution failed: {e}")
126 | 
127 |     print("Done!")


--------------------------------------------------------------------------------
/src/tools/pdf_parsing.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from .base import BaseTool
  3 | from PyPDF2 import PdfReader
  4 | import pymupdf
  5 | import pymupdf4llm
  6 | 
  7 | class PDF_Parser_Tool(BaseTool):
  8 |     require_llm_engine = False
  9 |     
 10 |     def __init__(self):
 11 |         super().__init__(
 12 |             tool_name="PDF_Parser_Tool",
 13 |             tool_description="A tool that extracts and processes text from PDF documents.",
 14 |             tool_version="1.0.0",
 15 |             input_types={
 16 |                 "pdf_path": "str - The path to the PDF file from the current workspace.",
 17 |                 "num_pages": "int - The number of pages to extract (default: all pages).",
 18 |                 "min_size": "int - The minimum text length required for extraction (default: 100)."
 19 |             },
 20 |             output_type="str - The extracted text from the PDF document.",
 21 |             demo_commands=[
 22 |                 {
 23 |                     "command": 'execution = tool.execute(pdf_path="workspace/sample.pdf")',
 24 |                     "description": "Extract text from an entire PDF document."
 25 |                 },
 26 |                 {
 27 |                     "command": 'execution = tool.execute(pdf_path="workspace/sample.pdf", num_pages=2, min_size=50)',
 28 |                     "description": "Extract text from the first two pages of a PDF with a minimum text length of 50 characters."
 29 |                 }
 30 |             ],
 31 |             user_metadata={
 32 |                 "limitation": "May not accurately extract text from scanned PDFs or those with complex formatting. The extracted text may contain errors or omissions.",
 33 |                 "best_practice": "Use this tool for extracting text from digital PDFs rather than scanned images."
 34 |             },
 35 |         )
 36 |         print("\nInitializing PDF Parser Tool")
 37 |     
 38 |     def execute(self, pdf_path, num_pages=None, min_size=-1):
 39 |         """
 40 |         Extract text from a PDF using multiple fallback methods (pymupdf4llm -> pymupdf -> pypdf).
 41 |         Returns a dict with { "success": bool, "message": str }:
 42 |         - success=True, message=extracted_text on success
 43 |         - success=False, message=error info on fail
 44 |         """
 45 |         
 46 |         if type(num_pages) == str:
 47 |             num_pages == int(num_pages) if num_pages.isdigit() else None
 48 |         if type(min_size) == str:
 49 |             min_size = int(min_size) if min_size.isdigit() else -1
 50 |         
 51 |         try:
 52 |             # Ensure the path starts from the workspace
 53 |             workspace_path = os.getenv("WORKSPACE_PATH", "workspace")
 54 |             if "workspace" not in pdf_path:
 55 |                 pdf_path = os.path.join(workspace_path, pdf_path)
 56 |             else:
 57 |                 pdf_path = os.path.join(workspace_path, pdf_path.split("workspace/")[-1])
 58 | 
 59 |             if not os.path.isfile(pdf_path):
 60 |                 return {
 61 |                     "success": False,
 62 |                     "message": "Error: Invalid PDF file path."
 63 |                 }
 64 | 
 65 |             text = ""
 66 | 
 67 |             try:
 68 |                 # Attempt using pymupdf4llm
 69 |                 if num_pages is None:
 70 |                     text = pymupdf4llm.to_markdown(pdf_path)
 71 |                 else:
 72 |                     reader = PdfReader(pdf_path)
 73 |                     min_pages = min(len(reader.pages), num_pages)
 74 |                     text = pymupdf4llm.to_markdown(pdf_path, pages=list(range(min_pages)))
 75 | 
 76 |                 if min_size != -1 and len(text) < min_size:
 77 |                     raise Exception("Text too short")
 78 | 
 79 |             except Exception as e:
 80 |                 print(f"Error with pymupdf4llm, falling back to pymupdf: {e}")
 81 |                 try:
 82 |                     # Fallback to pure pymupdf
 83 |                     doc = pymupdf.open(pdf_path)
 84 |                     if num_pages:
 85 |                         doc = doc[:num_pages]
 86 |                     text = "".join(page.get_text() for page in doc)
 87 | 
 88 |                     if min_size != -1 and len(text) < min_size:
 89 |                         raise Exception("Text too short")
 90 | 
 91 |                 except Exception as e2:
 92 |                     print(f"Error with pymupdf, falling back to pypdf: {e2}")
 93 |                     # Fallback to pypdf
 94 |                     reader = PdfReader(pdf_path)
 95 |                     if num_pages is None:
 96 |                         text = "".join(page.extract_text() for page in reader.pages)
 97 |                     else:
 98 |                         text = "".join(
 99 |                             page.extract_text() for page in reader.pages[:num_pages]
100 |                         )
101 | 
102 |                     if min_size != -1 and len(text) < min_size:
103 |                         raise Exception("Text too short")
104 | 
105 |             # If everything is ok, return success + text
106 |             return {
107 |                 "success": True,
108 |                 "message": text
109 |             }
110 | 
111 |         except Exception as e:
112 |             return {
113 |                 "success": False,
114 |                 "message": f"Error extracting text: {str(e)}"
115 |             }
116 | 
117 |     def get_metadata(self):
118 |         metadata = super().get_metadata()
119 |         metadata['require_llm_engine'] = self.require_llm_engine
120 |         return metadata
121 | 
122 | if __name__ == "__main__":
123 |     import json
124 | 
125 |     tool = PDF_Parser_Tool()
126 |     
127 |     # Get tool metadata
128 |     metadata = tool.get_metadata()
129 |     print(metadata)
130 |     
131 |     # Construct the full path to the PDF using the script's directory
132 |     os.environ["WORKSPACE_PATH"] = "PATH_TO_TEST_FILE/2025_Managing_Sustainable_Tourism"
133 |     relative_pdf_path = "workspace/2025_MCM_Problem_B.pdf"
134 |     
135 |     # Execute the tool with default parameters
136 |     try:
137 |         execution = tool.execute(pdf_path=relative_pdf_path)
138 |         print("Extracted Text:")
139 |         print(json.dumps(execution, indent=4))
140 |     except Exception as e:
141 |         print(f"Execution failed: {e}")
142 |     
143 |     print("Done!")


--------------------------------------------------------------------------------
/src/tools/solution_generator.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from .base import BaseTool
  3 | from .engine import ChatOpenAI
  4 | 
  5 | class Solution_Generator_Tool(BaseTool):
  6 |     require_llm_engine = True
  7 | 
  8 |     def __init__(self, model_string="gpt-4o"):
  9 |         super().__init__(
 10 |             tool_name="Generalist_Solution_Generator_Tool",
 11 |             tool_description="A generalized tool that takes query from the user as prompt, and answers the question step by step to the best of its ability. It can also accept an image.",
 12 |             tool_version="1.0.0",
 13 |             input_types={
 14 |                 "prompt": "str - The prompt that includes query from the user to guide the agent to generate response (Examples: 'Describe this image in detail').",
 15 |                 "image": "str - The path to the image file from current workspace if applicable (default: None).",
 16 |             },
 17 |             output_type="str - The generated response to the original query prompt",
 18 |             demo_commands=[
 19 |                 {
 20 |                     "command": 'execution = tool.execute(prompt="Summarize the following text in a few lines")',
 21 |                     "description": "Generate a short summary given the prompt from the user."
 22 |                 },
 23 |                 {
 24 |                     "command": 'execution = tool.execute(prompt="Give your best coordinate estimate for the pacemaker in the image and return (x1, y1, x2, y2)", image="workspace/path/to/image.png")',
 25 |                     "description": "Generate bounding box coordinates given the image and prompt from the user. The format should be (x1, y1, x2, y2)."
 26 |                 },
 27 |             ],
 28 | 
 29 |             user_metadata = {
 30 |                 "limitation": "The Solution_Generator_Tool may provide hallucinated or incorrect responses. Besides, the solution generator can only answer SIMPLE questions. Never throw whole question into this tool and expect a proper response.",
 31 |                 "best_practice": "Use the Solution_Generator_Tool for general queries or tasks that don't require specialized knowledge or other specific tools. Provide clear, specific prompts. For complex queries, break them down into subtasks before using this tool."
 32 |             }
 33 | 
 34 |         )
 35 |         self.model_string = model_string  
 36 | 
 37 |     def execute(self, prompt, image=None):
 38 |         """
 39 |         Generates a solution or answer using a ChatOpenAI model (optionally multimodal).
 40 |         Returns a dict: { "success": bool, "message": str }.
 41 | 
 42 |         - success: True if generation succeeded, False otherwise.
 43 |         - message: The generated text or error message.
 44 |         """
 45 |         print(f"\nInitializing Solution Tool with model: {self.model_string}")
 46 |         multimodal = True if image else False
 47 | 
 48 |         try:
 49 |             # Initialize the LLM engine
 50 |             from src.tools.engine import ChatOpenAI  # or import wherever ChatOpenAI is
 51 |             llm_engine = ChatOpenAI(model_string=self.model_string, is_multimodal=multimodal)
 52 |         except Exception as e:
 53 |             return {
 54 |                 "success": False,
 55 |                 "message": f"Error initializing ChatOpenAI engine: {str(e)}"
 56 |             }
 57 | 
 58 |         try:
 59 |             input_data = [prompt]
 60 |             if multimodal:
 61 |                 if not os.path.isfile(image):
 62 |                     return {
 63 |                         "success": False,
 64 |                         "message": "Error: Invalid image file path."
 65 |                     }
 66 |                 try:
 67 |                     with open(image, 'rb') as file:
 68 |                         image_bytes = file.read()
 69 |                     input_data.append(image_bytes)
 70 |                 except Exception as e:
 71 |                     return {
 72 |                         "success": False,
 73 |                         "message": f"Error reading image file: {str(e)}"
 74 |                     }
 75 |                 # Attempt generating with multimodal
 76 |                 response = llm_engine(input_data)
 77 |             else:
 78 |                 # Text-only
 79 |                 response = llm_engine(input_data[0])
 80 | 
 81 |             return {
 82 |                 "success": True,
 83 |                 "message": response
 84 |             }
 85 | 
 86 |         except Exception as e:
 87 |             return {
 88 |                 "success": False,
 89 |                 "message": f"Error generating response: {str(e)}"
 90 |             }
 91 | 
 92 |     def get_metadata(self):
 93 |         metadata = super().get_metadata()
 94 |         return metadata
 95 | 
 96 | if __name__ == "__main__":
 97 |     # Example usage of the Generalist_Tool
 98 |     tool = Solution_Generator_Tool(model_string="gpt-4o")
 99 | 
100 |     # Get tool metadata
101 |     metadata = tool.get_metadata()
102 |     print(metadata)
103 | 
104 |     # Construct the full path to the image using the script's directory
105 |     relative_image_path = "workspace/Figure1.jpg"
106 |     prompt = "Describe the image in detail."
107 | 
108 |     # Execute the tool with default prompt
109 |     try:
110 |         execution = tool.execute(prompt=prompt, image=relative_image_path)
111 |         # execution = tool.execute(prompt=prompt)
112 |         print("Generated Response:")
113 |         print(execution)
114 |     except Exception as e: 
115 |         print(f"Execution failed: {e}")
116 | 
117 |     print("Done!")


--------------------------------------------------------------------------------
/src/tools/text_detector.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import time
  3 | from .base import BaseTool
  4 | 
  5 | import warnings
  6 | warnings.filterwarnings("ignore")
  7 | 
  8 | class Text_Detector_Tool(BaseTool):
  9 |     require_llm_engine = False
 10 |     
 11 |     def __init__(self):
 12 |         super().__init__(
 13 |             tool_name="Text_Detector_Tool",
 14 |             tool_description="A tool that detects text in an image using EasyOCR.",
 15 |             tool_version="1.0.0",
 16 |             input_types={
 17 |                 "image": "str - The path to the image file from the current workspace.",
 18 |                 "languages": "list - A list of language codes for the OCR model (default to English and Simplified Chinese only).",
 19 |                 "detail": "int - The level of detail in the output. Set to 0 for simpler output, 1 for detailed output (default to 0 simpler output)."
 20 |             },
 21 |             output_type="list - A list of detected text blocks.",
 22 |             demo_commands=[
 23 |                 {
 24 |                     "command": 'execution = tool.execute(image="workspace/path/to/image.png")',
 25 |                     "description": "Detect text in an image using the default language (English and Chinese)."
 26 |                 },
 27 |                 {
 28 |                     "command": 'execution = tool.execute(image="path/to/image.png", languages=["en"], detail=0)',
 29 |                     "description": "Detect English text in an image with simpler output (text without coordinates and scores)."
 30 |                 },
 31 |             ],
 32 |             user_metadata={
 33 |                 "limitation": "The Text_Detector_Tool may not accurately detect text in images with complex layouts, fonts, or backgrounds. The tables, numbers, and special characters may not be detected or retain its original structure.",
 34 |                 "best_practice": "Use the Text_Detector_Tool for detecting text in simple images with clear text. try to post process the detected text to improve accuracy and readability. Use the extracted text only as reference for understanding the image content.",
 35 |                 "frequently_used_language": {
 36 |                     "ch_sim": "Simplified Chinese",
 37 |                     "ch_tra": "Traditional Chinese",
 38 |                     "de": "German",
 39 |                     "en": "English",
 40 |                     "es": "Spanish",
 41 |                     "fr": "French",
 42 |                     "hi": "Hindi",
 43 |                     "ja": "Japanese",
 44 |                 }
 45 |             }
 46 |         )
 47 | 
 48 |     def build_tool(self, languages=None):
 49 |         """
 50 |         Builds and returns the EasyOCR reader model.
 51 | 
 52 |         Parameters:
 53 |             languages (list): A list of language codes for the OCR model.
 54 | 
 55 |         Returns:
 56 |             easyocr.Reader: An initialized EasyOCR Reader object.
 57 |         """
 58 |         languages = languages or ["en"]  # Default to English if no languages provided
 59 |         
 60 |         try:
 61 |             import easyocr
 62 |             reader = easyocr.Reader(languages)
 63 |             return reader
 64 |         except ImportError:
 65 |             raise ImportError("Please install the EasyOCR package using 'pip install easyocr'.")
 66 |         except Exception as e:
 67 |             print(f"Error building the OCR tool: {e}")
 68 |             return None
 69 |     
 70 |     def execute(self, image, languages=None, max_retries=10, retry_delay=5, clear_cuda_cache=False, **kwargs):
 71 |         """
 72 |         Executes the OCR tool to detect text in the provided image.
 73 | 
 74 |         Parameters:
 75 |             image (str): The path to the image file.
 76 |             languages (list): A list of language codes for the OCR model.
 77 |             max_retries (int): Maximum number of retry attempts.
 78 |             retry_delay (int): Delay in seconds between retry attempts.
 79 |             clear_cuda_cache (bool): Whether to clear CUDA cache on out-of-memory errors.
 80 |             **kwargs: Additional keyword arguments for the OCR reader.
 81 | 
 82 |         Returns:
 83 |             dict: {
 84 |             "success": bool,
 85 |             "message": str,  # success/failure info
 86 |             "data": list     # OCR result list (empty if failed)
 87 |             }
 88 |         """
 89 |         languages = ["en"]
 90 | 
 91 |         # get workspace path from environment
 92 |         workspace_path = os.getenv("WORKSPACE_PATH", "workspace")
 93 |         if "workspace" not in image:
 94 |             image = os.path.join(workspace_path, image)
 95 |         else:
 96 |             image = os.path.join(workspace_path, image.split("workspace/")[-1])
 97 | 
 98 |         # Check if file exists
 99 |         if not os.path.isfile(image):
100 |             return {
101 |                 "success": False,
102 |                 "message": "Error: Invalid image file path.",
103 |                 "data": []
104 |             }
105 | 
106 |         # Retry up to max_retries times
107 |         for attempt in range(max_retries):
108 |             try:
109 |                 reader = self.build_tool(languages)
110 |                 if reader is None:
111 |                     return {
112 |                         "success": False,
113 |                         "message": "Error: Failed to build the OCR tool.",
114 |                         "data": []
115 |                     }
116 | 
117 |                 result = reader.readtext(image, **kwargs)
118 |                 try:
119 |                     # If detail=1, convert numpy coords to int
120 |                     cleaned_result = [
121 |                         ([[int(coord[0]), int(coord[1])] for coord in item[0]], item[1], round(float(item[2]), 2))
122 |                         for item in result
123 |                     ]
124 |                     return {
125 |                         "success": True,
126 |                         "message": "OCR detection succeeded.",
127 |                         "data": cleaned_result
128 |                     }
129 |                 except Exception:
130 |                     # detail=0 or other fallback
131 |                     return {
132 |                         "success": True,
133 |                         "message": "OCR detection succeeded (detail=0).",
134 |                         "data": result
135 |                     }
136 | 
137 |             except RuntimeError as e:
138 |                 if "CUDA out of memory" in str(e):
139 |                     print(f"CUDA out of memory on attempt {attempt+1}.")
140 |                     if clear_cuda_cache:
141 |                         print("Clearing CUDA cache and retrying...")
142 |                         import torch
143 |                         torch.cuda.empty_cache()
144 |                     else:
145 |                         print(f"Retrying in {retry_delay} seconds...")
146 |                     time.sleep(retry_delay)
147 |                     continue
148 |                 else:
149 |                     print(f"Runtime error: {e}")
150 |                     return {
151 |                         "success": False,
152 |                         "message": f"Runtime error: {str(e)}",
153 |                         "data": []
154 |                     }
155 |             except Exception as e:
156 |                 print(f"Error detecting text: {e}")
157 |                 return {
158 |                     "success": False,
159 |                     "message": f"Error detecting text: {str(e)}",
160 |                     "data": []
161 |                 }
162 | 
163 |         # If we exhausted all retries
164 |         print(f"Failed to detect text after {max_retries} attempts.")
165 |         return {
166 |             "success": False,
167 |             "message": f"Failed after {max_retries} attempts.",
168 |             "data": []
169 |         }
170 | 
171 | 
172 |     def get_metadata(self):
173 |         """
174 |         Returns the metadata for the Text_Detector_Tool.
175 | 
176 |         Returns:
177 |             dict: A dictionary containing the tool's metadata.
178 |         """
179 |         metadata = super().get_metadata()
180 |         return metadata
181 | 
182 | 
183 | if __name__ == "__main__":
184 |     import json
185 |     
186 |     # Example usage of the Text_Detector_Tool
187 |     tool = Text_Detector_Tool()
188 | 
189 |     # Get tool metadata
190 |     metadata = tool.get_metadata()
191 |     print(metadata)
192 | 
193 |     relative_image_path = "workspace/Figure2.jpg"
194 | 
195 |     # Execute the tool
196 |     try:
197 |         execution = tool.execute(image=relative_image_path, languages=["en"], detail=0)
198 |         print(json.dumps(execution))
199 | 
200 |         print("Detected Text:", execution)
201 |     except ValueError as e:
202 |         print(f"Execution failed: {e}")
203 | 
204 |     print("Done!")


--------------------------------------------------------------------------------
/src/tools/url_text.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import requests
 3 | from bs4 import BeautifulSoup
 4 | 
 5 | from .base import BaseTool
 6 | 
 7 | class URL_Text_Extractor_Tool(BaseTool):
 8 |     def __init__(self):
 9 |         super().__init__(
10 |             tool_name="URL_Text_Extractor_Tool",
11 |             tool_description="A tool that extracts all text from a given URL.",
12 |             tool_version="1.0.0",
13 |             input_types={
14 |                 "url": "str - The URL from which to extract text.",
15 |             },
16 |             output_type="str - The extracted text from the given url and any error messages.",
17 |             demo_commands=[
18 |                 {
19 |                     "command": 'execution = tool.execute(url="https://example.com")',
20 |                     "description": "Extract all text from the example.com website."
21 |                 },
22 |                 {
23 |                     "command": 'execution = tool.execute(url="https://en.wikipedia.org/wiki/Python_(programming_language)")',
24 |                     "description": "Extract all text from the Wikipedia page about Python programming language."
25 |                 },
26 |             ],
27 |             user_metadata={
28 |                 "limitation": "1. The URL_Text_Extractor_Tool may not accurately extract text from all websites. The extracted text may contain errors or omissions. The text in the images or embedded content may not be extracted. 2. You should not use this tool to download anything or read online document like PDF. Make sure that the url you entered is a website.",
29 |                 "best_practice": "Use this tool to summarize all the text information from a web page. The extracted text should be used as a reference for understanding the content of the website. Be aware that it may not be exactly complete or accurate."
30 |             }
31 |         )
32 | 
33 |     def extract_text_from_url(self, url):
34 |         try:
35 |             response = requests.get(url, timeout=10)  # optional: set a timeout
36 |             response.raise_for_status()
37 |             soup = BeautifulSoup(response.content, 'html.parser')
38 |             text = soup.get_text(separator='\n', strip=True)
39 |             text = text[:10000]  # Limit the text to 10000 characters
40 |             return {
41 |                 "success": True,
42 |                 "message": text
43 |             }
44 |         except requests.RequestException as e:
45 |             return {
46 |                 "success": False,
47 |                 "message": f"Error fetching URL: {str(e)}"
48 |             }
49 |         except Exception as e:
50 |             return {
51 |                 "success": False,
52 |                 "message": f"Error extracting text: {str(e)}"
53 |             }
54 | 
55 |     def execute(self, url):
56 |         """
57 |         Extract text from a given webpage URL, returning a dict:
58 |           { "success": bool, "message": str }.
59 | 
60 |         - success=True, message=extracted_text (trimmed if too long)
61 |         - success=False, message=error info
62 |         """
63 |         return self.extract_text_from_url(url)
64 |     
65 |     def get_metadata(self):
66 |         metadata = super().get_metadata()
67 |         return metadata
68 | 
69 | 
70 | if __name__ == "__main__":
71 |     # Example usage of the URL_Text_Extractor_Tool
72 |     tool = URL_Text_Extractor_Tool()
73 | 
74 |     # Get tool metadata
75 |     metadata = tool.get_metadata()
76 |     print(metadata)
77 | 
78 |     # Sample URL for extracting text
79 |     url = "https://weather.metoffice.gov.uk/forecast/wx4g092se"
80 | 
81 |     import json
82 | 
83 |     # Execute the tool with the sample URL
84 |     try:
85 |         execution = tool.execute(url=url)
86 |         print("Execution Result:")
87 |         print(execution)
88 |     except ValueError as e:
89 |         print(f"Execution failed: {e}")
90 | 
91 |     print("Done!")


--------------------------------------------------------------------------------
/src/tools/web_download.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import requests
 3 | from .base import BaseTool
 4 | 
 5 | class Web_Download_Tool(BaseTool):
 6 |     def __init__(self):
 7 |         super().__init__(
 8 |             tool_name="Web_Download_Tool",
 9 |             tool_description="A tool that downloads a file from a given URL and saves it to a specified location.",
10 |             tool_version="1.0.0",
11 |             input_types={
12 |                 "url": "str - The URL of the file to download.",
13 |                 "save_path": "str - The target save file path starting from the workspace, including the filename."
14 |             },
15 |             output_type="str - Success message or error details.",
16 |             demo_commands=[
17 |                 {
18 |                     "command": 'execution = tool.execute(url="https://arxiv.org/pdf/paper.pdf", save_path="workspace/paper.pdf")',
19 |                     "description": "Download a PDF file from arXiv and save it to the workspace."
20 |                 }
21 |             ],
22 |             user_metadata={
23 |                 "limitation": "Cannot download files from restricted or inaccessible URLs. The downnload may fail if the URL is invalid or the file is too large. Always verify the content type matches the file extension - servers might return HTML error pages even when requesting non-HTML content (e.g., downloading a .zip but getting HTML content with .zip extension).",
24 |                 "best_practice": "Ensure the URL is valid and the save path includes the intended filename. Check the availability of the file after download using python code or other means."
25 |             },
26 |         )
27 | 
28 |     def execute(self, url, save_path):
29 |         """
30 |         Download a file from a URL to the workspace.
31 | 
32 |         Returns:
33 |             dict: {
34 |                 "success": bool,
35 |                 "message": str   # success info or error message
36 |             }
37 |         """
38 |         try:
39 |             # Ensure the save path starts from the workspace
40 |             workspace_path = os.getenv("WORKSPACE_PATH", "workspace")
41 |             if not save_path.startswith("workspace/"):
42 |                 final_path = os.path.join(workspace_path, save_path)
43 |             else:
44 |                 final_path = os.path.join(workspace_path, save_path.split("workspace/")[-1])
45 | 
46 |             # Create necessary directories
47 |             os.makedirs(os.path.dirname(final_path), exist_ok=True)
48 | 
49 |             # Download the file
50 |             import requests
51 |             response = requests.get(url, stream=True, timeout=10)
52 |             response.raise_for_status()  # Raise error for failed requests
53 | 
54 |             with open(final_path, "wb") as f:
55 |                 for chunk in response.iter_content(chunk_size=8192):
56 |                     f.write(chunk)
57 | 
58 |             return {
59 |                 "success": True,
60 |                 "message": f"Download successful! The file is saved at {final_path}"
61 |             }
62 | 
63 |         except requests.exceptions.RequestException as e:
64 |             return {
65 |                 "success": False,
66 |                 "message": f"Download failed (RequestException): {str(e)}"
67 |             }
68 |         except Exception as e:
69 |             return {
70 |                 "success": False,
71 |                 "message": f"Error saving file: {str(e)}"
72 |             }
73 | 
74 | if __name__ == "__main__":
75 |     tool = Web_Download_Tool()
76 |     execution = tool.execute(url="https://arxiv.org/pdf/2502.01600", save_path="workspace/paper.pdf")
77 |     print(execution)
78 | 


--------------------------------------------------------------------------------
/src/tools/web_search.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | import http.client
  4 | import time
  5 | 
  6 | from .base import BaseTool
  7 | 
  8 | class Web_Search_Tool(BaseTool):
  9 |     require_llm_engine = False
 10 |     
 11 |     def __init__(self):
 12 |         super().__init__(
 13 |             tool_name="Web_Search_Tool",
 14 |             tool_description="A tool that performs web searches using an API and returns structured search results.",
 15 |             tool_version="1.0.0",
 16 |             input_types={
 17 |                 "query": "str - The search query to retrieve information.",
 18 |                 "link": "bool - Whether to include links in the output (default: True).",
 19 |                 "num": "int - Number of search results to return (default: 10)."
 20 |             },
 21 |             output_type="str - The formatted search results based on the given query.",
 22 |             demo_commands=[
 23 |                 {
 24 |                     "command": 'execution = tool.execute(query="Latest AI trends", link=True, num=5)',
 25 |                     "description": "Search for the latest AI trends and return up to 5 results with links."
 26 |                 }
 27 |             ],
 28 |             user_metadata={
 29 |                 "limitation": "Limited by API availability and may not always return results. The snippet may be very concise and may not contain all relevant information.",
 30 |                 "best_practice": "Use this tool for retrieving up-to-date web search results on various topics. Then, use the search link you get from the return to explore more details by further using URL_Text_Extractor_Tool."
 31 |             }
 32 |         )
 33 | 
 34 |     def execute(self, query, link=False, num=10):
 35 |         """
 36 |         Perform a web search via Google Serper API.
 37 | 
 38 |         Returns:
 39 |         dict: {
 40 |             "success": bool,          # True if search results obtained, False otherwise
 41 |             "message": str            # The search results or error info
 42 |         }
 43 |         """
 44 |         
 45 |         if type(link) == str:
 46 |             link = False if link.lower() == 'false' else True
 47 |         if type(num) == str:
 48 |             num = int(num) if num.isdigit() else 10
 49 |         
 50 |         api_key = os.getenv("SERPER_API_KEY", None)
 51 |         if not api_key:
 52 |             return {
 53 |                 "success": False,
 54 |                 "message": "Error: Missing SERPER_API_KEY."
 55 |             }
 56 | 
 57 |         import http.client, json, time
 58 | 
 59 |         conn = http.client.HTTPSConnection("google.serper.dev")
 60 |         headers = {
 61 |             'X-API-KEY': api_key,
 62 |             'Content-Type': 'application/json'
 63 |         }
 64 |         payload = json.dumps({
 65 |             "q": query,
 66 |             # "tbs": "qdr:y"  # optional param for time range
 67 |         })
 68 | 
 69 |         try_time = 0
 70 |         data = {}
 71 |         while True:
 72 |             try:
 73 |                 conn.request("POST", "/search", payload, headers)
 74 |                 res = conn.getresponse()
 75 |                 raw_data = res.read().decode("utf-8")
 76 |                 data = json.loads(raw_data)
 77 | 
 78 |                 if data.get("organic", []):
 79 |                     # We got some results, break
 80 |                     break
 81 | 
 82 |                 try_time += 1
 83 |                 if try_time > 5:
 84 |                     return {
 85 |                         "success": False,
 86 |                         "message": "Search Error: Timeout or no results after 5 attempts."
 87 |                     }
 88 |                 time.sleep(5)
 89 | 
 90 |             except Exception as e:
 91 |                 return {
 92 |                     "success": False,
 93 |                     "message": f"Search Error while sending request: {str(e)}"
 94 |                 }
 95 | 
 96 |         try:
 97 |             output = ""
 98 |             index = 1
 99 |             answer_box = data.get("answerBox", {})
100 | 
101 |             # If there's an answerBox
102 |             if answer_box:
103 |                 try:
104 |                     current = f"{index}. {answer_box['title']}"
105 |                     if link and 'link' in answer_box:
106 |                         current += f"\n- Link: {answer_box['link']}"
107 |                     if "date" in answer_box:
108 |                         current += f"\n- Date: {answer_box['date']}"
109 |                     current += f"\n- Snippet: {answer_box['snippet']}"
110 |                     output += current + "\n\n"
111 |                     index += 1
112 |                 except Exception:
113 |                     pass  # in case something is missing
114 | 
115 |             # If we've reached the desired number of results
116 |             if index > num:
117 |                 return {
118 |                     "success": True,
119 |                     "message": output.strip()
120 |                 }
121 | 
122 |             # Now handle the "organic" array
123 |             for item in data.get("organic", []):
124 |                 try:
125 |                     current = f"{index}. {item['title']}"
126 |                     if link and 'link' in item:
127 |                         current += f"\n- Link: {item['link']}"
128 |                     if "date" in item:
129 |                         current += f"\n- Date: {item['date']}"
130 |                     current += f"\n- Snippet: {item['snippet']}"
131 |                     output += current + "\n\n"
132 |                     index += 1
133 |                 except:
134 |                     pass
135 | 
136 |                 if index > num:
137 |                     return {
138 |                         "success": True,
139 |                         "message": output.strip()
140 |                     }
141 | 
142 |             # Return what we have so far
143 |             return {
144 |                 "success": True,
145 |                 "message": output.strip()
146 |             }
147 | 
148 |         except Exception as e:
149 |             return {
150 |                 "success": False,
151 |                 "message": f"Search Error: {str(e)}"
152 |             }
153 | 
154 | if __name__ == "__main__":
155 |     tool = Web_Search_Tool()
156 |     query = "How's the weather in Beijing"
157 |     execution = tool.execute(query=query, link=True, num=3)
158 |     print("Search Results:")
159 |     print(execution)
160 |     print("Done!")
161 | 


--------------------------------------------------------------------------------