├── .gitignore ├── LICENSE ├── README.md ├── api.py ├── call_ai.py ├── chain_store.py ├── chat_loop.py ├── engine.py ├── helpers.py ├── main.py ├── mixture.py ├── planner.py ├── requirements.txt ├── successful_chains.json └── tools.py /.gitignore: -------------------------------------------------------------------------------- 1 | venv/ 2 | .env 3 | __pycache__/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 mshumer 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | # OpenReasoningEngine 4 | 5 | **While AI labs are quietly building closed reasoning systems, 6 | we can create something more powerful together in the open.** 7 | 8 |
9 | 10 | --- 11 | 12 | This repo serves as a modular, open-source test-time compute engine — anyone in the community with a useful idea to improve model capabilities is encouraged to add their approach to the system. As approaches are added, this system will enable users to compose them to drastically increase capabilities. 13 | 14 | And over time, as users save successful reasoning chains, we will be able to train models designed to take full advantage of this system. 15 | 16 | *Works with any OpenAI-compatible endpoint/model that supports function calling, and serves as a great base for building many types of reasoning systems.* 17 | 18 | > ### ⚠️ Important Note 19 | > **We are going to be very selective about what we add to this system. If an approach doesn't have a clear path to increasing the capabilities of the system, we will not add it.** 20 | 21 | --- 22 | 23 | ## 🚀 Initial System 24 | 25 | ### Core Features 26 | 27 | 🔹 **Step-by-Step Reasoning** 28 |     Executes reasoning one step per turn with integrated tools: 29 | - Python interpreter 30 | - Web search (via SerpAPI) 31 | - Wolfram Alpha integration 32 | - Full webpage reading (via Jina) 33 | 34 | 🔹 **Memory-Based Planning** 35 |     Continually learns and adapts from past experiences 36 | 37 | 🔹 **MoA** 38 |     Implements mixture-of-agents for ensemble decision making — *works but requires further testing* 39 | 40 | 🔹 **Beam Search** 41 |     Sample multiple next reasoning step candidates at each turn, and choose the best (soon to be updated with forking Python interpreters to significantly improve the system) 42 | 43 | 🔹 **Self-Reflection** 44 |     Force the AI to validate reasoning steps as it thinks 45 | 46 | 🔹 **Flexible Model Support** 47 |     Model-agnostic API supporting any OpenAI-compatible provider (OpenAI, Anthropic, etc.) 48 | 49 | 🔹 **Rich Input/Output** 50 |     Handles image input, **function calling**, and multi-turn conversations 51 | 52 | --- 53 | 54 | ## ⚙️ Installation 55 | 56 | ### 1. Clone and Install 57 | ```bash 58 | git clone https://github.com/mshumer/OpenReasoningEngine.git 59 | cd OpenReasoningEngine 60 | pip install -r requirements.txt 61 | ``` 62 | 63 | ### 2. API Setup 64 | Get API keys from: 65 | - [OpenRouter](https://openrouter.ai/) - for model access 66 | - [E2B](https://e2b.dev/) - for Python code execution 67 | - [SerpAPI](https://serpapi.com/) - for web search 68 | - [Jina](https://jina.ai/) (optional) - for webpage content extraction 69 | - [Wolfram Alpha](https://products.wolframalpha.com/api) (optional) - for computations/scientific queries 70 | - [Cohere](https://cohere.ai/) (optional) - for learning from past chains 71 | 72 | Create a `.env` file: 73 | ```env 74 | E2B_API_KEY="your_e2b_key_here" 75 | OPENROUTER_API_KEY="your_openrouter_key_here" 76 | SERPAPI_API_KEY="your_serpapi_key_here" 77 | JINA_API_KEY="your_jina_key_here" # Optional 78 | WOLFRAM_APP_ID="your_wolfram_key_here" # Optional 79 | COHERE_API_KEY="your_cohere_key_here" # Optional 80 | ``` 81 | 82 | ### 3. Load Environment 83 | ```bash 84 | source .env 85 | ``` 86 | 87 | --- 88 | 89 | ## 🛠️ Usage 90 | 91 | ### Running the Engine 92 | Two options available: 93 | - Direct execution: `python main.py` 94 | - API server: `python api.py` (starts a Flask API endpoint) 95 | 96 | ## Config Options 97 | Running the code as-is will work — I've chosen reasonable default settings. If you'd like to customize the way the system reasons, you can adjust the parameters when you run it. 98 | 99 | ### Tool System 100 | 101 | #### 1. Internal Tools 102 | - Used during the reasoning process 103 | - Default setup includes: 104 | - Python interpreter (with guidance to steer the LLM to add assertions, prints, etc. to improve performance and catch issues) 105 | - Web search (SerpAPI) 106 | - Webpage content extraction (Jina, optional) 107 | - Wolfram Alpha (optional) 108 | - Customizable based on your needs 109 | 110 | #### 2. Output Tools 111 | - Standard AI API output tools 112 | - Called after reasoning completion 113 | - Configurable based on use-case 114 | 115 | --- 116 | 117 | ## 🧮 Learning System 118 | 119 | ### Memory Management 120 | 121 | A major goal of OpenReasoningEngine is to enable learning from experience. The initial implementation is simple, and will continue to be iterated on as I (and others) come up with smarter approaches. 122 | 123 | #### Steps to Enable Continual Learning: 124 | 125 | 1. Obtain an API key from [Cohere](https://cohere.ai/) 126 | 127 | 2. Save successful reasoning chains: 128 | ```python 129 | chain_store.save_successful_chain( 130 | task=task, 131 | conversation_history=history, 132 | final_response=response, 133 | cohere_api_key=cohere_api_key, 134 | thinking_tools=thinking_tools, 135 | output_tools=output_tools, 136 | metadata={"model": model, "api_url": api_url} 137 | ) 138 | ``` 139 | 140 | The system includes starter chains in `successful_chains.json`. 141 | 142 | Community contributions to this database are welcome, subject to validation. If you'd like to add a chain to the database, please propose it [here](https://github.com/mshumer/OpenReasoningEngine/discussions/categories/proposed-chains). The community will vote on it, and if the results are positive, it will be added to the next version of the database (versioning will allow users to see stable performance over time). 143 | 144 | If you have ideas to make this process more seamless and scalable, please reach out! 145 | 146 | ### 📊 Performance Notes 147 | 148 | - Performance may vary based on the specific chains in your memory store (performance may be dramatically different with different chains) 149 | 150 | --- 151 | 152 | ## 📝 Logging 153 | 154 | ### Verbose Mode 155 | When `verbose=True`, the engine displays: 156 | - 🔄 API interactions 157 | - 🛠️ Tool usage and results 158 | - 📋 Step-by-step reasoning progress 159 | 160 | This makes it easy to see what's going on under the hood and diagnose issues. 161 | 162 | --- 163 | 164 | ## 🧪 Benchmarking 165 | 166 | I've open-sourced a very simple LLM evaluation harness that you can use with this repo to test different setups and understand how well approaches work. I've provided some example eval datasets so you can see how it works. If you want to try different OpenReasoningEngine setups, just drop in your own eval data and play with the reasoning settings until it works well for you! 167 | 168 | [Try it here.](https://github.com/mshumer/MattEval) 169 | 170 | --- 171 | 172 | ## 🤝 Contributing 173 | 174 | Contributions are welcome if they: 175 | - ✨ Demonstrably improve system capabilities 176 | - 📈 Include clear performance metrics 177 | 178 | Quality-of-life improvements are also appreciated. 179 | 180 | --- 181 | 182 | ## Acknowledgements 183 | Thank you to the following folks who provided advice, feedback, ideas, and helped me implement and test the initial versions of OpenReasoningEngine: 184 | - [Steve Ickman](https://x.com/stevenic) 185 | - [Vasek Mlejnsky](https://x.com/mlejva) 186 | - [Josh Bickett](https://x.com/josh_bickett) 187 | - [Aidan Gomez](https://x.com/aidangomez) 188 | - [Alec Velikanov](https://x.com/alecvxyz) (Alex, imo) 189 | 190 | [Follow me on X](https://x.com/mattshumer_) for updates on this and other AI things I'm working on. 191 | 192 | OpenReasoningEngine is released under the MIT License. See the [LICENSE](https://github.com/mshumer/OpenReasoningEngine/blob/main/LICENSE) file for more details. 193 | -------------------------------------------------------------------------------- /api.py: -------------------------------------------------------------------------------- 1 | from flask import Flask, request, jsonify 2 | from engine import complete_reasoning_task 3 | from mixture import ensemble 4 | import traceback 5 | 6 | app = Flask(__name__) 7 | 8 | @app.route('/reason', methods=['POST']) 9 | def reason(): 10 | """ 11 | Single model reasoning endpoint. 12 | 13 | Expected JSON payload: 14 | { 15 | "task": "The task description", 16 | "api_key": "your-api-key", 17 | "model": "model-name", 18 | "api_url": "api-endpoint", 19 | "temperature": 0.7, # optional 20 | "top_p": 1.0, # optional 21 | "max_tokens": 500, # optional 22 | "verbose": false, # optional 23 | "chain_store_api_key": "key", # optional 24 | "wolfram_app_id": "key", # optional 25 | "max_reasoning_steps": 10, # optional 26 | "image": "image-url or base64" # optional 27 | "output_tools": [ # optional 28 | { 29 | "type": "tool-type", 30 | "name": "tool-name", 31 | "description": "tool-description" 32 | } 33 | ], 34 | "reflection_mode": false, # optional: enable reflection mode 35 | "previous_chains": [ # optional: previous conversation chains 36 | [ 37 | { 38 | "role": "system|user|assistant|tool", 39 | "content": "message content", 40 | "tool_calls": [] # optional 41 | } 42 | ] 43 | ], 44 | "jina_api_key": "jina-api-key" # optional 45 | } 46 | """ 47 | try: 48 | data = request.get_json() 49 | 50 | # Required parameters 51 | task = data.get('task') 52 | api_key = data.get('api_key') 53 | model = data.get('model') 54 | api_url = data.get('api_url') 55 | 56 | if not all([task, api_key, model, api_url]): 57 | return jsonify({ 58 | 'error': 'Missing required parameters. Need: task, api_key, model, api_url' 59 | }), 400 60 | 61 | # Optional parameters 62 | temperature = data.get('temperature', 0.7) 63 | top_p = data.get('top_p', 1.0) 64 | max_tokens = data.get('max_tokens', 500) 65 | verbose = data.get('verbose', False) 66 | chain_store_api_key = data.get('chain_store_api_key') 67 | wolfram_app_id = data.get('wolfram_app_id') 68 | max_reasoning_steps = data.get('max_reasoning_steps') 69 | image = data.get('image') 70 | output_tools = data.get('output_tools') 71 | reflection_mode = data.get('reflection_mode', False) 72 | previous_chains = data.get('previous_chains', []) # New parameter 73 | num_candidates = data.get('num_candidates', 1) 74 | beam_search_enabled = data.get('beam_search_enabled', False) 75 | use_planning = data.get('use_planning', False) 76 | use_jeremy_planning = data.get('use_jeremy_planning', False) 77 | jina_api_key = data.get('jina_api_key') 78 | 79 | # Run reasoning 80 | response, history, thinking_tools, output_tools = complete_reasoning_task( 81 | task=task, 82 | api_key=api_key, 83 | model=model, 84 | api_url=api_url, 85 | temperature=temperature, 86 | top_p=top_p, 87 | max_tokens=max_tokens, 88 | verbose=verbose, 89 | chain_store_api_key=chain_store_api_key, 90 | wolfram_app_id=wolfram_app_id, 91 | max_reasoning_steps=max_reasoning_steps, 92 | image=image, 93 | output_tools=output_tools, 94 | reflection_mode=reflection_mode, 95 | previous_chains=previous_chains, 96 | use_planning=use_planning, 97 | beam_search_enabled=beam_search_enabled, 98 | num_candidates=num_candidates, 99 | use_jeremy_planning=use_jeremy_planning, 100 | jina_api_key=jina_api_key 101 | ) 102 | 103 | return jsonify({ 104 | 'response': response, 105 | 'reasoning_chain': history, 106 | 'thinking_tools': thinking_tools, 107 | 'output_tools': output_tools 108 | }) 109 | 110 | except Exception as e: 111 | return jsonify({ 112 | 'error': str(e), 113 | 'traceback': traceback.format_exc() 114 | }), 500 115 | 116 | @app.route('/ensemble', methods=['POST']) 117 | def run_ensemble(): 118 | """ 119 | Ensemble reasoning endpoint. 120 | 121 | Expected JSON payload: 122 | { 123 | "task": "The task description", 124 | "agents": [ 125 | { 126 | "model": "model-name-1", 127 | "api_key": "key-1", 128 | "api_url": "url-1", 129 | "temperature": "temperature-1", 130 | }, 131 | { 132 | "model": "model-name-2", 133 | "api_key": "key-2", 134 | "api_url": "url-2", 135 | "temperature": "temperature-2" 136 | } 137 | ], 138 | "coordinator": { 139 | "model": "model-name", 140 | "api_key": "key", 141 | "api_url": "url", 142 | "temperature": "temperature" 143 | }, 144 | "verbose": false, # optional 145 | "chain_store_api_key": "key", # optional 146 | "max_workers": 3, # optional 147 | "return_reasoning": false, # optional 148 | "max_reasoning_steps": 10, # optional: max steps per agent 149 | "coordinator_max_steps": 5, # optional: max steps for coordinator 150 | "wolfram_app_id": "key", # optional 151 | "temperature": 0.7, # optional 152 | "top_p": 1.0, # optional 153 | "max_tokens": 500 # optional 154 | "reflection_mode": false, # optional: enable reflection mode for all agents 155 | } 156 | """ 157 | try: 158 | data = request.get_json() 159 | 160 | # Required parameters 161 | task = data.get('task') 162 | agents = data.get('agents') 163 | coordinator = data.get('coordinator') 164 | 165 | if not all([task, agents, coordinator]): 166 | return jsonify({ 167 | 'error': 'Missing required parameters. Need: task, agents, coordinator' 168 | }), 400 169 | 170 | # Optional parameters 171 | verbose = data.get('verbose', False) 172 | chain_store_api_key = data.get('chain_store_api_key') 173 | max_workers = data.get('max_workers') 174 | return_reasoning = data.get('return_reasoning', False) 175 | max_reasoning_steps = data.get('max_reasoning_steps') 176 | coordinator_max_steps = data.get('coordinator_max_steps') 177 | wolfram_app_id = data.get('wolfram_app_id') 178 | temperature = data.get('temperature', 0.7) 179 | top_p = data.get('top_p', 1.0) 180 | max_tokens = data.get('max_tokens', 500) 181 | image = data.get('image', None) 182 | output_tools = data.get('output_tools') 183 | reflection_mode = data.get('reflection_mode', False) 184 | 185 | # Run ensemble 186 | result = ensemble( 187 | task=task, 188 | agents=agents, 189 | coordinator=coordinator, 190 | verbose=verbose, 191 | chain_store_api_key=chain_store_api_key, 192 | max_workers=max_workers, 193 | return_reasoning=return_reasoning, 194 | max_reasoning_steps=max_reasoning_steps, 195 | coordinator_max_steps=coordinator_max_steps, 196 | wolfram_app_id=wolfram_app_id, 197 | temperature=temperature, 198 | top_p=top_p, 199 | max_tokens=max_tokens, 200 | image=image, 201 | output_tools=output_tools, 202 | reflection_mode=reflection_mode 203 | ) 204 | 205 | if return_reasoning: 206 | coordinator_response, agent_results = result 207 | return jsonify({ 208 | 'response': coordinator_response, 209 | 'agent_results': [ 210 | { 211 | 'model': config['model'], 212 | 'response': response, 213 | 'reasoning_chain': history, 214 | 'thinking_tools': thinking_tools, 215 | 'output_tools': output_tools 216 | } 217 | for config, response, history, thinking_tools, output_tools in agent_results 218 | ] 219 | }) 220 | 221 | return jsonify({ 222 | 'response': result 223 | }) 224 | 225 | except Exception as e: 226 | return jsonify({ 227 | 'error': str(e), 228 | 'traceback': traceback.format_exc() 229 | }), 500 230 | 231 | if __name__ == '__main__': 232 | app.run(host='0.0.0.0', port=5050) -------------------------------------------------------------------------------- /call_ai.py: -------------------------------------------------------------------------------- 1 | from colorama import Fore, Style 2 | import requests 3 | from typing import List, Dict 4 | import concurrent.futures 5 | import os 6 | 7 | 8 | def send_message_to_api( 9 | task: str, 10 | messages: List[Dict], 11 | api_key: str, 12 | tools: List[Dict], 13 | model: str = "gpt-4o-mini", 14 | temperature: float = 0.7, 15 | top_p: float = 1.0, 16 | max_tokens: int = 500, 17 | api_url: str = "https://openrouter.ai/api/v1/chat/completions", 18 | verbose: bool = False, 19 | is_first_step: bool = False, 20 | tool_choice: str = None, 21 | ) -> Dict: 22 | """ 23 | Send a message to the OpenRouter API and return the assistant's response. 24 | Will retry up to 3 times with increasing delay between retries. 25 | """ 26 | if verbose and is_first_step: 27 | print( 28 | f"\n{Fore.CYAN}╭──────────────────────────────────────────{Style.RESET_ALL}" 29 | ) 30 | print(f"{Fore.CYAN}│ Sending Request to API{Style.RESET_ALL}") 31 | print( 32 | f"{Fore.CYAN}├──────────────────────────────────────────{Style.RESET_ALL}" 33 | ) 34 | print(f"{Fore.CYAN}│ Model: {Style.RESET_ALL}{model}") 35 | print(f"{Fore.CYAN}│ URL: {Style.RESET_ALL}{api_url}") 36 | print(f"{Fore.CYAN}│ Temperature: {Style.RESET_ALL}{temperature}") 37 | print( 38 | f"{Fore.CYAN}╰──────────────────────────────────────────{Style.RESET_ALL}\n" 39 | ) 40 | 41 | retries = 0 42 | max_retries = 3 43 | delay = 1 # Initial delay in seconds 44 | 45 | # Prepare request data for logging 46 | request_data = { 47 | 'model': model, 48 | 'messages': messages, 49 | 'tools': tools if tools else None, 50 | 'max_tokens': max_tokens, 51 | 'temperature': temperature, 52 | 'top_p': top_p, 53 | } 54 | 55 | if tool_choice: 56 | request_data['tool_choice'] = tool_choice 57 | 58 | while retries <= max_retries: 59 | try: 60 | print( 61 | f"\n{Fore.BLUE}Making API Request (Attempt {retries + 1}/{max_retries + 1})...{Style.RESET_ALL}" 62 | ) 63 | response = requests.post( 64 | api_url, 65 | headers={ 66 | "Authorization": f"Bearer {api_key}", 67 | "Content-Type": "application/json", 68 | }, 69 | json=request_data, 70 | timeout=60 71 | ) 72 | print(f"{Fore.GREEN}Response received:{Style.RESET_ALL}") 73 | print(f"{Fore.YELLOW}{response.json()}{Style.RESET_ALL}") 74 | 75 | if verbose: 76 | print( 77 | f"{Fore.YELLOW}Response status: {response.status_code}{Style.RESET_ALL}" 78 | ) 79 | 80 | if response.status_code != 200: 81 | # Log failed request 82 | import datetime 83 | import os 84 | import json 85 | 86 | os.makedirs('api_error_logs', exist_ok=True) 87 | timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S') 88 | log_file = f'api_error_logs/error_{timestamp}.json' 89 | 90 | error_log = { 91 | 'timestamp': timestamp, 92 | 'status_code': response.status_code, 93 | 'error_message': response.text, 94 | 'response_json': response.json(), 95 | 'request_url': api_url, 96 | 'request_data': request_data, 97 | 'retry_attempt': retries + 1 98 | } 99 | 100 | with open(log_file, 'w') as f: 101 | json.dump(error_log, f, indent=2) 102 | 103 | raise Exception( 104 | f"API request failed with status {response.status_code}: {response.text}" 105 | ) 106 | 107 | response_data = response.json() 108 | print(f"{Fore.GREEN}Successfully parsed response data{Style.RESET_ALL}") 109 | return response_data["choices"][0]["message"] 110 | 111 | except Exception as error: 112 | print( 113 | f"{Fore.RED}Error occurred during API call (Attempt {retries + 1})!{Style.RESET_ALL}" 114 | ) 115 | print(f"{Fore.RED}{str(error)}{Style.RESET_ALL}") 116 | 117 | # Log any other errors that occur 118 | import datetime 119 | import os 120 | import json 121 | 122 | os.makedirs('api_error_logs', exist_ok=True) 123 | timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S') 124 | log_file = f'api_error_logs/error_{timestamp}.json' 125 | 126 | error_log = { 127 | 'timestamp': timestamp, 128 | 'error_type': type(error).__name__, 129 | 'error_message': str(error), 130 | 'request_url': api_url, 131 | 'response_json': response.json(), 132 | 'request_data': request_data, 133 | 'retry_attempt': retries + 1 134 | } 135 | 136 | with open(log_file, 'w') as f: 137 | json.dump(error_log, f, indent=2) 138 | 139 | if retries == max_retries: 140 | raise Exception( 141 | f"Error sending message to API after {max_retries + 1} attempts: {str(error)}" 142 | ) 143 | 144 | import time 145 | 146 | wait_time = delay * (2**retries) # Exponential backoff 147 | print( 148 | f"{Fore.YELLOW}Waiting {wait_time} seconds before retrying...{Style.RESET_ALL}" 149 | ) 150 | time.sleep(wait_time) 151 | retries += 1 152 | 153 | 154 | def generate_multiple_candidates( 155 | task: str, 156 | messages: List[Dict], 157 | api_key: str, 158 | tools: List[Dict], 159 | num_candidates: int = 3, 160 | model: str = "gpt-4o-mini", 161 | temperature: float = 0.7, 162 | top_p: float = 1.0, 163 | max_tokens: int = 500, 164 | api_url: str = "https://openrouter.ai/api/v1/chat/completions", 165 | verbose: bool = False, 166 | is_first_step: bool = False, 167 | ) -> List[Dict]: 168 | """ 169 | Generate multiple candidate responses in parallel using concurrent.futures. 170 | Returns a list of candidate responses. 171 | """ 172 | print( 173 | f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}" 174 | ) 175 | print(f"{Fore.MAGENTA}│ Generating {num_candidates} Candidates{Style.RESET_ALL}") 176 | print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}") 177 | 178 | def generate_candidate(): 179 | return send_message_to_api( 180 | task=task, 181 | messages=messages, 182 | api_key=api_key, 183 | tools=tools, 184 | model=model, 185 | temperature=temperature, 186 | top_p=top_p, 187 | max_tokens=max_tokens, 188 | api_url=api_url, 189 | verbose=verbose, 190 | is_first_step=is_first_step, 191 | ) 192 | 193 | candidates = [] 194 | with concurrent.futures.ThreadPoolExecutor(max_workers=num_candidates) as executor: 195 | print(f"{Fore.CYAN}Starting parallel candidate generation...{Style.RESET_ALL}") 196 | future_to_candidate = { 197 | executor.submit(generate_candidate): i for i in range(num_candidates) 198 | } 199 | for future in concurrent.futures.as_completed(future_to_candidate): 200 | try: 201 | candidate = future.result() 202 | candidates.append(candidate) 203 | print( 204 | f"{Fore.GREEN}Successfully generated candidate {len(candidates)}/{num_candidates}{Style.RESET_ALL}" 205 | ) 206 | except Exception as e: 207 | print( 208 | f"{Fore.RED}Error generating candidate: {str(e)}{Style.RESET_ALL}" 209 | ) 210 | 211 | print( 212 | f"{Fore.GREEN}Generated {len(candidates)} candidates successfully{Style.RESET_ALL}" 213 | ) 214 | return candidates 215 | 216 | 217 | def generate_best_candidate( 218 | task: str, 219 | messages: List[Dict], 220 | api_key: str, 221 | tools: List[Dict], 222 | num_candidates: int = 3, 223 | model: str = "gpt-4o-mini", 224 | temperature: float = 0.7, 225 | top_p: float = 1.0, 226 | max_tokens: int = 500, 227 | api_url: str = "https://openrouter.ai/api/v1/chat/completions", 228 | verbose: bool = False, 229 | is_first_step: bool = False, 230 | ) -> Dict: 231 | """ 232 | Generate a list of candidate responses and return the best one. 233 | """ 234 | print(f"\n{Fore.CYAN}╭──────────────────────────────────────────{Style.RESET_ALL}") 235 | print(f"{Fore.CYAN}│ Starting Best Candidate Selection{Style.RESET_ALL}") 236 | print(f"{Fore.CYAN}╰──────────────────────────────────────────{Style.RESET_ALL}") 237 | 238 | candidates = generate_multiple_candidates( 239 | task, 240 | messages, 241 | api_key, 242 | tools, 243 | num_candidates, 244 | model, 245 | temperature, 246 | top_p, 247 | max_tokens, 248 | api_url, 249 | verbose, 250 | is_first_step, 251 | ) 252 | 253 | print(f"\n{Fore.YELLOW}Generated Candidates:{Style.RESET_ALL}") 254 | print(f"{Fore.YELLOW}{candidates}{Style.RESET_ALL}") 255 | 256 | print(f"\n{Fore.MAGENTA}Preparing evaluation prompt...{Style.RESET_ALL}") 257 | evaluation_prompt = "" 258 | 259 | i = 1 260 | for candidate in candidates: 261 | evaluation_prompt += f"Candidate {i}:\n{candidate}\n\n" 262 | i += 1 263 | 264 | SYSTEM_PROMPT = """You are a judge tasked with evaluating the viability of multiple candidate responses to a given task. Your goal is to identify the candidate that is most likely to lead to solving the task properly. 265 | 266 | You will be given a which describes the task at hand, a section which contains the thoughts of the assistant before receiving the candidate responses, and a section which contains the candidate responses to be evaluated. 267 | 268 | Evaluate the viability of each candidate response and output the number of the candidate that is most likely to lead to solving the task properly. 269 | 270 | Do so in the following format: 271 | 272 | Think through the viability of each candidate here. 273 | 274 | 275 | 276 | Number of the best candidate 277 | 278 | """ 279 | 280 | evaluation_prompt += f"""{task} 281 | 282 | 283 | {messages} 284 | 285 | 286 | 287 | {evaluation_prompt} 288 | 289 | 290 | Think it through inside the section, and then output the number of the candidate that is most likely to lead to solving the properly in the section. In the section, only output the number, nothing else. Possible numbers are: {', '.join(str(i) for i in range(1, num_candidates + 1))}""" 291 | 292 | print(f"\n{Fore.BLUE}Sending evaluation request to API...{Style.RESET_ALL}") 293 | best_candidate_response = send_message_to_api( 294 | task="", 295 | messages=[ 296 | {"role": "system", "content": SYSTEM_PROMPT}, 297 | {"role": "user", "content": evaluation_prompt}, 298 | ], 299 | api_key=api_key, 300 | tools=tools, 301 | ) 302 | 303 | # Parse the best candidate number from the response 304 | best_candidate_number = int( 305 | best_candidate_response["content"] 306 | .split("")[1] 307 | .split("")[0] 308 | .strip() 309 | ) 310 | 311 | print(f"\n{Fore.GREEN}Selected best candidate:{Style.RESET_ALL}") 312 | print(f"{Fore.GREEN}{best_candidate_number}{Style.RESET_ALL}") 313 | 314 | # Return the best candidate 315 | return candidates[best_candidate_number - 1] 316 | -------------------------------------------------------------------------------- /chain_store.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import requests 4 | import numpy as np 5 | from typing import List, Dict, Optional 6 | from datetime import datetime 7 | 8 | def init_store(store_file: str = "successful_chains.json") -> None: 9 | """Initialize the chain store if it doesn't exist.""" 10 | if not os.path.exists(store_file): 11 | with open(store_file, 'w') as f: 12 | json.dump({"chains": []}, f) 13 | 14 | def get_embedding(text: str, cohere_api_key: str, input_type: str = "search_document") -> Optional[List[float]]: 15 | """Get embeddings from Cohere API.""" 16 | try: 17 | response = requests.post( 18 | "https://api.cohere.ai/v1/embed", 19 | headers={ 20 | "Authorization": f"Bearer {cohere_api_key}", 21 | "Content-Type": "application/json" 22 | }, 23 | json={ 24 | "texts": [text], 25 | "model": "embed-english-v3.0", 26 | "input_type": input_type, 27 | "embedding_type": "float" 28 | } 29 | ) 30 | response.raise_for_status() 31 | return response.json()["embeddings"][0] 32 | except Exception as e: 33 | print(f"Error getting embedding: {e}") 34 | return None 35 | 36 | def cosine_similarity(a: List[float], b: List[float]) -> float: 37 | """Calculate cosine similarity between two vectors.""" 38 | return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) 39 | 40 | def save_successful_chain( 41 | task: str, 42 | conversation_history: List[Dict], 43 | final_response: str, 44 | cohere_api_key: str, 45 | thinking_tools: List[Dict], 46 | output_tools: List[Dict], 47 | metadata: Dict, 48 | store_file: str = "successful_chains.json" 49 | ) -> bool: 50 | """Save a successful chain to the store.""" 51 | try: 52 | # Get embedding for the task 53 | embedding = get_embedding(task, cohere_api_key) 54 | if not embedding: 55 | return False 56 | 57 | # Initialize store if it doesn't exist 58 | if not os.path.exists(store_file): 59 | store = {"chains": []} 60 | else: 61 | try: 62 | with open(store_file, 'r') as f: 63 | store = json.load(f) 64 | except json.JSONDecodeError: 65 | # If file exists but is invalid JSON, start fresh 66 | store = {"chains": []} 67 | 68 | # Process conversation history to redact long tool responses 69 | processed_history = [] 70 | for msg in conversation_history: 71 | if msg['role'] == 'tool' and len(msg['content']) > 1500: 72 | msg = msg.copy() # Create a copy to avoid modifying the original 73 | msg['content'] = "[redacted for token savings]" 74 | processed_history.append(msg) 75 | 76 | # Add new chain 77 | chain = { 78 | "task": task, 79 | "embedding": embedding, 80 | "conversation_history": processed_history, 81 | "final_response": final_response, 82 | "thinking_tools": thinking_tools, 83 | "output_tools": output_tools, 84 | "timestamp": datetime.now().isoformat(), 85 | "metadata": metadata 86 | } 87 | store["chains"].append(chain) 88 | 89 | # Save updated store 90 | with open(store_file, 'w') as f: 91 | json.dump(store, f, indent=2) 92 | 93 | return True 94 | except Exception as e: 95 | print(f"Error saving chain: {str(e)}") # More detailed error message 96 | return False 97 | 98 | def get_similar_chains( 99 | task: str, 100 | cohere_api_key: str, 101 | n: int = 3, 102 | store_file: str = "successful_chains.json" 103 | ) -> List[Dict]: 104 | """Get n most similar chains for a given task.""" 105 | try: 106 | # Get embedding for the query task 107 | query_embedding = get_embedding(task, cohere_api_key, input_type="search_query") 108 | if not query_embedding: 109 | return [] 110 | 111 | # Load chains 112 | with open(store_file, 'r') as f: 113 | store = json.load(f) 114 | 115 | # Calculate similarities 116 | similarities = [] 117 | for chain in store["chains"]: 118 | similarity = cosine_similarity(query_embedding, chain["embedding"]) 119 | similarities.append((similarity, chain)) 120 | 121 | # Sort by similarity and get top n 122 | similarities.sort(reverse=True, key=lambda x: x[0]) 123 | result = [chain for _, chain in similarities[:n]] 124 | return result 125 | 126 | except Exception as e: 127 | return [] 128 | 129 | def prepare_examples_messages(similar_chains: List[Dict], current_tools: List[Dict]) -> List[Dict]: 130 | """ 131 | Prepare example chains as messages for the prompt. 132 | Now includes information about available tools. 133 | """ 134 | if not similar_chains: 135 | return [] 136 | 137 | messages = [] 138 | for chain in similar_chains: 139 | # Get the tool names for both current and historical tools 140 | current_tool_names = {t['function']['name'] for t in current_tools} 141 | historical_tool_names = {t['function']['name'] for t in chain.get('tools', [])} 142 | 143 | # Create tool availability message 144 | tool_message = "Available tools in this example:" 145 | for tool_name in historical_tool_names: 146 | status = "✓" if tool_name in current_tool_names else "✗" 147 | tool_message += f"\n- {tool_name} {status}" 148 | 149 | # Add system message with the example task and tool information 150 | messages.append({ 151 | "role": "system", 152 | "content": ( 153 | "\n" 154 | f"{chain['task']}\n\n" 155 | f"{tool_message}\n\n" 156 | "\n" 157 | "Slow down your thinking by breaking complex questions into multiple reasoning steps.\n" 158 | "Each individual reasoning step should be brief.\n" 159 | "Return after the last step." 160 | ) 161 | }) 162 | 163 | # Add the conversation history 164 | messages.extend(chain["conversation_history"]) 165 | 166 | # For each message, replace any instance of the substring TASK with EXAMPLE_TASK 167 | for i, msg in enumerate(messages): 168 | if 'TASK' in msg['content']: 169 | messages[i]['content'] = msg['content'].replace('CURRENT_TASK', 'EXAMPLE_TASK') 170 | messages[i]['content'] = msg['content'].replace('TASK', 'EXAMPLE_TASK') 171 | messages[i]['content'] = messages[i]['content'].replace('EXAMPLE_EXAMPLE_TASK', 'EXAMPLE_TASK') 172 | 173 | return messages -------------------------------------------------------------------------------- /chat_loop.py: -------------------------------------------------------------------------------- 1 | import requests 2 | import json 3 | from typing import List, Dict, Optional 4 | import os 5 | from dotenv import load_dotenv 6 | 7 | # Load environment variables 8 | load_dotenv() 9 | 10 | def call_reason_api( 11 | task: str, 12 | previous_chains: Optional[List[List[Dict]]] = None, 13 | api_key: Optional[str] = None, 14 | model: str = "openai/gpt-4o-mini", 15 | api_url: str = "https://openrouter.ai/api/v1/chat/completions", 16 | verbose: bool = True 17 | ) -> tuple[Dict, List[Dict]]: 18 | """Call the reasoning API and return the response and chain.""" 19 | 20 | url = "http://localhost:5050/reason" 21 | 22 | payload = { 23 | "task": task, 24 | "api_key": "[redacted]", 25 | "model": "openai/gpt-4o-mini", 26 | "api_url": "https://openrouter.ai/api/v1/chat/completions", 27 | "verbose": verbose 28 | } 29 | 30 | if previous_chains: 31 | payload["previous_chains"] = previous_chains 32 | 33 | try: 34 | response = requests.post(url, json=payload) 35 | response.raise_for_status() 36 | data = response.json() 37 | return data["response"], data["reasoning_chain"] 38 | 39 | except requests.exceptions.RequestException as e: 40 | print(f"Error calling API: {e}") 41 | if hasattr(e.response, 'text'): 42 | print(f"Response text: {e.response.text}") 43 | raise 44 | 45 | def main(): 46 | print("Welcome to the reasoning chat loop!") 47 | print("Type 'exit' to quit, 'clear' to start a new conversation.") 48 | print("Enter your message:") 49 | 50 | conversation_chains = [] 51 | 52 | while True: 53 | user_input = input("\n> ").strip() 54 | 55 | if user_input.lower() == 'exit': 56 | break 57 | 58 | if user_input.lower() == 'clear': 59 | conversation_chains = [] 60 | print("\nConversation cleared. Starting fresh!") 61 | continue 62 | 63 | try: 64 | # Call API with previous conversation chains 65 | response, chain = call_reason_api( 66 | task=user_input, 67 | previous_chains=conversation_chains 68 | ) 69 | 70 | # Print the response 71 | if isinstance(response, dict): 72 | if response.get('content'): 73 | print("\nAssistant:", response['content']) 74 | if response.get('tool_calls'): 75 | print("\nTool Calls:", json.dumps(response['tool_calls'], indent=2)) 76 | else: 77 | print("\nAssistant:", response) 78 | 79 | # Add this chain to our conversation history 80 | conversation_chains.append(chain) 81 | 82 | # Print conversation stats 83 | print(f"\n(Conversation history: {len(conversation_chains)} chains)") 84 | 85 | except Exception as e: 86 | print(f"\nError: {e}") 87 | 88 | if __name__ == "__main__": 89 | main() -------------------------------------------------------------------------------- /engine.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from e2b_code_interpreter import Sandbox 4 | from typing import List, Dict, Optional, Tuple, Union 5 | from colorama import init, Fore, Style 6 | from tools import execute_tool, clear_interpreter_state 7 | import json 8 | from datetime import datetime 9 | from chain_store import ( 10 | get_similar_chains, 11 | prepare_examples_messages 12 | ) 13 | from planner import generate_plan # Add this import at the top 14 | from call_ai import send_message_to_api, generate_best_candidate 15 | from helpers import validate_conversation 16 | 17 | # Initialize colorama for cross-platform colored output 18 | init() 19 | 20 | def thinking_loop( 21 | task: str, 22 | api_key: str, 23 | tools: List[Dict], 24 | model: str = 'gpt-4o-mini', 25 | temperature: float = 0.7, 26 | top_p: float = 1.0, 27 | max_tokens: int = 500, 28 | api_url: str = 'https://api.openai.com/v1/chat/completions', 29 | verbose: bool = False, 30 | chain_store_api_key: Optional[str] = None, 31 | wolfram_app_id: Optional[str] = None, 32 | max_reasoning_steps: Optional[int] = None, 33 | sandbox: Optional[Sandbox] = None, 34 | image: Optional[str] = None, 35 | reflection_mode: bool = False, 36 | previous_chains: Optional[List[List[Dict]]] = None, 37 | use_planning: bool = True, 38 | beam_search_enabled: bool = False, 39 | num_candidates: int = 1, 40 | use_jeremy_planning: bool = False, 41 | jina_api_key: Optional[str] = None 42 | ) -> List[Dict]: 43 | """ 44 | Execute the thinking loop and return the conversation history. 45 | Uses planning from memory to guide reasoning. 46 | """ 47 | conversation_history = [] 48 | continue_loop = True 49 | step_count = 1 50 | 51 | if verbose: 52 | print(f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}") 53 | print(f"{Fore.MAGENTA}│ Starting Thinking Loop{Style.RESET_ALL}") 54 | if max_reasoning_steps: 55 | print(f"{Fore.MAGENTA}│ Maximum steps: {max_reasoning_steps}{Style.RESET_ALL}") 56 | print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}\n") 57 | 58 | # Get similar chains and generate plan 59 | action_plan = "" 60 | if chain_store_api_key and use_planning: 61 | similar_chains = get_similar_chains(task, chain_store_api_key) 62 | if similar_chains: 63 | action_plan = generate_plan( 64 | task=task, 65 | similar_chains=similar_chains, 66 | current_tools=tools, 67 | api_key=api_key, 68 | model=model, 69 | api_url=api_url, 70 | verbose=verbose, 71 | metadata={ 72 | "model": model, 73 | "max_steps": max_reasoning_steps, 74 | "reflection_mode": reflection_mode 75 | } 76 | ) 77 | 78 | # Add previous chains directly to the conversation history 79 | if previous_chains: 80 | for chain in previous_chains: 81 | conversation_history.extend(chain) 82 | 83 | # Create the system message for the current task 84 | tool_list = [] 85 | tool_list.append("find_datapoint_on_web: Search Google using SERPAPI to find factual information. Returns top search results with titles, snippets, and URLs.") 86 | tool_list.append("python: For executing Python code") 87 | 88 | if wolfram_app_id: 89 | tools.append("wolfram: Query Wolfram Alpha for precise mathematical, scientific, and factual computations") 90 | 91 | if jina_api_key: 92 | tools.append("get_webpage_content: Retrieve detailed content from specific webpages using Jina API. Use this when you want to read the full content of a webpage") 93 | 94 | tools_description = "You have access to these tools:\n" + "\n".join(f"{i+1}. {tool}" for i, tool in enumerate(tools)) 95 | 96 | # Include the generated plan in the system message 97 | plan_section = "" 98 | if action_plan: 99 | plan_section = f"\n\n{action_plan}\n\n" 100 | 101 | # Update the web search instructions in the system message 102 | web_search_instructions = ( 103 | "\nWhen searching the web:\n" 104 | "- The find_datapoint_on_web tool uses SERPAPI to search Google with enhanced results\n" 105 | "- Results may include knowledge graph data, featured snippets, and detailed summaries\n" 106 | "- Each result contains multiple sections including titles, snippets, and structured data\n" 107 | "- Make queries specific and focused on finding factual information\n" 108 | "- Use keywords rather than full questions for better search results\n" 109 | "- Cross-reference information from multiple sources when possible\n" 110 | "- If initial results don't contain enough detail, try searching with different keywords\n" 111 | "- Always cite sources when providing information from search results\n" 112 | ) 113 | 114 | system_message = { 115 | 'role': 'system', 116 | 'content': ( 117 | f"\n{task}\n\n" 118 | f"{plan_section}" 119 | "\n" 120 | "Slow down your thinking by breaking complex questions into multiple reasoning steps.\n" 121 | "Each individual reasoning step should be brief.\n" 122 | f"{tools_description}\n\n" 123 | "When you need to write or test Python code, use the python tool.\n" 124 | "When you need to search for information, use the find_datapoint_on_web tool.\n" 125 | + ( 126 | "When you need precise mathematical or scientific computations, use the wolfram tool.\n" 127 | if wolfram_app_id else "" 128 | ) + 129 | f"{web_search_instructions}\n" 130 | "\nWhen writing Python code:\n" 131 | "- If your code produces an error, add print statements to debug the issue\n" 132 | "- Use assertions/prints to validate inputs, intermediate results, and outputs\n" 133 | "- Print the state to see what's happening\n" 134 | "- When an error occurs, systematically add checks to identify where the problem is\n" 135 | "- Structure your debugging process step by step\n" 136 | + ( 137 | "\nWhen using Wolfram Alpha:\n" 138 | "- Use for precise mathematical calculations and scientific data\n" 139 | "- Phrase queries clearly and specifically\n" 140 | "- Great for unit conversions, equations, and factual data\n" 141 | if wolfram_app_id else "" 142 | ) + 143 | "\nReturn after the last step." 144 | ) 145 | } 146 | 147 | # Start with system message and previous chains 148 | full_conversation_history = conversation_history + [system_message] 149 | 150 | if image: 151 | full_conversation_history.append({ 152 | 'role': 'user', 153 | 'content': [ 154 | { 155 | 'type': 'text', 156 | 'text': f"Here is the image the user provided:" 157 | }, 158 | { 159 | 'type': 'image_url', 160 | 'image_url': { 161 | 'url': image 162 | } 163 | } 164 | ] 165 | }) 166 | 167 | # Add initial planning step request 168 | if use_jeremy_planning: 169 | initial_planning_message = { 170 | 'role': 'user', 171 | 'content': ( 172 | # 'Before we begin solving the task, let\'s create a detailed plan. Please:\n' 173 | # '1. Break down the task into clear sub-goals\n' 174 | # '2. Identify which tools will be needed for each sub-goal\n' 175 | # '3. Outline potential challenges and how to address them\n' 176 | # '4. Determine verification criteria for each sub-goal\n' 177 | # '5. Most importantly, generate a suite of test cases for each sub-goal, as well as test cases for the overall task\n' 178 | # "In this planning step, make it very clear that until each test case is verified, we should not proceed with the actual solution.\n" 179 | # 'Provide a structured plan before we proceed with the actual solution.' 180 | "Before we move on, make a list of wrong assumptions people sometimes make about the concepts included in the question." 181 | ) 182 | } 183 | conversation_history.append(initial_planning_message) 184 | full_conversation_history.append(initial_planning_message) 185 | 186 | # Get planning response 187 | planning_response = send_message_to_api( 188 | task, 189 | full_conversation_history, 190 | api_key, 191 | tools, 192 | model, 193 | temperature, 194 | top_p, 195 | max_tokens, 196 | api_url, 197 | verbose, 198 | tool_choice="none", 199 | ) 200 | 201 | # Add planning response to histories 202 | planning_message = { 203 | 'role': 'assistant', 204 | 'content': planning_response.get('content'), 205 | 'tool_calls': planning_response.get('tool_calls', None) 206 | } 207 | conversation_history.append(planning_message) 208 | full_conversation_history.append(planning_message) 209 | 210 | while continue_loop: 211 | # Check if we've exceeded max steps 212 | if max_reasoning_steps and step_count > max_reasoning_steps: 213 | if verbose: 214 | print(f"\n{Fore.YELLOW}Maximum reasoning steps ({max_reasoning_steps}) reached. Forcing completion.{Style.RESET_ALL}") 215 | 216 | # Add a system message explaining the forced stop 217 | force_stop_message = { 218 | 'role': 'system', 219 | 'content': ( 220 | f"Maximum reasoning steps ({max_reasoning_steps}) reached. " 221 | ) 222 | } 223 | conversation_history.append(force_stop_message) 224 | full_conversation_history.append(force_stop_message) 225 | 226 | # Add a user message requesting the final answer 227 | final_user_message = { 228 | 'role': 'user', 229 | 'content': ( 230 | 'Based on your reasoning so far, provide your final answer to the CURRENT_TASK. ' 231 | 'Make your response complete and self-contained since this will be shown to the user.' 232 | "Please provide your final answer based on what you've learned so far. " 233 | "Do not return , and **you are not allowed to use any tools**. Just respond with your final answer." 234 | ) 235 | } 236 | conversation_history.append(final_user_message) 237 | full_conversation_history.append(final_user_message) 238 | 239 | # Get final response when hitting max steps 240 | response = send_message_to_api( 241 | task, 242 | full_conversation_history, 243 | api_key, 244 | tools, 245 | model, 246 | temperature, 247 | top_p, 248 | max_tokens, 249 | api_url, 250 | verbose 251 | ) 252 | print('Final response:', response) 253 | 254 | # Add the final response to histories 255 | assistant_message = { 256 | 'role': 'assistant', 257 | 'content': response.get('content'), 258 | 'tool_calls': response.get('tool_calls', None) 259 | } 260 | conversation_history.append(assistant_message) 261 | full_conversation_history.append(assistant_message) 262 | 263 | if verbose and response.get('content'): 264 | print(f"\n{Fore.GREEN}Final Response after max steps:{Style.RESET_ALL}") 265 | print(response.get('content')) 266 | 267 | # Return here to skip the additional final response request 268 | return full_conversation_history 269 | 270 | if verbose: 271 | print(f"\n{Fore.BLUE}Step {step_count}{Style.RESET_ALL}") 272 | print(f"{Fore.BLUE}{'─' * 40}{Style.RESET_ALL}") 273 | 274 | # Determine which message to send based on reflection mode and step count 275 | if reflection_mode and step_count % 2 == 0: 276 | # Even steps in reflection mode are for reflection 277 | user_message = { 278 | 'role': 'user', 279 | 'content': ( 280 | 'Reflect on your last step — check for mistakes. ' 281 | 'Consider:\n' 282 | '1. Are your assumptions valid and well-justified?\n' 283 | '2. Did you make any logical errors or jumps in reasoning?\n' 284 | '3. Is there a more effective or efficient approach?\n' 285 | 'Explain your analysis, whether you find issues or confirm the step was sound.\n' 286 | 'Do not make a snap decision. Think carefully before deciding if the step is free of mistakes.\n' 287 | 'Be brief and to the point.\n' 288 | 'If this is the final step, return .' 289 | ) 290 | } # Note — these reflection steps are often a bit long, which may lead to the non-reflection steps doing more work per step than they should. Figure this out later. 291 | else: 292 | if False: # until we've perfected this, let's not use it (it seems to slightly reduce performance, interestingly) 293 | user_message = { 294 | 'role': 'user', 295 | 'content': ( 296 | 'Think about your first reasoning step to perform the CURRENT_TASK. ' 297 | 'Return just the first step. ' 298 | 'Remember, steps should be very brief. ' 299 | ) 300 | } 301 | else: 302 | # Odd steps or non-reflection mode use the original message 303 | user_message = { 304 | 'role': 'user', 305 | 'content': ( 306 | 'Think about your next reasoning step to perform the CURRENT_TASK. ' 307 | 'Return just the next step. ' 308 | 'Remember, steps should be very brief. ' 309 | 'If this is the final step, return .' 310 | # """Think about your next reasoning step. Consider: 311 | # 1. What did you observe in the previous step's results? 312 | # 2. What needs to be validated or corrected based on those results? 313 | # 3. What's the most logical next step to make progress? 314 | # Return a brief step focused on making concrete progress. 315 | # If this is the final step, return .""" 316 | ) 317 | } 318 | 319 | # Add to both conversation histories 320 | conversation_history.append(user_message) 321 | full_conversation_history.append(user_message) 322 | 323 | # Get response from AI API 324 | if beam_search_enabled: 325 | response = generate_best_candidate( 326 | task, 327 | full_conversation_history, 328 | api_key, 329 | tools, 330 | num_candidates, 331 | model, 332 | temperature, 333 | top_p, 334 | max_tokens, 335 | api_url, 336 | verbose, 337 | is_first_step=(step_count == 1) 338 | ) 339 | else: 340 | response = send_message_to_api( 341 | task, 342 | full_conversation_history, 343 | api_key, 344 | tools, 345 | model, 346 | temperature, 347 | top_p, 348 | max_tokens, 349 | api_url, 350 | verbose, 351 | is_first_step=(step_count == 1) 352 | ) 353 | 354 | # Add assistant's response to both histories 355 | assistant_message = { 356 | 'role': 'assistant', 357 | 'content': response.get('content'), 358 | 'tool_calls': response.get('tool_calls', None) 359 | } 360 | conversation_history.append(assistant_message) 361 | full_conversation_history.append(assistant_message) 362 | 363 | if verbose and response.get('content'): 364 | print(f"\n{Fore.GREEN}Assistant: {Style.RESET_ALL}{response['content']}") 365 | 366 | # Handle tool calls 367 | if 'tool_calls' in response and response['tool_calls']: 368 | for tool_call in response['tool_calls']: 369 | if verbose: 370 | print(f"\n{Fore.YELLOW}╭──────────────────────────────────────────{Style.RESET_ALL}") 371 | print(f"{Fore.YELLOW}│ Tool Call Detected{Style.RESET_ALL}") 372 | print(f"{Fore.YELLOW}├──────────────────────────────────────────{Style.RESET_ALL}") 373 | 374 | try: 375 | # Execute tool and get result 376 | tool_name = tool_call['function']['name'] 377 | 378 | # Add error handling for argument parsing 379 | try: 380 | if 'arguments' not in tool_call['function'] or not tool_call['function']['arguments']: 381 | error_msg = "No arguments provided in tool call" 382 | if verbose: 383 | print(f"{Fore.RED}{error_msg}{Style.RESET_ALL}") 384 | raise ValueError(error_msg) 385 | 386 | arguments = json.loads(tool_call['function']['arguments']) 387 | 388 | except json.JSONDecodeError as e: 389 | error_msg = f"Invalid JSON in tool arguments: {tool_call['function'].get('arguments', 'NO_ARGS')}" 390 | if verbose: 391 | print(f"{Fore.RED}{error_msg}{Style.RESET_ALL}") 392 | print(f"{Fore.RED}Error: {str(e)}{Style.RESET_ALL}") 393 | raise ValueError(error_msg) 394 | 395 | if verbose: 396 | print(f"{Fore.YELLOW}│ Tool: {Style.RESET_ALL}{tool_name}") 397 | print(f"{Fore.YELLOW}│ Arguments: {Style.RESET_ALL}{json.dumps(arguments, indent=2)}") 398 | 399 | result = execute_tool( 400 | tool_name, 401 | arguments, 402 | task=task, 403 | api_key=api_key, 404 | model=model, 405 | api_url=api_url, 406 | wolfram_app_id=wolfram_app_id, 407 | sandbox=sandbox, 408 | jina_api_key=jina_api_key 409 | ) 410 | 411 | # Add tool result to both histories 412 | tool_message = { 413 | 'role': 'tool', 414 | 'tool_call_id': tool_call['id'], 415 | 'content': str(result) 416 | } 417 | conversation_history.append(tool_message) 418 | full_conversation_history.append(tool_message) 419 | 420 | if verbose: 421 | print(f"{Fore.YELLOW}│ Result: {Style.RESET_ALL}{result}") 422 | print(f"{Fore.YELLOW}╰──────────────────────────────────────────{Style.RESET_ALL}\n") 423 | 424 | except Exception as e: 425 | error_msg = str(e) 426 | if verbose: 427 | print(f"{Fore.RED}Error executing tool: {error_msg}{Style.RESET_ALL}") 428 | 429 | # Add error message to conversation history so model can correct its approach 430 | error_message = { 431 | 'role': 'tool', 432 | 'content': ( 433 | f"Error using {tool_name} tool: {error_msg}\n" 434 | "Please correct your approach and try again." 435 | ), 436 | 'tool_call_id': tool_call['id'] 437 | } 438 | conversation_history.append(error_message) 439 | full_conversation_history.append(error_message) 440 | continue 441 | 442 | # Check for termination conditions 443 | if response.get('content'): 444 | termination_phrases = [ 445 | '', 'done', 'there is no next step.', 446 | 'this conversation is complete', 'the conversation has ended.', 447 | 'this conversation is finished.', 'the conversation has concluded.' 448 | ] 449 | 450 | if any(term in response['content'].lower() for term in termination_phrases): 451 | if verbose: 452 | print(f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}") 453 | print(f"{Fore.MAGENTA}│ Thinking Loop Complete{Style.RESET_ALL}") 454 | print(f"{Fore.MAGENTA}│ Total Steps: {step_count}{Style.RESET_ALL}") 455 | print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}\n") 456 | continue_loop = False 457 | 458 | step_count += 1 459 | 460 | return full_conversation_history 461 | 462 | def complete_reasoning_task( 463 | task: str, 464 | api_key: Optional[str] = None, 465 | model: str = 'gpt-4o-mini', 466 | temperature: float = 0.7, 467 | top_p: float = 1.0, 468 | max_tokens: int = 3000, 469 | api_url: str = 'https://api.openai.com/v1/chat/completions', 470 | verbose: bool = False, 471 | log_conversation: bool = False, 472 | chain_store_api_key: Optional[str] = None, 473 | wolfram_app_id: Optional[str] = None, 474 | max_reasoning_steps: Optional[int] = None, 475 | image: Optional[str] = None, 476 | output_tools: Optional[List[Dict]] = None, 477 | reflection_mode: bool = False, 478 | previous_chains: Optional[List[List[Dict]]] = None, 479 | use_planning: bool = False, 480 | beam_search_enabled: bool = False, 481 | num_candidates: int = 1, 482 | use_jeremy_planning: bool = False, 483 | jina_api_key: Optional[str] = None 484 | ) -> Tuple[Union[str, Dict], List[Dict], List[Dict], List[Dict]]: 485 | """ 486 | Execute the reasoning task and return the final response. 487 | Now supports optional structured output via output_tools, reflection mode, 488 | and previous conversation chains. 489 | """ 490 | sandbox = None 491 | try: 492 | # Clear Python interpreter state for just this task 493 | clear_interpreter_state(task=task) 494 | 495 | if api_key is None: 496 | raise ValueError('API key not provided.') 497 | 498 | if verbose: 499 | print(f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}") 500 | print(f"{Fore.MAGENTA}│ Starting Task{Style.RESET_ALL}") 501 | print(f"{Fore.MAGENTA}├──────────────────────────────────────────{Style.RESET_ALL}") 502 | print(f"{Fore.MAGENTA}│ {task}{Style.RESET_ALL}") 503 | if previous_chains: 504 | print(f"{Fore.MAGENTA}│ With {len(previous_chains)} previous conversation chains{Style.RESET_ALL}") 505 | print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}\n") 506 | 507 | # Initialize E2B sandbox for Python code execution 508 | timeout = 60 * 15 # 10 minutes 509 | for attempt in range(3): # Try 3 times 510 | try: 511 | sandbox = Sandbox(timeout=timeout) 512 | break # If successful, exit the loop 513 | except Exception as e: 514 | if attempt == 2: # If this was the last attempt 515 | raise Exception(f"Failed to create sandbox after 3 attempts. Last error: {e}") 516 | continue 517 | 518 | # Define thinking tools (internal tools that can be used during reasoning) 519 | thinking_tools = [ 520 | { 521 | "type": "function", 522 | "function": { 523 | "name": "python", 524 | "description": "Execute Python code and return the output.", 525 | "parameters": { 526 | "type": "object", 527 | "properties": { 528 | "code": { 529 | "type": "string", 530 | "description": "The Python code to execute" 531 | }, 532 | }, 533 | "required": ["code"] 534 | } 535 | } 536 | }, 537 | { 538 | "type": "function", 539 | "function": { 540 | "name": "find_datapoint_on_web", 541 | "description": "Search Google using SERPAPI to find factual information. Returns top search results with titles, snippets, and URLs.", 542 | "parameters": { 543 | "type": "object", 544 | "properties": { 545 | "query": { 546 | "type": "string", 547 | "description": "The specific query" 548 | } 549 | }, 550 | "required": ["query"] 551 | } 552 | } 553 | } 554 | ] 555 | 556 | # Add Wolfram tool if wolfram_app_id is provided 557 | if wolfram_app_id: 558 | thinking_tools.append({ 559 | "type": "function", 560 | "function": { 561 | "name": "wolfram", 562 | "description": "Query Wolfram Alpha for computations, math, science, and knowledge. Great for mathematical analysis, scientific calculations, data analysis, and fact-checking.", 563 | "parameters": { 564 | "type": "object", 565 | "properties": { 566 | "query": { 567 | "type": "string", 568 | "description": "The query to send to Wolfram Alpha. Be specific and precise." 569 | }, 570 | "include_pods": { 571 | "type": "array", 572 | "items": { 573 | "type": "string" 574 | }, 575 | "description": "Optional list of pod names to include (e.g., ['Result', 'Solution', 'Plot']). Leave empty for all pods.", 576 | "default": None 577 | }, 578 | "max_width": { 579 | "type": "integer", 580 | "description": "Maximum width for plots/images", 581 | "default": 1000 582 | } 583 | }, 584 | "required": ["query"] 585 | } 586 | } 587 | }) 588 | 589 | # Add Jina tool if jina_api_key is provided 590 | if jina_api_key: 591 | thinking_tools.append({ 592 | "type": "function", 593 | "function": { 594 | "name": "get_webpage_content", 595 | "description": "Retrieve the content of a webpage using Jina API. Useful for reading detailed content from search results or specific URLs.", 596 | "parameters": { 597 | "type": "object", 598 | "properties": { 599 | "url": { 600 | "type": "string", 601 | "description": "The URL of the webpage to fetch content from" 602 | } 603 | }, 604 | "required": ["url"] 605 | } 606 | } 607 | }) 608 | 609 | # Add output tools description 610 | output_tools_description = "" 611 | if output_tools: 612 | output_tools_description = "\n\nWhen providing your final response, you can use these output functions (but you don't have access to them during reasoning steps):\n" 613 | for tool in output_tools: 614 | output_tools_description += f"- {tool['function']['name']}: {tool['function']['description']}\n" 615 | 616 | # Create initial conversation history with previous chains 617 | conversation_history = [] 618 | if previous_chains: 619 | for chain in previous_chains: 620 | conversation_history.extend(chain) 621 | 622 | # Run thinking loop with thinking tools 623 | conversation_history = thinking_loop( 624 | task, 625 | api_key, 626 | thinking_tools, 627 | model, 628 | temperature, 629 | top_p, 630 | max_tokens, 631 | api_url, 632 | verbose, 633 | chain_store_api_key=chain_store_api_key, 634 | wolfram_app_id=wolfram_app_id, 635 | max_reasoning_steps=max_reasoning_steps, 636 | sandbox=sandbox, 637 | image=image, 638 | reflection_mode=reflection_mode, 639 | previous_chains=previous_chains, 640 | use_planning=use_planning, 641 | beam_search_enabled=beam_search_enabled, 642 | num_candidates=num_candidates, 643 | use_jeremy_planning=use_jeremy_planning, 644 | jina_api_key=jina_api_key 645 | ) 646 | 647 | # Only request final response if we didn't hit max steps 648 | final_response = None 649 | if not max_reasoning_steps or len([m for m in conversation_history if m['role'] == 'system' and 'Maximum reasoning steps' in m.get('content', '')]) == 0: 650 | # Add final completion request 651 | final_user_message = { 652 | 'role': 'user', 653 | 'content': ( 654 | 'Complete the . Do not return . ' 655 | 'Note that the user will only see what you return here. ' 656 | 'None of the steps you have taken will be shown to the user, so ensure you return the final answer. ' 657 | + ('You can return a text response and/or use one of the available output functions.' if output_tools else '') 658 | ) 659 | } 660 | conversation_history.append(final_user_message) 661 | 662 | if verbose: 663 | print(f"{Fore.CYAN}Requesting final response...{Style.RESET_ALL}\n") 664 | 665 | # Get final response with output tools if provided 666 | 667 | # Wrapping in try/except to catch any errors and try again with validated conversation history — for now... just because I'm not 100% sure if the validation is working and I don't want to risk messing up already solid chains 668 | try: 669 | final_response = send_message_to_api( 670 | task, 671 | conversation_history, 672 | api_key, 673 | output_tools if output_tools else thinking_tools, # Use output tools for final response if provided 674 | model, 675 | temperature, 676 | top_p, 677 | max_tokens, 678 | api_url, 679 | verbose 680 | ) 681 | except Exception as e: 682 | print(f"{Fore.RED}Error sending final response: {e}{Style.RESET_ALL}") 683 | print(f"{Fore.YELLOW}Trying again with validated conversation history...{Style.RESET_ALL}") 684 | final_response = send_message_to_api( 685 | task, 686 | validate_conversation(conversation_history), 687 | api_key, 688 | output_tools if output_tools else thinking_tools, 689 | model, 690 | temperature, 691 | top_p, 692 | max_tokens, 693 | api_url, 694 | verbose 695 | ) 696 | 697 | # Add the final response to the conversation history 698 | assistant_message = { 699 | 'role': 'assistant', 700 | 'content': final_response.get('content'), 701 | 'tool_calls': final_response.get('tool_calls', None) 702 | } 703 | conversation_history.append(assistant_message) 704 | else: 705 | # Use the last assistant message as the final response 706 | final_response = next( 707 | (msg for msg in reversed(conversation_history) 708 | if msg['role'] == 'assistant' and msg.get('content')), 709 | {'content': None} 710 | ) 711 | 712 | # Print final response if verbose 713 | if verbose and ('content' in final_response or 'tool_calls' in final_response): 714 | print(f'\n{Fore.GREEN}Final Response:{Style.RESET_ALL}') 715 | if 'content' in final_response and 'tool_calls' in final_response: 716 | print(f"Content: {final_response['content']}") 717 | print(f"Tool Calls: {final_response['tool_calls']}") 718 | elif 'content' in final_response: 719 | print(final_response['content']) 720 | else: 721 | print(final_response['tool_calls']) 722 | 723 | if 'tool_calls' in final_response: 724 | final_response_tool_calls = final_response['tool_calls'] 725 | else: 726 | final_response_tool_calls = None 727 | 728 | if 'content' in final_response: 729 | final_response_content = final_response['content'] 730 | else: 731 | final_response_content = None 732 | 733 | # Log conversation history if logging is enabled 734 | if log_conversation: 735 | # Remove example chains from conversation history by removing everything prior to the bottom-most system message 736 | ### THIS MAY NOT WORK IF WE'RE INJECTING SYSTEM MESSAGES INTO THE CHAIN (I THINK WE'RE DOING THIS, SO IT'S WORTH REVISITING)! 737 | bottom_system_message_index = next((i for i, msg in enumerate(reversed(conversation_history)) if msg.get('role') == 'system'), None) 738 | if bottom_system_message_index is not None: 739 | conversation_history = conversation_history[-bottom_system_message_index:] 740 | 741 | # Create logs directory if it doesn't exist 742 | os.makedirs('logs', exist_ok=True) 743 | 744 | # Create filename with timestamp 745 | timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') 746 | filename = f'logs/conversation_{timestamp}.json' 747 | 748 | # Prepare log data 749 | log_data = { 750 | 'task': task, 751 | 'model': model, 752 | 'temperature': temperature, 753 | 'top_p': top_p, 754 | 'max_tokens': max_tokens, 755 | 'api_url': api_url, 756 | 'reasoning_chain': conversation_history, 757 | 'final_response': final_response_content, 758 | 'final_response_tool_calls': final_response_tool_calls, 759 | 'thinking_tools': thinking_tools, 760 | 'output_tools': output_tools 761 | } 762 | 763 | # Write to file 764 | try: 765 | with open(filename, 'w', encoding='utf-8') as f: 766 | json.dump(log_data, f, indent=2, ensure_ascii=False) 767 | if verbose: 768 | print(f"\n{Fore.CYAN}Conversation history logged to: {Style.RESET_ALL}{filename}") 769 | except Exception as e: 770 | if verbose: 771 | print(f"\n{Fore.RED}Failed to log conversation history: {Style.RESET_ALL}{str(e)}") 772 | 773 | return {'content': final_response_content, 'tool_calls': final_response_tool_calls}, conversation_history, thinking_tools, output_tools 774 | 775 | finally: 776 | # Clean up sandbox resources 777 | if sandbox: 778 | sandbox.kill() 779 | -------------------------------------------------------------------------------- /helpers.py: -------------------------------------------------------------------------------- 1 | def validate_conversation(history): 2 | """ 3 | Before generating the final response, ensure all tool calls have responses. 4 | If a tool call doesn't have a response, include it in the message content instead. 5 | """ 6 | tool_call_ids = set() 7 | tool_response_ids = set() 8 | 9 | for message in history: 10 | if message.get("role") == "assistant" and message.get("tool_calls"): 11 | for tool_call in message["tool_calls"]: 12 | tool_call_ids.add(tool_call["id"]) 13 | elif message.get("role") == "tool": 14 | tool_response_ids.add(message["tool_call_id"]) 15 | 16 | # If there are unmatched tool calls, convert them to content 17 | if tool_call_ids != tool_response_ids: 18 | filtered_history = [] 19 | 20 | for message in history: 21 | if message.get("role") == "assistant" and message.get("tool_calls"): 22 | message_copy = message.copy() 23 | content = message_copy.get("content", "") 24 | 25 | # Convert unmatched tool calls to content 26 | for tool_call in message_copy["tool_calls"]: 27 | if tool_call["id"] not in tool_response_ids: 28 | tool_content = f"\nTool Call: {tool_call['function']['name']}({tool_call['function']['arguments']})" 29 | content = (content + tool_content) if content else tool_content 30 | 31 | # Only keep matched tool calls 32 | message_copy["tool_calls"] = [tc for tc in message_copy["tool_calls"] 33 | if tc["id"] in tool_response_ids] 34 | message_copy["content"] = content 35 | 36 | filtered_history.append(message_copy) 37 | else: 38 | filtered_history.append(message) 39 | 40 | return filtered_history 41 | return history -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | # main.py 2 | from dotenv import load_dotenv 3 | import os 4 | from engine import complete_reasoning_task 5 | from mixture import ensemble 6 | import chain_store 7 | 8 | load_dotenv() 9 | 10 | OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY") 11 | JINA_API_KEY = os.environ.get("JINA_API_KEY") 12 | COHERE_API_KEY = os.environ.get("COHERE_API_KEY") 13 | 14 | def save_chain_prompt() -> bool: 15 | """Prompt user if they want to save the chain.""" 16 | while True: 17 | response = input("\nWould you like to save this reasoning chain for future reference? (y/n): ").lower() 18 | if response in ['y', 'yes']: 19 | return True 20 | elif response in ['n', 'no']: 21 | return False 22 | print("Please answer 'y' or 'n'") 23 | 24 | def main(): 25 | # Initialize store 26 | chain_store.init_store() 27 | 28 | task = """ 29 | Create a Python implementation of a Red-Black Tree with the following operations: 30 | 1. Insert a node 31 | 2. Delete a node 32 | 3. Search for a node 33 | 4. Print the tree in-order 34 | 35 | The implementation should maintain all Red-Black Tree properties: 36 | - Every node is either red or black 37 | - The root is black 38 | - All leaves (NIL) are black 39 | - If a node is red, then both its children are black 40 | - Every path from root to leaves contains the same number of black nodes 41 | 42 | Test the implementation by: 43 | 1. Inserting the numbers [7, 3, 18, 10, 22, 8, 11, 26, 2, 6, 13] 44 | 2. Printing the tree structure showing colors 45 | 3. Deleting nodes 18 and 11 46 | 4. Printing the final tree structure 47 | 5. Searching for both present and non-present values 48 | 49 | Use the Python interpreter tool to implement and test this data structure. 50 | """ 51 | 52 | model = "anthropic/claude-3.5-sonnet" 53 | api_url = "https://openrouter.ai/api/v1/chat/completions" 54 | 55 | # Run the engine 56 | response, conversation_history, thinking_tools, output_tools = complete_reasoning_task( 57 | task=task, 58 | api_key=OPENROUTER_API_KEY, 59 | model=model, 60 | api_url=api_url, 61 | verbose=True, 62 | use_planning=False, 63 | jina_api_key=JINA_API_KEY 64 | ) 65 | 66 | # Check if the run was successful (no errors in response) 67 | if isinstance(response, dict) and not response.get('error'): 68 | # Ask user if they want to save the chain 69 | if save_chain_prompt(): 70 | try: 71 | # Save the chain 72 | chain_store.save_successful_chain( 73 | task=task, 74 | conversation_history=conversation_history, 75 | final_response=response, 76 | cohere_api_key=COHERE_API_KEY, 77 | thinking_tools=thinking_tools, 78 | output_tools=output_tools, 79 | metadata={"model": model, "api_url": api_url} 80 | ) 81 | print("Chain saved successfully!") 82 | except Exception as e: 83 | print(f"Error saving chain: {str(e)}") 84 | else: 85 | print("Run contained errors - skipping chain save prompt") 86 | 87 | return response, conversation_history, thinking_tools, output_tools 88 | 89 | if __name__ == "__main__": 90 | main() -------------------------------------------------------------------------------- /mixture.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Optional, Tuple, Any, Union 2 | from engine import complete_reasoning_task 3 | import json 4 | from colorama import init, Fore, Style 5 | from concurrent.futures import ThreadPoolExecutor, as_completed 6 | 7 | # Initialize colorama for cross-platform colored output 8 | init() 9 | 10 | def run_agent( 11 | task: str, 12 | agent_config: Dict[str, str], 13 | verbose: bool = False, 14 | chain_store_api_key: Optional[str] = None, 15 | max_reasoning_steps: Optional[int] = None, 16 | wolfram_app_id: Optional[str] = None, 17 | default_temperature: float = 0.7, 18 | top_p: float = 1.0, 19 | max_tokens: int = 500, 20 | image: Optional[str] = None, 21 | output_tools: Optional[List[Dict]] = None, 22 | reflection_mode: bool = False 23 | ) -> Tuple[Dict[str, str], str, List[Dict], List[Dict]]: 24 | """ 25 | Run a single agent with the given configuration. 26 | 27 | Args: 28 | task: The task to complete 29 | agent_config: Dictionary containing 'model', 'api_key', and 'api_url' 30 | verbose: Whether to show detailed output 31 | chain_store_api_key: API key for chain store if using 32 | max_reasoning_steps: Maximum number of reasoning steps for this agent 33 | wolfram_app_id: Wolfram Alpha app ID if using 34 | default_temperature: Default temperature for the model if using 35 | top_p: Top p for the model if using 36 | max_tokens: Maximum number of tokens for the model if using 37 | image: Optional image to pass to the model if using 38 | output_tools: Optional list of output tools for the model if using 39 | reflection_mode: Whether to enable reflection mode for this agent 40 | Returns: 41 | Tuple of (agent_config, final_response, conversation_history, thinking_tools, output_tools) 42 | """ 43 | # Reinitialize colorama for this process 44 | init(autoreset=True) 45 | 46 | if verbose: 47 | print(f"\n{Fore.CYAN}Running agent with model: {Style.RESET_ALL}{agent_config['model']}") 48 | if max_reasoning_steps: 49 | print(f"{Fore.CYAN}Max steps: {Style.RESET_ALL}{max_reasoning_steps}") 50 | print(f"{Fore.CYAN}Temperature: {Style.RESET_ALL}{agent_config.get('temperature', default_temperature)}") 51 | 52 | if verbose and reflection_mode: 53 | print(f"{Fore.CYAN}Reflection mode: {Style.RESET_ALL}Enabled") 54 | 55 | response, history, thinking_tools, output_tools = complete_reasoning_task( 56 | task=task, 57 | api_key=agent_config['api_key'], 58 | model=agent_config['model'], 59 | api_url=agent_config['api_url'], 60 | verbose=verbose, 61 | chain_store_api_key=chain_store_api_key, 62 | max_reasoning_steps=max_reasoning_steps, 63 | wolfram_app_id=wolfram_app_id, 64 | temperature=agent_config.get('temperature', default_temperature), 65 | top_p=top_p, 66 | max_tokens=max_tokens, 67 | image=image, 68 | output_tools=output_tools, 69 | reflection_mode=reflection_mode 70 | ) 71 | 72 | # Remove example chains from conversation history 73 | bottom_system_message_index = next((i for i, msg in enumerate(reversed(history)) if msg.get('role') == 'system'), None) 74 | if bottom_system_message_index is not None: 75 | history = history[-bottom_system_message_index:] 76 | 77 | return agent_config, response, history, thinking_tools, output_tools 78 | 79 | def format_agent_results( 80 | agent_results: List[Tuple[Dict[str, str], str, List[Dict], List[Dict]]] 81 | ) -> str: 82 | """Format the results from multiple agents into a prompt for the coordinator.""" 83 | formatted_results = "Here are the responses from different AI models:\n\n" 84 | 85 | for i, (agent_config, response, history, thinking_tools, output_tools) in enumerate(agent_results, 1): 86 | formatted_results += f"Model {i} ({agent_config['model']}):\n" 87 | formatted_results += "Reasoning steps:\n" 88 | 89 | # Extract reasoning steps from history 90 | for msg in history: 91 | if msg['role'] == 'assistant': 92 | if msg.get('content'): 93 | formatted_results += f"- {msg['content']}\n" 94 | elif msg['role'] == 'tool': 95 | formatted_results += f" Tool result: {msg['content']}\n" 96 | 97 | formatted_results += f"\nFinal response:\n{response}\n\n" 98 | formatted_results += "─" * 50 + "\n\n" 99 | 100 | return formatted_results 101 | 102 | def run_agents_parallel( 103 | task: str, 104 | agents: List[Dict[str, str]], 105 | verbose: bool = False, 106 | chain_store_api_key: Optional[str] = None, 107 | max_workers: Optional[int] = None, 108 | max_reasoning_steps: Optional[int] = None, 109 | wolfram_app_id: Optional[str] = None, 110 | temperature: float = 0.7, 111 | top_p: float = 1.0, 112 | max_tokens: int = 500, 113 | image: Optional[str] = None, 114 | output_tools: Optional[List[Dict]] = None, 115 | reflection_mode: bool = False 116 | ) -> List[Tuple[Dict[str, str], str, List[Dict], List[Dict]]]: 117 | """Run multiple agents in parallel.""" 118 | with ThreadPoolExecutor(max_workers=max_workers) as executor: 119 | # Submit all tasks 120 | future_to_agent = { 121 | executor.submit( 122 | run_agent, 123 | task, 124 | agent, 125 | verbose, 126 | chain_store_api_key, 127 | max_reasoning_steps, 128 | wolfram_app_id, 129 | temperature, 130 | top_p, 131 | max_tokens, 132 | image, 133 | output_tools, 134 | reflection_mode 135 | ): agent for agent in agents 136 | } 137 | 138 | # Collect results as they complete 139 | results = [] 140 | for future in as_completed(future_to_agent): 141 | try: 142 | result = future.result() 143 | results.append(result) 144 | except Exception as e: 145 | if verbose: 146 | agent = future_to_agent[future] 147 | print(f"\n{Fore.RED}Error with model {agent['model']}: {str(e)}{Style.RESET_ALL}") 148 | 149 | return results 150 | 151 | def ensemble( 152 | task: str, 153 | agents: List[Dict[str, str]], 154 | coordinator: Dict[str, str], 155 | verbose: bool = False, 156 | chain_store_api_key: Optional[str] = None, 157 | max_workers: Optional[int] = None, 158 | return_reasoning: bool = False, 159 | max_reasoning_steps: Optional[int] = None, 160 | coordinator_max_steps: Optional[int] = None, 161 | wolfram_app_id: Optional[str] = None, 162 | temperature: float = 0.7, 163 | top_p: float = 1.0, 164 | max_tokens: int = 500, 165 | image: Optional[str] = None, 166 | output_tools: Optional[List[Dict]] = None, 167 | reflection_mode: bool = False 168 | ) -> Union[str, Tuple[str, List[Tuple[Dict[str, str], str, List[Dict], List[Dict]]]]]: 169 | """ 170 | Run multiple agents in parallel and coordinate their responses. 171 | 172 | Args: 173 | task: The task to complete 174 | agents: List of dictionaries, each containing 'model', 'api_key', and 'api_url' 175 | coordinator: Dictionary containing 'model', 'api_key', and 'api_url' for the coordinating model 176 | verbose: Whether to show detailed output 177 | chain_store_api_key: API key for chain store if using 178 | max_workers: Maximum number of parallel workers 179 | return_reasoning: Whether to return the full reasoning chains 180 | max_reasoning_steps: Maximum steps for each agent 181 | coordinator_max_steps: Maximum steps for the coordinator (can be different from agents) 182 | wolfram_app_id: Wolfram Alpha app ID if using 183 | temperature: Default temperature for the model if using 184 | top_p: Top p for the model if using 185 | max_tokens: Maximum number of tokens for the model if using 186 | image: Optional image to pass to the model if using 187 | output_tools: Optional list of output tools for the model if using 188 | reflection_mode: Whether to enable reflection mode for all agents 189 | """ 190 | # Reinitialize colorama for the main process 191 | init(autoreset=True) 192 | 193 | if verbose: 194 | print(f"\n{Fore.MAGENTA}Starting Ensemble for task:{Style.RESET_ALL}") 195 | print(f"{task}\n") 196 | print(f"{Fore.MAGENTA}Using {len(agents)} agents in parallel{Style.RESET_ALL}") 197 | print(f"{Fore.MAGENTA}Default temperature: {temperature}{Style.RESET_ALL}") 198 | for agent in agents: 199 | if 'temperature' in agent: 200 | print(f"{Fore.MAGENTA}Temperature for {agent['model']}: {agent['temperature']}{Style.RESET_ALL}") 201 | 202 | if verbose and reflection_mode: 203 | print(f"{Fore.MAGENTA}Reflection mode: {Style.RESET_ALL}Enabled for all agents") 204 | 205 | # Run all agents in parallel with max steps 206 | agent_results = run_agents_parallel( 207 | task, 208 | agents, 209 | verbose, 210 | chain_store_api_key, 211 | max_workers, 212 | max_reasoning_steps, 213 | wolfram_app_id, 214 | temperature, 215 | top_p, 216 | max_tokens, 217 | image, 218 | output_tools, 219 | reflection_mode 220 | ) 221 | 222 | # Format results for coordinator 223 | formatted_results = format_agent_results(agent_results) 224 | 225 | # Create coordinator prompt 226 | coordinator_task = f"""You are a coordinator model tasked with analyzing multiple AI responses to the following question: 227 | 228 | Question: {task} 229 | 230 | 231 | {formatted_results} 232 | 233 | 234 | Please analyze all responses and their reasoning steps carefully. Consider: 235 | 1. The logical soundness of each approach 236 | 2. The thoroughness of the reasoning 237 | 3. The correctness of calculations and tool usage 238 | 4. The clarity and completeness of the final response 239 | 240 | Based on your analysis, synthesize these responses into a single, high-quality response to the question. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the question. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability. Also remember that the user is only going to see your final answer, so make sure it's complete and self-contained, and actually answers the question.""" 241 | 242 | # Get coordinator's response 243 | if verbose: 244 | print(f"\n{Fore.CYAN}Running coordinator model: {Style.RESET_ALL}{coordinator['model']}") 245 | 246 | coordinator_response, _, _, _ = complete_reasoning_task( 247 | task=coordinator_task, 248 | api_key=coordinator['api_key'], 249 | model=coordinator['model'], 250 | api_url=coordinator['api_url'], 251 | verbose=verbose, 252 | chain_store_api_key=None, 253 | max_reasoning_steps=coordinator_max_steps, 254 | wolfram_app_id=wolfram_app_id, 255 | temperature=temperature, 256 | top_p=top_p, 257 | max_tokens=max_tokens, 258 | image=image, 259 | output_tools=output_tools, 260 | reflection_mode=reflection_mode 261 | ) 262 | 263 | if return_reasoning: 264 | return coordinator_response, agent_results 265 | return coordinator_response 266 | 267 | # Alias for backward compatibility 268 | run_mixture_of_agents = ensemble -------------------------------------------------------------------------------- /planner.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Optional 2 | from colorama import Fore, Style 3 | import json 4 | 5 | def format_tools_for_context(tools: List[Dict]) -> str: 6 | """Format tools list into a readable string for context.""" 7 | tools_str = "Available Tools:\n" 8 | for tool in tools: 9 | if tool.get('type') == 'function': 10 | func = tool['function'] 11 | tools_str += f"- {func['name']}: {func['description']}\n" 12 | 13 | # Add parameter details if they exist 14 | if 'parameters' in func and 'properties' in func['parameters']: 15 | tools_str += " Parameters:\n" 16 | for param_name, param_details in func['parameters']['properties'].items(): 17 | tools_str += f" - {param_name}: {param_details.get('description', 'No description')}\n" 18 | 19 | return tools_str 20 | 21 | def format_chain_for_planning( 22 | chain: Dict, 23 | include_tool_calls: bool = True 24 | ) -> str: 25 | """ 26 | Format a single chain into a concise summary focusing on key patterns and outcomes. 27 | """ 28 | formatted = f"\nTask: {chain.get('task', 'Unknown task')}\n" 29 | 30 | # Add metadata if it exists 31 | if 'metadata' in chain: 32 | formatted += "Context:\n" 33 | for key, value in chain['metadata'].items(): 34 | formatted += f"- {key}: {value}\n" 35 | 36 | # Add tools that were available 37 | if 'thinking_tools' in chain: 38 | formatted += "\n" + format_tools_for_context(chain['thinking_tools']) 39 | 40 | formatted += "\nSteps taken:\n" 41 | for msg in chain.get('conversation_history', []): 42 | if msg['role'] == 'assistant': 43 | step = f"- {msg.get('content', '')}" 44 | 45 | # Include tool calls if requested and they exist 46 | if include_tool_calls and msg.get('tool_calls'): 47 | for tool_call in msg['tool_calls']: 48 | if tool_call['type'] == 'function': 49 | func = tool_call['function'] 50 | step += f"\n Tool used: {func['name']}" 51 | try: 52 | args = json.loads(func['arguments']) 53 | step += f"\n Arguments: {json.dumps(args, indent=2)}" 54 | except: 55 | step += f"\n Arguments: {func['arguments']}" 56 | 57 | formatted += step + "\n" 58 | # Include tool responses for context 59 | elif msg['role'] == 'tool': 60 | content = msg.get('content', '') 61 | first_line = content.split('\n')[0] if content else '' 62 | formatted += f" Result: {first_line}...\n" 63 | 64 | return formatted 65 | 66 | def generate_plan( 67 | task: str, 68 | similar_chains: List[Dict], 69 | current_tools: List[Dict], 70 | api_key: str, 71 | model: str, 72 | api_url: str, 73 | verbose: bool = False, 74 | metadata: Optional[Dict] = None 75 | ) -> str: 76 | """ 77 | Generate a plan of action based on similar chains from memory. 78 | Takes into account available tools and other context. 79 | """ 80 | from call_ai import send_message_to_api 81 | 82 | if verbose: 83 | print(f"\n{Fore.CYAN}Extracting patterns from {len(similar_chains)} similar chains...{Style.RESET_ALL}") 84 | # Print the tasks of the similar chains 85 | for i, chain in enumerate(similar_chains, 1): 86 | print(f"Example {i}: {chain.get('task', 'Unknown task')}") 87 | 88 | # Format current context 89 | current_context = f"Current Task: {task}\n" 90 | if metadata: 91 | current_context += "Current Context:\n" 92 | for key, value in metadata.items(): 93 | current_context += f"- {key}: {value}\n" 94 | current_context += "\n" + format_tools_for_context(current_tools) 95 | 96 | # Format similar chains 97 | examples_context = "" 98 | for i, chain in enumerate(similar_chains, 1): 99 | examples_context += f"\nExample {i}:" 100 | examples_context += format_chain_for_planning(chain) 101 | 102 | # Create planning prompt 103 | planning_messages = [ 104 | { 105 | 'role': 'system', 106 | 'content': ( 107 | "You are an expert at breaking down complex tasks into clear steps and leveraging available tools effectively. " 108 | "Focus on providing strategic guidance about HOW to approach problems rather than specific solutions. " 109 | "Key aspects to consider:\n" 110 | "- How to break the problem into manageable steps\n" 111 | "- Which tools would be most helpful at each stage\n" 112 | "- How to validate progress and handle potential issues\n" 113 | "- What patterns from past experiences could be applied" 114 | ) 115 | }, 116 | { 117 | 'role': 'user', 118 | 'content': "[REDACTED]" # Make the AI think there was an example input here — we're just trying to teach it how to generate a solid plan 119 | }, 120 | { 121 | 'role': 'assistant', 122 | 'content': """For the current task of designing an generalist AI search agent that uses OpenAI-compatible APIs, we can learn from the example where we built a LLM-based voice chatbot. 123 | 124 | For API integration, we successfully used the OpenRouter endpoint (https://openrouter.ai/api/v1/chat/completions) with these key parameters: 125 | - model: "meta-llama/Meta-Llama-3-8B-Instruct" 126 | - messages: [{"role": "user", "content": "Hello, how are you?"}] 127 | - tools: [] 128 | - max_tokens: 1000 129 | - temperature: 0.7 130 | - top_p: 1.0 131 | 132 | One key learning was about model selection - while the chatbot needed low latency, an agent typically benefits from a more capable model since response time is less critical. 133 | 134 | We also discovered important lessons about prompt engineering. Our experience showed that shorter, precise prompts consistently outperformed longer ones. The initial iterations suffered from vague prompting that led to unfocused responses. 135 | 136 | A particularly effective pattern we uncovered was using function calling to enable tool usage. This approach could be valuable for integrating search capabilities, particularly by combining function calling with SERP APIs for web access. 137 | """ 138 | }, 139 | { 140 | 'role': 'user', 141 | 'content': ( 142 | f"{current_context}\n" 143 | f"Similar Examples:{examples_context}\n\n" 144 | "Based on these examples and the available tools/resources, outline a strategic approach for this task:\n" 145 | "1. How would you break this down into clear steps?\n" 146 | "2. Which tools (and, if applicable, which libraries) would be most valuable at each stage?\n" 147 | "3. What key checkpoints or validation should be included?\n" 148 | "4. What patterns from similar past tasks could guide the approach?\n\n" 149 | "Focus on the process and methodology rather than specific implementation details.\n" 150 | "Keep it concise and super high-level, like you're having a quick chat with a colleague. Maximum 200 words." 151 | ) 152 | } 153 | ] 154 | 155 | if verbose: 156 | print(f"{Fore.CYAN}Analyzing patterns and generating plan...{Style.RESET_ALL}") 157 | 158 | try: 159 | response = send_message_to_api( 160 | task, 161 | planning_messages, 162 | api_key, 163 | [], # No tools needed for planning 164 | model, 165 | temperature=0.7, 166 | top_p=1.0, 167 | max_tokens=1000, # Increased for more detailed plans 168 | api_url=api_url, 169 | verbose=verbose 170 | ) 171 | 172 | plan = response.get('content', '') 173 | 174 | if verbose: 175 | print(f"\n{Fore.GREEN}Generated Plan:{Style.RESET_ALL}") 176 | print(plan) 177 | 178 | return plan 179 | 180 | except Exception as e: 181 | if verbose: 182 | print(f"\n{Fore.RED}Error generating plan: {str(e)}{Style.RESET_ALL}") 183 | return "Failed to generate plan from similar examples." -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | anyio==4.6.2.post1 2 | backports.tarfile==1.2.0 3 | blinker==1.9.0 4 | certifi==2024.8.30 5 | charset-normalizer==3.4.0 6 | click==8.1.7 7 | colorama==0.4.6 8 | contourpy==1.3.0 9 | cycler==0.12.1 10 | e2b_code_interpreter==1.0.1 11 | exceptiongroup==1.2.2 12 | Flask==3.0.3 13 | fonttools==4.54.1 14 | google_search_results==2.4.2 15 | h11==0.14.0 16 | httpcore==1.0.6 17 | httpx==0.27.2 18 | idna==3.10 19 | importlib_metadata==8.5.0 20 | importlib_resources==6.4.5 21 | itsdangerous==2.2.0 22 | jaraco.context==6.0.1 23 | Jinja2==3.1.4 24 | joblib==1.4.2 25 | kiwisolver==1.4.7 26 | MarkupSafe==3.0.2 27 | matplotlib==3.9.2 28 | more-itertools==10.5.0 29 | mpmath==1.3.0 30 | multidict==6.1.0 31 | numpy==2.0.2 32 | packaging==24.2 33 | pandas==2.2.3 34 | pillow==11.0.0 35 | pyparsing==3.2.0 36 | python-dateutil==2.9.0.post0 37 | python-dotenv==1.0.1 38 | pytz==2024.2 39 | requests==2.32.3 40 | scikit-learn==1.5.2 41 | scipy==1.13.1 42 | seaborn==0.13.2 43 | six==1.16.0 44 | sniffio==1.3.1 45 | sympy==1.13.3 46 | threadpoolctl==3.5.0 47 | typing_extensions==4.12.2 48 | tzdata==2024.2 49 | urllib3==2.2.3 50 | Werkzeug==3.1.3 51 | wolframalpha==5.1.3 52 | xmltodict==0.14.2 53 | zipp==3.21.0 54 | -------------------------------------------------------------------------------- /tools.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from typing import Dict, Any, List, Union, Optional 4 | import sys 5 | from io import StringIO 6 | import traceback 7 | from contextlib import redirect_stdout, redirect_stderr 8 | import json 9 | import wolframalpha 10 | import numpy as np 11 | import pandas as pd 12 | import matplotlib.pyplot as plt 13 | import seaborn as sns 14 | import sympy 15 | import scipy 16 | import sklearn 17 | from sympy import symbols, solve, simplify 18 | from scipy import stats 19 | from sklearn import preprocessing 20 | import math 21 | from e2b_code_interpreter import Sandbox 22 | from serpapi import GoogleSearch 23 | from dotenv import load_dotenv 24 | from urllib.parse import quote 25 | load_dotenv() 26 | 27 | serpapi_api_key = os.environ.get("SERPAPI_API_KEY") 28 | 29 | # Dictionary of interpreter states, keyed by task hash 30 | interpreter_states = {} 31 | 32 | def get_task_hash(task: str) -> str: 33 | """Generate a unique hash for a task.""" 34 | import hashlib 35 | return hashlib.md5(task.encode()).hexdigest() 36 | 37 | def clear_interpreter_state(task: str = None): 38 | """ 39 | Clear the interpreter state. 40 | If task is provided, only clear that task's state. 41 | If no task is provided, clear all states. 42 | """ 43 | global interpreter_states 44 | if task: 45 | task_hash = get_task_hash(task) 46 | if task_hash in interpreter_states: 47 | del interpreter_states[task_hash] 48 | else: 49 | interpreter_states = {} 50 | 51 | def python_interpreter(code: str, task: str, timeout: int = 10, sandbox: Optional[Sandbox] = None) -> str: 52 | """ 53 | Safely execute Python code in a restricted environment. 54 | Maintains separate state for each task. 55 | """ 56 | if sandbox is None: 57 | raise ValueError("E2B Sandbox is required for Python code execution but none was provided.") 58 | 59 | print(f"Executing code:\n{code}") 60 | execution = sandbox.run_code( 61 | code, 62 | # timeout=timeout, # Timeout to wait for the whole request to complete 63 | on_stdout=lambda x: print('[stdout]', x), 64 | on_stderr=lambda x: print('[stderr]', x) 65 | ) 66 | 67 | if execution.error: 68 | e = execution.error 69 | 70 | error_msg = ( 71 | f"Error executing code: {e.value}\n" 72 | f"Error type: {type(e.name)}\n" 73 | f"Traceback:\n{e.traceback}\n" 74 | "\nDebugging Suggestions:\n" 75 | "1. Add print statements to debug the issue\n" 76 | "2. Use assertions to validate inputs and outputs\n" 77 | "3. Check variable types with print(type(var))\n" 78 | "4. For numerical computations, verify inputs are numbers\n" 79 | "5. For symbolic math, ensure variables are properly defined with symbols()\n" 80 | "\nNote: Plotting is currently not supported. Instead of visualizing data, consider:\n" 81 | "1. Printing key numerical results\n" 82 | "2. Showing data statistics\n" 83 | "3. Printing array slices or samples\n" 84 | "\nAvailable packages:\n" 85 | "- numpy (np): Numerical computing\n" 86 | "- pandas (pd): Data manipulation\n" 87 | "- scipy: Scientific computing\n" 88 | "- sklearn: Machine learning" 89 | ) 90 | return error_msg 91 | 92 | result = [] 93 | 94 | # Results are the output of the code execution besides stdout and stderr 95 | # Can be text, PNG, JPG, JSON, html, markdown, etc. 96 | # Results are based on executing code inside the headless Jupyter notebook 97 | # that's running inside the sandbox. 98 | # The same way, you'd get result from a Jupyter notebook cell, you get results here. 99 | # That means any display() calls in the code will be captured as a result, 100 | # and also the last expression in the code, if there is one. 101 | code_exec_results = execution.results 102 | for ce_result in code_exec_results: 103 | print(ce_result.formats()) # Raw data of results 104 | # if 'png' in ce_result.formats: 105 | # Handle PNG images 106 | # if 'json' in ce_result.formats: 107 | # Handle JSON 108 | # ... 109 | # 110 | # Text is always present for every result. 111 | result.append(ce_result.text) 112 | 113 | stdout = execution.logs.stdout 114 | stderr = execution.logs.stderr 115 | if stdout: 116 | result.append(f"Output:\n{''.join(stdout)}") 117 | if stderr: 118 | result.append(f"Errors:\n{''.join(stderr)}") 119 | return "\n\n".join(result) if result else "Code executed successfully with no output." 120 | 121 | def find_datapoint_on_web( 122 | query: str, 123 | api_key: str = None, 124 | ) -> str: 125 | """ 126 | Perform web search using SERPAPI Google Search. 127 | 128 | Args: 129 | query: The specific search query 130 | api_key: API key for SERPAPI 131 | api_url: Not used for SERPAPI 132 | 133 | Returns: 134 | str: Search results with citations 135 | """ 136 | try: 137 | # Configure the search 138 | search = GoogleSearch({ 139 | "q": query, 140 | "api_key": api_key, 141 | "num": 5 # Get top 5 results 142 | }) 143 | 144 | # Get the results 145 | results = search.get_dict() 146 | 147 | if "error" in results: 148 | return f"Error performing search: {results['error']}" 149 | 150 | # Format organic results 151 | formatted_results = [] 152 | 153 | if "organic_results" in results: 154 | for result in results["organic_results"]: 155 | title = result.get("title", "No title") 156 | snippet = result.get("snippet", "No description available") 157 | link = result.get("link", "No link available") 158 | formatted_results.append(f"Source: {title}\nSummary: {snippet}\nURL: {link}\n") 159 | 160 | if formatted_results: 161 | return "\n".join(formatted_results) 162 | else: 163 | return "No relevant results found for the query." 164 | 165 | except Exception as e: 166 | return f"Error performing web search: {str(e)}" 167 | 168 | def wolfram( 169 | query: str, 170 | wolfram_app_id: str, 171 | include_pods: List[str] = None, # e.g., ["Result", "Solution", "Plot"] 172 | max_width: int = 1000 173 | ) -> str: 174 | """ 175 | Query Wolfram Alpha for computations, math, science, and knowledge. 176 | 177 | Args: 178 | query: The query to send to Wolfram Alpha 179 | wolfram_app_id: Your Wolfram Alpha API key 180 | include_pods: List of pod names to include in result (None for all) 181 | max_width: Maximum width for plots/images 182 | 183 | Returns: 184 | str: Formatted response from Wolfram Alpha 185 | """ 186 | try: 187 | client = wolframalpha.Client(wolfram_app_id) 188 | res = client.query(query, width=max_width) 189 | 190 | # Format the response 191 | result = [] 192 | for pod in res.pods: 193 | # Skip if we're only interested in specific pods and this isn't one of them 194 | if include_pods and pod.title not in include_pods: 195 | continue 196 | 197 | if pod.title and pod.text: 198 | result.append(f"{pod.title}:\n{pod.text}") 199 | 200 | return "\n\n".join(result) if result else "No results found" 201 | 202 | except Exception as e: 203 | return f"Error querying Wolfram Alpha: {str(e)}" 204 | 205 | def get_webpage_content(url: str, jina_api_key: str = None) -> str: 206 | """ 207 | Retrieve webpage content using Jina API. 208 | 209 | Args: 210 | url: The webpage URL to fetch content from 211 | jina_api_key: Jina API key for authentication 212 | 213 | Returns: 214 | str: The webpage content or error message 215 | """ 216 | if not jina_api_key: 217 | return "Error: Jina API key not provided" 218 | 219 | try: 220 | # URL encode the target URL and prepend Jina API endpoint 221 | encoded_url = quote(url, safe='') 222 | jina_url = f'https://r.jina.ai/{encoded_url}' 223 | 224 | headers = { 225 | 'Authorization': f'Bearer {jina_api_key}' 226 | } 227 | 228 | response = requests.get(jina_url, headers=headers, timeout=10) 229 | 230 | if response.status_code == 200: 231 | return response.text 232 | else: 233 | return f"Failed to retrieve content. Status code: {response.status_code}" 234 | 235 | except requests.RequestException as e: 236 | return f"Error fetching webpage content: {str(e)}" 237 | 238 | def execute_tool( 239 | tool_name: str, 240 | parameters: Dict[str, Any], 241 | task: str = None, 242 | api_key: str = None, 243 | model: str = None, 244 | api_url: str = None, 245 | wolfram_app_id: str = None, 246 | sandbox: Optional[Sandbox] = None, 247 | jina_api_key: str = None 248 | ) -> Any: 249 | """Execute the specified tool with the given parameters.""" 250 | tools = { 251 | "python": python_interpreter, 252 | "find_datapoint_on_web": find_datapoint_on_web, 253 | "wolfram": wolfram, 254 | } 255 | 256 | # Only add get_webpage_content tool if Jina API key is provided 257 | if jina_api_key: 258 | tools["get_webpage_content"] = get_webpage_content 259 | 260 | if tool_name not in tools: 261 | raise ValueError(f"Unknown tool: {tool_name}") 262 | 263 | tool_func = tools[tool_name] 264 | 265 | # Remove thread_id from parameters if it exists 266 | if 'thread_id' in parameters: 267 | del parameters['thread_id'] 268 | 269 | # Inject appropriate credentials and task 270 | if tool_name == "python": 271 | parameters = {**parameters, "task": task, "sandbox": sandbox} 272 | elif tool_name == "find_datapoint_on_web": 273 | parameters = {**parameters, "api_key": serpapi_api_key} 274 | elif tool_name == "wolfram": 275 | parameters = {**parameters, "wolfram_app_id": wolfram_app_id} 276 | elif tool_name == "get_webpage_content": 277 | parameters = {**parameters, "jina_api_key": jina_api_key} 278 | 279 | return tool_func(**parameters) --------------------------------------------------------------------------------