├── .gitignore
├── LICENSE
├── README.md
├── api.py
├── call_ai.py
├── chain_store.py
├── chat_loop.py
├── engine.py
├── helpers.py
├── main.py
├── mixture.py
├── planner.py
├── requirements.txt
├── successful_chains.json
└── tools.py


/.gitignore:
--------------------------------------------------------------------------------
1 | venv/
2 | .env
3 | __pycache__/


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 mshumer
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <div align="center">
  2 | 
  3 | # OpenReasoningEngine
  4 | 
  5 | **While AI labs are quietly building closed reasoning systems,  
  6 | we can create something more powerful together in the open.**
  7 | 
  8 | </div>
  9 | 
 10 | ---
 11 | 
 12 | This repo serves as a modular, open-source test-time compute engine — anyone in the community with a useful idea to improve model capabilities is encouraged to add their approach to the system. As approaches are added, this system will enable users to compose them to drastically increase capabilities.
 13 | 
 14 | And over time, as users save successful reasoning chains, we will be able to train models designed to take full advantage of this system.
 15 | 
 16 | *Works with any OpenAI-compatible endpoint/model that supports function calling, and serves as a great base for building many types of reasoning systems.*
 17 | 
 18 | > ### ⚠️ Important Note
 19 | > **We are going to be very selective about what we add to this system. If an approach doesn't have a clear path to increasing the capabilities of the system, we will not add it.**
 20 | 
 21 | ---
 22 | 
 23 | ## 🚀 Initial System
 24 | 
 25 | ### Core Features
 26 | 
 27 | 🔹 **Step-by-Step Reasoning**  
 28 |    &nbsp;&nbsp;&nbsp;&nbsp;Executes reasoning one step per turn with integrated tools:
 29 |    - Python interpreter
 30 |    - Web search (via SerpAPI)
 31 |    - Wolfram Alpha integration
 32 |    - Full webpage reading (via Jina)
 33 | 
 34 | 🔹 **Memory-Based Planning**  
 35 |    &nbsp;&nbsp;&nbsp;&nbsp;Continually learns and adapts from past experiences
 36 | 
 37 | 🔹 **MoA**  
 38 |    &nbsp;&nbsp;&nbsp;&nbsp;Implements mixture-of-agents for ensemble decision making — *works but requires further testing*
 39 | 
 40 | 🔹 **Beam Search**  
 41 |    &nbsp;&nbsp;&nbsp;&nbsp;Sample multiple next reasoning step candidates at each turn, and choose the best (soon to be updated with forking Python interpreters to significantly improve the system)
 42 | 
 43 | 🔹 **Self-Reflection**  
 44 |    &nbsp;&nbsp;&nbsp;&nbsp;Force the AI to validate reasoning steps as it thinks
 45 | 
 46 | 🔹 **Flexible Model Support**  
 47 |    &nbsp;&nbsp;&nbsp;&nbsp;Model-agnostic API supporting any OpenAI-compatible provider (OpenAI, Anthropic, etc.)
 48 | 
 49 | 🔹 **Rich Input/Output**  
 50 |    &nbsp;&nbsp;&nbsp;&nbsp;Handles image input, **function calling**, and multi-turn conversations
 51 | 
 52 | ---
 53 | 
 54 | ## ⚙️ Installation
 55 | 
 56 | ### 1. Clone and Install
 57 | ```bash
 58 | git clone https://github.com/mshumer/OpenReasoningEngine.git
 59 | cd OpenReasoningEngine
 60 | pip install -r requirements.txt
 61 | ```
 62 | 
 63 | ### 2. API Setup
 64 | Get API keys from:
 65 | - [OpenRouter](https://openrouter.ai/) - for model access
 66 | - [E2B](https://e2b.dev/) - for Python code execution
 67 | - [SerpAPI](https://serpapi.com/) - for web search
 68 | - [Jina](https://jina.ai/) (optional) - for webpage content extraction
 69 | - [Wolfram Alpha](https://products.wolframalpha.com/api) (optional) - for computations/scientific queries
 70 | - [Cohere](https://cohere.ai/) (optional) - for learning from past chains
 71 | 
 72 | Create a `.env` file:
 73 | ```env
 74 | E2B_API_KEY="your_e2b_key_here"
 75 | OPENROUTER_API_KEY="your_openrouter_key_here"
 76 | SERPAPI_API_KEY="your_serpapi_key_here"
 77 | JINA_API_KEY="your_jina_key_here"  # Optional
 78 | WOLFRAM_APP_ID="your_wolfram_key_here"  # Optional
 79 | COHERE_API_KEY="your_cohere_key_here"  # Optional
 80 | ```
 81 | 
 82 | ### 3. Load Environment
 83 | ```bash
 84 | source .env
 85 | ```
 86 | 
 87 | ---
 88 | 
 89 | ## 🛠️ Usage
 90 | 
 91 | ### Running the Engine
 92 | Two options available:
 93 | - Direct execution: `python main.py`
 94 | - API server: `python api.py` (starts a Flask API endpoint)
 95 | 
 96 | ## Config Options
 97 | Running the code as-is will work — I've chosen reasonable default settings. If you'd like to customize the way the system reasons, you can adjust the parameters when you run it.
 98 | 
 99 | ### Tool System
100 | 
101 | #### 1. Internal Tools
102 | - Used during the reasoning process
103 | - Default setup includes:
104 |   - Python interpreter (with guidance to steer the LLM to add assertions, prints, etc. to improve performance and catch issues)
105 |   - Web search (SerpAPI)
106 |   - Webpage content extraction (Jina, optional)
107 |   - Wolfram Alpha (optional)
108 | - Customizable based on your needs
109 | 
110 | #### 2. Output Tools
111 | - Standard AI API output tools
112 | - Called after reasoning completion
113 | - Configurable based on use-case
114 | 
115 | ---
116 | 
117 | ## 🧮 Learning System
118 | 
119 | ### Memory Management
120 | 
121 | A major goal of OpenReasoningEngine is to enable learning from experience. The initial implementation is simple, and will continue to be iterated on as I (and others) come up with smarter approaches.
122 | 
123 | #### Steps to Enable Continual Learning:
124 | 
125 | 1. Obtain an API key from [Cohere](https://cohere.ai/)
126 | 
127 | 2. Save successful reasoning chains:
128 | ```python
129 | chain_store.save_successful_chain(
130 |     task=task,
131 |     conversation_history=history,
132 |     final_response=response,
133 |     cohere_api_key=cohere_api_key,
134 |     thinking_tools=thinking_tools,
135 |     output_tools=output_tools,
136 |     metadata={"model": model, "api_url": api_url}
137 | )
138 | ```
139 | 
140 | The system includes starter chains in `successful_chains.json`.
141 | 
142 | Community contributions to this database are welcome, subject to validation. If you'd like to add a chain to the database, please propose it [here](https://github.com/mshumer/OpenReasoningEngine/discussions/categories/proposed-chains). The community will vote on it, and if the results are positive, it will be added to the next version of the database (versioning will allow users to see stable performance over time).
143 | 
144 | If you have ideas to make this process more seamless and scalable, please reach out!
145 | 
146 | ### 📊 Performance Notes
147 | 
148 | - Performance may vary based on the specific chains in your memory store (performance may be dramatically different with different chains)
149 | 
150 | ---
151 | 
152 | ## 📝 Logging
153 | 
154 | ### Verbose Mode
155 | When `verbose=True`, the engine displays:
156 | - 🔄 API interactions
157 | - 🛠️ Tool usage and results
158 | - 📋 Step-by-step reasoning progress
159 | 
160 | This makes it easy to see what's going on under the hood and diagnose issues.
161 | 
162 | ---
163 | 
164 | ## 🧪 Benchmarking
165 | 
166 | I've open-sourced a very simple LLM evaluation harness that you can use with this repo to test different setups and understand how well approaches work. I've provided some example eval datasets so you can see how it works. If you want to try different OpenReasoningEngine setups, just drop in your own eval data and play with the reasoning settings until it works well for you!
167 | 
168 | [Try it here.](https://github.com/mshumer/MattEval)
169 | 
170 | ---
171 | 
172 | ## 🤝 Contributing
173 | 
174 | Contributions are welcome if they:
175 | - ✨ Demonstrably improve system capabilities
176 | - 📈 Include clear performance metrics
177 | 
178 | Quality-of-life improvements are also appreciated.
179 | 
180 | ---
181 | 
182 | ## Acknowledgements
183 | Thank you to the following folks who provided advice, feedback, ideas, and helped me implement and test the initial versions of OpenReasoningEngine:
184 | - [Steve Ickman](https://x.com/stevenic)
185 | - [Vasek Mlejnsky](https://x.com/mlejva)
186 | - [Josh Bickett](https://x.com/josh_bickett)
187 | - [Aidan Gomez](https://x.com/aidangomez)
188 | - [Alec Velikanov](https://x.com/alecvxyz) (Alex, imo)
189 | 
190 | [Follow me on X](https://x.com/mattshumer_) for updates on this and other AI things I'm working on.
191 | 
192 | OpenReasoningEngine is released under the MIT License. See the [LICENSE](https://github.com/mshumer/OpenReasoningEngine/blob/main/LICENSE) file for more details.
193 | 


--------------------------------------------------------------------------------
/api.py:
--------------------------------------------------------------------------------
  1 | from flask import Flask, request, jsonify
  2 | from engine import complete_reasoning_task
  3 | from mixture import ensemble
  4 | import traceback
  5 | 
  6 | app = Flask(__name__)
  7 | 
  8 | @app.route('/reason', methods=['POST'])
  9 | def reason():
 10 |     """
 11 |     Single model reasoning endpoint.
 12 |     
 13 |     Expected JSON payload:
 14 |     {
 15 |         "task": "The task description",
 16 |         "api_key": "your-api-key",
 17 |         "model": "model-name",
 18 |         "api_url": "api-endpoint",
 19 |         "temperature": 0.7,            # optional
 20 |         "top_p": 1.0,                 # optional
 21 |         "max_tokens": 500,            # optional
 22 |         "verbose": false,             # optional
 23 |         "chain_store_api_key": "key", # optional
 24 |         "wolfram_app_id": "key",      # optional
 25 |         "max_reasoning_steps": 10,    # optional
 26 |         "image": "image-url or base64" # optional
 27 |         "output_tools": [             # optional
 28 |             {
 29 |                 "type": "tool-type",
 30 |                 "name": "tool-name",
 31 |                 "description": "tool-description"
 32 |             }
 33 |         ],
 34 |         "reflection_mode": false,      # optional: enable reflection mode
 35 |         "previous_chains": [           # optional: previous conversation chains
 36 |             [
 37 |                 {
 38 |                     "role": "system|user|assistant|tool",
 39 |                     "content": "message content",
 40 |                     "tool_calls": [] # optional
 41 |                 }
 42 |             ]
 43 |         ],
 44 |         "jina_api_key": "jina-api-key"  # optional
 45 |     }
 46 |     """
 47 |     try:
 48 |         data = request.get_json()
 49 |         
 50 |         # Required parameters
 51 |         task = data.get('task')
 52 |         api_key = data.get('api_key')
 53 |         model = data.get('model')
 54 |         api_url = data.get('api_url')
 55 |     
 56 |         if not all([task, api_key, model, api_url]):
 57 |             return jsonify({
 58 |                 'error': 'Missing required parameters. Need: task, api_key, model, api_url'
 59 |             }), 400
 60 |                 
 61 |         # Optional parameters
 62 |         temperature = data.get('temperature', 0.7)
 63 |         top_p = data.get('top_p', 1.0)
 64 |         max_tokens = data.get('max_tokens', 500)
 65 |         verbose = data.get('verbose', False)
 66 |         chain_store_api_key = data.get('chain_store_api_key')
 67 |         wolfram_app_id = data.get('wolfram_app_id')
 68 |         max_reasoning_steps = data.get('max_reasoning_steps')
 69 |         image = data.get('image')
 70 |         output_tools = data.get('output_tools')
 71 |         reflection_mode = data.get('reflection_mode', False)
 72 |         previous_chains = data.get('previous_chains', [])  # New parameter
 73 |         num_candidates = data.get('num_candidates', 1)
 74 |         beam_search_enabled = data.get('beam_search_enabled', False)
 75 |         use_planning = data.get('use_planning', False)
 76 |         use_jeremy_planning = data.get('use_jeremy_planning', False)
 77 |         jina_api_key = data.get('jina_api_key')
 78 | 
 79 |         # Run reasoning
 80 |         response, history, thinking_tools, output_tools = complete_reasoning_task(
 81 |             task=task,
 82 |             api_key=api_key,
 83 |             model=model,
 84 |             api_url=api_url,
 85 |             temperature=temperature,
 86 |             top_p=top_p,
 87 |             max_tokens=max_tokens,
 88 |             verbose=verbose,
 89 |             chain_store_api_key=chain_store_api_key,
 90 |             wolfram_app_id=wolfram_app_id,
 91 |             max_reasoning_steps=max_reasoning_steps,
 92 |             image=image,
 93 |             output_tools=output_tools,
 94 |             reflection_mode=reflection_mode,
 95 |             previous_chains=previous_chains,
 96 |             use_planning=use_planning,
 97 |             beam_search_enabled=beam_search_enabled,
 98 |             num_candidates=num_candidates,
 99 |             use_jeremy_planning=use_jeremy_planning,
100 |             jina_api_key=jina_api_key
101 |         )
102 |                 
103 |         return jsonify({
104 |             'response': response,
105 |             'reasoning_chain': history,
106 |             'thinking_tools': thinking_tools,
107 |             'output_tools': output_tools
108 |         })
109 |         
110 |     except Exception as e:
111 |         return jsonify({
112 |             'error': str(e),
113 |             'traceback': traceback.format_exc()
114 |         }), 500
115 | 
116 | @app.route('/ensemble', methods=['POST'])
117 | def run_ensemble():
118 |     """
119 |     Ensemble reasoning endpoint.
120 |     
121 |     Expected JSON payload:
122 |     {
123 |         "task": "The task description",
124 |         "agents": [
125 |             {
126 |                 "model": "model-name-1",
127 |                 "api_key": "key-1",
128 |                 "api_url": "url-1",
129 |                 "temperature": "temperature-1",
130 |             },
131 |             {
132 |                 "model": "model-name-2",
133 |                 "api_key": "key-2",
134 |                 "api_url": "url-2",
135 |                 "temperature": "temperature-2"
136 |             }
137 |         ],
138 |         "coordinator": {
139 |             "model": "model-name",
140 |             "api_key": "key",
141 |             "api_url": "url",
142 |             "temperature": "temperature"
143 |         },
144 |         "verbose": false,             # optional
145 |         "chain_store_api_key": "key", # optional
146 |         "max_workers": 3,             # optional
147 |         "return_reasoning": false,    # optional
148 |         "max_reasoning_steps": 10,    # optional: max steps per agent
149 |         "coordinator_max_steps": 5,   # optional: max steps for coordinator
150 |         "wolfram_app_id": "key",      # optional
151 |         "temperature": 0.7,           # optional
152 |         "top_p": 1.0,                # optional
153 |         "max_tokens": 500            # optional
154 |         "reflection_mode": false,    # optional: enable reflection mode for all agents
155 |     }
156 |     """
157 |     try:
158 |         data = request.get_json()
159 |         
160 |         # Required parameters
161 |         task = data.get('task')
162 |         agents = data.get('agents')
163 |         coordinator = data.get('coordinator')
164 |         
165 |         if not all([task, agents, coordinator]):
166 |             return jsonify({
167 |                 'error': 'Missing required parameters. Need: task, agents, coordinator'
168 |             }), 400
169 |         
170 |         # Optional parameters
171 |         verbose = data.get('verbose', False)
172 |         chain_store_api_key = data.get('chain_store_api_key')
173 |         max_workers = data.get('max_workers')
174 |         return_reasoning = data.get('return_reasoning', False)
175 |         max_reasoning_steps = data.get('max_reasoning_steps')
176 |         coordinator_max_steps = data.get('coordinator_max_steps')
177 |         wolfram_app_id = data.get('wolfram_app_id')
178 |         temperature = data.get('temperature', 0.7)
179 |         top_p = data.get('top_p', 1.0)
180 |         max_tokens = data.get('max_tokens', 500)
181 |         image = data.get('image', None)
182 |         output_tools = data.get('output_tools')
183 |         reflection_mode = data.get('reflection_mode', False)
184 | 
185 |         # Run ensemble
186 |         result = ensemble(
187 |             task=task,
188 |             agents=agents,
189 |             coordinator=coordinator,
190 |             verbose=verbose,
191 |             chain_store_api_key=chain_store_api_key,
192 |             max_workers=max_workers,
193 |             return_reasoning=return_reasoning,
194 |             max_reasoning_steps=max_reasoning_steps,
195 |             coordinator_max_steps=coordinator_max_steps,
196 |             wolfram_app_id=wolfram_app_id,
197 |             temperature=temperature,
198 |             top_p=top_p,
199 |             max_tokens=max_tokens,
200 |             image=image,
201 |             output_tools=output_tools,
202 |             reflection_mode=reflection_mode
203 |         )
204 |         
205 |         if return_reasoning:
206 |             coordinator_response, agent_results = result
207 |             return jsonify({
208 |                 'response': coordinator_response,
209 |                 'agent_results': [
210 |                     {
211 |                         'model': config['model'],
212 |                         'response': response,
213 |                         'reasoning_chain': history,
214 |                         'thinking_tools': thinking_tools,
215 |                         'output_tools': output_tools
216 |                     }
217 |                     for config, response, history, thinking_tools, output_tools in agent_results
218 |                 ]
219 |             })
220 |         
221 |         return jsonify({
222 |             'response': result
223 |         })
224 |         
225 |     except Exception as e:
226 |         return jsonify({
227 |             'error': str(e),
228 |             'traceback': traceback.format_exc()
229 |         }), 500
230 | 
231 | if __name__ == '__main__':
232 |     app.run(host='0.0.0.0', port=5050) 


--------------------------------------------------------------------------------
/call_ai.py:
--------------------------------------------------------------------------------
  1 | from colorama import Fore, Style
  2 | import requests
  3 | from typing import List, Dict
  4 | import concurrent.futures
  5 | import os
  6 | 
  7 | 
  8 | def send_message_to_api(
  9 |     task: str,
 10 |     messages: List[Dict],
 11 |     api_key: str,
 12 |     tools: List[Dict],
 13 |     model: str = "gpt-4o-mini",
 14 |     temperature: float = 0.7,
 15 |     top_p: float = 1.0,
 16 |     max_tokens: int = 500,
 17 |     api_url: str = "https://openrouter.ai/api/v1/chat/completions",
 18 |     verbose: bool = False,
 19 |     is_first_step: bool = False,
 20 |     tool_choice: str = None,
 21 | ) -> Dict:
 22 |     """
 23 |     Send a message to the OpenRouter API and return the assistant's response.
 24 |     Will retry up to 3 times with increasing delay between retries.
 25 |     """
 26 |     if verbose and is_first_step:
 27 |         print(
 28 |             f"\n{Fore.CYAN}╭──────────────────────────────────────────{Style.RESET_ALL}"
 29 |         )
 30 |         print(f"{Fore.CYAN}│ Sending Request to API{Style.RESET_ALL}")
 31 |         print(
 32 |             f"{Fore.CYAN}├──────────────────────────────────────────{Style.RESET_ALL}"
 33 |         )
 34 |         print(f"{Fore.CYAN}│ Model: {Style.RESET_ALL}{model}")
 35 |         print(f"{Fore.CYAN}│ URL: {Style.RESET_ALL}{api_url}")
 36 |         print(f"{Fore.CYAN}│ Temperature: {Style.RESET_ALL}{temperature}")
 37 |         print(
 38 |             f"{Fore.CYAN}╰──────────────────────────────────────────{Style.RESET_ALL}\n"
 39 |         )
 40 | 
 41 |     retries = 0
 42 |     max_retries = 3
 43 |     delay = 1  # Initial delay in seconds
 44 | 
 45 |     # Prepare request data for logging
 46 |     request_data = {
 47 |         'model': model,
 48 |         'messages': messages,
 49 |         'tools': tools if tools else None,
 50 |         'max_tokens': max_tokens,
 51 |         'temperature': temperature,
 52 |         'top_p': top_p,
 53 |     }
 54 | 
 55 |     if tool_choice:
 56 |         request_data['tool_choice'] = tool_choice
 57 | 
 58 |     while retries <= max_retries:
 59 |         try:
 60 |             print(
 61 |                 f"\n{Fore.BLUE}Making API Request (Attempt {retries + 1}/{max_retries + 1})...{Style.RESET_ALL}"
 62 |             )
 63 |             response = requests.post(
 64 |                 api_url,
 65 |                 headers={
 66 |                     "Authorization": f"Bearer {api_key}",
 67 |                     "Content-Type": "application/json",
 68 |                 },
 69 |                 json=request_data,
 70 |                 timeout=60
 71 |             )
 72 |             print(f"{Fore.GREEN}Response received:{Style.RESET_ALL}")
 73 |             print(f"{Fore.YELLOW}{response.json()}{Style.RESET_ALL}")
 74 | 
 75 |             if verbose:
 76 |                 print(
 77 |                     f"{Fore.YELLOW}Response status: {response.status_code}{Style.RESET_ALL}"
 78 |                 )
 79 | 
 80 |             if response.status_code != 200:
 81 |                 # Log failed request
 82 |                 import datetime
 83 |                 import os
 84 |                 import json
 85 |                 
 86 |                 os.makedirs('api_error_logs', exist_ok=True)
 87 |                 timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
 88 |                 log_file = f'api_error_logs/error_{timestamp}.json'
 89 |                 
 90 |                 error_log = {
 91 |                     'timestamp': timestamp,
 92 |                     'status_code': response.status_code,
 93 |                     'error_message': response.text,
 94 |                     'response_json': response.json(),
 95 |                     'request_url': api_url,
 96 |                     'request_data': request_data,
 97 |                     'retry_attempt': retries + 1
 98 |                 }
 99 |                 
100 |                 with open(log_file, 'w') as f:
101 |                     json.dump(error_log, f, indent=2)
102 |                 
103 |                 raise Exception(
104 |                     f"API request failed with status {response.status_code}: {response.text}"
105 |                 )
106 | 
107 |             response_data = response.json()
108 |             print(f"{Fore.GREEN}Successfully parsed response data{Style.RESET_ALL}")
109 |             return response_data["choices"][0]["message"]
110 | 
111 |         except Exception as error:
112 |             print(
113 |                 f"{Fore.RED}Error occurred during API call (Attempt {retries + 1})!{Style.RESET_ALL}"
114 |             )
115 |             print(f"{Fore.RED}{str(error)}{Style.RESET_ALL}")
116 |             
117 |             # Log any other errors that occur
118 |             import datetime
119 |             import os
120 |             import json
121 |             
122 |             os.makedirs('api_error_logs', exist_ok=True)
123 |             timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
124 |             log_file = f'api_error_logs/error_{timestamp}.json'
125 |             
126 |             error_log = {
127 |                 'timestamp': timestamp,
128 |                 'error_type': type(error).__name__,
129 |                 'error_message': str(error),
130 |                 'request_url': api_url,
131 |                 'response_json': response.json(),
132 |                 'request_data': request_data,
133 |                 'retry_attempt': retries + 1
134 |             }
135 |             
136 |             with open(log_file, 'w') as f:
137 |                 json.dump(error_log, f, indent=2)
138 |             
139 |             if retries == max_retries:
140 |                 raise Exception(
141 |                     f"Error sending message to API after {max_retries + 1} attempts: {str(error)}"
142 |                 )
143 | 
144 |             import time
145 | 
146 |             wait_time = delay * (2**retries)  # Exponential backoff
147 |             print(
148 |                 f"{Fore.YELLOW}Waiting {wait_time} seconds before retrying...{Style.RESET_ALL}"
149 |             )
150 |             time.sleep(wait_time)
151 |             retries += 1
152 | 
153 | 
154 | def generate_multiple_candidates(
155 |     task: str,
156 |     messages: List[Dict],
157 |     api_key: str,
158 |     tools: List[Dict],
159 |     num_candidates: int = 3,
160 |     model: str = "gpt-4o-mini",
161 |     temperature: float = 0.7,
162 |     top_p: float = 1.0,
163 |     max_tokens: int = 500,
164 |     api_url: str = "https://openrouter.ai/api/v1/chat/completions",
165 |     verbose: bool = False,
166 |     is_first_step: bool = False,
167 | ) -> List[Dict]:
168 |     """
169 |     Generate multiple candidate responses in parallel using concurrent.futures.
170 |     Returns a list of candidate responses.
171 |     """
172 |     print(
173 |         f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}"
174 |     )
175 |     print(f"{Fore.MAGENTA}│ Generating {num_candidates} Candidates{Style.RESET_ALL}")
176 |     print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}")
177 | 
178 |     def generate_candidate():
179 |         return send_message_to_api(
180 |             task=task,
181 |             messages=messages,
182 |             api_key=api_key,
183 |             tools=tools,
184 |             model=model,
185 |             temperature=temperature,
186 |             top_p=top_p,
187 |             max_tokens=max_tokens,
188 |             api_url=api_url,
189 |             verbose=verbose,
190 |             is_first_step=is_first_step,
191 |         )
192 | 
193 |     candidates = []
194 |     with concurrent.futures.ThreadPoolExecutor(max_workers=num_candidates) as executor:
195 |         print(f"{Fore.CYAN}Starting parallel candidate generation...{Style.RESET_ALL}")
196 |         future_to_candidate = {
197 |             executor.submit(generate_candidate): i for i in range(num_candidates)
198 |         }
199 |         for future in concurrent.futures.as_completed(future_to_candidate):
200 |             try:
201 |                 candidate = future.result()
202 |                 candidates.append(candidate)
203 |                 print(
204 |                     f"{Fore.GREEN}Successfully generated candidate {len(candidates)}/{num_candidates}{Style.RESET_ALL}"
205 |                 )
206 |             except Exception as e:
207 |                 print(
208 |                     f"{Fore.RED}Error generating candidate: {str(e)}{Style.RESET_ALL}"
209 |                 )
210 | 
211 |     print(
212 |         f"{Fore.GREEN}Generated {len(candidates)} candidates successfully{Style.RESET_ALL}"
213 |     )
214 |     return candidates
215 | 
216 | 
217 | def generate_best_candidate(
218 |     task: str,
219 |     messages: List[Dict],
220 |     api_key: str,
221 |     tools: List[Dict],
222 |     num_candidates: int = 3,
223 |     model: str = "gpt-4o-mini",
224 |     temperature: float = 0.7,
225 |     top_p: float = 1.0,
226 |     max_tokens: int = 500,
227 |     api_url: str = "https://openrouter.ai/api/v1/chat/completions",
228 |     verbose: bool = False,
229 |     is_first_step: bool = False,
230 | ) -> Dict:
231 |     """
232 |     Generate a list of candidate responses and return the best one.
233 |     """
234 |     print(f"\n{Fore.CYAN}╭──────────────────────────────────────────{Style.RESET_ALL}")
235 |     print(f"{Fore.CYAN}│ Starting Best Candidate Selection{Style.RESET_ALL}")
236 |     print(f"{Fore.CYAN}╰──────────────────────────────────────────{Style.RESET_ALL}")
237 | 
238 |     candidates = generate_multiple_candidates(
239 |         task,
240 |         messages,
241 |         api_key,
242 |         tools,
243 |         num_candidates,
244 |         model,
245 |         temperature,
246 |         top_p,
247 |         max_tokens,
248 |         api_url,
249 |         verbose,
250 |         is_first_step,
251 |     )
252 | 
253 |     print(f"\n{Fore.YELLOW}Generated Candidates:{Style.RESET_ALL}")
254 |     print(f"{Fore.YELLOW}{candidates}{Style.RESET_ALL}")
255 | 
256 |     print(f"\n{Fore.MAGENTA}Preparing evaluation prompt...{Style.RESET_ALL}")
257 |     evaluation_prompt = ""
258 | 
259 |     i = 1
260 |     for candidate in candidates:
261 |         evaluation_prompt += f"Candidate {i}:\n{candidate}\n\n"
262 |         i += 1
263 | 
264 |     SYSTEM_PROMPT = """You are a judge tasked with evaluating the viability of multiple candidate responses to a given task. Your goal is to identify the candidate that is most likely to lead to solving the task properly.
265 | 
266 | You will be given a <task> which describes the task at hand, a <previous_thoughts> section which contains the thoughts of the assistant before receiving the candidate responses, and a <next_thought_candidates> section which contains the candidate responses to be evaluated.
267 | 
268 | Evaluate the viability of each candidate response and output the number of the candidate that is most likely to lead to solving the task properly.
269 | 
270 | Do so in the following format:
271 | <thinking>
272 | Think through the viability of each candidate here.
273 | </thinking>
274 | 
275 | <best_candidate_number>
276 | Number of the best candidate
277 | </best_candidate_number>
278 | """
279 | 
280 |     evaluation_prompt += f"""<task>{task}</task>
281 | 
282 | <previous_thoughts>
283 | {messages}
284 | </previous_thoughts>
285 | 
286 | <next_thought_candidates>
287 | {evaluation_prompt}
288 | </next_thought_candidates>
289 | 
290 | Think it through inside the <thinking> section, and then output the number of the candidate that is most likely to lead to solving the <task> properly in the <best_candidate_number> section. In the <best_candidate_number> section, only output the number, nothing else. Possible numbers are: {', '.join(str(i) for i in range(1, num_candidates + 1))}"""
291 | 
292 |     print(f"\n{Fore.BLUE}Sending evaluation request to API...{Style.RESET_ALL}")
293 |     best_candidate_response = send_message_to_api(
294 |         task="",
295 |         messages=[
296 |             {"role": "system", "content": SYSTEM_PROMPT},
297 |             {"role": "user", "content": evaluation_prompt},
298 |         ],
299 |         api_key=api_key,
300 |         tools=tools,
301 |     )
302 | 
303 |     # Parse the best candidate number from the response
304 |     best_candidate_number = int(
305 |         best_candidate_response["content"]
306 |         .split("<best_candidate_number>")[1]
307 |         .split("</best_candidate_number>")[0]
308 |         .strip()
309 |     )
310 | 
311 |     print(f"\n{Fore.GREEN}Selected best candidate:{Style.RESET_ALL}")
312 |     print(f"{Fore.GREEN}{best_candidate_number}{Style.RESET_ALL}")
313 | 
314 |     # Return the best candidate
315 |     return candidates[best_candidate_number - 1]
316 | 


--------------------------------------------------------------------------------
/chain_store.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | import requests
  4 | import numpy as np
  5 | from typing import List, Dict, Optional
  6 | from datetime import datetime
  7 | 
  8 | def init_store(store_file: str = "successful_chains.json") -> None:
  9 |     """Initialize the chain store if it doesn't exist."""
 10 |     if not os.path.exists(store_file):
 11 |         with open(store_file, 'w') as f:
 12 |             json.dump({"chains": []}, f)
 13 | 
 14 | def get_embedding(text: str, cohere_api_key: str, input_type: str = "search_document") -> Optional[List[float]]:
 15 |     """Get embeddings from Cohere API."""
 16 |     try:
 17 |         response = requests.post(
 18 |             "https://api.cohere.ai/v1/embed",
 19 |             headers={
 20 |                 "Authorization": f"Bearer {cohere_api_key}",
 21 |                 "Content-Type": "application/json"
 22 |             },
 23 |             json={
 24 |                 "texts": [text],
 25 |                 "model": "embed-english-v3.0",
 26 |                 "input_type": input_type,
 27 |                 "embedding_type": "float"
 28 |             }
 29 |         )
 30 |         response.raise_for_status()
 31 |         return response.json()["embeddings"][0]
 32 |     except Exception as e:
 33 |         print(f"Error getting embedding: {e}")
 34 |         return None
 35 | 
 36 | def cosine_similarity(a: List[float], b: List[float]) -> float:
 37 |     """Calculate cosine similarity between two vectors."""
 38 |     return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 39 | 
 40 | def save_successful_chain(
 41 |     task: str,
 42 |     conversation_history: List[Dict],
 43 |     final_response: str,
 44 |     cohere_api_key: str,
 45 |     thinking_tools: List[Dict],
 46 |     output_tools: List[Dict],
 47 |     metadata: Dict,
 48 |     store_file: str = "successful_chains.json"
 49 | ) -> bool:
 50 |     """Save a successful chain to the store."""
 51 |     try:
 52 |         # Get embedding for the task
 53 |         embedding = get_embedding(task, cohere_api_key)
 54 |         if not embedding:
 55 |             return False
 56 |         
 57 |         # Initialize store if it doesn't exist
 58 |         if not os.path.exists(store_file):
 59 |             store = {"chains": []}
 60 |         else:
 61 |             try:
 62 |                 with open(store_file, 'r') as f:
 63 |                     store = json.load(f)
 64 |             except json.JSONDecodeError:
 65 |                 # If file exists but is invalid JSON, start fresh
 66 |                 store = {"chains": []}
 67 |         
 68 |         # Process conversation history to redact long tool responses
 69 |         processed_history = []
 70 |         for msg in conversation_history:
 71 |             if msg['role'] == 'tool' and len(msg['content']) > 1500:
 72 |                 msg = msg.copy()  # Create a copy to avoid modifying the original
 73 |                 msg['content'] = "[redacted for token savings]"
 74 |             processed_history.append(msg)
 75 |         
 76 |         # Add new chain
 77 |         chain = {
 78 |             "task": task,
 79 |             "embedding": embedding,
 80 |             "conversation_history": processed_history,
 81 |             "final_response": final_response,
 82 |             "thinking_tools": thinking_tools,
 83 |             "output_tools": output_tools,
 84 |             "timestamp": datetime.now().isoformat(),
 85 |             "metadata": metadata
 86 |         }
 87 |         store["chains"].append(chain)
 88 |         
 89 |         # Save updated store
 90 |         with open(store_file, 'w') as f:
 91 |             json.dump(store, f, indent=2)
 92 |         
 93 |         return True
 94 |     except Exception as e:
 95 |         print(f"Error saving chain: {str(e)}")  # More detailed error message
 96 |         return False
 97 | 
 98 | def get_similar_chains(
 99 |     task: str,
100 |     cohere_api_key: str,
101 |     n: int = 3,
102 |     store_file: str = "successful_chains.json"
103 | ) -> List[Dict]:
104 |     """Get n most similar chains for a given task."""
105 |     try:
106 |         # Get embedding for the query task
107 |         query_embedding = get_embedding(task, cohere_api_key, input_type="search_query")
108 |         if not query_embedding:
109 |             return []
110 |         
111 |         # Load chains
112 |         with open(store_file, 'r') as f:
113 |             store = json.load(f)
114 |         
115 |         # Calculate similarities
116 |         similarities = []
117 |         for chain in store["chains"]:
118 |             similarity = cosine_similarity(query_embedding, chain["embedding"])
119 |             similarities.append((similarity, chain))
120 |         
121 |         # Sort by similarity and get top n
122 |         similarities.sort(reverse=True, key=lambda x: x[0])
123 |         result = [chain for _, chain in similarities[:n]]
124 |         return result
125 |         
126 |     except Exception as e:
127 |         return []
128 | 
129 | def prepare_examples_messages(similar_chains: List[Dict], current_tools: List[Dict]) -> List[Dict]:
130 |     """
131 |     Prepare example chains as messages for the prompt.
132 |     Now includes information about available tools.
133 |     """
134 |     if not similar_chains:
135 |         return []
136 |     
137 |     messages = []
138 |     for chain in similar_chains:
139 |         # Get the tool names for both current and historical tools
140 |         current_tool_names = {t['function']['name'] for t in current_tools}
141 |         historical_tool_names = {t['function']['name'] for t in chain.get('tools', [])}
142 |         
143 |         # Create tool availability message
144 |         tool_message = "Available tools in this example:"
145 |         for tool_name in historical_tool_names:
146 |             status = "✓" if tool_name in current_tool_names else "✗"
147 |             tool_message += f"\n- {tool_name} {status}"
148 |         
149 |         # Add system message with the example task and tool information
150 |         messages.append({
151 |             "role": "system",
152 |             "content": (
153 |                 "<TASK>\n"
154 |                 f"{chain['task']}\n\n"
155 |                 f"{tool_message}\n\n"
156 |                 "<INSTRUCTIONS>\n"
157 |                 "Slow down your thinking by breaking complex questions into multiple reasoning steps.\n"
158 |                 "Each individual reasoning step should be brief.\n"
159 |                 "Return <DONE> after the last step."
160 |             )
161 |         })
162 |         
163 |         # Add the conversation history
164 |         messages.extend(chain["conversation_history"])
165 |     
166 |     # For each message, replace any instance of the substring TASK with EXAMPLE_TASK
167 |     for i, msg in enumerate(messages):
168 |         if 'TASK' in msg['content']:
169 |             messages[i]['content'] = msg['content'].replace('CURRENT_TASK', 'EXAMPLE_TASK')
170 |             messages[i]['content'] = msg['content'].replace('TASK', 'EXAMPLE_TASK')
171 |             messages[i]['content'] = messages[i]['content'].replace('EXAMPLE_EXAMPLE_TASK', 'EXAMPLE_TASK')
172 |     
173 |     return messages


--------------------------------------------------------------------------------
/chat_loop.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | import json
 3 | from typing import List, Dict, Optional
 4 | import os
 5 | from dotenv import load_dotenv
 6 | 
 7 | # Load environment variables
 8 | load_dotenv()
 9 | 
10 | def call_reason_api(
11 |     task: str, 
12 |     previous_chains: Optional[List[List[Dict]]] = None,
13 |     api_key: Optional[str] = None,
14 |     model: str = "openai/gpt-4o-mini",
15 |     api_url: str = "https://openrouter.ai/api/v1/chat/completions",
16 |     verbose: bool = True
17 | ) -> tuple[Dict, List[Dict]]:
18 |     """Call the reasoning API and return the response and chain."""
19 |     
20 |     url = "http://localhost:5050/reason"
21 |     
22 |     payload = {
23 |         "task": task,
24 |         "api_key": "[redacted]",
25 |         "model": "openai/gpt-4o-mini",
26 |         "api_url": "https://openrouter.ai/api/v1/chat/completions",
27 |         "verbose": verbose
28 |     }
29 | 
30 |     if previous_chains:
31 |         payload["previous_chains"] = previous_chains
32 | 
33 |     try:
34 |         response = requests.post(url, json=payload)
35 |         response.raise_for_status()
36 |         data = response.json()
37 |         return data["response"], data["reasoning_chain"]
38 |     
39 |     except requests.exceptions.RequestException as e:
40 |         print(f"Error calling API: {e}")
41 |         if hasattr(e.response, 'text'):
42 |             print(f"Response text: {e.response.text}")
43 |         raise
44 | 
45 | def main():
46 |     print("Welcome to the reasoning chat loop!")
47 |     print("Type 'exit' to quit, 'clear' to start a new conversation.")
48 |     print("Enter your message:")
49 | 
50 |     conversation_chains = []
51 |     
52 |     while True:
53 |         user_input = input("\n> ").strip()
54 |         
55 |         if user_input.lower() == 'exit':
56 |             break
57 |         
58 |         if user_input.lower() == 'clear':
59 |             conversation_chains = []
60 |             print("\nConversation cleared. Starting fresh!")
61 |             continue
62 |             
63 |         try:
64 |             # Call API with previous conversation chains
65 |             response, chain = call_reason_api(
66 |                 task=user_input,
67 |                 previous_chains=conversation_chains
68 |             )
69 |             
70 |             # Print the response
71 |             if isinstance(response, dict):
72 |                 if response.get('content'):
73 |                     print("\nAssistant:", response['content'])
74 |                 if response.get('tool_calls'):
75 |                     print("\nTool Calls:", json.dumps(response['tool_calls'], indent=2))
76 |             else:
77 |                 print("\nAssistant:", response)
78 |             
79 |             # Add this chain to our conversation history
80 |             conversation_chains.append(chain)
81 |             
82 |             # Print conversation stats
83 |             print(f"\n(Conversation history: {len(conversation_chains)} chains)")
84 |             
85 |         except Exception as e:
86 |             print(f"\nError: {e}")
87 | 
88 | if __name__ == "__main__":
89 |     main() 


--------------------------------------------------------------------------------
/engine.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import requests
  3 | from e2b_code_interpreter import Sandbox
  4 | from typing import List, Dict, Optional, Tuple, Union
  5 | from colorama import init, Fore, Style
  6 | from tools import execute_tool, clear_interpreter_state
  7 | import json
  8 | from datetime import datetime
  9 | from chain_store import (
 10 |     get_similar_chains,
 11 |     prepare_examples_messages
 12 | )
 13 | from planner import generate_plan  # Add this import at the top
 14 | from call_ai import send_message_to_api, generate_best_candidate
 15 | from helpers import validate_conversation
 16 | 
 17 | # Initialize colorama for cross-platform colored output
 18 | init()
 19 | 
 20 | def thinking_loop(
 21 |     task: str,
 22 |     api_key: str,
 23 |     tools: List[Dict],
 24 |     model: str = 'gpt-4o-mini',
 25 |     temperature: float = 0.7,
 26 |     top_p: float = 1.0,
 27 |     max_tokens: int = 500,
 28 |     api_url: str = 'https://api.openai.com/v1/chat/completions',
 29 |     verbose: bool = False,
 30 |     chain_store_api_key: Optional[str] = None,
 31 |     wolfram_app_id: Optional[str] = None,
 32 |     max_reasoning_steps: Optional[int] = None,
 33 |     sandbox: Optional[Sandbox] = None,
 34 |     image: Optional[str] = None,
 35 |     reflection_mode: bool = False,
 36 |     previous_chains: Optional[List[List[Dict]]] = None,
 37 |     use_planning: bool = True,
 38 |     beam_search_enabled: bool = False,
 39 |     num_candidates: int = 1,
 40 |     use_jeremy_planning: bool = False,
 41 |     jina_api_key: Optional[str] = None
 42 | ) -> List[Dict]:
 43 |     """
 44 |     Execute the thinking loop and return the conversation history.
 45 |     Uses planning from memory to guide reasoning.
 46 |     """
 47 |     conversation_history = []
 48 |     continue_loop = True
 49 |     step_count = 1
 50 | 
 51 |     if verbose:
 52 |         print(f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}")
 53 |         print(f"{Fore.MAGENTA}│ Starting Thinking Loop{Style.RESET_ALL}")
 54 |         if max_reasoning_steps:
 55 |             print(f"{Fore.MAGENTA}│ Maximum steps: {max_reasoning_steps}{Style.RESET_ALL}")
 56 |         print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}\n")
 57 | 
 58 |     # Get similar chains and generate plan
 59 |     action_plan = ""
 60 |     if chain_store_api_key and use_planning:
 61 |         similar_chains = get_similar_chains(task, chain_store_api_key)
 62 |         if similar_chains:
 63 |             action_plan = generate_plan(
 64 |                 task=task,
 65 |                 similar_chains=similar_chains,
 66 |                 current_tools=tools,
 67 |                 api_key=api_key,
 68 |                 model=model,
 69 |                 api_url=api_url,
 70 |                 verbose=verbose,
 71 |                 metadata={
 72 |                     "model": model,
 73 |                     "max_steps": max_reasoning_steps,
 74 |                     "reflection_mode": reflection_mode
 75 |                 }
 76 |             )
 77 | 
 78 |     # Add previous chains directly to the conversation history
 79 |     if previous_chains:
 80 |         for chain in previous_chains:
 81 |             conversation_history.extend(chain)
 82 | 
 83 |     # Create the system message for the current task
 84 |     tool_list = []
 85 |     tool_list.append("find_datapoint_on_web: Search Google using SERPAPI to find factual information. Returns top search results with titles, snippets, and URLs.")
 86 |     tool_list.append("python: For executing Python code")
 87 |     
 88 |     if wolfram_app_id:
 89 |         tools.append("wolfram: Query Wolfram Alpha for precise mathematical, scientific, and factual computations")
 90 |         
 91 |     if jina_api_key:
 92 |         tools.append("get_webpage_content: Retrieve detailed content from specific webpages using Jina API. Use this when you want to read the full content of a webpage")
 93 |         
 94 |     tools_description = "You have access to these tools:\n" + "\n".join(f"{i+1}. {tool}" for i, tool in enumerate(tools))
 95 | 
 96 |     # Include the generated plan in the system message
 97 |     plan_section = ""
 98 |     if action_plan:
 99 |         plan_section = f"\n<SUGGESTED_APPROACH_BASED_ON_SIMILAR_TASKS>\n{action_plan}\n</SUGGESTED_APPROACH_BASED_ON_SIMILAR_TASKS>\n"
100 | 
101 |     # Update the web search instructions in the system message
102 |     web_search_instructions = (
103 |         "\nWhen searching the web:\n"
104 |         "- The find_datapoint_on_web tool uses SERPAPI to search Google with enhanced results\n"
105 |         "- Results may include knowledge graph data, featured snippets, and detailed summaries\n"
106 |         "- Each result contains multiple sections including titles, snippets, and structured data\n"
107 |         "- Make queries specific and focused on finding factual information\n"
108 |         "- Use keywords rather than full questions for better search results\n"
109 |         "- Cross-reference information from multiple sources when possible\n"
110 |         "- If initial results don't contain enough detail, try searching with different keywords\n"
111 |         "- Always cite sources when providing information from search results\n"
112 |     )
113 | 
114 |     system_message = {
115 |         'role': 'system',
116 |         'content': (
117 |             f"<CURRENT_TASK>\n{task}\n\n"
118 |             f"{plan_section}"
119 |             "<INSTRUCTIONS>\n"
120 |             "Slow down your thinking by breaking complex questions into multiple reasoning steps.\n"
121 |             "Each individual reasoning step should be brief.\n"
122 |             f"{tools_description}\n\n"
123 |             "When you need to write or test Python code, use the python tool.\n"
124 |             "When you need to search for information, use the find_datapoint_on_web tool.\n"
125 |             + (
126 |                 "When you need precise mathematical or scientific computations, use the wolfram tool.\n"
127 |                 if wolfram_app_id else ""
128 |             ) +
129 |             f"{web_search_instructions}\n"
130 |             "\nWhen writing Python code:\n"
131 |             "- If your code produces an error, add print statements to debug the issue\n"
132 |             "- Use assertions/prints to validate inputs, intermediate results, and outputs\n"
133 |             "- Print the state to see what's happening\n"
134 |             "- When an error occurs, systematically add checks to identify where the problem is\n"
135 |             "- Structure your debugging process step by step\n"
136 |             + (
137 |                 "\nWhen using Wolfram Alpha:\n"
138 |                 "- Use for precise mathematical calculations and scientific data\n"
139 |                 "- Phrase queries clearly and specifically\n"
140 |                 "- Great for unit conversions, equations, and factual data\n"
141 |                 if wolfram_app_id else ""
142 |             ) +
143 |             "\nReturn <DONE> after the last step."
144 |         )
145 |     }
146 | 
147 |     # Start with system message and previous chains
148 |     full_conversation_history = conversation_history + [system_message]
149 | 
150 |     if image:
151 |         full_conversation_history.append({
152 |             'role': 'user',
153 |             'content': [
154 |                 {
155 |                     'type': 'text',
156 |                     'text': f"Here is the image the user provided:"
157 |                 },
158 |                 {
159 |                     'type': 'image_url',
160 |                     'image_url': {
161 |                         'url': image
162 |                     }
163 |                 }
164 |             ]
165 |         })
166 | 
167 |     # Add initial planning step request
168 |     if use_jeremy_planning:
169 |         initial_planning_message = {
170 |             'role': 'user',
171 |             'content': (
172 |             # 'Before we begin solving the task, let\'s create a detailed plan. Please:\n'
173 |             # '1. Break down the task into clear sub-goals\n'
174 |             # '2. Identify which tools will be needed for each sub-goal\n'
175 |             # '3. Outline potential challenges and how to address them\n'
176 |             # '4. Determine verification criteria for each sub-goal\n'
177 |             # '5. Most importantly, generate a suite of test cases for each sub-goal, as well as test cases for the overall task\n'
178 |             # "In this planning step, make it very clear that until each test case is verified, we should not proceed with the actual solution.\n"
179 |             # 'Provide a structured plan before we proceed with the actual solution.'
180 |             "Before we move on, make a list of wrong assumptions people sometimes make about the concepts included in the question."
181 |             )
182 |         }
183 |         conversation_history.append(initial_planning_message)
184 |         full_conversation_history.append(initial_planning_message)
185 | 
186 |         # Get planning response
187 |         planning_response = send_message_to_api(
188 |             task,
189 |             full_conversation_history,
190 |             api_key,
191 |             tools,
192 |             model,
193 |             temperature,
194 |             top_p,
195 |             max_tokens,
196 |             api_url,
197 |             verbose,
198 |             tool_choice="none",
199 |         )
200 | 
201 |         # Add planning response to histories
202 |         planning_message = {
203 |             'role': 'assistant',
204 |             'content': planning_response.get('content'),
205 |             'tool_calls': planning_response.get('tool_calls', None)
206 |         }
207 |         conversation_history.append(planning_message)
208 |         full_conversation_history.append(planning_message)
209 | 
210 |     while continue_loop:
211 |         # Check if we've exceeded max steps
212 |         if max_reasoning_steps and step_count > max_reasoning_steps:
213 |             if verbose:
214 |                 print(f"\n{Fore.YELLOW}Maximum reasoning steps ({max_reasoning_steps}) reached. Forcing completion.{Style.RESET_ALL}")
215 | 
216 |             # Add a system message explaining the forced stop
217 |             force_stop_message = {
218 |                 'role': 'system',
219 |                 'content': (
220 |                     f"Maximum reasoning steps ({max_reasoning_steps}) reached. "
221 |                 )
222 |             }
223 |             conversation_history.append(force_stop_message)
224 |             full_conversation_history.append(force_stop_message)
225 | 
226 |             # Add a user message requesting the final answer
227 |             final_user_message = {
228 |                 'role': 'user',
229 |                 'content': (
230 |                     'Based on your reasoning so far, provide your final answer to the CURRENT_TASK. '
231 |                     'Make your response complete and self-contained since this will be shown to the user.'
232 |                     "Please provide your final answer based on what you've learned so far. "
233 |                     "Do not return <DONE>, and **you are not allowed to use any tools**. Just respond with your final answer."
234 |                 )
235 |             }
236 |             conversation_history.append(final_user_message)
237 |             full_conversation_history.append(final_user_message)
238 | 
239 |             # Get final response when hitting max steps
240 |             response = send_message_to_api(
241 |                 task,
242 |                 full_conversation_history,
243 |                 api_key,
244 |                 tools,
245 |                 model,
246 |                 temperature,
247 |                 top_p,
248 |                 max_tokens,
249 |                 api_url,
250 |                 verbose
251 |             )
252 |             print('Final response:', response)
253 | 
254 |             # Add the final response to histories
255 |             assistant_message = {
256 |                 'role': 'assistant',
257 |                 'content': response.get('content'),
258 |                 'tool_calls': response.get('tool_calls', None)
259 |             }
260 |             conversation_history.append(assistant_message)
261 |             full_conversation_history.append(assistant_message)
262 | 
263 |             if verbose and response.get('content'):
264 |                 print(f"\n{Fore.GREEN}Final Response after max steps:{Style.RESET_ALL}")
265 |                 print(response.get('content'))
266 | 
267 |             # Return here to skip the additional final response request
268 |             return full_conversation_history
269 | 
270 |         if verbose:
271 |             print(f"\n{Fore.BLUE}Step {step_count}{Style.RESET_ALL}")
272 |             print(f"{Fore.BLUE}{'─' * 40}{Style.RESET_ALL}")
273 | 
274 |         # Determine which message to send based on reflection mode and step count
275 |         if reflection_mode and step_count % 2 == 0:
276 |             # Even steps in reflection mode are for reflection
277 |             user_message = {
278 |                 'role': 'user',
279 |                 'content': (
280 |                     'Reflect on your last step — check for mistakes. '
281 |                     'Consider:\n'
282 |                     '1. Are your assumptions valid and well-justified?\n'
283 |                     '2. Did you make any logical errors or jumps in reasoning?\n'
284 |                     '3. Is there a more effective or efficient approach?\n'
285 |                     'Explain your analysis, whether you find issues or confirm the step was sound.\n'
286 |                     'Do not make a snap decision. Think carefully before deciding if the step is free of mistakes.\n'
287 |                     'Be brief and to the point.\n'
288 |                     'If this is the final step, return <DONE>.'
289 |                 )
290 |             } # Note — these reflection steps are often a bit long, which may lead to the non-reflection steps doing more work per step than they should. Figure this out later.
291 |         else:
292 |             if False: # until we've perfected this, let's not use it (it seems to slightly reduce performance, interestingly)
293 |                 user_message = {
294 |                     'role': 'user',
295 |                     'content': (
296 |                         'Think about your first reasoning step to perform the CURRENT_TASK. '
297 |                         'Return just the first step. '
298 |                         'Remember, steps should be very brief. '
299 |                     )
300 |                 }
301 |             else:
302 |                 # Odd steps or non-reflection mode use the original message
303 |                 user_message = {
304 |                 'role': 'user',
305 |                 'content': (
306 |                     'Think about your next reasoning step to perform the CURRENT_TASK. '
307 |                     'Return just the next step. '
308 |                     'Remember, steps should be very brief. '
309 |                     'If this is the final step, return <DONE>.'
310 | #                     """Think about your next reasoning step. Consider:
311 | # 1. What did you observe in the previous step's results?
312 | # 2. What needs to be validated or corrected based on those results?
313 | # 3. What's the most logical next step to make progress?
314 | # Return a brief step focused on making concrete progress.
315 | # If this is the final step, return <DONE>."""
316 |                 )
317 |             }
318 | 
319 |         # Add to both conversation histories
320 |         conversation_history.append(user_message)
321 |         full_conversation_history.append(user_message)
322 | 
323 |         # Get response from AI API
324 |         if beam_search_enabled:
325 |             response = generate_best_candidate(
326 |                 task,
327 |                 full_conversation_history,
328 |                 api_key,
329 |                 tools,
330 |                 num_candidates,
331 |                 model,
332 |                 temperature,
333 |                 top_p,
334 |                 max_tokens,
335 |                 api_url,
336 |                 verbose,
337 |                 is_first_step=(step_count == 1)
338 |             )
339 |         else:
340 |             response = send_message_to_api(
341 |                 task,
342 |                 full_conversation_history,
343 |                 api_key,
344 |                 tools,
345 |                 model,
346 |                 temperature,
347 |                 top_p,
348 |                 max_tokens,
349 |                 api_url,
350 |                 verbose,
351 |                 is_first_step=(step_count == 1)
352 |             )
353 | 
354 |         # Add assistant's response to both histories
355 |         assistant_message = {
356 |             'role': 'assistant',
357 |             'content': response.get('content'),
358 |             'tool_calls': response.get('tool_calls', None)
359 |         }
360 |         conversation_history.append(assistant_message)
361 |         full_conversation_history.append(assistant_message)
362 | 
363 |         if verbose and response.get('content'):
364 |             print(f"\n{Fore.GREEN}Assistant: {Style.RESET_ALL}{response['content']}")
365 | 
366 |         # Handle tool calls
367 |         if 'tool_calls' in response and response['tool_calls']:
368 |             for tool_call in response['tool_calls']:
369 |                 if verbose:
370 |                     print(f"\n{Fore.YELLOW}╭──────────────────────────────────────────{Style.RESET_ALL}")
371 |                     print(f"{Fore.YELLOW}│ Tool Call Detected{Style.RESET_ALL}")
372 |                     print(f"{Fore.YELLOW}├──────────────────────────────────────────{Style.RESET_ALL}")
373 | 
374 |                 try:
375 |                     # Execute tool and get result
376 |                     tool_name = tool_call['function']['name']
377 | 
378 |                     # Add error handling for argument parsing
379 |                     try:
380 |                         if 'arguments' not in tool_call['function'] or not tool_call['function']['arguments']:
381 |                             error_msg = "No arguments provided in tool call"
382 |                             if verbose:
383 |                                 print(f"{Fore.RED}{error_msg}{Style.RESET_ALL}")
384 |                             raise ValueError(error_msg)
385 | 
386 |                         arguments = json.loads(tool_call['function']['arguments'])
387 | 
388 |                     except json.JSONDecodeError as e:
389 |                         error_msg = f"Invalid JSON in tool arguments: {tool_call['function'].get('arguments', 'NO_ARGS')}"
390 |                         if verbose:
391 |                             print(f"{Fore.RED}{error_msg}{Style.RESET_ALL}")
392 |                             print(f"{Fore.RED}Error: {str(e)}{Style.RESET_ALL}")
393 |                         raise ValueError(error_msg)
394 | 
395 |                     if verbose:
396 |                         print(f"{Fore.YELLOW}│ Tool: {Style.RESET_ALL}{tool_name}")
397 |                         print(f"{Fore.YELLOW}│ Arguments: {Style.RESET_ALL}{json.dumps(arguments, indent=2)}")
398 | 
399 |                     result = execute_tool(
400 |                         tool_name,
401 |                         arguments,
402 |                         task=task,
403 |                         api_key=api_key,
404 |                         model=model,
405 |                         api_url=api_url,
406 |                         wolfram_app_id=wolfram_app_id,
407 |                         sandbox=sandbox,
408 |                         jina_api_key=jina_api_key
409 |                     )
410 | 
411 |                     # Add tool result to both histories
412 |                     tool_message = {
413 |                         'role': 'tool',
414 |                         'tool_call_id': tool_call['id'],
415 |                         'content': str(result)
416 |                     }
417 |                     conversation_history.append(tool_message)
418 |                     full_conversation_history.append(tool_message)
419 | 
420 |                     if verbose:
421 |                         print(f"{Fore.YELLOW}│ Result: {Style.RESET_ALL}{result}")
422 |                         print(f"{Fore.YELLOW}╰──────────────────────────────────────────{Style.RESET_ALL}\n")
423 | 
424 |                 except Exception as e:
425 |                     error_msg = str(e)
426 |                     if verbose:
427 |                         print(f"{Fore.RED}Error executing tool: {error_msg}{Style.RESET_ALL}")
428 | 
429 |                     # Add error message to conversation history so model can correct its approach
430 |                     error_message = {
431 |                         'role': 'tool',
432 |                         'content': (
433 |                             f"Error using {tool_name} tool: {error_msg}\n"
434 |                             "Please correct your approach and try again."
435 |                         ),
436 |                         'tool_call_id': tool_call['id']
437 |                     }
438 |                     conversation_history.append(error_message)
439 |                     full_conversation_history.append(error_message)
440 |                     continue
441 | 
442 |         # Check for termination conditions
443 |         if response.get('content'):
444 |             termination_phrases = [
445 |                 '<done>', 'done', 'there is no next step.',
446 |                 'this conversation is complete', 'the conversation has ended.',
447 |                 'this conversation is finished.', 'the conversation has concluded.'
448 |             ]
449 | 
450 |             if any(term in response['content'].lower() for term in termination_phrases):
451 |                 if verbose:
452 |                     print(f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}")
453 |                     print(f"{Fore.MAGENTA}│ Thinking Loop Complete{Style.RESET_ALL}")
454 |                     print(f"{Fore.MAGENTA}│ Total Steps: {step_count}{Style.RESET_ALL}")
455 |                     print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}\n")
456 |                 continue_loop = False
457 | 
458 |         step_count += 1
459 | 
460 |     return full_conversation_history
461 | 
462 | def complete_reasoning_task(
463 |     task: str,
464 |     api_key: Optional[str] = None,
465 |     model: str = 'gpt-4o-mini',
466 |     temperature: float = 0.7,
467 |     top_p: float = 1.0,
468 |     max_tokens: int = 3000,
469 |     api_url: str = 'https://api.openai.com/v1/chat/completions',
470 |     verbose: bool = False,
471 |     log_conversation: bool = False,
472 |     chain_store_api_key: Optional[str] = None,
473 |     wolfram_app_id: Optional[str] = None,
474 |     max_reasoning_steps: Optional[int] = None,
475 |     image: Optional[str] = None,
476 |     output_tools: Optional[List[Dict]] = None,
477 |     reflection_mode: bool = False,
478 |     previous_chains: Optional[List[List[Dict]]] = None,
479 |     use_planning: bool = False,
480 |     beam_search_enabled: bool = False,
481 |     num_candidates: int = 1,
482 |     use_jeremy_planning: bool = False,
483 |     jina_api_key: Optional[str] = None
484 | ) -> Tuple[Union[str, Dict], List[Dict], List[Dict], List[Dict]]:
485 |     """
486 |     Execute the reasoning task and return the final response.
487 |     Now supports optional structured output via output_tools, reflection mode,
488 |     and previous conversation chains.
489 |     """
490 |     sandbox = None
491 |     try:
492 |         # Clear Python interpreter state for just this task
493 |         clear_interpreter_state(task=task)
494 | 
495 |         if api_key is None:
496 |             raise ValueError('API key not provided.')
497 | 
498 |         if verbose:
499 |             print(f"\n{Fore.MAGENTA}╭──────────────────────────────────────────{Style.RESET_ALL}")
500 |             print(f"{Fore.MAGENTA}│ Starting Task{Style.RESET_ALL}")
501 |             print(f"{Fore.MAGENTA}├──────────────────────────────────────────{Style.RESET_ALL}")
502 |             print(f"{Fore.MAGENTA}│ {task}{Style.RESET_ALL}")
503 |             if previous_chains:
504 |                 print(f"{Fore.MAGENTA}│ With {len(previous_chains)} previous conversation chains{Style.RESET_ALL}")
505 |             print(f"{Fore.MAGENTA}╰──────────────────────────────────────────{Style.RESET_ALL}\n")
506 | 
507 |         # Initialize E2B sandbox for Python code execution
508 |         timeout = 60 * 15 # 10 minutes
509 |         for attempt in range(3):  # Try 3 times
510 |             try:
511 |                 sandbox = Sandbox(timeout=timeout)
512 |                 break  # If successful, exit the loop
513 |             except Exception as e:
514 |                 if attempt == 2:  # If this was the last attempt
515 |                     raise Exception(f"Failed to create sandbox after 3 attempts. Last error: {e}")
516 |                 continue
517 | 
518 |         # Define thinking tools (internal tools that can be used during reasoning)
519 |         thinking_tools = [
520 |             {
521 |                 "type": "function",
522 |                 "function": {
523 |                     "name": "python",
524 |                     "description": "Execute Python code and return the output.",
525 |                     "parameters": {
526 |                         "type": "object",
527 |                         "properties": {
528 |                             "code": {
529 |                                 "type": "string",
530 |                                 "description": "The Python code to execute"
531 |                             },
532 |                         },
533 |                         "required": ["code"]
534 |                     }
535 |                 }
536 |             },
537 |             {
538 |                 "type": "function",
539 |                 "function": {
540 |                     "name": "find_datapoint_on_web",
541 |                     "description": "Search Google using SERPAPI to find factual information. Returns top search results with titles, snippets, and URLs.",
542 |                     "parameters": {
543 |                         "type": "object",
544 |                         "properties": {
545 |                             "query": {
546 |                                 "type": "string",
547 |                                 "description": "The specific query"
548 |                             }
549 |                         },
550 |                         "required": ["query"]
551 |                     }
552 |                 }
553 |             }
554 |         ]
555 | 
556 |         # Add Wolfram tool if wolfram_app_id is provided
557 |         if wolfram_app_id:
558 |             thinking_tools.append({
559 |                 "type": "function",
560 |                 "function": {
561 |                     "name": "wolfram",
562 |                     "description": "Query Wolfram Alpha for computations, math, science, and knowledge. Great for mathematical analysis, scientific calculations, data analysis, and fact-checking.",
563 |                     "parameters": {
564 |                         "type": "object",
565 |                         "properties": {
566 |                             "query": {
567 |                                 "type": "string",
568 |                                 "description": "The query to send to Wolfram Alpha. Be specific and precise."
569 |                             },
570 |                             "include_pods": {
571 |                                 "type": "array",
572 |                                 "items": {
573 |                                     "type": "string"
574 |                                 },
575 |                                 "description": "Optional list of pod names to include (e.g., ['Result', 'Solution', 'Plot']). Leave empty for all pods.",
576 |                                 "default": None
577 |                             },
578 |                             "max_width": {
579 |                                 "type": "integer",
580 |                                 "description": "Maximum width for plots/images",
581 |                                 "default": 1000
582 |                             }
583 |                         },
584 |                         "required": ["query"]
585 |                     }
586 |                 }
587 |             })
588 | 
589 |         # Add Jina tool if jina_api_key is provided
590 |         if jina_api_key:
591 |             thinking_tools.append({
592 |                 "type": "function",
593 |                 "function": {
594 |                     "name": "get_webpage_content",
595 |                     "description": "Retrieve the content of a webpage using Jina API. Useful for reading detailed content from search results or specific URLs.",
596 |                     "parameters": {
597 |                         "type": "object",
598 |                         "properties": {
599 |                             "url": {
600 |                                 "type": "string",
601 |                                 "description": "The URL of the webpage to fetch content from"
602 |                             }
603 |                         },
604 |                         "required": ["url"]
605 |                     }
606 |                 }
607 |             })
608 | 
609 |         # Add output tools description
610 |         output_tools_description = ""
611 |         if output_tools:
612 |             output_tools_description = "\n\nWhen providing your final response, you can use these output functions (but you don't have access to them during reasoning steps):\n"
613 |             for tool in output_tools:
614 |                 output_tools_description += f"- {tool['function']['name']}: {tool['function']['description']}\n"
615 | 
616 |         # Create initial conversation history with previous chains
617 |         conversation_history = []
618 |         if previous_chains:
619 |             for chain in previous_chains:
620 |                 conversation_history.extend(chain)
621 | 
622 |         # Run thinking loop with thinking tools
623 |         conversation_history = thinking_loop(
624 |             task,
625 |             api_key,
626 |             thinking_tools,
627 |             model,
628 |             temperature,
629 |             top_p,
630 |             max_tokens,
631 |             api_url,
632 |             verbose,
633 |             chain_store_api_key=chain_store_api_key,
634 |             wolfram_app_id=wolfram_app_id,
635 |             max_reasoning_steps=max_reasoning_steps,
636 |             sandbox=sandbox,
637 |             image=image,
638 |             reflection_mode=reflection_mode,
639 |             previous_chains=previous_chains,
640 |             use_planning=use_planning,
641 |             beam_search_enabled=beam_search_enabled,
642 |             num_candidates=num_candidates,
643 |             use_jeremy_planning=use_jeremy_planning,
644 |             jina_api_key=jina_api_key
645 |         )
646 | 
647 |         # Only request final response if we didn't hit max steps
648 |         final_response = None
649 |         if not max_reasoning_steps or len([m for m in conversation_history if m['role'] == 'system' and 'Maximum reasoning steps' in m.get('content', '')]) == 0:
650 |             # Add final completion request
651 |             final_user_message = {
652 |                 'role': 'user',
653 |                 'content': (
654 |                     'Complete the <CURRENT_TASK>. Do not return <DONE>. '
655 |                     'Note that the user will only see what you return here. '
656 |                     'None of the steps you have taken will be shown to the user, so ensure you return the final answer. '
657 |                     + ('You can return a text response and/or use one of the available output functions.' if output_tools else '')
658 |                 )
659 |             }
660 |             conversation_history.append(final_user_message)
661 | 
662 |             if verbose:
663 |                 print(f"{Fore.CYAN}Requesting final response...{Style.RESET_ALL}\n")
664 | 
665 |             # Get final response with output tools if provided
666 | 
667 |             # Wrapping in try/except to catch any errors and try again with validated conversation history — for now... just because I'm not 100% sure if the validation is working and I don't want to risk messing up already solid chains
668 |             try:
669 |                 final_response = send_message_to_api(
670 |                     task,
671 |                     conversation_history,
672 |                     api_key,
673 |                     output_tools if output_tools else thinking_tools,  # Use output tools for final response if provided
674 |                     model,
675 |                     temperature,
676 |                     top_p,
677 |                     max_tokens,
678 |                     api_url,
679 |                     verbose
680 |                 )
681 |             except Exception as e:
682 |                 print(f"{Fore.RED}Error sending final response: {e}{Style.RESET_ALL}")
683 |                 print(f"{Fore.YELLOW}Trying again with validated conversation history...{Style.RESET_ALL}")
684 |                 final_response = send_message_to_api(
685 |                     task,
686 |                     validate_conversation(conversation_history),
687 |                     api_key,
688 |                     output_tools if output_tools else thinking_tools,
689 |                     model,
690 |                     temperature,
691 |                     top_p,
692 |                     max_tokens,
693 |                     api_url,
694 |                     verbose
695 |                 )
696 |             
697 |             # Add the final response to the conversation history
698 |             assistant_message = {
699 |                 'role': 'assistant',
700 |                 'content': final_response.get('content'),
701 |                 'tool_calls': final_response.get('tool_calls', None)
702 |             }
703 |             conversation_history.append(assistant_message)
704 |         else:
705 |             # Use the last assistant message as the final response
706 |             final_response = next(
707 |                 (msg for msg in reversed(conversation_history)
708 |                  if msg['role'] == 'assistant' and msg.get('content')),
709 |                 {'content': None}
710 |             )
711 | 
712 |         # Print final response if verbose
713 |         if verbose and ('content' in final_response or 'tool_calls' in final_response):
714 |             print(f'\n{Fore.GREEN}Final Response:{Style.RESET_ALL}')
715 |             if 'content' in final_response and 'tool_calls' in final_response:
716 |                 print(f"Content: {final_response['content']}")
717 |                 print(f"Tool Calls: {final_response['tool_calls']}")
718 |             elif 'content' in final_response:
719 |                 print(final_response['content'])
720 |             else:
721 |                 print(final_response['tool_calls'])
722 | 
723 |         if 'tool_calls' in final_response:
724 |             final_response_tool_calls = final_response['tool_calls']
725 |         else:
726 |             final_response_tool_calls = None
727 | 
728 |         if 'content' in final_response:
729 |             final_response_content = final_response['content']
730 |         else:
731 |             final_response_content = None
732 | 
733 |         # Log conversation history if logging is enabled
734 |         if log_conversation:
735 |             # Remove example chains from conversation history by removing everything prior to the bottom-most system message
736 |             ### THIS MAY NOT WORK IF WE'RE INJECTING SYSTEM MESSAGES INTO THE CHAIN (I THINK WE'RE DOING THIS, SO IT'S WORTH REVISITING)!
737 |             bottom_system_message_index = next((i for i, msg in enumerate(reversed(conversation_history)) if msg.get('role') == 'system'), None)
738 |             if bottom_system_message_index is not None:
739 |                 conversation_history = conversation_history[-bottom_system_message_index:]
740 | 
741 |             # Create logs directory if it doesn't exist
742 |             os.makedirs('logs', exist_ok=True)
743 | 
744 |             # Create filename with timestamp
745 |             timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
746 |             filename = f'logs/conversation_{timestamp}.json'
747 | 
748 |             # Prepare log data
749 |             log_data = {
750 |                 'task': task,
751 |                 'model': model,
752 |                 'temperature': temperature,
753 |                 'top_p': top_p,
754 |                 'max_tokens': max_tokens,
755 |                 'api_url': api_url,
756 |                 'reasoning_chain': conversation_history,
757 |                 'final_response': final_response_content,
758 |                 'final_response_tool_calls': final_response_tool_calls,
759 |                 'thinking_tools': thinking_tools,
760 |                 'output_tools': output_tools
761 |             }
762 | 
763 |             # Write to file
764 |             try:
765 |                 with open(filename, 'w', encoding='utf-8') as f:
766 |                     json.dump(log_data, f, indent=2, ensure_ascii=False)
767 |                 if verbose:
768 |                     print(f"\n{Fore.CYAN}Conversation history logged to: {Style.RESET_ALL}{filename}")
769 |             except Exception as e:
770 |                 if verbose:
771 |                     print(f"\n{Fore.RED}Failed to log conversation history: {Style.RESET_ALL}{str(e)}")
772 | 
773 |         return {'content': final_response_content, 'tool_calls': final_response_tool_calls}, conversation_history, thinking_tools, output_tools
774 | 
775 |     finally:
776 |         # Clean up sandbox resources
777 |         if sandbox:
778 |             sandbox.kill()
779 | 


--------------------------------------------------------------------------------
/helpers.py:
--------------------------------------------------------------------------------
 1 | def validate_conversation(history):
 2 |     """
 3 |     Before generating the final response, ensure all tool calls have responses.
 4 |     If a tool call doesn't have a response, include it in the message content instead.
 5 |     """
 6 |     tool_call_ids = set()
 7 |     tool_response_ids = set()
 8 |     
 9 |     for message in history:
10 |         if message.get("role") == "assistant" and message.get("tool_calls"):
11 |             for tool_call in message["tool_calls"]:
12 |                 tool_call_ids.add(tool_call["id"])
13 |         elif message.get("role") == "tool":
14 |             tool_response_ids.add(message["tool_call_id"])
15 |     
16 |     # If there are unmatched tool calls, convert them to content
17 |     if tool_call_ids != tool_response_ids:
18 |         filtered_history = []
19 |         
20 |         for message in history:
21 |             if message.get("role") == "assistant" and message.get("tool_calls"):
22 |                 message_copy = message.copy()
23 |                 content = message_copy.get("content", "")
24 |                 
25 |                 # Convert unmatched tool calls to content
26 |                 for tool_call in message_copy["tool_calls"]:
27 |                     if tool_call["id"] not in tool_response_ids:
28 |                         tool_content = f"\nTool Call: {tool_call['function']['name']}({tool_call['function']['arguments']})"
29 |                         content = (content + tool_content) if content else tool_content
30 |                 
31 |                 # Only keep matched tool calls
32 |                 message_copy["tool_calls"] = [tc for tc in message_copy["tool_calls"] 
33 |                                             if tc["id"] in tool_response_ids]
34 |                 message_copy["content"] = content
35 |                 
36 |                 filtered_history.append(message_copy)
37 |             else:
38 |                 filtered_history.append(message)
39 |                 
40 |         return filtered_history
41 |     return history


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | # main.py
 2 | from dotenv import load_dotenv
 3 | import os
 4 | from engine import complete_reasoning_task
 5 | from mixture import ensemble
 6 | import chain_store
 7 | 
 8 | load_dotenv()
 9 | 
10 | OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY")
11 | JINA_API_KEY = os.environ.get("JINA_API_KEY")
12 | COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
13 | 
14 | def save_chain_prompt() -> bool:
15 |     """Prompt user if they want to save the chain."""
16 |     while True:
17 |         response = input("\nWould you like to save this reasoning chain for future reference? (y/n): ").lower()
18 |         if response in ['y', 'yes']:
19 |             return True
20 |         elif response in ['n', 'no']:
21 |             return False
22 |         print("Please answer 'y' or 'n'")
23 | 
24 | def main():
25 |     # Initialize store
26 |     chain_store.init_store()
27 | 
28 |     task = """
29 |     Create a Python implementation of a Red-Black Tree with the following operations:
30 |     1. Insert a node
31 |     2. Delete a node
32 |     3. Search for a node
33 |     4. Print the tree in-order
34 | 
35 |     The implementation should maintain all Red-Black Tree properties:
36 |     - Every node is either red or black
37 |     - The root is black
38 |     - All leaves (NIL) are black
39 |     - If a node is red, then both its children are black
40 |     - Every path from root to leaves contains the same number of black nodes
41 | 
42 |     Test the implementation by:
43 |     1. Inserting the numbers [7, 3, 18, 10, 22, 8, 11, 26, 2, 6, 13]
44 |     2. Printing the tree structure showing colors
45 |     3. Deleting nodes 18 and 11
46 |     4. Printing the final tree structure
47 |     5. Searching for both present and non-present values
48 | 
49 |     Use the Python interpreter tool to implement and test this data structure.
50 |     """
51 | 
52 |     model = "anthropic/claude-3.5-sonnet"
53 |     api_url = "https://openrouter.ai/api/v1/chat/completions"
54 | 
55 |     # Run the engine
56 |     response, conversation_history, thinking_tools, output_tools = complete_reasoning_task(
57 |         task=task,
58 |         api_key=OPENROUTER_API_KEY,
59 |         model=model,
60 |         api_url=api_url,
61 |         verbose=True,
62 |         use_planning=False,
63 |         jina_api_key=JINA_API_KEY
64 |     )
65 | 
66 |     # Check if the run was successful (no errors in response)
67 |     if isinstance(response, dict) and not response.get('error'):
68 |         # Ask user if they want to save the chain
69 |         if save_chain_prompt():
70 |             try:
71 |                 # Save the chain
72 |                 chain_store.save_successful_chain(
73 |                     task=task,
74 |                     conversation_history=conversation_history,
75 |                     final_response=response,
76 |                     cohere_api_key=COHERE_API_KEY,
77 |                     thinking_tools=thinking_tools,
78 |                     output_tools=output_tools,
79 |                     metadata={"model": model, "api_url": api_url}
80 |                 )
81 |                 print("Chain saved successfully!")
82 |             except Exception as e:
83 |                 print(f"Error saving chain: {str(e)}")
84 |     else:
85 |         print("Run contained errors - skipping chain save prompt")
86 | 
87 |     return response, conversation_history, thinking_tools, output_tools
88 | 
89 | if __name__ == "__main__":
90 |     main()


--------------------------------------------------------------------------------
/mixture.py:
--------------------------------------------------------------------------------
  1 | from typing import List, Dict, Optional, Tuple, Any, Union
  2 | from engine import complete_reasoning_task
  3 | import json
  4 | from colorama import init, Fore, Style
  5 | from concurrent.futures import ThreadPoolExecutor, as_completed
  6 | 
  7 | # Initialize colorama for cross-platform colored output
  8 | init()
  9 | 
 10 | def run_agent(
 11 |     task: str,
 12 |     agent_config: Dict[str, str],
 13 |     verbose: bool = False,
 14 |     chain_store_api_key: Optional[str] = None,
 15 |     max_reasoning_steps: Optional[int] = None,
 16 |     wolfram_app_id: Optional[str] = None,
 17 |     default_temperature: float = 0.7,
 18 |     top_p: float = 1.0,
 19 |     max_tokens: int = 500,
 20 |     image: Optional[str] = None,
 21 |     output_tools: Optional[List[Dict]] = None,
 22 |     reflection_mode: bool = False
 23 | ) -> Tuple[Dict[str, str], str, List[Dict], List[Dict]]:
 24 |     """
 25 |     Run a single agent with the given configuration.
 26 |     
 27 |     Args:
 28 |         task: The task to complete
 29 |         agent_config: Dictionary containing 'model', 'api_key', and 'api_url'
 30 |         verbose: Whether to show detailed output
 31 |         chain_store_api_key: API key for chain store if using
 32 |         max_reasoning_steps: Maximum number of reasoning steps for this agent
 33 |         wolfram_app_id: Wolfram Alpha app ID if using
 34 |         default_temperature: Default temperature for the model if using
 35 |         top_p: Top p for the model if using
 36 |         max_tokens: Maximum number of tokens for the model if using
 37 |         image: Optional image to pass to the model if using
 38 |         output_tools: Optional list of output tools for the model if using
 39 |         reflection_mode: Whether to enable reflection mode for this agent
 40 |     Returns:
 41 |         Tuple of (agent_config, final_response, conversation_history, thinking_tools, output_tools)
 42 |     """
 43 |     # Reinitialize colorama for this process
 44 |     init(autoreset=True)
 45 |     
 46 |     if verbose:
 47 |         print(f"\n{Fore.CYAN}Running agent with model: {Style.RESET_ALL}{agent_config['model']}")
 48 |         if max_reasoning_steps:
 49 |             print(f"{Fore.CYAN}Max steps: {Style.RESET_ALL}{max_reasoning_steps}")
 50 |         print(f"{Fore.CYAN}Temperature: {Style.RESET_ALL}{agent_config.get('temperature', default_temperature)}")
 51 |     
 52 |     if verbose and reflection_mode:
 53 |         print(f"{Fore.CYAN}Reflection mode: {Style.RESET_ALL}Enabled")
 54 |     
 55 |     response, history, thinking_tools, output_tools = complete_reasoning_task(
 56 |         task=task,
 57 |         api_key=agent_config['api_key'],
 58 |         model=agent_config['model'],
 59 |         api_url=agent_config['api_url'],
 60 |         verbose=verbose,
 61 |         chain_store_api_key=chain_store_api_key,    
 62 |         max_reasoning_steps=max_reasoning_steps,
 63 |         wolfram_app_id=wolfram_app_id,
 64 |         temperature=agent_config.get('temperature', default_temperature),
 65 |         top_p=top_p,
 66 |         max_tokens=max_tokens,
 67 |         image=image,
 68 |         output_tools=output_tools,
 69 |         reflection_mode=reflection_mode
 70 |     )
 71 | 
 72 |     # Remove example chains from conversation history
 73 |     bottom_system_message_index = next((i for i, msg in enumerate(reversed(history)) if msg.get('role') == 'system'), None)
 74 |     if bottom_system_message_index is not None:
 75 |         history = history[-bottom_system_message_index:]
 76 |     
 77 |     return agent_config, response, history, thinking_tools, output_tools
 78 | 
 79 | def format_agent_results(
 80 |     agent_results: List[Tuple[Dict[str, str], str, List[Dict], List[Dict]]]
 81 | ) -> str:
 82 |     """Format the results from multiple agents into a prompt for the coordinator."""
 83 |     formatted_results = "Here are the responses from different AI models:\n\n"
 84 |     
 85 |     for i, (agent_config, response, history, thinking_tools, output_tools) in enumerate(agent_results, 1):
 86 |         formatted_results += f"Model {i} ({agent_config['model']}):\n"
 87 |         formatted_results += "Reasoning steps:\n"
 88 |         
 89 |         # Extract reasoning steps from history
 90 |         for msg in history:
 91 |             if msg['role'] == 'assistant':
 92 |                 if msg.get('content'):
 93 |                     formatted_results += f"- {msg['content']}\n"
 94 |             elif msg['role'] == 'tool':
 95 |                 formatted_results += f"  Tool result: {msg['content']}\n"
 96 |         
 97 |         formatted_results += f"\nFinal response:\n{response}\n\n"
 98 |         formatted_results += "─" * 50 + "\n\n"
 99 |     
100 |     return formatted_results
101 | 
102 | def run_agents_parallel(
103 |     task: str,
104 |     agents: List[Dict[str, str]],
105 |     verbose: bool = False,
106 |     chain_store_api_key: Optional[str] = None,
107 |     max_workers: Optional[int] = None,
108 |     max_reasoning_steps: Optional[int] = None,
109 |     wolfram_app_id: Optional[str] = None,
110 |     temperature: float = 0.7,
111 |     top_p: float = 1.0,
112 |     max_tokens: int = 500,
113 |     image: Optional[str] = None,
114 |     output_tools: Optional[List[Dict]] = None,
115 |     reflection_mode: bool = False
116 | ) -> List[Tuple[Dict[str, str], str, List[Dict], List[Dict]]]:
117 |     """Run multiple agents in parallel."""
118 |     with ThreadPoolExecutor(max_workers=max_workers) as executor:
119 |         # Submit all tasks
120 |         future_to_agent = {
121 |             executor.submit(
122 |                 run_agent, 
123 |                 task, 
124 |                 agent,
125 |                 verbose,
126 |                 chain_store_api_key,
127 |                 max_reasoning_steps,
128 |                 wolfram_app_id,
129 |                 temperature,
130 |                 top_p,
131 |                 max_tokens,
132 |                 image,
133 |                 output_tools,
134 |                 reflection_mode
135 |             ): agent for agent in agents
136 |         }
137 |         
138 |         # Collect results as they complete
139 |         results = []
140 |         for future in as_completed(future_to_agent):
141 |             try:
142 |                 result = future.result()
143 |                 results.append(result)
144 |             except Exception as e:
145 |                 if verbose:
146 |                     agent = future_to_agent[future]
147 |                     print(f"\n{Fore.RED}Error with model {agent['model']}: {str(e)}{Style.RESET_ALL}")
148 |         
149 |         return results
150 | 
151 | def ensemble(
152 |     task: str,
153 |     agents: List[Dict[str, str]],
154 |     coordinator: Dict[str, str],
155 |     verbose: bool = False,
156 |     chain_store_api_key: Optional[str] = None,
157 |     max_workers: Optional[int] = None,
158 |     return_reasoning: bool = False,
159 |     max_reasoning_steps: Optional[int] = None,
160 |     coordinator_max_steps: Optional[int] = None,
161 |     wolfram_app_id: Optional[str] = None,
162 |     temperature: float = 0.7,
163 |     top_p: float = 1.0,
164 |     max_tokens: int = 500,
165 |     image: Optional[str] = None,
166 |     output_tools: Optional[List[Dict]] = None,
167 |     reflection_mode: bool = False
168 | ) -> Union[str, Tuple[str, List[Tuple[Dict[str, str], str, List[Dict], List[Dict]]]]]:
169 |     """
170 |     Run multiple agents in parallel and coordinate their responses.
171 |     
172 |     Args:
173 |         task: The task to complete
174 |         agents: List of dictionaries, each containing 'model', 'api_key', and 'api_url'
175 |         coordinator: Dictionary containing 'model', 'api_key', and 'api_url' for the coordinating model
176 |         verbose: Whether to show detailed output
177 |         chain_store_api_key: API key for chain store if using
178 |         max_workers: Maximum number of parallel workers
179 |         return_reasoning: Whether to return the full reasoning chains
180 |         max_reasoning_steps: Maximum steps for each agent
181 |         coordinator_max_steps: Maximum steps for the coordinator (can be different from agents)
182 |         wolfram_app_id: Wolfram Alpha app ID if using
183 |         temperature: Default temperature for the model if using
184 |         top_p: Top p for the model if using
185 |         max_tokens: Maximum number of tokens for the model if using
186 |         image: Optional image to pass to the model if using
187 |         output_tools: Optional list of output tools for the model if using
188 |         reflection_mode: Whether to enable reflection mode for all agents
189 |     """
190 |     # Reinitialize colorama for the main process
191 |     init(autoreset=True)
192 |     
193 |     if verbose:
194 |         print(f"\n{Fore.MAGENTA}Starting Ensemble for task:{Style.RESET_ALL}")
195 |         print(f"{task}\n")
196 |         print(f"{Fore.MAGENTA}Using {len(agents)} agents in parallel{Style.RESET_ALL}")
197 |         print(f"{Fore.MAGENTA}Default temperature: {temperature}{Style.RESET_ALL}")
198 |         for agent in agents:
199 |             if 'temperature' in agent:
200 |                 print(f"{Fore.MAGENTA}Temperature for {agent['model']}: {agent['temperature']}{Style.RESET_ALL}")
201 |     
202 |     if verbose and reflection_mode:
203 |         print(f"{Fore.MAGENTA}Reflection mode: {Style.RESET_ALL}Enabled for all agents")
204 |     
205 |     # Run all agents in parallel with max steps
206 |     agent_results = run_agents_parallel(
207 |         task,
208 |         agents,
209 |         verbose,
210 |         chain_store_api_key,
211 |         max_workers,
212 |         max_reasoning_steps,
213 |         wolfram_app_id,
214 |         temperature,
215 |         top_p,
216 |         max_tokens,
217 |         image,
218 |         output_tools,
219 |         reflection_mode
220 |     )
221 |     
222 |     # Format results for coordinator
223 |     formatted_results = format_agent_results(agent_results)
224 |     
225 |     # Create coordinator prompt
226 |     coordinator_task = f"""You are a coordinator model tasked with analyzing multiple AI responses to the following question:
227 | 
228 | Question: {task}
229 | 
230 | <Agent Responses>
231 | {formatted_results}
232 | </Agent Responses>
233 | 
234 | Please analyze all responses and their reasoning steps carefully. Consider:
235 | 1. The logical soundness of each approach
236 | 2. The thoroughness of the reasoning
237 | 3. The correctness of calculations and tool usage
238 | 4. The clarity and completeness of the final response
239 | 
240 | Based on your analysis, synthesize these responses into a single, high-quality response to the question. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the question. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability. Also remember that the user is only going to see your final answer, so make sure it's complete and self-contained, and actually answers the question."""
241 | 
242 |     # Get coordinator's response
243 |     if verbose:
244 |         print(f"\n{Fore.CYAN}Running coordinator model: {Style.RESET_ALL}{coordinator['model']}")
245 |     
246 |     coordinator_response, _, _, _ = complete_reasoning_task(
247 |         task=coordinator_task,
248 |         api_key=coordinator['api_key'],
249 |         model=coordinator['model'],
250 |         api_url=coordinator['api_url'],
251 |         verbose=verbose,
252 |         chain_store_api_key=None,
253 |         max_reasoning_steps=coordinator_max_steps,
254 |         wolfram_app_id=wolfram_app_id,
255 |         temperature=temperature,
256 |         top_p=top_p,
257 |         max_tokens=max_tokens,
258 |         image=image,
259 |         output_tools=output_tools,
260 |         reflection_mode=reflection_mode
261 |     )
262 |     
263 |     if return_reasoning:
264 |         return coordinator_response, agent_results
265 |     return coordinator_response
266 | 
267 | # Alias for backward compatibility
268 | run_mixture_of_agents = ensemble


--------------------------------------------------------------------------------
/planner.py:
--------------------------------------------------------------------------------
  1 | from typing import List, Dict, Optional
  2 | from colorama import Fore, Style
  3 | import json
  4 | 
  5 | def format_tools_for_context(tools: List[Dict]) -> str:
  6 |     """Format tools list into a readable string for context."""
  7 |     tools_str = "Available Tools:\n"
  8 |     for tool in tools:
  9 |         if tool.get('type') == 'function':
 10 |             func = tool['function']
 11 |             tools_str += f"- {func['name']}: {func['description']}\n"
 12 |             
 13 |             # Add parameter details if they exist
 14 |             if 'parameters' in func and 'properties' in func['parameters']:
 15 |                 tools_str += "  Parameters:\n"
 16 |                 for param_name, param_details in func['parameters']['properties'].items():
 17 |                     tools_str += f"    - {param_name}: {param_details.get('description', 'No description')}\n"
 18 |     
 19 |     return tools_str
 20 | 
 21 | def format_chain_for_planning(
 22 |     chain: Dict,
 23 |     include_tool_calls: bool = True
 24 | ) -> str:
 25 |     """
 26 |     Format a single chain into a concise summary focusing on key patterns and outcomes.
 27 |     """
 28 |     formatted = f"\nTask: {chain.get('task', 'Unknown task')}\n"
 29 |     
 30 |     # Add metadata if it exists
 31 |     if 'metadata' in chain:
 32 |         formatted += "Context:\n"
 33 |         for key, value in chain['metadata'].items():
 34 |             formatted += f"- {key}: {value}\n"
 35 |     
 36 |     # Add tools that were available
 37 |     if 'thinking_tools' in chain:
 38 |         formatted += "\n" + format_tools_for_context(chain['thinking_tools'])
 39 |     
 40 |     formatted += "\nSteps taken:\n"
 41 |     for msg in chain.get('conversation_history', []):
 42 |         if msg['role'] == 'assistant':
 43 |             step = f"- {msg.get('content', '')}"
 44 |             
 45 |             # Include tool calls if requested and they exist
 46 |             if include_tool_calls and msg.get('tool_calls'):
 47 |                 for tool_call in msg['tool_calls']:
 48 |                     if tool_call['type'] == 'function':
 49 |                         func = tool_call['function']
 50 |                         step += f"\n  Tool used: {func['name']}"
 51 |                         try:
 52 |                             args = json.loads(func['arguments'])
 53 |                             step += f"\n  Arguments: {json.dumps(args, indent=2)}"
 54 |                         except:
 55 |                             step += f"\n  Arguments: {func['arguments']}"
 56 |             
 57 |             formatted += step + "\n"
 58 |         # Include tool responses for context
 59 |         elif msg['role'] == 'tool':
 60 |             content = msg.get('content', '')
 61 |             first_line = content.split('\n')[0] if content else ''
 62 |             formatted += f"  Result: {first_line}...\n"
 63 |     
 64 |     return formatted
 65 | 
 66 | def generate_plan(
 67 |     task: str,
 68 |     similar_chains: List[Dict],
 69 |     current_tools: List[Dict],
 70 |     api_key: str,
 71 |     model: str,
 72 |     api_url: str,
 73 |     verbose: bool = False,
 74 |     metadata: Optional[Dict] = None
 75 | ) -> str:
 76 |     """
 77 |     Generate a plan of action based on similar chains from memory.
 78 |     Takes into account available tools and other context.
 79 |     """
 80 |     from call_ai import send_message_to_api
 81 |     
 82 |     if verbose:
 83 |         print(f"\n{Fore.CYAN}Extracting patterns from {len(similar_chains)} similar chains...{Style.RESET_ALL}")
 84 |         # Print the tasks of the similar chains
 85 |         for i, chain in enumerate(similar_chains, 1):
 86 |             print(f"Example {i}: {chain.get('task', 'Unknown task')}")
 87 |     
 88 |     # Format current context
 89 |     current_context = f"Current Task: {task}\n"
 90 |     if metadata:
 91 |         current_context += "Current Context:\n"
 92 |         for key, value in metadata.items():
 93 |             current_context += f"- {key}: {value}\n"
 94 |     current_context += "\n" + format_tools_for_context(current_tools)
 95 |     
 96 |     # Format similar chains
 97 |     examples_context = ""
 98 |     for i, chain in enumerate(similar_chains, 1):
 99 |         examples_context += f"\nExample {i}:"
100 |         examples_context += format_chain_for_planning(chain)
101 |     
102 |     # Create planning prompt
103 |     planning_messages = [
104 |         {
105 |             'role': 'system',
106 |             'content': (
107 |                 "You are an expert at breaking down complex tasks into clear steps and leveraging available tools effectively. "
108 |                 "Focus on providing strategic guidance about HOW to approach problems rather than specific solutions. "
109 |                 "Key aspects to consider:\n"
110 |                 "- How to break the problem into manageable steps\n"
111 |                 "- Which tools would be most helpful at each stage\n" 
112 |                 "- How to validate progress and handle potential issues\n"
113 |                 "- What patterns from past experiences could be applied"
114 |             )
115 |         },
116 |         {
117 |             'role': 'user',
118 |             'content': "[REDACTED]" # Make the AI think there was an example input here — we're just trying to teach it how to generate a solid plan
119 |         },
120 |         {
121 |             'role': 'assistant',
122 |             'content': """For the current task of designing an generalist AI search agent that uses OpenAI-compatible APIs, we can learn from the example where we built a LLM-based voice chatbot.
123 | 
124 |             For API integration, we successfully used the OpenRouter endpoint (https://openrouter.ai/api/v1/chat/completions) with these key parameters:
125 |             - model: "meta-llama/Meta-Llama-3-8B-Instruct"
126 |             - messages: [{"role": "user", "content": "Hello, how are you?"}]
127 |             - tools: []
128 |             - max_tokens: 1000
129 |             - temperature: 0.7
130 |             - top_p: 1.0
131 |             
132 |             One key learning was about model selection - while the chatbot needed low latency, an agent typically benefits from a more capable model since response time is less critical.
133 | 
134 |             We also discovered important lessons about prompt engineering. Our experience showed that shorter, precise prompts consistently outperformed longer ones. The initial iterations suffered from vague prompting that led to unfocused responses.
135 |             
136 |             A particularly effective pattern we uncovered was using function calling to enable tool usage. This approach could be valuable for integrating search capabilities, particularly by combining function calling with SERP APIs for web access.
137 |             """
138 |         },
139 |         {
140 |             'role': 'user',
141 |             'content': (
142 |                 f"{current_context}\n"
143 |                 f"Similar Examples:{examples_context}\n\n"
144 |                 "Based on these examples and the available tools/resources, outline a strategic approach for this task:\n"
145 |                 "1. How would you break this down into clear steps?\n"
146 |                 "2. Which tools (and, if applicable, which libraries) would be most valuable at each stage?\n"
147 |                 "3. What key checkpoints or validation should be included?\n"
148 |                 "4. What patterns from similar past tasks could guide the approach?\n\n"
149 |                 "Focus on the process and methodology rather than specific implementation details.\n"
150 |                 "Keep it concise and super high-level, like you're having a quick chat with a colleague. Maximum 200 words."
151 |             )
152 |         }
153 |     ]
154 | 
155 |     if verbose:
156 |         print(f"{Fore.CYAN}Analyzing patterns and generating plan...{Style.RESET_ALL}")
157 | 
158 |     try:
159 |         response = send_message_to_api(
160 |             task,
161 |             planning_messages,
162 |             api_key,
163 |             [],  # No tools needed for planning
164 |             model,
165 |             temperature=0.7,
166 |             top_p=1.0,
167 |             max_tokens=1000,  # Increased for more detailed plans
168 |             api_url=api_url,
169 |             verbose=verbose
170 |         )
171 |         
172 |         plan = response.get('content', '')
173 |         
174 |         if verbose:
175 |             print(f"\n{Fore.GREEN}Generated Plan:{Style.RESET_ALL}")
176 |             print(plan)
177 |             
178 |         return plan
179 |         
180 |     except Exception as e:
181 |         if verbose:
182 |             print(f"\n{Fore.RED}Error generating plan: {str(e)}{Style.RESET_ALL}")
183 |         return "Failed to generate plan from similar examples." 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | anyio==4.6.2.post1
 2 | backports.tarfile==1.2.0
 3 | blinker==1.9.0
 4 | certifi==2024.8.30
 5 | charset-normalizer==3.4.0
 6 | click==8.1.7
 7 | colorama==0.4.6
 8 | contourpy==1.3.0
 9 | cycler==0.12.1
10 | e2b_code_interpreter==1.0.1
11 | exceptiongroup==1.2.2
12 | Flask==3.0.3
13 | fonttools==4.54.1
14 | google_search_results==2.4.2
15 | h11==0.14.0
16 | httpcore==1.0.6
17 | httpx==0.27.2
18 | idna==3.10
19 | importlib_metadata==8.5.0
20 | importlib_resources==6.4.5
21 | itsdangerous==2.2.0
22 | jaraco.context==6.0.1
23 | Jinja2==3.1.4
24 | joblib==1.4.2
25 | kiwisolver==1.4.7
26 | MarkupSafe==3.0.2
27 | matplotlib==3.9.2
28 | more-itertools==10.5.0
29 | mpmath==1.3.0
30 | multidict==6.1.0
31 | numpy==2.0.2
32 | packaging==24.2
33 | pandas==2.2.3
34 | pillow==11.0.0
35 | pyparsing==3.2.0
36 | python-dateutil==2.9.0.post0
37 | python-dotenv==1.0.1
38 | pytz==2024.2
39 | requests==2.32.3
40 | scikit-learn==1.5.2
41 | scipy==1.13.1
42 | seaborn==0.13.2
43 | six==1.16.0
44 | sniffio==1.3.1
45 | sympy==1.13.3
46 | threadpoolctl==3.5.0
47 | typing_extensions==4.12.2
48 | tzdata==2024.2
49 | urllib3==2.2.3
50 | Werkzeug==3.1.3
51 | wolframalpha==5.1.3
52 | xmltodict==0.14.2
53 | zipp==3.21.0
54 | 


--------------------------------------------------------------------------------
/tools.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import requests
  3 | from typing import Dict, Any, List, Union, Optional
  4 | import sys
  5 | from io import StringIO
  6 | import traceback
  7 | from contextlib import redirect_stdout, redirect_stderr
  8 | import json
  9 | import wolframalpha
 10 | import numpy as np
 11 | import pandas as pd
 12 | import matplotlib.pyplot as plt
 13 | import seaborn as sns
 14 | import sympy
 15 | import scipy
 16 | import sklearn
 17 | from sympy import symbols, solve, simplify
 18 | from scipy import stats
 19 | from sklearn import preprocessing
 20 | import math
 21 | from e2b_code_interpreter import Sandbox
 22 | from serpapi import GoogleSearch
 23 | from dotenv import load_dotenv
 24 | from urllib.parse import quote
 25 | load_dotenv()
 26 | 
 27 | serpapi_api_key = os.environ.get("SERPAPI_API_KEY")
 28 | 
 29 | # Dictionary of interpreter states, keyed by task hash
 30 | interpreter_states = {}
 31 | 
 32 | def get_task_hash(task: str) -> str:
 33 |     """Generate a unique hash for a task."""
 34 |     import hashlib
 35 |     return hashlib.md5(task.encode()).hexdigest()
 36 | 
 37 | def clear_interpreter_state(task: str = None):
 38 |     """
 39 |     Clear the interpreter state.
 40 |     If task is provided, only clear that task's state.
 41 |     If no task is provided, clear all states.
 42 |     """
 43 |     global interpreter_states
 44 |     if task:
 45 |         task_hash = get_task_hash(task)
 46 |         if task_hash in interpreter_states:
 47 |             del interpreter_states[task_hash]
 48 |     else:
 49 |         interpreter_states = {}
 50 | 
 51 | def python_interpreter(code: str, task: str, timeout: int = 10, sandbox: Optional[Sandbox] = None) -> str:
 52 |     """
 53 |     Safely execute Python code in a restricted environment.
 54 |     Maintains separate state for each task.
 55 |     """
 56 |     if sandbox is None:
 57 |         raise ValueError("E2B Sandbox is required for Python code execution but none was provided.")
 58 | 
 59 |     print(f"Executing code:\n{code}")
 60 |     execution = sandbox.run_code(
 61 |         code,
 62 |         # timeout=timeout, # Timeout to wait for the whole request to complete
 63 |         on_stdout=lambda x: print('[stdout]', x),
 64 |         on_stderr=lambda x: print('[stderr]', x)
 65 |     )
 66 | 
 67 |     if execution.error:
 68 |         e = execution.error
 69 | 
 70 |         error_msg = (
 71 |             f"Error executing code: {e.value}\n"
 72 |             f"Error type: {type(e.name)}\n"
 73 |             f"Traceback:\n{e.traceback}\n"
 74 |             "\nDebugging Suggestions:\n"
 75 |             "1. Add print statements to debug the issue\n"
 76 |             "2. Use assertions to validate inputs and outputs\n"
 77 |             "3. Check variable types with print(type(var))\n"
 78 |             "4. For numerical computations, verify inputs are numbers\n"
 79 |             "5. For symbolic math, ensure variables are properly defined with symbols()\n"
 80 |             "\nNote: Plotting is currently not supported. Instead of visualizing data, consider:\n"
 81 |             "1. Printing key numerical results\n"
 82 |             "2. Showing data statistics\n"
 83 |             "3. Printing array slices or samples\n"
 84 |             "\nAvailable packages:\n"
 85 |             "- numpy (np): Numerical computing\n"
 86 |             "- pandas (pd): Data manipulation\n"
 87 |             "- scipy: Scientific computing\n"
 88 |             "- sklearn: Machine learning"
 89 |         )
 90 |         return error_msg
 91 | 
 92 |     result = []
 93 | 
 94 |     # Results are the output of the code execution besides stdout and stderr
 95 |     # Can be text, PNG, JPG, JSON, html, markdown, etc.
 96 |     # Results are based on executing code inside the headless Jupyter notebook
 97 |     # that's running inside the sandbox.
 98 |     # The same way, you'd get result from a Jupyter notebook cell, you get results here.
 99 |     # That means any display() calls in the code will be captured as a result,
100 |     # and also the last expression in the code, if there is one.
101 |     code_exec_results = execution.results
102 |     for ce_result in code_exec_results:
103 |         print(ce_result.formats()) # Raw data of results
104 |         # if 'png' in ce_result.formats:
105 |             # Handle PNG images
106 |         # if 'json' in ce_result.formats:
107 |             # Handle JSON
108 |         # ...
109 |         #
110 |         # Text is always present for every result.
111 |         result.append(ce_result.text)
112 | 
113 |     stdout = execution.logs.stdout
114 |     stderr = execution.logs.stderr
115 |     if stdout:
116 |         result.append(f"Output:\n{''.join(stdout)}")
117 |     if stderr:
118 |         result.append(f"Errors:\n{''.join(stderr)}")
119 |     return "\n\n".join(result) if result else "Code executed successfully with no output."
120 | 
121 | def find_datapoint_on_web(
122 |     query: str,
123 |     api_key: str = None,
124 | ) -> str:
125 |     """
126 |     Perform web search using SERPAPI Google Search.
127 | 
128 |     Args:
129 |         query: The specific search query
130 |         api_key: API key for SERPAPI
131 |         api_url: Not used for SERPAPI
132 | 
133 |     Returns:
134 |         str: Search results with citations
135 |     """
136 |     try:
137 |         # Configure the search
138 |         search = GoogleSearch({
139 |             "q": query,
140 |             "api_key": api_key,
141 |             "num": 5  # Get top 5 results
142 |         })
143 |         
144 |         # Get the results
145 |         results = search.get_dict()
146 |         
147 |         if "error" in results:
148 |             return f"Error performing search: {results['error']}"
149 |             
150 |         # Format organic results
151 |         formatted_results = []
152 |         
153 |         if "organic_results" in results:
154 |             for result in results["organic_results"]:
155 |                 title = result.get("title", "No title")
156 |                 snippet = result.get("snippet", "No description available")
157 |                 link = result.get("link", "No link available")
158 |                 formatted_results.append(f"Source: {title}\nSummary: {snippet}\nURL: {link}\n")
159 |                 
160 |         if formatted_results:
161 |             return "\n".join(formatted_results)
162 |         else:
163 |             return "No relevant results found for the query."
164 |             
165 |     except Exception as e:
166 |         return f"Error performing web search: {str(e)}"
167 | 
168 | def wolfram(
169 |     query: str,
170 |     wolfram_app_id: str,
171 |     include_pods: List[str] = None,  # e.g., ["Result", "Solution", "Plot"]
172 |     max_width: int = 1000
173 | ) -> str:
174 |     """
175 |     Query Wolfram Alpha for computations, math, science, and knowledge.
176 | 
177 |     Args:
178 |         query: The query to send to Wolfram Alpha
179 |         wolfram_app_id: Your Wolfram Alpha API key
180 |         include_pods: List of pod names to include in result (None for all)
181 |         max_width: Maximum width for plots/images
182 | 
183 |     Returns:
184 |         str: Formatted response from Wolfram Alpha
185 |     """
186 |     try:
187 |         client = wolframalpha.Client(wolfram_app_id)
188 |         res = client.query(query, width=max_width)
189 | 
190 |         # Format the response
191 |         result = []
192 |         for pod in res.pods:
193 |             # Skip if we're only interested in specific pods and this isn't one of them
194 |             if include_pods and pod.title not in include_pods:
195 |                 continue
196 | 
197 |             if pod.title and pod.text:
198 |                 result.append(f"{pod.title}:\n{pod.text}")
199 | 
200 |         return "\n\n".join(result) if result else "No results found"
201 | 
202 |     except Exception as e:
203 |         return f"Error querying Wolfram Alpha: {str(e)}"
204 | 
205 | def get_webpage_content(url: str, jina_api_key: str = None) -> str:
206 |     """
207 |     Retrieve webpage content using Jina API.
208 |     
209 |     Args:
210 |         url: The webpage URL to fetch content from
211 |         jina_api_key: Jina API key for authentication
212 |         
213 |     Returns:
214 |         str: The webpage content or error message
215 |     """
216 |     if not jina_api_key:
217 |         return "Error: Jina API key not provided"
218 |         
219 |     try:
220 |         # URL encode the target URL and prepend Jina API endpoint
221 |         encoded_url = quote(url, safe='')
222 |         jina_url = f'https://r.jina.ai/{encoded_url}'
223 |         
224 |         headers = {
225 |             'Authorization': f'Bearer {jina_api_key}'
226 |         }
227 |         
228 |         response = requests.get(jina_url, headers=headers, timeout=10)
229 |         
230 |         if response.status_code == 200:
231 |             return response.text
232 |         else:
233 |             return f"Failed to retrieve content. Status code: {response.status_code}"
234 |             
235 |     except requests.RequestException as e:
236 |         return f"Error fetching webpage content: {str(e)}"
237 | 
238 | def execute_tool(
239 |     tool_name: str,
240 |     parameters: Dict[str, Any],
241 |     task: str = None,
242 |     api_key: str = None,
243 |     model: str = None,
244 |     api_url: str = None,
245 |     wolfram_app_id: str = None,
246 |     sandbox: Optional[Sandbox] = None,
247 |     jina_api_key: str = None
248 | ) -> Any:
249 |     """Execute the specified tool with the given parameters."""
250 |     tools = {
251 |         "python": python_interpreter,
252 |         "find_datapoint_on_web": find_datapoint_on_web,
253 |         "wolfram": wolfram,
254 |     }
255 |     
256 |     # Only add get_webpage_content tool if Jina API key is provided
257 |     if jina_api_key:
258 |         tools["get_webpage_content"] = get_webpage_content
259 | 
260 |     if tool_name not in tools:
261 |         raise ValueError(f"Unknown tool: {tool_name}")
262 | 
263 |     tool_func = tools[tool_name]
264 | 
265 |     # Remove thread_id from parameters if it exists
266 |     if 'thread_id' in parameters:
267 |         del parameters['thread_id']
268 | 
269 |     # Inject appropriate credentials and task
270 |     if tool_name == "python":
271 |         parameters = {**parameters, "task": task, "sandbox": sandbox}
272 |     elif tool_name == "find_datapoint_on_web":
273 |         parameters = {**parameters, "api_key": serpapi_api_key}
274 |     elif tool_name == "wolfram":
275 |         parameters = {**parameters, "wolfram_app_id": wolfram_app_id}
276 |     elif tool_name == "get_webpage_content":
277 |         parameters = {**parameters, "jina_api_key": jina_api_key}
278 | 
279 |     return tool_func(**parameters)


--------------------------------------------------------------------------------