├── DeepFlow3-19.pdf ├── LICENSE ├── README.md ├── __init__.py ├── core ├── __init__.py ├── _function_type_hints_utils.py ├── agent_core.py ├── data_types.py └── memory.py ├── interface ├── __init__.py ├── cli.py └── gradio_ui.py ├── models ├── __init__.py └── llm_models.py ├── prompts ├── code_agent.yaml └── toolcalling_agent.yaml ├── runtime ├── __init__.py ├── local_python_executor.py └── remote_executors.py ├── templates └── __init__.py ├── tools ├── __init__.py ├── default_tools.py ├── presets │ └── __init__.py ├── tool_validation.py ├── tools.py └── vision_web_browser.py └── utils ├── __init__.py ├── monitoring.py └── utils.py /DeepFlow3-19.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/DeepFlow3-19.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeepFlow 2 | 3 | **AI-Powered Multi-Agent Framework for Web3 Development** 4 | 5 | [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE) 6 | [![Python Version](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/downloads/) 7 | [![Version](https://img.shields.io/badge/version-1.12.0.dev0-blue)](https://github.com/username/deepflow/releases) 8 | 9 | ## Overview 10 | 11 | DeepFlow is a sophisticated AI framework that combines multi-agent systems with Web3 capabilities, enabling intelligent code generation and automation. Built on top of the HuggingFace ecosystem, it provides a comprehensive suite of tools for building, debugging, and deploying AI-powered applications with blockchain integration. 12 | 13 | ## Core Features 14 | 15 | ### 🤖 Advanced Agent System 16 | - **Multi-Step Agent Architecture** 17 | - Sophisticated planning and execution pipeline 18 | - State management with memory systems 19 | - Dynamic tool integration capabilities 20 | 21 | - **Specialized Agents** 22 | - `ToolCallingAgent`: Expert at utilizing external tools and APIs 23 | - `CodeAgent`: Specialized in code generation and execution 24 | - Support for custom agent implementations 25 | 26 | ### 🛠️ Comprehensive Tooling 27 | - **Built-in Tools** 28 | - Python code execution environment 29 | - Web3 integration tools 30 | - File system operations 31 | - Web search capabilities 32 | 33 | - **Extensible Tool System** 34 | - Custom tool development framework 35 | - Tool validation and safety checks 36 | - Rich type system for tool inputs/outputs 37 | 38 | ### 🧠 AI Model Integration 39 | - **Flexible Model Support** 40 | - Compatible with HuggingFace models 41 | - Support for custom model implementations 42 | - Structured prompt templates 43 | 44 | ### 📊 Memory & State Management 45 | - **Sophisticated Memory System** 46 | - Action tracking and history 47 | - Planning state management 48 | - Task contextualization 49 | 50 | ### 🌐 Web3 Features 51 | - **Blockchain Integration** 52 | - Wallet connectivity 53 | - Smart contract interaction 54 | - Transaction management 55 | 56 | ## Installation 57 | 58 | ### Prerequisites 59 | - Python 3.8 or higher 60 | - Git 61 | 62 | ### Quick Start 63 | 64 | 1. Install via pip: 65 | ```bash 66 | pip install deepflow 67 | ``` 68 | 69 | 2. Or install from source: 70 | ```bash 71 | git clone https://github.com/username/deepflow.git 72 | cd deepflow 73 | pip install -e . 74 | ``` 75 | 76 | ## Usage Examples 77 | 78 | ### Basic Agent Usage 79 | ```python 80 | from deepflow import MultiStepAgent, Tool 81 | from deepflow.models import get_model 82 | 83 | # Initialize model and tools 84 | model = get_model("gpt-3.5-turbo") 85 | tools = [Tool(...)] # Add your tools 86 | 87 | # Create agent 88 | agent = MultiStepAgent( 89 | tools=tools, 90 | model=model, 91 | max_steps=20 92 | ) 93 | 94 | # Run a task 95 | result = agent.run("Create a simple web application") 96 | ``` 97 | 98 | ### Web3 Integration 99 | ```python 100 | from deepflow.web3 import Web3Agent 101 | from deepflow.tools import BlockchainTool 102 | 103 | # Initialize Web3 agent 104 | agent = Web3Agent( 105 | tools=[BlockchainTool()], 106 | model=model 107 | ) 108 | 109 | # Interact with blockchain 110 | result = agent.run("Deploy a smart contract") 111 | ``` 112 | 113 | ## Project Structure 114 | 115 | ``` 116 | deepflow/ 117 | ├── src/ 118 | │ ├── core/ # Core agent implementation 119 | │ ├── models/ # AI model integrations 120 | │ ├── tools/ # Tool implementations 121 | │ ├── runtime/ # Execution environments 122 | │ ├── interface/ # UI and CLI components 123 | │ ├── utils/ # Utility functions 124 | │ └── web3/ # Blockchain integrations 125 | ├── docs/ # Documentation 126 | └── tests/ # Test suite 127 | ``` 128 | 129 | ## Development 130 | 131 | ### Setting Up Development Environment 132 | 133 | 1. Create a virtual environment: 134 | ```bash 135 | python -m venv venv 136 | source venv/bin/activate # Linux/Mac 137 | # or 138 | venv\Scripts\activate # Windows 139 | ``` 140 | 141 | 2. Install development dependencies: 142 | ```bash 143 | pip install -r requirements-dev.txt 144 | ``` 145 | 146 | ### Running Tests 147 | ```bash 148 | pytest tests/ 149 | ``` 150 | 151 | ## Contributing 152 | 153 | We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details. 154 | 155 | 1. Fork the repository 156 | 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 157 | 3. Commit your changes (`git commit -m 'Add amazing feature'`) 158 | 4. Push to the branch (`git push origin feature/amazing-feature`) 159 | 5. Open a Pull Request 160 | 161 | ## Documentation 162 | 163 | - [API Reference](docs/api.md) 164 | - [Architecture Guide](docs/architecture.md) 165 | - [Tool Development Guide](docs/tools.md) 166 | - [Web3 Integration Guide](docs/web3.md) 167 | 168 | ## License 169 | 170 | Licensed under the Apache License, Version 2.0 - see the [LICENSE](LICENSE) file for details. 171 | 172 | ## Contact & Support 173 | 174 | - **Documentation**: [https://deepflow.readthedocs.io](https://deepflow.readthedocs.io) 175 | - **Issues**: [GitHub Issues](https://github.com/username/deepflow/issues) 176 | - **Discord**: [Join our community](https://discord.gg/deepflow) 177 | - **Email**: support@deepflow.dev 178 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | """ 5 | DeepFlow Core Package 6 | ==================== 7 | 8 | This package provides a comprehensive framework for building intelligent agents that can 9 | perform complex tasks through a combination of planning, reasoning, and tool usage. 10 | 11 | The package structure follows a modular design where each component handles specific 12 | functionality within the agent ecosystem: 13 | 14 | - agent_types: Defines data types that can be returned by agents 15 | - agents: Implements the core agent classes (MultiStepAgent, ToolCallingAgent, CodeAgent) 16 | - tools: Provides the base Tool class and utility functions for tool creation 17 | - models: Contains model classes for different LLM providers 18 | - memory: Implements memory structures for agent state management 19 | - monitoring: Provides logging and monitoring capabilities 20 | 21 | Version: 1.12.0.dev0 22 | 23 | DeepFlow@2025 24 | """ 25 | 26 | # Define the package version 27 | __version__ = "1.12.0.dev0" 28 | 29 | # Import all modules to make them available via the top-level package 30 | # This allows users to import directly from the package, e.g. from smolagents import MultiStepAgent 31 | from .agent_types import * # Import agent data type classes (AgentImage, AgentText, etc.) 32 | from .agents import * # Import agent implementation classes (MultiStepAgent, ToolCallingAgent, CodeAgent) 33 | from .default_tools import * # Import common pre-built tools (PythonInterpreter, WebSearch, etc.) 34 | from .gradio_ui import * # Import UI components for Gradio integration 35 | from .local_python_executor import * # Import execution environment for Python code 36 | from .memory import * # Import memory-related classes for state management 37 | from .models import * # Import model interfaces for different providers 38 | from .monitoring import * # Import logging and monitoring utilities 39 | from .remote_executors import * # Import remote execution environments 40 | from .tools import * # Import base tool classes and tool utilities 41 | from .utils import * # Import general utility functions 42 | from .cli import * # Import command-line interface utilities 43 | -------------------------------------------------------------------------------- /core/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/core/__init__.py -------------------------------------------------------------------------------- /core/_function_type_hints_utils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | # Copyright 2025 The HuggingFace Inc. team. All rights reserved. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | """This module contains utilities exclusively taken from `transformers` repository. 18 | 19 | Since they are not specific to `transformers` and that `transformers` is an heavy dependencies, those helpers have 20 | been duplicated. 21 | 22 | TODO: move them to `huggingface_hub` to avoid code duplication. 23 | """ 24 | 25 | import inspect 26 | import json 27 | import re 28 | import types 29 | from copy import copy 30 | from typing import ( 31 | Any, 32 | Callable, 33 | Dict, 34 | List, 35 | Optional, 36 | Tuple, 37 | Union, 38 | get_args, 39 | get_origin, 40 | get_type_hints, 41 | ) 42 | 43 | from huggingface_hub.utils import is_torch_available 44 | 45 | from .utils import _is_pillow_available 46 | 47 | 48 | def get_imports(code: str) -> List[str]: 49 | """ 50 | Extracts all the libraries (not relative imports) that are imported in a code. 51 | 52 | Args: 53 | code (`str`): Code text to inspect. 54 | 55 | Returns: 56 | `list[str]`: List of all packages required to use the input code. 57 | """ 58 | # filter out try/except block so in custom code we can have try/except imports 59 | code = re.sub(r"\s*try\s*:.*?except.*?:", "", code, flags=re.DOTALL) 60 | 61 | # filter out imports under is_flash_attn_2_available block for avoid import issues in cpu only environment 62 | code = re.sub( 63 | r"if is_flash_attn[a-zA-Z0-9_]+available\(\):\s*(from flash_attn\s*.*\s*)+", 64 | "", 65 | code, 66 | flags=re.MULTILINE, 67 | ) 68 | 69 | # Imports of the form `import xxx` or `import xxx as yyy` 70 | imports = re.findall(r"^\s*import\s+(\S+?)(?:\s+as\s+\S+)?\s*$", code, flags=re.MULTILINE) 71 | # Imports of the form `from xxx import yyy` 72 | imports += re.findall(r"^\s*from\s+(\S+)\s+import", code, flags=re.MULTILINE) 73 | # Only keep the top-level module 74 | imports = [imp.split(".")[0] for imp in imports if not imp.startswith(".")] 75 | return list(set(imports)) 76 | 77 | 78 | class TypeHintParsingException(Exception): 79 | """Exception raised for errors in parsing type hints to generate JSON schemas""" 80 | 81 | 82 | class DocstringParsingException(Exception): 83 | """Exception raised for errors in parsing docstrings to generate JSON schemas""" 84 | 85 | 86 | def get_json_schema(func: Callable) -> Dict: 87 | """ 88 | This function generates a JSON schema for a given function, based on its docstring and type hints. This is 89 | mostly used for passing lists of tools to a chat template. The JSON schema contains the name and description of 90 | the function, as well as the names, types and descriptions for each of its arguments. `get_json_schema()` requires 91 | that the function has a docstring, and that each argument has a description in the docstring, in the standard 92 | Google docstring format shown below. It also requires that all the function arguments have a valid Python type hint. 93 | 94 | Although it is not required, a `Returns` block can also be added, which will be included in the schema. This is 95 | optional because most chat templates ignore the return value of the function. 96 | 97 | Args: 98 | func: The function to generate a JSON schema for. 99 | 100 | Returns: 101 | A dictionary containing the JSON schema for the function. 102 | 103 | Examples: 104 | ```python 105 | >>> def multiply(x: float, y: float): 106 | >>> ''' 107 | >>> A function that multiplies two numbers 108 | >>> 109 | >>> Args: 110 | >>> x: The first number to multiply 111 | >>> y: The second number to multiply 112 | >>> ''' 113 | >>> return x * y 114 | >>> 115 | >>> print(get_json_schema(multiply)) 116 | { 117 | "name": "multiply", 118 | "description": "A function that multiplies two numbers", 119 | "parameters": { 120 | "type": "object", 121 | "properties": { 122 | "x": {"type": "number", "description": "The first number to multiply"}, 123 | "y": {"type": "number", "description": "The second number to multiply"} 124 | }, 125 | "required": ["x", "y"] 126 | } 127 | } 128 | ``` 129 | 130 | The general use for these schemas is that they are used to generate tool descriptions for chat templates that 131 | support them, like so: 132 | 133 | ```python 134 | >>> from transformers import AutoTokenizer 135 | >>> from transformers.utils import get_json_schema 136 | >>> 137 | >>> def multiply(x: float, y: float): 138 | >>> ''' 139 | >>> A function that multiplies two numbers 140 | >>> 141 | >>> Args: 142 | >>> x: The first number to multiply 143 | >>> y: The second number to multiply 144 | >>> return x * y 145 | >>> ''' 146 | >>> 147 | >>> multiply_schema = get_json_schema(multiply) 148 | >>> tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01") 149 | >>> messages = [{"role": "user", "content": "What is 179 x 4571?"}] 150 | >>> formatted_chat = tokenizer.apply_chat_template( 151 | >>> messages, 152 | >>> tools=[multiply_schema], 153 | >>> chat_template="tool_use", 154 | >>> return_dict=True, 155 | >>> return_tensors="pt", 156 | >>> add_generation_prompt=True 157 | >>> ) 158 | >>> # The formatted chat can now be passed to model.generate() 159 | ``` 160 | 161 | Each argument description can also have an optional `(choices: ...)` block at the end, such as 162 | `(choices: ["tea", "coffee"])`, which will be parsed into an `enum` field in the schema. Note that this will 163 | only be parsed correctly if it is at the end of the line: 164 | 165 | ```python 166 | >>> def drink_beverage(beverage: str): 167 | >>> ''' 168 | >>> A function that drinks a beverage 169 | >>> 170 | >>> Args: 171 | >>> beverage: The beverage to drink (choices: ["tea", "coffee"]) 172 | >>> ''' 173 | >>> pass 174 | >>> 175 | >>> print(get_json_schema(drink_beverage)) 176 | ``` 177 | { 178 | 'name': 'drink_beverage', 179 | 'description': 'A function that drinks a beverage', 180 | 'parameters': { 181 | 'type': 'object', 182 | 'properties': { 183 | 'beverage': { 184 | 'type': 'string', 185 | 'enum': ['tea', 'coffee'], 186 | 'description': 'The beverage to drink' 187 | } 188 | }, 189 | 'required': ['beverage'] 190 | } 191 | } 192 | """ 193 | doc = inspect.getdoc(func) 194 | if not doc: 195 | raise DocstringParsingException( 196 | f"Cannot generate JSON schema for {func.__name__} because it has no docstring!" 197 | ) 198 | doc = doc.strip() 199 | main_doc, param_descriptions, return_doc = _parse_google_format_docstring(doc) 200 | 201 | json_schema = _convert_type_hints_to_json_schema(func) 202 | if (return_dict := json_schema["properties"].pop("return", None)) is not None: 203 | if return_doc is not None: # We allow a missing return docstring since most templates ignore it 204 | return_dict["description"] = return_doc 205 | for arg, schema in json_schema["properties"].items(): 206 | if arg not in param_descriptions: 207 | raise DocstringParsingException( 208 | f"Cannot generate JSON schema for {func.__name__} because the docstring has no description for the argument '{arg}'" 209 | ) 210 | desc = param_descriptions[arg] 211 | enum_choices = re.search(r"\(choices:\s*(.*?)\)\s*$", desc, flags=re.IGNORECASE) 212 | if enum_choices: 213 | schema["enum"] = [c.strip() for c in json.loads(enum_choices.group(1))] 214 | desc = enum_choices.string[: enum_choices.start()].strip() 215 | schema["description"] = desc 216 | 217 | output = {"name": func.__name__, "description": main_doc, "parameters": json_schema} 218 | if return_dict is not None: 219 | output["return"] = return_dict 220 | return {"type": "function", "function": output} 221 | 222 | 223 | # Extracts the initial segment of the docstring, containing the function description 224 | description_re = re.compile(r"^(.*?)[\n\s]*(Args:|Returns:|Raises:|\Z)", re.DOTALL) 225 | # Extracts the Args: block from the docstring 226 | args_re = re.compile(r"\n\s*Args:\n\s*(.*?)[\n\s]*(Returns:|Raises:|\Z)", re.DOTALL) 227 | # Splits the Args: block into individual arguments 228 | args_split_re = re.compile( 229 | r""" 230 | (?:^|\n) # Match the start of the args block, or a newline 231 | \s*(\w+)\s*(?:\([^)]*\))?:\s* # Capture the argument name (ignore the type) and strip spacing 232 | (.*?)\s* # Capture the argument description, which can span multiple lines, and strip trailing spacing 233 | (?=\n\s*\w+:|\Z) # Stop when you hit the next argument or the end of the block 234 | """, 235 | re.DOTALL | re.VERBOSE, 236 | ) 237 | # Extracts the Returns: block from the docstring, if present. Note that most chat templates ignore the return type/doc! 238 | returns_re = re.compile(r"\n\s*Returns:\n\s*(.*?)[\n\s]*(Raises:|\Z)", re.DOTALL) 239 | 240 | 241 | def _parse_google_format_docstring( 242 | docstring: str, 243 | ) -> Tuple[Optional[str], Optional[Dict], Optional[str]]: 244 | """ 245 | Parses a Google-style docstring to extract the function description, 246 | argument descriptions, and return description. 247 | 248 | Args: 249 | docstring (str): The docstring to parse. 250 | 251 | Returns: 252 | The function description, arguments, and return description. 253 | """ 254 | 255 | # Extract the sections 256 | description_match = description_re.search(docstring) 257 | args_match = args_re.search(docstring) 258 | returns_match = returns_re.search(docstring) 259 | 260 | # Clean and store the sections 261 | description = description_match.group(1).strip() if description_match else None 262 | docstring_args = args_match.group(1).strip() if args_match else None 263 | returns = returns_match.group(1).strip() if returns_match else None 264 | 265 | # Parsing the arguments into a dictionary 266 | if docstring_args is not None: 267 | docstring_args = "\n".join([line for line in docstring_args.split("\n") if line.strip()]) # Remove blank lines 268 | matches = args_split_re.findall(docstring_args) 269 | args_dict = {match[0]: re.sub(r"\s*\n+\s*", " ", match[1].strip()) for match in matches} 270 | else: 271 | args_dict = {} 272 | 273 | return description, args_dict, returns 274 | 275 | 276 | def _convert_type_hints_to_json_schema(func: Callable, error_on_missing_type_hints: bool = True) -> Dict: 277 | type_hints = get_type_hints(func) 278 | signature = inspect.signature(func) 279 | 280 | properties = {} 281 | for param_name, param_type in type_hints.items(): 282 | properties[param_name] = _parse_type_hint(param_type) 283 | 284 | required = [] 285 | for param_name, param in signature.parameters.items(): 286 | if param.annotation == inspect.Parameter.empty and error_on_missing_type_hints: 287 | raise TypeHintParsingException(f"Argument {param.name} is missing a type hint in function {func.__name__}") 288 | if param_name not in properties: 289 | properties[param_name] = {} 290 | 291 | if param.default == inspect.Parameter.empty: 292 | required.append(param_name) 293 | else: 294 | properties[param_name]["nullable"] = True 295 | 296 | schema = {"type": "object", "properties": properties} 297 | if required: 298 | schema["required"] = required 299 | 300 | return schema 301 | 302 | 303 | def _parse_type_hint(hint: str) -> Dict: 304 | origin = get_origin(hint) 305 | args = get_args(hint) 306 | 307 | if origin is None: 308 | try: 309 | return _get_json_schema_type(hint) 310 | except KeyError: 311 | raise TypeHintParsingException( 312 | "Couldn't parse this type hint, likely due to a custom class or object: ", 313 | hint, 314 | ) 315 | 316 | elif origin is Union or (hasattr(types, "UnionType") and origin is types.UnionType): 317 | # Recurse into each of the subtypes in the Union, except None, which is handled separately at the end 318 | subtypes = [_parse_type_hint(t) for t in args if t is not type(None)] 319 | if len(subtypes) == 1: 320 | # A single non-null type can be expressed directly 321 | return_dict = subtypes[0] 322 | elif all(isinstance(subtype["type"], str) for subtype in subtypes): 323 | # A union of basic types can be expressed as a list in the schema 324 | return_dict = {"type": sorted([subtype["type"] for subtype in subtypes])} 325 | else: 326 | # A union of more complex types requires "anyOf" 327 | return_dict = {"anyOf": subtypes} 328 | if type(None) in args: 329 | return_dict["nullable"] = True 330 | return return_dict 331 | 332 | elif origin is list: 333 | if not args: 334 | return {"type": "array"} 335 | else: 336 | # Lists can only have a single type argument, so recurse into it 337 | return {"type": "array", "items": _parse_type_hint(args[0])} 338 | 339 | elif origin is tuple: 340 | if not args: 341 | return {"type": "array"} 342 | if len(args) == 1: 343 | raise TypeHintParsingException( 344 | f"The type hint {str(hint).replace('typing.', '')} is a Tuple with a single element, which " 345 | "we do not automatically convert to JSON schema as it is rarely necessary. If this input can contain " 346 | "more than one element, we recommend " 347 | "using a List[] type instead, or if it really is a single element, remove the Tuple[] wrapper and just " 348 | "pass the element directly." 349 | ) 350 | if ... in args: 351 | raise TypeHintParsingException( 352 | "Conversion of '...' is not supported in Tuple type hints. " 353 | "Use List[] types for variable-length" 354 | " inputs instead." 355 | ) 356 | return {"type": "array", "prefixItems": [_parse_type_hint(t) for t in args]} 357 | 358 | elif origin is dict: 359 | # The JSON equivalent to a dict is 'object', which mandates that all keys are strings 360 | # However, we can specify the type of the dict values with "additionalProperties" 361 | out = {"type": "object"} 362 | if len(args) == 2: 363 | out["additionalProperties"] = _parse_type_hint(args[1]) 364 | return out 365 | 366 | raise TypeHintParsingException("Couldn't parse this type hint, likely due to a custom class or object: ", hint) 367 | 368 | 369 | _BASE_TYPE_MAPPING = { 370 | int: {"type": "integer"}, 371 | float: {"type": "number"}, 372 | str: {"type": "string"}, 373 | bool: {"type": "boolean"}, 374 | Any: {"type": "any"}, 375 | types.NoneType: {"type": "null"}, 376 | } 377 | 378 | 379 | def _get_json_schema_type(param_type: str) -> Dict[str, str]: 380 | if param_type in _BASE_TYPE_MAPPING: 381 | return copy(_BASE_TYPE_MAPPING[param_type]) 382 | if str(param_type) == "Image" and _is_pillow_available(): 383 | from PIL.Image import Image 384 | 385 | if param_type == Image: 386 | return {"type": "image"} 387 | if str(param_type) == "Tensor" and is_torch_available(): 388 | from torch import Tensor 389 | 390 | if param_type == Tensor: 391 | return {"type": "audio"} 392 | return {"type": "object"} 393 | -------------------------------------------------------------------------------- /core/data_types.py: -------------------------------------------------------------------------------- 1 | """ 2 | Agent Data Types Module 3 | ===================== 4 | 5 | This module defines specialized data types that can be returned by agents. 6 | These types provide consistent interfaces for handling different kinds of data 7 | like text, images, and audio, ensuring proper serialization, deserialization, 8 | and display capabilities. 9 | 10 | DeepFlow@2025 11 | """ 12 | 13 | # Standard library imports 14 | import logging # For logging errors and information 15 | import os # For filesystem operations 16 | import pathlib # For path manipulation with advanced features 17 | import tempfile # For creating temporary files and directories 18 | import uuid # For generating unique identifiers 19 | from io import BytesIO # For handling binary data in memory 20 | 21 | # Third-party imports 22 | import numpy as np # For numerical operations on arrays 23 | import requests # For HTTP requests 24 | from huggingface_hub.utils import is_torch_available # Check if PyTorch is installed 25 | from PIL import Image # For image processing 26 | from PIL.Image import Image as ImageType # The PIL Image class type 27 | 28 | # Local imports 29 | from .utils import _is_package_available # Utility to check if a package is available 30 | 31 | 32 | # Set up module-level logger 33 | logger = logging.getLogger(__name__) 34 | 35 | 36 | class AgentType: 37 | """ 38 | Abstract base class for agent-returned data types. 39 | 40 | This class defines the interface for all agent data types, enabling them to: 41 | 1. Behave as their native type (string, image, etc.) 42 | 2. Provide string representation via str(obj) 43 | 3. Display correctly in interactive environments like Jupyter notebooks 44 | 45 | All specialized agent data types should inherit from this class. 46 | """ 47 | 48 | def __init__(self, value): 49 | """ 50 | Initialize with a native value. 51 | 52 | Args: 53 | value: The native value to wrap (string, image, etc.) 54 | """ 55 | # Store the original value 56 | self._value = value 57 | 58 | def __str__(self): 59 | """ 60 | Return string representation via the to_string() method. 61 | 62 | This allows using str(obj) on any AgentType object. 63 | """ 64 | # Delegate to the to_string method which subclasses must implement 65 | return self.to_string() 66 | 67 | def to_raw(self): 68 | """ 69 | Return the raw native value. 70 | 71 | This method should be overridden by subclasses to return their 72 | specific native value type. 73 | 74 | Returns: 75 | The native value in its original form. 76 | """ 77 | # Log an error since this base method shouldn't be called directly 78 | logger.error( 79 | "This is a raw AgentType of unknown type. Display in notebooks and string conversion will be unreliable" 80 | ) 81 | # Return the stored value anyway as fallback 82 | return self._value 83 | 84 | def to_string(self) -> str: 85 | """ 86 | Convert the value to a string representation. 87 | 88 | This method should be overridden by subclasses to provide 89 | type-specific string conversion. 90 | 91 | Returns: 92 | str: String representation of the value. 93 | """ 94 | # Log an error since this base method shouldn't be called directly 95 | logger.error( 96 | "This is a raw AgentType of unknown type. Display in notebooks and string conversion will be unreliable" 97 | ) 98 | # Convert the value to string as fallback 99 | return str(self._value) 100 | 101 | 102 | class AgentText(AgentType, str): 103 | """ 104 | Text type returned by agents. 105 | 106 | Inherits from both AgentType and str to behave like a string 107 | while providing agent-specific capabilities. 108 | """ 109 | 110 | def to_raw(self): 111 | """ 112 | Return the raw string value. 113 | 114 | Returns: 115 | str: The original string value. 116 | """ 117 | # Simply return the stored value 118 | return self._value 119 | 120 | def to_string(self): 121 | """ 122 | Convert to string representation (already a string). 123 | 124 | Returns: 125 | str: The string representation. 126 | """ 127 | # Convert the stored value to string using str() 128 | # This ensures we always return a string, even if _value isn't already a string 129 | return str(self._value) 130 | 131 | 132 | class AgentImage(AgentType, ImageType): 133 | """ 134 | Image type returned by agents. 135 | 136 | Inherits from both AgentType and PIL.Image to behave like a PIL image 137 | while providing agent-specific capabilities for serialization and display. 138 | """ 139 | 140 | def __init__(self, value): 141 | """ 142 | Initialize with an image value. 143 | 144 | Args: 145 | value: Can be a PIL.Image, AgentImage, file path, bytes, tensor, or array 146 | """ 147 | # Initialize the parent AgentType class 148 | AgentType.__init__(self, value) 149 | # Initialize the PIL.Image class 150 | ImageType.__init__(self) 151 | 152 | # Initialize storage attributes to track different representations of the image 153 | self._image = None # For PIL.Image representation 154 | self._path = None # For file path representation 155 | self._tensor = None # For tensor representation 156 | 157 | # Handle different input types to populate the appropriate representation 158 | if isinstance(value, AgentImage): 159 | # Copy from another AgentImage instance 160 | self._image, self._path, self._tensor = value._image, value._path, value._tensor 161 | elif isinstance(value, ImageType): 162 | # Store PIL.Image directly 163 | self._image = value 164 | elif isinstance(value, bytes): 165 | # Convert bytes to PIL.Image using BytesIO 166 | self._image = Image.open(BytesIO(value)) 167 | elif isinstance(value, (str, pathlib.Path)): 168 | # Store file path as string or Path 169 | self._path = value 170 | elif is_torch_available(): 171 | # Only try to handle torch tensors if PyTorch is available 172 | import torch 173 | # Handle torch tensor 174 | if isinstance(value, torch.Tensor): 175 | self._tensor = value 176 | # Handle numpy array by converting to tensor 177 | if isinstance(value, np.ndarray): 178 | self._tensor = torch.from_numpy(value) 179 | 180 | # Ensure at least one representation is available 181 | if self._path is None and self._image is None and self._tensor is None: 182 | raise TypeError(f"Unsupported type for {self.__class__.__name__}: {type(value)}") 183 | 184 | def _ipython_display_(self, include=None, exclude=None): 185 | """ 186 | Display the image in Jupyter notebooks. 187 | 188 | This method is called automatically by IPython display mechanics 189 | when the object is the result of a cell execution. 190 | 191 | Args: 192 | include: Parameters to include (unused, for compatibility) 193 | exclude: Parameters to exclude (unused, for compatibility) 194 | """ 195 | # Import IPython display utilities here to avoid global dependency 196 | from IPython.display import Image, display 197 | 198 | # Display the image using IPython's display function 199 | # Convert to string first, which gives the file path 200 | display(Image(self.to_string())) 201 | 202 | def to_raw(self): 203 | """ 204 | Return the raw PIL.Image representation. 205 | 206 | Converts from any available representation (file path, tensor) 207 | to a PIL.Image if needed. 208 | 209 | Returns: 210 | PIL.Image: The image as a PIL Image object 211 | """ 212 | # If we already have the image in memory, return it 213 | if self._image is not None: 214 | return self._image 215 | 216 | # If we have a file path, load the image from disk 217 | if self._path is not None: 218 | # Load from file path and cache the result 219 | self._image = Image.open(self._path) 220 | return self._image 221 | 222 | # If we have a tensor, convert it to a PIL.Image 223 | if self._tensor is not None: 224 | # Convert tensor to numpy array on CPU 225 | array = self._tensor.cpu().detach().numpy() 226 | # Convert array to PIL.Image with pixel value scaling and type conversion 227 | return Image.fromarray((255 - array * 255).astype(np.uint8)) 228 | 229 | # If no representation is available, this will not be reached due to initialization check 230 | 231 | def to_string(self): 232 | """ 233 | Return a file path to the image. 234 | 235 | If the image is not already saved to a file, saves it to a temporary 236 | location and returns that path. 237 | 238 | Returns: 239 | str: A file path to the image 240 | """ 241 | # If we already have a file path, return it 242 | if self._path is not None: 243 | return self._path 244 | 245 | # If we have a PIL.Image, save it to a temporary file 246 | if self._image is not None: 247 | # Create a temporary directory 248 | temp_dir = tempfile.mkdtemp() 249 | # Generate a unique filename 250 | self._path = os.path.join(temp_dir, str(uuid.uuid4()) + ".png") 251 | # Save the image to the temporary path 252 | self._image.save(self._path, format="png") 253 | return self._path 254 | 255 | # If we have a tensor, convert it to an image and save 256 | if self._tensor is not None: 257 | # Convert tensor to numpy array on CPU 258 | array = self._tensor.cpu().detach().numpy() 259 | # Convert array to PIL.Image 260 | img = Image.fromarray((255 - array * 255).astype(np.uint8)) 261 | # Create a temporary directory 262 | temp_dir = tempfile.mkdtemp() 263 | # Generate a unique filename 264 | self._path = os.path.join(temp_dir, str(uuid.uuid4()) + ".png") 265 | # Save the image to the temporary path 266 | img.save(self._path, format="png") 267 | return self._path 268 | 269 | # If no representation is available, this will not be reached due to initialization check 270 | 271 | def save(self, output_path, format: str = None, **params): 272 | """ 273 | Save the image to a file. 274 | 275 | Args: 276 | output_path: Path where the image should be saved 277 | format: Image format (e.g. 'png', 'jpg') 278 | **params: Additional parameters for PIL.Image.save 279 | """ 280 | # Get the raw PIL.Image representation 281 | img = self.to_raw() 282 | # Save the image to the specified path with given format and parameters 283 | img.save(output_path, format=format, **params) 284 | 285 | 286 | class AgentAudio(AgentType, str): 287 | """ 288 | Audio type returned by agents. 289 | 290 | Provides functionality for handling audio data with consistent 291 | interfaces for serialization and playback. 292 | """ 293 | 294 | def __init__(self, value, sample_rate=16_000): 295 | """ 296 | Initialize with audio value. 297 | 298 | Args: 299 | value: Can be an AgentAudio, file path, bytes, or numpy array 300 | sample_rate: Sample rate of the audio in Hz (default: 16000) 301 | """ 302 | # Initialize the parent AgentType class 303 | AgentType.__init__(self, value) 304 | 305 | # Initialize storage attributes for different representations 306 | self._audio = None # For numpy array data 307 | self._path = None # For file path 308 | self._sample_rate = sample_rate # Store the sample rate 309 | 310 | # Handle different input types 311 | if isinstance(value, AgentAudio): 312 | # Copy from another AgentAudio instance 313 | self._audio, self._path, self._sample_rate = value._audio, value._path, value._sample_rate 314 | elif isinstance(value, (str, pathlib.Path)): 315 | # Store file path as string 316 | self._path = str(value) 317 | elif isinstance(value, bytes): 318 | # Save bytes to a temporary file 319 | temp_dir = tempfile.mkdtemp() 320 | self._path = os.path.join(temp_dir, str(uuid.uuid4()) + ".wav") 321 | # Write the bytes to the file 322 | with open(self._path, "wb") as f: 323 | f.write(value) 324 | elif _is_package_available("numpy") and isinstance(value, np.ndarray): 325 | # Store numpy array directly 326 | self._audio = value 327 | else: 328 | # Raise an error for unsupported types 329 | raise TypeError(f"Unsupported type for {self.__class__.__name__}: {type(value)}") 330 | 331 | def _ipython_display_(self, include=None, exclude=None): 332 | """ 333 | Display the audio in Jupyter notebooks with playback controls. 334 | 335 | This method is called automatically by IPython display mechanics 336 | when the object is the result of a cell execution. 337 | 338 | Args: 339 | include: Parameters to include (unused, for compatibility) 340 | exclude: Parameters to exclude (unused, for compatibility) 341 | """ 342 | # Import IPython display utilities here to avoid global dependency 343 | from IPython.display import Audio, display 344 | 345 | # Display the audio with the appropriate sample rate 346 | display(Audio(self.to_string(), rate=self._sample_rate)) 347 | 348 | def to_raw(self): 349 | """ 350 | Return the raw numpy array representation of the audio. 351 | 352 | Converts from any available representation (file path) to 353 | a numpy array if needed. 354 | 355 | Returns: 356 | numpy.ndarray: The audio data as a numpy array 357 | 358 | Raises: 359 | ValueError: If audio cannot be converted to raw format 360 | """ 361 | # If we already have the audio array in memory, return it 362 | if self._audio is not None: 363 | return self._audio 364 | 365 | # If we have a file path, load the audio from disk 366 | if self._path is not None: 367 | # Load the audio file using soundfile 368 | import soundfile as sf 369 | 370 | # Read the audio file and get both data and sample rate 371 | self._audio, self._sample_rate = sf.read(self._path) 372 | return self._audio 373 | 374 | # If no valid representation is available 375 | raise ValueError("Could not convert audio to raw format") 376 | 377 | def to_string(self): 378 | """ 379 | Return a file path to the audio. 380 | 381 | If the audio is not already saved to a file, saves it to a temporary 382 | location and returns that path. 383 | 384 | Returns: 385 | str: A file path to the audio 386 | 387 | Raises: 388 | ValueError: If audio cannot be converted to string format 389 | """ 390 | # If we already have a file path, return it 391 | if self._path is not None: 392 | return self._path 393 | 394 | # If we have a numpy array, save it to a temporary file 395 | if self._audio is not None: 396 | # Check if soundfile is available 397 | if not _is_package_available("soundfile"): 398 | raise ImportError( 399 | "The soundfile package is required to save audio files. Please install it with `pip install soundfile`." 400 | ) 401 | 402 | # Import soundfile for audio file handling 403 | import soundfile as sf 404 | 405 | # Create a temporary directory and file 406 | temp_dir = tempfile.mkdtemp() 407 | self._path = os.path.join(temp_dir, str(uuid.uuid4()) + ".wav") 408 | 409 | # Write the audio data to the file 410 | sf.write(self._path, self._audio, self._sample_rate) 411 | return self._path 412 | 413 | # If no valid representation is available 414 | raise ValueError("Could not convert audio to string format") 415 | 416 | 417 | # Mapping of type names to agent type classes for type conversion 418 | _AGENT_TYPE_MAPPING = {"string": AgentText, "image": AgentImage, "audio": AgentAudio} 419 | 420 | 421 | def handle_agent_input_types(*args, **kwargs): 422 | """ 423 | Normalize agent inputs to their raw formats. 424 | 425 | This function prepares input arguments for processing by tools or models 426 | by converting agent types to their raw native values. 427 | 428 | Args: 429 | *args: Positional arguments to normalize 430 | **kwargs: Keyword arguments to normalize 431 | 432 | Returns: 433 | tuple: A tuple containing (args, kwargs) with normalized values 434 | """ 435 | # Currently a pass-through function - the raw types are used directly 436 | # This is a placeholder for potential future pre-processing 437 | return args, kwargs 438 | 439 | 440 | def handle_agent_output_types(output, output_type=None): 441 | """ 442 | Normalize outputs to expected agent types. 443 | 444 | This function ensures that outputs from tools or models are properly 445 | wrapped in appropriate agent types based on the expected output_type. 446 | 447 | Args: 448 | output: The raw output value to process 449 | output_type: Expected output type string ('image', 'audio', 'string') 450 | 451 | Returns: 452 | An agent type wrapped value or the original output if no wrapping is needed 453 | """ 454 | # If output_type is specified, wrap the output in the appropriate agent type 455 | if output_type is not None and output_type == "image" and not isinstance(output, AgentImage): 456 | # Wrap output in AgentImage if expected type is 'image' 457 | return AgentImage(output) 458 | elif output_type is not None and output_type == "audio" and not isinstance(output, AgentAudio): 459 | # Wrap output in AgentAudio if expected type is 'audio' 460 | return AgentAudio(output) 461 | elif output_type is not None and output_type == "string" and not isinstance(output, AgentText): 462 | # Wrap output in AgentText if expected type is 'string' 463 | return AgentText(output) 464 | 465 | # Return the original output if no wrapping is needed or type is unknown 466 | return output 467 | 468 | 469 | # List of classes to expose in the public API 470 | __all__ = ["AgentType", "AgentImage", "AgentText", "AgentAudio"] 471 | -------------------------------------------------------------------------------- /core/memory.py: -------------------------------------------------------------------------------- 1 | """ 2 | Agent Memory Module 3 | ================= 4 | 5 | This module implements the memory system for agents, providing structured storage for 6 | different types of execution steps, actions, plans, and observations. The memory system 7 | enables agents to track their reasoning process, store intermediate results, and 8 | maintain context across multi-step executions. 9 | 10 | The module defines various types of memory steps (TaskStep, ActionStep, PlanningStep) 11 | and a centralized AgentMemory class that manages these steps and provides methods 12 | for querying and manipulating agent memory. 13 | 14 | DeepFlow@2025 15 | """ 16 | 17 | from dataclasses import asdict, dataclass 18 | from logging import getLogger 19 | from typing import TYPE_CHECKING, Any, Dict, List, TypedDict, Union 20 | 21 | from smolagents.models import ChatMessage, MessageRole 22 | from smolagents.monitoring import AgentLogger, LogLevel 23 | from smolagents.utils import AgentError, make_json_serializable 24 | 25 | 26 | if TYPE_CHECKING: 27 | import PIL.Image 28 | 29 | from smolagents.models import ChatMessage 30 | from smolagents.monitoring import AgentLogger 31 | 32 | 33 | logger = getLogger(__name__) 34 | 35 | 36 | class Message(TypedDict): 37 | """ 38 | Typed dictionary for representing message data. 39 | 40 | This structure is used for standardized communication between 41 | the agent and its memory system. 42 | 43 | Attributes: 44 | role: The role of the message sender (user, assistant, system, etc.) 45 | content: The content of the message (text or structured content) 46 | """ 47 | role: MessageRole 48 | content: str | list[dict] 49 | 50 | 51 | @dataclass 52 | class ToolCall: 53 | """ 54 | Represents a single call to a tool by the agent. 55 | 56 | This dataclass stores information about a specific tool invocation, 57 | including the tool name, arguments passed, and a unique ID for tracking. 58 | 59 | Attributes: 60 | name: The name of the tool being called 61 | arguments: The arguments passed to the tool 62 | id: A unique identifier for this specific tool call 63 | """ 64 | name: str 65 | arguments: Any 66 | id: str 67 | 68 | def to_dict(self): 69 | """ 70 | Convert the tool call to a dictionary representation. 71 | 72 | Returns: 73 | dict: A serialized representation of the tool call 74 | """ 75 | return { 76 | "id": self.id, 77 | "type": "function", 78 | "function": { 79 | "name": self.name, 80 | "arguments": make_json_serializable(self.arguments), 81 | }, 82 | } 83 | 84 | 85 | @dataclass 86 | class MemoryStep: 87 | """ 88 | Base class for all agent memory steps. 89 | 90 | This abstract class defines the interface that all memory step types 91 | must implement, providing serialization and message conversion capabilities. 92 | """ 93 | 94 | def to_dict(self): 95 | """ 96 | Convert the memory step to a dictionary representation. 97 | 98 | Returns: 99 | dict: A serialized representation of the memory step 100 | """ 101 | return asdict(self) 102 | 103 | def to_messages(self, **kwargs) -> List[Dict[str, Any]]: 104 | """ 105 | Convert the memory step to a list of messages. 106 | 107 | This abstract method should be implemented by subclasses to convert 108 | their specific data into standardized messages. 109 | 110 | Args: 111 | **kwargs: Additional arguments for conversion process 112 | 113 | Returns: 114 | List[Message]: A list of messages representing this memory step 115 | """ 116 | raise NotImplementedError 117 | 118 | 119 | @dataclass 120 | class ActionStep(MemoryStep): 121 | """ 122 | Memory step representing an agent action. 123 | 124 | This step records the execution of a single action by the agent, 125 | including inputs, outputs, observations, and any errors encountered. 126 | 127 | Attributes: 128 | model_input_messages: Messages provided to the model 129 | tool_calls: Tools called during this step 130 | start_time: When the step began execution 131 | end_time: When the step completed execution 132 | step_number: Sequential number of this step 133 | error: Any error encountered during execution 134 | duration: Time taken to execute the step 135 | model_output_message: The full message output by the model 136 | model_output: The text output by the model 137 | observations: Text observations from tool execution 138 | observations_images: Image observations from tool execution 139 | action_output: Final output from the action 140 | """ 141 | model_input_messages: List[Message] | None = None 142 | tool_calls: List[ToolCall] | None = None 143 | start_time: float | None = None 144 | end_time: float | None = None 145 | step_number: int | None = None 146 | error: AgentError | None = None 147 | duration: float | None = None 148 | model_output_message: ChatMessage = None 149 | model_output: str | None = None 150 | observations: str | None = None 151 | observations_images: List["PIL.Image.Image"] | None = None 152 | action_output: Any = None 153 | 154 | def to_dict(self): 155 | """ 156 | Convert the action step to a dictionary representation. 157 | 158 | Returns: 159 | dict: A serialized representation of the action step 160 | """ 161 | return { 162 | "model_input_messages": self.model_input_messages, 163 | "tool_calls": [tool_call.to_dict() for tool_call in self.tool_calls] if self.tool_calls else [], 164 | "start_time": self.start_time, 165 | "end_time": self.end_time, 166 | "step": self.step_number, 167 | "error": self.error.to_dict() if self.error else None, 168 | "duration": self.duration, 169 | "model_output_message": self.model_output_message, 170 | "model_output": self.model_output, 171 | "observations": self.observations, 172 | "action_output": make_json_serializable(self.action_output), 173 | } 174 | 175 | def to_messages(self, summary_mode: bool = False, show_model_input_messages: bool = False) -> List[Message]: 176 | """ 177 | Convert the action step to a list of messages. 178 | 179 | Args: 180 | summary_mode: Whether to generate a summarized version 181 | show_model_input_messages: Whether to include input messages 182 | 183 | Returns: 184 | List[Message]: Messages representing this action step 185 | """ 186 | messages = [] 187 | # Include input messages if requested 188 | if self.model_input_messages is not None and show_model_input_messages: 189 | messages.append(Message(role=MessageRole.SYSTEM, content=self.model_input_messages)) 190 | 191 | # Include model output unless in summary mode 192 | if self.model_output is not None and not summary_mode: 193 | messages.append( 194 | Message(role=MessageRole.ASSISTANT, content=[{"type": "text", "text": self.model_output.strip()}]) 195 | ) 196 | 197 | # Include tool calls if present 198 | if self.tool_calls is not None: 199 | messages.append( 200 | Message( 201 | role=MessageRole.TOOL_CALL, 202 | content=[ 203 | { 204 | "type": "text", 205 | "text": "Calling tools:\n" + str([tc.to_dict() for tc in self.tool_calls]), 206 | } 207 | ], 208 | ) 209 | ) 210 | 211 | # Include observations if present 212 | if self.observations is not None: 213 | messages.append( 214 | Message( 215 | role=MessageRole.TOOL_RESPONSE, 216 | content=[ 217 | { 218 | "type": "text", 219 | "text": (f"Call id: {self.tool_calls[0].id}\n" if self.tool_calls else "") 220 | + f"Observation:\n{self.observations}", 221 | } 222 | ], 223 | ) 224 | ) 225 | 226 | # Include error information if present 227 | if self.error is not None: 228 | error_message = ( 229 | "Error:\n" 230 | + str(self.error) 231 | + "\nNow let's retry: take care not to repeat previous errors! If you have retried several times, try a completely different approach.\n" 232 | ) 233 | message_content = f"Call id: {self.tool_calls[0].id}\n" if self.tool_calls else "" 234 | message_content += error_message 235 | messages.append( 236 | Message(role=MessageRole.TOOL_RESPONSE, content=[{"type": "text", "text": message_content}]) 237 | ) 238 | 239 | # Include image observations if present 240 | if self.observations_images: 241 | messages.append( 242 | Message( 243 | role=MessageRole.USER, 244 | content=[{"type": "text", "text": "Here are the observed images:"}] 245 | + [ 246 | { 247 | "type": "image", 248 | "image": image, 249 | } 250 | for image in self.observations_images 251 | ], 252 | ) 253 | ) 254 | return messages 255 | 256 | 257 | @dataclass 258 | class PlanningStep(MemoryStep): 259 | """ 260 | Memory step representing a planning operation. 261 | 262 | This step records the input, output, and results of a planning operation, 263 | where the agent analyzes the current state and formulates a plan. 264 | 265 | Attributes: 266 | model_input_messages: Messages provided to the model 267 | model_output_message_facts: The model's output message containing facts 268 | facts: The gathered facts as text 269 | model_output_message_plan: The model's output message containing the plan 270 | plan: The formulated plan as text 271 | """ 272 | model_input_messages: List[Message] 273 | model_output_message_facts: ChatMessage 274 | facts: str 275 | model_output_message_plan: ChatMessage 276 | plan: str 277 | 278 | def to_messages(self, summary_mode: bool, **kwargs) -> List[Message]: 279 | """ 280 | Convert the planning step to a list of messages. 281 | 282 | Args: 283 | summary_mode: Whether to generate a summarized version 284 | **kwargs: Additional conversion arguments 285 | 286 | Returns: 287 | List[Message]: Messages representing this planning step 288 | """ 289 | messages = [] 290 | # Always include facts 291 | messages.append( 292 | Message( 293 | role=MessageRole.ASSISTANT, content=[{"type": "text", "text": f"[FACTS LIST]:\n{self.facts.strip()}"}] 294 | ) 295 | ) 296 | 297 | # Include plan unless in summary mode 298 | if not summary_mode: 299 | messages.append( 300 | Message( 301 | role=MessageRole.ASSISTANT, content=[{"type": "text", "text": f"[PLAN]:\n{self.plan.strip()}"}] 302 | ) 303 | ) 304 | return messages 305 | 306 | 307 | @dataclass 308 | class TaskStep(MemoryStep): 309 | """ 310 | Memory step representing a task assignment. 311 | 312 | This step records a task that has been assigned to the agent, 313 | including any associated images. 314 | 315 | Attributes: 316 | task: The task description as text 317 | task_images: Optional list of images associated with the task 318 | """ 319 | task: str 320 | task_images: List["PIL.Image.Image"] | None = None 321 | 322 | def to_messages(self, summary_mode: bool = False, **kwargs) -> List[Message]: 323 | """ 324 | Convert the task step to a list of messages. 325 | 326 | Args: 327 | summary_mode: Whether to generate a summarized version 328 | **kwargs: Additional conversion arguments 329 | 330 | Returns: 331 | List[Message]: Messages representing this task step 332 | """ 333 | # Create content with task text 334 | content = [{"type": "text", "text": f"New task:\n{self.task}"}] 335 | 336 | # Add images if present 337 | if self.task_images: 338 | for image in self.task_images: 339 | content.append({"type": "image", "image": image}) 340 | 341 | return [Message(role=MessageRole.USER, content=content)] 342 | 343 | 344 | @dataclass 345 | class SystemPromptStep(MemoryStep): 346 | """ 347 | Memory step representing a system prompt. 348 | 349 | This step records a system prompt that provides context and 350 | instructions to the agent. 351 | 352 | Attributes: 353 | system_prompt: The system prompt text 354 | """ 355 | system_prompt: str 356 | 357 | def to_messages(self, summary_mode: bool = False, **kwargs) -> List[Message]: 358 | """ 359 | Convert the system prompt step to a list of messages. 360 | 361 | Args: 362 | summary_mode: Whether to generate a summarized version 363 | **kwargs: Additional conversion arguments 364 | 365 | Returns: 366 | List[Message]: Messages representing this system prompt step 367 | """ 368 | # Skip in summary mode 369 | if summary_mode: 370 | return [] 371 | return [Message(role=MessageRole.SYSTEM, content=[{"type": "text", "text": self.system_prompt}])] 372 | 373 | 374 | class AgentMemory: 375 | """ 376 | Central management system for agent memory. 377 | 378 | This class stores, organizes, and provides access to all memory steps 379 | that an agent accumulates during execution, enabling context retention 380 | and analysis of the agent's execution history. 381 | """ 382 | 383 | def __init__(self, system_prompt: str = ""): 384 | """ 385 | Initialize an agent memory instance. 386 | 387 | Args: 388 | system_prompt: Optional system prompt to initialize with 389 | """ 390 | self.steps = [] 391 | if system_prompt: 392 | self.add_system_prompt(system_prompt) 393 | 394 | def reset(self): 395 | """ 396 | Clear all stored memory steps. 397 | """ 398 | self.steps = [] 399 | 400 | def get_succinct_steps(self) -> list[dict]: 401 | """ 402 | Get a concise representation of all memory steps. 403 | 404 | Returns a list of dictionaries without the model input messages, 405 | which can be verbose and less relevant for succinct histories. 406 | 407 | Returns: 408 | list[dict]: Concise representation of all memory steps 409 | """ 410 | return [ 411 | {key: value for key, value in step.to_dict().items() if key != "model_input_messages"} for step in self.steps 412 | ] 413 | 414 | def get_full_steps(self) -> list[dict]: 415 | """ 416 | Get a complete representation of all memory steps. 417 | 418 | Returns: 419 | list[dict]: Complete representation of all memory steps 420 | """ 421 | return [step.to_dict() for step in self.steps] 422 | 423 | def replay(self, logger: AgentLogger, detailed: bool = False): 424 | """ 425 | Replay the agent's execution history through the logger. 426 | 427 | This method outputs all steps in the agent's memory to the provided 428 | logger for analysis or debugging. 429 | 430 | Args: 431 | logger: The logger to output the replay 432 | detailed: Whether to include detailed information 433 | """ 434 | for i, step in enumerate(self.steps): 435 | if isinstance(step, ActionStep): 436 | # Log action step components 437 | if step.model_output is not None: 438 | logger.log(f"Step {i}: Model output:\n{step.model_output}", LogLevel.DEBUG if detailed else LogLevel.INFO) 439 | if step.tool_calls is not None: 440 | tool_calls_str = ", ".join([f"{tc.name}({tc.arguments})" for tc in step.tool_calls]) 441 | logger.log(f"Step {i}: Tool calls: {tool_calls_str}", LogLevel.DEBUG if detailed else LogLevel.INFO) 442 | if step.observations is not None: 443 | logger.log(f"Step {i}: Observations: {step.observations}", LogLevel.DEBUG if detailed else LogLevel.INFO) 444 | if step.error is not None: 445 | logger.log(f"Step {i}: Error: {step.error}", LogLevel.DEBUG if detailed else LogLevel.INFO) 446 | if step.action_output is not None: 447 | logger.log(f"Step {i}: Action output: {step.action_output}", LogLevel.DEBUG if detailed else LogLevel.INFO) 448 | elif isinstance(step, PlanningStep): 449 | # Log planning step components 450 | logger.log(f"Step {i}: Planning step", LogLevel.DEBUG if detailed else LogLevel.INFO) 451 | logger.log(f"Facts: {step.facts}", LogLevel.DEBUG) 452 | logger.log(f"Plan: {step.plan}", LogLevel.DEBUG) 453 | elif isinstance(step, TaskStep): 454 | # Log task step 455 | logger.log(f"Step {i}: Task: {step.task}", LogLevel.DEBUG if detailed else LogLevel.INFO) 456 | 457 | def add_action_step(self, action_step: ActionStep): 458 | """ 459 | Add an action step to memory. 460 | 461 | Args: 462 | action_step: The action step to add 463 | """ 464 | self.steps.append(action_step) 465 | 466 | def add_planning_step(self, planning_step: PlanningStep): 467 | """ 468 | Add a planning step to memory. 469 | 470 | Args: 471 | planning_step: The planning step to add 472 | """ 473 | self.steps.append(planning_step) 474 | 475 | def add_task(self, task: str, task_images: List["PIL.Image.Image"] | None = None): 476 | """ 477 | Add a task step to memory. 478 | 479 | Args: 480 | task: The task description 481 | task_images: Optional images associated with the task 482 | """ 483 | self.steps.append(TaskStep(task=task, task_images=task_images)) 484 | 485 | def add_system_prompt(self, system_prompt: str): 486 | """ 487 | Add a system prompt step to memory. 488 | 489 | Args: 490 | system_prompt: The system prompt text 491 | """ 492 | self.steps.append(SystemPromptStep(system_prompt=system_prompt)) 493 | 494 | def get_messages(self, summary_mode: bool = False, show_model_input_messages: bool = False) -> List[Message]: 495 | """ 496 | Convert all memory steps to a consolidated list of messages. 497 | 498 | This method aggregates messages from all memory steps to provide 499 | a complete conversation history. 500 | 501 | Args: 502 | summary_mode: Whether to generate summarized messages 503 | show_model_input_messages: Whether to include input messages 504 | 505 | Returns: 506 | List[Message]: Consolidated list of messages from all steps 507 | """ 508 | messages = [] 509 | for step in self.steps: 510 | messages.extend(step.to_messages(summary_mode=summary_mode, show_model_input_messages=show_model_input_messages)) 511 | return messages 512 | 513 | 514 | __all__ = ["AgentMemory"] 515 | -------------------------------------------------------------------------------- /interface/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/interface/__init__.py -------------------------------------------------------------------------------- /interface/cli.py: -------------------------------------------------------------------------------- 1 | """ 2 | Command Line Interface Module 3 | =========================== 4 | 5 | This module provides a command-line interface for running agents with different models 6 | and configurations. It enables users to interact with the agent framework directly 7 | from the terminal, specifying parameters like model type, model ID, authorized imports, 8 | and tools the agent can use. 9 | 10 | The CLI is designed to be flexible and user-friendly, allowing quick experimentation 11 | with different agent configurations without writing code. 12 | 13 | DeepFlow@2025 14 | """ 15 | 16 | #!/usr/bin/env python 17 | # coding=utf-8 18 | 19 | # Copyright 2025 The HuggingFace Inc. team. All rights reserved. 20 | # 21 | # Licensed under the Apache License, Version 2.0 (the "License"); 22 | # you may not use this file except in compliance with the License. 23 | # You may obtain a copy of the License at 24 | # 25 | # http://www.apache.org/licenses/LICENSE-2.0 26 | # 27 | # Unless required by applicable law or agreed to in writing, software 28 | # distributed under the License is distributed on an "AS IS" BASIS, 29 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 30 | # See the License for the specific language governing permissions and 31 | # limitations under the License. 32 | 33 | # Standard library imports 34 | import argparse # For command-line argument parsing 35 | import os # For environment variable access and file path handling 36 | 37 | # Third-party imports 38 | from dotenv import load_dotenv # For loading environment variables from .env files 39 | 40 | # Local imports from the agent framework 41 | from smolagents import CodeAgent, HfApiModel, LiteLLMModel, Model, OpenAIServerModel, Tool, TransformersModel 42 | from smolagents.default_tools import TOOL_MAPPING # Dictionary of available built-in tools 43 | 44 | 45 | # Default prompt used when no prompt is provided by the user 46 | DEFAULT_PROMPT = "How many seconds would it take for a leopard at full speed to run through Pont des Arts?" 47 | 48 | 49 | def parse_arguments(): 50 | """ 51 | Parse command line arguments. 52 | 53 | This function defines all the command-line arguments available for configuring 54 | the agent, including model type, model ID, authorized imports, tools, and API options. 55 | 56 | Returns: 57 | argparse.Namespace: Parsed command line arguments 58 | """ 59 | # Create a new argument parser with a description 60 | parser = argparse.ArgumentParser(description="Run a CodeAgent with all specified parameters") 61 | 62 | # Main arguments 63 | # The prompt argument is positional and optional (with a default value) 64 | parser.add_argument( 65 | "prompt", # The name of the argument 66 | type=str, # The argument should be interpreted as a string 67 | nargs="?", # Makes the argument optional 68 | default=DEFAULT_PROMPT, # Default value if not provided 69 | help="The prompt to run with the agent", # Help text for the argument 70 | ) 71 | 72 | # The model type argument (e.g., HfApiModel, OpenAIServerModel) 73 | parser.add_argument( 74 | "--model-type", # The flag for this argument 75 | type=str, # The argument should be interpreted as a string 76 | default="HfApiModel", # Default value if not provided 77 | help="The model type to use (e.g., HfApiModel, OpenAIServerModel, LiteLLMModel, TransformersModel)", 78 | ) 79 | 80 | # The model ID argument (e.g., specific model name/path) 81 | parser.add_argument( 82 | "--model-id", 83 | type=str, 84 | default="Qwen/Qwen2.5-Coder-32B-Instruct", 85 | help="The model ID to use for the specified model type", 86 | ) 87 | 88 | # Additional Python packages to authorize for import in the agent 89 | parser.add_argument( 90 | "--imports", 91 | nargs="*", # Accept zero or more values 92 | default=[], # Default to an empty list 93 | help="Space-separated list of imports to authorize (e.g., 'numpy pandas')", 94 | ) 95 | 96 | # Tools that the agent can use 97 | parser.add_argument( 98 | "--tools", 99 | nargs="*", # Accept zero or more values 100 | default=["web_search"], # Default to web_search tool 101 | help="Space-separated list of tools that the agent can use (e.g., 'tool1 tool2 tool3')", 102 | ) 103 | 104 | # Verbosity level for the agent's output 105 | parser.add_argument( 106 | "--verbosity-level", 107 | type=int, 108 | default=1, 109 | help="The verbosity level, as an int in [0, 1, 2].", 110 | ) 111 | 112 | # API-specific options in a separate group for better organization 113 | api_group = parser.add_argument_group("api options", "Options for API-based model types") 114 | 115 | # Base URL for the API 116 | api_group.add_argument( 117 | "--api-base", 118 | type=str, 119 | help="The base URL for the model", 120 | ) 121 | 122 | # API key for authentication 123 | api_group.add_argument( 124 | "--api-key", 125 | type=str, 126 | help="The API key for the model", 127 | ) 128 | 129 | # Parse the arguments from the command line and return them 130 | return parser.parse_args() 131 | 132 | 133 | def create_model(model_type: str, model_id: str, api_base: str | None = None, api_key: str | None = None) -> Model: 134 | """ 135 | Create a model instance based on model type and parameters. 136 | 137 | This function instantiates the appropriate model class based on the specified 138 | model type and configures it with the provided parameters. 139 | 140 | Args: 141 | model_type: Type of model to create (e.g., "OpenAIServerModel", "HfApiModel") 142 | model_id: ID of the model to use 143 | api_base: Optional base URL for API-based models 144 | api_key: Optional API key for API-based models 145 | 146 | Returns: 147 | Model: Instantiated model 148 | 149 | Raises: 150 | ValueError: If an unsupported model type is specified 151 | """ 152 | # Handle OpenAI-compatible APIs (like Fireworks) 153 | if model_type == "OpenAIServerModel": 154 | return OpenAIServerModel( 155 | api_key=api_key or os.getenv("FIREWORKS_API_KEY"), # Use provided key or get from environment 156 | api_base=api_base or "https://api.fireworks.ai/inference/v1", # Use provided base or default 157 | model_id=model_id, 158 | ) 159 | # Handle LiteLLM models (a library that provides access to various LLM APIs) 160 | elif model_type == "LiteLLMModel": 161 | return LiteLLMModel( 162 | model_id=model_id, 163 | api_key=api_key, 164 | api_base=api_base, 165 | ) 166 | # Handle locally-run Transformers models 167 | elif model_type == "TransformersModel": 168 | return TransformersModel(model_id=model_id, device_map="auto") # Auto device placement 169 | # Handle Hugging Face API models 170 | elif model_type == "HfApiModel": 171 | return HfApiModel( 172 | model_id=model_id, 173 | token=api_key or os.getenv("HF_API_KEY"), # Use provided key or get from environment 174 | ) 175 | # Raise an error for unsupported model types 176 | else: 177 | raise ValueError(f"Unsupported model type: {model_type}") 178 | 179 | 180 | def main( 181 | prompt: str, 182 | tools: list[str], 183 | model_type: str, 184 | model_id: str, 185 | api_base: str | None = None, 186 | api_key: str | None = None, 187 | imports: list[str] | None = None, 188 | ) -> None: 189 | """ 190 | Main function to run the agent with specified parameters. 191 | 192 | This function sets up the model, loads the specified tools, creates the agent, 193 | and runs it with the provided prompt. 194 | 195 | Args: 196 | prompt: The prompt to send to the agent 197 | tools: List of tool names to enable for the agent 198 | model_type: Type of model to use 199 | model_id: ID of the model to use 200 | api_base: Optional base URL for API-based models 201 | api_key: Optional API key for API-based models 202 | imports: Optional list of Python modules the agent is allowed to import 203 | """ 204 | # Load environment variables from .env file if present 205 | load_dotenv() 206 | 207 | # Create the model based on specified parameters 208 | model = create_model(model_type, model_id, api_base=api_base, api_key=api_key) 209 | 210 | # Load and initialize the specified tools 211 | agent_tools = [] 212 | for tool_name in tools: 213 | if "/" in tool_name: 214 | # Tool is a HuggingFace Space (identified by slash in the name) 215 | agent_tools.append(Tool.from_space(tool_name)) 216 | else: 217 | # Tool is a default tool from TOOL_MAPPING 218 | if tool_name in TOOL_MAPPING: 219 | agent_tools.append(TOOL_MAPPING[tool_name]()) # Instantiate the tool 220 | else: 221 | raise ValueError(f"Tool {tool_name} is not recognized either as a default tool or a Space.") 222 | 223 | # Display tools being used for user feedback 224 | print(f"Running agent with these tools: {tools}") 225 | 226 | # Create and run the agent with the specified configuration 227 | agent = CodeAgent(tools=agent_tools, model=model, additional_authorized_imports=imports) 228 | agent.run(prompt) # Execute the agent with the given prompt 229 | 230 | 231 | # Entry point for command-line execution 232 | if __name__ == "__main__": 233 | # Parse command line arguments and run main function 234 | args = parse_arguments() 235 | 236 | # Call the main function with the parsed arguments 237 | main( 238 | args.prompt, 239 | args.tools, 240 | args.model_type, 241 | args.model_id, 242 | api_base=args.api_base, 243 | api_key=args.api_key, 244 | imports=args.imports, 245 | ) 246 | -------------------------------------------------------------------------------- /interface/gradio_ui.py: -------------------------------------------------------------------------------- 1 | """ 2 | Gradio UI Module 3 | ============== 4 | 5 | This module provides a user interface for the agent framework using Gradio. 6 | It allows users to interact with agents through a web-based interface, 7 | configure agent parameters, monitor agent execution, and visualize results. 8 | 9 | The UI is designed to be intuitive and flexible, supporting different types of agents 10 | and providing real-time feedback on agent actions and reasoning processes. 11 | 12 | DeepFlow@2025 13 | """ 14 | 15 | #!/usr/bin/env python 16 | # coding=utf-8 17 | # Copyright 2024 The HuggingFace Inc. team. All rights reserved. 18 | # 19 | # Licensed under the Apache License, Version 2.0 (the "License"); 20 | # you may not use this file except in compliance with the License. 21 | # You may obtain a copy of the License at 22 | # 23 | # http://www.apache.org/licenses/LICENSE-2.0 24 | # 25 | # Unless required by applicable law or agreed to in writing, software 26 | # distributed under the License is distributed on an "AS IS" BASIS, 27 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 28 | # See the License for the specific language governing permissions and 29 | # limitations under the License. 30 | 31 | # Standard library imports 32 | import os # For file system operations like path manipulation and directory creation 33 | import re # For regular expression operations used in text processing 34 | import shutil # For high-level file operations like copying 35 | from typing import Optional # For type hinting with optional parameters 36 | 37 | # Import project-specific modules and types 38 | from smolagents.agent_types import AgentAudio, AgentImage, AgentText, handle_agent_output_types # Agent data types 39 | from smolagents.agents import ActionStep, MultiStepAgent # Agent implementation components 40 | from smolagents.memory import MemoryStep # For tracking agent memory and steps 41 | from smolagents.utils import _is_package_available # Utility to check for dependencies 42 | 43 | 44 | def pull_messages_from_step( 45 | step_log: MemoryStep, 46 | ): 47 | """ 48 | Extract ChatMessage objects from agent steps with proper nesting. 49 | 50 | This function converts agent step logs into Gradio ChatMessage objects for display in the UI. 51 | It handles various types of content including model outputs, tool calls, errors, and execution logs. 52 | 53 | Args: 54 | step_log (MemoryStep): The memory step log containing agent actions and observations 55 | 56 | Yields: 57 | gradio.ChatMessage: Properly formatted chat messages for display in the Gradio UI 58 | """ 59 | # Import gradio within the function to avoid requiring it unless this function is called 60 | import gradio as gr 61 | 62 | if isinstance(step_log, ActionStep): 63 | # Output the step number as a header message 64 | step_number = f"Step {step_log.step_number}" if step_log.step_number is not None else "" 65 | yield gr.ChatMessage(role="assistant", content=f"**{step_number}**") 66 | 67 | # First yield the thought/reasoning from the LLM if available 68 | if hasattr(step_log, "model_output") and step_log.model_output is not None: 69 | # Clean up the LLM output by removing whitespace 70 | model_output = step_log.model_output.strip() 71 | 72 | # Remove any trailing and extra backticks, handling multiple possible formats 73 | # This ensures code blocks are properly formatted for display 74 | model_output = re.sub(r"```\s*", "```", model_output) # handles ``` 75 | model_output = re.sub(r"\s*```", "```", model_output) # handles ``` 76 | model_output = re.sub(r"```\s*\n\s*", "```", model_output) # handles ```\n 77 | model_output = model_output.strip() 78 | 79 | # Create a chat message with the model's reasoning/thought process 80 | yield gr.ChatMessage(role="assistant", content=model_output) 81 | 82 | # Handle tool calls by creating nested messages 83 | if hasattr(step_log, "tool_calls") and step_log.tool_calls is not None: 84 | # Get the first tool call to determine how to format the display 85 | first_tool_call = step_log.tool_calls[0] 86 | # Check if the tool call is a Python code execution 87 | used_code = first_tool_call.name == "python_interpreter" 88 | # Create a unique ID for the parent message for nesting child messages 89 | parent_id = f"call_{len(step_log.tool_calls)}" 90 | 91 | # Process the tool call arguments based on their type 92 | args = first_tool_call.arguments 93 | if isinstance(args, dict): 94 | # If arguments are a dictionary, prefer the 'answer' field or stringify the dict 95 | content = str(args.get("answer", str(args))) 96 | else: 97 | # Otherwise, convert arguments to string 98 | content = str(args).strip() 99 | 100 | # Special formatting for Python code 101 | if used_code: 102 | # Clean up the content by removing any end code tags and format as Python code block 103 | content = re.sub(r"```.*?\n", "", content) # Remove existing code blocks 104 | content = re.sub(r"\s*\s*", "", content) # Remove end_code tags 105 | content = content.strip() 106 | # Ensure code is wrapped in Python code block markdown 107 | if not content.startswith("```python"): 108 | content = f"```python\n{content}\n```" 109 | 110 | # Create the parent message for the tool call 111 | parent_message_tool = gr.ChatMessage( 112 | role="assistant", 113 | content=content, 114 | metadata={ 115 | "title": f"🛠️ Used tool {first_tool_call.name}", # Tool name as title 116 | "id": parent_id, # Unique ID for nesting 117 | "status": "pending", # Initial status is pending 118 | }, 119 | ) 120 | yield parent_message_tool 121 | 122 | # Add execution logs as a nested message under the tool call if available 123 | if hasattr(step_log, "observations") and ( 124 | step_log.observations is not None and step_log.observations.strip() 125 | ): # Only yield execution logs if there's actual content 126 | log_content = step_log.observations.strip() 127 | if log_content: 128 | # Remove prefix and format as a bash code block 129 | log_content = re.sub(r"^Execution logs:\s*", "", log_content) 130 | yield gr.ChatMessage( 131 | role="assistant", 132 | content=f"```bash\n{log_content}\n", 133 | metadata={"title": "📝 Execution Logs", "parent_id": parent_id, "status": "done"}, 134 | ) 135 | 136 | # Add errors as a nested message under the tool call if available 137 | if hasattr(step_log, "error") and step_log.error is not None: 138 | yield gr.ChatMessage( 139 | role="assistant", 140 | content=str(step_log.error), 141 | metadata={"title": "💥 Error", "parent_id": parent_id, "status": "done"}, 142 | ) 143 | 144 | # Update parent message metadata to done status 145 | parent_message_tool.metadata["status"] = "done" 146 | 147 | # Handle standalone errors (not from tool calls) 148 | elif hasattr(step_log, "error") and step_log.error is not None: 149 | yield gr.ChatMessage(role="assistant", content=str(step_log.error), metadata={"title": "💥 Error"}) 150 | 151 | # Add a footnote with step number, token counts, and duration information 152 | step_footnote = f"{step_number}" 153 | # Add token information if available 154 | if hasattr(step_log, "input_token_count") and hasattr(step_log, "output_token_count"): 155 | token_str = ( 156 | f" | Input-tokens:{step_log.input_token_count:,} | Output-tokens:{step_log.output_token_count:,}" 157 | ) 158 | step_footnote += token_str 159 | # Add duration information if available 160 | if hasattr(step_log, "duration"): 161 | step_duration = f" | Duration: {round(float(step_log.duration), 2)}" if step_log.duration else None 162 | step_footnote += step_duration 163 | # Format the footnote as small, light gray text 164 | step_footnote = f"""{step_footnote} """ 165 | yield gr.ChatMessage(role="assistant", content=f"{step_footnote}") 166 | # Add a separator line between steps 167 | yield gr.ChatMessage(role="assistant", content="-----", metadata={"status": "done"}) 168 | 169 | 170 | def stream_to_gradio( 171 | agent, 172 | task: str, 173 | reset_agent_memory: bool = False, 174 | additional_args: Optional[dict] = None, 175 | ): 176 | """ 177 | Runs an agent with the given task and streams the messages from the agent as Gradio ChatMessages. 178 | 179 | This function serves as a bridge between the agent execution and the Gradio UI, 180 | converting agent outputs into UI-friendly message formats in real-time. 181 | 182 | Args: 183 | agent: The agent to run 184 | task (str): The user's task or query 185 | reset_agent_memory (bool): Whether to reset the agent's memory before running 186 | additional_args (dict, optional): Additional arguments to pass to the agent 187 | 188 | Yields: 189 | gradio.ChatMessage: Chat messages for the Gradio UI representing agent actions and outputs 190 | """ 191 | # Check if gradio is installed 192 | if not _is_package_available("gradio"): 193 | raise ModuleNotFoundError( 194 | "Please install 'gradio' extra to use the GradioUI: `pip install 'smolagents[gradio]'`" 195 | ) 196 | import gradio as gr 197 | 198 | # Initialize token counters 199 | total_input_tokens = 0 200 | total_output_tokens = 0 201 | 202 | # Run the agent with streaming enabled 203 | for step_log in agent.run(task, stream=True, reset=reset_agent_memory, additional_args=additional_args): 204 | # Track token usage if the model provides token counts 205 | if getattr(agent.model, "last_input_token_count", None) is not None: 206 | total_input_tokens += agent.model.last_input_token_count 207 | total_output_tokens += agent.model.last_output_token_count 208 | # Store token counts in the step log for display 209 | if isinstance(step_log, ActionStep): 210 | step_log.input_token_count = agent.model.last_input_token_count 211 | step_log.output_token_count = agent.model.last_output_token_count 212 | 213 | # Convert each step log to gradio chat messages and yield them 214 | for message in pull_messages_from_step( 215 | step_log, 216 | ): 217 | yield message 218 | 219 | # Process the final answer (the last step log) 220 | final_answer = step_log # Last log is the run's final_answer 221 | final_answer = handle_agent_output_types(final_answer) # Ensure proper type wrapping 222 | 223 | # Format the final answer based on its type 224 | if isinstance(final_answer, AgentText): 225 | # Text answers are displayed as markdown 226 | yield gr.ChatMessage( 227 | role="assistant", 228 | content=f"**Final answer:**\n{final_answer.to_string()}\n", 229 | ) 230 | elif isinstance(final_answer, AgentImage): 231 | # Image answers are displayed as images with the proper MIME type 232 | yield gr.ChatMessage( 233 | role="assistant", 234 | content={"path": final_answer.to_string(), "mime_type": "image/png"}, 235 | ) 236 | elif isinstance(final_answer, AgentAudio): 237 | # Audio answers are displayed as audio players with the proper MIME type 238 | yield gr.ChatMessage( 239 | role="assistant", 240 | content={"path": final_answer.to_string(), "mime_type": "audio/wav"}, 241 | ) 242 | else: 243 | # Other types are displayed as stringified values 244 | yield gr.ChatMessage(role="assistant", content=f"**Final answer:** {str(final_answer)}") 245 | 246 | 247 | class GradioUI: 248 | """ 249 | A one-line interface to launch your agent in Gradio. 250 | 251 | This class provides an easy way to create a web-based user interface 252 | for interacting with agents, handling file uploads, and displaying results. 253 | """ 254 | 255 | def __init__(self, agent: MultiStepAgent, file_upload_folder: str | None = None): 256 | """ 257 | Initialize the Gradio UI with an agent and optional file upload support. 258 | 259 | Args: 260 | agent (MultiStepAgent): The agent to interact with through the UI 261 | file_upload_folder (str, optional): Path to store uploaded files 262 | """ 263 | # Check if gradio is installed 264 | if not _is_package_available("gradio"): 265 | raise ModuleNotFoundError( 266 | "Please install 'gradio' extra to use the GradioUI: `pip install 'smolagents[gradio]'`" 267 | ) 268 | 269 | # Store the agent 270 | self.agent = agent 271 | 272 | # Set up file upload folder if provided 273 | self.file_upload_folder = file_upload_folder 274 | 275 | # Get agent metadata for display 276 | self.name = getattr(agent, "name") or "Agent interface" 277 | self.description = getattr(agent, "description", None) 278 | 279 | # Create the file upload folder if it doesn't exist 280 | if self.file_upload_folder is not None: 281 | if not os.path.exists(file_upload_folder): 282 | os.mkdir(file_upload_folder) 283 | 284 | def interact_with_agent(self, prompt, messages, session_state): 285 | """ 286 | Handle user interaction with the agent. 287 | 288 | This function processes user input, runs the agent, and streams responses 289 | to the Gradio chat interface. 290 | 291 | Args: 292 | prompt (str): The user's input prompt 293 | messages (list): The current message history 294 | session_state (dict): State object to store session data 295 | 296 | Yields: 297 | list: Updated message history with agent responses 298 | """ 299 | import gradio as gr 300 | 301 | # Initialize agent in session state if not already present 302 | if "agent" not in session_state: 303 | session_state["agent"] = self.agent 304 | 305 | try: 306 | # Add user message to chat history 307 | messages.append(gr.ChatMessage(role="user", content=prompt)) 308 | yield messages 309 | 310 | # Stream agent responses and add to chat history 311 | for msg in stream_to_gradio(session_state["agent"], task=prompt, reset_agent_memory=False): 312 | messages.append(msg) 313 | yield messages 314 | 315 | yield messages 316 | except Exception as e: 317 | # Handle errors by displaying them in the chat 318 | print(f"Error in interaction: {str(e)}") 319 | messages.append(gr.ChatMessage(role="assistant", content=f"Error: {str(e)}")) 320 | yield messages 321 | 322 | def upload_file(self, file, file_uploads_log, allowed_file_types=None): 323 | """ 324 | Handle file uploads and validate file types. 325 | 326 | Args: 327 | file: The uploaded file object 328 | file_uploads_log (list): Log of previously uploaded files 329 | allowed_file_types (list, optional): List of allowed file extensions 330 | 331 | Returns: 332 | tuple: Status textbox and updated file upload log 333 | """ 334 | import gradio as gr 335 | 336 | # Handle case when no file is uploaded 337 | if file is None: 338 | return gr.Textbox(value="No file uploaded", visible=True), file_uploads_log 339 | 340 | # Set default allowed file types if none provided 341 | if allowed_file_types is None: 342 | allowed_file_types = [".pdf", ".docx", ".txt"] 343 | 344 | # Check if file extension is allowed 345 | file_ext = os.path.splitext(file.name)[1].lower() 346 | if file_ext not in allowed_file_types: 347 | return gr.Textbox("File type disallowed", visible=True), file_uploads_log 348 | 349 | # Sanitize file name to ensure it's safe for the file system 350 | original_name = os.path.basename(file.name) 351 | sanitized_name = re.sub( 352 | r"[^\w\-.]", "_", original_name 353 | ) # Replace any non-alphanumeric, non-dash, or non-dot characters with underscores 354 | 355 | # Save the uploaded file to the specified folder 356 | file_path = os.path.join(self.file_upload_folder, os.path.basename(sanitized_name)) 357 | shutil.copy(file.name, file_path) 358 | 359 | # Return status and updated file log 360 | return gr.Textbox(f"File uploaded: {file_path}", visible=True), file_uploads_log + [file_path] 361 | 362 | def log_user_message(self, text_input, file_uploads_log): 363 | """ 364 | Process user input and append uploaded file information if any. 365 | 366 | Args: 367 | text_input (str): The user's text input 368 | file_uploads_log (list): List of uploaded file paths 369 | 370 | Returns: 371 | tuple: Modified text input, empty string, and button with updated state 372 | """ 373 | import gradio as gr 374 | 375 | # Append file information to the user's input if files have been uploaded 376 | return ( 377 | text_input 378 | + ( 379 | f"\nYou have been provided with these files, which might be helpful or not: {file_uploads_log}" 380 | if len(file_uploads_log) > 0 381 | else "" 382 | ), 383 | "", # Clear the input box 384 | gr.Button(interactive=False), # Disable the button during processing 385 | ) 386 | 387 | def launch(self, share: bool = True, **kwargs): 388 | """ 389 | Launch the Gradio UI for the agent. 390 | 391 | Args: 392 | share (bool): Whether to create a public link for the UI 393 | **kwargs: Additional arguments to pass to gradio.Blocks.launch() 394 | """ 395 | import gradio as gr 396 | 397 | # Create the Gradio Blocks interface 398 | with gr.Blocks(theme="ocean", fill_height=True) as demo: 399 | # Initialize session state and storage 400 | session_state = gr.State({}) # Stores session-specific data 401 | stored_messages = gr.State([]) # Stores message history 402 | file_uploads_log = gr.State([]) # Tracks uploaded files 403 | 404 | # Create the sidebar with agent information 405 | with gr.Sidebar(): 406 | # Display agent name and description 407 | gr.Markdown( 408 | f"# {self.name.replace('_', ' ').capitalize()}" 409 | "\n> This web ui allows you to interact with a `smolagents` agent that can use tools and execute steps to complete tasks." 410 | + (f"\n\n**Agent description:**\n{self.description}" if self.description else "") 411 | ) 412 | 413 | # Create input components 414 | with gr.Group(): 415 | gr.Markdown("**Your request**", container=True) 416 | # Text input for user prompts 417 | text_input = gr.Textbox( 418 | lines=3, 419 | label="Chat Message", 420 | container=False, 421 | placeholder="Enter your prompt here and press Shift+Enter or press the button", 422 | ) 423 | # Submit button 424 | submit_btn = gr.Button("Submit", variant="primary") 425 | 426 | # Add file upload components if enabled 427 | if self.file_upload_folder is not None: 428 | upload_file = gr.File(label="Upload a file") 429 | upload_status = gr.Textbox(label="Upload Status", interactive=False, visible=False) 430 | # Set up file upload handling 431 | upload_file.change( 432 | self.upload_file, 433 | [upload_file, file_uploads_log], 434 | [upload_status, file_uploads_log], 435 | ) 436 | 437 | # Add attribution footer 438 | gr.HTML("

Powered by:

") 439 | with gr.Row(): 440 | gr.HTML("""
441 | logo 442 | huggingface/smolagents 443 |
""") 444 | 445 | # Main chat interface 446 | chatbot = gr.Chatbot( 447 | label="Agent", 448 | type="messages", 449 | avatar_images=( 450 | None, # User avatar (none) 451 | "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolagents/mascot_smol.png", # Agent avatar 452 | ), 453 | resizeable=True, 454 | scale=1, 455 | ) 456 | 457 | # Set up event handlers for text input submission 458 | text_input.submit( 459 | self.log_user_message, # First log the user message 460 | [text_input, file_uploads_log], 461 | [stored_messages, text_input, submit_btn], 462 | ).then( 463 | self.interact_with_agent, # Then interact with the agent 464 | [stored_messages, chatbot, session_state], 465 | [chatbot] 466 | ).then( 467 | # Finally re-enable the input components 468 | lambda: ( 469 | gr.Textbox( 470 | interactive=True, placeholder="Enter your prompt here and press Shift+Enter or the button" 471 | ), 472 | gr.Button(interactive=True), 473 | ), 474 | None, 475 | [text_input, submit_btn], 476 | ) 477 | 478 | # Set up event handlers for button click (same flow as text submission) 479 | submit_btn.click( 480 | self.log_user_message, 481 | [text_input, file_uploads_log], 482 | [stored_messages, text_input, submit_btn], 483 | ).then( 484 | self.interact_with_agent, 485 | [stored_messages, chatbot, session_state], 486 | [chatbot] 487 | ).then( 488 | lambda: ( 489 | gr.Textbox( 490 | interactive=True, placeholder="Enter your prompt here and press Shift+Enter or the button" 491 | ), 492 | gr.Button(interactive=True), 493 | ), 494 | None, 495 | [text_input, submit_btn], 496 | ) 497 | 498 | # Launch the Gradio interface 499 | demo.launch(debug=True, share=share, **kwargs) 500 | 501 | 502 | # Export only the public API 503 | __all__ = ["stream_to_gradio", "GradioUI"] 504 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/models/__init__.py -------------------------------------------------------------------------------- /prompts/code_agent.yaml: -------------------------------------------------------------------------------- 1 | # CodeAgent Prompt Template 2 | # 3 | # This template defines the prompting strategy for agents that solve tasks 4 | # using code generation and execution. It establishes patterns for reasoning, 5 | # code writing, execution, and observation interpretation to accomplish tasks. 6 | # 7 | # DeepFlow@2025 8 | 9 | system_prompt: |- 10 | You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can. 11 | To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code. 12 | To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences. 13 | 14 | At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use. 15 | Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '' sequence. 16 | During each intermediate step, you can use 'print()' to save whatever important information you will then need. 17 | These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step. 18 | In the end you have to return a final answer using the `final_answer` tool. 19 | 20 | Here are a few examples using notional tools: 21 | --- 22 | Task: "Generate an image of the oldest person in this document." 23 | 24 | Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer. 25 | Code: 26 | ```py 27 | answer = document_qa(document=document, question="Who is the oldest person mentioned?") 28 | print(answer) 29 | ``` 30 | Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland." 31 | 32 | Thought: I will now generate an image showcasing the oldest person. 33 | Code: 34 | ```py 35 | image = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.") 36 | final_answer(image) 37 | ``` 38 | 39 | --- 40 | Task: "What is the result of the following operation: 5 + 3 + 1294.678?" 41 | 42 | Thought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool 43 | Code: 44 | ```py 45 | result = 5 + 3 + 1294.678 46 | final_answer(result) 47 | ``` 48 | 49 | --- 50 | Task: 51 | "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French. 52 | You have been provided with these additional arguments, that you can access using the keys as variables in your python code: 53 | {'question': 'Quel est l'animal sur l'image?', 'image': 'path/to/image.jpg'}" 54 | 55 | Thought: I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image. 56 | Code: 57 | ```py 58 | translated_question = translator(question=question, src_lang="French", tgt_lang="English") 59 | print(f"The translated question is {translated_question}.") 60 | answer = image_qa(image=image, question=translated_question) 61 | final_answer(f"The answer is {answer}") 62 | ``` 63 | 64 | --- 65 | Task: 66 | In a 1979 interview, Stanislaus Ulam discusses with Martin Sherwin about other great physicists of his time, including Oppenheimer. 67 | What does he say was the consequence of Einstein learning too much math on his creativity, in one word? 68 | 69 | Thought: I need to find and read the 1979 interview of Stanislaus Ulam with Martin Sherwin. 70 | Code: 71 | ```py 72 | pages = search(query="1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein") 73 | print(pages) 74 | ``` 75 | Observation: 76 | No result found for query "1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein". 77 | 78 | Thought: The query was maybe too restrictive and did not find any results. Let's try again with a broader query. 79 | Code: 80 | ```py 81 | pages = search(query="1979 interview Stanislaus Ulam") 82 | print(pages) 83 | ``` 84 | Observation: 85 | Found 6 pages: 86 | [Stanislaus Ulam 1979 interview](https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/) 87 | 88 | [Ulam discusses Manhattan Project](https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/) 89 | 90 | (truncated) 91 | 92 | Thought: I will read the first 2 pages to know more. 93 | Code: 94 | ```py 95 | for url in ["https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/", "https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/"]: 96 | whole_page = visit_webpage(url) 97 | print(whole_page) 98 | print("\n" + "="*80 + "\n") # Print separator between pages 99 | ``` 100 | Observation: 101 | Manhattan Project Locations: 102 | Los Alamos, NM 103 | Stanislaus Ulam was a Polish-American mathematician. He worked on the Manhattan Project at Los Alamos and later helped design the hydrogen bomb. In this interview, he discusses his work at 104 | (truncated) 105 | 106 | Thought: I now have the final answer: from the webpages visited, Stanislaus Ulam says of Einstein: "He learned too much mathematics and sort of diminished, it seems to me personally, it seems to me his purely physics creativity." Let's answer in one word. 107 | Code: 108 | ```py 109 | final_answer("diminished") 110 | ``` 111 | 112 | --- 113 | Task: "Which city has the highest population: Guangzhou or Shanghai?" 114 | 115 | Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities. 116 | Code: 117 | ```py 118 | for city in ["Guangzhou", "Shanghai"]: 119 | print(f"Population {city}:", search(f"{city} population") 120 | ``` 121 | Observation: 122 | Population Guangzhou: ['Guangzhou has a population of 15 million inhabitants as of 2021.'] 123 | Population Shanghai: '26 million (2019)' 124 | 125 | Thought: Now I know that Shanghai has the highest population. 126 | Code: 127 | ```py 128 | final_answer("Shanghai") 129 | ``` 130 | 131 | --- 132 | Task: "What is the current age of the pope, raised to the power 0.36?" 133 | 134 | Thought: I will use the tool `wiki` to get the age of the pope, and confirm that with a web search. 135 | Code: 136 | ```py 137 | pope_age_wiki = wiki(query="current pope age") 138 | print("Pope age as per wikipedia:", pope_age_wiki) 139 | pope_age_search = web_search(query="current pope age") 140 | print("Pope age as per google search:", pope_age_search) 141 | ``` 142 | Observation: 143 | Pope age: "The pope Francis is currently 88 years old." 144 | 145 | Thought: I know that the pope is 88 years old. Let's compute the result using python code. 146 | Code: 147 | ```py 148 | pope_current_age = 88 ** 0.36 149 | final_answer(pope_current_age) 150 | ``` 151 | 152 | Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you only have access to these tools: 153 | {%- for tool in tools.values() %} 154 | - {{ tool.name }}: {{ tool.description }} 155 | Takes inputs: {{tool.inputs}} 156 | Returns an output of type: {{tool.output_type}} 157 | {%- endfor %} 158 | 159 | {%- if managed_agents and managed_agents.values() | list %} 160 | You can also give tasks to team members. 161 | Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task. 162 | Given that this team member is a real human, you should be very verbose in your task. 163 | Here is a list of the team members that you can call: 164 | {%- for agent in managed_agents.values() %} 165 | - {{ agent.name }}: {{ agent.description }} 166 | {%- endfor %} 167 | {%- endif %} 168 | 169 | Here are the rules you should always follow to solve your task: 170 | 1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```' sequence, else you will fail. 171 | 2. Use only variables that you have defined! 172 | 3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'. 173 | 4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block. 174 | 5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters. 175 | 6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'. 176 | 7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables. 177 | 8. You can use imports in your code, but only from the following list of modules: {{authorized_imports}} 178 | 9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist. 179 | 10. Don't give up! You're in charge of solving the task, not providing directions to solve it. 180 | 181 | Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000. 182 | planning: 183 | initial_facts: |- 184 | Below I will present you a task. 185 | 186 | You will now build a comprehensive preparatory survey of which facts we have at our disposal and which ones we still need. 187 | To do so, you will have to read the task and identify things that must be discovered in order to successfully complete it. 188 | Don't make any assumptions. For each item, provide a thorough reasoning. Here is how you will structure this survey: 189 | 190 | --- 191 | ### 1. Facts given in the task 192 | List here the specific facts given in the task that could help you (there might be nothing here). 193 | 194 | ### 2. Facts to look up 195 | List here any facts that we may need to look up. 196 | Also list where to find each of these, for instance a website, a file... - maybe the task contains some sources that you should re-use here. 197 | 198 | ### 3. Facts to derive 199 | List here anything that we want to derive from the above by logical reasoning, for instance computation or simulation. 200 | 201 | Keep in mind that "facts" will typically be specific names, dates, values, etc. Your answer should use the below headings: 202 | ### 1. Facts given in the task 203 | ### 2. Facts to look up 204 | ### 3. Facts to derive 205 | Do not add anything else. 206 | 207 | Here is the task: 208 | ``` 209 | {{task}} 210 | ``` 211 | Now begin! 212 | initial_plan : |- 213 | You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools. 214 | 215 | Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts. 216 | This plan should involve individual tasks based on the available tools, that if executed correctly will yield the correct answer. 217 | Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS. 218 | After writing the final step of the plan, write the '\n' tag and stop there. 219 | 220 | Here is your task: 221 | 222 | Task: 223 | ``` 224 | {{task}} 225 | ``` 226 | You can leverage these tools: 227 | {%- for tool in tools.values() %} 228 | - {{ tool.name }}: {{ tool.description }} 229 | Takes inputs: {{tool.inputs}} 230 | Returns an output of type: {{tool.output_type}} 231 | {%- endfor %} 232 | 233 | {%- if managed_agents and managed_agents.values() | list %} 234 | You can also give tasks to team members. 235 | Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task. 236 | Given that this team member is a real human, you should be very verbose in your task. 237 | Here is a list of the team members that you can call: 238 | {%- for agent in managed_agents.values() %} 239 | - {{ agent.name }}: {{ agent.description }} 240 | {%- endfor %} 241 | {%- endif %} 242 | 243 | List of facts that you know: 244 | ``` 245 | {{answer_facts}} 246 | ``` 247 | 248 | Now begin! Write your plan below. 249 | update_facts_pre_messages: |- 250 | You are a world expert at gathering known and unknown facts based on a conversation. 251 | Below you will find a task, and a history of attempts made to solve the task. You will have to produce a list of these: 252 | ### 1. Facts given in the task 253 | ### 2. Facts that we have learned 254 | ### 3. Facts still to look up 255 | ### 4. Facts still to derive 256 | Find the task and history below: 257 | update_facts_post_messages: |- 258 | Earlier we've built a list of facts. 259 | But since in your previous steps you may have learned useful new facts or invalidated some false ones. 260 | Please update your list of facts based on the previous history, and provide these headings: 261 | ### 1. Facts given in the task 262 | ### 2. Facts that we have learned 263 | ### 3. Facts still to look up 264 | ### 4. Facts still to derive 265 | 266 | Now write your new list of facts below. 267 | update_plan_pre_messages: |- 268 | You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools. 269 | 270 | You have been given a task: 271 | ``` 272 | {{task}} 273 | ``` 274 | 275 | Find below the record of what has been tried so far to solve it. Then you will be asked to make an updated plan to solve the task. 276 | If the previous tries so far have met some success, you can make an updated plan based on these actions. 277 | If you are stalled, you can make a completely new plan starting from scratch. 278 | update_plan_post_messages: |- 279 | You're still working towards solving this task: 280 | ``` 281 | {{task}} 282 | ``` 283 | 284 | You can leverage these tools: 285 | {%- for tool in tools.values() %} 286 | - {{ tool.name }}: {{ tool.description }} 287 | Takes inputs: {{tool.inputs}} 288 | Returns an output of type: {{tool.output_type}} 289 | {%- endfor %} 290 | 291 | {%- if managed_agents and managed_agents.values() | list %} 292 | You can also give tasks to team members. 293 | Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task'. 294 | Given that this team member is a real human, you should be very verbose in your task, it should be a long string providing informations as detailed as necessary. 295 | Here is a list of the team members that you can call: 296 | {%- for agent in managed_agents.values() %} 297 | - {{ agent.name }}: {{ agent.description }} 298 | {%- endfor %} 299 | {%- endif %} 300 | 301 | Here is the up to date list of facts that you know: 302 | ``` 303 | {{facts_update}} 304 | ``` 305 | 306 | Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts. 307 | This plan should involve individual tasks based on the available tools, that if executed correctly will yield the correct answer. 308 | Beware that you have {remaining_steps} steps remaining. 309 | Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS. 310 | After writing the final step of the plan, write the '\n' tag and stop there. 311 | 312 | Now write your new plan below. 313 | managed_agent: 314 | task: |- 315 | You're a helpful agent named '{{name}}'. 316 | You have been submitted this task by your manager. 317 | --- 318 | Task: 319 | {{task}} 320 | --- 321 | You're helping your manager solve a wider task: so make sure to not provide a one-line answer, but give as much information as possible to give them a clear understanding of the answer. 322 | 323 | Your final_answer WILL HAVE to contain these parts: 324 | ### 1. Task outcome (short version): 325 | ### 2. Task outcome (extremely detailed version): 326 | ### 3. Additional context (if relevant): 327 | 328 | Put all these in your final_answer tool, everything that you do not pass as an argument to final_answer will be lost. 329 | And even if your task resolution is not successful, please return as much context as possible, so that your manager can act upon this feedback. 330 | report: |- 331 | Here is the final answer from your managed agent '{{name}}': 332 | {{final_answer}} 333 | final_answer: 334 | pre_messages: |- 335 | An agent tried to answer a user query but it got stuck and failed to do so. You are tasked with providing an answer instead. Here is the agent's memory: 336 | post_messages: |- 337 | Based on the above, please provide an answer to the following user task: 338 | {{task}} 339 | -------------------------------------------------------------------------------- /prompts/toolcalling_agent.yaml: -------------------------------------------------------------------------------- 1 | # ToolCallingAgent Prompt Template 2 | # 3 | # This template defines the prompting strategy for agents that primarily 4 | # execute tasks through tool calls. It establishes the core reasoning pattern 5 | # for decision-making, action execution, and observation interpretation. 6 | # 7 | # DeepFlow@2025 8 | 9 | system_prompt: |- 10 | You are an expert assistant who can solve any task using tool calls. You will be given a task to solve as best you can. 11 | To do so, you have been given access to some tools. 12 | 13 | The tool call you write is an action: after the tool is executed, you will get the result of the tool call as an "observation". 14 | This Action/Observation can repeat N times, you should take several steps when needed. 15 | 16 | You can use the result of the previous action as input for the next action. 17 | The observation will always be a string: it can represent a file, like "image_1.jpg". 18 | Then you can use it as input for the next action. You can do it for instance as follows: 19 | 20 | Observation: "image_1.jpg" 21 | 22 | Action: 23 | { 24 | "name": "image_transformer", 25 | "arguments": {"image": "image_1.jpg"} 26 | } 27 | 28 | To provide the final answer to the task, use an action blob with "name": "final_answer" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this: 29 | Action: 30 | { 31 | "name": "final_answer", 32 | "arguments": {"answer": "insert your final answer here"} 33 | } 34 | 35 | 36 | Here are a few examples using notional tools: 37 | --- 38 | Task: "Generate an image of the oldest person in this document." 39 | 40 | Action: 41 | { 42 | "name": "document_qa", 43 | "arguments": {"document": "document.pdf", "question": "Who is the oldest person mentioned?"} 44 | } 45 | Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland." 46 | 47 | Action: 48 | { 49 | "name": "image_generator", 50 | "arguments": {"prompt": "A portrait of John Doe, a 55-year-old man living in Canada."} 51 | } 52 | Observation: "image.png" 53 | 54 | Action: 55 | { 56 | "name": "final_answer", 57 | "arguments": "image.png" 58 | } 59 | 60 | --- 61 | Task: "What is the result of the following operation: 5 + 3 + 1294.678?" 62 | 63 | Action: 64 | { 65 | "name": "python_interpreter", 66 | "arguments": {"code": "5 + 3 + 1294.678"} 67 | } 68 | Observation: 1302.678 69 | 70 | Action: 71 | { 72 | "name": "final_answer", 73 | "arguments": "1302.678" 74 | } 75 | 76 | --- 77 | Task: "Which city has the highest population , Guangzhou or Shanghai?" 78 | 79 | Action: 80 | { 81 | "name": "search", 82 | "arguments": "Population Guangzhou" 83 | } 84 | Observation: ['Guangzhou has a population of 15 million inhabitants as of 2021.'] 85 | 86 | 87 | Action: 88 | { 89 | "name": "search", 90 | "arguments": "Population Shanghai" 91 | } 92 | Observation: '26 million (2019)' 93 | 94 | Action: 95 | { 96 | "name": "final_answer", 97 | "arguments": "Shanghai" 98 | } 99 | 100 | Above example were using notional tools that might not exist for you. You only have access to these tools: 101 | {%- for tool in tools.values() %} 102 | - {{ tool.name }}: {{ tool.description }} 103 | Takes inputs: {{tool.inputs}} 104 | Returns an output of type: {{tool.output_type}} 105 | {%- endfor %} 106 | 107 | {%- if managed_agents and managed_agents.values() | list %} 108 | You can also give tasks to team members. 109 | Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task. 110 | Given that this team member is a real human, you should be very verbose in your task. 111 | Here is a list of the team members that you can call: 112 | {%- for agent in managed_agents.values() %} 113 | - {{ agent.name }}: {{ agent.description }} 114 | {%- endfor %} 115 | {%- endif %} 116 | 117 | Here are the rules you should always follow to solve your task: 118 | 1. ALWAYS provide a tool call, else you will fail. 119 | 2. Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead. 120 | 3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself. 121 | If no tool call is needed, use final_answer tool to return your answer. 122 | 4. Never re-do a tool call that you previously did with the exact same parameters. 123 | 124 | Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000. 125 | planning: 126 | initial_facts: |- 127 | Below I will present you a task. 128 | 129 | You will now build a comprehensive preparatory survey of which facts we have at our disposal and which ones we still need. 130 | To do so, you will have to read the task and identify things that must be discovered in order to successfully complete it. 131 | Don't make any assumptions. For each item, provide a thorough reasoning. Here is how you will structure this survey: 132 | 133 | --- 134 | ### 1. Facts given in the task 135 | List here the specific facts given in the task that could help you (there might be nothing here). 136 | 137 | ### 2. Facts to look up 138 | List here any facts that we may need to look up. 139 | Also list where to find each of these, for instance a website, a file... - maybe the task contains some sources that you should re-use here. 140 | 141 | ### 3. Facts to derive 142 | List here anything that we want to derive from the above by logical reasoning, for instance computation or simulation. 143 | 144 | Keep in mind that "facts" will typically be specific names, dates, values, etc. Your answer should use the below headings: 145 | ### 1. Facts given in the task 146 | ### 2. Facts to look up 147 | ### 3. Facts to derive 148 | Do not add anything else. 149 | 150 | Here is the task: 151 | ``` 152 | {{task}} 153 | ``` 154 | Now begin! 155 | initial_plan : |- 156 | You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools. 157 | 158 | Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts. 159 | This plan should involve individual tasks based on the available tools, that if executed correctly will yield the correct answer. 160 | Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS. 161 | After writing the final step of the plan, write the '\n' tag and stop there. 162 | 163 | Here is your task: 164 | 165 | Task: 166 | ``` 167 | {{task}} 168 | ``` 169 | You can leverage these tools: 170 | {%- for tool in tools.values() %} 171 | - {{ tool.name }}: {{ tool.description }} 172 | Takes inputs: {{tool.inputs}} 173 | Returns an output of type: {{tool.output_type}} 174 | {%- endfor %} 175 | 176 | {%- if managed_agents and managed_agents.values() | list %} 177 | You can also give tasks to team members. 178 | Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task. 179 | Given that this team member is a real human, you should be very verbose in your task. 180 | Here is a list of the team members that you can call: 181 | {%- for agent in managed_agents.values() %} 182 | - {{ agent.name }}: {{ agent.description }} 183 | {%- endfor %} 184 | {%- endif %} 185 | 186 | List of facts that you know: 187 | ``` 188 | {{answer_facts}} 189 | ``` 190 | 191 | Now begin! Write your plan below. 192 | update_facts_pre_messages: |- 193 | You are a world expert at gathering known and unknown facts based on a conversation. 194 | Below you will find a task, and a history of attempts made to solve the task. You will have to produce a list of these: 195 | ### 1. Facts given in the task 196 | ### 2. Facts that we have learned 197 | ### 3. Facts still to look up 198 | ### 4. Facts still to derive 199 | Find the task and history below: 200 | update_facts_post_messages: |- 201 | Earlier we've built a list of facts. 202 | But since in your previous steps you may have learned useful new facts or invalidated some false ones. 203 | Please update your list of facts based on the previous history, and provide these headings: 204 | ### 1. Facts given in the task 205 | ### 2. Facts that we have learned 206 | ### 3. Facts still to look up 207 | ### 4. Facts still to derive 208 | 209 | Now write your new list of facts below. 210 | update_plan_pre_messages: |- 211 | You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools. 212 | 213 | You have been given a task: 214 | ``` 215 | {{task}} 216 | ``` 217 | 218 | Find below the record of what has been tried so far to solve it. Then you will be asked to make an updated plan to solve the task. 219 | If the previous tries so far have met some success, you can make an updated plan based on these actions. 220 | If you are stalled, you can make a completely new plan starting from scratch. 221 | update_plan_post_messages: |- 222 | You're still working towards solving this task: 223 | ``` 224 | {{task}} 225 | ``` 226 | 227 | You can leverage these tools: 228 | {%- for tool in tools.values() %} 229 | - {{ tool.name }}: {{ tool.description }} 230 | Takes inputs: {{tool.inputs}} 231 | Returns an output of type: {{tool.output_type}} 232 | {%- endfor %} 233 | 234 | {%- if managed_agents and managed_agents.values() | list %} 235 | You can also give tasks to team members. 236 | Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task'. 237 | Given that this team member is a real human, you should be very verbose in your task, it should be a long string providing informations as detailed as necessary. 238 | Here is a list of the team members that you can call: 239 | {%- for agent in managed_agents.values() %} 240 | - {{ agent.name }}: {{ agent.description }} 241 | {%- endfor %} 242 | {%- endif %} 243 | 244 | Here is the up to date list of facts that you know: 245 | ``` 246 | {{facts_update}} 247 | ``` 248 | 249 | Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts. 250 | This plan should involve individual tasks based on the available tools, that if executed correctly will yield the correct answer. 251 | Beware that you have {remaining_steps} steps remaining. 252 | Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS. 253 | After writing the final step of the plan, write the '\n' tag and stop there. 254 | 255 | Now write your new plan below. 256 | managed_agent: 257 | task: |- 258 | You're a helpful agent named '{{name}}'. 259 | You have been submitted this task by your manager. 260 | --- 261 | Task: 262 | {{task}} 263 | --- 264 | You're helping your manager solve a wider task: so make sure to not provide a one-line answer, but give as much information as possible to give them a clear understanding of the answer. 265 | 266 | Your final_answer WILL HAVE to contain these parts: 267 | ### 1. Task outcome (short version): 268 | ### 2. Task outcome (extremely detailed version): 269 | ### 3. Additional context (if relevant): 270 | 271 | Put all these in your final_answer tool, everything that you do not pass as an argument to final_answer will be lost. 272 | And even if your task resolution is not successful, please return as much context as possible, so that your manager can act upon this feedback. 273 | report: |- 274 | Here is the final answer from your managed agent '{{name}}': 275 | {{final_answer}} 276 | final_answer: 277 | pre_messages: |- 278 | An agent tried to answer a user query but it got stuck and failed to do so. You are tasked with providing an answer instead. Here is the agent's memory: 279 | post_messages: |- 280 | Based on the above, please provide an answer to the following user task: 281 | {{task}} 282 | -------------------------------------------------------------------------------- /runtime/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/runtime/__init__.py -------------------------------------------------------------------------------- /runtime/remote_executors.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | # Copyright 2024 The HuggingFace Inc. team. All rights reserved. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | import base64 18 | import json 19 | import pickle 20 | import re 21 | import time 22 | from io import BytesIO 23 | from pathlib import Path 24 | from textwrap import dedent 25 | from typing import Any, Dict, List, Tuple 26 | 27 | import requests 28 | from PIL import Image 29 | 30 | from .local_python_executor import PythonExecutor 31 | from .monitoring import LogLevel 32 | from .tools import Tool, get_tools_definition_code 33 | from .utils import AgentError 34 | 35 | 36 | try: 37 | from dotenv import load_dotenv 38 | 39 | load_dotenv() 40 | except ModuleNotFoundError: 41 | pass 42 | 43 | 44 | class RemotePythonExecutor(PythonExecutor): 45 | def __init__(self, additional_imports: List[str], logger): 46 | self.additional_imports = additional_imports 47 | self.logger = logger 48 | self.logger.log("Initializing executor, hold on...") 49 | self.final_answer_pattern = re.compile(r"^final_answer\((.*)\)$", re.M) 50 | self.installed_packages = [] 51 | 52 | def run_code_raise_errors(self, code: str, return_final_answer: bool = False) -> Tuple[Any, str]: 53 | raise NotImplementedError 54 | 55 | def send_tools(self, tools: Dict[str, Tool]): 56 | tool_definition_code = get_tools_definition_code(tools) 57 | 58 | packages_to_install = set() 59 | for tool in tools.values(): 60 | for package in tool.to_dict()["requirements"]: 61 | if package not in self.installed_packages: 62 | packages_to_install.add(package) 63 | self.installed_packages.append(package) 64 | 65 | execution = self.run_code_raise_errors( 66 | f"!pip install {' '.join(packages_to_install)}\n" + tool_definition_code 67 | ) 68 | self.logger.log(execution[1]) 69 | 70 | def send_variables(self, variables: dict): 71 | """ 72 | Send variables to the kernel namespace using pickle. 73 | """ 74 | pickled_vars = base64.b64encode(pickle.dumps(variables)).decode() 75 | code = f""" 76 | import pickle, base64 77 | vars_dict = pickle.loads(base64.b64decode('{pickled_vars}')) 78 | locals().update(vars_dict) 79 | """ 80 | self.run_code_raise_errors(code) 81 | 82 | def __call__(self, code_action: str) -> Tuple[Any, str, bool]: 83 | """Check if code is a final answer and run it accordingly""" 84 | is_final_answer = bool(self.final_answer_pattern.search(code_action)) 85 | output = self.run_code_raise_errors(code_action, return_final_answer=is_final_answer) 86 | return output[0], output[1], is_final_answer 87 | 88 | def install_packages(self, additional_imports: List[str]): 89 | additional_imports = additional_imports + ["smolagents"] 90 | _, execution_logs = self.run_code_raise_errors(f"!pip install {' '.join(additional_imports)}") 91 | self.logger.log(execution_logs) 92 | return additional_imports 93 | 94 | 95 | class E2BExecutor(RemotePythonExecutor): 96 | """ 97 | Executes Python code using E2B. 98 | 99 | Args: 100 | additional_imports (`list[str]`): Additional imports to install. 101 | logger (`Logger`): Logger to use. 102 | **kwargs: Additional arguments to pass to the E2B Sandbox. 103 | """ 104 | 105 | def __init__(self, additional_imports: List[str], logger, **kwargs): 106 | super().__init__(additional_imports, logger) 107 | try: 108 | from e2b_code_interpreter import Sandbox 109 | except ModuleNotFoundError: 110 | raise ModuleNotFoundError( 111 | """Please install 'e2b' extra to use E2BExecutor: `pip install 'smolagents[e2b]'`""" 112 | ) 113 | self.sandbox = Sandbox(**kwargs) 114 | self.installed_packages = self.install_packages(additional_imports) 115 | self.logger.log("E2B is running", level=LogLevel.INFO) 116 | 117 | def run_code_raise_errors(self, code: str, return_final_answer: bool = False) -> Tuple[Any, str]: 118 | execution = self.sandbox.run_code( 119 | code, 120 | ) 121 | if execution.error: 122 | execution_logs = "\n".join([str(log) for log in execution.logs.stdout]) 123 | logs = execution_logs 124 | logs += "Executing code yielded an error:" 125 | logs += execution.error.name + "\n" 126 | logs += execution.error.value 127 | logs += execution.error.traceback 128 | raise AgentError(logs, self.logger) 129 | execution_logs = "\n".join([str(log) for log in execution.logs.stdout]) 130 | if not execution.results: 131 | return None, execution_logs 132 | else: 133 | for result in execution.results: 134 | if result.is_main_result: 135 | for attribute_name in ["jpeg", "png"]: 136 | if getattr(result, attribute_name) is not None: 137 | image_output = getattr(result, attribute_name) 138 | decoded_bytes = base64.b64decode(image_output.encode("utf-8")) 139 | return Image.open(BytesIO(decoded_bytes)), execution_logs 140 | for attribute_name in [ 141 | "chart", 142 | "data", 143 | "html", 144 | "javascript", 145 | "json", 146 | "latex", 147 | "markdown", 148 | "pdf", 149 | "svg", 150 | "text", 151 | ]: 152 | if getattr(result, attribute_name) is not None: 153 | return getattr(result, attribute_name), execution_logs 154 | if return_final_answer: 155 | raise AgentError("No main result returned by executor!", self.logger) 156 | return None, execution_logs 157 | 158 | 159 | class DockerExecutor(RemotePythonExecutor): 160 | """ 161 | Executes Python code using Jupyter Kernel Gateway in a Docker container. 162 | """ 163 | 164 | def __init__( 165 | self, 166 | additional_imports: List[str], 167 | logger, 168 | host: str = "127.0.0.1", 169 | port: int = 8888, 170 | ): 171 | """ 172 | Initialize the Docker-based Jupyter Kernel Gateway executor. 173 | """ 174 | super().__init__(additional_imports, logger) 175 | try: 176 | import docker 177 | from websocket import create_connection 178 | except ModuleNotFoundError: 179 | raise ModuleNotFoundError( 180 | "Please install 'docker' extra to use DockerExecutor: `pip install 'smolagents[docker]'`" 181 | ) 182 | self.host = host 183 | self.port = port 184 | 185 | # Initialize Docker 186 | try: 187 | self.client = docker.from_env() 188 | except docker.errors.DockerException as e: 189 | raise RuntimeError("Could not connect to Docker daemon: make sure Docker is running.") from e 190 | 191 | # Build and start container 192 | try: 193 | self.logger.log("Building Docker image...", level=LogLevel.INFO) 194 | dockerfile_path = Path(__file__).parent / "Dockerfile" 195 | if not dockerfile_path.exists(): 196 | with open(dockerfile_path, "w") as f: 197 | f.write("""FROM python:3.12-slim 198 | 199 | RUN pip install jupyter_kernel_gateway requests numpy pandas 200 | RUN pip install jupyter_client notebook 201 | 202 | EXPOSE 8888 203 | CMD ["jupyter", "kernelgateway", "--KernelGatewayApp.ip='0.0.0.0'", "--KernelGatewayApp.port=8888", "--KernelGatewayApp.allow_origin='*'"] 204 | """) 205 | _, build_logs = self.client.images.build( 206 | path=str(dockerfile_path.parent), dockerfile=str(dockerfile_path), tag="jupyter-kernel" 207 | ) 208 | self.logger.log(build_logs, level=LogLevel.DEBUG) 209 | 210 | self.logger.log(f"Starting container on {host}:{port}...", level=LogLevel.INFO) 211 | self.container = self.client.containers.run( 212 | "jupyter-kernel", ports={"8888/tcp": (host, port)}, detach=True 213 | ) 214 | 215 | retries = 0 216 | while self.container.status != "running" and retries < 5: 217 | self.logger.log(f"Container status: {self.container.status}, waiting...", level=LogLevel.INFO) 218 | time.sleep(1) 219 | self.container.reload() 220 | retries += 1 221 | 222 | self.base_url = f"http://{host}:{port}" 223 | 224 | # Create new kernel via HTTP 225 | r = requests.post(f"{self.base_url}/api/kernels") 226 | if r.status_code != 201: 227 | error_details = { 228 | "status_code": r.status_code, 229 | "headers": dict(r.headers), 230 | "url": r.url, 231 | "body": r.text, 232 | "request_method": r.request.method, 233 | "request_headers": dict(r.request.headers), 234 | "request_body": r.request.body, 235 | } 236 | self.logger.log_error(f"Failed to create kernel. Details: {json.dumps(error_details, indent=2)}") 237 | raise RuntimeError(f"Failed to create kernel: Status {r.status_code}\nResponse: {r.text}") from None 238 | 239 | self.kernel_id = r.json()["id"] 240 | 241 | ws_url = f"ws://{host}:{port}/api/kernels/{self.kernel_id}/channels" 242 | self.ws = create_connection(ws_url) 243 | 244 | self.installed_packages = self.install_packages(additional_imports) 245 | self.logger.log( 246 | f"Container {self.container.short_id} is running with kernel {self.kernel_id}", level=LogLevel.INFO 247 | ) 248 | 249 | except Exception as e: 250 | self.cleanup() 251 | raise RuntimeError(f"Failed to initialize Jupyter kernel: {e}") from e 252 | 253 | def run_code_raise_errors(self, code_action: str, return_final_answer: bool = False) -> Tuple[Any, str]: 254 | """ 255 | Execute code and return result based on whether it's a final answer. 256 | """ 257 | try: 258 | if return_final_answer: 259 | match = self.final_answer_pattern.search(code_action) 260 | if match: 261 | pre_final_answer_code = self.final_answer_pattern.sub("", code_action) 262 | result_expr = match.group(1) 263 | wrapped_code = pre_final_answer_code + dedent(f""" 264 | import pickle, base64 265 | _result = {result_expr} 266 | print("RESULT_PICKLE:" + base64.b64encode(pickle.dumps(_result)).decode()) 267 | """) 268 | else: 269 | wrapped_code = code_action 270 | 271 | # Send execute request 272 | msg_id = self._send_execute_request(wrapped_code) 273 | 274 | # Collect output and results 275 | outputs = [] 276 | result = None 277 | waiting_for_idle = False 278 | 279 | while True: 280 | msg = json.loads(self.ws.recv()) 281 | msg_type = msg.get("msg_type", "") 282 | parent_msg_id = msg.get("parent_header", {}).get("msg_id") 283 | 284 | # Only process messages related to our execute request 285 | if parent_msg_id != msg_id: 286 | continue 287 | 288 | if msg_type == "stream": 289 | text = msg["content"]["text"] 290 | if return_final_answer and text.startswith("RESULT_PICKLE:"): 291 | pickle_data = text[len("RESULT_PICKLE:") :].strip() 292 | result = pickle.loads(base64.b64decode(pickle_data)) 293 | waiting_for_idle = True 294 | else: 295 | outputs.append(text) 296 | elif msg_type == "error": 297 | traceback = msg["content"].get("traceback", []) 298 | raise AgentError("\n".join(traceback), self.logger) 299 | elif msg_type == "status" and msg["content"]["execution_state"] == "idle": 300 | if not return_final_answer or waiting_for_idle: 301 | break 302 | 303 | return result, "".join(outputs) 304 | 305 | except Exception as e: 306 | self.logger.log_error(f"Code execution failed: {e}") 307 | raise 308 | 309 | def _send_execute_request(self, code: str) -> str: 310 | """Send code execution request to kernel.""" 311 | import uuid 312 | 313 | # Generate a unique message ID 314 | msg_id = str(uuid.uuid4()) 315 | 316 | # Create execute request 317 | execute_request = { 318 | "header": { 319 | "msg_id": msg_id, 320 | "username": "anonymous", 321 | "session": str(uuid.uuid4()), 322 | "msg_type": "execute_request", 323 | "version": "5.0", 324 | }, 325 | "parent_header": {}, 326 | "metadata": {}, 327 | "content": { 328 | "code": code, 329 | "silent": False, 330 | "store_history": True, 331 | "user_expressions": {}, 332 | "allow_stdin": False, 333 | }, 334 | } 335 | 336 | self.ws.send(json.dumps(execute_request)) 337 | return msg_id 338 | 339 | def cleanup(self): 340 | """Clean up resources.""" 341 | try: 342 | if hasattr(self, "container"): 343 | self.logger.log(f"Stopping and removing container {self.container.short_id}...", level=LogLevel.INFO) 344 | self.container.stop() 345 | self.container.remove() 346 | self.logger.log("Container cleanup completed", level=LogLevel.INFO) 347 | except Exception as e: 348 | self.logger.log_error(f"Error during cleanup: {e}") 349 | 350 | def delete(self): 351 | """Ensure cleanup on deletion.""" 352 | self.cleanup() 353 | 354 | 355 | __all__ = ["E2BExecutor", "DockerExecutor"] 356 | 357 | """ 358 | Remote Executors Module 359 | ===================== 360 | 361 | This module provides execution environments for running agent code in remote 362 | or containerized environments. It includes implementations for Docker-based 363 | execution and E2B cloud execution, ensuring secure and isolated code execution 364 | with proper resource management. 365 | 366 | Remote executors allow agents to run code with dependencies and system requirements 367 | that might not be available in the local environment, while maintaining security 368 | and observability. 369 | 370 | DeepFlow@2025 371 | """ 372 | -------------------------------------------------------------------------------- /templates/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/templates/__init__.py -------------------------------------------------------------------------------- /tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/tools/__init__.py -------------------------------------------------------------------------------- /tools/default_tools.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | # Copyright 2024 The HuggingFace Inc. team. All rights reserved. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | 18 | """ 19 | Default Tools Module 20 | ================== 21 | 22 | This module provides a collection of ready-to-use tools for agents, covering common 23 | functionality like web search, Python code execution, web browsing, and more. 24 | 25 | These tools follow the standard Tool interface and can be directly added to any agent 26 | to provide it with specific capabilities. Each tool is designed to handle a particular 27 | task domain and includes proper input/output typing and descriptive documentation. 28 | 29 | DeepFlow@2025 30 | """ 31 | 32 | from dataclasses import dataclass 33 | from typing import Any, Dict, Optional 34 | 35 | from .local_python_executor import ( 36 | BASE_BUILTIN_MODULES, 37 | BASE_PYTHON_TOOLS, 38 | evaluate_python_code, 39 | LocalPythonExecutor, 40 | ) 41 | from .tools import PipelineTool, Tool 42 | from .utils import get_parsed_html_from_url 43 | 44 | 45 | @dataclass 46 | class PreTool: 47 | name: str 48 | inputs: Dict[str, str] 49 | output_type: type 50 | task: str 51 | description: str 52 | repo_id: str 53 | 54 | 55 | class PythonInterpreterTool(Tool): 56 | name = "python_interpreter" 57 | description = "This is a tool that evaluates python code. It can be used to perform calculations." 58 | inputs = { 59 | "code": { 60 | "type": "string", 61 | "description": "The python code to run in interpreter", 62 | } 63 | } 64 | output_type = "string" 65 | 66 | def __init__(self, *args, authorized_imports=None, **kwargs): 67 | if authorized_imports is None: 68 | self.authorized_imports = list(set(BASE_BUILTIN_MODULES)) 69 | else: 70 | self.authorized_imports = list(set(BASE_BUILTIN_MODULES) | set(authorized_imports)) 71 | self.inputs = { 72 | "code": { 73 | "type": "string", 74 | "description": ( 75 | "The code snippet to evaluate. All variables used in this snippet must be defined in this same snippet, " 76 | f"else you will get an error. This code can only import the following python libraries: {authorized_imports}." 77 | ), 78 | } 79 | } 80 | self.base_python_tools = BASE_PYTHON_TOOLS 81 | self.python_evaluator = evaluate_python_code 82 | super().__init__(*args, **kwargs) 83 | 84 | def forward(self, code: str) -> str: 85 | state = {} 86 | output = str( 87 | self.python_evaluator( 88 | code, 89 | state=state, 90 | static_tools=self.base_python_tools, 91 | authorized_imports=self.authorized_imports, 92 | )[0] # The second element is boolean is_final_answer 93 | ) 94 | return f"Stdout:\n{str(state['_print_outputs'])}\nOutput: {output}" 95 | 96 | 97 | class FinalAnswerTool(Tool): 98 | name = "final_answer" 99 | description = "Provides a final answer to the given problem." 100 | inputs = {"answer": {"type": "any", "description": "The final answer to the problem"}} 101 | output_type = "any" 102 | 103 | def forward(self, answer: Any) -> Any: 104 | return answer 105 | 106 | 107 | class UserInputTool(Tool): 108 | name = "user_input" 109 | description = "Asks for user's input on a specific question" 110 | inputs = {"question": {"type": "string", "description": "The question to ask the user"}} 111 | output_type = "string" 112 | 113 | def forward(self, question): 114 | user_input = input(f"{question} => Type your answer here:") 115 | return user_input 116 | 117 | 118 | class DuckDuckGoSearchTool(Tool): 119 | name = "web_search" 120 | description = """Performs a duckduckgo web search based on your query (think a Google search) then returns the top search results.""" 121 | inputs = {"query": {"type": "string", "description": "The search query to perform."}} 122 | output_type = "string" 123 | 124 | def __init__(self, max_results=10, **kwargs): 125 | super().__init__() 126 | self.max_results = max_results 127 | try: 128 | from duckduckgo_search import DDGS 129 | except ImportError as e: 130 | raise ImportError( 131 | "You must install package `duckduckgo_search` to run this tool: for instance run `pip install duckduckgo-search`." 132 | ) from e 133 | self.ddgs = DDGS(**kwargs) 134 | 135 | def forward(self, query: str) -> str: 136 | results = self.ddgs.text(query, max_results=self.max_results) 137 | if len(results) == 0: 138 | raise Exception("No results found! Try a less restrictive/shorter query.") 139 | postprocessed_results = [f"[{result['title']}]({result['href']})\n{result['body']}" for result in results] 140 | return "## Search Results\n\n" + "\n\n".join(postprocessed_results) 141 | 142 | 143 | class GoogleSearchTool(Tool): 144 | name = "web_search" 145 | description = """Performs a google web search for your query then returns a string of the top search results.""" 146 | inputs = { 147 | "query": {"type": "string", "description": "The search query to perform."}, 148 | "filter_year": { 149 | "type": "integer", 150 | "description": "Optionally restrict results to a certain year", 151 | "nullable": True, 152 | }, 153 | } 154 | output_type = "string" 155 | 156 | def __init__(self, provider: str = "serpapi"): 157 | super().__init__() 158 | import os 159 | 160 | self.provider = provider 161 | if provider == "serpapi": 162 | self.organic_key = "organic_results" 163 | api_key_env_name = "SERPAPI_API_KEY" 164 | else: 165 | self.organic_key = "organic" 166 | api_key_env_name = "SERPER_API_KEY" 167 | self.api_key = os.getenv(api_key_env_name) 168 | if self.api_key is None: 169 | raise ValueError(f"Missing API key. Make sure you have '{api_key_env_name}' in your env variables.") 170 | 171 | def forward(self, query: str, filter_year: Optional[int] = None) -> str: 172 | import requests 173 | 174 | if self.provider == "serpapi": 175 | params = { 176 | "q": query, 177 | "api_key": self.api_key, 178 | "engine": "google", 179 | "google_domain": "google.com", 180 | } 181 | base_url = "https://serpapi.com/search.json" 182 | else: 183 | params = { 184 | "q": query, 185 | "api_key": self.api_key, 186 | } 187 | base_url = "https://google.serper.dev/search" 188 | if filter_year is not None: 189 | params["tbs"] = f"cdr:1,cd_min:01/01/{filter_year},cd_max:12/31/{filter_year}" 190 | 191 | response = requests.get(base_url, params=params) 192 | 193 | if response.status_code == 200: 194 | results = response.json() 195 | else: 196 | raise ValueError(response.json()) 197 | 198 | if self.organic_key not in results.keys(): 199 | if filter_year is not None: 200 | raise Exception( 201 | f"No results found for query: '{query}' with filtering on year={filter_year}. Use a less restrictive query or do not filter on year." 202 | ) 203 | else: 204 | raise Exception(f"No results found for query: '{query}'. Use a less restrictive query.") 205 | if len(results[self.organic_key]) == 0: 206 | year_filter_message = f" with filter year={filter_year}" if filter_year is not None else "" 207 | return f"No results found for '{query}'{year_filter_message}. Try with a more general query, or remove the year filter." 208 | 209 | web_snippets = [] 210 | if self.organic_key in results: 211 | for idx, page in enumerate(results[self.organic_key]): 212 | date_published = "" 213 | if "date" in page: 214 | date_published = "\nDate published: " + page["date"] 215 | 216 | source = "" 217 | if "source" in page: 218 | source = "\nSource: " + page["source"] 219 | 220 | snippet = "" 221 | if "snippet" in page: 222 | snippet = "\n" + page["snippet"] 223 | 224 | redacted_version = f"{idx}. [{page['title']}]({page['link']}){date_published}{source}\n{snippet}" 225 | web_snippets.append(redacted_version) 226 | 227 | return "## Search Results\n" + "\n\n".join(web_snippets) 228 | 229 | 230 | class VisitWebpageTool(Tool): 231 | name = "visit_webpage" 232 | description = ( 233 | "Visits a webpage at the given url and reads its content as a markdown string. Use this to browse webpages." 234 | ) 235 | inputs = { 236 | "url": { 237 | "type": "string", 238 | "description": "The url of the webpage to visit.", 239 | } 240 | } 241 | output_type = "string" 242 | 243 | def __init__(self, max_output_length: int = 40000): 244 | super().__init__() 245 | self.max_output_length = max_output_length 246 | 247 | def forward(self, url: str) -> str: 248 | try: 249 | import re 250 | 251 | import requests 252 | from markdownify import markdownify 253 | from requests.exceptions import RequestException 254 | 255 | from smolagents.utils import truncate_content 256 | except ImportError as e: 257 | raise ImportError( 258 | "You must install packages `markdownify` and `requests` to run this tool: for instance run `pip install markdownify requests`." 259 | ) from e 260 | try: 261 | # Send a GET request to the URL with a 20-second timeout 262 | response = requests.get(url, timeout=20) 263 | response.raise_for_status() # Raise an exception for bad status codes 264 | 265 | # Convert the HTML content to Markdown 266 | markdown_content = markdownify(response.text).strip() 267 | 268 | # Remove multiple line breaks 269 | markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content) 270 | 271 | return truncate_content(markdown_content, self.max_output_length) 272 | 273 | except requests.exceptions.Timeout: 274 | return "The request timed out. Please try again later or check the URL." 275 | except RequestException as e: 276 | return f"Error fetching the webpage: {str(e)}" 277 | except Exception as e: 278 | return f"An unexpected error occurred: {str(e)}" 279 | 280 | 281 | class SpeechToTextTool(PipelineTool): 282 | default_checkpoint = "openai/whisper-large-v3-turbo" 283 | description = "This is a tool that transcribes an audio into text. It returns the transcribed text." 284 | name = "transcriber" 285 | inputs = { 286 | "audio": { 287 | "type": "audio", 288 | "description": "The audio to transcribe. Can be a local path, an url, or a tensor.", 289 | } 290 | } 291 | output_type = "string" 292 | 293 | def __new__(cls, *args, **kwargs): 294 | from transformers.models.whisper import ( 295 | WhisperForConditionalGeneration, 296 | WhisperProcessor, 297 | ) 298 | 299 | cls.pre_processor_class = WhisperProcessor 300 | cls.model_class = WhisperForConditionalGeneration 301 | return super().__new__(cls, *args, **kwargs) 302 | 303 | def encode(self, audio): 304 | from .agent_types import AgentAudio 305 | 306 | audio = AgentAudio(audio).to_raw() 307 | return self.pre_processor(audio, return_tensors="pt") 308 | 309 | def forward(self, inputs): 310 | return self.model.generate(inputs["input_features"]) 311 | 312 | def decode(self, outputs): 313 | return self.pre_processor.batch_decode(outputs, skip_special_tokens=True)[0] 314 | 315 | 316 | TOOL_MAPPING = { 317 | tool_class.name: tool_class 318 | for tool_class in [ 319 | PythonInterpreterTool, 320 | DuckDuckGoSearchTool, 321 | VisitWebpageTool, 322 | ] 323 | } 324 | 325 | __all__ = [ 326 | "PythonInterpreterTool", 327 | "FinalAnswerTool", 328 | "UserInputTool", 329 | "DuckDuckGoSearchTool", 330 | "GoogleSearchTool", 331 | "VisitWebpageTool", 332 | "SpeechToTextTool", 333 | ] 334 | -------------------------------------------------------------------------------- /tools/presets/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/tools/presets/__init__.py -------------------------------------------------------------------------------- /tools/tool_validation.py: -------------------------------------------------------------------------------- 1 | import ast 2 | import builtins 3 | from itertools import zip_longest 4 | from typing import Set 5 | 6 | from .utils import BASE_BUILTIN_MODULES, get_source 7 | 8 | 9 | _BUILTIN_NAMES = set(vars(builtins)) 10 | 11 | 12 | class MethodChecker(ast.NodeVisitor): 13 | """ 14 | Checks that a method 15 | - only uses defined names 16 | - contains no local imports (e.g. numpy is ok but local_script is not) 17 | """ 18 | 19 | def __init__(self, class_attributes: Set[str], check_imports: bool = True): 20 | self.undefined_names = set() 21 | self.imports = {} 22 | self.from_imports = {} 23 | self.assigned_names = set() 24 | self.arg_names = set() 25 | self.class_attributes = class_attributes 26 | self.errors = [] 27 | self.check_imports = check_imports 28 | self.typing_names = {"Any"} 29 | 30 | def visit_arguments(self, node): 31 | """Collect function arguments""" 32 | self.arg_names = {arg.arg for arg in node.args} 33 | if node.kwarg: 34 | self.arg_names.add(node.kwarg.arg) 35 | if node.vararg: 36 | self.arg_names.add(node.vararg.arg) 37 | 38 | def visit_Import(self, node): 39 | for name in node.names: 40 | actual_name = name.asname or name.name 41 | self.imports[actual_name] = name.name 42 | 43 | def visit_ImportFrom(self, node): 44 | module = node.module or "" 45 | for name in node.names: 46 | actual_name = name.asname or name.name 47 | self.from_imports[actual_name] = (module, name.name) 48 | 49 | def visit_Assign(self, node): 50 | for target in node.targets: 51 | if isinstance(target, ast.Name): 52 | self.assigned_names.add(target.id) 53 | self.visit(node.value) 54 | 55 | def visit_With(self, node): 56 | """Track aliases in 'with' statements (the 'y' in 'with X as y')""" 57 | for item in node.items: 58 | if item.optional_vars: # This is the 'y' in 'with X as y' 59 | if isinstance(item.optional_vars, ast.Name): 60 | self.assigned_names.add(item.optional_vars.id) 61 | self.generic_visit(node) 62 | 63 | def visit_ExceptHandler(self, node): 64 | """Track exception aliases (the 'e' in 'except Exception as e')""" 65 | if node.name: # This is the 'e' in 'except Exception as e' 66 | self.assigned_names.add(node.name) 67 | self.generic_visit(node) 68 | 69 | def visit_AnnAssign(self, node): 70 | """Track annotated assignments.""" 71 | if isinstance(node.target, ast.Name): 72 | self.assigned_names.add(node.target.id) 73 | if node.value: 74 | self.visit(node.value) 75 | 76 | def visit_For(self, node): 77 | target = node.target 78 | if isinstance(target, ast.Name): 79 | self.assigned_names.add(target.id) 80 | elif isinstance(target, ast.Tuple): 81 | for elt in target.elts: 82 | if isinstance(elt, ast.Name): 83 | self.assigned_names.add(elt.id) 84 | self.generic_visit(node) 85 | 86 | def _handle_comprehension_generators(self, generators): 87 | """Helper method to handle generators in all types of comprehensions""" 88 | for generator in generators: 89 | if isinstance(generator.target, ast.Name): 90 | self.assigned_names.add(generator.target.id) 91 | elif isinstance(generator.target, ast.Tuple): 92 | for elt in generator.target.elts: 93 | if isinstance(elt, ast.Name): 94 | self.assigned_names.add(elt.id) 95 | 96 | def visit_ListComp(self, node): 97 | """Track variables in list comprehensions""" 98 | self._handle_comprehension_generators(node.generators) 99 | self.generic_visit(node) 100 | 101 | def visit_DictComp(self, node): 102 | """Track variables in dictionary comprehensions""" 103 | self._handle_comprehension_generators(node.generators) 104 | self.generic_visit(node) 105 | 106 | def visit_SetComp(self, node): 107 | """Track variables in set comprehensions""" 108 | self._handle_comprehension_generators(node.generators) 109 | self.generic_visit(node) 110 | 111 | def visit_Attribute(self, node): 112 | if not (isinstance(node.value, ast.Name) and node.value.id == "self"): 113 | self.generic_visit(node) 114 | 115 | def visit_Name(self, node): 116 | if isinstance(node.ctx, ast.Load): 117 | if not ( 118 | node.id in _BUILTIN_NAMES 119 | or node.id in BASE_BUILTIN_MODULES 120 | or node.id in self.arg_names 121 | or node.id == "self" 122 | or node.id in self.class_attributes 123 | or node.id in self.imports 124 | or node.id in self.from_imports 125 | or node.id in self.assigned_names 126 | or node.id in self.typing_names 127 | ): 128 | self.errors.append(f"Name '{node.id}' is undefined.") 129 | 130 | def visit_Call(self, node): 131 | if isinstance(node.func, ast.Name): 132 | if not ( 133 | node.func.id in _BUILTIN_NAMES 134 | or node.func.id in BASE_BUILTIN_MODULES 135 | or node.func.id in self.arg_names 136 | or node.func.id == "self" 137 | or node.func.id in self.class_attributes 138 | or node.func.id in self.imports 139 | or node.func.id in self.from_imports 140 | or node.func.id in self.assigned_names 141 | ): 142 | self.errors.append(f"Name '{node.func.id}' is undefined.") 143 | self.generic_visit(node) 144 | 145 | 146 | def validate_tool_attributes(cls, check_imports: bool = True) -> None: 147 | """ 148 | Validates that a Tool class follows the proper patterns: 149 | 0. Any argument of __init__ should have a default. 150 | Args chosen at init are not traceable, so we cannot rebuild the source code for them, thus any important arg should be defined as a class attribute. 151 | 1. About the class: 152 | - Class attributes should only be strings or dicts 153 | - Class attributes cannot be complex attributes 154 | 2. About all class methods: 155 | - Imports must be from packages, not local files 156 | - All methods must be self-contained 157 | 158 | Raises all errors encountered, if no error returns None. 159 | """ 160 | 161 | class ClassLevelChecker(ast.NodeVisitor): 162 | def __init__(self): 163 | self.imported_names = set() 164 | self.complex_attributes = set() 165 | self.class_attributes = set() 166 | self.non_defaults = set() 167 | self.non_literal_defaults = set() 168 | self.in_method = False 169 | 170 | def visit_FunctionDef(self, node): 171 | if node.name == "__init__": 172 | self._check_init_function_parameters(node) 173 | old_context = self.in_method 174 | self.in_method = True 175 | self.generic_visit(node) 176 | self.in_method = old_context 177 | 178 | def visit_Assign(self, node): 179 | if self.in_method: 180 | return 181 | # Track class attributes 182 | for target in node.targets: 183 | if isinstance(target, ast.Name): 184 | self.class_attributes.add(target.id) 185 | 186 | # Check if the assignment is more complex than simple literals 187 | if not all( 188 | isinstance(val, (ast.Str, ast.Num, ast.Constant, ast.Dict, ast.List, ast.Set)) 189 | for val in ast.walk(node.value) 190 | ): 191 | for target in node.targets: 192 | if isinstance(target, ast.Name): 193 | self.complex_attributes.add(target.id) 194 | 195 | def _check_init_function_parameters(self, node): 196 | # Check defaults in parameters 197 | for arg, default in reversed(list(zip_longest(reversed(node.args.args), reversed(node.args.defaults)))): 198 | if default is None: 199 | if arg.arg != "self": 200 | self.non_defaults.add(arg.arg) 201 | elif not isinstance(default, (ast.Str, ast.Num, ast.Constant, ast.Dict, ast.List, ast.Set)): 202 | self.non_literal_defaults.add(arg.arg) 203 | 204 | class_level_checker = ClassLevelChecker() 205 | source = get_source(cls) 206 | tree = ast.parse(source) 207 | class_node = tree.body[0] 208 | if not isinstance(class_node, ast.ClassDef): 209 | raise ValueError("Source code must define a class") 210 | class_level_checker.visit(class_node) 211 | 212 | errors = [] 213 | if class_level_checker.complex_attributes: 214 | errors.append( 215 | f"Complex attributes should be defined in __init__, not as class attributes: " 216 | f"{', '.join(class_level_checker.complex_attributes)}" 217 | ) 218 | if class_level_checker.non_defaults: 219 | errors.append( 220 | f"Parameters in __init__ must have default values, found required parameters: " 221 | f"{', '.join(class_level_checker.non_defaults)}" 222 | ) 223 | if class_level_checker.non_literal_defaults: 224 | errors.append( 225 | f"Parameters in __init__ must have literal default values, found non-literal defaults: " 226 | f"{', '.join(class_level_checker.non_literal_defaults)}" 227 | ) 228 | 229 | # Run checks on all methods 230 | for node in class_node.body: 231 | if isinstance(node, ast.FunctionDef): 232 | method_checker = MethodChecker(class_level_checker.class_attributes, check_imports=check_imports) 233 | method_checker.visit(node) 234 | errors += [f"- {node.name}: {error}" for error in method_checker.errors] 235 | 236 | if errors: 237 | raise ValueError(f"Tool validation failed for {cls.__name__}:\n" + "\n".join(errors)) 238 | return 239 | -------------------------------------------------------------------------------- /tools/vision_web_browser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from io import BytesIO 3 | from time import sleep 4 | 5 | import helium 6 | from dotenv import load_dotenv 7 | from PIL import Image 8 | from selenium import webdriver 9 | from selenium.webdriver.common.by import By 10 | from selenium.webdriver.common.keys import Keys 11 | 12 | from smolagents import CodeAgent, DuckDuckGoSearchTool, tool 13 | from smolagents.agents import ActionStep 14 | from smolagents.cli import load_model 15 | 16 | 17 | github_request = """ 18 | I'm trying to find how hard I have to work to get a repo in github.com/trending. 19 | Can you navigate to the profile for the top author of the top trending repo, and give me their total number of commits over the last year? 20 | """ # The agent is able to achieve this request only when powered by GPT-4o or Claude-3.5-sonnet. 21 | 22 | search_request = """ 23 | Please navigate to https://en.wikipedia.org/wiki/Chicago and give me a sentence containing the word "1992" that mentions a construction accident. 24 | """ 25 | 26 | 27 | def parse_arguments(): 28 | parser = argparse.ArgumentParser(description="Run a web browser automation script with a specified model.") 29 | parser.add_argument( 30 | "prompt", 31 | type=str, 32 | nargs="?", # Makes it optional 33 | default=search_request, 34 | help="The prompt to run with the agent", 35 | ) 36 | parser.add_argument( 37 | "--model-type", 38 | type=str, 39 | default="LiteLLMModel", 40 | help="The model type to use (e.g., OpenAIServerModel, LiteLLMModel, TransformersModel, HfApiModel)", 41 | ) 42 | parser.add_argument( 43 | "--model-id", 44 | type=str, 45 | default="gpt-4o", 46 | help="The model ID to use for the specified model type", 47 | ) 48 | return parser.parse_args() 49 | 50 | 51 | def save_screenshot(memory_step: ActionStep, agent: CodeAgent) -> None: 52 | sleep(1.0) # Let JavaScript animations happen before taking the screenshot 53 | driver = helium.get_driver() 54 | current_step = memory_step.step_number 55 | if driver is not None: 56 | for previous_memory_step in agent.memory.steps: # Remove previous screenshots from logs for lean processing 57 | if isinstance(previous_memory_step, ActionStep) and previous_memory_step.step_number <= current_step - 2: 58 | previous_memory_step.observations_images = None 59 | png_bytes = driver.get_screenshot_as_png() 60 | image = Image.open(BytesIO(png_bytes)) 61 | print(f"Captured a browser screenshot: {image.size} pixels") 62 | memory_step.observations_images = [image.copy()] # Create a copy to ensure it persists, important! 63 | 64 | # Update observations with current URL 65 | url_info = f"Current url: {driver.current_url}" 66 | memory_step.observations = ( 67 | url_info if memory_step.observations is None else memory_step.observations + "\n" + url_info 68 | ) 69 | return 70 | 71 | 72 | @tool 73 | def search_item_ctrl_f(text: str, nth_result: int = 1) -> str: 74 | """ 75 | Searches for text on the current page via Ctrl + F and jumps to the nth occurrence. 76 | Args: 77 | text: The text to search for 78 | nth_result: Which occurrence to jump to (default: 1) 79 | """ 80 | elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]") 81 | if nth_result > len(elements): 82 | raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)") 83 | result = f"Found {len(elements)} matches for '{text}'." 84 | elem = elements[nth_result - 1] 85 | driver.execute_script("arguments[0].scrollIntoView(true);", elem) 86 | result += f"Focused on element {nth_result} of {len(elements)}" 87 | return result 88 | 89 | 90 | @tool 91 | def go_back() -> None: 92 | """Goes back to previous page.""" 93 | driver.back() 94 | 95 | 96 | @tool 97 | def close_popups() -> str: 98 | """ 99 | Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows! This does not work on cookie consent banners. 100 | """ 101 | webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform() 102 | 103 | 104 | def initialize_driver(): 105 | """Initialize the Selenium WebDriver.""" 106 | chrome_options = webdriver.ChromeOptions() 107 | chrome_options.add_argument("--force-device-scale-factor=1") 108 | chrome_options.add_argument("--window-size=1000,1350") 109 | chrome_options.add_argument("--disable-pdf-viewer") 110 | chrome_options.add_argument("--window-position=0,0") 111 | return helium.start_chrome(headless=False, options=chrome_options) 112 | 113 | 114 | def initialize_agent(model): 115 | """Initialize the CodeAgent with the specified model.""" 116 | return CodeAgent( 117 | tools=[DuckDuckGoSearchTool(), go_back, close_popups, search_item_ctrl_f], 118 | model=model, 119 | additional_authorized_imports=["helium"], 120 | step_callbacks=[save_screenshot], 121 | max_steps=20, 122 | verbosity_level=2, 123 | ) 124 | 125 | 126 | helium_instructions = """ 127 | Use your web_search tool when you want to get Google search results. 128 | Then you can use helium to access websites. Don't use helium for Google search, only for navigating websites! 129 | Don't bother about the helium driver, it's already managed. 130 | We've already ran "from helium import *" 131 | Then you can go to pages! 132 | Code: 133 | ```py 134 | go_to('github.com/trending') 135 | ``` 136 | 137 | You can directly click clickable elements by inputting the text that appears on them. 138 | Code: 139 | ```py 140 | click("Top products") 141 | ``` 142 | 143 | If it's a link: 144 | Code: 145 | ```py 146 | click(Link("Top products")) 147 | ``` 148 | 149 | If you try to interact with an element and it's not found, you'll get a LookupError. 150 | In general stop your action after each button click to see what happens on your screenshot. 151 | Never try to login in a page. 152 | 153 | To scroll up or down, use scroll_down or scroll_up with as an argument the number of pixels to scroll from. 154 | Code: 155 | ```py 156 | scroll_down(num_pixels=1200) # This will scroll one viewport down 157 | ``` 158 | 159 | When you have pop-ups with a cross icon to close, don't try to click the close icon by finding its element or targeting an 'X' element (this most often fails). 160 | Just use your built-in tool `close_popups` to close them: 161 | Code: 162 | ```py 163 | close_popups() 164 | ``` 165 | 166 | You can use .exists() to check for the existence of an element. For example: 167 | Code: 168 | ```py 169 | if Text('Accept cookies?').exists(): 170 | click('I accept') 171 | ``` 172 | 173 | Proceed in several steps rather than trying to solve the task in one shot. 174 | And at the end, only when you have your answer, return your final answer. 175 | Code: 176 | ```py 177 | final_answer("YOUR_ANSWER_HERE") 178 | ``` 179 | 180 | If pages seem stuck on loading, you might have to wait, for instance `import time` and run `time.sleep(5.0)`. But don't overuse this! 181 | To list elements on page, DO NOT try code-based element searches like 'contributors = find_all(S("ol > li"))': just look at the latest screenshot you have and read it visually, or use your tool search_item_ctrl_f. 182 | Of course, you can act on buttons like a user would do when navigating. 183 | After each code blob you write, you will be automatically provided with an updated screenshot of the browser and the current browser url. 184 | But beware that the screenshot will only be taken at the end of the whole action, it won't see intermediate states. 185 | Don't kill the browser. 186 | When you have modals or cookie banners on screen, you should get rid of them before you can click anything else. 187 | """ 188 | 189 | 190 | def main(prompt: str, model_type: str, model_id: str) -> None: 191 | # Load environment variables 192 | load_dotenv() 193 | 194 | # Initialize the model based on the provided arguments 195 | model = load_model(model_type, model_id) 196 | 197 | global driver 198 | driver = initialize_driver() 199 | agent = initialize_agent(model) 200 | 201 | # Run the agent with the provided prompt 202 | agent.python_executor("from helium import *") 203 | agent.run(prompt + helium_instructions) 204 | 205 | 206 | if __name__ == "__main__": 207 | # Parse command line arguments 208 | args = parse_arguments() 209 | 210 | main(args.prompt, args.model_type, args.model_id) 211 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DeepFlowcc/DeepFlow/92a3ed17ccf54b91216e3f669eaf2ff396eaf68b/utils/__init__.py -------------------------------------------------------------------------------- /utils/monitoring.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | # Copyright 2024 The HuggingFace Inc. team. All rights reserved. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | 18 | """ 19 | Monitoring and Logging Module 20 | =========================== 21 | 22 | This module provides comprehensive logging and monitoring capabilities for agents. 23 | It includes classes for tracking metrics, logging agent activities at different 24 | verbosity levels, and visualizing agent structures. 25 | 26 | The monitoring system helps track performance metrics like token usage and execution times, 27 | while the logging system provides formatted output for debugging and user feedback. 28 | 29 | DeepFlow@2025 30 | """ 31 | 32 | import json 33 | from enum import IntEnum 34 | from typing import List, Optional 35 | 36 | from rich import box 37 | from rich.console import Console, Group 38 | from rich.panel import Panel 39 | from rich.rule import Rule 40 | from rich.syntax import Syntax 41 | from rich.table import Table 42 | from rich.text import Text 43 | from rich.tree import Tree 44 | 45 | from smolagents.utils import escape_code_brackets 46 | 47 | 48 | __all__ = ["AgentLogger", "LogLevel", "Monitor"] 49 | 50 | 51 | class Monitor: 52 | """ 53 | Tracks and records metrics for agent performance. 54 | 55 | This class monitors various performance metrics like execution times, 56 | token counts, and provides methods to query accumulated statistics. 57 | """ 58 | 59 | def __init__(self, tracked_model, logger): 60 | """ 61 | Initialize a monitor instance. 62 | 63 | Args: 64 | tracked_model: The model to track metrics for 65 | logger: The logger to output monitoring information 66 | """ 67 | self.step_durations = [] 68 | self.tracked_model = tracked_model 69 | self.logger = logger 70 | if hasattr(self.tracked_model, "last_input_token_count"): 71 | self.total_input_token_count = 0 72 | self.total_output_token_count = 0 73 | 74 | def get_total_token_counts(self): 75 | """ 76 | Get the accumulated token counts for both input and output. 77 | 78 | Returns: 79 | dict: A dictionary with 'input' and 'output' token counts 80 | """ 81 | return { 82 | "input": self.total_input_token_count, 83 | "output": self.total_output_token_count, 84 | } 85 | 86 | def reset(self): 87 | """ 88 | Reset all accumulated metrics to their initial state. 89 | """ 90 | self.step_durations = [] 91 | self.total_input_token_count = 0 92 | self.total_output_token_count = 0 93 | 94 | def update_metrics(self, step_log): 95 | """ 96 | Update metrics based on the latest step execution. 97 | 98 | This method records execution time and token counts from a step, 99 | updating the accumulated statistics. 100 | 101 | Args: 102 | step_log: A MemoryStep containing execution data 103 | """ 104 | step_duration = step_log.duration 105 | self.step_durations.append(step_duration) 106 | metrics_message = f"[Step {len(self.step_durations)}: Duration {step_duration:.2f} seconds" 107 | 108 | if hasattr(self.tracked_model, "last_input_token_count"): 109 | self.total_input_token_count += self.tracked_model.last_input_token_count 110 | self.total_output_token_count += self.tracked_model.last_output_token_count 111 | metrics_message += ( 112 | f"| Input tokens: {self.total_input_token_count:,} | Output tokens: {self.total_output_token_count:,}" 113 | ) 114 | metrics_message += "]" 115 | self.logger.log(Text(metrics_message, style="dim"), level=1) 116 | 117 | 118 | class LogLevel(IntEnum): 119 | """ 120 | Enumeration of logging verbosity levels. 121 | 122 | These levels control how much information is output during agent execution: 123 | - OFF: No output 124 | - ERROR: Only error messages 125 | - INFO: Standard information (default) 126 | - DEBUG: Detailed information for debugging 127 | """ 128 | OFF = -1 # No output 129 | ERROR = 0 # Only errors 130 | INFO = 1 # Normal output (default) 131 | DEBUG = 2 # Detailed output 132 | 133 | 134 | # Color constant used for highlighting in log output 135 | YELLOW_HEX = "#d4b702" 136 | 137 | 138 | class AgentLogger: 139 | """ 140 | Handles logging for agent activities with rich formatting. 141 | 142 | This class provides various methods for outputting different types of 143 | information (text, code, markdown) with consistent formatting and 144 | respects the configured verbosity level. 145 | """ 146 | 147 | def __init__(self, level: LogLevel = LogLevel.INFO): 148 | """ 149 | Initialize a logger with the specified verbosity level. 150 | 151 | Args: 152 | level: The verbosity level (default: INFO) 153 | """ 154 | self.level = level 155 | self.console = Console() 156 | 157 | def log(self, *args, level: str | LogLevel = LogLevel.INFO, **kwargs) -> None: 158 | """ 159 | Log a message if the current verbosity level allows it. 160 | 161 | Args: 162 | *args: Content to log 163 | level: Minimum level required to display this message 164 | **kwargs: Additional arguments for rich.console.print 165 | """ 166 | if isinstance(level, str): 167 | level = LogLevel[level.upper()] 168 | if level <= self.level: 169 | self.console.print(*args, **kwargs) 170 | 171 | def log_error(self, error_message: str) -> None: 172 | """ 173 | Log an error message with appropriate styling. 174 | 175 | Args: 176 | error_message: The error message to display 177 | """ 178 | self.log(escape_code_brackets(error_message), style="bold red", level=LogLevel.ERROR) 179 | 180 | def log_markdown(self, content: str, title: Optional[str] = None, level=LogLevel.INFO, style=YELLOW_HEX) -> None: 181 | """ 182 | Log content as markdown with optional title. 183 | 184 | Args: 185 | content: Markdown content to display 186 | title: Optional title to display above content 187 | level: Minimum level required to display this message 188 | style: Color style for the title 189 | """ 190 | markdown_content = Syntax( 191 | content, 192 | lexer="markdown", 193 | theme="github-dark", 194 | word_wrap=True, 195 | ) 196 | if title: 197 | self.log( 198 | Group( 199 | Rule( 200 | "[bold italic]" + title, 201 | align="left", 202 | style=style, 203 | ), 204 | markdown_content, 205 | ), 206 | level=level, 207 | ) 208 | else: 209 | self.log(markdown_content, level=level) 210 | 211 | def log_code(self, title: str, content: str, level: int = LogLevel.INFO) -> None: 212 | """ 213 | Log content as syntax-highlighted code with a title. 214 | 215 | Args: 216 | title: Title to display above the code 217 | content: Code content to display with syntax highlighting 218 | level: Minimum level required to display this message 219 | """ 220 | self.log( 221 | Panel( 222 | Syntax( 223 | content, 224 | lexer="python", 225 | theme="monokai", 226 | word_wrap=True, 227 | ), 228 | title="[bold]" + title, 229 | title_align="left", 230 | box=box.HORIZONTALS, 231 | ), 232 | level=level, 233 | ) 234 | 235 | def log_rule(self, title: str, level: int = LogLevel.INFO) -> None: 236 | """ 237 | Log a horizontal rule with a title. 238 | 239 | Args: 240 | title: Title to display on the horizontal rule 241 | level: Minimum level required to display this message 242 | """ 243 | self.log( 244 | Rule( 245 | "[bold]" + title, 246 | characters="━", 247 | style=YELLOW_HEX, 248 | ), 249 | level=level, 250 | ) 251 | 252 | def log_task(self, content: str, subtitle: str, title: Optional[str] = None, level: int = LogLevel.INFO) -> None: 253 | """ 254 | Log task information in a panel with title and subtitle. 255 | 256 | Args: 257 | content: Task content to display 258 | subtitle: Subtitle for the panel 259 | title: Optional title for the panel (defaults to "Task") 260 | level: Minimum level required to display this message 261 | """ 262 | title = title or "Task" 263 | task_panel = Panel( 264 | Text(content), 265 | title=f"[bold]{title}", 266 | subtitle=subtitle, 267 | subtitle_align="right", 268 | box=box.HORIZONTALS, 269 | ) 270 | self.log(task_panel, level=level) 271 | 272 | def log_messages(self, messages: List, level: int = LogLevel.DEBUG) -> None: 273 | """ 274 | Log a list of messages with role-based styling. 275 | 276 | Args: 277 | messages: List of message dictionaries with 'role' and 'content' 278 | level: Minimum level required to display these messages 279 | """ 280 | for message in messages: 281 | if message["role"] == "system": 282 | self.log(message["content"], style="bold green", level=level) 283 | elif message["role"] == "user": 284 | self.log(message["content"], style="bold", level=level) 285 | else: 286 | self.log(message["content"], level=level) 287 | 288 | def visualize_agent_tree(self, agent): 289 | """ 290 | Visualize the structure of an agent as a hierarchical tree. 291 | 292 | This displays the agent's tools and any managed sub-agents 293 | in a tree structure. 294 | 295 | Args: 296 | agent: The agent to visualize 297 | """ 298 | def create_tools_section(tools_dict): 299 | """Create a tree section for the agent's tools.""" 300 | tools_tree = Tree("Tools") 301 | for tool_name, tool in tools_dict.items(): 302 | tool_node = tools_tree.add(f"[bold]{tool_name}[/bold]") 303 | if hasattr(tool, "description"): 304 | tool_node.add(Text(tool.description, style="italic")) 305 | return tools_tree 306 | 307 | def get_agent_headline(agent, name: Optional[str] = None): 308 | """Generate the headline for an agent node.""" 309 | return f"[bold]{name or agent.__class__.__name__}[/bold]" 310 | 311 | def build_agent_tree(parent_tree, agent_obj): 312 | """Recursively build the agent tree structure.""" 313 | tools_tree = create_tools_section(agent_obj.tools_dict) 314 | parent_tree.add(tools_tree) 315 | 316 | if hasattr(agent_obj, "managed_agents") and agent_obj.managed_agents: 317 | managed_agents_tree = Tree("Managed Agents") 318 | for agent_name, managed_agent in agent_obj.managed_agents.items(): 319 | agent_node = managed_agents_tree.add(get_agent_headline(managed_agent, agent_name)) 320 | build_agent_tree(agent_node, managed_agent) 321 | parent_tree.add(managed_agents_tree) 322 | 323 | # Create the main agent tree and display it 324 | agent_tree = Tree(get_agent_headline(agent)) 325 | build_agent_tree(agent_tree, agent) 326 | self.console.print(agent_tree) 327 | -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utilities Module 3 | ============= 4 | 5 | This module provides various utility functions used throughout the codebase, 6 | including helpers for file handling, text processing, error management, 7 | JSON serialization, and working with web content. 8 | 9 | The utilities in this module are designed to be general-purpose and reusable, 10 | supporting the core functionality of the agent framework without being tied 11 | to specific implementation details. 12 | 13 | DeepFlow@2025 14 | """ 15 | 16 | #!/usr/bin/env python 17 | # coding=utf-8 18 | 19 | # Copyright 2024 The HuggingFace Inc. team. All rights reserved. 20 | # 21 | # Licensed under the Apache License, Version 2.0 (the "License"); 22 | # you may not use this file except in compliance with the License. 23 | # You may obtain a copy of the License at 24 | # 25 | # http://www.apache.org/licenses/LICENSE-2.0 26 | # 27 | # Unless required by applicable law or agreed to in writing, software 28 | # distributed under the License is distributed on an "AS IS" BASIS, 29 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 30 | # See the License for the specific language governing permissions and 31 | # limitations under the License. 32 | import ast 33 | import base64 34 | import importlib.metadata 35 | import importlib.util 36 | import inspect 37 | import json 38 | import os 39 | import re 40 | import types 41 | from functools import lru_cache 42 | from io import BytesIO 43 | from textwrap import dedent 44 | from typing import TYPE_CHECKING, Any, Dict, Tuple 45 | 46 | 47 | if TYPE_CHECKING: 48 | from smolagents.memory import AgentLogger 49 | 50 | 51 | __all__ = ["AgentError"] 52 | 53 | 54 | @lru_cache 55 | def _is_package_available(package_name: str) -> bool: 56 | try: 57 | importlib.metadata.version(package_name) 58 | return True 59 | except importlib.metadata.PackageNotFoundError: 60 | return False 61 | 62 | 63 | @lru_cache 64 | def _is_pillow_available(): 65 | return importlib.util.find_spec("PIL") is not None 66 | 67 | 68 | BASE_BUILTIN_MODULES = [ 69 | "collections", 70 | "datetime", 71 | "itertools", 72 | "math", 73 | "queue", 74 | "random", 75 | "re", 76 | "stat", 77 | "statistics", 78 | "time", 79 | "unicodedata", 80 | ] 81 | 82 | 83 | def escape_code_brackets(text: str) -> str: 84 | """Escapes square brackets in code segments while preserving Rich styling tags.""" 85 | 86 | def replace_bracketed_content(match): 87 | content = match.group(1) 88 | cleaned = re.sub( 89 | r"bold|red|green|blue|yellow|magenta|cyan|white|black|italic|dim|\s|#[0-9a-fA-F]{6}", "", content 90 | ) 91 | return f"\\[{content}\\]" if cleaned.strip() else f"[{content}]" 92 | 93 | return re.sub(r"\[([^\]]*)\]", replace_bracketed_content, text) 94 | 95 | 96 | class AgentError(Exception): 97 | """Base class for other agent-related exceptions""" 98 | 99 | def __init__(self, message, logger: "AgentLogger"): 100 | super().__init__(message) 101 | self.message = message 102 | logger.log_error(message) 103 | 104 | def dict(self) -> Dict[str, str]: 105 | return {"type": self.__class__.__name__, "message": str(self.message)} 106 | 107 | 108 | class AgentParsingError(AgentError): 109 | """Exception raised for errors in parsing in the agent""" 110 | 111 | pass 112 | 113 | 114 | class AgentExecutionError(AgentError): 115 | """Exception raised for errors in execution in the agent""" 116 | 117 | pass 118 | 119 | 120 | class AgentMaxStepsError(AgentError): 121 | """Exception raised for errors in execution in the agent""" 122 | 123 | pass 124 | 125 | 126 | class AgentGenerationError(AgentError): 127 | """Exception raised for errors in generation in the agent""" 128 | 129 | pass 130 | 131 | 132 | def make_json_serializable(obj: Any) -> Any: 133 | """Recursive function to make objects JSON serializable""" 134 | if obj is None: 135 | return None 136 | elif isinstance(obj, (str, int, float, bool)): 137 | # Try to parse string as JSON if it looks like a JSON object/array 138 | if isinstance(obj, str): 139 | try: 140 | if (obj.startswith("{") and obj.endswith("}")) or (obj.startswith("[") and obj.endswith("]")): 141 | parsed = json.loads(obj) 142 | return make_json_serializable(parsed) 143 | except json.JSONDecodeError: 144 | pass 145 | return obj 146 | elif isinstance(obj, (list, tuple)): 147 | return [make_json_serializable(item) for item in obj] 148 | elif isinstance(obj, dict): 149 | return {str(k): make_json_serializable(v) for k, v in obj.items()} 150 | elif hasattr(obj, "__dict__"): 151 | # For custom objects, convert their __dict__ to a serializable format 152 | return {"_type": obj.__class__.__name__, **{k: make_json_serializable(v) for k, v in obj.__dict__.items()}} 153 | else: 154 | # For any other type, convert to string 155 | return str(obj) 156 | 157 | 158 | def parse_json_blob(json_blob: str) -> Tuple[Dict[str, str], str]: 159 | "Extracts the JSON blob from the input and returns the JSON data and the rest of the input." 160 | try: 161 | first_accolade_index = json_blob.find("{") 162 | last_accolade_index = [a.start() for a in list(re.finditer("}", json_blob))][-1] 163 | json_data = json_blob[first_accolade_index : last_accolade_index + 1].replace('\\"', "'") 164 | json_data = json.loads(json_data, strict=False) 165 | return json_data, json_blob[:first_accolade_index] 166 | except json.JSONDecodeError as e: 167 | place = e.pos 168 | if json_blob[place - 1 : place + 2] == "},\n": 169 | raise ValueError( 170 | "JSON is invalid: you probably tried to provide multiple tool calls in one action. PROVIDE ONLY ONE TOOL CALL." 171 | ) 172 | raise ValueError( 173 | f"The JSON blob you used is invalid due to the following error: {e}.\n" 174 | f"JSON blob was: {json_blob}, decoding failed on that specific part of the blob:\n" 175 | f"'{json_blob[place - 4 : place + 5]}'." 176 | ) 177 | 178 | 179 | def parse_code_blobs(text: str) -> str: 180 | """Extract code blocs from the LLM's output. 181 | 182 | If a valid code block is passed, it returns it directly. 183 | 184 | Args: 185 | text (`str`): LLM's output text to parse. 186 | 187 | Returns: 188 | `str`: Extracted code block. 189 | 190 | Raises: 191 | ValueError: If no valid code block is found in the text. 192 | """ 193 | pattern = r"```(?:py|python)?\n(.*?)\n```" 194 | matches = re.findall(pattern, text, re.DOTALL) 195 | if matches: 196 | return "\n\n".join(match.strip() for match in matches) 197 | # Maybe the LLM outputted a code blob directly 198 | try: 199 | ast.parse(text) 200 | return text 201 | except SyntaxError: 202 | pass 203 | 204 | if "final" in text and "answer" in text: 205 | raise ValueError( 206 | dedent( 207 | f""" 208 | Your code snippet is invalid, because the regex pattern {pattern} was not found in it. 209 | Here is your code snippet: 210 | {text} 211 | It seems like you're trying to return the final answer, you can do it as follows: 212 | Code: 213 | ```py 214 | final_answer("YOUR FINAL ANSWER HERE") 215 | ``` 216 | """ 217 | ).strip() 218 | ) 219 | raise ValueError( 220 | dedent( 221 | f""" 222 | Your code snippet is invalid, because the regex pattern {pattern} was not found in it. 223 | Here is your code snippet: 224 | {text} 225 | Make sure to include code with the correct pattern, for instance: 226 | Thoughts: Your thoughts 227 | Code: 228 | ```py 229 | # Your python code here 230 | ``` 231 | """ 232 | ).strip() 233 | ) 234 | 235 | 236 | MAX_LENGTH_TRUNCATE_CONTENT = 20000 237 | 238 | 239 | def truncate_content(content: str, max_length: int = MAX_LENGTH_TRUNCATE_CONTENT) -> str: 240 | if len(content) <= max_length: 241 | return content 242 | else: 243 | return ( 244 | content[: max_length // 2] 245 | + f"\n..._This content has been truncated to stay below {max_length} characters_...\n" 246 | + content[-max_length // 2 :] 247 | ) 248 | 249 | 250 | class ImportFinder(ast.NodeVisitor): 251 | def __init__(self): 252 | self.packages = set() 253 | 254 | def visit_Import(self, node): 255 | for alias in node.names: 256 | # Get the base package name (before any dots) 257 | base_package = alias.name.split(".")[0] 258 | self.packages.add(base_package) 259 | 260 | def visit_ImportFrom(self, node): 261 | if node.module: # for "from x import y" statements 262 | # Get the base package name (before any dots) 263 | base_package = node.module.split(".")[0] 264 | self.packages.add(base_package) 265 | 266 | 267 | def get_method_source(method): 268 | """Get source code for a method, including bound methods.""" 269 | if isinstance(method, types.MethodType): 270 | method = method.__func__ 271 | return get_source(method) 272 | 273 | 274 | def is_same_method(method1, method2): 275 | """Compare two methods by their source code.""" 276 | try: 277 | source1 = get_method_source(method1) 278 | source2 = get_method_source(method2) 279 | 280 | # Remove method decorators if any 281 | source1 = "\n".join(line for line in source1.split("\n") if not line.strip().startswith("@")) 282 | source2 = "\n".join(line for line in source2.split("\n") if not line.strip().startswith("@")) 283 | 284 | return source1 == source2 285 | except (TypeError, OSError): 286 | return False 287 | 288 | 289 | def is_same_item(item1, item2): 290 | """Compare two class items (methods or attributes) for equality.""" 291 | if callable(item1) and callable(item2): 292 | return is_same_method(item1, item2) 293 | else: 294 | return item1 == item2 295 | 296 | 297 | def instance_to_source(instance, base_cls=None): 298 | """Convert an instance to its class source code representation.""" 299 | cls = instance.__class__ 300 | class_name = cls.__name__ 301 | 302 | # Start building class lines 303 | class_lines = [] 304 | if base_cls: 305 | class_lines.append(f"class {class_name}({base_cls.__name__}):") 306 | else: 307 | class_lines.append(f"class {class_name}:") 308 | 309 | # Add docstring if it exists and differs from base 310 | if cls.__doc__ and (not base_cls or cls.__doc__ != base_cls.__doc__): 311 | class_lines.append(f' """{cls.__doc__}"""') 312 | 313 | # Add class-level attributes 314 | class_attrs = { 315 | name: value 316 | for name, value in cls.__dict__.items() 317 | if not name.startswith("__") 318 | and not callable(value) 319 | and not (base_cls and hasattr(base_cls, name) and getattr(base_cls, name) == value) 320 | } 321 | 322 | for name, value in class_attrs.items(): 323 | if isinstance(value, str): 324 | # multiline value 325 | if "\n" in value: 326 | escaped_value = value.replace('"""', r"\"\"\"") # Escape triple quotes 327 | class_lines.append(f' {name} = """{escaped_value}"""') 328 | else: 329 | class_lines.append(f" {name} = {json.dumps(value)}") 330 | else: 331 | class_lines.append(f" {name} = {repr(value)}") 332 | 333 | if class_attrs: 334 | class_lines.append("") 335 | 336 | # Add methods 337 | methods = { 338 | name: func 339 | for name, func in cls.__dict__.items() 340 | if callable(func) 341 | and ( 342 | not base_cls 343 | or not hasattr(base_cls, name) 344 | or ( 345 | isinstance(func, staticmethod) 346 | or isinstance(func, classmethod) 347 | or (getattr(base_cls, name).__code__.co_code != func.__code__.co_code) 348 | ) 349 | ) 350 | } 351 | 352 | for name, method in methods.items(): 353 | method_source = get_source(method) 354 | # Clean up the indentation 355 | method_lines = method_source.split("\n") 356 | first_line = method_lines[0] 357 | indent = len(first_line) - len(first_line.lstrip()) 358 | method_lines = [line[indent:] for line in method_lines] 359 | method_source = "\n".join([" " + line if line.strip() else line for line in method_lines]) 360 | class_lines.append(method_source) 361 | class_lines.append("") 362 | 363 | # Find required imports using ImportFinder 364 | import_finder = ImportFinder() 365 | import_finder.visit(ast.parse("\n".join(class_lines))) 366 | required_imports = import_finder.packages 367 | 368 | # Build final code with imports 369 | final_lines = [] 370 | 371 | # Add base class import if needed 372 | if base_cls: 373 | final_lines.append(f"from {base_cls.__module__} import {base_cls.__name__}") 374 | 375 | # Add discovered imports 376 | for package in required_imports: 377 | final_lines.append(f"import {package}") 378 | 379 | if final_lines: # Add empty line after imports 380 | final_lines.append("") 381 | 382 | # Add the class code 383 | final_lines.extend(class_lines) 384 | 385 | return "\n".join(final_lines) 386 | 387 | 388 | def get_source(obj) -> str: 389 | """Get the source code of a class or callable object (e.g.: function, method). 390 | First attempts to get the source code using `inspect.getsource`. 391 | In a dynamic environment (e.g.: Jupyter, IPython), if this fails, 392 | falls back to retrieving the source code from the current interactive shell session. 393 | 394 | Args: 395 | obj: A class or callable object (e.g.: function, method) 396 | 397 | Returns: 398 | str: The source code of the object, dedented and stripped 399 | 400 | Raises: 401 | TypeError: If object is not a class or callable 402 | OSError: If source code cannot be retrieved from any source 403 | ValueError: If source cannot be found in IPython history 404 | 405 | Note: 406 | TODO: handle Python standard REPL 407 | """ 408 | if not (isinstance(obj, type) or callable(obj)): 409 | raise TypeError(f"Expected class or callable, got {type(obj)}") 410 | 411 | inspect_error = None 412 | try: 413 | # Handle dynamically created classes 414 | source = obj.__source if hasattr(obj, "__source") else inspect.getsource(obj) 415 | return dedent(source).strip() 416 | except OSError as e: 417 | # let's keep track of the exception to raise it if all further methods fail 418 | inspect_error = e 419 | try: 420 | import IPython 421 | 422 | shell = IPython.get_ipython() 423 | if not shell: 424 | raise ImportError("No active IPython shell found") 425 | all_cells = "\n".join(shell.user_ns.get("In", [])).strip() 426 | if not all_cells: 427 | raise ValueError("No code cells found in IPython session") 428 | 429 | tree = ast.parse(all_cells) 430 | for node in ast.walk(tree): 431 | if isinstance(node, (ast.ClassDef, ast.FunctionDef)) and node.name == obj.__name__: 432 | return dedent("\n".join(all_cells.split("\n")[node.lineno - 1 : node.end_lineno])).strip() 433 | raise ValueError(f"Could not find source code for {obj.__name__} in IPython history") 434 | except ImportError: 435 | # IPython is not available, let's just raise the original inspect error 436 | raise inspect_error 437 | except ValueError as e: 438 | # IPython is available but we couldn't find the source code, let's raise the error 439 | raise e from inspect_error 440 | 441 | 442 | def encode_image_base64(image): 443 | buffered = BytesIO() 444 | image.save(buffered, format="PNG") 445 | return base64.b64encode(buffered.getvalue()).decode("utf-8") 446 | 447 | 448 | def make_image_url(base64_image): 449 | return f"data:image/png;base64,{base64_image}" 450 | 451 | 452 | def make_init_file(folder: str): 453 | os.makedirs(folder, exist_ok=True) 454 | # Create __init__ 455 | with open(os.path.join(folder, "__init__.py"), "w"): 456 | pass 457 | --------------------------------------------------------------------------------