├── .gitignore ├── README.MD ├── eval-agents ├── build-langfuse-dataset.py ├── translation-agent-eval.py └── translation-agent-local-eval.py ├── multi-agents-design-patterns ├── evaluator-optimizer.py ├── orchestrator-workers.py ├── parallelization.py ├── prompt-chaining.py └── workflow-routing.py ├── requirements.txt ├── simple-agent.py └── simple-react-agent.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Environment variables 2 | .env 3 | 4 | # Python 5 | __pycache__/ 6 | *.py[cod] 7 | *$py.class 8 | *.so 9 | .Python 10 | env/ 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # Virtual Environment 28 | venv/ 29 | ENV/ -------------------------------------------------------------------------------- /README.MD: -------------------------------------------------------------------------------- 1 | # Multi-agents Design Patterns and Evaluation Process 2 | 3 | ## Installation 4 | 5 | 1. Clone the repository: 6 | ```bash 7 | git clone [repository-url] 8 | cd multi-agents 9 | ``` 10 | 11 | 2. Create and activate a virtual environment (recommended): 12 | ```bash 13 | python -m venv venv 14 | source venv/bin/activate # On Windows: venv\Scripts\activate 15 | ``` 16 | 17 | 3. Install dependencies: 18 | ```bash 19 | pip install -r requirements.txt 20 | ``` 21 | 22 | 4. Set up environment variables: 23 | ```bash 24 | # Create .env file 25 | cat << EOF > .env 26 | OPENAI_API_KEY=your_openai_key 27 | LANGFUSE_PUBLIC_KEY=your_key 28 | LANGFUSE_SECRET_KEY=your_secret 29 | LANGFUSE_HOST=https://cloud.langfuse.com 30 | EOF 31 | ``` 32 | 33 | ## Requirements.txt 34 | ```text 35 | langfuse 36 | nltk 37 | python-dotenv 38 | asyncio 39 | pydantic 40 | ``` 41 | 42 | 43 | ## Design Patterns 44 | 45 | ### 1. Prompt Chaining 46 | **Name**: Sequential Task Processing 47 | **Description**: A workflow that breaks down complex tasks into sequential steps where each agent processes the output of the previous one. This pattern is implemented in our project through the Marketing → Validation → Translation chain. 48 | 49 | ```bash 50 | python multi-agents-design-patterns/prompt-chaining.py 51 | ``` 52 | 53 | 54 | ### 2. Routing 55 | **Name**: Input Classification and Direction 56 | **Description**: A pattern that classifies inputs and directs them to specialized handlers. While not directly implemented in our current project, this could be useful for handling different types of marketing content or language pairs. 57 | 58 | 59 | ```bash 60 | python multi-agents-design-patterns/workflow-routing.py 61 | ``` 62 | 63 | ### 3. Parallelization 64 | **Name**: Concurrent Task Processing 65 | **Description**: Enables simultaneous execution of related tasks through either sectioning (breaking into subtasks) or voting (multiple attempts). Our project could be extended to use this for parallel validation checks or multiple translation attempts. 66 | 67 | ```bash 68 | python multi-agents-design-patterns/parallelization.py 69 | ``` 70 | 71 | ### 4. Orchestrator-Workers 72 | **Name**: Dynamic Task Management 73 | **Description**: Uses a central agent to coordinate and delegate tasks to specialized worker agents. Our project demonstrates this through the main workflow coordination, though in a simplified linear chain. 74 | 75 | ```bash 76 | python multi-agents-design-patterns/orchestrator-workers.py 77 | ``` 78 | 79 | ### 5. Evaluator-Optimizer 80 | **Name**: Iterative Improvement Loop 81 | **Description**: Implements a feedback loop where one agent generates content and another evaluates and suggests improvements. Our validation agent partially implements this pattern by checking marketing copy quality. 82 | 83 | ```bash 84 | python multi-agents-design-patterns/evaluator-optimizer.py 85 | ``` 86 | 87 | ## Part 2: Evaluation Framework 88 | 89 | ### Overview 90 | The `eval-agents` folder contains a comprehensive evaluation system for testing and benchmarking translation agents. The system includes parallel translation processing with BLEU score calculation, Langfuse integration for dataset management, and both interactive and batch evaluation modes. The evaluation framework supports English-Spanish translation assessment through multiple agents and uses a voting mechanism to select the best translation among multiple candidates. 91 | 92 | ### Components 93 | - `translation-agent-eval.py`: Main evaluation script with Langfuse integration 94 | - `translation-agent-local-eval.py`: Local version of the evaluation system 95 | - `build-langfuse-dataset.py`: Tool for creating translation benchmark datasets 96 | 97 | ### How to Run 98 | 99 | 1. **Setup**: 100 | ```bash 101 | 102 | # Set up environment variables in .env file 103 | LANGFUSE_PUBLIC_KEY=your_key 104 | LANGFUSE_SECRET_KEY=your_secret 105 | LANGFUSE_HOST=https://cloud.langfuse.com 106 | ``` 107 | 108 | 2. **Running Evaluation**: 109 | ```bash 110 | python translation-agent-eval.py 111 | ``` 112 | Then choose: 113 | - Option 1: Interactive mode for single translations 114 | - Option 2: Dataset evaluation mode for batch testing 115 | 116 | The system will output translation candidates, BLEU scores, and performance metrics for comprehensive agent evaluation. 117 | 118 | 3. Run evaluation without LangFuse 119 | 120 | ```bash 121 | python translation-agent-eval.py 122 | ``` 123 | -------------------------------------------------------------------------------- /eval-agents/build-langfuse-dataset.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | from datasets import load_dataset 4 | from langfuse import Langfuse 5 | 6 | # Set Langfuse credentials 7 | os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-0b384bde-303f-4df6-9066-9cb1e4e30e94" 8 | os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-978aac65-5019-4092-9bfc-63bf75aaf98e" 9 | os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" 10 | 11 | # Initialize Langfuse client 12 | langfuse = Langfuse() 13 | 14 | # Load English-Spanish translation dataset from Hugging Face 15 | print("Loading dataset from Hugging Face...") 16 | dataset = load_dataset("opus_books", "en-es", split="train[:50]") 17 | df = pd.DataFrame(dataset) 18 | 19 | # Display sample data 20 | print("\nFirst few rows of the dataset:") 21 | for i, row in df.head().iterrows(): 22 | print(f"English: {row['translation']['en']}") 23 | print(f"Spanish: {row['translation']['es']}") 24 | print("-" * 50) 25 | 26 | # Define Langfuse dataset name 27 | langfuse_dataset_name = "en-es-translation-benchmark" 28 | 29 | # Create a dataset in Langfuse 30 | print(f"\nCreating dataset '{langfuse_dataset_name}' in Langfuse...") 31 | try: 32 | langfuse.create_dataset( 33 | name=langfuse_dataset_name, 34 | description="English-Spanish translation dataset from opus_books", 35 | metadata={ 36 | "source": "opus_books", 37 | "language_pair": "en-es", 38 | "type": "benchmark" 39 | } 40 | ) 41 | print("Dataset created successfully.") 42 | except Exception as e: 43 | print(f"Note: {e}") 44 | print("Dataset might already exist, continuing with item creation...") 45 | 46 | # Add dataset items from Hugging Face dataset to Langfuse 47 | print("\nAdding items to Langfuse dataset...") 48 | count = 0 49 | for idx, row in df.iterrows(): 50 | # Create dataset item with English as input and Spanish as expected output 51 | langfuse.create_dataset_item( 52 | dataset_name=langfuse_dataset_name, 53 | input={"text": row["translation"]["en"]}, 54 | expected_output={"text": row["translation"]["es"]} 55 | ) 56 | count += 1 57 | 58 | # Progress indicator 59 | if count % 10 == 0: 60 | print(f"Added {count} items...") 61 | 62 | if idx >= 49: # For this example, we upload only the first 50 items 63 | break 64 | 65 | # Ensure all data is sent to Langfuse 66 | langfuse.flush() 67 | print(f"\nSuccessfully uploaded {count} translation examples to Langfuse dataset '{langfuse_dataset_name}'") 68 | print("You can now use this dataset for evaluating your translation agent.") -------------------------------------------------------------------------------- /eval-agents/translation-agent-eval.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import os 3 | from dotenv import load_dotenv 4 | from agents import Agent, ItemHelpers, Runner, trace 5 | from langfuse import Langfuse 6 | import nltk 7 | from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction 8 | 9 | """ 10 | Enhanced version that includes BLEU score calculation for translation evaluation. 11 | """ 12 | 13 | def load_env(): 14 | """Load environment variables from .env file""" 15 | load_dotenv() 16 | 17 | # Set Langfuse credentials 18 | os.environ["LANGFUSE_PUBLIC_KEY"] = os.getenv("LANGFUSE_PUBLIC_KEY") 19 | os.environ["LANGFUSE_SECRET_KEY"] = os.getenv("LANGFUSE_SECRET_KEY") 20 | os.environ["LANGFUSE_HOST"] = os.getenv("LANGFUSE_HOST") 21 | 22 | # Define translation agents 23 | spanish_agent = Agent( 24 | name="spanish_agent", 25 | instructions="You translate the user's message to Spanish with perfect grammar and natural phrasing.", 26 | ) 27 | 28 | translation_picker = Agent( 29 | name="translation_picker", 30 | instructions="""You are an expert Spanish translator. Pick the best Spanish translation from the given options. 31 | 32 | Evaluate each translation on: 33 | 1. Accuracy - How well it preserves the original meaning 34 | 2. Grammar - Correctness of grammar 35 | 3. Naturalness - How native and idiomatic it sounds 36 | 37 | First briefly explain which translation is best and why, then provide ONLY the best translation.""", 38 | ) 39 | 40 | async def run_translation(msg): 41 | """Run the parallel translation process on a single message""" 42 | # Ensure the entire workflow is a single trace 43 | with trace("Parallel translation"): 44 | # Run translations in parallel 45 | res_1, res_2, res_3 = await asyncio.gather( 46 | Runner.run(spanish_agent, msg), 47 | Runner.run(spanish_agent, msg), 48 | Runner.run(spanish_agent, msg), 49 | ) 50 | 51 | outputs = [ 52 | ItemHelpers.text_message_outputs(res_1.new_items), 53 | ItemHelpers.text_message_outputs(res_2.new_items), 54 | ItemHelpers.text_message_outputs(res_3.new_items), 55 | ] 56 | 57 | translations = "\n\n".join(outputs) 58 | 59 | # Get the best translation 60 | best_translation = await Runner.run( 61 | translation_picker, 62 | f"Input: {msg}\n\nTranslations:\n{translations}", 63 | ) 64 | 65 | return { 66 | "input": msg, 67 | "translations": outputs, 68 | "best": best_translation.final_output 69 | } 70 | 71 | def calculate_bleu_score(reference, candidate): 72 | """ 73 | Calculate BLEU score for a translation compared to a reference. 74 | 75 | Args: 76 | reference (str): The reference/expected translation 77 | candidate (str): The candidate translation to evaluate 78 | 79 | Returns: 80 | float: BLEU score between 0 and 100 81 | """ 82 | # Tokenize the sentences into words 83 | reference_tokens = reference.lower().split() 84 | candidate_tokens = candidate.lower().split() 85 | 86 | # Use smoothing to avoid 0 score when there are no n-gram matches 87 | smoothing = SmoothingFunction().method1 88 | 89 | # Calculate BLEU score with weights [0.25, 0.25, 0.25, 0.25] for 1-gram to 4-gram 90 | # and handle cases where reference might be too short for higher n-grams 91 | try: 92 | bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 93 | weights=(0.25, 0.25, 0.25, 0.25), 94 | smoothing_function=smoothing) 95 | except Exception as e: 96 | print(f"Error calculating BLEU score: {e}") 97 | # Fall back to unigram BLEU if there's an issue 98 | bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 99 | weights=(1, 0, 0, 0), 100 | smoothing_function=smoothing) 101 | 102 | # Convert from 0-1 scale to 0-100 scale 103 | return bleu_score * 100 104 | 105 | async def evaluate_on_dataset(dataset_name, num_examples): 106 | """Evaluate the agent on examples from the dataset""" 107 | langfuse = Langfuse() 108 | 109 | print(f"Evaluating on dataset '{dataset_name}'...") 110 | 111 | # Get the dataset from Langfuse 112 | try: 113 | dataset = langfuse.get_dataset(dataset_name) 114 | print(f"Found dataset with {len(dataset.items)} items") 115 | except Exception as e: 116 | print(f"Error retrieving dataset: {e}") 117 | return [] 118 | 119 | results = [] 120 | 121 | # Limit the number of examples to process 122 | items_to_process = min(num_examples, len(dataset.items)) 123 | print(f"Processing {items_to_process} examples...") 124 | 125 | for i in range(items_to_process): 126 | item = dataset.items[i] 127 | print(f"\n===== Processing example {i+1}/{items_to_process} =====") 128 | 129 | # Extract the input from the dataset item 130 | try: 131 | # Print the entire item to debug 132 | print(f"Item data: {item}") 133 | 134 | # Try to get the input field 135 | if hasattr(item, 'input'): 136 | english_text = item.input.get('text', f"Default example {i+1}") 137 | else: 138 | # If direct access fails, try using the to_dict method if available 139 | item_dict = item.to_dict() if hasattr(item, 'to_dict') else {} 140 | english_text = item_dict.get('input', {}).get('text', f"Default example {i+1}") 141 | 142 | # Run the agent on this example 143 | print(f"English text: {english_text}") 144 | result = await run_translation(english_text) 145 | 146 | # print("\nTranslation candidates:") 147 | # for j, translation in enumerate(result["translations"]): 148 | # print(f"{j+1}: {translation}") 149 | 150 | print(f"\nBest translation: {result['best']}") 151 | 152 | # Get the expected translation 153 | expected_translation = None 154 | if hasattr(item, 'expected_output'): 155 | print("---item.expected_output", item.expected_output) 156 | expected_translation = item.expected_output.get('text') 157 | else: 158 | # If direct access fails, try using the to_dict method if available 159 | item_dict = item.to_dict() if hasattr(item, 'to_dict') else {} 160 | expected_translation = item_dict.get('expected_output', {}).get('text') 161 | 162 | 163 | # Compare the best translation with the expected translation 164 | if expected_translation: 165 | print(f"\nExpected translation: {expected_translation}") 166 | 167 | # Add comparison result to the result dictionary 168 | result['expected'] = expected_translation 169 | 170 | # Calculate BLEU score 171 | bleu = calculate_bleu_score(expected_translation, result['best']) 172 | print(f"BLEU score: {bleu:.2f}/100") 173 | result['bleu_score'] = bleu 174 | 175 | # Simple match check (could be enhanced with more sophisticated metrics) 176 | exact_match = result['best'] == expected_translation 177 | print(f"Exact match: {exact_match}") 178 | result['exact_match'] = exact_match 179 | 180 | else: 181 | print("\nNo expected translation found for comparison") 182 | result['expected'] = None 183 | result['exact_match'] = None 184 | result['bleu_score'] = None 185 | 186 | results.append(result) 187 | 188 | except Exception as e: 189 | print(f"Error processing item {i+1}: {e}") 190 | continue 191 | 192 | # Calculate and print overall performance metrics 193 | successful_comparisons = [r for r in results if r['exact_match'] is not None] 194 | if successful_comparisons: 195 | exact_matches = sum(1 for r in successful_comparisons if r['exact_match']) 196 | match_rate = exact_matches / len(successful_comparisons) 197 | 198 | # Calculate average BLEU score 199 | bleu_scores = [r['bleu_score'] for r in successful_comparisons if r['bleu_score'] is not None] 200 | avg_bleu = sum(bleu_scores) / len(bleu_scores) if bleu_scores else 0 201 | 202 | print(f"\nOverall performance:") 203 | print(f"Exact matches: {exact_matches}/{len(successful_comparisons)} ({match_rate:.2%})") 204 | print(f"Average BLEU score: {avg_bleu:.2f}/100") 205 | 206 | return results 207 | 208 | async def main(): 209 | # Load environment variables 210 | load_env() 211 | 212 | # Ensure NLTK packages needed for BLEU calculation are downloaded 213 | try: 214 | nltk.data.find('tokenizers/punkt') 215 | except LookupError: 216 | print("Downloading NLTK punkt tokenizer...") 217 | nltk.download('punkt') 218 | 219 | # Define the Langfuse dataset name 220 | dataset_name = "en-es-translation-benchmark" 221 | 222 | # Choose mode 223 | print("1. Interactive mode") 224 | print("2. Dataset evaluation mode") 225 | choice = input("Enter choice (1/2): ") 226 | 227 | if choice == "2": 228 | # Dataset evaluation mode 229 | num_examples = int(input("How many examples to evaluate? (1-20): ")) 230 | num_examples = min(20, max(1, num_examples)) 231 | 232 | results = await evaluate_on_dataset(dataset_name, num_examples) 233 | 234 | print("\n===== EVALUATION SUMMARY =====") 235 | for i, result in enumerate(results): 236 | print(f"Example {i+1}:") 237 | print(f"Input: {result['input']}") 238 | print(f"Best translation: {result['best']}") 239 | if result['expected']: 240 | print(f"Expected translation: {result['expected']}") 241 | print(f"Exact match: {result['exact_match']}") 242 | print(f"BLEU score: {result['bleu_score']:.2f}/100") 243 | print("-" * 40) 244 | else: 245 | # Interactive mode 246 | msg = input("Enter a message to translate to Spanish: ") 247 | result = await run_translation(msg) 248 | 249 | print("\nTranslation candidates:") 250 | for i, translation in enumerate(result["translations"]): 251 | print(f"{i+1}: {translation}") 252 | 253 | print(f"\nBest translation: {result['best']}") 254 | 255 | if __name__ == "__main__": 256 | asyncio.run(main()) -------------------------------------------------------------------------------- /eval-agents/translation-agent-local-eval.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import os 3 | from dotenv import load_dotenv 4 | from agents import Agent, ItemHelpers, Runner, trace 5 | import nltk 6 | from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction 7 | from datasets import load_dataset 8 | import pandas as pd 9 | 10 | """ 11 | Enhanced version that includes BLEU score calculation for translation evaluation 12 | using HuggingFace datasets instead of Langfuse. 13 | """ 14 | 15 | def load_env(): 16 | """Load environment variables from .env file""" 17 | load_dotenv() 18 | 19 | # Define translation agents 20 | spanish_agent = Agent( 21 | name="spanish_agent", 22 | instructions="You translate the user's message to Spanish with perfect grammar and natural phrasing.", 23 | ) 24 | 25 | translation_picker = Agent( 26 | name="translation_picker", 27 | instructions="""You are an expert Spanish translator. Pick the best Spanish translation from the given options. 28 | 29 | Evaluate each translation on: 30 | 1. Accuracy - How well it preserves the original meaning 31 | 2. Grammar - Correctness of grammar 32 | 3. Naturalness - How native and idiomatic it sounds 33 | 34 | Your response must ONLY contain the text of the best translation, without any explanations, justifications, or prefixes like "Best translation:". Just return the raw translated text.""", 35 | ) 36 | 37 | async def run_translation(msg): 38 | """Run the parallel translation process on a single message""" 39 | # Ensure the entire workflow is a single trace 40 | with trace("Parallel translation"): 41 | # Run translations in parallel 42 | res_1, res_2, res_3 = await asyncio.gather( 43 | Runner.run(spanish_agent, msg), 44 | Runner.run(spanish_agent, msg), 45 | Runner.run(spanish_agent, msg), 46 | ) 47 | 48 | outputs = [ 49 | ItemHelpers.text_message_outputs(res_1.new_items), 50 | ItemHelpers.text_message_outputs(res_2.new_items), 51 | ItemHelpers.text_message_outputs(res_3.new_items), 52 | ] 53 | 54 | translations = "\n\n".join(outputs) 55 | 56 | # Get the best translation 57 | best_translation = await Runner.run( 58 | translation_picker, 59 | f"Input: {msg}\n\nTranslations:\n{translations}", 60 | ) 61 | 62 | return { 63 | "input": msg, 64 | "translations": outputs, 65 | "best": best_translation.final_output 66 | } 67 | 68 | def calculate_bleu_score(reference, candidate): 69 | """ 70 | Calculate BLEU score for a translation compared to a reference. 71 | 72 | Args: 73 | reference (str): The reference/expected translation 74 | candidate (str): The candidate translation to evaluate 75 | 76 | Returns: 77 | float: BLEU score between 0 and 100 78 | """ 79 | # Tokenize the sentences into words 80 | reference_tokens = reference.lower().split() 81 | candidate_tokens = candidate.lower().split() 82 | 83 | # Use smoothing to avoid 0 score when there are no n-gram matches 84 | smoothing = SmoothingFunction().method1 85 | 86 | # Calculate BLEU score with weights [0.25, 0.25, 0.25, 0.25] for 1-gram to 4-gram 87 | # and handle cases where reference might be too short for higher n-grams 88 | try: 89 | bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 90 | weights=(0.25, 0.25, 0.25, 0.25), 91 | smoothing_function=smoothing) 92 | except Exception as e: 93 | print(f"Error calculating BLEU score: {e}") 94 | # Fall back to unigram BLEU if there's an issue 95 | bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 96 | weights=(1, 0, 0, 0), 97 | smoothing_function=smoothing) 98 | 99 | # Convert from 0-1 scale to 0-100 scale 100 | return bleu_score * 100 101 | 102 | async def evaluate_on_dataset(dataset_name, num_examples): 103 | """Evaluate the agent on examples from the HuggingFace dataset""" 104 | print(f"Evaluating on HuggingFace dataset '{dataset_name}'...") 105 | 106 | # Load dataset from HuggingFace 107 | try: 108 | # Load the English-Spanish dataset from HuggingFace 109 | dataset = load_dataset("opus_books", "en-es", split=f"train[:{num_examples}]") 110 | df = pd.DataFrame(dataset) 111 | print(f"Found dataset with {len(df)} items") 112 | except Exception as e: 113 | print(f"Error retrieving dataset: {e}") 114 | return [] 115 | 116 | results = [] 117 | 118 | # Limit the number of examples to process 119 | items_to_process = min(num_examples, len(df)) 120 | print(f"Processing {items_to_process} examples...") 121 | 122 | for i, row in df.iterrows(): 123 | print(f"\n===== Processing example {i+1}/{items_to_process} =====") 124 | 125 | try: 126 | # Extract the English text from the dataset 127 | english_text = row["translation"]["en"] 128 | 129 | # Run the agent on this example 130 | print(f"English text: {english_text}") 131 | result = await run_translation(english_text) 132 | 133 | print(f"\nBest translation: {result['best']}") 134 | 135 | # Get the expected translation (Spanish text) 136 | expected_translation = row["translation"]["es"] 137 | 138 | # Compare the best translation with the expected translation 139 | if expected_translation: 140 | print(f"\nExpected translation: {expected_translation}") 141 | 142 | # Add comparison result to the result dictionary 143 | result['expected'] = expected_translation 144 | 145 | # Calculate BLEU score 146 | bleu = calculate_bleu_score(expected_translation, result['best']) 147 | print(f"BLEU score: {bleu:.2f}/100") 148 | result['bleu_score'] = bleu 149 | 150 | # Simple match check (could be enhanced with more sophisticated metrics) 151 | exact_match = result['best'] == expected_translation 152 | print(f"Exact match: {exact_match}") 153 | result['exact_match'] = exact_match 154 | 155 | else: 156 | print("\nNo expected translation found for comparison") 157 | result['expected'] = None 158 | result['exact_match'] = None 159 | result['bleu_score'] = None 160 | 161 | results.append(result) 162 | 163 | except Exception as e: 164 | print(f"Error processing item {i+1}: {e}") 165 | continue 166 | 167 | # Calculate and print overall performance metrics 168 | successful_comparisons = [r for r in results if r['exact_match'] is not None] 169 | if successful_comparisons: 170 | exact_matches = sum(1 for r in successful_comparisons if r['exact_match']) 171 | match_rate = exact_matches / len(successful_comparisons) 172 | 173 | # Calculate average BLEU score 174 | bleu_scores = [r['bleu_score'] for r in successful_comparisons if r['bleu_score'] is not None] 175 | avg_bleu = sum(bleu_scores) / len(bleu_scores) if bleu_scores else 0 176 | 177 | print(f"\nOverall performance:") 178 | print(f"Exact matches: {exact_matches}/{len(successful_comparisons)} ({match_rate:.2%})") 179 | print(f"Average BLEU score: {avg_bleu:.2f}/100") 180 | 181 | return results 182 | 183 | async def main(): 184 | # Load environment variables 185 | load_env() 186 | 187 | # Ensure NLTK packages needed for BLEU calculation are downloaded 188 | try: 189 | nltk.data.find('tokenizers/punkt') 190 | except LookupError: 191 | print("Downloading NLTK punkt tokenizer...") 192 | nltk.download('punkt') 193 | 194 | # Define the HuggingFace dataset name 195 | dataset_name = "opus_books" 196 | 197 | # Choose mode 198 | print("1. Interactive mode") 199 | print("2. Dataset evaluation mode") 200 | choice = input("Enter choice (1/2): ") 201 | 202 | if choice == "2": 203 | # Dataset evaluation mode 204 | num_examples = int(input("How many examples to evaluate? (1-50): ")) 205 | num_examples = min(50, max(1, num_examples)) 206 | 207 | results = await evaluate_on_dataset(dataset_name, num_examples) 208 | 209 | print("\n===== EVALUATION SUMMARY =====") 210 | for i, result in enumerate(results): 211 | print(f"Example {i+1}:") 212 | print(f"Input: {result['input']}") 213 | print(f"Best translation: {result['best']}") 214 | if result['expected']: 215 | print(f"Expected translation: {result['expected']}") 216 | print(f"Exact match: {result['exact_match']}") 217 | print(f"BLEU score: {result['bleu_score']:.2f}/100") 218 | print("-" * 40) 219 | else: 220 | # Interactive mode 221 | msg = input("Enter a message to translate to Spanish: ") 222 | result = await run_translation(msg) 223 | 224 | print("\nTranslation candidates:") 225 | for i, translation in enumerate(result["translations"]): 226 | print(f"{i+1}: {translation}") 227 | 228 | print(f"\nBest translation: {result['best']}") 229 | 230 | if __name__ == "__main__": 231 | asyncio.run(main()) -------------------------------------------------------------------------------- /multi-agents-design-patterns/evaluator-optimizer.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | import asyncio 4 | from dataclasses import dataclass 5 | from typing import Literal 6 | 7 | from agents import Agent, ItemHelpers, Runner, TResponseInputItem, trace 8 | from dotenv import load_dotenv 9 | 10 | load_dotenv() 11 | 12 | """ 13 | This example shows the LLM as a judge pattern. The first agent generates an outline for a story. 14 | The second agent judges the outline and provides feedback. We loop until the judge is satisfied 15 | with the outline. 16 | """ 17 | 18 | story_outline_generator = Agent( 19 | name="story_outline_generator", 20 | instructions=( 21 | "You generate a very short story outline based on the user's input." 22 | "If there is any feedback provided, use it to improve the outline." 23 | ), 24 | ) 25 | 26 | 27 | @dataclass 28 | class EvaluationFeedback: 29 | feedback: str 30 | score: Literal["pass", "needs_improvement", "fail"] 31 | 32 | 33 | evaluator = Agent[None]( 34 | name="evaluator", 35 | instructions=( 36 | "You evaluate a story outline and decide if it's good enough." 37 | "If it's not good enough, you provide feedback on what needs to be improved." 38 | "Never give it a pass on the first try." 39 | ), 40 | output_type=EvaluationFeedback, 41 | ) 42 | 43 | 44 | async def main() -> None: 45 | msg = input("What kind of story would you like to hear? ") 46 | input_items: list[TResponseInputItem] = [{"content": msg, "role": "user"}] 47 | 48 | latest_outline: str | None = None 49 | 50 | # We'll run the entire workflow in a single trace 51 | with trace("LLM as a judge"): 52 | while True: 53 | story_outline_result = await Runner.run( 54 | story_outline_generator, 55 | input_items, 56 | ) 57 | 58 | input_items = story_outline_result.to_input_list() 59 | latest_outline = ItemHelpers.text_message_outputs(story_outline_result.new_items) 60 | print("Story outline generated") 61 | 62 | evaluator_result = await Runner.run(evaluator, input_items) 63 | result: EvaluationFeedback = evaluator_result.final_output 64 | 65 | print(f"Evaluator score: {result.score}") 66 | 67 | if result.score == "pass": 68 | print("Story outline is good enough, exiting.") 69 | break 70 | 71 | print("Re-running with feedback") 72 | 73 | input_items.append({"content": f"Feedback: {result.feedback}", "role": "user"}) 74 | 75 | print(f"Final story outline: {latest_outline}") 76 | 77 | 78 | if __name__ == "__main__": 79 | asyncio.run(main()) -------------------------------------------------------------------------------- /multi-agents-design-patterns/orchestrator-workers.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | import asyncio 4 | from dataclasses import dataclass 5 | from typing import List, Dict, Optional, Enum, Field, Any 6 | from pydantic import BaseModel 7 | 8 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool 9 | from dotenv import load_dotenv 10 | 11 | load_dotenv() 12 | class TaskType(str, Enum): 13 | BACKEND = "backend" 14 | FRONTEND = "frontend" 15 | ANALYSIS = "analysis" 16 | INTEGRATION = "integration" 17 | 18 | 19 | class Task(BaseModel): 20 | """Represents a task that needs to be performed""" 21 | task_id: str 22 | description: str 23 | type: TaskType 24 | dependencies: List[str] = Field(default_factory=list) 25 | status: str = "pending" 26 | context: Dict[str, Any] = Field(default_factory=dict) # Additional context for workers 27 | file_paths: List[str] = Field(default_factory=list) # Files that need modification 28 | 29 | 30 | class TaskResult(BaseModel): 31 | """Result of a task execution""" 32 | task_id: str 33 | success: bool 34 | output: str 35 | error: Optional[str] = None 36 | 37 | 38 | class ProjectAnalysis(BaseModel): 39 | """Analysis of the project and required tasks""" 40 | tasks: List[Task] 41 | execution_order: List[str] 42 | 43 | 44 | class OrchestratorResult(BaseModel): 45 | """Result of the orchestration process""" 46 | all_tasks_completed: bool 47 | results: List[TaskResult] 48 | summary: str 49 | 50 | 51 | @function_tool 52 | async def analyze_project_requirements(description: str) -> ProjectAnalysis: 53 | """ 54 | Use LLM to dynamically analyze project requirements and break them down into tasks. 55 | The LLM determines necessary files to modify and creates appropriate subtasks. 56 | """ 57 | # This would be an actual LLM call to analyze the project 58 | prompt = f""" 59 | Analyze the following project requirements and break them down into specific tasks: 60 | {description} 61 | 62 | For each task: 63 | 1. Identify the type of work needed (backend/frontend/analysis/integration) 64 | 2. Determine which files need to be modified 65 | 3. Specify any dependencies between tasks 66 | 4. Provide detailed context for workers 67 | """ 68 | 69 | # Mock LLM response - in reality, this would be dynamic 70 | tasks = [ 71 | Task( 72 | task_id="analyze_auth_flow", 73 | description="Analyze current authentication flow and identify necessary changes", 74 | type=TaskType.ANALYSIS, 75 | file_paths=["auth/*", "models/user.py"], 76 | context={"focus_areas": ["security", "user flow"]} 77 | ), 78 | Task( 79 | task_id="update_api", 80 | description="Update API endpoints for new authentication flow", 81 | type=TaskType.BACKEND, 82 | dependencies=["analyze_auth_flow"], 83 | file_paths=["api/auth.py", "api/endpoints.py"], 84 | context={"api_version": "v2"} 85 | ), 86 | # ... more tasks ... 87 | ] 88 | return ProjectAnalysis(tasks=tasks, execution_order=["analyze_auth_flow", "update_api"]) 89 | 90 | 91 | @function_tool 92 | async def execute_backend_task(task: Dict[str, Any]) -> TaskResult: 93 | """Execute a backend-related task""" 94 | try: 95 | return TaskResult( 96 | task_id=task["description"].split(":")[0], # Extract task_id from description 97 | success=True, 98 | output=f"Completed backend task: {task['description']}", 99 | error=None 100 | ) 101 | except Exception as e: 102 | return TaskResult( 103 | task_id=task.get("task_id", "unknown"), 104 | success=False, 105 | output="", 106 | error=str(e) 107 | ) 108 | 109 | 110 | @function_tool 111 | async def execute_frontend_task(task: Dict[str, Any]) -> TaskResult: 112 | """Execute a frontend-related task""" 113 | try: 114 | return TaskResult( 115 | task_id=task["description"].split(":")[0], # Extract task_id from description 116 | success=True, 117 | output=f"Completed frontend task: {task['description']}", 118 | error=None 119 | ) 120 | except Exception as e: 121 | return TaskResult( 122 | task_id=task.get("task_id", "unknown"), 123 | success=False, 124 | output="", 125 | error=str(e) 126 | ) 127 | 128 | 129 | @function_tool 130 | async def synthesize_results(results: List[TaskResult]) -> str: 131 | """ 132 | Use LLM to analyze and synthesize results from multiple workers into a coherent summary 133 | """ 134 | # This would be an actual LLM call to synthesize results 135 | prompt = f""" 136 | Review the following task results and provide a coherent summary: 137 | {results} 138 | 139 | Include: 140 | 1. Overall success/failure analysis 141 | 2. Key changes made 142 | 3. Integration points to verify 143 | 4. Next steps or recommendations 144 | """ 145 | return "Synthesized summary of all work completed..." 146 | 147 | 148 | class Orchestrator: 149 | def __init__(self): 150 | self.workers = { 151 | TaskType.BACKEND: backend_worker, 152 | TaskType.FRONTEND: frontend_worker, 153 | TaskType.ANALYSIS: analysis_worker, 154 | TaskType.INTEGRATION: integration_worker 155 | } 156 | 157 | async def adjust_tasks(self, analysis: ProjectAnalysis, result: TaskResult) -> ProjectAnalysis: 158 | """ 159 | Dynamically adjust remaining tasks based on a task result 160 | """ 161 | prompt = f""" 162 | Based on the task result: 163 | {result} 164 | 165 | Analyze the current project state and determine if any tasks need to be: 166 | 1. Added 167 | 2. Modified 168 | 3. Removed 169 | 170 | Current tasks: 171 | {analysis.tasks} 172 | """ 173 | 174 | # This would be an actual LLM call 175 | # For now, return the original analysis 176 | return analysis 177 | 178 | async def execute_task(self, task: Task) -> TaskResult: 179 | """Execute a single task using the appropriate worker""" 180 | worker = self.workers.get(task.type) 181 | if not worker: 182 | raise ValueError(f"No worker found for task type: {task.type}") 183 | 184 | try: 185 | result = await Runner.run( 186 | worker, 187 | input={ 188 | "description": task.description, 189 | "context": task.context, 190 | "file_paths": task.file_paths 191 | } 192 | ) 193 | 194 | # Add requires_task_adjustment field if missing 195 | if not hasattr(result, 'requires_task_adjustment'): 196 | result.requires_task_adjustment = False 197 | 198 | return result 199 | except Exception as e: 200 | return TaskResult( 201 | task_id=task.task_id, 202 | success=False, 203 | output="", 204 | error=str(e) 205 | ) 206 | 207 | async def orchestrate(self, project_description: str) -> OrchestratorResult: 208 | """Main orchestration logic""" 209 | # Analyze project and break down into tasks 210 | analysis = await analyze_project_requirements(project_description) 211 | 212 | # Execute tasks in order, respecting dependencies 213 | results = [] 214 | completed_tasks = set() 215 | 216 | for task_id in analysis.execution_order: 217 | task = next(t for t in analysis.tasks if t.task_id == task_id) 218 | 219 | # Check dependencies 220 | if not all(dep in completed_tasks for dep in task.dependencies): 221 | continue 222 | 223 | # Execute task 224 | result = await self.execute_task(task) 225 | results.append(result) 226 | completed_tasks.add(task_id) 227 | 228 | # Dynamically adjust remaining tasks based on result if needed 229 | if result.success and result.requires_task_adjustment: 230 | new_analysis = await self.adjust_tasks(analysis, result) 231 | analysis = new_analysis 232 | 233 | # Synthesize final results 234 | summary = await synthesize_results(results) 235 | 236 | return OrchestratorResult( 237 | all_tasks_completed=len(completed_tasks) == len(analysis.tasks), 238 | results=results, 239 | summary=summary 240 | ) 241 | 242 | 243 | # Create worker agents for different types of tasks 244 | backend_worker = Agent( 245 | name="Backend Worker", 246 | instructions="Execute backend-related tasks in the codebase", 247 | tools=[execute_backend_task], 248 | output_type=TaskResult 249 | ) 250 | 251 | frontend_worker = Agent( 252 | name="Frontend Worker", 253 | instructions="Execute frontend-related tasks in the codebase", 254 | tools=[execute_frontend_task], 255 | output_type=TaskResult 256 | ) 257 | 258 | # Create additional worker agents 259 | analysis_worker = Agent( 260 | name="Analysis Worker", 261 | instructions="Execute analysis-related tasks in the codebase", 262 | tools=[], # Add appropriate tools 263 | output_type=TaskResult 264 | ) 265 | 266 | integration_worker = Agent( 267 | name="Integration Worker", 268 | instructions="Execute integration-related tasks in the codebase", 269 | tools=[], # Add appropriate tools 270 | output_type=TaskResult 271 | ) 272 | 273 | # Create orchestrator agent 274 | orchestrator = Agent( 275 | name="Orchestrator", 276 | instructions=""" 277 | Analyze the project requirements and coordinate task execution: 278 | 1. Break down the project into tasks 279 | 2. Determine task dependencies and execution order 280 | 3. Delegate tasks to appropriate worker agents 281 | 4. Monitor task completion and handle failures 282 | 5. Provide a final summary of all completed work 283 | """, 284 | tools=[analyze_project_requirements], 285 | output_type=OrchestratorResult, 286 | handoffs=[backend_worker, frontend_worker] 287 | ) 288 | 289 | 290 | async def execute_tasks(tasks: List[Task], execution_order: List[str]) -> List[TaskResult]: 291 | """Execute tasks in the specified order""" 292 | results = [] 293 | completed_tasks = set() 294 | 295 | for task_id in execution_order: 296 | task = next(task for task in tasks if task.task_id == task_id) 297 | 298 | # Check if dependencies are met 299 | if task.dependencies: 300 | if not all(dep in completed_tasks for dep in task.dependencies): 301 | continue 302 | 303 | # Select appropriate worker based on task type 304 | worker = backend_worker if task.type == "backend" else frontend_worker 305 | 306 | # Execute task 307 | result = await Runner.run( 308 | worker, 309 | input=f"Execute task: {task.description}" 310 | ) 311 | 312 | results.append(result) 313 | completed_tasks.add(task_id) 314 | 315 | return results 316 | 317 | 318 | async def main(): 319 | try: 320 | project_description = """ 321 | Update the user authentication system: 322 | - Modify API endpoints to support new authentication flow 323 | - Update data models to include new user properties 324 | - Update UI components to show new user information 325 | """ 326 | 327 | orchestrator = Orchestrator() 328 | result = await orchestrator.orchestrate(project_description) 329 | 330 | print("\nProject Execution Summary:") 331 | print("-" * 50) 332 | print(f"All tasks completed: {result.all_tasks_completed}") 333 | print("\nTask Results:") 334 | for task_result in result.results: 335 | print(f"\nTask: {task_result.task_id}") 336 | print(f"Success: {task_result.success}") 337 | print(f"Output: {task_result.output}") 338 | if task_result.error: 339 | print(f"Error: {task_result.error}") 340 | 341 | print("\nFinal Summary:") 342 | print(result.summary) 343 | 344 | except Exception as e: 345 | print(f"Error in main execution: {str(e)}") 346 | raise 347 | 348 | 349 | if __name__ == "__main__": 350 | asyncio.run(main()) -------------------------------------------------------------------------------- /multi-agents-design-patterns/parallelization.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | from dotenv import load_dotenv 3 | 4 | from agents import Agent, ItemHelpers, Runner, trace 5 | 6 | """ 7 | This example shows the parallelization pattern. We run the agent three times in parallel, and pick 8 | the best result. 9 | """ 10 | 11 | def load_env(): 12 | """Load environment variables from .env file""" 13 | load_dotenv() 14 | 15 | spanish_agent = Agent( 16 | name="spanish_agent", 17 | instructions="You translate the user's message to Spanish", 18 | ) 19 | 20 | translation_picker = Agent( 21 | name="translation_picker", 22 | instructions="You pick the best Spanish translation from the given options.", 23 | ) 24 | 25 | 26 | async def main(): 27 | msg = input("Hi! Enter a message, and we'll translate it to Spanish.\n\n") 28 | 29 | # Ensure the entire workflow is a single trace 30 | with trace("Parallel translation"): 31 | res_1, res_2, res_3 = await asyncio.gather( 32 | Runner.run( 33 | spanish_agent, 34 | msg, 35 | ), 36 | Runner.run( 37 | spanish_agent, 38 | msg, 39 | ), 40 | Runner.run( 41 | spanish_agent, 42 | msg, 43 | ), 44 | ) 45 | 46 | outputs = [ 47 | ItemHelpers.text_message_outputs(res_1.new_items), 48 | ItemHelpers.text_message_outputs(res_2.new_items), 49 | ItemHelpers.text_message_outputs(res_3.new_items), 50 | ] 51 | 52 | translations = "\n\n".join(outputs) 53 | print(f"\n\nTranslations:\n\n{translations}") 54 | 55 | best_translation = await Runner.run( 56 | translation_picker, 57 | f"Input: {msg}\n\nTranslations:\n{translations}", 58 | ) 59 | 60 | print("\n\n-----") 61 | 62 | print(f"Best translation: {best_translation.final_output}") 63 | 64 | 65 | if __name__ == "__main__": 66 | load_env() 67 | asyncio.run(main()) -------------------------------------------------------------------------------- /multi-agents-design-patterns/prompt-chaining.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | from typing import Any, Optional 3 | from pydantic import BaseModel 4 | 5 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool 6 | from dotenv import load_dotenv 7 | 8 | load_dotenv() 9 | 10 | class CustomAgentHooks(AgentHooks): 11 | def __init__(self, display_name: str): 12 | self.event_counter = 0 13 | self.display_name = display_name 14 | 15 | async def on_start(self, context: RunContextWrapper, agent: Agent) -> None: 16 | self.event_counter += 1 17 | print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started") 18 | 19 | async def on_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None: 20 | self.event_counter += 1 21 | print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended with output {output}") 22 | 23 | async def on_handoff(self, context: RunContextWrapper, agent: Agent, source: Agent) -> None: 24 | self.event_counter += 1 25 | print(f"### ({self.display_name}) {self.event_counter}: Agent {source.name} handed off to {agent.name}") 26 | 27 | async def on_tool_start(self, context: RunContextWrapper, agent: Agent, tool: Tool) -> None: 28 | self.event_counter += 1 29 | print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}") 30 | 31 | async def on_tool_end(self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str) -> None: 32 | self.event_counter += 1 33 | print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}") 34 | 35 | 36 | class MarketingCopy(BaseModel): 37 | headline: str 38 | body: str 39 | call_to_action: str 40 | 41 | 42 | class TranslatedCopy(BaseModel): 43 | original: MarketingCopy 44 | translated_headline: str 45 | translated_body: str 46 | translated_call_to_action: str 47 | 48 | 49 | class ValidationResult(BaseModel): 50 | is_valid: bool 51 | feedback: str 52 | 53 | 54 | @function_tool 55 | def validate_marketing_copy(copy: MarketingCopy) -> ValidationResult: 56 | """ 57 | Validate marketing copy against common criteria. 58 | """ 59 | feedback = [] 60 | is_valid = True 61 | 62 | # Check headline length 63 | if len(copy.headline) > 60: 64 | feedback.append("Headline is too long (should be under 60 characters)") 65 | is_valid = False 66 | 67 | # Check body length 68 | if len(copy.body) < 100: 69 | feedback.append("Body is too short (should be at least 100 characters)") 70 | is_valid = False 71 | 72 | # Check call to action 73 | if not any(cta in copy.call_to_action.lower() for cta in ["buy", "learn", "try", "get", "start"]): 74 | feedback.append("Call to action should include action words like 'buy', 'learn', 'try', etc.") 75 | is_valid = False 76 | 77 | return ValidationResult( 78 | is_valid=is_valid, 79 | feedback="\n".join(feedback) if feedback else "All checks passed!" 80 | ) 81 | 82 | 83 | @function_tool 84 | def translate_text(text: str, target_language: str) -> str: 85 | """ 86 | Translate text to target language. 87 | """ 88 | # This is a mock translation function 89 | # In a real implementation, you would use a translation API 90 | translations = { 91 | "es": { 92 | "Buy now": "Comprar ahora", 93 | "Learn more": "Más información", 94 | "Try it free": "Pruébalo gratis" 95 | } 96 | } 97 | return translations.get(target_language, {}).get(text, f"[Translated to {target_language}] {text}") 98 | 99 | 100 | # Create the agents for each step in the chain 101 | validation_agent = Agent( 102 | name="Validation Agent", 103 | instructions="Validate the marketing copy against common criteria and provide feedback.", 104 | tools=[validate_marketing_copy], 105 | output_type=ValidationResult, 106 | hooks=CustomAgentHooks(display_name="Validation Agent"), 107 | ) 108 | 109 | translation_agent = Agent( 110 | name="Translation Agent", 111 | instructions="Translate the validated marketing copy to the target language.", 112 | tools=[translate_text], 113 | output_type=TranslatedCopy, 114 | hooks=CustomAgentHooks(display_name="Translation Agent"), 115 | ) 116 | 117 | marketing_agent = Agent( 118 | name="Marketing Agent", 119 | instructions="Create compelling marketing copy with a headline, body, and call to action.", 120 | output_type=MarketingCopy, 121 | handoffs=[validation_agent], 122 | hooks=CustomAgentHooks(display_name="Marketing Agent"), 123 | ) 124 | 125 | 126 | async def main() -> None: 127 | # Example workflow: Create marketing copy -> Validate -> Translate 128 | product_description = input("Enter product description: ") 129 | target_language = input("Enter target language (e.g., 'es' for Spanish): ") 130 | 131 | # Start the chain with the marketing agent 132 | result = await Runner.run( 133 | marketing_agent, 134 | input=f"Create marketing copy for: {product_description}", 135 | ) 136 | 137 | # Check validation result 138 | validation_result = await Runner.run( 139 | validation_agent, 140 | input=f"Validate this marketing copy: {result}", 141 | ) 142 | 143 | # Access the final_output property of RunResult 144 | if not validation_result.final_output.is_valid: 145 | print(f"Validation failed:\n{validation_result.final_output.feedback}") 146 | return 147 | 148 | # If validation passes, proceed with translation 149 | translation_result = await Runner.run( 150 | translation_agent, 151 | input=f"Translate this marketing copy to {target_language}: {result}", 152 | ) 153 | 154 | print("\nFinal Results:") 155 | print(f"Original Headline: {translation_result.final_output.original.headline}") 156 | print(f"Translated Headline: {translation_result.final_output.translated_headline}") 157 | print(f"\nOriginal Body: {translation_result.final_output.original.body}") 158 | print(f"Translated Body: {translation_result.final_output.translated_body}") 159 | print(f"\nOriginal CTA: {translation_result.final_output.original.call_to_action}") 160 | print(f"Translated CTA: {translation_result.final_output.translated_call_to_action}") 161 | 162 | 163 | if __name__ == "__main__": 164 | asyncio.run(main()) -------------------------------------------------------------------------------- /multi-agents-design-patterns/workflow-routing.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import uuid 3 | 4 | from openai.types.responses import ResponseContentPartDoneEvent, ResponseTextDeltaEvent 5 | 6 | from agents import Agent, RawResponsesStreamEvent, Runner, TResponseInputItem, trace 7 | 8 | from dotenv import load_dotenv 9 | 10 | 11 | """ 12 | This example shows the handoffs/routing pattern. The triage agent receives the first message, and 13 | then hands off to the appropriate agent based on the language of the request. Responses are 14 | streamed to the user. 15 | """ 16 | 17 | french_agent = Agent( 18 | name="french_agent", 19 | instructions="You only speak French", 20 | ) 21 | 22 | spanish_agent = Agent( 23 | name="spanish_agent", 24 | instructions="You only speak Spanish", 25 | ) 26 | 27 | english_agent = Agent( 28 | name="english_agent", 29 | instructions="You only speak English", 30 | ) 31 | 32 | triage_agent = Agent( 33 | name="triage_agent", 34 | instructions="Handoff to the appropriate agent based on the language of the request.", 35 | handoffs=[french_agent, spanish_agent, english_agent], 36 | ) 37 | 38 | 39 | async def main(): 40 | # We'll create an ID for this conversation, so we can link each trace 41 | conversation_id = str(uuid.uuid4().hex[:16]) 42 | 43 | msg = input("Hi! We speak French, Spanish and English. How can I help? ") 44 | agent = triage_agent 45 | inputs: list[TResponseInputItem] = [{"content": msg, "role": "user"}] 46 | 47 | while True: 48 | # Each conversation turn is a single trace. Normally, each input from the user would be an 49 | # API request to your app, and you can wrap the request in a trace() 50 | with trace("Routing example", group_id=conversation_id): 51 | result = Runner.run_streamed( 52 | agent, 53 | input=inputs, 54 | ) 55 | async for event in result.stream_events(): 56 | if not isinstance(event, RawResponsesStreamEvent): 57 | continue 58 | data = event.data 59 | if isinstance(data, ResponseTextDeltaEvent): 60 | print(data.delta, end="", flush=True) 61 | elif isinstance(data, ResponseContentPartDoneEvent): 62 | print("\n") 63 | 64 | inputs = result.to_input_list() 65 | print("\n") 66 | 67 | user_msg = input("Enter a message: ") 68 | inputs.append({"content": user_msg, "role": "user"}) 69 | agent = result.current_agent 70 | 71 | 72 | if __name__ == "__main__": 73 | asyncio.run(main()) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | openai>=1.12.0 2 | pydantic>=2.0.0 3 | python-dotenv>=1.0.0 4 | typing-extensions>=4.5.0 5 | openai-agents 6 | opentelemetry-sdk 7 | opentelemetry-exporter-otlp-proto-http 8 | langfuse 9 | nest_asyncio 10 | pydantic-ai[logfire] 11 | datasets 12 | nltk>=3.8.1 -------------------------------------------------------------------------------- /simple-agent.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import random 3 | from typing import Any 4 | from dotenv import load_dotenv 5 | import os 6 | 7 | from pydantic import BaseModel 8 | 9 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool 10 | 11 | 12 | def load_env(): 13 | """Load environment variables from .env file.""" 14 | load_dotenv() 15 | # You can add validation here if needed 16 | required_vars = ['OPENAI_API_KEY'] 17 | for var in required_vars: 18 | if not os.getenv(var): 19 | raise ValueError(f"Missing required environment variable: {var}") 20 | 21 | 22 | class CustomAgentHooks(AgentHooks): 23 | def __init__(self, display_name: str): 24 | self.event_counter = 0 25 | self.display_name = display_name 26 | 27 | async def on_start(self, context: RunContextWrapper, agent: Agent) -> None: 28 | self.event_counter += 1 29 | print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started") 30 | 31 | async def on_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None: 32 | self.event_counter += 1 33 | print( 34 | f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended with output {output}" 35 | ) 36 | 37 | async def on_handoff(self, context: RunContextWrapper, agent: Agent, source: Agent) -> None: 38 | self.event_counter += 1 39 | print( 40 | f"### ({self.display_name}) {self.event_counter}: Agent {source.name} handed off to {agent.name}" 41 | ) 42 | 43 | async def on_tool_start(self, context: RunContextWrapper, agent: Agent, tool: Tool) -> None: 44 | self.event_counter += 1 45 | print( 46 | f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}" 47 | ) 48 | 49 | async def on_tool_end( 50 | self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str 51 | ) -> None: 52 | self.event_counter += 1 53 | print( 54 | f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}" 55 | ) 56 | 57 | 58 | ### 59 | 60 | 61 | @function_tool 62 | def random_number(max: int) -> int: 63 | """ 64 | Generate a random number up to the provided maximum. 65 | """ 66 | return random.randint(0, max) 67 | 68 | 69 | @function_tool 70 | def multiply_by_two(x: int) -> int: 71 | """Simple multiplication by two.""" 72 | return x * 2 73 | 74 | 75 | class FinalResult(BaseModel): 76 | number: int 77 | 78 | 79 | multiply_agent = Agent( 80 | name="Multiply Agent", 81 | instructions="Multiply the number by 2 and then return the final result.", 82 | tools=[multiply_by_two], 83 | output_type=FinalResult, 84 | hooks=CustomAgentHooks(display_name="Multiply Agent"), 85 | ) 86 | 87 | start_agent = Agent( 88 | name="Start Agent", 89 | instructions="Generate a random number. If it's even, stop. If it's odd, hand off to the multiply agent.", 90 | tools=[random_number], 91 | output_type=FinalResult, 92 | handoffs=[multiply_agent], 93 | hooks=CustomAgentHooks(display_name="Start Agent"), 94 | ) 95 | 96 | 97 | async def main() -> None: 98 | load_env() # Load environment variables 99 | user_input = input("Enter a max number: ") 100 | await Runner.run( 101 | start_agent, 102 | input=f"Generate a random number between 0 and {user_input}.", 103 | ) 104 | 105 | print("Done!") 106 | 107 | 108 | if __name__ == "__main__": 109 | asyncio.run(main()) 110 | """ 111 | $ python examples/basic/agent_lifecycle_example.py 112 | 113 | Enter a max number: 250 114 | ### (Start Agent) 1: Agent Start Agent started 115 | ### (Start Agent) 2: Agent Start Agent started tool random_number 116 | ### (Start Agent) 3: Agent Start Agent ended tool random_number with result 37 117 | ### (Start Agent) 4: Agent Start Agent started 118 | ### (Start Agent) 5: Agent Start Agent handed off to Multiply Agent 119 | ### (Multiply Agent) 1: Agent Multiply Agent started 120 | ### (Multiply Agent) 2: Agent Multiply Agent started tool multiply_by_two 121 | ### (Multiply Agent) 3: Agent Multiply Agent ended tool multiply_by_two with result 74 122 | ### (Multiply Agent) 4: Agent Multiply Agent started 123 | ### (Multiply Agent) 5: Agent Multiply Agent ended with output number=74 124 | Done! 125 | """ -------------------------------------------------------------------------------- /simple-react-agent.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import random 3 | from typing import Any 4 | from dotenv import load_dotenv 5 | import os 6 | from pydantic import BaseModel 7 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool 8 | 9 | 10 | def load_env(): 11 | """Load environment variables from .env file.""" 12 | load_dotenv() 13 | # You can add validation here if needed 14 | required_vars = ['OPENAI_API_KEY'] 15 | for var in required_vars: 16 | if not os.getenv(var): 17 | raise ValueError(f"Missing required environment variable: {var}") 18 | 19 | 20 | class CustomAgentHooks(AgentHooks): 21 | def __init__(self, display_name: str): 22 | self.event_counter = 0 23 | self.display_name = display_name 24 | 25 | async def on_start(self, context: RunContextWrapper, agent: Agent) -> None: 26 | self.event_counter += 1 27 | print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started") 28 | 29 | async def on_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None: 30 | self.event_counter += 1 31 | print( 32 | f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended with output {output}" 33 | ) 34 | 35 | async def on_handoff(self, context: RunContextWrapper, agent: Agent, source: Agent) -> None: 36 | self.event_counter += 1 37 | print( 38 | f"### ({self.display_name}) {self.event_counter}: Agent {source.name} handed off to {agent.name}" 39 | ) 40 | 41 | async def on_tool_start(self, context: RunContextWrapper, agent: Agent, tool: Tool) -> None: 42 | self.event_counter += 1 43 | print( 44 | f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}" 45 | ) 46 | 47 | async def on_tool_end( 48 | self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str 49 | ) -> None: 50 | self.event_counter += 1 51 | print( 52 | f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}" 53 | ) 54 | 55 | 56 | ### 57 | 58 | 59 | @function_tool 60 | def random_number(max: int) -> int: 61 | """ 62 | Generate a random number up to the provided maximum. 63 | """ 64 | return random.randint(0, max) 65 | 66 | 67 | @function_tool 68 | def multiply_by_two(x: int) -> int: 69 | """Simple multiplication by two.""" 70 | return x * 2 71 | 72 | 73 | @function_tool 74 | def is_even(x: int) -> bool: 75 | """Check if a number is even.""" 76 | return x % 2 == 0 77 | 78 | 79 | class FinalResult(BaseModel): 80 | original_number: int 81 | multiplied_number: int 82 | 83 | 84 | multiply_agent = Agent( 85 | name="Multiply Agent", 86 | instructions=""" 87 | You have received an even number from the Find Even Agent. 88 | 1. Take the even number provided. 89 | 2. Multiply it by 2 using the multiply_by_two tool. 90 | 3. Return both the original even number and the multiplied result. 91 | """, 92 | tools=[multiply_by_two], 93 | output_type=FinalResult, 94 | hooks=CustomAgentHooks(display_name="Multiply Agent"), 95 | ) 96 | 97 | find_even_agent = Agent( 98 | name="Find Even Agent", 99 | instructions=""" 100 | Your task is to find an even number and then hand it off to the Multiply Agent. 101 | 102 | 1. Generate a random number using the random_number tool. 103 | 2. Check if the number is even using the is_even tool. 104 | 3. If the number is odd, generate another random number and repeat until you find an even number. 105 | 4. Once you find an even number, hand off to the Multiply Agent. 106 | 107 | Important: Be efficient and don't waste turns. If you get an odd number, immediately try again without unnecessary reasoning. 108 | """, 109 | tools=[random_number, is_even], 110 | output_type=FinalResult, 111 | handoffs=[multiply_agent], 112 | hooks=CustomAgentHooks(display_name="Find Even Agent"), 113 | ) 114 | 115 | 116 | async def main() -> None: 117 | load_env() # Load environment variables 118 | user_input = input("Enter a max number: ") 119 | try: 120 | result = await Runner.run( 121 | find_even_agent, 122 | input=f"Generate random numbers between 0 and {user_input} until you find an even one, then multiply it by 2.", 123 | max_turns=20, # Increased max turns 124 | ) 125 | print(f"Final result: {result}") 126 | except Exception as e: 127 | print(f"Error occurred: {str(e)}") 128 | print("Try running again with a different input number.") 129 | finally: 130 | print("Done!") 131 | 132 | 133 | if __name__ == "__main__": 134 | asyncio.run(main()) --------------------------------------------------------------------------------