├── .gitignore
├── README.MD
├── eval-agents
    ├── build-langfuse-dataset.py
    ├── translation-agent-eval.py
    └── translation-agent-local-eval.py
├── multi-agents-design-patterns
    ├── evaluator-optimizer.py
    ├── orchestrator-workers.py
    ├── parallelization.py
    ├── prompt-chaining.py
    └── workflow-routing.py
├── requirements.txt
├── simple-agent.py
└── simple-react-agent.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Environment variables
 2 | .env
 3 | 
 4 | # Python
 5 | __pycache__/
 6 | *.py[cod]
 7 | *$py.class
 8 | *.so
 9 | .Python
10 | env/
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | 
27 | # Virtual Environment
28 | venv/
29 | ENV/ 


--------------------------------------------------------------------------------
/README.MD:
--------------------------------------------------------------------------------
  1 | # Multi-agents Design Patterns and Evaluation Process
  2 | 
  3 | ## Installation
  4 | 
  5 | 1. Clone the repository:
  6 | ```bash
  7 | git clone [repository-url]
  8 | cd multi-agents
  9 | ```
 10 | 
 11 | 2. Create and activate a virtual environment (recommended):
 12 | ```bash
 13 | python -m venv venv
 14 | source venv/bin/activate  # On Windows: venv\Scripts\activate
 15 | ```
 16 | 
 17 | 3. Install dependencies:
 18 | ```bash
 19 | pip install -r requirements.txt
 20 | ```
 21 | 
 22 | 4. Set up environment variables:
 23 | ```bash
 24 | # Create .env file
 25 | cat << EOF > .env
 26 | OPENAI_API_KEY=your_openai_key
 27 | LANGFUSE_PUBLIC_KEY=your_key
 28 | LANGFUSE_SECRET_KEY=your_secret
 29 | LANGFUSE_HOST=https://cloud.langfuse.com
 30 | EOF
 31 | ```
 32 | 
 33 | ## Requirements.txt
 34 | ```text
 35 | langfuse
 36 | nltk
 37 | python-dotenv
 38 | asyncio
 39 | pydantic
 40 | ```
 41 | 
 42 | 
 43 | ## Design Patterns
 44 | 
 45 | ### 1. Prompt Chaining
 46 | **Name**: Sequential Task Processing  
 47 | **Description**: A workflow that breaks down complex tasks into sequential steps where each agent processes the output of the previous one. This pattern is implemented in our project through the Marketing → Validation → Translation chain.
 48 | 
 49 | ```bash
 50 | python multi-agents-design-patterns/prompt-chaining.py
 51 | ```
 52 | 
 53 | 
 54 | ### 2. Routing
 55 | **Name**: Input Classification and Direction  
 56 | **Description**: A pattern that classifies inputs and directs them to specialized handlers. While not directly implemented in our current project, this could be useful for handling different types of marketing content or language pairs.
 57 | 
 58 | 
 59 | ```bash
 60 | python multi-agents-design-patterns/workflow-routing.py
 61 | ```
 62 | 
 63 | ### 3. Parallelization
 64 | **Name**: Concurrent Task Processing  
 65 | **Description**: Enables simultaneous execution of related tasks through either sectioning (breaking into subtasks) or voting (multiple attempts). Our project could be extended to use this for parallel validation checks or multiple translation attempts.
 66 | 
 67 | ```bash
 68 | python multi-agents-design-patterns/parallelization.py
 69 | ```
 70 | 
 71 | ### 4. Orchestrator-Workers
 72 | **Name**: Dynamic Task Management  
 73 | **Description**: Uses a central agent to coordinate and delegate tasks to specialized worker agents. Our project demonstrates this through the main workflow coordination, though in a simplified linear chain.
 74 | 
 75 | ```bash
 76 | python multi-agents-design-patterns/orchestrator-workers.py
 77 | ```
 78 | 
 79 | ### 5. Evaluator-Optimizer
 80 | **Name**: Iterative Improvement Loop  
 81 | **Description**: Implements a feedback loop where one agent generates content and another evaluates and suggests improvements. Our validation agent partially implements this pattern by checking marketing copy quality.
 82 | 
 83 | ```bash
 84 | python multi-agents-design-patterns/evaluator-optimizer.py
 85 | ```
 86 | 
 87 | ## Part 2: Evaluation Framework
 88 | 
 89 | ### Overview
 90 | The `eval-agents` folder contains a comprehensive evaluation system for testing and benchmarking translation agents. The system includes parallel translation processing with BLEU score calculation, Langfuse integration for dataset management, and both interactive and batch evaluation modes. The evaluation framework supports English-Spanish translation assessment through multiple agents and uses a voting mechanism to select the best translation among multiple candidates.
 91 | 
 92 | ### Components
 93 | - `translation-agent-eval.py`: Main evaluation script with Langfuse integration
 94 | - `translation-agent-local-eval.py`: Local version of the evaluation system
 95 | - `build-langfuse-dataset.py`: Tool for creating translation benchmark datasets
 96 | 
 97 | ### How to Run
 98 | 
 99 | 1. **Setup**:
100 | ```bash
101 | 
102 | # Set up environment variables in .env file
103 | LANGFUSE_PUBLIC_KEY=your_key
104 | LANGFUSE_SECRET_KEY=your_secret
105 | LANGFUSE_HOST=https://cloud.langfuse.com
106 | ```
107 | 
108 | 2. **Running Evaluation**:
109 | ```bash
110 | python translation-agent-eval.py
111 | ```
112 | Then choose:
113 | - Option 1: Interactive mode for single translations
114 | - Option 2: Dataset evaluation mode for batch testing
115 | 
116 | The system will output translation candidates, BLEU scores, and performance metrics for comprehensive agent evaluation.
117 | 
118 | 3. Run evaluation without LangFuse
119 | 
120 | ```bash
121 | python translation-agent-eval.py
122 | ```
123 | 


--------------------------------------------------------------------------------
/eval-agents/build-langfuse-dataset.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import pandas as pd
 3 | from datasets import load_dataset
 4 | from langfuse import Langfuse
 5 | 
 6 | # Set Langfuse credentials
 7 | os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-0b384bde-303f-4df6-9066-9cb1e4e30e94"
 8 | os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-978aac65-5019-4092-9bfc-63bf75aaf98e"
 9 | os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
10 | 
11 | # Initialize Langfuse client
12 | langfuse = Langfuse()
13 | 
14 | # Load English-Spanish translation dataset from Hugging Face
15 | print("Loading dataset from Hugging Face...")
16 | dataset = load_dataset("opus_books", "en-es", split="train[:50]")
17 | df = pd.DataFrame(dataset)
18 | 
19 | # Display sample data
20 | print("\nFirst few rows of the dataset:")
21 | for i, row in df.head().iterrows():
22 |     print(f"English: {row['translation']['en']}")
23 |     print(f"Spanish: {row['translation']['es']}")
24 |     print("-" * 50)
25 | 
26 | # Define Langfuse dataset name
27 | langfuse_dataset_name = "en-es-translation-benchmark"
28 | 
29 | # Create a dataset in Langfuse
30 | print(f"\nCreating dataset '{langfuse_dataset_name}' in Langfuse...")
31 | try:
32 |     langfuse.create_dataset(
33 |         name=langfuse_dataset_name,
34 |         description="English-Spanish translation dataset from opus_books",
35 |         metadata={
36 |             "source": "opus_books",
37 |             "language_pair": "en-es",
38 |             "type": "benchmark"
39 |         }
40 |     )
41 |     print("Dataset created successfully.")
42 | except Exception as e:
43 |     print(f"Note: {e}")
44 |     print("Dataset might already exist, continuing with item creation...")
45 | 
46 | # Add dataset items from Hugging Face dataset to Langfuse
47 | print("\nAdding items to Langfuse dataset...")
48 | count = 0
49 | for idx, row in df.iterrows():
50 |     # Create dataset item with English as input and Spanish as expected output
51 |     langfuse.create_dataset_item(
52 |         dataset_name=langfuse_dataset_name,
53 |         input={"text": row["translation"]["en"]},
54 |         expected_output={"text": row["translation"]["es"]}
55 |     )
56 |     count += 1
57 |     
58 |     # Progress indicator
59 |     if count % 10 == 0:
60 |         print(f"Added {count} items...")
61 |         
62 |     if idx >= 49:  # For this example, we upload only the first 50 items
63 |         break
64 | 
65 | # Ensure all data is sent to Langfuse
66 | langfuse.flush()
67 | print(f"\nSuccessfully uploaded {count} translation examples to Langfuse dataset '{langfuse_dataset_name}'")
68 | print("You can now use this dataset for evaluating your translation agent.")


--------------------------------------------------------------------------------
/eval-agents/translation-agent-eval.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import os
  3 | from dotenv import load_dotenv
  4 | from agents import Agent, ItemHelpers, Runner, trace
  5 | from langfuse import Langfuse
  6 | import nltk
  7 | from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
  8 | 
  9 | """
 10 | Enhanced version that includes BLEU score calculation for translation evaluation.
 11 | """
 12 | 
 13 | def load_env():
 14 |     """Load environment variables from .env file"""
 15 |     load_dotenv()
 16 | 
 17 | # Set Langfuse credentials
 18 | os.environ["LANGFUSE_PUBLIC_KEY"] = os.getenv("LANGFUSE_PUBLIC_KEY")
 19 | os.environ["LANGFUSE_SECRET_KEY"] = os.getenv("LANGFUSE_SECRET_KEY")
 20 | os.environ["LANGFUSE_HOST"] = os.getenv("LANGFUSE_HOST")
 21 | 
 22 | # Define translation agents
 23 | spanish_agent = Agent(
 24 |     name="spanish_agent",
 25 |     instructions="You translate the user's message to Spanish with perfect grammar and natural phrasing.",
 26 | )
 27 | 
 28 | translation_picker = Agent(
 29 |     name="translation_picker",
 30 |     instructions="""You are an expert Spanish translator. Pick the best Spanish translation from the given options.
 31 |     
 32 |     Evaluate each translation on:
 33 |     1. Accuracy - How well it preserves the original meaning
 34 |     2. Grammar - Correctness of grammar 
 35 |     3. Naturalness - How native and idiomatic it sounds
 36 |     
 37 |     First briefly explain which translation is best and why, then provide ONLY the best translation.""",
 38 | )
 39 | 
 40 | async def run_translation(msg):
 41 |     """Run the parallel translation process on a single message"""
 42 |     # Ensure the entire workflow is a single trace
 43 |     with trace("Parallel translation"):
 44 |         # Run translations in parallel
 45 |         res_1, res_2, res_3 = await asyncio.gather(
 46 |             Runner.run(spanish_agent, msg),
 47 |             Runner.run(spanish_agent, msg),
 48 |             Runner.run(spanish_agent, msg),
 49 |         )
 50 | 
 51 |         outputs = [
 52 |             ItemHelpers.text_message_outputs(res_1.new_items),
 53 |             ItemHelpers.text_message_outputs(res_2.new_items),
 54 |             ItemHelpers.text_message_outputs(res_3.new_items),
 55 |         ]
 56 | 
 57 |         translations = "\n\n".join(outputs)
 58 |         
 59 |         # Get the best translation
 60 |         best_translation = await Runner.run(
 61 |             translation_picker,
 62 |             f"Input: {msg}\n\nTranslations:\n{translations}",
 63 |         )
 64 | 
 65 |     return {
 66 |         "input": msg,
 67 |         "translations": outputs,
 68 |         "best": best_translation.final_output
 69 |     }
 70 | 
 71 | def calculate_bleu_score(reference, candidate):
 72 |     """
 73 |     Calculate BLEU score for a translation compared to a reference.
 74 |     
 75 |     Args:
 76 |         reference (str): The reference/expected translation
 77 |         candidate (str): The candidate translation to evaluate
 78 |         
 79 |     Returns:
 80 |         float: BLEU score between 0 and 100
 81 |     """
 82 |     # Tokenize the sentences into words
 83 |     reference_tokens = reference.lower().split()
 84 |     candidate_tokens = candidate.lower().split()
 85 |     
 86 |     # Use smoothing to avoid 0 score when there are no n-gram matches
 87 |     smoothing = SmoothingFunction().method1
 88 |     
 89 |     # Calculate BLEU score with weights [0.25, 0.25, 0.25, 0.25] for 1-gram to 4-gram
 90 |     # and handle cases where reference might be too short for higher n-grams
 91 |     try:
 92 |         bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 
 93 |                                    weights=(0.25, 0.25, 0.25, 0.25),
 94 |                                    smoothing_function=smoothing)
 95 |     except Exception as e:
 96 |         print(f"Error calculating BLEU score: {e}")
 97 |         # Fall back to unigram BLEU if there's an issue
 98 |         bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 
 99 |                                    weights=(1, 0, 0, 0),
100 |                                    smoothing_function=smoothing)
101 |     
102 |     # Convert from 0-1 scale to 0-100 scale
103 |     return bleu_score * 100
104 | 
105 | async def evaluate_on_dataset(dataset_name, num_examples):
106 |     """Evaluate the agent on examples from the dataset"""
107 |     langfuse = Langfuse()
108 |     
109 |     print(f"Evaluating on dataset '{dataset_name}'...")
110 |     
111 |     # Get the dataset from Langfuse
112 |     try:
113 |         dataset = langfuse.get_dataset(dataset_name)
114 |         print(f"Found dataset with {len(dataset.items)} items")
115 |     except Exception as e:
116 |         print(f"Error retrieving dataset: {e}")
117 |         return []
118 |     
119 |     results = []
120 |     
121 |     # Limit the number of examples to process
122 |     items_to_process = min(num_examples, len(dataset.items))
123 |     print(f"Processing {items_to_process} examples...")
124 |     
125 |     for i in range(items_to_process):
126 |         item = dataset.items[i]
127 |         print(f"\n===== Processing example {i+1}/{items_to_process} =====")
128 |         
129 |         # Extract the input from the dataset item
130 |         try:
131 |             # Print the entire item to debug
132 |             print(f"Item data: {item}")
133 |             
134 |             # Try to get the input field
135 |             if hasattr(item, 'input'):
136 |                 english_text = item.input.get('text', f"Default example {i+1}")
137 |             else:
138 |                 # If direct access fails, try using the to_dict method if available
139 |                 item_dict = item.to_dict() if hasattr(item, 'to_dict') else {}
140 |                 english_text = item_dict.get('input', {}).get('text', f"Default example {i+1}")
141 |             
142 |             # Run the agent on this example
143 |             print(f"English text: {english_text}")
144 |             result = await run_translation(english_text)
145 |             
146 |             # print("\nTranslation candidates:")
147 |             # for j, translation in enumerate(result["translations"]):
148 |             #     print(f"{j+1}: {translation}")
149 |                 
150 |             print(f"\nBest translation: {result['best']}")
151 |             
152 |             # Get the expected translation
153 |             expected_translation = None
154 |             if hasattr(item, 'expected_output'):
155 |                 print("---item.expected_output", item.expected_output)
156 |                 expected_translation = item.expected_output.get('text')
157 |             else:
158 |                 # If direct access fails, try using the to_dict method if available
159 |                 item_dict = item.to_dict() if hasattr(item, 'to_dict') else {}
160 |                 expected_translation = item_dict.get('expected_output', {}).get('text')
161 | 
162 |             
163 |             # Compare the best translation with the expected translation
164 |             if expected_translation:
165 |                 print(f"\nExpected translation: {expected_translation}")
166 |                 
167 |                 # Add comparison result to the result dictionary
168 |                 result['expected'] = expected_translation
169 |                 
170 |                 # Calculate BLEU score
171 |                 bleu = calculate_bleu_score(expected_translation, result['best'])
172 |                 print(f"BLEU score: {bleu:.2f}/100")
173 |                 result['bleu_score'] = bleu
174 |                 
175 |                 # Simple match check (could be enhanced with more sophisticated metrics)
176 |                 exact_match = result['best'] == expected_translation
177 |                 print(f"Exact match: {exact_match}")
178 |                 result['exact_match'] = exact_match
179 |                 
180 |             else:
181 |                 print("\nNo expected translation found for comparison")
182 |                 result['expected'] = None
183 |                 result['exact_match'] = None
184 |                 result['bleu_score'] = None
185 |             
186 |             results.append(result)
187 |             
188 |         except Exception as e:
189 |             print(f"Error processing item {i+1}: {e}")
190 |             continue
191 |     
192 |     # Calculate and print overall performance metrics
193 |     successful_comparisons = [r for r in results if r['exact_match'] is not None]
194 |     if successful_comparisons:
195 |         exact_matches = sum(1 for r in successful_comparisons if r['exact_match'])
196 |         match_rate = exact_matches / len(successful_comparisons)
197 |         
198 |         # Calculate average BLEU score
199 |         bleu_scores = [r['bleu_score'] for r in successful_comparisons if r['bleu_score'] is not None]
200 |         avg_bleu = sum(bleu_scores) / len(bleu_scores) if bleu_scores else 0
201 |         
202 |         print(f"\nOverall performance:")
203 |         print(f"Exact matches: {exact_matches}/{len(successful_comparisons)} ({match_rate:.2%})")
204 |         print(f"Average BLEU score: {avg_bleu:.2f}/100")
205 |     
206 |     return results
207 | 
208 | async def main():
209 |     # Load environment variables
210 |     load_env()
211 |     
212 |     # Ensure NLTK packages needed for BLEU calculation are downloaded
213 |     try:
214 |         nltk.data.find('tokenizers/punkt')
215 |     except LookupError:
216 |         print("Downloading NLTK punkt tokenizer...")
217 |         nltk.download('punkt')
218 |     
219 |     # Define the Langfuse dataset name
220 |     dataset_name = "en-es-translation-benchmark"
221 |     
222 |     # Choose mode
223 |     print("1. Interactive mode")
224 |     print("2. Dataset evaluation mode")
225 |     choice = input("Enter choice (1/2): ")
226 |     
227 |     if choice == "2":
228 |         # Dataset evaluation mode
229 |         num_examples = int(input("How many examples to evaluate? (1-20): "))
230 |         num_examples = min(20, max(1, num_examples))
231 |         
232 |         results = await evaluate_on_dataset(dataset_name, num_examples)
233 |         
234 |         print("\n===== EVALUATION SUMMARY =====")
235 |         for i, result in enumerate(results):
236 |             print(f"Example {i+1}:")
237 |             print(f"Input: {result['input']}")
238 |             print(f"Best translation: {result['best']}")
239 |             if result['expected']:
240 |                 print(f"Expected translation: {result['expected']}")
241 |                 print(f"Exact match: {result['exact_match']}")
242 |                 print(f"BLEU score: {result['bleu_score']:.2f}/100")
243 |             print("-" * 40)
244 |     else:
245 |         # Interactive mode
246 |         msg = input("Enter a message to translate to Spanish: ")
247 |         result = await run_translation(msg)
248 |         
249 |         print("\nTranslation candidates:")
250 |         for i, translation in enumerate(result["translations"]):
251 |             print(f"{i+1}: {translation}")
252 |             
253 |         print(f"\nBest translation: {result['best']}")
254 | 
255 | if __name__ == "__main__":
256 |     asyncio.run(main())


--------------------------------------------------------------------------------
/eval-agents/translation-agent-local-eval.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import os
  3 | from dotenv import load_dotenv
  4 | from agents import Agent, ItemHelpers, Runner, trace
  5 | import nltk
  6 | from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
  7 | from datasets import load_dataset
  8 | import pandas as pd
  9 | 
 10 | """
 11 | Enhanced version that includes BLEU score calculation for translation evaluation
 12 | using HuggingFace datasets instead of Langfuse.
 13 | """
 14 | 
 15 | def load_env():
 16 |     """Load environment variables from .env file"""
 17 |     load_dotenv()
 18 | 
 19 | # Define translation agents
 20 | spanish_agent = Agent(
 21 |     name="spanish_agent",
 22 |     instructions="You translate the user's message to Spanish with perfect grammar and natural phrasing.",
 23 | )
 24 | 
 25 | translation_picker = Agent(
 26 |     name="translation_picker",
 27 |     instructions="""You are an expert Spanish translator. Pick the best Spanish translation from the given options.
 28 |     
 29 |     Evaluate each translation on:
 30 |     1. Accuracy - How well it preserves the original meaning
 31 |     2. Grammar - Correctness of grammar 
 32 |     3. Naturalness - How native and idiomatic it sounds
 33 |     
 34 |     Your response must ONLY contain the text of the best translation, without any explanations, justifications, or prefixes like "Best translation:". Just return the raw translated text.""",
 35 | )
 36 | 
 37 | async def run_translation(msg):
 38 |     """Run the parallel translation process on a single message"""
 39 |     # Ensure the entire workflow is a single trace
 40 |     with trace("Parallel translation"):
 41 |         # Run translations in parallel
 42 |         res_1, res_2, res_3 = await asyncio.gather(
 43 |             Runner.run(spanish_agent, msg),
 44 |             Runner.run(spanish_agent, msg),
 45 |             Runner.run(spanish_agent, msg),
 46 |         )
 47 | 
 48 |         outputs = [
 49 |             ItemHelpers.text_message_outputs(res_1.new_items),
 50 |             ItemHelpers.text_message_outputs(res_2.new_items),
 51 |             ItemHelpers.text_message_outputs(res_3.new_items),
 52 |         ]
 53 | 
 54 |         translations = "\n\n".join(outputs)
 55 |         
 56 |         # Get the best translation
 57 |         best_translation = await Runner.run(
 58 |             translation_picker,
 59 |             f"Input: {msg}\n\nTranslations:\n{translations}",
 60 |         )
 61 | 
 62 |     return {
 63 |         "input": msg,
 64 |         "translations": outputs,
 65 |         "best": best_translation.final_output
 66 |     }
 67 | 
 68 | def calculate_bleu_score(reference, candidate):
 69 |     """
 70 |     Calculate BLEU score for a translation compared to a reference.
 71 |     
 72 |     Args:
 73 |         reference (str): The reference/expected translation
 74 |         candidate (str): The candidate translation to evaluate
 75 |         
 76 |     Returns:
 77 |         float: BLEU score between 0 and 100
 78 |     """
 79 |     # Tokenize the sentences into words
 80 |     reference_tokens = reference.lower().split()
 81 |     candidate_tokens = candidate.lower().split()
 82 |     
 83 |     # Use smoothing to avoid 0 score when there are no n-gram matches
 84 |     smoothing = SmoothingFunction().method1
 85 |     
 86 |     # Calculate BLEU score with weights [0.25, 0.25, 0.25, 0.25] for 1-gram to 4-gram
 87 |     # and handle cases where reference might be too short for higher n-grams
 88 |     try:
 89 |         bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 
 90 |                                    weights=(0.25, 0.25, 0.25, 0.25),
 91 |                                    smoothing_function=smoothing)
 92 |     except Exception as e:
 93 |         print(f"Error calculating BLEU score: {e}")
 94 |         # Fall back to unigram BLEU if there's an issue
 95 |         bleu_score = sentence_bleu([reference_tokens], candidate_tokens, 
 96 |                                    weights=(1, 0, 0, 0),
 97 |                                    smoothing_function=smoothing)
 98 |     
 99 |     # Convert from 0-1 scale to 0-100 scale
100 |     return bleu_score * 100
101 | 
102 | async def evaluate_on_dataset(dataset_name, num_examples):
103 |     """Evaluate the agent on examples from the HuggingFace dataset"""
104 |     print(f"Evaluating on HuggingFace dataset '{dataset_name}'...")
105 |     
106 |     # Load dataset from HuggingFace
107 |     try:
108 |         # Load the English-Spanish dataset from HuggingFace
109 |         dataset = load_dataset("opus_books", "en-es", split=f"train[:{num_examples}]")
110 |         df = pd.DataFrame(dataset)
111 |         print(f"Found dataset with {len(df)} items")
112 |     except Exception as e:
113 |         print(f"Error retrieving dataset: {e}")
114 |         return []
115 |     
116 |     results = []
117 |     
118 |     # Limit the number of examples to process
119 |     items_to_process = min(num_examples, len(df))
120 |     print(f"Processing {items_to_process} examples...")
121 |     
122 |     for i, row in df.iterrows():
123 |         print(f"\n===== Processing example {i+1}/{items_to_process} =====")
124 |         
125 |         try:
126 |             # Extract the English text from the dataset
127 |             english_text = row["translation"]["en"]
128 |             
129 |             # Run the agent on this example
130 |             print(f"English text: {english_text}")
131 |             result = await run_translation(english_text)
132 |             
133 |             print(f"\nBest translation: {result['best']}")
134 |             
135 |             # Get the expected translation (Spanish text)
136 |             expected_translation = row["translation"]["es"]
137 |             
138 |             # Compare the best translation with the expected translation
139 |             if expected_translation:
140 |                 print(f"\nExpected translation: {expected_translation}")
141 |                 
142 |                 # Add comparison result to the result dictionary
143 |                 result['expected'] = expected_translation
144 |                 
145 |                 # Calculate BLEU score
146 |                 bleu = calculate_bleu_score(expected_translation, result['best'])
147 |                 print(f"BLEU score: {bleu:.2f}/100")
148 |                 result['bleu_score'] = bleu
149 |                 
150 |                 # Simple match check (could be enhanced with more sophisticated metrics)
151 |                 exact_match = result['best'] == expected_translation
152 |                 print(f"Exact match: {exact_match}")
153 |                 result['exact_match'] = exact_match
154 |                 
155 |             else:
156 |                 print("\nNo expected translation found for comparison")
157 |                 result['expected'] = None
158 |                 result['exact_match'] = None
159 |                 result['bleu_score'] = None
160 |             
161 |             results.append(result)
162 |             
163 |         except Exception as e:
164 |             print(f"Error processing item {i+1}: {e}")
165 |             continue
166 |     
167 |     # Calculate and print overall performance metrics
168 |     successful_comparisons = [r for r in results if r['exact_match'] is not None]
169 |     if successful_comparisons:
170 |         exact_matches = sum(1 for r in successful_comparisons if r['exact_match'])
171 |         match_rate = exact_matches / len(successful_comparisons)
172 |         
173 |         # Calculate average BLEU score
174 |         bleu_scores = [r['bleu_score'] for r in successful_comparisons if r['bleu_score'] is not None]
175 |         avg_bleu = sum(bleu_scores) / len(bleu_scores) if bleu_scores else 0
176 |         
177 |         print(f"\nOverall performance:")
178 |         print(f"Exact matches: {exact_matches}/{len(successful_comparisons)} ({match_rate:.2%})")
179 |         print(f"Average BLEU score: {avg_bleu:.2f}/100")
180 |     
181 |     return results
182 | 
183 | async def main():
184 |     # Load environment variables
185 |     load_env()
186 |     
187 |     # Ensure NLTK packages needed for BLEU calculation are downloaded
188 |     try:
189 |         nltk.data.find('tokenizers/punkt')
190 |     except LookupError:
191 |         print("Downloading NLTK punkt tokenizer...")
192 |         nltk.download('punkt')
193 |     
194 |     # Define the HuggingFace dataset name
195 |     dataset_name = "opus_books"
196 |     
197 |     # Choose mode
198 |     print("1. Interactive mode")
199 |     print("2. Dataset evaluation mode")
200 |     choice = input("Enter choice (1/2): ")
201 |     
202 |     if choice == "2":
203 |         # Dataset evaluation mode
204 |         num_examples = int(input("How many examples to evaluate? (1-50): "))
205 |         num_examples = min(50, max(1, num_examples))
206 |         
207 |         results = await evaluate_on_dataset(dataset_name, num_examples)
208 |         
209 |         print("\n===== EVALUATION SUMMARY =====")
210 |         for i, result in enumerate(results):
211 |             print(f"Example {i+1}:")
212 |             print(f"Input: {result['input']}")
213 |             print(f"Best translation: {result['best']}")
214 |             if result['expected']:
215 |                 print(f"Expected translation: {result['expected']}")
216 |                 print(f"Exact match: {result['exact_match']}")
217 |                 print(f"BLEU score: {result['bleu_score']:.2f}/100")
218 |             print("-" * 40)
219 |     else:
220 |         # Interactive mode
221 |         msg = input("Enter a message to translate to Spanish: ")
222 |         result = await run_translation(msg)
223 |         
224 |         print("\nTranslation candidates:")
225 |         for i, translation in enumerate(result["translations"]):
226 |             print(f"{i+1}: {translation}")
227 |             
228 |         print(f"\nBest translation: {result['best']}")
229 | 
230 | if __name__ == "__main__":
231 |     asyncio.run(main())


--------------------------------------------------------------------------------
/multi-agents-design-patterns/evaluator-optimizer.py:
--------------------------------------------------------------------------------
 1 | from __future__ import annotations
 2 | 
 3 | import asyncio
 4 | from dataclasses import dataclass
 5 | from typing import Literal
 6 | 
 7 | from agents import Agent, ItemHelpers, Runner, TResponseInputItem, trace
 8 | from dotenv import load_dotenv
 9 | 
10 | load_dotenv()
11 | 
12 | """
13 | This example shows the LLM as a judge pattern. The first agent generates an outline for a story.
14 | The second agent judges the outline and provides feedback. We loop until the judge is satisfied
15 | with the outline.
16 | """
17 | 
18 | story_outline_generator = Agent(
19 |     name="story_outline_generator",
20 |     instructions=(
21 |         "You generate a very short story outline based on the user's input."
22 |         "If there is any feedback provided, use it to improve the outline."
23 |     ),
24 | )
25 | 
26 | 
27 | @dataclass
28 | class EvaluationFeedback:
29 |     feedback: str
30 |     score: Literal["pass", "needs_improvement", "fail"]
31 | 
32 | 
33 | evaluator = Agent[None](
34 |     name="evaluator",
35 |     instructions=(
36 |         "You evaluate a story outline and decide if it's good enough."
37 |         "If it's not good enough, you provide feedback on what needs to be improved."
38 |         "Never give it a pass on the first try."
39 |     ),
40 |     output_type=EvaluationFeedback,
41 | )
42 | 
43 | 
44 | async def main() -> None:
45 |     msg = input("What kind of story would you like to hear? ")
46 |     input_items: list[TResponseInputItem] = [{"content": msg, "role": "user"}]
47 | 
48 |     latest_outline: str | None = None
49 | 
50 |     # We'll run the entire workflow in a single trace
51 |     with trace("LLM as a judge"):
52 |         while True:
53 |             story_outline_result = await Runner.run(
54 |                 story_outline_generator,
55 |                 input_items,
56 |             )
57 | 
58 |             input_items = story_outline_result.to_input_list()
59 |             latest_outline = ItemHelpers.text_message_outputs(story_outline_result.new_items)
60 |             print("Story outline generated")
61 | 
62 |             evaluator_result = await Runner.run(evaluator, input_items)
63 |             result: EvaluationFeedback = evaluator_result.final_output
64 | 
65 |             print(f"Evaluator score: {result.score}")
66 | 
67 |             if result.score == "pass":
68 |                 print("Story outline is good enough, exiting.")
69 |                 break
70 | 
71 |             print("Re-running with feedback")
72 | 
73 |             input_items.append({"content": f"Feedback: {result.feedback}", "role": "user"})
74 | 
75 |     print(f"Final story outline: {latest_outline}")
76 | 
77 | 
78 | if __name__ == "__main__":
79 |     asyncio.run(main())


--------------------------------------------------------------------------------
/multi-agents-design-patterns/orchestrator-workers.py:
--------------------------------------------------------------------------------
  1 | from __future__ import annotations
  2 | 
  3 | import asyncio
  4 | from dataclasses import dataclass
  5 | from typing import List, Dict, Optional, Enum, Field, Any
  6 | from pydantic import BaseModel
  7 | 
  8 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool
  9 | from dotenv import load_dotenv
 10 | 
 11 | load_dotenv()
 12 | class TaskType(str, Enum):
 13 |     BACKEND = "backend"
 14 |     FRONTEND = "frontend"
 15 |     ANALYSIS = "analysis"
 16 |     INTEGRATION = "integration"
 17 | 
 18 | 
 19 | class Task(BaseModel):
 20 |     """Represents a task that needs to be performed"""
 21 |     task_id: str
 22 |     description: str
 23 |     type: TaskType
 24 |     dependencies: List[str] = Field(default_factory=list)
 25 |     status: str = "pending"
 26 |     context: Dict[str, Any] = Field(default_factory=dict)  # Additional context for workers
 27 |     file_paths: List[str] = Field(default_factory=list)    # Files that need modification
 28 | 
 29 | 
 30 | class TaskResult(BaseModel):
 31 |     """Result of a task execution"""
 32 |     task_id: str
 33 |     success: bool
 34 |     output: str
 35 |     error: Optional[str] = None
 36 | 
 37 | 
 38 | class ProjectAnalysis(BaseModel):
 39 |     """Analysis of the project and required tasks"""
 40 |     tasks: List[Task]
 41 |     execution_order: List[str]
 42 | 
 43 | 
 44 | class OrchestratorResult(BaseModel):
 45 |     """Result of the orchestration process"""
 46 |     all_tasks_completed: bool
 47 |     results: List[TaskResult]
 48 |     summary: str
 49 | 
 50 | 
 51 | @function_tool
 52 | async def analyze_project_requirements(description: str) -> ProjectAnalysis:
 53 |     """
 54 |     Use LLM to dynamically analyze project requirements and break them down into tasks.
 55 |     The LLM determines necessary files to modify and creates appropriate subtasks.
 56 |     """
 57 |     # This would be an actual LLM call to analyze the project
 58 |     prompt = f"""
 59 |     Analyze the following project requirements and break them down into specific tasks:
 60 |     {description}
 61 |     
 62 |     For each task:
 63 |     1. Identify the type of work needed (backend/frontend/analysis/integration)
 64 |     2. Determine which files need to be modified
 65 |     3. Specify any dependencies between tasks
 66 |     4. Provide detailed context for workers
 67 |     """
 68 |     
 69 |     # Mock LLM response - in reality, this would be dynamic
 70 |     tasks = [
 71 |         Task(
 72 |             task_id="analyze_auth_flow",
 73 |             description="Analyze current authentication flow and identify necessary changes",
 74 |             type=TaskType.ANALYSIS,
 75 |             file_paths=["auth/*", "models/user.py"],
 76 |             context={"focus_areas": ["security", "user flow"]}
 77 |         ),
 78 |         Task(
 79 |             task_id="update_api",
 80 |             description="Update API endpoints for new authentication flow",
 81 |             type=TaskType.BACKEND,
 82 |             dependencies=["analyze_auth_flow"],
 83 |             file_paths=["api/auth.py", "api/endpoints.py"],
 84 |             context={"api_version": "v2"}
 85 |         ),
 86 |         # ... more tasks ...
 87 |     ]
 88 |     return ProjectAnalysis(tasks=tasks, execution_order=["analyze_auth_flow", "update_api"])
 89 | 
 90 | 
 91 | @function_tool
 92 | async def execute_backend_task(task: Dict[str, Any]) -> TaskResult:
 93 |     """Execute a backend-related task"""
 94 |     try:
 95 |         return TaskResult(
 96 |             task_id=task["description"].split(":")[0],  # Extract task_id from description
 97 |             success=True,
 98 |             output=f"Completed backend task: {task['description']}",
 99 |             error=None
100 |         )
101 |     except Exception as e:
102 |         return TaskResult(
103 |             task_id=task.get("task_id", "unknown"),
104 |             success=False,
105 |             output="",
106 |             error=str(e)
107 |         )
108 | 
109 | 
110 | @function_tool
111 | async def execute_frontend_task(task: Dict[str, Any]) -> TaskResult:
112 |     """Execute a frontend-related task"""
113 |     try:
114 |         return TaskResult(
115 |             task_id=task["description"].split(":")[0],  # Extract task_id from description
116 |             success=True,
117 |             output=f"Completed frontend task: {task['description']}",
118 |             error=None
119 |         )
120 |     except Exception as e:
121 |         return TaskResult(
122 |             task_id=task.get("task_id", "unknown"),
123 |             success=False,
124 |             output="",
125 |             error=str(e)
126 |         )
127 | 
128 | 
129 | @function_tool
130 | async def synthesize_results(results: List[TaskResult]) -> str:
131 |     """
132 |     Use LLM to analyze and synthesize results from multiple workers into a coherent summary
133 |     """
134 |     # This would be an actual LLM call to synthesize results
135 |     prompt = f"""
136 |     Review the following task results and provide a coherent summary:
137 |     {results}
138 |     
139 |     Include:
140 |     1. Overall success/failure analysis
141 |     2. Key changes made
142 |     3. Integration points to verify
143 |     4. Next steps or recommendations
144 |     """
145 |     return "Synthesized summary of all work completed..."
146 | 
147 | 
148 | class Orchestrator:
149 |     def __init__(self):
150 |         self.workers = {
151 |             TaskType.BACKEND: backend_worker,
152 |             TaskType.FRONTEND: frontend_worker,
153 |             TaskType.ANALYSIS: analysis_worker,
154 |             TaskType.INTEGRATION: integration_worker
155 |         }
156 | 
157 |     async def adjust_tasks(self, analysis: ProjectAnalysis, result: TaskResult) -> ProjectAnalysis:
158 |         """
159 |         Dynamically adjust remaining tasks based on a task result
160 |         """
161 |         prompt = f"""
162 |         Based on the task result:
163 |         {result}
164 |         
165 |         Analyze the current project state and determine if any tasks need to be:
166 |         1. Added
167 |         2. Modified
168 |         3. Removed
169 |         
170 |         Current tasks:
171 |         {analysis.tasks}
172 |         """
173 |         
174 |         # This would be an actual LLM call
175 |         # For now, return the original analysis
176 |         return analysis
177 | 
178 |     async def execute_task(self, task: Task) -> TaskResult:
179 |         """Execute a single task using the appropriate worker"""
180 |         worker = self.workers.get(task.type)
181 |         if not worker:
182 |             raise ValueError(f"No worker found for task type: {task.type}")
183 |         
184 |         try:
185 |             result = await Runner.run(
186 |                 worker,
187 |                 input={
188 |                     "description": task.description,
189 |                     "context": task.context,
190 |                     "file_paths": task.file_paths
191 |                 }
192 |             )
193 |             
194 |             # Add requires_task_adjustment field if missing
195 |             if not hasattr(result, 'requires_task_adjustment'):
196 |                 result.requires_task_adjustment = False
197 |                 
198 |             return result
199 |         except Exception as e:
200 |             return TaskResult(
201 |                 task_id=task.task_id,
202 |                 success=False,
203 |                 output="",
204 |                 error=str(e)
205 |             )
206 | 
207 |     async def orchestrate(self, project_description: str) -> OrchestratorResult:
208 |         """Main orchestration logic"""
209 |         # Analyze project and break down into tasks
210 |         analysis = await analyze_project_requirements(project_description)
211 |         
212 |         # Execute tasks in order, respecting dependencies
213 |         results = []
214 |         completed_tasks = set()
215 |         
216 |         for task_id in analysis.execution_order:
217 |             task = next(t for t in analysis.tasks if t.task_id == task_id)
218 |             
219 |             # Check dependencies
220 |             if not all(dep in completed_tasks for dep in task.dependencies):
221 |                 continue
222 |                 
223 |             # Execute task
224 |             result = await self.execute_task(task)
225 |             results.append(result)
226 |             completed_tasks.add(task_id)
227 |             
228 |             # Dynamically adjust remaining tasks based on result if needed
229 |             if result.success and result.requires_task_adjustment:
230 |                 new_analysis = await self.adjust_tasks(analysis, result)
231 |                 analysis = new_analysis
232 | 
233 |         # Synthesize final results
234 |         summary = await synthesize_results(results)
235 |         
236 |         return OrchestratorResult(
237 |             all_tasks_completed=len(completed_tasks) == len(analysis.tasks),
238 |             results=results,
239 |             summary=summary
240 |         )
241 | 
242 | 
243 | # Create worker agents for different types of tasks
244 | backend_worker = Agent(
245 |     name="Backend Worker",
246 |     instructions="Execute backend-related tasks in the codebase",
247 |     tools=[execute_backend_task],
248 |     output_type=TaskResult
249 | )
250 | 
251 | frontend_worker = Agent(
252 |     name="Frontend Worker",
253 |     instructions="Execute frontend-related tasks in the codebase",
254 |     tools=[execute_frontend_task],
255 |     output_type=TaskResult
256 | )
257 | 
258 | # Create additional worker agents
259 | analysis_worker = Agent(
260 |     name="Analysis Worker",
261 |     instructions="Execute analysis-related tasks in the codebase",
262 |     tools=[],  # Add appropriate tools
263 |     output_type=TaskResult
264 | )
265 | 
266 | integration_worker = Agent(
267 |     name="Integration Worker",
268 |     instructions="Execute integration-related tasks in the codebase",
269 |     tools=[],  # Add appropriate tools
270 |     output_type=TaskResult
271 | )
272 | 
273 | # Create orchestrator agent
274 | orchestrator = Agent(
275 |     name="Orchestrator",
276 |     instructions="""
277 |     Analyze the project requirements and coordinate task execution:
278 |     1. Break down the project into tasks
279 |     2. Determine task dependencies and execution order
280 |     3. Delegate tasks to appropriate worker agents
281 |     4. Monitor task completion and handle failures
282 |     5. Provide a final summary of all completed work
283 |     """,
284 |     tools=[analyze_project_requirements],
285 |     output_type=OrchestratorResult,
286 |     handoffs=[backend_worker, frontend_worker]
287 | )
288 | 
289 | 
290 | async def execute_tasks(tasks: List[Task], execution_order: List[str]) -> List[TaskResult]:
291 |     """Execute tasks in the specified order"""
292 |     results = []
293 |     completed_tasks = set()
294 | 
295 |     for task_id in execution_order:
296 |         task = next(task for task in tasks if task.task_id == task_id)
297 |         
298 |         # Check if dependencies are met
299 |         if task.dependencies:
300 |             if not all(dep in completed_tasks for dep in task.dependencies):
301 |                 continue
302 | 
303 |         # Select appropriate worker based on task type
304 |         worker = backend_worker if task.type == "backend" else frontend_worker
305 |         
306 |         # Execute task
307 |         result = await Runner.run(
308 |             worker,
309 |             input=f"Execute task: {task.description}"
310 |         )
311 |         
312 |         results.append(result)
313 |         completed_tasks.add(task_id)
314 | 
315 |     return results
316 | 
317 | 
318 | async def main():
319 |     try:
320 |         project_description = """
321 |         Update the user authentication system:
322 |         - Modify API endpoints to support new authentication flow
323 |         - Update data models to include new user properties
324 |         - Update UI components to show new user information
325 |         """
326 | 
327 |         orchestrator = Orchestrator()
328 |         result = await orchestrator.orchestrate(project_description)
329 | 
330 |         print("\nProject Execution Summary:")
331 |         print("-" * 50)
332 |         print(f"All tasks completed: {result.all_tasks_completed}")
333 |         print("\nTask Results:")
334 |         for task_result in result.results:
335 |             print(f"\nTask: {task_result.task_id}")
336 |             print(f"Success: {task_result.success}")
337 |             print(f"Output: {task_result.output}")
338 |             if task_result.error:
339 |                 print(f"Error: {task_result.error}")
340 |         
341 |         print("\nFinal Summary:")
342 |         print(result.summary)
343 | 
344 |     except Exception as e:
345 |         print(f"Error in main execution: {str(e)}")
346 |         raise
347 | 
348 | 
349 | if __name__ == "__main__":
350 |     asyncio.run(main())


--------------------------------------------------------------------------------
/multi-agents-design-patterns/parallelization.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | from dotenv import load_dotenv
 3 | 
 4 | from agents import Agent, ItemHelpers, Runner, trace
 5 | 
 6 | """
 7 | This example shows the parallelization pattern. We run the agent three times in parallel, and pick
 8 | the best result.
 9 | """
10 | 
11 | def load_env():
12 |     """Load environment variables from .env file"""
13 |     load_dotenv()
14 | 
15 | spanish_agent = Agent(
16 |     name="spanish_agent",
17 |     instructions="You translate the user's message to Spanish",
18 | )
19 | 
20 | translation_picker = Agent(
21 |     name="translation_picker",
22 |     instructions="You pick the best Spanish translation from the given options.",
23 | )
24 | 
25 | 
26 | async def main():
27 |     msg = input("Hi! Enter a message, and we'll translate it to Spanish.\n\n")
28 | 
29 |     # Ensure the entire workflow is a single trace
30 |     with trace("Parallel translation"):
31 |         res_1, res_2, res_3 = await asyncio.gather(
32 |             Runner.run(
33 |                 spanish_agent,
34 |                 msg,
35 |             ),
36 |             Runner.run(
37 |                 spanish_agent,
38 |                 msg,
39 |             ),
40 |             Runner.run(
41 |                 spanish_agent,
42 |                 msg,
43 |             ),
44 |         )
45 | 
46 |         outputs = [
47 |             ItemHelpers.text_message_outputs(res_1.new_items),
48 |             ItemHelpers.text_message_outputs(res_2.new_items),
49 |             ItemHelpers.text_message_outputs(res_3.new_items),
50 |         ]
51 | 
52 |         translations = "\n\n".join(outputs)
53 |         print(f"\n\nTranslations:\n\n{translations}")
54 | 
55 |         best_translation = await Runner.run(
56 |             translation_picker,
57 |             f"Input: {msg}\n\nTranslations:\n{translations}",
58 |         )
59 | 
60 |     print("\n\n-----")
61 | 
62 |     print(f"Best translation: {best_translation.final_output}")
63 | 
64 | 
65 | if __name__ == "__main__":
66 |     load_env()
67 |     asyncio.run(main())


--------------------------------------------------------------------------------
/multi-agents-design-patterns/prompt-chaining.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | from typing import Any, Optional
  3 | from pydantic import BaseModel
  4 | 
  5 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool
  6 | from dotenv import load_dotenv
  7 | 
  8 | load_dotenv()
  9 | 
 10 | class CustomAgentHooks(AgentHooks):
 11 |     def __init__(self, display_name: str):
 12 |         self.event_counter = 0
 13 |         self.display_name = display_name
 14 | 
 15 |     async def on_start(self, context: RunContextWrapper, agent: Agent) -> None:
 16 |         self.event_counter += 1
 17 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started")
 18 | 
 19 |     async def on_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None:
 20 |         self.event_counter += 1
 21 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended with output {output}")
 22 | 
 23 |     async def on_handoff(self, context: RunContextWrapper, agent: Agent, source: Agent) -> None:
 24 |         self.event_counter += 1
 25 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {source.name} handed off to {agent.name}")
 26 | 
 27 |     async def on_tool_start(self, context: RunContextWrapper, agent: Agent, tool: Tool) -> None:
 28 |         self.event_counter += 1
 29 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}")
 30 | 
 31 |     async def on_tool_end(self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str) -> None:
 32 |         self.event_counter += 1
 33 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}")
 34 | 
 35 | 
 36 | class MarketingCopy(BaseModel):
 37 |     headline: str
 38 |     body: str
 39 |     call_to_action: str
 40 | 
 41 | 
 42 | class TranslatedCopy(BaseModel):
 43 |     original: MarketingCopy
 44 |     translated_headline: str
 45 |     translated_body: str
 46 |     translated_call_to_action: str
 47 | 
 48 | 
 49 | class ValidationResult(BaseModel):
 50 |     is_valid: bool
 51 |     feedback: str
 52 | 
 53 | 
 54 | @function_tool
 55 | def validate_marketing_copy(copy: MarketingCopy) -> ValidationResult:
 56 |     """
 57 |     Validate marketing copy against common criteria.
 58 |     """
 59 |     feedback = []
 60 |     is_valid = True
 61 | 
 62 |     # Check headline length
 63 |     if len(copy.headline) > 60:
 64 |         feedback.append("Headline is too long (should be under 60 characters)")
 65 |         is_valid = False
 66 | 
 67 |     # Check body length
 68 |     if len(copy.body) < 100:
 69 |         feedback.append("Body is too short (should be at least 100 characters)")
 70 |         is_valid = False
 71 | 
 72 |     # Check call to action
 73 |     if not any(cta in copy.call_to_action.lower() for cta in ["buy", "learn", "try", "get", "start"]):
 74 |         feedback.append("Call to action should include action words like 'buy', 'learn', 'try', etc.")
 75 |         is_valid = False
 76 | 
 77 |     return ValidationResult(
 78 |         is_valid=is_valid,
 79 |         feedback="\n".join(feedback) if feedback else "All checks passed!"
 80 |     )
 81 | 
 82 | 
 83 | @function_tool
 84 | def translate_text(text: str, target_language: str) -> str:
 85 |     """
 86 |     Translate text to target language.
 87 |     """
 88 |     # This is a mock translation function
 89 |     # In a real implementation, you would use a translation API
 90 |     translations = {
 91 |         "es": {
 92 |             "Buy now": "Comprar ahora",
 93 |             "Learn more": "Más información",
 94 |             "Try it free": "Pruébalo gratis"
 95 |         }
 96 |     }
 97 |     return translations.get(target_language, {}).get(text, f"[Translated to {target_language}] {text}")
 98 | 
 99 | 
100 | # Create the agents for each step in the chain
101 | validation_agent = Agent(
102 |     name="Validation Agent",
103 |     instructions="Validate the marketing copy against common criteria and provide feedback.",
104 |     tools=[validate_marketing_copy],
105 |     output_type=ValidationResult,
106 |     hooks=CustomAgentHooks(display_name="Validation Agent"),
107 | )
108 | 
109 | translation_agent = Agent(
110 |     name="Translation Agent",
111 |     instructions="Translate the validated marketing copy to the target language.",
112 |     tools=[translate_text],
113 |     output_type=TranslatedCopy,
114 |     hooks=CustomAgentHooks(display_name="Translation Agent"),
115 | )
116 | 
117 | marketing_agent = Agent(
118 |     name="Marketing Agent",
119 |     instructions="Create compelling marketing copy with a headline, body, and call to action.",
120 |     output_type=MarketingCopy,
121 |     handoffs=[validation_agent],
122 |     hooks=CustomAgentHooks(display_name="Marketing Agent"),
123 | )
124 | 
125 | 
126 | async def main() -> None:
127 |     # Example workflow: Create marketing copy -> Validate -> Translate
128 |     product_description = input("Enter product description: ")
129 |     target_language = input("Enter target language (e.g., 'es' for Spanish): ")
130 | 
131 |     # Start the chain with the marketing agent
132 |     result = await Runner.run(
133 |         marketing_agent,
134 |         input=f"Create marketing copy for: {product_description}",
135 |     )
136 | 
137 |     # Check validation result
138 |     validation_result = await Runner.run(
139 |         validation_agent,
140 |         input=f"Validate this marketing copy: {result}",
141 |     )
142 | 
143 |     # Access the final_output property of RunResult
144 |     if not validation_result.final_output.is_valid:
145 |         print(f"Validation failed:\n{validation_result.final_output.feedback}")
146 |         return
147 | 
148 |     # If validation passes, proceed with translation
149 |     translation_result = await Runner.run(
150 |         translation_agent,
151 |         input=f"Translate this marketing copy to {target_language}: {result}",
152 |     )
153 | 
154 |     print("\nFinal Results:")
155 |     print(f"Original Headline: {translation_result.final_output.original.headline}")
156 |     print(f"Translated Headline: {translation_result.final_output.translated_headline}")
157 |     print(f"\nOriginal Body: {translation_result.final_output.original.body}")
158 |     print(f"Translated Body: {translation_result.final_output.translated_body}")
159 |     print(f"\nOriginal CTA: {translation_result.final_output.original.call_to_action}")
160 |     print(f"Translated CTA: {translation_result.final_output.translated_call_to_action}")
161 | 
162 | 
163 | if __name__ == "__main__":
164 |     asyncio.run(main()) 


--------------------------------------------------------------------------------
/multi-agents-design-patterns/workflow-routing.py:
--------------------------------------------------------------------------------
 1 | import asyncio
 2 | import uuid
 3 | 
 4 | from openai.types.responses import ResponseContentPartDoneEvent, ResponseTextDeltaEvent
 5 | 
 6 | from agents import Agent, RawResponsesStreamEvent, Runner, TResponseInputItem, trace
 7 | 
 8 | from dotenv import load_dotenv
 9 | 
10 | 
11 | """
12 | This example shows the handoffs/routing pattern. The triage agent receives the first message, and
13 | then hands off to the appropriate agent based on the language of the request. Responses are
14 | streamed to the user.
15 | """
16 | 
17 | french_agent = Agent(
18 |     name="french_agent",
19 |     instructions="You only speak French",
20 | )
21 | 
22 | spanish_agent = Agent(
23 |     name="spanish_agent",
24 |     instructions="You only speak Spanish",
25 | )
26 | 
27 | english_agent = Agent(
28 |     name="english_agent",
29 |     instructions="You only speak English",
30 | )
31 | 
32 | triage_agent = Agent(
33 |     name="triage_agent",
34 |     instructions="Handoff to the appropriate agent based on the language of the request.",
35 |     handoffs=[french_agent, spanish_agent, english_agent],
36 | )
37 | 
38 | 
39 | async def main():
40 |     # We'll create an ID for this conversation, so we can link each trace
41 |     conversation_id = str(uuid.uuid4().hex[:16])
42 | 
43 |     msg = input("Hi! We speak French, Spanish and English. How can I help? ")
44 |     agent = triage_agent
45 |     inputs: list[TResponseInputItem] = [{"content": msg, "role": "user"}]
46 | 
47 |     while True:
48 |         # Each conversation turn is a single trace. Normally, each input from the user would be an
49 |         # API request to your app, and you can wrap the request in a trace()
50 |         with trace("Routing example", group_id=conversation_id):
51 |             result = Runner.run_streamed(
52 |                 agent,
53 |                 input=inputs,
54 |             )
55 |             async for event in result.stream_events():
56 |                 if not isinstance(event, RawResponsesStreamEvent):
57 |                     continue
58 |                 data = event.data
59 |                 if isinstance(data, ResponseTextDeltaEvent):
60 |                     print(data.delta, end="", flush=True)
61 |                 elif isinstance(data, ResponseContentPartDoneEvent):
62 |                     print("\n")
63 | 
64 |         inputs = result.to_input_list()
65 |         print("\n")
66 | 
67 |         user_msg = input("Enter a message: ")
68 |         inputs.append({"content": user_msg, "role": "user"})
69 |         agent = result.current_agent
70 | 
71 | 
72 | if __name__ == "__main__":
73 |     asyncio.run(main())


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | openai>=1.12.0
 2 | pydantic>=2.0.0
 3 | python-dotenv>=1.0.0
 4 | typing-extensions>=4.5.0
 5 | openai-agents
 6 | opentelemetry-sdk
 7 | opentelemetry-exporter-otlp-proto-http
 8 | langfuse
 9 | nest_asyncio
10 | pydantic-ai[logfire]
11 | datasets
12 | nltk>=3.8.1


--------------------------------------------------------------------------------
/simple-agent.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import random
  3 | from typing import Any
  4 | from dotenv import load_dotenv
  5 | import os
  6 | 
  7 | from pydantic import BaseModel
  8 | 
  9 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool
 10 | 
 11 | 
 12 | def load_env():
 13 |     """Load environment variables from .env file."""
 14 |     load_dotenv()
 15 |     # You can add validation here if needed
 16 |     required_vars = ['OPENAI_API_KEY']
 17 |     for var in required_vars:
 18 |         if not os.getenv(var):
 19 |             raise ValueError(f"Missing required environment variable: {var}")
 20 | 
 21 | 
 22 | class CustomAgentHooks(AgentHooks):
 23 |     def __init__(self, display_name: str):
 24 |         self.event_counter = 0
 25 |         self.display_name = display_name
 26 | 
 27 |     async def on_start(self, context: RunContextWrapper, agent: Agent) -> None:
 28 |         self.event_counter += 1
 29 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started")
 30 | 
 31 |     async def on_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None:
 32 |         self.event_counter += 1
 33 |         print(
 34 |             f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended with output {output}"
 35 |         )
 36 | 
 37 |     async def on_handoff(self, context: RunContextWrapper, agent: Agent, source: Agent) -> None:
 38 |         self.event_counter += 1
 39 |         print(
 40 |             f"### ({self.display_name}) {self.event_counter}: Agent {source.name} handed off to {agent.name}"
 41 |         )
 42 | 
 43 |     async def on_tool_start(self, context: RunContextWrapper, agent: Agent, tool: Tool) -> None:
 44 |         self.event_counter += 1
 45 |         print(
 46 |             f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}"
 47 |         )
 48 | 
 49 |     async def on_tool_end(
 50 |         self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str
 51 |     ) -> None:
 52 |         self.event_counter += 1
 53 |         print(
 54 |             f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}"
 55 |         )
 56 | 
 57 | 
 58 | ###
 59 | 
 60 | 
 61 | @function_tool
 62 | def random_number(max: int) -> int:
 63 |     """
 64 |     Generate a random number up to the provided maximum.
 65 |     """
 66 |     return random.randint(0, max)
 67 | 
 68 | 
 69 | @function_tool
 70 | def multiply_by_two(x: int) -> int:
 71 |     """Simple multiplication by two."""
 72 |     return x * 2
 73 | 
 74 | 
 75 | class FinalResult(BaseModel):
 76 |     number: int
 77 | 
 78 | 
 79 | multiply_agent = Agent(
 80 |     name="Multiply Agent",
 81 |     instructions="Multiply the number by 2 and then return the final result.",
 82 |     tools=[multiply_by_two],
 83 |     output_type=FinalResult,
 84 |     hooks=CustomAgentHooks(display_name="Multiply Agent"),
 85 | )
 86 | 
 87 | start_agent = Agent(
 88 |     name="Start Agent",
 89 |     instructions="Generate a random number. If it's even, stop. If it's odd, hand off to the multiply agent.",
 90 |     tools=[random_number],
 91 |     output_type=FinalResult,
 92 |     handoffs=[multiply_agent],
 93 |     hooks=CustomAgentHooks(display_name="Start Agent"),
 94 | )
 95 | 
 96 | 
 97 | async def main() -> None:
 98 |     load_env()  # Load environment variables
 99 |     user_input = input("Enter a max number: ")
100 |     await Runner.run(
101 |         start_agent,
102 |         input=f"Generate a random number between 0 and {user_input}.",
103 |     )
104 | 
105 |     print("Done!")
106 | 
107 | 
108 | if __name__ == "__main__":
109 |     asyncio.run(main())
110 | """
111 | $ python examples/basic/agent_lifecycle_example.py
112 | 
113 | Enter a max number: 250
114 | ### (Start Agent) 1: Agent Start Agent started
115 | ### (Start Agent) 2: Agent Start Agent started tool random_number
116 | ### (Start Agent) 3: Agent Start Agent ended tool random_number with result 37
117 | ### (Start Agent) 4: Agent Start Agent started
118 | ### (Start Agent) 5: Agent Start Agent handed off to Multiply Agent
119 | ### (Multiply Agent) 1: Agent Multiply Agent started
120 | ### (Multiply Agent) 2: Agent Multiply Agent started tool multiply_by_two
121 | ### (Multiply Agent) 3: Agent Multiply Agent ended tool multiply_by_two with result 74
122 | ### (Multiply Agent) 4: Agent Multiply Agent started
123 | ### (Multiply Agent) 5: Agent Multiply Agent ended with output number=74
124 | Done!
125 | """


--------------------------------------------------------------------------------
/simple-react-agent.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import random
  3 | from typing import Any
  4 | from dotenv import load_dotenv
  5 | import os
  6 | from pydantic import BaseModel
  7 | from agents import Agent, AgentHooks, RunContextWrapper, Runner, Tool, function_tool
  8 | 
  9 | 
 10 | def load_env():
 11 |     """Load environment variables from .env file."""
 12 |     load_dotenv()
 13 |     # You can add validation here if needed
 14 |     required_vars = ['OPENAI_API_KEY']
 15 |     for var in required_vars:
 16 |         if not os.getenv(var):
 17 |             raise ValueError(f"Missing required environment variable: {var}")
 18 | 
 19 | 
 20 | class CustomAgentHooks(AgentHooks):
 21 |     def __init__(self, display_name: str):
 22 |         self.event_counter = 0
 23 |         self.display_name = display_name
 24 | 
 25 |     async def on_start(self, context: RunContextWrapper, agent: Agent) -> None:
 26 |         self.event_counter += 1
 27 |         print(f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started")
 28 | 
 29 |     async def on_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None:
 30 |         self.event_counter += 1
 31 |         print(
 32 |             f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended with output {output}"
 33 |         )
 34 | 
 35 |     async def on_handoff(self, context: RunContextWrapper, agent: Agent, source: Agent) -> None:
 36 |         self.event_counter += 1
 37 |         print(
 38 |             f"### ({self.display_name}) {self.event_counter}: Agent {source.name} handed off to {agent.name}"
 39 |         )
 40 | 
 41 |     async def on_tool_start(self, context: RunContextWrapper, agent: Agent, tool: Tool) -> None:
 42 |         self.event_counter += 1
 43 |         print(
 44 |             f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} started tool {tool.name}"
 45 |         )
 46 | 
 47 |     async def on_tool_end(
 48 |         self, context: RunContextWrapper, agent: Agent, tool: Tool, result: str
 49 |     ) -> None:
 50 |         self.event_counter += 1
 51 |         print(
 52 |             f"### ({self.display_name}) {self.event_counter}: Agent {agent.name} ended tool {tool.name} with result {result}"
 53 |         )
 54 | 
 55 | 
 56 | ###
 57 | 
 58 | 
 59 | @function_tool
 60 | def random_number(max: int) -> int:
 61 |     """
 62 |     Generate a random number up to the provided maximum.
 63 |     """
 64 |     return random.randint(0, max)
 65 | 
 66 | 
 67 | @function_tool
 68 | def multiply_by_two(x: int) -> int:
 69 |     """Simple multiplication by two."""
 70 |     return x * 2
 71 | 
 72 | 
 73 | @function_tool
 74 | def is_even(x: int) -> bool:
 75 |     """Check if a number is even."""
 76 |     return x % 2 == 0
 77 | 
 78 | 
 79 | class FinalResult(BaseModel):
 80 |     original_number: int
 81 |     multiplied_number: int
 82 | 
 83 | 
 84 | multiply_agent = Agent(
 85 |     name="Multiply Agent",
 86 |     instructions="""
 87 |     You have received an even number from the Find Even Agent.
 88 |     1. Take the even number provided.
 89 |     2. Multiply it by 2 using the multiply_by_two tool.
 90 |     3. Return both the original even number and the multiplied result.
 91 |     """,
 92 |     tools=[multiply_by_two],
 93 |     output_type=FinalResult,
 94 |     hooks=CustomAgentHooks(display_name="Multiply Agent"),
 95 | )
 96 | 
 97 | find_even_agent = Agent(
 98 |     name="Find Even Agent",
 99 |     instructions="""
100 |     Your task is to find an even number and then hand it off to the Multiply Agent.
101 |     
102 |     1. Generate a random number using the random_number tool.
103 |     2. Check if the number is even using the is_even tool.
104 |     3. If the number is odd, generate another random number and repeat until you find an even number.
105 |     4. Once you find an even number, hand off to the Multiply Agent.
106 |     
107 |     Important: Be efficient and don't waste turns. If you get an odd number, immediately try again without unnecessary reasoning.
108 |     """,
109 |     tools=[random_number, is_even],
110 |     output_type=FinalResult,
111 |     handoffs=[multiply_agent],
112 |     hooks=CustomAgentHooks(display_name="Find Even Agent"),
113 | )
114 | 
115 | 
116 | async def main() -> None:
117 |     load_env()  # Load environment variables
118 |     user_input = input("Enter a max number: ")
119 |     try:
120 |         result = await Runner.run(
121 |             find_even_agent,
122 |             input=f"Generate random numbers between 0 and {user_input} until you find an even one, then multiply it by 2.",
123 |             max_turns=20,  # Increased max turns
124 |         )
125 |         print(f"Final result: {result}")
126 |     except Exception as e:
127 |         print(f"Error occurred: {str(e)}")
128 |         print("Try running again with a different input number.")
129 |     finally:
130 |         print("Done!")
131 | 
132 | 
133 | if __name__ == "__main__":
134 |     asyncio.run(main())


--------------------------------------------------------------------------------