├── .env ├── .gitignore ├── LICENSE ├── README.md ├── gen_answers.py ├── requirements.txt ├── run_evol.py ├── src ├── __init__.py ├── analyzers │ ├── __init__.py │ ├── base_analyzer.py │ └── trajectory_analyzer.py ├── autoevol.py ├── evaluator │ ├── __init__.py │ ├── base_evaluator.py │ ├── failure_detector_evaluator.py │ └── reward_model_evaluator.py ├── evolvers │ ├── __init__.py │ ├── base_evolver.py │ └── recurrent_evolver.py ├── generators │ ├── __init__.py │ ├── base_generator.py │ ├── openai.py │ ├── openrouter.py │ └── vllm.py ├── optimizers │ ├── __init__.py │ ├── base_optimizer.py │ └── evol_optimizer.py └── utils.py ├── tests ├── analyzers │ └── test_trajectory.py ├── evolvers │ └── test_recurrent.py ├── generators │ ├── test_openai.py │ └── test_openrouter.py ├── optimizers │ └── test_wizard.py └── test_full_flow.py └── visualize_evol_instruct.ipynb /.env: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arcee-ai/EvolKit/aa85caae55b4464c7bc1f8dca0c55e7dbdcbd6ee/.env -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *__pycache__* 2 | *.json 3 | .pytest_cache 4 | .DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | The MIT License (MIT) 3 | 4 | Copyright (c) 2024 Arcee.ai 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in 14 | all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 22 | THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EvolKit 2 | 3 | EvolKit is an framework for automatically enhancing the complexity of instructions used in fine-tuning Large Language Models (LLMs). Our project aims to revolutionize the evolution process by leveraging open-source LLMs, moving away from closed-source alternatives. 4 | 5 | ## Key Features 6 | 7 | - Automatic instruction complexity enhancement 8 | - Integration with open-source LLMs 9 | - Streamlined fine-tuning process 10 | - Support for various datasets from Hugging Face 11 | - Flexible configuration options for optimization 12 | 13 | ## Installation 14 | 15 | To set up EvolKit, follow these steps: 16 | 17 | 1. Clone the repository: 18 | 19 | ``` 20 | git clone https://github.com/arcee-ai/EvolKit.git 21 | cd EvolKit 22 | ``` 23 | 24 | 2. Install the required dependencies: 25 | 26 | ``` 27 | pip install -r requirements.txt 28 | ``` 29 | 30 | ## Usage 31 | 32 | To run the AutoEvol script, use the following command structure: 33 | 34 | ``` 35 | python run_evol.py --dataset [options] 36 | ``` 37 | 38 | ### Required Parameters: 39 | 40 | - `--dataset `: The name of the dataset on Hugging Face to use. 41 | - `--model `: Model to use for evolving instructions. 42 | - `--generator `: Type of generator to use ('openrouter' or 'vllm'). 43 | - `--batch_size `: Number of instructions to process in each batch. 44 | - `--num_methods `: Number of evolution methods to use. 45 | - `--max_concurrent_batches `: Maximum number of batches to process concurrently (in our experiment, a cluster of 8xH100 hosting Qwen2-72B-Instruct-GPTQ-Int8 can handle batch size of 50 concurrently). 46 | - `--evolve_epoch `: Maximum number of epochs for evolving each instruction. 47 | - `--output_file `: Name of the output file to save results. 48 | 49 | ### Optional Parameters: 50 | 51 | - `--dev_set_size `: Number of samples to use in the development set. Use -1 for no devset. Default is -1. (We do not recommend using a dev set since it will take much more time to finish each round) 52 | - `--use_reward_model`: Flag to use a reward model for evaluation. No value required. 53 | 54 | ### Models 55 | 56 | We found 2 models that work very well with this pipeline: 57 | - Qwen2-72B-Instruct and DeepSeek-V2.5 (GPTQ and AWQ versions are fine too). 58 | - Other models might work but it has to be very good at generating structured content (in order to parse using parsing operations) 59 | 60 | ### VLLM Support 61 | 62 | To use VLLM as the backend, set the `VLLM_BACKEND` environment variable: 63 | 64 | ``` 65 | export VLLM_BACKEND=http://your-vllm-backend-url:port/v1 66 | ``` 67 | 68 | If not set, it will default to 'http://localhost:8000/v1'. 69 | 70 | ### Example Usage: 71 | 72 | To run AutoEvol on the 'small_tomb' dataset with custom parameters: 73 | 74 | ``` 75 | python run_evol.py --dataset qnguyen3/small_tomb --model Qwen/Qwen2-72B-Instruct-GPTQ-Int8 --generator vllm --batch_size 100 --num_methods 3 --max_concurrent_batches 10 --evolve_epoch 3 --output_file the_tomb_evolved-3e-batch100.json --dev_set_size 5 --use_reward_model 76 | ``` 77 | 78 | This command will: 79 | 1. Load the 'qnguyen3/small_tomb' dataset from Hugging Face. 80 | 2. Use the Qwen2-72B-Instruct model with VLLM as the generator. 81 | 3. Process samples in batches of 100. 82 | 4. Apply 3 evolution methods for each instruction. 83 | 5. Process 10 batches concurrently. 84 | 6. Evolve each instruction for up to 3 epochs. 85 | 7. Use 5 samples for the development set. 86 | 8. Use the reward model for evaluation. 87 | 9. Output the final evolved instructions to the_tomb_evolved-3e-batch100.json. 88 | 89 | After evolving the instructions, you can generate answers using: 90 | 91 | ``` 92 | python gen_answers.py --model Qwen/Qwen2-72B-Instruct-GPTQ-Int8 --generator vllm --data_path the_tomb_evolved-3e-batch100.json --batch_size 50 --output completed_evol_data.json 93 | ``` 94 | 95 | The final dataset will be saved to completed_evol_data.json in ShareGPT format. 96 | 97 | ## Components 98 | 99 | EvolKit consists of several key components: 100 | 101 | - **Generator**: Uses an LLM for generating initial instructions (OpenRouter or VLLM). 102 | - **Evolver**: Employs a recurrent evolution strategy. 103 | - **Analyzer**: Utilizes trajectory analysis. 104 | - **Evaluator**: Offers two options: 105 | - Reward Model Evaluator 106 | - Failure Detector Evaluator (originally from WizardLM's paper) 107 | - **Optimizer**: Optimizes the evolution method for the next round. 108 | 109 | ## Output 110 | 111 | The script saves the results in JSON format to the specified output file. Each entry in the JSON file represents an evolved instruction along with relevant metadata. 112 | 113 | Find a 20k subset of a dataset generated using EvolKit [here](https://huggingface.co/datasets/arcee-ai/EvolKit-20k) 114 | 115 | ## Acknowledgement 116 | - Microsoft's WizardLM team for the inspiration from the [AutoEvol paper](https://arxiv.org/pdf/2406.00770). 117 | -------------------------------------------------------------------------------- /gen_answers.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import json 3 | import argparse 4 | from typing import List, Dict 5 | from src.generators import OpenRouterGenerator, VLLMGenerator, BaseGenerator 6 | from datasets import load_dataset 7 | from tqdm import tqdm 8 | import time 9 | from os import getenv 10 | 11 | async def process_batch(generator: BaseGenerator, batch: List[str], system_prompt: str) -> List[Dict]: 12 | tasks = [] 13 | for item in batch: 14 | try: 15 | task = generator.agenerate(item, system_prompt, temperature=0.5) 16 | tasks.append(task) 17 | except: 18 | task = 'error' 19 | tasks.append(task) 20 | results = await asyncio.gather(*tasks) 21 | 22 | processed_items = [] 23 | for item, result in zip(batch, results): 24 | processed_item = { 25 | 'conversations': [ 26 | {"from": "human", "value": item}, 27 | {"from": "gpt", "value": result} 28 | ] 29 | } 30 | processed_items.append(processed_item) 31 | 32 | return processed_items 33 | 34 | async def process_data(model:str, generator_str: str, file_path: str, batch_size: int, output_file: str): 35 | # Initialize OpenRouterGenerator 36 | generator = ( 37 | VLLMGenerator(model=model, base_url=getenv('VLLM_BACKEND') or 'http://localhost:8000/v1') 38 | if generator_str == 'vllm' 39 | else OpenRouterGenerator(model=model)) 40 | 41 | # Load data 42 | if '.json' not in file_path: 43 | data = load_dataset(file_path)['train'] # Assuming the main split is named 'train' 44 | else: 45 | with open(file_path, 'r') as file: 46 | data = json.load(file) 47 | 48 | start_idx = 0 49 | 50 | # Calculate total number of batches 51 | total_batches = (len(data) + batch_size - start_idx - 1) // batch_size 52 | 53 | instructions = [] 54 | for sample in data: 55 | convo = sample['conversations'] 56 | if convo[0]['from'] == 'human': 57 | user = convo[0]['value'] 58 | else: 59 | user = convo[1]['value'] 60 | 61 | instructions.append(user) 62 | 63 | 64 | # Process in batches 65 | all_results = [] 66 | with tqdm(total=total_batches, desc="Processing batches") as pbar: 67 | for i in range(0, len(data), batch_size): 68 | batch = instructions[i+start_idx:i+batch_size+start_idx] 69 | processed_batch = await process_batch(generator, batch, "You are a helpful assistant. Answer the question from the user. Give full solution and explaination.") 70 | 71 | # Extend results and save 72 | all_results.extend(processed_batch) 73 | 74 | # Save and overwrite for each batch 75 | with open(output_file, 'w') as f: 76 | json.dump(all_results, f, indent=2, ensure_ascii=False) 77 | 78 | pbar.update(1) 79 | time.sleep(5) 80 | 81 | def main(): 82 | parser = argparse.ArgumentParser(description="Process data using OpenRouterGenerator") 83 | parser.add_argument("--model", type=str, required=True, help="Model use to evol instructions.") 84 | parser.add_argument("--generator", type=str, required=True, choices=['openrouter', 'vllm'], help="Type of generator to use.") 85 | parser.add_argument("--data_path", required=True, help="Path to the JSON file or Hugging Face dataset repo") 86 | parser.add_argument("--batch_size", type=int, default=10, help="Batch size for processing") 87 | parser.add_argument("--output", default="final_evolved_data.json", help="Output file path") 88 | 89 | args = parser.parse_args() 90 | 91 | asyncio.run(process_data(args.model, args.generator, args.data_path, args.batch_size, args.output)) 92 | 93 | if __name__ == "__main__": 94 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | asyncio 2 | accelerate 3 | transformers 4 | datasets 5 | torch 6 | protobuf 7 | pytest 8 | pytest-asyncio 9 | openai 10 | sentencepiece 11 | einops -------------------------------------------------------------------------------- /run_evol.py: -------------------------------------------------------------------------------- 1 | import time 2 | import json 3 | import asyncio 4 | import argparse 5 | from datasets import load_dataset 6 | from src.generators import OpenRouterGenerator, VLLMGenerator 7 | from src.evolvers import RecurrentEvolver 8 | from src.analyzers import TrajectoryAnalyzer 9 | from src.evaluator import FailureDetectorEvaluator, RewardModelEvaluator 10 | from src.optimizers.evol_optimizer import EvolOptimizer 11 | from src import AutoEvol 12 | from os import getenv 13 | 14 | def load_and_process_dataset(dataset_name, dev_set_size=5): 15 | # Load the dataset from Hugging Face 16 | dataset = load_dataset(dataset_name) 17 | 18 | # Assuming the dataset has a 'train' split. Adjust if needed. 19 | if 'train' not in dataset: 20 | raise ValueError(f"The dataset {dataset_name} does not have a 'train' split.") 21 | 22 | full_dataset = dataset['train'] 23 | 24 | # Shuffle the dataset 25 | full_dataset = full_dataset.shuffle(seed=42) 26 | 27 | # Ensure dev_set_size is not larger than the dataset 28 | if dev_set_size > len(full_dataset): 29 | raise ValueError(f"Specified dev set size ({dev_set_size}) is larger than the dataset size ({len(full_dataset)})") 30 | full_filtered = [] 31 | for sample in full_dataset['conversations']: 32 | for turn in sample: 33 | if turn['from'] == 'system': 34 | break 35 | if turn['from'] == 'human': 36 | full_filtered.append(turn['value']) 37 | break 38 | 39 | if dev_set_size != -1: 40 | # Split the dataset 41 | train_instructions = full_filtered[dev_set_size:] 42 | dev_instructions = full_filtered[:dev_set_size] 43 | 44 | return train_instructions, dev_instructions 45 | else: 46 | return full_filtered, [] 47 | 48 | async def save_results(results, output_file): 49 | with open(output_file, 'w') as f: 50 | json.dump(results, f, indent=2, ensure_ascii=False) 51 | 52 | async def main(): 53 | parser = argparse.ArgumentParser(description="Run AutoEvol with specified parameters") 54 | parser.add_argument("--dataset", required=True, help="Name of the dataset on Hugging Face") 55 | parser.add_argument("--model", type=str, required=True, help="Model use to evol instructions.") 56 | parser.add_argument("--generator", type=str, required=True, choices=['openrouter', 'vllm'], help="Type of generator to use.") 57 | parser.add_argument("--batch_size", type=int, required=True, help="Batch size for processing") 58 | parser.add_argument("--num_methods", type=int, required=True, help="Number of methods to use") 59 | parser.add_argument("--max_concurrent_batches", type=int, required=True, help="Maximum number of concurrent batches") 60 | parser.add_argument("--evolve_epoch", type=int, required=True, help="Maximum number of epoch for each instruction") 61 | parser.add_argument("--output_file", type=str, required=True, help="Name of output file") 62 | 63 | # Optional arguments 64 | parser.add_argument("--dev_set_size", type=int, default=-1, help="Maximum samples for dev set. Use -1 for no dev set.") 65 | parser.add_argument("--use_reward_model", action="store_true", help="Use reward model for evaluation") 66 | 67 | args = parser.parse_args() 68 | 69 | # Load and process the dataset 70 | train_set, dev_set = load_and_process_dataset(args.dataset, args.dev_set_size) 71 | 72 | generator = ( 73 | VLLMGenerator(model=args.model, base_url=getenv('VLLM_BACKEND') or 'http://localhost:8000/v1') 74 | if args.generator == 'vllm' 75 | else OpenRouterGenerator(model=args.model) 76 | ) 77 | 78 | 79 | components = { 80 | 'generator': generator, 81 | 'evolver': RecurrentEvolver(generator), 82 | 'analyzer': TrajectoryAnalyzer(generator), 83 | 'evaluator': RewardModelEvaluator() if args.use_reward_model else FailureDetectorEvaluator(), 84 | 'dev_set': dev_set 85 | } 86 | components['optimizer'] = EvolOptimizer(generator, components['evaluator']) 87 | 88 | auto_evol = AutoEvol(components) 89 | 90 | print(f"Dataset: {args.dataset}") 91 | if args.dev_set_size != -1: 92 | print(f"Train set size: {len(train_set)}, Dev set size: {len(dev_set)}") 93 | else: 94 | print(f"Train set size: {len(train_set)}") 95 | print(f"Batch size: {args.batch_size}") 96 | print(f"Number of methods: {args.num_methods}") 97 | print(f"Max concurrent batches: {args.max_concurrent_batches}") 98 | 99 | start_time = time.time() 100 | 101 | output_file = args.output_file 102 | all_results = [] 103 | 104 | total_batches = (len(train_set) + args.batch_size - 1) // args.batch_size # Calculate total number of batches 105 | 106 | for i in range(0, len(train_set), args.batch_size): 107 | batch = train_set[i:i+args.batch_size] 108 | batch_results = await auto_evol.run(batch, batch_size=args.batch_size, num_methods=args.num_methods, max_concurrent_batches=args.max_concurrent_batches, evolve_epoch=args.evolve_epoch) 109 | all_results.extend(batch_results) 110 | 111 | current_batch = i // args.batch_size + 1 112 | print(f"Done batch {current_batch}/{total_batches}") # New print statement for batch progress 113 | print(f"Batch {current_batch} completed. Saving results...") 114 | await save_results(all_results, output_file) 115 | 116 | end_time = time.time() 117 | total_time = end_time - start_time 118 | 119 | print(f"Total execution time: {total_time:.2f} seconds") 120 | print(f"Final results saved to {output_file}") 121 | 122 | if __name__ == "__main__": 123 | asyncio.run(main()) -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | from .generators import * 2 | from .autoevol import AutoEvol -------------------------------------------------------------------------------- /src/analyzers/__init__.py: -------------------------------------------------------------------------------- 1 | from .base_analyzer import BaseAnalyzer 2 | from .trajectory_analyzer import TrajectoryAnalyzer -------------------------------------------------------------------------------- /src/analyzers/base_analyzer.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | class BaseAnalyzer(ABC): 4 | @abstractmethod 5 | def analyze(self, current_method, feedback): 6 | pass -------------------------------------------------------------------------------- /src/analyzers/trajectory_analyzer.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | from typing import List 3 | 4 | from .base_analyzer import BaseAnalyzer 5 | from src.generators import BaseGenerator 6 | 7 | TRAJECTORY_ANALYZER_SYSTEM_PROMPT = """ 8 | You are an expert at analyzing the evolution of a given instruction. You will look at the trajectory of the evolution from an initial instruction and make feedbacks based on how the complexity is being increased in each stage. 9 | """ 10 | 11 | TRAJECTORY_ANALYZER_PROMPT = """ 12 | The following list shows cases where an Instruction evolves into a more complex version of an Instruction. 13 | For each case, stage 0 represents the Instruction in its initial state, and stage 1 requires an increase in complexity based on the previous stage. 14 | 15 | Please identify cases that failed to evolve, and provide the reason why it fails. 16 | 17 | Please strictly output using the following format, do not add anything else to the response: 18 | 19 | ***FORMAT INSTRUCTION*** 20 | Choose one of the two options: 21 | Option 1 - If all cases are evolving correctly, please strictly output: 22 | ### PASSED 23 | 24 | Option 2 - If you identify cases that did not evolve correctly, please strictly output: 25 | ### FAILED - Reason: [reason_of_fail] 26 | and so on... 27 | ***END OF FORMAT INSTRUCTION*** 28 | 29 | Evolution Trajectory: 30 | {{evol_trajectory}} 31 | """ 32 | 33 | class TrajectoryAnalyzer(BaseAnalyzer): 34 | def __init__(self, generator: BaseGenerator) -> None: 35 | self.generator = generator 36 | 37 | async def analyze_async(self, init_instruction: str, evolved_instructions: List[str]) -> List[str]: 38 | async def generate_single(evolved_instruction): 39 | trajectory_str = f""" 40 | Stage 0: {init_instruction} 41 | Stage 1: {evolved_instruction} 42 | """ 43 | trajectory_prompt = TRAJECTORY_ANALYZER_PROMPT.replace('{{evol_trajectory}}', trajectory_str) 44 | feedback = await self.generator.agenerate(prompt=trajectory_prompt, system_prompt=TRAJECTORY_ANALYZER_SYSTEM_PROMPT, temperature=0.2) 45 | 46 | return feedback 47 | 48 | tasks = [generate_single(evolved_instruction) for evolved_instruction in evolved_instructions] 49 | results = await asyncio.gather(*tasks) 50 | 51 | return results 52 | 53 | def analyze(self, init_instruction: str, evolved_instructions: List[str]) -> List[str]: 54 | return asyncio.run(self.analyze_async(init_instruction, evolved_instructions)) -------------------------------------------------------------------------------- /src/autoevol.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import time 3 | from typing import List, Dict, Any 4 | from src.evolvers.recurrent_evolver import INITIAL_EVOLVE_METHOD 5 | from .utils import parse_steps 6 | from tqdm import tqdm 7 | 8 | class AutoEvol: 9 | def __init__(self, components: Dict[str, Any]): 10 | self.components = components 11 | 12 | async def process_instruction(self, instruction: str, num_methods: int, evolve_epoch: int = 2) -> Dict[str, Any]: 13 | start_time = time.time() 14 | instruction_stages = [instruction] 15 | methods = [INITIAL_EVOLVE_METHOD.replace("{{instruction}}", instruction_stages[0])] 16 | current_method = methods[0] 17 | 18 | result = { 19 | "original_instruction": instruction, 20 | "stages": [] 21 | } 22 | 23 | for i in range(evolve_epoch): 24 | stage_start_time = time.time() 25 | stage_result = { 26 | "stage": i + 1, 27 | "input_instruction": instruction_stages[-1], 28 | "method": current_method, 29 | "evolved_instructions": [], 30 | "feedbacks": [], 31 | "optimized_method": "", 32 | "final_evolved_instruction": "" 33 | } 34 | 35 | evolved_instructions = await self.components['evolver'].evolve_async(instruction_stages[-1], current_method, n=num_methods) 36 | 37 | feedbacks = await self.components['analyzer'].analyze_async(instruction_stages[-1], evolved_instructions) 38 | 39 | stage_result["evolved_instructions"] = evolved_instructions 40 | stage_result["feedbacks"] = feedbacks 41 | 42 | optimized_method, _ = await self.components['optimizer'].optimize( 43 | current_method, 44 | feedback=feedbacks, 45 | evolver=self.components['evolver'], 46 | development_set=self.components['dev_set'] if len(self.components['dev_set']) > 0 else [instruction_stages[-1]] 47 | ) 48 | 49 | optimized_method_steps = parse_steps(optimized_method) 50 | optimized_method = self.components['evolver'].build_new_method(optimized_method_steps, instruction_stages[-1]) 51 | 52 | stage_result["optimized_method"] = optimized_method 53 | 54 | evolved_instruction = await self.components['generator'].agenerate(prompt=optimized_method, temperature=0.5) 55 | evolved_instruction_steps = parse_steps(evolved_instruction) 56 | 57 | try: 58 | if evolved_instruction_steps[-1]['step_name'] == 'Finally Rewritten Instruction': 59 | evolved_instruction = evolved_instruction_steps[-1]['step_instruction'] 60 | except: 61 | print('Error: Unexpected step name in evolved instruction') 62 | evolved_instruction = instruction_stages[-1] # Append the same instruction as before 63 | 64 | instruction_stages.append(evolved_instruction) 65 | methods.append(optimized_method) 66 | current_method = self.components['evolver'].build_new_method(optimized_method_steps, evolved_instruction) 67 | 68 | stage_result["final_evolved_instruction"] = evolved_instruction 69 | stage_end_time = time.time() 70 | stage_result["stage_time"] = stage_end_time - stage_start_time 71 | result["stages"].append(stage_result) 72 | 73 | result["final_instruction"] = instruction_stages[-1] 74 | end_time = time.time() 75 | result["total_time"] = end_time - start_time 76 | return result 77 | 78 | async def process_batch(self, batch: List[str], num_methods: int, evolve_epoch: int, pbar: tqdm) -> List[Dict[str, Any]]: 79 | batch_results = await asyncio.gather(*[self.process_instruction(instruction, num_methods, evolve_epoch) for instruction in batch]) 80 | pbar.update(len(batch)) 81 | return batch_results 82 | 83 | async def run(self, dataset: List[str], batch_size: int = 10, num_methods: int = 5, max_concurrent_batches: int = 2, evolve_epoch: int = 2) -> List[Dict[str, Any]]: 84 | print(f"Starting dataset processing. Dataset size: {len(dataset)}, Max concurrent batches: {max_concurrent_batches}") 85 | start_time = time.time() 86 | 87 | batches = [dataset[i:i+batch_size] for i in range(0, len(dataset), batch_size)] 88 | pbar = tqdm(total=len(dataset), desc="Processing instructions") 89 | 90 | semaphore = asyncio.Semaphore(max_concurrent_batches) 91 | 92 | async def process_batch_with_semaphore(batch): 93 | async with semaphore: 94 | return await self.process_batch(batch, num_methods, evolve_epoch, pbar) 95 | 96 | results = await asyncio.gather(*[process_batch_with_semaphore(batch) for batch in batches]) 97 | 98 | pbar.close() 99 | 100 | end_time = time.time() 101 | total_time = end_time - start_time 102 | print(f"\nDataset processing complete. Total time: {total_time:.2f} seconds") 103 | return [item for sublist in results for item in sublist] # Flatten the list of batch results -------------------------------------------------------------------------------- /src/evaluator/__init__.py: -------------------------------------------------------------------------------- 1 | from .base_evaluator import BaseEvaluator 2 | from .failure_detector_evaluator import FailureDetectorEvaluator 3 | from .reward_model_evaluator import RewardModelEvaluator -------------------------------------------------------------------------------- /src/evaluator/base_evaluator.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import List, Optional 3 | 4 | class BaseEvaluator(ABC): 5 | @abstractmethod 6 | def evaluate(self, instructions: List[str], responses: List[str]) -> float: 7 | pass 8 | 9 | @abstractmethod 10 | def select_best_method(self, methods: List[str], instructions: List[str], responses: List[List[str]]) -> tuple: 11 | pass -------------------------------------------------------------------------------- /src/evaluator/failure_detector_evaluator.py: -------------------------------------------------------------------------------- 1 | from .base_evaluator import BaseEvaluator 2 | from typing import List, Tuple 3 | import re 4 | from concurrent.futures import ThreadPoolExecutor, as_completed 5 | 6 | class FailureDetectorEvaluator(BaseEvaluator): 7 | def __init__(self, max_workers: int = 4): 8 | self.stagnant_pattern = re.compile(r'\b(understood|thank you|noted|got it|okay|alright)\b.*\?$', re.IGNORECASE) 9 | self.insufficient_pattern = re.compile(r'\b(sure|certainly|of course|happy to help)\b.*\?$|what do you mean|could you explain', re.IGNORECASE) 10 | self.loss_pattern = re.compile(r'please provide|need more information|could you clarify|what exactly', re.IGNORECASE) 11 | self.executor = ThreadPoolExecutor(max_workers=max_workers) 12 | 13 | def is_failure(self, response: str) -> bool: 14 | return ( 15 | bool(self.stagnant_pattern.search(response)) or 16 | bool(self.insufficient_pattern.search(response)) or 17 | bool(self.loss_pattern.search(response)) 18 | ) 19 | 20 | def evaluate(self, instructions: List[str], responses: List[str]) -> float: 21 | with self.executor as executor: 22 | failure_futures = [executor.submit(self.is_failure, response) for response in responses] 23 | failures = sum(future.result() for future in as_completed(failure_futures)) 24 | return failures / len(responses) 25 | 26 | async def select_best_method(self, methods: List[str], instructions: List[str], responses: List[List[str]]) -> Tuple[str, float]: 27 | evaluation_results = [] 28 | 29 | with self.executor as executor: 30 | futures = [executor.submit(self.evaluate, instructions, method_responses) 31 | for method_responses in responses] 32 | 33 | for method, future in zip(methods, futures): 34 | failure_rate = future.result() 35 | evaluation_results.append((method, failure_rate)) 36 | 37 | best_method, lowest_failure_rate = min(evaluation_results, key=lambda x: x[1]) 38 | return best_method, lowest_failure_rate -------------------------------------------------------------------------------- /src/evaluator/reward_model_evaluator.py: -------------------------------------------------------------------------------- 1 | from .base_evaluator import BaseEvaluator 2 | from typing import List, Optional 3 | import torch 4 | from transformers import AutoModel, AutoTokenizer 5 | from concurrent.futures import ThreadPoolExecutor 6 | import asyncio 7 | 8 | 9 | class RewardModelEvaluator(BaseEvaluator): 10 | def __init__(self, model: str = "internlm/internlm2-1_8b-reward"): 11 | self.model = AutoModel.from_pretrained( 12 | model, 13 | device_map="cuda", 14 | torch_dtype=torch.float16, 15 | trust_remote_code=True, 16 | ) 17 | self.tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True) 18 | self.executor = ThreadPoolExecutor(max_workers=4) # Adjust based on your GPU 19 | 20 | async def get_score(self, instruction: str, response: str) -> float: 21 | chat = [ 22 | {"role": "user", "content": instruction}, 23 | {"role": "assistant", "content": response} 24 | ] 25 | loop = asyncio.get_event_loop() 26 | score = await loop.run_in_executor(self.executor, self.model.get_score, self.tokenizer, chat) 27 | torch.cuda.empty_cache() 28 | return score 29 | 30 | async def evaluate(self, instructions: List[str], responses: List[str]) -> float: 31 | scores = await asyncio.gather(*[self.get_score(instruction, response) 32 | for instruction, response in zip(instructions, responses)]) 33 | torch.cuda.empty_cache() 34 | return sum(scores) / len(scores) 35 | 36 | async def select_best_method(self, methods: List[str], instructions: List[str], responses: List[List[str]]) -> tuple: 37 | evaluation_tasks = [self.evaluate(instructions, method_responses) 38 | for method_responses in responses] 39 | scores = await asyncio.gather(*evaluation_tasks) 40 | torch.cuda.empty_cache() 41 | 42 | best_index = max(range(len(scores)), key=scores.__getitem__) 43 | return methods[best_index], scores[best_index] -------------------------------------------------------------------------------- /src/evolvers/__init__.py: -------------------------------------------------------------------------------- 1 | from .recurrent_evolver import RecurrentEvolver 2 | from .base_evolver import BaseEvolver -------------------------------------------------------------------------------- /src/evolvers/base_evolver.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | class BaseEvolver(ABC): 4 | @abstractmethod 5 | def evolve(self, instruction, evolving_method): 6 | pass -------------------------------------------------------------------------------- /src/evolvers/recurrent_evolver.py: -------------------------------------------------------------------------------- 1 | from .base_evolver import BaseEvolver 2 | from src.generators.base_generator import BaseGenerator 3 | 4 | import asyncio 5 | from typing import List 6 | 7 | INITIAL_EVOLVE_METHOD = """ 8 | You are an Instruction Rewriter that rewrites the given #Instruction# into a more complex version. 9 | Please follow the steps below to rewrite the given "#Instruction#" into a more complex version. 10 | 11 | Step 1: Please read the "#Instruction#" below carefully and list all the possible methods to make this instruction more complex (to make it a bit harder for well-known AI assistants such as ChatGPT and GPT4 to handle). Please do not provide methods to change the language of the instruction! 12 | 13 | Step 2: Please create a comprehensive plan based on the #Methods List# generated in Step 1 to make the #Instruction# more complex. The plan should include several methods from the #Methods List#. 14 | 15 | Step 3: Please execute the plan step by step and provide the #Rewritten Instruction#. #Rewritten Instruction# can only add 10 to 20 words into the "#Instruction#". 16 | 17 | Step 4: Please carefully review the #Rewritten Instruction# and identify any unreasonable parts. Ensure that the #Rewritten Instruction# is only a more complex version of the #Instruction#, make sure that it only adds 10 to 20 words into the "#Instruction#". Just provide the #Finally Rewritten Instruction# without any explanation. 18 | 19 | #Instruction#: {{instruction}} 20 | 21 | REMEMBER that you are generating a more complex version of the instruction (or question), NOT answering #Instruction#. The #Finally Rewritten Instruction# should only add 10 to 20 words the #Instruction# below. 22 | 23 | **Output Instructions** 24 | Please generate the optimized instruction strictly using ONLY the given below format, do not add anything else: 25 | 26 | ```Optimized Instruction 27 | Step 1: 28 | #Methods List# 29 | 30 | Step 2: 31 | #Plan# 32 | 33 | Step 3: 34 | #Rewritten Instruction# 35 | 36 | Step 4: 37 | #Finally Rewritten Instruction# 38 | ``` 39 | """ 40 | 41 | INTERATIVE_EVOLVE_METHOD = """ 42 | You are an Instruction Rewriter that rewrites the given #Instruction# into a more complex version. 43 | Please follow the steps below to rewrite the given "#Instruction#" into a more complex version. 44 | 45 | {steps} 46 | #Instruction#: {instruction} 47 | 48 | REMEMBER that you are generating a more complex version of the instruction (or question), NOT answering #Instruction#. The #Finally Rewritten Instruction# should only add 10 to 20 words the #Instruction# below. 49 | 50 | **Output Instructions** 51 | Please generate the optimized instruction strictly using ONLY the given below format, do not add anything else: 52 | 53 | ```Optimized Instruction 54 | {format_steps} 55 | ``` 56 | """ 57 | 58 | class RecurrentEvolver(BaseEvolver): 59 | def __init__(self, generator: BaseGenerator) -> None: 60 | self.generator = generator 61 | 62 | async def evolve_async(self, instruction: str, evolving_method: str = None, n: int = 1) -> List[str]: 63 | evol_method = evolving_method if evolving_method else INITIAL_EVOLVE_METHOD.format(instruction=instruction) 64 | 65 | async def generate_single(): 66 | return await self.generator.agenerate(evol_method) 67 | 68 | tasks = [generate_single() for _ in range(n)] 69 | results = await asyncio.gather(*tasks) 70 | 71 | return results 72 | 73 | def evolve(self, instruction: str, evolving_method: str = None, n: int = 1) -> List[str]: 74 | return asyncio.run(self.evolve_async(instruction, evolving_method, n)) 75 | 76 | def build_new_method(self, steps, instruction): 77 | step_details = "" 78 | format_steps = "" 79 | 80 | for i, step in enumerate(steps, start=1): 81 | step_name = step['step_name'] 82 | step_instruction = step['step_instruction'] 83 | 84 | step_details += f"Step {i}: {step_instruction}\n\n" 85 | format_steps += f"Step {i}:\n#{step_name}#\n\n" 86 | 87 | new_method = INTERATIVE_EVOLVE_METHOD.format(steps=step_details.strip(), instruction=instruction, format_steps=format_steps.strip()) 88 | return new_method 89 | -------------------------------------------------------------------------------- /src/generators/__init__.py: -------------------------------------------------------------------------------- 1 | from .base_generator import BaseGenerator 2 | from .openai import OpenAIGenerator 3 | from .openrouter import OpenRouterGenerator 4 | from .vllm import VLLMGenerator -------------------------------------------------------------------------------- /src/generators/base_generator.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import Optional 3 | 4 | class BaseGenerator(ABC): 5 | @abstractmethod 6 | def generate(self, prompt: str, system_prompt: Optional[str] = "You are a helpful AI assistant.", temperature: Optional[float] = 0.5) -> str: 7 | pass 8 | 9 | async def agenerate(self, prompt: str, system_prompt: Optional[str] = "You are a helpful AI assistant.", temperature: Optional[float] = 0.5): 10 | pass -------------------------------------------------------------------------------- /src/generators/openai.py: -------------------------------------------------------------------------------- 1 | from typing import Optional 2 | from os import getenv 3 | 4 | from openai import OpenAI 5 | 6 | from .base_generator import BaseGenerator 7 | 8 | class OpenAIGenerator(BaseGenerator): 9 | def __init__(self, model: str = "gpt-4", api_key: Optional[str] = None) -> None: 10 | self.model = model 11 | self.api_key = api_key if api_key else getenv("OPENAI_API_KEY") 12 | self.client = OpenAI() 13 | 14 | def generate(self, prompt: str, system_prompt: str = "You are a helpful AI assistant.", temperature: float = 0.7): 15 | response = self.client.chat.completions.create( 16 | model=self.model, 17 | messages=[ 18 | {"role": "system", "content": system_prompt}, 19 | {"role": "user", "content": prompt} 20 | ], 21 | temperature=temperature,) 22 | # print(response.choices[0].message.content) # For Debuging 23 | return response.choices[0].message.content -------------------------------------------------------------------------------- /src/generators/openrouter.py: -------------------------------------------------------------------------------- 1 | from typing import Optional 2 | from os import getenv 3 | 4 | from openai import OpenAI, AsyncOpenAI 5 | from .openai import OpenAIGenerator 6 | 7 | class OpenRouterGenerator(OpenAIGenerator): 8 | def __init__(self, model: str = "deepseek/deepseek-chat", api_key: Optional[str] = None) -> None: 9 | self.model = model 10 | self.api_key = api_key if api_key else getenv("OPENROUTER_API_KEY") 11 | self.client = OpenAI(base_url="https://openrouter.ai/api/v1", 12 | api_key=self.api_key,) 13 | 14 | self.aclient = AsyncOpenAI(base_url="https://openrouter.ai/api/v1", 15 | api_key=self.api_key,) 16 | 17 | def generate(self, prompt: str, system_prompt: str = "You are a helpful AI assistant.", temperature: float = 0.5): 18 | return super().generate(prompt, system_prompt, temperature) 19 | 20 | async def agenerate(self, prompt: str, system_prompt: str = "You are a helpful AI assistant.", temperature: float = 0.2): 21 | try: 22 | response = await self.aclient.chat.completions.create( 23 | model=self.model, 24 | messages=[ 25 | {"role": "system", "content": system_prompt}, 26 | {"role": "user", "content": prompt} 27 | ], 28 | temperature=temperature,) 29 | # print(response.choices[0].message.content) # For Debuging 30 | return response.choices[0].message.content 31 | except: 32 | return 'error' -------------------------------------------------------------------------------- /src/generators/vllm.py: -------------------------------------------------------------------------------- 1 | from typing import Optional 2 | from os import getenv 3 | 4 | from openai import OpenAI, AsyncOpenAI 5 | from .openai import OpenAIGenerator 6 | 7 | class VLLMGenerator(OpenAIGenerator): 8 | def __init__(self, model: str = "deepseek/deepseek-chat", base_url: str = 'http://localhost:8000/v1') -> None: 9 | self.model = model 10 | # self.api_key = api_key if api_key else 'test-abc1' 11 | self.client = OpenAI(base_url=base_url, api_key='test-abc1') 12 | 13 | self.aclient = AsyncOpenAI(base_url=base_url, api_key='test-abc1') 14 | 15 | def generate(self, prompt: str, system_prompt: str = "You are a helpful AI assistant.", temperature: float = 0.5): 16 | return super().generate(prompt, system_prompt, temperature) 17 | 18 | async def agenerate(self, prompt: str, system_prompt: str = "You are a helpful AI assistant.", temperature: float = 0.5): 19 | response = await self.aclient.chat.completions.create( 20 | model=self.model, 21 | messages=[ 22 | {"role": "system", "content": system_prompt}, 23 | {"role": "user", "content": prompt} 24 | ], 25 | temperature=temperature,) 26 | # print(response.choices[0].message.content) # For Debuging 27 | return response.choices[0].message.content -------------------------------------------------------------------------------- /src/optimizers/__init__.py: -------------------------------------------------------------------------------- 1 | from .base_optimizer import BaseOptimizer 2 | from .evol_optimizer import EvolOptimizer -------------------------------------------------------------------------------- /src/optimizers/base_optimizer.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | 3 | class BaseOptimizer(ABC): 4 | @abstractmethod 5 | def optimize(self, current_method, feedback): 6 | pass -------------------------------------------------------------------------------- /src/optimizers/evol_optimizer.py: -------------------------------------------------------------------------------- 1 | from .base_optimizer import BaseOptimizer 2 | from src.evolvers import RecurrentEvolver 3 | from src.evaluator import BaseEvaluator 4 | from src.generators import BaseGenerator 5 | from src.utils import parse_steps 6 | 7 | from typing import List, Optional 8 | import asyncio 9 | 10 | METHOD_EVOL_PROMPT = """ 11 | Feedback: {feedback} 12 | You are an Instruction Method Optimizer. Based on the feedback from the evolution failure case, optimize the method below to create a more effective instruction rewriting process without negatively impacting performance on other cases. Ensure that the complexity of the optimized method is not lower than the previous method. 13 | If the feedback is "### PASSED", then come up with a better method than the current one to create a more complex and effective instruction rewriting process. Remember that the new method should not be very similar to the current method, be creative with new steps for the new method. 14 | 15 | Current Method: 16 | {current_method} 17 | 18 | **Output Instructions** 19 | Add more steps to achieve the most refined method if needed, however, REMEMBER that the final step in your output has to be "#Finally Rewritten Instruction#" no matter how many steps are added. 20 | Please generate the optimized method strictly using ONLY the given below format, do not add anything else: 21 | 22 | ```Optimized Method 23 | Step 1: 24 | #Methods List# 25 | Describe how to generate a list of methods to make instructions more complex, incorporating the feedback 26 | 27 | Step 2: 28 | #Plan# 29 | Explain how to create a comprehensive plan based on the Methods List 30 | 31 | [Note]Add more steps here as you want to achieve the best method. The steps should align with the instruction domain/topic, and should not involve any tools or visualization, it should be text-only methods. The last step should always be #Finally Rewritten Instruction#. 32 | 33 | Step N-1: 34 | #Rewritten Instruction# 35 | Do not generate new Instruction here, but please provide a detailed the process of executing the plan to rewrite the instruction. You are generating a guide to write a better instruction, NOT THE INSTRUCTION ITSELF. 36 | 37 | Step N: 38 | #Finally Rewritten Instruction# 39 | Do not generate new Instruction here, but please provide the process to write the final rewritten instruction. You are generating a guide to write a better instruction, NOT THE INSTRUCTION ITSELF. 40 | ``` 41 | """ 42 | 43 | class EvolOptimizer(BaseOptimizer): 44 | def __init__(self, generator: BaseGenerator, evaluator: BaseEvaluator) -> None: 45 | self.generator = generator 46 | self.evaluator = evaluator 47 | 48 | async def optimize(self, current_method: str, feedback: List[str], evolver: RecurrentEvolver, development_set: Optional[List] = None): 49 | async def generate_and_evaluate(feedback_item): 50 | optimized_prompt = METHOD_EVOL_PROMPT.format(current_method=current_method, feedback=feedback_item) 51 | evolved_method = await self.generator.agenerate(optimized_prompt, temperature=0.5) 52 | 53 | async def process_instruction(instruction): 54 | async def generate_with_timeout(prompt, temperature): 55 | try: 56 | return await asyncio.wait_for( 57 | self.generator.agenerate(prompt=prompt, temperature=temperature), 58 | timeout=60.0 # 60 seconds timeout 59 | ) 60 | except asyncio.TimeoutError: 61 | return None 62 | try: 63 | parsed_steps = parse_steps(evolved_method) 64 | new_method = evolver.build_new_method(parsed_steps, instruction) 65 | 66 | evolved_instruction = await generate_with_timeout(new_method, 0.2) 67 | if evolved_instruction is None: 68 | # print('bad') 69 | return instruction, "error response" 70 | 71 | try: 72 | parsed_evolved_instruction = parse_steps(evolved_instruction)[-1]['step_instruction'] 73 | except: 74 | fallback_response = await generate_with_timeout(instruction, 0.5) 75 | if fallback_response is None: 76 | # print('bad') 77 | return instruction, "error response" 78 | return instruction, fallback_response 79 | response = await generate_with_timeout(parsed_evolved_instruction, 0.5) 80 | if response is None: 81 | # print('bad') 82 | return instruction, "error response" 83 | 84 | # print('good') 85 | return parsed_evolved_instruction, response 86 | except: 87 | # print('bad') 88 | return instruction, "error response" 89 | 90 | 91 | results = await asyncio.gather(*[process_instruction(instruction) for instruction in development_set]) 92 | evolved_instructions, responses = zip(*results) 93 | 94 | return evolved_method, list(evolved_instructions), list(responses) 95 | 96 | results = await asyncio.gather(*[generate_and_evaluate(item) for item in feedback]) 97 | evolved_methods, all_evolved_instructions, all_responses = zip(*results) 98 | 99 | best_method, best_score = await self.evaluator.select_best_method( 100 | evolved_methods, 101 | [instr for method_instructions in all_evolved_instructions for instr in method_instructions], 102 | [resp for method_responses in all_responses for resp in method_responses] 103 | ) 104 | 105 | return best_method, list(evolved_methods) -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | def parse_sections(string_example): 4 | # Use regular expressions to find sections 5 | pattern = re.compile(r"#.*?#:") 6 | matches = list(pattern.finditer(string_example)) 7 | sections = [] 8 | for i in range(len(matches)): 9 | start = matches[i].end() 10 | end = matches[i+1].start() if i+1 < len(matches) else len(string_example) 11 | section = string_example[start:end].strip() 12 | 13 | # Cut off text before the next "/nStep" if it exists 14 | step_cut = re.search(r'\nStep', section) 15 | if step_cut: 16 | section = section[:step_cut.start()] 17 | 18 | sections.append(section.strip()) 19 | 20 | return sections 21 | 22 | # def parse_steps(example_string): 23 | # # Regular expression to match step instructions 24 | # step_regex = re.compile(r"Step \d+: #([^#]+)#\n([^\n]+(?:\n-(?!Step)[^\n]+)*)", re.MULTILINE) 25 | 26 | # steps_list = [] 27 | # for match in step_regex.finditer(example_string): 28 | # step_dict = { 29 | # "step_name": match.group(1).strip(), 30 | # "step_instruction": match.group(2).strip() 31 | # } 32 | # steps_list.append(step_dict) 33 | 34 | # return steps_list 35 | 36 | def parse_steps(example_string): 37 | # Extract content inside the first pair of triple backticks 38 | content_match = re.search(r'```(.*)```', example_string, re.DOTALL) 39 | if content_match: 40 | example_string = content_match.group(1).strip() 41 | 42 | # Regular expression to match step instructions 43 | step_regex = re.compile(r"Step (\d+):\s*(?:#([^#]+)#)?\s*(.*?)(?=Step \d+:|$)", re.DOTALL) 44 | 45 | steps_list = [] 46 | for match in step_regex.finditer(example_string): 47 | step_number = int(match.group(1)) 48 | step_name = match.group(2).strip() if match.group(2) else "" 49 | step_instruction = match.group(3).strip() 50 | 51 | step_dict = { 52 | "step_number": step_number, 53 | "step_name": step_name, 54 | "step_instruction": step_instruction 55 | } 56 | steps_list.append(step_dict) 57 | 58 | return steps_list -------------------------------------------------------------------------------- /tests/analyzers/test_trajectory.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from src.generators.openai import OpenAIGenerator 3 | from src.generators.openrouter import OpenRouterGenerator 4 | from src.analyzers.trajectory_analyzer import TrajectoryAnalyzer 5 | 6 | # class TestTrajectoryAnalyzer(unittest.TestCase): 7 | def test_analyzer(): 8 | generator = OpenRouterGenerator(model='deepseek/deepseek-chat') 9 | analyzer = TrajectoryAnalyzer(generator=generator) 10 | result = analyzer.analyze("x + 2 = 12, what is x?", ["What is x in the case of 40x^2 - 5 = 40?"]) 11 | assert result is not None 12 | assert len(result) > 0 -------------------------------------------------------------------------------- /tests/evolvers/test_recurrent.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from src.generators import OpenAIGenerator, OpenRouterGenerator 3 | from src.evolvers.recurrent_evolver import RecurrentEvolver 4 | from src.utils import parse_steps 5 | 6 | def test_recurrent_evolver(): 7 | generator = OpenRouterGenerator(model='deepseek/deepseek-chat') 8 | evolver = RecurrentEvolver(generator=generator) 9 | results = evolver.evolve(instruction="What is the sum of 2 and 2, and please provide a detailed explanation of how you arrived at the answer in the form of a mathematical equation? Furthermore, imagine you have 2 apples and receive 2 more, how many apples do you have now?", n=2) 10 | parsed_results = [] 11 | for result in results: 12 | parsed_results.append(parse_steps(result)) 13 | 14 | # print(parsed_results) 15 | 16 | assert len(parsed_results) == 2 17 | assert parsed_results[0][-1]['step_name'] == "Finally Rewritten Instruction", f"Unexpected final step name for result 1: {parsed_results[0][-1]['step_name']}" 18 | assert parsed_results[1][-1]['step_name'] == "Finally Rewritten Instruction", f"Unexpected final step name for result 2: {parsed_results[1][-1]['step_name']}" 19 | -------------------------------------------------------------------------------- /tests/generators/test_openai.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from src.generators.openai import OpenAIGenerator 3 | 4 | def test_generate(): 5 | generator = OpenAIGenerator() 6 | result = generator.generate("Test prompt") 7 | assert result is not None 8 | assert len(result) > 0 -------------------------------------------------------------------------------- /tests/generators/test_openrouter.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | import pytest 3 | from src.generators.openrouter import OpenRouterGenerator 4 | 5 | def test_generate(): 6 | generator = OpenRouterGenerator() 7 | result = generator.generate("Test prompt") 8 | assert result is not None 9 | assert len(result) > 0 -------------------------------------------------------------------------------- /tests/optimizers/test_wizard.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from src.generators import OpenRouterGenerator 3 | from src.optimizers.evol_optimizer import EvolOptimizer 4 | from src.analyzers.trajectory_analyzer import TrajectoryAnalyzer 5 | from src.evolvers.recurrent_evolver import RecurrentEvolver, INITIAL_EVOLVE_METHOD 6 | from src.evaluator.failure_detector_evaluator import FailureDetectorEvaluator 7 | from src.utils import parse_steps 8 | 9 | dev_set = ['Write a python function to perform bubble sort', 'Write a letter to my headmaster asking for a day off.'] 10 | 11 | @pytest.mark.asyncio 12 | async def test_wizard_optimizer(): 13 | generator = OpenRouterGenerator(model='openai/chatgpt-4o-latest') 14 | evolver = RecurrentEvolver(generator) 15 | analyzer = TrajectoryAnalyzer(generator) 16 | detector = FailureDetectorEvaluator() 17 | optimizer = EvolOptimizer(generator, detector) 18 | 19 | init_instruction = "What is the sum of 2 and 2, and please provide a detailed explanation of how you arrived at the answer in the form of a mathematical equation? Furthermore, imagine you have 2 apples and receive 2 more, how many apples do you have now?" 20 | 21 | # Assuming evolver.evolve and analyzer.analyze are also async 22 | evolved_instructions = evolver.evolve(init_instruction, INITIAL_EVOLVE_METHOD, n=5) 23 | 24 | 25 | feedbacks = analyzer.analyze(INITIAL_EVOLVE_METHOD, evolved_instructions) 26 | 27 | optimized_method, methods = await optimizer.optimize(INITIAL_EVOLVE_METHOD.format(instruction=init_instruction), feedback=feedbacks, evolver=evolver, development_set=dev_set) -------------------------------------------------------------------------------- /tests/test_full_flow.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | from typing import List 3 | from src.generators import OpenRouterGenerator 4 | from src.optimizers.evol_optimizer import EvolOptimizer 5 | from src.analyzers.trajectory_analyzer import TrajectoryAnalyzer 6 | from src.evolvers.recurrent_evolver import RecurrentEvolver, INITIAL_EVOLVE_METHOD 7 | from src.evaluator.failure_detector_evaluator import FailureDetectorEvaluator 8 | from src.utils import parse_steps 9 | 10 | async def process_instruction(instruction: str, components: dict) -> List[str]: 11 | instruction_stages = [instruction] 12 | current_method = INITIAL_EVOLVE_METHOD.format(instruction=instruction_stages[0]) 13 | 14 | for i in range(2): 15 | evolved_instructions = components['evolver'].evolve(instruction_stages[-1], current_method, n=5) 16 | feedbacks = components['analyzer'].analyze(instruction_stages[-1], evolved_instructions) 17 | 18 | optimized_method, _ = await components['optimizer'].optimize( 19 | current_method, 20 | feedback=feedbacks, 21 | evolver=components['evolver'], 22 | development_set=components['dev_set'] 23 | ) 24 | 25 | optimized_method_steps = parse_steps(optimized_method) 26 | optimized_method = components['evolver'].build_new_method(optimized_method_steps, instruction_stages[-1]) 27 | 28 | evolved_instruction = await components['generator'].agenerate(prompt=optimized_method, temperature=0.2) 29 | evolved_instruction_steps = parse_steps(evolved_instruction) 30 | 31 | if evolved_instruction_steps[-1]['step_name'] == 'Finally Rewritten Instruction': 32 | evolved_instruction = evolved_instruction_steps[-1]['step_instruction'] 33 | instruction_stages.append(evolved_instruction) 34 | current_method = components['evolver'].build_new_method(optimized_method_steps, evolved_instruction) 35 | else: 36 | print(evolved_instruction) 37 | print('Error: Unexpected step name in evolved instruction') 38 | 39 | return instruction_stages 40 | 41 | async def process_dataset(dataset: List[str], dev_set: List[str]) -> List[List[str]]: 42 | print(f"Starting dataset processing. Dataset size: {len(dataset)}") 43 | 44 | components = { 45 | 'generator': OpenRouterGenerator(model='openai/gpt-4o'), 46 | 'evolver': RecurrentEvolver(OpenRouterGenerator(model='openai/gpt-4o')), 47 | 'analyzer': TrajectoryAnalyzer(OpenRouterGenerator(model='openai/gpt-4o')), 48 | 'detector': FailureDetectorEvaluator(), 49 | 'dev_set': dev_set 50 | } 51 | components['optimizer'] = EvolOptimizer(components['generator'], components['detector']) 52 | 53 | print("Initialized all components") 54 | 55 | async def process_batch(batch: List[str]) -> List[List[str]]: 56 | return await asyncio.gather(*[process_instruction(instruction, components) for instruction in batch]) 57 | 58 | batch_size = 10 59 | results = [] 60 | for i in range(0, len(dataset), batch_size): 61 | batch = dataset[i:i+batch_size] 62 | print(f"Processing batch {i//batch_size + 1}") 63 | results.extend(await process_batch(batch)) 64 | 65 | print("Dataset processing complete") 66 | return results 67 | 68 | import pytest 69 | 70 | @pytest.mark.asyncio 71 | async def test_process_dataset(): 72 | print("Starting test_process_dataset") 73 | dataset = [ 74 | "Write a function to calculate the factorial of a number", 75 | "Explain the concept of recursion in programming", 76 | "Design a simple to-do list application", 77 | "Describe the process of photosynthesis", 78 | "Write a short story about a time traveler", 79 | "Explain the theory of relativity", 80 | "Create a recipe for a three-course meal", 81 | "Write a short story about a time traveler", 82 | "Explain the theory of relativity", 83 | "Create a recipe for a three-course meal" 84 | ] 85 | dev_set = [ 86 | 'Write a python function to perform bubble sort', 87 | 'Write a letter to my headmaster asking for a day off.', 88 | 'Write a python function to perform bubble sort', 89 | 'Write a letter to my headmaster asking for a day off.', 90 | 'Write a python function to perform bubble sort', 91 | ] 92 | 93 | print(f"Dataset size: {len(dataset)}, Dev set size: {len(dev_set)}") 94 | 95 | evolved_dataset = await process_dataset(dataset, dev_set) 96 | 97 | print("Dataset processing complete. Running assertions.") 98 | 99 | assert len(evolved_dataset) == len(dataset) 100 | for i, evolved_instructions in enumerate(evolved_dataset): 101 | print(f"Checking evolved instructions for dataset item {i+1}") 102 | assert len(evolved_instructions) == 3 # Original + 2 evolutions 103 | assert evolved_instructions[0] in dataset 104 | 105 | for j in range(1, len(evolved_instructions)): 106 | assert evolved_instructions[j] != evolved_instructions[j-1] 107 | print(f"All checks passed for dataset item {i+1}") 108 | 109 | for i in range(len(evolved_dataset)): 110 | for j in range(i+1, len(evolved_dataset)): 111 | assert evolved_dataset[i] != evolved_dataset[j] 112 | 113 | print("All assertions passed. Test complete.") 114 | 115 | if __name__ == "__main__": 116 | pytest.main([__file__]) -------------------------------------------------------------------------------- /visualize_evol_instruct.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 3, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import json\n", 10 | "import pandas as pd\n", 11 | "import matplotlib.pyplot as plt\n", 12 | "from IPython.display import display, HTML\n", 13 | "\n", 14 | "def analyze_single_sample(sample):\n", 15 | " # Display instruction evolution\n", 16 | " def display_instruction_evolution(instruction_data):\n", 17 | " html = f\"

Original Instruction: {instruction_data['original_instruction']}

\"\n", 18 | " html += \"\"\n", 19 | " \n", 20 | " for stage in instruction_data['stages']:\n", 21 | " html += f\"\"\n", 22 | " \n", 23 | " html += f\"\"\n", 24 | " html += \"
StageInput InstructionFinal Evolved Instruction
{stage['stage']}{stage['input_instruction']}{stage['final_evolved_instruction']}
Final{instruction_data['final_instruction']}
\"\n", 25 | " \n", 26 | " display(HTML(html))\n", 27 | " \n", 28 | " display_instruction_evolution(sample)\n", 29 | " \n", 30 | " # Word count analysis\n", 31 | " def word_count(text):\n", 32 | " return len(text.split())\n", 33 | " \n", 34 | " stage_word_counts = {0: word_count(sample['original_instruction'])}\n", 35 | " for i, stage in enumerate(sample['stages']):\n", 36 | " stage_word_counts[i+1] = word_count(stage['final_evolved_instruction'])\n", 37 | " # stage_word_counts[len(sample['stages'])] = word_count(sample['final_instruction'])\n", 38 | " \n", 39 | " plt.figure(figsize=(10, 6))\n", 40 | " plt.plot(list(stage_word_counts.keys()), list(stage_word_counts.values()), marker='o')\n", 41 | " plt.title('Word Count of Instructions by Stage')\n", 42 | " plt.xlabel('Stage (0: Original, 0 to N-1: Intermediate, N: Final)')\n", 43 | " plt.ylabel('Word Count')\n", 44 | " plt.xticks(range(0, len(sample['stages']) + 1))\n", 45 | " plt.grid(True)\n", 46 | " plt.show()\n", 47 | " \n", 48 | " # Display methods used in each stage\n", 49 | " def display_methods(instruction_data):\n", 50 | " html = \"

Methods Used in Each Stage

\"\n", 51 | " html += \"\"\n", 52 | " \n", 53 | " for stage in instruction_data['stages']:\n", 54 | " html += f\"\"\n", 55 | " \n", 56 | " html += \"
StageMethod
{stage['stage']}{stage['optimized_method'][:500]}...
\"\n", 57 | " \n", 58 | " display(HTML(html))\n", 59 | " \n", 60 | " display_methods(sample)\n", 61 | " \n", 62 | " # Display statistics\n", 63 | " num_stages = len(sample['stages'])\n", 64 | " print(f\"Number of stages: {num_stages}\")\n", 65 | " \n", 66 | " # Display evolved instructions and feedbacks\n", 67 | " for i, stage in enumerate(sample['stages']):\n", 68 | " print(f\"\\nStage {i + 1}\")\n", 69 | " print(\"Evolved Instructions:\")\n", 70 | " for j, instruction in enumerate(stage['evolved_instructions']):\n", 71 | " print(f\" {j + 1}. {instruction}\")\n", 72 | " print(\"\\nFeedbacks:\")\n", 73 | " for j, feedback in enumerate(stage['feedbacks']):\n", 74 | " print(f\" {j + 1}. {feedback}\")\n", 75 | "\n", 76 | " # Display final instruction\n", 77 | " print(f\"\\nFinal Instruction: {sample['final_instruction']}\")" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 4, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "# change your path\n", 87 | "\n", 88 | "evolved_instruction_path = 'the_tomb_evolved-3e_batch1.json'\n", 89 | "\n", 90 | "with open(evolved_instruction_path, 'r') as f:\n", 91 | " data = json.load(f)" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 7, 97 | "metadata": {}, 98 | "outputs": [ 99 | { 100 | "data": { 101 | "text/html": [ 102 | "

Original Instruction: Please verify my approach to solving the following problem: In triangle ABC, there are 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side AC. How many different quadrilaterals can be formed by selecting four of these points? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 103 | "\n", 104 | "$$\n", 105 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 106 | "$$\n", 107 | "\n", 108 | "Is this method and the resulting expression accurate?

StageInput InstructionFinal Evolved Instruction
1Please verify my approach to solving the following problem: In triangle ABC, there are 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side AC. How many different quadrilaterals can be formed by selecting four of these points? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 109 | "\n", 110 | "$$\n", 111 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 112 | "$$\n", 113 | "\n", 114 | "Is this method and the resulting expression accurate?```Optimized Instruction\n", 115 | "Step 1:\n", 116 | "#Methods List# Incorporate hypothetical scenarios involving different triangle configurations; ask for evaluation of alternative calculation methods; require explanation of each step in the solution, including assumptions and reasoning; identify potential errors in the given approach.\n", 117 | "\n", 118 | "Step 2:\n", 119 | "#Plan# Rewrite the instruction to consider a hypothetical scenario where the triangle's sides have different numbers of points; ask for an evaluation of the given solution compared to alternative methods; require a detailed explanation of the reasoning behind each step, including any assumptions made; identify and justify any potential errors in the calculation.\n", 120 | "\n", 121 | "Step 3:\n", 122 | "#Rewritten Instruction Process# In a hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution against alternative methods, explain the rationale behind each step, and identify any potential errors.\n", 123 | "\n", 124 | "Step 4:\n", 125 | "#Review Process# Ensure the rewritten instruction includes the hypothetical scenario, evaluation of alternative methods, detailed explanation of reasoning, and identification of potential errors. Adjust as necessary to maintain clarity and answerability while increasing complexity.\n", 126 | "\n", 127 | "Step 5:\n", 128 | "#Finally Rewritten Instruction Process# In a hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution against alternative methods, explain the rationale behind each step, and identify any potential errors in the calculation. Is the given method and resulting expression accurate, considering the hypothetical scenario and alternative approaches?\n", 129 | "```
2```Optimized Instruction\n", 130 | "Step 1:\n", 131 | "#Methods List# Incorporate hypothetical scenarios involving different triangle configurations; ask for evaluation of alternative calculation methods; require explanation of each step in the solution, including assumptions and reasoning; identify potential errors in the given approach.\n", 132 | "\n", 133 | "Step 2:\n", 134 | "#Plan# Rewrite the instruction to consider a hypothetical scenario where the triangle's sides have different numbers of points; ask for an evaluation of the given solution compared to alternative methods; require a detailed explanation of the reasoning behind each step, including any assumptions made; identify and justify any potential errors in the calculation.\n", 135 | "\n", 136 | "Step 3:\n", 137 | "#Rewritten Instruction Process# In a hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution against alternative methods, explain the rationale behind each step, and identify any potential errors.\n", 138 | "\n", 139 | "Step 4:\n", 140 | "#Review Process# Ensure the rewritten instruction includes the hypothetical scenario, evaluation of alternative methods, detailed explanation of reasoning, and identification of potential errors. Adjust as necessary to maintain clarity and answerability while increasing complexity.\n", 141 | "\n", 142 | "Step 5:\n", 143 | "#Finally Rewritten Instruction Process# In a hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution against alternative methods, explain the rationale behind each step, and identify any potential errors in the calculation. Is the given method and resulting expression accurate, considering the hypothetical scenario and alternative approaches?\n", 144 | "```In a multi-level hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, with points that can move along the sides, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution and alternative methods, including their advantages and disadvantages, explain the rationale behind each step, and identify any potential errors in the calculation. Reflect on the decision-making process and predict possible challenges in the calculation, proposing solutions to overcome them. Consider the implications of moving points and the robustness of your solution against potential errors.
3In a multi-level hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, with points that can move along the sides, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution and alternative methods, including their advantages and disadvantages, explain the rationale behind each step, and identify any potential errors in the calculation. Reflect on the decision-making process and predict possible challenges in the calculation, proposing solutions to overcome them. Consider the implications of moving points and the robustness of your solution against potential errors.In a multi-level hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, with points that can move along the sides, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution and alternative methods, including their advantages and disadvantages, explain the rationale behind each step, and identify any potential errors in the calculation. Reflect on the decision-making process and predict possible challenges in the calculation, proposing solutions to overcome them. Consider the implications of moving points and the robustness of your solution against potential errors, while also contemplating the adaptability of your strategy based on previous errors in similar problems.
FinalIn a multi-level hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, with points that can move along the sides, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution and alternative methods, including their advantages and disadvantages, explain the rationale behind each step, and identify any potential errors in the calculation. Reflect on the decision-making process and predict possible challenges in the calculation, proposing solutions to overcome them. Consider the implications of moving points and the robustness of your solution against potential errors, while also contemplating the adaptability of your strategy based on previous errors in similar problems.
" 145 | ], 146 | "text/plain": [ 147 | "" 148 | ] 149 | }, 150 | "metadata": {}, 151 | "output_type": "display_data" 152 | }, 153 | { 154 | "data": { 155 | "image/png": "", 156 | "text/plain": [ 157 | "
" 158 | ] 159 | }, 160 | "metadata": {}, 161 | "output_type": "display_data" 162 | }, 163 | { 164 | "data": { 165 | "text/html": [ 166 | "

Methods Used in Each Stage

StageMethod
1\n", 167 | "You are an Instruction Rewriter that rewrites the given #Instruction# into a more complex version.\n", 168 | "Please follow the steps below to rewrite the given \"#Instruction#\" into a more complex version.\n", 169 | "\n", 170 | "Step 1: To generate a list of methods to make instructions more complex, consider incorporating elements that challenge AI's understanding by introducing hypothetical scenarios, asking for the evaluation of multiple solutions, and requiring the explanation of steps taken. Additionally, include methods ...
2\n", 171 | "You are an Instruction Rewriter that rewrites the given #Instruction# into a more complex version.\n", 172 | "Please follow the steps below to rewrite the given \"#Instruction#\" into a more complex version.\n", 173 | "\n", 174 | "Step 1: To generate a list of methods to make instructions more complex, consider incorporating elements such as: \n", 175 | "1.1. Introducing multi-level hypothetical scenarios that require the AI to consider various possible outcomes and their implications.\n", 176 | "1.2. Asking for the evaluation of multiple solutions, ...
3\n", 177 | "You are an Instruction Rewriter that rewrites the given #Instruction# into a more complex version.\n", 178 | "Please follow the steps below to rewrite the given \"#Instruction#\" into a more complex version.\n", 179 | "\n", 180 | "Step 1: To create a list of methods for enhancing instruction complexity, consider integrating:\n", 181 | "1.1. Nested hypothetical scenarios that explore the implications of various outcomes on multiple levels.\n", 182 | "1.2. Comparative analysis of solutions, requiring the evaluation of their relative effectiveness in di...
" 183 | ], 184 | "text/plain": [ 185 | "" 186 | ] 187 | }, 188 | "metadata": {}, 189 | "output_type": "display_data" 190 | }, 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "Number of stages: 3\n", 196 | "\n", 197 | "Stage 1\n", 198 | "Evolved Instructions:\n", 199 | " 1. ```Optimized Instruction\n", 200 | "Step 1:\n", 201 | "#Methods List# - Incorporate mathematical notations for clarity. - Request a detailed step-by-step explanation of the verification process. - Ask for confirmation on the correct usage of combinatorial formulas. - Inquire about potential alternative approaches to solving the problem.\n", 202 | "\n", 203 | "Step 2:\n", 204 | "#Plan# - Add mathematical notations to clarify the combinatorial expressions. - Request a detailed, sequential verification of the solution's logic and calculations. - Seek confirmation on the proper application of combinatorial formulas. - Encourage consideration of alternative problem-solving strategies.\n", 205 | "\n", 206 | "Step 3:\n", 207 | "#Rewritten Instruction# - Please meticulously verify my approach to solving the following problem, employing combinatorial formulas: In triangle \\(ABC\\), there are 3 distinct points on side \\(AB\\), 4 distinct points on side \\(BC\\), and 5 distinct points on side \\(AC\\). How many different quadrilaterals can be formed by selecting four of these points? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 208 | "\\[\n", 209 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 210 | "\\]\n", 211 | "- Is this method and the resulting expression accurate? - Could you provide a detailed, step-by-step verification of the logic and calculations? - Could you confirm the correct usage of combinatorial formulas? - Are there alternative approaches to solving this problem worth considering?\n", 212 | "\n", 213 | "Step 4:\n", 214 | "#Finally Rewritten Instruction# Please meticulously verify my approach to solving the problem using combinatorial formulas: In triangle \\(ABC\\), how many quadrilaterals can be formed by selecting four points from 3 on side \\(AB\\), 4 on side \\(BC\\), and 5 on side \\(AC\\)? I considered cases (1) and (2), and calculated:\n", 215 | "\\[\n", 216 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 217 | "\\]\n", 218 | "Is this method and expression accurate? Could you confirm the correct usage of combinatorial formulas and consider alternative problem-solving strategies?\n", 219 | "```\n", 220 | " 2. ```Optimized Instruction\n", 221 | "Step 1:\n", 222 | "#Methods List#\n", 223 | "1. Incorporate technical terms related to combinatorics and geometry.\n", 224 | "2. Request for a detailed explanation of each step in the solution.\n", 225 | "3. Ask for a generalization of the problem to n points on each side.\n", 226 | "4. Suggest an alternative method to verify the solution, such as a graphical representation.\n", 227 | "5. Inquire about the conditions under which the solution would not hold.\n", 228 | "\n", 229 | "Step 2:\n", 230 | "#Plan#\n", 231 | "1. Add a request for the use of combinatorial identities in the explanation.\n", 232 | "2. Ask for a graphical representation of the quadrilaterals formed.\n", 233 | "3. Generalize the problem to n points on each side and ask for a formula.\n", 234 | "\n", 235 | "Step 3:\n", 236 | "#Rewritten Instruction#\n", 237 | "Please verify my approach to solving the following problem: In triangle ABC, there are 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side AC. How many different quadrilaterals can be formed by selecting four of these points? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 238 | "$$\n", 239 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 240 | "$$\n", 241 | "Is this method and the resulting expression accurate? Please provide a detailed explanation using combinatorial identities, and also include a graphical representation of the quadrilaterals formed. Furthermore, could you generalize this problem to n points on each side and provide a formula for the number of quadrilaterals that can be formed?\n", 242 | "\n", 243 | "Step 4:\n", 244 | "#Finally Rewritten Instruction#\n", 245 | "Please rigorously verify my approach to solving the following problem: In triangle ABC, there are 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side AC. How many different quadrilaterals can be formed by selecting four of these points? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 246 | "$$\n", 247 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 248 | "$$\n", 249 | "Is this method and the resulting expression accurate? Please provide a detailed explanation using combinatorial identities, and also include a graphical representation of the quadrilaterals formed. Additionally, could you generalize this problem to n points on each side?\n", 250 | "```\n", 251 | " 3. ```Optimized Instruction\n", 252 | "Step 1:\n", 253 | "#Methods List#\n", 254 | "1. Introduce additional constraints or conditions.\n", 255 | "2. Require the verification of the method's validity under certain conditions.\n", 256 | "3. Request a detailed explanation for each step of the method.\n", 257 | "4. Ask for alternative methods or approaches to solve the problem.\n", 258 | "5. Include a request for a proof or justification of the final answer.\n", 259 | "\n", 260 | "Step 2:\n", 261 | "#Plan#\n", 262 | "1. Add a condition that the quadrilaterals must not have any sides parallel to the triangle's sides.\n", 263 | "2. Request the verification of the method's validity when considering the additional condition.\n", 264 | "3. Ask for a detailed explanation of how the combination formula is applied in each term of the expression.\n", 265 | "4. Inquire about possible alternative methods to solve the problem, considering the new condition.\n", 266 | "5. Require a proof or justification that the final answer meets the given conditions and is mathematically sound.\n", 267 | "\n", 268 | "Step 3:\n", 269 | "#Rewritten Instruction#\n", 270 | "Please verify my approach to solving the following problem under the condition that no sides of the quadrilaterals are parallel to the triangle's sides: In triangle ABC, there are 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side AC. How many different quadrilaterals can be formed by selecting four of these points? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 271 | "\n", 272 | "$$\n", 273 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 274 | "$$\n", 275 | "\n", 276 | "Is this method and the resulting expression accurate when considering the additional condition? Please provide a detailed explanation for each step of the method and suggest alternative methods if applicable. Additionally, please provide a proof or justification that the final answer meets the given conditions and is mathematically sound.\n", 277 | "\n", 278 | "Step 4:\n", 279 | "#Finally Rewritten Instruction#\n", 280 | "Please verify my approach to solving the following problem under the condition that no sides of the quadrilaterals are parallel to the triangle's sides: In triangle ABC, there are 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side AC. How many different quadrilaterals can be formed by selecting four of these points, ensuring no sides are parallel to the triangle's sides? I considered two cases: (1) two collinear points on one side and two points on different sides, and (2) two points on one side and two points on another side. My calculation is as follows:\n", 281 | "\n", 282 | "$$\n", 283 | "3 \\cdot 4 \\binom{5}{2} + 3 \\cdot 5 \\binom{4}{2} + 4 \\cdot 5 \\binom{3}{2} + \\binom{3}{2} \\binom{4}{2} + \\binom{3}{2} \\binom{5}{2} + \\binom{4}{2} \\binom{5}{2}\n", 284 | "$$\n", 285 | "\n", 286 | "Is this method and the resulting expression accurate when considering the additional condition? Please provide a detailed explanation for each step of the method and suggest alternative methods if applicable. Additionally, please provide a proof or justification that the final answer meets the given conditions and is mathematically sound.\n", 287 | "```\n", 288 | "\n", 289 | "Feedbacks:\n", 290 | " 1. ### PASSED\n", 291 | " 2. ### PASSED\n", 292 | " 3. ### PASSED\n", 293 | "\n", 294 | "Stage 2\n", 295 | "Evolved Instructions:\n", 296 | " 1. ```Optimized Instruction\n", 297 | "Step 1:\n", 298 | "#Methods List# Incorporate a scenario with a complex geometric figure; request evaluation of multiple calculation techniques; demand detailed reasoning for each step; seek identification of possible miscalculations.\n", 299 | "\n", 300 | "Step 2:\n", 301 | "#Plan# Design an instruction focusing on a scenario with a non-standard geometric shape; ask for comparison of the given solution with alternative calculation methods; require a thorough explanation of the logic behind each step; prompt for spotting and justifying potential calculation errors.\n", 302 | "\n", 303 | "Step 3:\n", 304 | "#Rewritten Instruction Process# Given a complex geometric shape with irregularly distributed points, how would you calculate the number of unique triangles that can be formed? Compare the provided solution to other calculation methods, explain the rationale for each step taken, and identify any potential miscalculations.\n", 305 | "\n", 306 | "Step 4:\n", 307 | "#Review Process# Ensure the instruction includes the complex scenario, comparison of methods, detailed rationale, and error identification. Adjust if necessary for clarity and answerability while increasing complexity.\n", 308 | "\n", 309 | "Step 5:\n", 310 | "#Finally Rewritten Instruction Process# Considering a complex, irregular geometric shape with unevenly distributed points, evaluate the method for calculating the number of unique triangles that can be formed. Compare this method to alternatives, thoroughly explain the logic for each step, and critically assess any potential sources of miscalculation. Is the chosen method the most efficient and accurate given the complexity of the shape and the distribution of points?\n", 311 | "```\n", 312 | " 2. ```Optimized Instruction\n", 313 | "Step 1:\n", 314 | "#Methods List# Incorporate a scenario with a complex geometric figure; ask for comparison of multiple calculation methods; require detailed explanation of each calculation step; identify possible errors in provided solutions.\n", 315 | "\n", 316 | "Step 2:\n", 317 | "#Plan# Rewrite the instruction to include a scenario with a complex geometric figure; ask for a comparison of the given solution with alternative calculation methods; require a detailed explanation of each calculation step; identify and justify possible errors in the provided solutions.\n", 318 | "\n", 319 | "Step 3:\n", 320 | "#Rewritten Instruction Process# Consider a scenario with a complex geometric figure composed of multiple shapes. How would you calculate the total area of this figure using the given solution? Compare this method with alternative calculation approaches, explain each step in detail, and identify any possible errors in the provided solution.\n", 321 | "\n", 322 | "Step 4:\n", 323 | "#Review Process# Ensure the rewritten instruction includes the scenario, comparison of methods, detailed explanation of steps, and identification of errors. Adjust if necessary to maintain clarity and answerability while increasing complexity.\n", 324 | "\n", 325 | "Step 5:\n", 326 | "#Finally Rewritten Instruction Process# In a scenario involving a complex geometric figure composed of multiple shapes, calculate the total area using the given solution. Compare this method with alternative approaches, explain each step in detail, and identify any possible errors in the provided solution. Is the given method efficient and accurate for the scenario, considering alternative calculation approaches?\n", 327 | "```\n", 328 | " 3. ```Optimized Instruction\n", 329 | "Step 1:\n", 330 | "#Methods List# Incorporate hypothetical scenarios involving different triangle configurations; ask for evaluation of alternative calculation methods; require explanation of each step in the solution, including assumptions and reasoning; identify potential errors in the given approach.\n", 331 | "\n", 332 | "Step 2:\n", 333 | "#Plan# Rewrite the instruction to consider a hypothetical scenario where the triangle's sides have varying numbers of points; ask for an evaluation of the given solution against alternative methods; require a detailed explanation of the reasoning behind each step, including any assumptions made; identify and justify any potential errors in the calculation.\n", 334 | "\n", 335 | "Step 3:\n", 336 | "#Rewritten Instruction Process# In a scenario where a triangle ABC has 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side CA, calculate the number of different triangles that can be formed by selecting three points. Evaluate the provided solution against alternative methods, explain the rationale behind each step, and identify any potential errors.\n", 337 | "\n", 338 | "Step 4:\n", 339 | "#Review Process# Ensure the rewritten instruction includes the scenario, evaluation of methods, detailed explanation, and error identification. Adjust as necessary to maintain clarity and answerability while increasing complexity.\n", 340 | "\n", 341 | "Step 5:\n", 342 | "#Finally Rewritten Instruction Process# In a scenario where a triangle ABC has 3 distinct points on side AB, 4 distinct points on side BC, and 5 distinct points on side CA, calculate the number of different triangles that can be formed by selecting three points. Evaluate the provided solution against alternative methods, explain the rationale behind each step, and identify any potential errors. Consider the accuracy of the method and the validity of the assumptions made in this complex scenario.\n", 343 | "```\n", 344 | "\n", 345 | "Feedbacks:\n", 346 | " 1. ### PASSED\n", 347 | " 2. ### PASSED\n", 348 | " 3. ### FAILED - Reason: The complexity did not increase from stage 0 to stage 1. In stage 0, the instruction involves calculating the number of different quadrilaterals that can be formed, while in stage 1, it is reduced to calculating the number of different triangles, which is less complex.\n", 349 | "\n", 350 | "Stage 3\n", 351 | "Evolved Instructions:\n", 352 | " 1. ```Optimized Instruction\n", 353 | "Step 1:\n", 354 | "#Methods List#\n", 355 | "1.1. Introduce multi-level hypothetical scenarios.\n", 356 | "1.2. Evaluate multiple solutions and their pros and cons.\n", 357 | "1.3. Explain steps and reasoning, identifying potential errors.\n", 358 | "1.4. Incorporate meta-cognitive questions.\n", 359 | "1.5. Predict challenges and propose solutions.\n", 360 | "\n", 361 | "Step 2:\n", 362 | "#Plan#\n", 363 | "2.1. Consider alternative calculation methods.\n", 364 | "2.2. Explain reasoning and identify errors.\n", 365 | "2.3. Reflect on decision-making process.\n", 366 | "2.4. Predict challenges in calculation.\n", 367 | "2.5. Ensure rewritten instruction is more complex.\n", 368 | "\n", 369 | "Step 3:\n", 370 | "#Elaborate on Methods#\n", 371 | "3.1. Design scenarios that challenge AI's outcome analysis.\n", 372 | "3.2. Provide detailed analysis of solution advantages and disadvantages.\n", 373 | "3.3. Explain reasoning and implications of potential errors.\n", 374 | "3.4. Encourage reflection on thought process and decision-making.\n", 375 | "3.5. Propose solutions to identified challenges.\n", 376 | "\n", 377 | "Step 4:\n", 378 | "#Execute the Plan#\n", 379 | "Rewrite instruction with multi-level scenarios, evaluate methods, explain steps, reflect, and predict challenges.\n", 380 | "\n", 381 | "Step 5:\n", 382 | "#Review the Rewritten Instruction#\n", 383 | "Ensure the rewritten instruction is significantly more complex, clear, and answerable.\n", 384 | "\n", 385 | "Step 6:\n", 386 | "#Finally Rewritten Instruction#\n", 387 | "In a multi-level hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, with points that can move along the sides, devise and evaluate multiple methods to calculate the number of different quadrilaterals that can be formed by selecting four points. Explain the rationale behind each step, identify potential errors, and reflect on the decision-making process. Predict possible challenges in the calculation due to moving points and propose robust solutions to overcome them, considering the implications on the solution's robustness.\n", 388 | "```\n", 389 | " 2. ```Optimized Instruction\n", 390 | "Step 1:\n", 391 | "#Methods List#\n", 392 | "1.1. Introducing multi-level hypothetical scenarios.\n", 393 | "1.2. Evaluating multiple solutions and their pros and cons.\n", 394 | "1.3. Explaining steps and identifying potential errors.\n", 395 | "1.4. Incorporating meta-cognitive reflection.\n", 396 | "1.5. Predicting challenges and proposing solutions.\n", 397 | "\n", 398 | "Step 2:\n", 399 | "#Plan#\n", 400 | "2.1. Challenge the AI to consider moving points and their implications.\n", 401 | "2.2. Ask for the evaluation of the provided solution and alternatives.\n", 402 | "2.3. Require detailed explanations and error identification.\n", 403 | "2.4. Encourage reflection on the decision-making process.\n", 404 | "2.5. Request prediction of challenges and solutions.\n", 405 | "\n", 406 | "Step 3:\n", 407 | "#Elaborate on Methods#\n", 408 | "3.1. Design scenarios that test understanding of moving points.\n", 409 | "3.2. Conduct a detailed analysis of solution effectiveness.\n", 410 | "3.3. Explain each decision and identify possible calculation errors.\n", 411 | "3.4. Reflect on the reasoning behind each choice.\n", 412 | "3.5. Propose solutions for potential calculation challenges.\n", 413 | "\n", 414 | "Step 4:\n", 415 | "#Execute the Plan#\n", 416 | "Rewrite the instruction to include multi-level scenarios, evaluate solutions, explain steps, reflect on decision-making, and predict challenges.\n", 417 | "\n", 418 | "Step 5:\n", 419 | "#Review the Rewritten Instruction#\n", 420 | "Ensure the rewritten instruction is complex, clear, and answerable, with all elements from the plan incorporated.\n", 421 | "\n", 422 | "Step 6:\n", 423 | "#Finally Rewritten Instruction#\n", 424 | "In a multi-level scenario with a moving-point triangle XYZ, calculate the number of possible quadrilaterals formed by selecting four points, considering robustness against errors. Evaluate the provided solution and alternatives, explaining each step and identifying potential errors. Reflect on your decision-making and predict challenges, proposing solutions. Consider the implications of point movement on the solution's robustness.\n", 425 | "```\n", 426 | " 3. ```Optimized Instruction\n", 427 | "Step 1:\n", 428 | "#Methods List#\n", 429 | "1.1. Introduce multi-level hypothetical scenarios involving moving points and side lengths.\n", 430 | "1.2. Evaluate multiple solutions for calculating quadrilaterals, considering their robustness and efficiency.\n", 431 | "1.3. Explain each step in detail, including the reasoning behind decisions and potential errors.\n", 432 | "1.4. Reflect on the decision-making process and thought process.\n", 433 | "1.5. Predict challenges such as point movement and propose error-checking mechanisms.\n", 434 | "\n", 435 | "Step 2:\n", 436 | "#Plan#\n", 437 | "2.1. Create a scenario where points can move dynamically on the triangle's sides.\n", 438 | "2.2. Evaluate three methods for calculating quadrilaterals, discussing their advantages and disadvantages.\n", 439 | "2.3. Explain each step, identifying potential errors and their implications.\n", 440 | "2.4. Incorporate meta-cognitive questions about the decision-making process.\n", 441 | "2.5. Predict possible challenges and propose solutions to ensure accuracy.\n", 442 | "\n", 443 | "Step 3:\n", 444 | "#Elaborate on Methods#\n", 445 | "3.1. Design a scenario that challenges the AI to consider the implications of point movement on quadrilateral formation.\n", 446 | "3.2. Evaluate methods in terms of computational efficiency and robustness against errors.\n", 447 | "3.3. Explain each decision, identifying potential errors and their impact on the final result.\n", 448 | "3.4. Ask reflective questions about the decision-making process and the thought process behind each step.\n", 449 | "3.5. Predict challenges related to point movement and propose mechanisms to check for errors and ensure accuracy.\n", 450 | "\n", 451 | "Step 4:\n", 452 | "#Execute the Plan#\n", 453 | "Rewrite the instruction to include a multi-level scenario with moving points. Evaluate three methods for calculating quadrilaterals, discussing their efficiency and robustness. Explain each step, including the reasoning and potential errors. Incorporate reflective questions about the decision process. Predict challenges and propose solutions to ensure accuracy.\n", 454 | "\n", 455 | "Step 5:\n", 456 | "#Review the Rewritten Instruction#\n", 457 | "Ensure the rewritten instruction is significantly more complex, includes all elements of the plan, and is clear and answerable. Adjust to maintain clarity and complexity.\n", 458 | "\n", 459 | "Step 6:\n", 460 | "#Finally Rewritten Instruction#\n", 461 | "In a multi-level scenario where points on the sides of triangle XYZ can dynamically move, calculate the number of different quadrilaterals that can be formed. Evaluate three methods for calculation, discussing their efficiency and robustness. Explain each step, identifying potential errors and their implications. Reflect on the decision-making process and thought process. Predict challenges related to point movement and propose error-checking mechanisms to ensure the accuracy of your solution.\n", 462 | "```\n", 463 | "\n", 464 | "Feedbacks:\n", 465 | " 1. ### PASSED\n", 466 | " 2. ### PASSED\n", 467 | " 3. ### PASSED\n", 468 | "\n", 469 | "Final Instruction: In a multi-level hypothetical scenario where a triangle XYZ has 4 distinct points on side XY, 5 distinct points on side YZ, and 6 distinct points on side ZX, with points that can move along the sides, how would you calculate the number of different quadrilaterals that can be formed by selecting four points? Evaluate the provided solution and alternative methods, including their advantages and disadvantages, explain the rationale behind each step, and identify any potential errors in the calculation. Reflect on the decision-making process and predict possible challenges in the calculation, proposing solutions to overcome them. Consider the implications of moving points and the robustness of your solution against potential errors, while also contemplating the adaptability of your strategy based on previous errors in similar problems.\n" 470 | ] 471 | } 472 | ], 473 | "source": [ 474 | "analyze_single_sample(data[600])" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": null, 480 | "metadata": {}, 481 | "outputs": [], 482 | "source": [] 483 | } 484 | ], 485 | "metadata": { 486 | "kernelspec": { 487 | "display_name": "base", 488 | "language": "python", 489 | "name": "python3" 490 | }, 491 | "language_info": { 492 | "codemirror_mode": { 493 | "name": "ipython", 494 | "version": 3 495 | }, 496 | "file_extension": ".py", 497 | "mimetype": "text/x-python", 498 | "name": "python", 499 | "nbconvert_exporter": "python", 500 | "pygments_lexer": "ipython3", 501 | "version": "3.11.9" 502 | } 503 | }, 504 | "nbformat": 4, 505 | "nbformat_minor": 2 506 | } 507 | --------------------------------------------------------------------------------