├── .gitignore ├── README.md ├── cifar ├── pipeline.py ├── train.py └── vit.py ├── gpt4_prompt2 ├── losses.pkl ├── messages.pkl └── trajectories.pkl ├── llm_hpo.py ├── load_toy_trajectories.ipynb ├── run_toy_fns.ipynb └── utils └── toy_utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # pycache 2 | __pycache__/ 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Large Language Models for Hyperparameter Optimization 3 | 4 | Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba 5 | 6 | We explore the use of large language models (LLMs) in hyperparameter optimization (HPO). By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that LLMs, within constrained search budgets, can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. 7 | 8 | [Arxiv paper](https://arxiv.org/abs/2312.04528) 9 | 10 | 11 | ## Reproducing experiments on CIFAR 12 | 13 | `cifar/train.py` implements our method on CIFAR-10 and can be used to reproduce the results in our paper. The provide code demonstrates how to conduct our experiments with Vision Transformers and a small ResNet. The same prompt is used for both models in our experiments to make the tuning task more difficult. 14 | 15 | ### Setup 16 | Set up your OpenAI API credentials. You can also modify `LLMHyperparameterTuner` to call a different LLM. 17 | 18 | ``` 19 | # Set up OpenAI API credentials: https://platform.openai.com/api-keys 20 | export OPENAI_API_KEY=(YOUR OPENAI API KEY) 21 | ``` 22 | 23 | Install packages. 24 | ``` 25 | pip install -r requirements.txt 26 | ``` 27 | 28 | ### Usage 29 | 30 | Run the script with desired arguments: 31 | ``` 32 | python train.py [-h] [--random_hparams | --llm] [--rounds ROUNDS] [--search_space SEARCH_SPACE] 33 | [--dataset_dir DATASET_DIR] [--arch ARCH] [--save_dir SAVE_DIR] 34 | [--train_batch_size TRAIN_BATCH_SIZE] [--eval_batch_size EVAL_BATCH_SIZE] 35 | [--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY] [--label_smoothing LABEL_SMOOTHING] 36 | [--optimizer OPTIMIZER] [--num_train_epochs NUM_TRAIN_EPOCHS] [--seed SEED] 37 | ``` 38 | 39 | To run the LLM-based hyperparameter search with a single seed: 40 | ``` 41 | python train.py --llm --seed SEED 42 | ``` 43 | By default, this tunes the optimizer, learning rate, batch size, weight decay, and label smoothing. GPT-4 generated some ranges which we found reasonable. To reproduce our results, run the command with five different random seeds. 44 | 45 | To run random hyperparameter search: 46 | ``` 47 | python train.py --random_hparams --seed SEED 48 | ``` 49 | 50 | This samples randomly from the same default configuration space. To reproduce our results, run the command with 100 different random seeds and then bootstrap to estimate the best error so far. 51 | 52 | If neither --llm nor --random_hparams is specified, a training run is performed with hyperparameters specified by the other arguments. 53 | 54 | ### Trying different architectures 55 | 56 | To try different architectures, modify the --arch argument when running the script. The currently supported architectures are: 57 | 58 | resnet9: A small ResNet architecture with 9 layers 59 | 60 | vit: A Vision Transformer (ViT) architecture 61 | 62 | Example usage: 63 | ``` 64 | python train.py --arch resnet9 65 | python train.py --arch vit 66 | ``` 67 | 68 | ### Results 69 | 70 | The results of the hyperparameter search will be saved in the specified save_dir as JSON files. The files will contain the hyperparameters and corresponding evaluation metrics (loss and accuracy) for each trial. 71 | 72 | ### Adapting to other Datasets and Architectures 73 | To provide more specific details on the dataset and architecture to potentially improve tuning performance, you can modify the initial_prompt in the main function of train.py. 74 | 75 | For example, you can add information about the CIFAR-10 dataset, such as the number of classes, image dimensions, and any data augmentation techniques used. Additionally, you can provide more details about the chosen architecture, such as the number of layers, hidden dimensions, or attention heads for the ViT. 76 | 77 | By providing more context about the dataset and architecture in the prompt, the LLM may be able to generate more informed hyperparameter suggestions. 78 | 79 | ## Toy experiments 80 | 81 | You can find code we use to run 2D experiments in `run_toy_fns.ipynb`. 82 | 83 | ## Cite 84 | If you find our work useful, please cite as: 85 | ``` 86 | @article{zhang2023using, 87 | title={Using Large Language Models for Hyperparameter Optimization}, 88 | author={Zhang, Michael R and Desai, Nishkrit and Bae, Juhan and Lorraine, Jonathan and Ba, Jimmy}, 89 | journal={arXiv preprint arXiv:2312.04528}, 90 | year={2023} 91 | } 92 | ``` -------------------------------------------------------------------------------- /cifar/pipeline.py: -------------------------------------------------------------------------------- 1 | 2 | import copy 3 | import math 4 | from typing import List, Optional 5 | 6 | import datasets 7 | import numpy as np 8 | import torch 9 | import torchvision 10 | from torch import nn 11 | 12 | 13 | class Mul(nn.Module): 14 | def __init__(self, weight: float) -> None: 15 | super().__init__() 16 | self.weight = weight 17 | 18 | def forward(self, x: torch.Tensor) -> torch.Tensor: 19 | return x * self.weight 20 | 21 | 22 | class Flatten(nn.Module): 23 | def forward(self, x: torch.Tensor) -> torch.Tensor: 24 | return x.view(x.size(0), -1) 25 | 26 | 27 | class Residual(nn.Module): 28 | def __init__(self, module: nn.Module) -> None: 29 | super().__init__() 30 | self.module = module 31 | 32 | def forward(self, x: torch.Tensor) -> torch.Tensor: 33 | return x + self.module(x) 34 | 35 | 36 | def construct_resnet9() -> nn.Module: 37 | # ResNet-9 architecture from https://github.com/MadryLab/trak/blob/main/examples/cifar_quickstart.ipynb. 38 | def conv_bn( 39 | channels_in: int, 40 | channels_out: int, 41 | kernel_size: int = 3, 42 | stride: int = 1, 43 | padding: int = 1, 44 | groups: int = 1, 45 | ) -> nn.Module: 46 | return torch.nn.Sequential( 47 | torch.nn.Conv2d( 48 | channels_in, 49 | channels_out, 50 | kernel_size=kernel_size, 51 | stride=stride, 52 | padding=padding, 53 | groups=groups, 54 | bias=False, 55 | ), 56 | torch.nn.BatchNorm2d(channels_out), 57 | torch.nn.ReLU(), 58 | ) 59 | 60 | model = torch.nn.Sequential( 61 | conv_bn(3, 64, kernel_size=3, stride=1, padding=1), 62 | conv_bn(64, 128, kernel_size=5, stride=2, padding=2), 63 | Residual(torch.nn.Sequential(conv_bn(128, 128), conv_bn(128, 128))), 64 | conv_bn(128, 256, kernel_size=3, stride=1, padding=1), 65 | torch.nn.MaxPool2d(2), 66 | Residual(torch.nn.Sequential(conv_bn(256, 256), conv_bn(256, 256))), 67 | conv_bn(256, 128, kernel_size=3, stride=1, padding=0), 68 | torch.nn.AdaptiveMaxPool2d((1, 1)), 69 | Flatten(), 70 | torch.nn.Linear(128, 10, bias=False), 71 | Mul(0.2), 72 | ) 73 | return model 74 | 75 | 76 | def get_cifar10_dataset( 77 | split: str, 78 | indices: List[int] = None, 79 | corrupt_percentage: Optional[float] = None, 80 | dataset_dir: str = "data/", 81 | ) -> datasets.Dataset: 82 | assert split in ["train", "eval_train", "valid"] 83 | 84 | normalize = torchvision.transforms.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.247, 0.243, 0.261)) 85 | if split == "train": 86 | transform_config = torchvision.transforms.Compose( 87 | [ 88 | torchvision.transforms.RandomCrop(32, padding=4), 89 | torchvision.transforms.RandomHorizontalFlip(), 90 | torchvision.transforms.ToTensor(), 91 | normalize, 92 | ] 93 | ) 94 | else: 95 | transform_config = torchvision.transforms.Compose( 96 | [ 97 | torchvision.transforms.ToTensor(), 98 | normalize, 99 | ] 100 | ) 101 | 102 | dataset = torchvision.datasets.CIFAR10( 103 | root=dataset_dir, 104 | download=True, 105 | train=split in ["train", "eval_train"], 106 | transform=transform_config, 107 | ) 108 | 109 | if corrupt_percentage is not None: 110 | if split == "valid": 111 | raise NotImplementedError("Performing corruption on the validation dataset is not supported.") 112 | assert 0.0 < corrupt_percentage <= 1.0 113 | num_corrupt = math.ceil(len(dataset) * corrupt_percentage) 114 | original_targets = np.array(copy.deepcopy(dataset.targets[:num_corrupt])) 115 | new_targets = torch.randint( 116 | 0, 117 | 10, 118 | size=original_targets[:num_corrupt].shape, 119 | generator=torch.Generator().manual_seed(0), 120 | ).numpy() 121 | offsets = torch.randint( 122 | 1, 123 | 9, 124 | size=new_targets[new_targets == original_targets].shape, 125 | generator=torch.Generator().manual_seed(0), 126 | ).numpy() 127 | new_targets[new_targets == original_targets] = (new_targets[new_targets == original_targets] + offsets) % 10 128 | assert (new_targets == original_targets).sum() == 0 129 | dataset.targets[:num_corrupt] = list(new_targets) 130 | 131 | if indices is not None: 132 | dataset = torch.utils.data.Subset(dataset, indices) 133 | 134 | 135 | return dataset -------------------------------------------------------------------------------- /cifar/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import time 5 | from typing import Tuple 6 | 7 | import numpy as np 8 | import torch 9 | import torch.nn.functional as F 10 | from torch import nn 11 | from torch.optim import lr_scheduler 12 | from torch.utils import data 13 | import json 14 | from openai import OpenAI 15 | 16 | from cifar.pipeline import construct_resnet9, get_cifar10_dataset 17 | from cifar.vit import ViT 18 | 19 | DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") 20 | 21 | class LLMHyperparameterTuner: 22 | """A class for tuning hyperparameters using a large language model (LLM). 23 | 24 | This implementation uses the OpenAI API to interact with the LLM and uses 25 | the compressed format prompting approach from "Using Large Language Models 26 | for Hyperparameter Optimization". 27 | """ 28 | def __init__( 29 | self, 30 | initial_prompt: str, 31 | model: str = 'gpt-4-1106-preview', 32 | temperature: float = 0.0, 33 | max_tokens: int = 1000, 34 | frequency_penalty: float = 0.0, 35 | seed: int = 0, 36 | round_digits: int = 4, 37 | ): 38 | """Initialize the LLM hyperparameter tuner. 39 | 40 | Args: 41 | initial_prompt: The initial prompt to use, describing the hyperparameter search space and the 42 | model: The LLM model to use for tuning hyperparameters. 43 | temperature: The temperature parameter for sampling from the LLM. 44 | max_tokens: The maximum number of tokens to generate in each response. 45 | frequency_penalty: The frequency penalty parameter for sampling from the LLM. 46 | seed: The seed to use for sampling from the LLM. 47 | round_digits: The number of digits to round float values to. 48 | """ 49 | self.initial_prompt = initial_prompt 50 | self.model = model 51 | self.temperature = temperature 52 | self.max_tokens = max_tokens 53 | self.frequency_penalty = frequency_penalty 54 | self.seed = seed 55 | self.round_digits = round_digits 56 | self.configs = [] # Store history of configurations and their outcomes 57 | self.reset_messages() 58 | 59 | def reset_messages(self): 60 | """Reset the messages to the initial system message.""" 61 | self.messages = [{"role": "system", "content": "You are a machine learning expert."}] 62 | 63 | def add_message(self, role, content): 64 | """Add a message to the conversation log.""" 65 | self.messages.append({"role": role, "content": content}) 66 | 67 | def generate_prompt(self, error=None): 68 | """Generate the prompt for the next interaction with the LLM.""" 69 | if len(self.configs) > 0: 70 | prompt = self.initial_prompt + "\nThis is what has been done so far:\n" 71 | else: 72 | prompt = self.initial_prompt + "\n" 73 | for i, (config, error_rate, loss) in enumerate(self.configs): 74 | prompt += f"Config {i+1}: {config} Error Rate: {error_rate:.{self.round_digits}e}, Loss: {loss:.{self.round_digits}e}\n" 75 | if error: 76 | prompt += f"We got the following error message with the previous proposal: {error}\n" 77 | prompt += "Provide a config in the same JSON format." 78 | return prompt 79 | 80 | def parse_response(self, response): 81 | hyperparameters_text = response.choices[0].message.content.strip() 82 | try: 83 | hyperparameters = json.loads(hyperparameters_text) 84 | return hyperparameters 85 | except json.JSONDecodeError as e: 86 | raise ValueError(f"Failed to parse JSON response: {e}") 87 | 88 | def suggest_hyperparameters(self, max_retries=2, training_exception=None): 89 | for attempt in range(max_retries): 90 | self.reset_messages() 91 | prompt = self.generate_prompt(error=training_exception) 92 | self.add_message("user", prompt) # Log the user prompt 93 | try: 94 | client = OpenAI() 95 | print('sending messages:', len(self.messages)) 96 | print(self.messages[-1]['content']) 97 | response = client.chat.completions.create( 98 | model=self.model, 99 | messages=self.messages, 100 | temperature=self.temperature, 101 | max_tokens=self.max_tokens, 102 | frequency_penalty=self.frequency_penalty, 103 | seed=self.seed, 104 | response_format={"type": "json_object"}, 105 | ) 106 | hyperparameters = self.parse_response(response) 107 | self.add_message("assistant", response.choices[0].message.content) # Log the parsed response 108 | return hyperparameters 109 | except Exception as e: 110 | print(e) 111 | training_exception = str(e) 112 | if attempt < max_retries - 1: 113 | print(f"Attempt {attempt + 1} failed: {e}. Retrying...") 114 | time.sleep(30) 115 | else: 116 | raise Exception(f"Failed to call LLM after {max_retries} attempts: {e}") 117 | 118 | def update_configs(self, config, validation_error, validation_loss): 119 | self.configs.append((config, validation_error, validation_loss)) 120 | 121 | 122 | def llm_hyperparameter_search(tuner, rounds=10): 123 | """Perform hyperparameter search using the LLM tuner.""" 124 | results = [] 125 | training_exception = None 126 | for i in range(rounds): 127 | try: 128 | config = tuner.suggest_hyperparameters(training_exception=training_exception) 129 | validation_loss, validation_error = train_and_evaluate(config) 130 | tuner.update_configs(config, validation_error, validation_loss) 131 | # save and print results 132 | results.append({ 133 | "config": config, 134 | "accuracy": validation_error, 135 | "loss": validation_loss 136 | 137 | }) 138 | print(f"Round {i+1}: Validation Error = {validation_error}, Validation Loss = {validation_loss}") 139 | print(f"Suggested Config: {config}") 140 | training_exception = None 141 | except Exception as e: 142 | print(f"Error during training: {e}") 143 | print("Continuing to the next round...") 144 | # When an error occurs, append None for accuracy and loss 145 | results.append({ 146 | "config": config, 147 | "accuracy": None, 148 | "loss": None 149 | }) 150 | training_exception = str(e) 151 | return results 152 | 153 | 154 | def parse_args(): 155 | parser = argparse.ArgumentParser(description="Train ResNet-9 model on CIFAR-10 dataset.") 156 | tuner_group = parser.add_mutually_exclusive_group() 157 | # method (random, LLM) for tuning hyperparameters. If none specified, runs a single training run with what is given from argparse. 158 | tuner_group.add_argument('--random_hparams', action='store_true', help='Randomly sample hyperparameters', default=False) 159 | tuner_group.add_argument('--llm', action='store_true', help='Use LLM for hyperparameter search', default=False) 160 | # llm tuning specific hyperparams 161 | parser.add_argument('--rounds', type=int, default=10, help='Number of times we interact with the LLM to get hyperparameters') 162 | parser.add_argument('--search_space', type=str, default='constrained', help='Search space for LLM tuning') 163 | # general hparams 164 | parser.add_argument("--dataset_dir", type=str, default="./data", help="A folder to download or load CIFAR-10 dataset.") 165 | parser.add_argument("--arch", type=str, default="resnet9", help="Architecture to use for training.") 166 | parser.add_argument("--save_dir", type=str, default="./out", help="A folder to save the results") 167 | parser.add_argument("--train_batch_size", type=int, default=512, help="Batch size for the training dataloader.") 168 | parser.add_argument("--eval_batch_size", type=int, default=1024, help="Batch size for the evaluation dataloader.") 169 | parser.add_argument("--learning_rate", type=float, default=0.01, help="Initial learning rate to train the model.") 170 | parser.add_argument("--weight_decay", type=float, default=0.0001, help="Weight decay to train the model.") 171 | parser.add_argument("--label_smoothing", type=float, default=0, help="Label smoothing to train the model.") 172 | parser.add_argument("--optimizer", type=str, default="SGD", help="Optimizer to use for training") 173 | parser.add_argument("--num_train_epochs", type=int, default=20, help="Total number of epochs to train the model.") 174 | parser.add_argument("--seed", type=int, default=1004, help="A seed for reproducible training pipeline.") 175 | args = parser.parse_args() 176 | return args 177 | 178 | def train( 179 | dataset: data.Dataset, 180 | batch_size: int, 181 | num_train_epochs: int, 182 | learning_rate: float, 183 | weight_decay: float, 184 | label_smoothing: float, 185 | optimizer: str = "SGD", 186 | arch: str = "resnet9", 187 | hyps: dict = None, 188 | ) -> nn.Module: 189 | """Train a model on the given dataset with the specified hyperparameters.""" 190 | train_dataloader = data.DataLoader( 191 | dataset=dataset, 192 | batch_size=batch_size, 193 | shuffle=True, 194 | drop_last=True, 195 | ) 196 | if arch == "resnet9": 197 | model = construct_resnet9() 198 | print('Using ResNet-9 model') 199 | elif arch == "vit": 200 | # can optionally specify hyperparameters for the ViT model, otherwise defaults are used 201 | if hyps is None or 'patch_size' not in hyps: 202 | model = ViT( 203 | image_size = 32, 204 | patch_size = 4, 205 | num_classes = 10, 206 | dim = 512, 207 | depth = 6, 208 | heads = 8, 209 | mlp_dim = 512, 210 | dropout = 0.1, 211 | emb_dropout = 0.1 212 | ) 213 | else: 214 | model = ViT( 215 | image_size = 32, 216 | patch_size = hyps['patch_size'], 217 | num_classes = 10, 218 | dim = hyps['dim'], 219 | depth = hyps['depth'], 220 | heads = hyps['heads'], 221 | mlp_dim = hyps['mlp_dim'], 222 | dropout = hyps['dropout'], 223 | emb_dropout = hyps['emb_dropout'], 224 | ) 225 | print('Using ViT model') 226 | model = model.to(DEVICE) 227 | if optimizer.lower() == "sgd": 228 | optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay) 229 | elif optimizer.lower() == "adam": 230 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay) 231 | else: 232 | raise ValueError("Invalid optimizer") 233 | 234 | iters_per_epoch = len(train_dataloader) 235 | lr_peak_epoch = num_train_epochs // 5 236 | # Linearly increase learning rate to peak at lr_peak_epoch, then decrease to 0 237 | lr_schedule = np.interp( 238 | np.arange((num_train_epochs + 1) * iters_per_epoch), 239 | [0, lr_peak_epoch * iters_per_epoch, num_train_epochs * iters_per_epoch], 240 | [0, 1, 0], 241 | ) 242 | scheduler = lr_scheduler.LambdaLR(optimizer, lr_schedule.__getitem__) 243 | 244 | start_time = time.time() 245 | model.train() 246 | for epoch in range(num_train_epochs): 247 | total_loss = 0.0 248 | for batch in train_dataloader: 249 | model.zero_grad() 250 | inputs, labels = batch 251 | inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) 252 | outputs = model(inputs) 253 | loss = F.cross_entropy(outputs, labels, label_smoothing=label_smoothing) 254 | loss.backward() 255 | optimizer.step() 256 | scheduler.step() 257 | total_loss += loss.detach().float() 258 | logging.info(f"Epoch {epoch + 1} - Averaged Loss: {total_loss / len(dataset)}") 259 | end_time = time.time() 260 | elapsed_time = end_time - start_time 261 | logging.info(f"Completed training in {elapsed_time:.2f} seconds.") 262 | return model 263 | 264 | 265 | def evaluate(model: nn.Module, dataset: data.Dataset, batch_size: int) -> Tuple[float, float]: 266 | dataloader = data.DataLoader( 267 | dataset=dataset, 268 | batch_size=batch_size, 269 | shuffle=False, 270 | drop_last=False, 271 | ) 272 | 273 | model.eval() 274 | total_loss, total_correct = 0.0, 0 275 | for batch in dataloader: 276 | with torch.no_grad(): 277 | inputs, labels = batch 278 | inputs, labels = inputs.to(DEVICE), labels.to(DEVICE) 279 | outputs = model(inputs) 280 | loss = F.cross_entropy(outputs, labels, reduction="sum") 281 | total_loss += loss.detach().float() 282 | total_correct += outputs.detach().argmax(1).eq(labels).sum() 283 | 284 | return total_loss.item() / len(dataloader.dataset), total_correct.item() / len(dataloader.dataset) 285 | 286 | def sample_hyperparameters(model, arch_params=False): 287 | hyp = {} 288 | # Sampling learning_rate and weight_decay in log space 289 | learning_rate = 10**np.random.uniform(-4, -1) 290 | weight_decay = 10**np.random.uniform(-5, -1) 291 | 292 | batch_sizes = [32, 64, 128, 256, 512] 293 | batch_size = np.random.choice(batch_sizes) 294 | batch_size = int(batch_size) 295 | 296 | # Sampling label smoothing linearly 297 | label_smoothing = np.random.uniform(0, 0.2) 298 | optimizer = np.random.choice(['SGD', 'Adam']) 299 | hyp.update({ 300 | "learning_rate": learning_rate, 301 | "weight_decay": weight_decay, 302 | "train_batch_size": batch_size, 303 | "label_smoothing": label_smoothing, 304 | "optimizer": optimizer 305 | }) 306 | if model == 'vit' and arch_params: 307 | hyp['patch_size'] = np.random.choice([2, 4, 8]) 308 | hyp['dim'] = np.random.choice([128, 256, 512]) 309 | hyp['depth'] = np.random.choice([4, 6, 8, 12]) 310 | hyp['heads'] = np.random.choice([4, 8, 16]) 311 | hyp['mlp_dim'] = np.random.choice([256, 512, 1024]) 312 | hyp['dropout'] = np.random.uniform(0, 0.5) 313 | hyp['emb_dropout'] = np.random.uniform(0, 0.5) 314 | return hyp 315 | 316 | def save_results(hyperparams, results, trial_number, filename): 317 | """Save hyperparameters and results to a JSON file.""" 318 | if not os.path.exists(filename): 319 | os.makedirs(os.path.dirname(filename), exist_ok=True) 320 | data = {'hyperparameters': hyperparams, 'results': results} 321 | with open(filename, 'w') as f: 322 | json.dump(data, f, indent=4) 323 | 324 | def main(args): 325 | # performs training and evaluation with given hyperparameters 326 | if args.seed is not None: 327 | torch.manual_seed(args.seed) 328 | np.random.seed(args.seed) 329 | 330 | save_filename = os.path.join(args.save_dir, f"results_trial_{args.seed}.json") 331 | print("Checking if results already saved in ", save_filename) 332 | if args.random_hparams: 333 | print('Randomly sampling hyperparameters') 334 | # check if results already saved 335 | if os.path.exists(save_filename): 336 | print('results already saved') 337 | return 338 | # randomly sample hyperparameters 339 | hyps = sample_hyperparameters(args.arch) 340 | else: 341 | lr = args.learning_rate 342 | wd = args.weight_decay 343 | bs = args.train_batch_size 344 | ls = args.label_smoothing 345 | optimizer = args.optimizer 346 | hyps = { 347 | "learning_rate": lr, 348 | "weight_decay": wd, 349 | "train_batch_size": bs, 350 | "label_smoothing": ls, 351 | "optimizer": optimizer 352 | } 353 | print("Hyperparameters:") 354 | for k, v in hyps.items(): 355 | print(f"{k}: {v}") 356 | logging.basicConfig(level=logging.INFO) 357 | logger = logging.getLogger() 358 | 359 | train_dataset = get_cifar10_dataset( 360 | split="train", dataset_dir=args.dataset_dir, 361 | ) 362 | model = train( 363 | dataset=train_dataset, 364 | batch_size=hyps['train_batch_size'], 365 | num_train_epochs=args.num_train_epochs, 366 | learning_rate=hyps['learning_rate'], 367 | weight_decay=hyps['weight_decay'], 368 | label_smoothing=hyps['label_smoothing'], 369 | optimizer=hyps['optimizer'], 370 | arch=args.arch, 371 | hyps=hyps, 372 | ) 373 | 374 | eval_train_dataset = get_cifar10_dataset(split="eval_train", dataset_dir=args.dataset_dir) 375 | train_loss, train_acc = evaluate(model=model, dataset=eval_train_dataset, batch_size=args.eval_batch_size) 376 | logger.info(f"Train loss: {train_loss}, Train Accuracy: {train_acc}") 377 | 378 | eval_dataset = get_cifar10_dataset(split="valid", dataset_dir=args.dataset_dir) 379 | eval_loss, eval_acc = evaluate(model=model, dataset=eval_dataset, batch_size=args.eval_batch_size) 380 | logger.info(f"Evaluation loss: {eval_loss}, Evaluation Accuracy: {eval_acc}") 381 | 382 | if args.random_hparams: 383 | # save hyperparameters and results 384 | save_results(hyps, {'train_loss': train_loss, 'train_acc': train_acc, 'eval_loss': eval_loss, 'eval_acc': eval_acc}, args.seed, save_filename) 385 | 386 | return train_loss, train_acc, eval_loss, eval_acc 387 | 388 | def train_and_evaluate(hyperparameters): 389 | args = parse_args() 390 | args.learning_rate = hyperparameters["learning_rate"] 391 | args.weight_decay = hyperparameters["weight_decay"] 392 | args.train_batch_size = hyperparameters["train_batch_size"] 393 | args.label_smoothing = hyperparameters["label_smoothing"] 394 | args.optimizer = hyperparameters["optimizer"] 395 | train_loss, train_acc, eval_loss, eval_acc = main(args) 396 | eval_error = 1 - eval_acc 397 | return eval_loss, eval_error 398 | 399 | prompt_end = """You will get the validation error rate and loss before you need to specify the next configuration. The goal is to find the configuration that minimizes the error rate with the given budget, so you should explore different parts of the search space if the loss is not changing. Provide a config in JSON format. Do not put new lines or any extra characters in the response, only provide the config. Example config: 400 | { 401 | "optimizer": a 402 | "learning_rate": b 403 | "training batch size": c 404 | "weight_decay": d 405 | "label_smoothing": e 406 | } 407 | """ 408 | 409 | initial_prompt = """You are helping tune hyperparameters for a neural network. This is our hyperparameter search space: 410 | { 411 | "optimizer": must be ["adam", "sgd"] 412 | "learning_rate": positive float 413 | "train_batch_size": positive integer 414 | "weight_decay": nonnegative float 415 | "label_smoothing": nonnegative float 416 | }""" + prompt_end 417 | 418 | # same constraints as random search 419 | initial_prompt_constrained = """You are helping tune hyperparameters for a neural network. This is our hyperparameter search space: 420 | { 421 | "optimizer": must be ["adam", "sgd"] 422 | "learning_rate": between 1e-4 and 1e-1 423 | "train_batch_size": 32, 64, 128, 256, 512 424 | "weight_decay": between 1e-5 and 1e-1 425 | "label_smoothing": between 0 and 0.2 426 | }""" + prompt_end 427 | 428 | 429 | if __name__ == "__main__": 430 | args = parse_args() 431 | np.random.seed(args.seed) 432 | torch.manual_seed(args.seed) 433 | 434 | # performs hyperparam search with GPT-4, check if already run 435 | if args.llm: 436 | if not os.path.exists(args.save_dir): 437 | os.makedirs(args.save_dir) 438 | save_filename = os.path.join(args.save_dir, f"llmtuner_trial_{args.seed}.json") 439 | if os.path.exists(save_filename): 440 | print('results already saved') 441 | else: 442 | if args.search_space == 'constrained': 443 | tuner = LLMHyperparameterTuner(initial_prompt_constrained) 444 | elif args.search_space == 'unconstrained': 445 | tuner = LLMHyperparameterTuner(initial_prompt) 446 | results = llm_hyperparameter_search(tuner, rounds=args.rounds) 447 | results_dict = { 448 | "results": results, 449 | "search_space": args.search_space, 450 | "rounds": args.rounds 451 | } 452 | with open(save_filename, 'w') as f: 453 | json.dump(results_dict, f, indent=4) 454 | # performs a single training run with given hyperparameters 455 | else: 456 | main(args) 457 | -------------------------------------------------------------------------------- /cifar/vit.py: -------------------------------------------------------------------------------- 1 | 2 | # https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py 3 | 4 | import torch 5 | from torch import nn 6 | 7 | from einops import rearrange, repeat 8 | from einops.layers.torch import Rearrange 9 | 10 | # helpers 11 | 12 | def pair(t): 13 | return t if isinstance(t, tuple) else (t, t) 14 | 15 | # classes 16 | 17 | class PreNorm(nn.Module): 18 | def __init__(self, dim, fn): 19 | super().__init__() 20 | self.norm = nn.LayerNorm(dim) 21 | self.fn = fn 22 | def forward(self, x, **kwargs): 23 | return self.fn(self.norm(x), **kwargs) 24 | 25 | class FeedForward(nn.Module): 26 | def __init__(self, dim, hidden_dim, dropout = 0.): 27 | super().__init__() 28 | self.net = nn.Sequential( 29 | nn.Linear(dim, hidden_dim), 30 | nn.GELU(), 31 | nn.Dropout(dropout), 32 | nn.Linear(hidden_dim, dim), 33 | nn.Dropout(dropout) 34 | ) 35 | def forward(self, x): 36 | return self.net(x) 37 | 38 | class Attention(nn.Module): 39 | def __init__(self, dim, heads = 8, dim_head = 64, dropout = 0.): 40 | super().__init__() 41 | inner_dim = dim_head * heads 42 | project_out = not (heads == 1 and dim_head == dim) 43 | 44 | self.heads = heads 45 | self.scale = dim_head ** -0.5 46 | 47 | self.attend = nn.Softmax(dim = -1) 48 | self.to_qkv = nn.Linear(dim, inner_dim * 3, bias = False) 49 | 50 | self.to_out = nn.Sequential( 51 | nn.Linear(inner_dim, dim), 52 | nn.Dropout(dropout) 53 | ) if project_out else nn.Identity() 54 | 55 | def forward(self, x): 56 | qkv = self.to_qkv(x).chunk(3, dim = -1) 57 | q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = self.heads), qkv) 58 | 59 | dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale 60 | 61 | attn = self.attend(dots) 62 | 63 | out = torch.matmul(attn, v) 64 | out = rearrange(out, 'b h n d -> b n (h d)') 65 | return self.to_out(out) 66 | 67 | class Transformer(nn.Module): 68 | def __init__(self, dim, depth, heads, dim_head, mlp_dim, dropout = 0.): 69 | super().__init__() 70 | self.layers = nn.ModuleList([]) 71 | for _ in range(depth): 72 | self.layers.append(nn.ModuleList([ 73 | PreNorm(dim, Attention(dim, heads = heads, dim_head = dim_head, dropout = dropout)), 74 | PreNorm(dim, FeedForward(dim, mlp_dim, dropout = dropout)) 75 | ])) 76 | def forward(self, x): 77 | for attn, ff in self.layers: 78 | x = attn(x) + x 79 | x = ff(x) + x 80 | return x 81 | 82 | class ViT(nn.Module): 83 | def __init__(self, *, image_size, patch_size, num_classes, dim, depth, heads, mlp_dim, pool = 'cls', channels = 3, dim_head = 64, dropout = 0., emb_dropout = 0.): 84 | super().__init__() 85 | image_height, image_width = pair(image_size) 86 | patch_height, patch_width = pair(patch_size) 87 | 88 | assert image_height % patch_height == 0 and image_width % patch_width == 0, 'Image dimensions must be divisible by the patch size.' 89 | 90 | num_patches = (image_height // patch_height) * (image_width // patch_width) 91 | patch_dim = channels * patch_height * patch_width 92 | assert pool in {'cls', 'mean'}, 'pool type must be either cls (cls token) or mean (mean pooling)' 93 | 94 | self.to_patch_embedding = nn.Sequential( 95 | Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1 = patch_height, p2 = patch_width), 96 | nn.Linear(patch_dim, dim), 97 | ) 98 | 99 | self.pos_embedding = nn.Parameter(torch.randn(1, num_patches + 1, dim)) 100 | self.cls_token = nn.Parameter(torch.randn(1, 1, dim)) 101 | self.dropout = nn.Dropout(emb_dropout) 102 | 103 | self.transformer = Transformer(dim, depth, heads, dim_head, mlp_dim, dropout) 104 | 105 | self.pool = pool 106 | self.to_latent = nn.Identity() 107 | 108 | self.mlp_head = nn.Sequential( 109 | nn.LayerNorm(dim), 110 | nn.Linear(dim, num_classes) 111 | ) 112 | 113 | def forward(self, img): 114 | x = self.to_patch_embedding(img) 115 | b, n, _ = x.shape 116 | 117 | cls_tokens = repeat(self.cls_token, '() n d -> b n d', b = b) 118 | x = torch.cat((cls_tokens, x), dim=1) 119 | x += self.pos_embedding[:, :(n + 1)] 120 | x = self.dropout(x) 121 | 122 | x = self.transformer(x) 123 | 124 | x = x.mean(dim = 1) if self.pool == 'mean' else x[:, 0] 125 | 126 | x = self.to_latent(x) 127 | return self.mlp_head(x) 128 | 129 | -------------------------------------------------------------------------------- /gpt4_prompt2/losses.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/michaelrzhang/LLM-HyperOpt/c8dd022089f0f7e7c3da7a75bd53e29bddfea438/gpt4_prompt2/losses.pkl -------------------------------------------------------------------------------- /gpt4_prompt2/messages.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/michaelrzhang/LLM-HyperOpt/c8dd022089f0f7e7c3da7a75bd53e29bddfea438/gpt4_prompt2/messages.pkl -------------------------------------------------------------------------------- /gpt4_prompt2/trajectories.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/michaelrzhang/LLM-HyperOpt/c8dd022089f0f7e7c3da7a75bd53e29bddfea438/gpt4_prompt2/trajectories.pkl -------------------------------------------------------------------------------- /llm_hpo.py: -------------------------------------------------------------------------------- 1 | 2 | import json 3 | import openai 4 | import re 5 | 6 | class LLMOptimizer: 7 | def __init__(self, is_expert="generic", model="gpt-3.5-turbo", 8 | temperature = 0.0, 9 | max_tokens = 600, 10 | frequency_penalty = 0, 11 | use_cot = False, 12 | ): 13 | self.model = model 14 | self.is_expert = is_expert 15 | self.temperature = temperature 16 | self.max_tokens = max_tokens 17 | self.frequency_penalty = frequency_penalty 18 | self.use_cot = use_cot 19 | self.messages = [] 20 | 21 | # Initialize the conversation with the given system prompt 22 | self.initial_config() 23 | 24 | def initial_config(self): 25 | """This expert prompt does not seem to make too much of a difference (see Appendix in paper), but conditioning on good performance is generally a good idea.""" 26 | if self.is_expert == "generic": 27 | message = {"role":"system", "content": "You are a machine learning expert."} 28 | self.messages.append(message) 29 | 30 | def call_llm(self, max_retries=2): 31 | tries = 0 32 | while tries < max_retries: 33 | try: 34 | response = openai.ChatCompletion.create( 35 | model=self.model, 36 | messages=self.messages, 37 | temperature=self.temperature, 38 | max_tokens=self.max_tokens, 39 | frequency_penalty=self.frequency_penalty 40 | ) 41 | # make sure we have the right version 42 | if self.model == "gpt-4": 43 | assert response.model == 'gpt-4-0613' 44 | elif self.model == "gpt-3.5-turbo": 45 | assert response.model == "gpt-3.5-turbo-0613" 46 | config = response["choices"][0]["message"]["content"] 47 | self.messages.append({"role":"assistant", "content": config}) 48 | return config 49 | except Exception as e: 50 | tries += 1 51 | print(e) 52 | import time; time.sleep(30) 53 | print(response) 54 | raise Exception("Failed to call LLM, max retries exceeded") 55 | 56 | def _parse_raw_message(self, raw_message): 57 | # Parse the raw message into model source code, optimizer source code, and hparams 58 | json_match = re.search(r'```json\n(.*)\n```', raw_message, re.DOTALL) 59 | if json_match is None: 60 | raise Exception("Failed to parse raw message") 61 | params = json.loads(json_match.group(1).strip()) 62 | assert isinstance(params, dict) 63 | assert "x1" in params and "x2" in params 64 | 65 | return params 66 | 67 | def parse_message(self, raw_message): 68 | if "Output: " in raw_message: 69 | raw_message = raw_message.split("Output: ")[1] 70 | try: 71 | params = json.loads(raw_message) 72 | params = params["x"] 73 | except: 74 | print("***Raising exception...") 75 | print(raw_message) 76 | raise Exception("Failed to parse message") 77 | return params 78 | 79 | 80 | def ask(self, prompt): 81 | self.messages.append({"role":"user", "content": prompt}) 82 | raw_message = self.call_llm() 83 | params = self.parse_message(raw_message) 84 | return params 85 | 86 | def get_current_messages(self): 87 | return self.messages -------------------------------------------------------------------------------- /load_toy_trajectories.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 7, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os\n", 10 | "import pickle\n", 11 | "import numpy as np\n", 12 | "import matplotlib.pyplot as plt" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 9, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "def load_results(load_path):\n", 22 | " print(\"Loading results from\", load_path)\n", 23 | " trajectory_path = os.path.join(load_path, \"trajectories.pkl\")\n", 24 | " losses_path = os.path.join(load_path, \"losses.pkl\")\n", 25 | " messages_path = os.path.join(load_path, \"messages.pkl\")\n", 26 | " with open(trajectory_path, \"rb\") as f:\n", 27 | " trajectories = pickle.load(f)\n", 28 | " with open(losses_path, \"rb\") as f:\n", 29 | " losses = pickle.load(f)\n", 30 | " with open(messages_path, \"rb\") as f:\n", 31 | " messages = pickle.load(f)\n", 32 | " print(\"successfully loaded\")\n", 33 | " for t, results in losses.items():\n", 34 | " results = np.array(results)\n", 35 | " assert results.shape[0] == 3\n", 36 | " min_losses = np.min(results, axis=1)\n", 37 | " print(f\"t={t}, min_losses={min_losses}\")\n", 38 | " return trajectories, losses, messages" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 11, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "name": "stdout", 48 | "output_type": "stream", 49 | "text": [ 50 | "Loading results from gpt4_prompt2\n", 51 | "successfully loaded\n", 52 | "t=rosenbrock, min_losses=[0. 1. 0.]\n", 53 | "t=branin, min_losses=[1.9431494 3.91125913 0.49791071]\n", 54 | "t=himmelblau, min_losses=[8.125 8.125 8.125]\n", 55 | "t=ackley, min_losses=[0. 0. 0.]\n", 56 | "t=quadratic2d, min_losses=[ 1.445 0.365 28.145]\n", 57 | "t=quadratic2d_10, min_losses=[51.4475 51.14046875 0.6475 ]\n", 58 | "t=shifted_rosenbrock, min_losses=[ 0.528301 163.859581 15.36746506]\n", 59 | "t=shifted_branin, min_losses=[3.25633331 5.58170296 9.89968188]\n", 60 | "t=shifted_himmelblau, min_losses=[16.99008562 47.06711762 47.06711762]\n", 61 | "t=shifted_ackley, min_losses=[1.77061516 1.86148837 4.47514355]\n", 62 | "dict_keys(['rosenbrock', 'branin', 'himmelblau', 'ackley', 'quadratic2d', 'quadratic2d_10', 'shifted_rosenbrock', 'shifted_branin', 'shifted_himmelblau', 'shifted_ackley'])\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "trajectories, losses, messages = load_results(\"gpt4_prompt2\")\n", 68 | "print(trajectories.keys())" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 13, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "data": { 78 | "text/plain": [ 79 | "[[0, 0],\n", 80 | " [-5, 15],\n", 81 | " [-5, 15],\n", 82 | " [0, 10],\n", 83 | " [-5, 15],\n", 84 | " [-5, 15],\n", 85 | " [10, 0],\n", 86 | " [10, 5],\n", 87 | " [10, 10],\n", 88 | " [10, 3]]" 89 | ] 90 | }, 91 | "execution_count": 13, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "trajectories[\"branin\"][0]" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 14, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "[{'role': 'system', 'content': 'You are a machine learning expert.'},\n", 109 | " {'role': 'user',\n", 110 | " 'content': 'You are helping tune hyperparameters to minimize loss. x1 must be in range (-5, 10). x2 must be in range (0, 15). I want you to predict values that minimize the loss of the function; I will tell you the value of the function before you try again. Do not put new lines or extra characters in your response. We have a total of 10 evaluations. Format your output with json as follows: {\"x\": [x1, x2]}\\n'},\n", 111 | " {'role': 'assistant', 'content': '{\"x\": [0, 0]}'},\n", 112 | " {'role': 'user',\n", 113 | " 'content': 'Loss: 5.560e+01. Format your next output as before.'},\n", 114 | " {'role': 'assistant', 'content': '{\"x\": [-5, 15]}'},\n", 115 | " {'role': 'user',\n", 116 | " 'content': 'Loss: 1.751e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 117 | " {'role': 'assistant',\n", 118 | " 'content': 'Analysis: The loss decreased significantly when x1 was at its minimum and x2 was at its maximum. This suggests that the function may be minimized when x1 is less and x2 is more. Let\\'s try to decrease x1 further and increase x2 further within their respective ranges.\\n\\nOutput: {\"x\": [-5, 15]}'},\n", 119 | " {'role': 'user',\n", 120 | " 'content': 'Loss: 1.751e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 121 | " {'role': 'assistant',\n", 122 | " 'content': 'Analysis: The loss remained the same, indicating that we might have reached the optimal values for x1 and x2. However, to confirm this, let\\'s try a different combination by increasing x1 and decreasing x2.\\n\\nOutput: {\"x\": [0, 10]}'},\n", 123 | " {'role': 'user',\n", 124 | " 'content': 'Loss: 3.560e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 125 | " {'role': 'assistant',\n", 126 | " 'content': 'Analysis: The loss increased when we increased x1 and decreased x2. This suggests that the function is indeed minimized when x1 is less and x2 is more. Let\\'s revert back to the previous values to confirm this.\\n\\nOutput: {\"x\": [-5, 15]}'},\n", 127 | " {'role': 'user',\n", 128 | " 'content': 'Loss: 1.751e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 129 | " {'role': 'assistant',\n", 130 | " 'content': 'Analysis: The loss remained the same, confirming our hypothesis that the function is minimized when x1 is at its minimum and x2 is at its maximum. Let\\'s keep these values as they seem to provide the lowest loss.\\n\\nOutput: {\"x\": [-5, 15]}'},\n", 131 | " {'role': 'user',\n", 132 | " 'content': 'Loss: 1.751e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 133 | " {'role': 'assistant',\n", 134 | " 'content': 'Analysis: The loss is consistent at 1.751e+01 when x1 is -5 and x2 is 15. This suggests that we have found the optimal values for x1 and x2. However, to ensure we are not stuck in a local minimum, let\\'s try a different combination.\\n\\nOutput: {\"x\": [10, 0]}'},\n", 135 | " {'role': 'user',\n", 136 | " 'content': 'Loss: 1.096e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 137 | " {'role': 'assistant',\n", 138 | " 'content': 'Analysis: Surprisingly, the loss decreased when we set x1 to its maximum and x2 to its minimum. This suggests that the function might have multiple minima. Let\\'s continue exploring this direction by keeping x1 at its maximum and increasing x2 slightly.\\n\\nOutput: {\"x\": [10, 5]}'},\n", 139 | " {'role': 'user',\n", 140 | " 'content': 'Loss: 5.931e+00. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 141 | " {'role': 'assistant',\n", 142 | " 'content': 'Analysis: The loss decreased further when we increased x2 while keeping x1 at its maximum. This suggests that the function might be minimized when x1 is high and x2 is somewhere in the middle of its range. Let\\'s continue to increase x2.\\n\\nOutput: {\"x\": [10, 10]}'},\n", 143 | " {'role': 'user',\n", 144 | " 'content': 'Loss: 5.090e+01. Write two lines as follows:\\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\\nOutput:{json dict}'},\n", 145 | " {'role': 'assistant',\n", 146 | " 'content': 'Analysis: The loss increased significantly when we increased x2 further while keeping x1 at its maximum. This suggests that the function might be minimized when x1 is high and x2 is lower. Let\\'s decrease x2 slightly.\\n\\nOutput: {\"x\": [10, 3]}'}]" 147 | ] 148 | }, 149 | "execution_count": 14, 150 | "metadata": {}, 151 | "output_type": "execute_result" 152 | } 153 | ], 154 | "source": [ 155 | "messages[\"branin\"][0]" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [] 164 | } 165 | ], 166 | "metadata": { 167 | "kernelspec": { 168 | "display_name": "llm_hpo", 169 | "language": "python", 170 | "name": "llm-hpo" 171 | }, 172 | "language_info": { 173 | "codemirror_mode": { 174 | "name": "ipython", 175 | "version": 3 176 | }, 177 | "file_extension": ".py", 178 | "mimetype": "text/x-python", 179 | "name": "python", 180 | "nbconvert_exporter": "python", 181 | "pygments_lexer": "ipython3", 182 | "version": "3.9.12" 183 | } 184 | }, 185 | "nbformat": 4, 186 | "nbformat_minor": 2 187 | } 188 | -------------------------------------------------------------------------------- /run_toy_fns.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import openai\n", 10 | "from llm_hpo import LLMOptimizer\n", 11 | "from utils.toy_utils import *\n", 12 | "\n", 13 | "%load_ext autoreload\n", 14 | "%autoreload 2" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 3, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "try:\n", 24 | " openai.api_key = os.getenv(\"OPENAI_API_KEY\")\n", 25 | "except:\n", 26 | " #insert key here \n", 27 | " print(\"API key not found in environment variables\")\n", 28 | " openai.api_key = \"\"\n", 29 | " " 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 4, 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "name": "stdout", 39 | "output_type": "stream", 40 | "text": [ 41 | "dict_keys(['rosenbrock', 'branin', 'himmelblau', 'ackley', 'quadratic2d', 'quadratic2d_10', 'shifted_rosenbrock', 'shifted_branin', 'shifted_himmelblau', 'shifted_ackley'])\n" 42 | ] 43 | } 44 | ], 45 | "source": [ 46 | "print(named_toy_functions.keys())" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 5, 52 | "metadata": {}, 53 | "outputs": [ 54 | { 55 | "name": "stdout", 56 | "output_type": "stream", 57 | "text": [ 58 | "Iteration 0 out of 5\n", 59 | "Iteration 1 out of 5\n", 60 | "Iteration 2 out of 5\n", 61 | "Iteration 3 out of 5\n", 62 | "Iteration 4 out of 5\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "# optimize rosenbrock\n", 68 | "name = \"quadratic2d\"\n", 69 | "t = named_toy_functions[name]\n", 70 | "search_budget = 5\n", 71 | "llm_model = \"gpt-3.5-turbo\"\n", 72 | "use_cot = True\n", 73 | "temperature = 0\n", 74 | "prompt_number = 2 \n", 75 | "trajectory, losses, messages = optimize_with_llm(t, search_budget=search_budget, llm_model=llm_model, cot=use_cot, temperature=temperature, prompt_num=prompt_number, verbose=False)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 6, 81 | "metadata": {}, 82 | "outputs": [ 83 | { 84 | "data": { 85 | "image/png": "", 86 | "text/plain": [ 87 | "
" 88 | ] 89 | }, 90 | "metadata": {}, 91 | "output_type": "display_data" 92 | } 93 | ], 94 | "source": [ 95 | "plot_type = \"quadratic\" if \"quadratic\" in name else \"standard\"\n", 96 | "fig = plot_trajectory(t, trajectory, log_scale=False, plot_type=plot_type, func_name=name)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [] 105 | } 106 | ], 107 | "metadata": { 108 | "kernelspec": { 109 | "display_name": "llm_hpo", 110 | "language": "python", 111 | "name": "llm-hpo" 112 | }, 113 | "language_info": { 114 | "codemirror_mode": { 115 | "name": "ipython", 116 | "version": 3 117 | }, 118 | "file_extension": ".py", 119 | "mimetype": "text/x-python", 120 | "name": "python", 121 | "nbconvert_exporter": "python", 122 | "pygments_lexer": "ipython3", 123 | "version": "3.9.12" 124 | } 125 | }, 126 | "nbformat": 4, 127 | "nbformat_minor": 2 128 | } 129 | -------------------------------------------------------------------------------- /utils/toy_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import os 3 | from collections import namedtuple 4 | 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | import openai 8 | 9 | from llm_hpo import LLMOptimizer 10 | 11 | 12 | def optimize_with_llm( 13 | toy_func, search_budget, llm_model, cot, temperature, prompt_num, verbose=False 14 | ): 15 | """Optimize a toy function using LLM.""" 16 | llm = LLMOptimizer(model=llm_model, temperature=temperature) 17 | trajectory = [] 18 | losses = [] 19 | for iteration in range(search_budget): 20 | print(f"Iteration {iteration} out of {search_budget}") 21 | if iteration == 0: 22 | prompt = BlackBoxPrompts.make_inital_prompt( 23 | toy_func.search_space, prompt_num=prompt_num, budget=search_budget 24 | ) 25 | else: 26 | prompt = BlackBoxPrompts.make_feedback_prompt( 27 | loss, iter=iteration, use_cot=cot, prompt_num=prompt_num 28 | ) 29 | params = llm.ask(prompt) 30 | if isinstance(toy_func, QuadraticFunction): 31 | loss = toy_func(params) 32 | else: 33 | loss = toy_func(params[0], params[1]) 34 | if verbose: 35 | print(f"----\nPrompt: {prompt}\nparams: {params}\n Loss: {loss}") 36 | trajectory.append(params) 37 | losses.append(loss) 38 | messages = llm.get_current_messages() 39 | return trajectory, losses, messages 40 | 41 | 42 | # define toy functions 43 | class ToyFunction: 44 | search_space: dict[str, tuple[float]] 45 | optima: dict[str, float] 46 | 47 | def __init__(self, search_space=None, optima=None) -> None: 48 | self.search_space = search_space 49 | self.optima = optima 50 | 51 | def __call__(self, *args): 52 | raise NotImplementedError 53 | 54 | 55 | class QuadraticFunction: 56 | """Axis-aligned quadratic function""" 57 | 58 | def __init__(self, num_dimensions=5, eigenspectrum="uniform", optima_min=5): 59 | # Define optimum c within the range [-min_range, min_range] 60 | self.num_dimensions = num_dimensions 61 | self.min_range = optima_min # dist from origin to optima 62 | 63 | if eigenspectrum == "uniform": 64 | self.eigenvalues = np.ones(self.num_dimensions) 65 | elif eigenspectrum == "random": 66 | self.eigenvalues = np.random.uniform(1, num_dimensions, self.num_dimensions) 67 | else: 68 | raise NotImplementedError 69 | 70 | self.A = np.diag(self.eigenvalues) 71 | self.c = np.random.uniform(-self.min_range, self.min_range, self.num_dimensions) 72 | self.search_space = { 73 | f"x{i+1}": (-self.min_range, self.min_range) 74 | for i in range(self.num_dimensions) 75 | } 76 | 77 | def __call__(self, x): 78 | x_minus_c = x - self.c 79 | return np.dot(x_minus_c.T, np.dot(self.A, x_minus_c)) 80 | 81 | def set_A(self, A): 82 | self.A = A 83 | 84 | def set_c(self, c): 85 | self.c = c 86 | 87 | def evaluate_on_grid(self, X1, X2): 88 | Y = np.zeros_like(X1) 89 | for i in range(X1.shape[0]): 90 | for j in range(X1.shape[1]): 91 | x = np.array([X1[i, j], X2[i, j]]) 92 | Y[i, j] = self(x) 93 | return Y 94 | 95 | def load_dict(self, d): 96 | self.num_dimensions = d["num_dimensions"] 97 | self.eigenvalues = d["eigenvalues"] 98 | self.c = d["c"] 99 | self.A = np.diag(self.eigenvalues) 100 | 101 | def get_dict(self): 102 | # returns dictionary that can be used to reconstruct identical function 103 | return { 104 | "num_dimensions": self.num_dimensions, 105 | "eigenvalues": self.eigenvalues, 106 | "c": self.c, 107 | } 108 | 109 | def get_info(self): 110 | return self.A, self.c 111 | 112 | def __repr__(self) -> str: 113 | return f"QuadraticFunction(eigenspectrum={self.eigenvalues}, min={self.c}" 114 | 115 | 116 | class ShiftedToyFunction: 117 | # shifts everything to mitigate overfitting 118 | def __init__(self, toy_function_instance, shift_value): 119 | self.toy_function_instance = toy_function_instance 120 | self.shift_value = shift_value 121 | self.search_space = self.toy_function_instance.search_space 122 | self.optima = self.toy_function_instance.optima 123 | self.optima["x1"] += self.shift_value[0] 124 | self.optima["x2"] += self.shift_value[1] 125 | # check if instance has optimas 126 | if getattr(self.toy_function_instance, "optimas", None) is not None: 127 | self.optimas = [] 128 | for optima in self.toy_function_instance.optimas: 129 | self.optimas.append( 130 | { 131 | "x1": optima["x1"] + self.shift_value[0], 132 | "x2": optima["x2"] + self.shift_value[1], 133 | } 134 | ) 135 | 136 | def __call__(self, x1, x2) -> float: 137 | return self.toy_function_instance( 138 | x1 - self.shift_value[0], x2 - self.shift_value[1] 139 | ) 140 | 141 | 142 | class RosenbrockFunction(ToyFunction): 143 | def __init__(self): 144 | search_space = { 145 | "x1": (-5, 10), 146 | "x2": (-5, 10), # TODO: check if this is correct 147 | } 148 | optima = { 149 | "x1": 1.0, 150 | "x2": 1.0, 151 | } 152 | super().__init__(search_space=search_space, optima=optima) 153 | 154 | def __call__(self, x1: float, x2: float) -> float: 155 | # See https://en.wikipedia.org/wiki/Rosenbrock_function. 156 | # Search space: [-5, 10] x [5, 10] 157 | return (1 - x1) ** 2.0 + 100 * (x2 - x1**2.0) ** 2.0 158 | 159 | 160 | class BraninFunction(ToyFunction): 161 | def __init__(self): 162 | search_space = { 163 | "x1": (-5, 10), 164 | "x2": (0, 15), 165 | } 166 | optima = { 167 | "x1": -math.pi, 168 | "x2": 12.275, 169 | } 170 | optimas = [ 171 | {"x1": -math.pi, "x2": 12.275}, 172 | {"x1": math.pi, "x2": 2.275}, 173 | {"x1": 9.42478, "x2": 2.475}, 174 | ] 175 | self.optimas = optimas 176 | super().__init__(search_space=search_space, optima=optima) 177 | 178 | def __call__(self, x1: float, x2: float) -> float: 179 | # Reference code: 180 | # https://github.com/automl/SMAC3/blob/main/benchmark/src/models/branin.py. 181 | # Search space: [-5, 10] x [0, 15]. 182 | pi = math.pi 183 | a = 1 184 | b = 5.1 / ((2 * pi) ** 2) 185 | c = 5 / pi 186 | r = 6 187 | s = 10 188 | t = 1 / (8 * pi) 189 | return a * (x2 - b * x1**2 + c * x1 - r) ** 2 + s * (1 - t) * np.cos(x1) + s 190 | 191 | 192 | class HimmelblauFunction(ToyFunction): 193 | def __init__(self): 194 | search_space = { 195 | "x1": (-5, 5), 196 | "x2": (-5, 5), 197 | } 198 | optima = { 199 | "x1": 3.0, 200 | "x2": 2.0, 201 | } 202 | optimas = [ 203 | {"x1": 3, "x2": 2}, 204 | {"x1": -2.805118, "x2": 3.131312}, 205 | {"x1": -3.779310, "x2": -3.283186}, 206 | {"x1": 3.584428, "x2": -1.848126}, 207 | ] 208 | self.optimas = optimas 209 | super().__init__(search_space=search_space, optima=optima) 210 | 211 | def __call__(self, x1: float, x2: float) -> float: 212 | # Reference code: 213 | # https://github.com/automl/SMAC3/blob/main/benchmark/src/models/himmelblau.py. 214 | # Search space: [-5, 5] x [-5, 5] 215 | return (x1**2 + x2 - 11) ** 2 + (x1 + x2**2 - 7) ** 2 216 | 217 | 218 | class AckleyFunction(ToyFunction): 219 | def __init__(self): 220 | search_space = { 221 | "x1": (-5, 5), 222 | "x2": (-5, 5), 223 | } 224 | optima = { 225 | "x1": 0.0, 226 | "x2": 0.0, 227 | } 228 | super().__init__(search_space=search_space, optima=optima) 229 | 230 | def __call__(self, x1: float, x2: float) -> float: 231 | # Reference code: 232 | return ( 233 | -20 * np.exp(-0.2 * np.sqrt(0.5 * (x1**2 + x2**2))) 234 | - np.exp(0.5 * (np.cos(2 * math.pi * x1) + np.cos(2 * math.pi * x2))) 235 | + math.e 236 | + 20 237 | ) 238 | 239 | 240 | named_toy_functions = { 241 | "rosenbrock": RosenbrockFunction(), 242 | "branin": BraninFunction(), 243 | "himmelblau": HimmelblauFunction(), 244 | "ackley": AckleyFunction(), 245 | } 246 | 247 | function_names = tuple(named_toy_functions.keys()) 248 | toy_functions = namedtuple("_toy_funcs", named_toy_functions.keys())( 249 | **named_toy_functions 250 | ) 251 | 252 | quadratic_uniform_2d = QuadraticFunction( 253 | num_dimensions=2, eigenspectrum="uniform", optima_min=5 254 | ) 255 | quad_optima_x = -4.15 256 | quad_optima_y = 3.35 257 | quadratic_uniform_2d.set_c(np.array([quad_optima_x, quad_optima_y])) 258 | 259 | quadratic_badcondition_2d = QuadraticFunction( 260 | num_dimensions=2, eigenspectrum="uniform", optima_min=5 261 | ) 262 | quadratic_badcondition_2d.set_c(np.array([quad_optima_x, quad_optima_y])) 263 | quadratic_badcondition_2d.set_A(np.array([[1, 0], [0, 10]])) 264 | 265 | named_toy_functions.update( 266 | { 267 | "quadratic2d": quadratic_uniform_2d, 268 | "quadratic2d_10": quadratic_badcondition_2d, 269 | } 270 | ) 271 | 272 | shift_value = (0.93, 0.59) # we use these throughout experiments 273 | named_toy_functions.update( 274 | { 275 | "shifted_rosenbrock": ShiftedToyFunction(RosenbrockFunction(), shift_value), 276 | "shifted_branin": ShiftedToyFunction(BraninFunction(), shift_value), 277 | "shifted_himmelblau": ShiftedToyFunction(HimmelblauFunction(), shift_value), 278 | "shifted_ackley": ShiftedToyFunction(AckleyFunction(), shift_value), 279 | } 280 | ) 281 | 282 | 283 | def get_function_name(function: ToyFunction) -> str: 284 | return repr(function).split("Function")[0] 285 | 286 | 287 | def get_function(name: str) -> ToyFunction: 288 | return named_toy_functions[name] 289 | 290 | 291 | # Prompts for 2D Toy functions 292 | class BlackBoxPrompts: 293 | def make_inital_prompt(search_space, prompt_num, budget=10): 294 | if prompt_num == 0: 295 | prompt = f"""You are optimizing a function with two inputs. x1 must be in range {search_space['x1']}. x2 must be in range {search_space['x2']}. I want you to predict values that minimize the loss of the function; I will tell you the value of the function before you try again. Do not put new lines or extra characters in your response. Format your output with json as follows: {{"x": [x1, x2]}}\n""" 296 | elif prompt_num == 1: 297 | prompt = f"""You are optimizing a function with two inputs. x1 must be in range {search_space['x1']}. x2 must be in range {search_space['x2']}. I want you to predict values that minimize the loss of the function; I will tell you the value of the function before you try again. Do not put new lines or extra characters in your response. We have a total of {budget} evaluations. Format your output with json as follows: {{"x": [x1, x2]}}\n""" 298 | elif prompt_num == 2: 299 | prompt = f"""You are helping tune hyperparameters to minimize loss. x1 must be in range {search_space['x1']}. x2 must be in range {search_space['x2']}. I want you to predict values that minimize the loss of the function; I will tell you the value of the function before you try again. Do not put new lines or extra characters in your response. We have a total of {budget} evaluations. Format your output with json as follows: {{"x": [x1, x2]}}\n""" 300 | elif prompt_num == 3: 301 | prompt = f"""You are helping tune hyperparameters to minimize loss. x1 must be in range {search_space['x1']}. x2 must be in range {search_space['x2']}. The training process is deterministic and yields a nonnegative loss. I want you to predict values that minimize the loss of the function; I will tell you the value of the function before you try again. Do not put new lines or extra characters in your response. We have a total of {budget} evaluations. Format your output with json as follows: {{"x": [x1, x2]}}\n""" 302 | else: 303 | raise NotImplementedError 304 | return prompt 305 | 306 | def make_feedback_prompt(loss, iter=1, use_cot=True, prompt_num=1): 307 | # prompt_num currently unused 308 | if iter > 0 and use_cot: 309 | return f"""Loss: {loss:.3e}. Write two lines as follows:\nAnalysis:(up to a few sentences describing what worked so far and what to choose next)\nOutput:{{json dict}}""" 310 | else: 311 | return f"""Loss: {loss:.3e}. Format your next output as before.""" 312 | 313 | 314 | def plot_trajectory( 315 | func: ToyFunction, 316 | trajectory: list[tuple[float]], 317 | log_scale=False, 318 | plot_type="standard", 319 | func_name="example", 320 | ): 321 | fig, ax = plt.subplots() 322 | # ax.set_title(f"Trajectory on {func_name}") 323 | ax.set_xlabel("x1") 324 | ax.set_ylabel("x2") 325 | if plot_type == "standard": 326 | x1_lower, x1_upper = func.search_space["x1"] 327 | x1_lower -= 0.5 328 | x1_upper += 0.5 329 | x2_lower, x2_upper = func.search_space["x2"] 330 | x2_lower -= 0.5 331 | x2_upper += 0.5 332 | else: 333 | x1_lower, x1_upper = -6, 6 334 | x2_lower, x2_upper = -6, 6 335 | ax.set_xlim(x1_lower, x1_upper) 336 | ax.set_ylim(x2_lower, x2_upper) 337 | 338 | # setup log scale if needed 339 | norm = LogNorm() if log_scale else None 340 | 341 | # Plot the function as a color map 342 | x1 = np.linspace(x1_lower, x1_upper, 150) 343 | x2 = np.linspace(x2_lower, x2_upper, 150) 344 | X1, X2 = np.meshgrid(x1, x2) 345 | if plot_type == "standard": 346 | Y = func(X1, X2) 347 | else: 348 | Y = func.evaluate_on_grid(X1, X2) 349 | c = ax.pcolormesh(X1, X2, Y, shading="auto", cmap="inferno", norm=norm) 350 | fig.colorbar(c, ax=ax) 351 | 352 | # Add contours 353 | contour = ax.contour(X1, X2, Y, levels=20, colors="salmon", norm=norm) 354 | ax.clabel(contour, inline=1, fontsize=8) 355 | 356 | # Plot the trajectory with arrows 357 | x1_vals, x2_vals = zip(*trajectory) 358 | ax.plot(x1_vals, x2_vals, marker="o", color="royalblue", linewidth=1, markersize=1) 359 | for i in range(1, len(x1_vals)): 360 | ax.quiver( 361 | x1_vals[i - 1], 362 | x2_vals[i - 1], 363 | x1_vals[i] - x1_vals[i - 1], 364 | x2_vals[i] - x2_vals[i - 1], 365 | angles="xy", 366 | scale_units="xy", 367 | scale=1, 368 | color="royalblue", 369 | ) 370 | 371 | # Indicate start and end of the trajectory 372 | ax.plot(x1_vals[0], x2_vals[0], marker="o", color="mediumseagreen", label="Start") 373 | ax.plot(x1_vals[-1], x2_vals[-1], marker="s", color="dodgerblue", label="End") 374 | 375 | # Mark the optima 376 | if plot_type == "standard": 377 | # check if func has optimas 378 | if getattr(func, "optimas", None) is not None: 379 | print("multiple optimas") 380 | # iterate through optimas 381 | for i, optima in enumerate(func.optimas): 382 | x1_opt, x2_opt = optima["x1"], optima["x2"] 383 | if i == 0: 384 | ax.plot( 385 | x1_opt, 386 | x2_opt, 387 | marker="*", 388 | markersize=12, 389 | color="limegreen", 390 | label="Optima", 391 | ) 392 | else: 393 | ax.plot( 394 | x1_opt, x2_opt, marker="*", markersize=12, color="limegreen" 395 | ) 396 | else: 397 | print("single optima") 398 | x1_opt, x2_opt = func.optima["x1"], func.optima["x2"] 399 | ax.plot( 400 | x1_opt, 401 | x2_opt, 402 | marker="*", 403 | markersize=12, 404 | color="limegreen", 405 | label="Optima", 406 | ) 407 | else: 408 | x1_opt, x2_opt = quad_optima_x, quad_optima_y 409 | ax.plot( 410 | x1_opt, x2_opt, marker="*", markersize=12, color="limegreen", label="Optima" 411 | ) 412 | 413 | ax.legend() 414 | plt.show() 415 | return fig 416 | --------------------------------------------------------------------------------