├── .gitignore ├── README.md ├── configs ├── README.md ├── args_guide.yaml ├── default_training_configs │ ├── default_full_ft.yaml │ ├── default_lora.yaml │ └── default_qlora.yaml ├── sweep_configs │ ├── full_ft_sweep.yaml │ ├── lora_sweep.yaml │ └── qlora_sweep.yaml └── test │ └── qlora_experiment.yaml ├── installation.sh ├── requirements.txt ├── setup.py └── sweep.py /.gitignore: -------------------------------------------------------------------------------- 1 | axolotl/ 2 | *.arrow 3 | *wandb/ 4 | *last_run_prepared/ 5 | *qlora-out/ 6 | *sweep_id.txt 7 | *.idea/ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Comparing QLora, Lora, and Full Fine-tuning 2 | Comprehensive analysis of difference in performance of QLora, Lora, and Full Fine-tuning. 3 | 4 | 5 | ## Installation 6 | ### 1. Install python 3.9.x or 3.10.x 7 | ### 2. Install pytorch stable: 8 | >https://pytorch.org/get-started/locally/ 9 | > 10 | >If this step fails, either at install or gives an error when training, do `pip uninstall torch` and try simply `pip install torch` 11 | > 12 | ### 3. Install axolotl and dependencies 13 | ``` 14 | git clone https://github.com/AblateIt/axolotl.git 15 | pip install -e axolotl/. 16 | pip install -U git+https://github.com/huggingface/peft.git 17 | ``` 18 | There is a `requirements.txt` file in this repo, you might need to install some packages from this depending on what you are missing. 19 | 20 | ## For contributors running sweeps and training 21 | ### 1. Request access to the AblateIt WandB and HuggingFace teams 22 | ### 2. Log into wandb and HuggingFace through the CLI 23 | wandb login (login with the account added to the wandb org) 24 | huggingface-cli login (login with the account added to the HF org) 25 | 26 | ### How to start a sweep (you most likely will never do it) 27 | 1. Activate the correct environment 28 | 2. Set the default location to create new projects to `ablateit`. This is required to create the sweep but not to run finetuning. 29 | 3. `python sweep.py --sweep_config --project --default_training_args ` 30 | 31 | For example to run QLora sweep, this command can be run 32 | `python sweep.py --sweep_config configs/sweep_configs/qlora_sweep.yaml --project test-qlora_sweep --default_training_args configs/default_training_configs/default_qlora.yaml` 33 | 34 | ### How to Finetune configurations from a sweep. 35 | 1. Check if you have a default acclerate config and if you have it then delete it. You can check your huggingface cache folder, by default it points to this `~/.cache/huggingface/accelerate/default_config.yaml`, if the `default_config.yaml` file exists then delete it. 36 | 2. Test your code by running the command `CUDA_VISIBLE_DEVICES=0 accelerate launch axolotl/scripts/finetune.py configs/test/qlora_experiment.yaml --main_process_port 0`, this should run a qlora run on your GPU0. If not then please fix the error before running a sweep or else you will pull configurations from the sweep which will crash and no one else will be able to run them as well. 37 | 38 | 3. You would need a `sweep_id` and a `project_id` from one of the contributor who has started a sweep in order to run finetune experiments. 39 | 40 | `python sweep.py --sweep_id --project --gpu ` 41 | 42 | For example, this sample command will run finetuning on GPU 0. 43 | `python sweep.py --sweep_id usevjjyj --CUDA_device_ids 0` 44 | 45 | 46 | ## FAQs 47 | #### 1. Accelerate running experiments on multiple GPUs or other accelerate issues. 48 | Go to your huggingface cache folder and delete the `default_config.yaml` file. For examples the default location of this file would be would be at `~/.cache/huggingface/accelerate/default_config.yaml`. 49 | 50 | When running finetuning, if you are **NOT** seeing a messge like this, then you have a default accelerate config that is saved in your cache that needs to be **DELETED**. 51 | ```python 52 | The following values were not passed to `accelerate launch` and had defaults used instead: 53 | `--num_processes` was set to a value of `1` 54 | `--num_machines` was set to a value of `1` 55 | `--mixed_precision` was set to a value of `'no'` 56 | `--dynamo_backend` was set to a value of `'no'` 57 | To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. 58 | ``` 59 | 60 | ## Links 61 | - [Discord](https://discord.gg/HfNctSTJ) 62 | - [HuggingFace](https://huggingface.co/AblateIt) 63 | - [WandB](https://wandb.ai/ablateit) 64 | -------------------------------------------------------------------------------- /configs/README.md: -------------------------------------------------------------------------------- 1 | ## Structure 2 | ### sweep_configs 3 | Contains configuration files for each sweep: 4 | - "full_ft_sweep.yaml" - Full Fine-Tuning Sweep 5 | - "lora_sweep.yaml" - LoRA Sweep 6 | - "qlora_sweep.yaml" - QLoRA Sweep 7 | 8 | ### default_training_args 9 | Contains the default training arguments for each fine-tuning method taken directly from axolotl: 10 | - "default_lora.yaml" - LoRA Default Training Arguments 11 | - "default_qlora.yaml" - QLoRA Default Training Arguments 12 | 13 | ### test 14 | Contains the Puffin Llama 2 7b configuration mainly used for testing 15 | - "qlora_experiment.yaml" 16 | -------------------------------------------------------------------------------- /configs/args_guide.yaml: -------------------------------------------------------------------------------- 1 | # this is the huggingface model that contains *.pt, *.safetensors, or *.bin files 2 | # this can also be a relative path to a model on disk 3 | base_model: ./llama-7b-hf 4 | # you can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc) 5 | base_model_ignore_patterns: 6 | # if the base_model repo on hf hub doesn't include configuration .json files, 7 | # you can set that here, or leave this empty to default to base_model 8 | base_model_config: ./llama-7b-hf 9 | # you can specify to choose a specific model revision from huggingface hub 10 | model_revision: 11 | # Optional tokenizer configuration override in case you want to use a different tokenizer 12 | # than the one defined in the base model 13 | tokenizer_config: 14 | # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too 15 | model_type: AutoModelForCausalLM 16 | # Corresponding tokenizer for the model AutoTokenizer is a good choice 17 | tokenizer_type: AutoTokenizer 18 | # Trust remote code for untrusted source 19 | trust_remote_code: 20 | # use_fast option for tokenizer loading from_pretrained, default to True 21 | tokenizer_use_fast: 22 | # resize the model embeddings when new tokens are added to multiples of 32 23 | # this is reported to improve training speed on some models 24 | resize_token_embeddings_to_32x: 25 | 26 | # whether you are training a 4-bit GPTQ quantized model 27 | gptq: true 28 | gptq_groupsize: 128 # group size 29 | gptq_model_v1: false # v1 or v2 30 | 31 | # this will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer 32 | load_in_8bit: true 33 | # use bitsandbytes 4 bit 34 | load_in_4bit: 35 | 36 | # Use CUDA bf16 37 | bf16: true # bool or 'full' for `bf16_full_eval`. require >=ampere 38 | # Use CUDA fp16 39 | fp16: true 40 | # Use CUDA tf32 41 | tf32: true # require >=ampere 42 | 43 | # a list of one or more datasets to finetune the model with 44 | datasets: 45 | # hf dataset repo | "json" for local dataset, make sure to fill data_files 46 | - path: vicgalle/alpaca-gpt4 47 | # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection] 48 | type: alpaca # format | format: (chat/instruct) | .load_ 49 | data_files: # path to source data files 50 | shards: # number of shards to split data into 51 | name: # name of dataset configuration to load 52 | 53 | # axolotl attempts to save the dataset as an arrow after packing the data together so 54 | # subsequent training attempts load faster, relative path 55 | dataset_prepared_path: data/last_run_prepared 56 | # push prepared dataset to hub 57 | push_dataset_to_hub: # repo path 58 | # push checkpoints to hub 59 | hub_model_id: # repo path to push finetuned model 60 | # whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets 61 | # required to be true when used in combination with `push_dataset_to_hub` 62 | hf_use_auth_token: # boolean 63 | # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc 64 | val_set_size: 0.04 65 | # Num shards for whole dataset 66 | dataset_shard_num: 67 | # Index of shard to use for whole dataset 68 | dataset_shard_idx: 69 | 70 | # the maximum length of an input to train with, this should typically be less than 2048 71 | # as most models have a token/context limit of 2048 72 | sequence_len: 2048 73 | # max sequence length to concatenate training samples together up to 74 | # inspired by StackLLaMA. see https://huggingface.co/blog/stackllama#supervised-fine-tuning 75 | max_packed_sequence_len: 1024 76 | 77 | # if you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model 78 | adapter: lora 79 | # if you already have a lora model trained that you want to load, put that here 80 | # lora hyperparameters 81 | lora_model_dir: 82 | lora_r: 8 83 | lora_alpha: 16 84 | lora_dropout: 0.05 85 | lora_target_modules: 86 | - q_proj 87 | - v_proj 88 | # - k_proj 89 | # - o_proj 90 | # - gate_proj 91 | # - down_proj 92 | # - up_proj 93 | lora_target_linear: # if true, will target all linear layers 94 | lora_modules_to_save: 95 | # - embed_tokens 96 | # - lm_head 97 | lora_out_dir: 98 | lora_fan_in_fan_out: false 99 | 100 | # wandb configuration if you're using it 101 | wandb_mode: 102 | wandb_project: 103 | wandb_watch: 104 | wandb_run_id: 105 | wandb_log_model: # 'checkpoint' 106 | 107 | # where to save the finished model to 108 | output_dir: ./completed-model 109 | 110 | # training hyperparameters 111 | gradient_accumulation_steps: 1 112 | micro_batch_size: 2 113 | eval_batch_size: 2 114 | num_epochs: 3 115 | warmup_steps: 100 116 | learning_rate: 0.00003 117 | logging_steps: 118 | save_steps: 119 | eval_steps: 120 | 121 | # save model as safetensors (require safetensors package) 122 | save_safetensors: 123 | 124 | # whether to mask out or include the human's prompt from the training labels 125 | train_on_inputs: false 126 | # group similarly sized data to minimize padding 127 | # may be slower to start, as it must download and sort the entire dataset 128 | # note that training loss may have an oscillating pattern with this enabled 129 | group_by_length: false 130 | 131 | # Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing 132 | gradient_checkpointing: false 133 | 134 | # stop training after this many evaluation losses have increased in a row 135 | # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback 136 | early_stopping_patience: 3 137 | 138 | # specify a scheduler and kwargs to use with the optimizer 139 | lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine 140 | lr_scheduler_kwargs: 141 | 142 | # for one_cycle optim 143 | lr_div_factor: # learning rate div factor 144 | 145 | # for log_sweep optim 146 | log_sweep_min_lr: 147 | log_sweep_max_lr: 148 | 149 | # specify optimizer 150 | optimizer: 151 | # specify weight decay 152 | weight_decay: 153 | # adamw hyperparams 154 | adam_beta1: 155 | adam_beta2: 156 | adam_epsilon: 157 | # Gradient clipping max norm 158 | max_grad_norm: 159 | 160 | # whether to bettertransformers 161 | flash_optimum: 162 | # whether to use xformers attention patch https://github.com/facebookresearch/xformers: 163 | xformers_attention: 164 | # whether to use flash attention patch https://github.com/HazyResearch/flash-attention: 165 | flash_attention: # require a100 for llama 166 | # whether to use scaled-dot-product attention 167 | # https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html 168 | sdp_attention: 169 | # Landmark attention (only llama) 170 | landmark_attention: 171 | # xpos RoPE see https://github.com/kaiokendev/cutoff-len-is-context-len/blob/main/util/xpos_rope_llama_monkey_patch.py 172 | # llama only 173 | xpos_rope: 174 | 175 | # resume from a specific checkpoint dir 176 | resume_from_checkpoint: 177 | # if resume_from_checkpoint isn't set and you simply want it to start where it left off 178 | # be careful with this being turned on between different models 179 | auto_resume_from_checkpoints: false 180 | 181 | # don't mess with this, it's here for accelerate and torchrun 182 | local_rank: 183 | 184 | # add or change special tokens 185 | special_tokens: 186 | # bos_token: "" 187 | # eos_token: "" 188 | # unk_token: "" 189 | # add extra tokens 190 | tokens: 191 | 192 | # FSDP 193 | fsdp: 194 | fsdp_config: 195 | 196 | # Deepspeed 197 | deepspeed: 198 | 199 | # Path to torch distx for optim 'adamw_anyprecision' 200 | torchdistx_path: 201 | 202 | # Set padding for data collator to 'longest' 203 | collator_pad_to_longest: 204 | 205 | # Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize 206 | pretraining_dataset: 207 | 208 | # Debug mode 209 | debug: 210 | 211 | # Seed 212 | seed: 213 | 214 | # Allow overwrite yml config using from cli 215 | strict: -------------------------------------------------------------------------------- /configs/default_training_configs/default_full_ft.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AblateIt/finetune-study/63dada3020e78d6a7da290fba67d4850331c6dd0/configs/default_training_configs/default_full_ft.yaml -------------------------------------------------------------------------------- /configs/default_training_configs/default_lora.yaml: -------------------------------------------------------------------------------- 1 | #TODO: change defaults to work with our needs 2 | base_model: meta-llama/Llama-2-7b-hf 3 | base_model_config: meta-llama/Llama-2-7b-hf 4 | model_type: LlamaForCausalLM 5 | tokenizer_type: LlamaTokenizer 6 | 7 | load_in_8bit: true 8 | load_in_4bit: false 9 | strict: false 10 | 11 | datasets: 12 | - path: mhenrichsen/alpaca_2k_test 13 | type: alpaca 14 | dataset_prepared_path: last_run_prepared 15 | val_set_size: 0.01 16 | output_dir: ./lora-out 17 | 18 | sequence_len: 4096 19 | max_packed_sequence_len: 20 | 21 | adapter: lora 22 | lora_model_dir: 23 | lora_r: 32 24 | lora_alpha: 16 25 | lora_dropout: 0.05 26 | lora_target_linear: true 27 | lora_fan_in_fan_out: 28 | 29 | wandb_project: 30 | wandb_watch: 31 | wandb_run_id: 32 | wandb_log_model: 33 | 34 | gradient_accumulation_steps: 4 35 | micro_batch_size: 1 36 | num_epochs: 10 37 | optimizer: adamw_bnb_8bit 38 | lr_scheduler: constant_with_warmup 39 | learning_rate: 0.0002 40 | 41 | train_on_inputs: false 42 | group_by_length: false 43 | bf16: true 44 | fp16: false 45 | tf32: false 46 | 47 | gradient_checkpointing: true 48 | early_stopping_patience: 49 | resume_from_checkpoint: 50 | local_rank: 51 | logging_steps: 1 52 | xformers_attention: false 53 | flash_attention: true 54 | 55 | warmup_steps: 10 56 | eval_steps: 20 57 | save_steps: 58 | debug: 59 | deepspeed: 60 | weight_decay: 0.0 61 | fsdp: 62 | fsdp_config: 63 | special_tokens: 64 | bos_token: "" 65 | eos_token: "" 66 | unk_token: "" 67 | -------------------------------------------------------------------------------- /configs/default_training_configs/default_qlora.yaml: -------------------------------------------------------------------------------- 1 | base_model: NousResearch/Llama-2-7b-hf 2 | base_model_config: NousResearch/Llama-2-7b-hf 3 | model_type: LlamaForCausalLM 4 | tokenizer_type: LlamaTokenizer 5 | 6 | load_in_8bit: false 7 | load_in_4bit: true 8 | strict: false 9 | 10 | datasets: 11 | - path: LDJnr/Puffin 12 | type: sharegpt:chat 13 | val_set_size: 0.05 14 | dataset_prepared_path: last_run_prepared 15 | output_dir: ./qlora-out 16 | 17 | adapter: qlora 18 | lora_model_dir: 19 | 20 | sequence_len: 4096 21 | max_packed_sequence_len: 22 | lora_r: 32 23 | lora_alpha: 16 24 | lora_dropout: 0.00 25 | lora_target_modules: 26 | - gate_proj 27 | - down_proj 28 | - up_proj 29 | - q_proj 30 | - v_proj 31 | - k_proj 32 | - o_proj 33 | lora_target_linear: true 34 | lora_fan_in_fan_out: 35 | 36 | wandb_project: 37 | wandb_watch: 38 | wandb_log_model: 39 | 40 | data_seed: 42 41 | seed: 42 42 | 43 | gradient_accumulation_steps: 4 44 | micro_batch_size: 1 45 | num_epochs: 7 46 | optimizer: adamw_bnb_8bit 47 | learning_rate: 0.00002 48 | lr_scheduler: constant_with_warmup 49 | 50 | train_on_inputs: false 51 | group_by_length: false 52 | bf16: true 53 | fp16: false 54 | tf32: false 55 | 56 | gradient_checkpointing: true 57 | early_stopping_patience: 5 58 | resume_from_checkpoint: 59 | local_rank: 60 | logging_steps: 1 61 | xformers_attention: false 62 | flash_attention: true 63 | 64 | save_strategy: epoch 65 | eval_strategy: epoch 66 | eval_steps: 0.2 67 | save_steps: 0.2 68 | save_total_limit: 5 69 | load_best_model_at_end: true 70 | 71 | bench_dataset: pharaouk/dharma-1/dharma_1_full.json 72 | do_bench_eval: true 73 | greater_is_better: true 74 | metric_for_best_model: eval_bench_total_accuracy 75 | 76 | debug: 77 | deepspeed: 78 | weight_decay: 0.0 79 | fsdp: 80 | fsdp_config: 81 | special_tokens: 82 | bos_token: "" 83 | eos_token: "" 84 | unk_token: "" 85 | -------------------------------------------------------------------------------- /configs/sweep_configs/full_ft_sweep.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AblateIt/finetune-study/63dada3020e78d6a7da290fba67d4850331c6dd0/configs/sweep_configs/full_ft_sweep.yaml -------------------------------------------------------------------------------- /configs/sweep_configs/lora_sweep.yaml: -------------------------------------------------------------------------------- 1 | wandb_args: 2 | name: lora 3 | # early_terminate: # Uncomment to enable early termination once the two TODOs below are filled in 4 | # max_iter: 10 #TODO: Fill in the number of max iterations 5 | # s: 3 #TODO: Specify total number of brackets. The number of brackets corresponds to the number of times you log the metric you are optimizing. 6 | # or use min_iter 7 | method: grid #TODO: Select between grid, random and bayes 8 | metric: 9 | name: train_loss #TODO: Change to name axolotl uses | or use train_loss first | add moving average of eval_loss to axolotl 10 | goal: minimize 11 | 12 | parameters: #TODO: Fill in the parameters you want to sweep over, everything else will be taken from default_training_configs/default_qlora.yaml and base_training_configs/default_lora.yaml 13 | # Examples: 14 | learning_rate: {"values": [0.00002, 0.00003]} 15 | num_epochs: {"value": 15} 16 | 17 | 18 | -------------------------------------------------------------------------------- /configs/sweep_configs/qlora_sweep.yaml: -------------------------------------------------------------------------------- 1 | wandb_args: 2 | name: qlora_puffin_sweep_4 3 | method: grid 4 | metric: 5 | name: "eval/bench_total_accuracy" 6 | goal: minimize 7 | 8 | parameters: 9 | lora_r: { "values": [ 8, 32, 64, 128 ] } 10 | learning_rate: { "values": [ 1e-4, 2e-5, 1e-6 ] } 11 | gradient_accumulation_steps: { "values": [ 1, 8, 16 ] } 12 | lora_dropout: { "values": [ 0, 0.1 ] } 13 | warmpup_steps_factor_of_epoch: {"value": 0.2} 14 | sweep_name: { "value": "qlora_puffin_sweep_4" } 15 | ft_type: { "value": "qlora" } 16 | weight_decay: { "values": [ 0., 0.1 ] } 17 | -------------------------------------------------------------------------------- /configs/test/qlora_experiment.yaml: -------------------------------------------------------------------------------- 1 | base_model: NousResearch/Llama-2-7b-hf 2 | base_model_config: NousResearch/Llama-2-7b-hf 3 | model_type: LlamaForCausalLM 4 | tokenizer_type: LlamaTokenizer 5 | 6 | load_in_8bit: false 7 | load_in_4bit: true 8 | strict: false 9 | 10 | datasets: 11 | - path: LDJnr/Puffin 12 | type: sharegpt:chat 13 | dataset_prepared_path: last_run_prepared 14 | val_set_size: 0.05 15 | output_dir: ./qlora-out 16 | 17 | adapter: qlora 18 | lora_model_dir: 19 | 20 | sequence_len: 4096 21 | max_packed_sequence_len: 22 | lora_r: 32 23 | lora_alpha: 16 24 | lora_dropout: 0.00 25 | lora_target_modules: 26 | - gate_proj 27 | - down_proj 28 | - up_proj 29 | - q_proj 30 | - v_proj 31 | - k_proj 32 | - o_proj 33 | lora_target_linear: true 34 | lora_fan_in_fan_out: 35 | 36 | wandb_project: 37 | wandb_watch: 38 | wandb_log_model: 39 | 40 | data_seed: 42 41 | seed: 42 42 | 43 | gradient_accumulation_steps: 4 44 | micro_batch_size: 1 45 | num_epochs: 10 46 | optimizer: adamw_bnb_8bit 47 | learning_rate: 0.00002 48 | lr_scheduler: constant_with_warmup 49 | 50 | train_on_inputs: false 51 | group_by_length: false 52 | bf16: true 53 | fp16: false 54 | tf32: false 55 | 56 | gradient_checkpointing: true 57 | early_stopping_patience: 5 58 | resume_from_checkpoint: 59 | local_rank: 60 | logging_steps: 1 61 | xformers_attention: false 62 | flash_attention: true 63 | 64 | save_strategy: epoch 65 | eval_strategy: epoch 66 | eval_steps: 0.2 67 | save_steps: 0.2 68 | save_total_limit: 5 69 | load_best_model_at_end: true 70 | greater_is_better: false 71 | metric_for_best_model: eval_loss 72 | 73 | debug: 74 | deepspeed: 75 | weight_decay: 0.0 76 | fsdp: 77 | fsdp_config: 78 | special_tokens: 79 | bos_token: "" 80 | eos_token: "" 81 | unk_token: "" -------------------------------------------------------------------------------- /installation.sh: -------------------------------------------------------------------------------- 1 | ! python3.9 -m venv finetune-study-venv 2 | ! source finetune-study-venv/bin/activate 3 | ! git clone https://github.com/AblateIt/axolotl.git 4 | ! pip3 install -e axolotl/. 5 | ! pip3 install -r requirements.txt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==1.4.0 2 | accelerate @ git+https://github.com/huggingface/accelerate@2a289f6108e77a77a4efffb3f6316bc98538413b 3 | addict==2.4.0 4 | aiohttp==3.8.5 5 | aiosignal==1.3.1 6 | appdirs==1.4.4 7 | async-timeout==4.0.3 8 | attrs==23.1.0 9 | bert-score==0.3.13 10 | bitsandbytes==0.41.1 11 | certifi==2023.7.22 12 | charset-normalizer==3.2.0 13 | click==8.1.6 14 | cmake==3.27.1 15 | coloredlogs==15.0.1 16 | contourpy==1.1.0 17 | cycler==0.11.0 18 | datasets==2.14.4 19 | dill==0.3.7 20 | docker-pycreds==0.4.0 21 | einops==0.6.1 22 | evaluate==0.4.0 23 | filelock==3.12.2 24 | fire==0.5.0 25 | fonttools==4.42.0 26 | frozenlist==1.4.0 27 | fsspec==2023.6.0 28 | gitdb==4.0.10 29 | GitPython==3.1.32 30 | hf-transfer==0.1.3 31 | huggingface-hub==0.16.4 32 | humanfriendly==10.0 33 | idna==3.4 34 | importlib-resources==6.0.1 35 | Jinja2==3.1.2 36 | joblib==1.3.2 37 | kiwisolver==1.4.4 38 | lit==16.0.6 39 | MarkupSafe==2.1.3 40 | matplotlib==3.7.2 41 | mpmath==1.3.0 42 | multidict==6.0.4 43 | multiprocess==0.70.15 44 | mypy-extensions==1.0.0 45 | networkx==3.1 46 | nltk==3.8.1 47 | numpy==1.25.2 48 | optimum==1.11.1 49 | packaging==23.1 50 | pandas==2.0.3 51 | pathtools==0.1.2 52 | peft @ git+https://github.com/huggingface/peft.git@a916465ad0970944f3241305071d9b79fae55b59 53 | Pillow==10.0.0 54 | protobuf==4.24.0 55 | psutil==5.9.5 56 | pyarrow==12.0.1 57 | pynvml==11.5.0 58 | pyparsing==3.0.9 59 | pyre-extensions==0.0.29 60 | python-dateutil==2.8.2 61 | pytz==2023.3 62 | PyYAML==6.0 63 | regex==2023.8.8 64 | requests==2.31.0 65 | responses==0.18.0 66 | rouge-score==0.1.2 67 | safetensors==0.3.2 68 | scikit-learn==1.2.2 69 | scipy==1.11.1 70 | sentencepiece==0.1.99 71 | sentry-sdk==1.29.2 72 | setproctitle==1.3.2 73 | six==1.16.0 74 | smmap==5.0.0 75 | sympy==1.12 76 | termcolor==2.3.0 77 | threadpoolctl==3.2.0 78 | tokenizers==0.13.3 79 | tqdm==4.66.1 80 | transformers @ git+https://github.com/huggingface/transformers.git@fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db 81 | triton==2.0.0 82 | typing-extensions==4.7.1 83 | typing-inspect==0.9.0 84 | tzdata==2023.3 85 | urllib3==2.0.4 86 | wandb==0.15.8 87 | xformers==0.0.20 88 | xxhash==3.3.0 89 | yarl==1.9.2 90 | zipp==3.16.2 91 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | """setup.py for axolotl""" 2 | 3 | from setuptools import find_packages, setup 4 | 5 | install_requires = [] 6 | with open("./requirements.txt", encoding="utf-8") as requirements_file: 7 | # don't include peft yet until we check the int4 8 | # need to manually install peft for now... 9 | reqs = [r.strip() for r in requirements_file.readlines() if "peft" not in r] 10 | reqs = [r for r in reqs if r and r[0] != "#"] 11 | for r in reqs: 12 | install_requires.append(r) 13 | 14 | setup( 15 | name="axolotl", 16 | version="0.1", 17 | description="You know you're going to axolotl questions", 18 | package_dir={"": "src"}, 19 | packages=find_packages(), 20 | install_requires=install_requires, 21 | extras_require={ 22 | "gptq": [ 23 | "alpaca_lora_4bit @ git+https://github.com/winglian/alpaca_lora_4bit.git@setup_pip", 24 | ], 25 | "gptq_triton": [ 26 | "alpaca_lora_4bit[triton] @ git+https://github.com/winglian/alpaca_lora_4bit.git@setup_pip", 27 | ], 28 | "extras": [ 29 | "flash-attn", 30 | "deepspeed", 31 | ], 32 | }, 33 | ) 34 | 35 | -------------------------------------------------------------------------------- /sweep.py: -------------------------------------------------------------------------------- 1 | import wandb 2 | import argparse 3 | import yaml 4 | import shutil 5 | from subprocess import call 6 | import os 7 | 8 | wandb.login() 9 | 10 | 11 | def get_args(): 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument( 14 | "--sweep_id", 15 | type=str, 16 | default=None, 17 | help="Wandb sweep id for decentralized sweeping. If not provided, a new sweep will be created.", 18 | ) 19 | 20 | parser.add_argument( 21 | "--gpu", 22 | type=list, 23 | default=None, 24 | help="List of CUDA device ids to use for training. If not provided, all available GPUs will be used.", 25 | ) 26 | 27 | parser.add_argument( 28 | "--sweep_config", 29 | type=str, 30 | default="configs/sweep_configs/qlora_sweep.yaml", 31 | help="Path to sweep config yaml file. Ignored if sweep_id is provided.", 32 | ) 33 | 34 | parser.add_argument( 35 | "--project", 36 | type=str, 37 | default="AblateIt-Sweeps", 38 | help="Wandb project name. Do not change.", 39 | ) 40 | 41 | parser.add_argument( 42 | "--default_training_args", 43 | type=str, 44 | default="configs/default_training_configs/default_qlora.yaml", 45 | help="Path to default training args yaml file. Ignored if sweep_id is provided.", 46 | ) 47 | 48 | parser.add_argument( 49 | "--entity", 50 | type=str, 51 | default="ablateit", 52 | help="Wandb entity name. Do not change unless testing.", 53 | ) 54 | 55 | parser.add_argument( 56 | "--push_to_hub", 57 | type=bool, 58 | default=True, 59 | help="Whether to push the models to the hub during training.", 60 | ) 61 | 62 | parser.add_argument( 63 | "--max_num_runs", 64 | type=int, 65 | default=99999, 66 | help="Maximum number of runs for the agent to start.", 67 | ) 68 | 69 | # parser.add_argument('--dataset', type=str, default='LDJnr/Puffin', 70 | # help='Dataset to use for training. Currently only supports Puffin.') 71 | 72 | return parser.parse_args() 73 | 74 | 75 | DATASET_SIZES = {"Puffin": 3000} 76 | 77 | 78 | def create_name(config_dict): 79 | short = { 80 | "gradient_accumulation_steps": "graccsteps", 81 | "learning_rate": "lr", 82 | "lora_r": "lora_r", 83 | "lora_dropout": "drop", 84 | } 85 | name = "" 86 | for hyperparam, value in config_dict.items(): 87 | name += short.get(hyperparam, hyperparam) + str(value).replace(".", "_") + "-" 88 | return name[:-1] 89 | 90 | 91 | def sweep(): 92 | args = get_args() 93 | 94 | sweep_id = args.sweep_id 95 | 96 | if not sweep_id: 97 | sweep_config = yaml.safe_load(open(args.sweep_config))["wandb_args"] 98 | sweep_id = wandb.sweep(sweep_config, project=args.project) 99 | print(sweep_id) 100 | with open("sweep_id.txt", "w") as file: 101 | file.write(sweep_id) 102 | 103 | def run_sweep(): 104 | wandb.init(entity=args.entity) 105 | config = dict(wandb.config) 106 | 107 | warmup_factor = ( 108 | config.pop("warmpup_steps_factor_of_epoch") 109 | if "warmpup_steps_factor_of_epoch" in config 110 | else None 111 | ) 112 | finetune_type = config.pop("ft_type") 113 | sweep_name = config.pop("sweep_name") 114 | 115 | run_name = args.project + "-" + sweep_name + "-" + finetune_type + "-" + create_name(config) 116 | 117 | wandb.run.name = run_name 118 | with open(args.default_training_args, "r") as file: 119 | run_config = yaml.safe_load(file) 120 | 121 | for hyperparameter, value in config.items(): 122 | run_config[hyperparameter] = value 123 | 124 | epoch_train_steps = int((DATASET_SIZES["Puffin"] * 125 | (1 - run_config["val_set_size"])) / (run_config["gradient_accumulation_steps"] * run_config["micro_batch_size"])) 126 | 127 | if warmup_factor: 128 | run_config["warmup_steps"] = int(epoch_train_steps * warmup_factor) 129 | 130 | if run_config["eval_strategy"] == "epoch" and type(run_config["eval_steps"]) == float: 131 | run_config["eval_steps"] = int(epoch_train_steps * run_config["eval_steps"]) 132 | run_config["eval_strategy"] = "steps" 133 | 134 | if run_config["save_strategy"] == "epoch" and type(run_config["save_steps"]) == float: 135 | run_config["save_steps"] = int(epoch_train_steps * run_config["save_steps"]) 136 | run_config["save_strategy"] = "steps" 137 | 138 | if args.push_to_hub: 139 | run_config["hub_model_id"] = "AblateIt/" + run_name 140 | run_config["push_to_hub"] = True 141 | run_config["hub_strategy"] = "all_checkpoints" 142 | print(run_config["hub_model_id"]) 143 | 144 | run_config["wandb_project"] = args.project 145 | run_config["wandb_entity"] = args.entity 146 | run_config["wandb_run_name"] = run_name 147 | run_config["output_dir"] = run_config["output_dir"] + "/" + run_name + "/" 148 | 149 | run_config_path = run_config["output_dir"] + "config.yaml" 150 | 151 | if not os.path.exists(run_config["output_dir"]): 152 | os.makedirs(run_config["output_dir"]) 153 | 154 | with open(run_config_path, "w") as file: 155 | yaml.dump(run_config, file) 156 | print(run_config) 157 | 158 | # Run the training command with the temporary config file 159 | cuda_device_declaration = ( 160 | "export CUDA_VISIBLE_DEVICES=" + ",".join([str(x) for x in args.gpu]) + "; " 161 | if args.gpu 162 | else "" 163 | ) 164 | cmd = ( 165 | cuda_device_declaration 166 | + f"accelerate launch axolotl/scripts/finetune.py {run_config_path} --main_process_port 0" 167 | ) 168 | print(cmd) 169 | call(cmd, shell=True) 170 | 171 | if args.sweep_id is not None: 172 | # Run the sweep 173 | wandb.agent(sweep_id, run_sweep, project=args.project, entity=args.entity, count=args.max_num_runs) 174 | 175 | 176 | if __name__ == "__main__": 177 | sweep() 178 | --------------------------------------------------------------------------------