├── .gitignore
├── README.md
├── configs
    ├── README.md
    ├── args_guide.yaml
    ├── default_training_configs
    │   ├── default_full_ft.yaml
    │   ├── default_lora.yaml
    │   └── default_qlora.yaml
    ├── sweep_configs
    │   ├── full_ft_sweep.yaml
    │   ├── lora_sweep.yaml
    │   └── qlora_sweep.yaml
    └── test
    │   └── qlora_experiment.yaml
├── installation.sh
├── requirements.txt
├── setup.py
└── sweep.py


/.gitignore:
--------------------------------------------------------------------------------
1 | axolotl/
2 | *.arrow
3 | *wandb/
4 | *last_run_prepared/
5 | *qlora-out/
6 | *sweep_id.txt
7 | *.idea/


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Comparing QLora, Lora, and Full Fine-tuning
 2 | Comprehensive analysis of difference in performance of QLora, Lora, and Full Fine-tuning.
 3 | 
 4 | 
 5 | ## Installation
 6 | ### 1. Install python 3.9.x or 3.10.x
 7 | ### 2. Install pytorch stable:
 8 | >https://pytorch.org/get-started/locally/
 9 | >
10 | >If this step fails, either at install or gives an error when training, do `pip uninstall torch` and try simply `pip install torch`
11 | >
12 | ### 3. Install axolotl and dependencies
13 | ```
14 | git clone https://github.com/AblateIt/axolotl.git
15 | pip install -e axolotl/.
16 | pip install -U git+https://github.com/huggingface/peft.git
17 | ```
18 | There is a `requirements.txt` file in this repo, you might need to install some packages from this depending on what you are missing.
19 | 
20 | ## For contributors running sweeps and training
21 | ### 1. Request access to the AblateIt WandB and HuggingFace teams
22 | ### 2. Log into wandb and HuggingFace through the CLI
23 |     wandb login (login with the account added to the wandb org)
24 |     huggingface-cli login (login with the account added to the HF org)
25 | 
26 | ### How to start a sweep (you most likely will never do it)
27 | 1. Activate the correct environment
28 | 2. Set the default location to create new projects to `ablateit`. This is required to create the sweep but not to run finetuning.
29 | 3. `python sweep.py --sweep_config <path_to_sweep_config> --project <wandb_project_name> --default_training_args <default_config_file_for_experiment>`
30 | 
31 | For example to run QLora sweep, this command can be run
32 | `python sweep.py --sweep_config configs/sweep_configs/qlora_sweep.yaml --project test-qlora_sweep --default_training_args configs/default_training_configs/default_qlora.yaml`
33 | 
34 | ### How to Finetune configurations from a sweep.
35 | 1. Check if you have a default acclerate config and if you have it then delete it. You can check your huggingface cache folder, by default it points to this `~/.cache/huggingface/accelerate/default_config.yaml`, if the `default_config.yaml` file exists then delete it.
36 | 2. Test your code by running the command `CUDA_VISIBLE_DEVICES=0 accelerate launch axolotl/scripts/finetune.py configs/test/qlora_experiment.yaml --main_process_port 0`, this should run a qlora run on your GPU0. If not then please fix the error before running a sweep or else you will pull configurations from the sweep which will crash and no one else will be able to run them as well.
37 | 
38 | 3. You would need a `sweep_id` and a `project_id` from one of the contributor who has started a sweep in order to run finetune experiments.
39 | 
40 | `python sweep.py --sweep_id <sweep_id> --project <project_id> --gpu <gpu_id>`
41 | 
42 | For example, this sample command will run finetuning on GPU 0.
43 | `python sweep.py --sweep_id usevjjyj  --CUDA_device_ids 0`
44 | 
45 | 
46 | ## FAQs
47 | #### 1. Accelerate running experiments on multiple GPUs or other accelerate issues.
48 | Go to your huggingface cache folder and delete the `default_config.yaml` file. For examples the default location of this file would be would be at `~/.cache/huggingface/accelerate/default_config.yaml`.
49 | 
50 | When running finetuning, if you are **NOT** seeing a messge like this, then you have a default accelerate config that is saved in your cache that needs to be **DELETED**.
51 | ```python
52 | The following values were not passed to `accelerate launch` and had defaults used instead:
53 |         `--num_processes` was set to a value of `1`
54 |         `--num_machines` was set to a value of `1`
55 |         `--mixed_precision` was set to a value of `'no'`
56 |         `--dynamo_backend` was set to a value of `'no'`
57 | To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
58 | ```
59 | 
60 | ## Links
61 | - [Discord](https://discord.gg/HfNctSTJ)
62 | - [HuggingFace](https://huggingface.co/AblateIt)
63 | - [WandB](https://wandb.ai/ablateit)
64 | 


--------------------------------------------------------------------------------
/configs/README.md:
--------------------------------------------------------------------------------
 1 | ## Structure
 2 | ### sweep_configs
 3 | Contains configuration files for each sweep:
 4 | - "full_ft_sweep.yaml" - Full Fine-Tuning Sweep
 5 | - "lora_sweep.yaml" - LoRA Sweep
 6 | - "qlora_sweep.yaml" - QLoRA Sweep
 7 | 
 8 | ### default_training_args
 9 | Contains the default training arguments for each fine-tuning method taken directly from axolotl:
10 | - "default_lora.yaml" - LoRA Default Training Arguments
11 | - "default_qlora.yaml" - QLoRA Default Training Arguments
12 | 
13 | ### test
14 | Contains the Puffin Llama 2 7b configuration mainly used for testing
15 | - "qlora_experiment.yaml" 
16 | 


--------------------------------------------------------------------------------
/configs/args_guide.yaml:
--------------------------------------------------------------------------------
  1 | # this is the huggingface model that contains *.pt, *.safetensors, or *.bin files
  2 | # this can also be a relative path to a model on disk
  3 | base_model: ./llama-7b-hf
  4 | # you can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
  5 | base_model_ignore_patterns:
  6 | # if the base_model repo on hf hub doesn't include configuration .json files,
  7 | # you can set that here, or leave this empty to default to base_model
  8 | base_model_config: ./llama-7b-hf
  9 | # you can specify to choose a specific model revision from huggingface hub
 10 | model_revision:
 11 | # Optional tokenizer configuration override in case you want to use a different tokenizer
 12 | # than the one defined in the base model
 13 | tokenizer_config:
 14 | # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
 15 | model_type: AutoModelForCausalLM
 16 | # Corresponding tokenizer for the model AutoTokenizer is a good choice
 17 | tokenizer_type: AutoTokenizer
 18 | # Trust remote code for untrusted source
 19 | trust_remote_code:
 20 | # use_fast option for tokenizer loading from_pretrained, default to True
 21 | tokenizer_use_fast:
 22 | # resize the model embeddings when new tokens are added to multiples of 32
 23 | # this is reported to improve training speed on some models
 24 | resize_token_embeddings_to_32x:
 25 | 
 26 | # whether you are training a 4-bit GPTQ quantized model
 27 | gptq: true
 28 | gptq_groupsize: 128 # group size
 29 | gptq_model_v1: false # v1 or v2
 30 | 
 31 | # this will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
 32 | load_in_8bit: true
 33 | # use bitsandbytes 4 bit
 34 | load_in_4bit:
 35 | 
 36 | # Use CUDA bf16
 37 | bf16: true # bool or 'full' for `bf16_full_eval`. require >=ampere
 38 | # Use CUDA fp16
 39 | fp16: true
 40 | # Use CUDA tf32
 41 | tf32: true # require >=ampere
 42 | 
 43 | # a list of one or more datasets to finetune the model with
 44 | datasets:
 45 |   # hf dataset repo | "json" for local dataset, make sure to fill data_files
 46 |   - path: vicgalle/alpaca-gpt4
 47 |   # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
 48 |     type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
 49 |     data_files: # path to source data files
 50 |     shards: # number of shards to split data into
 51 |     name: # name of dataset configuration to load
 52 | 
 53 | # axolotl attempts to save the dataset as an arrow after packing the data together so
 54 | # subsequent training attempts load faster, relative path
 55 | dataset_prepared_path: data/last_run_prepared
 56 | # push prepared dataset to hub
 57 | push_dataset_to_hub: # repo path
 58 | # push checkpoints to hub
 59 | hub_model_id: # repo path to push finetuned model
 60 | # whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
 61 | # required to be true when used in combination with `push_dataset_to_hub`
 62 | hf_use_auth_token: # boolean
 63 | # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc
 64 | val_set_size: 0.04
 65 | # Num shards for whole dataset
 66 | dataset_shard_num:
 67 | # Index of shard to use for whole dataset
 68 | dataset_shard_idx:
 69 | 
 70 | # the maximum length of an input to train with, this should typically be less than 2048
 71 | # as most models have a token/context limit of 2048
 72 | sequence_len: 2048
 73 | # max sequence length to concatenate training samples together up to
 74 | # inspired by StackLLaMA. see https://huggingface.co/blog/stackllama#supervised-fine-tuning
 75 | max_packed_sequence_len: 1024
 76 | 
 77 | # if you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model
 78 | adapter: lora
 79 | # if you already have a lora model trained that you want to load, put that here
 80 | # lora hyperparameters
 81 | lora_model_dir:
 82 | lora_r: 8
 83 | lora_alpha: 16
 84 | lora_dropout: 0.05
 85 | lora_target_modules:
 86 |   - q_proj
 87 |   - v_proj
 88 | #  - k_proj
 89 | #  - o_proj
 90 | #  - gate_proj
 91 | #  - down_proj
 92 | #  - up_proj
 93 | lora_target_linear: # if true, will target all linear layers
 94 | lora_modules_to_save:
 95 | #  - embed_tokens
 96 | #  - lm_head
 97 | lora_out_dir:
 98 | lora_fan_in_fan_out: false
 99 | 
100 | # wandb configuration if you're using it
101 | wandb_mode:
102 | wandb_project:
103 | wandb_watch:
104 | wandb_run_id:
105 | wandb_log_model: # 'checkpoint'
106 | 
107 | # where to save the finished model to
108 | output_dir: ./completed-model
109 | 
110 | # training hyperparameters
111 | gradient_accumulation_steps: 1
112 | micro_batch_size: 2
113 | eval_batch_size: 2
114 | num_epochs: 3
115 | warmup_steps: 100
116 | learning_rate: 0.00003
117 | logging_steps:
118 | save_steps:
119 | eval_steps:
120 | 
121 | # save model as safetensors (require safetensors package)
122 | save_safetensors:
123 | 
124 | # whether to mask out or include the human's prompt from the training labels
125 | train_on_inputs: false
126 | # group similarly sized data to minimize padding
127 | # may be slower to start, as it must download and sort the entire dataset
128 | # note that training loss may have an oscillating pattern with this enabled
129 | group_by_length: false
130 | 
131 | # Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
132 | gradient_checkpointing: false
133 | 
134 | # stop training after this many evaluation losses have increased in a row
135 | # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
136 | early_stopping_patience: 3
137 | 
138 | # specify a scheduler and kwargs to use with the optimizer
139 | lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine
140 | lr_scheduler_kwargs:
141 | 
142 | # for one_cycle optim
143 | lr_div_factor: # learning rate div factor
144 | 
145 | # for log_sweep optim
146 | log_sweep_min_lr:
147 | log_sweep_max_lr:
148 | 
149 | # specify optimizer
150 | optimizer:
151 | # specify weight decay
152 | weight_decay:
153 | # adamw hyperparams
154 | adam_beta1:
155 | adam_beta2:
156 | adam_epsilon:
157 | # Gradient clipping max norm
158 | max_grad_norm:
159 | 
160 | # whether to bettertransformers
161 | flash_optimum:
162 | # whether to use xformers attention patch https://github.com/facebookresearch/xformers:
163 | xformers_attention:
164 | # whether to use flash attention patch https://github.com/HazyResearch/flash-attention:
165 | flash_attention:  # require a100 for llama
166 | # whether to use scaled-dot-product attention
167 | # https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
168 | sdp_attention:
169 | # Landmark attention (only llama)
170 | landmark_attention:
171 | # xpos RoPE see https://github.com/kaiokendev/cutoff-len-is-context-len/blob/main/util/xpos_rope_llama_monkey_patch.py
172 | # llama only
173 | xpos_rope:
174 | 
175 | # resume from a specific checkpoint dir
176 | resume_from_checkpoint:
177 | # if resume_from_checkpoint isn't set and you simply want it to start where it left off
178 | # be careful with this being turned on between different models
179 | auto_resume_from_checkpoints: false
180 | 
181 | # don't mess with this, it's here for accelerate and torchrun
182 | local_rank:
183 | 
184 | # add or change special tokens
185 | special_tokens:
186 |   # bos_token: "<s>"
187 |   # eos_token: "</s>"
188 |   # unk_token: "<unk>"
189 | # add extra tokens
190 | tokens:
191 | 
192 | # FSDP
193 | fsdp:
194 | fsdp_config:
195 | 
196 | # Deepspeed
197 | deepspeed:
198 | 
199 | # Path to torch distx for optim 'adamw_anyprecision'
200 | torchdistx_path:
201 | 
202 | # Set padding for data collator to 'longest'
203 | collator_pad_to_longest:
204 | 
205 | # Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize
206 | pretraining_dataset:
207 | 
208 | # Debug mode
209 | debug:
210 | 
211 | # Seed
212 | seed:
213 | 
214 | # Allow overwrite yml config using from cli
215 | strict:


--------------------------------------------------------------------------------
/configs/default_training_configs/default_full_ft.yaml:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AblateIt/finetune-study/63dada3020e78d6a7da290fba67d4850331c6dd0/configs/default_training_configs/default_full_ft.yaml


--------------------------------------------------------------------------------
/configs/default_training_configs/default_lora.yaml:
--------------------------------------------------------------------------------
 1 | #TODO: change defaults to work with our needs
 2 | base_model: meta-llama/Llama-2-7b-hf
 3 | base_model_config: meta-llama/Llama-2-7b-hf
 4 | model_type: LlamaForCausalLM
 5 | tokenizer_type: LlamaTokenizer
 6 | 
 7 | load_in_8bit: true
 8 | load_in_4bit: false
 9 | strict: false
10 | 
11 | datasets:
12 |   - path: mhenrichsen/alpaca_2k_test
13 |     type: alpaca
14 | dataset_prepared_path: last_run_prepared
15 | val_set_size: 0.01
16 | output_dir: ./lora-out
17 | 
18 | sequence_len: 4096
19 | max_packed_sequence_len:
20 | 
21 | adapter: lora
22 | lora_model_dir:
23 | lora_r: 32
24 | lora_alpha: 16
25 | lora_dropout: 0.05
26 | lora_target_linear: true
27 | lora_fan_in_fan_out:
28 | 
29 | wandb_project:
30 | wandb_watch:
31 | wandb_run_id:
32 | wandb_log_model:
33 | 
34 | gradient_accumulation_steps: 4
35 | micro_batch_size: 1
36 | num_epochs: 10
37 | optimizer: adamw_bnb_8bit
38 | lr_scheduler: constant_with_warmup
39 | learning_rate: 0.0002
40 | 
41 | train_on_inputs: false
42 | group_by_length: false
43 | bf16: true
44 | fp16: false
45 | tf32: false
46 | 
47 | gradient_checkpointing: true
48 | early_stopping_patience:
49 | resume_from_checkpoint:
50 | local_rank:
51 | logging_steps: 1
52 | xformers_attention: false
53 | flash_attention: true
54 | 
55 | warmup_steps: 10
56 | eval_steps: 20
57 | save_steps:
58 | debug:
59 | deepspeed:
60 | weight_decay: 0.0
61 | fsdp:
62 | fsdp_config:
63 | special_tokens:
64 |   bos_token: "<s>"
65 |   eos_token: "</s>"
66 |   unk_token: "<unk>"
67 | 


--------------------------------------------------------------------------------
/configs/default_training_configs/default_qlora.yaml:
--------------------------------------------------------------------------------
 1 | base_model: NousResearch/Llama-2-7b-hf
 2 | base_model_config: NousResearch/Llama-2-7b-hf
 3 | model_type: LlamaForCausalLM
 4 | tokenizer_type: LlamaTokenizer
 5 | 
 6 | load_in_8bit: false
 7 | load_in_4bit: true
 8 | strict: false
 9 | 
10 | datasets:
11 |   - path: LDJnr/Puffin
12 |     type: sharegpt:chat
13 | val_set_size: 0.05
14 | dataset_prepared_path: last_run_prepared
15 | output_dir: ./qlora-out
16 | 
17 | adapter: qlora
18 | lora_model_dir:
19 | 
20 | sequence_len: 4096
21 | max_packed_sequence_len:
22 | lora_r: 32
23 | lora_alpha: 16
24 | lora_dropout: 0.00
25 | lora_target_modules:
26 |   - gate_proj
27 |   - down_proj
28 |   - up_proj
29 |   - q_proj
30 |   - v_proj
31 |   - k_proj
32 |   - o_proj
33 | lora_target_linear: true
34 | lora_fan_in_fan_out:
35 | 
36 | wandb_project:
37 | wandb_watch:
38 | wandb_log_model:
39 | 
40 | data_seed: 42
41 | seed: 42
42 | 
43 | gradient_accumulation_steps: 4
44 | micro_batch_size: 1
45 | num_epochs: 7
46 | optimizer: adamw_bnb_8bit
47 | learning_rate: 0.00002
48 | lr_scheduler: constant_with_warmup
49 | 
50 | train_on_inputs: false
51 | group_by_length: false
52 | bf16: true
53 | fp16: false
54 | tf32: false
55 | 
56 | gradient_checkpointing: true
57 | early_stopping_patience: 5
58 | resume_from_checkpoint:
59 | local_rank:
60 | logging_steps: 1
61 | xformers_attention: false
62 | flash_attention: true
63 | 
64 | save_strategy: epoch
65 | eval_strategy: epoch
66 | eval_steps: 0.2
67 | save_steps: 0.2
68 | save_total_limit: 5
69 | load_best_model_at_end: true
70 | 
71 | bench_dataset: pharaouk/dharma-1/dharma_1_full.json
72 | do_bench_eval: true
73 | greater_is_better: true
74 | metric_for_best_model: eval_bench_total_accuracy
75 | 
76 | debug:
77 | deepspeed:
78 | weight_decay: 0.0
79 | fsdp:
80 | fsdp_config:
81 | special_tokens:
82 |   bos_token: "<s>"
83 |   eos_token: "</s>"
84 |   unk_token: "<unk>"
85 | 


--------------------------------------------------------------------------------
/configs/sweep_configs/full_ft_sweep.yaml:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AblateIt/finetune-study/63dada3020e78d6a7da290fba67d4850331c6dd0/configs/sweep_configs/full_ft_sweep.yaml


--------------------------------------------------------------------------------
/configs/sweep_configs/lora_sweep.yaml:
--------------------------------------------------------------------------------
 1 | wandb_args:
 2 |   name: lora
 3 | #  early_terminate: # Uncomment to enable early termination once the two TODOs below are filled in
 4 | #    max_iter: 10 #TODO: Fill in the number of max iterations
 5 | #    s: 3 #TODO: Specify total number of brackets. The number of brackets corresponds to the number of times you log the metric you are optimizing.
 6 |   #  or use min_iter
 7 |   method: grid #TODO: Select between grid, random and bayes
 8 |   metric:
 9 |     name: train_loss #TODO: Change to name axolotl uses | or use train_loss first | add moving average of eval_loss to axolotl
10 |     goal: minimize
11 | 
12 |   parameters: #TODO: Fill in the parameters you want to sweep over, everything else will be taken from default_training_configs/default_qlora.yaml and base_training_configs/default_lora.yaml
13 |     # Examples:
14 |     learning_rate: {"values": [0.00002, 0.00003]}
15 |     num_epochs: {"value": 15}
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/configs/sweep_configs/qlora_sweep.yaml:
--------------------------------------------------------------------------------
 1 | wandb_args:
 2 |   name: qlora_puffin_sweep_4
 3 |   method: grid
 4 |   metric:
 5 |     name: "eval/bench_total_accuracy"
 6 |     goal: minimize
 7 | 
 8 |   parameters:
 9 |     lora_r: { "values": [ 8, 32, 64, 128 ] }
10 |     learning_rate: { "values": [ 1e-4, 2e-5, 1e-6 ] }
11 |     gradient_accumulation_steps: { "values": [ 1, 8, 16 ] }
12 |     lora_dropout: { "values": [ 0, 0.1 ] }
13 |     warmpup_steps_factor_of_epoch: {"value": 0.2}
14 |     sweep_name: { "value": "qlora_puffin_sweep_4" }
15 |     ft_type: { "value": "qlora" }
16 |     weight_decay: { "values": [ 0., 0.1 ] }
17 | 


--------------------------------------------------------------------------------
/configs/test/qlora_experiment.yaml:
--------------------------------------------------------------------------------
 1 | base_model: NousResearch/Llama-2-7b-hf
 2 | base_model_config: NousResearch/Llama-2-7b-hf
 3 | model_type: LlamaForCausalLM
 4 | tokenizer_type: LlamaTokenizer
 5 | 
 6 | load_in_8bit: false
 7 | load_in_4bit: true
 8 | strict: false
 9 | 
10 | datasets:
11 |   - path: LDJnr/Puffin
12 |     type: sharegpt:chat
13 | dataset_prepared_path: last_run_prepared
14 | val_set_size: 0.05
15 | output_dir: ./qlora-out
16 | 
17 | adapter: qlora
18 | lora_model_dir:
19 | 
20 | sequence_len: 4096
21 | max_packed_sequence_len:
22 | lora_r: 32
23 | lora_alpha: 16
24 | lora_dropout: 0.00
25 | lora_target_modules:
26 |   - gate_proj
27 |   - down_proj
28 |   - up_proj
29 |   - q_proj
30 |   - v_proj
31 |   - k_proj
32 |   - o_proj
33 | lora_target_linear: true
34 | lora_fan_in_fan_out:
35 | 
36 | wandb_project:
37 | wandb_watch:
38 | wandb_log_model:
39 | 
40 | data_seed: 42
41 | seed: 42
42 | 
43 | gradient_accumulation_steps: 4
44 | micro_batch_size: 1
45 | num_epochs: 10
46 | optimizer: adamw_bnb_8bit
47 | learning_rate: 0.00002
48 | lr_scheduler: constant_with_warmup
49 | 
50 | train_on_inputs: false
51 | group_by_length: false
52 | bf16: true
53 | fp16: false
54 | tf32: false
55 | 
56 | gradient_checkpointing: true
57 | early_stopping_patience: 5
58 | resume_from_checkpoint:
59 | local_rank:
60 | logging_steps: 1
61 | xformers_attention: false
62 | flash_attention: true
63 | 
64 | save_strategy: epoch
65 | eval_strategy: epoch
66 | eval_steps: 0.2
67 | save_steps: 0.2
68 | save_total_limit: 5
69 | load_best_model_at_end: true
70 | greater_is_better: false
71 | metric_for_best_model: eval_loss
72 | 
73 | debug:
74 | deepspeed:
75 | weight_decay: 0.0
76 | fsdp:
77 | fsdp_config:
78 | special_tokens:
79 |   bos_token: "<s>"
80 |   eos_token: "</s>"
81 |   unk_token: "<unk>"


--------------------------------------------------------------------------------
/installation.sh:
--------------------------------------------------------------------------------
1 | ! python3.9 -m venv finetune-study-venv
2 | ! source finetune-study-venv/bin/activate
3 | ! git clone https://github.com/AblateIt/axolotl.git
4 | ! pip3 install -e axolotl/.
5 | ! pip3 install -r requirements.txt


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | absl-py==1.4.0
 2 | accelerate @ git+https://github.com/huggingface/accelerate@2a289f6108e77a77a4efffb3f6316bc98538413b
 3 | addict==2.4.0
 4 | aiohttp==3.8.5
 5 | aiosignal==1.3.1
 6 | appdirs==1.4.4
 7 | async-timeout==4.0.3
 8 | attrs==23.1.0
 9 | bert-score==0.3.13
10 | bitsandbytes==0.41.1
11 | certifi==2023.7.22
12 | charset-normalizer==3.2.0
13 | click==8.1.6
14 | cmake==3.27.1
15 | coloredlogs==15.0.1
16 | contourpy==1.1.0
17 | cycler==0.11.0
18 | datasets==2.14.4
19 | dill==0.3.7
20 | docker-pycreds==0.4.0
21 | einops==0.6.1
22 | evaluate==0.4.0
23 | filelock==3.12.2
24 | fire==0.5.0
25 | fonttools==4.42.0
26 | frozenlist==1.4.0
27 | fsspec==2023.6.0
28 | gitdb==4.0.10
29 | GitPython==3.1.32
30 | hf-transfer==0.1.3
31 | huggingface-hub==0.16.4
32 | humanfriendly==10.0
33 | idna==3.4
34 | importlib-resources==6.0.1
35 | Jinja2==3.1.2
36 | joblib==1.3.2
37 | kiwisolver==1.4.4
38 | lit==16.0.6
39 | MarkupSafe==2.1.3
40 | matplotlib==3.7.2
41 | mpmath==1.3.0
42 | multidict==6.0.4
43 | multiprocess==0.70.15
44 | mypy-extensions==1.0.0
45 | networkx==3.1
46 | nltk==3.8.1
47 | numpy==1.25.2
48 | optimum==1.11.1
49 | packaging==23.1
50 | pandas==2.0.3
51 | pathtools==0.1.2
52 | peft @ git+https://github.com/huggingface/peft.git@a916465ad0970944f3241305071d9b79fae55b59
53 | Pillow==10.0.0
54 | protobuf==4.24.0
55 | psutil==5.9.5
56 | pyarrow==12.0.1
57 | pynvml==11.5.0
58 | pyparsing==3.0.9
59 | pyre-extensions==0.0.29
60 | python-dateutil==2.8.2
61 | pytz==2023.3
62 | PyYAML==6.0
63 | regex==2023.8.8
64 | requests==2.31.0
65 | responses==0.18.0
66 | rouge-score==0.1.2
67 | safetensors==0.3.2
68 | scikit-learn==1.2.2
69 | scipy==1.11.1
70 | sentencepiece==0.1.99
71 | sentry-sdk==1.29.2
72 | setproctitle==1.3.2
73 | six==1.16.0
74 | smmap==5.0.0
75 | sympy==1.12
76 | termcolor==2.3.0
77 | threadpoolctl==3.2.0
78 | tokenizers==0.13.3
79 | tqdm==4.66.1
80 | transformers @ git+https://github.com/huggingface/transformers.git@fe3c8ab1af558b95f67f5fafc0c55f09fd2b09db
81 | triton==2.0.0
82 | typing-extensions==4.7.1
83 | typing-inspect==0.9.0
84 | tzdata==2023.3
85 | urllib3==2.0.4
86 | wandb==0.15.8
87 | xformers==0.0.20
88 | xxhash==3.3.0
89 | yarl==1.9.2
90 | zipp==3.16.2
91 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | """setup.py for axolotl"""
 2 | 
 3 | from setuptools import find_packages, setup
 4 | 
 5 | install_requires = []
 6 | with open("./requirements.txt", encoding="utf-8") as requirements_file:
 7 |     # don't include peft yet until we check the int4
 8 |     # need to manually install peft for now...
 9 |     reqs = [r.strip() for r in requirements_file.readlines() if "peft" not in r]
10 |     reqs = [r for r in reqs if r and r[0] != "#"]
11 |     for r in reqs:
12 |         install_requires.append(r)
13 | 
14 | setup(
15 |     name="axolotl",
16 |     version="0.1",
17 |     description="You know you're going to axolotl questions",
18 |     package_dir={"": "src"},
19 |     packages=find_packages(),
20 |     install_requires=install_requires,
21 |     extras_require={
22 |         "gptq": [
23 |             "alpaca_lora_4bit @ git+https://github.com/winglian/alpaca_lora_4bit.git@setup_pip",
24 |         ],
25 |         "gptq_triton": [
26 |             "alpaca_lora_4bit[triton] @ git+https://github.com/winglian/alpaca_lora_4bit.git@setup_pip",
27 |         ],
28 |         "extras": [
29 |             "flash-attn",
30 |             "deepspeed",
31 |         ],
32 |     },
33 | )
34 | 
35 | 


--------------------------------------------------------------------------------
/sweep.py:
--------------------------------------------------------------------------------
  1 | import wandb
  2 | import argparse
  3 | import yaml
  4 | import shutil
  5 | from subprocess import call
  6 | import os
  7 | 
  8 | wandb.login()
  9 | 
 10 | 
 11 | def get_args():
 12 |     parser = argparse.ArgumentParser()
 13 |     parser.add_argument(
 14 |         "--sweep_id",
 15 |         type=str,
 16 |         default=None,
 17 |         help="Wandb sweep id for decentralized sweeping. If not provided, a new sweep will be created.",
 18 |     )
 19 | 
 20 |     parser.add_argument(
 21 |         "--gpu",
 22 |         type=list,
 23 |         default=None,
 24 |         help="List of CUDA device ids to use for training. If not provided, all available GPUs will be used.",
 25 |     )
 26 | 
 27 |     parser.add_argument(
 28 |         "--sweep_config",
 29 |         type=str,
 30 |         default="configs/sweep_configs/qlora_sweep.yaml",
 31 |         help="Path to sweep config yaml file. Ignored if sweep_id is provided.",
 32 |     )
 33 | 
 34 |     parser.add_argument(
 35 |         "--project",
 36 |         type=str,
 37 |         default="AblateIt-Sweeps",
 38 |         help="Wandb project name. Do not change.",
 39 |     )
 40 | 
 41 |     parser.add_argument(
 42 |         "--default_training_args",
 43 |         type=str,
 44 |         default="configs/default_training_configs/default_qlora.yaml",
 45 |         help="Path to default training args yaml file. Ignored if sweep_id is provided.",
 46 |     )
 47 | 
 48 |     parser.add_argument(
 49 |         "--entity",
 50 |         type=str,
 51 |         default="ablateit",
 52 |         help="Wandb entity name. Do not change unless testing.",
 53 |     )
 54 | 
 55 |     parser.add_argument(
 56 |         "--push_to_hub",
 57 |         type=bool,
 58 |         default=True,
 59 |         help="Whether to push the models to the hub during training.",
 60 |     )
 61 | 
 62 |     parser.add_argument(
 63 |         "--max_num_runs",
 64 |         type=int,
 65 |         default=99999,
 66 |         help="Maximum number of runs for the agent to start.",
 67 |     )
 68 | 
 69 |     # parser.add_argument('--dataset', type=str, default='LDJnr/Puffin',
 70 |     #                     help='Dataset to use for training. Currently only supports Puffin.')
 71 | 
 72 |     return parser.parse_args()
 73 | 
 74 | 
 75 | DATASET_SIZES = {"Puffin": 3000}
 76 | 
 77 | 
 78 | def create_name(config_dict):
 79 |     short = {
 80 |         "gradient_accumulation_steps": "graccsteps",
 81 |         "learning_rate": "lr",
 82 |         "lora_r": "lora_r",
 83 |         "lora_dropout": "drop",
 84 |     }
 85 |     name = ""
 86 |     for hyperparam, value in config_dict.items():
 87 |         name += short.get(hyperparam, hyperparam) + str(value).replace(".", "_") + "-"
 88 |     return name[:-1]
 89 | 
 90 | 
 91 | def sweep():
 92 |     args = get_args()
 93 | 
 94 |     sweep_id = args.sweep_id
 95 | 
 96 |     if not sweep_id:
 97 |         sweep_config = yaml.safe_load(open(args.sweep_config))["wandb_args"]
 98 |         sweep_id = wandb.sweep(sweep_config, project=args.project)
 99 |         print(sweep_id)
100 |         with open("sweep_id.txt", "w") as file:
101 |             file.write(sweep_id)
102 | 
103 |     def run_sweep():
104 |         wandb.init(entity=args.entity)
105 |         config = dict(wandb.config)
106 | 
107 |         warmup_factor = (
108 |             config.pop("warmpup_steps_factor_of_epoch")
109 |             if "warmpup_steps_factor_of_epoch" in config
110 |             else None
111 |         )
112 |         finetune_type = config.pop("ft_type")
113 |         sweep_name = config.pop("sweep_name")
114 | 
115 |         run_name = args.project + "-" + sweep_name + "-" + finetune_type + "-" + create_name(config)
116 | 
117 |         wandb.run.name = run_name
118 |         with open(args.default_training_args, "r") as file:
119 |             run_config = yaml.safe_load(file)
120 | 
121 |         for hyperparameter, value in config.items():
122 |             run_config[hyperparameter] = value
123 | 
124 |         epoch_train_steps = int((DATASET_SIZES["Puffin"] *
125 |                                (1 - run_config["val_set_size"])) / (run_config["gradient_accumulation_steps"] * run_config["micro_batch_size"]))
126 | 
127 |         if warmup_factor:
128 |             run_config["warmup_steps"] = int(epoch_train_steps * warmup_factor)
129 | 
130 |         if run_config["eval_strategy"] == "epoch" and type(run_config["eval_steps"]) == float:
131 |             run_config["eval_steps"] = int(epoch_train_steps * run_config["eval_steps"])
132 |             run_config["eval_strategy"] = "steps"
133 | 
134 |         if run_config["save_strategy"] == "epoch" and type(run_config["save_steps"]) == float:
135 |             run_config["save_steps"] = int(epoch_train_steps * run_config["save_steps"])
136 |             run_config["save_strategy"] = "steps"
137 | 
138 |         if args.push_to_hub:
139 |             run_config["hub_model_id"] = "AblateIt/" + run_name
140 |             run_config["push_to_hub"] = True
141 |             run_config["hub_strategy"] = "all_checkpoints"
142 |             print(run_config["hub_model_id"])
143 | 
144 |         run_config["wandb_project"] = args.project
145 |         run_config["wandb_entity"] = args.entity
146 |         run_config["wandb_run_name"] = run_name
147 |         run_config["output_dir"] = run_config["output_dir"] + "/" + run_name + "/"
148 | 
149 |         run_config_path = run_config["output_dir"] + "config.yaml"
150 | 
151 |         if not os.path.exists(run_config["output_dir"]):
152 |             os.makedirs(run_config["output_dir"])
153 | 
154 |         with open(run_config_path, "w") as file:
155 |             yaml.dump(run_config, file)
156 |         print(run_config)
157 | 
158 |         # Run the training command with the temporary config file
159 |         cuda_device_declaration = (
160 |             "export CUDA_VISIBLE_DEVICES=" + ",".join([str(x) for x in args.gpu]) + "; "
161 |             if args.gpu
162 |             else ""
163 |         )
164 |         cmd = (
165 |             cuda_device_declaration
166 |             + f"accelerate launch axolotl/scripts/finetune.py {run_config_path} --main_process_port 0"
167 |         )
168 |         print(cmd)
169 |         call(cmd, shell=True)
170 | 
171 |     if args.sweep_id is not None:
172 |         # Run the sweep
173 |         wandb.agent(sweep_id, run_sweep, project=args.project, entity=args.entity, count=args.max_num_runs)
174 | 
175 | 
176 | if __name__ == "__main__":
177 |     sweep()
178 | 


--------------------------------------------------------------------------------