├── CITATION.cff ├── LICENSE ├── README.md ├── configs ├── llama_100m.json ├── llama_130m.json ├── llama_1b.json ├── llama_20m.json ├── llama_250m.json ├── llama_350m.json ├── llama_35m.json ├── llama_3b.json ├── llama_40m.json ├── llama_60m.json ├── llama_71m.json ├── llama_7b.json └── llama_9m.json ├── exp_requirements.txt ├── galore_torch ├── __init__.py ├── adafactor.py ├── adamw.py ├── adamw8bit.py ├── galore_projector.py └── galore_projector_tensor.py ├── imgs ├── galore_code_box.png └── subspace_learning.png ├── peft_pretraining ├── args_utils.py ├── dataloader.py ├── modeling_llama.py └── training_utils.py ├── requirements.txt ├── run_glue.py ├── scripts ├── benchmark_c4 │ ├── llama_130m.sh │ ├── llama_1b.sh │ ├── llama_350m.sh │ ├── llama_60m.sh │ └── llama_7b.sh ├── single_gpu │ ├── llama_7b.sh │ └── llama_7b_checkpointing.sh └── tensor_test │ └── neural_operator.py ├── setup.py └── torchrun_main.py /CITATION.cff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/CITATION.cff -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/README.md -------------------------------------------------------------------------------- /configs/llama_100m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_100m.json -------------------------------------------------------------------------------- /configs/llama_130m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_130m.json -------------------------------------------------------------------------------- /configs/llama_1b.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_1b.json -------------------------------------------------------------------------------- /configs/llama_20m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_20m.json -------------------------------------------------------------------------------- /configs/llama_250m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_250m.json -------------------------------------------------------------------------------- /configs/llama_350m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_350m.json -------------------------------------------------------------------------------- /configs/llama_35m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_35m.json -------------------------------------------------------------------------------- /configs/llama_3b.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_3b.json -------------------------------------------------------------------------------- /configs/llama_40m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_40m.json -------------------------------------------------------------------------------- /configs/llama_60m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_60m.json -------------------------------------------------------------------------------- /configs/llama_71m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_71m.json -------------------------------------------------------------------------------- /configs/llama_7b.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_7b.json -------------------------------------------------------------------------------- /configs/llama_9m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/configs/llama_9m.json -------------------------------------------------------------------------------- /exp_requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/exp_requirements.txt -------------------------------------------------------------------------------- /galore_torch/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/galore_torch/__init__.py -------------------------------------------------------------------------------- /galore_torch/adafactor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/galore_torch/adafactor.py -------------------------------------------------------------------------------- /galore_torch/adamw.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/galore_torch/adamw.py -------------------------------------------------------------------------------- /galore_torch/adamw8bit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/galore_torch/adamw8bit.py -------------------------------------------------------------------------------- /galore_torch/galore_projector.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/galore_torch/galore_projector.py -------------------------------------------------------------------------------- /galore_torch/galore_projector_tensor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/galore_torch/galore_projector_tensor.py -------------------------------------------------------------------------------- /imgs/galore_code_box.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/imgs/galore_code_box.png -------------------------------------------------------------------------------- /imgs/subspace_learning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/imgs/subspace_learning.png -------------------------------------------------------------------------------- /peft_pretraining/args_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/peft_pretraining/args_utils.py -------------------------------------------------------------------------------- /peft_pretraining/dataloader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/peft_pretraining/dataloader.py -------------------------------------------------------------------------------- /peft_pretraining/modeling_llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/peft_pretraining/modeling_llama.py -------------------------------------------------------------------------------- /peft_pretraining/training_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/peft_pretraining/training_utils.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch 2 | transformers 3 | bitsandbytes 4 | tensorly 5 | -------------------------------------------------------------------------------- /run_glue.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/run_glue.py -------------------------------------------------------------------------------- /scripts/benchmark_c4/llama_130m.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/benchmark_c4/llama_130m.sh -------------------------------------------------------------------------------- /scripts/benchmark_c4/llama_1b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/benchmark_c4/llama_1b.sh -------------------------------------------------------------------------------- /scripts/benchmark_c4/llama_350m.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/benchmark_c4/llama_350m.sh -------------------------------------------------------------------------------- /scripts/benchmark_c4/llama_60m.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/benchmark_c4/llama_60m.sh -------------------------------------------------------------------------------- /scripts/benchmark_c4/llama_7b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/benchmark_c4/llama_7b.sh -------------------------------------------------------------------------------- /scripts/single_gpu/llama_7b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/single_gpu/llama_7b.sh -------------------------------------------------------------------------------- /scripts/single_gpu/llama_7b_checkpointing.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/single_gpu/llama_7b_checkpointing.sh -------------------------------------------------------------------------------- /scripts/tensor_test/neural_operator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/scripts/tensor_test/neural_operator.py -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/setup.py -------------------------------------------------------------------------------- /torchrun_main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jiaweizzhao/GaLore/HEAD/torchrun_main.py --------------------------------------------------------------------------------