├── CITATION.cff ├── LICENSE ├── README.md ├── configs ├── llama_100m.json ├── llama_130m.json ├── llama_1b.json ├── llama_20m.json ├── llama_250m.json ├── llama_350m.json ├── llama_35m.json ├── llama_3b.json ├── llama_40m.json ├── llama_60m.json ├── llama_71m.json ├── llama_7b.json └── llama_9m.json ├── galore_scripts ├── benchmark_c4 │ ├── llama_130m.sh │ ├── llama_1b.sh │ ├── llama_350m.sh │ ├── llama_60m.sh │ └── llama_7b.sh └── single_gpu │ ├── llama_7b.sh │ └── llama_7b_checkpointing.sh ├── galore_torch ├── __init__.py ├── adafactor.py ├── adamw.py ├── adamw8bit.py ├── galore_lion.py ├── galore_projector.py └── lion.py ├── imgs ├── online_subspace_descent_code_box.png ├── perplexity_vs_length.png └── system_benchmark.png ├── peft_pretraining ├── args_utils.py ├── dataloader.py ├── modeling_llama.py └── training_utils.py ├── requirements.txt ├── run_glue.py ├── setup.py └── torchrun_main.py /CITATION.cff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/CITATION.cff -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/README.md -------------------------------------------------------------------------------- /configs/llama_100m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_100m.json -------------------------------------------------------------------------------- /configs/llama_130m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_130m.json -------------------------------------------------------------------------------- /configs/llama_1b.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_1b.json -------------------------------------------------------------------------------- /configs/llama_20m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_20m.json -------------------------------------------------------------------------------- /configs/llama_250m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_250m.json -------------------------------------------------------------------------------- /configs/llama_350m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_350m.json -------------------------------------------------------------------------------- /configs/llama_35m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_35m.json -------------------------------------------------------------------------------- /configs/llama_3b.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_3b.json -------------------------------------------------------------------------------- /configs/llama_40m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_40m.json -------------------------------------------------------------------------------- /configs/llama_60m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_60m.json -------------------------------------------------------------------------------- /configs/llama_71m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_71m.json -------------------------------------------------------------------------------- /configs/llama_7b.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_7b.json -------------------------------------------------------------------------------- /configs/llama_9m.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/configs/llama_9m.json -------------------------------------------------------------------------------- /galore_scripts/benchmark_c4/llama_130m.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/benchmark_c4/llama_130m.sh -------------------------------------------------------------------------------- /galore_scripts/benchmark_c4/llama_1b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/benchmark_c4/llama_1b.sh -------------------------------------------------------------------------------- /galore_scripts/benchmark_c4/llama_350m.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/benchmark_c4/llama_350m.sh -------------------------------------------------------------------------------- /galore_scripts/benchmark_c4/llama_60m.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/benchmark_c4/llama_60m.sh -------------------------------------------------------------------------------- /galore_scripts/benchmark_c4/llama_7b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/benchmark_c4/llama_7b.sh -------------------------------------------------------------------------------- /galore_scripts/single_gpu/llama_7b.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/single_gpu/llama_7b.sh -------------------------------------------------------------------------------- /galore_scripts/single_gpu/llama_7b_checkpointing.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_scripts/single_gpu/llama_7b_checkpointing.sh -------------------------------------------------------------------------------- /galore_torch/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/__init__.py -------------------------------------------------------------------------------- /galore_torch/adafactor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/adafactor.py -------------------------------------------------------------------------------- /galore_torch/adamw.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/adamw.py -------------------------------------------------------------------------------- /galore_torch/adamw8bit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/adamw8bit.py -------------------------------------------------------------------------------- /galore_torch/galore_lion.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/galore_lion.py -------------------------------------------------------------------------------- /galore_torch/galore_projector.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/galore_projector.py -------------------------------------------------------------------------------- /galore_torch/lion.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/galore_torch/lion.py -------------------------------------------------------------------------------- /imgs/online_subspace_descent_code_box.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/imgs/online_subspace_descent_code_box.png -------------------------------------------------------------------------------- /imgs/perplexity_vs_length.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/imgs/perplexity_vs_length.png -------------------------------------------------------------------------------- /imgs/system_benchmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/imgs/system_benchmark.png -------------------------------------------------------------------------------- /peft_pretraining/args_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/peft_pretraining/args_utils.py -------------------------------------------------------------------------------- /peft_pretraining/dataloader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/peft_pretraining/dataloader.py -------------------------------------------------------------------------------- /peft_pretraining/modeling_llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/peft_pretraining/modeling_llama.py -------------------------------------------------------------------------------- /peft_pretraining/training_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/peft_pretraining/training_utils.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/requirements.txt -------------------------------------------------------------------------------- /run_glue.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/run_glue.py -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/setup.py -------------------------------------------------------------------------------- /torchrun_main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kyleliang919/Online-Subspace-Descent/HEAD/torchrun_main.py --------------------------------------------------------------------------------