├── .amltignore ├── .gitignore ├── CONTRIBUTING.md ├── Dockerfile ├── LICENSE ├── README.md ├── SECURITY.md ├── assets ├── sambay poster.pdf ├── sambay_arch.png ├── sambay_poster.pdf ├── scaling_d8-d24.png ├── scaling_data_1B_mup_abl_tie.png └── shooting_stars.png ├── eval.py ├── eval_phonebook.py ├── eval_reason.py ├── eval_reason.sh ├── lit_gpt ├── __init__.py ├── config.py ├── diff_attn.py ├── fused_cross_entropy.py ├── fused_linear_cross_entropy.py ├── fused_rotary_embedding.py ├── gated_deltanet.py ├── gated_memory_unit.py ├── mamba2.py ├── mamba_simple.py ├── model.py ├── optim.py ├── packed_dataset.py ├── rmsnorm.py ├── rotary.py ├── selective_scan_interface.py ├── speed_monitor.py ├── tokenizer.py ├── triton_sequential_scan.py └── utils.py ├── plot_data_scaling.py ├── plot_flops_scaling.py ├── pretrain.py └── scripts ├── convert_hf_checkpoint.py ├── convert_lit_checkpoint.py ├── download.py ├── prepare_redpajama.py ├── prepare_slimpajama.py └── prepare_starcoder.py /.amltignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/.amltignore -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/.gitignore -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/CONTRIBUTING.md -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/Dockerfile -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/README.md -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/SECURITY.md -------------------------------------------------------------------------------- /assets/sambay poster.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/assets/sambay poster.pdf -------------------------------------------------------------------------------- /assets/sambay_arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/assets/sambay_arch.png -------------------------------------------------------------------------------- /assets/sambay_poster.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/assets/sambay_poster.pdf -------------------------------------------------------------------------------- /assets/scaling_d8-d24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/assets/scaling_d8-d24.png -------------------------------------------------------------------------------- /assets/scaling_data_1B_mup_abl_tie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/assets/scaling_data_1B_mup_abl_tie.png -------------------------------------------------------------------------------- /assets/shooting_stars.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/assets/shooting_stars.png -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/eval.py -------------------------------------------------------------------------------- /eval_phonebook.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/eval_phonebook.py -------------------------------------------------------------------------------- /eval_reason.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/eval_reason.py -------------------------------------------------------------------------------- /eval_reason.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/eval_reason.sh -------------------------------------------------------------------------------- /lit_gpt/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/__init__.py -------------------------------------------------------------------------------- /lit_gpt/config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/config.py -------------------------------------------------------------------------------- /lit_gpt/diff_attn.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/diff_attn.py -------------------------------------------------------------------------------- /lit_gpt/fused_cross_entropy.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/fused_cross_entropy.py -------------------------------------------------------------------------------- /lit_gpt/fused_linear_cross_entropy.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/fused_linear_cross_entropy.py -------------------------------------------------------------------------------- /lit_gpt/fused_rotary_embedding.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/fused_rotary_embedding.py -------------------------------------------------------------------------------- /lit_gpt/gated_deltanet.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/gated_deltanet.py -------------------------------------------------------------------------------- /lit_gpt/gated_memory_unit.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/gated_memory_unit.py -------------------------------------------------------------------------------- /lit_gpt/mamba2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/mamba2.py -------------------------------------------------------------------------------- /lit_gpt/mamba_simple.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/mamba_simple.py -------------------------------------------------------------------------------- /lit_gpt/model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/model.py -------------------------------------------------------------------------------- /lit_gpt/optim.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/optim.py -------------------------------------------------------------------------------- /lit_gpt/packed_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/packed_dataset.py -------------------------------------------------------------------------------- /lit_gpt/rmsnorm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/rmsnorm.py -------------------------------------------------------------------------------- /lit_gpt/rotary.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/rotary.py -------------------------------------------------------------------------------- /lit_gpt/selective_scan_interface.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/selective_scan_interface.py -------------------------------------------------------------------------------- /lit_gpt/speed_monitor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/speed_monitor.py -------------------------------------------------------------------------------- /lit_gpt/tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/tokenizer.py -------------------------------------------------------------------------------- /lit_gpt/triton_sequential_scan.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/triton_sequential_scan.py -------------------------------------------------------------------------------- /lit_gpt/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/lit_gpt/utils.py -------------------------------------------------------------------------------- /plot_data_scaling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/plot_data_scaling.py -------------------------------------------------------------------------------- /plot_flops_scaling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/plot_flops_scaling.py -------------------------------------------------------------------------------- /pretrain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/pretrain.py -------------------------------------------------------------------------------- /scripts/convert_hf_checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/scripts/convert_hf_checkpoint.py -------------------------------------------------------------------------------- /scripts/convert_lit_checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/scripts/convert_lit_checkpoint.py -------------------------------------------------------------------------------- /scripts/download.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/scripts/download.py -------------------------------------------------------------------------------- /scripts/prepare_redpajama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/scripts/prepare_redpajama.py -------------------------------------------------------------------------------- /scripts/prepare_slimpajama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/scripts/prepare_slimpajama.py -------------------------------------------------------------------------------- /scripts/prepare_starcoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/ArchScale/HEAD/scripts/prepare_starcoder.py --------------------------------------------------------------------------------