├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── architecture └── README.md ├── assets └── wikipedia_precision.png ├── debug ├── NicerTrace.py ├── README.md ├── printflock.py └── torch-distributed-gpu-test.py ├── hparams └── README.md ├── instabilities └── README.md ├── parallelism └── README.md ├── resources └── README.md └── throughput ├── README.md └── all_reduce_bench.py /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/CODE_OF_CONDUCT.md -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/CONTRIBUTING.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/README.md -------------------------------------------------------------------------------- /architecture/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/architecture/README.md -------------------------------------------------------------------------------- /assets/wikipedia_precision.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/assets/wikipedia_precision.png -------------------------------------------------------------------------------- /debug/NicerTrace.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/debug/NicerTrace.py -------------------------------------------------------------------------------- /debug/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/debug/README.md -------------------------------------------------------------------------------- /debug/printflock.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/debug/printflock.py -------------------------------------------------------------------------------- /debug/torch-distributed-gpu-test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/debug/torch-distributed-gpu-test.py -------------------------------------------------------------------------------- /hparams/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/hparams/README.md -------------------------------------------------------------------------------- /instabilities/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/instabilities/README.md -------------------------------------------------------------------------------- /parallelism/README.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /resources/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/resources/README.md -------------------------------------------------------------------------------- /throughput/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/throughput/README.md -------------------------------------------------------------------------------- /throughput/all_reduce_bench.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/huggingface/large_language_model_training_playbook/HEAD/throughput/all_reduce_bench.py --------------------------------------------------------------------------------