Component | Code |
---|---|
Model Code 22 | Literal code for building every 23 | layer of the model, and notebook 24 | used to figure some of this. 25 | | 26 | └── src 27 | ├── model.py 28 | ├── global_cache.py 29 | ├── transformer_block.py 30 | ├── attention.py 31 | ├── rope.py 32 | ├── ffn.py 33 | └── notebooks 34 | └── notes.ipynb 35 | |
Model Config 38 | Config data type, actual and debug 39 | configs, and helper notebook used 40 | to figure out config values. 41 | | 42 | └── src 43 | ├── params.py 44 | ├── model_configs 45 | └── my_llm_config.py 46 | └── notebooks 47 | └── parameters_tuning.ipynb 48 | |
Model Training 51 | Code to pretrain model, compute 52 | validation loss, scheduled AdamW 53 | wrapper, weight initialization 54 | scheme, and distributed training handler. 55 | | 56 | └── src 57 | ├── pretrain.py 58 | ├── model_assessment 59 | └── validation.py 60 | ├── model_utils 61 | ├── adamw_opt.py 62 | └── weight_init.py 63 | └── utils 64 | └── handle_ddp.py 65 | |
Training Data 68 | Code to download, tokenize, and 69 | shard 10 Billion Tokens (BT) 70 | worth of training data. And 71 | also randomly shuffle and 72 | load shards to memory while 73 | running distributed training. 74 | | 75 | └── src 76 | ├── data_processing 77 | ├── data_downloader.py 78 | └── training_data_loader.py 79 | └── utils 80 | └── rand_idx_seq_gen.py 81 | |
Model Assessment 84 | Code to autoregressively sample 85 | from the LF_LLM-269M model and 86 | run HellaSwag evaluation. 87 | | 88 | └── src 89 | └── model_assessment 90 | ├── sampling.py 91 | └── hellaswag.py 92 | |
Training Helpers 95 | Helper code to log, manage files, 96 | checkpoint, and graph metrics. 97 | | 98 | └── src 99 | ├── utils 100 | ├── logger.py 101 | └── root.py 102 | ├── model_utils 103 | ├── debugging.py 104 | └── checkpoint_utils.py 105 | └── notebooks 106 | └── metric_graphs.ipynb 107 | |
Temp Storage 110 | Temp storage where datasets, 111 | logs, checkpoints are stored 112 | for easy access as model is 113 | being trained/evaluated. 114 | | 115 | └── temp_data/* 116 | |