├── LICENSE ├── README.md ├── analyze.py ├── gather.py ├── prepare_jsonl_chunks.py ├── prepare_pile_of_law_chunks.py ├── prepare_redpajama_chunks.py ├── prepare_starcoder_chunks.py ├── shuffle.py ├── starcoder_fim_main.py └── tokenize_datasets.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/README.md -------------------------------------------------------------------------------- /analyze.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/analyze.py -------------------------------------------------------------------------------- /gather.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/gather.py -------------------------------------------------------------------------------- /prepare_jsonl_chunks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/prepare_jsonl_chunks.py -------------------------------------------------------------------------------- /prepare_pile_of_law_chunks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/prepare_pile_of_law_chunks.py -------------------------------------------------------------------------------- /prepare_redpajama_chunks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/prepare_redpajama_chunks.py -------------------------------------------------------------------------------- /prepare_starcoder_chunks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/prepare_starcoder_chunks.py -------------------------------------------------------------------------------- /shuffle.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/shuffle.py -------------------------------------------------------------------------------- /starcoder_fim_main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/starcoder_fim_main.py -------------------------------------------------------------------------------- /tokenize_datasets.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/k2-data-prep/HEAD/tokenize_datasets.py --------------------------------------------------------------------------------