├── LICENSE ├── README.md ├── bad_sample.jsonl ├── convert_dataset_hf_refinedpajama_json.py ├── good_sample.jsonl ├── mix_and_split.py ├── redpajama_v1 ├── download.sh └── urls.txt ├── starcoder-lang.list └── validate_json.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/README.md -------------------------------------------------------------------------------- /bad_sample.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/bad_sample.jsonl -------------------------------------------------------------------------------- /convert_dataset_hf_refinedpajama_json.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/convert_dataset_hf_refinedpajama_json.py -------------------------------------------------------------------------------- /good_sample.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/good_sample.jsonl -------------------------------------------------------------------------------- /mix_and_split.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/mix_and_split.py -------------------------------------------------------------------------------- /redpajama_v1/download.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/redpajama_v1/download.sh -------------------------------------------------------------------------------- /redpajama_v1/urls.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/redpajama_v1/urls.txt -------------------------------------------------------------------------------- /starcoder-lang.list: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/starcoder-lang.list -------------------------------------------------------------------------------- /validate_json.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLM360/amber-data-prep/HEAD/validate_json.py --------------------------------------------------------------------------------