├── README.md ├── count_training_tokens.py ├── prepare_jsonl_shards.py ├── requirements.txt ├── setup.py └── train_unigram.py /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sayakpaul/count-tokens-hf-datasets/HEAD/README.md -------------------------------------------------------------------------------- /count_training_tokens.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sayakpaul/count-tokens-hf-datasets/HEAD/count_training_tokens.py -------------------------------------------------------------------------------- /prepare_jsonl_shards.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sayakpaul/count-tokens-hf-datasets/HEAD/prepare_jsonl_shards.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sayakpaul/count-tokens-hf-datasets/HEAD/requirements.txt -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sayakpaul/count-tokens-hf-datasets/HEAD/setup.py -------------------------------------------------------------------------------- /train_unigram.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sayakpaul/count-tokens-hf-datasets/HEAD/train_unigram.py --------------------------------------------------------------------------------