├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── compress.py ├── config ├── global │ └── demo.yaml └── vocabulary │ └── demo.txt ├── data ├── demo.txt ├── demo_decode_out.txt ├── demo_encode_out.bin ├── demo_encode_out.txt └── simple_wikipedia.txt ├── data_loader.py ├── decompress.py ├── file_io.py ├── model.py ├── modules.py ├── notebooks ├── mistral_model_block_demo.ipynb ├── pytorch_topk_demo.ipynb └── transformer_attntion_demo.ipynb ├── requirements.txt ├── tests ├── __init__.py └── test_file_io.py ├── tokenizer.py ├── trainer.py └── utils.py /.gitattributes: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/.gitattributes -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/README.md -------------------------------------------------------------------------------- /compress.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/compress.py -------------------------------------------------------------------------------- /config/global/demo.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/config/global/demo.yaml -------------------------------------------------------------------------------- /config/vocabulary/demo.txt: -------------------------------------------------------------------------------- 1 | A 2 | B 3 | C 4 | D 5 | E -------------------------------------------------------------------------------- /data/demo.txt: -------------------------------------------------------------------------------- 1 | sTpRpVpQpUpe m -------------------------------------------------------------------------------- /data/demo_decode_out.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/data/demo_decode_out.txt -------------------------------------------------------------------------------- /data/demo_encode_out.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/data/demo_encode_out.bin -------------------------------------------------------------------------------- /data/demo_encode_out.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/data/demo_encode_out.txt -------------------------------------------------------------------------------- /data/simple_wikipedia.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/data/simple_wikipedia.txt -------------------------------------------------------------------------------- /data_loader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/data_loader.py -------------------------------------------------------------------------------- /decompress.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/decompress.py -------------------------------------------------------------------------------- /file_io.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/file_io.py -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/model.py -------------------------------------------------------------------------------- /modules.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/modules.py -------------------------------------------------------------------------------- /notebooks/mistral_model_block_demo.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/notebooks/mistral_model_block_demo.ipynb -------------------------------------------------------------------------------- /notebooks/pytorch_topk_demo.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/notebooks/pytorch_topk_demo.ipynb -------------------------------------------------------------------------------- /notebooks/transformer_attntion_demo.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/notebooks/transformer_attntion_demo.ipynb -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pytorch >= 2.2.0 2 | tqdm 3 | simple-parsing 4 | munch 5 | pyyaml 6 | spacy -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/test_file_io.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/tests/test_file_io.py -------------------------------------------------------------------------------- /tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/tokenizer.py -------------------------------------------------------------------------------- /trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/trainer.py -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Shawn-Guo-CN/Lossless_Text_Compression_with_Transformer/HEAD/utils.py --------------------------------------------------------------------------------