├── .gitignore ├── README.md ├── data ├── test.txt ├── train.txt ├── train │ └── data.txt └── valid │ └── data.txt ├── pics ├── case.gif └── eval.png ├── requirements.txt ├── src ├── arguments.py ├── config │ └── opd.json ├── data │ ├── __init__.py │ ├── dataset.py │ ├── distributed_indexed.py │ └── indexed.py ├── generation.py ├── layer │ ├── __init__.py │ ├── attention.py │ ├── blocks.py │ ├── embedding.py │ ├── feedforward.py │ ├── layernorm.py │ ├── linear.py │ ├── position_embedding.py │ └── transformer.py ├── model │ ├── __init__.py │ ├── config.py │ └── opd.py ├── opd_inference_static.py ├── opd_interactive.py ├── opd_train.py ├── scripts │ ├── encode_data.sh │ ├── inference_static.sh │ ├── interactive.sh │ ├── prepare_data.sh │ └── train.sh ├── tokenizer │ ├── __init__.py │ └── opd_tokenizer.py └── tools │ ├── encode_data.py │ ├── indexed_dataset.py │ ├── merge_checkpoint.py │ └── prepare_data.py └── vocab └── vocab.txt /.gitignore: -------------------------------------------------------------------------------- 1 | results 2 | **/__pycache__ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/README.md -------------------------------------------------------------------------------- /data/test.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/data/test.txt -------------------------------------------------------------------------------- /data/train.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/data/train.txt -------------------------------------------------------------------------------- /data/train/data.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/data/train/data.txt -------------------------------------------------------------------------------- /data/valid/data.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/data/valid/data.txt -------------------------------------------------------------------------------- /pics/case.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/pics/case.gif -------------------------------------------------------------------------------- /pics/eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/pics/eval.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch>=1.10 2 | bmtrain 3 | transformers 4 | jieba 5 | tensorboard -------------------------------------------------------------------------------- /src/arguments.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/arguments.py -------------------------------------------------------------------------------- /src/config/opd.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/config/opd.json -------------------------------------------------------------------------------- /src/data/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/data/__init__.py -------------------------------------------------------------------------------- /src/data/dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/data/dataset.py -------------------------------------------------------------------------------- /src/data/distributed_indexed.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/data/distributed_indexed.py -------------------------------------------------------------------------------- /src/data/indexed.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/data/indexed.py -------------------------------------------------------------------------------- /src/generation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/generation.py -------------------------------------------------------------------------------- /src/layer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/__init__.py -------------------------------------------------------------------------------- /src/layer/attention.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/attention.py -------------------------------------------------------------------------------- /src/layer/blocks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/blocks.py -------------------------------------------------------------------------------- /src/layer/embedding.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/embedding.py -------------------------------------------------------------------------------- /src/layer/feedforward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/feedforward.py -------------------------------------------------------------------------------- /src/layer/layernorm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/layernorm.py -------------------------------------------------------------------------------- /src/layer/linear.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/linear.py -------------------------------------------------------------------------------- /src/layer/position_embedding.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/position_embedding.py -------------------------------------------------------------------------------- /src/layer/transformer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/layer/transformer.py -------------------------------------------------------------------------------- /src/model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/model/__init__.py -------------------------------------------------------------------------------- /src/model/config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/model/config.py -------------------------------------------------------------------------------- /src/model/opd.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/model/opd.py -------------------------------------------------------------------------------- /src/opd_inference_static.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/opd_inference_static.py -------------------------------------------------------------------------------- /src/opd_interactive.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/opd_interactive.py -------------------------------------------------------------------------------- /src/opd_train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/opd_train.py -------------------------------------------------------------------------------- /src/scripts/encode_data.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/scripts/encode_data.sh -------------------------------------------------------------------------------- /src/scripts/inference_static.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/scripts/inference_static.sh -------------------------------------------------------------------------------- /src/scripts/interactive.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/scripts/interactive.sh -------------------------------------------------------------------------------- /src/scripts/prepare_data.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/scripts/prepare_data.sh -------------------------------------------------------------------------------- /src/scripts/train.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/scripts/train.sh -------------------------------------------------------------------------------- /src/tokenizer/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/tokenizer/__init__.py -------------------------------------------------------------------------------- /src/tokenizer/opd_tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/tokenizer/opd_tokenizer.py -------------------------------------------------------------------------------- /src/tools/encode_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/tools/encode_data.py -------------------------------------------------------------------------------- /src/tools/indexed_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/tools/indexed_dataset.py -------------------------------------------------------------------------------- /src/tools/merge_checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/tools/merge_checkpoint.py -------------------------------------------------------------------------------- /src/tools/prepare_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/src/tools/prepare_data.py -------------------------------------------------------------------------------- /vocab/vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thu-coai/OPD/HEAD/vocab/vocab.txt --------------------------------------------------------------------------------