├── .gitignore ├── LICENSE ├── readme.md └── src ├── pretrain ├── __init__.py ├── jieba_tokenizer │ ├── jieba_t.py │ ├── special_tokens.txt │ └── train_vocab.py └── pretrain.py ├── requirements.txt └── wwm_pretrain ├── dataset └── __Init__.py ├── jieba_wwm_dataset.py └── pretrain.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | __pycache__ 3 | .vscode -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/LICENSE -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/readme.md -------------------------------------------------------------------------------- /src/pretrain/__init__.py: -------------------------------------------------------------------------------- 1 | __ 2 | -------------------------------------------------------------------------------- /src/pretrain/jieba_tokenizer/jieba_t.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/src/pretrain/jieba_tokenizer/jieba_t.py -------------------------------------------------------------------------------- /src/pretrain/jieba_tokenizer/special_tokens.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/src/pretrain/jieba_tokenizer/special_tokens.txt -------------------------------------------------------------------------------- /src/pretrain/jieba_tokenizer/train_vocab.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/src/pretrain/jieba_tokenizer/train_vocab.py -------------------------------------------------------------------------------- /src/pretrain/pretrain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/src/pretrain/pretrain.py -------------------------------------------------------------------------------- /src/requirements.txt: -------------------------------------------------------------------------------- 1 | transformers==4.6.1 2 | tensorflow==2.4.0 3 | jieba-fast==0.53 -------------------------------------------------------------------------------- /src/wwm_pretrain/dataset/__Init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/wwm_pretrain/jieba_wwm_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/src/wwm_pretrain/jieba_wwm_dataset.py -------------------------------------------------------------------------------- /src/wwm_pretrain/pretrain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LowinLi/chinese-bigbird/HEAD/src/wwm_pretrain/pretrain.py --------------------------------------------------------------------------------