├── README.md ├── ac.py ├── config.py ├── data ├── cached_dev_RoBERTa_wwm_ext_150 ├── cached_train_RoBERTa_wwm_ext_150 ├── dev.txt ├── dev_original.txt ├── new_vocab.txt ├── test.txt ├── test_original.txt ├── train.txt └── train_original.txt ├── dataset.py ├── dicts ├── illegal.txt ├── illegal_char_split.txt ├── new.txt ├── suspected_illegal.txt └── trad2simp.txt ├── figure └── system.png ├── generate_data.py ├── illegal_check.py ├── models ├── fasttext_model.py └── roberta_model.py ├── preprocess.py ├── requirements.txt ├── train_fasttext.py └── train_roberta.py /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/README.md -------------------------------------------------------------------------------- /ac.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/ac.py -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/config.py -------------------------------------------------------------------------------- /data/cached_dev_RoBERTa_wwm_ext_150: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/cached_dev_RoBERTa_wwm_ext_150 -------------------------------------------------------------------------------- /data/cached_train_RoBERTa_wwm_ext_150: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/cached_train_RoBERTa_wwm_ext_150 -------------------------------------------------------------------------------- /data/dev.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/dev.txt -------------------------------------------------------------------------------- /data/dev_original.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/dev_original.txt -------------------------------------------------------------------------------- /data/new_vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/new_vocab.txt -------------------------------------------------------------------------------- /data/test.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/test.txt -------------------------------------------------------------------------------- /data/test_original.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/test_original.txt -------------------------------------------------------------------------------- /data/train.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/train.txt -------------------------------------------------------------------------------- /data/train_original.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/data/train_original.txt -------------------------------------------------------------------------------- /dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/dataset.py -------------------------------------------------------------------------------- /dicts/illegal.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/dicts/illegal.txt -------------------------------------------------------------------------------- /dicts/illegal_char_split.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/dicts/illegal_char_split.txt -------------------------------------------------------------------------------- /dicts/new.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/dicts/new.txt -------------------------------------------------------------------------------- /dicts/suspected_illegal.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/dicts/suspected_illegal.txt -------------------------------------------------------------------------------- /dicts/trad2simp.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/dicts/trad2simp.txt -------------------------------------------------------------------------------- /figure/system.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/figure/system.png -------------------------------------------------------------------------------- /generate_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/generate_data.py -------------------------------------------------------------------------------- /illegal_check.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/illegal_check.py -------------------------------------------------------------------------------- /models/fasttext_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/models/fasttext_model.py -------------------------------------------------------------------------------- /models/roberta_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/models/roberta_model.py -------------------------------------------------------------------------------- /preprocess.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/preprocess.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/requirements.txt -------------------------------------------------------------------------------- /train_fasttext.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/train_fasttext.py -------------------------------------------------------------------------------- /train_roberta.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wjx-git/IllegalTextDetection/HEAD/train_roberta.py --------------------------------------------------------------------------------