├── .gitignore ├── LICENSE ├── README.md ├── StrQ2B ├── README.md ├── run.sh └── strQ2B.py ├── chinese_t2s ├── Data │ ├── wiki_zh_10.txt │ └── wiki_zh_10_sim.txt ├── README.md └── chinese_t2s.py ├── clean ├── Data │ ├── zhwiki.txt │ └── zhwiki_remove.txt ├── README.md └── clean_corpus.py ├── cn_to_arabic ├── README.md └── cn_to_arabic.py ├── extract_zh_char_stoke ├── Data │ └── giga_small.txt ├── README.md ├── Stoke │ ├── __init__.py │ ├── character_stoke.py │ ├── character_stoke_handian.py │ ├── default_stoke.txt │ └── handian.py ├── extract_zh_char_stoke.py ├── merge_file.py └── run.sh ├── python-2to3 ├── 2to3.py └── README.md ├── split_zh_bichar ├── Data │ ├── giga_small.txt │ └── giga_small_out.txt ├── README.md ├── run.sh └── split_zh_bichar.py ├── split_zh_char ├── Data │ ├── giga_small.txt │ └── giga_small_out.txt ├── README.md ├── run.sh └── split_zh_char.py ├── tag_for_NER ├── convert_tag.py └── run.sh └── wikidata_process ├── README.md └── wiki_process.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/README.md -------------------------------------------------------------------------------- /StrQ2B/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/StrQ2B/README.md -------------------------------------------------------------------------------- /StrQ2B/run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/StrQ2B/run.sh -------------------------------------------------------------------------------- /StrQ2B/strQ2B.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/StrQ2B/strQ2B.py -------------------------------------------------------------------------------- /chinese_t2s/Data/wiki_zh_10.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/chinese_t2s/Data/wiki_zh_10.txt -------------------------------------------------------------------------------- /chinese_t2s/Data/wiki_zh_10_sim.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/chinese_t2s/Data/wiki_zh_10_sim.txt -------------------------------------------------------------------------------- /chinese_t2s/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/chinese_t2s/README.md -------------------------------------------------------------------------------- /chinese_t2s/chinese_t2s.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/chinese_t2s/chinese_t2s.py -------------------------------------------------------------------------------- /clean/Data/zhwiki.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/clean/Data/zhwiki.txt -------------------------------------------------------------------------------- /clean/Data/zhwiki_remove.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/clean/Data/zhwiki_remove.txt -------------------------------------------------------------------------------- /clean/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/clean/README.md -------------------------------------------------------------------------------- /clean/clean_corpus.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/clean/clean_corpus.py -------------------------------------------------------------------------------- /cn_to_arabic/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/cn_to_arabic/README.md -------------------------------------------------------------------------------- /cn_to_arabic/cn_to_arabic.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/cn_to_arabic/cn_to_arabic.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/Data/giga_small.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/Data/giga_small.txt -------------------------------------------------------------------------------- /extract_zh_char_stoke/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/README.md -------------------------------------------------------------------------------- /extract_zh_char_stoke/Stoke/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/Stoke/__init__.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/Stoke/character_stoke.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/Stoke/character_stoke.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/Stoke/character_stoke_handian.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/Stoke/character_stoke_handian.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/Stoke/default_stoke.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/Stoke/default_stoke.txt -------------------------------------------------------------------------------- /extract_zh_char_stoke/Stoke/handian.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/Stoke/handian.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/extract_zh_char_stoke.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/extract_zh_char_stoke.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/merge_file.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/merge_file.py -------------------------------------------------------------------------------- /extract_zh_char_stoke/run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/extract_zh_char_stoke/run.sh -------------------------------------------------------------------------------- /python-2to3/2to3.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/python-2to3/2to3.py -------------------------------------------------------------------------------- /python-2to3/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/python-2to3/README.md -------------------------------------------------------------------------------- /split_zh_bichar/Data/giga_small.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_bichar/Data/giga_small.txt -------------------------------------------------------------------------------- /split_zh_bichar/Data/giga_small_out.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_bichar/Data/giga_small_out.txt -------------------------------------------------------------------------------- /split_zh_bichar/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_bichar/README.md -------------------------------------------------------------------------------- /split_zh_bichar/run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_bichar/run.sh -------------------------------------------------------------------------------- /split_zh_bichar/split_zh_bichar.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_bichar/split_zh_bichar.py -------------------------------------------------------------------------------- /split_zh_char/Data/giga_small.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_char/Data/giga_small.txt -------------------------------------------------------------------------------- /split_zh_char/Data/giga_small_out.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_char/Data/giga_small_out.txt -------------------------------------------------------------------------------- /split_zh_char/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_char/README.md -------------------------------------------------------------------------------- /split_zh_char/run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_char/run.sh -------------------------------------------------------------------------------- /split_zh_char/split_zh_char.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/split_zh_char/split_zh_char.py -------------------------------------------------------------------------------- /tag_for_NER/convert_tag.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/tag_for_NER/convert_tag.py -------------------------------------------------------------------------------- /tag_for_NER/run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/tag_for_NER/run.sh -------------------------------------------------------------------------------- /wikidata_process/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/wikidata_process/README.md -------------------------------------------------------------------------------- /wikidata_process/wiki_process.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dalinvip/corpus_process_script/HEAD/wikidata_process/wiki_process.py --------------------------------------------------------------------------------