├── .gitignore ├── .gitmodules ├── CT_BERT_Huggingface_(GPU_training).ipynb ├── Finetune_COVID_Twitter_BERT.ipynb ├── LICENSE ├── README.md ├── README_pretrain.md ├── config.py ├── configs ├── bert_config_covid_twitter_bert.json ├── bert_config_large_uncased.json ├── bert_config_large_uncased_wwm.json └── bert_config_multi_cased.json ├── convert_tf2_to_pytorch ├── .gitignore ├── convert_tf2_to_pytorch.py ├── convert_tf2_to_pytorch_classifier.py ├── convert_tf2_to_pytorch_pretrain.py ├── test_classifier.py └── test_converted_models.py ├── data └── .gitignore ├── datasets └── covid_category │ ├── README.md │ └── covid_category.csv ├── images ├── .gitkeep ├── COVID-Twitter-BERT-graph.jpeg ├── COVID-Twitter-BERT-medium.png ├── COVID-Twitter-BERT.png └── COVID-Twitter-BERT_small.png ├── logs └── .gitignore ├── playground └── test_tokenize.py ├── preprocess ├── create_finetune_data.py ├── create_predict_data.py ├── create_pretrain_data.py ├── prepare_pretrain_data.py └── pretrain_helpers.py ├── report └── v1 │ ├── arxiv.sty │ ├── fig1.pdf │ ├── fig2.pdf │ ├── main.pdf │ ├── main.tex │ └── refs.bib ├── requirements.txt ├── run_finetune.py ├── run_predict.py ├── run_pretrain.py ├── scripts ├── convert_checkpoint_v1_to_v2.py ├── download_vocab_files.py ├── run_finetune.sh └── run_pretrain.sh ├── sync_bucket_data.py ├── utils ├── analysis.py ├── finetune_helpers.py ├── misc.py ├── model_training_utils.py ├── optimizer.py └── preprocess.py └── vocabs ├── bert-base-cased-vocab.txt ├── bert-base-multilingual-cased-vocab.txt ├── bert-base-multilingual-uncased-vocab.txt ├── bert-base-uncased-vocab.txt ├── bert-large-cased-vocab.txt ├── bert-large-cased-whole-word-masking-vocab.txt ├── bert-large-uncased-vocab.txt ├── bert-large-uncased-whole-word-masking-vocab.txt └── bert-multi-cased-vocab.txt /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/.gitignore -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/.gitmodules -------------------------------------------------------------------------------- /CT_BERT_Huggingface_(GPU_training).ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/CT_BERT_Huggingface_(GPU_training).ipynb -------------------------------------------------------------------------------- /Finetune_COVID_Twitter_BERT.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/Finetune_COVID_Twitter_BERT.ipynb -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/README.md -------------------------------------------------------------------------------- /README_pretrain.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/README_pretrain.md -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/config.py -------------------------------------------------------------------------------- /configs/bert_config_covid_twitter_bert.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/configs/bert_config_covid_twitter_bert.json -------------------------------------------------------------------------------- /configs/bert_config_large_uncased.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/configs/bert_config_large_uncased.json -------------------------------------------------------------------------------- /configs/bert_config_large_uncased_wwm.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/configs/bert_config_large_uncased_wwm.json -------------------------------------------------------------------------------- /configs/bert_config_multi_cased.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/configs/bert_config_multi_cased.json -------------------------------------------------------------------------------- /convert_tf2_to_pytorch/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/convert_tf2_to_pytorch/.gitignore -------------------------------------------------------------------------------- /convert_tf2_to_pytorch/convert_tf2_to_pytorch.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/convert_tf2_to_pytorch/convert_tf2_to_pytorch.py -------------------------------------------------------------------------------- /convert_tf2_to_pytorch/convert_tf2_to_pytorch_classifier.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/convert_tf2_to_pytorch/convert_tf2_to_pytorch_classifier.py -------------------------------------------------------------------------------- /convert_tf2_to_pytorch/convert_tf2_to_pytorch_pretrain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/convert_tf2_to_pytorch/convert_tf2_to_pytorch_pretrain.py -------------------------------------------------------------------------------- /convert_tf2_to_pytorch/test_classifier.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/convert_tf2_to_pytorch/test_classifier.py -------------------------------------------------------------------------------- /convert_tf2_to_pytorch/test_converted_models.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/convert_tf2_to_pytorch/test_converted_models.py -------------------------------------------------------------------------------- /data/.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | !.gitignore 3 | -------------------------------------------------------------------------------- /datasets/covid_category/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/datasets/covid_category/README.md -------------------------------------------------------------------------------- /datasets/covid_category/covid_category.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/datasets/covid_category/covid_category.csv -------------------------------------------------------------------------------- /images/.gitkeep: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /images/COVID-Twitter-BERT-graph.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/images/COVID-Twitter-BERT-graph.jpeg -------------------------------------------------------------------------------- /images/COVID-Twitter-BERT-medium.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/images/COVID-Twitter-BERT-medium.png -------------------------------------------------------------------------------- /images/COVID-Twitter-BERT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/images/COVID-Twitter-BERT.png -------------------------------------------------------------------------------- /images/COVID-Twitter-BERT_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/images/COVID-Twitter-BERT_small.png -------------------------------------------------------------------------------- /logs/.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | !.gitignore 3 | -------------------------------------------------------------------------------- /playground/test_tokenize.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/playground/test_tokenize.py -------------------------------------------------------------------------------- /preprocess/create_finetune_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/preprocess/create_finetune_data.py -------------------------------------------------------------------------------- /preprocess/create_predict_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/preprocess/create_predict_data.py -------------------------------------------------------------------------------- /preprocess/create_pretrain_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/preprocess/create_pretrain_data.py -------------------------------------------------------------------------------- /preprocess/prepare_pretrain_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/preprocess/prepare_pretrain_data.py -------------------------------------------------------------------------------- /preprocess/pretrain_helpers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/preprocess/pretrain_helpers.py -------------------------------------------------------------------------------- /report/v1/arxiv.sty: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/report/v1/arxiv.sty -------------------------------------------------------------------------------- /report/v1/fig1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/report/v1/fig1.pdf -------------------------------------------------------------------------------- /report/v1/fig2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/report/v1/fig2.pdf -------------------------------------------------------------------------------- /report/v1/main.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/report/v1/main.pdf -------------------------------------------------------------------------------- /report/v1/main.tex: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/report/v1/main.tex -------------------------------------------------------------------------------- /report/v1/refs.bib: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/report/v1/refs.bib -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/requirements.txt -------------------------------------------------------------------------------- /run_finetune.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/run_finetune.py -------------------------------------------------------------------------------- /run_predict.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/run_predict.py -------------------------------------------------------------------------------- /run_pretrain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/run_pretrain.py -------------------------------------------------------------------------------- /scripts/convert_checkpoint_v1_to_v2.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/scripts/convert_checkpoint_v1_to_v2.py -------------------------------------------------------------------------------- /scripts/download_vocab_files.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/scripts/download_vocab_files.py -------------------------------------------------------------------------------- /scripts/run_finetune.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/scripts/run_finetune.sh -------------------------------------------------------------------------------- /scripts/run_pretrain.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/scripts/run_pretrain.sh -------------------------------------------------------------------------------- /sync_bucket_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/sync_bucket_data.py -------------------------------------------------------------------------------- /utils/analysis.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/utils/analysis.py -------------------------------------------------------------------------------- /utils/finetune_helpers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/utils/finetune_helpers.py -------------------------------------------------------------------------------- /utils/misc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/utils/misc.py -------------------------------------------------------------------------------- /utils/model_training_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/utils/model_training_utils.py -------------------------------------------------------------------------------- /utils/optimizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/utils/optimizer.py -------------------------------------------------------------------------------- /utils/preprocess.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/utils/preprocess.py -------------------------------------------------------------------------------- /vocabs/bert-base-cased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-base-cased-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-base-multilingual-cased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-base-multilingual-cased-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-base-multilingual-uncased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-base-multilingual-uncased-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-base-uncased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-base-uncased-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-large-cased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-large-cased-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-large-cased-whole-word-masking-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-large-cased-whole-word-masking-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-large-uncased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-large-uncased-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-large-uncased-whole-word-masking-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-large-uncased-whole-word-masking-vocab.txt -------------------------------------------------------------------------------- /vocabs/bert-multi-cased-vocab.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/HEAD/vocabs/bert-multi-cased-vocab.txt --------------------------------------------------------------------------------