├── .gitignore ├── README.md ├── check_sentences.py ├── check_sentences_run.sh ├── corpus_counts.md ├── dict.txt ├── fix_tokenizer.py ├── preprocess.sh ├── preprocess_bigfile.sh ├── save_to_huggingface.py ├── sentencepiece_encoder.py ├── sentencepiece_trainer.py ├── shard_txt_file.sh ├── singularity_pytorch_bart.def ├── test_bart_checkpoint.py ├── tokenizers_encoder.py ├── tokenizers_encoder_run.sh ├── tokenizers_trainer.py ├── train_bart.sh └── train_bart_args.sh /.gitignore: -------------------------------------------------------------------------------- 1 | /data 2 | /config 3 | /checkpoints 4 | /fairseq 5 | /apex 6 | /multirun -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/README.md -------------------------------------------------------------------------------- /check_sentences.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/check_sentences.py -------------------------------------------------------------------------------- /check_sentences_run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/check_sentences_run.sh -------------------------------------------------------------------------------- /corpus_counts.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/corpus_counts.md -------------------------------------------------------------------------------- /dict.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/dict.txt -------------------------------------------------------------------------------- /fix_tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/fix_tokenizer.py -------------------------------------------------------------------------------- /preprocess.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/preprocess.sh -------------------------------------------------------------------------------- /preprocess_bigfile.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/preprocess_bigfile.sh -------------------------------------------------------------------------------- /save_to_huggingface.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/save_to_huggingface.py -------------------------------------------------------------------------------- /sentencepiece_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/sentencepiece_encoder.py -------------------------------------------------------------------------------- /sentencepiece_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/sentencepiece_trainer.py -------------------------------------------------------------------------------- /shard_txt_file.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/shard_txt_file.sh -------------------------------------------------------------------------------- /singularity_pytorch_bart.def: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/singularity_pytorch_bart.def -------------------------------------------------------------------------------- /test_bart_checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/test_bart_checkpoint.py -------------------------------------------------------------------------------- /tokenizers_encoder.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/tokenizers_encoder.py -------------------------------------------------------------------------------- /tokenizers_encoder_run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/tokenizers_encoder_run.sh -------------------------------------------------------------------------------- /tokenizers_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/tokenizers_trainer.py -------------------------------------------------------------------------------- /train_bart.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/train_bart.sh -------------------------------------------------------------------------------- /train_bart_args.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kb-labb/kb_bart/HEAD/train_bart_args.sh --------------------------------------------------------------------------------