├── .gitignore ├── LICENSE ├── README.md ├── analysis ├── count_emojis.py ├── count_keywords.py └── uk_us_ratio.py ├── config ├── data.en.config ├── data.fr.config ├── data.ja.config ├── en_reddit_anonymized.yaml ├── example_config.yaml ├── fr_reddit_anonymized.yaml └── ja_reddit_anonymized.yaml ├── recipes ├── README.md ├── config.en-fr.eval.yaml ├── config.en-fr.tune.yaml ├── config.en-fr.yaml ├── config.en-ja.eval.yaml ├── config.en-ja.tune.yaml ├── config.en-ja.yaml ├── config.fr-en.eval.yaml ├── config.fr-en.tune.yaml ├── config.fr-en.yaml ├── config.ja-en.eval.yaml ├── config.ja-en.tune.yaml ├── config.ja-en.yaml └── config.wmt15.en-fr.1M.eval.yaml ├── resources ├── informal_pronouns.ja ├── profanities.en ├── profanities.fr ├── profanities.ja └── pronouns.ja ├── scripts ├── bleu_ja.sh ├── build_dic.py ├── download_en.sh ├── download_fr.sh ├── download_ja.sh ├── eval_kenlm.py ├── paired_bootstrap_resampling.py ├── post_processing.sh ├── prepare_fr-en.sh ├── prepare_model.sh ├── print_stats.py ├── remove-outliers.py ├── remove-tabs.py ├── start_scraper.sh ├── tokenize_sentencepiece.py ├── train_ngram_lm.sh └── train_sentencepiece.py └── src ├── __init__.py ├── noise.py ├── normalize_punctuation.py ├── run_scraper.py ├── scraper.py ├── text.py └── util.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/README.md -------------------------------------------------------------------------------- /analysis/count_emojis.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/analysis/count_emojis.py -------------------------------------------------------------------------------- /analysis/count_keywords.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/analysis/count_keywords.py -------------------------------------------------------------------------------- /analysis/uk_us_ratio.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/analysis/uk_us_ratio.py -------------------------------------------------------------------------------- /config/data.en.config: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/data.en.config -------------------------------------------------------------------------------- /config/data.fr.config: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/data.fr.config -------------------------------------------------------------------------------- /config/data.ja.config: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/data.ja.config -------------------------------------------------------------------------------- /config/en_reddit_anonymized.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/en_reddit_anonymized.yaml -------------------------------------------------------------------------------- /config/example_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/example_config.yaml -------------------------------------------------------------------------------- /config/fr_reddit_anonymized.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/fr_reddit_anonymized.yaml -------------------------------------------------------------------------------- /config/ja_reddit_anonymized.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/config/ja_reddit_anonymized.yaml -------------------------------------------------------------------------------- /recipes/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/README.md -------------------------------------------------------------------------------- /recipes/config.en-fr.eval.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.en-fr.eval.yaml -------------------------------------------------------------------------------- /recipes/config.en-fr.tune.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.en-fr.tune.yaml -------------------------------------------------------------------------------- /recipes/config.en-fr.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.en-fr.yaml -------------------------------------------------------------------------------- /recipes/config.en-ja.eval.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.en-ja.eval.yaml -------------------------------------------------------------------------------- /recipes/config.en-ja.tune.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.en-ja.tune.yaml -------------------------------------------------------------------------------- /recipes/config.en-ja.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.en-ja.yaml -------------------------------------------------------------------------------- /recipes/config.fr-en.eval.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.fr-en.eval.yaml -------------------------------------------------------------------------------- /recipes/config.fr-en.tune.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.fr-en.tune.yaml -------------------------------------------------------------------------------- /recipes/config.fr-en.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.fr-en.yaml -------------------------------------------------------------------------------- /recipes/config.ja-en.eval.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.ja-en.eval.yaml -------------------------------------------------------------------------------- /recipes/config.ja-en.tune.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.ja-en.tune.yaml -------------------------------------------------------------------------------- /recipes/config.ja-en.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.ja-en.yaml -------------------------------------------------------------------------------- /recipes/config.wmt15.en-fr.1M.eval.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/recipes/config.wmt15.en-fr.1M.eval.yaml -------------------------------------------------------------------------------- /resources/informal_pronouns.ja: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/resources/informal_pronouns.ja -------------------------------------------------------------------------------- /resources/profanities.en: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/resources/profanities.en -------------------------------------------------------------------------------- /resources/profanities.fr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/resources/profanities.fr -------------------------------------------------------------------------------- /resources/profanities.ja: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/resources/profanities.ja -------------------------------------------------------------------------------- /resources/pronouns.ja: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/resources/pronouns.ja -------------------------------------------------------------------------------- /scripts/bleu_ja.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/bleu_ja.sh -------------------------------------------------------------------------------- /scripts/build_dic.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/build_dic.py -------------------------------------------------------------------------------- /scripts/download_en.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/download_en.sh -------------------------------------------------------------------------------- /scripts/download_fr.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/download_fr.sh -------------------------------------------------------------------------------- /scripts/download_ja.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/download_ja.sh -------------------------------------------------------------------------------- /scripts/eval_kenlm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/eval_kenlm.py -------------------------------------------------------------------------------- /scripts/paired_bootstrap_resampling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/paired_bootstrap_resampling.py -------------------------------------------------------------------------------- /scripts/post_processing.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/post_processing.sh -------------------------------------------------------------------------------- /scripts/prepare_fr-en.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/prepare_fr-en.sh -------------------------------------------------------------------------------- /scripts/prepare_model.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/prepare_model.sh -------------------------------------------------------------------------------- /scripts/print_stats.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/print_stats.py -------------------------------------------------------------------------------- /scripts/remove-outliers.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/remove-outliers.py -------------------------------------------------------------------------------- /scripts/remove-tabs.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/remove-tabs.py -------------------------------------------------------------------------------- /scripts/start_scraper.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/start_scraper.sh -------------------------------------------------------------------------------- /scripts/tokenize_sentencepiece.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/tokenize_sentencepiece.py -------------------------------------------------------------------------------- /scripts/train_ngram_lm.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/train_ngram_lm.sh -------------------------------------------------------------------------------- /scripts/train_sentencepiece.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/scripts/train_sentencepiece.py -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/noise.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/src/noise.py -------------------------------------------------------------------------------- /src/normalize_punctuation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/src/normalize_punctuation.py -------------------------------------------------------------------------------- /src/run_scraper.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/src/run_scraper.py -------------------------------------------------------------------------------- /src/scraper.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/src/scraper.py -------------------------------------------------------------------------------- /src/text.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/src/text.py -------------------------------------------------------------------------------- /src/util.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pmichel31415/mtnt/HEAD/src/util.py --------------------------------------------------------------------------------