├── .gitignore ├── ANCE ├── data.py ├── gen_embeddings.py ├── gen_embeds_clueweb.sh ├── gen_embeds_webqa.sh ├── msmarco_eval.py ├── multi_model.py ├── retrieval.py ├── retrieval_clueweb.sh ├── retrieval_webqa.sh ├── train.py ├── train_clueweb.sh ├── train_webqa.sh ├── utils.py └── visual.py ├── ClueWeb22-MM ├── pretrain │ ├── clip_filter.py │ ├── filter_similarity.py │ └── merge_split_datasets.py ├── readme.md └── retrieval_benchmark │ ├── construct_new_clueweb_data.py │ ├── filter_by_ance.py │ ├── first_filter.py │ ├── gen_anchor.py │ ├── get_collection_no_duplicate.py │ ├── get_qrel.py │ ├── get_raw_data.py │ ├── merge_raw_data.py │ ├── new_image.py │ ├── remove_datasets_duplicate.py │ ├── remove_error_image.py │ ├── sample_one_label.py │ ├── save_top_data.py │ ├── second_filter.py │ └── split_newdata.py ├── DPR ├── data.py ├── gen_embeddings.py ├── gen_embeds_clueweb.sh ├── gen_embeds_webqa.sh ├── get_hard_negs_all.py ├── get_hn_clueweb.sh ├── get_hn_webqa.sh ├── msmarco_eval.py ├── multi_model.py ├── retrieval.py ├── retrieval_clueweb.sh ├── retrieval_webqa.sh ├── train.py ├── train_clueweb.sh ├── train_webqa.sh ├── utils.py └── visual.py ├── LICENSE ├── README.md ├── data └── README.md ├── image └── marvel.gif ├── pretrain ├── data.py ├── multi_model.py ├── train.py ├── train.sh └── utils.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /ANCE/data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/data.py -------------------------------------------------------------------------------- /ANCE/gen_embeddings.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/gen_embeddings.py -------------------------------------------------------------------------------- /ANCE/gen_embeds_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/gen_embeds_clueweb.sh -------------------------------------------------------------------------------- /ANCE/gen_embeds_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/gen_embeds_webqa.sh -------------------------------------------------------------------------------- /ANCE/msmarco_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/msmarco_eval.py -------------------------------------------------------------------------------- /ANCE/multi_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/multi_model.py -------------------------------------------------------------------------------- /ANCE/retrieval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/retrieval.py -------------------------------------------------------------------------------- /ANCE/retrieval_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/retrieval_clueweb.sh -------------------------------------------------------------------------------- /ANCE/retrieval_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/retrieval_webqa.sh -------------------------------------------------------------------------------- /ANCE/train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/train.py -------------------------------------------------------------------------------- /ANCE/train_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/train_clueweb.sh -------------------------------------------------------------------------------- /ANCE/train_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/train_webqa.sh -------------------------------------------------------------------------------- /ANCE/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/utils.py -------------------------------------------------------------------------------- /ANCE/visual.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ANCE/visual.py -------------------------------------------------------------------------------- /ClueWeb22-MM/pretrain/clip_filter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/pretrain/clip_filter.py -------------------------------------------------------------------------------- /ClueWeb22-MM/pretrain/filter_similarity.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/pretrain/filter_similarity.py -------------------------------------------------------------------------------- /ClueWeb22-MM/pretrain/merge_split_datasets.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/pretrain/merge_split_datasets.py -------------------------------------------------------------------------------- /ClueWeb22-MM/readme.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/readme.md -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/construct_new_clueweb_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/construct_new_clueweb_data.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/filter_by_ance.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/filter_by_ance.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/first_filter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/first_filter.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/gen_anchor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/gen_anchor.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/get_collection_no_duplicate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/get_collection_no_duplicate.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/get_qrel.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/get_qrel.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/get_raw_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/get_raw_data.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/merge_raw_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/merge_raw_data.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/new_image.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/new_image.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/remove_datasets_duplicate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/remove_datasets_duplicate.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/remove_error_image.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/remove_error_image.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/sample_one_label.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/sample_one_label.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/save_top_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/save_top_data.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/second_filter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/second_filter.py -------------------------------------------------------------------------------- /ClueWeb22-MM/retrieval_benchmark/split_newdata.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/ClueWeb22-MM/retrieval_benchmark/split_newdata.py -------------------------------------------------------------------------------- /DPR/data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/data.py -------------------------------------------------------------------------------- /DPR/gen_embeddings.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/gen_embeddings.py -------------------------------------------------------------------------------- /DPR/gen_embeds_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/gen_embeds_clueweb.sh -------------------------------------------------------------------------------- /DPR/gen_embeds_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/gen_embeds_webqa.sh -------------------------------------------------------------------------------- /DPR/get_hard_negs_all.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/get_hard_negs_all.py -------------------------------------------------------------------------------- /DPR/get_hn_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/get_hn_clueweb.sh -------------------------------------------------------------------------------- /DPR/get_hn_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/get_hn_webqa.sh -------------------------------------------------------------------------------- /DPR/msmarco_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/msmarco_eval.py -------------------------------------------------------------------------------- /DPR/multi_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/multi_model.py -------------------------------------------------------------------------------- /DPR/retrieval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/retrieval.py -------------------------------------------------------------------------------- /DPR/retrieval_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/retrieval_clueweb.sh -------------------------------------------------------------------------------- /DPR/retrieval_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/retrieval_webqa.sh -------------------------------------------------------------------------------- /DPR/train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/train.py -------------------------------------------------------------------------------- /DPR/train_clueweb.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/train_clueweb.sh -------------------------------------------------------------------------------- /DPR/train_webqa.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/train_webqa.sh -------------------------------------------------------------------------------- /DPR/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/utils.py -------------------------------------------------------------------------------- /DPR/visual.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/DPR/visual.py -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/README.md -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/data/README.md -------------------------------------------------------------------------------- /image/marvel.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/image/marvel.gif -------------------------------------------------------------------------------- /pretrain/data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/pretrain/data.py -------------------------------------------------------------------------------- /pretrain/multi_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/pretrain/multi_model.py -------------------------------------------------------------------------------- /pretrain/train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/pretrain/train.py -------------------------------------------------------------------------------- /pretrain/train.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/pretrain/train.sh -------------------------------------------------------------------------------- /pretrain/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/pretrain/utils.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenMatch/MARVEL/HEAD/requirements.txt --------------------------------------------------------------------------------