├── .gitignore ├── Images └── ucreat_pipeline.png ├── Models ├── BM25 │ ├── .gitignore │ ├── README.md │ ├── config_files │ │ ├── README.md │ │ ├── configs_coliee21 │ │ │ ├── config_1.json │ │ │ ├── config_10.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ ├── config_5.json │ │ │ ├── config_6.json │ │ │ ├── config_7.json │ │ │ ├── config_8.json │ │ │ └── config_9.json │ │ ├── configs_coliee21_test_RR │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_test_atomic │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_test_events │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_test_iouf │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_train_RR │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_train_atomic │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_train_events │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_coliee21_train_iouf │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_test │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_test_RR │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_test_atomic │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_test_events │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_test_iouf │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_train │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_train_RR │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_train_atomic │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_train_events │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── configs_ik_train_iouf │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ ├── flip_configs.py │ │ ├── replicate_configs.py │ │ └── sentence_removed │ │ │ ├── configs_ik_test │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_test_RR │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_test_atomic │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_test_events │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_test_iouf │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_train │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_train_RR │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_train_atomic │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ ├── configs_ik_train_events │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ │ │ └── configs_ik_train_iouf │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ ├── config_4.json │ │ │ └── config_5.json │ ├── data │ │ ├── .gitkeep │ │ ├── README.md │ │ ├── auxiliary_scripts │ │ │ ├── convert_labels_file.ipynb │ │ │ ├── make_RR_corpus.ipynb │ │ │ ├── make_events_corpus.ipynb │ │ │ └── remove_citation_sentences.py │ │ ├── corpus │ │ │ ├── .gitkeep │ │ │ ├── COLIEE2021 │ │ │ │ ├── .gitkeep │ │ │ │ ├── test │ │ │ │ │ ├── .gitkeep │ │ │ │ │ ├── candidate │ │ │ │ │ │ └── .gitkeep │ │ │ │ │ └── query │ │ │ │ │ │ └── .gitkeep │ │ │ │ └── train │ │ │ │ │ ├── .gitkeep │ │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_RR_test │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_RR_train │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_test_atomic │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_test_events │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_test_iou_filtered │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_train_atomic │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_train_events │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── COLIEE2021_train_iou_filtered │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_test │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_test_RR │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_test_atomic │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_test_events │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_test_iouf │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_train │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_train_RR │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_train_atomic │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_train_events │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ ├── ik_train_iouf │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ └── sentence_removed │ │ │ │ ├── .gitkeep │ │ │ │ ├── ik_test │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_test_RR │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_test_atomic │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_test_events │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_test_iouf │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_train │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_train_RR │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_train_atomic │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ ├── ik_train_events │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ │ └── .gitkeep │ │ │ │ └── ik_train_iouf │ │ │ │ ├── .gitkeep │ │ │ │ ├── candidate │ │ │ │ └── .gitkeep │ │ │ │ └── query │ │ │ │ └── .gitkeep │ │ └── segment_dictionaries │ │ │ ├── .gitkeep │ │ │ ├── RR │ │ │ ├── .gitkeep │ │ │ └── Output_Data │ │ │ │ ├── .gitkeep │ │ │ │ └── RR_Output_Data │ │ │ │ └── .gitkeep │ │ │ ├── coliee21_processed │ │ │ ├── .gitkeep │ │ │ └── coliee21 │ │ │ │ ├── .gitkeep │ │ │ │ ├── test │ │ │ │ └── .gitkeep │ │ │ │ └── train │ │ │ │ └── .gitkeep │ │ │ ├── ilpcr_processed │ │ │ ├── .gitkeep │ │ │ └── ilpcr │ │ │ │ ├── .gitkeep │ │ │ │ ├── test │ │ │ │ └── .gitkeep │ │ │ │ └── train │ │ │ │ └── .gitkeep │ │ │ └── ilpcr_woc_processed │ │ │ ├── .gitkeep │ │ │ └── ilpcr_woc │ │ │ ├── .gitkeep │ │ │ ├── test │ │ │ └── .gitkeep │ │ │ └── train │ │ │ └── .gitkeep │ ├── evaluate_at_K.py │ ├── exp_results │ │ ├── .gitkeep │ │ ├── Images │ │ │ └── ILPCR_results.png │ │ └── README.md │ ├── get_exp_results.py │ ├── requirements.txt │ ├── run_script.ipynb │ ├── run_script.py │ └── spawn_processes.sh ├── Events-Extraction │ ├── Events_Extraction │ │ ├── extract_events.ipynb │ │ ├── extract_events.py │ │ └── input_details.json │ ├── IOU │ │ ├── Event_IOU_Sim.ipynb │ │ ├── Event_IOU_Sim.py │ │ ├── Event_IOU_Sim_input_details.json │ │ ├── IOU_filtered_input_details.json │ │ ├── IOU_filtered_text_dict.ipynb │ │ └── IOU_filtered_text_dict.py │ ├── README.md │ ├── RR │ │ └── Note.txt │ ├── SBERT │ │ ├── SBERT_Sim.ipynb │ │ ├── SBERT_Sim.py │ │ └── SBERT_input_details.json │ └── data │ │ └── Note.txt └── Transformer-Embeddings │ ├── .gitignore │ ├── README.md │ ├── config_files │ ├── configs_COLIEE │ │ ├── InCaseLawBERT │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── InLegalBERT │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── bert │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── bert_finetuned │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── distilbert │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ └── distilbert_finetuned │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ ├── configs_ILPCR │ │ ├── InCaseLawBERT │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── InLegalBERT │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── bert │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── bert_finetuned │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── distilbert │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ └── distilbert_finetuned │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ ├── configs_ILPCR_citation_removed │ │ ├── InCaseLawBERT │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── InLegalBERT │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── bert │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── bert_finetuned │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ ├── distilbert │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ │ └── distilbert_finetuned │ │ │ ├── config_1.json │ │ │ ├── config_2.json │ │ │ ├── config_3.json │ │ │ └── config_4.json │ └── replicate_configs.py │ ├── data │ ├── .gitkeep │ ├── README.md │ └── corpus │ │ ├── .gitkeep │ │ ├── COLIEE2021 │ │ ├── .gitkeep │ │ ├── test │ │ │ ├── .gitkeep │ │ │ ├── candidate │ │ │ │ └── .gitkeep │ │ │ └── query │ │ │ │ └── .gitkeep │ │ └── train │ │ │ ├── .gitkeep │ │ │ ├── candidate │ │ │ └── .gitkeep │ │ │ └── query │ │ │ └── .gitkeep │ │ ├── ik_test │ │ ├── .gitkeep │ │ ├── candidate │ │ │ └── .gitkeep │ │ └── query │ │ │ └── .gitkeep │ │ └── ik_train │ │ ├── .gitkeep │ │ ├── candidate │ │ └── .gitkeep │ │ └── query │ │ └── .gitkeep │ ├── evaluate_at_K.py │ ├── exp_results │ ├── Images │ │ └── ILPCR_results.png │ └── README.md │ ├── get_exp_results.py │ ├── requirements.txt │ ├── spawn_transformers.sh │ ├── transformer_score.ipynb │ └── transformer_score.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | .DS_Store 131 | -------------------------------------------------------------------------------- /Images/ucreat_pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Images/ucreat_pipeline.png -------------------------------------------------------------------------------- /Models/BM25/.gitignore: -------------------------------------------------------------------------------- 1 | # large/temporary folders 2 | exp_results/* 3 | __pycache__/ 4 | logs/ 5 | plots/ 6 | 7 | # segment dictionary for corpuses 8 | data/corpus/*/segment_dictionary.sav 9 | 10 | data/* 11 | 12 | !data/README.md 13 | !data/auxiliary_scripts 14 | !exp_results/Images/ 15 | !exp_results/README.md 16 | -------------------------------------------------------------------------------- /Models/BM25/config_files/README.md: -------------------------------------------------------------------------------- 1 | ## Directory Overview 2 | 3 | This folder contains several sample input config files for `run_script.py`. Config files are included for the COLIEE21, ILPCR and citation-sentence removed ILPCR corpus. The suffixes `events`, `atomic`, `iouf` and `RR` indicate config files for the experiments `non-atomic events`, `atomic events`, `events filtered docs` and `RR filtered docs` in the paper respectively. 4 | Each config file must contain the following entries for the run script to complete the experiment : 5 | 1. `path_prior_cases` : Path to the folder containing prior/candidate cases. The cases must be present as `.txt` files inside the folder. 6 | 2. `path_current_cases` : Path to the folder containing current/query cases. The cases must be present as `.txt` files inside the folder. 7 | 3. `true_labels_json` : Path to the label.json file containing the ground truth labels for candidate cases considered relevant per query case. This is required to compute F1@K and related evaluation metrics. The general format for this json file is 8 | ```json 9 | { 10 | "Query Set": [ // contains separate dictionary for each query case 11 | { 12 | "id" : , 13 | "query_name": 14 | "relevant candidates": [ // contains ids of relevant cases 15 | ... 16 | ] 17 | }, ... 18 | ], 19 | "Candidate Set": [ 20 | { 21 | "id": 22 | }, ... 23 | ] 24 | } 25 | ``` 26 | Please see an example label.json file included with the dataset for a concrete example. 27 | 28 | 4. `n_gram` : The gram value for which to run BM25. Unigram, bigram, etc. 29 | 30 | ## Auxiliary scripts 31 | 1. `replicate_configs.py` : Each config directory contains information for experiments conducted on the same corpus but with different `n_gram` value. Therefore, it is useful to take a single config file and programmatically create the remaining config files by changing the `n_gram` values. Example usage is 32 | ```bash 33 | python3 ./replicate_configs.py path/to/config_folder 34 | ``` 35 | 36 | 2. `flip_configs.py` : Our test and train corpus names differ in only the test/train identifier, for e.g. `ik_test_iouf`, `ik_train_iouf`. Hence, a config directory for the test corpus can be converted to the config directory for the train corpus. Example usage is 37 | ```bash 38 | python3 ./flip_configs.py path/to/config_folder 39 | ``` 40 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_10.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram" : 5 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 2 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 3 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 4 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 5 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_6.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_7.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram" : 2 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_8.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram" : 3 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21/config_9.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram" : 4 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_RR/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021_RR_test/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021_RR_test/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_RR/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_RR/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_RR/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_RR/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_atomic/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021_test_atomic/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021_test_atomic/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_atomic/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_atomic/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_atomic/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_atomic/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_events/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021_test_events/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021_test_events/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_events/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_events/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_events/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_events/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_iouf/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/COLIEE2021_test_iou_filtered/candidate", 3 | "path_current_cases" : "./data/corpus/COLIEE2021_test_iou_filtered/query", 4 | "true_labels_json" : "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_iouf/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_iouf/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_iouf/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_test_iouf/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_test_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_test_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_RR/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_RR/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_RR/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_RR/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_RR/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_RR_train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_RR_train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_atomic/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_atomic/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_atomic/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_atomic/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_atomic/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_atomic/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_events/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_events/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_events/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_events/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_events/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_events/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_events/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_iouf/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_iouf/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_iouf/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_iouf/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_coliee21_train_iouf/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021_train_iou_filtered/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021_train_iou_filtered/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 2 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 3 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 4 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 5 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_RR/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test_RR/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test_RR/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_RR/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_RR/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_RR/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_RR/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_atomic/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test_atomic/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test_atomic/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_atomic/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_atomic/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_atomic/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_atomic/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_events/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_current_cases" : "./data/corpus/ik_test_events/query", 3 | "path_prior_cases" : "./data/corpus/ik_test_events/candidate", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_events/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_current_cases": "./data/corpus/ik_test_events/query", 3 | "path_prior_cases": "./data/corpus/ik_test_events/candidate", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_events/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_current_cases": "./data/corpus/ik_test_events/query", 3 | "path_prior_cases": "./data/corpus/ik_test_events/candidate", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_events/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_current_cases": "./data/corpus/ik_test_events/query", 3 | "path_prior_cases": "./data/corpus/ik_test_events/candidate", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_events/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_current_cases": "./data/corpus/ik_test_events/query", 3 | "path_prior_cases": "./data/corpus/ik_test_events/candidate", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_iouf/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_test_iouf/candidate", 3 | "path_current_cases" : "./data/corpus/ik_test_iouf/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_iouf/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_iouf/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_iouf/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_test_iouf/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 2 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 3 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 4 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 5 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_RR/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_RR/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_RR/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_RR/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_RR/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_atomic/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train_atomic/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train_atomic/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_atomic/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_atomic/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_atomic/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_atomic/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_events/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/ik_train_events/candidate", 3 | "path_current_cases" : "./data/corpus/ik_train_events/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_events/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_events/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_events/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_events/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_iouf/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_iouf/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_iouf/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_iouf/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/configs_ik_train_iouf/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/flip_configs.py: -------------------------------------------------------------------------------- 1 | import os, sys, json 2 | 3 | if __name__ == '__main__': 4 | assert(len(sys.argv) > 1) 5 | folder_name = sys.argv[1] 6 | assert 'test' in folder_name.lower() and 'train' not in folder_name.lower(), f"Name : {folder_name} is wrong. Must contain just the keyword \"test\"" 7 | 8 | # make the train folder 9 | train_folder = folder_name.replace('test', 'train') 10 | os.makedirs(train_folder, exist_ok=False) 11 | 12 | # for each config in test folder convert to train config and save 13 | for config_path in os.listdir(folder_name): 14 | with open(folder_name + f'/{config_path}', 'r') as f: 15 | config = json.load(f) 16 | 17 | for key, value in config.items(): 18 | if type(value) == str: 19 | config[key] = value.replace('test', 'train') 20 | 21 | out_path = train_folder + f'/{config_path}' 22 | with open(out_path, 'w+') as f: 23 | json.dump(config, f, indent = 4) 24 | -------------------------------------------------------------------------------- /Models/BM25/config_files/replicate_configs.py: -------------------------------------------------------------------------------- 1 | import os, sys, json 2 | 3 | if __name__ == '__main__': 4 | assert(len(sys.argv) > 1) 5 | folder_name = sys.argv[1] 6 | assert(len(os.listdir(folder_name)) == 1) 7 | with open(f'./{folder_name}/config_1.json', 'r') as f: 8 | config = json.load(f) 9 | assert(config["n_gram"] == 1) 10 | for i in range(2,6): 11 | temp_config = config 12 | temp_config["n_gram"] = i 13 | with open(f'./{folder_name}/config_{i}.json', 'w') as f: 14 | json.dump(temp_config, f, indent = 4) 15 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 2 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 3 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 4 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 5 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_RR/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test_RR/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test_RR/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_RR/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_RR/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_RR/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_RR/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_RR/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_atomic/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test_atomic/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test_atomic/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_atomic/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_atomic/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_atomic/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_atomic/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_events/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test_events/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test_events/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_events/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_events/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_events/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_events/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_events/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_events/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_events/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_events/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_iouf/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_test_iouf/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_test_iouf/query", 4 | "true_labels_json" : "./data/corpus/ik_test/test.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_iouf/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_iouf/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_iouf/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_test_iouf/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 2 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 3 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 4 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 5 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_RR/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_RR/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_RR/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_RR/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_RR/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_RR/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_RR/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_atomic/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train_atomic/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train_atomic/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_atomic/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_atomic/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_atomic/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_atomic/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_atomic/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_atomic/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_events/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases" : "./data/corpus/sentence_removed/ik_train_events/candidate", 3 | "path_current_cases" : "./data/corpus/sentence_removed/ik_train_events/query", 4 | "true_labels_json" : "./data/corpus/ik_train/train.json", 5 | "n_gram" : 1 6 | } 7 | -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_events/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_events/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_events/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_events/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_events/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_events/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_iouf/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 1 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_iouf/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 2 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_iouf/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 3 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_iouf/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 4 6 | } -------------------------------------------------------------------------------- /Models/BM25/config_files/sentence_removed/configs_ik_train_iouf/config_5.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train_iouf/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train_iouf/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "n_gram": 5 6 | } -------------------------------------------------------------------------------- /Models/BM25/data/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/README.md: -------------------------------------------------------------------------------- 1 | ## Directory Overview 2 | 3 | This folder must contain the legal corpus in the `corpus/` subdirectory when running `run_script.py`. You can obtain the dataset via request, as mentioned in the project [README.md](../../../README.md). 4 | 5 | A valid corpus contains a `query/`, `candidate/` directory containing current and prior cases respectively. Also a `label.json` file is required which contains the ground truth labels for evaluating F1 scores. 6 | 7 | Additionally, the `data` folder contains the following auxiliary scripts : 8 | 1. `remove_citation_sentences.py` : Takes a corpus in the standard dataset format of query/, candidate/, labels.json and removes sentences containing the `` keyword (please see the paper for details). Used to create the citation-sentence removed IL-PCR corpus. 9 | 2. `make_events_corpus.ipynb` : Creates the events, atomic and iouf corpuses from segment dictionaries in `segment_dictionaries`. The segment dictionary contains corpus events obtained from the event extraction pipeline. Please see `` for details. Note for first time users : The segment dictionaries must be produced by the event extraction pipeline and the results should be present in the `segment_dictionaries` directory. 10 | 3. `make_RR_corpus.ipynb` : Creates the RR corpus from segment dictionaries present in `segment_dictionaries/RR` directory. 11 | 4. `convert_labels_file.ipynb` : Converts `.csv` ground truth files from the COLIEE dataset into the more convenient `.json` format that we use. 12 | -------------------------------------------------------------------------------- /Models/BM25/data/auxiliary_scripts/convert_labels_file.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "attachments": {}, 5 | "cell_type": "markdown", 6 | "metadata": {}, 7 | "source": [ 8 | "1. This file converts the ground truth label csv for the COLIEE2021/22 datasets into a lighter json format.\n", 9 | "2. The experiments for the paper U-CREAT work with the json labels file.\n", 10 | "3. If working with COLIEE datasets you must convert the .csv label files into the required .json format." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 2, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import json\n", 20 | "import pandas as pd, numpy as np\n", 21 | "import re, os, sys" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 5, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "label_file_path = '../corpus/ik_val/val.csv' \n", 31 | "\n", 32 | "query_dir_path = label_file_path[:label_file_path.rfind(\"/\")] + f'/query'\n", 33 | "candidate_dir_path = label_file_path[:label_file_path.rfind(\"/\")] + f'/candidate'\n", 34 | "save_path = label_file_path[:label_file_path.rfind(\".\")] + f'.json' # json file of the same name\n", 35 | "\n", 36 | "query_names = os.listdir(query_dir_path)\n", 37 | "candidate_names = os.listdir(candidate_dir_path)\n", 38 | "\n", 39 | "df = pd.read_csv(label_file_path)\n", 40 | "\n", 41 | "golden, golden_citations = {}, {}\n", 42 | "output_json = {\n", 43 | " \"Query Set\" : [], \n", 44 | " \"Candidate Set\" : [], \n", 45 | "}\n", 46 | "\n", 47 | "for i in range(df.shape[0]):\n", 48 | " query_case = df.iloc[i]['Query Case']\n", 49 | " query_case = query_case.zfill(14) # 6 for the coliee dataset\n", 50 | " assert query_case in query_names, f\"{query_case} not in query_names!\"\n", 51 | "\n", 52 | " candidate_cases = df.iloc[i]['Cited Cases']\n", 53 | " candidate_cases = [i for i in re.findall(r'\\d+.txt', candidate_cases)]\n", 54 | "\n", 55 | " for candidate in candidate_cases:\n", 56 | " assert candidate in candidate_names, f\"{candidate} not in candidate_names!\"\n", 57 | "\n", 58 | " # golden[query_case] = len(candidate_cases)\n", 59 | " # golden_citations[query_case] = candidate_cases\n", 60 | " output_json[\"Query Set\"].append({\n", 61 | " \"id\" : query_case, \n", 62 | " \"query_name\" : query_case, \n", 63 | " \"relevant candidates\" : candidate_cases,\n", 64 | " })\n", 65 | "\n", 66 | "output_json[\"Candidate Set\"] = [{\"id\" : i} for i in candidate_names]\n", 67 | "with open(save_path, 'w+') as f:\n", 68 | " json.dump(output_json, f, indent = 4)" 69 | ] 70 | } 71 | ], 72 | "metadata": { 73 | "kernelspec": { 74 | "display_name": "Python 3", 75 | "language": "python", 76 | "name": "python3" 77 | }, 78 | "language_info": { 79 | "codemirror_mode": { 80 | "name": "ipython", 81 | "version": 3 82 | }, 83 | "file_extension": ".py", 84 | "mimetype": "text/x-python", 85 | "name": "python", 86 | "nbconvert_exporter": "python", 87 | "pygments_lexer": "ipython3", 88 | "version": "3.10.6" 89 | }, 90 | "orig_nbformat": 4 91 | }, 92 | "nbformat": 4, 93 | "nbformat_minor": 2 94 | } 95 | -------------------------------------------------------------------------------- /Models/BM25/data/auxiliary_scripts/make_RR_corpus.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "attachments": {}, 5 | "cell_type": "markdown", 6 | "id": "d32c4370", 7 | "metadata": {}, 8 | "source": [ 9 | "## Get all rhetorical roles" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "id": "3aaef14e", 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "import os, sys, re\n", 20 | "from tqdm.notebook import tqdm_notebook" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 50, 26 | "id": "22bd10d7", 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "['Fact',\n", 33 | " 'Dissent',\n", 34 | " 'RatioOfTheDecision',\n", 35 | " 'Statute',\n", 36 | " 'Precedent',\n", 37 | " 'Argument',\n", 38 | " 'RulingByLowerCourt',\n", 39 | " 'RulingByPresentCourt']" 40 | ] 41 | }, 42 | "execution_count": 50, 43 | "metadata": {}, 44 | "output_type": "execute_result" 45 | } 46 | ], 47 | "source": [ 48 | "import json\n", 49 | "# path_RR_file = f\"../segment_dictionaries/RR/Output_Data/RR_Output_Data/sent_data_test_coliee21_candidate_rr_results.json\"\n", 50 | "# out_folder_path = f\"../corpus/COLIEE2021_RR_test/\"\n", 51 | "# query_numbers = os.listdir('../corpus/COLIEE2021/test/query/') # get query numbers\n", 52 | "\n", 53 | "query_numbers = [int(re.findall(r'\\d+', i)[0]) for i in query_numbers]\n", 54 | "\n", 55 | "with open(path_RR_file, 'r') as f:\n", 56 | " content = json.load(f)\n", 57 | "\n", 58 | "roles = set()\n", 59 | "for i,val in content.items():\n", 60 | " for sentence in val:\n", 61 | " roles.add(sentence[1])\n", 62 | "roles = list(roles)\n", 63 | "roles" 64 | ] 65 | }, 66 | { 67 | "attachments": {}, 68 | "cell_type": "markdown", 69 | "id": "08e2b09b", 70 | "metadata": {}, 71 | "source": [ 72 | "### Make corpus with custom RR" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "id": "61969068", 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "r_query = ['Fact', 'Argument', 'RatioOfTheDecision'] # roles selected for the query documents\n", 83 | "r_candidate = ['Fact', 'Argument', 'RatioOfTheDecision', 'RulingByPresentCourt'] # roles selected for the candidate documents\n", 84 | "\n", 85 | "query_path = out_folder_path + f'/query/'\n", 86 | "candidate_path = out_folder_path + f'/candidate/'\n", 87 | "os.makedirs(out_folder_path, exist_ok=True)\n", 88 | "os.makedirs(query_path, exist_ok=True)\n", 89 | "os.makedirs(candidate_path, exist_ok=True)\n", 90 | "\n", 91 | "query_corpus = []\n", 92 | "for num, doc in tqdm_notebook(content.items()):\n", 93 | " num = int(num)\n", 94 | " if(num in query_numbers):\n", 95 | " r1_content = [i[0] for i in doc if i[1] in r_query]\n", 96 | " file = query_path + f'{num:010d}.txt'\n", 97 | " with open(file, 'w') as f:\n", 98 | " f.write(\". \".join(r1_content))\n", 99 | "\n", 100 | "# get candidate\n", 101 | "for num, doc in tqdm_notebook(content.items()):\n", 102 | " num = int(num)\n", 103 | " r2_content = [i[0] for i in doc if i[1] in r_candidate]\n", 104 | " file = candidate_path + f'{num:010d}.txt'\n", 105 | " with open(file, 'w') as f:\n", 106 | " f.write(\". \".join(r2_content))" 107 | ] 108 | } 109 | ], 110 | "metadata": { 111 | "kernelspec": { 112 | "display_name": "venv_dataset", 113 | "language": "python", 114 | "name": "venv_dataset" 115 | }, 116 | "language_info": { 117 | "codemirror_mode": { 118 | "name": "ipython", 119 | "version": 3 120 | }, 121 | "file_extension": ".py", 122 | "mimetype": "text/x-python", 123 | "name": "python", 124 | "nbconvert_exporter": "python", 125 | "pygments_lexer": "ipython3", 126 | "version": "3.10.6" 127 | }, 128 | "vscode": { 129 | "interpreter": { 130 | "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" 131 | } 132 | } 133 | }, 134 | "nbformat": 4, 135 | "nbformat_minor": 5 136 | } 137 | -------------------------------------------------------------------------------- /Models/BM25/data/auxiliary_scripts/remove_citation_sentences.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Removes sentences containing the tag. 3 | ''' 4 | 5 | import numpy as np, pandas as pd, os, sys, json 6 | 7 | folder_path = f'../corpus/ik_test' 8 | destination_path = f'../corpus/sentence_removed/ik_test' 9 | 10 | assert(os.path.isdir(folder_path)) 11 | os.makedirs(destination_path + f'/query/', exist_ok=True) 12 | os.makedirs(destination_path + f'/candidate/', exist_ok=True) 13 | 14 | # taken from stackoverflow.com/questions/4576077/how-can-i-split-a-text-into-sentences 15 | # Added Rs. to prefixes 16 | 17 | import re 18 | alphabets= "([A-Za-z])" 19 | prefixes = "(Mr|St|Mrs|Ms|Dr|Rs)[.]" 20 | suffixes = "(Inc|Ltd|Jr|Sr|Co)" 21 | starters = "(Mr|Mrs|Ms|Dr|He\s|She\s|It\s|They\s|Their\s|Our\s|We\s|But\s|However\s|That\s|This\s|Wherever)" 22 | acronyms = "([A-Z][.][A-Z][.](?:[A-Z][.])?)" 23 | websites = "[.](com|net|org|io|gov)" 24 | digits = "([0-9])" 25 | 26 | def split_into_sentences(text): 27 | text = " " + text + " " 28 | text = text.replace("\n"," ") 29 | text = re.sub(prefixes,"\\1",text) 30 | text = re.sub(websites,"\\1",text) 31 | text = re.sub(digits + "[.]" + digits,"\\1\\2",text) 32 | if "..." in text: text = text.replace("...","") 33 | if "Ph.D" in text: text = text.replace("Ph.D.","PhD") 34 | text = re.sub("\s" + alphabets + "[.] "," \\1 ",text) 35 | text = re.sub(acronyms+" "+starters,"\\1 \\2",text) 36 | text = re.sub(alphabets + "[.]" + alphabets + "[.]" + alphabets + "[.]","\\1\\2\\3",text) 37 | text = re.sub(alphabets + "[.]" + alphabets + "[.]","\\1\\2",text) 38 | text = re.sub(" "+suffixes+"[.] "+starters," \\1 \\2",text) 39 | text = re.sub(" "+suffixes+"[.]"," \\1",text) 40 | text = re.sub(" " + alphabets + "[.]"," \\1",text) 41 | if "”" in text: text = text.replace(".”","”.") 42 | if "\"" in text: text = text.replace(".\"","\".") 43 | if "!" in text: text = text.replace("!\"","\"!") 44 | # if "?" in text: text = text.replace("?\"","\"?") 45 | text = text.replace(".",".") 46 | text = text.replace("?","?") 47 | text = text.replace("!","!") 48 | text = text.replace("",".") 49 | sentences = text.split("") 50 | sentences = sentences[:-1] 51 | sentences = [s.strip() for s in sentences] 52 | return sentences 53 | 54 | # do on query cases 55 | for f in (os.listdir(folder_path + f'/query/')): 56 | path = folder_path + f'/query/' + f 57 | outfile_path = destination_path + f'/query/' + f 58 | with open(path, 'r') as f_: 59 | content = f_.read() 60 | content = split_into_sentences(content) 61 | content = [i for i in content if not ('CITATION' in i)] # remove tag containing sentences 62 | content = ' '.join(content) 63 | 64 | with open(outfile_path, 'w+') as f_out: 65 | f_out.write(content) 66 | 67 | # do on candidate cases 68 | for f in (os.listdir(folder_path + f'/candidate/')): 69 | path = folder_path + f'/candidate/' + f 70 | outfile_path = destination_path + f'/candidate/' + f 71 | with open(path, 'r') as f_: 72 | content = f_.read() 73 | content = split_into_sentences(content) 74 | content = [i for i in content if not ('CITATION' in i)] # remove tag containing sentences 75 | content = ' '.join(content) 76 | 77 | with open(outfile_path, 'w+') as f_out: 78 | f_out.write(content) 79 | -------------------------------------------------------------------------------- /Models/BM25/data/corpus/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/test/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/test/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/test/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/test/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/train/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/train/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021/train/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021/train/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_RR_test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_RR_test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_RR_test/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_RR_test/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_RR_test/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_RR_test/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_RR_train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_RR_train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_RR_train/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_RR_train/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_RR_train/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_RR_train/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_atomic/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_atomic/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_atomic/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_atomic/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_atomic/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_atomic/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_events/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_events/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_events/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_events/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_events/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_events/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_iou_filtered/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_iou_filtered/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_iou_filtered/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_iou_filtered/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_test_iou_filtered/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_test_iou_filtered/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_atomic/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_atomic/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_atomic/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_atomic/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_atomic/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_atomic/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_events/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_events/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_events/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_events/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_events/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_events/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_iou_filtered/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_iou_filtered/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_iou_filtered/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_iou_filtered/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/COLIEE2021_train_iou_filtered/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/COLIEE2021_train_iou_filtered/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_RR/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_RR/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_RR/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_RR/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_RR/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_RR/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_atomic/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_atomic/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_atomic/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_atomic/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_atomic/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_atomic/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_events/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_events/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_events/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_events/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_events/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_events/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_iouf/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_iouf/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_iouf/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_iouf/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_test_iouf/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_test_iouf/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_RR/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_RR/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_RR/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_RR/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_RR/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_RR/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_atomic/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_atomic/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_atomic/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_atomic/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_atomic/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_atomic/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_events/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_events/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_events/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_events/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_events/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_events/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_iouf/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_iouf/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_iouf/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_iouf/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/ik_train_iouf/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/ik_train_iouf/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_RR/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_RR/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_RR/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_RR/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_RR/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_RR/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_atomic/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_atomic/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_atomic/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_atomic/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_atomic/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_atomic/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_events/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_events/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_events/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_events/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_events/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_events/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_iouf/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_iouf/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_iouf/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_iouf/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_test_iouf/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_test_iouf/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_RR/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_RR/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_RR/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_RR/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_RR/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_RR/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_atomic/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_atomic/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_atomic/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_atomic/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_atomic/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_atomic/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_events/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_events/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_events/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_events/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_events/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_events/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_iouf/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_iouf/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_iouf/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_iouf/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/corpus/sentence_removed/ik_train_iouf/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/corpus/sentence_removed/ik_train_iouf/query/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/RR/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/RR/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/RR/Output_Data/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/RR/Output_Data/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/RR/Output_Data/RR_Output_Data/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/RR/Output_Data/RR_Output_Data/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/coliee21_processed/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/coliee21_processed/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/coliee21_processed/coliee21/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/coliee21_processed/coliee21/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/coliee21_processed/coliee21/test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/coliee21_processed/coliee21/test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/coliee21_processed/coliee21/train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/coliee21_processed/coliee21/train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_processed/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_processed/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_processed/ilpcr/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_processed/ilpcr/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_processed/ilpcr/test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_processed/ilpcr/test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_processed/ilpcr/train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_processed/ilpcr/train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/ilpcr_woc/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/ilpcr_woc/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/ilpcr_woc/test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/ilpcr_woc/test/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/ilpcr_woc/train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/data/segment_dictionaries/ilpcr_woc_processed/ilpcr_woc/train/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/evaluate_at_K.py: -------------------------------------------------------------------------------- 1 | import os, sys, re 2 | import argparse 3 | 4 | import pandas as pd 5 | import numpy as np 6 | import matplotlib.pyplot as plt 7 | 8 | from tqdm import tqdm 9 | 10 | def get_micro_scores_at_K(actual, predicted, k): 11 | act_set = set(actual) 12 | pred_set = set(predicted[:k]) 13 | 14 | number_of_correctly_retrieved = len(act_set & pred_set) 15 | number_of_relevant_cases = len(act_set) 16 | number_of_retrieved_cases = k 17 | 18 | return number_of_correctly_retrieved, number_of_relevant_cases, number_of_retrieved_cases 19 | 20 | def get_f1_vs_K(gold_labels, similarity_df): 21 | precision_vs_K = [] 22 | recall_vs_K = [] 23 | f1_vs_K = [] 24 | for k in tqdm(range(1, 21)): 25 | precision_scores = [] 26 | recall_scores = [] 27 | f1_scores = [] 28 | number_of_correctly_retrieved_all = [] 29 | number_of_relevant_cases_all = [] 30 | number_of_retrieved_cases_all = [] 31 | for query_case_id in similarity_df.query_case_id.values: 32 | if query_case_id not in [ 33 | 1864396, 34 | 1508893 35 | ] : 36 | gold = gold_labels[ 37 | gold_labels["query_case_id"].values == query_case_id 38 | ].values[0][1:] 39 | actual = np.asarray(list(gold_labels.columns)[1:])[ 40 | np.logical_or(gold == 1, gold == -2) 41 | ] 42 | actual = [str(i) for i in actual] 43 | 44 | # candidate_docs = list(similarity_df.columns)[1:] 45 | candidate_docs = [int(i) for i in gold_labels.columns.values[1:]] 46 | column_name = 'query_case_id' if 'query_case_id' in similarity_df.columns else 'Unnamed: 0' 47 | similarity_scores = similarity_df[ 48 | similarity_df[column_name].values == query_case_id 49 | ].values[0][1:] 50 | assert(len(similarity_scores) == len(candidate_docs)) 51 | 52 | sorted_candidates = [ 53 | x 54 | for _, x in sorted( 55 | zip(similarity_scores, candidate_docs), 56 | key=lambda pair: float(pair[0]), 57 | reverse=True, 58 | ) 59 | ] 60 | sorted_candidates.remove((query_case_id)) 61 | sorted_candidates = [str(i) for i in sorted_candidates] 62 | 63 | number_of_correctly_retrieved, number_of_relevant_cases, number_of_retrieved_cases = get_micro_scores_at_K(actual=actual, predicted=sorted_candidates, k=k) 64 | number_of_correctly_retrieved_all.append(number_of_correctly_retrieved) 65 | number_of_relevant_cases_all.append(number_of_relevant_cases) 66 | number_of_retrieved_cases_all.append(number_of_retrieved_cases) 67 | 68 | recall_scores = np.sum(number_of_correctly_retrieved_all)/np.sum(number_of_relevant_cases_all) 69 | precision_scores = np.sum(number_of_correctly_retrieved_all)/np.sum(number_of_retrieved_cases_all) 70 | if recall_scores == 0 or precision_scores == 0: 71 | f1_scores = 0 72 | else : 73 | f1_scores = (2*precision_scores*recall_scores)/(precision_scores+recall_scores) 74 | 75 | recall_vs_K.append(recall_scores) 76 | precision_vs_K.append(precision_scores) 77 | f1_vs_K.append(f1_scores) 78 | 79 | return { 80 | "recall_vs_K" : recall_vs_K, 81 | "precision_vs_K" : precision_vs_K, 82 | "f1_vs_K" : f1_vs_K 83 | } 84 | 85 | if __name__ == '__main__': 86 | import json 87 | 88 | parser = argparse.ArgumentParser(description="evaluate_at_K.py") 89 | parser.add_argument( 90 | "--ground-truth", 91 | type=str, 92 | required=True, 93 | help="Path for json file containing ground truth labels against which the similarity file is to be compared.", 94 | ) 95 | parser.add_argument( 96 | "--sim-csv", 97 | type=str, 98 | required=True, 99 | help="Path for predicted similarity scores csv file.", 100 | ) 101 | 102 | args = parser.parse_args() 103 | true_labels_json = args.ground_truth 104 | sim_csv_path = args.sim_csv 105 | 106 | def obtain_sim_df_from_labels(labels): 107 | query_numbers = [int(re.findall(r'\d+', i["id"])[0]) for i in labels["Query Set"]] 108 | relevant_cases = [i["relevant candidates"] for i in labels["Query Set"]] 109 | relevant_cases = [[int(re.findall(r'\d+', j)[0]) for j in i] for i in relevant_cases] 110 | relevant_cases = {i:j for i,j in zip(query_numbers, relevant_cases)} 111 | 112 | candidate_numbers = [int(re.findall(r'\d+', i["id"])[0]) for i in labels["Candidate Set"]] 113 | candidate_numbers.sort() 114 | 115 | row_wise_dataframe = {} 116 | for query_number in sorted(list(relevant_cases.keys())): 117 | relevance_dict = {} # contains 0 for not relevant, 1 for relevant, -1 for self-relevance/citation 118 | for candidate in candidate_numbers: 119 | if candidate == query_number: 120 | relevance_dict[candidate] = -1 121 | elif candidate in relevant_cases[query_number]: 122 | relevance_dict[candidate] = 1 123 | else : 124 | relevance_dict[candidate] = 0 125 | 126 | row_wise_dataframe[query_number] = relevance_dict 127 | 128 | df = pd.DataFrame(row_wise_dataframe) 129 | df = df.T 130 | df.insert(loc=0, column='query_case_id', value=row_wise_dataframe.keys()) 131 | df = df.reset_index(drop=True) 132 | return df 133 | 134 | with open(true_labels_json, 'r') as f: 135 | true_labels = json.load(f) # get the gold labels file 136 | gold_labels_df = obtain_sim_df_from_labels(true_labels) 137 | sim_df = pd.read_csv(sim_csv_path) 138 | 139 | print(get_f1_vs_K(gold_labels_df, sim_df)) 140 | -------------------------------------------------------------------------------- /Models/BM25/exp_results/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/exp_results/.gitkeep -------------------------------------------------------------------------------- /Models/BM25/exp_results/Images/ILPCR_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/BM25/exp_results/Images/ILPCR_results.png -------------------------------------------------------------------------------- /Models/BM25/exp_results/README.md: -------------------------------------------------------------------------------- 1 | ## `exp_results` 2 | 3 | Results of each experiment run are stored here. Each experiment stores : 4 | 5 | 1. `config_file.json` : The config file for the experiment contains details such as the path for the candidate/query cases, the labels file and the `n_gram` value. Please see the primary BM25 [README.md](../README.md) for details. 6 | 7 | 2. `bm25_results.sav` : Dict containing the BM25 similarity score for each (query, candidate) pair. 8 | 9 | 3. `filled_similarity_matrix.csv` : pd.DataFrame object containing the BM25 similarity score for each (query, candidate) pair. 10 | 11 | 4. `output.json` : The evaluation metrics (`precision_vs_K`, `recall_vs_K`, `F1_vs_K`) for the experiment. For each query, the algorithm ranks the candidate pool and returns the top-`K` candidates as relevant. The precision/recall/F1 scores are obtained from comparing the candidates marked relevant against the ground truth labels. 12 | 13 | ### Results Summary 14 | 15 | ![Images/ILPCR_results.png](Images/ILPCR_results.png) 16 | 17 | -------------------------------------------------------------------------------- /Models/BM25/get_exp_results.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Print the results of all the experiments in the `exp_results/` folder. 3 | Fetches the best (k) from the train experiment and reports the testset F1 on the appropriate (k). 4 | ''' 5 | 6 | import os, sys, glob 7 | import json 8 | 9 | def get_train_counterpart(exp_name): 10 | assert(('test' in exp_name) and ('train' not in exp_name)) # for the test experiments 11 | return exp_name.replace("test", "train") 12 | 13 | def get_exp_result(exp_name): 14 | ''' 15 | Fetches the experiment folder matching the experiment_name and appends it to the global (EXP_RESULTS) list. 16 | Example usage : get_exp_result("ik_train_atomic"), get_exp_result("ik_test"), 17 | ''' 18 | path = f'./data/corpus/{exp_name}/' 19 | output_files = {} # contains all the folders (at all n_grams) of the queried experiment 20 | for config_file_path in glob.glob('./exp_results/*/config_file.json'): 21 | with open(config_file_path, 'r') as f: 22 | config = json.load(f) 23 | if config['path_prior_cases'].startswith(path): 24 | output_dict_path = config_file_path[:config_file_path.rfind('/')] + f'/output.json' 25 | 26 | assert os.path.isfile(output_dict_path), f"exp name \"{exp_name}\" exists but does not contain an output file. Experiment is either incomplete or corrupt." 27 | 28 | with open(output_dict_path, 'r') as f2: 29 | output_dict = json.load(f2) 30 | output_files[config['n_gram']] = output_dict # contains results corresponding to each (n_gram) 31 | 32 | if 'train' in exp_name: 33 | for n_gram, res in output_files.items(): 34 | best_k = res['f1_vs_K'].index(max(res['f1_vs_K'])) 35 | return_dict = { 36 | 'exp_name' : exp_name, 37 | 'n_gram' : n_gram, 38 | 'best_k' : best_k+1, 39 | 'precision_vs_K' : res['precision_vs_K'][best_k], 40 | 'recall_vs_K' : res['recall_vs_K'][best_k], 41 | 'f1_v_k' : res['f1_vs_K'][best_k], 42 | } 43 | EXP_RESULTS.append(return_dict) 44 | 45 | 46 | elif 'test' in exp_name: 47 | # the (best_k) value for test experiments requires the (best_k) value obtained from the train counter-part of the same experiment 48 | counter_trainname = get_train_counterpart(exp_name) 49 | for n_gram, res in output_files.items(): 50 | train_entry = [i for i in EXP_RESULTS if ((i['exp_name'] == counter_trainname) and (i['n_gram'] == n_gram))] 51 | if(len(train_entry) == 0): 52 | print(f'Train counterpart experiment for {exp_name} : {counter_trainname}, n_gram : {n_gram} not found!\nRerunning the train counterpart experiment should do the trick.') 53 | raise RuntimeError 54 | assert len(train_entry) == 1, f"experiment {exp_name} has mulitple counterparts : {train_entry}. Please delete redundant folders." # only 1 counterpart experiment 55 | best_k_train = train_entry[0]['best_k'] 56 | 57 | return_dict = { 58 | 'exp_name' : exp_name, 59 | 'n_gram' : n_gram, 60 | 'best_k_train' : best_k_train, 61 | 'precision_vs_K' : res['precision_vs_K'][best_k_train-1], 62 | 'recall_vs_K' : res['recall_vs_K'][best_k_train-1], # as the (best_K) value is saved starting from index = 1 63 | 'f1_v_k' : res['f1_vs_K'][best_k_train-1], 64 | } 65 | EXP_RESULTS.append(return_dict) 66 | 67 | return 68 | 69 | def show(i): 70 | i['precision_vs_K'] = f"{round(i['precision_vs_K']*100, 2)}%" 71 | i['recall_vs_K'] = f"{round(i['recall_vs_K']*100, 2)}%" 72 | 73 | i['f1_v_k'] = round(i['f1_v_k']*100, 2) 74 | i['f1_v_k'] = f"{i['f1_v_k']}%" 75 | 76 | if SHOW_MODE == 'TRAIN' and 'best_k' in i: 77 | print(i) 78 | 79 | if SHOW_MODE == 'TEST' and 'best_k_train' in i: 80 | print(i) 81 | 82 | if __name__ == '__main__': 83 | 84 | # list of experiments whose results are to be fetched 85 | exp_ILPCR = [ 86 | 'ik_train', 'ik_test', 87 | 'ik_train_atomic', 'ik_test_atomic', 88 | 'ik_train_events', 'ik_test_events', 89 | 'ik_train_iouf', 'ik_test_iouf', 90 | 'ik_train_RR', 'ik_test_RR', 91 | ] 92 | 93 | # for the citation-sentence removed ILPCR 94 | exp_ILPCR_sentence_removed = [f"sentence_removed/" + i for i in exp_ILPCR] 95 | 96 | exp_coliee = [ 97 | 'COLIEE2021/train', 98 | 'COLIEE2021/test', 99 | "COLIEE2021_train_events", 100 | "COLIEE2021_test_events", 101 | "COLIEE2021_train_atomic", 102 | "COLIEE2021_test_atomic", 103 | "COLIEE2021_train_iou_filtered", 104 | "COLIEE2021_test_iou_filtered", 105 | "COLIEE2021_RR_train", 106 | "COLIEE2021_RR_test", 107 | ] 108 | 109 | EXP_RESULTS = [] 110 | 111 | # for i in exp_ILPCR + exp_ILPCR_sentence_removed + exp_coliee: 112 | for i in exp_ILPCR : 113 | _ = get_exp_result(i) 114 | 115 | # SHOW_MODE = 'TRAIN' # print the train/test set results in particular 116 | SHOW_MODE = 'TEST' 117 | 118 | for i in EXP_RESULTS: 119 | show(i) 120 | -------------------------------------------------------------------------------- /Models/BM25/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | pandas 3 | tqdm 4 | 5 | scipy 6 | matplotlib 7 | scikit-learn 8 | sklearn 9 | -------------------------------------------------------------------------------- /Models/BM25/spawn_processes.sh: -------------------------------------------------------------------------------- 1 | run_experiment() { 2 | exp_name=$1 3 | for input in "${@:2}"; do 4 | config_name=./config_files/$exp_name/config_$input.json # fill your config pattern here 5 | log_path=./logs/$exp_name\_$input.txt 6 | if [[ -f $config_name ]]; then 7 | python3 -u ./run_script.py $config_name 1>$log_path 2>&1 & 8 | else 9 | echo "config_file DOES NOT exists : $config_name" 10 | fi 11 | done 12 | } 13 | 14 | mkdir -p logs 15 | mkdir -p logs/sentence_removed 16 | 17 | # for the ILPCR corpus 18 | run_experiment configs_ik_train 1 2 19 | run_experiment configs_ik_test 1 2 20 | run_experiment configs_ik_train_atomic 1 2 3 21 | run_experiment configs_ik_test_atomic 1 2 3 22 | run_experiment configs_ik_train_events 1 2 3 4 5 23 | run_experiment configs_ik_test_events 1 2 3 4 5 24 | run_experiment configs_ik_train_iouf 1 2 3 4 5 25 | run_experiment configs_ik_test_iouf 1 2 3 4 5 26 | run_experiment configs_ik_train_RR 1 2 3 4 5 27 | run_experiment configs_ik_test_RR 1 2 3 4 5 28 | 29 | # for citation sentence removed ILPCR 30 | run_experiment sentence_removed/configs_ik_train 1 2 31 | run_experiment sentence_removed/configs_ik_test 1 2 32 | run_experiment sentence_removed/configs_ik_train_atomic 1 2 3 33 | run_experiment sentence_removed/configs_ik_test_atomic 1 2 3 34 | run_experiment sentence_removed/configs_ik_train_events 1 2 3 4 5 35 | run_experiment sentence_removed/configs_ik_test_events 1 2 3 4 5 36 | run_experiment sentence_removed/configs_ik_train_iouf 1 2 3 4 5 37 | run_experiment sentence_removed/configs_ik_test_iouf 1 2 3 4 5 38 | run_experiment sentence_removed/configs_ik_train_RR 1 2 3 4 5 39 | run_experiment sentence_removed/configs_ik_test_RR 1 2 3 4 5 40 | 41 | # for COLIEE21 42 | run_experiment configs_coliee21 1 2 3 4 5 6 7 8 9 10 43 | run_experiment configs_coliee21_train_atomic 1 2 3 44 | run_experiment configs_coliee21_test_atomic 1 2 3 45 | run_experiment configs_coliee21_train_events 1 2 3 4 5 46 | run_experiment configs_coliee21_test_events 1 2 3 4 5 47 | run_experiment configs_coliee21_train_iouf 1 2 3 4 5 48 | run_experiment configs_coliee21_test_iouf 1 2 3 4 5 49 | run_experiment configs_coliee21_train_RR 1 2 3 4 5 50 | run_experiment configs_coliee21_test_RR 1 2 3 4 5 51 | -------------------------------------------------------------------------------- /Models/Events-Extraction/Events_Extraction/input_details.json: -------------------------------------------------------------------------------- 1 | { 2 | "input_root": "../data", 3 | "dataset": "ilpcr", 4 | "split_type": "test", 5 | "files_type": "query", 6 | "output_root": "../data/ilpcr_processed" 7 | } -------------------------------------------------------------------------------- /Models/Events-Extraction/IOU/Event_IOU_Sim.ipynb: -------------------------------------------------------------------------------- 1 | {"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["%%"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import pandas as pd\n", "import pickle as pkl\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from tqdm import tqdm\n", "import json\n", "import os"]}, {"cell_type": "markdown", "metadata": {}, "source": ["load Event_IOU_Sim_input_details.json"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["with open('Event_IOU_Sim_input_details.json') as f:\n", " input_details = json.load(f)\n", "dataset = input_details['dataset']\n", "split_type = input_details['split_type'] # test,train\n", "path_sim = \"./Sim_CSVs/\"+dataset+\"/\"\n", "os.makedirs(os.path.dirname(path_sim), exist_ok=True)\n", "path_sim_csv = path_sim +\"/\"+dataset+\"_\"+split_type+'_IOU_sim.csv'\n", "query_seg = input_details['query_segment_dictionary_path']\n", "candidate_seg = input_details['candidate_segment_dictionary_path']\n", "print(\"dataset:\",dataset)\n", "print(\"split_type:\",split_type)\n", "print(\"query_seg_path:\",query_seg)\n", "print(\"candidate_seg_path:\",candidate_seg)\n", "# %%\n", "# open query_seg and candidate_seg\n", "with open(query_seg, 'rb') as f:\n", " query_segment_dict = pkl.load(f)\n", " f.close()\n", "with open(candidate_seg, 'rb') as f:\n", " candidate_segment_dict = pkl.load(f)\n", " f.close()\n", "segment_dict = {\"dict_query\": query_segment_dict['dict_query'],\n", " \"dict_candidate\": candidate_segment_dict['dict_candidate']}"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Jaccard Similarity"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["def Jaccard_Similarity(doc1, doc2):\n", " # List the unique words in a document\n", " words_doc1 = set(doc1)\n", " words_doc2 = set(doc2)\n\n", " # Find the intersection of words list of doc1 & doc2\n", " intersection = words_doc1.intersection(words_doc2)\n\n", " # Find the union of words list of doc1 & doc2\n", " union = words_doc1.union(words_doc2)\n\n", " # Calculate Jaccard similarity score\n", " # using length of intersection set divided by length of union set\n", " if len(intersection) == 0:\n", " return 0\n", " return float(len(intersection)) / len(union)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["%%"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["tot_jaccard_dict = dict()\n", "tot_jaccard_query_case_rank = dict()\n", "for query in tqdm(segment_dict['dict_query'].keys(), desc=\"Jacc Sim:\"):\n", " jaccard = dict()\n", " for citation in segment_dict['dict_candidate'].keys():\n", " # if citation != query:\n", " doc1 = segment_dict['dict_query'][query]\n", " doc2 = segment_dict['dict_candidate'][citation]\n", " res = Jaccard_Similarity(doc1, doc2)\n", " jaccard[citation] = res\n", " t = list(jaccard.items())\n", " jaccard_sorted = sorted(t, key=lambda x: x[1], reverse=True)\n", " jaccard_case_rank = []\n", " for tup in jaccard_sorted:\n", " jaccard_case_rank.append(tup[0])\n", " tot_jaccard_dict[query] = jaccard\n", " tot_jaccard_query_case_rank[query] = jaccard_case_rank\n", " # break"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["all_queries = sorted(list(segment_dict[\"dict_query\"].keys()))\n", "all_candidates = sorted(list(segment_dict[\"dict_candidate\"].keys()))\n", "print(\"Length of queries:\", len(all_queries))\n", "print(\"Length of candidates:\", len(all_candidates))\n", "# %%\n", "IOU_sim = dict()\n", "for query in tqdm(all_queries):\n", " jacc_score_dict = tot_jaccard_dict[query]\n", " # print(query)\n", " IOU_sim[query] = list()\n", " for candidate in all_candidates:\n", " IOU_sim[query].append(jacc_score_dict[candidate])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["%%"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["pd.DataFrame.from_dict(data=IOU_sim, orient='index').to_csv(\n", " path_sim_csv, header=all_candidates)\n", "df = pd.read_csv(path_sim_csv)\n", "df.columns = ['query_case_id'] + list(df.columns[1:])\n", "df.to_csv(path_sim_csv, index=False)"]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4"}}, "nbformat": 4, "nbformat_minor": 2} -------------------------------------------------------------------------------- /Models/Events-Extraction/IOU/Event_IOU_Sim.py: -------------------------------------------------------------------------------- 1 | # %% 2 | import pandas as pd 3 | import pickle as pkl 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | from tqdm import tqdm 7 | import json 8 | import os 9 | 10 | # load Event_IOU_Sim_input_details.json 11 | with open('Event_IOU_Sim_input_details.json') as f: 12 | input_details = json.load(f) 13 | dataset = input_details['dataset'] 14 | split_type = input_details['split_type'] # test,train 15 | path_sim = "./Sim_CSVs/"+dataset+"/" 16 | os.makedirs(os.path.dirname(path_sim), exist_ok=True) 17 | path_sim_csv = path_sim +"/"+dataset+"_"+split_type+'_IOU_sim.csv' 18 | query_seg = input_details['query_segment_dictionary_path'] 19 | candidate_seg = input_details['candidate_segment_dictionary_path'] 20 | print("dataset:",dataset) 21 | print("split_type:",split_type) 22 | print("query_seg_path:",query_seg) 23 | print("candidate_seg_path:",candidate_seg) 24 | # %% 25 | # open query_seg and candidate_seg 26 | with open(query_seg, 'rb') as f: 27 | query_segment_dict = pkl.load(f) 28 | f.close() 29 | with open(candidate_seg, 'rb') as f: 30 | candidate_segment_dict = pkl.load(f) 31 | f.close() 32 | segment_dict = {"dict_query": query_segment_dict['dict_query'], 33 | "dict_candidate": candidate_segment_dict['dict_candidate']} 34 | 35 | # Jaccard Similarity 36 | 37 | 38 | def Jaccard_Similarity(doc1, doc2): 39 | # List the unique words in a document 40 | words_doc1 = set(doc1) 41 | words_doc2 = set(doc2) 42 | 43 | # Find the intersection of words list of doc1 & doc2 44 | intersection = words_doc1.intersection(words_doc2) 45 | 46 | # Find the union of words list of doc1 & doc2 47 | union = words_doc1.union(words_doc2) 48 | 49 | # Calculate Jaccard similarity score 50 | # using length of intersection set divided by length of union set 51 | if len(intersection) == 0: 52 | return 0 53 | return float(len(intersection)) / len(union) 54 | 55 | 56 | # %% 57 | tot_jaccard_dict = dict() 58 | tot_jaccard_query_case_rank = dict() 59 | for query in tqdm(segment_dict['dict_query'].keys(), desc="Jacc Sim:"): 60 | jaccard = dict() 61 | for citation in segment_dict['dict_candidate'].keys(): 62 | # if citation != query: 63 | doc1 = segment_dict['dict_query'][query] 64 | doc2 = segment_dict['dict_candidate'][citation] 65 | res = Jaccard_Similarity(doc1, doc2) 66 | jaccard[citation] = res 67 | t = list(jaccard.items()) 68 | jaccard_sorted = sorted(t, key=lambda x: x[1], reverse=True) 69 | jaccard_case_rank = [] 70 | for tup in jaccard_sorted: 71 | jaccard_case_rank.append(tup[0]) 72 | tot_jaccard_dict[query] = jaccard 73 | tot_jaccard_query_case_rank[query] = jaccard_case_rank 74 | # break 75 | 76 | all_queries = sorted(list(segment_dict["dict_query"].keys())) 77 | all_candidates = sorted(list(segment_dict["dict_candidate"].keys())) 78 | print("Length of queries:", len(all_queries)) 79 | print("Length of candidates:", len(all_candidates)) 80 | # %% 81 | IOU_sim = dict() 82 | for query in tqdm(all_queries): 83 | jacc_score_dict = tot_jaccard_dict[query] 84 | # print(query) 85 | IOU_sim[query] = list() 86 | for candidate in all_candidates: 87 | IOU_sim[query].append(jacc_score_dict[candidate]) 88 | 89 | # %% 90 | pd.DataFrame.from_dict(data=IOU_sim, orient='index').to_csv( 91 | path_sim_csv, header=all_candidates) 92 | df = pd.read_csv(path_sim_csv) 93 | df.columns = ['query_case_id'] + list(df.columns[1:]) 94 | df.to_csv(path_sim_csv, index=False) 95 | -------------------------------------------------------------------------------- /Models/Events-Extraction/IOU/Event_IOU_Sim_input_details.json: -------------------------------------------------------------------------------- 1 | { 2 | "dataset": "ilpcr", 3 | "split_type": "test", 4 | "query_segment_dictionary_path": "../data/ilpcr_processed/ilpcr/test/segment_dictionary_test_ilpcr_query.sav", 5 | "candidate_segment_dictionary_path": "../data/ilpcr_processed/ilpcr/test/segment_dictionary_test_ilpcr_candidate.sav" 6 | } -------------------------------------------------------------------------------- /Models/Events-Extraction/IOU/IOU_filtered_input_details.json: -------------------------------------------------------------------------------- 1 | { 2 | "dataset": "ilpcr", 3 | "split_type": "test", 4 | "output_dir": "../data/ilpcr_processed", 5 | "seg_data_dir_path": "../data/ilpcr_processed", 6 | "event_doc_line_text_dir_path": "../data/ilpcr_processed" 7 | } -------------------------------------------------------------------------------- /Models/Events-Extraction/IOU/IOU_filtered_text_dict.py: -------------------------------------------------------------------------------- 1 | # %% 2 | import pandas as pd 3 | import pickle as pkl 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | from tqdm import tqdm 7 | import os 8 | import json 9 | 10 | # %% 11 | # load IOU_filtered_input_details.json 12 | with open(r"./IOU_filtered_input_details.json", 'rb') as f: 13 | input_details = json.load(f) 14 | # load segment dictionary 15 | dataset = input_details['dataset'] # indiankanoon,COLIEE etc 16 | split_type = input_details['split_type'] # train,dev,test 17 | 18 | # load segment dictionary 19 | seg_data_dir_path = input_details['seg_data_dir_path'] 20 | # load segment dictionary for cadidate 21 | seg_path_candi = seg_data_dir_path+"/"+dataset+"/"+split_type + \ 22 | "/segment_dictionary_"+split_type+"_"+dataset+"_candidate.sav" 23 | with open(seg_path_candi, 'rb') as f: 24 | candi_seg_dict = pkl.load(f) 25 | # load segment dictionary for query 26 | seg_path_query = seg_data_dir_path+"/"+dataset+"/"+split_type + \ 27 | "/segment_dictionary_"+split_type+"_"+dataset+"_query.sav" 28 | with open(seg_path_query, 'rb') as f: 29 | query_seg_dict = pkl.load(f) 30 | 31 | # load event doc line text 32 | event_doc_line_text_dir_path = input_details['event_doc_line_text_dir_path'] 33 | # load event doc line text for cadidate 34 | event_doc_line_text_path_candi = event_doc_line_text_dir_path+"/"+dataset + \ 35 | "/"+split_type + "/event_doc_line_text_"+split_type+"_"+dataset+"_candidate.pkl" 36 | with open(event_doc_line_text_path_candi, 'rb') as f: 37 | candidate_event_doc_txt = pkl.load(f) 38 | # load event doc line text for query 39 | event_doc_line_text_path_query = event_doc_line_text_dir_path+"/"+dataset + \ 40 | "/"+split_type + "/event_doc_line_text_"+split_type+"_"+dataset+"_query.pkl" 41 | with open(event_doc_line_text_path_query, 'rb') as f: 42 | query_event_doc_txt = pkl.load(f) 43 | 44 | print("Loaded seg_path_query:", seg_path_query) 45 | print("Loaded seg_path_candi:", seg_path_candi) 46 | print("Loaded event_doc_line_text_path_query:", event_doc_line_text_path_query) 47 | print("Loaded event_doc_line_text_path_candi:", event_doc_line_text_path_candi) 48 | 49 | # %% 50 | # sorted list of all queries and candidates 51 | all_queries = sorted(list(query_seg_dict['dict_query'].keys())) 52 | all_candidates = sorted(list(candi_seg_dict['dict_candidate'].keys())) 53 | print(len(all_queries)) 54 | print(len(all_candidates)) 55 | 56 | # %% 57 | # making matrix of all common events between queries and candidates 58 | qc_mat = list() 59 | for q in tqdm(all_queries,desc="Matrix Calculation:"): 60 | # print(q) 61 | qc_events = list() 62 | # q_events = {tuple(e) for e in query_seg_dict['dict_query'][q]} 63 | q_events = set(query_seg_dict['dict_query'][q]) 64 | for c in all_candidates: 65 | # print(c) 66 | if q != c: 67 | # c_events = {tuple(e) for e in candi_seg_dict['dict_candidate'][c]} 68 | c_events = set(candi_seg_dict['dict_candidate'][c]) 69 | qc_events.append(q_events.intersection(c_events)) 70 | else: 71 | # print("same",q,c) 72 | qc_events.append(set()) 73 | # break 74 | qc_mat.append(qc_events) 75 | 76 | # %% 77 | # check length of qc_mat = number of queries 78 | print(len(qc_mat)) 79 | # check length of all lists qc_mat = number of candidates 80 | ct = set() 81 | for ql in qc_mat: 82 | ct.add(len(ql)) 83 | print(ct) 84 | 85 | # %% 86 | # unique events in a query 87 | query_events = {} 88 | for i in range(len(all_queries)): 89 | q_id = all_queries[i] 90 | # print(q_id) 91 | qc_events = qc_mat[i] 92 | qc_common = set().union(*qc_events) 93 | query_events[q_id] = qc_common 94 | # break 95 | # check length of query_events = number of queries 96 | print(len(query_events)) 97 | 98 | # %% 99 | # unique events in a candidate 100 | candidate_events = {} 101 | for i in range(len(all_candidates)): 102 | c_id = all_candidates[i] 103 | # print(c_id) 104 | cq_events = list() 105 | for row in qc_mat: 106 | cq_events.append(row[i]) 107 | cq_common = set().union(*cq_events) 108 | candidate_events[c_id] = cq_common 109 | # break 110 | 111 | # check length of candidate_events = number of candidates 112 | print(len(candidate_events)) 113 | 114 | # %% 115 | # convert common query events into the source sentences 116 | query_text = {} 117 | for q, eves in query_events.items(): 118 | q_txt = {} 119 | for eve in list(eves): 120 | if eve in query_event_doc_txt: 121 | if q in query_event_doc_txt[eve]: 122 | q_txt.update(query_event_doc_txt[eve][q]) 123 | q_lst = [] 124 | for key in sorted(list(q_txt.keys())): 125 | # print(key,"::",q_txt[key]) 126 | q_lst.append(q_txt[key]) 127 | query_text[q] = q_lst 128 | # break 129 | 130 | # %% 131 | # convert common candidate events into the source sentences 132 | candidate_text = {} 133 | for c, eves in candidate_events.items(): 134 | c_txt = {} 135 | for eve in list(eves): 136 | if eve in candidate_event_doc_txt: 137 | if c in candidate_event_doc_txt[eve]: 138 | c_txt.update(candidate_event_doc_txt[eve][c]) 139 | c_lst = [] 140 | for key in sorted(list(c_txt.keys())): 141 | # print(key,"::",c_txt[key]) 142 | c_lst.append(c_txt[key]) 143 | candidate_text[c] = c_lst 144 | 145 | # %% 146 | # save all text in dictionary 147 | event_text_dict = dict() 148 | event_text_dict['dict_query'] = query_text 149 | event_text_dict['dict_candidate'] = candidate_text 150 | 151 | 152 | # %% 153 | # save file to location 154 | output_path = input_details['output_dir']+"/"+dataset+"/"+split_type 155 | output_file_path = output_path+"/IOU_filtered_text_dict_"+dataset+"_"+split_type+".sav" 156 | os.makedirs(output_path, exist_ok=True) 157 | with open(output_file_path, 'wb') as f: 158 | try: 159 | pkl.dump(event_text_dict, f) 160 | except: 161 | print("Couldn't save") 162 | f.close() 163 | print("Saved IOU_filtered_text_dict_"+dataset + 164 | "_"+split_type+".sav at:", output_file_path) 165 | -------------------------------------------------------------------------------- /Models/Events-Extraction/RR/Note.txt: -------------------------------------------------------------------------------- 1 | - Refer to the github : https://github.com/Exploration-Lab/CJPE 2 | - Replace this folder with the RR folder given in the above github. -------------------------------------------------------------------------------- /Models/Events-Extraction/SBERT/SBERT_input_details.json: -------------------------------------------------------------------------------- 1 | { 2 | "dataset": "ilpcr", 3 | "split_type": "train", 4 | "data_dir_path": "../data", 5 | "query_segment_dictionary_path": "../data/ilpcr_processed/ilpcr/train/sent_data_train_ilpcr_query.sav", 6 | "candidate_segment_dictionary_path": "../data/ilpcr_processed/ilpcr/train/sent_data_train_ilpcr_candidate.sav", 7 | "segment_dictionary_path": "../data/segment_dictionary_train_line.sav", 8 | "model_type": "bert", 9 | "model_id": 156960, 10 | "model_name": "bert-base-uncased", 11 | "model_path": "../output/ilpcr/bert_base/156960/" 12 | } -------------------------------------------------------------------------------- /Models/Events-Extraction/data/Note.txt: -------------------------------------------------------------------------------- 1 | ########## Empty folders are created inside this data folder, for easier understanding of folder structure ############# 2 | 3 | Please refer to ReadMe while running the python scripts or Notebooks. -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/.gitignore: -------------------------------------------------------------------------------- 1 | # temporary folders and logs 2 | .vscode/ 3 | __pycache__/ 4 | logs/ 5 | 6 | # model checkpoints 7 | model_checkpoints/ 8 | 9 | # experiment results 10 | exp_results/* 11 | 12 | data/* 13 | 14 | !data/README.md 15 | !exp_results/Images/ 16 | !exp_results/README.md 17 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/README.md: -------------------------------------------------------------------------------- 1 | # U-CREAT: Unsupervised Case Retrieval using Events extrAcTion 2 | 3 | This folder contains the official code for all word based transformer experiments conducted in the paper `U-CREAT: Unsupervised Case Retrieval using Events extrAcTion`. The experiments include `Segment-Doc Transformer(full document)` and `Transformer (top 512 tokens)`. 4 | 5 | ## Project Overview 6 | 1. `transformer_score.py` 7 | 8 | The main python script which runs the transformer experiments. Takes as input a config file containing experiment details and saves the experiment output/metrics in the `exp_results` folder with a time stamp of when the experiment was been started. The formats of the input (config files) and the experiment output are described below. 9 | 10 | 2. `config_files/` 11 | 12 | This folder contains several sample input config files for `transformer_score.py`. Config files are included for the models `bert-base-uncased`, `bert-finetuned`, `distilbert-base-uncased`, `distilbert-finetuned`, `InLegalBERT`, `InCaseLawBERT` being evaluated on the corpuses `COLIEE21`, `IL-PCR`, `citation-sentence removed IL-PCR`. 13 | Each config file must contain the following entries for the run script to complete the experiment : 14 | 15 | 1. `path_prior_cases` : Path to the folder containing prior/candidate cases. The cases must be present as `.txt` files inside the folder. 16 | 2. `path_current_cases` : Path to the folder containing current/query cases. The cases must be present as `.txt` files inside the folder. 17 | 3. `true_labels_json` : Path to the label.json file containing the ground truth labels for candidate cases considered relevant per query case. This is required to compute F1@K and related evaluation metrics. The general format for this json file is 18 | ```json 19 | { 20 | "Query Set": [ // contains separate dictionary for each query case 21 | { 22 | "id" : , 23 | "query_name": 24 | "relevant candidates": [ // contains ids of relevant cases 25 | ... 26 | ] 27 | }, ... 28 | ], 29 | "Candidate Set": [ 30 | { 31 | "id": 32 | }, ... 33 | ] 34 | } 35 | ``` 36 | Please see an example label.json file included with the dataset for a concrete example. 37 | 38 | 4. `checkpoint` : Checkpoint of the model to be used to produce the vector embeddings of documents. 39 | 40 | 5. `top512` : Indicates whether to use the top 512 tokens of a document to obtain its vector representation. If set to `True` the top 512 tokens are taken. If set to `False` the document is divided into chunks of 512 tokens and the vector representation of each chunk is concatentated to produce the final vector embedding. Please see the paper for details. 41 | 42 | 3. `spawn_transformers.sh` 43 | 44 | This is a helper script which spawns experiments with different config files at once. Our config files are arranged folder-wise and are named `config_1, config_2, ...` and so on. This allows you to automate running the experiments in the paper. 45 | 46 | 4. `data` 47 | 48 | This folder must contain the legal corpus in the `corpus/` subdirectory when running `transformer_score.py`. A valid corpus contains a `query/`, `candidate/` directory containing current and prior cases respectively. Also a `label.json` file is required which contains the ground truth labels for evaluating F1 scores. 49 | 50 | 5. `evaluate_at_K.py` 51 | 52 | Evaluates the micro F1 score between a ground truth file (`label.json`) and a similarity score csv produced by `transformer_score.py`. The similarity score csv contains a relevance score for each query X candidate pair. Please check the paper for more details on how the results are evaluated. 53 | 54 | 6. `get_exp_results.py` 55 | 56 | This script fetches the experiment results (recall, precision and F1 at K) as reported in the paper. The value of $K$ is determined using the results for the trainset of ILPCR (please see the paper for details). Hence, the results over any test corpus requires running `transformer_score.py` on the counter-part train corpus first. Once both results are available the script fetches the best $K$ obtained from the train corpus and reports the appropriate F1@K value on the test set. 57 | 58 | 7. `model_checkpoints/` 59 | 60 | Contains model checkpoints for finetuned bert and distilbert for the experiments in the paper. If you require these models please contact the authors. 61 | 62 | 8. `transformer_score.ipynb` 63 | 64 | A `.ipynb` format of `transformer_score.py`, useful for visualization, interactive coding and educational purposes. 65 | 66 | ## Installation 67 | 1. pip requirements are listed in the `requirements.txt` file. Install using 68 | ```bash 69 | python3 -m pip install -r requirements.txt 70 | ``` 71 | 72 | ## Usage 73 | 1. `transformer_score.py` : Requires a config file to run and a dataset to be present at `data/corpus/` in the standard dataset format. Example usage is 74 | ``` 75 | python3 ./transformer_score.py path/to/config_file.json 76 | ``` 77 | 78 | 2. `spawn_transformers.sh` : Multiple experiments can be run simultaneously with running python files in the background and redirecting their standard output/error streams. An example is : 79 | ```bash 80 | python3 -u ./transformer_score.py config_1_path 1>./logs/log1 2>&1 & 81 | python3 -u ./transformer_score.py config_2_path 1>./logs/log2 2>&1 & 82 | ``` 83 | Our `config_files/` folder contains arranged configs of each word-based experiment in the paper. The results in the paper can be recreated by uncommenting the line for the required experiment and simply using 84 | ```bash 85 | ./spawn_transformers.sh 86 | ``` 87 | 88 | 3. `get_exp_results.py` : Obtains the recall, precision and F1 @ K values by fetching experiment statistics saved in `exp_results/`. Results to be displayed can be changed by modifying the file internally (no command line option for that). Example usage is 89 | ```bash 90 | python3 ./get_exp_results.py 91 | ``` 92 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InCaseLawBERT/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InCaseLawBERT/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InCaseLawBERT/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InCaseLawBERT/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InLegalBERT/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InLegalBERT/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InLegalBERT/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/InLegalBERT/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert_finetuned/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert_finetuned/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert_finetuned/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/bert_finetuned/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert_finetuned/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert_finetuned/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/test/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/test/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/test/test.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert_finetuned/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_COLIEE/distilbert_finetuned/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/COLIEE2021/train/candidate", 3 | "path_current_cases": "./data/corpus/COLIEE2021/train/query", 4 | "true_labels_json": "./data/corpus/COLIEE2021/train/train.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InCaseLawBERT/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InCaseLawBERT/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InCaseLawBERT/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InCaseLawBERT/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InLegalBERT/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InLegalBERT/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InLegalBERT/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/InLegalBERT/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert_finetuned/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert_finetuned/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert_finetuned/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/bert_finetuned/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert_finetuned/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert_finetuned/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert_finetuned/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR/distilbert_finetuned/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InCaseLawBERT/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InCaseLawBERT/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InCaseLawBERT/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InCaseLawBERT/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InCaseLawBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InLegalBERT/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InLegalBERT/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InLegalBERT/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/InLegalBERT/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "law-ai/InLegalBERT", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "bert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert_finetuned/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert_finetuned/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert_finetuned/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/bert_finetuned/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/bert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "distilbert-base-uncased", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert_finetuned/config_1.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert_finetuned/config_2.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_test/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_test/query", 4 | "true_labels_json": "./data/corpus/ik_test/test.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert_finetuned/config_3.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "False" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/configs_ILPCR_citation_removed/distilbert_finetuned/config_4.json: -------------------------------------------------------------------------------- 1 | { 2 | "path_prior_cases": "./data/corpus/sentence_removed/ik_train/candidate", 3 | "path_current_cases": "./data/corpus/sentence_removed/ik_train/query", 4 | "true_labels_json": "./data/corpus/ik_train/train.json", 5 | "checkpoint": "./model_checkpoints/distilbert_finetuned/", 6 | "top512": "True" 7 | } -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/config_files/replicate_configs.py: -------------------------------------------------------------------------------- 1 | import os, sys, json 2 | 3 | if __name__ == '__main__': 4 | assert(len(sys.argv) == 2) 5 | folder_name = sys.argv[1] 6 | assert(len(os.listdir(folder_name)) == 1) 7 | 8 | with open(f'./{folder_name}/config_1.json', 'r') as f: 9 | config = json.load(f) 10 | assert('test' in config["path_prior_cases"]) 11 | assert(config['top512'] == "False") 12 | 13 | for i in range(2,5): 14 | temp_config = config 15 | if i%2 == 0: 16 | temp_config['top512'] = "True" 17 | else : 18 | temp_config['top512'] = "False" 19 | 20 | if i >= 3: 21 | temp_config['path_prior_cases'] = temp_config['path_prior_cases'].replace('test', 'train') 22 | temp_config['path_current_cases'] = temp_config['path_current_cases'].replace('test', 'train') 23 | temp_config['true_labels_json'] = temp_config['true_labels_json'].replace('test', 'train') 24 | 25 | with open(f'./{folder_name}/config_{i}.json', 'w') as f: 26 | json.dump(temp_config, f, indent = 4) 27 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/README.md: -------------------------------------------------------------------------------- 1 | ## Directory Overview 2 | 3 | This folder must contain the legal corpus in the `corpus/` subdirectory when running `transformer_score.py`. You can obtain the dataset via request, as mentioned in the project [README.md](../../../README.md). 4 | 5 | A valid corpus contains a `query/`, `candidate/` directory containing current and prior cases respectively. Also a `label.json` file is required which contains the ground truth labels for evaluating F1 scores. `ik_train/test` contain the ILPCR train and test set in the standard dataset format of `query/`, `candidate/` and a `label.json` file with the expected format of : 6 | 7 | ```json 8 | { 9 | "Query Set": [ // contains separate dictionary for each query case 10 | { 11 | "id" : , 12 | "query_name": 13 | "relevant candidates": [ // contains ids of relevant cases 14 | ... 15 | ] 16 | }, ... 17 | ], 18 | "Candidate Set": [ 19 | { 20 | "id": 21 | }, ... 22 | ] 23 | } 24 | ``` 25 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/test/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/test/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/test/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/test/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/test/query/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/train/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/train/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/train/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/COLIEE2021/train/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/COLIEE2021/train/query/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/ik_test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/ik_test/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/ik_test/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/ik_test/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/ik_test/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/ik_test/query/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/ik_train/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/ik_train/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/ik_train/candidate/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/ik_train/candidate/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/data/corpus/ik_train/query/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/data/corpus/ik_train/query/.gitkeep -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/evaluate_at_K.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | 4 | import pandas as pd 5 | import numpy as np 6 | import matplotlib.pyplot as plt 7 | 8 | from tqdm import tqdm 9 | 10 | def get_micro_scores_at_K(actual, predicted, k): 11 | act_set = set(actual) 12 | pred_set = set(predicted[:k]) 13 | 14 | number_of_correctly_retrieved = len(act_set & pred_set) 15 | number_of_relevant_cases = len(act_set) 16 | number_of_retrieved_cases = k 17 | 18 | return number_of_correctly_retrieved, number_of_relevant_cases, number_of_retrieved_cases 19 | 20 | def get_f1_vs_K(gold_labels, similarity_df): 21 | precision_vs_K = [] 22 | recall_vs_K = [] 23 | f1_vs_K = [] 24 | for k in tqdm(range(1, 21)): 25 | precision_scores = [] 26 | recall_scores = [] 27 | f1_scores = [] 28 | number_of_correctly_retrieved_all = [] 29 | number_of_relevant_cases_all = [] 30 | number_of_retrieved_cases_all = [] 31 | for query_case_id in similarity_df.query_case_id.values: 32 | if query_case_id not in [ 33 | 1864396, 34 | 1508893, 35 | ] : 36 | gold = gold_labels[ 37 | gold_labels["query_case_id"].values == query_case_id 38 | ].values[0][1:] 39 | actual = np.asarray(list(gold_labels.columns)[1:])[ 40 | np.logical_or(gold == 1, gold == -2) 41 | ] 42 | actual = [str(i) for i in actual] 43 | 44 | # candidate_docs = list(similarity_df.columns)[1:] 45 | candidate_docs = [int(i) for i in gold_labels.columns.values[1:]] 46 | column_name = 'query_case_id' if 'query_case_id' in similarity_df.columns else 'Unnamed: 0' 47 | similarity_scores = similarity_df[ 48 | similarity_df[column_name].values == query_case_id 49 | ].values[0][1:] 50 | assert(len(similarity_scores) == len(candidate_docs)) 51 | 52 | sorted_candidates = [ 53 | x 54 | for _, x in sorted( 55 | zip(similarity_scores, candidate_docs), 56 | key=lambda pair: float(pair[0]), 57 | reverse=True, 58 | ) 59 | ] 60 | sorted_candidates.remove((query_case_id)) 61 | sorted_candidates = [str(i) for i in sorted_candidates] 62 | 63 | number_of_correctly_retrieved, number_of_relevant_cases, number_of_retrieved_cases = get_micro_scores_at_K(actual=actual, predicted=sorted_candidates, k=k) 64 | number_of_correctly_retrieved_all.append(number_of_correctly_retrieved) 65 | number_of_relevant_cases_all.append(number_of_relevant_cases) 66 | number_of_retrieved_cases_all.append(number_of_retrieved_cases) 67 | 68 | recall_scores = np.sum(number_of_correctly_retrieved_all)/np.sum(number_of_relevant_cases_all) 69 | precision_scores = np.sum(number_of_correctly_retrieved_all)/np.sum(number_of_retrieved_cases_all) 70 | if recall_scores == 0 or precision_scores == 0: 71 | f1_scores = 0 72 | else : 73 | f1_scores = (2*precision_scores*recall_scores)/(precision_scores+recall_scores) 74 | 75 | recall_vs_K.append(recall_scores) 76 | precision_vs_K.append(precision_scores) 77 | f1_vs_K.append(f1_scores) 78 | 79 | return { 80 | "recall_vs_K" : recall_vs_K, 81 | "precision_vs_K" : precision_vs_K, 82 | "f1_vs_K" : f1_vs_K 83 | } 84 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/exp_results/Images/ILPCR_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Exploration-Lab/IL-PCR/8c111ff30631cbe0f7a3068a0c67536c6aac523c/Models/Transformer-Embeddings/exp_results/Images/ILPCR_results.png -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/exp_results/README.md: -------------------------------------------------------------------------------- 1 | ## `exp_results` 2 | 3 | Results of each experiment run are stored here. Each experiment stores : 4 | 5 | 1. `config_file.json` : The config file for the experiment contains details such as the path for the candidate/query cases, the labels file and the `n_gram` value. Please see the primary BM25 [README.md](../README.md) for details. 6 | 7 | 2. `scores.json` : Json containing the similarity score for each (query, candidate) pair. 8 | 9 | 3. `filled_similarity_matrix.csv` : pd.DataFrame object containing the BM25 similarity score for each (query, candidate) pair. 10 | 11 | 4. `output.json` : The evaluation metrics (`precision_vs_K`, `recall_vs_K`, `F1_vs_K`) for the experiment. For each query, the algorithm ranks the candidate pool and returns the top-`K` candidates as relevant. The precision/recall/F1 scores are obtained from comparing the candidates marked relevant against the ground truth labels. 12 | 13 | ### Results Summary 14 | 15 | ![Images/ILPCR_results.png](Images/ILPCR_results.png) 16 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/get_exp_results.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Exps are defined by corpus, checkpoint, top512 3 | ''' 4 | import os, sys, json, pickle as pkl 5 | import itertools 6 | 7 | def find_folder_and_output(model, corpus, top512): 8 | for i in os.listdir(f'./exp_results/'): 9 | path = f'./exp_results/{i}' 10 | if not os.path.isdir(path): # ignore README.md and other files 11 | continue 12 | with open(path + '/config_file.json', 'r') as f: 13 | config = json.load(f) 14 | 15 | if config["checkpoint"] == model and f'data/corpus/{corpus}' in config["path_prior_cases"] and config["top512"] == top512 : 16 | with open(path + f'/output.json', 'r') as f: 17 | output = json.load(f) 18 | return output 19 | print(f"exp result for {model, corpus, top512} not found") 20 | raise RuntimeError 21 | 22 | def get_train_exp(model, corpus, top512): 23 | ''' 24 | Fetches the counterpart train experiment to determine the best (k) 25 | ''' 26 | train_corpus = corpus.replace('test', 'train') 27 | for i in EXP_RESULTS: 28 | if i["model"] == model and i["corpus"] == train_corpus and i["top512"] == top512: 29 | return i 30 | print(f'Can\'t find counter-part train experiment for config {model, corpus, top512}') 31 | raise RuntimeError 32 | 33 | def resolve_exp(model, corpus, top512): 34 | ''' 35 | Fetches results corresponding to the experiment (model, corpus, top512) and appends in the global list EXP_RESULTS 36 | Example usage : resolve_exp("bert-case-uncased", ) 37 | ''' 38 | output = find_folder_and_output(model, corpus, top512) 39 | if 'train' in corpus: 40 | best_k = output['f1_vs_K'].index(max(output['f1_vs_K'])) 41 | return_dict = { 42 | 'model' : model, 43 | 'corpus' : corpus, 44 | 'top512' : top512, 45 | 'best_k' : best_k+1, 46 | 'recall_vs_K' : output['recall_vs_K'][best_k], 47 | 'precision_vs_K' : output['precision_vs_K'][best_k], 48 | 'f1_v_k' : output['f1_vs_K'][best_k], 49 | } 50 | return return_dict 51 | 52 | elif 'test' in corpus: 53 | train_exp = get_train_exp(model, corpus, top512) 54 | best_k = train_exp['best_k'] 55 | return_dict = { 56 | 'model' : model, 57 | 'corpus' : corpus, 58 | 'top512' : top512, 59 | 'best_k_train' : best_k, 60 | 'recall_vs_K' : output['recall_vs_K'][best_k-1], # as best_k is saved starting from index 1 61 | 'precision_vs_K' : output['precision_vs_K'][best_k-1], 62 | 'f1_v_k' : output['f1_vs_K'][best_k-1], 63 | } 64 | return return_dict 65 | 66 | else : 67 | print('corpus name \'{corpus}\' does not contain either test or train') 68 | raise RuntimeError 69 | 70 | def show(i): 71 | i['precision_vs_K'] = f"{round(i['precision_vs_K']*100, 2)}%" 72 | i['recall_vs_K'] = f"{round(i['recall_vs_K']*100, 2)}%" 73 | 74 | i['f1_v_k'] = round(i['f1_v_k']*100, 2) 75 | i['f1_v_k'] = f"{i['f1_v_k']}%" 76 | 77 | if SHOW_MODE == 'TRAIN': 78 | if 'best_k' in i: 79 | print(i) 80 | if SHOW_MODE == 'TEST': 81 | if 'best_k_train' in i: 82 | print(i) 83 | 84 | if __name__ == '__main__': 85 | model_names = ['bert-base-uncased', './model_checkpoints/bert_finetuned/', 'distilbert-base-uncased', './model_checkpoints/distilbert_finetuned/', 'law-ai/InCaseLawBERT', 'law-ai/InLegalBERT'] 86 | # model_names = ['bert-base-uncased', 'distilbert-base-uncased', 'law-ai/InLegalBERT', 'law-ai/InCaseLawBERT'] 87 | # corpus_names = ['ik_train', 'ik_test', 'sentence_removed/ik_train', 'sentence_removed/ik_test'] # for ILPCR dataset 88 | corpus_names = ['ik_train', 'ik_test'] # for ILPCR dataset 89 | # corpus_names = ['sentence_removed/ik_train', 'sentence_removed/ik_test'] 90 | # corpus_names = ['COLIEE2021/train', 'COLIEE2021/test'] # for coliee results 91 | # top512_values = ["True", "False"] 92 | top512_values = ["True"] 93 | 94 | EXP_RESULTS = [] 95 | for model, corpus, top512 in itertools.product(*[model_names, corpus_names, top512_values]): 96 | EXP_RESULTS.append(resolve_exp(model, corpus, top512)) 97 | 98 | # SHOW_MODE = 'TRAIN' 99 | SHOW_MODE = 'TEST' 100 | for i in EXP_RESULTS: 101 | show(i) 102 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib 2 | spacy 3 | 4 | transformers 5 | -------------------------------------------------------------------------------- /Models/Transformer-Embeddings/spawn_transformers.sh: -------------------------------------------------------------------------------- 1 | # it is not recommended to run all the experiments at once as your GPU will run out of memory. 2 | 3 | run_experiment() { 4 | exp_name=$1 5 | for input in "${@:2}"; do 6 | export CUDA_VISIBLE_DEVICES=$(((input-1)%4)) # running on one of 4 gpus, change this for your system 7 | config_name=./config_files/$exp_name/config_$input.json # fill your config pattern here 8 | log_path=./logs/$exp_name\_$input.txt 9 | last_slash_index=${log_path%/*} 10 | log_path="${log_path/$last_slash_index\//${last_slash_index}_}" 11 | if [[ -f $config_name ]]; then 12 | python3 -u ./transformer_score.py $config_name 1>$log_path 2>&1 & 13 | else 14 | echo "config_file DOES NOT exists : $config_name" 15 | fi 16 | done 17 | } 18 | 19 | mkdir -p logs 20 | 21 | # ILPCR experiments 22 | run_experiment configs_ILPCR/bert 1 2 3 4 23 | run_experiment configs_ILPCR/bert_finetuned 1 2 3 4 24 | run_experiment configs_ILPCR/distilbert 1 2 3 4 25 | run_experiment configs_ILPCR/distilbert_finetuned 1 2 3 4 26 | run_experiment configs_ILPCR/InCaseLawBERT 1 2 3 4 27 | run_experiment configs_ILPCR/InLegalBERT 1 2 3 4 28 | 29 | # citation-removed sentences ILPCR experiments 30 | # run_experiment configs_ILPCR_citation_removed/bert 1 2 3 4 31 | # run_experiment configs_ILPCR_citation_removed/bert_finetuned 1 2 3 4 32 | # run_experiment configs_ILPCR_citation_removed/distilbert 1 2 3 4 33 | # run_experiment configs_ILPCR_citation_removed/distilbert_finetuned 1 2 3 4 34 | # run_experiment configs_ILPCR_citation_removed/InCaseLawBERT 1 2 3 4 35 | # run_experiment configs_ILPCR_citation_removed/InLegalBERT 1 2 3 4 36 | 37 | # COLIEE experiments 38 | # run_experiment configs_COLIEE/bert 1 2 3 4 39 | # run_experiment configs_COLIEE/bert_finetuned 1 2 3 4 40 | # run_experiment configs_COLIEE/distilbert 1 2 3 4 41 | # run_experiment configs_COLIEE/distilbert_finetuned 1 2 3 4 42 | # run_experiment configs_COLIEE/InCaseLawBERT 1 2 3 4 43 | # run_experiment configs_COLIEE/InLegalBERT 1 2 3 4 44 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # U-CREAT: Unsupervised Case Retrieval using Events extrAcTion 2 | 3 | The repository contains the full codebase of experiments and results of the ACL 2023 paper **"U-CREAT: Unsupervised Case Retrieval using Events extrAcTion"**. 4 | 5 | ![/Images/ucreat_pipeline.png](/Images/ucreat_pipeline.png) 6 | 7 | ## Contributions 8 | We make the following contributions: 9 | 10 | 1. Considering the lack of available benchmarks for the Indian legal setting, we create a new benchmark for Prior Case Retrieval focused on the Indian legal system (IL-PCR) and provide a detailed analysis of the created benchmark. Due to the large size of the corpus, the created benchmark could serve as a helpful resource for building information retrieval systems for legal documents. We release the corpus and model code for the purpose of research usage. 11 | 2. We propose a new framework for legal document retrieval: U-CREAT (Unsupervised Case Retrieval using Events Extraction), based on the events extracted from documents. We propose different event-based models for the PCR task. We show that these perform better than existing state-of-the-art methods both in terms of retrieval efficiency as well as inference time. 12 | 3. We show that the proposed event-based framework and models generalize well across different legal systems (Indian and Canadian systems) without any law/demography-specific tuning of models. 13 | 14 | ## Models and Data 15 | 1. `Model Checkpoints` : The checkpoints for the finetuned bert and distilbert models mentioned in the paper are available for download at [here](https://1drv.ms/f/s!Ao1lGmmnesu6l7cSBRElaJIhVNzwxw?e=QqQy2S). 16 | 2. `Data` : The Dataset is ONLY for research use and NOT for any commercial use. The dataset and leaderboard are available on [Hugging Face](https://huggingface.co/spaces/Exploration-Lab/IL-TUR-Leaderboard). For the COLIEE'21 dataset please refer to [COLIEE-2021](https://sites.ualberta.ca/~rabelo/COLIEE2021/). 17 | 4. Each algorithm in the `Models/` subdirectory uses the corpus stored in the local `data/` subdirectory. Please see the algorithm specific READMEs for an explanation of how to prepare the corpus for running with the codebase. 18 | 19 | ## License 20 | 21 | [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) 22 | 23 | The ILPCR dataset and UCREAT software follows [CC-BY-NC](CC-BY-NC) license. Thus, users can share and adapt our dataset if they give credit to us and do not use our dataset for any commercial purposes. 24 | ## Citation 25 | 26 | ``` 27 | @inproceedings{joshi-etal-2023-ucreat, 28 | title = "{U-CREAT}: Unsupervised Case Retrieval using Events extrAcTion", 29 | author = "Joshi, Abhinav and 30 | Sharma, Akshat and 31 | Tanikella, Sai Kiran and 32 | Modi, Ashutosh", 33 | booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", 34 | month = july, 35 | year = "2023", 36 | publisher = "Association for Computational Linguistics", 37 | abstract = "The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).", 38 | } 39 | ``` 40 | 41 | ## Contact 42 | In case of any queries, please contact , , , 43 | --------------------------------------------------------------------------------