├── .gitignore ├── README.md ├── batch_scripts ├── extract.sh ├── judge.sh ├── precision.sh ├── recall.sh ├── reflection_quality.sh ├── relevance_rate.sh └── run_all.py ├── dataset.py ├── direct_eval.py ├── figs ├── effiency.jpg ├── quality.jpg ├── radar.jpg ├── robustness.jpg └── teaser.jpg ├── file_utils.py ├── final_score ├── dummy_reflection_quality.json ├── efficiency.py ├── precision.py ├── quality.py ├── recall.py ├── reflection_quality.py ├── relevance_rate.py └── robustness.py ├── main.py ├── prompt ├── prompt_extract.txt ├── prompt_judge.txt ├── prompt_precision.txt ├── prompt_recall.txt ├── prompt_reflection_quality.txt └── prompt_relevance_rate.txt ├── requirements.txt ├── results ├── json │ ├── example_cot.json │ └── example_dir.json └── xlsx │ ├── MODELNAME_MME_CoT_TEST_cot.xlsx │ └── MODELNAME_MME_CoT_TEST_dir.xlsx ├── scripts ├── extract.sh ├── judge.sh ├── precision.sh ├── recall.sh ├── reflection_quality.sh └── relevance_rate.sh └── tools ├── read_extract_cache.py └── update_lmmseval_json.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/.gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/README.md -------------------------------------------------------------------------------- /batch_scripts/extract.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/extract.sh -------------------------------------------------------------------------------- /batch_scripts/judge.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/judge.sh -------------------------------------------------------------------------------- /batch_scripts/precision.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/precision.sh -------------------------------------------------------------------------------- /batch_scripts/recall.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/recall.sh -------------------------------------------------------------------------------- /batch_scripts/reflection_quality.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/reflection_quality.sh -------------------------------------------------------------------------------- /batch_scripts/relevance_rate.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/relevance_rate.sh -------------------------------------------------------------------------------- /batch_scripts/run_all.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/batch_scripts/run_all.py -------------------------------------------------------------------------------- /dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/dataset.py -------------------------------------------------------------------------------- /direct_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/direct_eval.py -------------------------------------------------------------------------------- /figs/effiency.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/figs/effiency.jpg -------------------------------------------------------------------------------- /figs/quality.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/figs/quality.jpg -------------------------------------------------------------------------------- /figs/radar.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/figs/radar.jpg -------------------------------------------------------------------------------- /figs/robustness.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/figs/robustness.jpg -------------------------------------------------------------------------------- /figs/teaser.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/figs/teaser.jpg -------------------------------------------------------------------------------- /file_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/file_utils.py -------------------------------------------------------------------------------- /final_score/dummy_reflection_quality.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/dummy_reflection_quality.json -------------------------------------------------------------------------------- /final_score/efficiency.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/efficiency.py -------------------------------------------------------------------------------- /final_score/precision.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/precision.py -------------------------------------------------------------------------------- /final_score/quality.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/quality.py -------------------------------------------------------------------------------- /final_score/recall.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/recall.py -------------------------------------------------------------------------------- /final_score/reflection_quality.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/reflection_quality.py -------------------------------------------------------------------------------- /final_score/relevance_rate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/relevance_rate.py -------------------------------------------------------------------------------- /final_score/robustness.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/final_score/robustness.py -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/main.py -------------------------------------------------------------------------------- /prompt/prompt_extract.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/prompt/prompt_extract.txt -------------------------------------------------------------------------------- /prompt/prompt_judge.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/prompt/prompt_judge.txt -------------------------------------------------------------------------------- /prompt/prompt_precision.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/prompt/prompt_precision.txt -------------------------------------------------------------------------------- /prompt/prompt_recall.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/prompt/prompt_recall.txt -------------------------------------------------------------------------------- /prompt/prompt_reflection_quality.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/prompt/prompt_reflection_quality.txt -------------------------------------------------------------------------------- /prompt/prompt_relevance_rate.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/prompt/prompt_relevance_rate.txt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | datasets 2 | openai 3 | json_repair 4 | tqdm 5 | -------------------------------------------------------------------------------- /results/json/example_cot.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/results/json/example_cot.json -------------------------------------------------------------------------------- /results/json/example_dir.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/results/json/example_dir.json -------------------------------------------------------------------------------- /results/xlsx/MODELNAME_MME_CoT_TEST_cot.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/results/xlsx/MODELNAME_MME_CoT_TEST_cot.xlsx -------------------------------------------------------------------------------- /results/xlsx/MODELNAME_MME_CoT_TEST_dir.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/results/xlsx/MODELNAME_MME_CoT_TEST_dir.xlsx -------------------------------------------------------------------------------- /scripts/extract.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/scripts/extract.sh -------------------------------------------------------------------------------- /scripts/judge.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/scripts/judge.sh -------------------------------------------------------------------------------- /scripts/precision.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/scripts/precision.sh -------------------------------------------------------------------------------- /scripts/recall.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/scripts/recall.sh -------------------------------------------------------------------------------- /scripts/reflection_quality.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/scripts/reflection_quality.sh -------------------------------------------------------------------------------- /scripts/relevance_rate.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/scripts/relevance_rate.sh -------------------------------------------------------------------------------- /tools/read_extract_cache.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/tools/read_extract_cache.py -------------------------------------------------------------------------------- /tools/update_lmmseval_json.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MME-Benchmarks/MME-CoT/HEAD/tools/update_lmmseval_json.py --------------------------------------------------------------------------------