├── .DS_Store ├── LICENSE ├── README.md ├── code ├── gc │ ├── eval │ │ └── eval_tf_batch_pair_3acc.py │ └── test │ │ ├── run_gc_full_round.bash │ │ └── test_script_example │ │ └── test_qwen2p5_7B_img_qa_gc.py ├── oc │ ├── eval │ │ ├── eval_cnt │ │ │ └── acc_with_cnt_score.py │ │ ├── eval_cpr │ │ │ └── eval_comp_tf_pair_batch_3acc.py │ │ └── eval_grp │ │ │ └── eval_acc_batch_classify_mcq.py │ └── test │ │ ├── run_oc_full_round.bash │ │ └── test_script_example │ │ └── test_qwen2p5_7B_img_qa_oc.py └── pc │ ├── image │ ├── eval │ │ ├── eval_cnt │ │ │ └── acc_with_cnt_score.py │ │ ├── eval_cpr │ │ │ └── eval_comp_tf_pair_batch_3acc.py │ │ └── eval_grp │ │ │ └── eval_acc_batch_classify_mcq.py │ └── test │ │ ├── run_pc-i_full_round.bash │ │ ├── test_res │ │ └── .DS_Store │ │ └── test_script_example │ │ └── test_qwen2p5_7B_img_qa_pc.py │ └── video │ ├── eval │ └── eval_pc-v_open-ended.py │ └── test │ ├── run_pc-v_full_round.bash │ └── test_script_example │ └── test_qwen2p5_7B_vid_qa_pc-v.py ├── figs ├── .DS_Store ├── hf-logo.png ├── pwc-icon.png ├── vlm2-bench-icon_final.png ├── vlm2-bench_eval_results.png ├── vlm2-bench_overview.png └── vlm2-bench_statistics.png ├── requirements.txt └── vlm2bench_evaluator.ipynb /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/.DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/README.md -------------------------------------------------------------------------------- /code/gc/eval/eval_tf_batch_pair_3acc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/gc/eval/eval_tf_batch_pair_3acc.py -------------------------------------------------------------------------------- /code/gc/test/run_gc_full_round.bash: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/gc/test/run_gc_full_round.bash -------------------------------------------------------------------------------- /code/gc/test/test_script_example/test_qwen2p5_7B_img_qa_gc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/gc/test/test_script_example/test_qwen2p5_7B_img_qa_gc.py -------------------------------------------------------------------------------- /code/oc/eval/eval_cnt/acc_with_cnt_score.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/oc/eval/eval_cnt/acc_with_cnt_score.py -------------------------------------------------------------------------------- /code/oc/eval/eval_cpr/eval_comp_tf_pair_batch_3acc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/oc/eval/eval_cpr/eval_comp_tf_pair_batch_3acc.py -------------------------------------------------------------------------------- /code/oc/eval/eval_grp/eval_acc_batch_classify_mcq.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/oc/eval/eval_grp/eval_acc_batch_classify_mcq.py -------------------------------------------------------------------------------- /code/oc/test/run_oc_full_round.bash: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/oc/test/run_oc_full_round.bash -------------------------------------------------------------------------------- /code/oc/test/test_script_example/test_qwen2p5_7B_img_qa_oc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/oc/test/test_script_example/test_qwen2p5_7B_img_qa_oc.py -------------------------------------------------------------------------------- /code/pc/image/eval/eval_cnt/acc_with_cnt_score.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/image/eval/eval_cnt/acc_with_cnt_score.py -------------------------------------------------------------------------------- /code/pc/image/eval/eval_cpr/eval_comp_tf_pair_batch_3acc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/image/eval/eval_cpr/eval_comp_tf_pair_batch_3acc.py -------------------------------------------------------------------------------- /code/pc/image/eval/eval_grp/eval_acc_batch_classify_mcq.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/image/eval/eval_grp/eval_acc_batch_classify_mcq.py -------------------------------------------------------------------------------- /code/pc/image/test/run_pc-i_full_round.bash: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/image/test/run_pc-i_full_round.bash -------------------------------------------------------------------------------- /code/pc/image/test/test_res/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/image/test/test_res/.DS_Store -------------------------------------------------------------------------------- /code/pc/image/test/test_script_example/test_qwen2p5_7B_img_qa_pc.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/image/test/test_script_example/test_qwen2p5_7B_img_qa_pc.py -------------------------------------------------------------------------------- /code/pc/video/eval/eval_pc-v_open-ended.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/video/eval/eval_pc-v_open-ended.py -------------------------------------------------------------------------------- /code/pc/video/test/run_pc-v_full_round.bash: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/video/test/run_pc-v_full_round.bash -------------------------------------------------------------------------------- /code/pc/video/test/test_script_example/test_qwen2p5_7B_vid_qa_pc-v.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/code/pc/video/test/test_script_example/test_qwen2p5_7B_vid_qa_pc-v.py -------------------------------------------------------------------------------- /figs/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/.DS_Store -------------------------------------------------------------------------------- /figs/hf-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/hf-logo.png -------------------------------------------------------------------------------- /figs/pwc-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/pwc-icon.png -------------------------------------------------------------------------------- /figs/vlm2-bench-icon_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/vlm2-bench-icon_final.png -------------------------------------------------------------------------------- /figs/vlm2-bench_eval_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/vlm2-bench_eval_results.png -------------------------------------------------------------------------------- /figs/vlm2-bench_overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/vlm2-bench_overview.png -------------------------------------------------------------------------------- /figs/vlm2-bench_statistics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/figs/vlm2-bench_statistics.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tqdm>=4.65.0 2 | argparse>=1.4.0 3 | json>=2.0.9 4 | 5 | -------------------------------------------------------------------------------- /vlm2bench_evaluator.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vlm2-bench/VLM2-Bench/HEAD/vlm2bench_evaluator.ipynb --------------------------------------------------------------------------------