├── vlmeval ├── vlm │ ├── ovis │ │ ├── utils │ │ │ └── __init__.py │ │ └── __init__.py │ ├── ola │ │ ├── ola │ │ │ ├── datasets │ │ │ │ └── __init__.py │ │ │ ├── model │ │ │ │ ├── speech_encoder │ │ │ │ │ ├── beats │ │ │ │ │ │ └── __init__.py │ │ │ │ │ └── builder.py │ │ │ │ ├── __init__.py │ │ │ │ ├── speech_projector │ │ │ │ │ ├── builder.py │ │ │ │ │ └── speech_projector.py │ │ │ │ ├── multimodal_encoder │ │ │ │ │ └── builder.py │ │ │ │ ├── multimodal_resampler │ │ │ │ │ └── builder.py │ │ │ │ └── multimodal_projector │ │ │ │ │ └── pooler_projector.py │ │ │ ├── constants.py │ │ │ └── arguments.py │ │ └── __init__.py │ ├── ursa │ │ ├── __init__.py │ │ └── ursa_model │ │ │ ├── __init__.py │ │ │ └── processing_ursa.py │ ├── valley │ │ ├── __init__.py │ │ ├── valley_eagle │ │ │ ├── util │ │ │ │ ├── config.py │ │ │ │ └── vision_encoder_config.py │ │ │ ├── constants.py │ │ │ └── model │ │ │ │ ├── multimodal_encoder │ │ │ │ └── builder.py │ │ │ │ └── token_compressor │ │ │ │ ├── avgpool.py │ │ │ │ ├── builder.py │ │ │ │ └── roipool.py │ │ └── requirements_valley.txt │ ├── internvl │ │ └── __init__.py │ ├── qwen2_vl │ │ └── __init__.py │ ├── llava │ │ └── __init__.py │ ├── xcomposer │ │ └── __init__.py │ ├── video_llm │ │ ├── __init__.py │ │ ├── configs │ │ │ ├── llama_vid │ │ │ │ └── processor │ │ │ │ │ └── clip-patch14-224 │ │ │ │ │ └── preprocessor_config.json │ │ │ └── videochat2_hd.json │ │ └── video_chatgpt.py │ ├── misc │ │ ├── minigptv2_eval.yaml │ │ ├── minigpt4_13b_eval.yaml │ │ ├── minigpt4_7b_eval.yaml │ │ ├── blip2_instruct_vicuna7b.yaml │ │ └── blip2_instruct_vicuna13b.yaml │ ├── visualglm.py │ ├── falcon_vlm.py │ ├── chameleon.py │ ├── mixsense.py │ ├── phi4_multimodal.py │ ├── instructblip.py │ ├── pandagpt.py │ ├── pixtral.py │ ├── clip.py │ ├── wemm.py │ ├── qh_360vl.py │ ├── slime.py │ └── __init__.py ├── dataset │ ├── Omnidocbench │ │ ├── __init__.py │ │ └── requirements.txt │ ├── utils │ │ ├── megabench │ │ │ ├── __init__.py │ │ │ ├── parsing │ │ │ │ ├── dummy_parse.py │ │ │ │ └── json_parse.py │ │ │ ├── aggregation │ │ │ │ ├── unsupported_agg.py │ │ │ │ ├── min_agg.py │ │ │ │ └── mean_agg.py │ │ │ ├── scoring │ │ │ │ ├── unsupported_scoring.py │ │ │ │ ├── exact_str_match_case_insensitive.py │ │ │ │ ├── set_precision.py │ │ │ │ ├── normalized_similarity_damerau_levenshtein.py │ │ │ │ ├── longest_common_list_prefix_ratio.py │ │ │ │ ├── gleu.py │ │ │ │ ├── sacrebleu_bleu.py │ │ │ │ ├── nli_entailment.py │ │ │ │ ├── number_rel_diff_ratio.py │ │ │ │ ├── chess_jaccard.py │ │ │ │ ├── near_str_match.py │ │ │ │ ├── multi_ref_phrase.py │ │ │ │ ├── dict_exact_match_agg_recall.py │ │ │ │ ├── dict_nbbox_iou_tuple_agg_jaccard.py │ │ │ │ ├── simple_str_match.py │ │ │ │ ├── dict_set_equality_agg_jaccard.py │ │ │ │ ├── dict_jaccard_agg_jaccard.py │ │ │ │ ├── positive_int_match.py │ │ │ │ ├── xml_nbbox_iou.py │ │ │ │ ├── dict_equality.py │ │ │ │ ├── xml_norm_point_in_bbox.py │ │ │ │ ├── xml_norm_point_distance.py │ │ │ │ ├── exact_str_match.py │ │ │ │ ├── mse.py │ │ │ │ ├── sequence_equality.py │ │ │ │ ├── coordinate_sequence_match.py │ │ │ │ ├── jaccard.py │ │ │ │ └── set_equality.py │ │ │ ├── requirements.txt │ │ │ ├── aggregation_type.py │ │ │ ├── response_parse_type.py │ │ │ └── utils.py │ │ ├── __init__.py │ │ ├── ccocr_evaluator │ │ │ └── __init__.py │ │ ├── crpe.py │ │ ├── hrbench.py │ │ ├── judge_util.py │ │ ├── qbench_video.py │ │ └── longvideobench.py │ ├── autolaporo_maneuver_classification.py │ ├── jigsaws_gesture_classification.py │ ├── heichole_helpers.py │ ├── cholec80_phase_recognition.py │ ├── jigsaws_skill_assessment.py │ ├── endoscapes_cvs_assessment.py │ ├── cholec80_tool_recognition.py │ ├── dresden_anatomy_presence.py │ ├── emma.py │ ├── image_caption.py │ ├── avos_action_recognition.py │ └── mmgenbench.py ├── smp │ ├── __init__.py │ └── log.py ├── utils │ ├── __init__.py │ ├── matching_util.py │ └── mp_util.py ├── __init__.py └── api │ ├── __init__.py │ ├── reka.py │ ├── glm_vision.py │ └── qwen_api.py ├── config ├── model │ ├── CLIP.yaml │ ├── OpenCLIP.yaml │ ├── SurgVLP.yaml │ ├── InternVL2.yaml │ ├── Phi-3.5-Vision.yaml │ ├── PaliGemma.yaml │ ├── Qwen2-VL.yaml │ └── llava_next_vicuna_7b.yaml ├── task │ ├── cholect45_triplet_recognition.yaml │ ├── endoscapes_cvs_assessment_fewshot.yaml │ ├── heichole_action_recognition_fewshot.yaml │ ├── endoscapes_cvs_assessment.yaml │ ├── heichole_action_recognition.yaml │ ├── avos_action_recognition.yaml │ ├── cholec80_tool_recognition.yaml │ ├── heichole_tool_recognition.yaml │ ├── heichole_tool_recognition_fewshot.yaml │ ├── cholec80_phase_recognition.yaml │ ├── heichole_phase_recognition.yaml │ ├── heichole_phase_recognition_fewshot.yaml │ ├── dresden_anatomy_presence.yaml │ └── multibypass140_phase_recognition.yaml └── config.yaml ├── docs ├── en │ ├── docutils.conf │ ├── _static │ │ ├── js │ │ │ └── custom.js │ │ ├── css │ │ │ └── readthedocs.css │ │ └── image │ │ │ └── logo_icon.svg │ ├── _templates │ │ ├── autosummary │ │ │ └── class.rst │ │ ├── callable.rst │ │ └── 404.html │ ├── .readthedocs.yaml │ ├── Makefile │ ├── index.rst │ ├── EvalByLMDeploy.md │ └── Contributors.md └── zh-CN │ ├── docutils.conf │ ├── _static │ ├── js │ │ └── custom.js │ ├── css │ │ └── readthedocs.css │ └── image │ │ └── logo_icon.svg │ ├── cp_origin_docs.sh │ ├── _templates │ ├── autosummary │ │ └── class.rst │ ├── callable.rst │ └── 404.html │ ├── .readthedocs.yaml │ ├── Makefile │ ├── EvalByLMDeploy.md │ └── index.rst ├── assets └── apple.jpg ├── scripts ├── run.sh ├── cover.sh ├── srun.sh ├── auto_run.py └── apires_scan.py ├── requirements └── docs.txt ├── .github ├── workflows │ ├── lint.yml │ └── pr-run-test.yml └── scripts │ └── assert_score.py ├── requirements.txt ├── .pre-commit-config.yaml └── eval.py /vlmeval/vlm/ovis/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /vlmeval/dataset/Omnidocbench/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/__init__.py: -------------------------------------------------------------------------------- 1 | from .ola_model import Ola 2 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/model/speech_encoder/beats/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /config/model/CLIP.yaml: -------------------------------------------------------------------------------- 1 | name: CLIP 2 | contrastive: True 3 | -------------------------------------------------------------------------------- /vlmeval/vlm/ursa/__init__.py: -------------------------------------------------------------------------------- 1 | from .ursa_chat import UrsaChat -------------------------------------------------------------------------------- /config/model/OpenCLIP.yaml: -------------------------------------------------------------------------------- 1 | name: OpenCLIP 2 | contrastive: True 3 | -------------------------------------------------------------------------------- /config/model/SurgVLP.yaml: -------------------------------------------------------------------------------- 1 | name: SurgVLP 2 | contrastive: True 3 | -------------------------------------------------------------------------------- /config/model/InternVL2.yaml: -------------------------------------------------------------------------------- 1 | name: InternVL2-8B 2 | contrastive: False 3 | -------------------------------------------------------------------------------- /docs/en/docutils.conf: -------------------------------------------------------------------------------- 1 | [html writers] 2 | table_style: colwidths-auto 3 | -------------------------------------------------------------------------------- /config/model/Phi-3.5-Vision.yaml: -------------------------------------------------------------------------------- 1 | name: Phi-3.5-Vision 2 | contrastive: False -------------------------------------------------------------------------------- /docs/zh-CN/docutils.conf: -------------------------------------------------------------------------------- 1 | [html writers] 2 | table_style: colwidths-auto 3 | -------------------------------------------------------------------------------- /config/model/PaliGemma.yaml: -------------------------------------------------------------------------------- 1 | name: paligemma-3b-mix-448 2 | contrastive: False 3 | -------------------------------------------------------------------------------- /config/model/Qwen2-VL.yaml: -------------------------------------------------------------------------------- 1 | name: Qwen2-VL-7B-Instruct 2 | contrastive: False 3 | -------------------------------------------------------------------------------- /vlmeval/vlm/valley/__init__.py: -------------------------------------------------------------------------------- 1 | from .valley_eagle_chat import ValleyEagleChat 2 | -------------------------------------------------------------------------------- /assets/apple.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anitarau/SurgBenchKit/HEAD/assets/apple.jpg -------------------------------------------------------------------------------- /config/model/llava_next_vicuna_7b.yaml: -------------------------------------------------------------------------------- 1 | name: llava_next_vicuna_7b 2 | contrastive: False 3 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/model/__init__.py: -------------------------------------------------------------------------------- 1 | from .language_model.ola_qwen import OlaQwenForCausalLM, OlaConfigQwen -------------------------------------------------------------------------------- /vlmeval/vlm/internvl/__init__.py: -------------------------------------------------------------------------------- 1 | from .internvl_chat import InternVLChat 2 | 3 | __all__ = ['InternVLChat'] 4 | -------------------------------------------------------------------------------- /vlmeval/vlm/qwen2_vl/__init__.py: -------------------------------------------------------------------------------- 1 | from .model import Qwen2VLChat 2 | from .prompt import Qwen2VLPromptMixin 3 | -------------------------------------------------------------------------------- /vlmeval/smp/__init__.py: -------------------------------------------------------------------------------- 1 | from .file import * 2 | from .vlm import * 3 | from .misc import * 4 | from .log import * 5 | -------------------------------------------------------------------------------- /scripts/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -x 3 | export GPU=$(nvidia-smi --list-gpus | wc -l) 4 | torchrun --nproc-per-node=$GPU run.py ${@:1} -------------------------------------------------------------------------------- /vlmeval/vlm/ovis/__init__.py: -------------------------------------------------------------------------------- 1 | from .ovis import Ovis, Ovis1_6, Ovis1_6_Plus, Ovis2 2 | 3 | __all__ = ['Ovis', 'Ovis1_6', 'Ovis1_6_Plus', 'Ovis2'] -------------------------------------------------------------------------------- /scripts/cover.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) 3 | cp $DIR/../config.py $DIR/../vlmeval/ 4 | cp $DIR/../misc/* $DIR/../vlmeval/vlm/misc/ -------------------------------------------------------------------------------- /scripts/srun.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -x 3 | srun -n1 --ntasks-per-node=1 --partition $1 --gres=gpu:8 --quotatype=reserved --job-name vlmeval --cpus-per-task=64 torchrun --nproc-per-node=8 run.py ${@:2} -------------------------------------------------------------------------------- /vlmeval/dataset/Omnidocbench/requirements.txt: -------------------------------------------------------------------------------- 1 | torchvision 2 | Levenshtein 3 | BeautifulSoup4 4 | pylatexenc 5 | scipy 6 | evaluate 7 | apted 8 | lxml 9 | func_timeout 10 | accelerate>=0.26.0 11 | jmespath 12 | qwen_vl_utils 13 | nltk -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/__init__.py: -------------------------------------------------------------------------------- 1 | from .aggregation_type import AggregationType 2 | from .metric_type import MetricType 3 | from .response_parse_type import ResponseParseType 4 | 5 | __all__ = [AggregationType, MetricType, ResponseParseType] 6 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/parsing/dummy_parse.py: -------------------------------------------------------------------------------- 1 | class DummyParse: 2 | 3 | @staticmethod 4 | def parse(response: str, *args, **kwargs) -> dict: 5 | """return the raw string without doing anything""" 6 | return response.strip() 7 | -------------------------------------------------------------------------------- /vlmeval/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .matching_util import can_infer, can_infer_option, can_infer_text 2 | from .mp_util import track_progress_rich 3 | 4 | 5 | __all__ = [ 6 | 'can_infer', 'can_infer_option', 'can_infer_text', 'track_progress_rich', 7 | ] 8 | -------------------------------------------------------------------------------- /config/task/cholect45_triplet_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: cholect45_triplet_recognition 3 | data: Cholect45Triplet 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/CholecT45/ 7 | clip_eval_mode: 'sigmoid' # multi-label binary classification 8 | -------------------------------------------------------------------------------- /config/task/endoscapes_cvs_assessment_fewshot.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: endoscapes_cvs_assessment_fewshot 3 | data: EndoscapesCVSAssessment 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/endoscapes/ 7 | label_names: ['C1', 'C2', 'C3'] 8 | shots: five 9 | -------------------------------------------------------------------------------- /docs/en/_static/js/custom.js: -------------------------------------------------------------------------------- 1 | var collapsedSections = []; 2 | 3 | $(document).ready(function () { 4 | $('.model-summary').DataTable({ 5 | "stateSave": false, 6 | "lengthChange": false, 7 | "pageLength": 20, 8 | "order": [] 9 | }); 10 | }); 11 | -------------------------------------------------------------------------------- /config/task/heichole_action_recognition_fewshot.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: heichole_action_recognition_fewshot 3 | data: HeiCholeDataloader 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/heichole 7 | label_names: ['grasp', 'hold', 'cut', 'clip'] 8 | shots: one 9 | -------------------------------------------------------------------------------- /docs/zh-CN/_static/js/custom.js: -------------------------------------------------------------------------------- 1 | var collapsedSections = []; 2 | 3 | $(document).ready(function () { 4 | $('.model-summary').DataTable({ 5 | "stateSave": false, 6 | "lengthChange": false, 7 | "pageLength": 20, 8 | "order": [] 9 | }); 10 | }); 11 | -------------------------------------------------------------------------------- /vlmeval/vlm/llava/__init__.py: -------------------------------------------------------------------------------- 1 | from .llava import LLaVA, LLaVA_Next, LLaVA_Next2, LLaVA_OneVision, LLaVA_OneVision_HF 2 | from .llava_xtuner import LLaVA_XTuner 3 | 4 | __all__ = ['LLaVA', 'LLaVA_Next', 'LLaVA_XTuner', 'LLaVA_Next2', 'LLaVA_OneVision', 'LLaVA_OneVision_HF'] 5 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/aggregation/unsupported_agg.py: -------------------------------------------------------------------------------- 1 | from numbers import Number 2 | from typing import Dict 3 | 4 | 5 | class UnsupportedAggregation: 6 | @staticmethod 7 | def aggregate(scores: Dict[str, Number], weights: Dict[str, Number]) -> Number: 8 | return -1 9 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/scoring/unsupported_scoring.py: -------------------------------------------------------------------------------- 1 | class UnsupportedScoring: 2 | """Unsupported scoring.""" 3 | 4 | @staticmethod 5 | def match(response: str, correct_answer: str) -> int: 6 | """Default response for unimplemented metrics.""" 7 | return -1 8 | -------------------------------------------------------------------------------- /docs/zh-CN/cp_origin_docs.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # Copy *.md files from docs/ if it doesn't have a Chinese translation 4 | 5 | for filename in $(find ../en/ -name '*.md' -printf "%P\n"); 6 | do 7 | mkdir -p $(dirname $filename) 8 | cp -n ../en/$filename ./$filename 9 | done 10 | -------------------------------------------------------------------------------- /requirements/docs.txt: -------------------------------------------------------------------------------- 1 | docutils==0.18.1 2 | modelindex 3 | myst-parser 4 | -e git+https://github.com/open-compass/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme 5 | sphinx==6.1.3 6 | sphinx-copybutton 7 | sphinx-design 8 | sphinx-notfound-page 9 | sphinx-tabs 10 | sphinxcontrib-jquery 11 | tabulate 12 | -------------------------------------------------------------------------------- /config/task/endoscapes_cvs_assessment.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: endoscapes_cvs_assessment 3 | data: EndoscapesCVSAssessment 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/endoscapes/ 7 | label_names: ['C1', 'C2', 'C3'] 8 | clip_eval_mode: 'sigmoid' # multi-label binary classification 9 | shots: zero 10 | -------------------------------------------------------------------------------- /vlmeval/__init__.py: -------------------------------------------------------------------------------- 1 | try: 2 | import torch 3 | except ImportError: 4 | pass 5 | 6 | from .smp import * 7 | from .api import * 8 | from .dataset import * 9 | from .utils import * 10 | from .vlm import * 11 | from .config import * 12 | from .tools import cli 13 | 14 | load_env() 15 | 16 | __version__ = '0.2rc1' 17 | -------------------------------------------------------------------------------- /config/task/heichole_action_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: heichole_action_recognition 3 | data: HeiCholeDataloader 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/heichole 7 | label_names: ['grasp', 'hold', 'cut', 'clip'] 8 | clip_eval_mode: 'sigmoid' # multi-label binary classification 9 | shots: zero 10 | -------------------------------------------------------------------------------- /vlmeval/vlm/xcomposer/__init__.py: -------------------------------------------------------------------------------- 1 | from .sharecaptioner import ShareCaptioner 2 | from .xcomposer import XComposer 3 | from .xcomposer2 import XComposer2 4 | from .xcomposer2_4KHD import XComposer2_4KHD 5 | from .xcomposer2d5 import XComposer2d5 6 | 7 | __all__ = ['ShareCaptioner', 'XComposer', 'XComposer2', 'XComposer2_4KHD', 'XComposer2d5'] 8 | -------------------------------------------------------------------------------- /docs/en/_templates/autosummary/class.rst: -------------------------------------------------------------------------------- 1 | .. role:: hidden 2 | :class: hidden-section 3 | .. currentmodule:: {{ module }} 4 | 5 | 6 | {{ name | underline}} 7 | 8 | .. autoclass:: {{ name }} 9 | :members: 10 | 11 | .. 12 | autogenerated from _templates/autosummary/class.rst 13 | note it does not have :inherited-members: 14 | -------------------------------------------------------------------------------- /docs/zh-CN/_templates/autosummary/class.rst: -------------------------------------------------------------------------------- 1 | .. role:: hidden 2 | :class: hidden-section 3 | .. currentmodule:: {{ module }} 4 | 5 | 6 | {{ name | underline}} 7 | 8 | .. autoclass:: {{ name }} 9 | :members: 10 | 11 | .. 12 | autogenerated from _templates/autosummary/class.rst 13 | note it does not have :inherited-members: 14 | -------------------------------------------------------------------------------- /config/task/avos_action_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: avos_action_recognition 3 | data: AVOSActionRecognition 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/AVOS 7 | label_names: ['cutting', 'tying', 'suturing', 'background'] 8 | clip_eval_mode: 'singlelabel' # single label multi-class classification 9 | shots: zero 10 | -------------------------------------------------------------------------------- /docs/en/.readthedocs.yaml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | # Set the version of Python and other tools you might need 4 | build: 5 | os: ubuntu-22.04 6 | tools: 7 | python: "3.8" 8 | 9 | formats: 10 | - epub 11 | 12 | sphinx: 13 | configuration: docs/en/conf.py 14 | 15 | python: 16 | install: 17 | - requirements: requirements/docs.txt 18 | -------------------------------------------------------------------------------- /docs/en/_templates/callable.rst: -------------------------------------------------------------------------------- 1 | .. role:: hidden 2 | :class: hidden-section 3 | .. currentmodule:: {{ module }} 4 | 5 | 6 | {{ name | underline}} 7 | 8 | .. autoclass:: {{ name }} 9 | :members: 10 | :special-members: __call__ 11 | 12 | .. 13 | autogenerated from _templates/callable.rst 14 | note it does not have :inherited-members: 15 | -------------------------------------------------------------------------------- /docs/zh-CN/.readthedocs.yaml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | # Set the version of Python and other tools you might need 4 | build: 5 | os: ubuntu-22.04 6 | tools: 7 | python: "3.8" 8 | 9 | formats: 10 | - epub 11 | 12 | sphinx: 13 | configuration: docs/zh-CN/conf.py 14 | 15 | python: 16 | install: 17 | - requirements: requirements/docs.txt 18 | -------------------------------------------------------------------------------- /docs/zh-CN/_templates/callable.rst: -------------------------------------------------------------------------------- 1 | .. role:: hidden 2 | :class: hidden-section 3 | .. currentmodule:: {{ module }} 4 | 5 | 6 | {{ name | underline}} 7 | 8 | .. autoclass:: {{ name }} 9 | :members: 10 | :special-members: __call__ 11 | 12 | .. 13 | autogenerated from _templates/callable.rst 14 | note it does not have :inherited-members: 15 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .judge_util import build_judge, DEBUG_MESSAGE 2 | from .multiple_choice import extract_answer_from_item, prefetch_answer 3 | from .vqa_eval import levenshtein_distance 4 | 5 | 6 | __all__ = [ 7 | 'build_judge', 'extract_answer_from_item', 'prefetch_answer', 8 | 'levenshtein_distance', 'DEBUG_MESSAGE', 9 | ] 10 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/requirements.txt: -------------------------------------------------------------------------------- 1 | antlr4-python3-runtime==4.11.0 2 | filelock==3.16.1 3 | geopy==2.4.1 4 | jieba==0.42.1 5 | nltk==3.9.1 6 | numpy==1.26.4 7 | pronouncing==0.2.0 8 | rapidfuzz==3.9.5 9 | regex==2024.7.24 10 | requests==2.32.3 11 | requests_cache==1.2.1 12 | sacrebleu==2.4.3 13 | sympy==1.13.2 14 | tqdm==4.66.4 15 | Unidecode==1.3.8 16 | -------------------------------------------------------------------------------- /config/task/cholec80_tool_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: cholec80_tool_recognition 3 | data: Cholec80ToolRecognition 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/cholec80 7 | label_names: ['Grasper', 'Bipolar', 'Hook', 'Scissors', 'Clipper', 'Irrigator', 'SpecimenBag'] 8 | clip_eval_mode: 'sigmoid' # multi-label binary classification 9 | shots: zero 10 | -------------------------------------------------------------------------------- /vlmeval/vlm/video_llm/__init__.py: -------------------------------------------------------------------------------- 1 | from .video_llava import VideoLLaVA, VideoLLaVA_HF 2 | from .videochat2 import VideoChat2_HD 3 | from .chat_uni_vi import Chatunivi 4 | from .video_chatgpt import VideoChatGPT 5 | from .llama_vid import LLaMAVID 6 | from .pllava import PLLaVA 7 | 8 | __all__ = ['VideoLLaVA', 'VideoLLaVA_HF', 'Chatunivi', 'VideoChatGPT', 'LLaMAVID', 'VideoChat2_HD', 'PLLaVA'] 9 | -------------------------------------------------------------------------------- /config/task/heichole_tool_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: heichole_tool_recognition 3 | data: HeiCholeDataloader 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/heichole 7 | label_names: ['Grasper', 'Clipper', 'Coagulation instruments', 'Scissors', 'Suction-irrigation', 'Specimen bag', 'Stapler'] 8 | clip_eval_mode: 'sigmoid' # multi-label binary classification 9 | shots: zero 10 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/model/speech_projector/builder.py: -------------------------------------------------------------------------------- 1 | from .speech_projector import EncoderProjectorConcat 2 | 3 | 4 | def build_speech_projector(config): 5 | projector_type = getattr(config, 'speech_projector_type', 'linear') 6 | if projector_type == 'linear': 7 | return EncoderProjectorConcat(config) 8 | 9 | raise ValueError(f'Unknown projector type: {projector_type}') 10 | -------------------------------------------------------------------------------- /config/task/heichole_tool_recognition_fewshot.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: heichole_tool_recognition_fewshot 3 | data: HeiCholeDataloader 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/heichole 7 | label_names: ['Grasper', 'Clipper', 'Coagulation instruments', 'Scissors', 'Suction-irrigation', 'Specimen bag', 'Stapler'] 8 | clip_eval_mode: 'sigmoid' # multi-label binary classification 9 | shots: one 10 | -------------------------------------------------------------------------------- /config/task/cholec80_phase_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: cholec80_phase_recognition 3 | data: Cholec80PhaseRecognition 4 | data_config: 5 | data_dir: /path/to/cholec80 6 | label_names: ['Preparation', 'CalotTriangleDissection', 'ClippingCutting', 'GallbladderDissection', 'GallbladderPackaging', 'CleaningCoagulation', 'GallbladderRetraction'] 7 | clip_eval_mode: 'singlelabel' # single label multi-class classification 8 | shots: zero -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/constants.py: -------------------------------------------------------------------------------- 1 | CONTROLLER_HEART_BEAT_EXPIRATION = 30 2 | WORKER_HEART_BEAT_INTERVAL = 15 3 | 4 | LOGDIR = "." 5 | 6 | # Model Constants 7 | IGNORE_INDEX = -100 8 | SPEECH_TOKEN_INDEX = -200 9 | DEFAULT_SPEECH_TOKEN = "" 10 | IMAGE_TOKEN_INDEX= -300 11 | DEFAULT_IMAGE_TOKEN = "" 12 | DEFAULT_IMAGE_PATCH_TOKEN = "" 13 | DEFAULT_IM_START_TOKEN = "" 14 | DEFAULT_IM_END_TOKEN = "" -------------------------------------------------------------------------------- /config/task/heichole_phase_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: heichole_phase_recognition 3 | data: HeiCholeDataloader 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/heichole 7 | label_names: ['Preparation', 'CalotTriangleDissection', 'ClippingCutting', 'GallbladderDissection', 'GallbladderPackaging', 'CleaningCoagulation', 'GallbladderRetraction'] 8 | clip_eval_mode: 'singlelabel' # single label multi-class classification 9 | shots: zero 10 | -------------------------------------------------------------------------------- /config/task/heichole_phase_recognition_fewshot.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: heichole_phase_recognition_fewshot 3 | data: HeiCholeDataloader 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/heichole 7 | label_names: ['Preparation', 'CalotTriangleDissection', 'ClippingCutting', 'GallbladderDissection', 'GallbladderPackaging', 'CleaningCoagulation', 'GallbladderRetraction'] 8 | clip_eval_mode: 'singlelabel' # single label multi-class classification 9 | shots: one 10 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/ccocr_evaluator/__init__.py: -------------------------------------------------------------------------------- 1 | from .kie_evaluator import KieEvaluator 2 | from .doc_parsing_evaluator import ParsingEvaluator 3 | from .ocr_evaluator import OcrEvaluator 4 | from .common import summary 5 | 6 | 7 | evaluator_map_info = { 8 | "kie": KieEvaluator("kie"), 9 | "doc_parsing": ParsingEvaluator("doc_parsing"), 10 | "multi_lan_ocr": OcrEvaluator("multi_lan_ocr"), 11 | "multi_scene_ocr": OcrEvaluator("multi_scene_ocr") 12 | } 13 | -------------------------------------------------------------------------------- /config/task/dresden_anatomy_presence.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: dresden_anatomy_presence 3 | data: DresdenAnatomyPresence 4 | data_config: 5 | transform: None 6 | data_dir: /path/to//DresdenSurgicalAnatomy 7 | label_names: ['abdominal wall', 'colon', 'inferior mesenteric artery', 'intestinal veins', 'liver', 'pancreas', 8 | 'small intestine', 'spleen', 'stomach', 'ureter', 'null', 'vesicular glands'] 9 | clip_eval_mode: 'sigmoid' # multi-label binary classification 10 | -------------------------------------------------------------------------------- /vlmeval/vlm/video_llm/configs/llama_vid/processor/clip-patch14-224/preprocessor_config.json: -------------------------------------------------------------------------------- 1 | { 2 | "crop_size": 224, 3 | "do_center_crop": true, 4 | "do_normalize": true, 5 | "do_resize": true, 6 | "feature_extractor_type": "CLIPFeatureExtractor", 7 | "image_mean": [ 8 | 0.48145466, 9 | 0.4578275, 10 | 0.40821073 11 | ], 12 | "image_std": [ 13 | 0.26862954, 14 | 0.26130258, 15 | 0.27577711 16 | ], 17 | "resample": 3, 18 | "size": 224 19 | } 20 | -------------------------------------------------------------------------------- /config/config.yaml: -------------------------------------------------------------------------------- 1 | defaults: 2 | - _self_ 3 | - model: GeminiPro1-5 4 | - task: dresden_anatomy_presence 5 | - override hydra/hydra_logging: disabled 6 | - override hydra/job_logging: disabled 7 | 8 | # Set model and task 9 | workdir: /pasteur/u/arau/projects/surg_bench/check_for_pub/ 10 | exp_name: deleteme_for_pub 11 | eval_mode: infer_data 12 | 13 | override_outputs: False 14 | 15 | #prevent hydra outputs 16 | hydra: 17 | output_subdir: null 18 | run: 19 | dir: . -------------------------------------------------------------------------------- /vlmeval/dataset/utils/crpe.py: -------------------------------------------------------------------------------- 1 | import json 2 | import argparse 3 | from collections import defaultdict 4 | 5 | 6 | def is_correct(predict, answer): 7 | # predict是标准答案 answer是预测 8 | if len(answer) == 1: 9 | return answer[0] == predict[0] 10 | elif len(answer) != 1 and answer[0] in ['A', 'B', 'C', 'D']: 11 | return answer[0] == predict[0] 12 | elif len(answer) != 1 and answer[0] not in ['A', 'B', 'C', 'D']: 13 | return predict[4:].lower() in answer.lower() 14 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/model/speech_encoder/builder.py: -------------------------------------------------------------------------------- 1 | from .speech_encoder import WhisperWrappedEncoder, DualWrappedEncoder 2 | 3 | 4 | def build_speech_encoder(config): 5 | speech_encoder_type = getattr(config, 'speech_encoder_type', None) 6 | if "whisper" in speech_encoder_type.lower(): 7 | return WhisperWrappedEncoder.load(config) 8 | elif "dual" in speech_encoder_type.lower(): 9 | return DualWrappedEncoder(config) 10 | 11 | raise ValueError(f'Unknown speech encoder: {speech_encoder_type}') 12 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/aggregation/min_agg.py: -------------------------------------------------------------------------------- 1 | from numbers import Number 2 | from typing import Dict 3 | 4 | 5 | class MinAggregation: 6 | """Take the minimum of all valid scores.""" 7 | 8 | @staticmethod 9 | def aggregate(scores: Dict[str, Number], weights: Dict[str, Number]) -> Number: 10 | """Exact match between targets and responses.""" 11 | filtered_scores = [s for s in scores.values() if s >= 0] 12 | if not filtered_scores: 13 | return -1 14 | return min(filtered_scores) 15 | -------------------------------------------------------------------------------- /vlmeval/vlm/ola/ola/model/multimodal_encoder/builder.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .oryx_vit import SigLIPViTAnysizeWrapper 3 | 4 | def build_vision_tower(vision_tower_cfg, **kwargs): 5 | vision_tower = getattr(vision_tower_cfg, 'vision_tower', getattr(vision_tower_cfg, 'mm_vision_tower', None)) 6 | is_absolute_path_exists = os.path.exists(vision_tower) 7 | print(f"Buiding OryxViTWrapper from {vision_tower}...") 8 | # path = vision_tower.split(":")[1] 9 | return SigLIPViTAnysizeWrapper(vision_tower, path=vision_tower, args=vision_tower_cfg, **kwargs) -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/scoring/exact_str_match_case_insensitive.py: -------------------------------------------------------------------------------- 1 | from .exact_str_match import ExactStrMatch 2 | 3 | 4 | class ExactStrMatchCaseInsensitive: 5 | """Case-insensitive exact string matching.""" 6 | 7 | @staticmethod 8 | def match(response, correct_answer) -> int: 9 | """Case-insensitive exact match between targets and responses.""" 10 | if not isinstance(response, str) and isinstance(correct_answer, str): 11 | return 0 12 | return ExactStrMatch.match(response.lower(), correct_answer.lower()) 13 | -------------------------------------------------------------------------------- /docs/en/_templates/404.html: -------------------------------------------------------------------------------- 1 | {% extends "layout.html" %} 2 | 3 | {% block body %} 4 | 5 |

Page Not Found

6 |

7 | The page you are looking for cannot be found. 8 |

9 |

10 | If you just switched documentation versions, it is likely that the page you were on is moved. You can look for it in 11 | the content table left, or go to the homepage. 12 |

13 | 17 | 18 | {% endblock %} 19 | -------------------------------------------------------------------------------- /docs/zh-CN/_templates/404.html: -------------------------------------------------------------------------------- 1 | {% extends "layout.html" %} 2 | 3 | {% block body %} 4 | 5 |

Page Not Found

6 |

7 | The page you are looking for cannot be found. 8 |

9 |

10 | If you just switched documentation versions, it is likely that the page you were on is moved. You can look for it in 11 | the content table left, or go to the homepage. 12 |

13 | 17 | 18 | {% endblock %} 19 | -------------------------------------------------------------------------------- /config/task/multibypass140_phase_recognition.yaml: -------------------------------------------------------------------------------- 1 | # Task 2 | name: multibypass140_phase_recognition 3 | data: MultiBypass140PhaseRecognition 4 | data_config: 5 | transform: None 6 | data_dir: /path/to/MultiBypass140/ 7 | label_names: ['Preparation', 'Gastric pouch creation', 'Omentum division', 'Gastrojejunal anastomosis', 'Anastomosis test', 'Jejunal separation', 'Petersen space closure', 'Jejunojejunal anastomosis', 'Mesenteric defect closure', 'Cleaning & Coagulation', 'Disassembling', 'Other intervention'] 8 | clip_eval_mode: 'singlelabel' # single label multi-class classification 9 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/scoring/set_precision.py: -------------------------------------------------------------------------------- 1 | from .common.conversions import cast_to_set 2 | from .common.metrics import set_precision 3 | 4 | 5 | class SetPrecision: 6 | """Calculates the set precision for iterables.""" 7 | 8 | @classmethod 9 | def match(cls, responses, targets) -> float: 10 | """Exact match between targets and responses.""" 11 | if responses is None: 12 | return 0 13 | responses = cast_to_set(responses) 14 | targets = cast_to_set(targets) 15 | 16 | return set_precision(responses, targets) 17 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/scoring/normalized_similarity_damerau_levenshtein.py: -------------------------------------------------------------------------------- 1 | import rapidfuzz 2 | 3 | 4 | class NormalizedSimilarityDamerauLevenshtein: 5 | """Normalized Damerau-Levenshtein Similarity.""" 6 | 7 | @staticmethod 8 | def match(response, correct_answer) -> int: 9 | """Normalized indel similarityuiio do between targets and responses.""" 10 | if not isinstance(response, str) and isinstance(correct_answer, str): 11 | return 0 12 | return rapidfuzz.distance.DamerauLevenshtein.normalized_similarity( 13 | response, correct_answer 14 | ) 15 | -------------------------------------------------------------------------------- /.github/workflows/lint.yml: -------------------------------------------------------------------------------- 1 | name: lint 2 | 3 | on: [push, pull_request] 4 | 5 | concurrency: 6 | group: ${{ github.workflow }}-${{ github.ref }} 7 | cancel-in-progress: true 8 | 9 | jobs: 10 | lint: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - uses: actions/checkout@v2 14 | - name: Set up Python 3.10 15 | uses: actions/setup-python@v2 16 | with: 17 | python-version: 3.10.15 18 | - name: Install pre-commit hook 19 | run: | 20 | pip install pre-commit 21 | pre-commit install 22 | - name: Linting 23 | run: pre-commit run --all-files 24 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/scoring/longest_common_list_prefix_ratio.py: -------------------------------------------------------------------------------- 1 | from .common.conversions import str_to_list 2 | from .common.metrics import longest_common_prefix 3 | 4 | 5 | class LongestCommonListPrefixRatio: 6 | """Determines how much of the first part of the list 7 | was predicted correctly. 8 | """ 9 | 10 | @classmethod 11 | def match(cls, responses, targets) -> int: 12 | """Exact match between targets and responses.""" 13 | responses = str_to_list(responses) 14 | targets = str_to_list(targets) 15 | return len(longest_common_prefix(responses, targets)) / len(targets) 16 | -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/parsing/json_parse.py: -------------------------------------------------------------------------------- 1 | from .common.parsers import parse_json 2 | from .common.utils import evaluate_as_string 3 | 4 | 5 | class JsonParse: 6 | """Load the response as a JSON object.""" 7 | 8 | @staticmethod 9 | def parse(response: str): 10 | """Parse the JSON object, including nested JSON strings.""" 11 | parsed_res = parse_json(response) 12 | # Drop the potentially duplicated string quotes 13 | if isinstance(parsed_res, dict): 14 | for key, val in parsed_res.items(): 15 | parsed_res[key] = evaluate_as_string(val) 16 | 17 | return parsed_res 18 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | decord; platform_machine != 'arm64' 2 | eva-decord; platform_machine == 'arm64' 3 | gradio 4 | huggingface_hub 5 | imageio 6 | matplotlib 7 | numpy==1.26.4 8 | omegaconf 9 | openai 10 | opencv-python>=4.4.0.46 11 | openpyxl 12 | pandas 13 | pillow 14 | portalocker 15 | protobuf 16 | python-dotenv 17 | requests 18 | rich 19 | sentencepiece 20 | setuptools 21 | sty 22 | tabulate 23 | tiktoken 24 | timeout-decorator 25 | torch 26 | torchvision 27 | tqdm 28 | transformers 29 | typing_extensions 30 | validators 31 | xlsxwriter 32 | ftfy 33 | regex 34 | pandas 35 | decord 36 | scikit-learn 37 | hydra-core 38 | google-generativeai 39 | flash-attn==2.6.3 40 | qwen-vl-utils 41 | open_clip_torch 42 | clip -------------------------------------------------------------------------------- /vlmeval/dataset/utils/megabench/scoring/gleu.py: -------------------------------------------------------------------------------- 1 | from numbers import Number 2 | import jieba 3 | from nltk.translate.gleu_score import sentence_gleu 4 | 5 | 6 | class GLEUChinese: 7 | """Compute GLEU score for Chinese text.""" 8 | 9 | @staticmethod 10 | def match(response, correct_answer) -> Number: 11 | """Compute the BLEU scores between two strings.""" 12 | if isinstance(response, str) and isinstance(correct_answer, str): 13 | reference_tokens = list(jieba.cut_for_search(response)) 14 | translation_tokens = list(jieba.cut_for_search(correct_answer)) 15 | else: 16 | return 0 17 | return sentence_gleu([reference_tokens], translation_tokens) 18 | -------------------------------------------------------------------------------- /docs/en/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line, and also 5 | # from the environment for the first two. 6 | SPHINXOPTS ?= 7 | SPHINXBUILD ?= sphinx-build 8 | SOURCEDIR = . 9 | BUILDDIR = _build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | -------------------------------------------------------------------------------- /docs/zh-CN/Makefile: -------------------------------------------------------------------------------- 1 | # Minimal makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line, and also 5 | # from the environment for the first two. 6 | SPHINXOPTS ?= 7 | SPHINXBUILD ?= sphinx-build 8 | SOURCEDIR = . 9 | BUILDDIR = _build 10 | 11 | # Put it first so that "make" without argument is like "make help". 12 | help: 13 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 14 | 15 | .PHONY: help Makefile 16 | 17 | # Catch-all target: route all unknown targets to Sphinx using the new 18 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). 19 | %: Makefile 20 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) 21 | -------------------------------------------------------------------------------- /vlmeval/vlm/valley/valley_eagle/util/config.py: -------------------------------------------------------------------------------- 1 | IGNORE_INDEX = -100 2 | IMAGE_TOKEN_INDEX = -200 3 | GANDALF_TOKEN_INDEX = -300 4 | DEFAULT_PAD_TOKEN = "[PAD]" 5 | DEFAULT_EOS_TOKEN = "" 6 | DEFAULT_BOS_TOKEN = "" 7 | DEFAULT_UNK_TOKEN = "" 8 | DEFAULT_IMAGE_TOKEN = "" 9 | DEFAULT_IMAGE_PATCH_TOKEN = "" 10 | DEFAULT_IM_START_TOKEN = "" 11 | DEFAULT_IM_END_TOKEN = "" 12 | 13 | DEFAULT_VIDEO_TOKEN = "