10 |
11 |
12 |
13 |
14 | ## 👀 MMR-V Overview
15 | > The sequential structure of videos poses a challenge to the ability of multimodal large language models (MLLMs) to 🕵️locate multi-frame evidence and conduct multimodal reasoning. However, existing video benchmarks mainly focus on understanding tasks, which only require models to match frames mentioned in the question and perceive a few adjacent frames. To address this gap, we propose **MMR-V: A Benchmark for Multimodal Deep Reasoning in Videos**. Models like o3 and o4-mini have achieved impressive results on **"Think with Images"** tasks, which require models to 🕵️mine evidence on image. Similarly, tasks in MMR-V require models to perform in-depth reasoning over visual information from different frames of a video, challenging their ability to 🕵️mine evidence across long-range multi-frame (**"Think with Video"**).
16 |
17 | ### 🌟 Highlights
18 | * *Long-range, multi-frame reasoning*: Models are required to infer and analyze evidence frames that may be far from the question frame.
19 |
20 | * *Beyond perception*: Questions cannot be answered through direct perception alone but require reasoning over hidden information.
21 |
22 | * *Reliability*: All tasks are **manually annotated**, referencing extensive real-world user understanding to align with common perceptions.
23 |
24 | * *Confusability*: Carefully designed distractor annotation strategies to reduce model shortcuts.
25 |
26 | MMR-V consists of **317** videos and **1,257** tasks. All videos and tasks have been manually reviewed to ensure quality and diversity, aiming to closely reflect real-world scenarios.
27 |
28 | ## 🎬 MMR-V Task Examples
29 |
30 |
31 |
32 |
33 |
34 | ---
35 |
36 | ## 🚀 Quick Start
37 |
38 | 1. Load the MMR-V Benchmark
39 |
40 | ```shell
41 | huggingface-cli download JokerJan/MMR-VBench --repo-type dataset --local-dir MMR-V --local-dir-use-symlinks False
42 | ```
43 | 2. Extract videos from the `.tar` files:
44 |
45 | ```shell
46 | cat videos.tar.part.* > videos.tar
47 | tar -xvf videos.tar
48 | ```
49 |
50 | 3. Data Format
51 |
52 | All data in **MMR-V** are standardized to the following format:
53 | ```json
54 | {
55 | "video": "Level 1 to 100 Magic Tricks Anyone Can Do.mp4",
56 | "videoType": "TV",
57 | "question": "How does the man at the beginning of the video pick up and casually control the flame on the lighter?",
58 | "options": [
59 | "(A) He used a holographic projector to simulate the flame.",
60 | "(B) He used a special flame-retardant chemical on his hand to create the illusion.",
61 | "(C) He possessed an innate immunity to fire.",
62 | "(D) He practiced yoga meditation to withstand any flame heat.",
63 | "(E) A quick extinguishing spray was applied that halted the flame.",
64 | "(F) He surrounded the flame with an invisible film.",
65 | "(G) He mastered the art of fire manipulation.",
66 | "(H) The flame was made of non-flammable gas.",
67 | "(I) He applied a hidden cooling technology under his sleeve.",
68 | "(J) The flame was actually an LED light.",
69 | "(K) A hidden lighter in his hand, a sleight of hand trick."
70 | ],
71 | "correctAnswer": "(K)",
72 | "abilityType_L2": "Counterintuitive Reasoning",
73 | "abilityType_L3": "Magic Deconstruction",
74 | "question_idx": 20
75 | }
76 | ```
77 |
78 | 4. Evaluation Settings:
79 |
80 | Please place the unzipped video file under `MMR-V/videos`.
81 |
82 | Other model inference details and implementation can be found in `utils
83 | /video_utils.py`.
84 |
85 | 5. Evaluation with script:
86 |
87 | ```shell
88 | python evaluation/server_evaluation_on_MMR.py \
89 | --model_name gemini-2.5-flash-preview-04-17 \
90 | --api_url https://XXX/v1/chat/completions \
91 | --api_key sk-XXX \
92 | --with_cot \
93 | --frame_count 32
94 | ```
95 | Please provide valid API information at the `--api_url` and `--api_key` fields. For open-source models running on a local `vllm` server, set `--api_url` to the local server address and leave `--api_key` empty. If the `--with_cot` flag is specified, the evaluation will use *Chain-of-Thought (CoT) prompting*; otherwise, the model will default to *directly* outputting the final answer.
96 |
97 | ---
98 | ## 📊 Leaderboard
99 | | Rank | Model | Overall | Implicit | Explicit | Art | Life | TV | Film | Film | Phi. |
100 | |---|---|---|---|---|---|---|---|---|---|---|
101 | | 🥇 | Human | 86.0 | 80.6 | 91.2 | 57.7 | 92.3 | 90.6 | 92.3 | 90.7 | 70.0 |
102 | | 🥈 | o4-mini | 52.5 | 54.6 | 46.0 | 40.1 | 54.0 | 54.0 | 51.7 | 65.3 | 27.9 |
103 | | 🥉 | Gemini-2.5-Flash | 51.2 | 52.9 | 46.9 | 45.3 | 39.5 | 50.3 | 47.9 | 65.6 | 34.9 |
104 |
105 | *Full leaderboard in [our homepage](https://mmr-v.github.io/).*
106 |
107 | *📢 The leaderboard is constantly updating as we are welcoming new submissions!*
108 |
109 | ---
110 |
111 |
112 |
113 | ## 🎯 Experiment Results
114 |
115 | ### Performance across Different Tasks
116 |
117 |
118 |
119 |
120 |
121 | ### Impact of Audio Input
122 |
123 |
124 |
125 |
126 |
127 |
128 | ### Error Analysis
129 |
130 |
131 |
132 |
133 | ---
134 |
135 | ## 🧠 Model Response Examples
136 |
137 | The figure below presents example responses with Multimodal Chain-of-Thought (MCoT) from two reasoning models to a sample task from MMR-V. (Gemini's response omits part of the option analysis.) In the visualization, *yellow tokens represent reasoning and analysis based on textual information (e.g., the question and answer options), while green tokens indicate the model’s analysis of visual content from the video (including the question frame and evidence frames)*. It can be observed that **o4-mini** engages in deeper reasoning and analysis of the **video content**, ultimately arriving at the correct answer. In contrast, Gemini exhibits a more text-dominated reasoning strategy. This example highlights how MMR-V places greater emphasis on a model’s ability to incorporate visual information into the reasoning process and to mine multimodal cues effectively.
138 |
139 |
140 |
141 | The full video corresponding to this example can be found here: https://www.youtube.com/watch?v=g1NuAfkQ-Hw.
142 |
143 | ## 📜 Citation
144 |
145 | We sincerely appreciate it if **MMR-V** provides any inspiration or assistance to your research. Please consider citing the following article and giving us a star⭐.
146 |
147 | ```bibtex
148 | @misc{zhu2025mmrvwhatsleftunsaid,
149 | title={MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos},
150 | author={Kejian Zhu and Zhuoran Jin and Hongbang Yuan and Jiachun Li and Shangqing Tu and Pengfei Cao and Yubo Chen and Kang Liu and Jun Zhao},
151 | year={2025},
152 | eprint={2506.04141},
153 | archivePrefix={arXiv},
154 | primaryClass={cs.CV},
155 | url={https://arxiv.org/abs/2506.04141},
156 | }
157 | ```
158 |
159 | ---
160 |
--------------------------------------------------------------------------------
/annotation/To_gen/videos_w_qa_3_12.json:
--------------------------------------------------------------------------------
1 | [
2 |
3 | ]
--------------------------------------------------------------------------------
/annotation/merge_videos.py:
--------------------------------------------------------------------------------
1 | import json
2 | from collections import defaultdict
3 |
4 | def merge_json(input_file, output_file):
5 | with open(input_file, 'r', encoding='utf-8') as f:
6 | data = json.load(f)
7 |
8 | merged_data = defaultdict(lambda: {"video": "", "videoType": "", "remark": "", "questions": []})
9 |
10 | for item in data:
11 | video = item["video"]
12 | if not merged_data[video]["video"]:
13 | merged_data[video]["video"] = video
14 | merged_data[video]["videoType"] = item["videoType"]
15 | merged_data[video]["remark"] = item["remark"]
16 |
17 | question_entry = {
18 | "question": item["question"],
19 | "options": item["options"],
20 | "correctAnswer": item["correctAnswer"],
21 | "abilityType_L2": item["abilityType_L2"],
22 | "abilityType_L3": item["abilityType_L3"]
23 | }
24 | merged_data[video]["questions"].append(question_entry)
25 |
26 | result = list(merged_data.values())
27 |
28 | with open(output_file, 'w', encoding='utf-8') as f:
29 | json.dump(result, f, indent=4, ensure_ascii=False)
30 |
31 | print(f"Merged data saved to {output_file}")
32 |
33 | # 调用函数,替换 'input.json' 和 'output.json' 为你的实际文件路径
34 | merge_json('/netdisk/zhukejian/implicit_video_anonotations/annotation/annotation_part2.json', '/netdisk/zhukejian/implicit_video_anonotations/annotation/annotation_part2_output.json')
35 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, send_from_directory, render_template, request, jsonify
2 | import json
3 | import os
4 | import requests
5 | import subprocess
6 |
7 |
8 | app = Flask(__name__, static_folder='./static')
9 | VIDEO_FOLDER = './static/videos' #
10 | ANNOTATION_FILE = './annotation.json'
11 |
12 | @app.route('/videos/')
13 | def serve_video(filename):
14 | return send_from_directory(app.static_folder + '/videos', filename, mimetype='video/mp4')
15 |
16 |
17 |
18 | def load_annotations():
19 | """加载现有的标注数据"""
20 | if os.path.exists(ANNOTATION_FILE):
21 | try:
22 | with open(ANNOTATION_FILE, 'r', encoding='utf-8') as file:
23 | content = file.read().strip()
24 | if not content: # 文件为空
25 | return []
26 | return json.loads(content)
27 | except (json.JSONDecodeError, IOError) as e:
28 | print(f"Error loading annotations: {e}")
29 | return []
30 | return []
31 |
32 |
33 | def save_annotations(data):
34 | """保存标注数据到文件"""
35 | try:
36 | with open(ANNOTATION_FILE, 'w', encoding='utf-8') as file:
37 | json.dump(data, file, ensure_ascii=False, indent=4)
38 | except IOError as e:
39 | print(f"Error saving annotations: {e}")
40 |
41 |
42 | @app.route('/')
43 | def index():
44 | all_files = os.listdir(VIDEO_FOLDER)
45 | # 只保留 .mp4 文件
46 | mp4_files = [f for f in all_files if f.endswith('.mp4')]
47 | annotations = load_annotations() # 加载现有的标注数据
48 | return render_template('index.html', videos=mp4_files, annotations=annotations)
49 |
50 | # @app.route('/')
51 | # def index():
52 | # # 获取视频目录下的所有文件
53 | # all_files = os.listdir(VIDEO_FOLDER)
54 | # # 只保留 .mp4 文件
55 | # mp4_files = [f for f in all_files if f.endswith('.mp4')]
56 | # # 将文件列表传递给模板
57 | # return render_template('index.html', videos=mp4_files)
58 |
59 |
60 | @app.route('/save', methods=['POST'])
61 | def save():
62 | try:
63 | # 获取前端提交的数据
64 | data = request.get_json()
65 | video = data.get('video')
66 | remark = data.get('remark', '')
67 | video_type = data.get('videoType', 'other')
68 | questions = data.get('questions', [])
69 |
70 | # 数据验证
71 | if not video:
72 | return jsonify({'message': '视频名称不能为空!'}), 400
73 |
74 | if not isinstance(questions, list):
75 | return jsonify({'message': '问题列表格式不正确!'}), 400
76 |
77 | # 加载现有的标注数据
78 | annotations = load_annotations()
79 |
80 | # 更新或添加新的标注数据
81 | updated = False
82 | for entry in annotations:
83 | if entry['video'] == video:
84 | entry['remark'] = remark
85 | entry['videoType'] = video_type
86 | entry['questions'] = questions
87 | updated = True
88 | break
89 |
90 | if not updated:
91 | annotations.append({
92 | 'video': video,
93 | 'remark': remark,
94 | 'videoType': video_type,
95 | 'questions': questions
96 | })
97 |
98 | # 保存数据到文件
99 | save_annotations(annotations)
100 | return jsonify({'message': '标注已成功保存!'}), 200
101 |
102 | except Exception as e:
103 | print(f"Error saving annotation: {e}")
104 | return jsonify({'message': f'保存失败: {str(e)}'}), 500
105 |
106 |
107 | def download_video(url, output_folder):
108 | """
109 | 下载视频并保存为 MP4 格式。
110 | """
111 | if not os.path.exists(output_folder):
112 | os.makedirs(output_folder)
113 |
114 | output_template = os.path.join(output_folder, "%(title)s.%(ext)s")
115 | command = [
116 | "yt-dlp",
117 | "-f", "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]",
118 | "--merge-output-format", "mp4",
119 | "-o", output_template,
120 | url
121 | ]
122 |
123 | try:
124 | subprocess.run(command, check=True)
125 | return True, "视频已成功下载。"
126 | except subprocess.CalledProcessError as e:
127 | return False, f"W下载失败: {e}"
128 |
129 | @app.route('/download_video', methods=['POST'])
130 | def download_video_route():
131 | data = request.json
132 | video_url = data.get('url')
133 |
134 | if not video_url:
135 | return jsonify({"message": "请提供视频链接。"}), 400
136 |
137 | success, message = download_video(video_url, VIDEO_FOLDER)
138 |
139 | if success:
140 | return jsonify({"message": "视频下载成功!"})
141 | else:
142 | return jsonify({"message": message}), 500
143 |
144 |
145 | if __name__ == '__main__':
146 | app.run(debug=True, host='0.0.0.0', port=18888)
147 |
--------------------------------------------------------------------------------
/crawler.py:
--------------------------------------------------------------------------------
1 | import os
2 | import subprocess
3 |
4 | def download_video(url, output_folder):
5 | """
6 | 下载视频并保存为 MP4 格式。
7 |
8 | :param url: 视频的 URL(支持 YouTube、Bilibili 等)。
9 | :param output_folder: 保存视频的目标文件夹。
10 | """
11 | # 确保目标文件夹存在
12 | if not os.path.exists(output_folder):
13 | os.makedirs(output_folder)
14 |
15 | # 设置输出文件模板
16 | output_template = os.path.join(output_folder, "%(title)s.%(ext)s")
17 |
18 | # 下载命令
19 | command = [
20 | "yt-dlp", # 替代 youtube-dl 使用 yt-dlp
21 | "-f", "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]", # 下载最佳质量的 MP4 视频
22 | "--merge-output-format", "mp4", # 合并音视频为 MP4 格式
23 | "-o", output_template, # 输出文件路径
24 | url
25 | ]
26 |
27 | try:
28 | # 调用命令行执行下载
29 | subprocess.run(command, check=True)
30 | print(f"视频已成功下载到: {output_folder}")
31 | except subprocess.CalledProcessError as e:
32 | print(f"下载失败: {e}")
33 |
34 | if __name__ == "__main__":
35 | # 示例:下载 YouTube 或 Bilibili 视频
36 | video_url = "https://www.youtube.com/watch?v=4KvAoF1wcBo"
37 | save_folder = './static/videos/'
38 |
39 | download_video(video_url, save_folder)
40 | # video_page_url = "https://www.bilibili.com/video/BV12T411g7KA/?spm_id_from=888.80997.embed_other.whitelist&t=28.943664&bvid=BV12T411g7KA&vd_source=e2638f46408a99009fc4299e944cf139"
41 | # "https://www.youtube.com/watch?v=8AsZCKw53lI&list=PL68gfsJwBv3d8k3Bw6B8Qb8bQY0zIFrMW&index=6"
--------------------------------------------------------------------------------
/crop_video.py:
--------------------------------------------------------------------------------
1 | import os
2 | from moviepy.editor import VideoFileClip
3 |
4 | def crop_video(input_video_path, start_time, end_time):
5 | # 获取文件所在目录和文件名
6 | dir_name, file_name = os.path.split(input_video_path)
7 | file_base, file_ext = os.path.splitext(file_name)
8 |
9 | # 生成重命名后的原视频路径
10 | original_video_renamed = os.path.join(dir_name, f"{file_base}123{file_ext}")
11 |
12 | # 生成裁剪后的视频路径
13 | output_video_path = os.path.join(dir_name, f"{file_base}{file_ext}")
14 |
15 | # 重命名原视频
16 | os.rename(input_video_path, original_video_renamed)
17 |
18 | # 加载视频文件
19 | video = VideoFileClip(original_video_renamed)
20 |
21 | # 裁剪视频
22 | cropped_video = video.subclip(start_time, end_time)
23 |
24 | # 保存裁剪后的视频
25 | cropped_video.write_videofile(output_video_path, codec="libx264")
26 |
27 | # 删除重命名后的原视频
28 | os.remove(original_video_renamed)
29 |
30 | print(f"原视频已重命名并删除: {original_video_renamed}")
31 | print(f"裁剪后的视频已保存为: {output_video_path}")
32 |
33 | # 使用示例
34 | input_video_path = "/netdisk/zhukejian/implicit_video_anonotations/3_11_downloads/Dinner for few | Animated short film by Nassos Vakalis.mp4"
35 | start_time = 27 # 开始时间(秒)
36 | end_time = 609 # 结束时间(秒)
37 |
38 | crop_video(input_video_path, start_time, end_time)
39 |
--------------------------------------------------------------------------------
/cut_video.py:
--------------------------------------------------------------------------------
1 | import ffmpeg
2 |
3 | def crop_video(input_path, output_path):
4 | # 获取视频信息
5 | probe = ffmpeg.probe(input_path)
6 | video_stream = next((stream for stream in probe['streams'] if stream['codec_type'] == 'video'), None)
7 | if video_stream is None:
8 | raise ValueError("No video stream found in input file")
9 |
10 | width = int(video_stream['width'])
11 | height = int(video_stream['height'])
12 |
13 | # 计算裁剪的高度
14 | new_height = int(height * 0.6) # 90% - 10% = 80%
15 | y_offset = int(height * 0.2) # 从10%处开始
16 |
17 | # 使用 ffmpeg 进行裁剪
18 | ffmpeg.input(input_path).crop(x=0, y=y_offset, width=width, height=new_height).output(output_path).run()
19 |
20 | print(f"裁剪完成,输出文件: {output_path}")
21 |
22 | # 示例调用
23 | input_video = ""
24 | output_video = ""
25 | crop_video(input_video, output_video)
--------------------------------------------------------------------------------
/dataset/__pycache__/load_MMR_V.cpython-310.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/dataset/__pycache__/load_MMR_V.cpython-310.pyc
--------------------------------------------------------------------------------
/dataset/__pycache__/load_MMR_V.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/dataset/__pycache__/load_MMR_V.cpython-39.pyc
--------------------------------------------------------------------------------
/dataset/load_MMR_V.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | from utils import read_json
4 | import random
5 |
6 | def load_MMR_V_4o_error():
7 |
8 | file_paths = [
9 | "/netdisk/zhukejian/MMR_V/MMR-V - 4o - wrong.json",
10 | ]
11 |
12 | samples = None
13 |
14 | for path in file_paths:
15 | if os.path.exists(path):
16 | samples = read_json(path)
17 | print(f"Read data from {path}")
18 | break # 一旦找到有效路径,停止遍历
19 |
20 | # 如果没有找到有效路径,抛出错误
21 | if samples is None:
22 | raise FileNotFoundError("None of the provided file paths are valid.")
23 |
24 | # breakpoint()
25 | print(f"Load {len(samples)} samples for the text-audio-to-text preference task.")
26 | return samples
27 |
28 | def load_MMR_V():
29 | file_paths = [
30 | # "/mnt/userdata/MMR_V/MMR-V - video -llava.json"
31 | #"/netdisk/zhukejian/MMR_V/MMR-V - split.json",
32 | #"/mnt/userdata/MMR_V/MMR-V - split.json"
33 | ]
34 |
35 | samples = None
36 |
37 | for path in file_paths:
38 | if os.path.exists(path):
39 | samples = read_json(path)
40 | print(f"Read data from {path}")
41 | break # 一旦找到有效路径,停止遍历
42 |
43 | # 如果没有找到有效路径,抛出错误
44 | if samples is None:
45 | raise FileNotFoundError("None of the provided file paths are valid.")
46 |
47 | # breakpoint()
48 | print(f"Load {len(samples)} samples for MMR-V.")
49 | return samples
50 |
51 |
52 |
53 |
54 |
55 |
56 | if __name__ == '__main__':
57 |
--------------------------------------------------------------------------------
/downloader.py:
--------------------------------------------------------------------------------
1 | import subprocess
2 | import sys
3 | import os
4 | import json
5 |
6 | # 确保 yt-dlp 已安装或更新
7 | def install_or_update_yt_dlp():
8 | try:
9 | subprocess.run([sys.executable, "-m", "pip", "install", "--upgrade", "yt-dlp"], check=True)
10 | print("✅ yt-dlp 已安装/更新成功!")
11 | except subprocess.CalledProcessError:
12 | print("❌ 安装/更新 yt-dlp 失败,请手动安装!")
13 | sys.exit(1)
14 |
15 | # 检查 ffmpeg 是否安装
16 | def check_ffmpeg():
17 | try:
18 | subprocess.run(["ffmpeg", "-version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
19 | print("✅ ffmpeg 已安装")
20 | except FileNotFoundError:
21 | print("❌ 未找到 ffmpeg,请先安装!")
22 | sys.exit(1)
23 |
24 | # 下载 YouTube 视频
25 | def download_youtube_video(url, output_folder="./3_13_downloads", cookies_file="cookies.txt"):
26 | os.makedirs(output_folder, exist_ok=True)
27 |
28 | cmd = [
29 | "yt-dlp",
30 | "-f", "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]",
31 | "--merge-output-format", "mp4",
32 | "-o", f"{output_folder}/%(title)s.%(ext)s",
33 | url
34 | ]
35 |
36 | # 使用 Cookies 认证
37 | if cookies_file and os.path.exists(cookies_file):
38 | cmd += ["--cookies", cookies_file]
39 |
40 | try:
41 | subprocess.run(cmd, check=True)
42 | print(f"✅ 下载完成: {url}")
43 | except subprocess.CalledProcessError:
44 | print(f"❌ 下载失败: {url}")
45 |
46 | if __name__ == "__main__":
47 | # install_or_update_yt_dlp()
48 | # check_ffmpeg()
49 |
50 | json_file = "videos.json"
51 |
52 | if not os.path.exists(json_file):
53 | print(f"❌ 未找到 JSON 文件: {json_file}")
54 | sys.exit(1)
55 |
56 | with open(json_file, "r", encoding="utf-8") as f:
57 | data = json.load(f)
58 |
59 | video_urls = data.get("videos", [])
60 |
61 | if not video_urls:
62 | print("❌ JSON 文件中未找到有效的 YouTube 视频 URL!")
63 | sys.exit(1)
64 |
65 | for url in video_urls:
66 | download_youtube_video(url)
67 |
68 | print("🎉 所有视频下载任务完成!")
69 |
--------------------------------------------------------------------------------
/evaluation/InternVL3-8B_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | import json
5 | import argparse
6 | import re
7 | # 加载 .env 文件中的环境变量
8 | # load_dotenv()
9 | # 从环境变量中获取 API 密钥
10 | from loguru import logger as eval_logger
11 | from utils.video_utils import OpenAI,VIDEO_TOKEN
12 | from utils import write_to_json, read_json
13 | from dataset.load_MMR_V import load_MMR_V
14 |
15 | prompt_template = """
16 | [[INSTRUCTIONS]]
17 | Please select the best answer to the following multiple-choice question based on the video.
18 | Only one option is the most accurate answer in relation to the question and the video.
19 |
20 | What is the correct answer to this question [[QUESTION]]
21 | Options:
22 | [[OPTIONS]]
23 | [[END OF INSTRUCTIONS]]
24 | [[QUESTION]]
25 | {question}
26 | [[END OF QUESTION]]
27 | [[OPTIONS]]
28 | {options}
29 | [[END OF OPTIONS]]
30 | [[OUTPUT FORMAT]]
31 | Format your answer as follows:
32 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
33 | Directly give the final correct option number in the following format: "[[X]]"
34 | [[END OF OUTPUT FORMAT]]
35 | """
36 |
37 | def extract_last_option(text):
38 | """从文本中倒序查找最后一个出现的A-D选项"""
39 | matches = re.findall(r'\b([A-L])\b', text.upper())
40 | return matches[-1] if matches else None
41 |
42 | def get_unique_id(elem):
43 | return elem["question"]
44 |
45 | if __name__ == '__main__':
46 | print("Hello World")
47 | parser = argparse.ArgumentParser()
48 | parser.add_argument(
49 | "--api_url",
50 | type=str,
51 | default="https://api.gpt.ge/v1/chat/completions",
52 | help="URL for the API endpoint."
53 | )
54 | parser.add_argument(
55 | "--api_key",
56 | type=str,
57 | help="API key for authentication."
58 | )
59 | parser.add_argument(
60 | "--continue_eval",
61 | action="store_true",
62 | default=True,
63 | help="continue evaluation from existing result file"
64 | )
65 | parser.add_argument(
66 | "--overwrite",
67 | action="store_true",
68 | default=False,
69 | help="overwrite the existing result file"
70 | )
71 | args = parser.parse_args()
72 | samples = load_MMR_V()
73 | model_name = 'InternVL3-8B'
74 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
75 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
76 |
77 | file_paths = [
78 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
79 | "/netdisk/zhukejian",
80 | "/mnt/userdata"
81 | ]
82 |
83 | for path in file_paths:
84 | if os.path.exists(f"{path}/implicit_video_anonotations"):
85 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
86 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
87 | break # 一旦找到有效路径,停止遍历
88 |
89 | results = []
90 | id_set = set()
91 | id2sample = {}
92 | # breakpoint()
93 | if args.continue_eval:
94 | if os.path.isfile(save_file):
95 | print(f"Continue eval from file {save_file}")
96 | results = read_json(save_file)
97 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None and elem[f"{model_name}_raw_response"] != ""]
98 | print(f"Load {len(results)} results...")
99 | id_set = set([get_unique_id(elem) for elem in results])
100 | id2sample = {get_unique_id(elem): elem for elem in results}
101 | else:
102 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
103 | elif args.overwrite:
104 | if os.path.isfile(save_file):
105 | print(f"Choose to overwrite existing file {save_file}")
106 | else:
107 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
108 | else:
109 | if os.path.isfile(save_file):
110 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
111 |
112 | client = OpenAI(
113 | model_version=model_name,
114 | api_type='openai',
115 | api_key="",
116 | api_url="http://210.75.240.156:52578/v1/chat/completions",
117 | default_headers={"x-foo": "true"},
118 | max_num_frames=8,
119 | )
120 | # breakpoint()
121 |
122 | for idx,sample in enumerate(samples[:]):
123 |
124 | curr_id = get_unique_id(sample)
125 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None and id2sample[curr_id][f"{model_name}_raw_response"] != "":
126 | continue
127 |
128 | print(f"******** idx={idx} **********")
129 |
130 | video_path = os.path.join(visual_path,sample["video"])
131 | question = sample["question"]
132 | options = sample["options"]
133 | full_prompt = prompt_template.format(
134 | question=question,
135 | options=options,
136 | )
137 |
138 | response = client.generate(
139 | visuals=video_path,
140 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
141 | )
142 | print(response)
143 | sample[f"{model_name}_raw_response"] = response
144 |
145 | if isinstance(response, str):
146 | # 先尝试原始的 [[X]] 提取
147 | json_regex = r'\[\[([A-L])\]\]'
148 | match = re.search(json_regex, response)
149 | if match:
150 | final_answer = match.group(1)
151 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
152 | print(f"Extracted answer: {final_answer}")
153 | else:
154 | # 回退到 \boxed{X} 格式的提取
155 | box_regex = r'\\boxed\{([A-L])\}'
156 | box_match = re.search(box_regex, response)
157 | if box_match:
158 | final_answer = box_match.group(1)
159 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
160 | print(f"Extracted answer from boxed pattern: {final_answer}")
161 | else:
162 | option = extract_last_option(response)
163 | if option:
164 | sample[f"{model_name}_response"] = {"final_answer": option}
165 | else:
166 | print("No matching answer found in response.")
167 | # 仍然存储原始响应以便检查
168 | sample[f"{model_name}_raw_response"] = response
169 | else:
170 | print("Invalid response type received.")
171 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
172 |
173 | results.append(sample)
174 | # Write the results to the output file
175 | write_to_json(results, save_file, indent=4)
176 |
177 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
178 | eval_logger.info("Finished Running!")
179 |
--------------------------------------------------------------------------------
/evaluation/Phi-4-multimodal-instruct_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import argparse
8 | import re
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 | from loguru import logger as eval_logger
13 | from utils.video_utils import OpenAI,VIDEO_TOKEN
14 | from utils import write_to_json, read_json
15 | from dataset.load_MMR_V import load_MMR_V
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
35 | Directly give the final correct option number in the following format: "[[X]]"
36 | [[END OF OUTPUT FORMAT]]
37 | """
38 |
39 | def extract_last_option(text):
40 | """从文本中倒序查找最后一个出现的A-D选项"""
41 | matches = re.findall(r'\b([A-L])\b', text.upper())
42 | return matches[-1] if matches else None
43 |
44 | def get_unique_id(elem):
45 | return elem["question"]
46 |
47 | if __name__ == '__main__':
48 | print("Hello World")
49 | parser = argparse.ArgumentParser()
50 | parser.add_argument(
51 | "--api_url",
52 | type=str,
53 | default="https://api.gpt.ge/v1/chat/completions",
54 | help="URL for the API endpoint."
55 | )
56 | parser.add_argument(
57 | "--api_key",
58 | type=str,
59 | help="API key for authentication."
60 | )
61 | parser.add_argument(
62 | "--continue_eval",
63 | action="store_true",
64 | default=True,
65 | help="continue evaluation from existing result file"
66 | )
67 | parser.add_argument(
68 | "--overwrite",
69 | action="store_true",
70 | default=False,
71 | help="overwrite the existing result file"
72 | )
73 | args = parser.parse_args()
74 | samples = load_MMR_V()
75 | model_name = 'Phi-4-multimodal-instruct'
76 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
77 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
78 |
79 | file_paths = [
80 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
81 | "/netdisk/zhukejian",
82 | "/mnt/userdata"
83 | ]
84 |
85 | for path in file_paths:
86 | if os.path.exists(f"{path}/implicit_video_anonotations"):
87 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
88 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
89 | break # 一旦找到有效路径,停止遍历
90 |
91 | results = []
92 | id_set = set()
93 | id2sample = {}
94 | # breakpoint()
95 | if args.continue_eval:
96 | if os.path.isfile(save_file):
97 | print(f"Continue eval from file {save_file}")
98 | results = read_json(save_file)
99 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None and elem[f"{model_name}_raw_response"] != '']
100 | print(f"Load {len(results)} results...")
101 | id_set = set([get_unique_id(elem) for elem in results])
102 | id2sample = {get_unique_id(elem): elem for elem in results}
103 | else:
104 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
105 | elif args.overwrite:
106 | if os.path.isfile(save_file):
107 | print(f"Choose to overwrite existing file {save_file}")
108 | else:
109 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
110 | else:
111 | if os.path.isfile(save_file):
112 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
113 |
114 | client = OpenAI(
115 | model_version='/mnt/usercache/zhaosuifeng/model/Phi-4-multimodal-instruct/',
116 | api_type='openai',
117 | api_key="",
118 | api_url="http://210.75.240.155:22345/v1/chat/completions",
119 | default_headers={"x-foo": "true"},
120 | max_num_frames=8,
121 | )
122 | # breakpoint()
123 |
124 | for idx,sample in enumerate(samples[:]):
125 |
126 | curr_id = get_unique_id(sample)
127 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None and id2sample[curr_id][f"{model_name}_raw_response"] != '':
128 | continue
129 |
130 | print(f"******** idx={idx} **********")
131 |
132 | video_path = os.path.join(visual_path,sample["video"])
133 | question = sample["question"]
134 | options = sample["options"]
135 | full_prompt = prompt_template.format(
136 | question=question,
137 | options=options,
138 | )
139 |
140 | response = client.generate(
141 | visuals=video_path,
142 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
143 | )
144 | print(response)
145 | sample[f"{model_name}_raw_response"] = response
146 |
147 | if isinstance(response, str):
148 | # 先尝试原始的 [[X]] 提取
149 | json_regex = r'\[\[([A-L])\]\]'
150 | match = re.search(json_regex, response)
151 | if match:
152 | final_answer = match.group(1)
153 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
154 | print(f"Extracted answer: {final_answer}")
155 | else:
156 | # 回退到 \boxed{X} 格式的提取
157 | box_regex = r'\\boxed\{([A-L])\}'
158 | box_match = re.search(box_regex, response)
159 | if box_match:
160 | final_answer = box_match.group(1)
161 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
162 | print(f"Extracted answer from boxed pattern: {final_answer}")
163 | else:
164 | option = extract_last_option(response)
165 | if option:
166 | sample[f"{model_name}_response"] = {"final_answer": option}
167 | else:
168 | print("No matching answer found in response.")
169 | # 仍然存储原始响应以便检查
170 | sample[f"{model_name}_raw_response"] = response
171 | else:
172 | print("Invalid response type received.")
173 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
174 |
175 | results.append(sample)
176 | # Write the results to the output file
177 | write_to_json(results, save_file, indent=4)
178 |
179 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
180 | eval_logger.info("Finished Running!")
181 |
--------------------------------------------------------------------------------
/evaluation/Phi-4-multimodal-instruct_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import argparse
8 | import re
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 | from loguru import logger as eval_logger
13 | from utils.video_utils import OpenAI,VIDEO_TOKEN
14 | from utils import write_to_json, read_json
15 | from dataset.load_MMR_V import load_MMR_V
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | [[END OF INSTRUCTIONS]]
27 | [[QUESTION]]
28 | {question}
29 | [[END OF QUESTION]]
30 | [[OPTIONS]]
31 | {options}
32 | [[END OF OPTIONS]]
33 | [[OUTPUT FORMAT]]
34 | Format your answer as follows:
35 | Your thinking process.
36 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
37 | give the final correct option number in the following format: "[[X]]"
38 | [[END OF OUTPUT FORMAT]]
39 | """
40 |
41 | def extract_last_option(text):
42 | """从文本中倒序查找最后一个出现的A-D选项"""
43 | matches = re.findall(r'\b([A-L])\b', text.upper())
44 | return matches[-1] if matches else None
45 |
46 | def get_unique_id(elem):
47 | return elem["question"]
48 |
49 | if __name__ == '__main__':
50 | print("Hello World")
51 | parser = argparse.ArgumentParser()
52 | parser.add_argument(
53 | "--api_url",
54 | type=str,
55 | default="https://api.gpt.ge/v1/chat/completions",
56 | help="URL for the API endpoint."
57 | )
58 | parser.add_argument(
59 | "--api_key",
60 | type=str,
61 | help="API key for authentication."
62 | )
63 | parser.add_argument(
64 | "--continue_eval",
65 | action="store_true",
66 | default=True,
67 | help="continue evaluation from existing result file"
68 | )
69 | parser.add_argument(
70 | "--overwrite",
71 | action="store_true",
72 | default=False,
73 | help="overwrite the existing result file"
74 | )
75 | args = parser.parse_args()
76 | samples = load_MMR_V()
77 | model_name = 'Phi-4-multimodal-instruct'
78 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
79 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
80 |
81 | file_paths = [
82 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
83 | "/netdisk/zhukejian",
84 | "/mnt/userdata"
85 | ]
86 |
87 | for path in file_paths:
88 | if os.path.exists(f"{path}/implicit_video_anonotations"):
89 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
90 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
91 | break # 一旦找到有效路径,停止遍历
92 |
93 | results = []
94 | id_set = set()
95 | id2sample = {}
96 | # breakpoint()
97 | if args.continue_eval:
98 | if os.path.isfile(save_file):
99 | print(f"Continue eval from file {save_file}")
100 | results = read_json(save_file)
101 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None and elem[f"{model_name}_raw_response"] != '']
102 | print(f"Load {len(results)} results...")
103 | id_set = set([get_unique_id(elem) for elem in results])
104 | id2sample = {get_unique_id(elem): elem for elem in results}
105 | else:
106 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
107 | elif args.overwrite:
108 | if os.path.isfile(save_file):
109 | print(f"Choose to overwrite existing file {save_file}")
110 | else:
111 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
112 | else:
113 | if os.path.isfile(save_file):
114 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
115 |
116 | client = OpenAI(
117 | model_version='/mnt/usercache/zhaosuifeng/model/Phi-4-multimodal-instruct/',
118 | api_type='openai',
119 | api_key="",
120 | api_url="http://210.75.240.155:22345/v1/chat/completions",
121 | default_headers={"x-foo": "true"},
122 | max_num_frames=8,
123 | )
124 | # breakpoint()
125 |
126 | for idx,sample in enumerate(samples[:]):
127 |
128 | curr_id = get_unique_id(sample)
129 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None and id2sample[curr_id][f"{model_name}_raw_response"] != '':
130 | continue
131 |
132 | print(f"******** idx={idx} **********")
133 |
134 | video_path = os.path.join(visual_path,sample["video"])
135 | question = sample["question"]
136 | options = sample["options"]
137 | full_prompt = prompt_template.format(
138 | question=question,
139 | options=options,
140 | )
141 |
142 | response = client.generate(
143 | visuals=video_path,
144 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
145 | )
146 | print(response)
147 | sample[f"{model_name}_raw_response"] = response
148 |
149 | if isinstance(response, str):
150 | # 尝试 [[X]] 的所有匹配
151 | json_regex = r'\[\[([A-L])\]\]'
152 | all_answers = re.findall(json_regex, response)
153 | if all_answers:
154 | final_answer = all_answers[-1]
155 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
156 | print(f"Extracted last answer: {final_answer}")
157 | else:
158 | # 回退到 \boxed{X}
159 | box_regex = r'\\boxed\{([A-L])\}'
160 | all_boxed = re.findall(box_regex, response)
161 | if all_boxed:
162 | final_answer = all_boxed[-1]
163 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
164 | print(f"Extracted last boxed answer: {final_answer}")
165 | else:
166 | option = extract_last_option(response)
167 | if option:
168 | sample[f"{model_name}_response"] = {"final_answer": option}
169 | else:
170 | print("No matching answer found in response.")
171 | # 仍然存储原始响应以便检查
172 | sample[f"{model_name}_raw_response"] = response
173 | else:
174 | print("Invalid response type received.")
175 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
176 |
177 | results.append(sample)
178 | # Write the results to the output file
179 | write_to_json(results, save_file, indent=4)
180 |
181 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
182 | eval_logger.info("Finished Running!")
183 |
--------------------------------------------------------------------------------
/evaluation/claude-3-5-sonnet-20241022_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | # 加载 .env 文件中的环境变量
9 | # load_dotenv()
10 | # 从环境变量中获取 API 密钥
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | [[END OF INSTRUCTIONS]]
27 | [[QUESTION]]
28 | {question}
29 | [[END OF QUESTION]]
30 | [[OPTIONS]]
31 | {options}
32 | [[END OF OPTIONS]]
33 | [[OUTPUT FORMAT]]
34 | Format your answer as follows:
35 | Please directly output the answer letter without thinking and explanation.
36 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
37 | give the final correct option number in the following format: \"[[X]]\"
38 | [[END OF OUTPUT FORMAT]]
39 | """
40 |
41 | if __name__ == '__main__':
42 | print("Hello World")
43 |
44 | samples = load_MMR_V()
45 | model_name = 'claude-3-5-sonnet-20241022'
46 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
47 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
48 |
49 | save_file = f'/mnt/userdata/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
50 | visual_path = '/mnt/userdata/implicit_video_anonotations/static/videos'
51 |
52 | client = OpenAI(
53 | model_version=model_name,
54 | api_type='openai',
55 | api_key=api_key,
56 | api_url="https://api.gpt.ge/v1/chat/completions",
57 | default_headers={"x-foo": "true"},
58 | max_num_frames=32,
59 | )
60 | # breakpoint()
61 | results = []
62 | for idx,sample in enumerate(samples[:]):
63 | print(f"******** idx={idx} **********")
64 | if idx<1192:
65 | continue
66 | # breakpoint()
67 | # if idx>=10:
68 | # break
69 | video_path = os.path.join(visual_path,sample["video"])
70 | question = sample["question"]
71 | options = sample["options"]
72 | full_prompt = prompt_template.format(
73 | question=question,
74 | options=options,
75 | )
76 |
77 | response = client.generate(
78 | visuals=video_path,
79 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
80 | )
81 | print(response)
82 | sample[f"{model_name}_raw_response"] = response
83 | # breakpoint()
84 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
85 |
86 | # Use findall to match all possible JSON blocks
87 | # matches = re.findall(json_regex, response, re.DOTALL)
88 |
89 | if isinstance(response, str):
90 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
91 | match = re.search(json_regex, response)
92 |
93 | if match:
94 | final_answer = match.group(1)
95 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
96 | print(f"Extracted answer: {final_answer}")
97 | else:
98 | print("No matching answer found in response.")
99 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
100 | else:
101 | print("Invalid response type received.")
102 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
103 | results.append(sample)
104 | # Write the results to the output file
105 | write_to_json(results, save_file, indent=4)
106 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
107 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/claude-3-5-sonnet-20241022_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | # 加载 .env 文件中的环境变量
9 | # load_dotenv()
10 | # 从环境变量中获取 API 密钥
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | Let's think step by step.
27 |
28 | [[END OF INSTRUCTIONS]]
29 | [[QUESTION]]
30 | {question}
31 | [[END OF QUESTION]]
32 | [[OPTIONS]]
33 | {options}
34 | [[END OF OPTIONS]]
35 | [[OUTPUT FORMAT]]
36 | Format your answer as follows:
37 | Your thinking process.
38 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
39 | give the final correct option number in the following format: \"[[X]]\"
40 | [[END OF OUTPUT FORMAT]]
41 | """
42 |
43 | if __name__ == '__main__':
44 | print("Hello World")
45 |
46 | samples = load_MMR_V()
47 | model_name = 'claude-3-5-sonnet-20241022'
48 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
49 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
50 |
51 | save_file = f'/mnt/userdata/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
52 | visual_path = '/mnt/userdata/implicit_video_anonotations/static/videos'
53 |
54 | client = OpenAI(
55 | model_version=model_name,
56 | api_type='openai',
57 | api_key=api_key,
58 | api_url="https://api.gpt.ge/v1/chat/completions",
59 | default_headers={"x-foo": "true"},
60 | max_num_frames=32,
61 | )
62 | # breakpoint()
63 | results = []
64 | for idx,sample in enumerate(samples[:]):
65 | print(f"******** idx={idx} **********")
66 | if idx<969:
67 | continue
68 | # breakpoint()
69 | # if idx>=10:
70 | # break
71 | video_path = os.path.join(visual_path,sample["video"])
72 | question = sample["question"]
73 | options = sample["options"]
74 | full_prompt = prompt_template.format(
75 | question=question,
76 | options=options,
77 | )
78 |
79 | response = client.generate(
80 | visuals=video_path,
81 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
82 | )
83 | print(response)
84 | sample[f"{model_name}_raw_response"] = response
85 | # breakpoint()
86 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
87 |
88 | # Use findall to match all possible JSON blocks
89 | # matches = re.findall(json_regex, response, re.DOTALL)
90 |
91 | if isinstance(response, str):
92 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
93 | match = re.search(json_regex, response)
94 |
95 | if match:
96 | final_answer = match.group(1)
97 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
98 | print(f"Extracted answer: {final_answer}")
99 | else:
100 | print("No matching answer found in response.")
101 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
102 | else:
103 | print("Invalid response type received.")
104 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
105 | results.append(sample)
106 | # Write the results to the output file
107 | write_to_json(results, save_file, indent=4)
108 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
109 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/cli-cog-vlm.py:
--------------------------------------------------------------------------------
1 | import io
2 | import numpy as np
3 | import torch
4 | from decord import cpu, VideoReader, bridge
5 | from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
6 | import argparse
7 |
8 | MODEL_PATH = "/mnt/userdata/MODELS/THUDM/cogvlm2-video-llama3-chat"
9 | DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
10 | TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[
11 | 0] >= 8 else torch.float16
12 |
13 | parser = argparse.ArgumentParser(description="CogVLM2-Video CLI Demo")
14 | parser.add_argument('--quant', type=int, choices=[4, 8], help='Enable 4-bit or 8-bit precision loading', default=0)
15 | args = parser.parse_args()
16 |
17 | if 'int4' in MODEL_PATH:
18 | args.quant = 4
19 |
20 |
21 | def load_video(video_path, strategy='chat'):
22 | bridge.set_bridge('torch')
23 | with open(video_path, 'rb') as f:
24 | mp4_stream = f.read()
25 | num_frames = 24
26 |
27 | if mp4_stream is not None:
28 | decord_vr = VideoReader(io.BytesIO(mp4_stream), ctx=cpu(0))
29 | else:
30 | decord_vr = VideoReader(video_path, ctx=cpu(0))
31 | frame_id_list = None
32 | total_frames = len(decord_vr)
33 | if strategy == 'base':
34 | clip_end_sec = 60
35 | clip_start_sec = 0
36 | start_frame = int(clip_start_sec * decord_vr.get_avg_fps())
37 | end_frame = min(total_frames,
38 | int(clip_end_sec * decord_vr.get_avg_fps())) if clip_end_sec is not None else total_frames
39 | frame_id_list = np.linspace(start_frame, end_frame - 1, num_frames, dtype=int)
40 | elif strategy == 'chat':
41 | timestamps = decord_vr.get_frame_timestamp(np.arange(total_frames))
42 | timestamps = [i[0] for i in timestamps]
43 | max_second = round(max(timestamps)) + 1
44 | frame_id_list = []
45 | for second in range(max_second):
46 | closest_num = min(timestamps, key=lambda x: abs(x - second))
47 | index = timestamps.index(closest_num)
48 | frame_id_list.append(index)
49 | if len(frame_id_list) >= num_frames:
50 | break
51 | video_data = decord_vr.get_batch(frame_id_list)
52 | video_data = video_data.permute(3, 0, 1, 2)
53 | return video_data
54 |
55 |
56 | tokenizer = AutoTokenizer.from_pretrained(
57 | MODEL_PATH,
58 | trust_remote_code=True,
59 | # padding_side="left"
60 | )
61 |
62 | if torch.cuda.is_available() and torch.cuda.get_device_properties(0).total_memory < 48 * 1024 ** 3 and not args.quant:
63 | print("GPU memory is less than 48GB. Please use cli_demo_multi_gpus.py or pass `--quant 4` or `--quant 8`.")
64 | exit()
65 |
66 | # Load the model
67 | if args.quant == 4:
68 | model = AutoModelForCausalLM.from_pretrained(
69 | MODEL_PATH,
70 | torch_dtype=TORCH_TYPE,
71 | trust_remote_code=True,
72 | quantization_config=BitsAndBytesConfig(
73 | load_in_4bit=True,
74 | bnb_4bit_compute_dtype=TORCH_TYPE,
75 | ),
76 | low_cpu_mem_usage=True
77 | ).eval()
78 | elif args.quant == 8:
79 | model = AutoModelForCausalLM.from_pretrained(
80 | MODEL_PATH,
81 | torch_dtype=TORCH_TYPE,
82 | trust_remote_code=True,
83 | quantization_config=BitsAndBytesConfig(
84 | load_in_8bit=True,
85 | bnb_4bit_compute_dtype=TORCH_TYPE,
86 | ),
87 | low_cpu_mem_usage=True
88 | ).eval()
89 | else:
90 | model = AutoModelForCausalLM.from_pretrained(
91 | MODEL_PATH,
92 | torch_dtype=TORCH_TYPE,
93 | trust_remote_code=True
94 | ).eval().to(DEVICE)
95 |
96 | while True:
97 | strategy = 'base' if 'cogvlm2-video-llama3-base' in MODEL_PATH else 'chat'
98 | print(f"using with {strategy} model")
99 | video_path = input("video path >>>>> ")
100 | if video_path == '':
101 | print('You did not enter video path, the following will be a plain text conversation.')
102 | video = None
103 | else:
104 | video = load_video(video_path, strategy=strategy)
105 |
106 | history = []
107 | while True:
108 | query = input("Human:")
109 | if query == "clear":
110 | break
111 |
112 | inputs = model.build_conversation_input_ids(
113 | tokenizer=tokenizer,
114 | query=query,
115 | images=[video],
116 | history=history,
117 | template_version=strategy
118 | )
119 |
120 | inputs = {
121 | 'input_ids': inputs['input_ids'].unsqueeze(0).to(DEVICE),
122 | 'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to(DEVICE),
123 | 'attention_mask': inputs['attention_mask'].unsqueeze(0).to(DEVICE),
124 | 'images': [[inputs['images'][0].to('cuda').to(TORCH_TYPE)]],
125 | }
126 | gen_kwargs = {
127 | "max_new_tokens": 2048,
128 | "pad_token_id": 128002,
129 | "top_k": 1,
130 | "do_sample": True,
131 | "top_p": 0.1,
132 | "temperature": 0.1,
133 | }
134 | with torch.no_grad():
135 | outputs = model.generate(**inputs, **gen_kwargs)
136 | outputs = outputs[:, inputs['input_ids'].shape[1]:]
137 | response = tokenizer.decode(outputs[0], skip_special_tokens=True)
138 | print("\nCogVLM2-Video:", response)
139 | history.append((query, response))
--------------------------------------------------------------------------------
/evaluation/gemini-2.0-flash-thinking_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | import json
6 | import re
7 | # 加载 .env 文件中的环境变量
8 | # load_dotenv()
9 | # 从环境变量中获取 API 密钥
10 | from loguru import logger as eval_logger
11 | from utils.video_utils import OpenAI,VIDEO_TOKEN
12 | from utils import write_to_json
13 | from dataset.load_MMR_V import load_MMR_V
14 |
15 |
16 | prompt_template = """
17 | [[INSTRUCTIONS]]
18 | Please select the best answer to the following multiple-choice question based on the video.
19 | Only one option is the most accurate answer in relation to the question and the video.
20 |
21 | What is the correct answer to this question [[QUESTION]]
22 | Options:
23 | [[OPTIONS]]
24 |
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
35 | give the final correct option number in the following format: \"[[X]]\"
36 | [[END OF OUTPUT FORMAT]]
37 | """
38 |
39 | if __name__ == '__main__':
40 | print("Hello World")
41 |
42 | samples = load_MMR_V()
43 | model_name = 'gemini-2.0-flash-thinking-exp'
44 | save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V_frame32.json'
45 | visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
46 |
47 | client = OpenAI(
48 | model_version=model_name,
49 | api_type='openai',
50 | api_key=api_key,
51 | api_url="https://api.gpt.ge/v1/chat/completions",
52 | default_headers={"x-foo": "true"},
53 | max_num_frames=16,
54 | )
55 | # breakpoint()
56 | results = []
57 | for idx,sample in enumerate(samples[:]):
58 | print(f"******** idx={idx} **********")
59 | # if idx<848:
60 | # continue
61 | # breakpoint()
62 | if idx>=3:
63 | break
64 | video_path = os.path.join(visual_path,sample["video"])
65 | question = sample["question"]
66 | options = sample["options"]
67 | full_prompt = prompt_template.format(
68 | question=question,
69 | options=options,
70 | )
71 |
72 | response = client.generate(
73 | visuals=video_path,
74 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
75 | )
76 | print(response)
77 | sample[f"{model_name}_raw_response"] = response
78 | # breakpoint()
79 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
80 |
81 | # Use findall to match all possible JSON blocks
82 | # matches = re.findall(json_regex, response, re.DOTALL)
83 |
84 | if isinstance(response, str):
85 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
86 | match = re.search(json_regex, response)
87 |
88 | if match:
89 | final_answer = match.group(1)
90 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
91 | print(f"Extracted answer: {final_answer}")
92 | else:
93 | print("No matching answer found in response.")
94 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
95 | else:
96 | print("Invalid response type received.")
97 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
98 | results.append(sample)
99 | # Write the results to the output file
100 | write_to_json(results, save_file, indent=4)
101 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
102 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/gemini-2.0-flash-thinking_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | # 加载 .env 文件中的环境变量
9 | # load_dotenv()
10 | # 从环境变量中获取 API 密钥
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | Let's think step by step.
27 |
28 | [[END OF INSTRUCTIONS]]
29 | [[QUESTION]]
30 | {question}
31 | [[END OF QUESTION]]
32 | [[OPTIONS]]
33 | {options}
34 | [[END OF OPTIONS]]
35 | [[OUTPUT FORMAT]]
36 | Format your answer as follows:
37 | Your thinking process.
38 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
39 | give the final correct option number in the following format: \"[[X]]\"
40 | [[END OF OUTPUT FORMAT]]
41 | """
42 |
43 | if __name__ == '__main__':
44 | print("Hello World")
45 |
46 | samples = load_MMR_V()
47 | model_name = 'gemini-2.0-flash-thinking-exp-01-21'
48 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V_cotjson'
49 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
50 |
51 | save_file = f'/mnt/userdata/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
52 | visual_path = '/mnt/userdata/implicit_video_anonotations/static/videos'
53 |
54 | client = OpenAI(
55 | model_version=model_name,
56 | api_type='openai',
57 | api_key=api_key,
58 | api_url="https://api.gpt.ge/v1/chat/completions",
59 | default_headers={"x-foo": "true"},
60 | max_num_frames=16,
61 | )
62 | # breakpoint()
63 | results = []
64 | for idx,sample in enumerate(samples[:]):
65 | print(f"******** idx={idx} **********")
66 | if idx<1081:
67 | continue
68 | # breakpoint()
69 | # if idx>=3:
70 | # break
71 | video_path = os.path.join(visual_path,sample["video"])
72 | question = sample["question"]
73 | options = sample["options"]
74 | full_prompt = prompt_template.format(
75 | question=question,
76 | options=options,
77 | )
78 |
79 | response = client.generate(
80 | visuals=video_path,
81 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
82 | )
83 | print(response)
84 | sample[f"{model_name}_raw_response"] = response
85 | # breakpoint()
86 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
87 |
88 | # Use findall to match all possible JSON blocks
89 | # matches = re.findall(json_regex, response, re.DOTALL)
90 | # breakpoint()
91 | if isinstance(response, str):
92 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
93 | match = re.search(json_regex, response)
94 |
95 | if match:
96 | final_answer = match.group(1)
97 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
98 | print(f"Extracted answer: {final_answer}")
99 | else:
100 | print("No matching answer found in response.")
101 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
102 | else:
103 | print("Invalid response type received.")
104 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
105 | results.append(sample)
106 | # Write the results to the output file
107 | write_to_json(results, save_file, indent=4)
108 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
109 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/gemini-2.0-flash_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | import json
6 | import re
7 | # 加载 .env 文件中的环境变量
8 | # load_dotenv()
9 | # 从环境变量中获取 API 密钥
10 |
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | [[END OF INSTRUCTIONS]]
27 | [[QUESTION]]
28 | {question}
29 | [[END OF QUESTION]]
30 | [[OPTIONS]]
31 | {options}
32 | [[END OF OPTIONS]]
33 | [[OUTPUT FORMAT]]
34 | Format your answer as follows:
35 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
36 | give the final correct option number in the following format: \"[[X]]\"
37 | [[END OF OUTPUT FORMAT]]
38 | """
39 |
40 | if __name__ == '__main__':
41 | print("Hello World")
42 |
43 | samples = load_MMR_V()
44 | model_name = 'gemini-2.0-flash'
45 | save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
46 | visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
47 |
48 | client = OpenAI(
49 | model_version=model_name,
50 | api_type='openai',
51 | api_key=api_key,
52 | api_url="https://api.gpt.ge/v1/chat/completions",
53 | default_headers={"x-foo": "true"},
54 | max_num_frames=512,
55 | )
56 | # breakpoint()
57 | results = []
58 | for idx,sample in enumerate(samples[:]):
59 | print(f"******** idx={idx} **********")
60 | # if idx<848:
61 | # continue
62 | # breakpoint()
63 | # if idx>=10:
64 | # break
65 | video_path = os.path.join(visual_path,sample["video"])
66 | question = sample["question"]
67 | options = sample["options"]
68 | full_prompt = prompt_template.format(
69 | question=question,
70 | options=options,
71 | )
72 |
73 | response = client.generate(
74 | visuals=video_path,
75 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
76 | )
77 | print(response)
78 | sample[f"{model_name}_raw_response"] = response
79 | # breakpoint()
80 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
81 |
82 | # Use findall to match all possible JSON blocks
83 | # matches = re.findall(json_regex, response, re.DOTALL)
84 |
85 | if isinstance(response, str):
86 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
87 | match = re.search(json_regex, response)
88 |
89 | if match:
90 | final_answer = match.group(1)
91 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
92 | print(f"Extracted answer: {final_answer}")
93 | else:
94 | print("No matching answer found in response.")
95 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
96 | else:
97 | print("Invalid response type received.")
98 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
99 | results.append(sample)
100 | # Write the results to the output file
101 | write_to_json(results, save_file, indent=4)
102 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
103 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/gemini-2.0-flash_on_MMR_frame16.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | # 加载 .env 文件中的环境变量
9 | # load_dotenv()
10 | # 从环境变量中获取 API 密钥
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | [[END OF INSTRUCTIONS]]
27 | [[QUESTION]]
28 | {question}
29 | [[END OF QUESTION]]
30 | [[OPTIONS]]
31 | {options}
32 | [[END OF OPTIONS]]
33 | [[OUTPUT FORMAT]]
34 | Format your answer as follows:
35 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
36 | give the final correct option number in the following format: \"[[X]]\"
37 | [[END OF OUTPUT FORMAT]]
38 | """
39 |
40 | if __name__ == '__main__':
41 | print("Hello World")
42 |
43 | samples = load_MMR_V()
44 | model_name = 'gemini-2.0-flash'
45 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V_frame16.json'
46 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
47 | save_file = f'/mnt/userdata/implicit_video_anonotations/results/{model_name}_on_MMR_V_frame16.json' #f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR.json'
48 | visual_path = '/mnt/userdata/implicit_video_anonotations/static/videos' #f'/netdisk/zhukejian/implicit_video_anonotations/static/videos'
49 |
50 |
51 | client = OpenAI(
52 | model_version=model_name,
53 | api_type='openai',
54 | api_key=api_key,
55 | api_url="https://api.gpt.ge/v1/chat/completions",
56 | default_headers={"x-foo": "true"},
57 | max_num_frames=16,
58 | )
59 | # breakpoint()
60 | results = []
61 | for idx,sample in enumerate(samples[:]):
62 | print(f"******** idx={idx} **********")
63 | if idx<66:
64 | continue
65 | # breakpoint()
66 | # if idx>=10:
67 | # break
68 | video_path = os.path.join(visual_path,sample["video"])
69 | question = sample["question"]
70 | options = sample["options"]
71 | full_prompt = prompt_template.format(
72 | question=question,
73 | options=options,
74 | )
75 |
76 | response = client.generate(
77 | visuals=video_path,
78 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
79 | )
80 | print(response)
81 | sample[f"{model_name}_raw_response"] = response
82 | # breakpoint()
83 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
84 |
85 | # Use findall to match all possible JSON blocks
86 | # matches = re.findall(json_regex, response, re.DOTALL)
87 |
88 | if isinstance(response, str):
89 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
90 | match = re.search(json_regex, response)
91 |
92 | if match:
93 | final_answer = match.group(1)
94 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
95 | print(f"Extracted answer: {final_answer}")
96 | else:
97 | print("No matching answer found in response.")
98 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
99 | else:
100 | print("Invalid response type received.")
101 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
102 | results.append(sample)
103 | # Write the results to the output file
104 | write_to_json(results, save_file, indent=4)
105 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
106 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/gemini-2.5-flash_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | import json
6 | import re
7 | # 加载 .env 文件中的环境变量
8 | # load_dotenv()
9 | # 从环境变量中获取 API 密钥
10 | from loguru import logger as eval_logger
11 | from utils.video_utils import OpenAI,VIDEO_TOKEN
12 | from utils import write_to_json
13 | from dataset.load_MMR_V import load_MMR_V
14 |
15 | prompt_template = """
16 | [[INSTRUCTIONS]]
17 | Please select the best answer to the following multiple-choice question based on the video.
18 | Only one option is the most accurate answer in relation to the question and the video.
19 |
20 | What is the correct answer to this question [[QUESTION]]
21 | Options:
22 | [[OPTIONS]]
23 |
24 | [[END OF INSTRUCTIONS]]
25 | [[QUESTION]]
26 | {question}
27 | [[END OF QUESTION]]
28 | [[OPTIONS]]
29 | {options}
30 | [[END OF OPTIONS]]
31 | [[OUTPUT FORMAT]]
32 | Format your answer as follows:
33 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
34 | give the final correct option number in the following format: "[[X]]"
35 | [[END OF OUTPUT FORMAT]]
36 | """
37 |
38 | if __name__ == '__main__':
39 | print("Hello World")
40 |
41 | samples = load_MMR_V()
42 | model_name = 'gemini-2.5-flash-preview-04-17'
43 | save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
44 | visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
45 |
46 | client = OpenAI(
47 | model_version=model_name,
48 | api_type='openai',
49 | api_key=api_key,
50 | api_url="https://us.vveai.com/v1/chat/completions",
51 | default_headers={"x-foo": "true"},
52 | max_num_frames=32,
53 | )
54 | # breakpoint()
55 | results = []
56 | for idx,sample in enumerate(samples[:]):
57 | print(f"******** idx={idx} **********")
58 | if idx<3:
59 | continue
60 | video_path = os.path.join(visual_path,sample["video"])
61 | question = sample["question"]
62 | options = sample["options"]
63 | full_prompt = prompt_template.format(
64 | question=question,
65 | options=options,
66 | )
67 |
68 | response = client.generate(
69 | visuals=video_path,
70 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
71 | )
72 | print(response)
73 | sample[f"{model_name}_raw_response"] = response
74 |
75 | if isinstance(response, str):
76 | # 先尝试原始的 [[X]] 提取
77 | json_regex = r'\[\[([A-L])\]\]'
78 | match = re.search(json_regex, response)
79 | if match:
80 | final_answer = match.group(1)
81 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
82 | print(f"Extracted answer: {final_answer}")
83 | else:
84 | # 回退到 \boxed{X} 格式的提取
85 | box_regex = r'\\boxed\{([A-L])\}'
86 | box_match = re.search(box_regex, response)
87 | if box_match:
88 | final_answer = box_match.group(1)
89 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
90 | print(f"Extracted answer from boxed pattern: {final_answer}")
91 | else:
92 | print("No matching answer found in response.")
93 | # 仍然存储原始响应以便检查
94 | sample[f"{model_name}_raw_response"] = response
95 | else:
96 | print("Invalid response type received.")
97 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
98 |
99 | results.append(sample)
100 | # Write the results to the output file
101 | write_to_json(results, save_file, indent=4)
102 |
103 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
104 | eval_logger.info("Finished Running!")
105 |
--------------------------------------------------------------------------------
/evaluation/gemini-2.5-flash_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | import argparse
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 |
13 | from loguru import logger as eval_logger
14 | from utils.video_utils import OpenAI,VIDEO_TOKEN
15 | from utils import write_to_json, read_json
16 | from dataset.load_MMR_V import load_MMR_V
17 |
18 | prompt_template = """
19 | [[INSTRUCTIONS]]
20 | Please select the best answer to the following multiple-choice question based on the video.
21 | Only one option is the most accurate answer in relation to the question and the video.
22 |
23 | What is the correct answer to this question [[QUESTION]]
24 | Options:
25 | [[OPTIONS]]
26 |
27 | [[END OF INSTRUCTIONS]]
28 | [[QUESTION]]
29 | {question}
30 | [[END OF QUESTION]]
31 | [[OPTIONS]]
32 | {options}
33 | [[END OF OPTIONS]]
34 | [[OUTPUT FORMAT]]
35 | Format your answer as follows:
36 | Your thinking process.
37 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
38 | give the final correct option number in the following format: "[[X]]"
39 | [[END OF OUTPUT FORMAT]]
40 | """
41 |
42 | def get_unique_id(elem):
43 | return elem["question"]
44 |
45 | if __name__ == '__main__':
46 | print("Hello World")
47 | parser = argparse.ArgumentParser()
48 | parser.add_argument(
49 | "--api_url",
50 | type=str,
51 | default="https://api.v3.cm/v1/chat/completions",
52 | help="URL for the API endpoint."
53 | )
54 | parser.add_argument(
55 | "--api_key",
56 | type=str,
57 | help="API key for authentication."
58 | )
59 | parser.add_argument(
60 | "--continue_eval",
61 | action="store_true",
62 | default=True,
63 | help="continue evaluation from existing result file"
64 | )
65 | parser.add_argument(
66 | "--overwrite",
67 | action="store_true",
68 | default=False,
69 | help="overwrite the existing result file"
70 | )
71 | args = parser.parse_args()
72 |
73 | samples = load_MMR_V()
74 | model_name = 'gemini-2.5-flash-preview-04-17'
75 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
76 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
77 |
78 | file_paths = [
79 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
80 | "/netdisk/zhukejian",
81 | "/mnt/userdata"
82 | ]
83 |
84 | for path in file_paths:
85 | if os.path.exists(f"{path}/implicit_video_anonotations"):
86 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
87 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
88 | break # 一旦找到有效路径,停止遍历
89 |
90 | results = []
91 | id_set = set()
92 | id2sample = {}
93 | # breakpoint()
94 | if args.continue_eval:
95 | if os.path.isfile(save_file):
96 | print(f"Continue eval from file {save_file}")
97 | results = read_json(save_file)
98 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None and elem[f"{model_name}_raw_response"] != ""]
99 | print(f"Load {len(results)} results...")
100 | id_set = set([get_unique_id(elem) for elem in results])
101 | id2sample = {get_unique_id(elem): elem for elem in results}
102 | else:
103 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
104 | elif args.overwrite:
105 | if os.path.isfile(save_file):
106 | print(f"Choose to overwrite existing file {save_file}")
107 | else:
108 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
109 | else:
110 | if os.path.isfile(save_file):
111 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
112 |
113 |
114 | client = OpenAI(
115 | model_version=model_name,
116 | api_type='openai',
117 | api_key=api_key,
118 | api_url="https://us.vveai.com/v1/chat/completions",
119 | default_headers={"x-foo": "true"},
120 | max_num_frames=32,
121 | )
122 | # breakpoint()
123 | results = []
124 | for idx,sample in enumerate(samples[:]):
125 |
126 | curr_id = get_unique_id(sample)
127 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None and id2sample[curr_id][f"{model_name}_raw_response"] != "":
128 | continue
129 |
130 | print(f"******** idx={idx} **********")
131 | # if idx<3:
132 | # continue
133 | video_path = os.path.join(visual_path,sample["video"])
134 | question = sample["question"]
135 | options = sample["options"]
136 | full_prompt = prompt_template.format(
137 | question=question,
138 | options=options,
139 | )
140 |
141 | response = client.generate(
142 | visuals=video_path,
143 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
144 | )
145 | print(response)
146 | sample[f"{model_name}_raw_response"] = response
147 |
148 | # if isinstance(response, str):
149 | # # 先尝试原始的 [[X]] 提取
150 | # json_regex = r'\[\[([A-L])\]\]'
151 | # match = re.search(json_regex, response)
152 | # if match:
153 | # final_answer = match.group(1)
154 | # sample[f"{model_name}_response"] = {"final_answer": final_answer}
155 | # print(f"Extracted answer: {final_answer}")
156 | # else:
157 | # # 回退到 \boxed{X} 格式的提取
158 | # box_regex = r'\\boxed\{([A-L])\}'
159 | # box_match = re.search(box_regex, response)
160 | # if box_match:
161 | # final_answer = box_match.group(1)
162 | # sample[f"{model_name}_response"] = {"final_answer": final_answer}
163 | # print(f"Extracted answer from boxed pattern: {final_answer}")
164 | # else:
165 | # print("No matching answer found in response.")
166 | # # 仍然存储原始响应以便检查
167 | # sample[f"{model_name}_raw_response"] = response
168 | if isinstance(response, str):
169 | # 尝试 [[X]] 的所有匹配
170 | json_regex = r'\[\[([A-L])\]\]'
171 | all_answers = re.findall(json_regex, response)
172 | if all_answers:
173 | final_answer = all_answers[-1]
174 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
175 | print(f"Extracted last answer: {final_answer}")
176 | else:
177 | # 回退到 \boxed{X}
178 | box_regex = r'\\boxed\{([A-L])\}'
179 | all_boxed = re.findall(box_regex, response)
180 | if all_boxed:
181 | final_answer = all_boxed[-1]
182 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
183 | print(f"Extracted last boxed answer: {final_answer}")
184 | else:
185 | print("No matching answer found in response.")
186 | sample[f"{model_name}_raw_response"] = response
187 | else:
188 | print("Invalid response type received.")
189 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
190 |
191 | results.append(sample)
192 | # Write the results to the output file
193 | write_to_json(results, save_file, indent=4)
194 |
195 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
196 | eval_logger.info("Finished Running!")
197 |
--------------------------------------------------------------------------------
/evaluation/gemma-3-12b-it_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import argparse
8 | import re
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 | from loguru import logger as eval_logger
13 | from utils.video_utils import OpenAI,VIDEO_TOKEN
14 | from utils import write_to_json, read_json
15 | from dataset.load_MMR_V import load_MMR_V
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
35 | Directly give the final correct option number in the following format: "[[X]]"
36 | [[END OF OUTPUT FORMAT]]
37 | """
38 |
39 | def extract_last_option(text):
40 | """从文本中倒序查找最后一个出现的A-D选项"""
41 | matches = re.findall(r'\b([A-L])\b', text.upper())
42 | return matches[-1] if matches else None
43 |
44 | def get_unique_id(elem):
45 | return elem["question"]
46 |
47 | if __name__ == '__main__':
48 | print("Hello World")
49 | parser = argparse.ArgumentParser()
50 | parser.add_argument(
51 | "--api_url",
52 | type=str,
53 | default="https://api.gpt.ge/v1/chat/completions",
54 | help="URL for the API endpoint."
55 | )
56 | parser.add_argument(
57 | "--api_key",
58 | type=str,
59 | help="API key for authentication."
60 | )
61 | parser.add_argument(
62 | "--continue_eval",
63 | action="store_true",
64 | default=True,
65 | help="continue evaluation from existing result file"
66 | )
67 | parser.add_argument(
68 | "--overwrite",
69 | action="store_true",
70 | default=False,
71 | help="overwrite the existing result file"
72 | )
73 | args = parser.parse_args()
74 | samples = load_MMR_V()
75 | model_name = 'gemma-3-12b-it'
76 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
77 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
78 |
79 | file_paths = [
80 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
81 | "/netdisk/zhukejian",
82 | "/mnt/userdata"
83 | ]
84 |
85 | for path in file_paths:
86 | if os.path.exists(f"{path}/implicit_video_anonotations"):
87 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
88 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
89 | break # 一旦找到有效路径,停止遍历
90 |
91 | results = []
92 | id_set = set()
93 | id2sample = {}
94 | # breakpoint()
95 | if args.continue_eval:
96 | if os.path.isfile(save_file):
97 | print(f"Continue eval from file {save_file}")
98 | results = read_json(save_file)
99 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None]
100 | print(f"Load {len(results)} results...")
101 | id_set = set([get_unique_id(elem) for elem in results])
102 | id2sample = {get_unique_id(elem): elem for elem in results}
103 | else:
104 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
105 | elif args.overwrite:
106 | if os.path.isfile(save_file):
107 | print(f"Choose to overwrite existing file {save_file}")
108 | else:
109 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
110 | else:
111 | if os.path.isfile(save_file):
112 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
113 |
114 | client = OpenAI(
115 | model_version='/mnt/usercache/zhaosuifeng/model/gemma-3-12b-it/',
116 | api_type='openai',
117 | api_key="",
118 | api_url="http://210.75.240.155:25712/v1/chat/completions",
119 | default_headers={"x-foo": "true"},
120 | max_num_frames=16,
121 | )
122 | # breakpoint()
123 |
124 | for idx,sample in enumerate(samples[:]):
125 |
126 | curr_id = get_unique_id(sample)
127 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None:
128 | continue
129 |
130 | print(f"******** idx={idx} **********")
131 |
132 | video_path = os.path.join(visual_path,sample["video"])
133 | question = sample["question"]
134 | options = sample["options"]
135 | full_prompt = prompt_template.format(
136 | question=question,
137 | options=options,
138 | )
139 |
140 | response = client.generate(
141 | visuals=video_path,
142 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
143 | )
144 | print(response)
145 | sample[f"{model_name}_raw_response"] = response
146 |
147 | if isinstance(response, str):
148 | # 先尝试原始的 [[X]] 提取
149 | json_regex = r'\[\[([A-L])\]\]'
150 | match = re.search(json_regex, response)
151 | if match:
152 | final_answer = match.group(1)
153 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
154 | print(f"Extracted answer: {final_answer}")
155 | else:
156 | # 回退到 \boxed{X} 格式的提取
157 | box_regex = r'\\boxed\{([A-L])\}'
158 | box_match = re.search(box_regex, response)
159 | if box_match:
160 | final_answer = box_match.group(1)
161 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
162 | print(f"Extracted answer from boxed pattern: {final_answer}")
163 | else:
164 | option = extract_last_option(response)
165 | if option:
166 | sample[f"{model_name}_response"] = {"final_answer": option}
167 | else:
168 | print("No matching answer found in response.")
169 | # 仍然存储原始响应以便检查
170 | sample[f"{model_name}_raw_response"] = response
171 | else:
172 | print("Invalid response type received.")
173 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
174 |
175 | results.append(sample)
176 | # Write the results to the output file
177 | write_to_json(results, save_file, indent=4)
178 |
179 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
180 | eval_logger.info("Finished Running!")
181 |
--------------------------------------------------------------------------------
/evaluation/gemma-3-12b-it_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import argparse
8 | import re
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 | from loguru import logger as eval_logger
13 | from utils.video_utils import OpenAI,VIDEO_TOKEN
14 | from utils import write_to_json, read_json
15 | from dataset.load_MMR_V import load_MMR_V
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | [[END OF INSTRUCTIONS]]
27 | [[QUESTION]]
28 | {question}
29 | [[END OF QUESTION]]
30 | [[OPTIONS]]
31 | {options}
32 | [[END OF OPTIONS]]
33 | [[OUTPUT FORMAT]]
34 | Format your answer as follows:
35 | Your thinking process.
36 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
37 | give the final correct option number in the following format: "[[X]]"
38 | The final correct option letter MUST be put in the "[[]]"
39 | [[END OF OUTPUT FORMAT]]
40 | """
41 |
42 | def extract_last_option(text):
43 | """从文本中倒序查找最后一个出现的A-D选项"""
44 | matches = re.findall(r'\b([A-L])\b', text.upper())
45 | return matches[-1] if matches else None
46 |
47 | def get_unique_id(elem):
48 | return elem["question"]
49 |
50 | if __name__ == '__main__':
51 | print("Hello World")
52 | parser = argparse.ArgumentParser()
53 | parser.add_argument(
54 | "--api_url",
55 | type=str,
56 | default="https://api.gpt.ge/v1/chat/completions",
57 | help="URL for the API endpoint."
58 | )
59 | parser.add_argument(
60 | "--api_key",
61 | type=str,
62 | help="API key for authentication."
63 | )
64 | parser.add_argument(
65 | "--continue_eval",
66 | action="store_true",
67 | default=True,
68 | help="continue evaluation from existing result file"
69 | )
70 | parser.add_argument(
71 | "--overwrite",
72 | action="store_true",
73 | default=False,
74 | help="overwrite the existing result file"
75 | )
76 | args = parser.parse_args()
77 | samples = load_MMR_V()
78 | model_name = 'gemma-3-12b-it'
79 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
80 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
81 |
82 | file_paths = [
83 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
84 | "/netdisk/zhukejian",
85 | "/mnt/userdata"
86 | ]
87 |
88 | for path in file_paths:
89 | if os.path.exists(f"{path}/implicit_video_anonotations"):
90 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
91 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
92 | break # 一旦找到有效路径,停止遍历
93 |
94 | results = []
95 | id_set = set()
96 | id2sample = {}
97 | # breakpoint()
98 | if args.continue_eval:
99 | if os.path.isfile(save_file):
100 | print(f"Continue eval from file {save_file}")
101 | results = read_json(save_file)
102 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None]
103 | print(f"Load {len(results)} results...")
104 | id_set = set([get_unique_id(elem) for elem in results])
105 | id2sample = {get_unique_id(elem): elem for elem in results}
106 | else:
107 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
108 | elif args.overwrite:
109 | if os.path.isfile(save_file):
110 | print(f"Choose to overwrite existing file {save_file}")
111 | else:
112 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
113 | else:
114 | if os.path.isfile(save_file):
115 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
116 |
117 | client = OpenAI(
118 | model_version='/mnt/usercache/zhaosuifeng/model/gemma-3-12b-it/',
119 | api_type='openai',
120 | api_key="",
121 | api_url="http://210.75.240.155:25712/v1/chat/completions",
122 | default_headers={"x-foo": "true"},
123 | max_num_frames=16,
124 | )
125 | # breakpoint()
126 |
127 | for idx,sample in enumerate(samples[:]):
128 |
129 | curr_id = get_unique_id(sample)
130 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None:
131 | continue
132 |
133 | print(f"******** idx={idx} **********")
134 |
135 | video_path = os.path.join(visual_path,sample["video"])
136 | question = sample["question"]
137 | options = sample["options"]
138 | full_prompt = prompt_template.format(
139 | question=question,
140 | options=options,
141 | )
142 |
143 | response = client.generate(
144 | visuals=video_path,
145 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
146 | )
147 | print(response)
148 | sample[f"{model_name}_raw_response"] = response
149 |
150 | if isinstance(response, str):
151 | # 先尝试原始的 [[X]] 提取
152 | json_regex = r'\[\[([A-L])\]\]'
153 | match = re.search(json_regex, response)
154 | if match:
155 | final_answer = match.group(1)
156 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
157 | print(f"Extracted answer: {final_answer}")
158 | else:
159 | # 回退到 \boxed{X} 格式的提取
160 | box_regex = r'\\boxed\{([A-L])\}'
161 | box_match = re.search(box_regex, response)
162 | if box_match:
163 | final_answer = box_match.group(1)
164 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
165 | print(f"Extracted answer from boxed pattern: {final_answer}")
166 | else:
167 | option = extract_last_option(response)
168 | if option:
169 | sample[f"{model_name}_response"] = {"final_answer": option}
170 | else:
171 | print("No matching answer found in response.")
172 | # 仍然存储原始响应以便检查
173 | sample[f"{model_name}_raw_response"] = response
174 | else:
175 | print("Invalid response type received.")
176 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
177 |
178 | results.append(sample)
179 | # Write the results to the output file
180 | write_to_json(results, save_file, indent=4)
181 |
182 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
183 | eval_logger.info("Finished Running!")
184 |
--------------------------------------------------------------------------------
/evaluation/gemma-3-27b-it_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import argparse
8 | import re
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 | from loguru import logger as eval_logger
13 | from utils.video_utils import OpenAI,VIDEO_TOKEN
14 | from utils import write_to_json, read_json
15 | from dataset.load_MMR_V import load_MMR_V
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
35 | Directly give the final correct option number in the following format: "[[X]]"
36 | [[END OF OUTPUT FORMAT]]
37 | """
38 |
39 | def extract_last_option(text):
40 | """从文本中倒序查找最后一个出现的A-D选项"""
41 | matches = re.findall(r'\b([A-L])\b', text.upper())
42 | return matches[-1] if matches else None
43 |
44 | def get_unique_id(elem):
45 | return elem["question"]
46 |
47 | if __name__ == '__main__':
48 | print("Hello World")
49 | parser = argparse.ArgumentParser()
50 | parser.add_argument(
51 | "--api_url",
52 | type=str,
53 | default="https://api.gpt.ge/v1/chat/completions",
54 | help="URL for the API endpoint."
55 | )
56 | parser.add_argument(
57 | "--api_key",
58 | type=str,
59 | help="API key for authentication."
60 | )
61 | parser.add_argument(
62 | "--continue_eval",
63 | action="store_true",
64 | default=True,
65 | help="continue evaluation from existing result file"
66 | )
67 | parser.add_argument(
68 | "--overwrite",
69 | action="store_true",
70 | default=False,
71 | help="overwrite the existing result file"
72 | )
73 | args = parser.parse_args()
74 | samples = load_MMR_V()
75 | model_name = 'gemma-3-27b-it'
76 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
77 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
78 |
79 | file_paths = [
80 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
81 | "/netdisk/zhukejian",
82 | "/mnt/userdata"
83 | ]
84 |
85 | for path in file_paths:
86 | if os.path.exists(f"{path}/implicit_video_anonotations"):
87 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
88 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
89 | break # 一旦找到有效路径,停止遍历
90 |
91 | results = []
92 | id_set = set()
93 | id2sample = {}
94 | # breakpoint()
95 | if args.continue_eval:
96 | if os.path.isfile(save_file):
97 | print(f"Continue eval from file {save_file}")
98 | results = read_json(save_file)
99 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None]
100 | print(f"Load {len(results)} results...")
101 | id_set = set([get_unique_id(elem) for elem in results])
102 | id2sample = {get_unique_id(elem): elem for elem in results}
103 | else:
104 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
105 | elif args.overwrite:
106 | if os.path.isfile(save_file):
107 | print(f"Choose to overwrite existing file {save_file}")
108 | else:
109 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
110 | else:
111 | if os.path.isfile(save_file):
112 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
113 |
114 | client = OpenAI(
115 | model_version='/mnt/usercache/zhaosuifeng/model/gemma-3-27b-it/',
116 | api_type='openai',
117 | api_key="",
118 | api_url="http://210.75.240.154:25712/v1/chat/completions",
119 | default_headers={"x-foo": "true"},
120 | max_num_frames=16,
121 | )
122 | # breakpoint()
123 |
124 | for idx,sample in enumerate(samples[:]):
125 |
126 | curr_id = get_unique_id(sample)
127 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None:
128 | continue
129 |
130 | print(f"******** idx={idx} **********")
131 |
132 | video_path = os.path.join(visual_path,sample["video"])
133 | question = sample["question"]
134 | options = sample["options"]
135 | full_prompt = prompt_template.format(
136 | question=question,
137 | options=options,
138 | )
139 |
140 | response = client.generate(
141 | visuals=video_path,
142 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
143 | )
144 | print(response)
145 | sample[f"{model_name}_raw_response"] = response
146 |
147 | if isinstance(response, str):
148 | # 先尝试原始的 [[X]] 提取
149 | json_regex = r'\[\[([A-L])\]\]'
150 | match = re.search(json_regex, response)
151 | if match:
152 | final_answer = match.group(1)
153 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
154 | print(f"Extracted answer: {final_answer}")
155 | else:
156 | # 回退到 \boxed{X} 格式的提取
157 | box_regex = r'\\boxed\{([A-L])\}'
158 | box_match = re.search(box_regex, response)
159 | if box_match:
160 | final_answer = box_match.group(1)
161 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
162 | print(f"Extracted answer from boxed pattern: {final_answer}")
163 | else:
164 | option = extract_last_option(response)
165 | if option:
166 | sample[f"{model_name}_response"] = {"final_answer": option}
167 | else:
168 | print("No matching answer found in response.")
169 | # 仍然存储原始响应以便检查
170 | sample[f"{model_name}_raw_response"] = response
171 | else:
172 | print("Invalid response type received.")
173 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
174 |
175 | results.append(sample)
176 | # Write the results to the output file
177 | write_to_json(results, save_file, indent=4)
178 |
179 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
180 | eval_logger.info("Finished Running!")
181 |
--------------------------------------------------------------------------------
/evaluation/gemma-3-27b-it_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import argparse
8 | import re
9 | # 加载 .env 文件中的环境变量
10 | # load_dotenv()
11 | # 从环境变量中获取 API 密钥
12 | from loguru import logger as eval_logger
13 | from utils.video_utils import OpenAI,VIDEO_TOKEN
14 | from utils import write_to_json, read_json
15 | from dataset.load_MMR_V import load_MMR_V
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | [[END OF INSTRUCTIONS]]
27 | [[QUESTION]]
28 | {question}
29 | [[END OF QUESTION]]
30 | [[OPTIONS]]
31 | {options}
32 | [[END OF OPTIONS]]
33 | [[OUTPUT FORMAT]]
34 | Format your answer as follows:
35 | Your thinking process.
36 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
37 | give the final correct option number in the following format: "[[X]]"
38 | The final correct option letter MUST be put in the "[[]]"
39 | [[END OF OUTPUT FORMAT]]
40 | """
41 |
42 | def extract_last_option(text):
43 | """从文本中倒序查找最后一个出现的A-D选项"""
44 | matches = re.findall(r'\b([A-L])\b', text.upper())
45 | return matches[-1] if matches else None
46 |
47 | def get_unique_id(elem):
48 | return elem["question"]
49 |
50 | if __name__ == '__main__':
51 | print("Hello World")
52 | parser = argparse.ArgumentParser()
53 | parser.add_argument(
54 | "--api_url",
55 | type=str,
56 | default="https://api.gpt.ge/v1/chat/completions",
57 | help="URL for the API endpoint."
58 | )
59 | parser.add_argument(
60 | "--api_key",
61 | type=str,
62 | help="API key for authentication."
63 | )
64 | parser.add_argument(
65 | "--continue_eval",
66 | action="store_true",
67 | default=True,
68 | help="continue evaluation from existing result file"
69 | )
70 | parser.add_argument(
71 | "--overwrite",
72 | action="store_true",
73 | default=False,
74 | help="overwrite the existing result file"
75 | )
76 | args = parser.parse_args()
77 | samples = load_MMR_V()
78 | model_name = 'gemma-3-27b-it'
79 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
80 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
81 |
82 | file_paths = [
83 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
84 | "/netdisk/zhukejian",
85 | "/mnt/userdata"
86 | ]
87 |
88 | for path in file_paths:
89 | if os.path.exists(f"{path}/implicit_video_anonotations"):
90 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
91 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
92 | break # 一旦找到有效路径,停止遍历
93 |
94 | results = []
95 | id_set = set()
96 | id2sample = {}
97 | # breakpoint()
98 | if args.continue_eval:
99 | if os.path.isfile(save_file):
100 | print(f"Continue eval from file {save_file}")
101 | results = read_json(save_file)
102 | results = [elem for elem in results if elem[f"{model_name}_raw_response"] is not None]
103 | print(f"Load {len(results)} results...")
104 | id_set = set([get_unique_id(elem) for elem in results])
105 | id2sample = {get_unique_id(elem): elem for elem in results}
106 | else:
107 | print(f"File {save_file} does not exists! Ignore the continue_eval parameter.")
108 | elif args.overwrite:
109 | if os.path.isfile(save_file):
110 | print(f"Choose to overwrite existing file {save_file}")
111 | else:
112 | print(f"File {save_file} does not exists! Ignore the overwrite parameter.")
113 | else:
114 | if os.path.isfile(save_file):
115 | raise ValueError(f"Save file {save_file} already exists! Please use --continue_eval or --overwrite.")
116 |
117 | client = OpenAI(
118 | model_version='/mnt/usercache/zhaosuifeng/model/gemma-3-27b-it/',
119 | api_type='openai',
120 | api_key="",
121 | api_url="http://210.75.240.155:25712/v1/chat/completions",
122 | default_headers={"x-foo": "true"},
123 | max_num_frames=16,
124 | )
125 | # breakpoint()
126 |
127 | for idx,sample in enumerate(samples[:]):
128 |
129 | curr_id = get_unique_id(sample)
130 | if curr_id in id_set and id2sample[curr_id][f"{model_name}_raw_response"] is not None:
131 | continue
132 |
133 | print(f"******** idx={idx} **********")
134 |
135 | video_path = os.path.join(visual_path,sample["video"])
136 | question = sample["question"]
137 | options = sample["options"]
138 | full_prompt = prompt_template.format(
139 | question=question,
140 | options=options,
141 | )
142 |
143 | response = client.generate(
144 | visuals=video_path,
145 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
146 | )
147 | print(response)
148 | sample[f"{model_name}_raw_response"] = response
149 |
150 | if isinstance(response, str):
151 | # 先尝试原始的 [[X]] 提取
152 | json_regex = r'\[\[([A-L])\]\]'
153 | match = re.search(json_regex, response)
154 | if match:
155 | final_answer = match.group(1)
156 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
157 | print(f"Extracted answer: {final_answer}")
158 | else:
159 | # 回退到 \boxed{X} 格式的提取
160 | box_regex = r'\\boxed\{([A-L])\}'
161 | box_match = re.search(box_regex, response)
162 | if box_match:
163 | final_answer = box_match.group(1)
164 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
165 | print(f"Extracted answer from boxed pattern: {final_answer}")
166 | else:
167 | option = extract_last_option(response)
168 | if option:
169 | sample[f"{model_name}_response"] = {"final_answer": option}
170 | else:
171 | print("No matching answer found in response.")
172 | # 仍然存储原始响应以便检查
173 | sample[f"{model_name}_raw_response"] = response
174 | else:
175 | print("Invalid response type received.")
176 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
177 |
178 | results.append(sample)
179 | # Write the results to the output file
180 | write_to_json(results, save_file, indent=4)
181 |
182 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
183 | eval_logger.info("Finished Running!")
184 |
--------------------------------------------------------------------------------
/evaluation/gpt-4.1_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | import json
6 | import re
7 | # 加载 .env 文件中的环境变量
8 | # load_dotenv()
9 | # 从环境变量中获取 API 密钥
10 | from loguru import logger as eval_logger
11 | from utils.video_utils import OpenAI,VIDEO_TOKEN
12 | from utils import write_to_json
13 | from dataset.load_MMR_V import load_MMR_V_4o_error, load_MMR_V
14 |
15 |
16 | prompt_template = """
17 | [[INSTRUCTIONS]]
18 | Please select the best answer to the following multiple-choice question based on the video.
19 | Only one option is the most accurate answer in relation to the question and the video.
20 |
21 | What is the correct answer to this question [[QUESTION]]
22 | Options:
23 | [[OPTIONS]]
24 |
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
35 | give the final correct option number in the following format: \"[[X]]\"
36 | [[END OF OUTPUT FORMAT]]
37 | """
38 |
39 | if __name__ == '__main__':
40 | print("Hello World")
41 |
42 | samples = load_MMR_V()
43 | model_name = 'gpt-4.1-2025-04-14'
44 | save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
45 | visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
46 |
47 | client = OpenAI(
48 | model_version=model_name,
49 | api_type='openai',
50 | api_key=api_key,
51 | api_url="https://api.gpt.ge/v1/chat/completions",
52 | default_headers={"x-foo": "true"},
53 | )
54 | # breakpoint()
55 | results = []
56 | for idx,sample in enumerate(samples[:]):
57 | print(f"******** idx={idx} **********")
58 | # if idx>=3:
59 | # break
60 | # if idx<848:
61 | # continue
62 | # breakpoint()
63 | video_path = os.path.join(visual_path,sample["video"])
64 | question = sample["question"]
65 | options = sample["options"]
66 | full_prompt = prompt_template.format(
67 | question=question,
68 | options=options,
69 | )
70 |
71 | response = client.generate(
72 | visuals=video_path,
73 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
74 | )
75 | print(response)
76 | sample[f"{model_name}_raw_response"] = response
77 | # breakpoint()
78 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
79 |
80 | # Use findall to match all possible JSON blocks
81 | # matches = re.findall(json_regex, response, re.DOTALL)
82 |
83 | if isinstance(response, str):
84 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
85 | match = re.search(json_regex, response)
86 |
87 | if match:
88 | final_answer = match.group(1)
89 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
90 | print(f"Extracted answer: {final_answer}")
91 | else:
92 | print("No matching answer found in response.")
93 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
94 | else:
95 | print("Invalid response type received.")
96 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
97 | results.append(sample)
98 | # Write the results to the output file
99 | write_to_json(results, save_file, indent=4)
100 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
101 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/gpt-4.1_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | import json
6 | import re
7 | # 加载 .env 文件中的环境变量
8 | # load_dotenv()
9 | # 从环境变量中获取 API 密钥
10 |
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V_4o_error, load_MMR_V
15 |
16 |
17 | prompt_template = """
18 | [[INSTRUCTIONS]]
19 | Please select the best answer to the following multiple-choice question based on the video.
20 | Only one option is the most accurate answer in relation to the question and the video.
21 |
22 | What is the correct answer to this question [[QUESTION]]
23 | Options:
24 | [[OPTIONS]]
25 |
26 | Let's think step by step.
27 |
28 | [[END OF INSTRUCTIONS]]
29 | [[QUESTION]]
30 | {question}
31 | [[END OF QUESTION]]
32 | [[OPTIONS]]
33 | {options}
34 | [[END OF OPTIONS]]
35 | [[OUTPUT FORMAT]]
36 | Format your answer as follows:
37 | Your thinking process.
38 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
39 | give the final correct option number in the following format: \"[[X]]\"
40 | [[END OF OUTPUT FORMAT]]
41 | """
42 |
43 | if __name__ == '__main__':
44 | print("Hello World")
45 |
46 | samples = load_MMR_V()
47 | model_name = 'gpt-4.1-2025-04-14'
48 | save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
49 | visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
50 |
51 | client = OpenAI(
52 | model_version=model_name,
53 | api_type='openai',
54 | api_key=api_key,
55 | api_url="https://api.gpt.ge/v1/chat/completions",
56 | default_headers={"x-foo": "true"},
57 | )
58 | # breakpoint()
59 | results = []
60 | for idx,sample in enumerate(samples[:]):
61 | print(f"******** idx={idx} **********")
62 | # if idx>=3:
63 | # break
64 | # if idx<848:
65 | # continue
66 | # breakpoint()
67 | video_path = os.path.join(visual_path,sample["video"])
68 | question = sample["question"]
69 | options = sample["options"]
70 | full_prompt = prompt_template.format(
71 | question=question,
72 | options=options,
73 | )
74 |
75 | response = client.generate(
76 | visuals=video_path,
77 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
78 | )
79 | print(response)
80 | sample[f"{model_name}_raw_response"] = response
81 | # breakpoint()
82 | # json_regex = r'JSON Output:\s*===\s*(?:```json\s*)?(\{.*?\})\s*(?:```)?\s*===\s*'
83 |
84 | # Use findall to match all possible JSON blocks
85 | # matches = re.findall(json_regex, response, re.DOTALL)
86 |
87 | if isinstance(response, str):
88 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
89 | match = re.search(json_regex, response)
90 |
91 | if match:
92 | final_answer = match.group(1)
93 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
94 | print(f"Extracted answer: {final_answer}")
95 | else:
96 | print("No matching answer found in response.")
97 | sample[f"{model_name}_raw_response"] = response # 仍然存储原始响应以便检查
98 | else:
99 | print("Invalid response type received.")
100 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
101 | results.append(sample)
102 | # Write the results to the output file
103 | write_to_json(results, save_file, indent=4)
104 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
105 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/o4-mini_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | # 加载 .env 文件中的环境变量
9 | # load_dotenv()
10 | # 从环境变量中获取 API 密钥
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 | prompt_template = """
17 | [[INSTRUCTIONS]]
18 | Please select the best answer to the following multiple-choice question based on the video.
19 | Only one option is the most accurate answer in relation to the question and the video.
20 |
21 | What is the correct answer to this question [[QUESTION]]
22 | Options:
23 | [[OPTIONS]]
24 | [[END OF INSTRUCTIONS]]
25 | [[QUESTION]]
26 | {question}
27 | [[END OF QUESTION]]
28 | [[OPTIONS]]
29 | {options}
30 | [[END OF OPTIONS]]
31 | [[OUTPUT FORMAT]]
32 | Format your answer as follows:
33 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
34 | Directly give the final correct option number in the following format: "[[X]]"
35 | [[END OF OUTPUT FORMAT]]
36 | """
37 |
38 | if __name__ == '__main__':
39 | print("Hello World")
40 |
41 | samples = load_MMR_V()
42 | model_name = 'o4-mini-2025-04-16'
43 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V.json'
44 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
45 |
46 | file_paths = [
47 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
48 | "/netdisk/zhukejian",
49 | "/mnt/userdata"
50 | ]
51 |
52 | for path in file_paths:
53 | if os.path.exists(f"{path}/implicit_video_anonotations"):
54 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V_part2.json'
55 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
56 | break # 一旦找到有效路径,停止遍历
57 |
58 | client = OpenAI(
59 | model_version=model_name,
60 | api_type='openai',
61 | api_key=api_key,
62 | api_url="https://us.vveai.com/v1/chat/completions",
63 | default_headers={"x-foo": "true"},
64 | max_num_frames=32,
65 | )
66 | # breakpoint()
67 | results = []
68 | for idx,sample in enumerate(samples[:]):
69 | print(f"******** idx={idx} **********")
70 | if idx<497:
71 | continue
72 | video_path = os.path.join(visual_path,sample["video"])
73 | question = sample["question"]
74 | options = sample["options"]
75 | full_prompt = prompt_template.format(
76 | question=question,
77 | options=options,
78 | )
79 |
80 | response = client.generate(
81 | visuals=video_path,
82 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
83 | )
84 | print(response)
85 | sample[f"{model_name}_raw_response"] = response
86 |
87 | if isinstance(response, str):
88 | # 先尝试原始的 [[X]] 提取
89 | json_regex = r'\[\[([A-L])\]\]'
90 | match = re.search(json_regex, response)
91 | if match:
92 | final_answer = match.group(1)
93 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
94 | print(f"Extracted answer: {final_answer}")
95 | else:
96 | # 回退到 \boxed{X} 格式的提取
97 | box_regex = r'\\boxed\{([A-L])\}'
98 | box_match = re.search(box_regex, response)
99 | if box_match:
100 | final_answer = box_match.group(1)
101 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
102 | print(f"Extracted answer from boxed pattern: {final_answer}")
103 | else:
104 | print("No matching answer found in response.")
105 | # 仍然存储原始响应以便检查
106 | sample[f"{model_name}_raw_response"] = response
107 | else:
108 | print("Invalid response type received.")
109 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
110 |
111 | results.append(sample)
112 | # Write the results to the output file
113 | write_to_json(results, save_file, indent=4)
114 |
115 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
116 | eval_logger.info("Finished Running!")
117 |
--------------------------------------------------------------------------------
/evaluation/o4-mini_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import sys
3 | import os
4 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
5 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
6 | import json
7 | import re
8 | # 加载 .env 文件中的环境变量
9 | # load_dotenv()
10 | # 从环境变量中获取 API 密钥
11 | from loguru import logger as eval_logger
12 | from utils.video_utils import OpenAI,VIDEO_TOKEN
13 | from utils import write_to_json
14 | from dataset.load_MMR_V import load_MMR_V
15 |
16 | prompt_template = """
17 | [[INSTRUCTIONS]]
18 | Please select the best answer to the following multiple-choice question based on the video.
19 | Only one option is the most accurate answer in relation to the question and the video.
20 |
21 | What is the correct answer to this question [[QUESTION]]
22 | Options:
23 | [[OPTIONS]]
24 | Let's think step by step.
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | Your thinking process.
35 | If the correct option letters (A, B, C, D... ) for the multiple-choice question is X,
36 | give the final correct option number in the following format: "[[X]]"
37 | [[END OF OUTPUT FORMAT]]
38 | """
39 |
40 | if __name__ == '__main__':
41 | print("Hello World")
42 |
43 | samples = load_MMR_V()
44 | model_name = 'o4-mini-2025-04-16'
45 | # save_file = f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot.json'
46 | # visual_path = '/netdisk/zhukejian/implicit_video_anonotations/static/videos'
47 | file_paths = [
48 | # "/mnt/userdata/implicit_video_anonotations/MMR-V - video -llava.json"
49 | "/netdisk/zhukejian",
50 | "/mnt/userdata"
51 | ]
52 |
53 | for path in file_paths:
54 | if os.path.exists(f"{path}/implicit_video_anonotations"):
55 | save_file = f'{path}/implicit_video_anonotations/results/{model_name}_on_MMR_V_cot_part2.json'
56 | visual_path = f'{path}/implicit_video_anonotations/static/videos'
57 | break # 一旦找到有效路径,停止遍历
58 |
59 |
60 | client = OpenAI(
61 | model_version=model_name,
62 | api_type='openai',
63 | api_key=api_key,
64 | api_url="https://api.gpt.ge/v1/chat/completions",
65 | default_headers={"x-foo": "true"},
66 | max_num_frames=32,
67 | )
68 | # breakpoint()
69 | results = []
70 | for idx,sample in enumerate(samples[:]):
71 | print(f"******** idx={idx} **********")
72 | if idx<925:
73 | continue
74 | # breakpoint()
75 | video_path = os.path.join(visual_path,sample["video"])
76 | question = sample["question"]
77 | options = sample["options"]
78 | full_prompt = prompt_template.format(
79 | question=question,
80 | options=options,
81 | )
82 |
83 | response = client.generate(
84 | visuals=video_path,
85 | contexts=f'{full_prompt} {VIDEO_TOKEN}' #
86 | )
87 | print(response)
88 | sample[f"{model_name}_raw_response"] = response
89 |
90 | if isinstance(response, str):
91 | # 先尝试原始的 [[X]] 提取
92 | json_regex = r'\[\[([A-L])\]\]'
93 | match = re.search(json_regex, response)
94 | if match:
95 | final_answer = match.group(1)
96 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
97 | print(f"Extracted answer: {final_answer}")
98 | else:
99 | # 回退到 \boxed{X} 格式的提取
100 | box_regex = r'\\boxed\{([A-L])\}'
101 | box_match = re.search(box_regex, response)
102 | if box_match:
103 | final_answer = box_match.group(1)
104 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
105 | print(f"Extracted answer from boxed pattern: {final_answer}")
106 | else:
107 | print("No matching answer found in response.")
108 | # 仍然存储原始响应以便检查
109 | sample[f"{model_name}_raw_response"] = response
110 | else:
111 | print("Invalid response type received.")
112 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
113 |
114 | results.append(sample)
115 | # Write the results to the output file
116 | write_to_json(results, save_file, indent=4)
117 |
118 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
119 | eval_logger.info("Finished Running!")
120 |
--------------------------------------------------------------------------------
/evaluation/qwen2.5-VL-72B_on_MMR.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import os
3 | # 加载 .env 文件中的环境变量
4 | # load_dotenv()
5 | import sys
6 | import re
7 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
8 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
9 | # print(sys.path)
10 | from loguru import logger as eval_logger
11 | # breakpoint()
12 | # 从环境变量中获取 API 密钥
13 | os.environ['DASHSCOPE_API_KEY'] = 'your_api_key_here'
14 |
15 | prompt_template = """
16 | [[INSTRUCTIONS]]
17 | Please select the best answer to the following multiple-choice question based on the video.
18 | Only one option is the most accurate answer in relation to the question and the video.
19 |
20 | What is the correct answer to this question [[QUESTION]]
21 | Options:
22 | [[OPTIONS]]
23 |
24 | [[END OF INSTRUCTIONS]]
25 | [[QUESTION]]
26 | {question}
27 | [[END OF QUESTION]]
28 | [[OPTIONS]]
29 | {options}
30 | [[END OF OPTIONS]]
31 | [[OUTPUT FORMAT]]
32 | Format your answer as follows:
33 |
34 | Give the final correct option number in the following format: \"[[A]]\" or \"[[B]]\" or \"[[C]]\" or \"[[D]]\" ...
35 | [[END OF OUTPUT FORMAT]]
36 | """
37 |
38 |
39 | api_key = os.getenv('DASHSCOPE_API_KEY')
40 |
41 | import os
42 | from utils.video_utils import OpenAI,VIDEO_TOKEN
43 | from utils import write_to_json
44 | from dataset.load_MMR_V import load_MMR_V
45 | if __name__ == '__main__':
46 | print("Hello World")
47 | samples = load_MMR_V()
48 | # samples = load_vcg_bench_diverse_subset()
49 | model_name = 'Qwen2.5-VL-72B-Instruct'
50 | save_file = f'/mnt/userdata/implicit_video_anonotations/results/{model_name}_on_MMR.json' #f'/netdisk/zhukejian/implicit_video_anonotations/results/{model_name}_on_MMR.json'
51 | visual_path = '/mnt/userdata/implicit_video_anonotations/static/videos' #f'/netdisk/zhukejian/implicit_video_anonotations/static/videos'
52 | results = []
53 | client = OpenAI(
54 | model_version= '/mnt/usercache/zhuoran/rl/Qwen2.5-VL-72B-Instruct', # '/netcache/zhuoran/rl/Qwen2.5-VL-7B-Instruct',
55 | api_type='openai',
56 | api_key=api_key,
57 | api_url="http://210.75.240.153:22277/v1/chat/completions",
58 | )
59 |
60 | # 每次处理一条数据,注意:不再设置 batch
61 | for idx, sample in enumerate(samples):
62 | print(f"******** idx={idx} **********")
63 | if idx<595:
64 | continue
65 | # if idx>=3:
66 | # break
67 | # breakpoint()
68 | video_path = os.path.join(visual_path, sample["video"])
69 | question = sample["question"]
70 | options = sample["options"]
71 | full_prompt = prompt_template.format(
72 | question=question,
73 | options=options,
74 | )
75 |
76 | response = client.generate(
77 | visuals=video_path,
78 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
79 | )
80 | print(response)
81 |
82 | sample[f"{model_name}_raw_response"] = response
83 |
84 | if isinstance(response, str):
85 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
86 | match = re.search(json_regex, response)
87 | if match:
88 | final_answer = match.group(1)
89 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
90 | print(f"Extracted answer: {final_answer}")
91 | else:
92 | print("No matching answer found in response.")
93 | sample[f"{model_name}_raw_response"] = response
94 | else:
95 | print("Invalid response type received.")
96 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
97 | results.append(sample)
98 | # breakpoint()
99 | # 将结果写入文件(也可选择每处理一条数据写入一次)
100 | write_to_json(results, save_file, indent=4)
101 |
102 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
103 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/evaluation/qwen2.5-VL-7B_on_MMR_cot.py:
--------------------------------------------------------------------------------
1 | from dotenv import load_dotenv
2 | import os
3 | # 加载 .env 文件中的环境变量
4 | # load_dotenv()
5 | import sys
6 | import re
7 | sys.path.append(os.path.abspath("/netdisk/zhukejian/implicit_video_anonotations"))
8 | sys.path.append(os.path.abspath("/mnt/userdata/implicit_video_anonotations"))
9 | # print(sys.path)
10 |
11 | # breakpoint()
12 | # 从环境变量中获取 API 密钥
13 | os.environ['DASHSCOPE_API_KEY'] = 'your_api_key_here'
14 |
15 | prompt_template = """
16 | [[INSTRUCTIONS]]
17 | Please select the best answer to the following multiple-choice question based on the video.
18 | Only one option is the most accurate answer in relation to the question and the video.
19 |
20 | What is the correct answer to this question [[QUESTION]]
21 | Options:
22 | [[OPTIONS]]
23 |
24 | Let's think step by step.
25 | [[END OF INSTRUCTIONS]]
26 | [[QUESTION]]
27 | {question}
28 | [[END OF QUESTION]]
29 | [[OPTIONS]]
30 | {options}
31 | [[END OF OPTIONS]]
32 | [[OUTPUT FORMAT]]
33 | Format your answer as follows:
34 | [Analyze the best option for question]
35 | [Justification for your final choice based on the thinking process.]
36 |
37 | Give the final correct option number in the following format: \"[[A]]\" or \"[[B]]\" or \"[[C]]\" or \"[[D]]\" ...
38 | [[END OF OUTPUT FORMAT]]
39 | """
40 |
41 |
42 | api_key = os.getenv('DASHSCOPE_API_KEY')
43 |
44 | import os
45 | from utils.video_utils import OpenAI,VIDEO_TOKEN
46 | from utils import write_to_json
47 | from dataset.load_MMR_V import load_MMR_V
48 | if __name__ == '__main__':
49 | print("Hello World")
50 | samples = load_MMR_V()
51 | # samples = load_vcg_bench_diverse_subset()
52 | model_name = 'Qwen2.5-VL-7B-Instruct'
53 | save_file = f'/mnt/userdata/implicit_video_anonotations/results/{model_name}_on_MMR_cot.json'
54 | visual_path = '/mnt/userdata/implicit_video_anonotations/static/videos'
55 | results = []
56 | client = OpenAI(
57 | model_version='/mnt/usercache/zhuoran/rl/Qwen2.5-VL-7B-Instruct',
58 | api_type='openai',
59 | api_key=api_key,
60 | api_url="http://210.75.240.153:22345/v1/chat/completions",
61 | max_num_frames=8,
62 | )
63 |
64 | # 每次处理一条数据,注意:不再设置 batch
65 | for idx, sample in enumerate(samples):
66 | print(f"******** idx={idx} **********")
67 | video_path = os.path.join(visual_path, sample["video"])
68 | question = sample["question"]
69 | options = sample["options"]
70 | full_prompt = prompt_template.format(
71 | question=question,
72 | options=options,
73 | )
74 |
75 | response = client.generate(
76 | visuals=video_path,
77 | contexts=f'{full_prompt} {VIDEO_TOKEN}'
78 | )
79 | print(response)
80 |
81 | sample[f"{model_name}_raw_response"] = response
82 |
83 | if isinstance(response, str):
84 | json_regex = r'\[\[([ABCDEFGHIJKL])\]\]'
85 | match = re.search(json_regex, response)
86 | if match:
87 | final_answer = match.group(1)
88 | sample[f"{model_name}_response"] = {"final_answer": final_answer}
89 | print(f"Extracted answer: {final_answer}")
90 | else:
91 | print("No matching answer found in response.")
92 | sample[f"{model_name}_raw_response"] = response
93 | else:
94 | print("Invalid response type received.")
95 | sample[f"{model_name}_raw_response"] = "Error: Invalid response type"
96 | results.append(sample)
97 |
98 | # 将结果写入文件(也可选择每处理一条数据写入一次)
99 | write_to_json(results, save_file, indent=4)
100 |
101 | eval_logger.info(f"Successfully wrote {len(results)} results to {save_file}!")
102 | eval_logger.info("Finished Running!")
--------------------------------------------------------------------------------
/figs/LOGO_v3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/LOGO_v3.png
--------------------------------------------------------------------------------
/figs/ability_type.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/ability_type.pdf
--------------------------------------------------------------------------------
/figs/accuracy_vs_frames_00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/accuracy_vs_frames_00.png
--------------------------------------------------------------------------------
/figs/audio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/audio.png
--------------------------------------------------------------------------------
/figs/construction_pipeline_00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/construction_pipeline_00.png
--------------------------------------------------------------------------------
/figs/data_example_intro_v4_5_16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/data_example_intro_v4_5_16.png
--------------------------------------------------------------------------------
/figs/enhanced_video_categories_fixed.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/enhanced_video_categories_fixed.pdf
--------------------------------------------------------------------------------
/figs/error analysis_00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/error analysis_00.png
--------------------------------------------------------------------------------
/figs/main.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/main.png
--------------------------------------------------------------------------------
/figs/main_results.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/main_results.png
--------------------------------------------------------------------------------
/figs/o4-compare.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/o4-compare.pdf
--------------------------------------------------------------------------------
/figs/o4-compare_00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/o4-compare_00.png
--------------------------------------------------------------------------------
/figs/task_analysis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/task_analysis.pdf
--------------------------------------------------------------------------------
/figs/task_analysis_00.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/task_analysis_00.png
--------------------------------------------------------------------------------
/figs/task_analysis_final.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/task_analysis_final.png
--------------------------------------------------------------------------------
/figs/video_type.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GaryStack/MMR-V/87ac4e5309c411b090c956ef73b02d2ffe7080b5/figs/video_type.pdf
--------------------------------------------------------------------------------
/human_exp/app.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, render_template, jsonify, send_from_directory
2 | import json
3 | import os
4 |
5 | app = Flask(__name__)
6 |
7 | # 首页:题目列表页
8 | @app.route('/')
9 | def index():
10 | return render_template("index.html")
11 |
12 | # 单独题目作答页
13 | @app.route('/question')
14 | def question():
15 | return render_template("question.html")
16 |
17 | # 交卷结果页面
18 | @app.route('/result')
19 | def result():
20 | return render_template("result.html")
21 |
22 | # 提供 JSON 数据接口
23 | @app.route('/questions.json')
24 | def get_questions():
25 | try:
26 | with open("questions.json", "r", encoding="utf-8") as f:
27 | questions = json.load(f)
28 | return jsonify(questions)
29 | except Exception as e:
30 | return jsonify({"error": "无法加载题目数据", "details": str(e)}), 500
31 |
32 | # 提供视频文件访问(假设视频存放在 /netdisk/implicit/videos/)
33 | @app.route('/netdisk/zhukejian/implicit_video_anonotations/static/videos/')
34 | def serve_video(filename):
35 | # 注意:根据实际环境,确保 Flask 有权访问该目录。
36 | video_directory = "/netdisk/zhukejian/implicit_video_anonotations/static/videos"
37 | if os.path.exists(os.path.join(video_directory, filename)):
38 | return send_from_directory(video_directory, filename)
39 | else:
40 | return f"视频文件 {filename} 不存在!", 404
41 |
42 | if __name__ == "__main__":
43 | # 开发环境中启动 Flask 服务器
44 | app.run(debug=True)
45 |
--------------------------------------------------------------------------------
/human_exp/format.py:
--------------------------------------------------------------------------------
1 | import json
2 | import random
3 |
4 | # 配置路径和键名
5 | file_path = "/netdisk/zhukejian/implicit_video_anonotations/human_exp/questions.json"
6 | keys_to_remove = ['error_info', 'gpt-4o_raw_response', 'gpt-4o_response', 'correctAnswer'] # 需要删除的键(按需修改)
7 | keys_to_add = ['human_answer', 'cost_time', 'explanation'] # 需要新增的键(按需修改)
8 |
9 | try:
10 | with open(file_path, 'r') as f:
11 | data = json.load(f)
12 | except Exception as e:
13 | print(f"读取文件失败: {e}")
14 | exit()
15 |
16 | # 改进后的删除逻辑:先检查存在性
17 | for sample in data:
18 | for key in keys_to_remove:
19 | if key in sample: # 显式存在性检查
20 | sample.pop(key)
21 | print(f"已删除存在的键: {key}")
22 | else:
23 | print(f"键不存在,跳过删除: {key}")
24 | # breakpoint()
25 | # 添加新键(空字符串值)
26 | for sample in data:
27 | for key in keys_to_add:
28 | if key not in sample: # 可选:防止覆盖已有键
29 | sample[key] = ""
30 | print(f"已添加新键: {key}")
31 | else:
32 | print(f"键已存在,跳过添加: {key}")
33 |
34 | random.shuffle(data)
35 |
36 | # 写回文件
37 | output_file_path = "/netdisk/zhukejian/implicit_video_anonotations/human_exp/human_exp_questions.json"
38 | try:
39 | with open(output_file_path, 'w') as f:
40 | json.dump(data, f, indent=4, ensure_ascii=False)
41 | print("文件更新成功!")
42 | except Exception as e:
43 | print(f"写入文件失败: {e}")
--------------------------------------------------------------------------------
/human_exp/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | 评测系统 - 主页
6 |
11 |
12 |
13 |
14 |
15 |
16 |
49 |
50 |
--------------------------------------------------------------------------------
/human_exp/question.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Question
6 |
12 |
13 |
14 |