├── .gitattributes
├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── automation
├── README.md
├── data.csv
├── requirements.txt
└── script.py
└── experiments
├── automated-voiceover-of-nba-game
├── README.md
└── notebook.ipynb
├── gpt4v-classification
├── README.md
├── app.py
├── fish.jpg
└── requirements.txt
├── gpt4v-grounding-dino-detection
├── README.md
├── app.py
├── mercedes.jpeg
└── requirements.txt
├── gpt4v-narration
├── README.md
├── app.py
├── requirements.txt
└── wearable.mp4
├── gpt4v-vs-clip
├── README.md
├── app.py
├── camry.jpeg
├── deep-dish.jpg
├── images.jpeg
└── requirements.txt
├── hot-dog-not-hot-dog
├── README.md
├── app.py
└── requirements.txt
└── webcam-gpt
├── README.md
├── app.py
└── requirements.txt
/.gitattributes:
--------------------------------------------------------------------------------
1 | *.ipynb linguist-vendored
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | venv/
3 | data/
4 | .DS_Store
5 | __pycache__/
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | ## 🦸 Contributing to awesome-openai-vision-api-experiments
2 |
3 | We love your input! We want to make contributing to awesome-openai-vision-api-experiments as easy and transparent as possible, whether it's:
4 |
5 | - Reporting a bug
6 | - Discussing the current state of the code
7 | - Submitting a fix
8 |
9 | ## 🧪️ Adding a new experiment
10 |
11 | - **We only accept experiments where the code was open-sourced.**
12 | - Add new subdirectory to `experiments` directory.
13 | - Add new entry to `automation/data.csv` file.
14 | - Run `automation/script.py`. Experiments table in `README.md` will update
15 | automatically.
16 | - Commit changes to feature branch. Create PR.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
openai vision api experiments 🧪
2 |
3 | ## 👋 Hello
4 |
5 | The must-have resource for anyone who wants to experiment with and build on the [OpenAI
6 | Vision API](https://platform.openai.com/docs/guides/vision). This repository serves as
7 | a hub for innovative experiments, showcasing a variety of applications ranging from
8 | simple image classifications to advanced zero-shot learning models. It's a space for
9 | both beginners and experts to explore the capabilities of the Vision API, share their
10 | findings, and collaborate on pushing the boundaries of visual AI.
11 |
12 | Experimenting with the OpenAI API requires an API 🔑. You can get one
13 | [here](https://platform.openai.com/api-keys).
14 |
15 | ## ⚠️ Limitations
16 |
17 | - 100 API requests per single API key per day.
18 | - Can't be used for object detection or image segmentation. We can solve this problem by combining GPT-4V with foundational models like GroundingDINO or Segment Anything (SAM). Please take a look at the [example](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection) and read our [blog post](https://blog.roboflow.com/dino-gpt-4v).
19 |
20 | ## 🧪 Experiments
21 |
22 |
23 |
27 | | **experiment** | **complementary materials** | **authors** |
28 | |:--------------:|:---------------------------:|:-----------:|
29 | | WebcamGPT - chat with video stream | [](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/webcam-gpt) [](https://huggingface.co/spaces/Roboflow/webcamGPT) | @SkalskiP |
30 | | HotDogGPT - simple image classification application | [](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/hot-dog-not-hot-dog) [](https://huggingface.co/spaces/Roboflow/HotDogGPT) | @SkalskiP |
31 | | zero-shot image classifier with GPT-4V | [](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-classification) | @capjamesg |
32 | | zero-shot object detection with GroundingDINO + GPT-4V | [](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection) [](https://huggingface.co/spaces/Roboflow/DINO-GPT4V) | @capjamesg |
33 | | GPT-4V vs. CLIP | [](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-vs-clip) | @capjamesg |
34 | | GPT-4V with Set-of-Mark (SoM) | [](https://github.com/microsoft/SoM) | Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao |
35 | | GPT-4V on Web | [](https://github.com/Jiayi-Pan/GPT-V-on-Web) | @Jiayi-Pan |
36 | | automated voiceover of NBA game | [](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/automated-voiceover-of-nba-game) [](https://colab.research.google.com/github/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb) | @SkalskiP |
37 | | screenshot-to-code | [](https://github.com/abi/screenshot-to-code) | @abi |
38 | | GPT with Vision Checkup | []( https://github.com/roboflow/gpt-checkup) | Roboflow team |
39 |
40 | https://github.com/roboflow/awesome-openai-vision-api-experiments/assets/26109316/c63fa3c0-4564-49ee-8982-a9e6a23dae9b
41 |
42 | ## 🗞️ Must Read Papers
43 |
44 | - [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V](https://arxiv.org/abs/2310.11441)
45 | by Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao
46 | - [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421)
47 | by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
48 | - [GPT-4 System Card](https://cdn.openai.com/papers/gpt-4-system-card.pdf) by OpenAI
49 |
50 | ## 🖊️ Blogs
51 |
52 | - [How CLIP and GPT-4V Compare for Classification](https://blog.roboflow.com/clip-vs-gpt-4v/)
53 | - [Experiments with GPT-4V for Object Detection](https://blog.roboflow.com/gpt-4v-object-detection/)
54 | - [Distilling GPT-4 for Classification with an API](https://blog.roboflow.com/gpt-4-image-classification/)
55 | - [DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model](https://blog.roboflow.com/dino-gpt-4v/)
56 | - [First Impressions with GPT-4V(ision)](https://blog.roboflow.com/gpt-4-vision/)
57 |
58 | ## 🦸 Contribution
59 |
60 | We would love your help in making this repository even better! Whether you want to
61 | add a new experiment or have any suggestions for improvement,
62 | feel free to open an [issue](https://github.com/roboflow/awesome-openai-vision-api-experiments/issues)
63 | or [pull request](https://github.com/roboflow/awesome-openai-vision-api-experiments/pulls).
64 |
65 | If you are up to the task and want to add a new experiment, please look at our [contribution guide](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/CONTRIBUTING.md). There you can find all the information you need.
66 |
--------------------------------------------------------------------------------
/automation/README.md:
--------------------------------------------------------------------------------
1 | ## Install
2 |
3 | ```bash
4 | # setup and activate python environment
5 | python3 -m venv venv
6 | source venv/bin/activate
7 |
8 | pip install -r automation/requirements.txt
9 | ```
10 |
11 | ## Generate
12 |
13 | ```bash
14 | python automation/script.py
15 | ```
--------------------------------------------------------------------------------
/automation/data.csv:
--------------------------------------------------------------------------------
1 | title, code, huggingface, colab, authors
2 | "WebcamGPT - chat with video stream","https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/webcam-gpt","https://huggingface.co/spaces/Roboflow/webcamGPT","",@SkalskiP
3 | "HotDogGPT - simple image classification application","https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/hot-dog-not-hot-dog","https://huggingface.co/spaces/Roboflow/HotDogGPT","",@SkalskiP
4 | "zero-shot image classifier with GPT-4V","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-classification","","",@capjamesg
5 | "zero-shot object detection with GroundingDINO + GPT-4V","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection","https://huggingface.co/spaces/Roboflow/DINO-GPT4V","",@capjamesg
6 | "GPT-4V vs. CLIP","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-vs-clip","","",@capjamesg
7 | "GPT-4V with Set-of-Mark (SoM)","https://github.com/microsoft/SoM","","","Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao"
8 | "GPT-4V audio narration","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-narration","","",@etown
9 | "GPT-4V on Web","https://github.com/Jiayi-Pan/GPT-V-on-Web","","",@Jiayi-Pan
10 | "automated voiceover of NBA game","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/automated-voiceover-of-nba-game","","https://colab.research.google.com/github/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb",@SkalskiP
11 | "GPT with Vision Checkup", https://github.com/roboflow/gpt-checkup,,, Roboflow team
12 |
--------------------------------------------------------------------------------
/automation/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas
--------------------------------------------------------------------------------
/automation/script.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from typing import List
3 |
4 | import pandas as pd
5 |
6 | from pandas.core.series import Series
7 |
8 | TITLE_COLUMN_NAME = "title"
9 | CODE_COLUMN_NAME = "code"
10 | HUGGINGFACE_COLUMN_NAME = "huggingface"
11 | COLAB_COLUMN_NAME = "colab"
12 | AUTHORS_COLUMN_NAME = "authors"
13 |
14 | AUTOGENERATED_EXPERIMENTS_LIST_TOKEN = ""
15 |
16 | WARNING_HEADER = [
17 | ""
21 | ]
22 |
23 | GITHUB_BADGE_PATTERN = "[]({})"
24 | HUGGINGFACE_BADGE_PATTERN = "[]({})"
25 | COLAB_BADGE_PATTERN = "[]({})"
26 |
27 |
28 | TABLE_HEADER = [
29 | "| **experiment** | **complementary materials** | **authors** |",
30 | "|:--------------:|:---------------------------:|:-----------:|"
31 | ]
32 |
33 |
34 | def read_lines_from_file(path: str) -> List[str]:
35 | """
36 | Reads lines from file and strips trailing whitespaces.
37 | """
38 | with open(path) as file:
39 | return [line.rstrip() for line in file]
40 |
41 |
42 | def save_lines_to_file(path: str, lines: List[str]) -> None:
43 | """
44 | Saves lines to file.
45 | """
46 | with open(path, "w") as f:
47 | for line in lines:
48 | f.write("%s\n" % line)
49 |
50 |
51 | def format_entry(entry: Series) -> str:
52 | title = entry.loc[TITLE_COLUMN_NAME]
53 | code_url = entry.loc[CODE_COLUMN_NAME]
54 | huggingface_url = entry.loc[HUGGINGFACE_COLUMN_NAME]
55 | colab_url = entry.loc[COLAB_COLUMN_NAME]
56 | authors = entry.loc[AUTHORS_COLUMN_NAME]
57 | code_badge = GITHUB_BADGE_PATTERN.format(
58 | code_url) if code_url else ""
59 | huggingface_badge = HUGGINGFACE_BADGE_PATTERN.format(
60 | huggingface_url) if huggingface_url else ""
61 | colab_badge = COLAB_BADGE_PATTERN.format(
62 | colab_url) if colab_url else ""
63 | complementary_materials = " ".join([code_badge, huggingface_badge, colab_badge])
64 | return "| {} | {} | {} |".format(title, complementary_materials, authors)
65 |
66 |
67 | def load_table_entries(path: str) -> List[str]:
68 | """
69 | Loads table entries from csv file.
70 | """
71 | df = pd.read_csv(path, quotechar='"', dtype=str)
72 | df.columns = df.columns.str.strip()
73 | df = df.fillna("")
74 | return [
75 | format_entry(row)
76 | for _, row
77 | in df.iterrows()
78 | ]
79 |
80 |
81 | def search_lines_with_token(lines: List[str], token: str) -> List[int]:
82 | result = []
83 | for line_index, line in enumerate(lines):
84 | if token in line:
85 | result.append(line_index)
86 | return result
87 |
88 |
89 | def inject_markdown_table_into_readme(
90 | readme_lines: List[str],
91 | table_lines: List[str]
92 | ) -> List[str]:
93 | lines_with_token_indexes = search_lines_with_token(
94 | lines=readme_lines,
95 | token=AUTOGENERATED_EXPERIMENTS_LIST_TOKEN)
96 | if len(lines_with_token_indexes) != 2:
97 | raise Exception(f"Please inject two {AUTOGENERATED_EXPERIMENTS_LIST_TOKEN} "
98 | f"tokens to signal start and end of autogenerated table.")
99 |
100 | [table_start_line_index, table_end_line_index] = lines_with_token_indexes
101 | return (
102 | readme_lines[:table_start_line_index + 1] +
103 | table_lines +
104 | readme_lines[table_end_line_index:]
105 | )
106 |
107 |
108 | if __name__ == "__main__":
109 | parser = argparse.ArgumentParser()
110 | parser.add_argument('-d', '--data_path', default='automation/data.csv')
111 | parser.add_argument('-r', '--readme_path', default='README.md')
112 | args = parser.parse_args()
113 |
114 | table_lines = load_table_entries(path=args.data_path)
115 | table_lines = WARNING_HEADER + TABLE_HEADER + table_lines
116 | readme_lines = read_lines_from_file(path=args.readme_path)
117 | readme_lines = inject_markdown_table_into_readme(
118 | readme_lines=readme_lines,
119 | table_lines=table_lines)
120 | save_lines_to_file(path=args.readme_path, lines=readme_lines)
121 |
--------------------------------------------------------------------------------
/experiments/automated-voiceover-of-nba-game/README.md:
--------------------------------------------------------------------------------
1 | ## Automated voiceover of NBA game 🏀
--------------------------------------------------------------------------------
/experiments/gpt4v-classification/README.md:
--------------------------------------------------------------------------------
1 | # GPT-4V Classification
2 |
3 | ## 💻 Install
4 |
5 | ```bash
6 | # create and activate virtual environment
7 | python3 -m venv venv
8 | source venv/bin/activate
9 |
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```
--------------------------------------------------------------------------------
/experiments/gpt4v-classification/app.py:
--------------------------------------------------------------------------------
1 | from autodistill_gpt_4v import GPT4V
2 | from autodistill.detection import CaptionOntology
3 | import os
4 |
5 | base_model = GPT4V(
6 | ontology=CaptionOntology(
7 | {
8 | "salmon": "salmon",
9 | "carp": "carp"
10 | }
11 | ),
12 | api_key=os.environ["OPENAI_API_KEY"]
13 | )
14 |
15 | result = base_model.predict("fish.jpg", base_model.ontology.prompts())
16 |
17 | class_result = base_model.ontology.prompts()[result.get_top_k(1).class_id]
18 | print(class_result)
--------------------------------------------------------------------------------
/experiments/gpt4v-classification/fish.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-classification/fish.jpg
--------------------------------------------------------------------------------
/experiments/gpt4v-classification/requirements.txt:
--------------------------------------------------------------------------------
1 | autodistill_gpt4v
2 | autodistill
--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/README.md:
--------------------------------------------------------------------------------
1 | # GPT-4V GroundingDINO Object Detection
2 |
3 | ## 💻 Install
4 |
5 | ```bash
6 | # create and activate virtual environment
7 | python3 -m venv venv
8 | source venv/bin/activate
9 |
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```
--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/app.py:
--------------------------------------------------------------------------------
1 | from autodistill_gpt_4v import GPT4V
2 | from autodistill.detection import CaptionOntology
3 | from autodistill_grounding_dino import GroundingDINO
4 | from autodistill.utils import plot
5 |
6 | from autodistill.core.custom_detection_model import CustomDetectionModel
7 | import cv2
8 | import os
9 |
10 | classes = ["mercedes", "toyota"]
11 |
12 |
13 | DINOGPT = CustomDetectionModel(
14 | detection_model=GroundingDINO(
15 | CaptionOntology({"car": "car"})
16 | ),
17 | classification_model=GPT4V(
18 | CaptionOntology({k: k for k in classes}),
19 | api_key=os.environ["OPENAI_API_KEY"]
20 | )
21 | )
22 |
23 | IMAGE = "mercedes.jpeg"
24 |
25 | results = DINOGPT.predict(IMAGE)
26 |
27 | plot(
28 | image=cv2.imread(IMAGE),
29 | detections=results,
30 | classes=["mercedes", "toyota", "car"]
31 | )
32 |
--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/mercedes.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-grounding-dino-detection/mercedes.jpeg
--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/requirements.txt:
--------------------------------------------------------------------------------
1 | autodistill_grounding_dino
2 | autodistill-gpt-4v
3 | autodistill
4 |
--------------------------------------------------------------------------------
/experiments/gpt4v-narration/README.md:
--------------------------------------------------------------------------------
1 | # Life Narration
2 |
3 | https://github.com/etown/LifeNarration/assets/357244/3e9a39a0-7f90-42d8-97ec-4b9825918e56
4 |
5 | ## 💻 Install
6 |
7 | ```bash
8 | # create and activate virtual environment
9 | python3 -m venv venv
10 | source venv/bin/activate
11 |
12 | # install dependencies
13 | pip install -r requirements.txt
14 | ```
15 |
16 | ## 🚀 Run
17 |
18 | ```bash
19 | python app.py
20 | ```
--------------------------------------------------------------------------------
/experiments/gpt4v-narration/app.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import base64
3 | import subprocess
4 | import tempfile
5 | import os
6 | from elevenlabs import generate, set_api_key
7 | from openai import OpenAI
8 |
9 | # API Keys and File Paths
10 | OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
11 | ELEVENLABS_API_KEY = os.environ["ELEVENLABS_API_KEY"]
12 | VIDEO_FILE_PATH = 'wearable.mp4'
13 | OUTPUT_FILE_PATH = 'output_with_audio.mp4'
14 |
15 | # Set API keys
16 | set_api_key(ELEVENLABS_API_KEY)
17 | client = OpenAI(
18 | api_key=OPENAI_API_KEY,
19 | )
20 |
21 | def read_video_frames(video_path, skip_frames=10):
22 | video = cv2.VideoCapture(video_path)
23 | base64_frames = []
24 | frame_count = 0
25 | while video.isOpened():
26 | success, frame = video.read()
27 | if not success:
28 | break
29 | if frame_count % skip_frames == 0:
30 | _, buffer = cv2.imencode(".jpg", frame)
31 | base64_frames.append(base64.b64encode(buffer).decode("utf-8"))
32 | frame_count += 1
33 | video.release()
34 | return base64_frames
35 |
36 | def generate_script(frames):
37 | prompt_messages = [
38 | {
39 | "role": "user",
40 | "content": [
41 | "These are frames of a video recorded from a person's point of view going through mundane life tasks. Create a short voiceover script in the style of a super excited sports narrator...",
42 | *map(lambda x: {"image": x, "resize": 768}, frames[0::10]),
43 | ],
44 | },
45 | ]
46 | result = client.chat.completions.create(
47 | model="gpt-4-vision-preview",
48 | messages=prompt_messages,
49 | max_tokens=500,
50 | )
51 | return result.choices[0].message.content
52 |
53 | def shorten_script(script):
54 | prompt_messages = [
55 | {
56 | "role": "user",
57 | "content": f"Shorten this script so it can be read in about 30 seconds: {script}",
58 | }
59 | ]
60 | result = client.chat.completions.create(
61 | model="gpt-4",
62 | messages=prompt_messages,
63 | max_tokens=500,
64 | )
65 | return result.choices[0].message.content
66 |
67 | def generate_audio(text):
68 | return generate(
69 | text=text,
70 | voice="Oliver",
71 | model='eleven_multilingual_v2'
72 | )
73 |
74 | def merge_audio_with_video(audio, video_path, output_path):
75 | with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as audio_file:
76 | audio_file.write(audio)
77 | audio_filename = audio_file.name
78 |
79 | ffmpeg_command = [
80 | 'ffmpeg', '-y', '-i', video_path, '-i', audio_filename,
81 | '-c:v', 'copy', '-c:a', 'aac', '-strict', 'experimental', output_path
82 | ]
83 |
84 | subprocess.run(ffmpeg_command)
85 |
86 | # Main Process
87 | frames = read_video_frames(VIDEO_FILE_PATH)
88 | script = generate_script(frames)
89 | short_script = shorten_script(script)
90 | audio = generate_audio(short_script)
91 | merge_audio_with_video(audio, VIDEO_FILE_PATH, OUTPUT_FILE_PATH)
--------------------------------------------------------------------------------
/experiments/gpt4v-narration/requirements.txt:
--------------------------------------------------------------------------------
1 | opencv-python-headless==4.5.5.64
2 | openai==1.2.2
3 | elevenlabs==0.2.24
4 | ffmpeg-python==0.2.0
5 |
--------------------------------------------------------------------------------
/experiments/gpt4v-narration/wearable.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-narration/wearable.mp4
--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/README.md:
--------------------------------------------------------------------------------
1 | # GPT-4V vs. CLIP
2 |
3 | ## 💻 Install
4 |
5 | ```bash
6 | # create and activate virtual environment
7 | python3 -m venv venv
8 | source venv/bin/activate
9 |
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```
--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/app.py:
--------------------------------------------------------------------------------
1 | from autodistill_gpt_4v import GPT4V
2 | from autodistill.detection import CaptionOntology
3 | from autodistill_clip import CLIP
4 | import os
5 |
6 | prompts = ["chicago deep dish pizza", "pizza"]
7 |
8 | ontology = CaptionOntology(
9 | {k: k for k in prompts}
10 | )
11 |
12 | clip_model = CLIP(ontology=ontology)
13 |
14 | clip_result = clip_model.predict("deep-dish.jpg")
15 |
16 | class_result = prompts[clip_result.class_id[0]]
17 |
18 | print("CLIP result: ", class_result)
19 |
20 | gpt_4v_model = GPT4V(ontology=ontology, api_key=os.environ["OPENAI_API_KEY"])
21 |
22 | gpt_result = gpt_4v_model.predict("deep-dish.jpg", gpt_4v_model.ontology.prompts())
23 |
24 | class_result = prompts[gpt_result.class_id[0]]
25 | print("GPT-4-V result: ", class_result)
26 |
--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/camry.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/camry.jpeg
--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/deep-dish.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/deep-dish.jpg
--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/images.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/images.jpeg
--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/requirements.txt:
--------------------------------------------------------------------------------
1 | autodistill_gpt-4v
2 | autodistill_clip
3 | autodistill
--------------------------------------------------------------------------------
/experiments/hot-dog-not-hot-dog/README.md:
--------------------------------------------------------------------------------
1 | # HotDogGPT 💬 + 🌭
2 |
3 | ## 💻 Install
4 |
5 | ```bash
6 | # create and activate virtual environment
7 | python3 -m venv venv
8 | source venv/bin/activate
9 |
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```
--------------------------------------------------------------------------------
/experiments/hot-dog-not-hot-dog/app.py:
--------------------------------------------------------------------------------
1 | import base64
2 |
3 | import cv2
4 | import gradio as gr
5 | import numpy as np
6 | import requests
7 |
8 | MARKDOWN = """
9 | # HotDogGPT 💬 + 🌭
10 |
11 | HotDogGPT is OpenAI Vision API experiment reproducing the famous
12 | [Hot Dog, Not Hot Dog](https://www.youtube.com/watch?v=ACmydtFDTGs) app from Silicon
13 | Valley.
14 |
15 |
16 |
17 |
18 |
19 | Visit [awesome-openai-vision-api-experiments](https://github.com/roboflow/awesome-openai-vision-api-experiments)
20 | repository to find more OpenAI Vision API experiments or contribute your own.
21 | """
22 | API_URL = "https://api.openai.com/v1/chat/completions"
23 | CLASSES = ["🌭 Hot Dog", "❌ Not Hot Dog"]
24 |
25 |
26 | def preprocess_image(image: np.ndarray) -> np.ndarray:
27 | image = np.fliplr(image)
28 | return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
29 |
30 |
31 | def encode_image_to_base64(image: np.ndarray) -> str:
32 | success, buffer = cv2.imencode('.jpg', image)
33 | if not success:
34 | raise ValueError("Could not encode image to JPEG format.")
35 |
36 | encoded_image = base64.b64encode(buffer).decode('utf-8')
37 | return encoded_image
38 |
39 |
40 | def compose_payload(image: np.ndarray, prompt: str) -> dict:
41 | base64_image = encode_image_to_base64(image)
42 | return {
43 | "model": "gpt-4-vision-preview",
44 | "messages": [
45 | {
46 | "role": "user",
47 | "content": [
48 | {
49 | "type": "text",
50 | "text": prompt
51 | },
52 | {
53 | "type": "image_url",
54 | "image_url": {
55 | "url": f"data:image/jpeg;base64,{base64_image}"
56 | }
57 | }
58 | ]
59 | }
60 | ],
61 | "max_tokens": 300
62 | }
63 |
64 |
65 | def compose_classification_prompt(classes: list) -> str:
66 | return (f"What is in the image? Return the class of the object in the image. Here "
67 | f"are the classes: {', '.join(classes)}. You can only return one class "
68 | f"from that list.")
69 |
70 |
71 | def compose_headers(api_key: str) -> dict:
72 | return {
73 | "Content-Type": "application/json",
74 | "Authorization": f"Bearer {api_key}"
75 | }
76 |
77 |
78 | def prompt_image(api_key: str, image: np.ndarray, prompt: str) -> str:
79 | headers = compose_headers(api_key=api_key)
80 | payload = compose_payload(image=image, prompt=prompt)
81 | response = requests.post(url=API_URL, headers=headers, json=payload).json()
82 |
83 | if 'error' in response:
84 | raise ValueError(response['error']['message'])
85 | return response['choices'][0]['message']['content']
86 |
87 |
88 | def classify_image(api_key: str, image: np.ndarray) -> str:
89 | if not api_key:
90 | raise ValueError(
91 | "API_KEY is not set. "
92 | "Please follow the instructions in the README to set it up.")
93 | image = preprocess_image(image=image)
94 | prompt = compose_classification_prompt(classes=CLASSES)
95 | response = prompt_image(api_key=api_key, image=image, prompt=prompt)
96 | return response
97 |
98 |
99 | with gr.Blocks() as demo:
100 | gr.Markdown(MARKDOWN)
101 | api_key_textbox = gr.Textbox(
102 | label="🔑 OpenAI API", type="password")
103 |
104 | with gr.TabItem("Basic"):
105 | with gr.Column():
106 | input_image = gr.Image(
107 | image_mode='RGB', type='numpy', height=500)
108 | output_text = gr.Textbox(
109 | label="Output")
110 | submit_button = gr.Button("Submit")
111 |
112 | submit_button.click(
113 | fn=classify_image,
114 | inputs=[api_key_textbox, input_image],
115 | outputs=output_text)
116 |
117 | # with gr.TabItem("Advanced"):
118 | # with gr.Column():
119 | # advanced_clss_list = gr.Textbox(
120 | # label="Comma-separated list of classes")
121 | # advanced_input_image = gr.Image(
122 | # image_mode='RGB', type='numpy', height=500)
123 | # advanced_output_text = gr.Textbox(
124 | # label="Output")
125 | # advanced_submit_button = gr.Button("Submit")
126 |
127 | demo.launch(debug=False, show_error=True)
128 |
--------------------------------------------------------------------------------
/experiments/hot-dog-not-hot-dog/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | opencv-python
3 | requests
4 | supervision
5 | gradio==3.50.2
6 |
--------------------------------------------------------------------------------
/experiments/webcam-gpt/README.md:
--------------------------------------------------------------------------------
1 | # WebcamGPT 💬 + 📸
2 |
3 | https://github.com/roboflow/awesome-openai-vision-api-experiments/assets/26109316/c63fa3c0-4564-49ee-8982-a9e6a23dae9b
4 |
5 | ## 💻 Install
6 |
7 | ```bash
8 | # create and activate virtual environment
9 | python3 -m venv venv
10 | source venv/bin/activate
11 |
12 | # install dependencies
13 | pip install -r requirements.txt
14 | ```
15 |
16 | ## 🚀 Run
17 |
18 | ```bash
19 | python app.py
20 | ```
21 |
--------------------------------------------------------------------------------
/experiments/webcam-gpt/app.py:
--------------------------------------------------------------------------------
1 | import base64
2 | import os
3 | import uuid
4 |
5 | import cv2
6 | import gradio as gr
7 | import numpy as np
8 | import requests
9 |
10 | MARKDOWN = """
11 | # WebcamGPT 💬 + 📸
12 |
13 | webcamGPT is a tool that allows you to chat with video using OpenAI Vision API.
14 |
15 | Visit [awesome-openai-vision-api-experiments](https://github.com/roboflow/awesome-openai-vision-api-experiments)
16 | repository to find more OpenAI Vision API experiments or contribute your own.
17 | """
18 | AVATARS = (
19 | "https://media.roboflow.com/spaces/roboflow_raccoon_full.png",
20 | "https://media.roboflow.com/spaces/openai-white-logomark.png"
21 | )
22 | IMAGE_CACHE_DIRECTORY = "data"
23 | API_URL = "https://api.openai.com/v1/chat/completions"
24 |
25 |
26 | def preprocess_image(image: np.ndarray) -> np.ndarray:
27 | image = np.fliplr(image)
28 | return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
29 |
30 |
31 | def encode_image_to_base64(image: np.ndarray) -> str:
32 | success, buffer = cv2.imencode('.jpg', image)
33 | if not success:
34 | raise ValueError("Could not encode image to JPEG format.")
35 |
36 | encoded_image = base64.b64encode(buffer).decode('utf-8')
37 | return encoded_image
38 |
39 |
40 | def compose_payload(image: np.ndarray, prompt: str) -> dict:
41 | base64_image = encode_image_to_base64(image)
42 | return {
43 | "model": "gpt-4-vision-preview",
44 | "messages": [
45 | {
46 | "role": "user",
47 | "content": [
48 | {
49 | "type": "text",
50 | "text": prompt
51 | },
52 | {
53 | "type": "image_url",
54 | "image_url": {
55 | "url": f"data:image/jpeg;base64,{base64_image}"
56 | }
57 | }
58 | ]
59 | }
60 | ],
61 | "max_tokens": 300
62 | }
63 |
64 |
65 | def compose_headers(api_key: str) -> dict:
66 | return {
67 | "Content-Type": "application/json",
68 | "Authorization": f"Bearer {api_key}"
69 | }
70 |
71 |
72 | def prompt_image(api_key: str, image: np.ndarray, prompt: str) -> str:
73 | headers = compose_headers(api_key=api_key)
74 | payload = compose_payload(image=image, prompt=prompt)
75 | response = requests.post(url=API_URL, headers=headers, json=payload).json()
76 |
77 | if 'error' in response:
78 | raise ValueError(response['error']['message'])
79 | return response['choices'][0]['message']['content']
80 |
81 |
82 | def cache_image(image: np.ndarray) -> str:
83 | image_filename = f"{uuid.uuid4()}.jpeg"
84 | os.makedirs(IMAGE_CACHE_DIRECTORY, exist_ok=True)
85 | image_path = os.path.join(IMAGE_CACHE_DIRECTORY, image_filename)
86 | cv2.imwrite(image_path, image)
87 | return image_path
88 |
89 |
90 | def respond(api_key: str, image: np.ndarray, prompt: str, chat_history):
91 | if not api_key:
92 | raise ValueError(
93 | "API_KEY is not set. "
94 | "Please follow the instructions in the README to set it up.")
95 |
96 | image = preprocess_image(image=image)
97 | cached_image_path = cache_image(image)
98 | response = prompt_image(api_key=api_key, image=image, prompt=prompt)
99 | chat_history.append(((cached_image_path,), None))
100 | chat_history.append((prompt, response))
101 | return "", chat_history
102 |
103 |
104 | with gr.Blocks() as demo:
105 | gr.Markdown(MARKDOWN)
106 | with gr.Row():
107 | webcam = gr.Image(source="webcam", streaming=True)
108 | with gr.Column():
109 | api_key_textbox = gr.Textbox(
110 | label="OpenAI API KEY", type="password")
111 | chatbot = gr.Chatbot(
112 | height=500, bubble_full_width=False, avatar_images=AVATARS)
113 | message_textbox = gr.Textbox()
114 | clear_button = gr.ClearButton([message_textbox, chatbot])
115 |
116 | message_textbox.submit(
117 | fn=respond,
118 | inputs=[api_key_textbox, webcam, message_textbox, chatbot],
119 | outputs=[message_textbox, chatbot]
120 | )
121 |
122 | demo.launch(debug=False, show_error=True)
123 |
--------------------------------------------------------------------------------
/experiments/webcam-gpt/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | opencv-python
3 | requests
4 | supervision
5 | gradio==3.50.2
6 |
--------------------------------------------------------------------------------