openai vision api experiments 🧪

├── .gitattributes
├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── automation
    ├── README.md
    ├── data.csv
    ├── requirements.txt
    └── script.py
└── experiments
    ├── automated-voiceover-of-nba-game
        ├── README.md
        └── notebook.ipynb
    ├── gpt4v-classification
        ├── README.md
        ├── app.py
        ├── fish.jpg
        └── requirements.txt
    ├── gpt4v-grounding-dino-detection
        ├── README.md
        ├── app.py
        ├── mercedes.jpeg
        └── requirements.txt
    ├── gpt4v-narration
        ├── README.md
        ├── app.py
        ├── requirements.txt
        └── wearable.mp4
    ├── gpt4v-vs-clip
        ├── README.md
        ├── app.py
        ├── camry.jpeg
        ├── deep-dish.jpg
        ├── images.jpeg
        └── requirements.txt
    ├── hot-dog-not-hot-dog
        ├── README.md
        ├── app.py
        └── requirements.txt
    └── webcam-gpt
        ├── README.md
        ├── app.py
        └── requirements.txt


/.gitattributes:
--------------------------------------------------------------------------------
1 | *.ipynb linguist-vendored


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | venv/
3 | data/
4 | .DS_Store
5 | __pycache__/


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | ## 🦸 Contributing to awesome-openai-vision-api-experiments 
 2 | 
 3 | We love your input! We want to make contributing to awesome-openai-vision-api-experiments as easy and transparent as possible, whether it's:
 4 | 
 5 | - Reporting a bug
 6 | - Discussing the current state of the code
 7 | - Submitting a fix
 8 | 
 9 | ## 🧪️ Adding a new experiment
10 | 
11 | - **We only accept experiments where the code was open-sourced.**
12 | - Add new subdirectory to `experiments` directory.
13 | - Add new entry to `automation/data.csv` file.
14 | - Run `automation/script.py`. Experiments table in `README.md` will update 
15 | automatically.
16 | - Commit changes to feature branch. Create PR.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <h1 align="center">openai vision api experiments 🧪</h1>
 2 | 
 3 | ## 👋 Hello
 4 | 
 5 | The must-have resource for anyone who wants to experiment with and build on the [OpenAI
 6 | Vision API](https://platform.openai.com/docs/guides/vision). This repository serves as
 7 | a hub for innovative experiments, showcasing a variety of applications ranging from
 8 | simple image classifications to advanced zero-shot learning models. It's a space for
 9 | both beginners and experts to explore the capabilities of the Vision API, share their
10 | findings, and collaborate on pushing the boundaries of visual AI.
11 | 
12 | Experimenting with the OpenAI API requires an API 🔑. You can get one
13 | [here](https://platform.openai.com/api-keys).
14 | 
15 | ## ⚠️ Limitations
16 | 
17 | - 100 API requests per single API key per day.
18 | - Can't be used for object detection or image segmentation. We can solve this problem by combining GPT-4V with foundational models like GroundingDINO or Segment Anything (SAM). Please take a look at the [example](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection) and read our [blog post](https://blog.roboflow.com/dino-gpt-4v).
19 | 
20 | ## 🧪 Experiments
21 | 
22 | <!--- AUTOGENERATED_EXPERIMENTS_LIST -->
23 | <!---
24 |    WARNING: DO NOT EDIT THIS LIST MANUALLY. IT IS AUTOMATICALLY GENERATED.
25 |    HEAD OVER TO CONTRIBUTING.MD FOR MORE DETAILS ON HOW TO MAKE CHANGES PROPERLY.
26 | -->
27 | | **experiment** | **complementary materials** | **authors** |
28 | |:--------------:|:---------------------------:|:-----------:|
29 | | WebcamGPT - chat with video stream | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/webcam-gpt) [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/webcamGPT)  | @SkalskiP |
30 | | HotDogGPT - simple image classification application | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/hot-dog-not-hot-dog) [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/HotDogGPT)  | @SkalskiP |
31 | | zero-shot image classifier with GPT-4V | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-classification)   | @capjamesg |
32 | | zero-shot object detection with GroundingDINO + GPT-4V | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection) [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/DINO-GPT4V)  | @capjamesg |
33 | | GPT-4V vs. CLIP | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-vs-clip)   | @capjamesg |
34 | | GPT-4V with Set-of-Mark (SoM) | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/microsoft/SoM)   | Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao |
35 | | GPT-4V on Web | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/Jiayi-Pan/GPT-V-on-Web)   | @Jiayi-Pan |
36 | | automated voiceover of NBA game | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/automated-voiceover-of-nba-game)  [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb) | @SkalskiP |
37 | | screenshot-to-code | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/abi/screenshot-to-code)   | @abi |
38 | | GPT with Vision Checkup | [![GitHub](https://badges.aleen42.com/src/github.svg)]( https://github.com/roboflow/gpt-checkup)   |  Roboflow team |
39 | 
40 | https://github.com/roboflow/awesome-openai-vision-api-experiments/assets/26109316/c63fa3c0-4564-49ee-8982-a9e6a23dae9b
41 | 
42 | ## 🗞️ Must Read Papers
43 | 
44 | - [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V](https://arxiv.org/abs/2310.11441)
45 | by Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao
46 | - [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421)
47 | by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
48 | - [GPT-4 System Card](https://cdn.openai.com/papers/gpt-4-system-card.pdf) by OpenAI
49 | 
50 | ## 🖊️ Blogs
51 | 
52 | - [How CLIP and GPT-4V Compare for Classification](https://blog.roboflow.com/clip-vs-gpt-4v/)
53 | - [Experiments with GPT-4V for Object Detection](https://blog.roboflow.com/gpt-4v-object-detection/)
54 | - [Distilling GPT-4 for Classification with an API](https://blog.roboflow.com/gpt-4-image-classification/)
55 | - [DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model](https://blog.roboflow.com/dino-gpt-4v/)
56 | - [First Impressions with GPT-4V(ision)](https://blog.roboflow.com/gpt-4-vision/)
57 | 
58 | ## 🦸 Contribution
59 | 
60 | We would love your help in making this repository even better! Whether you want to
61 | add a new experiment or have any suggestions for improvement,
62 | feel free to open an [issue](https://github.com/roboflow/awesome-openai-vision-api-experiments/issues)
63 | or [pull request](https://github.com/roboflow/awesome-openai-vision-api-experiments/pulls).
64 | 
65 | If you are up to the task and want to add a new experiment, please look at our [contribution guide](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/CONTRIBUTING.md). There you can find all the information you need.
66 | 


--------------------------------------------------------------------------------
/automation/README.md:
--------------------------------------------------------------------------------
 1 | ## Install
 2 | 
 3 | ```bash
 4 | # setup and activate python environment
 5 | python3 -m venv venv
 6 | source venv/bin/activate
 7 | 
 8 | pip install -r automation/requirements.txt
 9 | ```
10 | 
11 | ## Generate
12 | 
13 | ```bash
14 | python automation/script.py
15 | ```


--------------------------------------------------------------------------------
/automation/data.csv:
--------------------------------------------------------------------------------
 1 | title, code, huggingface, colab, authors
 2 | "WebcamGPT - chat with video stream","https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/webcam-gpt","https://huggingface.co/spaces/Roboflow/webcamGPT","",@SkalskiP
 3 | "HotDogGPT - simple image classification application","https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/hot-dog-not-hot-dog","https://huggingface.co/spaces/Roboflow/HotDogGPT","",@SkalskiP
 4 | "zero-shot image classifier with GPT-4V","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-classification","","",@capjamesg
 5 | "zero-shot object detection with GroundingDINO + GPT-4V","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection","https://huggingface.co/spaces/Roboflow/DINO-GPT4V","",@capjamesg
 6 | "GPT-4V vs. CLIP","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-vs-clip","","",@capjamesg
 7 | "GPT-4V with Set-of-Mark (SoM)","https://github.com/microsoft/SoM","","","Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao"
 8 | "GPT-4V audio narration","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-narration","","",@etown
 9 | "GPT-4V on Web","https://github.com/Jiayi-Pan/GPT-V-on-Web","","",@Jiayi-Pan
10 | "automated voiceover of NBA game","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/automated-voiceover-of-nba-game","","https://colab.research.google.com/github/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb",@SkalskiP
11 | "GPT with Vision Checkup", https://github.com/roboflow/gpt-checkup,,, Roboflow team
12 | 


--------------------------------------------------------------------------------
/automation/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas


--------------------------------------------------------------------------------
/automation/script.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | from typing import List
  3 | 
  4 | import pandas as pd
  5 | 
  6 | from pandas.core.series import Series
  7 | 
  8 | TITLE_COLUMN_NAME = "title"
  9 | CODE_COLUMN_NAME = "code"
 10 | HUGGINGFACE_COLUMN_NAME = "huggingface"
 11 | COLAB_COLUMN_NAME = "colab"
 12 | AUTHORS_COLUMN_NAME = "authors"
 13 | 
 14 | AUTOGENERATED_EXPERIMENTS_LIST_TOKEN = "<!--- AUTOGENERATED_EXPERIMENTS_LIST -->"
 15 | 
 16 | WARNING_HEADER = [
 17 |     "<!---",
 18 |     "   WARNING: DO NOT EDIT THIS LIST MANUALLY. IT IS AUTOMATICALLY GENERATED.",
 19 |     "   HEAD OVER TO CONTRIBUTING.MD FOR MORE DETAILS ON HOW TO MAKE CHANGES PROPERLY.",
 20 |     "-->"
 21 | ]
 22 | 
 23 | GITHUB_BADGE_PATTERN = "[![GitHub](https://badges.aleen42.com/src/github.svg)]({})"
 24 | HUGGINGFACE_BADGE_PATTERN = "[![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)]({})"
 25 | COLAB_BADGE_PATTERN = "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)]({})"
 26 | 
 27 | 
 28 | TABLE_HEADER = [
 29 |     "| **experiment** | **complementary materials** | **authors** |",
 30 |     "|:--------------:|:---------------------------:|:-----------:|"
 31 | ]
 32 | 
 33 | 
 34 | def read_lines_from_file(path: str) -> List[str]:
 35 |     """
 36 |     Reads lines from file and strips trailing whitespaces.
 37 |     """
 38 |     with open(path) as file:
 39 |         return [line.rstrip() for line in file]
 40 | 
 41 | 
 42 | def save_lines_to_file(path: str, lines: List[str]) -> None:
 43 |     """
 44 |     Saves lines to file.
 45 |     """
 46 |     with open(path, "w") as f:
 47 |         for line in lines:
 48 |             f.write("%s\n" % line)
 49 | 
 50 | 
 51 | def format_entry(entry: Series) -> str:
 52 |     title = entry.loc[TITLE_COLUMN_NAME]
 53 |     code_url = entry.loc[CODE_COLUMN_NAME]
 54 |     huggingface_url = entry.loc[HUGGINGFACE_COLUMN_NAME]
 55 |     colab_url = entry.loc[COLAB_COLUMN_NAME]
 56 |     authors = entry.loc[AUTHORS_COLUMN_NAME]
 57 |     code_badge = GITHUB_BADGE_PATTERN.format(
 58 |         code_url) if code_url else ""
 59 |     huggingface_badge = HUGGINGFACE_BADGE_PATTERN.format(
 60 |         huggingface_url) if huggingface_url else ""
 61 |     colab_badge = COLAB_BADGE_PATTERN.format(
 62 |         colab_url) if colab_url else ""
 63 |     complementary_materials = " ".join([code_badge, huggingface_badge, colab_badge])
 64 |     return "| {} | {} | {} |".format(title, complementary_materials, authors)
 65 | 
 66 | 
 67 | def load_table_entries(path: str) -> List[str]:
 68 |     """
 69 |     Loads table entries from csv file.
 70 |     """
 71 |     df = pd.read_csv(path, quotechar='"', dtype=str)
 72 |     df.columns = df.columns.str.strip()
 73 |     df = df.fillna("")
 74 |     return [
 75 |         format_entry(row)
 76 |         for _, row
 77 |         in df.iterrows()
 78 |     ]
 79 | 
 80 | 
 81 | def search_lines_with_token(lines: List[str], token: str) -> List[int]:
 82 |     result = []
 83 |     for line_index, line in enumerate(lines):
 84 |         if token in line:
 85 |             result.append(line_index)
 86 |     return result
 87 | 
 88 | 
 89 | def inject_markdown_table_into_readme(
 90 |         readme_lines: List[str],
 91 |         table_lines: List[str]
 92 | ) -> List[str]:
 93 |     lines_with_token_indexes = search_lines_with_token(
 94 |         lines=readme_lines,
 95 |         token=AUTOGENERATED_EXPERIMENTS_LIST_TOKEN)
 96 |     if len(lines_with_token_indexes) != 2:
 97 |         raise Exception(f"Please inject two {AUTOGENERATED_EXPERIMENTS_LIST_TOKEN} "
 98 |                         f"tokens to signal start and end of autogenerated table.")
 99 | 
100 |     [table_start_line_index, table_end_line_index] = lines_with_token_indexes
101 |     return (
102 |         readme_lines[:table_start_line_index + 1] +
103 |         table_lines +
104 |         readme_lines[table_end_line_index:]
105 |     )
106 | 
107 | 
108 | if __name__ == "__main__":
109 |     parser = argparse.ArgumentParser()
110 |     parser.add_argument('-d', '--data_path', default='automation/data.csv')
111 |     parser.add_argument('-r', '--readme_path', default='README.md')
112 |     args = parser.parse_args()
113 | 
114 |     table_lines = load_table_entries(path=args.data_path)
115 |     table_lines = WARNING_HEADER + TABLE_HEADER + table_lines
116 |     readme_lines = read_lines_from_file(path=args.readme_path)
117 |     readme_lines = inject_markdown_table_into_readme(
118 |         readme_lines=readme_lines,
119 |         table_lines=table_lines)
120 |     save_lines_to_file(path=args.readme_path, lines=readme_lines)
121 | 


--------------------------------------------------------------------------------
/experiments/automated-voiceover-of-nba-game/README.md:
--------------------------------------------------------------------------------
1 | ## Automated voiceover of NBA game 🏀


--------------------------------------------------------------------------------
/experiments/gpt4v-classification/README.md:
--------------------------------------------------------------------------------
 1 | # GPT-4V Classification
 2 | 
 3 | ## 💻 Install
 4 | 
 5 | ```bash
 6 | # create and activate virtual environment
 7 | python3 -m venv venv
 8 | source venv/bin/activate
 9 | 
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```


--------------------------------------------------------------------------------
/experiments/gpt4v-classification/app.py:
--------------------------------------------------------------------------------
 1 | from autodistill_gpt_4v import GPT4V
 2 | from autodistill.detection import CaptionOntology
 3 | import os
 4 | 
 5 | base_model = GPT4V(
 6 |     ontology=CaptionOntology(
 7 |         {
 8 |             "salmon": "salmon",
 9 |             "carp": "carp"
10 |         }
11 |     ),
12 |     api_key=os.environ["OPENAI_API_KEY"]
13 | )
14 | 
15 | result = base_model.predict("fish.jpg", base_model.ontology.prompts())
16 | 
17 | class_result = base_model.ontology.prompts()[result.get_top_k(1).class_id]
18 | print(class_result)


--------------------------------------------------------------------------------
/experiments/gpt4v-classification/fish.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-classification/fish.jpg


--------------------------------------------------------------------------------
/experiments/gpt4v-classification/requirements.txt:
--------------------------------------------------------------------------------
1 | autodistill_gpt4v
2 | autodistill


--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/README.md:
--------------------------------------------------------------------------------
 1 | # GPT-4V GroundingDINO Object Detection
 2 | 
 3 | ## 💻 Install
 4 | 
 5 | ```bash
 6 | # create and activate virtual environment
 7 | python3 -m venv venv
 8 | source venv/bin/activate
 9 | 
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```


--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/app.py:
--------------------------------------------------------------------------------
 1 | from autodistill_gpt_4v import GPT4V
 2 | from autodistill.detection import CaptionOntology
 3 | from autodistill_grounding_dino import GroundingDINO
 4 | from autodistill.utils import plot
 5 | 
 6 | from autodistill.core.custom_detection_model import CustomDetectionModel
 7 | import cv2
 8 | import os
 9 | 
10 | classes = ["mercedes", "toyota"]
11 | 
12 | 
13 | DINOGPT = CustomDetectionModel(
14 |     detection_model=GroundingDINO(
15 |         CaptionOntology({"car": "car"})
16 |     ),
17 |     classification_model=GPT4V(
18 |         CaptionOntology({k: k for k in classes}),
19 |         api_key=os.environ["OPENAI_API_KEY"]
20 |     )
21 | )
22 | 
23 | IMAGE = "mercedes.jpeg"
24 | 
25 | results = DINOGPT.predict(IMAGE)
26 | 
27 | plot(
28 |     image=cv2.imread(IMAGE),
29 |     detections=results,
30 |     classes=["mercedes", "toyota", "car"]
31 | )
32 | 


--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/mercedes.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-grounding-dino-detection/mercedes.jpeg


--------------------------------------------------------------------------------
/experiments/gpt4v-grounding-dino-detection/requirements.txt:
--------------------------------------------------------------------------------
1 | autodistill_grounding_dino
2 | autodistill-gpt-4v
3 | autodistill
4 | 


--------------------------------------------------------------------------------
/experiments/gpt4v-narration/README.md:
--------------------------------------------------------------------------------
 1 | # Life Narration 
 2 | 
 3 | https://github.com/etown/LifeNarration/assets/357244/3e9a39a0-7f90-42d8-97ec-4b9825918e56
 4 | 
 5 | ## 💻 Install
 6 | 
 7 | ```bash
 8 | # create and activate virtual environment
 9 | python3 -m venv venv
10 | source venv/bin/activate
11 | 
12 | # install dependencies
13 | pip install -r requirements.txt
14 | ```
15 | 
16 | ## 🚀 Run
17 | 
18 | ```bash
19 | python app.py
20 | ```


--------------------------------------------------------------------------------
/experiments/gpt4v-narration/app.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import base64
 3 | import subprocess
 4 | import tempfile
 5 | import os
 6 | from elevenlabs import generate, set_api_key
 7 | from openai import OpenAI
 8 | 
 9 | # API Keys and File Paths
10 | OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
11 | ELEVENLABS_API_KEY = os.environ["ELEVENLABS_API_KEY"]
12 | VIDEO_FILE_PATH = 'wearable.mp4'
13 | OUTPUT_FILE_PATH = 'output_with_audio.mp4'
14 | 
15 | # Set API keys
16 | set_api_key(ELEVENLABS_API_KEY)
17 | client = OpenAI(
18 |     api_key=OPENAI_API_KEY,
19 | )
20 | 
21 | def read_video_frames(video_path, skip_frames=10):
22 |     video = cv2.VideoCapture(video_path)
23 |     base64_frames = []
24 |     frame_count = 0
25 |     while video.isOpened():
26 |         success, frame = video.read()
27 |         if not success:
28 |             break
29 |         if frame_count % skip_frames == 0:
30 |             _, buffer = cv2.imencode(".jpg", frame)
31 |             base64_frames.append(base64.b64encode(buffer).decode("utf-8"))
32 |         frame_count += 1
33 |     video.release()
34 |     return base64_frames
35 | 
36 | def generate_script(frames):
37 |     prompt_messages = [
38 |         {
39 |             "role": "user",
40 |             "content": [
41 |                 "These are frames of a video recorded from a person's point of view going through mundane life tasks. Create a short voiceover script in the style of a super excited sports narrator...",
42 |                 *map(lambda x: {"image": x, "resize": 768}, frames[0::10]),
43 |             ],
44 |         },
45 |     ]
46 |     result = client.chat.completions.create(
47 |         model="gpt-4-vision-preview",
48 |         messages=prompt_messages,
49 |         max_tokens=500,
50 |     )
51 |     return result.choices[0].message.content
52 | 
53 | def shorten_script(script):
54 |     prompt_messages = [
55 |         {
56 |             "role": "user",
57 |             "content": f"Shorten this script so it can be read in about 30 seconds: {script}",
58 |         }
59 |     ]
60 |     result = client.chat.completions.create(
61 |         model="gpt-4",
62 |         messages=prompt_messages,
63 |         max_tokens=500,
64 |     )
65 |     return result.choices[0].message.content
66 | 
67 | def generate_audio(text):
68 |     return generate(
69 |         text=text,
70 |         voice="Oliver",
71 |         model='eleven_multilingual_v2'
72 |     )
73 | 
74 | def merge_audio_with_video(audio, video_path, output_path):
75 |     with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as audio_file:
76 |         audio_file.write(audio)
77 |         audio_filename = audio_file.name
78 | 
79 |     ffmpeg_command = [
80 |         'ffmpeg', '-y', '-i', video_path, '-i', audio_filename,
81 |         '-c:v', 'copy', '-c:a', 'aac', '-strict', 'experimental', output_path
82 |     ]
83 | 
84 |     subprocess.run(ffmpeg_command)
85 | 
86 | # Main Process
87 | frames = read_video_frames(VIDEO_FILE_PATH)
88 | script = generate_script(frames)
89 | short_script = shorten_script(script)
90 | audio = generate_audio(short_script)
91 | merge_audio_with_video(audio, VIDEO_FILE_PATH, OUTPUT_FILE_PATH)


--------------------------------------------------------------------------------
/experiments/gpt4v-narration/requirements.txt:
--------------------------------------------------------------------------------
1 | opencv-python-headless==4.5.5.64
2 | openai==1.2.2
3 | elevenlabs==0.2.24
4 | ffmpeg-python==0.2.0
5 | 


--------------------------------------------------------------------------------
/experiments/gpt4v-narration/wearable.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-narration/wearable.mp4


--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/README.md:
--------------------------------------------------------------------------------
 1 | # GPT-4V vs. CLIP
 2 | 
 3 | ## 💻 Install
 4 | 
 5 | ```bash
 6 | # create and activate virtual environment
 7 | python3 -m venv venv
 8 | source venv/bin/activate
 9 | 
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```


--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/app.py:
--------------------------------------------------------------------------------
 1 | from autodistill_gpt_4v import GPT4V
 2 | from autodistill.detection import CaptionOntology
 3 | from autodistill_clip import CLIP
 4 | import os
 5 | 
 6 | prompts = ["chicago deep dish pizza", "pizza"]
 7 | 
 8 | ontology = CaptionOntology(
 9 |     {k: k for k in prompts}
10 | )
11 | 
12 | clip_model = CLIP(ontology=ontology)
13 | 
14 | clip_result = clip_model.predict("deep-dish.jpg")
15 | 
16 | class_result = prompts[clip_result.class_id[0]]
17 | 
18 | print("CLIP result: ", class_result)
19 | 
20 | gpt_4v_model = GPT4V(ontology=ontology, api_key=os.environ["OPENAI_API_KEY"])
21 | 
22 | gpt_result = gpt_4v_model.predict("deep-dish.jpg", gpt_4v_model.ontology.prompts())
23 | 
24 | class_result = prompts[gpt_result.class_id[0]]
25 | print("GPT-4-V result: ", class_result)
26 | 


--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/camry.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/camry.jpeg


--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/deep-dish.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/deep-dish.jpg


--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/images.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/images.jpeg


--------------------------------------------------------------------------------
/experiments/gpt4v-vs-clip/requirements.txt:
--------------------------------------------------------------------------------
1 | autodistill_gpt-4v
2 | autodistill_clip
3 | autodistill


--------------------------------------------------------------------------------
/experiments/hot-dog-not-hot-dog/README.md:
--------------------------------------------------------------------------------
 1 | # HotDogGPT 💬 + 🌭
 2 | 
 3 | ## 💻 Install
 4 | 
 5 | ```bash
 6 | # create and activate virtual environment
 7 | python3 -m venv venv
 8 | source venv/bin/activate
 9 | 
10 | # install dependencies
11 | pip install -r requirements.txt
12 | ```


--------------------------------------------------------------------------------
/experiments/hot-dog-not-hot-dog/app.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | 
  3 | import cv2
  4 | import gradio as gr
  5 | import numpy as np
  6 | import requests
  7 | 
  8 | MARKDOWN = """
  9 | # HotDogGPT 💬 + 🌭
 10 | 
 11 | HotDogGPT is OpenAI Vision API experiment reproducing the famous 
 12 | [Hot Dog, Not Hot Dog](https://www.youtube.com/watch?v=ACmydtFDTGs) app from Silicon 
 13 | Valley.
 14 | 
 15 | <p align="center">
 16 |     <img width="600" src="https://miro.medium.com/v2/resize:fit:650/1*VrpXE1hE4rO1roK0laOd7g.png" alt="hotdog">
 17 | </p>
 18 | 
 19 | Visit [awesome-openai-vision-api-experiments](https://github.com/roboflow/awesome-openai-vision-api-experiments) 
 20 | repository to find more OpenAI Vision API experiments or contribute your own.
 21 | """
 22 | API_URL = "https://api.openai.com/v1/chat/completions"
 23 | CLASSES = ["🌭 Hot Dog", "❌ Not Hot Dog"]
 24 | 
 25 | 
 26 | def preprocess_image(image: np.ndarray) -> np.ndarray:
 27 |     image = np.fliplr(image)
 28 |     return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
 29 | 
 30 | 
 31 | def encode_image_to_base64(image: np.ndarray) -> str:
 32 |     success, buffer = cv2.imencode('.jpg', image)
 33 |     if not success:
 34 |         raise ValueError("Could not encode image to JPEG format.")
 35 | 
 36 |     encoded_image = base64.b64encode(buffer).decode('utf-8')
 37 |     return encoded_image
 38 | 
 39 | 
 40 | def compose_payload(image: np.ndarray, prompt: str) -> dict:
 41 |     base64_image = encode_image_to_base64(image)
 42 |     return {
 43 |         "model": "gpt-4-vision-preview",
 44 |         "messages": [
 45 |             {
 46 |                 "role": "user",
 47 |                 "content": [
 48 |                     {
 49 |                         "type": "text",
 50 |                         "text": prompt
 51 |                     },
 52 |                     {
 53 |                         "type": "image_url",
 54 |                         "image_url": {
 55 |                             "url": f"data:image/jpeg;base64,{base64_image}"
 56 |                         }
 57 |                     }
 58 |                 ]
 59 |             }
 60 |         ],
 61 |         "max_tokens": 300
 62 |     }
 63 | 
 64 | 
 65 | def compose_classification_prompt(classes: list) -> str:
 66 |     return (f"What is in the image? Return the class of the object in the image. Here "
 67 |             f"are the classes: {', '.join(classes)}. You can only return one class "
 68 |             f"from that list.")
 69 | 
 70 | 
 71 | def compose_headers(api_key: str) -> dict:
 72 |     return {
 73 |         "Content-Type": "application/json",
 74 |         "Authorization": f"Bearer {api_key}"
 75 |     }
 76 | 
 77 | 
 78 | def prompt_image(api_key: str, image: np.ndarray, prompt: str) -> str:
 79 |     headers = compose_headers(api_key=api_key)
 80 |     payload = compose_payload(image=image, prompt=prompt)
 81 |     response = requests.post(url=API_URL, headers=headers, json=payload).json()
 82 | 
 83 |     if 'error' in response:
 84 |         raise ValueError(response['error']['message'])
 85 |     return response['choices'][0]['message']['content']
 86 | 
 87 | 
 88 | def classify_image(api_key: str, image: np.ndarray) -> str:
 89 |     if not api_key:
 90 |         raise ValueError(
 91 |             "API_KEY is not set. "
 92 |             "Please follow the instructions in the README to set it up.")
 93 |     image = preprocess_image(image=image)
 94 |     prompt = compose_classification_prompt(classes=CLASSES)
 95 |     response = prompt_image(api_key=api_key, image=image, prompt=prompt)
 96 |     return response
 97 | 
 98 | 
 99 | with gr.Blocks() as demo:
100 |     gr.Markdown(MARKDOWN)
101 |     api_key_textbox = gr.Textbox(
102 |         label="🔑 OpenAI API", type="password")
103 | 
104 |     with gr.TabItem("Basic"):
105 |         with gr.Column():
106 |             input_image = gr.Image(
107 |                 image_mode='RGB', type='numpy', height=500)
108 |             output_text = gr.Textbox(
109 |                 label="Output")
110 |             submit_button = gr.Button("Submit")
111 | 
112 |         submit_button.click(
113 |             fn=classify_image,
114 |             inputs=[api_key_textbox, input_image],
115 |             outputs=output_text)
116 | 
117 |     # with gr.TabItem("Advanced"):
118 |     #     with gr.Column():
119 |     #         advanced_clss_list = gr.Textbox(
120 |     #             label="Comma-separated list of classes")
121 |     #         advanced_input_image = gr.Image(
122 |     #             image_mode='RGB', type='numpy', height=500)
123 |     #         advanced_output_text = gr.Textbox(
124 |     #             label="Output")
125 |     #         advanced_submit_button = gr.Button("Submit")
126 | 
127 | demo.launch(debug=False, show_error=True)
128 | 


--------------------------------------------------------------------------------
/experiments/hot-dog-not-hot-dog/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | opencv-python
3 | requests
4 | supervision
5 | gradio==3.50.2
6 | 


--------------------------------------------------------------------------------
/experiments/webcam-gpt/README.md:
--------------------------------------------------------------------------------
 1 | # WebcamGPT 💬 + 📸
 2 | 
 3 | https://github.com/roboflow/awesome-openai-vision-api-experiments/assets/26109316/c63fa3c0-4564-49ee-8982-a9e6a23dae9b
 4 | 
 5 | ## 💻 Install
 6 | 
 7 | ```bash
 8 | # create and activate virtual environment
 9 | python3 -m venv venv
10 | source venv/bin/activate
11 | 
12 | # install dependencies
13 | pip install -r requirements.txt
14 | ```
15 | 
16 | ## 🚀 Run
17 | 
18 | ```bash
19 | python app.py
20 | ```
21 | 


--------------------------------------------------------------------------------
/experiments/webcam-gpt/app.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import os
  3 | import uuid
  4 | 
  5 | import cv2
  6 | import gradio as gr
  7 | import numpy as np
  8 | import requests
  9 | 
 10 | MARKDOWN = """
 11 | # WebcamGPT 💬 + 📸
 12 | 
 13 | webcamGPT is a tool that allows you to chat with video using OpenAI Vision API.
 14 | 
 15 | Visit [awesome-openai-vision-api-experiments](https://github.com/roboflow/awesome-openai-vision-api-experiments) 
 16 | repository to find more OpenAI Vision API experiments or contribute your own.
 17 | """
 18 | AVATARS = (
 19 |     "https://media.roboflow.com/spaces/roboflow_raccoon_full.png",
 20 |     "https://media.roboflow.com/spaces/openai-white-logomark.png"
 21 | )
 22 | IMAGE_CACHE_DIRECTORY = "data"
 23 | API_URL = "https://api.openai.com/v1/chat/completions"
 24 | 
 25 | 
 26 | def preprocess_image(image: np.ndarray) -> np.ndarray:
 27 |     image = np.fliplr(image)
 28 |     return cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
 29 | 
 30 | 
 31 | def encode_image_to_base64(image: np.ndarray) -> str:
 32 |     success, buffer = cv2.imencode('.jpg', image)
 33 |     if not success:
 34 |         raise ValueError("Could not encode image to JPEG format.")
 35 | 
 36 |     encoded_image = base64.b64encode(buffer).decode('utf-8')
 37 |     return encoded_image
 38 | 
 39 | 
 40 | def compose_payload(image: np.ndarray, prompt: str) -> dict:
 41 |     base64_image = encode_image_to_base64(image)
 42 |     return {
 43 |         "model": "gpt-4-vision-preview",
 44 |         "messages": [
 45 |             {
 46 |                 "role": "user",
 47 |                 "content": [
 48 |                     {
 49 |                         "type": "text",
 50 |                         "text": prompt
 51 |                     },
 52 |                     {
 53 |                         "type": "image_url",
 54 |                         "image_url": {
 55 |                             "url": f"data:image/jpeg;base64,{base64_image}"
 56 |                         }
 57 |                     }
 58 |                 ]
 59 |             }
 60 |         ],
 61 |         "max_tokens": 300
 62 |     }
 63 | 
 64 | 
 65 | def compose_headers(api_key: str) -> dict:
 66 |     return {
 67 |         "Content-Type": "application/json",
 68 |         "Authorization": f"Bearer {api_key}"
 69 |     }
 70 | 
 71 | 
 72 | def prompt_image(api_key: str, image: np.ndarray, prompt: str) -> str:
 73 |     headers = compose_headers(api_key=api_key)
 74 |     payload = compose_payload(image=image, prompt=prompt)
 75 |     response = requests.post(url=API_URL, headers=headers, json=payload).json()
 76 | 
 77 |     if 'error' in response:
 78 |         raise ValueError(response['error']['message'])
 79 |     return response['choices'][0]['message']['content']
 80 | 
 81 | 
 82 | def cache_image(image: np.ndarray) -> str:
 83 |     image_filename = f"{uuid.uuid4()}.jpeg"
 84 |     os.makedirs(IMAGE_CACHE_DIRECTORY, exist_ok=True)
 85 |     image_path = os.path.join(IMAGE_CACHE_DIRECTORY, image_filename)
 86 |     cv2.imwrite(image_path, image)
 87 |     return image_path
 88 | 
 89 | 
 90 | def respond(api_key: str, image: np.ndarray, prompt: str, chat_history):
 91 |     if not api_key:
 92 |         raise ValueError(
 93 |             "API_KEY is not set. "
 94 |             "Please follow the instructions in the README to set it up.")
 95 | 
 96 |     image = preprocess_image(image=image)
 97 |     cached_image_path = cache_image(image)
 98 |     response = prompt_image(api_key=api_key, image=image, prompt=prompt)
 99 |     chat_history.append(((cached_image_path,), None))
100 |     chat_history.append((prompt, response))
101 |     return "", chat_history
102 | 
103 | 
104 | with gr.Blocks() as demo:
105 |     gr.Markdown(MARKDOWN)
106 |     with gr.Row():
107 |         webcam = gr.Image(source="webcam", streaming=True)
108 |         with gr.Column():
109 |             api_key_textbox = gr.Textbox(
110 |                 label="OpenAI API KEY", type="password")
111 |             chatbot = gr.Chatbot(
112 |                 height=500, bubble_full_width=False, avatar_images=AVATARS)
113 |             message_textbox = gr.Textbox()
114 |             clear_button = gr.ClearButton([message_textbox, chatbot])
115 | 
116 |     message_textbox.submit(
117 |         fn=respond,
118 |         inputs=[api_key_textbox, webcam, message_textbox, chatbot],
119 |         outputs=[message_textbox, chatbot]
120 |     )
121 | 
122 | demo.launch(debug=False, show_error=True)
123 | 


--------------------------------------------------------------------------------
/experiments/webcam-gpt/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | opencv-python
3 | requests
4 | supervision
5 | gradio==3.50.2
6 | 


--------------------------------------------------------------------------------