├── .gitattributes ├── .gitignore ├── CONTRIBUTING.md ├── README.md ├── automation ├── README.md ├── data.csv ├── requirements.txt └── script.py └── experiments ├── automated-voiceover-of-nba-game ├── README.md └── notebook.ipynb ├── gpt4v-classification ├── README.md ├── app.py ├── fish.jpg └── requirements.txt ├── gpt4v-grounding-dino-detection ├── README.md ├── app.py ├── mercedes.jpeg └── requirements.txt ├── gpt4v-narration ├── README.md ├── app.py ├── requirements.txt └── wearable.mp4 ├── gpt4v-vs-clip ├── README.md ├── app.py ├── camry.jpeg ├── deep-dish.jpg ├── images.jpeg └── requirements.txt ├── hot-dog-not-hot-dog ├── README.md ├── app.py └── requirements.txt └── webcam-gpt ├── README.md ├── app.py └── requirements.txt /.gitattributes: -------------------------------------------------------------------------------- 1 | *.ipynb linguist-vendored -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | venv/ 3 | data/ 4 | .DS_Store 5 | __pycache__/ -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## 🦸 Contributing to awesome-openai-vision-api-experiments 2 | 3 | We love your input! We want to make contributing to awesome-openai-vision-api-experiments as easy and transparent as possible, whether it's: 4 | 5 | - Reporting a bug 6 | - Discussing the current state of the code 7 | - Submitting a fix 8 | 9 | ## 🧪️ Adding a new experiment 10 | 11 | - **We only accept experiments where the code was open-sourced.** 12 | - Add new subdirectory to `experiments` directory. 13 | - Add new entry to `automation/data.csv` file. 14 | - Run `automation/script.py`. Experiments table in `README.md` will update 15 | automatically. 16 | - Commit changes to feature branch. Create PR. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

openai vision api experiments 🧪

2 | 3 | ## 👋 Hello 4 | 5 | The must-have resource for anyone who wants to experiment with and build on the [OpenAI 6 | Vision API](https://platform.openai.com/docs/guides/vision). This repository serves as 7 | a hub for innovative experiments, showcasing a variety of applications ranging from 8 | simple image classifications to advanced zero-shot learning models. It's a space for 9 | both beginners and experts to explore the capabilities of the Vision API, share their 10 | findings, and collaborate on pushing the boundaries of visual AI. 11 | 12 | Experimenting with the OpenAI API requires an API 🔑. You can get one 13 | [here](https://platform.openai.com/api-keys). 14 | 15 | ## ⚠️ Limitations 16 | 17 | - 100 API requests per single API key per day. 18 | - Can't be used for object detection or image segmentation. We can solve this problem by combining GPT-4V with foundational models like GroundingDINO or Segment Anything (SAM). Please take a look at the [example](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection) and read our [blog post](https://blog.roboflow.com/dino-gpt-4v). 19 | 20 | ## 🧪 Experiments 21 | 22 | 23 | 27 | | **experiment** | **complementary materials** | **authors** | 28 | |:--------------:|:---------------------------:|:-----------:| 29 | | WebcamGPT - chat with video stream | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/webcam-gpt) [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/webcamGPT) | @SkalskiP | 30 | | HotDogGPT - simple image classification application | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/hot-dog-not-hot-dog) [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/HotDogGPT) | @SkalskiP | 31 | | zero-shot image classifier with GPT-4V | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-classification) | @capjamesg | 32 | | zero-shot object detection with GroundingDINO + GPT-4V | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection) [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/DINO-GPT4V) | @capjamesg | 33 | | GPT-4V vs. CLIP | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-vs-clip) | @capjamesg | 34 | | GPT-4V with Set-of-Mark (SoM) | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/microsoft/SoM) | Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao | 35 | | GPT-4V on Web | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/Jiayi-Pan/GPT-V-on-Web) | @Jiayi-Pan | 36 | | automated voiceover of NBA game | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/automated-voiceover-of-nba-game) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb) | @SkalskiP | 37 | | screenshot-to-code | [![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/abi/screenshot-to-code) | @abi | 38 | | GPT with Vision Checkup | [![GitHub](https://badges.aleen42.com/src/github.svg)]( https://github.com/roboflow/gpt-checkup) | Roboflow team | 39 | 40 | https://github.com/roboflow/awesome-openai-vision-api-experiments/assets/26109316/c63fa3c0-4564-49ee-8982-a9e6a23dae9b 41 | 42 | ## 🗞️ Must Read Papers 43 | 44 | - [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V](https://arxiv.org/abs/2310.11441) 45 | by Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao 46 | - [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421) 47 | by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang 48 | - [GPT-4 System Card](https://cdn.openai.com/papers/gpt-4-system-card.pdf) by OpenAI 49 | 50 | ## 🖊️ Blogs 51 | 52 | - [How CLIP and GPT-4V Compare for Classification](https://blog.roboflow.com/clip-vs-gpt-4v/) 53 | - [Experiments with GPT-4V for Object Detection](https://blog.roboflow.com/gpt-4v-object-detection/) 54 | - [Distilling GPT-4 for Classification with an API](https://blog.roboflow.com/gpt-4-image-classification/) 55 | - [DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model](https://blog.roboflow.com/dino-gpt-4v/) 56 | - [First Impressions with GPT-4V(ision)](https://blog.roboflow.com/gpt-4-vision/) 57 | 58 | ## 🦸 Contribution 59 | 60 | We would love your help in making this repository even better! Whether you want to 61 | add a new experiment or have any suggestions for improvement, 62 | feel free to open an [issue](https://github.com/roboflow/awesome-openai-vision-api-experiments/issues) 63 | or [pull request](https://github.com/roboflow/awesome-openai-vision-api-experiments/pulls). 64 | 65 | If you are up to the task and want to add a new experiment, please look at our [contribution guide](https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/CONTRIBUTING.md). There you can find all the information you need. 66 | -------------------------------------------------------------------------------- /automation/README.md: -------------------------------------------------------------------------------- 1 | ## Install 2 | 3 | ```bash 4 | # setup and activate python environment 5 | python3 -m venv venv 6 | source venv/bin/activate 7 | 8 | pip install -r automation/requirements.txt 9 | ``` 10 | 11 | ## Generate 12 | 13 | ```bash 14 | python automation/script.py 15 | ``` -------------------------------------------------------------------------------- /automation/data.csv: -------------------------------------------------------------------------------- 1 | title, code, huggingface, colab, authors 2 | "WebcamGPT - chat with video stream","https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/webcam-gpt","https://huggingface.co/spaces/Roboflow/webcamGPT","",@SkalskiP 3 | "HotDogGPT - simple image classification application","https://github.com/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/hot-dog-not-hot-dog","https://huggingface.co/spaces/Roboflow/HotDogGPT","",@SkalskiP 4 | "zero-shot image classifier with GPT-4V","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-classification","","",@capjamesg 5 | "zero-shot object detection with GroundingDINO + GPT-4V","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-grounding-dino-detection","https://huggingface.co/spaces/Roboflow/DINO-GPT4V","",@capjamesg 6 | "GPT-4V vs. CLIP","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-vs-clip","","",@capjamesg 7 | "GPT-4V with Set-of-Mark (SoM)","https://github.com/microsoft/SoM","","","Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao" 8 | "GPT-4V audio narration","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/gpt4v-narration","","",@etown 9 | "GPT-4V on Web","https://github.com/Jiayi-Pan/GPT-V-on-Web","","",@Jiayi-Pan 10 | "automated voiceover of NBA game","https://github.com/roboflow/awesome-openai-vision-api-experiments/tree/main/experiments/automated-voiceover-of-nba-game","","https://colab.research.google.com/github/roboflow/awesome-openai-vision-api-experiments/blob/main/experiments/automated-voiceover-of-nba-game/notebook.ipynb",@SkalskiP 11 | "GPT with Vision Checkup", https://github.com/roboflow/gpt-checkup,,, Roboflow team 12 | -------------------------------------------------------------------------------- /automation/requirements.txt: -------------------------------------------------------------------------------- 1 | pandas -------------------------------------------------------------------------------- /automation/script.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from typing import List 3 | 4 | import pandas as pd 5 | 6 | from pandas.core.series import Series 7 | 8 | TITLE_COLUMN_NAME = "title" 9 | CODE_COLUMN_NAME = "code" 10 | HUGGINGFACE_COLUMN_NAME = "huggingface" 11 | COLAB_COLUMN_NAME = "colab" 12 | AUTHORS_COLUMN_NAME = "authors" 13 | 14 | AUTOGENERATED_EXPERIMENTS_LIST_TOKEN = "" 15 | 16 | WARNING_HEADER = [ 17 | "" 21 | ] 22 | 23 | GITHUB_BADGE_PATTERN = "[![GitHub](https://badges.aleen42.com/src/github.svg)]({})" 24 | HUGGINGFACE_BADGE_PATTERN = "[![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)]({})" 25 | COLAB_BADGE_PATTERN = "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)]({})" 26 | 27 | 28 | TABLE_HEADER = [ 29 | "| **experiment** | **complementary materials** | **authors** |", 30 | "|:--------------:|:---------------------------:|:-----------:|" 31 | ] 32 | 33 | 34 | def read_lines_from_file(path: str) -> List[str]: 35 | """ 36 | Reads lines from file and strips trailing whitespaces. 37 | """ 38 | with open(path) as file: 39 | return [line.rstrip() for line in file] 40 | 41 | 42 | def save_lines_to_file(path: str, lines: List[str]) -> None: 43 | """ 44 | Saves lines to file. 45 | """ 46 | with open(path, "w") as f: 47 | for line in lines: 48 | f.write("%s\n" % line) 49 | 50 | 51 | def format_entry(entry: Series) -> str: 52 | title = entry.loc[TITLE_COLUMN_NAME] 53 | code_url = entry.loc[CODE_COLUMN_NAME] 54 | huggingface_url = entry.loc[HUGGINGFACE_COLUMN_NAME] 55 | colab_url = entry.loc[COLAB_COLUMN_NAME] 56 | authors = entry.loc[AUTHORS_COLUMN_NAME] 57 | code_badge = GITHUB_BADGE_PATTERN.format( 58 | code_url) if code_url else "" 59 | huggingface_badge = HUGGINGFACE_BADGE_PATTERN.format( 60 | huggingface_url) if huggingface_url else "" 61 | colab_badge = COLAB_BADGE_PATTERN.format( 62 | colab_url) if colab_url else "" 63 | complementary_materials = " ".join([code_badge, huggingface_badge, colab_badge]) 64 | return "| {} | {} | {} |".format(title, complementary_materials, authors) 65 | 66 | 67 | def load_table_entries(path: str) -> List[str]: 68 | """ 69 | Loads table entries from csv file. 70 | """ 71 | df = pd.read_csv(path, quotechar='"', dtype=str) 72 | df.columns = df.columns.str.strip() 73 | df = df.fillna("") 74 | return [ 75 | format_entry(row) 76 | for _, row 77 | in df.iterrows() 78 | ] 79 | 80 | 81 | def search_lines_with_token(lines: List[str], token: str) -> List[int]: 82 | result = [] 83 | for line_index, line in enumerate(lines): 84 | if token in line: 85 | result.append(line_index) 86 | return result 87 | 88 | 89 | def inject_markdown_table_into_readme( 90 | readme_lines: List[str], 91 | table_lines: List[str] 92 | ) -> List[str]: 93 | lines_with_token_indexes = search_lines_with_token( 94 | lines=readme_lines, 95 | token=AUTOGENERATED_EXPERIMENTS_LIST_TOKEN) 96 | if len(lines_with_token_indexes) != 2: 97 | raise Exception(f"Please inject two {AUTOGENERATED_EXPERIMENTS_LIST_TOKEN} " 98 | f"tokens to signal start and end of autogenerated table.") 99 | 100 | [table_start_line_index, table_end_line_index] = lines_with_token_indexes 101 | return ( 102 | readme_lines[:table_start_line_index + 1] + 103 | table_lines + 104 | readme_lines[table_end_line_index:] 105 | ) 106 | 107 | 108 | if __name__ == "__main__": 109 | parser = argparse.ArgumentParser() 110 | parser.add_argument('-d', '--data_path', default='automation/data.csv') 111 | parser.add_argument('-r', '--readme_path', default='README.md') 112 | args = parser.parse_args() 113 | 114 | table_lines = load_table_entries(path=args.data_path) 115 | table_lines = WARNING_HEADER + TABLE_HEADER + table_lines 116 | readme_lines = read_lines_from_file(path=args.readme_path) 117 | readme_lines = inject_markdown_table_into_readme( 118 | readme_lines=readme_lines, 119 | table_lines=table_lines) 120 | save_lines_to_file(path=args.readme_path, lines=readme_lines) 121 | -------------------------------------------------------------------------------- /experiments/automated-voiceover-of-nba-game/README.md: -------------------------------------------------------------------------------- 1 | ## Automated voiceover of NBA game 🏀 -------------------------------------------------------------------------------- /experiments/gpt4v-classification/README.md: -------------------------------------------------------------------------------- 1 | # GPT-4V Classification 2 | 3 | ## 💻 Install 4 | 5 | ```bash 6 | # create and activate virtual environment 7 | python3 -m venv venv 8 | source venv/bin/activate 9 | 10 | # install dependencies 11 | pip install -r requirements.txt 12 | ``` -------------------------------------------------------------------------------- /experiments/gpt4v-classification/app.py: -------------------------------------------------------------------------------- 1 | from autodistill_gpt_4v import GPT4V 2 | from autodistill.detection import CaptionOntology 3 | import os 4 | 5 | base_model = GPT4V( 6 | ontology=CaptionOntology( 7 | { 8 | "salmon": "salmon", 9 | "carp": "carp" 10 | } 11 | ), 12 | api_key=os.environ["OPENAI_API_KEY"] 13 | ) 14 | 15 | result = base_model.predict("fish.jpg", base_model.ontology.prompts()) 16 | 17 | class_result = base_model.ontology.prompts()[result.get_top_k(1).class_id] 18 | print(class_result) -------------------------------------------------------------------------------- /experiments/gpt4v-classification/fish.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-classification/fish.jpg -------------------------------------------------------------------------------- /experiments/gpt4v-classification/requirements.txt: -------------------------------------------------------------------------------- 1 | autodistill_gpt4v 2 | autodistill -------------------------------------------------------------------------------- /experiments/gpt4v-grounding-dino-detection/README.md: -------------------------------------------------------------------------------- 1 | # GPT-4V GroundingDINO Object Detection 2 | 3 | ## 💻 Install 4 | 5 | ```bash 6 | # create and activate virtual environment 7 | python3 -m venv venv 8 | source venv/bin/activate 9 | 10 | # install dependencies 11 | pip install -r requirements.txt 12 | ``` -------------------------------------------------------------------------------- /experiments/gpt4v-grounding-dino-detection/app.py: -------------------------------------------------------------------------------- 1 | from autodistill_gpt_4v import GPT4V 2 | from autodistill.detection import CaptionOntology 3 | from autodistill_grounding_dino import GroundingDINO 4 | from autodistill.utils import plot 5 | 6 | from autodistill.core.custom_detection_model import CustomDetectionModel 7 | import cv2 8 | import os 9 | 10 | classes = ["mercedes", "toyota"] 11 | 12 | 13 | DINOGPT = CustomDetectionModel( 14 | detection_model=GroundingDINO( 15 | CaptionOntology({"car": "car"}) 16 | ), 17 | classification_model=GPT4V( 18 | CaptionOntology({k: k for k in classes}), 19 | api_key=os.environ["OPENAI_API_KEY"] 20 | ) 21 | ) 22 | 23 | IMAGE = "mercedes.jpeg" 24 | 25 | results = DINOGPT.predict(IMAGE) 26 | 27 | plot( 28 | image=cv2.imread(IMAGE), 29 | detections=results, 30 | classes=["mercedes", "toyota", "car"] 31 | ) 32 | -------------------------------------------------------------------------------- /experiments/gpt4v-grounding-dino-detection/mercedes.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-grounding-dino-detection/mercedes.jpeg -------------------------------------------------------------------------------- /experiments/gpt4v-grounding-dino-detection/requirements.txt: -------------------------------------------------------------------------------- 1 | autodistill_grounding_dino 2 | autodistill-gpt-4v 3 | autodistill 4 | -------------------------------------------------------------------------------- /experiments/gpt4v-narration/README.md: -------------------------------------------------------------------------------- 1 | # Life Narration 2 | 3 | https://github.com/etown/LifeNarration/assets/357244/3e9a39a0-7f90-42d8-97ec-4b9825918e56 4 | 5 | ## 💻 Install 6 | 7 | ```bash 8 | # create and activate virtual environment 9 | python3 -m venv venv 10 | source venv/bin/activate 11 | 12 | # install dependencies 13 | pip install -r requirements.txt 14 | ``` 15 | 16 | ## 🚀 Run 17 | 18 | ```bash 19 | python app.py 20 | ``` -------------------------------------------------------------------------------- /experiments/gpt4v-narration/app.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import base64 3 | import subprocess 4 | import tempfile 5 | import os 6 | from elevenlabs import generate, set_api_key 7 | from openai import OpenAI 8 | 9 | # API Keys and File Paths 10 | OPENAI_API_KEY = os.environ["OPENAI_API_KEY"] 11 | ELEVENLABS_API_KEY = os.environ["ELEVENLABS_API_KEY"] 12 | VIDEO_FILE_PATH = 'wearable.mp4' 13 | OUTPUT_FILE_PATH = 'output_with_audio.mp4' 14 | 15 | # Set API keys 16 | set_api_key(ELEVENLABS_API_KEY) 17 | client = OpenAI( 18 | api_key=OPENAI_API_KEY, 19 | ) 20 | 21 | def read_video_frames(video_path, skip_frames=10): 22 | video = cv2.VideoCapture(video_path) 23 | base64_frames = [] 24 | frame_count = 0 25 | while video.isOpened(): 26 | success, frame = video.read() 27 | if not success: 28 | break 29 | if frame_count % skip_frames == 0: 30 | _, buffer = cv2.imencode(".jpg", frame) 31 | base64_frames.append(base64.b64encode(buffer).decode("utf-8")) 32 | frame_count += 1 33 | video.release() 34 | return base64_frames 35 | 36 | def generate_script(frames): 37 | prompt_messages = [ 38 | { 39 | "role": "user", 40 | "content": [ 41 | "These are frames of a video recorded from a person's point of view going through mundane life tasks. Create a short voiceover script in the style of a super excited sports narrator...", 42 | *map(lambda x: {"image": x, "resize": 768}, frames[0::10]), 43 | ], 44 | }, 45 | ] 46 | result = client.chat.completions.create( 47 | model="gpt-4-vision-preview", 48 | messages=prompt_messages, 49 | max_tokens=500, 50 | ) 51 | return result.choices[0].message.content 52 | 53 | def shorten_script(script): 54 | prompt_messages = [ 55 | { 56 | "role": "user", 57 | "content": f"Shorten this script so it can be read in about 30 seconds: {script}", 58 | } 59 | ] 60 | result = client.chat.completions.create( 61 | model="gpt-4", 62 | messages=prompt_messages, 63 | max_tokens=500, 64 | ) 65 | return result.choices[0].message.content 66 | 67 | def generate_audio(text): 68 | return generate( 69 | text=text, 70 | voice="Oliver", 71 | model='eleven_multilingual_v2' 72 | ) 73 | 74 | def merge_audio_with_video(audio, video_path, output_path): 75 | with tempfile.NamedTemporaryFile(suffix='.mp3', delete=False) as audio_file: 76 | audio_file.write(audio) 77 | audio_filename = audio_file.name 78 | 79 | ffmpeg_command = [ 80 | 'ffmpeg', '-y', '-i', video_path, '-i', audio_filename, 81 | '-c:v', 'copy', '-c:a', 'aac', '-strict', 'experimental', output_path 82 | ] 83 | 84 | subprocess.run(ffmpeg_command) 85 | 86 | # Main Process 87 | frames = read_video_frames(VIDEO_FILE_PATH) 88 | script = generate_script(frames) 89 | short_script = shorten_script(script) 90 | audio = generate_audio(short_script) 91 | merge_audio_with_video(audio, VIDEO_FILE_PATH, OUTPUT_FILE_PATH) -------------------------------------------------------------------------------- /experiments/gpt4v-narration/requirements.txt: -------------------------------------------------------------------------------- 1 | opencv-python-headless==4.5.5.64 2 | openai==1.2.2 3 | elevenlabs==0.2.24 4 | ffmpeg-python==0.2.0 5 | -------------------------------------------------------------------------------- /experiments/gpt4v-narration/wearable.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-narration/wearable.mp4 -------------------------------------------------------------------------------- /experiments/gpt4v-vs-clip/README.md: -------------------------------------------------------------------------------- 1 | # GPT-4V vs. CLIP 2 | 3 | ## 💻 Install 4 | 5 | ```bash 6 | # create and activate virtual environment 7 | python3 -m venv venv 8 | source venv/bin/activate 9 | 10 | # install dependencies 11 | pip install -r requirements.txt 12 | ``` -------------------------------------------------------------------------------- /experiments/gpt4v-vs-clip/app.py: -------------------------------------------------------------------------------- 1 | from autodistill_gpt_4v import GPT4V 2 | from autodistill.detection import CaptionOntology 3 | from autodistill_clip import CLIP 4 | import os 5 | 6 | prompts = ["chicago deep dish pizza", "pizza"] 7 | 8 | ontology = CaptionOntology( 9 | {k: k for k in prompts} 10 | ) 11 | 12 | clip_model = CLIP(ontology=ontology) 13 | 14 | clip_result = clip_model.predict("deep-dish.jpg") 15 | 16 | class_result = prompts[clip_result.class_id[0]] 17 | 18 | print("CLIP result: ", class_result) 19 | 20 | gpt_4v_model = GPT4V(ontology=ontology, api_key=os.environ["OPENAI_API_KEY"]) 21 | 22 | gpt_result = gpt_4v_model.predict("deep-dish.jpg", gpt_4v_model.ontology.prompts()) 23 | 24 | class_result = prompts[gpt_result.class_id[0]] 25 | print("GPT-4-V result: ", class_result) 26 | -------------------------------------------------------------------------------- /experiments/gpt4v-vs-clip/camry.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/camry.jpeg -------------------------------------------------------------------------------- /experiments/gpt4v-vs-clip/deep-dish.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/deep-dish.jpg -------------------------------------------------------------------------------- /experiments/gpt4v-vs-clip/images.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/roboflow/awesome-openai-vision-api-experiments/7adff72c2de03e104f1029dfab3988cf4dca83be/experiments/gpt4v-vs-clip/images.jpeg -------------------------------------------------------------------------------- /experiments/gpt4v-vs-clip/requirements.txt: -------------------------------------------------------------------------------- 1 | autodistill_gpt-4v 2 | autodistill_clip 3 | autodistill -------------------------------------------------------------------------------- /experiments/hot-dog-not-hot-dog/README.md: -------------------------------------------------------------------------------- 1 | # HotDogGPT 💬 + 🌭 2 | 3 | ## 💻 Install 4 | 5 | ```bash 6 | # create and activate virtual environment 7 | python3 -m venv venv 8 | source venv/bin/activate 9 | 10 | # install dependencies 11 | pip install -r requirements.txt 12 | ``` -------------------------------------------------------------------------------- /experiments/hot-dog-not-hot-dog/app.py: -------------------------------------------------------------------------------- 1 | import base64 2 | 3 | import cv2 4 | import gradio as gr 5 | import numpy as np 6 | import requests 7 | 8 | MARKDOWN = """ 9 | # HotDogGPT 💬 + 🌭 10 | 11 | HotDogGPT is OpenAI Vision API experiment reproducing the famous 12 | [Hot Dog, Not Hot Dog](https://www.youtube.com/watch?v=ACmydtFDTGs) app from Silicon 13 | Valley. 14 | 15 |

16 | hotdog 17 |

18 | 19 | Visit [awesome-openai-vision-api-experiments](https://github.com/roboflow/awesome-openai-vision-api-experiments) 20 | repository to find more OpenAI Vision API experiments or contribute your own. 21 | """ 22 | API_URL = "https://api.openai.com/v1/chat/completions" 23 | CLASSES = ["🌭 Hot Dog", "❌ Not Hot Dog"] 24 | 25 | 26 | def preprocess_image(image: np.ndarray) -> np.ndarray: 27 | image = np.fliplr(image) 28 | return cv2.cvtColor(image, cv2.COLOR_RGB2BGR) 29 | 30 | 31 | def encode_image_to_base64(image: np.ndarray) -> str: 32 | success, buffer = cv2.imencode('.jpg', image) 33 | if not success: 34 | raise ValueError("Could not encode image to JPEG format.") 35 | 36 | encoded_image = base64.b64encode(buffer).decode('utf-8') 37 | return encoded_image 38 | 39 | 40 | def compose_payload(image: np.ndarray, prompt: str) -> dict: 41 | base64_image = encode_image_to_base64(image) 42 | return { 43 | "model": "gpt-4-vision-preview", 44 | "messages": [ 45 | { 46 | "role": "user", 47 | "content": [ 48 | { 49 | "type": "text", 50 | "text": prompt 51 | }, 52 | { 53 | "type": "image_url", 54 | "image_url": { 55 | "url": f"data:image/jpeg;base64,{base64_image}" 56 | } 57 | } 58 | ] 59 | } 60 | ], 61 | "max_tokens": 300 62 | } 63 | 64 | 65 | def compose_classification_prompt(classes: list) -> str: 66 | return (f"What is in the image? Return the class of the object in the image. Here " 67 | f"are the classes: {', '.join(classes)}. You can only return one class " 68 | f"from that list.") 69 | 70 | 71 | def compose_headers(api_key: str) -> dict: 72 | return { 73 | "Content-Type": "application/json", 74 | "Authorization": f"Bearer {api_key}" 75 | } 76 | 77 | 78 | def prompt_image(api_key: str, image: np.ndarray, prompt: str) -> str: 79 | headers = compose_headers(api_key=api_key) 80 | payload = compose_payload(image=image, prompt=prompt) 81 | response = requests.post(url=API_URL, headers=headers, json=payload).json() 82 | 83 | if 'error' in response: 84 | raise ValueError(response['error']['message']) 85 | return response['choices'][0]['message']['content'] 86 | 87 | 88 | def classify_image(api_key: str, image: np.ndarray) -> str: 89 | if not api_key: 90 | raise ValueError( 91 | "API_KEY is not set. " 92 | "Please follow the instructions in the README to set it up.") 93 | image = preprocess_image(image=image) 94 | prompt = compose_classification_prompt(classes=CLASSES) 95 | response = prompt_image(api_key=api_key, image=image, prompt=prompt) 96 | return response 97 | 98 | 99 | with gr.Blocks() as demo: 100 | gr.Markdown(MARKDOWN) 101 | api_key_textbox = gr.Textbox( 102 | label="🔑 OpenAI API", type="password") 103 | 104 | with gr.TabItem("Basic"): 105 | with gr.Column(): 106 | input_image = gr.Image( 107 | image_mode='RGB', type='numpy', height=500) 108 | output_text = gr.Textbox( 109 | label="Output") 110 | submit_button = gr.Button("Submit") 111 | 112 | submit_button.click( 113 | fn=classify_image, 114 | inputs=[api_key_textbox, input_image], 115 | outputs=output_text) 116 | 117 | # with gr.TabItem("Advanced"): 118 | # with gr.Column(): 119 | # advanced_clss_list = gr.Textbox( 120 | # label="Comma-separated list of classes") 121 | # advanced_input_image = gr.Image( 122 | # image_mode='RGB', type='numpy', height=500) 123 | # advanced_output_text = gr.Textbox( 124 | # label="Output") 125 | # advanced_submit_button = gr.Button("Submit") 126 | 127 | demo.launch(debug=False, show_error=True) 128 | -------------------------------------------------------------------------------- /experiments/hot-dog-not-hot-dog/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | opencv-python 3 | requests 4 | supervision 5 | gradio==3.50.2 6 | -------------------------------------------------------------------------------- /experiments/webcam-gpt/README.md: -------------------------------------------------------------------------------- 1 | # WebcamGPT 💬 + 📸 2 | 3 | https://github.com/roboflow/awesome-openai-vision-api-experiments/assets/26109316/c63fa3c0-4564-49ee-8982-a9e6a23dae9b 4 | 5 | ## 💻 Install 6 | 7 | ```bash 8 | # create and activate virtual environment 9 | python3 -m venv venv 10 | source venv/bin/activate 11 | 12 | # install dependencies 13 | pip install -r requirements.txt 14 | ``` 15 | 16 | ## 🚀 Run 17 | 18 | ```bash 19 | python app.py 20 | ``` 21 | -------------------------------------------------------------------------------- /experiments/webcam-gpt/app.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import os 3 | import uuid 4 | 5 | import cv2 6 | import gradio as gr 7 | import numpy as np 8 | import requests 9 | 10 | MARKDOWN = """ 11 | # WebcamGPT 💬 + 📸 12 | 13 | webcamGPT is a tool that allows you to chat with video using OpenAI Vision API. 14 | 15 | Visit [awesome-openai-vision-api-experiments](https://github.com/roboflow/awesome-openai-vision-api-experiments) 16 | repository to find more OpenAI Vision API experiments or contribute your own. 17 | """ 18 | AVATARS = ( 19 | "https://media.roboflow.com/spaces/roboflow_raccoon_full.png", 20 | "https://media.roboflow.com/spaces/openai-white-logomark.png" 21 | ) 22 | IMAGE_CACHE_DIRECTORY = "data" 23 | API_URL = "https://api.openai.com/v1/chat/completions" 24 | 25 | 26 | def preprocess_image(image: np.ndarray) -> np.ndarray: 27 | image = np.fliplr(image) 28 | return cv2.cvtColor(image, cv2.COLOR_RGB2BGR) 29 | 30 | 31 | def encode_image_to_base64(image: np.ndarray) -> str: 32 | success, buffer = cv2.imencode('.jpg', image) 33 | if not success: 34 | raise ValueError("Could not encode image to JPEG format.") 35 | 36 | encoded_image = base64.b64encode(buffer).decode('utf-8') 37 | return encoded_image 38 | 39 | 40 | def compose_payload(image: np.ndarray, prompt: str) -> dict: 41 | base64_image = encode_image_to_base64(image) 42 | return { 43 | "model": "gpt-4-vision-preview", 44 | "messages": [ 45 | { 46 | "role": "user", 47 | "content": [ 48 | { 49 | "type": "text", 50 | "text": prompt 51 | }, 52 | { 53 | "type": "image_url", 54 | "image_url": { 55 | "url": f"data:image/jpeg;base64,{base64_image}" 56 | } 57 | } 58 | ] 59 | } 60 | ], 61 | "max_tokens": 300 62 | } 63 | 64 | 65 | def compose_headers(api_key: str) -> dict: 66 | return { 67 | "Content-Type": "application/json", 68 | "Authorization": f"Bearer {api_key}" 69 | } 70 | 71 | 72 | def prompt_image(api_key: str, image: np.ndarray, prompt: str) -> str: 73 | headers = compose_headers(api_key=api_key) 74 | payload = compose_payload(image=image, prompt=prompt) 75 | response = requests.post(url=API_URL, headers=headers, json=payload).json() 76 | 77 | if 'error' in response: 78 | raise ValueError(response['error']['message']) 79 | return response['choices'][0]['message']['content'] 80 | 81 | 82 | def cache_image(image: np.ndarray) -> str: 83 | image_filename = f"{uuid.uuid4()}.jpeg" 84 | os.makedirs(IMAGE_CACHE_DIRECTORY, exist_ok=True) 85 | image_path = os.path.join(IMAGE_CACHE_DIRECTORY, image_filename) 86 | cv2.imwrite(image_path, image) 87 | return image_path 88 | 89 | 90 | def respond(api_key: str, image: np.ndarray, prompt: str, chat_history): 91 | if not api_key: 92 | raise ValueError( 93 | "API_KEY is not set. " 94 | "Please follow the instructions in the README to set it up.") 95 | 96 | image = preprocess_image(image=image) 97 | cached_image_path = cache_image(image) 98 | response = prompt_image(api_key=api_key, image=image, prompt=prompt) 99 | chat_history.append(((cached_image_path,), None)) 100 | chat_history.append((prompt, response)) 101 | return "", chat_history 102 | 103 | 104 | with gr.Blocks() as demo: 105 | gr.Markdown(MARKDOWN) 106 | with gr.Row(): 107 | webcam = gr.Image(source="webcam", streaming=True) 108 | with gr.Column(): 109 | api_key_textbox = gr.Textbox( 110 | label="OpenAI API KEY", type="password") 111 | chatbot = gr.Chatbot( 112 | height=500, bubble_full_width=False, avatar_images=AVATARS) 113 | message_textbox = gr.Textbox() 114 | clear_button = gr.ClearButton([message_textbox, chatbot]) 115 | 116 | message_textbox.submit( 117 | fn=respond, 118 | inputs=[api_key_textbox, webcam, message_textbox, chatbot], 119 | outputs=[message_textbox, chatbot] 120 | ) 121 | 122 | demo.launch(debug=False, show_error=True) 123 | -------------------------------------------------------------------------------- /experiments/webcam-gpt/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | opencv-python 3 | requests 4 | supervision 5 | gradio==3.50.2 6 | --------------------------------------------------------------------------------