├── .streamlit └── config.toml ├── .gitignore ├── test_images ├── alice.jpg ├── soup_can.jpg ├── turkey.jpg ├── strawberries.jpg └── pre-generated_hoodie.png ├── test_results └── alice.mp3 ├── __init__.py ├── parsers.py ├── LICENSE ├── utils.py ├── .devcontainer └── devcontainer.json ├── 🏠_Home.py ├── components.py ├── pages ├── 2_🧾_OCR.py ├── 0_📷_Camera.py ├── 1_👕_Product_Descriptions.py ├── 3_📋_Quality_Control.py └── 4_🗣️_Speech.py └── README.md /.streamlit/config.toml: -------------------------------------------------------------------------------- 1 | [theme] 2 | base = "light" -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | secrets.toml 2 | old_pages 3 | **/__pycache__ -------------------------------------------------------------------------------- /test_images/alice.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/alice.jpg -------------------------------------------------------------------------------- /test_images/soup_can.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/soup_can.jpg -------------------------------------------------------------------------------- /test_images/turkey.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/turkey.jpg -------------------------------------------------------------------------------- /test_results/alice.mp3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_results/alice.mp3 -------------------------------------------------------------------------------- /test_images/strawberries.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/strawberries.jpg -------------------------------------------------------------------------------- /test_images/pre-generated_hoodie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/pre-generated_hoodie.png -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Streamlit Inc. (2018-2022) Snowflake Inc. (2022) 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | -------------------------------------------------------------------------------- /parsers.py: -------------------------------------------------------------------------------- 1 | import json 2 | from json import JSONDecodeError 3 | 4 | 5 | def extract_json(string): 6 | """ 7 | This function extracts the first valid JSON object from a given string. 8 | 9 | Parameters: 10 | string (str): The string from which to extract the JSON object. 11 | 12 | Returns: 13 | obj: The first valid JSON object found in the string. 14 | 15 | Raises: 16 | ValueError: If no valid JSON object is found in the string. 17 | """ 18 | start_positions = [pos for pos, char in enumerate(string) if char == "{"] 19 | end_positions = [pos for pos, char in enumerate(string) if char == "}"] 20 | 21 | for start in start_positions: 22 | for end in reversed(end_positions): 23 | if start < end: 24 | try: 25 | obj = json.loads(string[start : end + 1]) 26 | return json.dumps(obj, indent=4, ensure_ascii=False) 27 | except JSONDecodeError: 28 | continue 29 | 30 | return "{}" 31 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Adam B. Strock 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Streamlit Inc. (2018-2022) Snowflake Inc. (2022) 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | import inspect 16 | import textwrap 17 | 18 | import streamlit as st 19 | 20 | 21 | def show_code(code): 22 | """Showing the code of the demo.""" 23 | show_code = st.sidebar.checkbox("Show code", False) 24 | if show_code: 25 | st.markdown("## Code") 26 | for function in code: 27 | # Showing the code of the demo. 28 | sourcelines, _ = inspect.getsourcelines(function) 29 | st.code(textwrap.dedent("".join(sourcelines[0:]))) 30 | -------------------------------------------------------------------------------- /.devcontainer/devcontainer.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Python 3", 3 | // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile 4 | "image": "mcr.microsoft.com/devcontainers/python:1-3.11-bullseye", 5 | "customizations": { 6 | "codespaces": { 7 | "openFiles": [ 8 | "README.md", 9 | "🏠_Home.py" 10 | ] 11 | }, 12 | "vscode": { 13 | "settings": {}, 14 | "extensions": [ 15 | "ms-python.python", 16 | "ms-python.vscode-pylance" 17 | ] 18 | } 19 | }, 20 | "updateContentCommand": "[ -f packages.txt ] && sudo apt update && sudo apt upgrade -y && sudo xargs apt install -y 63 | [data-testid="stSidebarNavItems"] { 64 | max-height: 60vh; 65 | } 66 | """, 67 | unsafe_allow_html=True, 68 | ) 69 | 70 | 71 | def toggle_balloons(): 72 | st.session_state.balloons = st.sidebar.checkbox("Show balloons", True) 73 | -------------------------------------------------------------------------------- /pages/2_🧾_OCR.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import base64 3 | import requests 4 | import json 5 | import components 6 | from utils import show_code 7 | 8 | 9 | def submit(image, api_key): 10 | headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"} 11 | 12 | base64_image = base64.b64encode(image).decode("utf-8") 13 | 14 | payload = { 15 | "model": "gpt-4-vision-preview", 16 | "messages": [ 17 | { 18 | "role": "system", 19 | "content": "You are trained to extract text from images.", 20 | }, 21 | { 22 | "role": "user", 23 | "content": [ 24 | { 25 | "type": "text", 26 | "text": "Extract all of the text visible in this image.", 27 | }, 28 | { 29 | "type": "image_url", 30 | "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, 31 | }, 32 | ], 33 | }, 34 | ], 35 | "max_tokens": 1024, 36 | } 37 | 38 | try: 39 | response = requests.post( 40 | "https://api.openai.com/v1/chat/completions", headers=headers, json=payload 41 | ) 42 | response.raise_for_status() 43 | 44 | text = response.json()["choices"][0]["message"]["content"] 45 | st.session_state.ocr_text = text 46 | 47 | if "balloons" in st.session_state and st.session_state.balloons: 48 | st.balloons() 49 | except requests.exceptions.HTTPError as err: 50 | st.toast(f":red[HTTP error: {err}]") 51 | except Exception as err: 52 | st.toast(f":red[Error: {err}]") 53 | 54 | 55 | def run(): 56 | image = components.image_uploader() 57 | 58 | api_key = components.api_key_with_warning() 59 | 60 | components.submit_button(image, api_key, submit) 61 | 62 | if "ocr_text" in st.session_state: 63 | st.text_area( 64 | "Extracted Text", 65 | st.session_state.ocr_text, 66 | height=400, 67 | ) 68 | 69 | 70 | st.set_page_config(page_title="GPT-4V OCR", page_icon="🧾") 71 | components.inc_sidebar_nav_height() 72 | st.write("# 🧾 OCR") 73 | st.write("Extract the text from an image.") 74 | st.info( 75 | "This is a test of the OpenAI GPT-4V preview and is not intended for production use." 76 | ) 77 | st.write("\n") 78 | 79 | run() 80 | 81 | components.toggle_balloons() 82 | show_code([submit, run, components]) 83 | -------------------------------------------------------------------------------- /pages/0_📷_Camera.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import base64 3 | import requests 4 | import components 5 | from utils import show_code 6 | 7 | 8 | def submit(image, api_key): 9 | headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"} 10 | 11 | base64_image = base64.b64encode(image).decode("utf-8") 12 | 13 | payload = { 14 | "model": "gpt-4-vision-preview", 15 | "messages": [ 16 | { 17 | "role": "system", 18 | "content": "You are a friendly assistant.", 19 | }, 20 | { 21 | "role": "user", 22 | "content": [ 23 | { 24 | "type": "text", 25 | "text": "Write your best four-sentence caption for this image, highlighting the most " 26 | "interesting aspects without making assumptions.", 27 | }, 28 | { 29 | "type": "image_url", 30 | "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, 31 | }, 32 | ], 33 | }, 34 | ], 35 | "max_tokens": 300, 36 | } 37 | 38 | try: 39 | response = requests.post( 40 | "https://api.openai.com/v1/chat/completions", headers=headers, json=payload 41 | ) 42 | response.raise_for_status() 43 | 44 | camera_caption = response.json()["choices"][0]["message"]["content"] 45 | st.session_state.camera_caption = camera_caption 46 | 47 | if "balloons" in st.session_state and st.session_state.balloons: 48 | st.balloons() 49 | except requests.exceptions.HTTPError as err: 50 | st.toast(f":red[HTTP error: {err}]") 51 | except Exception as err: 52 | st.toast(f":red[Error: {err}]") 53 | 54 | 55 | def run(): 56 | selected_option = st.radio( 57 | "Image Input", 58 | ["Camera", "Image File"], 59 | horizontal=True, 60 | label_visibility="collapsed", 61 | ) 62 | 63 | if selected_option == "Camera": 64 | image = components.camera_uploader() 65 | else: 66 | image = components.image_uploader() 67 | 68 | api_key = components.api_key_with_warning() 69 | 70 | components.submit_button(image, api_key, submit) 71 | 72 | if "camera_caption" in st.session_state: 73 | st.text_area( 74 | "Caption", 75 | st.session_state.camera_caption, 76 | height=300, 77 | ) 78 | 79 | 80 | st.set_page_config(page_title="GPT-4V Camera", page_icon="📷") 81 | components.inc_sidebar_nav_height() 82 | st.write("# 📷 Camera") 83 | st.write("Take a photo with your device's camera and generate a caption.") 84 | st.info( 85 | "This is a test of the OpenAI GPT-4V preview and is not intended for production use." 86 | ) 87 | 88 | run() 89 | 90 | components.toggle_balloons() 91 | show_code([submit, run, components]) 92 | -------------------------------------------------------------------------------- /pages/1_👕_Product_Descriptions.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import base64 3 | import requests 4 | import json 5 | import components 6 | from utils import show_code 7 | 8 | 9 | def submit(image, api_key, product): 10 | headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"} 11 | base64_image = base64.b64encode(image).decode("utf-8") 12 | 13 | payload = { 14 | "model": "gpt-4-vision-preview", 15 | "messages": [ 16 | { 17 | "role": "system", 18 | "content": "You are an expert copywriter for leading brands.", 19 | }, 20 | { 21 | "role": "user", 22 | "content": [ 23 | { 24 | "type": "text", 25 | "text": "Write your best single-paragraph product description for this image. " 26 | "You are encouraged to incorporate the product attributes provided below. " 27 | "Do not infer sizing, product name, product brand, or specific materials unless " 28 | "provided in the product attributes. Make sure to write about the colors and " 29 | "other visible features of the product.\n\n" 30 | f"{product}", 31 | }, 32 | { 33 | "type": "image_url", 34 | "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, 35 | }, 36 | ], 37 | }, 38 | ], 39 | "max_tokens": 300, 40 | } 41 | 42 | try: 43 | response = requests.post( 44 | "https://api.openai.com/v1/chat/completions", headers=headers, json=payload 45 | ) 46 | response.raise_for_status() 47 | 48 | description = response.json()["choices"][0]["message"]["content"] 49 | product = json.loads(product) 50 | description = ( 51 | description.replace(" '", " ") 52 | .replace("' ", " ") 53 | .replace(' "', " ") 54 | .replace('" ', " ") 55 | ) 56 | product["product_attributes"]["description"] = description 57 | st.session_state.product = json.dumps(product, indent=4, ensure_ascii=False) 58 | 59 | if "balloons" in st.session_state and st.session_state.balloons: 60 | st.balloons() 61 | except requests.exceptions.HTTPError as err: 62 | st.toast(f":red[HTTP error: {err}]") 63 | except Exception as err: 64 | st.toast(f":red[Error: {err}]") 65 | 66 | 67 | def run(): 68 | image = components.image_uploader() 69 | 70 | product = st.text_area( 71 | "Product Attributes", 72 | value=json.dumps( 73 | { 74 | "product_attributes": { 75 | "brand_name": "", 76 | "product_name": "", 77 | "materials": "", 78 | } 79 | }, 80 | indent=4, 81 | ), 82 | height=200, 83 | ) 84 | 85 | st.caption("Attributes are optional. Feel free to try your own!") 86 | 87 | api_key = components.api_key_with_warning() 88 | 89 | components.submit_button(image, api_key, submit, product) 90 | 91 | if "product" in st.session_state: 92 | st.text_area( 93 | "Product Attributes With Description", 94 | st.session_state.product, 95 | height=400, 96 | ) 97 | 98 | 99 | st.set_page_config(page_title="GPT-4V Product Descriptions", page_icon="👕") 100 | components.inc_sidebar_nav_height() 101 | st.write("# 👕 Product Descriptions") 102 | st.write("Generate a product description for an image.") 103 | st.info( 104 | "This is a test of the OpenAI GPT-4V preview and is not intended for production use." 105 | ) 106 | st.write("\n") 107 | 108 | run() 109 | 110 | components.toggle_balloons() 111 | show_code([submit, run, components]) 112 | -------------------------------------------------------------------------------- /pages/3_📋_Quality_Control.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import base64 3 | import requests 4 | import json 5 | import components 6 | from utils import show_code 7 | from parsers import extract_json 8 | 9 | 10 | def submit(image, api_key, issue_attributes): 11 | headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"} 12 | 13 | base64_image = base64.b64encode(image).decode("utf-8") 14 | 15 | payload = { 16 | "model": "gpt-4-vision-preview", 17 | "messages": [ 18 | { 19 | "role": "system", 20 | "content": "You are an expert quality control inspector for leading manufacturers.", 21 | }, 22 | { 23 | "role": "user", 24 | "content": [ 25 | { 26 | "type": "text", 27 | "text": ( 28 | "Inspect this image and write a report in the following format:\n\n" 29 | "```json\n" 30 | "{\n" 31 | ' "issues": [\n' 32 | " {\n" 33 | f"{issue_attributes}\n" 34 | " }\n" 35 | " ]\n" 36 | "}\n" 37 | "```\n\n" 38 | "If you see any signs of quality deterioration of any kind, such as corrosion, " 39 | "physical damage, decay, or contamination, add them as separate issues in the " 40 | "`issues` array. If there are no issues, the `issues` array should be empty. " 41 | "Your response should contain only valid JSON." 42 | ), 43 | }, 44 | { 45 | "type": "image_url", 46 | "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, 47 | }, 48 | ], 49 | }, 50 | ], 51 | "max_tokens": 1024, 52 | "temperature": 0.1, 53 | # Response format not yet supported by GPT-4V 54 | # "response_format": {"type": "json_object"}, 55 | } 56 | 57 | try: 58 | response = requests.post( 59 | "https://api.openai.com/v1/chat/completions", headers=headers, json=payload 60 | ) 61 | response.raise_for_status() 62 | 63 | text = extract_json(response.json()["choices"][0]["message"]["content"]) 64 | st.session_state.response_text = text 65 | 66 | if "balloons" in st.session_state and st.session_state.balloons: 67 | st.balloons() 68 | except requests.exceptions.HTTPError as err: 69 | st.toast(f":red[HTTP error: {err}]") 70 | except Exception as err: 71 | st.toast(f":red[Error: {err}]") 72 | 73 | 74 | def run(): 75 | image = components.image_uploader() 76 | 77 | issue_attributes = st.text_area( 78 | "Quality Issue Attributes", 79 | value='"issue_critical": true if inedible,\n' 80 | '"issue_category": string,\n' 81 | '"issue_description": single-paragraph string', 82 | height=120, 83 | ) 84 | 85 | api_key = components.api_key_with_warning() 86 | 87 | components.submit_button(image, api_key, submit, issue_attributes) 88 | 89 | if "response_text" in st.session_state: 90 | st.text_area( 91 | "QC Report", 92 | st.session_state.response_text, 93 | height=400, 94 | ) 95 | 96 | 97 | st.set_page_config(page_title="GPT-4V Quality Control", page_icon="📋") 98 | components.inc_sidebar_nav_height() 99 | st.write("# 📋 Quality Control") 100 | st.write("Generate a QC report for an image.") 101 | st.info( 102 | "This is a test of the OpenAI GPT-4V preview and is not intended for production use." 103 | ) 104 | st.write("\n") 105 | 106 | run() 107 | 108 | components.toggle_balloons() 109 | show_code([submit, run, components]) 110 | -------------------------------------------------------------------------------- /pages/4_🗣️_Speech.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import base64 3 | import requests 4 | import json 5 | import components 6 | from utils import show_code 7 | 8 | 9 | def submit(image, api_key, voice, hd): 10 | headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"} 11 | 12 | base64_image = base64.b64encode(image).decode("utf-8") 13 | 14 | payload = { 15 | "model": "gpt-4-vision-preview", 16 | "messages": [ 17 | { 18 | "role": "system", 19 | "content": "You are trained to extract text from images.", 20 | }, 21 | { 22 | "role": "user", 23 | "content": [ 24 | { 25 | "type": "text", 26 | "text": "Extract all of the text visible in this image, ignoring extraneous " 27 | "pieces of text like titles and page numbers, and converting Roman numerals " 28 | "to Latin numerals. Your response should contain only the headers and content.", 29 | }, 30 | { 31 | "type": "image_url", 32 | "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, 33 | }, 34 | ], 35 | }, 36 | ], 37 | "max_tokens": 2048, 38 | } 39 | 40 | try: 41 | response = requests.post( 42 | "https://api.openai.com/v1/chat/completions", headers=headers, json=payload 43 | ) 44 | response.raise_for_status() 45 | 46 | text = response.json()["choices"][0]["message"]["content"] 47 | st.session_state.extracted_text = text 48 | 49 | tts_payload = { 50 | "model": "tts-1-hd" if hd else "tts-1", 51 | "voice": voice, 52 | "input": text, 53 | } 54 | 55 | tts_response = requests.post( 56 | "https://api.openai.com/v1/audio/speech", headers=headers, json=tts_payload 57 | ) 58 | tts_response.raise_for_status() 59 | 60 | st.audio(tts_response.content, format="audio/mpeg") 61 | 62 | st.download_button( 63 | label="📥 Save Audio", 64 | data=tts_response.content, 65 | file_name=f'audio_{tts_response.headers["x-request-id"]}.mp3', 66 | mime="audio/mpeg", 67 | ) 68 | 69 | if "balloons" in st.session_state and st.session_state.balloons: 70 | st.balloons() 71 | except requests.exceptions.HTTPError as err: 72 | st.toast(f":red[HTTP error: {err}]") 73 | except Exception as err: 74 | st.toast(f":red[Error: {err}]") 75 | 76 | 77 | def run(): 78 | selected_option = st.radio( 79 | "Image Input", 80 | ["Camera", "Image File"], 81 | horizontal=True, 82 | label_visibility="collapsed", 83 | ) 84 | 85 | if selected_option == "Camera": 86 | image = components.camera_uploader() 87 | else: 88 | image = components.image_uploader() 89 | 90 | api_key = components.api_key_with_warning() 91 | 92 | voice = st.selectbox( 93 | "AI Voice", 94 | ("echo", "alloy", "fable", "onyx", "nova", "shimmer"), 95 | ) 96 | 97 | hd = st.checkbox( 98 | "HD", 99 | value=True, 100 | ) 101 | 102 | components.submit_button(image, api_key, submit, voice, hd) 103 | 104 | if "extracted_text" in st.session_state: 105 | st.text_area( 106 | "Extracted Text", 107 | st.session_state.extracted_text, 108 | height=400, 109 | ) 110 | 111 | 112 | st.set_page_config(page_title="GPT-4V Speech", page_icon="🗣️") 113 | components.inc_sidebar_nav_height() 114 | st.write("# 🗣️ Speech") 115 | st.write("Generate audio from an image using GPT-4V + OpenAI TTS.") 116 | st.info( 117 | "This is a test of the OpenAI GPT-4V preview and is not intended for production use." 118 | ) 119 | st.write("\n") 120 | 121 | run() 122 | 123 | components.toggle_balloons() 124 | show_code([submit, run, components]) 125 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GPT-4V Demos 2 | 3 | Python 3.8+ 4 | Streamlit App 5 | Open in GitHub Codespaces 6 | 7 | This mobile-friendly web app provides some basic demos to test the vision capabilities of GPT-4V. 8 | 9 | [Streamlit](https://streamlit.io) was selected as a framework for this project to enable rapid prototyping of new ideas. 10 | 11 | ## Examples 12 | 13 | 14 | 15 | 16 | 20 | 21 | 22 | 23 | 30 | 31 | 32 | 33 | 45 | 46 | 47 | 48 | 55 | 56 | 57 | 58 | 62 | 63 |
📷 CameraTake a photo with your device's camera and generate a caption.
17 | Turkey 18 | An unexpected traveler struts confidently across the asphalt, its iridescent feathers gleaming in the sunlight. This wild turkey, with its distinctive fan of tail feathers on full display, appears unfazed by the nearby human presence. A dash of wilderness encounters suburbia as the bird navigates between nature and civilization. The scene is a gentle reminder of the ever-present connection between human and animal territories. 19 |
👕 Product DescriptionsGenerate a product description for an image.
24 | Pre-generated Hoodie 25 | Additional Input
26 | { "product_attributes": { "brand_name": "Logical Root", "product_name": "Pre-generated Hoodie", "materials": "100% digital cotton" } }
27 | Output
28 | { "description": "Embrace the fusion of art and comfort with the Logical Root Pre-generated Hoodie, a masterpiece crafted from 100% digital cotton to provide unparalleled softness and durability. The hoodie comes in a classic, versatile shade of black, boasting a bold graphic print at its core that captures a whirlwind of vibrant colors in an abstract design, promising to turn heads and spark conversations. With a spacious front pocket to keep your essentials close and a snug hoodie with adjustable drawstrings for those extra chilly days, this piece is the epitome of functional fashion. The ribbed cuffs and hem ensure a perfect fit while adding to the overall sleek silhouette, making it a must-have addition to your wardrobe whether you're aiming for a casual day out or a statement-making ensemble." } 29 |
🧾 OCRExtract the text from an image.
34 | Soup Can 35 | The text on the can reads:
36 | - Campbell's®
37 | - CONDENSED
38 | - 90 CALORIES PER 1/2 CUP
39 | - Tomato
40 | - SOUP
41 | - NET WT. 10 3/4 OZ. (305g)
42 | There is also text within a gold seal that reads:
43 | - "PARIS INTERNATIONAL EXPOSITION 1900"
44 |
📋 Quality ControlGenerate a QC report for an image.
49 | Strawberries 50 | Additional Input
51 | "issue_critical": true if inedible, "issue_category": string, "issue_description": single-paragraph string
52 | Output
53 | { "issues": [ { "issue_critical": true, "issue_category": "Contamination", "issue_description": "There is visible mold on one of the strawberries in the bottom left corner of the image, indicating spoilage and potential health risk if consumed." }, { "issue_critical": false, "issue_category": "Physical Damage", "issue_description": "Several strawberries appear to have minor physical damage, such as dents and bruises, which may affect their shelf life and aesthetic appeal but are not necessarily a health hazard." } ] } 54 |
🗣️ SpeechGenerate audio from an image using GPT-4V + OpenAI TTS.
59 | Alice 60 | Download audio | Play audio on CodePen 61 |
64 | 65 | ## Prerequisites 66 | 67 | - Python 3.8+ 68 | - OpenAI API key 69 | > [How can I access GPT-4?](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4) 70 | 71 | ## Local setup 72 | 73 | Here's how you can get started. 74 | 75 | 1. Clone this repository. 76 | ``` 77 | git clone https://github.com/logicalroot/gpt-4v-demos.git 78 | cd gpt-4v-demos 79 | ``` 80 | 2. Install the necessary packages: 81 | ``` 82 | pip install streamlit 83 | ``` 84 | 3. Run the application: 85 | ``` 86 | streamlit run 🏠_Home.py 87 | ``` 88 | 4. To remove the missing secrets warning, create a blank `secrets.toml` file in your `.streamlit` folder. 89 | 90 | > [!TIP] 91 | > To avoid inputting your OpenAI API key every run, you can add it to `secrets.toml` with the following line. Paste your key between the double quotes. 92 | > ``` 93 | > OPENAI_API_KEY = "YOUR KEY" 94 | > ``` 95 | > For safety, ensure `secrets.toml` is in your `.gitignore` file. 96 | 97 | ## Limitations 98 | 99 | To use the camera input on iOS devices, Streamlit must be configured to use SSL. See [Streamlit docs](https://docs.streamlit.io/library/advanced-features/https-support). 100 | 101 | ## Contributing 102 | 103 | Feel free to experiment and share new demos using the code! 104 | 105 | ## About GPT-4V 106 | 107 | - [OpenAI announcement](https://openai.com/blog/new-models-and-developer-products-announced-at-devday) 108 | - [OpenAI research](https://openai.com/research/gpt-4v-system-card) 109 | - [OpenAI docs](https://platform.openai.com/docs/guides/vision) 110 | 111 | ## License 112 | 113 | This project is licensed under the terms of the MIT license. --------------------------------------------------------------------------------