├── .streamlit
    └── config.toml
├── .gitignore
├── test_images
    ├── alice.jpg
    ├── soup_can.jpg
    ├── turkey.jpg
    ├── strawberries.jpg
    └── pre-generated_hoodie.png
├── test_results
    └── alice.mp3
├── __init__.py
├── parsers.py
├── LICENSE
├── utils.py
├── .devcontainer
    └── devcontainer.json
├── 🏠_Home.py
├── components.py
├── pages
    ├── 2_🧾_OCR.py
    ├── 0_📷_Camera.py
    ├── 1_👕_Product_Descriptions.py
    ├── 3_📋_Quality_Control.py
    └── 4_🗣️_Speech.py
└── README.md


/.streamlit/config.toml:
--------------------------------------------------------------------------------
1 | [theme]
2 | base = "light"


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | secrets.toml
2 | old_pages
3 | **/__pycache__


--------------------------------------------------------------------------------
/test_images/alice.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/alice.jpg


--------------------------------------------------------------------------------
/test_images/soup_can.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/soup_can.jpg


--------------------------------------------------------------------------------
/test_images/turkey.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/turkey.jpg


--------------------------------------------------------------------------------
/test_results/alice.mp3:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_results/alice.mp3


--------------------------------------------------------------------------------
/test_images/strawberries.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/strawberries.jpg


--------------------------------------------------------------------------------
/test_images/pre-generated_hoodie.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/logicalroot/gpt-4v-demos/HEAD/test_images/pre-generated_hoodie.png


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Streamlit Inc. (2018-2022) Snowflake Inc. (2022)
 2 | #
 3 | # Licensed under the Apache License, Version 2.0 (the "License");
 4 | # you may not use this file except in compliance with the License.
 5 | # You may obtain a copy of the License at
 6 | #
 7 | #     http://www.apache.org/licenses/LICENSE-2.0
 8 | #
 9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | 


--------------------------------------------------------------------------------
/parsers.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | from json import JSONDecodeError
 3 | 
 4 | 
 5 | def extract_json(string):
 6 |     """
 7 |     This function extracts the first valid JSON object from a given string.
 8 | 
 9 |     Parameters:
10 |     string (str): The string from which to extract the JSON object.
11 | 
12 |     Returns:
13 |     obj: The first valid JSON object found in the string.
14 | 
15 |     Raises:
16 |     ValueError: If no valid JSON object is found in the string.
17 |     """
18 |     start_positions = [pos for pos, char in enumerate(string) if char == "{"]
19 |     end_positions = [pos for pos, char in enumerate(string) if char == "}"]
20 | 
21 |     for start in start_positions:
22 |         for end in reversed(end_positions):
23 |             if start < end:
24 |                 try:
25 |                     obj = json.loads(string[start : end + 1])
26 |                     return json.dumps(obj, indent=4, ensure_ascii=False)
27 |                 except JSONDecodeError:
28 |                     continue
29 | 
30 |     return "{}"
31 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Adam B. Strock
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Streamlit Inc. (2018-2022) Snowflake Inc. (2022)
 2 | #
 3 | # Licensed under the Apache License, Version 2.0 (the "License");
 4 | # you may not use this file except in compliance with the License.
 5 | # You may obtain a copy of the License at
 6 | #
 7 | #     http://www.apache.org/licenses/LICENSE-2.0
 8 | #
 9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | 
15 | import inspect
16 | import textwrap
17 | 
18 | import streamlit as st
19 | 
20 | 
21 | def show_code(code):
22 |     """Showing the code of the demo."""
23 |     show_code = st.sidebar.checkbox("Show code", False)
24 |     if show_code:
25 |         st.markdown("## Code")
26 |         for function in code:
27 |             # Showing the code of the demo.
28 |             sourcelines, _ = inspect.getsourcelines(function)
29 |             st.code(textwrap.dedent("".join(sourcelines[0:])))
30 | 


--------------------------------------------------------------------------------
/.devcontainer/devcontainer.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "Python 3",
 3 |   // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
 4 |   "image": "mcr.microsoft.com/devcontainers/python:1-3.11-bullseye",
 5 |   "customizations": {
 6 |     "codespaces": {
 7 |       "openFiles": [
 8 |         "README.md",
 9 |         "🏠_Home.py"
10 |       ]
11 |     },
12 |     "vscode": {
13 |       "settings": {},
14 |       "extensions": [
15 |         "ms-python.python",
16 |         "ms-python.vscode-pylance"
17 |       ]
18 |     }
19 |   },
20 |   "updateContentCommand": "[ -f packages.txt ] && sudo apt update && sudo apt upgrade -y && sudo xargs apt install -y <packages.txt; [ -f requirements.txt ] && pip3 install --user -r requirements.txt; pip3 install --user streamlit; echo '✅ Packages installed and Requirements met'",
21 |   "postAttachCommand": {
22 |     "server": "streamlit run 🏠_Home.py --server.enableCORS false --server.enableXsrfProtection false"
23 |   },
24 |   "portsAttributes": {
25 |     "8501": {
26 |       "label": "Application",
27 |       "onAutoForward": "openPreview"
28 |     }
29 |   },
30 |   "forwardPorts": [
31 |     8501
32 |   ]
33 | }
34 | 


--------------------------------------------------------------------------------
/🏠_Home.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | 
 3 | 
 4 | def run():
 5 |     st.set_page_config(
 6 |         page_title="GPT-4V Demos",
 7 |         page_icon="🤖",
 8 |         initial_sidebar_state="expanded",
 9 |     )
10 | 
11 |     try:
12 |         # Import OpenAI API key if it exists in project secrets
13 |         st.session_state.api_key = st.secrets.OPENAI_API_KEY
14 |     except:
15 |         # Otherwise, prompt for OpenAI API key
16 |         st.session_state.api_key = st.sidebar.text_input(
17 |             "OpenAI API key",
18 |             st.session_state.api_key if "api_key" in st.session_state else "",
19 |             type="password",
20 |         )
21 | 
22 |     if st.session_state.api_key == "":
23 |         st.sidebar.caption(":red[An OpenAI API key is required to run the tests.]")
24 | 
25 |     st.write("# GPT-4V Demos")
26 |     st.write("\n")
27 |     st.info(
28 |         "This mobile-friendly web app provides some basic demos to test the vision capabilities of GPT-4V."
29 |     )
30 |     st.info("Open them from the sidebar!", icon="↖️")
31 |     st.caption(
32 |         """This project is licensed under the terms of the MIT license.
33 |         [View the source code](https://github.com/logicalroot/gpt-4v-demos)."""
34 |     )
35 |     st.write("\n")
36 |     st.markdown(
37 |         """
38 |         ### About GPT-4V\n
39 |         [OpenAI announcement](https://openai.com/blog/new-models-and-developer-products-announced-at-devday)\n
40 |         [OpenAI research](https://openai.com/research/gpt-4v-system-card)\n
41 |         [OpenAI docs](https://platform.openai.com/docs/guides/vision)\n
42 |         """
43 |     )
44 | 
45 | 
46 | if __name__ == "__main__":
47 |     run()
48 | 


--------------------------------------------------------------------------------
/components.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | 
 3 | 
 4 | def api_key_with_warning():
 5 |     api_key = (
 6 |         st.session_state.api_key
 7 |         if "api_key" in st.session_state and st.session_state.api_key != ""
 8 |         else None
 9 |     )
10 | 
11 |     if api_key is None:
12 |         st.warning(
13 |             "Input your OpenAI API key in the sidebar on the Home page.", icon="⚠️"
14 |         )
15 | 
16 |     return api_key
17 | 
18 | 
19 | def uploader(file, download=False):
20 |     bytes_data = None
21 | 
22 |     if file is not None:
23 |         bytes_data = file.getvalue()
24 |         st.image(bytes_data, caption=file.name, width=200)
25 |         if download:
26 |             st.download_button(
27 |                 label="📥 Save File",
28 |                 data=bytes_data,
29 |                 file_name=file.name,
30 |                 mime=file.type,
31 |             )
32 | 
33 |     return bytes_data
34 | 
35 | 
36 | def image_uploader():
37 |     return uploader(st.file_uploader("Image file:", label_visibility="collapsed"))
38 | 
39 | 
40 | def camera_uploader():
41 |     return uploader(st.camera_input("Take a photo", label_visibility="collapsed"), True)
42 | 
43 | 
44 | def submit_button(image, api_key, callback, *optional_parameters):
45 |     button = st.button(
46 |         "Submit",
47 |         disabled=image is None or api_key is None,
48 |         key="submit",
49 |         type="primary",
50 |     )
51 | 
52 |     if button:
53 |         with st.spinner("Submitting..."):
54 |             if optional_parameters:
55 |                 callback(image, api_key, *optional_parameters)
56 |             else:
57 |                 callback(image, api_key)
58 | 
59 | 
60 | def inc_sidebar_nav_height():
61 |     st.markdown(
62 |         """<style>
63 |         [data-testid="stSidebarNavItems"] {
64 |             max-height: 60vh;
65 |         }
66 |         </style>""",
67 |         unsafe_allow_html=True,
68 |     )
69 | 
70 | 
71 | def toggle_balloons():
72 |     st.session_state.balloons = st.sidebar.checkbox("Show balloons", True)
73 | 


--------------------------------------------------------------------------------
/pages/2_🧾_OCR.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | import base64
 3 | import requests
 4 | import json
 5 | import components
 6 | from utils import show_code
 7 | 
 8 | 
 9 | def submit(image, api_key):
10 |     headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
11 | 
12 |     base64_image = base64.b64encode(image).decode("utf-8")
13 | 
14 |     payload = {
15 |         "model": "gpt-4-vision-preview",
16 |         "messages": [
17 |             {
18 |                 "role": "system",
19 |                 "content": "You are trained to extract text from images.",
20 |             },
21 |             {
22 |                 "role": "user",
23 |                 "content": [
24 |                     {
25 |                         "type": "text",
26 |                         "text": "Extract all of the text visible in this image.",
27 |                     },
28 |                     {
29 |                         "type": "image_url",
30 |                         "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
31 |                     },
32 |                 ],
33 |             },
34 |         ],
35 |         "max_tokens": 1024,
36 |     }
37 | 
38 |     try:
39 |         response = requests.post(
40 |             "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
41 |         )
42 |         response.raise_for_status()
43 | 
44 |         text = response.json()["choices"][0]["message"]["content"]
45 |         st.session_state.ocr_text = text
46 | 
47 |         if "balloons" in st.session_state and st.session_state.balloons:
48 |             st.balloons()
49 |     except requests.exceptions.HTTPError as err:
50 |         st.toast(f":red[HTTP error: {err}]")
51 |     except Exception as err:
52 |         st.toast(f":red[Error: {err}]")
53 | 
54 | 
55 | def run():
56 |     image = components.image_uploader()
57 | 
58 |     api_key = components.api_key_with_warning()
59 | 
60 |     components.submit_button(image, api_key, submit)
61 | 
62 |     if "ocr_text" in st.session_state:
63 |         st.text_area(
64 |             "Extracted Text",
65 |             st.session_state.ocr_text,
66 |             height=400,
67 |         )
68 | 
69 | 
70 | st.set_page_config(page_title="GPT-4V OCR", page_icon="🧾")
71 | components.inc_sidebar_nav_height()
72 | st.write("# 🧾 OCR")
73 | st.write("Extract the text from an image.")
74 | st.info(
75 |     "This is a test of the OpenAI GPT-4V preview and is not intended for production use."
76 | )
77 | st.write("\n")
78 | 
79 | run()
80 | 
81 | components.toggle_balloons()
82 | show_code([submit, run, components])
83 | 


--------------------------------------------------------------------------------
/pages/0_📷_Camera.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | import base64
 3 | import requests
 4 | import components
 5 | from utils import show_code
 6 | 
 7 | 
 8 | def submit(image, api_key):
 9 |     headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
10 | 
11 |     base64_image = base64.b64encode(image).decode("utf-8")
12 | 
13 |     payload = {
14 |         "model": "gpt-4-vision-preview",
15 |         "messages": [
16 |             {
17 |                 "role": "system",
18 |                 "content": "You are a friendly assistant.",
19 |             },
20 |             {
21 |                 "role": "user",
22 |                 "content": [
23 |                     {
24 |                         "type": "text",
25 |                         "text": "Write your best four-sentence caption for this image, highlighting the most "
26 |                         "interesting aspects without making assumptions.",
27 |                     },
28 |                     {
29 |                         "type": "image_url",
30 |                         "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
31 |                     },
32 |                 ],
33 |             },
34 |         ],
35 |         "max_tokens": 300,
36 |     }
37 | 
38 |     try:
39 |         response = requests.post(
40 |             "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
41 |         )
42 |         response.raise_for_status()
43 | 
44 |         camera_caption = response.json()["choices"][0]["message"]["content"]
45 |         st.session_state.camera_caption = camera_caption
46 | 
47 |         if "balloons" in st.session_state and st.session_state.balloons:
48 |             st.balloons()
49 |     except requests.exceptions.HTTPError as err:
50 |         st.toast(f":red[HTTP error: {err}]")
51 |     except Exception as err:
52 |         st.toast(f":red[Error: {err}]")
53 | 
54 | 
55 | def run():
56 |     selected_option = st.radio(
57 |         "Image Input",
58 |         ["Camera", "Image File"],
59 |         horizontal=True,
60 |         label_visibility="collapsed",
61 |     )
62 | 
63 |     if selected_option == "Camera":
64 |         image = components.camera_uploader()
65 |     else:
66 |         image = components.image_uploader()
67 | 
68 |     api_key = components.api_key_with_warning()
69 | 
70 |     components.submit_button(image, api_key, submit)
71 | 
72 |     if "camera_caption" in st.session_state:
73 |         st.text_area(
74 |             "Caption",
75 |             st.session_state.camera_caption,
76 |             height=300,
77 |         )
78 | 
79 | 
80 | st.set_page_config(page_title="GPT-4V Camera", page_icon="📷")
81 | components.inc_sidebar_nav_height()
82 | st.write("# 📷 Camera")
83 | st.write("Take a photo with your device's camera and generate a caption.")
84 | st.info(
85 |     "This is a test of the OpenAI GPT-4V preview and is not intended for production use."
86 | )
87 | 
88 | run()
89 | 
90 | components.toggle_balloons()
91 | show_code([submit, run, components])
92 | 


--------------------------------------------------------------------------------
/pages/1_👕_Product_Descriptions.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | import base64
  3 | import requests
  4 | import json
  5 | import components
  6 | from utils import show_code
  7 | 
  8 | 
  9 | def submit(image, api_key, product):
 10 |     headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
 11 |     base64_image = base64.b64encode(image).decode("utf-8")
 12 | 
 13 |     payload = {
 14 |         "model": "gpt-4-vision-preview",
 15 |         "messages": [
 16 |             {
 17 |                 "role": "system",
 18 |                 "content": "You are an expert copywriter for leading brands.",
 19 |             },
 20 |             {
 21 |                 "role": "user",
 22 |                 "content": [
 23 |                     {
 24 |                         "type": "text",
 25 |                         "text": "Write your best single-paragraph product description for this image. "
 26 |                         "You are encouraged to incorporate the product attributes provided below. "
 27 |                         "Do not infer sizing, product name, product brand, or specific materials unless "
 28 |                         "provided in the product attributes. Make sure to write about the colors and "
 29 |                         "other visible features of the product.\n\n"
 30 |                         f"{product}",
 31 |                     },
 32 |                     {
 33 |                         "type": "image_url",
 34 |                         "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
 35 |                     },
 36 |                 ],
 37 |             },
 38 |         ],
 39 |         "max_tokens": 300,
 40 |     }
 41 | 
 42 |     try:
 43 |         response = requests.post(
 44 |             "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
 45 |         )
 46 |         response.raise_for_status()
 47 | 
 48 |         description = response.json()["choices"][0]["message"]["content"]
 49 |         product = json.loads(product)
 50 |         description = (
 51 |             description.replace(" '", " ")
 52 |             .replace("' ", " ")
 53 |             .replace(' "', " ")
 54 |             .replace('" ', " ")
 55 |         )
 56 |         product["product_attributes"]["description"] = description
 57 |         st.session_state.product = json.dumps(product, indent=4, ensure_ascii=False)
 58 | 
 59 |         if "balloons" in st.session_state and st.session_state.balloons:
 60 |             st.balloons()
 61 |     except requests.exceptions.HTTPError as err:
 62 |         st.toast(f":red[HTTP error: {err}]")
 63 |     except Exception as err:
 64 |         st.toast(f":red[Error: {err}]")
 65 | 
 66 | 
 67 | def run():
 68 |     image = components.image_uploader()
 69 | 
 70 |     product = st.text_area(
 71 |         "Product Attributes",
 72 |         value=json.dumps(
 73 |             {
 74 |                 "product_attributes": {
 75 |                     "brand_name": "",
 76 |                     "product_name": "",
 77 |                     "materials": "",
 78 |                 }
 79 |             },
 80 |             indent=4,
 81 |         ),
 82 |         height=200,
 83 |     )
 84 | 
 85 |     st.caption("Attributes are optional. Feel free to try your own!")
 86 | 
 87 |     api_key = components.api_key_with_warning()
 88 | 
 89 |     components.submit_button(image, api_key, submit, product)
 90 | 
 91 |     if "product" in st.session_state:
 92 |         st.text_area(
 93 |             "Product Attributes With Description",
 94 |             st.session_state.product,
 95 |             height=400,
 96 |         )
 97 | 
 98 | 
 99 | st.set_page_config(page_title="GPT-4V Product Descriptions", page_icon="👕")
100 | components.inc_sidebar_nav_height()
101 | st.write("# 👕 Product Descriptions")
102 | st.write("Generate a product description for an image.")
103 | st.info(
104 |     "This is a test of the OpenAI GPT-4V preview and is not intended for production use."
105 | )
106 | st.write("\n")
107 | 
108 | run()
109 | 
110 | components.toggle_balloons()
111 | show_code([submit, run, components])
112 | 


--------------------------------------------------------------------------------
/pages/3_📋_Quality_Control.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | import base64
  3 | import requests
  4 | import json
  5 | import components
  6 | from utils import show_code
  7 | from parsers import extract_json
  8 | 
  9 | 
 10 | def submit(image, api_key, issue_attributes):
 11 |     headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
 12 | 
 13 |     base64_image = base64.b64encode(image).decode("utf-8")
 14 | 
 15 |     payload = {
 16 |         "model": "gpt-4-vision-preview",
 17 |         "messages": [
 18 |             {
 19 |                 "role": "system",
 20 |                 "content": "You are an expert quality control inspector for leading manufacturers.",
 21 |             },
 22 |             {
 23 |                 "role": "user",
 24 |                 "content": [
 25 |                     {
 26 |                         "type": "text",
 27 |                         "text": (
 28 |                             "Inspect this image and write a report in the following format:\n\n"
 29 |                             "```json\n"
 30 |                             "{\n"
 31 |                             '  "issues": [\n'
 32 |                             "    {\n"
 33 |                             f"{issue_attributes}\n"
 34 |                             "    }\n"
 35 |                             "  ]\n"
 36 |                             "}\n"
 37 |                             "```\n\n"
 38 |                             "If you see any signs of quality deterioration of any kind, such as corrosion, "
 39 |                             "physical damage, decay, or contamination, add them as separate issues in the "
 40 |                             "`issues` array. If there are no issues, the `issues` array should be empty. "
 41 |                             "Your response should contain only valid JSON."
 42 |                         ),
 43 |                     },
 44 |                     {
 45 |                         "type": "image_url",
 46 |                         "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
 47 |                     },
 48 |                 ],
 49 |             },
 50 |         ],
 51 |         "max_tokens": 1024,
 52 |         "temperature": 0.1,
 53 |         # Response format not yet supported by GPT-4V
 54 |         # "response_format": {"type": "json_object"},
 55 |     }
 56 | 
 57 |     try:
 58 |         response = requests.post(
 59 |             "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
 60 |         )
 61 |         response.raise_for_status()
 62 | 
 63 |         text = extract_json(response.json()["choices"][0]["message"]["content"])
 64 |         st.session_state.response_text = text
 65 | 
 66 |         if "balloons" in st.session_state and st.session_state.balloons:
 67 |             st.balloons()
 68 |     except requests.exceptions.HTTPError as err:
 69 |         st.toast(f":red[HTTP error: {err}]")
 70 |     except Exception as err:
 71 |         st.toast(f":red[Error: {err}]")
 72 | 
 73 | 
 74 | def run():
 75 |     image = components.image_uploader()
 76 | 
 77 |     issue_attributes = st.text_area(
 78 |         "Quality Issue Attributes",
 79 |         value='"issue_critical": true if inedible,\n'
 80 |         '"issue_category": string,\n'
 81 |         '"issue_description": single-paragraph string',
 82 |         height=120,
 83 |     )
 84 | 
 85 |     api_key = components.api_key_with_warning()
 86 | 
 87 |     components.submit_button(image, api_key, submit, issue_attributes)
 88 | 
 89 |     if "response_text" in st.session_state:
 90 |         st.text_area(
 91 |             "QC Report",
 92 |             st.session_state.response_text,
 93 |             height=400,
 94 |         )
 95 | 
 96 | 
 97 | st.set_page_config(page_title="GPT-4V Quality Control", page_icon="📋")
 98 | components.inc_sidebar_nav_height()
 99 | st.write("# 📋 Quality Control")
100 | st.write("Generate a QC report for an image.")
101 | st.info(
102 |     "This is a test of the OpenAI GPT-4V preview and is not intended for production use."
103 | )
104 | st.write("\n")
105 | 
106 | run()
107 | 
108 | components.toggle_balloons()
109 | show_code([submit, run, components])
110 | 


--------------------------------------------------------------------------------
/pages/4_🗣️_Speech.py:
--------------------------------------------------------------------------------
  1 | import streamlit as st
  2 | import base64
  3 | import requests
  4 | import json
  5 | import components
  6 | from utils import show_code
  7 | 
  8 | 
  9 | def submit(image, api_key, voice, hd):
 10 |     headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
 11 | 
 12 |     base64_image = base64.b64encode(image).decode("utf-8")
 13 | 
 14 |     payload = {
 15 |         "model": "gpt-4-vision-preview",
 16 |         "messages": [
 17 |             {
 18 |                 "role": "system",
 19 |                 "content": "You are trained to extract text from images.",
 20 |             },
 21 |             {
 22 |                 "role": "user",
 23 |                 "content": [
 24 |                     {
 25 |                         "type": "text",
 26 |                         "text": "Extract all of the text visible in this image, ignoring extraneous "
 27 |                         "pieces of text like titles and page numbers, and converting Roman numerals "
 28 |                         "to Latin numerals. Your response should contain only the headers and content.",
 29 |                     },
 30 |                     {
 31 |                         "type": "image_url",
 32 |                         "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
 33 |                     },
 34 |                 ],
 35 |             },
 36 |         ],
 37 |         "max_tokens": 2048,
 38 |     }
 39 | 
 40 |     try:
 41 |         response = requests.post(
 42 |             "https://api.openai.com/v1/chat/completions", headers=headers, json=payload
 43 |         )
 44 |         response.raise_for_status()
 45 | 
 46 |         text = response.json()["choices"][0]["message"]["content"]
 47 |         st.session_state.extracted_text = text
 48 | 
 49 |         tts_payload = {
 50 |             "model": "tts-1-hd" if hd else "tts-1",
 51 |             "voice": voice,
 52 |             "input": text,
 53 |         }
 54 | 
 55 |         tts_response = requests.post(
 56 |             "https://api.openai.com/v1/audio/speech", headers=headers, json=tts_payload
 57 |         )
 58 |         tts_response.raise_for_status()
 59 | 
 60 |         st.audio(tts_response.content, format="audio/mpeg")
 61 | 
 62 |         st.download_button(
 63 |             label="📥 Save Audio",
 64 |             data=tts_response.content,
 65 |             file_name=f'audio_{tts_response.headers["x-request-id"]}.mp3',
 66 |             mime="audio/mpeg",
 67 |         )
 68 | 
 69 |         if "balloons" in st.session_state and st.session_state.balloons:
 70 |             st.balloons()
 71 |     except requests.exceptions.HTTPError as err:
 72 |         st.toast(f":red[HTTP error: {err}]")
 73 |     except Exception as err:
 74 |         st.toast(f":red[Error: {err}]")
 75 | 
 76 | 
 77 | def run():
 78 |     selected_option = st.radio(
 79 |         "Image Input",
 80 |         ["Camera", "Image File"],
 81 |         horizontal=True,
 82 |         label_visibility="collapsed",
 83 |     )
 84 | 
 85 |     if selected_option == "Camera":
 86 |         image = components.camera_uploader()
 87 |     else:
 88 |         image = components.image_uploader()
 89 | 
 90 |     api_key = components.api_key_with_warning()
 91 | 
 92 |     voice = st.selectbox(
 93 |         "AI Voice",
 94 |         ("echo", "alloy", "fable", "onyx", "nova", "shimmer"),
 95 |     )
 96 | 
 97 |     hd = st.checkbox(
 98 |         "HD",
 99 |         value=True,
100 |     )
101 | 
102 |     components.submit_button(image, api_key, submit, voice, hd)
103 | 
104 |     if "extracted_text" in st.session_state:
105 |         st.text_area(
106 |             "Extracted Text",
107 |             st.session_state.extracted_text,
108 |             height=400,
109 |         )
110 | 
111 | 
112 | st.set_page_config(page_title="GPT-4V Speech", page_icon="🗣️")
113 | components.inc_sidebar_nav_height()
114 | st.write("# 🗣️ Speech")
115 | st.write("Generate audio from an image using GPT-4V + OpenAI TTS.")
116 | st.info(
117 |     "This is a test of the OpenAI GPT-4V preview and is not intended for production use."
118 | )
119 | st.write("\n")
120 | 
121 | run()
122 | 
123 | components.toggle_balloons()
124 | show_code([submit, run, components])
125 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # GPT-4V Demos
  2 | 
  3 | <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python%20-3.8%2B-orange" alt="Python 3.8+" height="20"></a>
  4 | <a href="https://gpt-4v-test.streamlit.app/"><img src="https://static.streamlit.io/badges/streamlit_badge_black_white.svg" alt="Streamlit App" height="20"></a>
  5 | <a href="https://codespaces.new/logicalroot/gpt-4v-demos?quickstart=1"><img src="https://github.com/codespaces/badge.svg" alt="Open in GitHub Codespaces" height="20"></a>
  6 | 
  7 | This mobile-friendly web app provides some basic demos to test the vision capabilities of GPT-4V.
  8 | 
  9 | [Streamlit](https://streamlit.io) was selected as a framework for this project to enable rapid prototyping of new ideas.
 10 | 
 11 | ## Examples
 12 | 
 13 | <table>
 14 |   <tr><td>📷 Camera</td><td>Take a photo with your device's camera and generate a caption.</td></tr>
 15 |   <tr>
 16 |     <td colspan="2">
 17 |       <img align="left" src="test_images/turkey.jpg" alt="Turkey" width="34%" />
 18 |       <code>An unexpected traveler struts confidently across the asphalt, its iridescent feathers gleaming in the sunlight. This wild turkey, with its distinctive fan of tail feathers on full display, appears unfazed by the nearby human presence. A dash of wilderness encounters suburbia as the bird navigates between nature and civilization. The scene is a gentle reminder of the ever-present connection between human and animal territories.</code>
 19 |     </td>
 20 |   </tr>
 21 |   <tr><td>👕 Product Descriptions</td><td>Generate a product description for an image.</td></tr>
 22 |   <tr>
 23 |     <td colspan="2">
 24 |       <img align="left" src="test_images/pre-generated_hoodie.png" alt="Pre-generated Hoodie" width="34%" />
 25 |       Additional Input<br />
 26 |       <code>{ "product_attributes": { "brand_name": "Logical Root", "product_name": "Pre-generated Hoodie", "materials": "100% digital cotton" } }</code><br />
 27 |       Output<br />
 28 |       <code>{ "description": "Embrace the fusion of art and comfort with the Logical Root Pre-generated Hoodie, a masterpiece crafted from 100% digital cotton to provide unparalleled softness and durability. The hoodie comes in a classic, versatile shade of black, boasting a bold graphic print at its core that captures a whirlwind of vibrant colors in an abstract design, promising to turn heads and spark conversations. With a spacious front pocket to keep your essentials close and a snug hoodie with adjustable drawstrings for those extra chilly days, this piece is the epitome of functional fashion. The ribbed cuffs and hem ensure a perfect fit while adding to the overall sleek silhouette, making it a must-have addition to your wardrobe whether you're aiming for a casual day out or a statement-making ensemble." }</code>
 29 |     </td>
 30 |   </tr>
 31 |   <tr><td>🧾 OCR</td><td>Extract the text from an image.</td></tr>
 32 |   <tr>
 33 |     <td colspan="2">
 34 |       <img align="left" src="test_images/soup_can.jpg" alt="Soup Can" width="34%" />
 35 |       <code>The text on the can reads:</code><br />
 36 |       <code>- Campbell's®</code><br />
 37 |       <code>- CONDENSED</code><br />
 38 |       <code>- 90 CALORIES PER 1/2 CUP</code><br />
 39 |       <code>- Tomato</code><br />
 40 |       <code>- SOUP</code><br />
 41 |       <code>- NET WT. 10 3/4 OZ. (305g)</code><br />
 42 |       <code>There is also text within a gold seal that reads:</code><br />
 43 |       <code>- "PARIS INTERNATIONAL EXPOSITION 1900"</code><br />
 44 |     </td>
 45 |   </tr>
 46 |   <tr><td>📋 Quality Control</td><td>Generate a QC report for an image.</td></tr>
 47 |   <tr>
 48 |     <td colspan="2">
 49 |       <img align="left" src="test_images/strawberries.jpg" alt="Strawberries" width="34%" />
 50 |       Additional Input<br />
 51 |       <code>"issue_critical": true if inedible, "issue_category": string, "issue_description": single-paragraph string</code><br />
 52 |       Output<br />
 53 |       <code>{ "issues": [ { "issue_critical": true, "issue_category": "Contamination", "issue_description": "There is visible mold on one of the strawberries in the bottom left corner of the image, indicating spoilage and potential health risk if consumed." }, { "issue_critical": false, "issue_category": "Physical Damage", "issue_description": "Several strawberries appear to have minor physical damage, such as dents and bruises, which may affect their shelf life and aesthetic appeal but are not necessarily a health hazard." } ] }</code>
 54 |     </td>
 55 |   </tr>
 56 |   <tr><td>🗣️ Speech</td><td>Generate audio from an image using GPT-4V + OpenAI TTS.</td></tr>
 57 |   <tr>
 58 |     <td colspan="2">
 59 |       <img align="left" src="test_images/alice.jpg" alt="Alice" width="34%" />
 60 |       <a href="https://github.com/logicalroot/gpt-4v-demos/raw/main/test_results/alice.mp3">Download audio</a> | <a href="https://codepen.io/logicalroot/full/gOqQzKE">Play audio on CodePen</a>
 61 |     </td>
 62 |   </tr>
 63 | </table>
 64 | 
 65 | ## Prerequisites
 66 | 
 67 | - Python 3.8+
 68 | - OpenAI API key
 69 | > [How can I access GPT-4?](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4)
 70 | 
 71 | ## Local setup
 72 | 
 73 | Here's how you can get started.
 74 | 
 75 | 1. Clone this repository.
 76 | ```
 77 | git clone https://github.com/logicalroot/gpt-4v-demos.git
 78 | cd gpt-4v-demos
 79 | ```
 80 | 2. Install the necessary packages:
 81 | ```
 82 | pip install streamlit
 83 | ```
 84 | 3. Run the application:
 85 | ```
 86 | streamlit run 🏠_Home.py
 87 | ```
 88 | 4. To remove the missing secrets warning, create a blank `secrets.toml` file in your `.streamlit` folder.
 89 | 
 90 | > [!TIP]
 91 | > To avoid inputting your OpenAI API key every run, you can add it to `secrets.toml` with the following line. Paste your key between the double quotes.
 92 | > ```
 93 | > OPENAI_API_KEY = "YOUR KEY"
 94 | > ```
 95 | > For safety, ensure `secrets.toml` is in your `.gitignore` file.
 96 | 
 97 | ## Limitations
 98 | 
 99 | To use the camera input on iOS devices, Streamlit must be configured to use SSL. See [Streamlit docs](https://docs.streamlit.io/library/advanced-features/https-support).
100 | 
101 | ## Contributing
102 | 
103 | Feel free to experiment and share new demos using the code!
104 | 
105 | ## About GPT-4V
106 | 
107 | - [OpenAI announcement](https://openai.com/blog/new-models-and-developer-products-announced-at-devday)
108 | - [OpenAI research](https://openai.com/research/gpt-4v-system-card)
109 | - [OpenAI docs](https://platform.openai.com/docs/guides/vision)
110 | 
111 | ## License
112 | 
113 | This project is licensed under the terms of the MIT license.


--------------------------------------------------------------------------------