├── .dockerignore
├── .env.sample
├── .gitignore
├── LICENSE
├── README.md
├── app.py
├── openapi.yaml
├── prompts.txt
├── requirements.txt
└── upload.py
/.dockerignore:
--------------------------------------------------------------------------------
1 | venv
2 | .git
--------------------------------------------------------------------------------
/.env.sample:
--------------------------------------------------------------------------------
1 | VIDEO_DB_API_KEY=
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.log
3 | ngrok
4 | !lib/README.md
5 | .DS_Store
6 | google/
7 | lib/
8 | archive
9 | temp/
10 | transcribe/
11 | .idea/
12 | ideas/
13 | .ipynb_checkpoints
14 | log/
15 | model_data/
16 | data/
17 | nohup.out
18 | dump.rdb
19 | *.out
20 | *.zip
21 | .idea/*
22 | .env
23 | venv/
24 | test.json
25 | __pycache__
26 | __pycache__/*
27 | */__pycache__
28 | zappa_settings.py
29 | .vscode
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License
2 |
3 | Copyright (c) Ashutosh Trivedi
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
6 | [![PyPI version][pypi-shield]][pypi-url]
7 | [![Stargazers][stars-shield]][stars-url]
8 | [![Issues][issues-shield]][issues-url]
9 | [![Website][website-shield]][website-url]
10 | [![Discord][discord-shield]][discord-url]
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
StreamRAG 🎥
21 |
22 |
23 | Video Search Agent for ChatGPT 🕵️♂️
24 |
25 | 📺Watch Demo Video
26 | ·
27 | 🐞Report a Bug
28 | ·
29 | 💡Suggest a Feature
30 |
31 |
32 |
33 |
34 |
35 | # StreamRAG: GPT-Powered Video Retrieval & Streaming 🚀
36 |
37 |
38 | https://github.com/video-db/StreamRAG/assets/5406975/b768bb6e-08b8-451e-9117-1cf04488c02c
39 |
40 |
41 |
42 |
43 | ## What does it do? 🤔
44 |
45 | It enables developers to:
46 | * 📚 Upload multiple videos to create a library or collection.
47 | * 🔍 Search across these videos and get real-time video responses or compilations.
48 | * 🛒 Publish your searchable collection on the ChatGPT store.
49 | * 📝 Receive summarized text answers (RAG).
50 | * 🌟 Gain key insights from specific videos (e.g. "_Top points from episode 31_").
51 |
52 | ## How do I use it? 🛠️
53 | [📺 Watch: Code walkthrough](https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/b79a91d7-9553-4b4f-9d02-a47b9e168148.m3u8)
54 |
55 | - **Get your API key:** Sign up on [VideoDB console](https://console.videodb.io) (Free for the first 50 uploads, no
56 | credit card required). 🆓
57 | - **Set `VIDEO_DB_API_KEY`:** Enter your key in the `env` file.
58 | - **Install dependencies:** Run `pip install -r requirements.txt` in your terminal.
59 | - **Upload your collection to VideoDB:** Add your links in `upload.py`.
60 | - **Run locally:** Start the flask server with `python app.py`.
61 |
62 | ## Publishing on ChatGPT Store 🏪
63 | [📺 Watch: Create New GPT](https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/b4b01b80-f38b-47f7-a238-09e53d844792.m3u8)
64 |
65 | 1. Deploy your flask server and note your server's `url`.
66 | 2. In `openapi.yaml`, update the `url` field under `server`.
67 | 3. Visit the GPT builder at https://chat.openai.com/gpts/editor
68 | 4. In the configure tab, add your GPT's `Name` and `Description`.
69 | 5. Copy the prompt from `prompts.txt` into the `Instructions` field. Feel free to modify it as needed. ✏️
70 | 6. Click on `Create new Action`
71 | 7. Copy the openapi details from `openapi.yaml` Don't miss to update the `url` field.
72 | 8. Save your GPT for personal use and give it a test run! 🧪
73 |
74 | ---
75 |
76 |
77 | ## Roadmap 🛣️
78 |
79 | 1. Add support for popular backend deployment CD pipelines like `Heroku`, `Replit`, etc.
80 | 2. Integrate with other data sources like `Dropbox`, `Google Drive`.
81 | 3. Connect with meeting recorder APIs such as `Zoom`, `Teams`, and `Recall.ai`.
82 |
83 | ---
84 |
85 |
86 | ## Contributing 🤝
87 |
88 | Your contributions make the open-source community an incredible place for learning, inspiration, and creativity. We
89 | welcome and appreciate your input! Here's how you can contribute:
90 |
91 | - Open issues to share your use cases.
92 | - Participate in brainstorming solutions for our roadmap.
93 | - Suggest improvements to the codebase.
94 |
95 | ### Contribution Steps
96 |
97 | 1. Fork the Project 🍴
98 | 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
99 | 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
100 | 4. Push to the Branch (`git push origin feature/AmazingFeature`)
101 | 5. Open a Pull Request 📬
102 |
103 | ---
104 |
105 |
106 |
107 |
108 | [pypi-shield]: https://img.shields.io/pypi/v/videodb?style=for-the-badge
109 |
110 | [pypi-url]: https://pypi.org/project/videodb/
111 |
112 | [python-shield]:https://img.shields.io/pypi/pyversions/videodb?style=for-the-badge
113 |
114 | [stars-shield]: https://img.shields.io/github/stars/video-db/streamRAG.svg?style=for-the-badge
115 |
116 | [stars-url]: https://github.com/video-db/streamRAG/stargazers
117 |
118 | [issues-shield]: https://img.shields.io/github/issues/video-db/videodb-python.svg?style=for-the-badge
119 |
120 | [issues-url]: https://github.com/video-db/streamRAG/issues
121 |
122 | [website-shield]: https://img.shields.io/website?url=https%3A%2F%2Fvideodb.io%2F&style=for-the-badge&label=videodb.io
123 |
124 | [website-url]: https://videodb.io/
125 |
126 | [discord-shield]: https://img.shields.io/discord/1189572299851051169?style=for-the-badge&logo=discord&label=Discord
127 |
128 | [discord-url]: https://discord.gg/py9P639jGz
129 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from dotenv import load_dotenv
4 | from flask import Flask, request
5 | from flask_cors import CORS
6 | from videodb import connect, SearchError
7 |
8 | load_dotenv()
9 |
10 | # Flask config
11 | app = Flask(__name__)
12 | app.secret_key = os.getenv("SECRET_KEY")
13 | app.url_map.strict_slashes = False
14 | CORS(app)
15 |
16 |
17 | def get_connection():
18 | conn = connect()
19 | return conn
20 |
21 |
22 | @app.route("/")
23 | def hello():
24 | return "StreamRAG: Your Go-To Video Search Agent"
25 |
26 |
27 | @app.route("/videos", methods=["GET"])
28 | def list_videos():
29 | """
30 | Get a list of all videos in the database of your default collection.
31 | """
32 | conn = get_connection()
33 | all_videos = conn.get_collection().get_videos()
34 | all_videos = [
35 | {
36 | "id": vid.id,
37 | "title": vid.name,
38 | "url": vid.stream_url,
39 | "length": round(float(vid.length)),
40 | }
41 | for vid in all_videos
42 | ]
43 | response = {"videos": all_videos}
44 | return response
45 |
46 |
47 | @app.route("/video/", methods=["GET"])
48 | def get_video(id):
49 | """
50 | Get a single video by id from default collection
51 | """
52 | conn = get_connection()
53 | all_videos = conn.get_collection().get_videos()
54 |
55 | vid = next(vid for vid in all_videos if vid.id == id)
56 |
57 | print("vid", vid)
58 | vid.get_transcript()
59 | transcript_text = vid.transcript_text
60 |
61 | response = {
62 | "video": {
63 | "id": vid.id,
64 | "title": vid.name,
65 | "url": vid.stream_url,
66 | "length": round(float(vid.length)),
67 | "transcript": transcript_text,
68 | }
69 | }
70 | return response
71 |
72 |
73 | @app.route("/search", methods=["POST"])
74 | def search_videos():
75 | """
76 | Search across videos in the database in default collection
77 | """
78 | data = request.get_json()
79 | query = data.get("query")
80 | conn = get_connection()
81 | try:
82 | coll = conn.get_collection()
83 | search_results = coll.search(query)
84 | search_results.compile()
85 | compilation_vid = search_results.player_url
86 | except SearchError:
87 | return "No Search Results found", 404
88 |
89 | shots = [
90 | {"text": shot.text, "video": shot.stream_url}
91 | for shot in search_results.get_shots()
92 | ]
93 | response = {"compilationVideo": compilation_vid, "chunks": shots}
94 | return response
95 |
96 |
97 | if __name__ == '__main__':
98 | app.run(host='0.0.0.0', port=8080, debug=True)
99 |
--------------------------------------------------------------------------------
/openapi.yaml:
--------------------------------------------------------------------------------
1 | openapi: 3.0.0
2 | info:
3 | title: Video Search API
4 | description: This API allows users to search collection of videos and get details of individual videos.
5 | version: 1.9.0
6 | servers:
7 | - url:
8 | description: Main API server
9 | paths:
10 | /videos:
11 | get:
12 | operationId: listVideos
13 | summary: Get list of all videos in the library.
14 | responses:
15 | "200":
16 | description: List details of all the videos
17 | content:
18 | application/json:
19 | schema:
20 | $ref: "#/components/schemas/ListVideosResponse"
21 | "400":
22 | description: Invalid request
23 | "default":
24 | description: Unexpected error
25 | /video/{id}:
26 | get:
27 | operationId: getVideo
28 | summary: Get data ( transcript,length etc.) of a video given its id.
29 | parameters:
30 | - name: id
31 | in: path
32 | required: true
33 | description: The unique identifier of the video.
34 | schema:
35 | type: string
36 | responses:
37 | "200":
38 | description: Video Data of the requested video
39 | content:
40 | application/json:
41 | schema:
42 | $ref: "#/components/schemas/GetVideoResponse"
43 | "400":
44 | description: Invalid request due to incorrect or missing video id.
45 | "default":
46 | description: Unexpected error
47 | /search:
48 | post:
49 | operationId: searchVideos
50 | summary: Search for videos.
51 | requestBody:
52 | required: true
53 | content:
54 | application/json:
55 | schema:
56 | $ref: "#/components/schemas/SearchRequest"
57 | responses:
58 | "200":
59 | description: Search results
60 | content:
61 | application/json:
62 | schema:
63 | $ref: "#/components/schemas/SearchResponse"
64 | "404":
65 | description: No videos found
66 | "400":
67 | description: Invalid request
68 | "default":
69 | description: Unexpected error
70 | components:
71 | schemas:
72 | SearchRequest:
73 | type: object
74 | properties:
75 | query:
76 | type: string
77 | description: Search query for finding videos
78 | SearchResponse:
79 | type: object
80 | properties:
81 | compilationVideo:
82 | type: string
83 | format: uri
84 | description: Playable URL of the video
85 | chunks:
86 | type: array
87 | items:
88 | type: object
89 | properties:
90 | text:
91 | type: string
92 | description: Text content of the video
93 | video:
94 | type: string
95 | format: uri
96 | description: Playable URL of the video segment
97 | GetVideoResponse:
98 | type: object
99 | properties:
100 | video:
101 | type: object
102 | properties:
103 | id:
104 | type: string
105 | description: Unique id of the video
106 | title:
107 | type: string
108 | description: Title of the video
109 | url:
110 | description: Playable URL of the video
111 | format: uri
112 | type: string
113 | length:
114 | description: Length of the video in seconds
115 | type: number
116 | transcript:
117 | description: Transcript of the video
118 | type: string
119 | ListVideosResponse:
120 | type: object
121 | properties:
122 | videos:
123 | type: array
124 | items:
125 | type: object
126 | properties:
127 | title:
128 | type: string
129 | description: Title of the video
130 | id:
131 | type: string
132 | description: Unique id of the video
133 | url:
134 | description: Playable URL of the video
135 | format: uri
136 | type: string
137 | length:
138 | description: Length of the video in seconds
139 | type: number
--------------------------------------------------------------------------------
/prompts.txt:
--------------------------------------------------------------------------------
1 | You are video search assistant, adept at handling video-related tasks with a casual tone. This step-by-step approach ensures
2 | a comprehensive and user-friendly response to video search requests, combining visual and textual information effectively.
3 |
4 | When a user asks you to search or find information, your first step is to identify if the request has a search query. If you can identify
5 | the search query, call the action `search` with the provided query. The action will return a `compilationVideo` and a
6 | list of related segments from the library. Each video has fields `title`, `id`, `link`, `text`.
7 | If user's request is for the video clip, show the compilationVideo with a short casual tone summary text of result.
8 |
9 | You would analyze the user's query and use the related `text` chunks to summarize the results in following fashion:
10 | 1.Return a concise, bullet-pointed response.
11 | 2.The response should include relevant information about the topic based on media.
12 | 3.If the response includes a lot of details, return only a short text answer.
13 | 4.If there are enough and accurate reference videos, include them as links in a separate bullet-pointed list titled 'Reference Videos:'.
14 | Limit these to the top 5 reference videos.
15 | 5.If not much relevant information is found across the videos, then return a message stating that no relevant information was found in the content,
16 |
17 | To complete other tasks:
18 | - You can get list of all videos by calling action videos and show videos which user needs. user can pick one of the video.
19 | - You can get data of individual video by calling action video/{id} to fetch more details about a video for example transcript,
20 | thumbnail etc.
21 | If you don’t know what id of the video user referring to, get all videos first and confirm the video
22 | from user and follow instructions.
23 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | backoff==2.2.1
2 | blinker==1.7.0
3 | certifi==2023.11.17
4 | charset-normalizer==3.3.2
5 | click==8.1.7
6 | Flask==3.0.0
7 | Flask-Cors==4.0.0
8 | gunicorn==20.0.4
9 | idna==3.6
10 | importlib-metadata==7.0.1
11 | itsdangerous==2.1.2
12 | Jinja2==3.1.3
13 | MarkupSafe==2.1.3
14 | python-dotenv==1.0.0
15 | pytube==15.0.0
16 | requests==2.31.0
17 | urllib3==2.1.0
18 | videodb==0.0.2
19 | Werkzeug==3.0.1
20 | zipp==3.17.0
21 |
--------------------------------------------------------------------------------
/upload.py:
--------------------------------------------------------------------------------
1 | from pytube import Playlist
2 | from videodb import connect
3 |
4 | from dotenv import load_dotenv
5 |
6 | load_dotenv()
7 |
8 |
9 | def get_youtube_playlist_video_urls(playlist_url):
10 | # TODO: Error and exception handling
11 | playlist = Playlist(playlist_url)
12 | urls = [url for url in playlist]
13 | return urls
14 |
15 |
16 | def bulk_upload(urls):
17 | # Read VideoDB API key from env and create a connection
18 | conn = connect()
19 | # Get a collection
20 | coll = conn.get_collection()
21 | for url in urls:
22 | # Upload Videos to a collection checkout https://docs.videodb.io for more upload functions
23 | print(f"Uploading {url}")
24 | video = coll.upload(url=url)
25 | print(f"Uploaded {video.name}")
26 | print(f"Indexing {video.name}")
27 | video.index_spoken_words()
28 | print(f"Indexed {video.name}")
29 | print("-----")
30 |
31 | # run bulk upload fn on list of videos
32 | """
33 | urls = [
34 | "https://www.youtube.com/watch?v=lsODSDmY4CY",
35 | "https://www.youtube.com/watch?v=vZ4kOr38JhY",
36 | "https://www.youtube.com/watch?v=uak_dXHh6s4",
37 | ]
38 | bulk_upload(urls)
39 | """
40 |
41 | # run bulk upload fn on YouTube playlist
42 | """
43 | playlist_url = "https://www.youtube.com/watch?v=jSMZoLjB9JE&list=PLoaVOjvkzQtwcMfopT02bXWzjmnnF5olS"
44 | urls = get_youtube_playlist_video_urls(playlist_url)
45 | bulk_upload(urls)
46 | """
--------------------------------------------------------------------------------