├── .dockerignore ├── .env.sample ├── .gitignore ├── LICENSE ├── README.md ├── app.py ├── openapi.yaml ├── prompts.txt ├── requirements.txt └── upload.py /.dockerignore: -------------------------------------------------------------------------------- 1 | venv 2 | .git -------------------------------------------------------------------------------- /.env.sample: -------------------------------------------------------------------------------- 1 | VIDEO_DB_API_KEY= -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.log 3 | ngrok 4 | !lib/README.md 5 | .DS_Store 6 | google/ 7 | lib/ 8 | archive 9 | temp/ 10 | transcribe/ 11 | .idea/ 12 | ideas/ 13 | .ipynb_checkpoints 14 | log/ 15 | model_data/ 16 | data/ 17 | nohup.out 18 | dump.rdb 19 | *.out 20 | *.zip 21 | .idea/* 22 | .env 23 | venv/ 24 | test.json 25 | __pycache__ 26 | __pycache__/* 27 | */__pycache__ 28 | zappa_settings.py 29 | .vscode -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License 2 | 3 | Copyright (c) Ashutosh Trivedi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 6 | [![PyPI version][pypi-shield]][pypi-url] 7 | [![Stargazers][stars-shield]][stars-url] 8 | [![Issues][issues-shield]][issues-url] 9 | [![Website][website-shield]][website-url] 10 | [![Discord][discord-shield]][discord-url] 11 | 12 | 13 | 14 |
15 |

16 | 17 | Logo 18 | 19 | 20 |

StreamRAG 🎥

21 | 22 |

23 | Video Search Agent for ChatGPT 🕵️‍♂️ 24 |
25 | 📺Watch Demo Video 26 | · 27 | 🐞Report a Bug 28 | · 29 | 💡Suggest a Feature 30 |

31 |

32 | 33 | 34 | 35 | # StreamRAG: GPT-Powered Video Retrieval & Streaming 🚀 36 | 37 | 38 | https://github.com/video-db/StreamRAG/assets/5406975/b768bb6e-08b8-451e-9117-1cf04488c02c 39 | 40 | 41 | 42 | 43 | ## What does it do? 🤔 44 | 45 | It enables developers to: 46 | * 📚 Upload multiple videos to create a library or collection. 47 | * 🔍 Search across these videos and get real-time video responses or compilations. 48 | * 🛒 Publish your searchable collection on the ChatGPT store. 49 | * 📝 Receive summarized text answers (RAG). 50 | * 🌟 Gain key insights from specific videos (e.g. "_Top points from episode 31_"). 51 | 52 | ## How do I use it? 🛠️ 53 | [📺 Watch: Code walkthrough](https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/b79a91d7-9553-4b4f-9d02-a47b9e168148.m3u8) 54 | 55 | - **Get your API key:** Sign up on [VideoDB console](https://console.videodb.io) (Free for the first 50 uploads, no 56 | credit card required). 🆓 57 | - **Set `VIDEO_DB_API_KEY`:** Enter your key in the `env` file. 58 | - **Install dependencies:** Run `pip install -r requirements.txt` in your terminal. 59 | - **Upload your collection to VideoDB:** Add your links in `upload.py`. 60 | - **Run locally:** Start the flask server with `python app.py`. 61 | 62 | ## Publishing on ChatGPT Store 🏪 63 | [📺 Watch: Create New GPT](https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/b4b01b80-f38b-47f7-a238-09e53d844792.m3u8) 64 | 65 | 1. Deploy your flask server and note your server's `url`. 66 | 2. In `openapi.yaml`, update the `url` field under `server`. 67 | 3. Visit the GPT builder at https://chat.openai.com/gpts/editor 68 | 4. In the configure tab, add your GPT's `Name` and `Description`. 69 | 5. Copy the prompt from `prompts.txt` into the `Instructions` field. Feel free to modify it as needed. ✏️ 70 | 6. Click on `Create new Action` 71 | 7. Copy the openapi details from `openapi.yaml` Don't miss to update the `url` field. 72 | 8. Save your GPT for personal use and give it a test run! 🧪 73 | 74 | --- 75 | 76 | 77 | ## Roadmap 🛣️ 78 | 79 | 1. Add support for popular backend deployment CD pipelines like `Heroku`, `Replit`, etc. 80 | 2. Integrate with other data sources like `Dropbox`, `Google Drive`. 81 | 3. Connect with meeting recorder APIs such as `Zoom`, `Teams`, and `Recall.ai`. 82 | 83 | --- 84 | 85 | 86 | ## Contributing 🤝 87 | 88 | Your contributions make the open-source community an incredible place for learning, inspiration, and creativity. We 89 | welcome and appreciate your input! Here's how you can contribute: 90 | 91 | - Open issues to share your use cases. 92 | - Participate in brainstorming solutions for our roadmap. 93 | - Suggest improvements to the codebase. 94 | 95 | ### Contribution Steps 96 | 97 | 1. Fork the Project 🍴 98 | 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`) 99 | 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`) 100 | 4. Push to the Branch (`git push origin feature/AmazingFeature`) 101 | 5. Open a Pull Request 📬 102 | 103 | --- 104 | 105 | 106 | 107 | 108 | [pypi-shield]: https://img.shields.io/pypi/v/videodb?style=for-the-badge 109 | 110 | [pypi-url]: https://pypi.org/project/videodb/ 111 | 112 | [python-shield]:https://img.shields.io/pypi/pyversions/videodb?style=for-the-badge 113 | 114 | [stars-shield]: https://img.shields.io/github/stars/video-db/streamRAG.svg?style=for-the-badge 115 | 116 | [stars-url]: https://github.com/video-db/streamRAG/stargazers 117 | 118 | [issues-shield]: https://img.shields.io/github/issues/video-db/videodb-python.svg?style=for-the-badge 119 | 120 | [issues-url]: https://github.com/video-db/streamRAG/issues 121 | 122 | [website-shield]: https://img.shields.io/website?url=https%3A%2F%2Fvideodb.io%2F&style=for-the-badge&label=videodb.io 123 | 124 | [website-url]: https://videodb.io/ 125 | 126 | [discord-shield]: https://img.shields.io/discord/1189572299851051169?style=for-the-badge&logo=discord&label=Discord 127 | 128 | [discord-url]: https://discord.gg/py9P639jGz 129 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from dotenv import load_dotenv 4 | from flask import Flask, request 5 | from flask_cors import CORS 6 | from videodb import connect, SearchError 7 | 8 | load_dotenv() 9 | 10 | # Flask config 11 | app = Flask(__name__) 12 | app.secret_key = os.getenv("SECRET_KEY") 13 | app.url_map.strict_slashes = False 14 | CORS(app) 15 | 16 | 17 | def get_connection(): 18 | conn = connect() 19 | return conn 20 | 21 | 22 | @app.route("/") 23 | def hello(): 24 | return "StreamRAG: Your Go-To Video Search Agent" 25 | 26 | 27 | @app.route("/videos", methods=["GET"]) 28 | def list_videos(): 29 | """ 30 | Get a list of all videos in the database of your default collection. 31 | """ 32 | conn = get_connection() 33 | all_videos = conn.get_collection().get_videos() 34 | all_videos = [ 35 | { 36 | "id": vid.id, 37 | "title": vid.name, 38 | "url": vid.stream_url, 39 | "length": round(float(vid.length)), 40 | } 41 | for vid in all_videos 42 | ] 43 | response = {"videos": all_videos} 44 | return response 45 | 46 | 47 | @app.route("/video/", methods=["GET"]) 48 | def get_video(id): 49 | """ 50 | Get a single video by id from default collection 51 | """ 52 | conn = get_connection() 53 | all_videos = conn.get_collection().get_videos() 54 | 55 | vid = next(vid for vid in all_videos if vid.id == id) 56 | 57 | print("vid", vid) 58 | vid.get_transcript() 59 | transcript_text = vid.transcript_text 60 | 61 | response = { 62 | "video": { 63 | "id": vid.id, 64 | "title": vid.name, 65 | "url": vid.stream_url, 66 | "length": round(float(vid.length)), 67 | "transcript": transcript_text, 68 | } 69 | } 70 | return response 71 | 72 | 73 | @app.route("/search", methods=["POST"]) 74 | def search_videos(): 75 | """ 76 | Search across videos in the database in default collection 77 | """ 78 | data = request.get_json() 79 | query = data.get("query") 80 | conn = get_connection() 81 | try: 82 | coll = conn.get_collection() 83 | search_results = coll.search(query) 84 | search_results.compile() 85 | compilation_vid = search_results.player_url 86 | except SearchError: 87 | return "No Search Results found", 404 88 | 89 | shots = [ 90 | {"text": shot.text, "video": shot.stream_url} 91 | for shot in search_results.get_shots() 92 | ] 93 | response = {"compilationVideo": compilation_vid, "chunks": shots} 94 | return response 95 | 96 | 97 | if __name__ == '__main__': 98 | app.run(host='0.0.0.0', port=8080, debug=True) 99 | -------------------------------------------------------------------------------- /openapi.yaml: -------------------------------------------------------------------------------- 1 | openapi: 3.0.0 2 | info: 3 | title: Video Search API 4 | description: This API allows users to search collection of videos and get details of individual videos. 5 | version: 1.9.0 6 | servers: 7 | - url: 8 | description: Main API server 9 | paths: 10 | /videos: 11 | get: 12 | operationId: listVideos 13 | summary: Get list of all videos in the library. 14 | responses: 15 | "200": 16 | description: List details of all the videos 17 | content: 18 | application/json: 19 | schema: 20 | $ref: "#/components/schemas/ListVideosResponse" 21 | "400": 22 | description: Invalid request 23 | "default": 24 | description: Unexpected error 25 | /video/{id}: 26 | get: 27 | operationId: getVideo 28 | summary: Get data ( transcript,length etc.) of a video given its id. 29 | parameters: 30 | - name: id 31 | in: path 32 | required: true 33 | description: The unique identifier of the video. 34 | schema: 35 | type: string 36 | responses: 37 | "200": 38 | description: Video Data of the requested video 39 | content: 40 | application/json: 41 | schema: 42 | $ref: "#/components/schemas/GetVideoResponse" 43 | "400": 44 | description: Invalid request due to incorrect or missing video id. 45 | "default": 46 | description: Unexpected error 47 | /search: 48 | post: 49 | operationId: searchVideos 50 | summary: Search for videos. 51 | requestBody: 52 | required: true 53 | content: 54 | application/json: 55 | schema: 56 | $ref: "#/components/schemas/SearchRequest" 57 | responses: 58 | "200": 59 | description: Search results 60 | content: 61 | application/json: 62 | schema: 63 | $ref: "#/components/schemas/SearchResponse" 64 | "404": 65 | description: No videos found 66 | "400": 67 | description: Invalid request 68 | "default": 69 | description: Unexpected error 70 | components: 71 | schemas: 72 | SearchRequest: 73 | type: object 74 | properties: 75 | query: 76 | type: string 77 | description: Search query for finding videos 78 | SearchResponse: 79 | type: object 80 | properties: 81 | compilationVideo: 82 | type: string 83 | format: uri 84 | description: Playable URL of the video 85 | chunks: 86 | type: array 87 | items: 88 | type: object 89 | properties: 90 | text: 91 | type: string 92 | description: Text content of the video 93 | video: 94 | type: string 95 | format: uri 96 | description: Playable URL of the video segment 97 | GetVideoResponse: 98 | type: object 99 | properties: 100 | video: 101 | type: object 102 | properties: 103 | id: 104 | type: string 105 | description: Unique id of the video 106 | title: 107 | type: string 108 | description: Title of the video 109 | url: 110 | description: Playable URL of the video 111 | format: uri 112 | type: string 113 | length: 114 | description: Length of the video in seconds 115 | type: number 116 | transcript: 117 | description: Transcript of the video 118 | type: string 119 | ListVideosResponse: 120 | type: object 121 | properties: 122 | videos: 123 | type: array 124 | items: 125 | type: object 126 | properties: 127 | title: 128 | type: string 129 | description: Title of the video 130 | id: 131 | type: string 132 | description: Unique id of the video 133 | url: 134 | description: Playable URL of the video 135 | format: uri 136 | type: string 137 | length: 138 | description: Length of the video in seconds 139 | type: number -------------------------------------------------------------------------------- /prompts.txt: -------------------------------------------------------------------------------- 1 | You are video search assistant, adept at handling video-related tasks with a casual tone. This step-by-step approach ensures 2 | a comprehensive and user-friendly response to video search requests, combining visual and textual information effectively. 3 | 4 | When a user asks you to search or find information, your first step is to identify if the request has a search query. If you can identify 5 | the search query, call the action `search` with the provided query. The action will return a `compilationVideo` and a 6 | list of related segments from the library. Each video has fields `title`, `id`, `link`, `text`. 7 | If user's request is for the video clip, show the compilationVideo with a short casual tone summary text of result. 8 | 9 | You would analyze the user's query and use the related `text` chunks to summarize the results in following fashion: 10 | 1.Return a concise, bullet-pointed response. 11 | 2.The response should include relevant information about the topic based on media. 12 | 3.If the response includes a lot of details, return only a short text answer. 13 | 4.If there are enough and accurate reference videos, include them as links in a separate bullet-pointed list titled 'Reference Videos:'. 14 | Limit these to the top 5 reference videos. 15 | 5.If not much relevant information is found across the videos, then return a message stating that no relevant information was found in the content, 16 | 17 | To complete other tasks: 18 | - You can get list of all videos by calling action videos and show videos which user needs. user can pick one of the video. 19 | - You can get data of individual video by calling action video/{id} to fetch more details about a video for example transcript, 20 | thumbnail etc. 21 | If you don’t know what id of the video user referring to, get all videos first and confirm the video 22 | from user and follow instructions. 23 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | backoff==2.2.1 2 | blinker==1.7.0 3 | certifi==2023.11.17 4 | charset-normalizer==3.3.2 5 | click==8.1.7 6 | Flask==3.0.0 7 | Flask-Cors==4.0.0 8 | gunicorn==20.0.4 9 | idna==3.6 10 | importlib-metadata==7.0.1 11 | itsdangerous==2.1.2 12 | Jinja2==3.1.3 13 | MarkupSafe==2.1.3 14 | python-dotenv==1.0.0 15 | pytube==15.0.0 16 | requests==2.31.0 17 | urllib3==2.1.0 18 | videodb==0.0.2 19 | Werkzeug==3.0.1 20 | zipp==3.17.0 21 | -------------------------------------------------------------------------------- /upload.py: -------------------------------------------------------------------------------- 1 | from pytube import Playlist 2 | from videodb import connect 3 | 4 | from dotenv import load_dotenv 5 | 6 | load_dotenv() 7 | 8 | 9 | def get_youtube_playlist_video_urls(playlist_url): 10 | # TODO: Error and exception handling 11 | playlist = Playlist(playlist_url) 12 | urls = [url for url in playlist] 13 | return urls 14 | 15 | 16 | def bulk_upload(urls): 17 | # Read VideoDB API key from env and create a connection 18 | conn = connect() 19 | # Get a collection 20 | coll = conn.get_collection() 21 | for url in urls: 22 | # Upload Videos to a collection checkout https://docs.videodb.io for more upload functions 23 | print(f"Uploading {url}") 24 | video = coll.upload(url=url) 25 | print(f"Uploaded {video.name}") 26 | print(f"Indexing {video.name}") 27 | video.index_spoken_words() 28 | print(f"Indexed {video.name}") 29 | print("-----") 30 | 31 | # run bulk upload fn on list of videos 32 | """ 33 | urls = [ 34 | "https://www.youtube.com/watch?v=lsODSDmY4CY", 35 | "https://www.youtube.com/watch?v=vZ4kOr38JhY", 36 | "https://www.youtube.com/watch?v=uak_dXHh6s4", 37 | ] 38 | bulk_upload(urls) 39 | """ 40 | 41 | # run bulk upload fn on YouTube playlist 42 | """ 43 | playlist_url = "https://www.youtube.com/watch?v=jSMZoLjB9JE&list=PLoaVOjvkzQtwcMfopT02bXWzjmnnF5olS" 44 | urls = get_youtube_playlist_video_urls(playlist_url) 45 | bulk_upload(urls) 46 | """ --------------------------------------------------------------------------------