├── .gitignore ├── README.md ├── assets └── whisper-benchmark.png ├── benchmark.sh ├── examples ├── demo_hermes.py ├── demo_hermes.sh └── hermes.ipynb ├── hermes.sh ├── hermes ├── __init__.py ├── cli.py ├── cli │ └── __main__.py ├── config.py ├── core.py ├── strategies │ ├── __init__.py │ ├── provider │ │ ├── __init__.py │ │ ├── base.py │ │ ├── groq.py │ │ ├── mlx.py │ │ └── openai.py │ └── source │ │ ├── __init__.py │ │ ├── auto.py │ │ ├── base.py │ │ ├── clipboard.py │ │ ├── file.py │ │ ├── microphone.py │ │ ├── web.py │ │ └── youtube.py └── utils │ ├── __init__.py │ ├── audio.py │ ├── cache.py │ └── llm.py ├── requirements.txt ├── setup.py └── tests ├── __init__.py ├── test_cache.py ├── test_cli.py ├── test_config.py ├── test_core.py ├── test_integration.py ├── test_llm_processor.py ├── test_provider_strategies.py └── test_source_strategies.py /.gitignore: -------------------------------------------------------------------------------- 1 | # macOS 2 | ._* 3 | .DS_Store 4 | .LSOverride 5 | # Icon must end with two carriage returns (\r) 6 | Icon 7 | 8 | # Windows 9 | [Dd]esktop.ini 10 | ehthumbs.db 11 | ehthumbs_vista.db 12 | [Tt]humbs.db 13 | 14 | # Backups 15 | *.bak 16 | 17 | # Logs 18 | *.log 19 | log/ 20 | logs/ 21 | 22 | # Temporary 23 | *~ 24 | *.tmp 25 | *.temp 26 | tmp/ 27 | temp/ 28 | 29 | # Vim 30 | .netrwhist 31 | [._]*.s[a-v][a-z] 32 | [._]*.sw[a-p] 33 | [._]*.un~ 34 | [._]s[a-rt-v][a-z] 35 | [._]ss[a-gi-z] 36 | [._]sw[a-p] 37 | 38 | input.mp4 39 | 40 | ydl.sh 41 | hermes0.sh 42 | todo.md 43 | create.sh 44 | requirements1.txt -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Hermes v0.1.0: Lightning-Fast Video Transcription 🎥➡️📝 2 | 3 | ![Hermes Benchmark Results](https://raw.githubusercontent.com/unclecode/hermes/main/assets/whisper-benchmark.png) 4 | 5 | 6 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1taJvfZKgTxOtScaMR3qofj-ev_8nwn9P?usp=sharing) 7 | 8 | Hermes, the messenger of the gods, now brings you ultra-fast video transcription powered by cutting-edge AI! This Python library and CLI tool harnesses the speed of Groq and the flexibility of multiple providers to convert your videos into text with unprecedented efficiency. 9 | 10 | ## 🚀 Features 11 | 12 | - **Blazing Fast**: Transcribe a 393-second video in just 1 second with Groq's distil-whisper model! 13 | - **Multi-Provider Support**: Choose from Groq (default), MLX Whisper, or OpenAI for transcription 14 | - **YouTube Support**: Easily transcribe YouTube videos by simply passing the URL 15 | - **Flexible**: Support for various models and output formats 16 | - **Python Library & CLI**: Use Hermes in your Python projects or directly from the command line 17 | - **LLM Processing**: Process the transcription with an LLM for further analysis 18 | 19 | ## 📦 Installation 20 | 21 | ### Prerequisites for Colab or Ubuntu-like Systems 22 | 23 | If you're using Google Colab or a Linux system like Ubuntu, you need to install some additional dependencies first. Run the following command: 24 | 25 | ``` 26 | !apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg 27 | ``` 28 | 29 | ### Installing Hermes 30 | 31 | You can install Hermes directly from GitHub using pip. There are two installation options: 32 | 33 | #### Standard Installation (without MLX support) 34 | 35 | For most users, the standard installation without MLX support is recommended: 36 | 37 | ``` 38 | pip install git+https://github.com/unclecode/hermes.git@main 39 | ``` 40 | 41 | This installation includes all core features but excludes MLX-specific functionality. 42 | 43 | #### Installation with MLX Support 44 | 45 | If you're using a Mac or an MPS system and want to use MLX Whisper for local transcription, install Hermes with MLX support: 46 | 47 | ``` 48 | pip install git+https://github.com/unclecode/hermes.git@main#egg=hermes[mlx] 49 | ``` 50 | 51 | This installation includes all core features plus MLX Whisper support for local transcription. 52 | 53 | **Note:** MLX support is currently only available for Mac or MPS systems. If you're unsure which version to install, start with the standard installation. 54 | 55 | ## ⚙️ Configuration 56 | 57 | Hermes uses a configuration file to manage its settings. On first run, Hermes will automatically create a `.hermes` folder in your home directory and populate it with a default `config.yml` file. 58 | 59 | You can customize Hermes' behavior by editing this file. Here's an example of what the `config.yml` might look like: 60 | 61 | ```yaml 62 | # LLM (Language Model) settings 63 | llm: 64 | provider: groq 65 | model: llama-3.1-8b-instant 66 | api_key: your_groq_api_key_here 67 | 68 | # Transcription settings 69 | transcription: 70 | provider: groq 71 | model: distil-whisper-large-v3-en 72 | api_key: your_groq_api_key_here 73 | 74 | # Cache settings 75 | cache: 76 | enabled: true 77 | directory: ~/.hermes/cache 78 | 79 | # Source type for input (auto-detect by default) 80 | source_type: auto 81 | ``` 82 | 83 | The configuration file is located at `~/.hermes/config.yml`. You can edit this file to change providers, models, API keys, and other settings. 84 | 85 | **Note:** If you don't specify API keys in the config file, Hermes will look for them in your environment variables. For example, it will look for `GROQ_API_KEY` if you're using Groq as a provider. 86 | 87 | To override the configuration temporarily, you can also use command-line arguments when running Hermes. These will take precedence over the settings in the config file. 88 | 89 | ## 🛠️ Usage 90 | 91 | ### Python Library 92 | 93 | 1. Basic transcription: 94 | 95 | ```python 96 | from hermes import transcribe 97 | 98 | result = transcribe('path/to/your/video.mp4', provider='groq') 99 | print(result['transcription']) 100 | ``` 101 | 102 | 2. Transcribe a YouTube video: 103 | 104 | ```python 105 | result = transcribe('https://www.youtube.com/watch?v=v=PNulbFECY-I', provider='groq') 106 | print(result['transcription']) 107 | ``` 108 | 109 | 3. Use a different model: 110 | 111 | ```python 112 | result = transcribe('path/to/your/video.mp4', provider='groq', model='whisper-large-v3') 113 | print(result['transcription']) 114 | ``` 115 | 116 | 4. Get JSON output: 117 | 118 | ```python 119 | result = transcribe('path/to/your/video.mp4', provider='groq', response_format='json') 120 | print(result['transcription']) 121 | ``` 122 | 123 | 5. Process with LLM: 124 | 125 | ```python 126 | result = transcribe('path/to/your/video.mp4', provider='groq', llm_prompt="Summarize this transcription in 3 bullet points") 127 | print(result['llm_processed']) 128 | ``` 129 | 130 | ### Command Line Interface 131 | 132 | 1. Basic usage: 133 | 134 | ``` 135 | hermes path/to/your/video.mp4 -p groq 136 | ``` 137 | 138 | 2. Transcribe a YouTube video: 139 | 140 | ``` 141 | hermes https://www.youtube.com/watch?v=v=PNulbFECY-I -p groq 142 | ``` 143 | 144 | 3. Use a different model: 145 | 146 | ``` 147 | hermes path/to/your/video.mp4 -p groq -m whisper-large-v3 148 | ``` 149 | 150 | 4. Get JSON output: 151 | 152 | ``` 153 | hermes path/to/your/video.mp4 -p groq --response_format json 154 | ``` 155 | 156 | 5. Process with LLM: 157 | 158 | ``` 159 | hermes path/to/your/video.mp4 -p groq --llm_prompt "Summarize this transcription in 3 bullet points" 160 | ``` 161 | 162 | ## 🏎️ Performance Comparison 163 | 164 | ![Hermes Benchmark Results](https://raw.githubusercontent.com/unclecode/hermes/main/assets/whisper-benchmark.png) 165 | 166 | For a 393-second video: 167 | 168 | | Provider | Model | Time (seconds) | 169 | |----------|-------|----------------| 170 | | Groq | distil-whisper-large-v3-en | 1 | 171 | | Groq | whisper-large-v3 | 2 | 172 | | MLX Whisper | distil-whisper-large-v3 | 11 | 173 | | OpenAI | whisper-1 | 21 | 174 | 175 | ## 📊 Running Benchmarks 176 | 177 | Test Hermes performance with different providers and models: 178 | 179 | ``` 180 | python -m hermes.benchmark path/to/your/video.mp4 181 | ``` 182 | 183 | or 184 | 185 | ``` 186 | python -m hermes.benchmark https://www.youtube.com/watch?v=v=PNulbFECY-I 187 | ``` 188 | 189 | This will generate a performance report for all supported providers and models. 190 | 191 | ## 🌟 Why Hermes? 192 | 193 | - **Unmatched Speed**: Groq's distil-whisper model transcribes 393 seconds of audio in just 1 second! 194 | - **Flexibility**: Choose the provider that best suits your needs 195 | - **Easy Integration**: Use as a Python library or CLI tool 196 | - **YouTube Support**: Transcribe YouTube videos without manual downloads 197 | - **Local Option**: Use MLX Whisper for fast, local transcription on Mac or MPS systems 198 | - **Cloud Power**: Leverage Groq's LPU for the fastest cloud-based transcription 199 | 200 | ## 🙏 Acknowledgements 201 | 202 | Huge shoutout to the @GroqInc team for their incredible distil-whisper model, making ultra-fast transcription a reality! 203 | 204 | ## 🎉 Final Thoughts 205 | 206 | We're living in amazing times! Whether you need the lightning speed of Groq, the convenience of OpenAI, or the local power of MLX Whisper, Hermes has got you covered. Happy transcribing! -------------------------------------------------------------------------------- /assets/whisper-benchmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/unclecode/hermes/806e1833167bc7735d470d3ab37f1f566b6aa463/assets/whisper-benchmark.png -------------------------------------------------------------------------------- /benchmark.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Define colors 4 | GREEN='\033[0;32m' 5 | CYAN='\033[1;36m' 6 | YELLOW='\033[1;33m' 7 | RED='\033[0;31m' 8 | NC='\033[0m' # No Color 9 | 10 | # Function to get video duration 11 | get_video_duration() { 12 | local video_file="$1" 13 | ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$video_file" 14 | } 15 | 16 | # Function to run the benchmark 17 | run_benchmark() { 18 | local video_file="$1" 19 | local report_file="benchmark_report.txt" 20 | 21 | # Get video duration 22 | local duration=$(get_video_duration "$video_file") 23 | echo -e "${YELLOW}Video duration: ${duration} seconds${NC}" 24 | 25 | # Initialize the report 26 | echo -e "${YELLOW}Benchmarking started for $video_file...${NC}" 27 | echo "Benchmark Report for $video_file" > "$report_file" 28 | echo "==================================" >> "$report_file" 29 | echo "Video duration: ${duration} seconds" >> "$report_file" 30 | 31 | # MLX Benchmark 32 | echo -e "${CYAN}Running MLX Whisper benchmark...${NC}" 33 | mlx_output=$(./hermes.sh "$video_file" mlx) 34 | mlx_time=$(echo "$mlx_output" | grep 'TRANSCRIPTION_TIME' | cut -d '=' -f 2) 35 | echo -e "${GREEN}MLX completed in $mlx_time seconds.${NC}" 36 | echo "MLX Whisper (distil-whisper-large-v3): $mlx_time seconds" >> "$report_file" 37 | 38 | # Groq Benchmark (distil-whisper-large-v3) 39 | echo -e "${CYAN}Running Groq benchmark (distil-whisper-large-v3-en)...${NC}" 40 | groq_output=$(./hermes.sh "$video_file" groq) 41 | groq_distil_time=$(echo "$groq_output" | grep 'TRANSCRIPTION_TIME' | cut -d '=' -f 2) 42 | echo -e "${GREEN}Groq (distil-whisper-large-v3-en) completed in $groq_distil_time seconds.${NC}" 43 | echo "Groq (distil-whisper-large-v3-en): $groq_distil_time seconds" >> "$report_file" 44 | 45 | # Groq Benchmark (whisper-large-v3) 46 | echo -e "${CYAN}Running Groq benchmark (whisper-large-v3)...${NC}" 47 | groq_large_output=$(./hermes.sh "$video_file" groq --model whisper-large-v3) 48 | groq_large_time=$(echo "$groq_large_output" | grep 'TRANSCRIPTION_TIME' | cut -d '=' -f 2) 49 | echo -e "${GREEN}Groq (whisper-large-v3) completed in $groq_large_time seconds.${NC}" 50 | echo "Groq (whisper-large-v3): $groq_large_time seconds" >> "$report_file" 51 | 52 | # OpenAI Benchmark 53 | echo -e "${CYAN}Running OpenAI Whisper benchmark...${NC}" 54 | openai_output=$(./hermes.sh "$video_file" openai --model whisper-1) 55 | openai_time=$(echo "$openai_output" | grep 'TRANSCRIPTION_TIME' | cut -d '=' -f 2) 56 | echo -e "${GREEN}OpenAI completed in $openai_time seconds.${NC}" 57 | echo "OpenAI Whisper (whisper-1): $openai_time seconds" >> "$report_file" 58 | 59 | # Final Report 60 | echo -e "${YELLOW}Benchmarking completed. Generating report...${NC}" 61 | # echo -e "\nBenchmark Results:" >> "$report_file" 62 | # echo "==================================" >> "$report_file" 63 | # echo "MLX Whisper (distil-whisper-large-v3): $mlx_time seconds" >> "$report_file" 64 | # echo "Groq (distil-whisper-large-v3-en): $groq_distil_time seconds" >> "$report_file" 65 | # echo "Groq (whisper-large-v3): $groq_large_time seconds" >> "$report_file" 66 | # echo "OpenAI Whisper (whisper-1): $openai_time seconds" >> "$report_file" 67 | 68 | echo -e "${GREEN}Report generated in $report_file${NC}" 69 | 70 | # Display the content of the report file 71 | echo -e "\n${YELLOW}Benchmark Report:${NC}" 72 | cat "$report_file" 73 | } 74 | 75 | # Check if video file is provided 76 | if [ -z "$1" ]; then 77 | echo -e "${RED}Usage: $0 ${NC}" 78 | exit 1 79 | fi 80 | 81 | # Run the benchmark 82 | run_benchmark "$1" 83 | -------------------------------------------------------------------------------- /examples/demo_hermes.py: -------------------------------------------------------------------------------- 1 | import os, sys 2 | # append the parent directory to the sys.path 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | from hermes.core import transcribe 5 | from hermes.config import CONFIG 6 | 7 | # Ensure GROQ_API_KEY is set 8 | if not CONFIG['llm']['api_key']: 9 | raise ValueError("Please set the GROQ_API_KEY environment variable") 10 | 11 | # Example 1: Basic transcription of a local file 12 | print("Example 1: Basic transcription of a local file") 13 | result = transcribe('examples/assets/input.mp4', provider='groq') 14 | print(f"Transcription: {result['transcription'][:100]}...\n") 15 | 16 | # Example 2: Transcription of a YouTube video 17 | print("Example 2: Transcription of a YouTube video") 18 | result = transcribe('https://www.youtube.com/watch?v=v=PNulbFECY-I', provider='groq') 19 | print(f"Transcription: {result['transcription'][:100]}...\n") 20 | 21 | # # Example 3: Transcription with a different model 22 | print("Example 3: Transcription with a different model") 23 | result = transcribe('examples/assets/input.mp4', provider='groq', model='whisper-large-v3') 24 | print(f"Transcription: {result['transcription'][:100]}...\n") 25 | 26 | # Example 4: Transcription with a different response format 27 | print("Example 4: Transcription with a different response format") 28 | result = transcribe('examples/assets/input.mp4', provider='groq', response_format='json') 29 | print(f"JSON Response: {result['transcription']}\n") 30 | 31 | # Example 5: Transcription with LLM processing 32 | print("Example 5: Transcription with LLM processing") 33 | result = transcribe('examples/assets/input.mp4', provider='groq', llm_prompt='Summarize this transcription in 3 bullet points') 34 | print(f"Transcription: {result['transcription'][:100]}...") 35 | print(f"LLM Summary: {result['llm_processed']}\n") 36 | 37 | # Example 6: Forced transcription (bypassing cache) 38 | print("Example 6: Forced transcription (bypassing cache)") 39 | result = transcribe('examples/assets/input.mp4', provider='groq', force=True) 40 | print(f"Transcription: {result['transcription'][:100]}...\n") 41 | 42 | print("All examples completed successfully!") -------------------------------------------------------------------------------- /examples/demo_hermes.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Ensure GROQ_API_KEY is set 4 | if [ -z "$GROQ_API_KEY" ]; then 5 | echo "Please set the GROQ_API_KEY environment variable" 6 | exit 1 7 | fi 8 | 9 | # Example 1: Basic transcription of a local file 10 | echo "Example 1: Basic transcription of a local file" 11 | python -m hermes.cli examples/assets/input.mp4 -p groq 12 | 13 | # Example 2: Transcription of a YouTube video 14 | echo -e "\nExample 2: Transcription of a YouTube video" 15 | python -m hermes.cli https://www.youtube.com/watch?v=PNulbFECY-I -p groq 16 | 17 | # Example 3: Transcription with a different model 18 | echo -e "\nExample 3: Transcription with a different model" 19 | python -m hermes.cli examples/assets/input.mp4 -p groq -m whisper-large-v3 20 | 21 | # Example 4: Transcription with a different response format 22 | echo -e "\nExample 4: Transcription with a different response format" 23 | python -m hermes.cli examples/assets/input.mp4 -p groq --response_format json 24 | 25 | # Example 5: Transcription with LLM processing 26 | echo -e "\nExample 5: Transcription with LLM processing" 27 | python -m hermes.cli examples/assets/input.mp4 -p groq --llm_prompt "Summarize this transcription in 3 bullet points" 28 | 29 | # Example 6: Forced transcription (bypassing cache) 30 | echo -e "\nExample 6: Forced transcription (bypassing cache)" 31 | python -m hermes.cli examples/assets/input.mp4 -p groq -f 32 | 33 | echo -e "\nAll examples completed successfully!" -------------------------------------------------------------------------------- /examples/hermes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "e3-WuuG6CIFy" 7 | }, 8 | "source": [ 9 | "# Hermes: Lightning-Fast Video Transcription Tutorial\n", 10 | "\n", 11 | "## Introduction\n", 12 | "\n", 13 | "Welcome to this tutorial on Hermes, a powerful Python library and CLI tool for lightning-fast video transcription! Developed by [@unclecode](https://twitter.com/unclecode) and powered by cutting-edge AI, Hermes leverages the speed of Groq and the flexibility of multiple providers (Groq, MLX Whisper, and OpenAI) to convert your videos into text.\n", 14 | "\n", 15 | "Before we dive in, head over to the GitHub repo and show your support:\n", 16 | "\n", 17 | "- **Star the repo:** https://github.com/unclecode/hermes\n", 18 | "- **Follow me on X:** [@unclecode](https://twitter.com/unclecode)" 19 | ], 20 | "id": "e3-WuuG6CIFy" 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": { 25 | "id": "AIrtWBTDCIF0" 26 | }, 27 | "source": [ 28 | "## Installation\n", 29 | "\n", 30 | "Let's get Hermes installed! Use pip to install directly from GitHub:" 31 | ], 32 | "id": "AIrtWBTDCIF0" 33 | }, 34 | { 35 | "cell_type": "code", 36 | "source": [ 37 | "!apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg" 38 | ], 39 | "metadata": { 40 | "colab": { 41 | "base_uri": "https://localhost:8080/" 42 | }, 43 | "collapsed": true, 44 | "id": "R0B4cL2MG5iL", 45 | "outputId": "ad2515b8-2ee9-423e-8394-03f62f6671b6" 46 | }, 47 | "id": "R0B4cL2MG5iL", 48 | "execution_count": 4, 49 | "outputs": [ 50 | { 51 | "output_type": "stream", 52 | "name": "stdout", 53 | "text": [ 54 | "Reading package lists... Done\n", 55 | "Building dependency tree... Done\n", 56 | "Reading state information... Done\n", 57 | "libasound2-dev is already the newest version (1.2.6.1-1ubuntu1).\n", 58 | "ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).\n", 59 | "Suggested packages:\n", 60 | " portaudio19-doc\n", 61 | "The following NEW packages will be installed:\n", 62 | " libportaudio2 libportaudiocpp0 portaudio19-dev\n", 63 | "0 upgraded, 3 newly installed, 0 to remove and 45 not upgraded.\n", 64 | "Need to get 188 kB of archives.\n", 65 | "After this operation, 927 kB of additional disk space will be used.\n", 66 | "Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudio2 amd64 19.6.0-1.1 [65.3 kB]\n", 67 | "Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudiocpp0 amd64 19.6.0-1.1 [16.1 kB]\n", 68 | "Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 portaudio19-dev amd64 19.6.0-1.1 [106 kB]\n", 69 | "Fetched 188 kB in 0s (578 kB/s)\n", 70 | "Selecting previously unselected package libportaudio2:amd64.\n", 71 | "(Reading database ... 123595 files and directories currently installed.)\n", 72 | "Preparing to unpack .../libportaudio2_19.6.0-1.1_amd64.deb ...\n", 73 | "Unpacking libportaudio2:amd64 (19.6.0-1.1) ...\n", 74 | "Selecting previously unselected package libportaudiocpp0:amd64.\n", 75 | "Preparing to unpack .../libportaudiocpp0_19.6.0-1.1_amd64.deb ...\n", 76 | "Unpacking libportaudiocpp0:amd64 (19.6.0-1.1) ...\n", 77 | "Selecting previously unselected package portaudio19-dev:amd64.\n", 78 | "Preparing to unpack .../portaudio19-dev_19.6.0-1.1_amd64.deb ...\n", 79 | "Unpacking portaudio19-dev:amd64 (19.6.0-1.1) ...\n", 80 | "Setting up libportaudio2:amd64 (19.6.0-1.1) ...\n", 81 | "Setting up libportaudiocpp0:amd64 (19.6.0-1.1) ...\n", 82 | "Setting up portaudio19-dev:amd64 (19.6.0-1.1) ...\n", 83 | "Processing triggers for libc-bin (2.35-0ubuntu3.4) ...\n", 84 | "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link\n", 85 | "\n", 86 | "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link\n", 87 | "\n", 88 | "/sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link\n", 89 | "\n", 90 | "/sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link\n", 91 | "\n", 92 | "/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link\n", 93 | "\n", 94 | "/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link\n", 95 | "\n", 96 | "/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link\n", 97 | "\n", 98 | "/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link\n", 99 | "\n", 100 | "/sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link\n", 101 | "\n" 102 | ] 103 | } 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 1, 109 | "metadata": { 110 | "colab": { 111 | "base_uri": "https://localhost:8080/" 112 | }, 113 | "collapsed": true, 114 | "id": "xoAvO9jRCIF0", 115 | "outputId": "1fe87c08-620c-478f-aaa5-fb003c930197" 116 | }, 117 | "outputs": [ 118 | { 119 | "output_type": "stream", 120 | "name": "stdout", 121 | "text": [ 122 | "Collecting git+https://github.com/unclecode/hermes.git@main\n", 123 | " Cloning https://github.com/unclecode/hermes.git (to revision main) to /tmp/pip-req-build-xqallhm5\n", 124 | " Running command git clone --filter=blob:none --quiet https://github.com/unclecode/hermes.git /tmp/pip-req-build-xqallhm5\n", 125 | " Resolved https://github.com/unclecode/hermes.git to commit 1dde137d1f7b0c1eefab8d76353c68e1fe36b31b\n", 126 | " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 127 | "Collecting yt-dlp>=2024.8.6 (from hermes==0.1.0)\n", 128 | " Downloading yt_dlp-2024.8.6-py3-none-any.whl.metadata (170 kB)\n", 129 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m170.1/170.1 kB\u001b[0m \u001b[31m1.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 130 | "\u001b[?25hCollecting ffmpeg-python>=0.2.0 (from hermes==0.1.0)\n", 131 | " Downloading ffmpeg_python-0.2.0-py3-none-any.whl.metadata (1.7 kB)\n", 132 | "Collecting openai>=1.42.0 (from hermes==0.1.0)\n", 133 | " Downloading openai-1.42.0-py3-none-any.whl.metadata (22 kB)\n", 134 | "Collecting groq>=0.9.0 (from hermes==0.1.0)\n", 135 | " Downloading groq-0.9.0-py3-none-any.whl.metadata (13 kB)\n", 136 | "Collecting pydub>=0.25.1 (from hermes==0.1.0)\n", 137 | " Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n", 138 | "Requirement already satisfied: pyperclip>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from hermes==0.1.0) (1.9.0)\n", 139 | "Collecting sounddevice>=0.5.0 (from hermes==0.1.0)\n", 140 | " Downloading sounddevice-0.5.0-py3-none-any.whl.metadata (1.4 kB)\n", 141 | "Collecting numpy>=2.0.1 (from hermes==0.1.0)\n", 142 | " Downloading numpy-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)\n", 143 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m60.9/60.9 kB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 144 | "\u001b[?25hRequirement already satisfied: requests>=2.32.3 in /usr/local/lib/python3.10/dist-packages (from hermes==0.1.0) (2.32.3)\n", 145 | "Requirement already satisfied: PyYAML>=6.0.2 in /usr/local/lib/python3.10/dist-packages (from hermes==0.1.0) (6.0.2)\n", 146 | "Collecting litellm>=1.44.5 (from hermes==0.1.0)\n", 147 | " Downloading litellm-1.44.5-py3-none-any.whl.metadata (32 kB)\n", 148 | "Requirement already satisfied: future in /usr/local/lib/python3.10/dist-packages (from ffmpeg-python>=0.2.0->hermes==0.1.0) (1.0.0)\n", 149 | "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from groq>=0.9.0->hermes==0.1.0) (3.7.1)\n", 150 | "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from groq>=0.9.0->hermes==0.1.0) (1.7.0)\n", 151 | "Collecting httpx<1,>=0.23.0 (from groq>=0.9.0->hermes==0.1.0)\n", 152 | " Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)\n", 153 | "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from groq>=0.9.0->hermes==0.1.0) (2.8.2)\n", 154 | "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from groq>=0.9.0->hermes==0.1.0) (1.3.1)\n", 155 | "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from groq>=0.9.0->hermes==0.1.0) (4.12.2)\n", 156 | "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from litellm>=1.44.5->hermes==0.1.0) (3.10.5)\n", 157 | "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from litellm>=1.44.5->hermes==0.1.0) (8.1.7)\n", 158 | "Requirement already satisfied: importlib-metadata>=6.8.0 in /usr/local/lib/python3.10/dist-packages (from litellm>=1.44.5->hermes==0.1.0) (8.4.0)\n", 159 | "Requirement already satisfied: jinja2<4.0.0,>=3.1.2 in /usr/local/lib/python3.10/dist-packages (from litellm>=1.44.5->hermes==0.1.0) (3.1.4)\n", 160 | "Requirement already satisfied: jsonschema<5.0.0,>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from litellm>=1.44.5->hermes==0.1.0) (4.23.0)\n", 161 | "Collecting python-dotenv>=0.2.0 (from litellm>=1.44.5->hermes==0.1.0)\n", 162 | " Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)\n", 163 | "Collecting tiktoken>=0.7.0 (from litellm>=1.44.5->hermes==0.1.0)\n", 164 | " Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)\n", 165 | "Requirement already satisfied: tokenizers in /usr/local/lib/python3.10/dist-packages (from litellm>=1.44.5->hermes==0.1.0) (0.19.1)\n", 166 | "Collecting jiter<1,>=0.4.0 (from openai>=1.42.0->hermes==0.1.0)\n", 167 | " Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)\n", 168 | "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai>=1.42.0->hermes==0.1.0) (4.66.5)\n", 169 | "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.3->hermes==0.1.0) (3.3.2)\n", 170 | "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.3->hermes==0.1.0) (3.7)\n", 171 | "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.3->hermes==0.1.0) (2.0.7)\n", 172 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.3->hermes==0.1.0) (2024.7.4)\n", 173 | "Requirement already satisfied: CFFI>=1.0 in /usr/local/lib/python3.10/dist-packages (from sounddevice>=0.5.0->hermes==0.1.0) (1.17.0)\n", 174 | "Collecting brotli (from yt-dlp>=2024.8.6->hermes==0.1.0)\n", 175 | " Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.5 kB)\n", 176 | "Collecting mutagen (from yt-dlp>=2024.8.6->hermes==0.1.0)\n", 177 | " Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)\n", 178 | "Collecting pycryptodomex (from yt-dlp>=2024.8.6->hermes==0.1.0)\n", 179 | " Downloading pycryptodomex-3.20.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)\n", 180 | "Collecting websockets>=12.0 (from yt-dlp>=2024.8.6->hermes==0.1.0)\n", 181 | " Downloading websockets-13.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)\n", 182 | "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->groq>=0.9.0->hermes==0.1.0) (1.2.2)\n", 183 | "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from CFFI>=1.0->sounddevice>=0.5.0->hermes==0.1.0) (2.22)\n", 184 | "Collecting httpcore==1.* (from httpx<1,>=0.23.0->groq>=0.9.0->hermes==0.1.0)\n", 185 | " Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)\n", 186 | "Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->groq>=0.9.0->hermes==0.1.0)\n", 187 | " Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)\n", 188 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata>=6.8.0->litellm>=1.44.5->hermes==0.1.0) (3.20.0)\n", 189 | "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2<4.0.0,>=3.1.2->litellm>=1.44.5->hermes==0.1.0) (2.1.5)\n", 190 | "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.44.5->hermes==0.1.0) (24.2.0)\n", 191 | "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.44.5->hermes==0.1.0) (2023.12.1)\n", 192 | "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.44.5->hermes==0.1.0) (0.35.1)\n", 193 | "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.44.5->hermes==0.1.0) (0.20.0)\n", 194 | "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->groq>=0.9.0->hermes==0.1.0) (0.7.0)\n", 195 | "Requirement already satisfied: pydantic-core==2.20.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->groq>=0.9.0->hermes==0.1.0) (2.20.1)\n", 196 | "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken>=0.7.0->litellm>=1.44.5->hermes==0.1.0) (2024.5.15)\n", 197 | "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->litellm>=1.44.5->hermes==0.1.0) (2.4.0)\n", 198 | "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->litellm>=1.44.5->hermes==0.1.0) (1.3.1)\n", 199 | "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->litellm>=1.44.5->hermes==0.1.0) (1.4.1)\n", 200 | "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->litellm>=1.44.5->hermes==0.1.0) (6.0.5)\n", 201 | "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->litellm>=1.44.5->hermes==0.1.0) (1.9.4)\n", 202 | "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->litellm>=1.44.5->hermes==0.1.0) (4.0.3)\n", 203 | "Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /usr/local/lib/python3.10/dist-packages (from tokenizers->litellm>=1.44.5->hermes==0.1.0) (0.23.5)\n", 204 | "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm>=1.44.5->hermes==0.1.0) (3.15.4)\n", 205 | "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm>=1.44.5->hermes==0.1.0) (2024.6.1)\n", 206 | "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm>=1.44.5->hermes==0.1.0) (24.1)\n", 207 | "Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)\n", 208 | "Downloading groq-0.9.0-py3-none-any.whl (103 kB)\n", 209 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m103.5/103.5 kB\u001b[0m \u001b[31m3.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 210 | "\u001b[?25hDownloading litellm-1.44.5-py3-none-any.whl (8.5 MB)\n", 211 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.5/8.5 MB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 212 | "\u001b[?25hDownloading numpy-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.3 MB)\n", 213 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m16.3/16.3 MB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 214 | "\u001b[?25hDownloading openai-1.42.0-py3-none-any.whl (362 kB)\n", 215 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m362.9/362.9 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 216 | "\u001b[?25hDownloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", 217 | "Downloading sounddevice-0.5.0-py3-none-any.whl (32 kB)\n", 218 | "Downloading yt_dlp-2024.8.6-py3-none-any.whl (3.1 MB)\n", 219 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 220 | "\u001b[?25hDownloading httpx-0.27.0-py3-none-any.whl (75 kB)\n", 221 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m3.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 222 | "\u001b[?25hDownloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n", 223 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m3.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 224 | "\u001b[?25hDownloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (318 kB)\n", 225 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m318.9/318.9 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 226 | "\u001b[?25hDownloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n", 227 | "Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", 228 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 229 | "\u001b[?25hDownloading websockets-13.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (157 kB)\n", 230 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m157.2/157.2 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 231 | "\u001b[?25hDownloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB)\n", 232 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 233 | "\u001b[?25hDownloading mutagen-1.47.0-py3-none-any.whl (194 kB)\n", 234 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.4/194.4 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 235 | "\u001b[?25hDownloading pycryptodomex-3.20.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)\n", 236 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 237 | "\u001b[?25hDownloading h11-0.14.0-py3-none-any.whl (58 kB)\n", 238 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 239 | "\u001b[?25hBuilding wheels for collected packages: hermes\n", 240 | " Building wheel for hermes (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 241 | " Created wheel for hermes: filename=hermes-0.1.0-py3-none-any.whl size=21951 sha256=15f35509dabd77bc8aa8460fe4f2f5ea20ed6a7c983a9bf2d5ca301b3d1eff0f\n", 242 | " Stored in directory: /tmp/pip-ephem-wheel-cache-xuakz5yd/wheels/00/0d/06/7e37737e241322e19a0a573c61b2a89008e212837bd3c730a1\n", 243 | "Successfully built hermes\n", 244 | "Installing collected packages: pydub, brotli, websockets, python-dotenv, pycryptodomex, numpy, mutagen, jiter, h11, ffmpeg-python, yt-dlp, tiktoken, sounddevice, httpcore, httpx, openai, groq, litellm, hermes\n", 245 | " Attempting uninstall: numpy\n", 246 | " Found existing installation: numpy 1.26.4\n", 247 | " Uninstalling numpy-1.26.4:\n", 248 | " Successfully uninstalled numpy-1.26.4\n", 249 | "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", 250 | "xgboost 2.1.1 requires nvidia-nccl-cu12; platform_system == \"Linux\" and platform_machine != \"aarch64\", which is not installed.\n", 251 | "accelerate 0.32.1 requires numpy<2.0.0,>=1.17, but you have numpy 2.1.0 which is incompatible.\n", 252 | "albucore 0.0.13 requires numpy<2,>=1.24.4, but you have numpy 2.1.0 which is incompatible.\n", 253 | "arviz 0.18.0 requires numpy<2.0,>=1.23.0, but you have numpy 2.1.0 which is incompatible.\n", 254 | "cudf-cu12 24.4.1 requires numpy<2.0a0,>=1.23, but you have numpy 2.1.0 which is incompatible.\n", 255 | "cupy-cuda12x 12.2.0 requires numpy<1.27,>=1.20, but you have numpy 2.1.0 which is incompatible.\n", 256 | "gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.1.0 which is incompatible.\n", 257 | "ibis-framework 8.0.0 requires numpy<2,>=1, but you have numpy 2.1.0 which is incompatible.\n", 258 | "numba 0.60.0 requires numpy<2.1,>=1.22, but you have numpy 2.1.0 which is incompatible.\n", 259 | "pandas 2.1.4 requires numpy<2,>=1.22.4; python_version < \"3.11\", but you have numpy 2.1.0 which is incompatible.\n", 260 | "rmm-cu12 24.4.0 requires numpy<2.0a0,>=1.23, but you have numpy 2.1.0 which is incompatible.\n", 261 | "scikit-learn 1.3.2 requires numpy<2.0,>=1.17.3, but you have numpy 2.1.0 which is incompatible.\n", 262 | "tensorflow 2.17.0 requires numpy<2.0.0,>=1.23.5; python_version <= \"3.11\", but you have numpy 2.1.0 which is incompatible.\n", 263 | "thinc 8.2.5 requires numpy<2.0.0,>=1.19.0; python_version >= \"3.9\", but you have numpy 2.1.0 which is incompatible.\n", 264 | "transformers 4.42.4 requires numpy<2.0,>=1.17, but you have numpy 2.1.0 which is incompatible.\u001b[0m\u001b[31m\n", 265 | "\u001b[0mSuccessfully installed brotli-1.1.0 ffmpeg-python-0.2.0 groq-0.9.0 h11-0.14.0 hermes-0.1.0 httpcore-1.0.5 httpx-0.27.0 jiter-0.5.0 litellm-1.44.5 mutagen-1.47.0 numpy-2.1.0 openai-1.42.0 pycryptodomex-3.20.0 pydub-0.25.1 python-dotenv-1.0.1 sounddevice-0.5.0 tiktoken-0.7.0 websockets-13.0 yt-dlp-2024.8.6\n" 266 | ] 267 | } 268 | ], 269 | "source": [ 270 | "!pip install git+https://github.com/unclecode/hermes.git@main" 271 | ], 272 | "id": "xoAvO9jRCIF0" 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": { 277 | "id": "8BNFpK6ZCIF1" 278 | }, 279 | "source": [ 280 | "## Python Library" 281 | ], 282 | "id": "8BNFpK6ZCIF1" 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": { 287 | "id": "F_u-xdquCIF1" 288 | }, 289 | "source": [ 290 | "### Basic Transcription\n", 291 | "\n", 292 | "Here's how to transcribe a local video file using the `transcribe` function with Groq as the provider:" 293 | ], 294 | "id": "F_u-xdquCIF1" 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 1, 299 | "metadata": { 300 | "colab": { 301 | "base_uri": "https://localhost:8080/" 302 | }, 303 | "id": "D4OM8q8RCIF1", 304 | "outputId": "4cd04ee5-43f0-4a84-df6c-54a85826d4c0" 305 | }, 306 | "outputs": [ 307 | { 308 | "output_type": "stream", 309 | "name": "stdout", 310 | "text": [ 311 | " Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama or Lama CCP and then you download it and install it or I just choose open rotor because they do have this carbon to 72 billion in Strug model and I took it from re-increting account so you can have an API tokens is not difficult anyway I just the code is here from their own GitHub if you want to use by Ollama then then okay then if if you You know what? When you call this models, in a typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extract your arguments, get the result back, send it back to the model, and then the model produced the response for you. This process, when you do have multiple calls, a user asking you questions that then when we get the first response then we have another call and again another call so this is going to be in kind of iteration and then you have to continue doing that so what if down I created a helper function which get the LM instance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function called, and get it out, they call the functions, and create the function message and send it back to the message in each story, and then iterate, till the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're gonna see it, because one of the thing that I want to try is this multiple call back to each other's. So that's one thing. And also I created some dummy functions. Get current weather, just slip a little bit, and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. Doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power. And they're really calculating. If you remember, if you have seen my video about Mistrol, there I challenge it with the complicated mathematical expressions. I wanted to see if you can break it down and then doing it. It was, couldn't, you know, but sort of half-half. Now let's see what's going to happen for this model. First of all, this model will return one functions. So it's not a multiple function, the parallel functions. So you have to iterate, like what I said. Okay, the first example. the better luck today in Paris also tell me the time in San Francisco. So there are two questions not necessarily relevant to each order. You can do this first or you can do the other one. But I want to see how it goes. And I pass those two functions here and I call the execute function. And that's what we're going to see. When you call this, the first things come out of the model is the get current weather in the Paris and location. And this is function result. I should say function called function response, but better than make a function response. So my function, that executed help for function, executed the relevant functions, and got the dummy data, and then pass it to model, and now we have another function called from the model. Model asking for another function to be called is get current time, which is the look values of San Francisco, and then the dummy functions returns back to 10, and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn miss stuff between Paris and San Francisco And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the word like today in Paris? So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature. And then let's see what's going to happen. So the first function called, the first function detected is the current temperature, is the Paris. So we execute the function, return it back to the model, and model it's rate again. And the second is the convert temperature, which is the 10 from the previous function called, it and then from Celsius to Fahrenheit and because it's a dummy function as it turns back at 10 always and we got this one perfect so this one Matt that that's really interesting very clean and then you can continue doing that to get the response and and you know by the way you can you can do some apply some smart trick to build your own a parallel function call I'll talk about it maybe someday even even the model doesn't support now Now let's go to the challenge that none of them could do it with the GPD. So I have sine, I have cosine, exponential in the power, and I'm passing this one. Calculate the result of power of sign of exponential 10 plus cosine of the power of two, exponent and then when you took it all, then there's another power, use this as a base and the four as exponent. Okay, so let's see what's going to happen. it here so we have a bunch of function called first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really brick it down I like that first went for exponential 10 and that's what you got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this addition doesn't look correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's Yeah, it's subtracted. Basically, this, if this minus this, it's going to give you the, but let me make sure everything is correct. Yeah. Well, okay, but that's beautiful. It got the exponent four, and this is great. This is fantastic. I mean, I think by maybe a little bit of adjustment, we can make it better. Or maybe we can create an extra function sum and don't let the model to do this edition by itself. We make it the sum like a binary operator, this sum of this, comma, this. You get what I mean? I will try that one. But that's fantastic. This absolutely, this is beautiful. This is great. Now I have, when I tried nostril, I said, I think so far this is the best function called model. But now I'm going to tell the same thing for this quote one. But seven to billion parameters, you know, that's the things. I like to have a function calling with 100 million parameters model. Anyway, this is great. This is a good model. I'm going to dig in more into this. And then that's it. I'll share the links to collapse so you can play around whether you can use that helper function. See how it goes. So this was Cohen 2. You can go to their blog. Definitely go to their blogs. A bunch of really interesting information over there. You can see what they have done. And that's it. So, Cohen 2, function calling. It was nice. And Uncle Code out.\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "import os\n", 317 | "from google.colab import userdata\n", 318 | "os.environ['GROQ_API_KEY'] = userdata.get('GROQ_API_KEY')\n", 319 | "from hermes import transcribe\n", 320 | "\n", 321 | "# Replace with the actual path to your video file\n", 322 | "video_file = 'input.mp4'\n", 323 | "\n", 324 | "result = transcribe(video_file, provider='groq')\n", 325 | "\n", 326 | "# Print the transcription\n", 327 | "print(result['transcription'])" 328 | ], 329 | "id": "D4OM8q8RCIF1" 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": { 334 | "id": "aTZPOUVGCIF1" 335 | }, 336 | "source": [ 337 | "**Explanation:**\n", 338 | "\n", 339 | "- We import the `transcribe` function from the `hermes` library.\n", 340 | "- We provide the path to our video file.\n", 341 | "- We specify `provider='groq'` to use Groq's powerful transcription models.\n", 342 | "- The `transcribe` function returns a dictionary containing the transcription and other metadata." 343 | ], 344 | "id": "aTZPOUVGCIF1" 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": { 349 | "id": "O0mPy3xrCIF2" 350 | }, 351 | "source": [ 352 | "### Transcribing YouTube Videos\n", 353 | "\n", 354 | "Transcribing YouTube videos is a breeze with Hermes. Simply pass the YouTube URL to the `transcribe` function:" 355 | ], 356 | "id": "O0mPy3xrCIF2" 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 2, 361 | "metadata": { 362 | "colab": { 363 | "base_uri": "https://localhost:8080/" 364 | }, 365 | "id": "TYyESkhMCIF2", 366 | "outputId": "f26119ea-13a3-4829-e1a4-db5dcc3fe154" 367 | }, 368 | "outputs": [ 369 | { 370 | "output_type": "stream", 371 | "name": "stdout", 372 | "text": [ 373 | "[youtube] Extracting URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ\n", 374 | "[youtube] dQw4w9WgXcQ: Downloading webpage\n", 375 | "[youtube] dQw4w9WgXcQ: Downloading ios player API JSON\n", 376 | "[youtube] dQw4w9WgXcQ: Downloading web creator player API JSON\n", 377 | "[youtube] dQw4w9WgXcQ: Downloading player a87a9450\n", 378 | "[youtube] dQw4w9WgXcQ: Downloading m3u8 information\n", 379 | "[info] dQw4w9WgXcQ: Downloading 1 format(s): 251\n", 380 | "[download] Destination: dQw4w9WgXcQ.webm\n", 381 | "[download] 100% of 3.28MiB in 00:00:00 at 10.12MiB/s \n", 382 | "[ExtractAudio] Destination: dQw4w9WgXcQ.mp3\n", 383 | "Deleting original file dQw4w9WgXcQ.webm (pass -k to keep)\n", 384 | " We're no strange to love. You know the rules, and so do I. I feel commitments want to thinking us. You wouldn't get this from any other guy I just want to tell you how I'm feeling Gotta make you understand Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell the lie And hurt you We've known each other for so long. Your heart's been aching, but you're too shy to say it. Inside we both know what's been going on. We know the game and we're gonna play it. And if you ask me how I'm feeling, don't tell me you're too blind to see. Never gonna give you up, never gonna let you down. down never gonna run around and desert you never gonna make you cry never gonna say goodbye never gonna tell a lie and hurt you never gonna give you up never gonna let you down never gonna run around and desert you never gonna make you cry never gonna say goodbye never gonna tell a bye and hurt you Never gonna give Never gonna give We've known each other For so long Your heart's been aching but You're too shy to say it It's how we both know what's been going on We know the game and we're gonna play it I just want to tell you how I feel like Gotta make you understand Never gonna give you up, never gonna let you down, never gonna run around and desert you, never gonna make you cry, never gonna say goodbye, never gonna tell a lie, and hurt you, never gonna give you up, never gonna let you down, never gonna run around and desert you, never gonna make you cry, never gonna say good bye, never gonna say good, never gonna say good Bye. Never going to say goodbye and hurt you. Never going to give you up. Never going to let you down. Never going to run around and desert you. Never going to make you cry. Never going to say goodbye. That I'm going to celebrate.\n" 385 | ] 386 | } 387 | ], 388 | "source": [ 389 | "from hermes import transcribe\n", 390 | "\n", 391 | "youtube_url = 'https://www.youtube.com/watch?v=PNulbFECY-I' # Example URL\n", 392 | "\n", 393 | "result = transcribe(youtube_url, provider='groq')\n", 394 | "print(result['transcription'])" 395 | ], 396 | "id": "TYyESkhMCIF2" 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": { 401 | "id": "K-Hky9-oCIF2" 402 | }, 403 | "source": [ 404 | "**Explanation:**\n", 405 | "\n", 406 | "- Hermes handles the YouTube video download automatically.\n", 407 | "- No need to manually download the video!" 408 | ], 409 | "id": "K-Hky9-oCIF2" 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": { 414 | "id": "YfGkV9jNCIF2" 415 | }, 416 | "source": [ 417 | "### Using Different Models\n", 418 | "\n", 419 | "Hermes supports various transcription models. You can specify the desired model using the `model` parameter:" 420 | ], 421 | "id": "YfGkV9jNCIF2" 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": 3, 426 | "metadata": { 427 | "colab": { 428 | "base_uri": "https://localhost:8080/" 429 | }, 430 | "id": "zQZH3e80CIF2", 431 | "outputId": "b8a9d956-b597-4a09-b374-c7556cfea79a" 432 | }, 433 | "outputs": [ 434 | { 435 | "output_type": "stream", 436 | "name": "stdout", 437 | "text": [ 438 | " Hello, you beautiful people. This is Uncle Code. And today I'm going to review quickly the Q1.2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with a really good stuff with 72 billion models. And you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions, and that is great because this is a large language model. sounds like all of us. And it's cool, and go and play around with it. So what I'm trying to do is trying to challenge a little bit the function calling. Like similar to the thing that I did for other models like the Mistral. So first, they have this nice library, the Cohen agents, that let you to create agent-next softwares and applications and also speed up the work with the large language model. What you can just install it quickly. And then there are different ways that you can work with the model. You can maybe use a Lama or Lama CCP and then you download it and install it. Or I just choose OpenRotor because they do have this going to 72 billion in stroke model. And I took it from reincred in accounts. You can have an API tokens, it's not difficult. Anyway, I just, the code is here from their own GitHub. If you want to use by Ollama. Then, okay, then if, you know what, when you call this models in typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extracted arguments, get the result back, send it back to the model, and then the model produce the response for you, right? This process, when you do have multiple calls, let's say user asking you questions, that then when we get the first response, then we have another call, and again another call. So this is gonna be in kind of iteration. And then you have to continue doing that. So what I done I created a helper function which get the LME stance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function call, it get it out, it call the functions and create the function message and send it back to the messaging story, and then iterate to the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're going to see, because one of the things I want to try is this multiple call back to each other. So that's one thing. And also I created some dummy functions, get current weather, it just slip a little bit and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. It doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power and they're really calculating if you remember if you have seen my video about mistral there i challenge it with the complicated mathematical expressions i wanted to see if can break it down and then doing it it was couldn't you know but sort of half half now let's see what's going to happen for this model first of all this model always returns one functions so it's not a multiple function the parallel functions so you have to each way like what i said okay the first example what's the better luck today in paris also tell me the time in san francisco so there are two questions not necessarily relevant to each order so you can do this first or you can do the other one but i want to see how it goes and i pass those two functions here and i call the execute function and that's what we're going to see when you call this the first things come out of the model is the get current weather in the Paris and location and this is function result I should say function called function response but better than making function response so so my function that execute help for function executed the relevant functions and got the dummy data and then pass it to model and now we have another function called from the model model asking for the function to be called is get parent time which is the look values of San Francisco and then the dummy functions returns back to 10 and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn mess up between Paris and San Francisco. And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the water like today in Paris, also converted to Fahrenheit. So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature, and then let's see what's going to happen. So first function call, the first function detected is the current temperature is the Paris. So we execute the function, return it back to the model, and model iterate again. And the second is the convert temperature, which is the 10 from the previous function call, took it, and then from Celsius to Fahrenheit, and because it's a dummy function, it returns back at 10 always, and we got this one. Perfect. So this one, that's really interesting, very clean, and then you can continue doing that to get the response. And you know, by the way, you can apply some smart trick to build your own parallel function call. I'll talk about it maybe someday even even the model doesn't support now let's go to the challenge that none of them could do it even invented gpd so i have sine i have cosine exponential in the power and i'm passing this one calculate the result of power of sine of exponential 10 plus cosine of the power of 2 exponent in 3 and then when you took it all then there's another power use this as a base and the for as exponent okay so let's see what's going to happen execute it here so we have a bunch of function calls first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really break it down i like that first went for exponential 10 and and that what it got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this the addition doesn't looks correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's yeah it's it's subtracted basically this if if this minus this it going to give you the but let me make sure everything is correct yeah well okay but that's beautiful it got the exponent four and and this is great this is fantastic i mean i think by maybe a little bit of adjustment we can make it better or maybe we can create an extra function sum and don't let the model to do this addition by itself we make it the sum like a binary operator There's this sum of this comma this. You get what I mean? I will try that one. But that's just fantastic. Absolutely, this is beautiful. This is great. Now, when I tried NISTROL, I said, I think so far this is the best function call model. But now I'm going to tell the same thing for this code. But 72 billion parameters. You know, that's the things. I like to have a function calling with 100 million parameters model. anyway um this is great this is a good model i i'm gonna dig in more into this and then that's it uh i'll share the links to colab so you can play around with it you can use that helper function see how it goes so this was cohen2 you can go to their blog definitely go to their blogs and read that a bunch of really interesting information over there you can see what they have done and that's it so cohen 2 function calling it was nice and uncle code out\n" 439 | ] 440 | } 441 | ], 442 | "source": [ 443 | "from hermes import transcribe\n", 444 | "\n", 445 | "video_file = 'input.mp4'\n", 446 | "\n", 447 | "result = transcribe(video_file, provider='groq', model='whisper-large-v3')\n", 448 | "print(result['transcription'])" 449 | ], 450 | "id": "zQZH3e80CIF2" 451 | }, 452 | { 453 | "cell_type": "markdown", 454 | "metadata": { 455 | "id": "iAuGXpr_CIF2" 456 | }, 457 | "source": [ 458 | "**Explanation:**\n", 459 | "\n", 460 | "- Here, we use the `whisper-large-v3` model instead of the default `distil-whisper` model." 461 | ], 462 | "id": "iAuGXpr_CIF2" 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": { 467 | "id": "dn0HQ0vICIF2" 468 | }, 469 | "source": [ 470 | "### JSON Output and LLM Processing\n", 471 | "\n", 472 | "- To get the transcription in JSON format, set `response_format='json'`.\n", 473 | "- To further process the transcription with an LLM (e.g., for summarization), use the `llm_prompt` parameter:" 474 | ], 475 | "id": "dn0HQ0vICIF2" 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 4, 480 | "metadata": { 481 | "colab": { 482 | "base_uri": "https://localhost:8080/" 483 | }, 484 | "id": "RK077hZiCIF2", 485 | "outputId": "05c9ae86-9aea-4e8c-c391-48f747af14d5" 486 | }, 487 | "outputs": [ 488 | { 489 | "output_type": "stream", 490 | "name": "stdout", 491 | "text": [ 492 | "{\"text\":\" Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama or Lama CCP and then you download it and install it or I just choose open rotor because they do have this carbon to 72 billion in Strug model and I took it from re-increting account so you can have an API tokens is not difficult anyway I just the code is here from their own GitHub if you want to use by Ollama then then okay then if if you You know what? When you call this models, in a typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extract your arguments, get the result back, send it back to the model, and then the model produced the response for you. This process, when you do have multiple calls, a user asking you questions that then when we get the first response then we have another call and again another call so this is going to be in kind of iteration and then you have to continue doing that so what if down I created a helper function which get the LM instance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function called, and get it out, they call the functions, and create the function message and send it back to the message in each story, and then iterate, till the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're gonna see it, because one of the thing that I want to try is this multiple call back to each other's. So that's one thing. And also I created some dummy functions. Get current weather, just slip a little bit, and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. Doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power. And they're really calculating. If you remember, if you have seen my video about Mistrol, there I challenge it with the complicated mathematical expressions. I wanted to see if you can break it down and then doing it. It was, couldn't, you know, but sort of half-half. Now let's see what's going to happen for this model. First of all, this model will return one functions. So it's not a multiple function, the parallel functions. So you have to iterate, like what I said. Okay, the first example. the better luck today in Paris also tell me the time in San Francisco. So there are two questions not necessarily relevant to each order. You can do this first or you can do the other one. But I want to see how it goes. And I pass those two functions here and I call the execute function. And that's what we're going to see. When you call this, the first things come out of the model is the get current weather in the Paris and location. And this is function result. I should say function called function response, but better than make a function response. So my function, that executed help for function, executed the relevant functions, and got the dummy data, and then pass it to model, and now we have another function called from the model. Model asking for another function to be called is get current time, which is the look values of San Francisco, and then the dummy functions returns back to 10, and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn miss stuff between Paris and San Francisco And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the word like today in Paris? So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature. And then let's see what's going to happen. So the first function called, the first function detected is the current temperature, is the Paris. So we execute the function, return it back to the model, and model it's rate again. And the second is the convert temperature, which is the 10 from the previous function called, it and then from Celsius to Fahrenheit and because it's a dummy function as it turns back at 10 always and we got this one perfect so this one Matt that that's really interesting very clean and then you can continue doing that to get the response and and you know by the way you can you can do some apply some smart trick to build your own a parallel function call I'll talk about it maybe someday even even the model doesn't support now Now let's go to the challenge that none of them could do it with the GPD. So I have sine, I have cosine, exponential in the power, and I'm passing this one. Calculate the result of power of sign of exponential 10 plus cosine of the power of two, exponent and then when you took it all, then there's another power, use this as a base and the four as exponent. Okay, so let's see what's going to happen. it here so we have a bunch of function called first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really brick it down I like that first went for exponential 10 and that's what you got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this addition doesn't look correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's Yeah, it's subtracted. Basically, this, if this minus this, it's going to give you the, but let me make sure everything is correct. Yeah. Well, okay, but that's beautiful. It got the exponent four, and this is great. This is fantastic. I mean, I think by maybe a little bit of adjustment, we can make it better. Or maybe we can create an extra function sum and don't let the model to do this edition by itself. We make it the sum like a binary operator, this sum of this, comma, this. You get what I mean? I will try that one. But that's fantastic. This absolutely, this is beautiful. This is great. Now I have, when I tried nostril, I said, I think so far this is the best function called model. But now I'm going to tell the same thing for this quote one. But seven to billion parameters, you know, that's the things. I like to have a function calling with 100 million parameters model. Anyway, this is great. This is a good model. I'm going to dig in more into this. And then that's it. I'll share the links to collapse so you can play around whether you can use that helper function. See how it goes. So this was Cohen 2. You can go to their blog. Definitely go to their blogs. A bunch of really interesting information over there. You can see what they have done. And that's it. So, Cohen 2, function calling. It was nice. And Uncle Code out.\",\"x_groq\":{\"id\":\"req_01j67j2k3bf4br65txs4fnzpf3\"}}\n", 493 | "\n", 494 | "Here are 3 bullet points summarizing the transcription:\n", 495 | "\n", 496 | "* The speaker, Uncle Code, reviews the Q1-2 function calling ability of a large language model and finds it impressive, comparing it to its previous experiments with other models. He highlights that the model has 72 billion parameters and supports multiple languages and regions.\n", 497 | "* Uncle Code creates a helper function to simplify function calling with the model by passing a list of functions, which are then iteratively called and executed by the model, resulting in a cascading response. He demonstrates this by passing multiple functions and shows the model's ability to handle nested calls and dependencies.\n", 498 | "* The speaker also conducts an experiment where he challenges the model with a complex mathematical expression involving trigonometric and exponential functions. To his surprise, the model breaks down the expression into smaller, executable components, unlike other models that may try to solve it directly or fail. Uncle Code is impressed with the model's ability to handle function calling and suggests potential improvements to optimize the process further.\n" 499 | ] 500 | } 501 | ], 502 | "source": [ 503 | "from hermes import transcribe\n", 504 | "\n", 505 | "video_file = 'input.mp4'\n", 506 | "\n", 507 | "# Get JSON output\n", 508 | "result = transcribe(video_file, provider='groq', response_format='json')\n", 509 | "print(result['transcription'])\n", 510 | "\n", 511 | "# Summarize with LLM\n", 512 | "result = transcribe(video_file, provider='groq', llm_prompt=\"Summarize this transcription in 3 bullet points\")\n", 513 | "print(result['llm_processed'])" 514 | ], 515 | "id": "RK077hZiCIF2" 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": { 520 | "id": "2Gi2p6ONCIF2" 521 | }, 522 | "source": [ 523 | "**Explanation:**\n", 524 | "\n", 525 | "- LLM processing requires an API key for the LLM provider (e.g., Groq). Make sure to set it up in your `~/.hermes/config.yml` file or as an environment variable." 526 | ], 527 | "id": "2Gi2p6ONCIF2" 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": { 532 | "id": "K12tLtI7CIF3" 533 | }, 534 | "source": [ 535 | "## Command Line Interface (CLI)\n", 536 | "\n", 537 | "Hermes also provides a convenient CLI for transcribing videos. Here are some examples:" 538 | ], 539 | "id": "K12tLtI7CIF3" 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": { 544 | "id": "IYxTYt86CIF3" 545 | }, 546 | "source": [ 547 | "### Basic Usage" 548 | ], 549 | "id": "IYxTYt86CIF3" 550 | }, 551 | { 552 | "cell_type": "code", 553 | "execution_count": 5, 554 | "metadata": { 555 | "colab": { 556 | "base_uri": "https://localhost:8080/" 557 | }, 558 | "id": "2PEQ0tLpCIF3", 559 | "outputId": "009c2a4e-0e25-4010-b301-708ae9e10744" 560 | }, 561 | "outputs": [ 562 | { 563 | "output_type": "stream", 564 | "name": "stdout", 565 | "text": [ 566 | " Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama or Lama CCP and then you download it and install it or I just choose open rotor because they do have this carbon to 72 billion in Strug model and I took it from re-increting account so you can have an API tokens is not difficult anyway I just the code is here from their own GitHub if you want to use by Ollama then then okay then if if you You know what? When you call this models, in a typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extract your arguments, get the result back, send it back to the model, and then the model produced the response for you. This process, when you do have multiple calls, a user asking you questions that then when we get the first response then we have another call and again another call so this is going to be in kind of iteration and then you have to continue doing that so what if down I created a helper function which get the LM instance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function called, and get it out, they call the functions, and create the function message and send it back to the message in each story, and then iterate, till the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're gonna see it, because one of the thing that I want to try is this multiple call back to each other's. So that's one thing. And also I created some dummy functions. Get current weather, just slip a little bit, and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. Doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power. And they're really calculating. If you remember, if you have seen my video about Mistrol, there I challenge it with the complicated mathematical expressions. I wanted to see if you can break it down and then doing it. It was, couldn't, you know, but sort of half-half. Now let's see what's going to happen for this model. First of all, this model will return one functions. So it's not a multiple function, the parallel functions. So you have to iterate, like what I said. Okay, the first example. the better luck today in Paris also tell me the time in San Francisco. So there are two questions not necessarily relevant to each order. You can do this first or you can do the other one. But I want to see how it goes. And I pass those two functions here and I call the execute function. And that's what we're going to see. When you call this, the first things come out of the model is the get current weather in the Paris and location. And this is function result. I should say function called function response, but better than make a function response. So my function, that executed help for function, executed the relevant functions, and got the dummy data, and then pass it to model, and now we have another function called from the model. Model asking for another function to be called is get current time, which is the look values of San Francisco, and then the dummy functions returns back to 10, and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn miss stuff between Paris and San Francisco And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the word like today in Paris? So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature. And then let's see what's going to happen. So the first function called, the first function detected is the current temperature, is the Paris. So we execute the function, return it back to the model, and model it's rate again. And the second is the convert temperature, which is the 10 from the previous function called, it and then from Celsius to Fahrenheit and because it's a dummy function as it turns back at 10 always and we got this one perfect so this one Matt that that's really interesting very clean and then you can continue doing that to get the response and and you know by the way you can you can do some apply some smart trick to build your own a parallel function call I'll talk about it maybe someday even even the model doesn't support now Now let's go to the challenge that none of them could do it with the GPD. So I have sine, I have cosine, exponential in the power, and I'm passing this one. Calculate the result of power of sign of exponential 10 plus cosine of the power of two, exponent and then when you took it all, then there's another power, use this as a base and the four as exponent. Okay, so let's see what's going to happen. it here so we have a bunch of function called first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really brick it down I like that first went for exponential 10 and that's what you got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this addition doesn't look correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's Yeah, it's subtracted. Basically, this, if this minus this, it's going to give you the, but let me make sure everything is correct. Yeah. Well, okay, but that's beautiful. It got the exponent four, and this is great. This is fantastic. I mean, I think by maybe a little bit of adjustment, we can make it better. Or maybe we can create an extra function sum and don't let the model to do this edition by itself. We make it the sum like a binary operator, this sum of this, comma, this. You get what I mean? I will try that one. But that's fantastic. This absolutely, this is beautiful. This is great. Now I have, when I tried nostril, I said, I think so far this is the best function called model. But now I'm going to tell the same thing for this quote one. But seven to billion parameters, you know, that's the things. I like to have a function calling with 100 million parameters model. Anyway, this is great. This is a good model. I'm going to dig in more into this. And then that's it. I'll share the links to collapse so you can play around whether you can use that helper function. See how it goes. So this was Cohen 2. You can go to their blog. Definitely go to their blogs. A bunch of really interesting information over there. You can see what they have done. And that's it. So, Cohen 2, function calling. It was nice. And Uncle Code out.\n" 567 | ] 568 | } 569 | ], 570 | "source": [ 571 | "!hermes input.mp4 -p groq" 572 | ], 573 | "id": "2PEQ0tLpCIF3" 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": { 578 | "id": "J-JuXmM0CIF3" 579 | }, 580 | "source": [ 581 | "### YouTube Videos" 582 | ], 583 | "id": "J-JuXmM0CIF3" 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 7, 588 | "metadata": { 589 | "colab": { 590 | "base_uri": "https://localhost:8080/" 591 | }, 592 | "id": "TJbF3j6bCIF3", 593 | "outputId": "fb4142a5-18c0-4b28-fc7b-5dd16ed1c636" 594 | }, 595 | "outputs": [ 596 | { 597 | "output_type": "stream", 598 | "name": "stdout", 599 | "text": [ 600 | "[youtube] Extracting URL: https://www.youtube.com/watch?v=PNulbFECY-I\n", 601 | "[youtube] PNulbFECY-I: Downloading webpage\n", 602 | "[youtube] PNulbFECY-I: Downloading ios player API JSON\n", 603 | "[youtube] PNulbFECY-I: Downloading web creator player API JSON\n", 604 | "[youtube] PNulbFECY-I: Downloading m3u8 information\n", 605 | "[info] PNulbFECY-I: Downloading 1 format(s): 251\n", 606 | "[download] Destination: PNulbFECY-I.webm\n", 607 | "\u001b[K[download] 100% of 2.13MiB in \u001b[1;37m00:00:00\u001b[0m at \u001b[0;32m6.55MiB/s\u001b[0m\n", 608 | "[ExtractAudio] Destination: PNulbFECY-I.mp3\n", 609 | "Deleting original file PNulbFECY-I.webm (pass -k to keep)\n", 610 | " Much is said about the virtues and pleasures of individuality, of being someone who stands out from the crowd and delights in their own particularity. But let's also admit to how frankly lonely and frightening it can be to find ourselves yet again in a peculiar minority where the differences between us and others strike us as bewildering rather than emboldening. When, for example, everyone seems to want to gossip, but we prefer generosity and forgiveness. When everyone is at ease, but we're melancholy and self-conscious. When everyone is cheerful, but we can't seem to let go of anxiety and apprehension. When everyone seems confident, but we feel suspicious and ashamed of ourselves. When everyone is contented in their couples, but we're still searching. for a home, when everyone worries passionately about the future of the planet, but we feel cold and at times almost indifferent, when everyone seems to love life, but we're not sure if we quite do. At such times we might benefit from a few thoughts to alleviate the isolation Firstly we don know reality as well as we imagine What we believe that everyone is like may not be how they actually are We may have more friends than we think Also we getting statistics wrong These four or eight or twelve people in a room don't represent all of humanity. The 80 or so people in our extended social group are in fact always a minuscule part of the human story. There are still so many friends left to meet. Also, perhaps our existing companions actually know much more about the material we feel alone with than we suspect. They and we simply haven't found a way to share our true selves. Maybe they will feel what we feel one day, just not yet. It may be fine to belong to a minority. Minorities have sheltered some of the most accomplished spirits of it ever lived. Lived. Isolation may just be a price we have to pay for a certain complexity of mind. And lastly, we have art to bridge the gaps between ourselves and other people. Bookshops are an ideal destination for the lonely, given how many books were written because their authors couldn't find anyone to talk to. Maybe there are people nearby, perhaps in this community, who would understand who would understand very well indeed.\n" 611 | ] 612 | } 613 | ], 614 | "source": [ 615 | "!hermes https://www.youtube.com/watch?v=PNulbFECY-I -p groq" 616 | ], 617 | "id": "TJbF3j6bCIF3" 618 | }, 619 | { 620 | "cell_type": "markdown", 621 | "metadata": { 622 | "id": "qxauRAa8CIF3" 623 | }, 624 | "source": [ 625 | "### Different Models" 626 | ], 627 | "id": "qxauRAa8CIF3" 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": 8, 632 | "metadata": { 633 | "colab": { 634 | "base_uri": "https://localhost:8080/" 635 | }, 636 | "id": "eCa8loClCIF3", 637 | "outputId": "a6678b3c-7b4e-48f0-819a-64cd8f4b73db" 638 | }, 639 | "outputs": [ 640 | { 641 | "output_type": "stream", 642 | "name": "stdout", 643 | "text": [ 644 | " Hello, you beautiful people. This is Uncle Code. And today I'm going to review quickly the Q1.2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with a really good stuff with 72 billion models. And you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions, and that is great because this is a large language model. sounds like all of us. And it's cool, and go and play around with it. So what I'm trying to do is trying to challenge a little bit the function calling. Like similar to the thing that I did for other models like the Mistral. So first, they have this nice library, the Cohen agents, that let you to create agent-next softwares and applications and also speed up the work with the large language model. What you can just install it quickly. And then there are different ways that you can work with the model. You can maybe use a Lama or Lama CCP and then you download it and install it. Or I just choose OpenRotor because they do have this going to 72 billion in stroke model. And I took it from reincred in accounts. You can have an API tokens, it's not difficult. Anyway, I just, the code is here from their own GitHub. If you want to use by Ollama. Then, okay, then if, you know what, when you call this models in typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extracted arguments, get the result back, send it back to the model, and then the model produce the response for you, right? This process, when you do have multiple calls, let's say user asking you questions, that then when we get the first response, then we have another call, and again another call. So this is gonna be in kind of iteration. And then you have to continue doing that. So what I done I created a helper function which get the LME stance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function call, it get it out, it call the functions and create the function message and send it back to the messaging story, and then iterate to the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're going to see, because one of the things I want to try is this multiple call back to each other. So that's one thing. And also I created some dummy functions, get current weather, it just slip a little bit and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. It doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power and they're really calculating if you remember if you have seen my video about mistral there i challenge it with the complicated mathematical expressions i wanted to see if can break it down and then doing it it was couldn't you know but sort of half half now let's see what's going to happen for this model first of all this model always returns one functions so it's not a multiple function the parallel functions so you have to each way like what i said okay the first example what's the better luck today in paris also tell me the time in san francisco so there are two questions not necessarily relevant to each order so you can do this first or you can do the other one but i want to see how it goes and i pass those two functions here and i call the execute function and that's what we're going to see when you call this the first things come out of the model is the get current weather in the Paris and location and this is function result I should say function called function response but better than making function response so so my function that execute help for function executed the relevant functions and got the dummy data and then pass it to model and now we have another function called from the model model asking for the function to be called is get parent time which is the look values of San Francisco and then the dummy functions returns back to 10 and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn mess up between Paris and San Francisco. And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the water like today in Paris, also converted to Fahrenheit. So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature, and then let's see what's going to happen. So first function call, the first function detected is the current temperature is the Paris. So we execute the function, return it back to the model, and model iterate again. And the second is the convert temperature, which is the 10 from the previous function call, took it, and then from Celsius to Fahrenheit, and because it's a dummy function, it returns back at 10 always, and we got this one. Perfect. So this one, that's really interesting, very clean, and then you can continue doing that to get the response. And you know, by the way, you can apply some smart trick to build your own parallel function call. I'll talk about it maybe someday even even the model doesn't support now let's go to the challenge that none of them could do it even invented gpd so i have sine i have cosine exponential in the power and i'm passing this one calculate the result of power of sine of exponential 10 plus cosine of the power of 2 exponent in 3 and then when you took it all then there's another power use this as a base and the for as exponent okay so let's see what's going to happen execute it here so we have a bunch of function calls first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really break it down i like that first went for exponential 10 and and that what it got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this the addition doesn't looks correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's yeah it's it's subtracted basically this if if this minus this it going to give you the but let me make sure everything is correct yeah well okay but that's beautiful it got the exponent four and and this is great this is fantastic i mean i think by maybe a little bit of adjustment we can make it better or maybe we can create an extra function sum and don't let the model to do this addition by itself we make it the sum like a binary operator There's this sum of this comma this. You get what I mean? I will try that one. But that's just fantastic. Absolutely, this is beautiful. This is great. Now, when I tried NISTROL, I said, I think so far this is the best function call model. But now I'm going to tell the same thing for this code. But 72 billion parameters. You know, that's the things. I like to have a function calling with 100 million parameters model. anyway um this is great this is a good model i i'm gonna dig in more into this and then that's it uh i'll share the links to colab so you can play around with it you can use that helper function see how it goes so this was cohen2 you can go to their blog definitely go to their blogs and read that a bunch of really interesting information over there you can see what they have done and that's it so cohen 2 function calling it was nice and uncle code out\n" 645 | ] 646 | } 647 | ], 648 | "source": [ 649 | "!hermes input.mp4 -p groq -m whisper-large-v3" 650 | ], 651 | "id": "eCa8loClCIF3" 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "metadata": { 656 | "id": "chb_DIGPCIF3" 657 | }, 658 | "source": [ 659 | "### JSON Output" 660 | ], 661 | "id": "chb_DIGPCIF3" 662 | }, 663 | { 664 | "cell_type": "code", 665 | "execution_count": 9, 666 | "metadata": { 667 | "colab": { 668 | "base_uri": "https://localhost:8080/" 669 | }, 670 | "id": "BUsIWsbkCIF3", 671 | "outputId": "4a509de1-74e6-4ab7-f147-f813c504a0c9" 672 | }, 673 | "outputs": [ 674 | { 675 | "output_type": "stream", 676 | "name": "stdout", 677 | "text": [ 678 | "{\"text\":\" Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama or Lama CCP and then you download it and install it or I just choose open rotor because they do have this carbon to 72 billion in Strug model and I took it from re-increting account so you can have an API tokens is not difficult anyway I just the code is here from their own GitHub if you want to use by Ollama then then okay then if if you You know what? When you call this models, in a typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extract your arguments, get the result back, send it back to the model, and then the model produced the response for you. This process, when you do have multiple calls, a user asking you questions that then when we get the first response then we have another call and again another call so this is going to be in kind of iteration and then you have to continue doing that so what if down I created a helper function which get the LM instance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function called, and get it out, they call the functions, and create the function message and send it back to the message in each story, and then iterate, till the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're gonna see it, because one of the thing that I want to try is this multiple call back to each other's. So that's one thing. And also I created some dummy functions. Get current weather, just slip a little bit, and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. Doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power. And they're really calculating. If you remember, if you have seen my video about Mistrol, there I challenge it with the complicated mathematical expressions. I wanted to see if you can break it down and then doing it. It was, couldn't, you know, but sort of half-half. Now let's see what's going to happen for this model. First of all, this model will return one functions. So it's not a multiple function, the parallel functions. So you have to iterate, like what I said. Okay, the first example. the better luck today in Paris also tell me the time in San Francisco. So there are two questions not necessarily relevant to each order. You can do this first or you can do the other one. But I want to see how it goes. And I pass those two functions here and I call the execute function. And that's what we're going to see. When you call this, the first things come out of the model is the get current weather in the Paris and location. And this is function result. I should say function called function response, but better than make a function response. So my function, that executed help for function, executed the relevant functions, and got the dummy data, and then pass it to model, and now we have another function called from the model. Model asking for another function to be called is get current time, which is the look values of San Francisco, and then the dummy functions returns back to 10, and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn miss stuff between Paris and San Francisco And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the word like today in Paris? So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature. And then let's see what's going to happen. So the first function called, the first function detected is the current temperature, is the Paris. So we execute the function, return it back to the model, and model it's rate again. And the second is the convert temperature, which is the 10 from the previous function called, it and then from Celsius to Fahrenheit and because it's a dummy function as it turns back at 10 always and we got this one perfect so this one Matt that that's really interesting very clean and then you can continue doing that to get the response and and you know by the way you can you can do some apply some smart trick to build your own a parallel function call I'll talk about it maybe someday even even the model doesn't support now Now let's go to the challenge that none of them could do it with the GPD. So I have sine, I have cosine, exponential in the power, and I'm passing this one. Calculate the result of power of sign of exponential 10 plus cosine of the power of two, exponent and then when you took it all, then there's another power, use this as a base and the four as exponent. Okay, so let's see what's going to happen. it here so we have a bunch of function called first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really brick it down I like that first went for exponential 10 and that's what you got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this addition doesn't look correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's Yeah, it's subtracted. Basically, this, if this minus this, it's going to give you the, but let me make sure everything is correct. Yeah. Well, okay, but that's beautiful. It got the exponent four, and this is great. This is fantastic. I mean, I think by maybe a little bit of adjustment, we can make it better. Or maybe we can create an extra function sum and don't let the model to do this edition by itself. We make it the sum like a binary operator, this sum of this, comma, this. You get what I mean? I will try that one. But that's fantastic. This absolutely, this is beautiful. This is great. Now I have, when I tried nostril, I said, I think so far this is the best function called model. But now I'm going to tell the same thing for this quote one. But seven to billion parameters, you know, that's the things. I like to have a function calling with 100 million parameters model. Anyway, this is great. This is a good model. I'm going to dig in more into this. And then that's it. I'll share the links to collapse so you can play around whether you can use that helper function. See how it goes. So this was Cohen 2. You can go to their blog. Definitely go to their blogs. A bunch of really interesting information over there. You can see what they have done. And that's it. So, Cohen 2, function calling. It was nice. And Uncle Code out.\",\"x_groq\":{\"id\":\"req_01j67j2k3bf4br65txs4fnzpf3\"}}\n", 679 | "\n" 680 | ] 681 | } 682 | ], 683 | "source": [ 684 | "!hermes input.mp4 -p groq --response_format json" 685 | ], 686 | "id": "BUsIWsbkCIF3" 687 | }, 688 | { 689 | "cell_type": "markdown", 690 | "metadata": { 691 | "id": "QvmhqNx1CIF3" 692 | }, 693 | "source": [ 694 | "### LLM Processing" 695 | ], 696 | "id": "QvmhqNx1CIF3" 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 10, 701 | "metadata": { 702 | "colab": { 703 | "base_uri": "https://localhost:8080/" 704 | }, 705 | "id": "Z49ZHZLhCIF3", 706 | "outputId": "fc557396-3edf-4630-fe63-330d8803039a" 707 | }, 708 | "outputs": [ 709 | { 710 | "output_type": "stream", 711 | "name": "stdout", 712 | "text": [ 713 | " Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama or Lama CCP and then you download it and install it or I just choose open rotor because they do have this carbon to 72 billion in Strug model and I took it from re-increting account so you can have an API tokens is not difficult anyway I just the code is here from their own GitHub if you want to use by Ollama then then okay then if if you You know what? When you call this models, in a typical way is you call a model, you pass the user query, and then the model returns you back a message that it has function called key in it, and content is empty or is not. And then you take the name of the function and then go to your functions and pass the detected, extract your arguments, get the result back, send it back to the model, and then the model produced the response for you. This process, when you do have multiple calls, a user asking you questions that then when we get the first response then we have another call and again another call so this is going to be in kind of iteration and then you have to continue doing that so what if down I created a helper function which get the LM instance in this example for the coin and then messages in the list of functions you know JSON schema And it goes through them and then there a loop and if the last message of the response from the LLM has function called, and get it out, they call the functions, and create the function message and send it back to the message in each story, and then iterate, till the point that there's no any kind of function call and return it back. So it's a great helper because you can see, you're gonna see it, because one of the thing that I want to try is this multiple call back to each other's. So that's one thing. And also I created some dummy functions. Get current weather, just slip a little bit, and then return back in JSON in always Celsius. This is get current temperature, which is very similar to this one. Doesn't have the format. I just wanted to have multiple version of it. You will see later. And get current time, convert temperature. And I do have some mathematical functions, sine, cosine, exponential, and power. And they're really calculating. If you remember, if you have seen my video about Mistrol, there I challenge it with the complicated mathematical expressions. I wanted to see if you can break it down and then doing it. It was, couldn't, you know, but sort of half-half. Now let's see what's going to happen for this model. First of all, this model will return one functions. So it's not a multiple function, the parallel functions. So you have to iterate, like what I said. Okay, the first example. the better luck today in Paris also tell me the time in San Francisco. So there are two questions not necessarily relevant to each order. You can do this first or you can do the other one. But I want to see how it goes. And I pass those two functions here and I call the execute function. And that's what we're going to see. When you call this, the first things come out of the model is the get current weather in the Paris and location. And this is function result. I should say function called function response, but better than make a function response. So my function, that executed help for function, executed the relevant functions, and got the dummy data, and then pass it to model, and now we have another function called from the model. Model asking for another function to be called is get current time, which is the look values of San Francisco, and then the dummy functions returns back to 10, and then we said the water in Paris in today is 10 degrees Celsius and current time in San Francisco is 10 because I just returned 10 That interesting So the model could really check each one of these questions separately It didn miss stuff between Paris and San Francisco And some weaker models, they do that. Perfect. And I tried, I think you'd be more than the two, three, four. You can continue doing that. That is really cool stuff. And then the second example is a little bit different. So I do have this message first. What's the word like today in Paris? So you can see that the second one is dependent on the first one. So there's a chain between these two, nested calls, binding, and there's dependency actually. So I pass the two functions, get current temperatures and convert temperature. And then let's see what's going to happen. So the first function called, the first function detected is the current temperature, is the Paris. So we execute the function, return it back to the model, and model it's rate again. And the second is the convert temperature, which is the 10 from the previous function called, it and then from Celsius to Fahrenheit and because it's a dummy function as it turns back at 10 always and we got this one perfect so this one Matt that that's really interesting very clean and then you can continue doing that to get the response and and you know by the way you can you can do some apply some smart trick to build your own a parallel function call I'll talk about it maybe someday even even the model doesn't support now Now let's go to the challenge that none of them could do it with the GPD. So I have sine, I have cosine, exponential in the power, and I'm passing this one. Calculate the result of power of sign of exponential 10 plus cosine of the power of two, exponent and then when you took it all, then there's another power, use this as a base and the four as exponent. Okay, so let's see what's going to happen. it here so we have a bunch of function called first of all fantastic that it didn't try to solve it by itself some of the models they do that so it took it really brick it down I like that first went for exponential 10 and that's what you got here then called the sign over the exponential 10 so that is great cool things then went for the power two three perfect then calculate the cosine of that nice eight and then got this and then got the power okay so what it did it did the addition by itself it did the addition between this part and this part let me see if the addition is correct minus this addition doesn't look correct it looks like it looks like subtraction i guess let me test it with my calculator uh is is 0.68 minus yeah it's Yeah, it's subtracted. Basically, this, if this minus this, it's going to give you the, but let me make sure everything is correct. Yeah. Well, okay, but that's beautiful. It got the exponent four, and this is great. This is fantastic. I mean, I think by maybe a little bit of adjustment, we can make it better. Or maybe we can create an extra function sum and don't let the model to do this edition by itself. We make it the sum like a binary operator, this sum of this, comma, this. You get what I mean? I will try that one. But that's fantastic. This absolutely, this is beautiful. This is great. Now I have, when I tried nostril, I said, I think so far this is the best function called model. But now I'm going to tell the same thing for this quote one. But seven to billion parameters, you know, that's the things. I like to have a function calling with 100 million parameters model. Anyway, this is great. This is a good model. I'm going to dig in more into this. And then that's it. I'll share the links to collapse so you can play around whether you can use that helper function. See how it goes. So this was Cohen 2. You can go to their blog. Definitely go to their blogs. A bunch of really interesting information over there. You can see what they have done. And that's it. So, Cohen 2, function calling. It was nice. And Uncle Code out.\n", 714 | "\n", 715 | "LLM Processed Result:\n", 716 | "Here are three bullet points summarizing the transcription:\n", 717 | "\n", 718 | "• Uncle Code reviews the Q1-2 function calling ability of a large language model, specifically the 72 billion parameter model from the Cohere API. He finds that the model can process multiple function calls and return the results, which is a more complex task than simply executing a single function.\n", 719 | "\n", 720 | "• Uncle Code creates a helper function to speed up the process of function calling, which involves iterating through the model's responses and executing the relevant functions. He also creates dummy functions to test the model's ability to process multiple function calls, including mathematical functions like sine, cosine, and exponential.\n", 721 | "\n", 722 | "• The model is able to process complex function calls, including nested calls and dependencies, and returns the correct results. Uncle Code is impressed with the model's ability to break down complex mathematical expressions and execute them step-by-step, and he believes that this model is one of the best function calling models he has tested so far.\n" 723 | ] 724 | } 725 | ], 726 | "source": [ 727 | "!hermes input.mp4 -p groq --llm_prompt \"Summarize this transcription in 3 bullet points\"" 728 | ], 729 | "id": "Z49ZHZLhCIF3" 730 | }, 731 | { 732 | "cell_type": "markdown", 733 | "metadata": { 734 | "id": "e-XcDMZZCIF3" 735 | }, 736 | "source": [ 737 | "## Conclusion\n", 738 | "\n", 739 | "That's it! You've learned the basics of using Hermes for lightning-fast video transcription. Explore the different providers, models, and response formats to find what works best for your needs. Happy transcribing!" 740 | ], 741 | "id": "e-XcDMZZCIF3" 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "metadata": { 746 | "id": "gIKs3uffCIF4" 747 | }, 748 | "source": [ 749 | "**Extra Comments:**\n", 750 | "\n", 751 | "- Remember to replace the example video file paths and YouTube URLs with your actual content.\n", 752 | "- Hermes has excellent performance, especially with Groq's `distil-whisper` model.\n", 753 | "- Check out the `examples` folder in the GitHub repository for more advanced usage.\n", 754 | "- Feel free to contribute to the project and report any issues you encounter.\n", 755 | "- Don't forget to star the repo and follow [@unclecode](https://twitter.com/unclecode) on X!" 756 | ], 757 | "id": "gIKs3uffCIF4" 758 | } 759 | ], 760 | "metadata": { 761 | "kernelspec": { 762 | "display_name": "Python 3", 763 | "language": "python", 764 | "name": "python3" 765 | }, 766 | "language_info": { 767 | "codemirror_mode": { 768 | "name": "ipython", 769 | "version": 3 770 | }, 771 | "file_extension": ".py", 772 | "mimetype": "text/x-python", 773 | "name": "python", 774 | "nbconvert_exporter": "python", 775 | "pygments_lexer": "ipython3", 776 | "version": "3.10.6" 777 | }, 778 | "colab": { 779 | "provenance": [] 780 | } 781 | }, 782 | "nbformat": 4, 783 | "nbformat_minor": 5 784 | } -------------------------------------------------------------------------------- /hermes.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Define colors 4 | GREEN='\033[0;32m' 5 | YELLOW='\033[1;33m' 6 | CYAN='\033[1;36m' 7 | RED='\033[0;31m' 8 | NC='\033[0m' # No Color 9 | 10 | show_help() { 11 | printf "${CYAN}Usage:${NC} $0 [provider: mlx | groq | openai] [response_format] [model] [additional-mlx-whisper-arguments]\n" 12 | printf "\n${GREEN}Hermes Video Transcription Script${NC}\n" 13 | printf "\n${CYAN}This script converts video to audio (mp3), supports YouTube URLs, and transcribes the audio using one of three providers: MLX, Groq, or OpenAI.${NC}\n" 14 | printf "\n${YELLOW}Positional Arguments:${NC}\n" 15 | printf " video-file-or-youtube-url The video file or YouTube URL to be processed.\n" 16 | printf "\n${YELLOW}Optional Arguments:${NC}\n" 17 | printf " provider The transcription provider. Options are:\n" 18 | printf " mlx - Uses mlx_whisper for transcription. Default model: distil-whisper-large-v3\n" 19 | printf " groq - Uses Groq API for transcription. Default model: distil-whisper-large-v3-en\n" 20 | printf " openai - Uses OpenAI API for transcription. Default model: whisper-1\n" 21 | printf "\n response_format Specifies the output format. Default: vtt for mlx, text for others\n" 22 | printf " Options: json, text, srt, verbose_json, vtt\n" 23 | printf "\n model The model to be used by the provider (optional).\n" 24 | printf "\n${YELLOW}Examples:${NC}\n" 25 | printf " Basic usage with MLX (default):\n" 26 | printf " $0 input.mp4\n" 27 | printf "\n Using Groq with default model:\n" 28 | printf " $0 input.mp4 groq\n" 29 | printf "\n Using OpenAI with srt output:\n" 30 | printf " $0 input.mp4 openai srt whisper-1\n" 31 | printf "\n Processing a YouTube video:\n" 32 | printf " $0 https://www.youtube.com/watch?v=v=PNulbFECY-I\n" 33 | } 34 | 35 | check_dependencies() { 36 | local deps=("yt-dlp" "ffmpeg" "mlx_whisper") 37 | for dep in "${deps[@]}"; do 38 | if ! command -v "$dep" &> /dev/null; then 39 | printf "${RED}Error: $dep is not installed. Please install it and try again.${NC}\n" 40 | exit 1 41 | fi 42 | done 43 | } 44 | 45 | check_yt_dlp() { 46 | if ! command -v yt-dlp &> /dev/null 47 | then 48 | printf "${YELLOW}yt-dlp is not installed. This is required for YouTube video processing.${NC}\n" 49 | printf "${CYAN}To install yt-dlp, you can use one of the following methods:${NC}\n" 50 | printf "1. Using pip (Python package manager):\n" 51 | printf " ${GREEN}pip install yt-dlp${NC}\n" 52 | printf "2. On macOS using Homebrew:\n" 53 | printf " ${GREEN}brew install yt-dlp${NC}\n" 54 | printf "3. On Ubuntu or Debian:\n" 55 | printf " ${GREEN}sudo apt-get install yt-dlp${NC}\n" 56 | printf "For other installation methods, please visit: https://github.com/yt-dlp/yt-dlp#installation\n" 57 | printf "${YELLOW}Please install yt-dlp and run the script again.${NC}\n" 58 | exit 1 59 | fi 60 | } 61 | 62 | get_video_duration() { 63 | local video_file="$1" 64 | ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$video_file" 65 | } 66 | 67 | download_youtube_audio() { 68 | check_yt_dlp 69 | 70 | local input="$1" 71 | local temp_file="temp_youtube_audio.mp3" 72 | local output_file="youtube_audio.mp3" 73 | local sample_rate=16000 74 | 75 | printf "${CYAN}Downloading audio from YouTube...${NC}\n" 76 | if yt-dlp -f 'bestaudio[ext=m4a]/bestaudio' \ 77 | --extract-audio \ 78 | --audio-format mp3 \ 79 | --audio-quality 0 \ 80 | -o "$temp_file" \ 81 | "$input" > /dev/null 2>&1; then 82 | printf "${GREEN}YouTube audio download complete. Converting to 16kHz...${NC}\n" 83 | if ffmpeg -i "$temp_file" -ar $sample_rate -ac 1 -q:a 0 "$output_file" -y > /dev/null 2>&1; then 84 | printf "${GREEN}Conversion to 16kHz completed: $output_file${NC}\n" 85 | rm -f "$temp_file" 86 | echo "$output_file" 87 | else 88 | printf "${RED}Failed to convert audio to 16kHz. Please check ffmpeg installation.${NC}\n" 89 | rm -f "$temp_file" 90 | return 1 91 | fi 92 | else 93 | printf "${RED}Failed to download YouTube audio. Please check the URL and try again.${NC}\n" 94 | return 1 95 | fi 96 | } 97 | 98 | transcribe_video() { 99 | local input="$1" 100 | local provider="${2:-groq}" 101 | local model 102 | local response_format 103 | local audio_file 104 | 105 | # Set default model and response format based on provider 106 | if [ "$provider" == "groq" ]; then 107 | model="${4:-distil-whisper-large-v3-en}" 108 | response_format="${3:-text}" 109 | elif [ "$provider" == "openai" ]; then 110 | model="${4:-whisper-1}" 111 | response_format="${3:-text}" 112 | elif [ "$provider" == "mlx" ]; then 113 | model="${4:-distil-whisper-large-v3}" 114 | response_format="${3:-vtt}" 115 | else 116 | # Set to groq as default provider 117 | provider="groq" 118 | model="${4:-distil-whisper-large-v3-en}" 119 | response_format="${3:-text}" 120 | fi 121 | 122 | # Check if input is a YouTube URL 123 | if [[ "$input" == http*youtube.com* ]] || [[ "$input" == http*youtu.be* ]] || [[ "$input" == http*youtube.com/shorts* ]]; then 124 | download_youtube_audio "$input" 125 | audio_file="youtube_audio.mp3" 126 | if [ $? -ne 0 ] || [ -z "$audio_file" ]; then 127 | printf "${RED}Failed to process YouTube audio. Exiting.${NC}\n" 128 | exit 1 129 | fi 130 | else 131 | # Convert local video to mp3 132 | audio_file="${input%.*}_temp.mp3" 133 | printf "${CYAN}Converting video to mp3...${NC}\n" 134 | if ! ffmpeg -loglevel error -i "$input" -ar 16000 -ac 1 -q:a 0 -map a "$audio_file" -y; then 135 | printf "${RED}Failed to convert video to audio. Please check the input file.${NC}\n" 136 | exit 1 137 | fi 138 | printf "${GREEN}Conversion completed.${NC}\n" 139 | fi 140 | 141 | start_time=$(date +%s) 142 | if [ "$provider" == "groq" ]; then 143 | printf "${CYAN}Starting transcription with Groq using model: $model...${NC}\n" 144 | if [ -z "$GROQ_API_KEY" ]; then 145 | printf "${RED}Error: GROQ_API_KEY is not set. Please set it and try again.${NC}\n" 146 | exit 1 147 | fi 148 | curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \ 149 | -H "Authorization: bearer $GROQ_API_KEY" \ 150 | -F "file=@$audio_file" \ 151 | -F "model=$model" \ 152 | -F "temperature=0" \ 153 | -F "response_format=$response_format" \ 154 | -F "language=en" > "${input##*/}_groq.$response_format" 155 | 156 | elif [ "$provider" == "openai" ]; then 157 | printf "${CYAN}Starting transcription with OpenAI using model: $model...${NC}\n" 158 | if [ -z "$OPENAI_API_KEY" ]; then 159 | printf "${RED}Error: OPENAI_API_KEY is not set. Please set it and try again.${NC}\n" 160 | exit 1 161 | fi 162 | curl -X POST "https://api.openai.com/v1/audio/transcriptions" \ 163 | -H "Authorization: Bearer $OPENAI_API_KEY" \ 164 | -H "Content-Type: multipart/form-data" \ 165 | -F "file=@$audio_file" \ 166 | -F "model=$model" \ 167 | -F "response_format=$response_format" > "${input##*/}_openai.$response_format" 168 | 169 | else 170 | printf "${CYAN}Starting transcription with mlx_whisper using model: mlx-community/$model...${NC}\n" 171 | mlx_whisper "$audio_file" --model "mlx-community/$model" --output-dir "./" "${@:5}" 172 | fi 173 | 174 | end_time=$(date +%s) 175 | duration_ns=$((end_time - start_time)) 176 | 177 | local duration=$(get_video_duration "$audio_file") 178 | printf "${YELLOW}Audio duration: ${duration} seconds${NC}\n" 179 | 180 | # Output transcription time in seconds for capturing by the benchmark script 181 | printf "${GREEN}Transcription completed in $duration_ns seconds.${NC}\n" 182 | echo "TRANSCRIPTION_TIME=$duration_ns" 183 | 184 | # Clean up the audio file 185 | rm -f "$audio_file" 186 | } 187 | 188 | # Main script execution 189 | check_dependencies 190 | 191 | if [ "$1" == "-h" ]; then 192 | show_help 193 | exit 0 194 | fi 195 | 196 | if [ -z "$1" ]; then 197 | printf "${RED}Missing required arguments. Use -h for help.${NC}\n" 198 | exit 1 199 | fi 200 | 201 | transcribe_video "$@" -------------------------------------------------------------------------------- /hermes/__init__.py: -------------------------------------------------------------------------------- 1 | from .core import Hermes, transcribe 2 | from .config import CONFIG 3 | from .strategies.source import SourceStrategy 4 | from .strategies.provider import ProviderStrategy 5 | from .utils.cache import Cache 6 | from .utils.llm import LLMProcessor 7 | from .cli import main as cli_main 8 | 9 | __version__ = "0.1.0" 10 | 11 | __all__ = [ 12 | "Hermes", 13 | "transcribe", 14 | "CONFIG", 15 | "SourceStrategy", 16 | "ProviderStrategy", 17 | "Cache", 18 | "LLMProcessor", 19 | ] 20 | -------------------------------------------------------------------------------- /hermes/cli.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import sys 3 | from typing import List 4 | from hermes.core import Hermes, transcribe 5 | 6 | def parse_args(args: List[str]) -> argparse.Namespace: 7 | parser = argparse.ArgumentParser(description="Hermes Video Transcription Tool") 8 | parser.add_argument("source", help="Source file, URL, or 'mic' for microphone input") 9 | parser.add_argument("-p", "--provider", choices=["groq", "openai", "mlx"], default="groq", help="Transcription provider") 10 | parser.add_argument("-m", "--model", help="Model to use for transcription") 11 | parser.add_argument("-o", "--output", help="Output file path") 12 | parser.add_argument("-f", "--force", action="store_true", help="Force transcription even if cached") 13 | parser.add_argument("--response_format", choices=["json", "text", "srt", "verbose_json", "vtt"], default="text", help="Response format") 14 | parser.add_argument("--llm_prompt", help="Prompt for LLM processing of transcription") 15 | 16 | # Parse known args first 17 | known_args, unknown_args = parser.parse_known_args(args) 18 | 19 | # Parse remaining args as key-value pairs 20 | extra_args = {} 21 | for i in range(0, len(unknown_args), 2): 22 | if i + 1 < len(unknown_args): 23 | key = unknown_args[i].lstrip('-') 24 | value = unknown_args[i + 1] 25 | extra_args[key] = value 26 | 27 | return known_args, extra_args 28 | 29 | def main(): 30 | known_args, extra_args = parse_args(sys.argv[1:]) 31 | 32 | try: 33 | result = transcribe( 34 | source=known_args.source, 35 | provider=known_args.provider, 36 | force=known_args.force, 37 | llm_prompt=known_args.llm_prompt, 38 | model=known_args.model, 39 | response_format=known_args.response_format, 40 | **extra_args 41 | ) 42 | 43 | if known_args.output: 44 | with open(known_args.output, 'w') as f: 45 | f.write(result['transcription']) 46 | print(f"Transcription saved to {known_args.output}") 47 | else: 48 | print(result['transcription']) 49 | 50 | if 'llm_processed' in result: 51 | print("\nLLM Processed Result:") 52 | print(result['llm_processed']) 53 | 54 | except Exception as e: 55 | print(f"Error: {str(e)}", file=sys.stderr) 56 | sys.exit(1) 57 | 58 | def cli_entry_point(): 59 | main() 60 | 61 | if __name__ == "__main__": 62 | cli_entry_point() -------------------------------------------------------------------------------- /hermes/cli/__main__.py: -------------------------------------------------------------------------------- 1 | from hermes.cli import main 2 | 3 | if __name__ == "__main__": 4 | main() -------------------------------------------------------------------------------- /hermes/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import yaml 3 | from typing import Dict, Any 4 | 5 | DEFAULT_CONFIG = { 6 | 'llm': { 7 | 'provider': 'groq', 8 | 'model': 'llama-3.1-8b-instant', 9 | 'api_key': None, 10 | }, 11 | 'transcription': { 12 | 'provider': 'groq', 13 | 'model': 'distil-whisper-large-v3-en', 14 | 'api_key': None, 15 | }, 16 | 'cache': { 17 | 'enabled': True, 18 | 'directory': '~/.hermes/cache', 19 | }, 20 | 'source_type': 'auto', 21 | } 22 | 23 | def load_config() -> Dict[str, Any]: 24 | config_path = os.path.expanduser('~/.hermes/config.yml') 25 | if os.path.exists(config_path): 26 | with open(config_path, 'r') as f: 27 | user_config = yaml.safe_load(f) 28 | else: 29 | user_config = {} 30 | 31 | # Merge user config with default config 32 | config = {**DEFAULT_CONFIG, **user_config} 33 | 34 | # Handle API keys 35 | for service in ['llm', 'transcription']: 36 | provider = config[service]['provider'] 37 | env_var = f"{provider.upper()}_API_KEY" 38 | 39 | # If API key is not in config, try to get it from environment 40 | if not config[service]['api_key']: 41 | config[service]['api_key'] = os.getenv(env_var) 42 | 43 | # If still no API key, raise an error 44 | if not config[service]['api_key']: 45 | raise ValueError(f"No API key found for {provider} in config or environment variable {env_var}. " 46 | f"Please set it in your config file or as an environment variable.") 47 | 48 | # Expand user directory for cache 49 | config['cache']['directory'] = os.path.expanduser(config['cache']['directory']) 50 | 51 | return config 52 | 53 | CONFIG = load_config() -------------------------------------------------------------------------------- /hermes/core.py: -------------------------------------------------------------------------------- 1 | import os 2 | from typing import Optional, Dict, Any 3 | from .strategies.source import SourceStrategy 4 | from .strategies.provider import ProviderStrategy 5 | from .utils.cache import Cache 6 | from .utils.llm import LLMProcessor 7 | from .config import CONFIG 8 | 9 | class Hermes: 10 | def __init__(self, config: Dict[str, Any] = None): 11 | self.config = config or CONFIG 12 | self.source_strategy = SourceStrategy.get_strategy(self.config['source_type']) 13 | self.provider_strategy = ProviderStrategy.get_strategy(self.config['transcription']['provider']) 14 | self.cache = Cache(self.config['cache']) 15 | self.llm_processor = LLMProcessor() 16 | 17 | def transcribe(self, source: str, force: bool = False, **kwargs) -> Dict[str, Any]: 18 | """ 19 | Transcribe audio from the given source. 20 | 21 | :param source: The source of the audio (file path, URL, etc.) 22 | :param force: If True, ignore cache and force new transcription 23 | :param kwargs: Additional arguments for the provider 24 | :return: A dictionary containing the transcription and metadata 25 | """ 26 | cache_key = f"{self.source_strategy.__class__.__name__}_{self.provider_strategy.__class__.__name__}_{self.config['transcription']['provider']}_{self.config['transcription']['model']}_{kwargs.get('response_format', 'text')}_{source.replace('/', '_')}" 27 | 28 | if not force: 29 | cached_result = self.cache.get(cache_key) 30 | if cached_result: 31 | return cached_result 32 | 33 | audio_data = self.source_strategy.get_audio(source) 34 | transcription = self.provider_strategy.transcribe(audio_data, params={**kwargs, **self.config['transcription']}) 35 | 36 | result = { 37 | "source": source, 38 | "provider": self.provider_strategy.__class__.__name__, 39 | "transcription": transcription 40 | } 41 | 42 | self.cache.set(cache_key, result) 43 | return result 44 | 45 | def process_with_llm(self, transcription: str, prompt: str) -> str: 46 | """ 47 | Process the transcription with a language model. 48 | 49 | :param transcription: The transcription text to process 50 | :param prompt: The prompt to send to the language model 51 | :return: The processed result from the language model 52 | """ 53 | return self.llm_processor.process(transcription, prompt) 54 | 55 | @classmethod 56 | def from_config(cls, config: Dict[str, Any]) -> 'Hermes': 57 | """ 58 | Create a Hermes instance from a configuration dictionary. 59 | 60 | :param config: A dictionary containing configuration options 61 | :return: A configured Hermes instance 62 | """ 63 | source_strategy = SourceStrategy.get_strategy(config['source_type']) 64 | provider_strategy = ProviderStrategy.get_strategy(config['transcription']['provider']) 65 | 66 | return cls(config) 67 | 68 | def transcribe(source: str, provider: Optional[str] = None, force: bool = False, llm_prompt: Optional[str] = None, model: Optional[str] = None, response_format: str = "text", **kwargs) -> Dict[str, Any]: 69 | """ 70 | Convenience function to transcribe audio and optionally process with LLM. 71 | 72 | :param source: The source of the audio (file path, URL, etc.) 73 | :param provider: The name of the provider to use (default: None, will use the default provider) 74 | :param force: If True, ignore cache and force new transcription 75 | :param llm_prompt: If provided, process the transcription with this prompt using an LLM 76 | :param model: The model to use for transcription 77 | :param response_format: The desired response format (default: "text") 78 | :param kwargs: Additional arguments for the provider 79 | :return: A dictionary containing the transcription, metadata, and optional LLM processing result 80 | """ 81 | config = { 82 | **CONFIG, 83 | 'source_type': 'auto', 84 | 'transcription': { 85 | 'provider': provider or CONFIG['transcription']['provider'], 86 | 'model': model or CONFIG['transcription']['model'], 87 | } 88 | } 89 | hermes = Hermes.from_config(config) 90 | result = hermes.transcribe(source, force=force, response_format=response_format, **kwargs) 91 | 92 | if llm_prompt: 93 | result['llm_processed'] = hermes.process_with_llm(result['transcription'], llm_prompt) 94 | 95 | return result -------------------------------------------------------------------------------- /hermes/strategies/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/unclecode/hermes/806e1833167bc7735d470d3ab37f1f566b6aa463/hermes/strategies/__init__.py -------------------------------------------------------------------------------- /hermes/strategies/provider/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import ProviderStrategy 2 | 3 | def get_provider_strategy(provider_name: str) -> ProviderStrategy: 4 | if provider_name == 'groq': 5 | from .groq import GroqProviderStrategy 6 | return GroqProviderStrategy() 7 | elif provider_name == 'openai': 8 | from .openai import OpenAIProviderStrategy 9 | return OpenAIProviderStrategy() 10 | elif provider_name == 'mlx': 11 | from .mlx import MLXProviderStrategy 12 | return MLXProviderStrategy() 13 | else: 14 | raise ValueError(f"Unknown provider: {provider_name}") -------------------------------------------------------------------------------- /hermes/strategies/provider/base.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import Dict, Any 3 | 4 | class ProviderStrategy(ABC): 5 | @abstractmethod 6 | def transcribe(self, audio_data: bytes, params: Dict[str, Any] = None) -> str: 7 | pass 8 | 9 | @classmethod 10 | def get_strategy(cls, provider_type: str) -> 'ProviderStrategy': 11 | if provider_type == 'groq': 12 | from .groq import GroqProviderStrategy 13 | return GroqProviderStrategy() 14 | elif provider_type == 'openai': 15 | from .openai import OpenAIProviderStrategy 16 | return OpenAIProviderStrategy() 17 | elif provider_type == 'mlx': 18 | from .mlx import MLXProviderStrategy 19 | return MLXProviderStrategy() 20 | else: 21 | raise ValueError(f"Unknown provider type: {provider_type}") -------------------------------------------------------------------------------- /hermes/strategies/provider/groq.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from typing import Dict, Any 4 | from .base import ProviderStrategy 5 | 6 | class GroqProviderStrategy(ProviderStrategy): 7 | def __init__(self): 8 | self.api_key = os.getenv("GROQ_API_KEY") 9 | if not self.api_key: 10 | raise ValueError("GROQ_API_KEY environment variable is not set") 11 | self.base_url = "https://api.groq.com/openai/v1/audio/transcriptions" 12 | 13 | def transcribe(self, audio_data: bytes, params: Dict[str, Any] = None) -> str: 14 | params = params or {} 15 | model = params.get("model", "distil-whisper-large-v3-en") 16 | response_format = params.get("response_format", "text") 17 | 18 | headers = { 19 | "Authorization": f"Bearer {self.api_key}", 20 | } 21 | 22 | files = { 23 | "file": ("audio.wav", audio_data, "audio/wav"), 24 | } 25 | 26 | data = { 27 | "model": model, 28 | "response_format": response_format, 29 | "temperature": 0, 30 | "language": "en", 31 | } 32 | 33 | response = requests.post(self.base_url, headers=headers, files=files, data=data) 34 | response.raise_for_status() 35 | 36 | return response.text 37 | -------------------------------------------------------------------------------- /hermes/strategies/provider/mlx.py: -------------------------------------------------------------------------------- 1 | import tempfile 2 | import subprocess 3 | from typing import Dict, Any 4 | from .base import ProviderStrategy 5 | 6 | class MLXProviderStrategy(ProviderStrategy): 7 | def transcribe(self, audio_data: bytes, params: Dict[str, Any] = None) -> str: 8 | params = params or {} 9 | model = params.get("model", "mlx-community/distil-whisper-large-v3") 10 | output_dir = params.get("output_dir", ".") 11 | 12 | with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_audio: 13 | temp_audio.write(audio_data) 14 | temp_audio_path = temp_audio.name 15 | 16 | command = [ 17 | "mlx_whisper", 18 | temp_audio_path, 19 | "--model", model, 20 | "--output-dir", output_dir, 21 | ] 22 | 23 | result = subprocess.run(command, capture_output=True, text=True) 24 | if result.returncode != 0: 25 | raise RuntimeError(f"MLX Whisper transcription failed: {result.stderr}") 26 | 27 | # Assuming the output is in the same directory with the same name as the input file 28 | output_file = f"{output_dir}/{temp_audio_path.split('/')[-1]}.txt" 29 | with open(output_file, 'r') as f: 30 | transcription = f.read() 31 | 32 | return transcription 33 | -------------------------------------------------------------------------------- /hermes/strategies/provider/openai.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | from typing import Dict, Any 4 | from .base import ProviderStrategy 5 | 6 | class OpenAIProviderStrategy(ProviderStrategy): 7 | def __init__(self): 8 | self.api_key = os.getenv("OPENAI_API_KEY") 9 | if not self.api_key: 10 | raise ValueError("OPENAI_API_KEY environment variable is not set") 11 | self.base_url = "https://api.openai.com/v1/audio/transcriptions" 12 | 13 | def transcribe(self, audio_data: bytes, params: Dict[str, Any] = None) -> str: 14 | params = params or {} 15 | model = params.get("model", "whisper-1") 16 | response_format = params.get("response_format", "text") 17 | 18 | headers = { 19 | "Authorization": f"Bearer {self.api_key}", 20 | } 21 | 22 | files = { 23 | "file": ("audio.wav", audio_data, "audio/wav"), 24 | } 25 | 26 | data = { 27 | "model": model, 28 | "response_format": response_format, 29 | } 30 | 31 | response = requests.post(self.base_url, headers=headers, files=files, data=data) 32 | response.raise_for_status() 33 | 34 | return response.text if response_format == "text" else response.json() -------------------------------------------------------------------------------- /hermes/strategies/source/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import SourceStrategy 2 | from .auto import AutoSourceStrategy -------------------------------------------------------------------------------- /hermes/strategies/source/auto.py: -------------------------------------------------------------------------------- 1 | from urllib.parse import urlparse 2 | from .base import SourceStrategy 3 | from .file import FileSourceStrategy 4 | from .youtube import YouTubeSourceStrategy 5 | from .web import WebSourceStrategy 6 | 7 | class AutoSourceStrategy(SourceStrategy): 8 | def get_audio(self, source: str) -> bytes: 9 | if self.is_youtube_url(source): 10 | return YouTubeSourceStrategy().get_audio(source) 11 | elif self.is_web_url(source): 12 | return WebSourceStrategy().get_audio(source) 13 | else: 14 | return FileSourceStrategy().get_audio(source) 15 | 16 | @staticmethod 17 | def is_youtube_url(url: str) -> bool: 18 | parsed = urlparse(url) 19 | return parsed.netloc in ['www.youtube.com', 'youtube.com', 'youtu.be'] 20 | 21 | @staticmethod 22 | def is_web_url(url: str) -> bool: 23 | parsed = urlparse(url) 24 | return parsed.scheme in ['http', 'https'] -------------------------------------------------------------------------------- /hermes/strategies/source/base.py: -------------------------------------------------------------------------------- 1 | from abc import ABC, abstractmethod 2 | from typing import Any 3 | 4 | class SourceStrategy(ABC): 5 | @abstractmethod 6 | def get_audio(self, source: str) -> bytes: 7 | """ 8 | Retrieve audio data from the given source. 9 | 10 | :param source: The source identifier (e.g., file path, URL) 11 | :return: Audio data as bytes in MP3 format 12 | """ 13 | pass 14 | 15 | @classmethod 16 | def get_strategy(cls, source_type: str) -> 'SourceStrategy': 17 | """ 18 | Factory method to get the appropriate source strategy. 19 | 20 | :param source_type: The type of source strategy to use 21 | :return: An instance of the appropriate SourceStrategy subclass 22 | """ 23 | if source_type == 'auto': 24 | from .auto import AutoSourceStrategy 25 | return AutoSourceStrategy() 26 | elif source_type == 'file': 27 | from .file import FileSourceStrategy 28 | return FileSourceStrategy() 29 | elif source_type == 'youtube': 30 | from .youtube import YouTubeSourceStrategy 31 | return YouTubeSourceStrategy() 32 | elif source_type == 'microphone': 33 | from .microphone import MicrophoneSourceStrategy 34 | return MicrophoneSourceStrategy() 35 | elif source_type == 'clipboard': 36 | from .clipboard import ClipboardSourceStrategy 37 | return ClipboardSourceStrategy() 38 | elif source_type == 'web': 39 | from .web import WebSourceStrategy 40 | return WebSourceStrategy() 41 | else: 42 | raise ValueError(f"Unknown source type: {source_type}") -------------------------------------------------------------------------------- /hermes/strategies/source/clipboard.py: -------------------------------------------------------------------------------- 1 | from .base import SourceStrategy 2 | from ...utils.audio import get_audio_from_clipboard 3 | from typing import Any 4 | class ClipboardSourceStrategy(SourceStrategy): 5 | def get_audio(self, source: str) -> bytes: 6 | audio_data = get_audio_from_clipboard() 7 | return audio_data.export(format="mp3").read() 8 | -------------------------------------------------------------------------------- /hermes/strategies/source/file.py: -------------------------------------------------------------------------------- 1 | import os 2 | from .base import SourceStrategy 3 | from ...utils.audio import load_audio_file, convert_to_wav 4 | from typing import Any 5 | 6 | class FileSourceStrategy(SourceStrategy): 7 | def get_audio(self, source: str) -> bytes: 8 | abs_path = os.path.abspath(source) 9 | if not os.path.isfile(abs_path): 10 | raise FileNotFoundError(f"The file {abs_path} does not exist") 11 | audio = load_audio_file(abs_path) 12 | return convert_to_wav(audio) -------------------------------------------------------------------------------- /hermes/strategies/source/microphone.py: -------------------------------------------------------------------------------- 1 | from .base import SourceStrategy 2 | from ...utils.audio import record_audio 3 | from typing import Any 4 | 5 | class MicrophoneSourceStrategy(SourceStrategy): 6 | def get_audio(self, source: str) -> bytes: 7 | audio_data = record_audio() 8 | return audio_data.export(format="mp3").read() -------------------------------------------------------------------------------- /hermes/strategies/source/web.py: -------------------------------------------------------------------------------- 1 | from .base import SourceStrategy 2 | from ...utils.audio import download_web_audio 3 | from typing import Any 4 | class WebSourceStrategy(SourceStrategy): 5 | def get_audio(self, source: str) -> bytes: 6 | audio_data = download_web_audio(source) 7 | return audio_data.export(format="mp3").read() -------------------------------------------------------------------------------- /hermes/strategies/source/youtube.py: -------------------------------------------------------------------------------- 1 | from .base import SourceStrategy 2 | from ...utils.audio import download_youtube_audio 3 | from typing import Any 4 | from pydub import AudioSegment 5 | 6 | class YouTubeSourceStrategy(SourceStrategy): 7 | def get_audio(self, source: str) -> bytes: 8 | audioSegment : AudioSegment = download_youtube_audio(source) 9 | return audioSegment.export(format="mp3").read() 10 | -------------------------------------------------------------------------------- /hermes/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/unclecode/hermes/806e1833167bc7735d470d3ab37f1f566b6aa463/hermes/utils/__init__.py -------------------------------------------------------------------------------- /hermes/utils/audio.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tempfile 3 | from typing import Any 4 | import ffmpeg 5 | import yt_dlp 6 | import pyperclip 7 | import sounddevice as sd 8 | import numpy as np 9 | import requests 10 | from pydub import AudioSegment 11 | 12 | def load_audio_file(file_path: str) -> AudioSegment: 13 | """ 14 | Load an audio file using pydub. 15 | 16 | :param file_path: Path to the audio file 17 | :return: AudioSegment object 18 | """ 19 | return AudioSegment.from_file(file_path) 20 | 21 | def download_youtube_audio(url: str) -> AudioSegment: 22 | """ 23 | Download audio from a YouTube video. 24 | 25 | :param url: YouTube video URL 26 | :return: AudioSegment object 27 | """ 28 | ydl_opts = { 29 | 'format': 'bestaudio/best', 30 | 'postprocessors': [{ 31 | 'key': 'FFmpegExtractAudio', 32 | 'preferredcodec': 'mp3', 33 | 'preferredquality': '192', 34 | }], 35 | 'outtmpl': '%(id)s.%(ext)s', 36 | } 37 | 38 | with yt_dlp.YoutubeDL(ydl_opts) as ydl: 39 | info = ydl.extract_info(url, download=True) 40 | filename = f"{info['id']}.mp3" 41 | 42 | audio = AudioSegment.from_mp3(filename) 43 | os.remove(filename) 44 | return audio 45 | 46 | def record_audio(duration: int = 10, sample_rate: int = 44100) -> AudioSegment: 47 | """ 48 | Record audio from the microphone. 49 | 50 | :param duration: Recording duration in seconds 51 | :param sample_rate: Sample rate for recording 52 | :return: AudioSegment object 53 | """ 54 | print(f"Recording for {duration} seconds...") 55 | recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=2) 56 | sd.wait() 57 | print("Recording finished.") 58 | 59 | # Convert numpy array to AudioSegment 60 | audio = AudioSegment( 61 | recording.tobytes(), 62 | frame_rate=sample_rate, 63 | sample_width=recording.dtype.itemsize, 64 | channels=2 65 | ) 66 | return audio 67 | 68 | def get_audio_from_clipboard() -> AudioSegment: 69 | """ 70 | Get audio data from the clipboard. 71 | 72 | :return: AudioSegment object 73 | """ 74 | clipboard_content = pyperclip.paste() 75 | if clipboard_content.startswith(('http://', 'https://')): 76 | return download_web_audio(clipboard_content) 77 | else: 78 | raise ValueError("No valid audio URL found in clipboard") 79 | 80 | def download_web_audio(url: str) -> AudioSegment: 81 | """ 82 | Download audio from a web URL. 83 | 84 | :param url: URL of the audio file 85 | :return: AudioSegment object 86 | """ 87 | response = requests.get(url) 88 | response.raise_for_status() 89 | 90 | with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as temp_file: 91 | temp_file.write(response.content) 92 | temp_file_path = temp_file.name 93 | 94 | audio = AudioSegment.from_mp3(temp_file_path) 95 | os.remove(temp_file_path) 96 | return audio 97 | 98 | def convert_to_wav(audio: AudioSegment, sample_rate: int = 16000) -> bytes: 99 | """ 100 | Convert AudioSegment to WAV format with specified sample rate. 101 | 102 | :param audio: AudioSegment object 103 | :param sample_rate: Desired sample rate 104 | :return: WAV audio data as bytes 105 | """ 106 | try: 107 | audio = audio.set_frame_rate(sample_rate).set_channels(1) 108 | buffer = audio.export(format="wav") 109 | return buffer.read() 110 | except Exception as e: 111 | print(f"Error converting audio to WAV: {e}") 112 | return None 113 | 114 | def get_audio_duration(audio: AudioSegment) -> float: 115 | """ 116 | Get the duration of an AudioSegment in seconds. 117 | 118 | :param audio: AudioSegment object 119 | :return: Duration in seconds 120 | """ 121 | return len(audio) / 1000.0 122 | -------------------------------------------------------------------------------- /hermes/utils/cache.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | from pathlib import Path 4 | from typing import Dict, Any, Optional 5 | 6 | class Cache: 7 | def __init__(self, config: Dict[str, Any]): 8 | self.enabled = config.get('enabled', True) 9 | self.cache_dir = Path(config.get('directory', Path.home() / '.hermes' / 'cache')) 10 | if self.enabled: 11 | self.cache_dir.mkdir(parents=True, exist_ok=True) 12 | 13 | def get(self, key: str) -> Optional[str]: 14 | if not self.enabled: 15 | return None 16 | cache_file = self.cache_dir / f"{key}.json" 17 | if cache_file.exists(): 18 | with open(cache_file, 'r') as f: 19 | return json.load(f)['transcription'] 20 | return None 21 | 22 | def set(self, key: str, value: str): 23 | if not self.enabled: 24 | return 25 | cache_file = self.cache_dir / f"{key}.json" 26 | with open(cache_file, 'w') as f: 27 | json.dump({'transcription': value}, f) 28 | 29 | def clear(self): 30 | if not self.enabled: 31 | return 32 | for cache_file in self.cache_dir.glob('*.json'): 33 | cache_file.unlink() -------------------------------------------------------------------------------- /hermes/utils/llm.py: -------------------------------------------------------------------------------- 1 | from typing import Dict, Any 2 | from litellm import completion 3 | from hermes.config import CONFIG 4 | 5 | class LLMProcessor: 6 | def __init__(self): 7 | self.config = CONFIG['llm'] 8 | if not self.config['api_key']: 9 | raise ValueError(f"API key for {self.config['provider']} is required. Set it in config.yml or as an environment variable.") 10 | 11 | def process(self, text: str, prompt: str, **kwargs) -> str: 12 | """ 13 | Process the given text with a language model using the provided prompt. 14 | 15 | :param text: The text to process (e.g., transcription) 16 | :param prompt: The prompt to send to the language model 17 | :param kwargs: Additional arguments for the LLM API call 18 | :return: The processed result from the language model 19 | """ 20 | full_prompt = f"{prompt}\n\nText: {text}" 21 | 22 | messages = [ 23 | {"role": "system", "content": "You are a helpful assistant."}, 24 | {"role": "user", "content": full_prompt} 25 | ] 26 | 27 | # Use a valid model string format 28 | model = f"{self.config['provider']}/{self.config['model']}" 29 | 30 | response = completion( 31 | model=model, 32 | messages=messages, 33 | api_key=self.config['api_key'], 34 | **kwargs 35 | ) 36 | 37 | return response.choices[0].message.content.strip() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | yt-dlp>=2024.8.6 2 | ffmpeg-python>=0.2.0 3 | openai>=1.42.0 4 | groq>=0.9.0 5 | mlx-whisper>=0.3.0 6 | pydub>=0.25.1 7 | pyperclip>=1.9.0 8 | sounddevice>=0.5.0 9 | numpy>=2.0.1 10 | requests>=2.32.3 11 | PyYAML>=6.0.2 12 | litellm>=1.44.5 13 | PyAudio>=0.2.14 -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | from setuptools.command.install import install 3 | import os 4 | import yaml 5 | from pathlib import Path 6 | 7 | # Read requirements.txt 8 | with open("requirements.txt") as f: 9 | requirements = [req for req in f.read().splitlines() if not req.startswith('mlx-')] 10 | 11 | # MLX-specific requirements 12 | mlx_requirements = ["mlx-whisper>=0.3.0"] 13 | 14 | # Define the default configuration 15 | DEFAULT_CONFIG = { 16 | 'llm': { 17 | 'provider': 'groq', 18 | 'model': 'llama-3.1-8b-instant', 19 | 'api_key': None, 20 | }, 21 | 'transcription': { 22 | 'provider': 'groq', 23 | 'model': 'distil-whisper-large-v3-en', 24 | }, 25 | 'cache': { 26 | 'enabled': True, 27 | 'directory': '~/.hermes/cache', 28 | }, 29 | 'source_type': 'auto', 30 | } 31 | 32 | def post_install(): 33 | # Create ~/.hermes directory 34 | hermes_dir = Path.home() / '.hermes' 35 | hermes_dir.mkdir(parents=True, exist_ok=True) 36 | 37 | # Create config.yaml if it doesn't exist 38 | config_path = hermes_dir / 'config.yaml' 39 | if not config_path.exists(): 40 | with open(config_path, 'w') as f: 41 | yaml.dump(DEFAULT_CONFIG, f) 42 | 43 | class PostInstallCommand(install): 44 | def run(self): 45 | install.run(self) 46 | post_install() 47 | 48 | setup( 49 | name="hermes", 50 | version="0.1.0", 51 | packages=find_packages(), 52 | install_requires=requirements, 53 | extras_require={ 54 | "mlx": mlx_requirements, 55 | }, 56 | entry_points={ 57 | "console_scripts": [ 58 | "hermes=hermes.cli:main", 59 | ], 60 | }, 61 | author="UncleCode", 62 | author_email="unclecode@kidocode.com", 63 | description="A versatile video transcription tool", 64 | long_description=open("README.md").read(), 65 | long_description_content_type="text/markdown", 66 | url="https://github.com/unclecode/hermes", 67 | classifiers=[ 68 | "Programming Language :: Python :: 3", 69 | "License :: OSI Approved :: MIT License", 70 | "Operating System :: OS Independent", 71 | ], 72 | python_requires=">=3.7", 73 | cmdclass={ 74 | 'install': PostInstallCommand, 75 | }, 76 | ) -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/unclecode/hermes/806e1833167bc7735d470d3ab37f1f566b6aa463/tests/__init__.py -------------------------------------------------------------------------------- /tests/test_cache.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from pathlib import Path 3 | from unittest.mock import patch, mock_open 4 | from hermes.utils.cache import Cache 5 | 6 | @pytest.fixture 7 | def cache(): 8 | return Cache({'enabled': True, 'directory': '/tmp/hermes_cache'}) 9 | 10 | def test_cache_initialization(cache): 11 | assert cache.enabled == True 12 | assert cache.cache_dir == Path('/tmp/hermes_cache') 13 | 14 | @patch('pathlib.Path.mkdir') 15 | def test_cache_directory_creation(mock_mkdir): 16 | Cache({'enabled': True, 'directory': '/tmp/hermes_cache'}) 17 | mock_mkdir.assert_called_once_with(parents=True, exist_ok=True) 18 | 19 | @patch('pathlib.Path.exists') 20 | @patch('builtins.open', new_callable=mock_open, read_data='{"transcription": "test_value"}') 21 | def test_cache_get(mock_file, mock_exists, cache): 22 | mock_exists.return_value = True 23 | 24 | result = cache.get('test_key') 25 | 26 | assert result == 'test_value' 27 | mock_file.assert_called_once_with(Path('/tmp/hermes_cache/test_key.json'), 'r') 28 | 29 | def test_cache_get_disabled(): 30 | disabled_cache = Cache({'enabled': False}) 31 | result = disabled_cache.get('test_key') 32 | assert result is None 33 | 34 | @patch('pathlib.Path.exists') 35 | def test_cache_get_nonexistent(mock_exists, cache): 36 | mock_exists.return_value = False 37 | result = cache.get('nonexistent_key') 38 | assert result is None 39 | 40 | @patch('builtins.open', new_callable=mock_open) 41 | @patch('json.dump') 42 | def test_cache_set(mock_json_dump, mock_file, cache): 43 | cache.set('test_key', 'test_value') 44 | mock_file.assert_called_once_with(Path('/tmp/hermes_cache/test_key.json'), 'w') 45 | mock_json_dump.assert_called_once_with({'transcription': 'test_value'}, mock_file()) 46 | 47 | def test_cache_set_disabled(): 48 | disabled_cache = Cache({'enabled': False}) 49 | disabled_cache.set('test_key', 'test_value') 50 | # No assertion needed, just make sure it doesn't raise an exception 51 | 52 | @patch('pathlib.Path.exists') 53 | @patch('builtins.open', new_callable=mock_open, read_data='{"transcription": "old_value"}') 54 | @patch('json.dump') 55 | def test_cache_update(mock_json_dump, mock_file, mock_exists, cache): 56 | mock_exists.return_value = True 57 | cache.set('test_key', 'new_value') 58 | mock_file.assert_any_call(Path('/tmp/hermes_cache/test_key.json'), 'w') 59 | mock_json_dump.assert_called_once_with({'transcription': 'new_value'}, mock_file()) 60 | 61 | @patch('pathlib.Path.exists') 62 | @patch('pathlib.Path.unlink') 63 | def test_cache_clear(mock_unlink, mock_exists, cache): 64 | mock_exists.return_value = True 65 | cache.clear() 66 | 67 | def test_cache_clear_disabled(): 68 | disabled_cache = Cache({'enabled': False}) 69 | disabled_cache.clear() 70 | # No assertion needed, just make sure it doesn't raise an exception -------------------------------------------------------------------------------- /tests/test_cli.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from unittest.mock import patch 3 | from hermes.cli import parse_args, main 4 | 5 | def test_parse_args(): 6 | known_args, extra_args = parse_args(['test_source', '-p', 'groq', '-m', 'test_model', '--response_format', 'json']) 7 | assert known_args.source == 'test_source' 8 | assert known_args.provider == 'groq' 9 | assert known_args.model == 'test_model' 10 | assert known_args.response_format == 'json' 11 | assert extra_args == {} 12 | 13 | @patch('hermes.cli.transcribe') 14 | def test_main_success(mock_transcribe): 15 | mock_transcribe.return_value = {'transcription': 'Test transcription'} 16 | with patch('sys.argv', ['hermes', 'test_source']): 17 | main() 18 | mock_transcribe.assert_called_once() 19 | 20 | @patch('hermes.cli.transcribe') 21 | def test_main_with_output_file(mock_transcribe, tmp_path): 22 | mock_transcribe.return_value = {'transcription': 'Test transcription'} 23 | output_file = tmp_path / "output.txt" 24 | with patch('sys.argv', ['hermes', 'test_source', '-o', str(output_file)]): 25 | main() 26 | assert output_file.read_text() == 'Test transcription' 27 | 28 | @patch('hermes.cli.transcribe') 29 | def test_main_error(mock_transcribe, capsys): 30 | mock_transcribe.side_effect = Exception("Test error") 31 | with patch('sys.argv', ['hermes', 'test_source']): 32 | with pytest.raises(SystemExit): 33 | main() 34 | captured = capsys.readouterr() 35 | assert "Error: Test error" in captured.err 36 | 37 | @patch('hermes.cli.transcribe') 38 | def test_main_with_llm_processing(mock_transcribe): 39 | mock_transcribe.return_value = { 40 | 'transcription': 'Test transcription', 41 | 'llm_processed': 'Processed result' 42 | } 43 | with patch('sys.argv', ['hermes', 'test_source', '--llm_prompt', 'Summarize']): 44 | main() 45 | mock_transcribe.assert_called_once_with( 46 | source='test_source', 47 | provider='groq', 48 | force=False, 49 | llm_prompt='Summarize', 50 | model=None, 51 | response_format='text' 52 | ) -------------------------------------------------------------------------------- /tests/test_config.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from unittest.mock import patch, mock_open 3 | from hermes.config import load_config, DEFAULT_CONFIG 4 | import os 5 | 6 | @pytest.fixture 7 | def mock_config_file(): 8 | config_content = """ 9 | llm: 10 | provider: test_provider 11 | model: test_model 12 | api_key: test_key 13 | transcription: 14 | provider: test_transcription_provider 15 | """ 16 | return mock_open(read_data=config_content) 17 | 18 | def test_load_config_default(): 19 | with patch('os.path.exists', return_value=False): 20 | config = load_config() 21 | assert config == DEFAULT_CONFIG 22 | 23 | def test_load_config_custom(mock_config_file): 24 | with patch('os.path.exists', return_value=True), \ 25 | patch('builtins.open', mock_config_file): 26 | config = load_config() 27 | assert config['llm']['provider'] == 'test_provider' 28 | assert config['llm']['model'] == 'test_model' 29 | assert config['transcription']['provider'] == 'test_transcription_provider' 30 | 31 | def test_load_config_env_vars(): 32 | with patch('os.path.exists', return_value=False), \ 33 | patch.dict('os.environ', {'GROQ_API_KEY': 'test_key'}, clear=True): 34 | config = load_config() 35 | assert config['llm']['api_key'] == os.environ['GROQ_API_KEY'] 36 | 37 | # Add more tests as needed -------------------------------------------------------------------------------- /tests/test_core.py: -------------------------------------------------------------------------------- 1 | import os, sys 2 | # Append the parent directory to the sys.path 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import pytest 5 | from unittest.mock import Mock, patch, ANY 6 | from hermes.core import Hermes, transcribe 7 | 8 | @pytest.fixture 9 | def mock_hermes(): 10 | with patch('hermes.core.SourceStrategy'), \ 11 | patch('hermes.core.ProviderStrategy'), \ 12 | patch('hermes.core.Cache'), \ 13 | patch('hermes.core.LLMProcessor'): 14 | yield Hermes() 15 | 16 | def test_hermes_initialization(mock_hermes): 17 | assert mock_hermes.source_strategy is not None 18 | assert mock_hermes.provider_strategy is not None 19 | assert mock_hermes.cache is not None 20 | assert mock_hermes.llm_processor is not None 21 | 22 | def test_hermes_transcribe(mock_hermes): 23 | mock_hermes.cache.get.return_value = None 24 | mock_hermes.source_strategy.get_audio.return_value = b'audio_data' 25 | mock_hermes.provider_strategy.transcribe.return_value = 'Transcription result' 26 | 27 | result = mock_hermes.transcribe('test_source') 28 | 29 | assert result['source'] == 'test_source' 30 | assert result['transcription'] == 'Transcription result' 31 | mock_hermes.source_strategy.get_audio.assert_called_once_with('test_source') 32 | mock_hermes.provider_strategy.transcribe.assert_called_once_with( 33 | b'audio_data', 34 | params={'provider': ANY, 'model': ANY} 35 | ) 36 | mock_hermes.cache.set.assert_called_once() 37 | 38 | def test_hermes_transcribe_cached(mock_hermes): 39 | mock_hermes.cache.get.return_value = {'cached': 'result'} 40 | 41 | result = mock_hermes.transcribe('test_source') 42 | 43 | assert result == {'cached': 'result'} 44 | mock_hermes.source_strategy.get_audio.assert_not_called() 45 | mock_hermes.provider_strategy.transcribe.assert_not_called() 46 | 47 | @patch('hermes.core.Hermes') 48 | def test_transcribe_function(mock_hermes_class): 49 | mock_hermes_instance = Mock() 50 | mock_hermes_class.from_config.return_value = mock_hermes_instance 51 | mock_hermes_instance.transcribe.return_value = {'transcription': 'Test'} 52 | 53 | result = transcribe('test_source', provider='test_provider', force=True) 54 | 55 | assert result['transcription'] == 'Test' 56 | mock_hermes_class.from_config.assert_called_once() 57 | mock_hermes_instance.transcribe.assert_called_once_with('test_source', force=True, response_format='text') 58 | 59 | def test_hermes_process_with_llm(mock_hermes): 60 | mock_hermes.llm_processor.process.return_value = 'Processed result' 61 | 62 | result = mock_hermes.process_with_llm('Test transcription', 'Summarize') 63 | 64 | assert result == 'Processed result' 65 | mock_hermes.llm_processor.process.assert_called_once_with('Test transcription', 'Summarize') 66 | 67 | if __name__ == "__main__": 68 | pytest.main() -------------------------------------------------------------------------------- /tests/test_integration.py: -------------------------------------------------------------------------------- 1 | import os, sys 2 | # Add the parent directory to the sys.path 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | 5 | import pytest 6 | from hermes.core import transcribe 7 | 8 | @pytest.mark.integration 9 | def test_transcribe_local_file(): 10 | result = transcribe('tests/assets/input.mp4', provider='groq') 11 | assert 'transcription' in result 12 | assert isinstance(result['transcription'], str) 13 | assert len(result['transcription']) > 0 14 | 15 | @pytest.mark.integration 16 | def test_transcribe_youtube_video(): 17 | result = transcribe('https://www.youtube.com/watch?v=v=PNulbFECY-I', provider='openai') 18 | assert 'transcription' in result 19 | assert isinstance(result['transcription'], str) 20 | assert len(result['transcription']) > 0 21 | 22 | @pytest.mark.integration 23 | def test_transcribe_with_llm_processing(): 24 | result = transcribe('tests/assets/input.mp4', provider='groq', llm_prompt='Summarize this transcription') 25 | assert 'transcription' in result 26 | assert 'llm_processed' in result 27 | assert isinstance(result['llm_processed'], str) 28 | assert len(result['llm_processed']) > 0 29 | 30 | @pytest.mark.integration 31 | def test_transcribe_with_different_response_formats(): 32 | for format in ['json', 'text', 'srt', 'vtt']: 33 | result = transcribe('tests/assets/input.mp4', provider='mlx', response_format=format) 34 | assert 'transcription' in result 35 | if format == 'json': 36 | assert isinstance(result['transcription'], dict) 37 | else: 38 | assert isinstance(result['transcription'], str) 39 | 40 | if __name__ == "__main__": 41 | pytest.main(["-v", __file__]) -------------------------------------------------------------------------------- /tests/test_llm_processor.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from unittest.mock import patch, Mock 3 | from hermes.utils.llm import LLMProcessor 4 | 5 | @pytest.fixture 6 | def mock_config(): 7 | return { 8 | 'llm': { 9 | 'provider': 'groq', 10 | 'model': 'llama-3.1-8b-instant', 11 | 'api_key': 'test_api_key' 12 | } 13 | } 14 | 15 | def test_llm_processor_initialization(mock_config): 16 | with patch('hermes.utils.llm.CONFIG', mock_config): 17 | processor = LLMProcessor() 18 | assert processor.config == mock_config['llm'] 19 | 20 | def test_llm_processor_missing_api_key(): 21 | with patch('hermes.utils.llm.CONFIG', {'llm': {'provider': 'openai', 'model': 'gpt-3.5-turbo', 'api_key': None}}): 22 | with pytest.raises(ValueError, match="API key for openai is required"): 23 | LLMProcessor() 24 | 25 | @patch('hermes.utils.llm.completion') 26 | def test_llm_processor_process(mock_completion, mock_config): 27 | mock_response = Mock() 28 | mock_response.choices = [Mock(message=Mock(content='Processed result'))] 29 | mock_completion.return_value = mock_response 30 | 31 | with patch('hermes.utils.llm.CONFIG', mock_config): 32 | processor = LLMProcessor() 33 | result = processor.process('Test input', 'Test prompt') 34 | 35 | assert result == 'Processed result' 36 | mock_completion.assert_called_once_with( 37 | model=mock_config['llm']['model'], 38 | messages=[ 39 | {"role": "system", "content": "You are a helpful assistant."}, 40 | {"role": "user", "content": "Test prompt\n\nText: Test input"} 41 | ], 42 | api_key='test_api_key' 43 | ) -------------------------------------------------------------------------------- /tests/test_provider_strategies.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from unittest.mock import patch, Mock 3 | from hermes.strategies.provider import ProviderStrategy, GroqProviderStrategy, OpenAIProviderStrategy, MLXProviderStrategy 4 | 5 | 6 | def test_get_strategy(): 7 | groq_strategy = ProviderStrategy.get_strategy('groq') 8 | assert isinstance(groq_strategy, GroqProviderStrategy) 9 | 10 | openai_strategy = ProviderStrategy.get_strategy('openai') 11 | assert isinstance(openai_strategy, OpenAIProviderStrategy) 12 | 13 | mlx_strategy = ProviderStrategy.get_strategy('mlx') 14 | assert isinstance(mlx_strategy, MLXProviderStrategy) 15 | 16 | with pytest.raises(ValueError): 17 | ProviderStrategy.get_strategy('invalid_provider') 18 | 19 | @patch('os.getenv') 20 | @patch('requests.post') 21 | def test_groq_provider_strategy(mock_post, mock_getenv): 22 | mock_getenv.return_value = 'fake_api_key' 23 | mock_response = Mock() 24 | mock_response.text = 'Transcription result' 25 | mock_post.return_value = mock_response 26 | 27 | strategy = GroqProviderStrategy() 28 | result = strategy.transcribe(b'audio data', {'model': 'test-model', 'response_format': 'text'}) 29 | 30 | assert result == 'Transcription result' 31 | mock_post.assert_called_once() 32 | args, kwargs = mock_post.call_args 33 | assert args[0] == 'https://api.groq.com/openai/v1/audio/transcriptions' 34 | assert kwargs['headers']['Authorization'] == 'Bearer fake_api_key' 35 | assert kwargs['data']['model'] == 'test-model' 36 | assert kwargs['data']['response_format'] == 'text' 37 | 38 | @patch('os.getenv') 39 | @patch('openai.Audio.transcribe') 40 | def test_openai_provider_strategy(mock_transcribe, mock_getenv): 41 | mock_getenv.return_value = 'fake_api_key' 42 | mock_transcribe.return_value.text = 'OpenAI transcription result' 43 | 44 | strategy = OpenAIProviderStrategy() 45 | result = strategy.transcribe(b'audio data', {'model': 'whisper-1', 'response_format': 'text'}) 46 | 47 | assert result == 'OpenAI transcription result' 48 | mock_transcribe.assert_called_once_with( 49 | model='whisper-1', 50 | file=('audio.wav', b'audio data'), 51 | response_format='text' 52 | ) 53 | 54 | @patch('subprocess.run') 55 | def test_mlx_provider_strategy(mock_run): 56 | mock_run.return_value.returncode = 0 57 | mock_run.return_value.stdout = 'MLX transcription result' 58 | 59 | with patch('builtins.open', mock_open(read_data='MLX transcription result')): 60 | strategy = MLXProviderStrategy() 61 | result = strategy.transcribe(b'audio data', {'model': 'mlx-community/test-model'}) 62 | 63 | assert result == 'MLX transcription result' 64 | mock_run.assert_called_once() 65 | args, kwargs = mock_run.call_args 66 | assert args[0][0] == 'mlx_whisper' 67 | assert args[0][2] == '--model' 68 | assert args[0][3] == 'mlx-community/test-model' -------------------------------------------------------------------------------- /tests/test_source_strategies.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from unittest.mock import patch, mock_open 3 | from hermes.strategies.source import SourceStrategy, AutoSourceStrategy 4 | 5 | def test_get_strategy(): 6 | strategy = SourceStrategy.get_strategy('auto') 7 | assert isinstance(strategy, AutoSourceStrategy) 8 | 9 | with pytest.raises(ValueError): 10 | SourceStrategy.get_strategy('invalid_type') 11 | 12 | @patch('hermes.strategies.source.auto.AutoSourceStrategy.is_youtube_url') 13 | @patch('hermes.strategies.source.auto.AutoSourceStrategy.is_web_url') 14 | @patch('hermes.strategies.source.youtube.YouTubeSourceStrategy.get_audio') 15 | @patch('hermes.strategies.source.web.WebSourceStrategy.get_audio') 16 | @patch('hermes.strategies.source.file.FileSourceStrategy.get_audio') 17 | def test_auto_source_strategy(mock_file, mock_web, mock_youtube, mock_is_web, mock_is_youtube): 18 | strategy = AutoSourceStrategy() 19 | 20 | # Test YouTube URL 21 | mock_is_youtube.return_value = True 22 | mock_youtube.return_value = b'youtube audio' 23 | assert strategy.get_audio('https://www.youtube.com/watch?v=v=PNulbFECY-I') == b'youtube audio' 24 | 25 | # Test Web URL 26 | mock_is_youtube.return_value = False 27 | mock_is_web.return_value = True 28 | mock_web.return_value = b'web audio' 29 | assert strategy.get_audio('https://example.com/audio.mp3') == b'web audio' 30 | 31 | # Test File 32 | mock_is_youtube.return_value = False 33 | mock_is_web.return_value = False 34 | mock_file.return_value = b'file audio' 35 | assert strategy.get_audio('path/to/audio.mp3') == b'file audio' 36 | 37 | @patch('hermes.strategies.source.auto.AutoSourceStrategy.is_youtube_url') 38 | @patch('hermes.strategies.source.auto.AutoSourceStrategy.is_web_url') 39 | def test_auto_source_strategy_url_detection(mock_is_web, mock_is_youtube): 40 | strategy = AutoSourceStrategy() 41 | 42 | mock_is_youtube.return_value = True 43 | assert strategy.is_youtube_url('https://www.youtube.com/watch?v=v=PNulbFECY-I') 44 | assert strategy.is_youtube_url('https://youtu.be/v=PNulbFECY-I') 45 | 46 | mock_is_youtube.return_value = False 47 | mock_is_web.return_value = True 48 | assert strategy.is_web_url('https://example.com/audio.mp3') 49 | assert strategy.is_web_url('http://example.com/audio.wav') 50 | 51 | mock_is_web.return_value = False 52 | assert not strategy.is_web_url('file:///path/to/audio.mp3') 53 | assert not strategy.is_web_url('/path/to/audio.mp3') --------------------------------------------------------------------------------