├── SimplerLLM ├── tools │ ├── __init__.py │ ├── python_func.py │ ├── file_loader.py │ ├── pandas_func.py │ ├── file_functions.py │ ├── email_functions.py │ ├── youtube.py │ ├── generic_loader.py │ └── rapid_api.py ├── utils │ ├── __init__.py │ └── custom_verbose.py ├── prompts │ ├── __init__.py │ ├── hub │ │ ├── __init__.py │ │ └── agentic_prompts.py │ ├── messages_template.py │ └── prompt_builder.py ├── language │ ├── llm_providers │ │ ├── __init__.py │ │ ├── llm_response_models.py │ │ ├── deepseek_llm.py │ │ └── ollama_llm.py │ ├── flow │ │ ├── __init__.py │ │ ├── models.py │ │ └── tool_registry.py │ ├── llm_router │ │ ├── __init__.py │ │ └── models.py │ ├── llm │ │ ├── wrappers │ │ │ └── __init__.py │ │ ├── __init__.py │ │ └── base.py │ ├── guardrails │ │ ├── input_guardrails │ │ │ ├── __init__.py │ │ │ └── prompt_injection.py │ │ ├── output_guardrails │ │ │ └── __init__.py │ │ ├── __init__.py │ │ └── exceptions.py │ ├── llm_brainstorm │ │ └── __init__.py │ ├── llm_feedback │ │ └── __init__.py │ ├── llm_judge │ │ └── __init__.py │ ├── llm_validator │ │ ├── __init__.py │ │ └── models.py │ ├── llm_provider_router │ │ └── __init__.py │ ├── llm_retrieval │ │ └── __init__.py │ ├── llm_clustering │ │ └── __init__.py │ └── __init__.py ├── voice │ ├── stt │ │ ├── wrappers │ │ │ └── __init__.py │ │ ├── providers │ │ │ ├── __init__.py │ │ │ └── stt_response_models.py │ │ ├── __init__.py │ │ └── base.py │ ├── tts │ │ ├── __init__.py │ │ ├── wrappers │ │ │ └── __init__.py │ │ ├── providers │ │ │ ├── __init__.py │ │ │ └── tts_response_models.py │ │ └── base.py │ ├── realtime_voice │ │ ├── wrappers │ │ │ └── __init__.py │ │ ├── providers │ │ │ └── __init__.py │ │ └── __init__.py │ ├── dialogue_generator │ │ └── __init__.py │ ├── voice_chat │ │ ├── __init__.py │ │ ├── conversation.py │ │ └── models.py │ ├── live_voice_chat │ │ ├── __init__.py │ │ └── models.py │ ├── video_transcription │ │ ├── utils │ │ │ ├── __init__.py │ │ │ └── subtitle_formatter.py │ │ ├── __init__.py │ │ └── models.py │ ├── video_dubbing │ │ ├── __init__.py │ │ └── models.py │ └── __init__.py ├── vectors │ ├── vector_providers.py │ └── __init__.py └── image │ ├── generation │ ├── wrappers │ │ └── __init__.py │ ├── providers │ │ ├── __init__.py │ │ └── image_response_models.py │ └── __init__.py │ └── __init__.py ├── Documentation ├── static │ ├── .nojekyll │ └── img │ │ ├── Logo.png │ │ ├── favicon.ico │ │ ├── docusaurus.png │ │ └── docusaurus-social-card.jpg ├── docs │ ├── Advanced Tools │ │ ├── _category_.json │ │ ├── Extract YouTube Data.md │ │ ├── Chunking Methods.md │ │ ├── File Operations.md │ │ ├── Generic RapidAPI Loader.md │ │ └── Search Engine Integration.md │ ├── LLM Interaction │ │ ├── _category_.json │ │ ├── Prompt Builders.md │ │ ├── Getting Started.md │ │ └── Consistent JSON from any LLM.md │ ├── Vector Storage │ │ ├── _category_.json │ │ └── Vector Embeddings.md │ ├── Image Processing │ │ └── _category_.json │ └── Advanced AI Features │ │ └── _category_.json ├── babel.config.js ├── src │ ├── pages │ │ ├── markdown-page.md │ │ ├── index.module.css │ │ └── main_home_page.js │ ├── components │ │ └── HomepageFeatures │ │ │ ├── styles.module.css │ │ │ └── index.js │ └── css │ │ └── custom.css ├── blog │ ├── 2021-08-26-welcome │ │ ├── docusaurus-plushie-banner.jpeg │ │ └── index.md │ ├── 2019-05-28-first-blog-post.md │ ├── tags.yml │ ├── 2021-08-01-mdx-blog-post.mdx │ ├── authors.yml │ └── 2019-05-29-long-blog-post.md ├── .gitignore ├── sidebars.js ├── README.md ├── package.json └── docusaurus.config.js ├── .gitattributes ├── cartoon_me.png ├── confident_smirk.png ├── .claude └── settings.local.json ├── .env-example ├── requirements.txt ├── pytest.ini ├── LICENSE ├── generate_smirk_shot.py ├── .gitignore └── setup.py /SimplerLLM/tools/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /SimplerLLM/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Documentation/static/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /SimplerLLM/prompts/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_providers/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /cartoon_me.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/cartoon_me.png -------------------------------------------------------------------------------- /confident_smirk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/confident_smirk.png -------------------------------------------------------------------------------- /Documentation/docs/Advanced Tools/_category_.json: -------------------------------------------------------------------------------- 1 | { 2 | "label": "Advanced Tools", 3 | "position": 4 4 | } -------------------------------------------------------------------------------- /Documentation/docs/LLM Interaction/_category_.json: -------------------------------------------------------------------------------- 1 | { 2 | "label": "LLM Interaction", 3 | "position": 2 4 | } -------------------------------------------------------------------------------- /Documentation/docs/Vector Storage/_category_.json: -------------------------------------------------------------------------------- 1 | { 2 | "label": "Vector Storage", 3 | "position": 3 4 | } -------------------------------------------------------------------------------- /Documentation/docs/Image Processing/_category_.json: -------------------------------------------------------------------------------- 1 | { 2 | "label": "Image Processing", 3 | "position": 5 4 | } 5 | -------------------------------------------------------------------------------- /Documentation/static/img/Logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/Documentation/static/img/Logo.png -------------------------------------------------------------------------------- /Documentation/babel.config.js: -------------------------------------------------------------------------------- 1 | module.exports = { 2 | presets: [require.resolve('@docusaurus/core/lib/babel/preset')], 3 | }; 4 | -------------------------------------------------------------------------------- /Documentation/docs/Advanced AI Features/_category_.json: -------------------------------------------------------------------------------- 1 | { 2 | "label": "Advanced AI Features", 3 | "position": 3 4 | } 5 | -------------------------------------------------------------------------------- /Documentation/static/img/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/Documentation/static/img/favicon.ico -------------------------------------------------------------------------------- /SimplerLLM/voice/stt/wrappers/__init__.py: -------------------------------------------------------------------------------- 1 | from .openai_wrapper import OpenAISTT 2 | 3 | __all__ = [ 4 | 'OpenAISTT', 5 | ] 6 | -------------------------------------------------------------------------------- /Documentation/static/img/docusaurus.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/Documentation/static/img/docusaurus.png -------------------------------------------------------------------------------- /SimplerLLM/vectors/vector_providers.py: -------------------------------------------------------------------------------- 1 | from enum import Enum 2 | 3 | 4 | class VectorProvider(Enum): 5 | LOCAL = 1 6 | QDRANT = 2 7 | -------------------------------------------------------------------------------- /Documentation/static/img/docusaurus-social-card.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/Documentation/static/img/docusaurus-social-card.jpg -------------------------------------------------------------------------------- /Documentation/src/pages/markdown-page.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Markdown page example 3 | --- 4 | 5 | # Markdown page example 6 | 7 | You don't need React to write simple standalone pages. 8 | -------------------------------------------------------------------------------- /Documentation/blog/2021-08-26-welcome/docusaurus-plushie-banner.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hassancs91/SimplerLLM/HEAD/Documentation/blog/2021-08-26-welcome/docusaurus-plushie-banner.jpeg -------------------------------------------------------------------------------- /SimplerLLM/voice/stt/providers/__init__.py: -------------------------------------------------------------------------------- 1 | from .stt_response_models import STTFullResponse 2 | from . import openai_stt 3 | 4 | __all__ = [ 5 | 'STTFullResponse', 6 | 'openai_stt', 7 | ] 8 | -------------------------------------------------------------------------------- /SimplerLLM/language/flow/__init__.py: -------------------------------------------------------------------------------- 1 | from .flow import MiniAgent 2 | from .models import StepResult, FlowResult 3 | 4 | __all__ = [ 5 | 'MiniAgent', 6 | 'StepResult', 7 | 'FlowResult', 8 | ] 9 | -------------------------------------------------------------------------------- /.claude/settings.local.json: -------------------------------------------------------------------------------- 1 | { 2 | "permissions": { 3 | "allow": [ 4 | "Bash(find:*)", 5 | "Bash(python -c:*)", 6 | "Bash(tree:*)" 7 | ], 8 | "deny": [], 9 | "ask": [] 10 | } 11 | } 12 | -------------------------------------------------------------------------------- /Documentation/src/components/HomepageFeatures/styles.module.css: -------------------------------------------------------------------------------- 1 | .features { 2 | display: flex; 3 | align-items: center; 4 | padding: 2rem 0; 5 | width: 100%; 6 | } 7 | 8 | .featureSvg { 9 | height: 200px; 10 | width: 200px; 11 | } 12 | -------------------------------------------------------------------------------- /SimplerLLM/vectors/__init__.py: -------------------------------------------------------------------------------- 1 | from .vector_db import VectorDB 2 | from .vector_providers import VectorProvider 3 | from .simpler_vector import SimplerVectors, SerializationFormat 4 | 5 | __all__ = ['VectorDB', 'VectorProvider', 'SimplerVectors', 'SerializationFormat'] 6 | -------------------------------------------------------------------------------- /SimplerLLM/voice/stt/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import STT, STTProvider 2 | from .wrappers import OpenAISTT 3 | from .providers import STTFullResponse 4 | 5 | __all__ = [ 6 | 'STT', 7 | 'STTProvider', 8 | 'OpenAISTT', 9 | 'STTFullResponse', 10 | ] 11 | -------------------------------------------------------------------------------- /SimplerLLM/voice/tts/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import TTS, TTSProvider 2 | from .wrappers import OpenAITTS, ElevenLabsTTS 3 | from .providers import TTSFullResponse 4 | 5 | __all__ = [ 6 | 'TTS', 7 | 'TTSProvider', 8 | 'OpenAITTS', 9 | 'ElevenLabsTTS', 10 | 'TTSFullResponse', 11 | ] 12 | -------------------------------------------------------------------------------- /.env-example: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY = "XXX" 2 | GEMENI_API_KEY = "XXX" 3 | CLAUDE_API_KEY = "XXX" 4 | RAPIDAPI_API_KEY = "XXX" 5 | VALUE_SERP_API_KEY = "XXX" 6 | SERPER_API_KEY = "XXX" 7 | STABILITY_API_KEY = "XXX" 8 | 9 | # Retry Mechanism 10 | MAX_RETRIES = 3 11 | RETRY_DELAY = 2 12 | 13 | 14 | STREAMING_DELAY = 0.1 15 | -------------------------------------------------------------------------------- /SimplerLLM/voice/realtime_voice/wrappers/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | User-facing wrappers for Realtime Voice API providers. 3 | """ 4 | 5 | from .openai_wrapper import OpenAIRealtimeVoice 6 | from .elevenlabs_wrapper import ElevenLabsRealtimeVoice 7 | 8 | __all__ = [ 9 | 'OpenAIRealtimeVoice', 10 | 'ElevenLabsRealtimeVoice' 11 | ] 12 | -------------------------------------------------------------------------------- /Documentation/.gitignore: -------------------------------------------------------------------------------- 1 | # Dependencies 2 | /node_modules 3 | 4 | # Production 5 | /build 6 | 7 | # Generated files 8 | .docusaurus 9 | .cache-loader 10 | 11 | # Misc 12 | .DS_Store 13 | .env.local 14 | .env.development.local 15 | .env.test.local 16 | .env.production.local 17 | 18 | npm-debug.log* 19 | yarn-debug.log* 20 | yarn-error.log* 21 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_router/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM Router package initialization. 3 | 4 | This package provides a system for intelligent content routing using LLMs. 5 | """ 6 | 7 | from .models import RouterResponse, Choice, PromptTemplate 8 | from .router import LLMRouter 9 | 10 | __all__ = ['RouterResponse', 'Choice', 'PromptTemplate', 'LLMRouter'] 11 | -------------------------------------------------------------------------------- /Documentation/blog/2019-05-28-first-blog-post.md: -------------------------------------------------------------------------------- 1 | --- 2 | slug: first-blog-post 3 | title: First Blog Post 4 | authors: [slorber, yangshun] 5 | tags: [hola, docusaurus] 6 | --- 7 | 8 | Lorem ipsum dolor sit amet... 9 | 10 | 11 | 12 | ...consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 13 | -------------------------------------------------------------------------------- /SimplerLLM/voice/tts/wrappers/__init__.py: -------------------------------------------------------------------------------- 1 | from .openai_wrapper import OpenAITTS 2 | 3 | # Optional wrapper - only import if elevenlabs is available 4 | try: 5 | from .elevenlabs_wrapper import ElevenLabsTTS 6 | _has_elevenlabs = True 7 | except ImportError: 8 | _has_elevenlabs = False 9 | ElevenLabsTTS = None 10 | 11 | __all__ = [ 12 | 'OpenAITTS', 13 | ] 14 | 15 | if _has_elevenlabs: 16 | __all__.append('ElevenLabsTTS') 17 | -------------------------------------------------------------------------------- /Documentation/blog/tags.yml: -------------------------------------------------------------------------------- 1 | facebook: 2 | label: Facebook 3 | permalink: /facebook 4 | description: Facebook tag description 5 | 6 | hello: 7 | label: Hello 8 | permalink: /hello 9 | description: Hello tag description 10 | 11 | docusaurus: 12 | label: Docusaurus 13 | permalink: /docusaurus 14 | description: Docusaurus tag description 15 | 16 | hola: 17 | label: Hola 18 | permalink: /hola 19 | description: Hola tag description 20 | -------------------------------------------------------------------------------- /SimplerLLM/image/generation/wrappers/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Image generation wrapper classes. 3 | Provides unified interfaces for different image generation providers. 4 | """ 5 | 6 | from .openai_wrapper import OpenAIImageGenerator 7 | from .stability_wrapper import StabilityImageGenerator 8 | from .google_wrapper import GoogleImageGenerator 9 | 10 | __all__ = [ 11 | 'OpenAIImageGenerator', 12 | 'StabilityImageGenerator', 13 | 'GoogleImageGenerator', 14 | ] 15 | -------------------------------------------------------------------------------- /SimplerLLM/image/generation/providers/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Image generation provider implementations. 3 | Contains the actual API calls to different image generation services. 4 | """ 5 | 6 | from .image_response_models import ImageGenerationResponse 7 | from . import openai_image 8 | from . import stability_image 9 | from . import google_image 10 | 11 | __all__ = [ 12 | 'ImageGenerationResponse', 13 | 'openai_image', 14 | 'stability_image', 15 | 'google_image', 16 | ] 17 | -------------------------------------------------------------------------------- /SimplerLLM/voice/dialogue_generator/__init__.py: -------------------------------------------------------------------------------- 1 | from .dialogue_generator import DialogueGenerator 2 | from .models import ( 3 | Dialogue, 4 | DialogueLine, 5 | SpeakerConfig, 6 | DialogueGenerationConfig, 7 | AudioDialogueResult, 8 | DialogueStyle 9 | ) 10 | 11 | __all__ = [ 12 | 'DialogueGenerator', 13 | 'Dialogue', 14 | 'DialogueLine', 15 | 'SpeakerConfig', 16 | 'DialogueGenerationConfig', 17 | 'AudioDialogueResult', 18 | 'DialogueStyle', 19 | ] 20 | -------------------------------------------------------------------------------- /Documentation/src/pages/index.module.css: -------------------------------------------------------------------------------- 1 | /** 2 | * CSS files with the .module.css suffix will be treated as CSS modules 3 | * and scoped locally. 4 | */ 5 | 6 | .heroBanner { 7 | padding: 4rem 0; 8 | text-align: center; 9 | position: relative; 10 | overflow: hidden; 11 | } 12 | 13 | @media screen and (max-width: 996px) { 14 | .heroBanner { 15 | padding: 2rem; 16 | } 17 | } 18 | 19 | .buttons { 20 | display: flex; 21 | align-items: center; 22 | justify-content: center; 23 | } 24 | -------------------------------------------------------------------------------- /SimplerLLM/voice/tts/providers/__init__.py: -------------------------------------------------------------------------------- 1 | from .tts_response_models import TTSFullResponse 2 | from . import openai_tts 3 | 4 | # Optional provider - only import if elevenlabs package is installed 5 | try: 6 | from . import elevenlabs_tts 7 | _has_elevenlabs = True 8 | except ImportError: 9 | _has_elevenlabs = False 10 | elevenlabs_tts = None 11 | 12 | __all__ = [ 13 | 'TTSFullResponse', 14 | 'openai_tts', 15 | ] 16 | 17 | if _has_elevenlabs: 18 | __all__.append('elevenlabs_tts') 19 | -------------------------------------------------------------------------------- /SimplerLLM/voice/voice_chat/__init__.py: -------------------------------------------------------------------------------- 1 | from .voice_chat import VoiceChat 2 | from .conversation import ConversationManager 3 | from .models import ( 4 | VoiceChatConfig, 5 | ConversationMessage, 6 | ConversationRole, 7 | VoiceTurnResult, 8 | VoiceChatSession 9 | ) 10 | 11 | __all__ = [ 12 | 'VoiceChat', 13 | 'ConversationManager', 14 | 'VoiceChatConfig', 15 | 'ConversationMessage', 16 | 'ConversationRole', 17 | 'VoiceTurnResult', 18 | 'VoiceChatSession', 19 | ] 20 | -------------------------------------------------------------------------------- /SimplerLLM/voice/realtime_voice/providers/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Provider implementations for Realtime Voice API. 3 | """ 4 | 5 | from .realtime_response_models import ( 6 | RealtimeFullResponse, 7 | RealtimeStreamChunk, 8 | RealtimeSessionInfo, 9 | RealtimeConversationItem, 10 | RealtimeFunctionCallResult 11 | ) 12 | 13 | __all__ = [ 14 | 'RealtimeFullResponse', 15 | 'RealtimeStreamChunk', 16 | 'RealtimeSessionInfo', 17 | 'RealtimeConversationItem', 18 | 'RealtimeFunctionCallResult' 19 | ] 20 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm/wrappers/__init__.py: -------------------------------------------------------------------------------- 1 | from .openai_wrapper import OpenAILLM 2 | from .gemini_wrapper import GeminiLLM 3 | from .anthropic_wrapper import AnthropicLLM 4 | from .ollama_wrapper import OllamaLLM 5 | from .deepseek_wrapper import DeepSeekLLM 6 | from .cohere_wrapper import CohereLLM 7 | from .perplexity_wrapper import PerplexityLLM 8 | 9 | __all__ = [ 10 | 'OpenAILLM', 11 | 'GeminiLLM', 12 | 'AnthropicLLM', 13 | 'OllamaLLM', 14 | 'DeepSeekLLM', 15 | 'CohereLLM', 16 | 'PerplexityLLM', 17 | ] 18 | -------------------------------------------------------------------------------- /SimplerLLM/image/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Image module for SimplerLLM. 3 | Provides image generation and manipulation capabilities across multiple providers. 4 | """ 5 | 6 | from .generation import ( 7 | ImageGenerator, 8 | ImageProvider, 9 | ImageSize, 10 | OpenAIImageGenerator, 11 | StabilityImageGenerator, 12 | GoogleImageGenerator, 13 | ImageGenerationResponse, 14 | ) 15 | 16 | __all__ = [ 17 | 'ImageGenerator', 18 | 'ImageProvider', 19 | 'ImageSize', 20 | 'OpenAIImageGenerator', 21 | 'StabilityImageGenerator', 22 | 'GoogleImageGenerator', 23 | 'ImageGenerationResponse', 24 | ] 25 | -------------------------------------------------------------------------------- /Documentation/blog/2021-08-01-mdx-blog-post.mdx: -------------------------------------------------------------------------------- 1 | --- 2 | slug: mdx-blog-post 3 | title: MDX Blog Post 4 | authors: [slorber] 5 | tags: [docusaurus] 6 | --- 7 | 8 | Blog posts support [Docusaurus Markdown features](https://docusaurus.io/docs/markdown-features), such as [MDX](https://mdxjs.com/). 9 | 10 | :::tip 11 | 12 | Use the power of React to create interactive blog posts. 13 | 14 | ::: 15 | 16 | {/* truncate */} 17 | 18 | For example, use JSX to create an interactive button: 19 | 20 | ```js 21 | 22 | ``` 23 | 24 | 25 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import LLM, LLMProvider 2 | from .reliable import ReliableLLM 3 | from .wrappers.openai_wrapper import OpenAILLM 4 | from .wrappers.gemini_wrapper import GeminiLLM 5 | from .wrappers.anthropic_wrapper import AnthropicLLM 6 | from .wrappers.ollama_wrapper import OllamaLLM 7 | from .wrappers.deepseek_wrapper import DeepSeekLLM 8 | from .wrappers.perplexity_wrapper import PerplexityLLM 9 | 10 | __all__ = [ 11 | 'LLM', 12 | 'LLMProvider', 13 | 'ReliableLLM', 14 | 'Message', 15 | 'OpenAILLM', 16 | 'GeminiLLM', 17 | 'AnthropicLLM', 18 | 'OllamaLLM', 19 | 'DeepSeekLLM', 20 | 'PerplexityLLM', 21 | ] 22 | -------------------------------------------------------------------------------- /SimplerLLM/voice/live_voice_chat/__init__.py: -------------------------------------------------------------------------------- 1 | from .models import LiveVoiceChatConfig 2 | 3 | # These imports may fail if sounddevice/pynput not available (requires PortAudio) 4 | try: 5 | from .live_voice_chat import LiveVoiceChat 6 | from .audio_recorder import AudioRecorder 7 | from .audio_player import AudioPlayer 8 | _LIVE_VOICE_AVAILABLE = True 9 | except (ImportError, OSError): 10 | LiveVoiceChat = None 11 | AudioRecorder = None 12 | AudioPlayer = None 13 | _LIVE_VOICE_AVAILABLE = False 14 | 15 | __all__ = ['LiveVoiceChatConfig', '_LIVE_VOICE_AVAILABLE'] 16 | if _LIVE_VOICE_AVAILABLE: 17 | __all__.extend(['LiveVoiceChat', 'AudioRecorder', 'AudioPlayer']) 18 | -------------------------------------------------------------------------------- /SimplerLLM/voice/video_transcription/utils/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utilities for video transcription. 3 | """ 4 | from .video_utils import ( 5 | extract_audio_from_video, 6 | get_video_duration, 7 | is_youtube_url, 8 | download_youtube_audio, 9 | cleanup_temp_file 10 | ) 11 | from .subtitle_formatter import ( 12 | format_segments_to_srt, 13 | format_segments_to_vtt, 14 | save_subtitles 15 | ) 16 | 17 | __all__ = [ 18 | 'extract_audio_from_video', 19 | 'get_video_duration', 20 | 'is_youtube_url', 21 | 'download_youtube_audio', 22 | 'cleanup_temp_file', 23 | 'format_segments_to_srt', 24 | 'format_segments_to_vtt', 25 | 'save_subtitles' 26 | ] 27 | -------------------------------------------------------------------------------- /SimplerLLM/language/guardrails/input_guardrails/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Input guardrails that are applied before LLM generation. 3 | 4 | Input guardrails can validate, modify, or block prompts before they 5 | are sent to the LLM. Common use cases include: 6 | - Injecting safety instructions 7 | - Blocking prohibited topics 8 | - Detecting PII in user input 9 | - Adding formatting instructions 10 | """ 11 | 12 | from .prompt_injection import PromptInjectionGuardrail 13 | from .topic_filter import TopicFilterGuardrail 14 | from .pii_detection import InputPIIDetectionGuardrail 15 | 16 | __all__ = [ 17 | "PromptInjectionGuardrail", 18 | "TopicFilterGuardrail", 19 | "InputPIIDetectionGuardrail", 20 | ] 21 | -------------------------------------------------------------------------------- /Documentation/blog/authors.yml: -------------------------------------------------------------------------------- 1 | yangshun: 2 | name: Yangshun Tay 3 | title: Front End Engineer @ Facebook 4 | url: https://github.com/yangshun 5 | image_url: https://github.com/yangshun.png 6 | page: true 7 | socials: 8 | x: yangshunz 9 | github: yangshun 10 | 11 | slorber: 12 | name: Sébastien Lorber 13 | title: Docusaurus maintainer 14 | url: https://sebastienlorber.com 15 | image_url: https://github.com/slorber.png 16 | page: 17 | # customize the url of the author page at /blog/authors/ 18 | permalink: '/all-sebastien-lorber-articles' 19 | socials: 20 | x: sebastienlorber 21 | linkedin: sebastienlorber 22 | github: slorber 23 | newsletter: https://thisweekinreact.com 24 | -------------------------------------------------------------------------------- /SimplerLLM/image/generation/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Image generation module. 3 | Provides unified interface for generating images from text prompts 4 | across multiple providers (OpenAI DALL-E, Stability AI, Google Gemini, etc.). 5 | """ 6 | 7 | from .base import ImageGenerator, ImageProvider, ImageSize 8 | from .wrappers.openai_wrapper import OpenAIImageGenerator 9 | from .wrappers.stability_wrapper import StabilityImageGenerator 10 | from .wrappers.google_wrapper import GoogleImageGenerator 11 | from .providers.image_response_models import ImageGenerationResponse 12 | 13 | __all__ = [ 14 | 'ImageGenerator', 15 | 'ImageProvider', 16 | 'ImageSize', 17 | 'OpenAIImageGenerator', 18 | 'StabilityImageGenerator', 19 | 'GoogleImageGenerator', 20 | 'ImageGenerationResponse', 21 | ] 22 | -------------------------------------------------------------------------------- /SimplerLLM/language/guardrails/output_guardrails/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Output guardrails that are applied after LLM generation. 3 | 4 | Output guardrails can validate, modify, or block responses after they 5 | are generated by the LLM. Common use cases include: 6 | - Content safety and moderation 7 | - PII detection and redaction 8 | - Format validation 9 | - Length constraints 10 | - Toxicity filtering 11 | """ 12 | 13 | from .format_validator import FormatValidatorGuardrail 14 | from .pii_detection import OutputPIIDetectionGuardrail 15 | from .content_safety import ContentSafetyGuardrail 16 | from .length_validator import LengthValidatorGuardrail 17 | 18 | __all__ = [ 19 | "FormatValidatorGuardrail", 20 | "OutputPIIDetectionGuardrail", 21 | "ContentSafetyGuardrail", 22 | "LengthValidatorGuardrail", 23 | ] 24 | -------------------------------------------------------------------------------- /SimplerLLM/voice/video_transcription/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Video transcription and multi-language caption generation. 3 | 4 | This module provides functionality for: 5 | - Transcribing videos (local files or YouTube URLs) to text with timing 6 | - Generating multi-language captions using LLM translation 7 | - Exporting captions in SRT and VTT formats 8 | """ 9 | 10 | from .base import VideoTranscriber 11 | from .caption_generator import MultiLanguageCaptionGenerator 12 | from .models import ( 13 | VideoTranscriptionResult, 14 | CaptionSegment, 15 | LanguageCaptions, 16 | MultiLanguageCaptionsResult 17 | ) 18 | 19 | __all__ = [ 20 | 'VideoTranscriber', 21 | 'MultiLanguageCaptionGenerator', 22 | 'VideoTranscriptionResult', 23 | 'CaptionSegment', 24 | 'LanguageCaptions', 25 | 'MultiLanguageCaptionsResult' 26 | ] 27 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp>=3.9 2 | cohere>=5.0 3 | duckduckgo_search>=5.3 4 | elevenlabs>=1.0 5 | lxml_html_clean>=0.1 6 | newspaper3k>=0.2 7 | numpy>=1.26 8 | openai>=1.59 9 | google-genai>=1.52.0 10 | Pillow>=10.0 11 | pydantic>=2.10 12 | PyPDF2>=3.0 13 | python-dotenv>=1.0 14 | python_docx>=1.1 15 | requests>=2.31 16 | youtube_transcript_api>=0.6 17 | colorama>=0.4 18 | scipy>=1.15 19 | tiktoken>=0.9 20 | qdrant-client==1.14.3 21 | voyageai==0.3.3 22 | pydub>=0.25.1 23 | elevenlabs>=2.22.0 24 | # sounddevice>=0.4.6 # Optional - for live voice features. Requires PortAudio. Install with: pip install SimplerLLM[voice] 25 | # pynput>=1.7.6 # Optional - for push-to-talk. Install with: pip install SimplerLLM[voice] 26 | # pygame>=2.5.0 # Optional - only needed for AudioPlayer file playback. Install with: pip install pygame>=2.5.0 27 | moviepy>=1.0.3 28 | yt-dlp>=2023.3.4 -------------------------------------------------------------------------------- /pytest.ini: -------------------------------------------------------------------------------- 1 | [pytest] 2 | # Pytest configuration for SimplerLLM 3 | 4 | # Test discovery patterns 5 | python_files = test_*.py 6 | python_classes = Test* 7 | python_functions = test_* 8 | 9 | # Test paths 10 | testpaths = tests 11 | 12 | # Output options 13 | addopts = 14 | -v 15 | --strict-markers 16 | --tb=short 17 | --disable-warnings 18 | 19 | # Markers for test categorization 20 | markers = 21 | unit: Unit tests (fast, no external dependencies) 22 | integration: Integration tests (may require external services) 23 | live: Tests requiring live Qdrant server 24 | slow: Slow running tests 25 | qdrant: Qdrant-related tests 26 | 27 | # Logging 28 | log_cli = false 29 | log_cli_level = INFO 30 | 31 | # Coverage options (if pytest-cov is installed) 32 | # addopts = --cov=SimplerLLM --cov-report=html --cov-report=term-missing 33 | 34 | # Minimum Python version 35 | minversion = 3.6 36 | -------------------------------------------------------------------------------- /Documentation/sidebars.js: -------------------------------------------------------------------------------- 1 | /** 2 | * Creating a sidebar enables you to: 3 | - create an ordered group of docs 4 | - render a sidebar for each doc of that group 5 | - provide next/previous navigation 6 | 7 | The sidebars can be generated from the filesystem, or explicitly defined here. 8 | 9 | Create as many sidebars as you want. 10 | */ 11 | 12 | // @ts-check 13 | 14 | /** @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} */ 15 | const sidebars = { 16 | // By default, Docusaurus generates a sidebar from the docs folder structure 17 | tutorialSidebar: [{type: 'autogenerated', dirName: '.'}], 18 | 19 | // But you can create a sidebar manually 20 | /* 21 | tutorialSidebar: [ 22 | 'intro', 23 | 'hello', 24 | { 25 | type: 'category', 26 | label: 'Tutorial', 27 | items: ['tutorial-basics/create-a-document'], 28 | }, 29 | ], 30 | */ 31 | }; 32 | 33 | export default sidebars; 34 | -------------------------------------------------------------------------------- /Documentation/README.md: -------------------------------------------------------------------------------- 1 | # Website 2 | 3 | This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator. 4 | 5 | ### Installation 6 | 7 | ``` 8 | $ yarn 9 | ``` 10 | 11 | ### Local Development 12 | 13 | ``` 14 | $ yarn start 15 | ``` 16 | 17 | This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server. 18 | 19 | ### Build 20 | 21 | ``` 22 | $ yarn build 23 | ``` 24 | 25 | This command generates static content into the `build` directory and can be served using any static contents hosting service. 26 | 27 | ### Deployment 28 | 29 | Using SSH: 30 | 31 | ``` 32 | $ USE_SSH=true yarn deploy 33 | ``` 34 | 35 | Not using SSH: 36 | 37 | ``` 38 | $ GIT_USER= yarn deploy 39 | ``` 40 | 41 | If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch. 42 | -------------------------------------------------------------------------------- /Documentation/blog/2021-08-26-welcome/index.md: -------------------------------------------------------------------------------- 1 | --- 2 | slug: welcome 3 | title: Welcome 4 | authors: [slorber, yangshun] 5 | tags: [facebook, hello, docusaurus] 6 | --- 7 | 8 | [Docusaurus blogging features](https://docusaurus.io/docs/blog) are powered by the [blog plugin](https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-content-blog). 9 | 10 | Here are a few tips you might find useful. 11 | 12 | 13 | 14 | Simply add Markdown files (or folders) to the `blog` directory. 15 | 16 | Regular blog authors can be added to `authors.yml`. 17 | 18 | The blog post date can be extracted from filenames, such as: 19 | 20 | - `2019-05-30-welcome.md` 21 | - `2019-05-30-welcome/index.md` 22 | 23 | A blog post folder can be convenient to co-locate blog post images: 24 | 25 | ![Docusaurus Plushie](./docusaurus-plushie-banner.jpeg) 26 | 27 | The blog supports tags as well! 28 | 29 | **And if you don't want a blog**: just delete this directory, and use `blog: false` in your Docusaurus config. 30 | -------------------------------------------------------------------------------- /SimplerLLM/prompts/hub/__init__.py: -------------------------------------------------------------------------------- 1 | # SimplerLLM/prompts/hub/__init__.py 2 | 3 | """ 4 | This module provides access to prompt management features, including fetching 5 | prompts from the SimplerLLM Prompt Manager hub. 6 | """ 7 | 8 | from .prompt_manager import ( 9 | fetch_prompt_from_hub, 10 | ManagedPrompt, 11 | PromptManagerError, 12 | AuthenticationError, 13 | PromptNotFoundError, 14 | NetworkError, 15 | MissingAPIKeyError, 16 | VariableError, 17 | list_prompts_from_hub, # Added 18 | PromptSummaryData, # Added 19 | fetch_prompt_version_from_hub, # Added 20 | ) 21 | 22 | __all__ = [ 23 | "fetch_prompt_from_hub", 24 | "ManagedPrompt", 25 | "PromptManagerError", 26 | "AuthenticationError", 27 | "PromptNotFoundError", 28 | "NetworkError", 29 | "MissingAPIKeyError", 30 | "VariableError", 31 | "list_prompts_from_hub", # Added 32 | "PromptSummaryData", # Added 33 | "fetch_prompt_version_from_hub", # Added 34 | ] 35 | -------------------------------------------------------------------------------- /SimplerLLM/tools/python_func.py: -------------------------------------------------------------------------------- 1 | 2 | import traceback 3 | import sys 4 | import io 5 | 6 | 7 | def execute_python_code(input_code): 8 | """ 9 | Executes a given Python code snippet and captures its standard output. 10 | 11 | Parameters: 12 | input_code (str): A string containing the Python code to be executed. 13 | 14 | Returns: 15 | tuple: A tuple containing two elements: 16 | - output (str): Captured standard output of the executed code if successful, None otherwise. 17 | - error_trace (str): Traceback of the exception if an error occurs, None otherwise. 18 | """ 19 | old_stdout = sys.stdout 20 | new_stdout = io.StringIO() 21 | sys.stdout = new_stdout 22 | 23 | try: 24 | exec(input_code, globals()) 25 | # Reset standard output 26 | sys.stdout = old_stdout 27 | output = new_stdout.getvalue() 28 | return output, None 29 | except Exception as e: 30 | error_trace = traceback.format_exc() 31 | return None, error_trace -------------------------------------------------------------------------------- /SimplerLLM/utils/custom_verbose.py: -------------------------------------------------------------------------------- 1 | import colorama 2 | from colorama import Fore, Back, Style 3 | 4 | # Initialize colorama 5 | colorama.init(autoreset=True) 6 | 7 | 8 | 9 | def verbose_print(message, level='info', end='\n\n'): 10 | """ 11 | Prints a message with color and style based on the level of verbosity. 12 | 13 | Parameters: 14 | message (str): The message to print. 15 | level (str): The verbosity level ('debug', 'info', 'warning', 'error', 'critical'). Default is 'info'. 16 | end (str): The end character to print after the message. Default is '\n'. 17 | """ 18 | styles = { 19 | 'debug': (Fore.CYAN, Style.DIM), 20 | 'info': (Fore.GREEN, Style.NORMAL), 21 | 'warning': (Fore.YELLOW, Style.BRIGHT), 22 | 'error': (Fore.RED, Style.NORMAL), 23 | 'critical': (Fore.WHITE, Back.RED, Style.BRIGHT) 24 | } 25 | 26 | color, *style = styles.get(level, (Fore.WHITE, Style.NORMAL)) 27 | style = ''.join(style) 28 | print(f"{color}{style}{message}{Style.RESET_ALL}", end=end) -------------------------------------------------------------------------------- /SimplerLLM/tools/file_loader.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import os 3 | from pydantic import BaseModel 4 | from typing import Optional, List 5 | 6 | 7 | class CSVDocument(BaseModel): 8 | file_size: Optional[int] = None 9 | row_count: int 10 | column_count: int 11 | total_fields: int 12 | content: List[List[str]] # This represents the CSV data as a list of rows 13 | title: Optional[str] = None 14 | url_or_path: Optional[str] = None 15 | 16 | 17 | def read_csv_file(file_path: str) -> CSVDocument: 18 | with open(file_path, "r", encoding="utf-8") as file: 19 | reader = csv.reader(file) 20 | rows = list(reader) 21 | 22 | file_size = os.path.getsize(file_path) 23 | row_count = len(rows) 24 | column_count = len(rows[0]) if rows else 0 25 | total_fields = sum(len(row) for row in rows) 26 | 27 | return CSVDocument( 28 | file_size=file_size, 29 | row_count=row_count, 30 | column_count=column_count, 31 | total_fields=total_fields, 32 | content=rows, 33 | url_or_path=file_path, 34 | ) 35 | -------------------------------------------------------------------------------- /SimplerLLM/voice/video_dubbing/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Video dubbing functionality for creating dubbed videos in different languages. 3 | 4 | This module provides functionality for: 5 | - Transcribing videos and translating to target languages 6 | - Generating TTS audio for translated segments 7 | - Adjusting audio timing to match original video 8 | - Replacing video audio tracks with dubbed audio 9 | """ 10 | 11 | from .base import VideoDubber 12 | from .models import DubbedSegment, DubbingConfig, VideoDubbingResult 13 | from .audio_sync import ( 14 | adjust_audio_speed, 15 | adjust_audio_speed_advanced, 16 | get_audio_duration, 17 | merge_audio_segments 18 | ) 19 | from .video_processor import ( 20 | replace_video_audio, 21 | get_video_info, 22 | trim_video 23 | ) 24 | 25 | __all__ = [ 26 | 'VideoDubber', 27 | 'DubbedSegment', 28 | 'DubbingConfig', 29 | 'VideoDubbingResult', 30 | 'adjust_audio_speed', 31 | 'adjust_audio_speed_advanced', 32 | 'get_audio_duration', 33 | 'merge_audio_segments', 34 | 'replace_video_audio', 35 | 'get_video_info', 36 | 'trim_video' 37 | ] 38 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Hasan Aboul Hasan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /generate_smirk_shot.py: -------------------------------------------------------------------------------- 1 | from SimplerLLM import ImageGenerator, ImageProvider, ImageSize 2 | 3 | # === CONFIGURE YOUR CHARACTER HERE === 4 | REFERENCE_IMAGE = "cartoon_me.png" # Path to your character image 5 | OUTPUT_FILE = "confident_smirk.png" 6 | 7 | # === PROMPT === 8 | PROMPT = """Generate the exact same cartoon character from the reference image. 9 | Maintain the same art style, colors, proportions, and character design exactly. 10 | The character should be shown in full body view with a clean, simple background. 11 | 12 | Show the character with a confident smirk expression. 13 | One corner of the mouth raised in a knowing smile, eyes slightly narrowed with confidence. 14 | Self-assured, slightly cocky attitude. Full body, front-facing view.""" 15 | 16 | # === GENERATE === 17 | img_gen = ImageGenerator.create(provider=ImageProvider.GOOGLE_GEMINI) 18 | result = img_gen.generate_image( 19 | prompt=PROMPT, 20 | reference_images=[REFERENCE_IMAGE], 21 | size=ImageSize.PORTRAIT_3_4, 22 | output_format="file", 23 | output_path=OUTPUT_FILE, 24 | model="gemini-2.5-flash-image-preview", 25 | ) 26 | print(f"Generated: {result}") 27 | -------------------------------------------------------------------------------- /Documentation/src/css/custom.css: -------------------------------------------------------------------------------- 1 | /** 2 | * Any CSS included here will be global. The classic template 3 | * bundles Infima by default. Infima is a CSS framework designed to 4 | * work well for content-centric websites. 5 | */ 6 | 7 | /* You can override the default Infima variables here. */ 8 | :root { 9 | --ifm-color-primary: #2e8555; 10 | --ifm-color-primary-dark: #29784c; 11 | --ifm-color-primary-darker: #277148; 12 | --ifm-color-primary-darkest: #205d3b; 13 | --ifm-color-primary-light: #33925d; 14 | --ifm-color-primary-lighter: #359962; 15 | --ifm-color-primary-lightest: #3cad6e; 16 | --ifm-code-font-size: 95%; 17 | --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.1); 18 | } 19 | 20 | /* For readability concerns, you should choose a lighter palette in dark mode. */ 21 | [data-theme='dark'] { 22 | --ifm-color-primary: #25c2a0; 23 | --ifm-color-primary-dark: #21af90; 24 | --ifm-color-primary-darker: #1fa588; 25 | --ifm-color-primary-darkest: #1a8870; 26 | --ifm-color-primary-light: #29d5b0; 27 | --ifm-color-primary-lighter: #32d8b4; 28 | --ifm-color-primary-lightest: #4fddbf; 29 | --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.3); 30 | } 31 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | venv/ 3 | **/__pycache__/ 4 | .pytest_cache/ 5 | /Tests 6 | /my_tests 7 | /old_codes 8 | /dist 9 | /codes 10 | /build 11 | /SimplerLLM/workflow 12 | /SimplerLLM.egg-info 13 | .pypirc 14 | generate_images.py 15 | generate_text.py 16 | chunk_text.py 17 | generate_code.py 18 | generate_embeddings.py 19 | test_embed.py 20 | test_generate.py 21 | test_chunker.py 22 | lib_test.py 23 | /SimplerLLM/tools/web_crawler.py 24 | /SimplerLLM/language/llm_providers/transformers_llm.py 25 | test_agents.py 26 | test_pydantic.py 27 | test_tool.py 28 | /SimplerLLM/tools/predefined_tools.py 29 | /SimplerLLM/agents_deprecated 30 | 31 | test.csv 32 | test_data_agent.py 33 | test_sql_agent.py 34 | vb_ui.py 35 | vd_test.py 36 | test.py 37 | file.txt 38 | Testing/ 39 | tweet_generator.py 40 | test_token_count.py 41 | test_reliable_json.py 42 | test_reliable_json.py 43 | test_llm_router.py 44 | test_deep.py 45 | test_complex_json.py 46 | setup.py 47 | 48 | CLAUDE.md 49 | 50 | output 51 | /output 52 | 53 | 54 | token.md 55 | gpt5_token_issue_writeup.md 56 | 57 | # Website (separate private repository) 58 | simplerllm_website/ 59 | MINIAGENT_ROADMAP.md 60 | mcp_server 61 | examples 62 | 63 | 64 | guides 65 | projects -------------------------------------------------------------------------------- /SimplerLLM/tools/pandas_func.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import io 3 | import traceback 4 | 5 | def execute_pandas_python_code(input_code, df): 6 | """ 7 | Executes a given Python code snippet within the context that includes the provided DataFrame. 8 | Captures its standard output and returns it along with any errors. 9 | 10 | Parameters: 11 | input_code (str): A string containing the Python code to be executed. 12 | 13 | Returns: 14 | tuple: A tuple containing two elements: 15 | - output (str): Captured standard output of the executed code if successful, None otherwise. 16 | - error_trace (str): Traceback of the exception if an error occurs, None otherwise. 17 | """ 18 | old_stdout = sys.stdout 19 | new_stdout = io.StringIO() 20 | sys.stdout = new_stdout 21 | 22 | local_vars = {"df": df} # Access the instance's DataFrame 23 | 24 | try: 25 | exec(input_code, globals(), local_vars) # Execute with local_vars as the local context 26 | sys.stdout = old_stdout 27 | output = new_stdout.getvalue() 28 | return output, None 29 | except Exception as e: 30 | sys.stdout = old_stdout 31 | error_trace = traceback.format_exc() 32 | return None, error_trace -------------------------------------------------------------------------------- /Documentation/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "documentation", 3 | "version": "0.0.0", 4 | "private": true, 5 | "scripts": { 6 | "docusaurus": "docusaurus", 7 | "start": "docusaurus start", 8 | "build": "docusaurus build", 9 | "swizzle": "docusaurus swizzle", 10 | "deploy": "docusaurus deploy", 11 | "clear": "docusaurus clear", 12 | "serve": "docusaurus serve", 13 | "write-translations": "docusaurus write-translations", 14 | "write-heading-ids": "docusaurus write-heading-ids" 15 | }, 16 | "dependencies": { 17 | "@docusaurus/core": "3.5.2", 18 | "@docusaurus/plugin-google-analytics": "^3.5.2", 19 | "@docusaurus/plugin-google-gtag": "^3.5.2", 20 | "@docusaurus/preset-classic": "3.5.2", 21 | "@mdx-js/react": "^3.0.0", 22 | "clsx": "^2.0.0", 23 | "prism-react-renderer": "^2.3.0", 24 | "react": "^18.0.0", 25 | "react-dom": "^18.0.0" 26 | }, 27 | "devDependencies": { 28 | "@docusaurus/module-type-aliases": "3.5.2", 29 | "@docusaurus/types": "3.5.2" 30 | }, 31 | "browserslist": { 32 | "production": [ 33 | ">0.5%", 34 | "not dead", 35 | "not op_mini all" 36 | ], 37 | "development": [ 38 | "last 3 chrome version", 39 | "last 3 firefox version", 40 | "last 5 safari version" 41 | ] 42 | }, 43 | "engines": { 44 | "node": ">=18.0" 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /Documentation/src/pages/main_home_page.js: -------------------------------------------------------------------------------- 1 | import clsx from 'clsx'; 2 | import Link from '@docusaurus/Link'; 3 | import useDocusaurusContext from '@docusaurus/useDocusaurusContext'; 4 | import Layout from '@theme/Layout'; 5 | import HomepageFeatures from '@site/src/components/HomepageFeatures'; 6 | 7 | import Heading from '@theme/Heading'; 8 | import styles from './index.module.css'; 9 | 10 | function HomepageHeader() { 11 | const {siteConfig} = useDocusaurusContext(); 12 | return ( 13 |
14 |
15 | 16 | {siteConfig.title} 17 | 18 |

{siteConfig.tagline}

19 |
20 | {/* 23 | Docusaurus Tutorial - 5min ⏱️ 24 | */} 25 |
26 |
27 |
28 | ); 29 | } 30 | 31 | export default function Home() { 32 | const {siteConfig} = useDocusaurusContext(); 33 | return ( 34 | 37 | 38 |
39 | 40 |
41 |
42 | ); 43 | } 44 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_brainstorm/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Recursive Brainstorm - Generate and expand ideas recursively using LLMs. 3 | 4 | This module provides flexible brainstorming capabilities with three generation modes: 5 | - Tree mode: Exponential expansion of all ideas 6 | - Linear mode: Focused refinement of the best ideas 7 | - Hybrid mode: Selective expansion of top-N ideas 8 | 9 | Example: 10 | >>> from SimplerLLM.language import LLM, LLMProvider 11 | >>> from SimplerLLM.language.llm_brainstorm import RecursiveBrainstorm 12 | >>> 13 | >>> llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4o") 14 | >>> brainstorm = RecursiveBrainstorm( 15 | ... llm=llm, 16 | ... max_depth=3, 17 | ... ideas_per_level=5, 18 | ... mode="tree" 19 | ... ) 20 | >>> 21 | >>> result = brainstorm.brainstorm("Ways to reduce carbon emissions") 22 | >>> print(f"Generated {result.total_ideas} ideas") 23 | >>> print(f"Best idea: {result.overall_best_idea.text}") 24 | """ 25 | 26 | from .recursive_brainstorm import RecursiveBrainstorm 27 | from .models import ( 28 | BrainstormIdea, 29 | BrainstormLevel, 30 | BrainstormIteration, 31 | BrainstormResult, 32 | IdeaGeneration, 33 | IdeaEvaluation, 34 | ) 35 | 36 | __all__ = [ 37 | "RecursiveBrainstorm", 38 | "BrainstormIdea", 39 | "BrainstormLevel", 40 | "BrainstormIteration", 41 | "BrainstormResult", 42 | "IdeaGeneration", 43 | "IdeaEvaluation", 44 | ] 45 | -------------------------------------------------------------------------------- /SimplerLLM/tools/file_functions.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | def save_text_to_file(text, filename="output.txt"): 5 | """ 6 | Saves the given text to a file specified by the filename. 7 | 8 | Parameters: 9 | text (str): The text to be saved to the file. This can be any length and include Unicode characters. 10 | filename (str): The name of the file where the text will be saved. Default is "output.txt". 11 | 12 | Returns: 13 | bool: True if the file is saved successfully, False otherwise. 14 | 15 | Raises: 16 | TypeError: If the provided text is not a string. 17 | OSError: If there are issues with writing to the file system, such as permissions. 18 | 19 | Examples: 20 | >>> save_text_to_file("Hello, world!", "greeting.txt") 21 | True 22 | 23 | >>> save_text_to_file(123, "numbers.txt") 24 | TypeError: Provided text must be a string. 25 | """ 26 | # Check if the provided 'text' is indeed a string 27 | if not isinstance(text, str): 28 | raise TypeError("Provided text must be a string.") 29 | 30 | try: 31 | # Open the file in write mode with encoding set to utf-8 32 | with open(filename, "w", encoding="utf-8") as file: 33 | file.write(text) 34 | return True 35 | except OSError as e: 36 | # Handle exceptions that may occur during file writing 37 | print(f"An error occurred while writing to the file: {e}") 38 | return False 39 | -------------------------------------------------------------------------------- /SimplerLLM/voice/video_transcription/utils/subtitle_formatter.py: -------------------------------------------------------------------------------- 1 | """ 2 | Subtitle formatting utilities for SRT and VTT formats. 3 | """ 4 | from typing import List 5 | from ..models import CaptionSegment 6 | 7 | 8 | def format_segments_to_srt(segments: List[CaptionSegment]) -> str: 9 | """ 10 | Format caption segments to SRT format. 11 | 12 | Args: 13 | segments: List of CaptionSegment objects 14 | 15 | Returns: 16 | SRT formatted string 17 | """ 18 | srt_content = [] 19 | 20 | for segment in segments: 21 | srt_content.append(segment.to_srt()) 22 | 23 | return "\n".join(srt_content) 24 | 25 | 26 | def format_segments_to_vtt(segments: List[CaptionSegment]) -> str: 27 | """ 28 | Format caption segments to VTT (WebVTT) format. 29 | 30 | Args: 31 | segments: List of CaptionSegment objects 32 | 33 | Returns: 34 | VTT formatted string 35 | """ 36 | vtt_content = ["WEBVTT\n"] 37 | 38 | for segment in segments: 39 | vtt_content.append(segment.to_vtt()) 40 | 41 | return "\n".join(vtt_content) 42 | 43 | 44 | def save_subtitles( 45 | content: str, 46 | output_path: str, 47 | encoding: str = 'utf-8' 48 | ) -> None: 49 | """ 50 | Save subtitle content to a file. 51 | 52 | Args: 53 | content: The formatted subtitle content (SRT or VTT) 54 | output_path: Path where to save the file 55 | encoding: File encoding (default: utf-8) 56 | """ 57 | with open(output_path, 'w', encoding=encoding) as f: 58 | f.write(content) 59 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_feedback/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM Feedback Loop - Iterative self-improvement system. 3 | 4 | This module provides tools for iteratively refining LLM responses through 5 | critique and improvement cycles. Supports multiple architectural patterns: 6 | - Single provider self-critique 7 | - Dual provider (generator + critic) 8 | - Multi-provider rotation 9 | 10 | Main Classes: 11 | - LLMFeedbackLoop: Main class for iterative improvement 12 | - FeedbackResult: Complete result with history and final answer 13 | - IterationResult: Result from a single iteration 14 | - Critique: Structured critique model 15 | - FeedbackConfig: Configuration options 16 | 17 | Example: 18 | ```python 19 | from SimplerLLM.language import LLM, LLMProvider 20 | from SimplerLLM.language.llm_feedback import LLMFeedbackLoop 21 | 22 | # Single provider self-critique 23 | llm = LLM.create(LLMProvider.OPENAI, model_name="gpt-4") 24 | feedback = LLMFeedbackLoop(llm=llm, max_iterations=3) 25 | result = feedback.improve("Explain quantum computing") 26 | 27 | print(f"Improvement: {result.initial_score} → {result.final_score}") 28 | print(result.final_answer) 29 | ``` 30 | """ 31 | 32 | from .feedback_loop import LLMFeedbackLoop 33 | from .models import ( 34 | Critique, 35 | IterationResult, 36 | FeedbackResult, 37 | FeedbackConfig, 38 | TemperatureSchedule, 39 | ) 40 | 41 | __all__ = [ 42 | "LLMFeedbackLoop", 43 | "Critique", 44 | "IterationResult", 45 | "FeedbackResult", 46 | "FeedbackConfig", 47 | "TemperatureSchedule", 48 | ] 49 | -------------------------------------------------------------------------------- /SimplerLLM/voice/stt/providers/stt_response_models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel 2 | from typing import Any, Optional 3 | 4 | 5 | class STTFullResponse(BaseModel): 6 | """Full response from STT transcription with metadata.""" 7 | 8 | text: str 9 | """The transcribed text""" 10 | 11 | model: str 12 | """The STT model used (e.g., 'whisper-1')""" 13 | 14 | language: Optional[str] = None 15 | """Detected or specified language code (e.g., 'en', 'es', 'fr')""" 16 | 17 | process_time: float 18 | """Time taken to transcribe the audio in seconds""" 19 | 20 | provider: Optional[str] = None 21 | """The STT provider used (e.g., 'OPENAI')""" 22 | 23 | audio_file: Optional[str] = None 24 | """Path to the audio file that was transcribed""" 25 | 26 | duration: Optional[float] = None 27 | """Duration of the audio file in seconds""" 28 | 29 | response_format: Optional[str] = None 30 | """Response format used (e.g., 'text', 'json', 'verbose_json')""" 31 | 32 | llm_provider_response: Optional[Any] = None 33 | """Raw response from the provider API""" 34 | 35 | class Config: 36 | json_schema_extra = { 37 | "example": { 38 | "text": "Hello, this is a transcription of the audio file.", 39 | "model": "whisper-1", 40 | "language": "en", 41 | "process_time": 2.45, 42 | "provider": "OPENAI", 43 | "audio_file": "recording.mp3", 44 | "duration": 10.5, 45 | "response_format": "text" 46 | } 47 | } 48 | -------------------------------------------------------------------------------- /SimplerLLM/voice/tts/providers/tts_response_models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel 2 | from typing import Any, Optional, Union 3 | 4 | 5 | class TTSFullResponse(BaseModel): 6 | """Full response from TTS generation with metadata.""" 7 | 8 | audio_data: Union[bytes, str] 9 | """The generated audio - either bytes or file path""" 10 | 11 | model: str 12 | """The TTS model used (e.g., 'tts-1', 'tts-1-hd')""" 13 | 14 | voice: str 15 | """The voice used (e.g., 'alloy', 'nova', 'shimmer')""" 16 | 17 | format: str 18 | """The audio format (e.g., 'mp3', 'wav', 'opus')""" 19 | 20 | process_time: float 21 | """Time taken to generate the audio in seconds""" 22 | 23 | speed: Optional[float] = 1.0 24 | """The speech speed used (0.25 to 4.0). None if provider doesn't support speed control.""" 25 | 26 | file_size: Optional[int] = None 27 | """Size of the audio data in bytes""" 28 | 29 | output_path: Optional[str] = None 30 | """File path if audio was saved to disk""" 31 | 32 | provider: Optional[str] = None 33 | """The TTS provider used (e.g., 'OPENAI')""" 34 | 35 | llm_provider_response: Optional[Any] = None 36 | """Raw response from the provider API""" 37 | 38 | class Config: 39 | json_schema_extra = { 40 | "example": { 41 | "audio_data": "output/speech.mp3", 42 | "model": "tts-1-hd", 43 | "voice": "alloy", 44 | "format": "mp3", 45 | "process_time": 1.23, 46 | "speed": 1.0, 47 | "file_size": 24576, 48 | "output_path": "output/speech.mp3", 49 | "provider": "OPENAI" 50 | } 51 | } 52 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_judge/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM Judge - Multi-provider orchestration and evaluation system. 3 | 4 | This module provides tools for orchestrating multiple LLM providers, 5 | evaluating their responses, and generating comparative analyses or synthesized answers. 6 | 7 | Main Classes: 8 | - LLMJudge: Orchestrates multiple providers and evaluates responses 9 | - JudgeMode: Enum for evaluation modes (select_best, synthesize, compare) 10 | - JudgeResult: Complete result with evaluations and metadata 11 | - EvaluationReport: Statistical summary for batch evaluations 12 | 13 | Example: 14 | ```python 15 | from SimplerLLM.language import LLM, LLMProvider 16 | from SimplerLLM.language.llm_judge import LLMJudge, JudgeMode 17 | 18 | # Create providers 19 | providers = [ 20 | LLM.create(LLMProvider.OPENAI, model_name="gpt-4"), 21 | LLM.create(LLMProvider.ANTHROPIC, model_name="claude-sonnet-4"), 22 | ] 23 | judge_llm = LLM.create(LLMProvider.ANTHROPIC, model_name="claude-opus-4") 24 | 25 | # Initialize judge 26 | judge = LLMJudge(providers=providers, judge_llm=judge_llm) 27 | 28 | # Evaluate 29 | result = judge.generate("Explain quantum computing", mode="synthesize") 30 | print(result.final_answer) 31 | print(result.confidence_scores) 32 | ``` 33 | """ 34 | 35 | from .judge import LLMJudge 36 | from .models import ( 37 | JudgeMode, 38 | JudgeResult, 39 | ProviderResponse, 40 | ProviderEvaluation, 41 | EvaluationReport, 42 | RouterSummary, 43 | ) 44 | 45 | __all__ = [ 46 | "LLMJudge", 47 | "JudgeMode", 48 | "JudgeResult", 49 | "ProviderResponse", 50 | "ProviderEvaluation", 51 | "EvaluationReport", 52 | "RouterSummary", 53 | ] 54 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_validator/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM Validator - Multi-provider validation system for AI-generated content. 3 | 4 | This module provides tools for validating AI-generated content using multiple 5 | LLM providers, with configurable aggregation methods and consensus detection. 6 | 7 | Main Classes: 8 | - LLMValidator: Validates content using multiple LLM providers 9 | - AggregationMethod: Enum for score aggregation methods 10 | - ValidationResult: Complete validation result with scores and metadata 11 | - ValidatorScore: Individual validator's score and explanation 12 | 13 | Example: 14 | ```python 15 | from SimplerLLM.language import LLM, LLMProvider 16 | from SimplerLLM.language.llm_validator import LLMValidator 17 | 18 | # Create validators 19 | validators = [ 20 | LLM.create(LLMProvider.OPENAI, model_name="gpt-4o"), 21 | LLM.create(LLMProvider.ANTHROPIC, model_name="claude-3-5-sonnet-20241022"), 22 | ] 23 | 24 | # Initialize validator 25 | validator = LLMValidator(validators=validators) 26 | 27 | # Validate content 28 | result = validator.validate( 29 | content="Paris is the capital of France.", 30 | validation_prompt="Check if the facts are accurate.", 31 | original_question="What is the capital of France?", 32 | ) 33 | print(f"Score: {result.overall_score}") 34 | print(f"Valid: {result.is_valid}") 35 | print(f"Consensus: {result.consensus}") 36 | ``` 37 | """ 38 | 39 | from .validator import LLMValidator 40 | from .models import ( 41 | AggregationMethod, 42 | ValidationResult, 43 | ValidatorScore, 44 | ) 45 | 46 | __all__ = [ 47 | "LLMValidator", 48 | "AggregationMethod", 49 | "ValidationResult", 50 | "ValidatorScore", 51 | ] 52 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_provider_router/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM Provider Router - Smart wrapper for intelligent provider routing. 3 | 4 | This module provides automatic query classification and provider routing, 5 | making it easy to use multiple LLM providers intelligently. 6 | 7 | Main Classes: 8 | - LLMProviderRouter: Main router class for automatic provider selection 9 | - ProviderConfig: Configuration for individual providers 10 | - RoutingResult: Complete result with answer and metadata 11 | - QueryClassification: Query classification details 12 | 13 | Example: 14 | ```python 15 | from SimplerLLM.language import LLM, LLMProvider 16 | from SimplerLLM.language.llm_provider_router import ( 17 | LLMProviderRouter, 18 | ProviderConfig 19 | ) 20 | 21 | # Configure providers 22 | providers = [ 23 | ProviderConfig( 24 | llm_provider="OPENAI", 25 | llm_model="gpt-4", 26 | specialties=["coding", "technical"], 27 | description="Best for code" 28 | ), 29 | ] 30 | 31 | # Create LLM instances 32 | llm_instances = [LLM.create(LLMProvider.OPENAI, model_name="gpt-4")] 33 | 34 | # Initialize router 35 | router = LLMProviderRouter( 36 | provider_configs=providers, 37 | llm_instances=llm_instances 38 | ) 39 | 40 | # Route and execute 41 | result = router.route("Write a Python function to reverse a string") 42 | print(result.answer) 43 | print(f"Used: {result.provider_used}") 44 | ``` 45 | """ 46 | 47 | from .provider_router import LLMProviderRouter 48 | from .query_classifier import QueryClassifier 49 | from .models import ( 50 | ProviderConfig, 51 | QueryClassification, 52 | RoutingResult, 53 | RouterConfig, 54 | ) 55 | 56 | __all__ = [ 57 | "LLMProviderRouter", 58 | "QueryClassifier", 59 | "ProviderConfig", 60 | "QueryClassification", 61 | "RoutingResult", 62 | "RouterConfig", 63 | ] 64 | -------------------------------------------------------------------------------- /Documentation/src/components/HomepageFeatures/index.js: -------------------------------------------------------------------------------- 1 | import clsx from 'clsx'; 2 | import Heading from '@theme/Heading'; 3 | import styles from './styles.module.css'; 4 | 5 | const FeatureList = [ 6 | { 7 | title: 'Easy to Use', 8 | Svg: require('@site/static/img/undraw_docusaurus_mountain.svg').default, 9 | description: ( 10 | <> 11 | Docusaurus was designed from the ground up to be easily installed and 12 | used to get your website up and running quickly. 13 | 14 | ), 15 | }, 16 | { 17 | title: 'Focus on What Matters', 18 | Svg: require('@site/static/img/undraw_docusaurus_tree.svg').default, 19 | description: ( 20 | <> 21 | Docusaurus lets you focus on your docs, and we'll do the chores. Go 22 | ahead and move your docs into the docs directory. 23 | 24 | ), 25 | }, 26 | { 27 | title: 'Powered by React', 28 | Svg: require('@site/static/img/undraw_docusaurus_react.svg').default, 29 | description: ( 30 | <> 31 | Extend or customize your website layout by reusing React. Docusaurus can 32 | be extended while reusing the same header and footer. 33 | 34 | ), 35 | }, 36 | ]; 37 | 38 | function Feature({Svg, title, description}) { 39 | return ( 40 |
41 |
42 | 43 |
44 |
45 | {title} 46 |

{description}

47 |
48 |
49 | ); 50 | } 51 | 52 | export default function HomepageFeatures() { 53 | return ( 54 |
55 |
56 |
57 | {FeatureList.map((props, idx) => ( 58 | 59 | ))} 60 |
61 |
62 |
63 | ); 64 | } 65 | -------------------------------------------------------------------------------- /SimplerLLM/voice/live_voice_chat/models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel, Field 2 | from typing import Optional 3 | from ..voice_chat.models import VoiceChatConfig 4 | 5 | 6 | class LiveVoiceChatConfig(VoiceChatConfig): 7 | """ 8 | Configuration for LiveVoiceChat with microphone input. 9 | 10 | Extends VoiceChatConfig with additional settings for audio recording 11 | and playback. 12 | """ 13 | 14 | # Audio recording settings 15 | sample_rate: int = Field(default=16000, ge=8000, le=48000) 16 | """Audio sample rate in Hz (16000 recommended for STT)""" 17 | 18 | channels: int = Field(default=1, ge=1, le=2) 19 | """Number of audio channels (1=mono, 2=stereo)""" 20 | 21 | audio_dtype: str = 'int16' 22 | """Audio data type ('int16' or 'float32')""" 23 | 24 | # Push-to-talk settings 25 | push_to_talk_key: str = 'space' 26 | """Key to use for push-to-talk recording""" 27 | 28 | max_recording_duration: Optional[float] = None 29 | """Maximum recording duration in seconds (None = unlimited)""" 30 | 31 | # Playback settings 32 | auto_play_response: bool = True 33 | """Automatically play TTS response audio""" 34 | 35 | playback_volume: float = Field(default=1.0, ge=0.0, le=1.0) 36 | """Playback volume (0.0 to 1.0)""" 37 | 38 | # File handling 39 | cleanup_temp_files: bool = True 40 | """Automatically delete temporary audio files after processing""" 41 | 42 | temp_audio_dir: Optional[str] = None 43 | """Directory for temporary audio files (None = system temp)""" 44 | 45 | class Config: 46 | json_schema_extra = { 47 | "example": { 48 | "system_prompt": "You are a helpful voice assistant", 49 | "temperature": 0.7, 50 | "tts_voice": "nova", 51 | "sample_rate": 16000, 52 | "channels": 1, 53 | "push_to_talk_key": "space", 54 | "auto_play_response": True, 55 | "playback_volume": 0.8, 56 | "cleanup_temp_files": True, 57 | "max_history_length": 10 58 | } 59 | } 60 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | 4 | # Read requirements (excluding comments and empty lines) 5 | with open("requirements.txt") as f: 6 | requirements = [ 7 | line.strip() for line in f.read().splitlines() 8 | if line.strip() and not line.strip().startswith('#') 9 | ] 10 | 11 | # Optional dependencies for specific features 12 | extras_require = { 13 | 'voice': ['pygame>=2.5.0', 'sounddevice>=0.4.6', 'pynput>=1.7.6'], # Full voice support 14 | 'live_voice': ['sounddevice>=0.4.6', 'pynput>=1.7.6'], # LiveVoiceChat (requires PortAudio) 15 | 'all': ['pygame>=2.5.0', 'sounddevice>=0.4.6', 'pynput>=1.7.6'] # Install all optional dependencies 16 | } 17 | 18 | # Read the long description from the README file 19 | with open("README.md", encoding="utf-8") as f: 20 | long_description = f.read() 21 | 22 | setup( 23 | name="SimplerLLM", 24 | version="0.3.3.3", 25 | author="Hasan Aboul Hasan", 26 | author_email="hasan@learnwithhasan.com", 27 | description="An easy-to-use Library for interacting with language models.", 28 | long_description=long_description, 29 | long_description_content_type="text/markdown", 30 | url="https://github.com/hassancs91/SimplerLLM", 31 | packages=find_packages(), 32 | install_requires=requirements, 33 | extras_require=extras_require, 34 | python_requires=">=3.6", 35 | license='MIT', 36 | keywords="text generation, openai, LLM, RAG", 37 | classifiers=[ 38 | "Development Status :: 4 - Beta", 39 | "Intended Audience :: Developers", 40 | "License :: OSI Approved :: MIT License", 41 | "Programming Language :: Python :: 3", 42 | "Programming Language :: Python :: 3.6", 43 | "Programming Language :: Python :: 3.7", 44 | "Programming Language :: Python :: 3.8", 45 | "Programming Language :: Python :: 3.9", 46 | "Programming Language :: Python :: 3.10", 47 | "Programming Language :: Python :: 3.11", 48 | "Programming Language :: Python :: 3.12", 49 | "Topic :: Software Development :: Libraries :: Python Modules", 50 | ], 51 | 52 | 53 | # Add additional fields as necessary 54 | ) 55 | -------------------------------------------------------------------------------- /SimplerLLM/image/generation/providers/image_response_models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel 2 | from typing import Any, Optional, Union 3 | 4 | 5 | class ImageGenerationResponse(BaseModel): 6 | """Full response from image generation with metadata.""" 7 | 8 | image_data: Union[bytes, str] 9 | """The generated image - can be bytes, URL, or file path""" 10 | 11 | model: str 12 | """The model used (e.g., 'dall-e-3', 'dall-e-2')""" 13 | 14 | prompt: str 15 | """The original prompt provided by the user""" 16 | 17 | revised_prompt: Optional[str] = None 18 | """The revised/enhanced prompt used by the model (if provided by the model)""" 19 | 20 | size: str 21 | """The image dimensions (e.g., '1024x1024', '1792x1024')""" 22 | 23 | quality: Optional[str] = None 24 | """The quality setting (e.g., 'standard', 'hd' for DALL-E 3)""" 25 | 26 | style: Optional[str] = None 27 | """The style setting (e.g., 'vivid', 'natural' for DALL-E 3)""" 28 | 29 | process_time: float 30 | """Time taken to generate the image in seconds""" 31 | 32 | provider: Optional[str] = None 33 | """The image provider used (e.g., 'OPENAI_DALL_E')""" 34 | 35 | file_size: Optional[int] = None 36 | """Size of the image data in bytes""" 37 | 38 | output_path: Optional[str] = None 39 | """File path if image was saved to disk""" 40 | 41 | llm_provider_response: Optional[Any] = None 42 | """Raw response from the provider API""" 43 | 44 | class Config: 45 | json_schema_extra = { 46 | "example": { 47 | "image_data": "https://oaidalleapiprodscus.blob.core.windows.net/...", 48 | "model": "dall-e-3", 49 | "prompt": "A serene landscape with mountains", 50 | "revised_prompt": "A peaceful mountain landscape at sunset with snow-capped peaks", 51 | "size": "1024x1024", 52 | "quality": "standard", 53 | "style": "vivid", 54 | "process_time": 8.45, 55 | "provider": "OPENAI_DALL_E", 56 | "file_size": 245760, 57 | "output_path": "output/image.png", 58 | } 59 | } 60 | -------------------------------------------------------------------------------- /Documentation/docs/Advanced Tools/Extract YouTube Data.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 3 3 | --- 4 | 5 | # Extract YouTube Data 6 | 7 | The functions in this section are designed to extract the transcript of any YouTube videoalong with their timestamps if needed. You can benefit from these capabilities to build powerful APIs / tools / applications. 8 | 9 | ## YouTube Video Transcript 10 | 11 | The `get_youtube_transcript(video_url)` function also takes only the `video_url`, and it returns the transcript of the YouTube video, formatting it into a simple readable string. 12 | 13 | ### Example Usage 14 | 15 | ```python 16 | from SimplerLLM.tools.youtube import get_youtube_transcript 17 | 18 | video_transcript = get_youtube_transcript("https://www.youtube.com/watch?v=r9PjzmUmk1w") 19 | 20 | print(video_transcript) 21 | ``` 22 | 23 | ## YouTube Video Transcript With Timing 24 | 25 | The `get_youtube_transcript_with_timing(video_url)` function also takes only the `video_url`, and retrieves the transcript of a YouTube video, including timing information for each line. It returns a list of dictionaries, where each dictionary refers to a part of the transcript and it contains the following: 26 | - `text`: The transcript text of a specific segment of the video. 27 | - `start`: The start time of the segment in seconds. 28 | - `duration`: The duration of the segment in seconds. 29 | 30 | ### Example Usage 31 | 32 | ```python 33 | from SimplerLLM.tools.youtube import get_youtube_transcript_with_timing 34 | 35 | video_transcript = get_youtube_transcript_with_timing("https://www.youtube.com/watch?v=r9PjzmUmk1w") 36 | 37 | print(video_transcript) 38 | ``` 39 | 40 | Here's the output format of a small section: 41 | ``` 42 | [{'text': 'hi friends in this video I will show you', 'start': 0.12, 'duration': 6.08}, {'text': 'how to turn any WordPress website into a', 'start': 2.639, 'duration': 7.481}, {'text': 'full SAS business using only three', 'start': 6.2, 'duration': 7.639}, {'text': 'plugins this is exactly what I did on my', 'start': 10.12, 'duration': 6.56}, {'text': 'website you will see here I have a list', 'start': 13.839, 'duration': 5.401}] 43 | ``` 44 | 45 | That's how you can benefit from SimplerLLM to make extracting YouTube data Simpler! -------------------------------------------------------------------------------- /SimplerLLM/voice/video_dubbing/models.py: -------------------------------------------------------------------------------- 1 | """ 2 | Pydantic models for video dubbing. 3 | """ 4 | from pydantic import BaseModel, Field 5 | from typing import List, Optional, Dict 6 | 7 | 8 | class DubbedSegment(BaseModel): 9 | """Represents a single dubbed audio segment with timing information.""" 10 | 11 | index: int 12 | start_time: float # Original start time in seconds 13 | end_time: float # Original end time in seconds 14 | original_text: str 15 | translated_text: str 16 | audio_file: Optional[str] = None # Path to generated audio segment 17 | original_duration: float 18 | dubbed_duration: Optional[float] = None # Duration of generated audio 19 | speed_adjustment: Optional[float] = None # Speed factor applied (e.g., 1.2 = 20% faster) 20 | 21 | 22 | class DubbingConfig(BaseModel): 23 | """Configuration for video dubbing.""" 24 | 25 | target_language: str 26 | match_timing: bool = True # Whether to adjust speech speed to match original timing 27 | voice: Optional[str] = None # Voice ID or name (provider-specific) 28 | speed_range: tuple = (0.75, 1.5) # Min and max speed adjustment factors 29 | audio_format: str = "mp3" 30 | sample_rate: int = 44100 31 | 32 | 33 | class VideoDubbingResult(BaseModel): 34 | """Result of video dubbing operation.""" 35 | 36 | original_video_path: str 37 | output_video_path: str 38 | target_language: str 39 | source_language: Optional[str] = None 40 | segments: List[DubbedSegment] 41 | total_segments: int 42 | duration: float # Total video duration 43 | process_time: float 44 | tts_provider: Optional[str] = None 45 | tts_model: Optional[str] = None 46 | average_speed_adjustment: Optional[float] = None 47 | 48 | class Config: 49 | json_schema_extra = { 50 | "example": { 51 | "original_video_path": "video.mp4", 52 | "output_video_path": "video_dubbed_es.mp4", 53 | "target_language": "Spanish", 54 | "source_language": "English", 55 | "total_segments": 15, 56 | "duration": 120.5, 57 | "process_time": 45.2, 58 | "tts_provider": "ELEVENLABS", 59 | "average_speed_adjustment": 1.05 60 | } 61 | } 62 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_retrieval/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM-based hierarchical retrieval module for SimplerLLM. 3 | 4 | This module provides intelligent retrieval through cluster trees using 5 | LLMRouter for navigation decisions, offering explainable and accurate 6 | retrieval without relying on embeddings for the retrieval step. 7 | 8 | Key Features: 9 | - Hierarchical tree navigation using LLMRouter 10 | - Full explainability with reasoning chains 11 | - Confidence scores at every decision point 12 | - Multi-level cluster traversal 13 | - Performance statistics tracking 14 | 15 | Example Usage: 16 | ```python 17 | from SimplerLLM.language.llm import LLM, LLMProvider 18 | from SimplerLLM.language.llm_router import LLMRouter 19 | from SimplerLLM.language.llm_clustering import LLMClusterer, ChunkReference 20 | from SimplerLLM.language.llm_retrieval import LLMRetriever, RetrievalConfig 21 | 22 | # Setup 23 | llm = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-sonnet-4") 24 | router = LLMRouter(llm) 25 | 26 | # Cluster documents 27 | clusterer = LLMClusterer(llm) 28 | chunks = [ChunkReference(chunk_id=i, text=text) for i, text in enumerate(texts)] 29 | clustering_result = clusterer.cluster(chunks, build_hierarchy=True) 30 | 31 | # Setup retriever 32 | retriever = LLMRetriever(router, clustering_result.tree) 33 | 34 | # Retrieve relevant chunks 35 | response = retriever.retrieve( 36 | query="What are the main AI safety challenges?", 37 | top_k=3 38 | ) 39 | 40 | # Access results 41 | for result in response.results: 42 | print(f"Rank {result.rank}: {result.chunk_text[:100]}...") 43 | print(f"Confidence: {result.confidence:.2f}") 44 | print(f"Path: {' -> '.join(result.cluster_path)}") 45 | print(f"Reasoning: {result.reasoning}") 46 | 47 | # View navigation path 48 | print(response.format_navigation_path()) 49 | ``` 50 | """ 51 | 52 | from .models import ( 53 | # Retrieval models 54 | NavigationStep, 55 | RetrievalResult, 56 | HierarchicalRetrievalResponse, 57 | 58 | # Configuration 59 | RetrievalConfig, 60 | RetrievalStats, 61 | ) 62 | 63 | from .retriever import LLMRetriever 64 | 65 | __all__ = [ 66 | # Main API 67 | "LLMRetriever", 68 | 69 | # Data models 70 | "NavigationStep", 71 | "RetrievalResult", 72 | "HierarchicalRetrievalResponse", 73 | 74 | # Configuration 75 | "RetrievalConfig", 76 | "RetrievalStats", 77 | ] 78 | 79 | __version__ = "0.1.0" 80 | -------------------------------------------------------------------------------- /SimplerLLM/prompts/messages_template.py: -------------------------------------------------------------------------------- 1 | class MessagesTemplate: 2 | def __init__(self): 3 | self.messages = [] 4 | 5 | def add_user_message(self, content): 6 | self.messages.append({"role": "user", "content": content}) 7 | 8 | def add_assistant_message(self, content): 9 | self.messages.append({"role": "assistant", "content": content}) 10 | 11 | def validate_alternation(self): 12 | if not self.messages: 13 | return False, "MessageTemplate is empty." 14 | 15 | if self.messages[0]["role"] != "user": 16 | return False, "MessageTemplate must start with a user message." 17 | 18 | if self.messages[-1]["role"] != "user": 19 | return False, "MessageTemplate must end with a user message." 20 | 21 | for i in range(len(self.messages) - 1): 22 | if self.messages[i]["role"] == self.messages[i + 1]["role"]: 23 | return False, f"Consecutive messages found at index {i} and {i + 1}." 24 | 25 | return True, "MessageTemplate is valid." 26 | 27 | def prepend_messages(self, messages_list): 28 | """Prepends the current messages with a list of messages, ensuring alternation.""" 29 | if not all(isinstance(message, dict) and 'role' in message and 'content' in message for message in messages_list): 30 | raise ValueError("All items in the list must be dictionaries with 'role' and 'content' keys") 31 | 32 | if self.messages and messages_list: 33 | # Check if the first message of this template and the last message of the messages_list are from the same role 34 | if self.messages[0]['role'] == messages_list[-1]['role']: 35 | raise ValueError("Cannot merge messages due to role conflict at the boundary.") 36 | 37 | # Prepend by combining lists with messages_list coming first 38 | self.messages = messages_list + self.messages 39 | 40 | def get_messages(self): 41 | # Validate the message alternation 42 | is_valid, validation_message = self.validate_alternation() 43 | if is_valid: 44 | return self.messages 45 | else: 46 | raise ValueError(validation_message) 47 | 48 | def get_last_message(self): 49 | """Returns the last message if available, otherwise returns None.""" 50 | if not self.messages: 51 | return None 52 | return self.messages[-1] 53 | 54 | def __repr__(self): 55 | return f"MessageTemplate({self.messages})" 56 | 57 | 58 | 59 | -------------------------------------------------------------------------------- /Documentation/docs/LLM Interaction/Prompt Builders.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 4 3 | --- 4 | 5 | # Prompt Template Builder 6 | 7 | Easily create and manage prompt templates with SimplerLLM. This feature allows you to define templates with dynamic placeholders and populate them with single or multiple sets of parameters. 8 | 9 | The `Prompt Template Builder` provides tools to define, customize, and reuse prompt templates. Here's how you can use it: 10 | 11 | ## Single Value Prompt Template 12 | 13 | For basic templates with single sets of parameters: 14 | 15 | ```python 16 | from SimplerLLM.prompts.prompt_builder import create_prompt_template 17 | 18 | # Define your prompt template 19 | basic_prompt = "Generate 5 titles for a blog about {topic} and {style}" 20 | 21 | # Create a prompt template 22 | prompt_template = create_prompt_template(basic_prompt) 23 | 24 | # Assign values to the parameters 25 | prompt_template.assign_parms(topic="marketing", style="catchy") 26 | 27 | # Access the populated prompt 28 | print(prompt_template) 29 | ``` 30 | 31 | This will output the following: **Generate 5 titles for a blog about marketing and catchy** 32 | 33 | ## Multi-Value Prompt Template 34 | 35 | For working with multiple sets of parameters, use the `create_multi_value_prompts` function: 36 | 37 | ```python 38 | from SimplerLLM.prompts.prompt_builder import create_multi_value_prompts 39 | 40 | # Define your multi-value prompt template 41 | multi_value_prompt_template = """Hello {name}, your next meeting is on {date}, 42 | and bring a {object} with you.""" 43 | 44 | # Define multiple parameter sets 45 | params_list = [ 46 | {"name": "Alice", "date": "January 10th", "object": "dog"}, 47 | {"name": "Bob", "date": "January 12th", "object": "bag"}, 48 | {"name": "Charlie", "date": "January 15th", "object": "pen"} 49 | ] 50 | 51 | # Create and generate multi-value prompts 52 | multi_value_prompt = create_multi_value_prompts(multi_value_prompt_template) 53 | generated_prompts = multi_value_prompt.generate_prompts(params_list) 54 | 55 | # Access the generated prompts 56 | print("This is the updated first prompt:", generated_prompts[0]) 57 | print("This is the updated second prompt:", generated_prompts[1]) 58 | print("This is the updated third prompt:", generated_prompts[2]) 59 | ``` 60 | This will output the following: 61 | 62 | ```bash 63 | This is the updated first prompt: Hello Alice, your next meeting is on January 10th, and bring a dog with you. 64 | This is the updated second prompt: Hello Bob, your next meeting is on January 12th, and bring a bag with you. 65 | This is the updated third prompt: Hello Charlie, your next meeting is on January 15th, and bring a pen with you. 66 | ``` 67 | 68 | That's how you can benefit from SimplerLLM to make managing and creating prompts Simpler! -------------------------------------------------------------------------------- /SimplerLLM/tools/email_functions.py: -------------------------------------------------------------------------------- 1 | import ssl 2 | import smtplib 3 | import aiosmtplib 4 | from dotenv import load_dotenv 5 | from email.mime.text import MIMEText 6 | from email.mime.multipart import MIMEMultipart 7 | 8 | load_dotenv(override=True) 9 | 10 | def send_email(subject, message, recipient_email, sender_email, sender_app_pass, sender_host, sender_port=465): 11 | msg = MIMEMultipart() 12 | msg['From'] = sender_email 13 | msg['To'] = recipient_email 14 | msg['Subject'] = subject 15 | msg.attach(MIMEText(message, 'plain')) 16 | 17 | try: 18 | context = ssl.create_default_context() 19 | 20 | if sender_port == 465: # SSL connection 21 | with smtplib.SMTP_SSL(sender_host, sender_port, context=context) as server: 22 | server.login(sender_email, sender_app_pass) 23 | server.sendmail(sender_email, recipient_email, msg.as_string()) 24 | else: # STARTTLS connection (port 587) 25 | with smtplib.SMTP(sender_host, sender_port) as server: 26 | server.ehlo() 27 | server.starttls(context=context) 28 | server.ehlo() 29 | server.login(sender_email, sender_app_pass) 30 | server.sendmail(sender_email, recipient_email, msg.as_string()) 31 | 32 | print("Email sent successfully!") 33 | except Exception as e: 34 | print(f"Failed to send email: {e}") 35 | raise 36 | 37 | async def send_email_async(subject, message, recipient_email, sender_email, sender_app_pass, sender_host, sender_port=465): 38 | msg = MIMEMultipart() 39 | msg['From'] = sender_email 40 | msg['To'] = recipient_email 41 | msg['Subject'] = subject 42 | msg.attach(MIMEText(message, 'plain')) 43 | 44 | try: 45 | context = ssl.create_default_context() 46 | if sender_port == 465: # SSL connection 47 | await aiosmtplib.send( 48 | msg, 49 | hostname=sender_host, 50 | port=sender_port, 51 | username=sender_email, 52 | password=sender_app_pass, 53 | use_tls=True, 54 | tls_context=context, 55 | ) 56 | else: # STARTTLS connection (port 587) 57 | await aiosmtplib.send( 58 | msg, 59 | hostname=sender_host, 60 | port=sender_port, 61 | username=sender_email, 62 | password=sender_app_pass, 63 | start_tls=True, 64 | tls_context=context, 65 | ) 66 | 67 | print("Email sent successfully!") 68 | except Exception as e: 69 | print(f"Failed to send email: {e}") 70 | raise 71 | 72 | send_email( 73 | "Test Subject", "Test Body", 74 | "husein.70821@hotmail.com", "hussein.70821@gmail.com", "cvlopilatdeifsie", 75 | "smtp.gmail.com", sender_port=587 76 | ) 77 | -------------------------------------------------------------------------------- /Documentation/docs/LLM Interaction/Getting Started.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 1 3 | --- 4 | 5 | # Getting Started 6 | 7 | The unified LLM interface in SimplerLLM allows you to easily interact with multiple Large Language Models (LLMs) through a single, consistent function. 8 | 9 | Whether you plan on using OpenAI, Google Gemini, Anthropic, Ollamma local model, or even our own LLM Playgound, SimplerLLM provides a clear and easy way to integrate and switch between these providers while maintaining a consistent code structure. 10 | 11 | ## Why Use a Unified Interface? 12 | 13 | Managing multiple LLM providers can be challenging, especially when each provider has its own API structure, unique methods, and configurations. The unified interface solves this problem by standardizing the way you interact with these models, making it easier to: 14 | - **Switch between providers**: You can switch between LLM providers by just changing some parameters in the same function, keeping the code structure as is. 15 | - **Reduce provider dependency**: If the LLM provider you're using gets shutdown or stops working for a certain cause, you can easily switch to another LLM provider keeping the code as is. 16 | 17 | ## How It Works 18 | 19 | SimplerLLM’s unified interface is built on the concept of defining a common `LLMProvider` and then creating instances of the `LLM` class based on that provider. The API is designed to be simple and consistent across all supported models. 20 | 21 | ### Setting Up the LLM Instance 22 | 23 | Here’s a basic example that shows how you can easily switch between different providers: 24 | 25 | ```python 26 | from SimplerLLM.language.llm import LLM, LLMProvider 27 | 28 | # For OpenAI 29 | llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-3.5-turbo") 30 | 31 | # For Google Gemini 32 | #llm_instance = LLM.create(provider=LLMProvider.GEMINI, model_name="gemini-1.5-flash") 33 | 34 | # For Anthropic Claude 35 | #llm_instance = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-3-5-sonnet-20240620") 36 | 37 | # For Ollama (Local Model) 38 | #llm_instance = LLM.create(provider=LLMProvider.OLLAMA, model_name="ollama-local-model") 39 | 40 | # Generate a response 41 | response = llm_instance.generate_response(prompt="generate a 5 words sentence") 42 | print(response) 43 | ``` 44 | 45 | As you can see it's very straightforward to switch between LLMs. You just need to create and LLM instance by picking the provider you want and the model name and you're ready to use it. 46 | 47 | After that you'll need to call the `generate_response` function which remains the same regarless of the provider you choose; passing your desired prompt to call the API and get the response. 48 | 49 | Finally, see the response by printing it in the terminal. 50 | 51 | As you can see, switching between LLMs is straightforward. Simply create an LLM instance by selecting the provider and model name. Once set up, you're ready to generate responses using a consistent and simple API. -------------------------------------------------------------------------------- /SimplerLLM/language/flow/models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel, Field 2 | from typing import Optional, List, Any 3 | from datetime import datetime 4 | 5 | 6 | class StepResult(BaseModel): 7 | """Result from a single step execution in a flow.""" 8 | step_number: int = Field(description="The step number in the flow (1-indexed)") 9 | step_type: str = Field(description="Type of step: 'llm' or 'tool'") 10 | input_data: Any = Field(description="Input data for this step") 11 | output_data: Any = Field(description="Output data from this step (can be str, dict, or Pydantic model)") 12 | duration_seconds: float = Field(description="Time taken to execute this step in seconds") 13 | tool_used: Optional[str] = Field(default=None, description="Name of the tool used (if step_type is 'tool')") 14 | prompt_used: Optional[str] = Field(default=None, description="Prompt used (if step_type is 'llm')") 15 | output_model_class: Optional[str] = Field(default=None, description="Name of the Pydantic model class used for JSON output (if applicable)") 16 | error: Optional[str] = Field(default=None, description="Error message if the step failed") 17 | 18 | class Config: 19 | json_schema_extra = { 20 | "example": { 21 | "step_number": 1, 22 | "step_type": "tool", 23 | "input_data": "https://youtube.com/watch?v=xyz", 24 | "output_data": "Video transcript text...", 25 | "duration_seconds": 2.5, 26 | "tool_used": "youtube_transcript", 27 | "prompt_used": None, 28 | "error": None 29 | } 30 | } 31 | 32 | 33 | class FlowResult(BaseModel): 34 | """Result from a complete flow execution.""" 35 | agent_name: str = Field(description="Name of the mini agent that executed the flow") 36 | total_steps: int = Field(description="Total number of steps executed") 37 | steps: List[StepResult] = Field(description="List of step results in execution order") 38 | total_duration_seconds: float = Field(description="Total time taken to execute the flow in seconds") 39 | final_output: Any = Field(description="Final output from the last step") 40 | success: bool = Field(description="Whether the flow completed successfully") 41 | error: Optional[str] = Field(default=None, description="Error message if the flow failed") 42 | executed_at: datetime = Field(default_factory=datetime.now, description="Timestamp of execution") 43 | 44 | class Config: 45 | json_schema_extra = { 46 | "example": { 47 | "agent_name": "YouTube Summarizer", 48 | "total_steps": 2, 49 | "steps": [], 50 | "total_duration_seconds": 5.8, 51 | "final_output": "Summary of the video...", 52 | "success": True, 53 | "error": None, 54 | "executed_at": "2025-01-15T10:30:00" 55 | } 56 | } 57 | -------------------------------------------------------------------------------- /SimplerLLM/language/guardrails/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | SimplerLLM Guardrails Module 3 | 4 | This module provides a comprehensive guardrails system for SimplerLLM that adds 5 | safety, quality, and compliance checks to LLM interactions. 6 | 7 | Guardrails can be applied before (input) or after (output) LLM generation to: 8 | - Enforce safety and ethical guidelines 9 | - Detect and handle PII 10 | - Validate output formats 11 | - Filter prohibited content 12 | - Ensure quality constraints 13 | 14 | Example: 15 | >>> from SimplerLLM.language.guardrails import ( 16 | ... GuardrailsLLM, 17 | ... PromptInjectionGuardrail, 18 | ... OutputPIIDetectionGuardrail 19 | ... ) 20 | >>> from SimplerLLM.language.llm.base import LLM, LLMProvider 21 | >>> 22 | >>> # Create base LLM 23 | >>> llm = LLM.create(provider=LLMProvider.OPENAI, api_key="...") 24 | >>> 25 | >>> # Add guardrails 26 | >>> guardrailed_llm = GuardrailsLLM( 27 | ... llm_instance=llm, 28 | ... input_guardrails=[PromptInjectionGuardrail()], 29 | ... output_guardrails=[OutputPIIDetectionGuardrail(config={"action_on_detect": "redact"})] 30 | ... ) 31 | >>> 32 | >>> # Use like normal LLM 33 | >>> response = guardrailed_llm.generate_response( 34 | ... prompt="Hello!", 35 | ... full_response=True 36 | ... ) 37 | >>> print(response.guardrails_metadata) 38 | """ 39 | 40 | # Core classes 41 | from .base import ( 42 | GuardrailAction, 43 | GuardrailResult, 44 | InputGuardrail, 45 | OutputGuardrail, 46 | CompositeGuardrail, 47 | ) 48 | 49 | # Wrapper 50 | from .wrapper import GuardrailsLLM 51 | 52 | # Exceptions 53 | from .exceptions import ( 54 | GuardrailException, 55 | GuardrailBlockedException, 56 | GuardrailValidationException, 57 | GuardrailConfigurationException, 58 | GuardrailTimeoutException, 59 | ) 60 | 61 | # Input guardrails 62 | from .input_guardrails import ( 63 | PromptInjectionGuardrail, 64 | TopicFilterGuardrail, 65 | InputPIIDetectionGuardrail, 66 | ) 67 | 68 | # Output guardrails 69 | from .output_guardrails import ( 70 | FormatValidatorGuardrail, 71 | OutputPIIDetectionGuardrail, 72 | ContentSafetyGuardrail, 73 | LengthValidatorGuardrail, 74 | ) 75 | 76 | __all__ = [ 77 | # Core classes 78 | "GuardrailAction", 79 | "GuardrailResult", 80 | "InputGuardrail", 81 | "OutputGuardrail", 82 | "CompositeGuardrail", 83 | # Wrapper 84 | "GuardrailsLLM", 85 | # Exceptions 86 | "GuardrailException", 87 | "GuardrailBlockedException", 88 | "GuardrailValidationException", 89 | "GuardrailConfigurationException", 90 | "GuardrailTimeoutException", 91 | # Input guardrails 92 | "PromptInjectionGuardrail", 93 | "TopicFilterGuardrail", 94 | "InputPIIDetectionGuardrail", 95 | # Output guardrails 96 | "FormatValidatorGuardrail", 97 | "OutputPIIDetectionGuardrail", 98 | "ContentSafetyGuardrail", 99 | "LengthValidatorGuardrail", 100 | ] 101 | 102 | # Version 103 | __version__ = "1.0.0" 104 | -------------------------------------------------------------------------------- /SimplerLLM/voice/realtime_voice/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Realtime Voice API integration for SimplerLLM. 3 | 4 | This module provides a unified interface for building AI voice agents using 5 | realtime voice APIs including OpenAI Realtime API and ElevenLabs Conversational AI. 6 | 7 | Example (OpenAI): 8 | >>> from SimplerLLM import RealtimeVoice, RealtimeVoiceProvider 9 | >>> realtime = RealtimeVoice.create( 10 | ... provider=RealtimeVoiceProvider.OPENAI, 11 | ... model="gpt-4o-realtime-preview-2024-10-01", 12 | ... voice="alloy" 13 | ... ) 14 | >>> await realtime.connect() 15 | >>> await realtime.send_text("Hello!") 16 | 17 | Example (ElevenLabs with Custom Voice): 18 | >>> realtime = RealtimeVoice.create( 19 | ... provider=RealtimeVoiceProvider.ELEVENLABS, 20 | ... voice_id="your_cloned_voice_id", 21 | ... model="gpt-4o-mini" 22 | ... ) 23 | >>> await realtime.connect() 24 | >>> await realtime.send_audio(audio_bytes) 25 | """ 26 | 27 | from .base import RealtimeVoice, RealtimeVoiceProvider 28 | from .models import ( 29 | RealtimeSessionConfig, 30 | ElevenLabsSessionConfig, 31 | RealtimeMessage, 32 | RealtimeResponse, 33 | RealtimeEvent, 34 | RealtimeError, 35 | RealtimeFunctionCall, 36 | RealtimeUsage, 37 | TurnDetectionType, 38 | TurnDetectionConfig, 39 | InputAudioTranscriptionConfig, 40 | Modality, 41 | AudioFormat, 42 | Voice 43 | ) 44 | from .providers import ( 45 | RealtimeFullResponse, 46 | RealtimeStreamChunk, 47 | RealtimeSessionInfo, 48 | RealtimeConversationItem, 49 | RealtimeFunctionCallResult 50 | ) 51 | from .wrappers import OpenAIRealtimeVoice, ElevenLabsRealtimeVoice 52 | from .realtime_voice_chat import RealtimeVoiceChat, RealtimeVoiceChatConfig 53 | from .audio_utils import ( 54 | resample_audio, 55 | resample_24k_to_16k, 56 | resample_16k_to_24k, 57 | AudioResampler 58 | ) 59 | 60 | __all__ = [ 61 | # Base classes 62 | 'RealtimeVoice', 63 | 'RealtimeVoiceProvider', 64 | 65 | # Configuration models 66 | 'RealtimeSessionConfig', 67 | 'ElevenLabsSessionConfig', 68 | 'TurnDetectionConfig', 69 | 'InputAudioTranscriptionConfig', 70 | 71 | # Message and response models 72 | 'RealtimeMessage', 73 | 'RealtimeResponse', 74 | 'RealtimeEvent', 75 | 'RealtimeError', 76 | 'RealtimeFunctionCall', 77 | 'RealtimeUsage', 78 | 79 | # Enums 80 | 'TurnDetectionType', 81 | 'Modality', 82 | 'AudioFormat', 83 | 'Voice', 84 | 85 | # Provider response models 86 | 'RealtimeFullResponse', 87 | 'RealtimeStreamChunk', 88 | 'RealtimeSessionInfo', 89 | 'RealtimeConversationItem', 90 | 'RealtimeFunctionCallResult', 91 | 92 | # Wrappers 93 | 'OpenAIRealtimeVoice', 94 | 'ElevenLabsRealtimeVoice', 95 | 96 | # Voice chat 97 | 'RealtimeVoiceChat', 98 | 'RealtimeVoiceChatConfig', 99 | 100 | # Audio utilities 101 | 'resample_audio', 102 | 'resample_24k_to_16k', 103 | 'resample_16k_to_24k', 104 | 'AudioResampler' 105 | ] 106 | -------------------------------------------------------------------------------- /Documentation/blog/2019-05-29-long-blog-post.md: -------------------------------------------------------------------------------- 1 | --- 2 | slug: long-blog-post 3 | title: Long Blog Post 4 | authors: yangshun 5 | tags: [hello, docusaurus] 6 | --- 7 | 8 | This is the summary of a very long blog post, 9 | 10 | Use a `` comment to limit blog post size in the list view. 11 | 12 | 13 | 14 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 15 | 16 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 17 | 18 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 19 | 20 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 21 | 22 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 23 | 24 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 25 | 26 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 27 | 28 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 29 | 30 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 31 | 32 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 33 | 34 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 35 | 36 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 37 | 38 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 39 | 40 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 41 | 42 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 43 | 44 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet 45 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_router/models.py: -------------------------------------------------------------------------------- 1 | """ 2 | Models for the LLM Router system. 3 | """ 4 | 5 | from typing import Dict, Optional, List 6 | from pydantic import BaseModel, Field 7 | 8 | class RouterChoice(BaseModel): 9 | """Response model for router predictions""" 10 | selected_index: int = Field( 11 | description="Index of the selected choice", 12 | ge=0 13 | ) 14 | confidence_score: float = Field( 15 | description="Confidence score for the selection", 16 | ge=0, 17 | le=1 18 | ) 19 | reasoning: str = Field( 20 | description="Explanation for why this choice was selected" 21 | ) 22 | 23 | class RouterMultiResponse(BaseModel): 24 | choices: List[RouterChoice] = Field( 25 | description="List of selected choices in order of confidence", 26 | max_items=3 27 | ) 28 | 29 | class RouterResponse(BaseModel): 30 | """Response model for router predictions""" 31 | selected_index: int = Field( 32 | description="Index of the selected choice", 33 | ge=0 34 | ) 35 | confidence_score: float = Field( 36 | description="Confidence score for the selection", 37 | ge=0, 38 | le=1 39 | ) 40 | reasoning: str = Field( 41 | description="Explanation for why this choice was selected" 42 | ) 43 | 44 | class Choice: 45 | """Represents a single choice in the router""" 46 | def __init__(self, content: str, metadata: Optional[Dict] = None): 47 | self.content = self._clean_string(content) 48 | self.metadata = metadata or {} 49 | 50 | @staticmethod 51 | def _clean_string(text: str) -> str: 52 | """Clean and normalize string content""" 53 | if not text: 54 | raise ValueError("Choice content cannot be empty") 55 | return " ".join(text.split()) 56 | 57 | def __repr__(self): 58 | return f"Choice(content={self.content[:50]}...)" 59 | 60 | class PromptTemplate: 61 | """Handles prompt generation for the router""" 62 | def __init__(self, template: Optional[str] = None): 63 | self.template = template or self._default_template() 64 | 65 | @staticmethod 66 | def _default_template() -> str: 67 | return """Your task is to choose the best choice from the list below based on the following input: {input} 68 | 69 | Choice List: 70 | {choices} 71 | 72 | Select the most appropriate choice based on relevance to the input. 73 | Provide a clear reasoning for your selection.""" 74 | 75 | @staticmethod 76 | def _default_template_top_k() -> str: 77 | return """Your task is to choose the top {k} most appropriate choices from the list below based on the following input: {input} 78 | 79 | Choice List: 80 | {choices} 81 | 82 | Select the {k} most appropriate choices based on relevance to the input. 83 | Provide reasoning for each selection. 84 | 85 | """ 86 | 87 | def format(self, input_text: str, choices_text: str, k: Optional[int] = None) -> str: 88 | if k is not None: 89 | return self._default_template_top_k().format( 90 | input=input_text, 91 | choices=choices_text, 92 | k=k 93 | ) 94 | return self.template.format( 95 | input=input_text, 96 | choices=choices_text 97 | ) 98 | -------------------------------------------------------------------------------- /SimplerLLM/prompts/hub/agentic_prompts.py: -------------------------------------------------------------------------------- 1 | tool_calling_agent_system_prompt = """ 2 | You are an agent with access to a toolbox. Given a user query, 3 | you will determine which tool, if any, is best suited to answer the query. 4 | 5 | IF NO TOOL IS REQUIRED, just provide a response to the query. 6 | 7 | The tools available are: 8 | {actions_list} 9 | 10 | \n\n 11 | IF TOOL IS REQUIRED: Return the tool in the following JSON format: 12 | Action: 13 | {{ 14 | "function_name": tool_name, 15 | "function_params": {{ 16 | "parameter_name": "parameter_value" 17 | }} 18 | }} 19 | """.strip() 20 | 21 | reflection_core_agent_system_prompt = """ 22 | 23 | You run in a loop of SELF REFLECTION AND CRITICISM. 24 | 25 | You will self criticis your own answers to get better answers. 26 | 27 | There will be no human to ask, IT IS ONLY YOU GENERATING BETTER ANSWERS 28 | 29 | """.strip() 30 | 31 | react_core_agent_system_prompt_test = """ 32 | 33 | 34 | You run in a loop of THOUGHT, ACTION, PAUSE, OBSERVATION. 35 | 36 | Use THOUGHT to understand the question you have been asked. 37 | Use ACTION to run one of the actions available to you - then return PAUSE. 38 | OBSERVATION will be the result of running those actions. 39 | 40 | Your available Actions are: 41 | {actions_list} 42 | 43 | To use an action, MAKE SURE to use the following format: 44 | Action: 45 | {{ 46 | "function_name": tool_name, 47 | "function_params": {{ 48 | "parameter_name": "parameter_value" 49 | }} 50 | }} 51 | 52 | OBSERVATION: the result of the action. 53 | 54 | 55 | 56 | """.strip() 57 | 58 | react_core_agent_system_prompt = """ 59 | You run in a loop of THOUGHT, ACTION, PAUSE, OBSERVATION. 60 | 61 | At the end of the loop you output an ANSWER. 62 | 63 | Use THOUGHT to understand the question you have been asked. 64 | Use ACTION to run one of the actions available to you - then return PAUSE. 65 | OBSERVATION will be the result of running those actions. 66 | 67 | 68 | Your available Actions are: 69 | {actions_list} 70 | 71 | To use an action, please use the following format: 72 | Action: 73 | {{ 74 | "function_name": tool_name, 75 | "function_params": {{ 76 | "parameter_name": "parameter_value" 77 | }} 78 | }} 79 | 80 | OBSERVATION: the result of the action. 81 | 82 | 83 | The ANSWER should be in this JSON format: 84 | Action: 85 | {{ 86 | "final_answer": answer, 87 | }} 88 | 89 | """.strip() -------------------------------------------------------------------------------- /SimplerLLM/prompts/prompt_builder.py: -------------------------------------------------------------------------------- 1 | class SimplePrompt: 2 | """ 3 | A class for creating and manipulating simple prompt templates. 4 | """ 5 | 6 | def __init__(self, template: str): 7 | if not isinstance(template, str): 8 | raise ValueError("Template must be a string") 9 | self.template = template 10 | self.content = '' # Holds the latest filled template 11 | 12 | def assign_parms(self, **kwargs) -> str: 13 | """ 14 | Assigns parameters to the template and returns the filled template. 15 | """ 16 | try: 17 | self.content = self.template.format(**kwargs) 18 | except KeyError as e: 19 | raise KeyError(f"Missing a required key in the template: {e}") 20 | except Exception as e: 21 | # Catch-all for other exceptions related to string formatting 22 | raise ValueError(f"Error processing the template: {e}") 23 | return self.content 24 | 25 | def update_template(self, new_template: str): 26 | """ 27 | Updates the template and clears the latest content. 28 | """ 29 | if not isinstance(new_template, str): 30 | raise ValueError("New template must be a string") 31 | self.template = new_template 32 | self.content = '' 33 | 34 | def __str__(self) -> str: 35 | return self.content 36 | 37 | def create_prompt_template(template_string: str) -> SimplePrompt: 38 | """ 39 | Factory function to create a SimpleTemplate instance. 40 | """ 41 | if not isinstance(template_string, str): 42 | raise ValueError("Template string must be a string") 43 | return SimplePrompt(template_string) 44 | 45 | 46 | 47 | class MultiValuePrompt: 48 | """ 49 | A class for creating and manipulating prompt templates with multiple sets of parameters. 50 | """ 51 | 52 | def __init__(self, template: str): 53 | if not isinstance(template, str): 54 | raise ValueError("Template must be a string") 55 | self.template = template 56 | self.generated_prompts = [] # Holds the generated prompts 57 | 58 | def generate_prompts(self, params_list: list) -> list: 59 | """ 60 | Generates prompts for each set of parameters in the params_list. 61 | """ 62 | if not all(isinstance(params, dict) for params in params_list): 63 | raise ValueError("Each item in params_list must be a dictionary") 64 | 65 | self.generated_prompts = [] 66 | for params in params_list: 67 | try: 68 | filled_prompt = self.template.format(**params) 69 | self.generated_prompts.append(filled_prompt) 70 | except KeyError as e: 71 | raise KeyError(f"Missing a required key in the template: {e}") 72 | except Exception as e: 73 | raise ValueError(f"Error processing the template: {e}") 74 | 75 | return self.generated_prompts 76 | 77 | def __str__(self) -> str: 78 | return "\n".join(self.generated_prompts) 79 | 80 | def create_multi_value_prompts(template_string: str) -> MultiValuePrompt: 81 | """ 82 | Factory function to create a FewShotPrompt instance. 83 | """ 84 | if not isinstance(template_string, str): 85 | raise ValueError("Template string must be a string") 86 | return MultiValuePrompt(template_string) -------------------------------------------------------------------------------- /SimplerLLM/language/llm_clustering/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | LLM-based clustering module for SimplerLLM. 3 | 4 | This module provides intelligent text clustering using LLM semantic understanding 5 | rather than traditional embedding-based approaches. It supports both flat clustering 6 | and hierarchical tree structures. 7 | 8 | Key Features: 9 | - Incremental cluster matching for consistency 10 | - Multi-cluster assignment support 11 | - Automatic hierarchical tree building 12 | - Rich metadata generation for each cluster 13 | - Configurable confidence thresholds and parameters 14 | 15 | Example Usage: 16 | ```python 17 | from SimplerLLM.language.llm import LLM, LLMProvider 18 | from SimplerLLM.language.llm_clustering import ( 19 | LLMClusterer, 20 | ChunkReference, 21 | ClusteringConfig, 22 | TreeConfig 23 | ) 24 | 25 | # Initialize LLM and clusterer 26 | llm = LLM.create(provider=LLMProvider.ANTHROPIC, model_name="claude-sonnet-4") 27 | clusterer = LLMClusterer(llm) 28 | 29 | # Prepare chunks 30 | chunks = [ 31 | ChunkReference(chunk_id=i, text=chunk_text) 32 | for i, chunk_text in enumerate(document_chunks) 33 | ] 34 | 35 | # Cluster with automatic hierarchy 36 | result = clusterer.cluster(chunks, build_hierarchy=True) 37 | 38 | # Access results 39 | print(f"Total clusters: {len(result.clusters)}") 40 | print(f"Tree depth: {result.tree.max_depth if result.tree else 0}") 41 | 42 | # Get clusters for a specific chunk 43 | chunk_clusters = result.get_clusters_for_chunk(chunk_id=5) 44 | ``` 45 | """ 46 | 47 | from .models import ( 48 | # Core data models 49 | Cluster, 50 | ClusterMetadata, 51 | ChunkReference, 52 | ClusterMatch, 53 | ChunkMatchingResult, 54 | ClusterTree, 55 | ClusteringResult, 56 | 57 | # Configuration 58 | ClusteringConfig, 59 | TreeConfig, 60 | ) 61 | 62 | from .clusterer import LLMClusterer 63 | from .flat_clusterer import FlatClusterer 64 | from .tree_builder import TreeBuilder 65 | from .persistence import ( 66 | save_clustering_result, 67 | load_clustering_result, 68 | save_cluster_tree, 69 | load_cluster_tree, 70 | get_clustering_stats, 71 | save_clustering_result_optimized, 72 | load_clustering_result_optimized 73 | ) 74 | from .chunk_store import ( 75 | ChunkStore, 76 | InMemoryChunkStore, 77 | SQLiteChunkStore, 78 | create_chunk_store 79 | ) 80 | 81 | __all__ = [ 82 | # Main API 83 | "LLMClusterer", 84 | 85 | # Advanced APIs 86 | "FlatClusterer", 87 | "TreeBuilder", 88 | 89 | # Data models 90 | "Cluster", 91 | "ClusterMetadata", 92 | "ChunkReference", 93 | "ClusterMatch", 94 | "ChunkMatchingResult", 95 | "ClusterTree", 96 | "ClusteringResult", 97 | 98 | # Configuration 99 | "ClusteringConfig", 100 | "TreeConfig", 101 | 102 | # Persistence 103 | "save_clustering_result", 104 | "load_clustering_result", 105 | "save_cluster_tree", 106 | "load_cluster_tree", 107 | "get_clustering_stats", 108 | "save_clustering_result_optimized", 109 | "load_clustering_result_optimized", 110 | 111 | # Chunk Storage 112 | "ChunkStore", 113 | "InMemoryChunkStore", 114 | "SQLiteChunkStore", 115 | "create_chunk_store", 116 | ] 117 | 118 | __version__ = "0.1.0" 119 | -------------------------------------------------------------------------------- /SimplerLLM/voice/__init__.py: -------------------------------------------------------------------------------- 1 | from .tts import TTS, TTSProvider, OpenAITTS, ElevenLabsTTS, TTSFullResponse 2 | from .stt import STT, STTProvider, OpenAISTT, STTFullResponse 3 | from .voice_chat import ( 4 | VoiceChat, 5 | VoiceChatConfig, 6 | ConversationMessage, 7 | ConversationRole, 8 | VoiceTurnResult, 9 | VoiceChatSession, 10 | ConversationManager 11 | ) 12 | from .live_voice_chat import LiveVoiceChatConfig 13 | 14 | # LiveVoiceChat requires sounddevice/pynput which need PortAudio 15 | try: 16 | from .live_voice_chat import ( 17 | LiveVoiceChat, 18 | AudioRecorder, 19 | AudioPlayer 20 | ) 21 | _LIVE_VOICE_AVAILABLE = True 22 | except (ImportError, OSError): 23 | LiveVoiceChat = None 24 | AudioRecorder = None 25 | AudioPlayer = None 26 | _LIVE_VOICE_AVAILABLE = False 27 | from .dialogue_generator import ( 28 | DialogueGenerator, 29 | Dialogue, 30 | DialogueLine, 31 | SpeakerConfig, 32 | DialogueGenerationConfig, 33 | AudioDialogueResult, 34 | DialogueStyle 35 | ) 36 | from .video_transcription import ( 37 | VideoTranscriber, 38 | MultiLanguageCaptionGenerator, 39 | VideoTranscriptionResult, 40 | CaptionSegment, 41 | LanguageCaptions, 42 | MultiLanguageCaptionsResult 43 | ) 44 | from .video_dubbing import ( 45 | VideoDubber, 46 | DubbedSegment, 47 | DubbingConfig, 48 | VideoDubbingResult 49 | ) 50 | from .realtime_voice import ( 51 | RealtimeVoice, 52 | RealtimeVoiceProvider, 53 | RealtimeSessionConfig, 54 | OpenAIRealtimeVoice, 55 | TurnDetectionType, 56 | Voice, 57 | AudioFormat, 58 | Modality, 59 | RealtimeVoiceChat, 60 | RealtimeVoiceChatConfig 61 | ) 62 | 63 | __all__ = [ 64 | # TTS 65 | 'TTS', 66 | 'TTSProvider', 67 | 'OpenAITTS', 68 | 'ElevenLabsTTS', 69 | 'TTSFullResponse', 70 | # STT 71 | 'STT', 72 | 'STTProvider', 73 | 'OpenAISTT', 74 | 'STTFullResponse', 75 | # VoiceChat 76 | 'VoiceChat', 77 | 'VoiceChatConfig', 78 | 'ConversationMessage', 79 | 'ConversationRole', 80 | 'VoiceTurnResult', 81 | 'VoiceChatSession', 82 | 'ConversationManager', 83 | # LiveVoiceChat (config always available, others require PortAudio) 84 | 'LiveVoiceChatConfig', 85 | '_LIVE_VOICE_AVAILABLE', 86 | # Dialogue Generator 87 | 'DialogueGenerator', 88 | 'Dialogue', 89 | 'DialogueLine', 90 | 'SpeakerConfig', 91 | 'DialogueGenerationConfig', 92 | 'AudioDialogueResult', 93 | 'DialogueStyle', 94 | # Video Transcription 95 | 'VideoTranscriber', 96 | 'MultiLanguageCaptionGenerator', 97 | 'VideoTranscriptionResult', 98 | 'CaptionSegment', 99 | 'LanguageCaptions', 100 | 'MultiLanguageCaptionsResult', 101 | # Video Dubbing 102 | 'VideoDubber', 103 | 'DubbedSegment', 104 | 'DubbingConfig', 105 | 'VideoDubbingResult', 106 | # Realtime Voice 107 | 'RealtimeVoice', 108 | 'RealtimeVoiceProvider', 109 | 'RealtimeSessionConfig', 110 | 'OpenAIRealtimeVoice', 111 | 'TurnDetectionType', 112 | 'Voice', 113 | 'AudioFormat', 114 | 'Modality', 115 | 'RealtimeVoiceChat', 116 | 'RealtimeVoiceChatConfig', 117 | ] 118 | 119 | # Conditionally add LiveVoiceChat exports if available 120 | if _LIVE_VOICE_AVAILABLE: 121 | __all__.extend(['LiveVoiceChat', 'AudioRecorder', 'AudioPlayer']) 122 | -------------------------------------------------------------------------------- /Documentation/docs/LLM Interaction/Consistent JSON from any LLM.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 3 3 | --- 4 | 5 | # Consistent JSON with LLMs 6 | 7 | This section introduces how SimplerLLM helps ensure a consistent JSON-structured response from LLMs. This functionality is useful when integrating LLM outputs into your software, where maintaining a stable JSON format is important for processing or automation. 8 | 9 | The feature uses Pydantic models to validate and standardize LLM outputs, ensuring seamless integration into your applications. 10 | 11 | In this way you won't need to include in every prompt you give the LLM, how it should make your output in a json structure and in which format. 12 | 13 | ## Key Functions 14 | 15 | SimplerLLM offers two functions for this purpose: one synchronous and one asynchronous. Both rely on a Pydantic model that you define, which acts as the structure for the LLM's response. 16 | 17 | > **Note:** You can use any LLM provider by modifying the `llm_instance` variable to include the provider of your choice. To learn more about setting up different LLM providers, refer to the [Choose the Right LLM](https://docs.simplerllm.com/LLM%20Interaction/Consistent%20JSON%20from%20any%20LLM) page in this documentation. 18 | 19 | ### Synchronous Function 20 | 21 | The synchronous function is `generate_pydantic_json_model`, and here's how you can use it: 22 | 23 | ```python 24 | from pydantic import BaseModel 25 | from SimplerLLM.language.llm import LLM, LLMProvider 26 | from SimplerLLM.language.llm_addons import generate_pydantic_json_model 27 | 28 | # Define your Pydantic model 29 | class LLMResponse(BaseModel): 30 | response: str 31 | 32 | # Initialize the LLM instance 33 | llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o") 34 | prompt = "Generate a sentence about the importance of AI" 35 | 36 | # Generate and parse the JSON response 37 | output = generate_pydantic_json_model( 38 | llm_instance=llm_instance, 39 | prompt=prompt, 40 | model_class=LLMResponse 41 | ) 42 | json_output = output.model_dump() 43 | print(json_output) 44 | ``` 45 | 46 | The `output` is an object of type `LLMResponse`, and the `model_dump()` method converts it into a dictionary or JSON-like format. 47 | 48 | ### Asynchronous Function 49 | 50 | For asynchronous applications, the `generate_pydantic_json_model_async` function provides the same functionality but in an async context. Here's an example: 51 | 52 | ```python 53 | import asyncio 54 | from pydantic import BaseModel 55 | from SimplerLLM.language.llm import LLM, LLMProvider 56 | from SimplerLLM.language.llm_addons import generate_pydantic_json_model_async 57 | 58 | # Define your Pydantic model 59 | class LLMResponse(BaseModel): 60 | response: str 61 | 62 | # Initialize the LLM instance 63 | llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4o") 64 | prompt = "Generate a sentence about the importance of AI" 65 | 66 | # Asynchronous usage 67 | async def main(): 68 | output = await generate_pydantic_json_model_async( 69 | llm_instance=llm_instance, 70 | prompt=prompt, 71 | model_class=LLMResponse 72 | ) 73 | json_output = output.model_dump() 74 | print(json_output) 75 | 76 | asyncio.run(main()) 77 | ``` 78 | 79 | The asynchronous function is ideal for use cases where you need to fetch results without blocking other operations in your application. 80 | 81 | By using these functions, you can effortlessly maintain a stable JSON structure in your LLM responses, making interaction with LLM providers Simpler! -------------------------------------------------------------------------------- /SimplerLLM/language/llm_providers/llm_response_models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel 2 | from typing import Any, Optional, List, Dict 3 | from datetime import datetime 4 | 5 | 6 | class LLMFullResponse(BaseModel): 7 | generated_text: str 8 | model: str 9 | process_time: float 10 | input_token_count: Optional[int] = None 11 | output_token_count: Optional[int] = None 12 | llm_provider_response: Any 13 | model_object: Optional[Any] = None 14 | provider: Optional[Any] = None 15 | model_name: Optional[str] = None 16 | guardrails_metadata: Optional[Dict[str, Any]] = None 17 | web_sources: Optional[List[Dict[str, Any]]] = None 18 | 19 | 20 | class LLMEmbeddingsResponse(BaseModel): 21 | generated_embedding: Any 22 | model: str 23 | process_time: float 24 | llm_provider_response: Any 25 | 26 | 27 | class PatternMatch(BaseModel): 28 | """Represents a single pattern match extracted from text.""" 29 | value: str 30 | """The extracted value as found in the text""" 31 | 32 | normalized_value: Optional[str] = None 33 | """The normalized version of the value (if normalization was applied)""" 34 | 35 | pattern_type: str 36 | """The type of pattern matched (e.g., 'email', 'phone', 'custom')""" 37 | 38 | position: int 39 | """The starting position of the match in the original text""" 40 | 41 | is_valid: bool = True 42 | """Whether the match passed validation checks beyond regex""" 43 | 44 | validation_message: Optional[str] = None 45 | """Validation details or error message if validation failed""" 46 | 47 | confidence: Optional[float] = None 48 | """Match quality score (0-1), if applicable""" 49 | 50 | class Config: 51 | json_schema_extra = { 52 | "example": { 53 | "value": "john.doe@example.com", 54 | "normalized_value": "john.doe@example.com", 55 | "pattern_type": "email", 56 | "position": 42, 57 | "is_valid": True, 58 | "validation_message": "Valid email format", 59 | "confidence": 1.0 60 | } 61 | } 62 | 63 | 64 | class PatternExtractionResult(BaseModel): 65 | """Result of a pattern extraction operation from LLM output.""" 66 | matches: List[PatternMatch] 67 | """List of all extracted pattern matches""" 68 | 69 | total_matches: int 70 | """Total number of matches found""" 71 | 72 | pattern_used: str 73 | """The regex pattern that was used for extraction""" 74 | 75 | original_text: str 76 | """The original text from the LLM response""" 77 | 78 | extraction_timestamp: datetime 79 | """When the extraction was performed""" 80 | 81 | class Config: 82 | json_schema_extra = { 83 | "example": { 84 | "matches": [ 85 | { 86 | "value": "john@example.com", 87 | "normalized_value": "john@example.com", 88 | "pattern_type": "email", 89 | "position": 0, 90 | "is_valid": True, 91 | "validation_message": "Valid email format", 92 | "confidence": 1.0 93 | } 94 | ], 95 | "total_matches": 1, 96 | "pattern_used": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", 97 | "original_text": "Contact us at john@example.com for more information.", 98 | "extraction_timestamp": "2025-01-15T10:30:00" 99 | } 100 | } 101 | -------------------------------------------------------------------------------- /Documentation/docs/Vector Storage/Vector Embeddings.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 1 3 | --- 4 | 5 | # Vector Embeddings 6 | 7 | This component of the SimplerLLM Library facilitates the generation of text embeddings using OpenAI's embedding model for now. This functionality helps developers working on natural language processing applications, enabling them to easily generate embeddings for a variety of use cases such as [semantic chunking](https://docs.simplerllm.com/Advanced%20Tools/Chunking%20Methods#chunk_by_semantics-function), clustering in machine learning, finding semantic similarity, etc... 8 | 9 | Before using this function make sure that your environment is set up with the necessary API keys. Place your OpenAI API key in the `.env` file as shown below: 10 | 11 | ``` 12 | OPENAI_API_KEY="your_openai_api_key" 13 | ``` 14 | 15 | ## Using the `EmbeddingsLLM` Class 16 | 17 | The `EmbeddingsLLM` class is designed to handle the generation of text embeddings, supporting both synchronous and asynchronous operations. 18 | 19 | Start by creating an instance of the `EmbeddingsLLM` class entering the provider and the model you wish to use, like this: 20 | 21 | ```python 22 | from SimplerLLM.language.embeddings import EmbeddingsLLM, EmbeddingsProvider 23 | 24 | embeddings_llm_instance = EmbeddingsLLM.create(EmbeddingsProvider.OPENAI, "text-embedding-3-small") 25 | ``` 26 | 27 | ### Generating Embeddings Synchronously 28 | 29 | The `generate_embeddings` method allows you to generate embeddings synchronously. It takes 2 parameters: 30 | - `user_input`: String or list of strings for which embeddings are required. 31 | - `full_response` (Optional): Boolean indicating whether to return the full API response. It's set by default to false where it returns only the text embeddings, not the full API response. 32 | 33 | Here's an example of generating embeddings for a list of strings: 34 | 35 | ```python 36 | from SimplerLLM.language.embeddings import EmbeddingsLLM, EmbeddingsProvider 37 | 38 | texts = ["Hello World", "Discussing AI.", "Artificial intelligence has many applications."] 39 | embeddings_llm_instance = EmbeddingsLLM.create(EmbeddingsProvider.OPENAI, "text-embedding-3-small") 40 | 41 | embeddings = embeddings_llm_instance.generate_embeddings(texts) 42 | 43 | print(embeddings) 44 | ``` 45 | 46 | ### Generating Embeddings Asynchronously 47 | 48 | For applications that benefit from non-blocking operations, use the `generate_embeddings_async` method to perform asynchronous embeddings generation. It also takes the same 2 parameters: 49 | - `user_input`: String or list of strings for which embeddings are required. 50 | - `full_response` (Optional): Boolean indicating whether to return the full API response. It's set by default to false where it returns only the text embeddings, not the full API response. 51 | 52 | Generating embeddings asynchronously for a list of strings: 53 | 54 | ```python 55 | import asyncio 56 | from SimplerLLM.language.embeddings import EmbeddingsLLM, EmbeddingsProvider 57 | 58 | async def generate_async_embeddings(): 59 | texts = ["Hello World", "Discussing AI.", "Artificial intelligence has many applications."] 60 | embeddings_llm_instance = EmbeddingsLLM.create(EmbeddingsProvider.OPENAI, "text-embedding-3-small") 61 | 62 | tasks = [embeddings_llm_instance.generate_embeddings_async(text) for text in texts] 63 | 64 | embeddings = await asyncio.gather(*tasks) 65 | 66 | print(embeddings) 67 | 68 | asyncio.run(generate_async_embeddings()) 69 | ``` 70 | 71 | This method allows your application to remain responsive while processing multiple embedding requests simultaneously. 72 | 73 | --- 74 | 75 | That's how you can benefit from SimplerLLM to make Vector Embeddings Generation Simpler! 76 | -------------------------------------------------------------------------------- /SimplerLLM/language/flow/tool_registry.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tool Registry for Mini Agent Flows 3 | 4 | Maps tool names to actual tool functions from SimplerLLM.tools 5 | """ 6 | 7 | from SimplerLLM.tools.python_func import execute_python_code 8 | from SimplerLLM.tools.file_functions import save_text_to_file 9 | from SimplerLLM.tools.json_helpers import ( 10 | extract_json_from_text, 11 | validate_json_with_pydantic_model, 12 | convert_json_to_pydantic_model, 13 | generate_json_example_from_pydantic, 14 | ) 15 | from SimplerLLM.tools.youtube import get_youtube_transcript, get_youtube_transcript_with_timing 16 | from SimplerLLM.tools.serp import ( 17 | search_with_serper_api, 18 | search_with_value_serp, 19 | search_with_duck_duck_go, 20 | ) 21 | from SimplerLLM.tools.file_loader import read_csv_file 22 | from SimplerLLM.tools.text_chunker import ( 23 | chunk_by_max_chunk_size, 24 | chunk_by_sentences, 25 | chunk_by_paragraphs, 26 | ) 27 | from SimplerLLM.tools.brainstorm import ( 28 | recursive_brainstorm_tool, 29 | simple_brainstorm, 30 | ) 31 | 32 | 33 | class ToolRegistry: 34 | """Registry of available tools for Mini Agent flows.""" 35 | 36 | TOOLS = { 37 | # Python execution 38 | "execute_python_code": execute_python_code, 39 | 40 | # File operations 41 | "save_text_to_file": save_text_to_file, 42 | "read_csv_file": read_csv_file, 43 | 44 | # JSON operations 45 | "extract_json_from_text": extract_json_from_text, 46 | "validate_json_with_pydantic_model": validate_json_with_pydantic_model, 47 | "convert_json_to_pydantic_model": convert_json_to_pydantic_model, 48 | "generate_json_example_from_pydantic": generate_json_example_from_pydantic, 49 | 50 | # YouTube tools 51 | "youtube_transcript": get_youtube_transcript, 52 | "youtube_transcript_with_timing": get_youtube_transcript_with_timing, 53 | 54 | # Search tools 55 | "web_search_serper": search_with_serper_api, 56 | "web_search_value_serp": search_with_value_serp, 57 | "web_search_duckduckgo": search_with_duck_duck_go, 58 | 59 | # Text chunking 60 | "chunk_by_max_size": chunk_by_max_chunk_size, 61 | "chunk_by_sentences": chunk_by_sentences, 62 | "chunk_by_paragraphs": chunk_by_paragraphs, 63 | 64 | # Brainstorming tools 65 | "recursive_brainstorm": recursive_brainstorm_tool, 66 | "simple_brainstorm": simple_brainstorm, 67 | } 68 | 69 | @classmethod 70 | def get_tool(cls, tool_name: str): 71 | """ 72 | Get a tool function by name. 73 | 74 | Args: 75 | tool_name: Name of the tool to retrieve 76 | 77 | Returns: 78 | The tool function 79 | 80 | Raises: 81 | ValueError: If the tool name is not found in the registry 82 | """ 83 | if tool_name not in cls.TOOLS: 84 | available_tools = ", ".join(cls.TOOLS.keys()) 85 | raise ValueError( 86 | f"Tool '{tool_name}' not found in registry. " 87 | f"Available tools: {available_tools}" 88 | ) 89 | return cls.TOOLS[tool_name] 90 | 91 | @classmethod 92 | def list_tools(cls): 93 | """Return a list of all available tool names.""" 94 | return list(cls.TOOLS.keys()) 95 | 96 | @classmethod 97 | def register_tool(cls, name: str, func): 98 | """ 99 | Register a custom tool. 100 | 101 | Args: 102 | name: Name for the tool 103 | func: The tool function to register 104 | """ 105 | cls.TOOLS[name] = func 106 | -------------------------------------------------------------------------------- /SimplerLLM/voice/stt/base.py: -------------------------------------------------------------------------------- 1 | from enum import Enum 2 | import os 3 | from SimplerLLM.utils.custom_verbose import verbose_print 4 | 5 | 6 | class STTProvider(Enum): 7 | """Enumeration of supported STT providers.""" 8 | OPENAI = 1 9 | # Future providers can be added here: 10 | # ASSEMBLYAI = 2 11 | # DEEPGRAM = 3 12 | # WHISPER_LOCAL = 4 13 | 14 | 15 | class STT: 16 | """ 17 | Base class for Speech-to-Text functionality. 18 | Provides a unified interface across different STT providers. 19 | """ 20 | 21 | def __init__( 22 | self, 23 | provider=STTProvider.OPENAI, 24 | model_name="whisper-1", 25 | api_key=None, 26 | verbose=False, 27 | ): 28 | """ 29 | Initialize STT instance. 30 | 31 | Args: 32 | provider: STT provider to use (STTProvider enum) 33 | model_name: Model to use (e.g., "whisper-1") 34 | api_key: API key for the provider (uses env var if not provided) 35 | verbose: Enable verbose logging 36 | """ 37 | self.provider = provider 38 | self.model_name = model_name 39 | self.api_key = api_key 40 | self.verbose = verbose 41 | 42 | if self.verbose: 43 | verbose_print( 44 | f"Initializing {provider.name} STT with model: {model_name}", 45 | "info" 46 | ) 47 | 48 | @staticmethod 49 | def create( 50 | provider=None, 51 | model_name=None, 52 | api_key=None, 53 | verbose=False, 54 | ): 55 | """ 56 | Factory method to create STT instances for different providers. 57 | 58 | Args: 59 | provider: STT provider (STTProvider enum) 60 | model_name: Model to use (provider-specific) 61 | api_key: API key for the provider 62 | verbose: Enable verbose logging 63 | 64 | Returns: 65 | Provider-specific STT instance (e.g., OpenAISTT) 66 | """ 67 | if provider == STTProvider.OPENAI: 68 | from .wrappers.openai_wrapper import OpenAISTT 69 | return OpenAISTT( 70 | provider=provider, 71 | model_name=model_name or "whisper-1", 72 | api_key=api_key, 73 | verbose=verbose, 74 | ) 75 | # Future providers can be added here 76 | # if provider == STTProvider.ASSEMBLYAI: 77 | # from .wrappers.assemblyai_wrapper import AssemblyAISTT 78 | # return AssemblyAISTT(...) 79 | else: 80 | return None 81 | 82 | def prepare_params(self, model=None, language=None, temperature=None): 83 | """ 84 | Prepare parameters for STT transcription, using instance defaults 85 | if parameters are not provided. 86 | 87 | Args: 88 | model: Model to use (None = use instance default) 89 | language: Language code (None = auto-detect) 90 | temperature: Temperature for transcription (None = use default 0.0) 91 | 92 | Returns: 93 | Dictionary of parameters 94 | """ 95 | return { 96 | "model_name": model if model is not None else self.model_name, 97 | "language": language, 98 | "temperature": temperature if temperature is not None else 0.0, 99 | } 100 | 101 | def set_provider(self, provider): 102 | """ 103 | Set the STT provider. 104 | 105 | Args: 106 | provider: STTProvider enum value 107 | """ 108 | if not isinstance(provider, STTProvider): 109 | raise ValueError("Provider must be an instance of STTProvider Enum") 110 | self.provider = provider 111 | -------------------------------------------------------------------------------- /SimplerLLM/language/guardrails/exceptions.py: -------------------------------------------------------------------------------- 1 | """ 2 | Custom exceptions for the Guardrails system. 3 | """ 4 | 5 | 6 | class GuardrailException(Exception): 7 | """Base exception for all guardrail-related errors.""" 8 | 9 | def __init__(self, message: str, guardrail_name: str = "", metadata: dict = None): 10 | """ 11 | Initialize guardrail exception. 12 | 13 | Args: 14 | message: Error message 15 | guardrail_name: Name of the guardrail that raised the exception 16 | metadata: Additional metadata about the error 17 | """ 18 | super().__init__(message) 19 | self.guardrail_name = guardrail_name 20 | self.metadata = metadata or {} 21 | 22 | 23 | class GuardrailBlockedException(GuardrailException): 24 | """ 25 | Exception raised when a guardrail blocks a request or response. 26 | 27 | This exception is raised when a guardrail determines that content 28 | should not be processed (input) or returned (output). 29 | """ 30 | 31 | def __init__(self, message: str, guardrail_name: str = "", metadata: dict = None): 32 | """ 33 | Initialize blocked exception. 34 | 35 | Args: 36 | message: Reason for blocking 37 | guardrail_name: Name of the guardrail that blocked 38 | metadata: Additional metadata (e.g., detected patterns, violation details) 39 | """ 40 | super().__init__(message, guardrail_name, metadata) 41 | 42 | 43 | class GuardrailValidationException(GuardrailException): 44 | """ 45 | Exception raised when guardrail validation fails unexpectedly. 46 | 47 | This is different from GuardrailBlockedException - this indicates 48 | an error in the guardrail itself, not blocked content. 49 | """ 50 | 51 | def __init__(self, message: str, guardrail_name: str = "", original_exception: Exception = None): 52 | """ 53 | Initialize validation exception. 54 | 55 | Args: 56 | message: Error message 57 | guardrail_name: Name of the guardrail that failed 58 | original_exception: The original exception that caused the failure 59 | """ 60 | super().__init__(message, guardrail_name) 61 | self.original_exception = original_exception 62 | 63 | 64 | class GuardrailConfigurationException(GuardrailException): 65 | """ 66 | Exception raised when a guardrail is misconfigured. 67 | 68 | Examples: missing required config parameters, invalid parameter values, etc. 69 | """ 70 | 71 | def __init__(self, message: str, guardrail_name: str = "", config: dict = None): 72 | """ 73 | Initialize configuration exception. 74 | 75 | Args: 76 | message: Error message 77 | guardrail_name: Name of the misconfigured guardrail 78 | config: The problematic configuration 79 | """ 80 | super().__init__(message, guardrail_name, config or {}) 81 | self.config = config or {} 82 | 83 | 84 | class GuardrailTimeoutException(GuardrailException): 85 | """ 86 | Exception raised when a guardrail execution times out. 87 | 88 | Useful for guardrails that make external API calls or perform 89 | expensive computations. 90 | """ 91 | 92 | def __init__(self, message: str, guardrail_name: str = "", timeout_seconds: float = None): 93 | """ 94 | Initialize timeout exception. 95 | 96 | Args: 97 | message: Error message 98 | guardrail_name: Name of the guardrail that timed out 99 | timeout_seconds: The timeout value that was exceeded 100 | """ 101 | metadata = {"timeout_seconds": timeout_seconds} if timeout_seconds else {} 102 | super().__init__(message, guardrail_name, metadata) 103 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm/base.py: -------------------------------------------------------------------------------- 1 | from enum import Enum 2 | import os 3 | import time 4 | import asyncio 5 | from typing import Type 6 | from pydantic import BaseModel 7 | 8 | from SimplerLLM.utils.custom_verbose import verbose_print 9 | from SimplerLLM.tools.json_helpers import ( 10 | extract_json_from_text, 11 | convert_json_to_pydantic_model, 12 | validate_json_with_pydantic_model, 13 | generate_json_example_from_pydantic, 14 | ) 15 | 16 | class LLMProvider(Enum): 17 | OPENAI = 1 18 | GEMINI = 2 19 | ANTHROPIC = 3 20 | OLLAMA = 4 21 | LWH = 5 22 | DEEPSEEK = 6 23 | OPENROUTER = 7 24 | COHERE = 8 25 | PERPLEXITY = 9 26 | 27 | class LLM: 28 | def __init__( 29 | self, 30 | provider=LLMProvider.OPENAI, 31 | model_name="gpt-4o-mini", 32 | temperature=0.7, 33 | top_p=1.0, 34 | api_key=None, 35 | user_id=None, 36 | verbose=False, 37 | ): 38 | self.provider = provider 39 | self.model_name = model_name 40 | self.temperature = temperature 41 | self.top_p = top_p 42 | self.api_key = api_key 43 | self.user_id = user_id 44 | self.verbose = verbose 45 | 46 | if self.verbose: 47 | verbose_print(f"Initializing {provider.name} LLM with model: {model_name}", "info") 48 | verbose_print(f"Configuration - Temperature: {temperature}, Top_p: {top_p}", "debug") 49 | 50 | @staticmethod 51 | def create( 52 | provider=None, 53 | model_name=None, 54 | temperature=0.7, 55 | top_p=1.0, 56 | api_key=None, 57 | user_id=None, 58 | verbose=False, 59 | ): 60 | if provider == LLMProvider.OPENAI: 61 | from .wrappers.openai_wrapper import OpenAILLM 62 | return OpenAILLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 63 | if provider == LLMProvider.GEMINI: 64 | from .wrappers.gemini_wrapper import GeminiLLM 65 | return GeminiLLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 66 | if provider == LLMProvider.ANTHROPIC: 67 | from .wrappers.anthropic_wrapper import AnthropicLLM 68 | return AnthropicLLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 69 | if provider == LLMProvider.OLLAMA: 70 | from .wrappers.ollama_wrapper import OllamaLLM 71 | return OllamaLLM(provider, model_name, temperature, top_p, verbose=verbose) 72 | if provider == LLMProvider.DEEPSEEK: 73 | from .wrappers.deepseek_wrapper import DeepSeekLLM 74 | return DeepSeekLLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 75 | if provider == LLMProvider.OPENROUTER: 76 | from .wrappers.openrouter_wrapper import OpenRouterLLM 77 | return OpenRouterLLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 78 | if provider == LLMProvider.COHERE: 79 | from .wrappers.cohere_wrapper import CohereLLM 80 | return CohereLLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 81 | if provider == LLMProvider.PERPLEXITY: 82 | from .wrappers.perplexity_wrapper import PerplexityLLM 83 | return PerplexityLLM(provider, model_name, temperature, top_p, api_key, verbose=verbose) 84 | else: 85 | return None 86 | 87 | def set_model(self, provider): 88 | if not isinstance(provider, LLMProvider): 89 | raise ValueError("Provider must be an instance of LLMProvider Enum") 90 | self.provider = provider 91 | 92 | def prepare_params(self, model_name, temperature, top_p): 93 | return { 94 | "model_name": model_name if model_name else self.model_name, 95 | "temperature": temperature if temperature else self.temperature, 96 | "top_p": top_p if top_p else self.top_p, 97 | } 98 | -------------------------------------------------------------------------------- /SimplerLLM/language/__init__.py: -------------------------------------------------------------------------------- 1 | from .llm.base import LLM, LLMProvider 2 | from .llm.reliable import ReliableLLM 3 | from .llm.wrappers import OpenAILLM, GeminiLLM, AnthropicLLM, OllamaLLM, DeepSeekLLM 4 | from .flow import MiniAgent, StepResult, FlowResult 5 | from .llm_judge import LLMJudge, JudgeMode, JudgeResult, ProviderResponse, ProviderEvaluation, EvaluationReport 6 | from .llm_validator import LLMValidator, ValidationResult, ValidatorScore, AggregationMethod 7 | from .llm_feedback import LLMFeedbackLoop, FeedbackResult, IterationResult, Critique 8 | from .llm_provider_router import LLMProviderRouter, ProviderConfig, RoutingResult, QueryClassification 9 | from .llm_clustering import ( 10 | LLMClusterer, 11 | Cluster, 12 | ClusterMetadata, 13 | ChunkReference, 14 | ClusterTree, 15 | ClusteringResult, 16 | ClusteringConfig, 17 | TreeConfig, 18 | save_clustering_result, 19 | load_clustering_result, 20 | save_cluster_tree, 21 | load_cluster_tree, 22 | get_clustering_stats, 23 | save_clustering_result_optimized, 24 | load_clustering_result_optimized, 25 | ChunkStore, 26 | InMemoryChunkStore, 27 | SQLiteChunkStore, 28 | create_chunk_store 29 | ) 30 | from .llm_retrieval import ( 31 | LLMRetriever, 32 | RetrievalResult, 33 | HierarchicalRetrievalResponse, 34 | RetrievalConfig 35 | ) 36 | from .guardrails import ( 37 | GuardrailsLLM, 38 | GuardrailAction, 39 | GuardrailResult, 40 | InputGuardrail, 41 | OutputGuardrail, 42 | CompositeGuardrail, 43 | GuardrailException, 44 | GuardrailBlockedException, 45 | GuardrailValidationException, 46 | GuardrailConfigurationException, 47 | GuardrailTimeoutException, 48 | PromptInjectionGuardrail, 49 | TopicFilterGuardrail, 50 | InputPIIDetectionGuardrail, 51 | FormatValidatorGuardrail, 52 | OutputPIIDetectionGuardrail, 53 | ContentSafetyGuardrail, 54 | LengthValidatorGuardrail, 55 | ) 56 | 57 | __all__ = [ 58 | 'LLM', 59 | 'LLMProvider', 60 | 'ReliableLLM', 61 | 'OpenAILLM', 62 | 'GeminiLLM', 63 | 'AnthropicLLM', 64 | 'OllamaLLM', 65 | 'DeepSeekLLM', 66 | 'MiniAgent', 67 | 'StepResult', 68 | 'FlowResult', 69 | 'LLMJudge', 70 | 'JudgeMode', 71 | 'JudgeResult', 72 | 'ProviderResponse', 73 | 'ProviderEvaluation', 74 | 'EvaluationReport', 75 | # LLM Validator 76 | 'LLMValidator', 77 | 'ValidationResult', 78 | 'ValidatorScore', 79 | 'AggregationMethod', 80 | 'LLMFeedbackLoop', 81 | 'FeedbackResult', 82 | 'IterationResult', 83 | 'Critique', 84 | 'LLMProviderRouter', 85 | 'ProviderConfig', 86 | 'RoutingResult', 87 | 'QueryClassification', 88 | # LLM Clustering 89 | 'LLMClusterer', 90 | 'Cluster', 91 | 'ClusterMetadata', 92 | 'ChunkReference', 93 | 'ClusterTree', 94 | 'ClusteringResult', 95 | 'ClusteringConfig', 96 | 'TreeConfig', 97 | 'save_clustering_result', 98 | 'load_clustering_result', 99 | 'save_cluster_tree', 100 | 'load_cluster_tree', 101 | 'get_clustering_stats', 102 | 'save_clustering_result_optimized', 103 | 'load_clustering_result_optimized', 104 | 'ChunkStore', 105 | 'InMemoryChunkStore', 106 | 'SQLiteChunkStore', 107 | 'create_chunk_store', 108 | # LLM Retrieval 109 | 'LLMRetriever', 110 | 'RetrievalResult', 111 | 'HierarchicalRetrievalResponse', 112 | 'RetrievalConfig', 113 | # Guardrails 114 | 'GuardrailsLLM', 115 | 'GuardrailAction', 116 | 'GuardrailResult', 117 | 'InputGuardrail', 118 | 'OutputGuardrail', 119 | 'CompositeGuardrail', 120 | 'GuardrailException', 121 | 'GuardrailBlockedException', 122 | 'GuardrailValidationException', 123 | 'GuardrailConfigurationException', 124 | 'GuardrailTimeoutException', 125 | 'PromptInjectionGuardrail', 126 | 'TopicFilterGuardrail', 127 | 'InputPIIDetectionGuardrail', 128 | 'FormatValidatorGuardrail', 129 | 'OutputPIIDetectionGuardrail', 130 | 'ContentSafetyGuardrail', 131 | 'LengthValidatorGuardrail', 132 | ] 133 | -------------------------------------------------------------------------------- /SimplerLLM/voice/voice_chat/conversation.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Optional 2 | from .models import ConversationMessage, ConversationRole 3 | from SimplerLLM.utils.custom_verbose import verbose_print 4 | 5 | 6 | class ConversationManager: 7 | """ 8 | Manages conversation history with automatic truncation. 9 | 10 | Handles: 11 | - Message storage 12 | - History length limits 13 | - Conversion to LLM message format 14 | - System message preservation 15 | """ 16 | 17 | def __init__(self, max_length: int = 20, verbose: bool = False): 18 | """ 19 | Initialize conversation manager. 20 | 21 | Args: 22 | max_length: Maximum number of messages to keep (excluding system message) 23 | verbose: Enable verbose logging 24 | """ 25 | self.max_length = max_length 26 | self.verbose = verbose 27 | self.messages: List[ConversationMessage] = [] 28 | 29 | def add_message( 30 | self, 31 | role: ConversationRole, 32 | content: str, 33 | audio_file: Optional[str] = None, 34 | metadata: Optional[Dict] = None 35 | ): 36 | """ 37 | Add message to conversation history. 38 | 39 | Args: 40 | role: Message role (USER, ASSISTANT, SYSTEM) 41 | content: Message content 42 | audio_file: Optional path to audio file 43 | metadata: Optional metadata dictionary 44 | """ 45 | message = ConversationMessage( 46 | role=role, 47 | content=content, 48 | audio_file=audio_file, 49 | metadata=metadata 50 | ) 51 | 52 | self.messages.append(message) 53 | 54 | # Truncate if needed (keep system message + max_length recent messages) 55 | if len(self.messages) > self.max_length + 1: # +1 for system message 56 | # Find system message (should be first) 57 | system_msg = None 58 | other_msgs = [] 59 | 60 | for msg in self.messages: 61 | if msg.role == ConversationRole.SYSTEM: 62 | system_msg = msg 63 | else: 64 | other_msgs.append(msg) 65 | 66 | # Keep system message + last max_length messages 67 | if system_msg: 68 | self.messages = [system_msg] + other_msgs[-self.max_length:] 69 | else: 70 | self.messages = other_msgs[-self.max_length:] 71 | 72 | if self.verbose: 73 | verbose_print( 74 | f"Truncated conversation history to {self.max_length} messages (+ system)", 75 | "debug" 76 | ) 77 | 78 | def get_messages_for_llm(self) -> List[Dict[str, str]]: 79 | """ 80 | Convert conversation history to LLM message format. 81 | 82 | Returns: 83 | List of message dictionaries with 'role' and 'content' keys 84 | """ 85 | return [ 86 | {"role": msg.role.value, "content": msg.content} 87 | for msg in self.messages 88 | ] 89 | 90 | def get_all_messages(self) -> List[ConversationMessage]: 91 | """ 92 | Get all messages in conversation history. 93 | 94 | Returns: 95 | Copy of all messages 96 | """ 97 | return self.messages.copy() 98 | 99 | def clear(self): 100 | """Clear all messages from conversation history.""" 101 | self.messages = [] 102 | if self.verbose: 103 | verbose_print("Conversation history cleared", "debug") 104 | 105 | def get_message_count(self) -> int: 106 | """Get total number of messages.""" 107 | return len(self.messages) 108 | 109 | def get_user_message_count(self) -> int: 110 | """Get number of user messages.""" 111 | return sum(1 for msg in self.messages if msg.role == ConversationRole.USER) 112 | 113 | def get_assistant_message_count(self) -> int: 114 | """Get number of assistant messages.""" 115 | return sum(1 for msg in self.messages if msg.role == ConversationRole.ASSISTANT) 116 | -------------------------------------------------------------------------------- /SimplerLLM/tools/youtube.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import requests 4 | from typing import List 5 | from pydantic import BaseModel 6 | from dotenv import load_dotenv 7 | from youtube_transcript_api import YouTubeTranscriptApi 8 | 9 | load_dotenv(override=True) 10 | 11 | class TranscriptSegment(BaseModel): 12 | text: str 13 | start: float 14 | duration: float 15 | 16 | class Transcript(BaseModel): 17 | segments: List[TranscriptSegment] 18 | 19 | def get_youtube_transcript_with_timing(video_url): 20 | """ 21 | Fetches the transcript of a YouTube video given its URL. 22 | 23 | Parameters: 24 | video_url (str): The URL of the YouTube video. 25 | 26 | Returns: 27 | list: A list of dictionaries, where each dictionary contains the 'text' of the transcript 28 | and its associated 'start' time and 'duration'. 29 | Example: 30 | [ 31 | {'text': 'Hello world!', 'start': 0.0, 'duration': 2.0}, 32 | {'text': 'This is a transcript.', 'start': 2.0, 'duration': 3.0}, 33 | ... 34 | ] 35 | str: An error message if the transcript cannot be retrieved. 36 | 37 | Raises: 38 | Exception: If an error occurs while fetching the transcript. 39 | """ 40 | # Enhanced regex to handle different YouTube URL formats 41 | match = re.search(r"(?:youtube\.com/watch\?v=|youtu\.be/)([\w-]+)", video_url) 42 | if match: 43 | video_id = match.group(1) 44 | else: 45 | raise ValueError("Invalid YouTube URL") 46 | 47 | try: 48 | api_key = os.getenv("SEARCHAPI_API_KEY") 49 | url = "https://www.searchapi.io/api/v1/search" 50 | params = { 51 | "engine": "youtube_transcripts", 52 | "video_id": video_id, 53 | "api_key": api_key 54 | } 55 | 56 | response = requests.get(url, params=params) 57 | response.raise_for_status() 58 | api_response = response.json() 59 | 60 | transcript = api_response["transcripts"] 61 | transcript_list = Transcript(segments=transcript) 62 | #transcript_list = YouTubeTranscriptApi.get_transcript(video_id) 63 | 64 | return transcript_list 65 | 66 | except Exception as e: 67 | raise Exception(f"An error occurred while fetching the video details: {e}") 68 | 69 | def get_youtube_transcript(video_url): 70 | """ 71 | Retrieves the transcript of a YouTube video and returns it as a single string with sentences. 72 | 73 | Args: 74 | video_id (str): The YouTube video ID for which to retrieve the transcript. 75 | 76 | Returns: 77 | str: A single string containing the transcript with sentences. 78 | The function attempts to maintain sentence structure by adding periods where necessary. 79 | str: An error message if the transcript cannot be retrieved. 80 | 81 | Raises: 82 | Exception: If an error occurs while fetching the transcript. 83 | """ 84 | match = re.search(r"(?:youtube\.com/watch\?v=|youtu\.be/)([\w-]+)", video_url) 85 | if match: 86 | video_id = match.group(1) 87 | else: 88 | raise ValueError("Invalid YouTube URL") 89 | 90 | try: 91 | api_key = os.getenv("SEARCHAPI_API_KEY") 92 | url = "https://www.searchapi.io/api/v1/search" 93 | params = { 94 | "engine": "youtube_transcripts", 95 | "video_id": video_id, 96 | "api_key": api_key 97 | } 98 | 99 | response = requests.get(url, params=params) 100 | response.raise_for_status() 101 | api_response = response.json() 102 | 103 | transcript = api_response["transcripts"] 104 | transcript_list = Transcript(segments=transcript) 105 | #transcript_list = YouTubeTranscriptApi.get_transcript(video_id) 106 | 107 | transcript_text = " ".join( 108 | [segment.text.strip() + "." if not segment.text.endswith('.') else segment.text.strip() 109 | for segment in transcript_list.segments] 110 | ) 111 | 112 | return transcript_text 113 | except Exception as e: 114 | raise Exception(f"An error occurred while fetching the video details: {e}") -------------------------------------------------------------------------------- /SimplerLLM/voice/voice_chat/models.py: -------------------------------------------------------------------------------- 1 | from pydantic import BaseModel, Field 2 | from typing import List, Optional, Dict, Any 3 | from enum import Enum 4 | from datetime import datetime 5 | 6 | 7 | class ConversationRole(str, Enum): 8 | """Message roles in conversation.""" 9 | USER = "user" 10 | ASSISTANT = "assistant" 11 | SYSTEM = "system" 12 | 13 | 14 | class ConversationMessage(BaseModel): 15 | """Single message in conversation history.""" 16 | role: ConversationRole 17 | content: str 18 | timestamp: datetime = Field(default_factory=datetime.now) 19 | audio_file: Optional[str] = None 20 | metadata: Optional[Dict[str, Any]] = None 21 | 22 | class Config: 23 | json_schema_extra = { 24 | "example": { 25 | "role": "user", 26 | "content": "What's the weather today?", 27 | "timestamp": "2025-01-15T10:30:00", 28 | "audio_file": "input_001.wav" 29 | } 30 | } 31 | 32 | 33 | class VoiceChatConfig(BaseModel): 34 | """Configuration for VoiceChat session.""" 35 | 36 | # LLM settings 37 | system_prompt: str = "You are a helpful voice assistant." 38 | temperature: float = Field(default=0.7, ge=0.0, le=2.0) 39 | max_tokens: int = Field(default=150, ge=1, le=4000) 40 | 41 | # Voice settings 42 | tts_voice: Optional[str] = None 43 | tts_speed: float = Field(default=1.0, ge=0.25, le=4.0) 44 | tts_model: Optional[str] = None 45 | stt_language: Optional[str] = None 46 | stt_model: Optional[str] = None 47 | 48 | # Conversation settings 49 | max_history_length: int = Field(default=20, ge=0, le=100) 50 | save_audio: bool = False 51 | output_dir: str = "voice_chat_output" 52 | 53 | # Advanced features (for future extensibility) 54 | enable_tools: bool = False 55 | enable_rag: bool = False 56 | enable_routing: bool = False 57 | 58 | class Config: 59 | json_schema_extra = { 60 | "example": { 61 | "system_prompt": "You are a friendly cooking assistant", 62 | "temperature": 0.8, 63 | "max_tokens": 200, 64 | "tts_voice": "nova", 65 | "tts_speed": 1.1, 66 | "max_history_length": 10, 67 | "save_audio": True, 68 | "output_dir": "cooking_chat" 69 | } 70 | } 71 | 72 | 73 | class VoiceTurnResult(BaseModel): 74 | """Result of a single voice interaction turn.""" 75 | user_audio_path: Optional[str] = None 76 | user_text: str 77 | assistant_text: str 78 | assistant_audio_path: Optional[str] = None 79 | 80 | stt_duration: Optional[float] = None 81 | llm_duration: Optional[float] = None 82 | tts_duration: Optional[float] = None 83 | total_duration: float 84 | 85 | timestamp: datetime = Field(default_factory=datetime.now) 86 | error: Optional[str] = None 87 | 88 | class Config: 89 | json_schema_extra = { 90 | "example": { 91 | "user_text": "What's the capital of France?", 92 | "assistant_text": "The capital of France is Paris.", 93 | "assistant_audio_path": "output/assistant_001.mp3", 94 | "stt_duration": 1.2, 95 | "llm_duration": 0.8, 96 | "tts_duration": 1.5, 97 | "total_duration": 3.5 98 | } 99 | } 100 | 101 | 102 | class VoiceChatSession(BaseModel): 103 | """Complete voice chat session data.""" 104 | session_id: str 105 | config: VoiceChatConfig 106 | conversation_history: List[ConversationMessage] 107 | turns: List[VoiceTurnResult] 108 | 109 | started_at: datetime 110 | ended_at: Optional[datetime] = None 111 | total_turns: int = 0 112 | success: bool = True 113 | 114 | metadata: Optional[Dict[str, Any]] = None 115 | 116 | class Config: 117 | json_schema_extra = { 118 | "example": { 119 | "session_id": "550e8400-e29b-41d4-a716-446655440000", 120 | "total_turns": 5, 121 | "started_at": "2025-01-15T10:00:00", 122 | "ended_at": "2025-01-15T10:15:00", 123 | "success": True 124 | } 125 | } 126 | -------------------------------------------------------------------------------- /Documentation/docs/Advanced Tools/Chunking Methods.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 4 3 | --- 4 | 5 | # Chunking Methods 6 | 7 | This section provides detailed information on the text chunking capabilities of the SimplerLLM library. These functions allow users to split text into pieces based on sentence, paragraph, size, or semantic similarity. 8 | 9 | Each method is designed to accommodate different analytical needs, enhancing text processing tasks in various applications such as data preprocessing, content analysis, and information retrieval. 10 | 11 | The Data by all these functions is returned in form of a `Text Chunks` object that includes the following parameters: 12 | - `chunk_list` (List): This is a list of `ChunkInfo` objects that includes: 13 | - `text` (string): The text of the chunk itself. 14 | - `num_characters` (string): The number of characters in the chunk. 15 | - `num_words` (string): The number of words in the chunk. 16 | - `num_chunks` (int): The total number of chunks returned. 17 | 18 | ## chunk_by_sentences Function 19 | 20 | Breaks down the provided text into sentences using punctuation marks as delimiters. It takes 1 parameter which is: 21 | - `text` (str): Text you want to chunk into sentences. 22 | 23 | It then returns a `Text Chunks` object. Here's a sample usage: 24 | 25 | ```python 26 | from SimplerLLM.tools.text_chunker import chunk_by_sentences 27 | 28 | text = "First sentence. Second sentence? Third sentence!" 29 | 30 | sentences = chunk_by_sentences(text) 31 | 32 | print(sentences) 33 | ``` 34 | 35 | ## chunk_by_paragraphs Function 36 | 37 | Segments the provided text into paragraphs based on newline characters. It takes 1 parameter: 38 | - `text` (str): Text you want to chunk into paragraphs. 39 | 40 | It then returns a `Text Chunks` object. Here's a sample usage: 41 | 42 | ```python 43 | from SimplerLLM.tools.text_chunker import chunk_by_paragraphs 44 | 45 | text = "First paragraph, still going.\n\nSecond paragraph starts." 46 | 47 | paragraphs = chunk_by_paragraphs(text) 48 | 49 | print(paragraphs) 50 | ``` 51 | 52 | ## chunk_by_max_chunk_size Function 53 | 54 | Splits the input text into chunks that do not exceed a specified size. Additionally, it can preserve the meaning of sentences by ensuring that chunks do not split sentences in the middle. It takes 3 parameters: 55 | - `text` (str): The text you want to chunk. 56 | - `max_chunk_size` (int): The maximum size of each chunk in characters. 57 | - `preserve_sentence_structure` (bool, optional): Whether you want to preserve sentence meaning. Set to False by default. 58 | 59 | It returns a `Text Chunks` object. Here's how you can use it: 60 | 61 | ```python 62 | from SimplerLLM.tools.text_chunker import chunk_by_max_chunk_size 63 | 64 | text = "Hello world! This is an example of text chunking. Enjoy using SimplerLLM." 65 | 66 | chunks = chunk_by_max_chunk_size(text, 50, True) 67 | 68 | print(chunks) 69 | ``` 70 | 71 | ## chunk_by_semantics Function 72 | 73 | Uses semantic similarity to divide text into chunks. It takes 2 parameters: 74 | - `text` (str): Text to be segmented based on semantic content. 75 | - `llm_embeddings_instance` [(EmbeddingsLLM)](https://docs.simplerllm.com/Vector%20Storage/Vector%20Embeddings): An instance of a language model used to generate text embeddings for semantic analysis. 76 | - `threshold_percentage` (int, Optional): The percentile threshold you want to use to chunk the text. It is set by default to 90. 77 | 78 | It returns a list of `ChunkInfo` objects, each representing a semantically coherent segment of the original text. However, keep in mind that you need to have your OpenAI API key in the `.env` file so that the llm embedding instance can generate the text embeddings. Enter it in this format: 79 | 80 | ``` 81 | OPENAI_API_KEY="your_openai_api_key" 82 | ``` 83 | 84 | Anyways, Here's an example usage of the code: 85 | 86 | ```python 87 | from SimplerLLM.tools.text_chunker import chunk_by_semantics 88 | from SimplerLLM.language.embeddings import EmbeddingsLLM, EmbeddingsProvider 89 | 90 | text = "Discussing AI. Artificial intelligence has many applications. However, Dogs like bones" 91 | embeddings_model = EmbeddingsLLM.create(provider=EmbeddingsProvider.OPENAI, 92 | model_name="text-embedding-3-small") 93 | 94 | semantic_chunks = chunk_by_semantics(text, embeddings_model, threshold_percentage=80) 95 | 96 | print(semantic_chunks) 97 | ``` 98 | That's how you can benefit from SimplerLLM to make Text Chunking Simpler! -------------------------------------------------------------------------------- /SimplerLLM/voice/tts/base.py: -------------------------------------------------------------------------------- 1 | from enum import Enum 2 | import os 3 | from SimplerLLM.utils.custom_verbose import verbose_print 4 | 5 | 6 | class TTSProvider(Enum): 7 | """Enumeration of supported TTS providers.""" 8 | OPENAI = 1 9 | ELEVENLABS = 2 10 | # Future providers can be added here: 11 | # GOOGLE = 3 12 | # AZURE = 4 13 | 14 | 15 | class TTS: 16 | """ 17 | Base class for Text-to-Speech functionality. 18 | Provides a unified interface across different TTS providers. 19 | """ 20 | 21 | def __init__( 22 | self, 23 | provider=TTSProvider.OPENAI, 24 | model_name="tts-1", 25 | voice="alloy", 26 | api_key=None, 27 | verbose=False, 28 | ): 29 | """ 30 | Initialize TTS instance. 31 | 32 | Args: 33 | provider: TTS provider to use (TTSProvider enum) 34 | model_name: Model to use (e.g., "tts-1", "tts-1-hd") 35 | voice: Default voice to use (e.g., "alloy", "nova", "shimmer") 36 | api_key: API key for the provider (uses env var if not provided) 37 | verbose: Enable verbose logging 38 | """ 39 | self.provider = provider 40 | self.model_name = model_name 41 | self.voice = voice 42 | self.api_key = api_key 43 | self.verbose = verbose 44 | 45 | if self.verbose: 46 | verbose_print( 47 | f"Initializing {provider.name} TTS with model: {model_name}, voice: {voice}", 48 | "info" 49 | ) 50 | 51 | @staticmethod 52 | def create( 53 | provider=None, 54 | model_name=None, 55 | voice=None, 56 | api_key=None, 57 | verbose=False, 58 | ): 59 | """ 60 | Factory method to create TTS instances for different providers. 61 | 62 | Args: 63 | provider: TTS provider (TTSProvider enum) 64 | model_name: Model to use (provider-specific) 65 | voice: Default voice to use 66 | api_key: API key for the provider 67 | verbose: Enable verbose logging 68 | 69 | Returns: 70 | Provider-specific TTS instance (e.g., OpenAITTS) 71 | """ 72 | if provider == TTSProvider.OPENAI: 73 | from .wrappers.openai_wrapper import OpenAITTS 74 | return OpenAITTS( 75 | provider=provider, 76 | model_name=model_name or "tts-1", 77 | voice=voice or "alloy", 78 | api_key=api_key, 79 | verbose=verbose, 80 | ) 81 | elif provider == TTSProvider.ELEVENLABS: 82 | from .wrappers.elevenlabs_wrapper import ElevenLabsTTS 83 | return ElevenLabsTTS( 84 | provider=provider, 85 | model_name=model_name or "eleven_turbo_v2", 86 | voice=voice or "21m00Tcm4TlvDq8ikWAM", # Rachel voice 87 | api_key=api_key, 88 | verbose=verbose, 89 | ) 90 | # Future providers can be added here 91 | # if provider == TTSProvider.GOOGLE: 92 | # from .wrappers.google_wrapper import GoogleTTS 93 | # return GoogleTTS(...) 94 | else: 95 | return None 96 | 97 | def prepare_params(self, voice=None, model=None, speed=None): 98 | """ 99 | Prepare parameters for TTS generation, using instance defaults 100 | if parameters are not provided. 101 | 102 | Args: 103 | voice: Voice to use (None = use instance default) 104 | model: Model to use (None = use instance default) 105 | speed: Speed to use (None = use default 1.0) 106 | 107 | Returns: 108 | Dictionary of parameters 109 | """ 110 | return { 111 | "voice": voice if voice is not None else self.voice, 112 | "model_name": model if model is not None else self.model_name, 113 | "speed": speed if speed is not None else 1.0, 114 | } 115 | 116 | def set_provider(self, provider): 117 | """ 118 | Set the TTS provider. 119 | 120 | Args: 121 | provider: TTSProvider enum value 122 | """ 123 | if not isinstance(provider, TTSProvider): 124 | raise ValueError("Provider must be an instance of TTSProvider Enum") 125 | self.provider = provider 126 | -------------------------------------------------------------------------------- /SimplerLLM/voice/video_transcription/models.py: -------------------------------------------------------------------------------- 1 | """ 2 | Pydantic models for video transcription and caption generation. 3 | """ 4 | from pydantic import BaseModel, Field 5 | from typing import List, Optional, Dict 6 | from datetime import timedelta 7 | 8 | 9 | class CaptionSegment(BaseModel): 10 | """Represents a single caption segment with timing information.""" 11 | 12 | index: int 13 | start_time: float # in seconds 14 | end_time: float # in seconds 15 | text: str 16 | duration: float # in seconds 17 | 18 | def to_srt_time(self, time_seconds: float) -> str: 19 | """Convert seconds to SRT time format (HH:MM:SS,mmm).""" 20 | td = timedelta(seconds=time_seconds) 21 | hours = int(td.total_seconds() // 3600) 22 | minutes = int((td.total_seconds() % 3600) // 60) 23 | seconds = int(td.total_seconds() % 60) 24 | milliseconds = int((td.total_seconds() % 1) * 1000) 25 | return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}" 26 | 27 | def to_vtt_time(self, time_seconds: float) -> str: 28 | """Convert seconds to VTT time format (HH:MM:SS.mmm).""" 29 | td = timedelta(seconds=time_seconds) 30 | hours = int(td.total_seconds() // 3600) 31 | minutes = int((td.total_seconds() % 3600) // 60) 32 | seconds = int(td.total_seconds() % 60) 33 | milliseconds = int((td.total_seconds() % 1) * 1000) 34 | return f"{hours:02d}:{minutes:02d}:{seconds:02d}.{milliseconds:03d}" 35 | 36 | def to_srt(self) -> str: 37 | """Convert segment to SRT format.""" 38 | start = self.to_srt_time(self.start_time) 39 | end = self.to_srt_time(self.end_time) 40 | return f"{self.index}\n{start} --> {end}\n{self.text}\n" 41 | 42 | def to_vtt(self) -> str: 43 | """Convert segment to VTT format.""" 44 | start = self.to_vtt_time(self.start_time) 45 | end = self.to_vtt_time(self.end_time) 46 | return f"{start} --> {end}\n{self.text}\n" 47 | 48 | 49 | class VideoTranscriptionResult(BaseModel): 50 | """Result of video transcription with timing information.""" 51 | 52 | text: str 53 | language: Optional[str] = None 54 | segments: List[CaptionSegment] 55 | duration: float # Total video duration in seconds 56 | process_time: float 57 | source_type: str # 'local_file' or 'youtube' 58 | source_path: str # File path or YouTube URL 59 | model: Optional[str] = None 60 | provider: Optional[str] = None 61 | 62 | def to_srt(self) -> str: 63 | """Convert all segments to SRT format.""" 64 | srt_content = "\n".join(segment.to_srt() for segment in self.segments) 65 | return srt_content 66 | 67 | def to_vtt(self) -> str: 68 | """Convert all segments to VTT format.""" 69 | vtt_content = "WEBVTT\n\n" 70 | vtt_content += "\n".join(segment.to_vtt() for segment in self.segments) 71 | return vtt_content 72 | 73 | 74 | class LanguageCaptions(BaseModel): 75 | """Captions in a specific language.""" 76 | 77 | language: str 78 | language_code: str # e.g., 'es', 'fr', 'de' 79 | segments: List[CaptionSegment] 80 | format: str # 'srt' or 'vtt' 81 | content: str # Formatted caption content 82 | 83 | def save_to_file(self, file_path: str): 84 | """Save captions to a file.""" 85 | with open(file_path, 'w', encoding='utf-8') as f: 86 | f.write(self.content) 87 | 88 | 89 | class MultiLanguageCaptionsResult(BaseModel): 90 | """Result containing captions in multiple languages.""" 91 | 92 | original_language: str 93 | original_transcription: VideoTranscriptionResult 94 | captions: Dict[str, LanguageCaptions] # language_code -> LanguageCaptions 95 | target_languages: List[str] 96 | process_time: float 97 | total_segments: int 98 | 99 | def get_captions(self, language_code: str) -> Optional[LanguageCaptions]: 100 | """Get captions for a specific language.""" 101 | return self.captions.get(language_code) 102 | 103 | def save_all(self, output_dir: str, base_filename: str): 104 | """Save all captions to files.""" 105 | import os 106 | os.makedirs(output_dir, exist_ok=True) 107 | 108 | for lang_code, caption in self.captions.items(): 109 | ext = 'srt' if caption.format == 'srt' else 'vtt' 110 | filename = f"{base_filename}.{lang_code}.{ext}" 111 | filepath = os.path.join(output_dir, filename) 112 | caption.save_to_file(filepath) 113 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_providers/deepseek_llm.py: -------------------------------------------------------------------------------- 1 | from dotenv import load_dotenv 2 | import asyncio 3 | import os 4 | import time 5 | import requests 6 | import aiohttp 7 | from .llm_response_models import LLMFullResponse 8 | 9 | # Load environment variables 10 | load_dotenv(override=True) 11 | 12 | MAX_RETRIES = int(os.getenv("MAX_RETRIES", 3)) 13 | RETRY_DELAY = int(os.getenv("RETRY_DELAY", 2)) 14 | 15 | def generate_response( 16 | model_name, 17 | messages=None, 18 | temperature=0.7, 19 | max_tokens=300, 20 | top_p=1.0, 21 | full_response=False, 22 | api_key=None, 23 | json_mode=False, 24 | ): 25 | start_time = time.time() if full_response else None 26 | headers = { 27 | "Content-Type": "application/json", 28 | "Accept": "application/json", 29 | "Authorization": f"Bearer {api_key}" 30 | } 31 | 32 | data = { 33 | "messages": messages, 34 | "model": model_name, 35 | "temperature": temperature, 36 | "max_tokens": max_tokens, 37 | "top_p": top_p, 38 | "stream": False 39 | } 40 | 41 | if json_mode: 42 | data["response_format"] = {"type": "json_object"} 43 | 44 | for attempt in range(MAX_RETRIES): 45 | try: 46 | response = requests.post( 47 | "https://api.deepseek.com/chat/completions", 48 | headers=headers, 49 | json=data 50 | ) 51 | response.raise_for_status() 52 | result = response.json() 53 | generated_text = result["choices"][0]["message"]["content"] 54 | 55 | if full_response: 56 | end_time = time.time() 57 | process_time = end_time - start_time 58 | return LLMFullResponse( 59 | generated_text=generated_text, 60 | model=model_name, 61 | process_time=process_time, 62 | input_token_count=result["usage"]["prompt_tokens"], 63 | output_token_count=result["usage"]["completion_tokens"], 64 | llm_provider_response=result, 65 | ) 66 | return generated_text 67 | 68 | except Exception as e: 69 | if attempt < MAX_RETRIES - 1: 70 | time.sleep(RETRY_DELAY * (2**attempt)) 71 | else: 72 | error_msg = f"Failed after {MAX_RETRIES} attempts due to: {e}" 73 | raise Exception(error_msg) 74 | 75 | async def generate_response_async( 76 | model_name, 77 | messages=None, 78 | temperature=0.7, 79 | max_tokens=300, 80 | top_p=1.0, 81 | full_response=False, 82 | api_key=None, 83 | json_mode=False, 84 | ): 85 | start_time = time.time() if full_response else None 86 | headers = { 87 | "Content-Type": "application/json", 88 | "Accept": "application/json", 89 | "Authorization": f"Bearer {api_key}" 90 | } 91 | 92 | data = { 93 | "messages": messages, 94 | "model": model_name, 95 | "temperature": temperature, 96 | "max_tokens": max_tokens, 97 | "top_p": top_p, 98 | "stream": False 99 | } 100 | 101 | if json_mode: 102 | data["response_format"] = {"type": "json_object"} 103 | 104 | for attempt in range(MAX_RETRIES): 105 | try: 106 | async with aiohttp.ClientSession() as session: 107 | async with session.post( 108 | "https://api.deepseek.com/chat/completions", 109 | headers=headers, 110 | json=data 111 | ) as response: 112 | response.raise_for_status() 113 | result = await response.json() 114 | generated_text = result["choices"][0]["message"]["content"] 115 | 116 | if full_response: 117 | end_time = time.time() 118 | process_time = end_time - start_time 119 | return LLMFullResponse( 120 | generated_text=generated_text, 121 | model=model_name, 122 | process_time=process_time, 123 | input_token_count=result["usage"]["prompt_tokens"], 124 | output_token_count=result["usage"]["completion_tokens"], 125 | llm_provider_response=result, 126 | ) 127 | return generated_text 128 | 129 | except Exception as e: 130 | if attempt < MAX_RETRIES - 1: 131 | await asyncio.sleep(RETRY_DELAY * (2**attempt)) 132 | else: 133 | error_msg = f"Failed after {MAX_RETRIES} attempts due to: {e}" 134 | raise Exception(error_msg) 135 | -------------------------------------------------------------------------------- /Documentation/docs/Advanced Tools/File Operations.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 1 3 | --- 4 | 5 | # File Operations 6 | 7 | SimplerLLM supports creating and loading the content of various file types. This makes it easy to load the content of any file or even content from the internet using a generic function. 8 | 9 | The file operations available are categorized into three primary areas: 10 | - **Saving Text to Files**: This functionality allows for the writing of text data to files ensuring errors are handled correctly. 11 | - **Loading CSV Files**: This functionality allows easy reading of any CSV file document, where it returns specific strucutred data. 12 | - **Generic File Loading**: This includes loading the details of various types of files, such as plain text, PDFs, DOCX, and even web pages. 13 | 14 | Here's how each of them works: 15 | 16 | ## Saving Text to File 17 | 18 | This operation contains a single function `save_text_to_file` which takes as input the text you want to save and the name of the file you want to save in. 19 | 20 | If the file is already present in your directory it just rewrites its content, however if it's not present it creates the file and adds the the input text to it. Here's an example: 21 | 22 | ```python 23 | from SimplerLLM.tools.file_functions import save_text_to_file 24 | 25 | input_text = save_text_to_file("This is the text saved in the file", "file.txt") 26 | 27 | print(input_text) 28 | ``` 29 | 30 | As you can see it takes 2 paramters: 31 | - `text (str)`: The text content to save. 32 | - `filename (str)` (Optional): The destination filename. Defaults to "output.txt". 33 | 34 | Then, it returns a bool (True/False), representing if the file was created successfully or not. 35 | 36 | ## Loading CSV Files 37 | 38 | This operation also contains a single function `load_csv_file` which takes as input the path to the CSV, and returns a `CSVDocument` object which provides a structured way to access the CSV data, including the following attributes that you can access independently: 39 | - `file_size`: The size of the CSV file in bytes. 40 | - `row_count`: Number of rows in the CSV. 41 | - `column_count`: Number of columns in the CSV. 42 | - `total_fields`: Total number of data fields. 43 | - `content`: Nested list representing rows and columns. 44 | - `title`: Title of the document (Will be set to None in this function) 45 | - `url_or_path`: CSV file name. 46 | 47 | Here's an example of the function in action: 48 | 49 | ```python 50 | from SimplerLLM.tools.file_loader import read_csv_file 51 | 52 | csv_data = read_csv_file("text.csv") 53 | 54 | print(csv_data) 55 | ``` 56 | 57 | When you print the csv_data as is it will return the whole `CSVDocument` object with all its attributes. However, if you want access for example only the content of the file, here's how you do it: 58 | 59 | ```python 60 | from SimplerLLM.tools.file_loader import read_csv_file 61 | 62 | csv_data = read_csv_file("text.csv") 63 | 64 | print(csv_data.content) 65 | ``` 66 | 67 | Use the same method for accessing the other attributes. 68 | Here's another example on how to access the column count: 69 | 70 | ```python 71 | from SimplerLLM.tools.file_loader import read_csv_file 72 | 73 | csv_data = read_csv_file("text.csv") 74 | 75 | print(csv_data.column_count) 76 | ``` 77 | 78 | ## Generic Loading Of Other File Types 79 | 80 | This generic loader supports a ton of file types which are: 81 | - Web Articles 82 | - Traditional formats like TXT, PDF, CSV, and DOCX. 83 | 84 | The `load_content` function takes the file name as input, and returns a `Text Document` object that has the following attributes: 85 | - `file_size`: The size of the file in bytes. 86 | - `word_count`: The number of words in the file. 87 | - `character_count`: The number of characters in the file. 88 | - `content`: String representing the contents of the file. 89 | - `title`: Title of the document (if it has one) 90 | - `url_or_path`: file name. 91 | 92 | Here's an example of the function in action: 93 | 94 | ```python 95 | from SimplerLLM.tools.generic_loader import load_content 96 | 97 | file_data = load_content("file_name.csv") 98 | 99 | print(file_data) 100 | ``` 101 | 102 | When you print the file_data as is it will return the whole `Text Document` object with all its attributes. However, if you want access for example only the content of the file, here's how you do it: 103 | 104 | ```python 105 | from SimplerLLM.tools.generic_loader import load_content 106 | 107 | file_data = load_content("file_name.csv") 108 | 109 | print(file_data.content) 110 | ``` 111 | 112 | Use the same method for accessing the other attributes. 113 | Here's another example on how to access the word count: 114 | 115 | ```python 116 | from SimplerLLM.tools.generic_loader import load_content 117 | 118 | file_data = load_content("file_name.csv") 119 | 120 | print(file_data.word_count) 121 | ``` 122 | 123 | That's how you can benefit from SimplerLLM to make interaction with files Simpler! -------------------------------------------------------------------------------- /SimplerLLM/tools/generic_loader.py: -------------------------------------------------------------------------------- 1 | import newspaper 2 | import os 3 | import PyPDF2 4 | import docx 5 | from youtube_transcript_api import YouTubeTranscriptApi 6 | import re 7 | from urllib.parse import urlparse 8 | from pydantic import BaseModel 9 | from typing import Optional 10 | 11 | 12 | class TextDocument(BaseModel): 13 | file_size: Optional[int] = None 14 | word_count: int 15 | character_count: int 16 | content: str 17 | title: Optional[str] = None 18 | url_or_path: Optional[str] = None 19 | 20 | 21 | def load_content(input_path_or_url): 22 | """ 23 | Load content from a given input path or URL. 24 | 25 | This function handles the following types of input: 26 | - URLs: Supports blog articles. 27 | - Local files: Supports .txt, .csv, .docx, and .pdf file extensions. 28 | Args: 29 | input_path_or_url (str) 30 | 31 | """ 32 | # Check if the input is a URL 33 | if re.match(r"http[s]?://", input_path_or_url): 34 | article = __read_blog_from_url(input_path_or_url) 35 | if article is not None: 36 | file_size = len(article.text.encode("utf-8")) # Size in bytes 37 | return TextDocument( 38 | word_count=len(article.text.split()), 39 | character_count=len(article.text), 40 | content=article.text, 41 | title=article.title, 42 | file_size=file_size, 43 | url_or_path=input_path_or_url, 44 | ) 45 | else: 46 | try: 47 | # Process based on file extension 48 | file_ext = os.path.splitext(input_path_or_url)[1].lower() 49 | if file_ext in [".txt", ".csv"]: 50 | file_size, num_words, num_chars, content = __read_text_file( 51 | input_path_or_url 52 | ) 53 | elif file_ext in [".docx"]: 54 | file_size, num_words, num_chars, content = __read_docx_file( 55 | input_path_or_url 56 | ) 57 | elif file_ext in [".pdf"]: 58 | file_size, num_words, num_chars, content = __read_pdf_file( 59 | input_path_or_url 60 | ) 61 | 62 | else: 63 | # Fallback: try reading as a text file 64 | file_size, num_words, num_chars, content = __read_text_file( 65 | input_path_or_url 66 | ) 67 | 68 | return TextDocument( 69 | file_size=file_size, 70 | word_count=num_words, 71 | character_count=num_chars, 72 | content=content, 73 | url_or_path=input_path_or_url, 74 | ) 75 | except Exception as e: 76 | raise ValueError(f"Error processing file: {e}") 77 | 78 | raise ValueError("Unable to process the input") 79 | 80 | 81 | def __read_text_file(file_path): 82 | with open(file_path, "r", encoding="utf-8") as file: 83 | content = file.read() 84 | 85 | file_size = os.path.getsize(file_path) 86 | words = content.split() 87 | num_words = len(words) 88 | num_chars = len(content) 89 | 90 | return file_size, num_words, num_chars, content 91 | 92 | 93 | def __read_docx_file(file_path): 94 | file_size = os.path.getsize(file_path) 95 | doc = docx.Document(file_path) 96 | content = "\n".join([para.text for para in doc.paragraphs]) 97 | 98 | words = content.split() 99 | num_words = len(words) 100 | num_chars = len(content) 101 | 102 | return file_size, num_words, num_chars, content 103 | 104 | 105 | def __read_pdf_file(file_path): 106 | file_size = os.path.getsize(file_path) 107 | 108 | with open(file_path, "rb") as file: 109 | reader = PyPDF2.PdfReader(file) 110 | content = "".join( 111 | [reader.pages[i].extract_text() for i in range(len(reader.pages))] 112 | ) 113 | 114 | words = content.split() 115 | num_words = len(words) 116 | num_chars = len(content) 117 | 118 | return file_size, num_words, num_chars, content 119 | 120 | 121 | def __read_blog_from_url(url): 122 | """ 123 | Extracts the text content from a given URL using the newspaper package. 124 | 125 | Parameters: 126 | url (str): The URL of the article to extract text from. 127 | 128 | Returns: 129 | str: The text content of the article if extraction is successful, None otherwise. 130 | """ 131 | try: 132 | article = newspaper.Article(url) 133 | article.download() 134 | 135 | if article.download_state == 2: 136 | article.parse() 137 | return article 138 | else: 139 | print("An error occurred while fetching the article") 140 | return None 141 | except newspaper.ArticleException as e: 142 | print(f"An error occurred while fetching the article: {e}") 143 | return None -------------------------------------------------------------------------------- /SimplerLLM/language/llm_validator/models.py: -------------------------------------------------------------------------------- 1 | """ 2 | Pydantic models for the LLM Validator system. 3 | 4 | This module defines the data structures used for multi-provider validation 5 | of AI-generated content, including scoring, confidence, and aggregation. 6 | """ 7 | 8 | from typing import Dict, List, Optional, Any 9 | from pydantic import BaseModel, Field 10 | from datetime import datetime 11 | from enum import Enum 12 | 13 | 14 | class AggregationMethod(str, Enum): 15 | """Method for aggregating scores from multiple validators.""" 16 | AVERAGE = "average" # Simple mean of all scores 17 | WEIGHTED = "weighted" # Weighted average (provider weights) 18 | MEDIAN = "median" # Median score 19 | CONSENSUS = "consensus" # Majority agreement threshold 20 | 21 | 22 | class ValidatorScoreOutput(BaseModel): 23 | """Internal model for structured validator output from LLM.""" 24 | score: float = Field( 25 | description="Validation score from 0.0 to 1.0 (0 = completely invalid, 1 = perfectly valid)", 26 | ge=0.0, 27 | le=1.0 28 | ) 29 | confidence: float = Field( 30 | description="How confident you are in this score from 0.0 to 1.0", 31 | ge=0.0, 32 | le=1.0 33 | ) 34 | explanation: str = Field( 35 | description="Detailed explanation of why you gave this score" 36 | ) 37 | 38 | 39 | class ValidatorScore(BaseModel): 40 | """Individual validator's score for the content.""" 41 | provider_name: str = Field(description="Name of the LLM provider (e.g., 'OPENAI', 'ANTHROPIC')") 42 | model_name: str = Field(description="Specific model used (e.g., 'gpt-4o', 'claude-3-5-sonnet')") 43 | score: float = Field( 44 | description="Validation score from 0.0 to 1.0", 45 | ge=0.0, 46 | le=1.0 47 | ) 48 | confidence: float = Field( 49 | description="How confident the validator is in its score (0.0 to 1.0)", 50 | ge=0.0, 51 | le=1.0 52 | ) 53 | explanation: str = Field(description="Explanation for the score") 54 | is_valid: bool = Field(description="Whether the content passes validation (score >= threshold)") 55 | execution_time: float = Field(description="Time taken to validate in seconds") 56 | error: Optional[str] = Field(default=None, description="Error message if validation failed") 57 | 58 | class Config: 59 | json_schema_extra = { 60 | "example": { 61 | "provider_name": "OPENAI", 62 | "model_name": "gpt-4o", 63 | "score": 0.85, 64 | "confidence": 0.92, 65 | "explanation": "The content is factually accurate and well-structured.", 66 | "is_valid": True, 67 | "execution_time": 1.5, 68 | "error": None 69 | } 70 | } 71 | 72 | 73 | class ValidationResult(BaseModel): 74 | """Complete result from LLM Validator.""" 75 | overall_score: float = Field( 76 | description="Aggregated validation score (0.0 to 1.0)", 77 | ge=0.0, 78 | le=1.0 79 | ) 80 | overall_confidence: float = Field( 81 | description="Aggregated confidence score (0.0 to 1.0)", 82 | ge=0.0, 83 | le=1.0 84 | ) 85 | is_valid: bool = Field(description="Overall pass/fail based on threshold") 86 | validators: List[ValidatorScore] = Field(description="Individual validator scores") 87 | consensus: bool = Field(description="Whether validators agreed on the result") 88 | consensus_details: str = Field(description="Explanation of validator agreement") 89 | aggregation_method: AggregationMethod = Field(description="Method used for aggregation") 90 | content_validated: str = Field(description="The content that was validated") 91 | validation_prompt: str = Field(description="The validation instructions used") 92 | original_question: Optional[str] = Field(default=None, description="The original question if provided") 93 | total_execution_time: float = Field(description="Total time for all validators in seconds") 94 | timestamp: datetime = Field(default_factory=datetime.now, description="When validation completed") 95 | 96 | class Config: 97 | json_schema_extra = { 98 | "example": { 99 | "overall_score": 0.85, 100 | "overall_confidence": 0.90, 101 | "is_valid": True, 102 | "validators": [], 103 | "consensus": True, 104 | "consensus_details": "All validators scored within 0.15 of each other", 105 | "aggregation_method": "average", 106 | "content_validated": "Paris is the capital of France.", 107 | "validation_prompt": "Check if the facts are accurate.", 108 | "original_question": "What is the capital of France?", 109 | "total_execution_time": 3.2, 110 | "timestamp": "2025-01-15T10:30:00" 111 | } 112 | } 113 | -------------------------------------------------------------------------------- /Documentation/docs/Advanced Tools/Generic RapidAPI Loader.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 5 3 | --- 4 | 5 | # Generic RapidAPI Loader 6 | 7 | This tool helps calling any API on RapidAPI using a generic function, offering methods for both synchronous and asynchronous requests. This makes it easier for developers who need to integrate multiple API services into their applications. This client manages API keys, request headers, and error handling internally, reducing the overhead for the developers and allowing them to focus on implementing the core functionality. 8 | 9 | To use it, you'll need to input your API key in the `.env` file in this form: 10 | 11 | ``` 12 | RAPIDAPI_API_KEY="your_api_key" 13 | ``` 14 | 15 | ## Synchronous API Calls (`call_api`) 16 | 17 | The `call_api` method allows users to perform synchronous HTTP requests to any specified RapidAPI endpoint. This method is useful when the application requires a direct response from the API for further processing. 18 | 19 | ### Parameters 20 | - `api_url`: String specifying the full URL of the API endpoint. 21 | - `method`: HTTP method to use (e.g., 'GET', 'POST'), default is 'GET'. 22 | - `params` (Optional): Dictionary of query parameters for the 'GET' request. 23 | - `headers_extra` (Optional): Dictionary to provide additional headers. 24 | - `data` (Optional): Dictionary specifying data for 'POST' requests. 25 | - `json` (Optional): Dictionary specifying JSON payload for 'POST' requests. 26 | - `max_retries` (Optional): Maximum number of retries 27 | - `backoff_factor` (Optional): Factor by which the delay increases during each retry 28 | 29 | The response from the API is checked for its HTTP status code. If the response indicates success (e.g., 200, 201), the method returns the JSON data. If the response is not successful, it raises an HTTP error with the status code and message. 30 | 31 | **This function includes automatic retries** for client connection errors (e.g., timeouts, server not available). The method will attempt to resend the request for a default number of 3 times with a backoff factor of 2. 32 | 33 | If you wnat to change these values you can include them in the parameters when sending an API call. 34 | 35 | ### Sample Use Case 36 | 37 | Let's try using the [Domain Authority API](https://rapidapi.com/hassan.cs91/api/domain-authority1/playground/apiendpoint_f2c2bcde-e9c2-45aa-9d0c-47d6b21b876b) which is an API i developed that returns the domain power, organic clicks, average rank, and keywords rank for any domain name. 38 | 39 | ```python 40 | from SimplerLLM.tools.rapid_api import RapidAPIClient 41 | 42 | api_url = "https://domain-authority1.p.rapidapi.com/seo/get-domain-info" 43 | api_params = { 44 | 'domain': 'learnwithhasan.com', 45 | } 46 | 47 | api_client = RapidAPIClient() 48 | response = api_client.call_api(api_url, method='GET', params=api_params) 49 | ``` 50 | 51 | ## Asynchronous API Calls (`call_api_async`) 52 | 53 | The `call_api_async` function is designed for making asynchronous API calls. This is particularly useful in environments that support asynchronous operations, allowing non-blocking API calls that can greatly improve the efficiency of your application. 54 | 55 | ### Parameters 56 | - `api_url`: String specifying the full URL of the API endpoint. 57 | - `method`: HTTP method to use (e.g., 'GET', 'POST'), default is 'GET'. 58 | - `params` (Optional): Dictionary of query parameters for the 'GET' request. 59 | - `headers_extra` (Optional): Dictionary to provide additional headers. 60 | - `data` (Optional): Dictionary specifying data for 'POST' requests. 61 | - `json` (Optional): Dictionary specifying JSON payload for 'POST' requests. 62 | - `max_retries` (Optional): Maximum number of retries 63 | - `backoff_factor` (Optional): Factor by which the delay increases during each retry 64 | 65 | As you can see it takes the same parameters as the `call_api`, but each request is handled asynchronously. 66 | 67 | The response from the API is checked for its HTTP status code. If the response indicates success (e.g., 200, 201), the method returns the JSON data. If the response is not successful, it raises an HTTP error with the status code and message. 68 | 69 | **This function includes automatic retries** for client connection errors (e.g., timeouts, server not available). The method will attempt to resend the request for a default number of 3 times with a backoff factor of 2. 70 | 71 | If you want to change these values you can include them in the parameters when sending an API call. 72 | 73 | ### Sample Use Case 74 | 75 | Let's try using the [Domain Authority API](https://rapidapi.com/hassan.cs91/api/domain-authority1/playground/apiendpoint_f2c2bcde-e9c2-45aa-9d0c-47d6b21b876b) which is an API i developed that returns the domain power, organic clicks, average rank, and keywords rank for any domain name. 76 | 77 | ```python 78 | from SimplerLLM.tools.rapid_api import RapidAPIClient 79 | 80 | api_url = "https://domain-authority1.p.rapidapi.com/seo/get-domain-info" 81 | api_params = { 82 | 'domain': 'learnwithhasan.com', 83 | } 84 | 85 | api_client = RapidAPIClient() 86 | response = api_client.call_api_async(api_url, method='GET', params=api_params) 87 | ``` 88 | 89 | That's how you can benefit from SimplerLLM to make RapidAPI calling Simpler! -------------------------------------------------------------------------------- /SimplerLLM/language/guardrails/input_guardrails/prompt_injection.py: -------------------------------------------------------------------------------- 1 | """ 2 | Prompt injection guardrail for adding safety and instruction rules to system prompts. 3 | """ 4 | 5 | from typing import Optional, List, Dict 6 | from SimplerLLM.language.guardrails.base import ( 7 | InputGuardrail, 8 | GuardrailResult, 9 | GuardrailAction 10 | ) 11 | 12 | 13 | class PromptInjectionGuardrail(InputGuardrail): 14 | """ 15 | Injects safety rules or instructions into the system prompt. 16 | 17 | This guardrail automatically adds safety rules, ethical guidelines, 18 | or specific instructions to the system prompt before LLM generation. 19 | 20 | Configuration: 21 | - safety_rules (str): Rules to inject (default: built-in safety rules) 22 | - position (str): Where to inject - 'prepend' or 'append' (default: 'prepend') 23 | - separator (str): Text to separate injected rules from original (default: '\\n\\n') 24 | - custom_instructions (list): Additional custom instructions to include 25 | 26 | Example: 27 | >>> guardrail = PromptInjectionGuardrail(config={ 28 | ... "safety_rules": "Always be helpful and harmless.", 29 | ... "position": "prepend" 30 | ... }) 31 | >>> result = guardrail.validate( 32 | ... prompt="Hello", 33 | ... system_prompt="You are an assistant" 34 | ... ) 35 | """ 36 | 37 | DEFAULT_SAFETY_RULES = """IMPORTANT SAFETY AND ETHICAL GUIDELINES: 38 | - Do not generate harmful, illegal, or unethical content 39 | - Respect user privacy and do not request sensitive personal information 40 | - Be truthful, accurate, and acknowledge uncertainty when appropriate 41 | - Decline requests that violate ethical guidelines or could cause harm 42 | - Avoid generating content that could be used for malicious purposes""" 43 | 44 | def __init__(self, config: Optional[Dict] = None): 45 | """ 46 | Initialize the prompt injection guardrail. 47 | 48 | Args: 49 | config: Configuration dictionary 50 | """ 51 | super().__init__(config) 52 | 53 | # Get configuration 54 | self.safety_rules = self.config.get("safety_rules", self.DEFAULT_SAFETY_RULES) 55 | self.position = self.config.get("position", "prepend") # prepend or append 56 | self.separator = self.config.get("separator", "\n\n") 57 | self.custom_instructions = self.config.get("custom_instructions", []) 58 | 59 | # Validate configuration 60 | if self.position not in ["prepend", "append"]: 61 | self.position = "prepend" 62 | 63 | def _inject_rules(self, system_prompt: str) -> str: 64 | """ 65 | Inject rules into system prompt. 66 | 67 | Args: 68 | system_prompt: Original system prompt 69 | 70 | Returns: 71 | Modified system prompt with injected rules 72 | """ 73 | # Combine safety rules with custom instructions 74 | all_rules = [self.safety_rules] 75 | if self.custom_instructions: 76 | all_rules.extend(self.custom_instructions) 77 | 78 | injected_content = self.separator.join(all_rules) 79 | 80 | # Inject based on position 81 | if self.position == "prepend": 82 | modified = f"{injected_content}{self.separator}{system_prompt}" 83 | else: # append 84 | modified = f"{system_prompt}{self.separator}{injected_content}" 85 | 86 | return modified 87 | 88 | def validate( 89 | self, 90 | prompt: str, 91 | system_prompt: str, 92 | messages: Optional[List[Dict]] = None, 93 | **kwargs 94 | ) -> GuardrailResult: 95 | """ 96 | Inject safety rules into system prompt. 97 | 98 | Args: 99 | prompt: User prompt (not modified) 100 | system_prompt: System prompt to modify 101 | messages: Optional conversation messages 102 | **kwargs: Additional context 103 | 104 | Returns: 105 | GuardrailResult with MODIFY action and modified system prompt 106 | """ 107 | modified_system = self._inject_rules(system_prompt) 108 | 109 | return GuardrailResult( 110 | action=GuardrailAction.MODIFY, 111 | passed=True, 112 | message="Injected safety rules into system prompt", 113 | modified_content=modified_system, 114 | metadata={ 115 | "system": True, 116 | "target": "system", 117 | "injection_position": self.position, 118 | "rules_count": 1 + len(self.custom_instructions) 119 | }, 120 | guardrail_name=self.name 121 | ) 122 | 123 | async def validate_async( 124 | self, 125 | prompt: str, 126 | system_prompt: str, 127 | messages: Optional[List[Dict]] = None, 128 | **kwargs 129 | ) -> GuardrailResult: 130 | """ 131 | Async version of validate. 132 | 133 | Since this guardrail doesn't make async calls, it just delegates 134 | to the sync version. 135 | """ 136 | return self.validate(prompt, system_prompt, messages, **kwargs) 137 | -------------------------------------------------------------------------------- /SimplerLLM/tools/rapid_api.py: -------------------------------------------------------------------------------- 1 | from dotenv import load_dotenv 2 | import os 3 | import time 4 | import requests 5 | import aiohttp 6 | import asyncio 7 | from typing import Optional, Any, Dict 8 | 9 | load_dotenv() # Load the environment variables 10 | 11 | class RapidAPIClient: 12 | def __init__(self, api_key: Optional[str] = None, timeout: int = 30): 13 | """ 14 | Initialize the RapidAPI client. 15 | 16 | :param api_key: Optional API key. If not provided, it will be read from the environment variable 'RAPID_API_KEY'. 17 | :param timeout: Request timeout in seconds. 18 | """ 19 | self.api_key = api_key if api_key else os.getenv('RAPIDAPI_API_KEY') 20 | self.timeout = timeout 21 | 22 | if not self.api_key: 23 | raise ValueError("API key must be provided or set as an environment variable 'RAPID_API_KEY'") 24 | 25 | def _construct_headers(self, api_url: str, headers_extra: Optional[Dict[str, str]] = None) -> Dict[str, str]: 26 | """ 27 | Construct headers for the API call. 28 | 29 | :param api_url: URL of the RapidAPI endpoint 30 | :param headers_extra: Additional headers if required by the API 31 | :return: Dictionary of headers 32 | """ 33 | headers = { 34 | 'x-rapidapi-key': self.api_key, 35 | 'x-rapidapi-host': api_url.split('/')[2] 36 | } 37 | 38 | if headers_extra: 39 | headers.update(headers_extra) 40 | 41 | return headers 42 | 43 | def _check_response(self, response: requests.Response) -> Any: 44 | """ 45 | Check the response status and return the JSON data if successful. 46 | 47 | :param response: Response object from requests library. 48 | :return: JSON response from the API 49 | """ 50 | if response.status_code in [200, 201, 202, 204]: 51 | return response.json() if response.text else None 52 | response.raise_for_status() 53 | 54 | def call_api(self, api_url: str, method: str = 'GET', headers_extra: Optional[Dict[str, str]] = None, params: Optional[Dict[str, str]] = None, data: Optional[Dict[str, str]] = None, json: Optional[Dict[str, Any]] = None, max_retries: int = 3, backoff_factor: int = 2) -> Any: 55 | """ 56 | Make a synchronous API call to a RapidAPI endpoint. 57 | 58 | :param api_url: URL of the RapidAPI endpoint 59 | :param method: HTTP method ('GET' or 'POST') 60 | :param headers_extra: Additional headers if required by the API 61 | :param params: Query parameters for GET request 62 | :param data: Form data for POST request 63 | :param json: JSON data for POST request 64 | :param max_retries: Maximum number of retries 65 | :param backoff_factor: Factor by which the delay increases during each retry 66 | :return: JSON response from the API 67 | """ 68 | headers = self._construct_headers(api_url, headers_extra) 69 | retries = 0 70 | 71 | while retries < max_retries: 72 | try: 73 | with requests.request(method, api_url, headers=headers, params=params, data=data, json=json, timeout=self.timeout) as response: 74 | return self._check_response(response) 75 | except requests.RequestException as e: 76 | retries += 1 77 | if retries >= max_retries: 78 | raise e 79 | time.sleep(backoff_factor ** retries) 80 | 81 | async def call_api_async(self, api_url: str, method: str = 'GET', headers_extra: Optional[Dict[str, str]] = None, params: Optional[Dict[str, str]] = None, data: Optional[Dict[str, str]] = None, json: Optional[Dict[str, Any]] = None, max_retries: int = 3, backoff_factor: int = 2) -> Any: 82 | """ 83 | Make an asynchronous API call to a RapidAPI endpoint. 84 | 85 | :param api_url: URL of the RapidAPI endpoint 86 | :param method: HTTP method ('GET' or 'POST') 87 | :param headers_extra: Additional headers if required by the API 88 | :param params: Query parameters for GET request 89 | :param data: Form data for POST request 90 | :param json: JSON data for POST request 91 | :param max_retries: Maximum number of retries 92 | :param backoff_factor: Factor by which the delay increases during each retry 93 | :return: JSON response from the API 94 | """ 95 | headers = self._construct_headers(api_url, headers_extra) 96 | 97 | async with aiohttp.ClientSession() as session: 98 | retries = 0 99 | while retries < max_retries: 100 | try: 101 | async with session.request(method, api_url, headers=headers, params=params, data=data, json=json, timeout=self.timeout) as response: 102 | if response.status in [200, 201, 202, 204]: 103 | return await response.json() if response.text else None 104 | response.raise_for_status() 105 | except aiohttp.ClientError as e: 106 | retries += 1 107 | if retries >= max_retries: 108 | raise e 109 | await asyncio.sleep(backoff_factor ** retries) 110 | -------------------------------------------------------------------------------- /SimplerLLM/language/llm_providers/ollama_llm.py: -------------------------------------------------------------------------------- 1 | from typing import Dict, Optional 2 | import os 3 | from dotenv import load_dotenv 4 | import aiohttp 5 | import asyncio 6 | import time 7 | import json 8 | import requests 9 | from .llm_response_models import LLMFullResponse 10 | 11 | # Load environment variables 12 | load_dotenv(override=True) 13 | 14 | 15 | MAX_RETRIES = int(os.getenv("MAX_RETRIES", 3)) 16 | RETRY_DELAY = int(os.getenv("RETRY_DELAY", 2)) 17 | OLLAMA_URL = str(os.getenv("OLLAMA_URL", "http://localhost:11434/")) + "api/chat" 18 | 19 | def generate_response( 20 | model_name: str, 21 | messages=None, 22 | temperature: float = 0.7, 23 | max_tokens: int = 300, 24 | top_p: float = 1.0, 25 | full_response: bool = False, 26 | json_mode=False, 27 | ) -> Optional[Dict]: 28 | """ 29 | Makes a POST request to the Anthropic API to generate content based on the provided text 30 | with specified generation configuration settings. 31 | """ 32 | start_time = time.time() # Record the start time 33 | retry_attempts = 3 34 | retry_delay = 1 # initial delay between retries in seconds 35 | 36 | 37 | # Define the URL and headers 38 | url = OLLAMA_URL 39 | headers = { 40 | "content-type": "application/json", 41 | } 42 | 43 | # Create the data payload 44 | payload = { 45 | "model": model_name, 46 | "messages": messages, 47 | "temperature": temperature, 48 | "top_p": top_p, 49 | "num_predict": max_tokens, 50 | "stream": False, 51 | } 52 | 53 | for attempt in range(retry_attempts): 54 | try: 55 | response = requests.post(url, headers=headers, json=payload) 56 | response.raise_for_status() # Raises HTTPError for bad requests (4XX or 5XX) 57 | 58 | if full_response: 59 | response_json = response.json() 60 | return LLMFullResponse( 61 | generated_text=response_json["message"]["content"], 62 | model=model_name, 63 | process_time=time.time() - start_time, 64 | input_token_count=response_json["prompt_eval_count"], 65 | output_token_count=response_json["eval_count"], 66 | llm_provider_response=response_json, 67 | ) 68 | 69 | else: 70 | return response.json()["message"]["content"] 71 | 72 | except Exception as e: 73 | if attempt < retry_attempts - 1: 74 | print(f"Attempt {attempt + 1} failed: {e}") 75 | time.sleep(retry_delay) 76 | retry_delay *= 2 # Double the delay each retry 77 | else: 78 | error_msg = f"Failed after {retry_attempts} attempts due to: {e}" 79 | raise Exception(error_msg) 80 | 81 | async def generate_response_async( 82 | model_name: str, 83 | messages=None, 84 | temperature: float = 0.7, 85 | max_tokens: int = 300, 86 | top_p: float = 1.0, 87 | full_response: bool = False, 88 | json_mode=False, 89 | ) -> Optional[Dict]: 90 | """ 91 | Makes an asynchronous POST request to the Anthropic API to generate content based on the provided text 92 | with specified generation configuration settings using asyncio. 93 | """ 94 | start_time = time.time() # Record the start time 95 | retry_attempts = 3 96 | retry_delay = 1 # initial delay between retries in seconds 97 | 98 | 99 | # Define the URL and headers 100 | url = OLLAMA_URL 101 | headers = { 102 | "content-type": "application/json", 103 | } 104 | 105 | # Create the data payload 106 | payload = { 107 | "model": model_name, 108 | "messages": messages, 109 | "temperature": temperature, 110 | "top_p": top_p, 111 | "num_predict": max_tokens, 112 | "stream": False, 113 | } 114 | 115 | 116 | async with aiohttp.ClientSession() as session: 117 | for attempt in range(retry_attempts): 118 | try: 119 | async with session.post(url, headers=headers, json=payload) as response: 120 | response.raise_for_status() # Raises HTTPError for bad requests (4XX or 5XX) 121 | data = await response.json() 122 | if full_response: 123 | return LLMFullResponse( 124 | generated_text=data["message"]["content"], 125 | model=model_name, 126 | process_time=time.time() - start_time, 127 | input_token_count=data["prompt_eval_count"], 128 | output_token_count=data["eval_count"], 129 | llm_provider_response=data, 130 | ) 131 | 132 | else: 133 | return data["message"]["content"] 134 | 135 | except Exception as e: 136 | if attempt < retry_attempts - 1: 137 | print(f"Attempt {attempt + 1} failed: {e}") 138 | await asyncio.sleep(retry_delay) 139 | retry_delay *= 2 # Double the delay each retry 140 | else: 141 | error_msg = f"Failed after {retry_attempts} attempts due to: {e}" 142 | raise Exception(error_msg) 143 | -------------------------------------------------------------------------------- /Documentation/docusaurus.config.js: -------------------------------------------------------------------------------- 1 | // @ts-check 2 | // `@type` JSDoc annotations allow editor autocompletion and type checking 3 | // (when paired with `@ts-check`). 4 | // There are various equivalent ways to declare your Docusaurus config. 5 | // See: https://docusaurus.io/docs/api/docusaurus-config 6 | 7 | import {themes as prismThemes} from 'prism-react-renderer'; 8 | 9 | /** @type {import('@docusaurus/types').Config} */ 10 | const config = { 11 | title: 'SimplerLLM', 12 | tagline: 'Easy Pass to Advanced AI', 13 | favicon: 'img/Logo.png', 14 | 15 | // Set the production url of your site here 16 | url: 'https://docs.simplerllm.com', 17 | // Set the // pathname under which your site is served 18 | // For GitHub pages deployment, it is often '//' 19 | baseUrl: '/', 20 | 21 | // GitHub pages deployment config. 22 | // If you aren't using GitHub pages, you don't need these. 23 | organizationName: 'LearnWithHasan', // Usually your GitHub org/user name. 24 | projectName: 'SimplerLLM', // Usually your repo name. 25 | 26 | onBrokenLinks: 'throw', 27 | onBrokenMarkdownLinks: 'warn', 28 | 29 | // Even if you don't use internationalization, you can use this field to set 30 | // useful metadata like html lang. For example, if your site is Chinese, you 31 | // may want to replace "en" with "zh-Hans". 32 | i18n: { 33 | defaultLocale: 'en', 34 | locales: ['en'], 35 | }, 36 | 37 | presets: [ 38 | [ 39 | 'classic', 40 | /** @type {import('@docusaurus/preset-classic').Options} */ 41 | ({ 42 | docs: { 43 | routeBasePath: '/', 44 | sidebarPath: './sidebars.js', 45 | // Please change this to your repo. 46 | // Remove this to remove the "edit this page" links. 47 | editUrl: 48 | 'https://github.com/hassancs91/SimplerLLM/tree/main/', 49 | }, 50 | blog: false, 51 | //blog: { 52 | // showReadingTime: true, 53 | // feedOptions: { 54 | // type: ['rss', 'atom'], 55 | // xslt: true, 56 | // }, 57 | // Please change this to your repo. 58 | // Remove this to remove the "edit this page" links. 59 | // editUrl: 60 | // 'https://github.com/facebook/docusaurus/tree/main/packages/create-docusaurus/templates/shared/', 61 | // // Useful options to enforce blogging best practices 62 | // onInlineTags: 'warn', 63 | // onInlineAuthors: 'warn', 64 | // onUntruncatedBlogPosts: 'warn', 65 | // }, 66 | theme: { 67 | customCss: './src/css/custom.css', 68 | }, 69 | }), 70 | ], 71 | ], 72 | plugins: [ 73 | [ 74 | '@docusaurus/plugin-google-gtag', 75 | { 76 | trackingID: 'G-JF551810R5' 77 | }, 78 | ], 79 | ], 80 | themeConfig: 81 | /** @type {import('@docusaurus/preset-classic').ThemeConfig} */ 82 | ({ 83 | // Replace with your project's social card 84 | image: 'img/Logo.png', 85 | navbar: { 86 | title: 'SimplerLLM Documention', 87 | logo: { 88 | alt: 'SimplerLLM Logo', 89 | src: './img/Logo.png', 90 | }, 91 | items: [ 92 | // { 93 | // type: 'docSidebar', 94 | // sidebarId: 'tutorialSidebar', 95 | // position: 'left', 96 | // label: 'Tutorial', 97 | // }, 98 | //{to: '/blog', label: 'Blog', position: 'left'}, 99 | { 100 | href: 'https://simplerllm.com/', 101 | label: 'Home Page', 102 | position: 'left', 103 | }, 104 | { 105 | href: 'https://github.com/hassancs91/SimplerLLM/blob/main/readme.md', 106 | label: 'GitHub', 107 | position: 'right', 108 | }, 109 | { 110 | href: 'https://discord.com/invite/HUrtZXyp3j', 111 | label: 'Discord', 112 | position: 'right', 113 | }, 114 | ], 115 | }, 116 | footer: { 117 | style: 'dark', 118 | // links: [ 119 | // { 120 | // title: 'Navigation', 121 | // items: [ 122 | // { 123 | // label: 'Introduction', 124 | // to: '/', 125 | // }, 126 | // { 127 | // label: 'Home Page', 128 | // to: 'https://simplerllm.com', 129 | // }, 130 | // ], 131 | // }, 132 | // { 133 | // title: 'Community', 134 | // items: [ 135 | // { 136 | // label: 'GitHub', 137 | // href: 'https://github.com/hassancs91/SimplerLLM/blob/main/readme.md', 138 | // }, 139 | // { 140 | // label: 'Discord', 141 | // href: 'https://discord.com/invite/HUrtZXyp3j', 142 | // }, 143 | // ], 144 | // }, 145 | // { 146 | // title: 'More', 147 | // items: [ 148 | // { 149 | // label: 'Blog', 150 | // to: '/blog', 151 | // }, 152 | // { 153 | // label: 'GitHub', 154 | // href: 'https://github.com/facebook/docusaurus', 155 | // }, 156 | // ], 157 | // }, 158 | //], 159 | copyright: `Copyright © ${new Date().getFullYear()} SimplerLLM. All Rights Reserved`, 160 | }, 161 | prism: { 162 | theme: prismThemes.github, 163 | darkTheme: prismThemes.dracula, 164 | }, 165 | }), 166 | }; 167 | 168 | export default config; 169 | -------------------------------------------------------------------------------- /Documentation/docs/Advanced Tools/Search Engine Integration.md: -------------------------------------------------------------------------------- 1 | --- 2 | sidebar_position: 2 3 | --- 4 | 5 | # Search Engine Integration 6 | 7 | This section provides an overview of how SimplerLLM facilitates the integration of Google and DuckDuckGo search engines in your code. It includes functions that allow applications to retrieve search results directly through APIs provided by Google and the free-to-use DuckDuckGo API. 8 | 9 | The Data is returned in form of a `SearchResult` object which is designed to standardize the format of search results across different search engines. This object includes the following fields: 10 | - `URL` : The URL of the search result. 11 | - `Domain`: The domain name extracted from the URL. 12 | - `Title`: The title of the search result. 13 | - `Description`: A brief description associated with the result. 14 | 15 | ## Google Search Integration 16 | 17 | Google search integration is supported through two paid APIs: the Serper API and the Value SERP API. These APIs provide excellent search capabilities and are suitable for applications requiring good and stable search functionalities. 18 | 19 | ### Serper API Functions 20 | 21 | To use the Serper API functions, you'll need to have your Serper API Key in the `.env` file in your project folder in this format: 22 | 23 | ``` 24 | SERPER_API_KEY="your_serper_api_key" 25 | ``` 26 | 27 | **Synchronous `search_with_serper_api` Function** 28 | 29 | It Takes 2 parameters: 30 | - `query` (string): The search query. 31 | - `num_results` (int, Optional): The maximum number of results to return, default is 50. 32 | 33 | It returns a list of `SearchResult` objects depending on the maximum number of results you specifiy. Here's an example usage: 34 | 35 | ```python 36 | from SimplerLLM.tools.serp import search_with_serper_api 37 | 38 | search_results = search_with_serper_api("What is SEO", 5) 39 | 40 | print(search_results) 41 | ``` 42 | 43 | **Asynchronous `search_with_serper_api_async` Function** 44 | 45 | It's the same as the normal function, however it Asynchronously fetches search results from Google using the Serper API. 46 | 47 | It also takes the same 2 parameters, and returns a list of `SearchResult` objects depending on the maximum number of results you specifiy. Here's an example usage: 48 | 49 | ```python 50 | import asyncio 51 | from SimplerLLM.tools.serp import search_with_serper_api_async 52 | 53 | async def fetch_results(): 54 | search_result = await search_with_serper_api_async("Latest AI advancements", 5) 55 | print(search_result) 56 | 57 | asyncio.run(fetch_results()) 58 | ``` 59 | 60 | ### Value SERP API Functions 61 | 62 | To use the Value SERP API functions, you'll need to have your Value SERP API Key in the `.env` file in your project folder in this format: 63 | 64 | ``` 65 | VALUE_SERP_API_KEY="your_value_serp_api_key" 66 | ``` 67 | 68 | **Synchronous `search_with_value_serp` Function** 69 | 70 | It takes 2 parameters: 71 | - `query` (string): The search query. 72 | - `num_results` (int, Optional): The maximum number of results to return, default is 50. 73 | 74 | It returns a list of `SearchResult` objects depending on the maximum number of results you specify. Here's an example usage: 75 | 76 | ```python 77 | from SimplerLLM.tools.serp import search_with_value_serp 78 | 79 | search_results = search_with_value_serp("What is SEO", 5) 80 | 81 | print(search_results) 82 | ``` 83 | 84 | **Asynchronous `search_with_value_serp_async` Function** 85 | 86 | It's the same as the normal function, however it asynchronously fetches search results from Google using the Value SERP API. 87 | 88 | It also takes the same 2 parameters, and returns a list of `SearchResult` objects depending on the maximum number of results you specify. Here's an example usage: 89 | 90 | ```python 91 | import asyncio 92 | from SimplerLLM.tools.serp import search_with_value_serp_async 93 | 94 | async def fetch_results(): 95 | search_results = await search_with_value_serp_async("Latest AI advancements", 5) 96 | print(search_results) 97 | 98 | asyncio.run(fetch_results()) 99 | ``` 100 | 101 | ## DuckDuckGo API Functions 102 | 103 | Unlike Google Search which is integrated using paid APIs, the DuckDuckGo search integration avaiable through their own free to use API. However, DuckDuckGo prioritizes user privacy and doesn’t track search history, leading to less personalized search results. This can make its results less relevant compared to Google, which customizes searches using a lot of user data. 104 | 105 | **Synchronous `search_with_duck_duck_go` Function** 106 | 107 | It takes 2 parameters: 108 | - `query` (string): The search query. 109 | - `max_results` (int, Optional): The maximum number of results to return, default is 10. 110 | 111 | It returns a list of `SearchResult` objects depending on the maximum number of results you specify. Here's an example usage: 112 | 113 | ```python 114 | from SimplerLLM.tools.serp import search_with_duck_duck_go 115 | 116 | search_results = search_with_duck_duck_go("Open source projects", 10) 117 | 118 | print(search_results) 119 | ``` 120 | 121 | **Asynchronous `search_with_duck_duck_go_async` Function** 122 | 123 | It's the same as the normal function but fetches search results from DuckDuckGo asynchronously. 124 | 125 | It also takes the same 2 parameters, and returns a list of `SearchResult` objects depending on the maximum number of results you specify. Here's an example usage: 126 | 127 | ```python 128 | import asyncio 129 | from SimplerLLM.tools.serp import search_with_duck_duck_go_async 130 | 131 | async def fetch_results(): 132 | search_results = await search_with_duck_duck_go_async("Open source tools", 10) 133 | print(search_results) 134 | 135 | asyncio.run(fetch_results()) 136 | ``` 137 | 138 | That's how you can benefit from SimplerLLM to make Search Engine Integration Simpler! --------------------------------------------------------------------------------