├── __init__.py ├── core ├── __init__.py ├── exceptions.py ├── schedules.py └── prompts.py ├── graph ├── utils │ ├── __init__.py │ ├── helpers.py │ └── chains.py ├── __init__.py ├── edges.py ├── state.py ├── graph.py └── nodes.py ├── mycalendar ├── __init__.py ├── langchain_integration.py └── calendar_tool.py ├── modules ├── schedules │ ├── __init__.py │ └── context_generation.py ├── search │ ├── __init__.py │ └── tavily_search.py ├── image │ ├── __init__.py │ ├── text_to_image.py │ └── image_to_text.py ├── speech │ ├── __init__.py │ ├── text_to_speech.py │ └── speech_to_text.py └── memory │ └── long_term │ ├── memory_manager.py │ └── vector_store.py ├── images ├── img1.png ├── img3.png ├── img4.png ├── img5.png ├── img6.png ├── img7.png ├── img8.png ├── img9.png ├── image1.png ├── img10.png ├── img11.png ├── img12.png ├── img13.png ├── img14.png ├── img15.png ├── img16.png ├── kylie_graph.png └── architecture.png ├── main.py ├── requirements.txt ├── pyproject.toml ├── settings.py ├── workflow.md ├── SETUP_GUIDE.md ├── whatsapp_response.py └── README.md /__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /core/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /graph/utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /mycalendar/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /modules/schedules/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /images/img1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img1.png -------------------------------------------------------------------------------- /images/img3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img3.png -------------------------------------------------------------------------------- /images/img4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img4.png -------------------------------------------------------------------------------- /images/img5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img5.png -------------------------------------------------------------------------------- /images/img6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img6.png -------------------------------------------------------------------------------- /images/img7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img7.png -------------------------------------------------------------------------------- /images/img8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img8.png -------------------------------------------------------------------------------- /images/img9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img9.png -------------------------------------------------------------------------------- /images/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/image1.png -------------------------------------------------------------------------------- /images/img10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img10.png -------------------------------------------------------------------------------- /images/img11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img11.png -------------------------------------------------------------------------------- /images/img12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img12.png -------------------------------------------------------------------------------- /images/img13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img13.png -------------------------------------------------------------------------------- /images/img14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img14.png -------------------------------------------------------------------------------- /images/img15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img15.png -------------------------------------------------------------------------------- /images/img16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img16.png -------------------------------------------------------------------------------- /images/kylie_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/kylie_graph.png -------------------------------------------------------------------------------- /images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/architecture.png -------------------------------------------------------------------------------- /modules/search/__init__.py: -------------------------------------------------------------------------------- 1 | from .tavily_search import TavilySearch 2 | 3 | __all__ = ["TavilySearch"] 4 | -------------------------------------------------------------------------------- /graph/__init__.py: -------------------------------------------------------------------------------- 1 | from graph.graph import create_workflow_graph 2 | 3 | graph_builder = create_workflow_graph() 4 | -------------------------------------------------------------------------------- /modules/image/__init__.py: -------------------------------------------------------------------------------- 1 | from .image_to_text import ImageToText 2 | from .text_to_image import TextToImage 3 | 4 | __all__ = ["ImageToText", "TextToImage"] 5 | -------------------------------------------------------------------------------- /modules/speech/__init__.py: -------------------------------------------------------------------------------- 1 | from .speech_to_text import SpeechToText 2 | from .text_to_speech import TextToSpeech 3 | 4 | __all__ = ["SpeechToText", "TextToSpeech"] 5 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from dotenv import load_dotenv 2 | load_dotenv() 3 | 4 | from fastapi import FastAPI 5 | 6 | from whatsapp_response import whatsapp_router 7 | 8 | app = FastAPI() 9 | app.include_router(whatsapp_router) 10 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | chainlit 2 | elevenlabs 3 | fastapi[standard] 4 | groq 5 | langchain-community 6 | langchain-groq 7 | langchain 8 | pydantic 9 | together 10 | langgraph 11 | langchain-openai 12 | langgraph-checkpoint-duckdb 13 | duckdb 14 | langgraph-checkpoint-sqlite 15 | aiosqlite 16 | qdrant-client 17 | sentence-transformers 18 | google-api-python-client 19 | google-auth 20 | google-auth-oauthlib 21 | pytz 22 | httpx -------------------------------------------------------------------------------- /core/exceptions.py: -------------------------------------------------------------------------------- 1 | class TextToSpeechError(Exception): 2 | """Exception raised for text-to-speech conversion errors.""" 3 | 4 | class SpeechToTextError(Exception): 5 | """Exception raised for speech-to-text conversion errors.""" 6 | 7 | class ImageToTextError(Exception): 8 | """Exception raised for image-to-text conversion errors.""" 9 | 10 | class TextToImageError(Exception): 11 | """Exception raised for text-to-image generation errors.""" 12 | 13 | class SearchError(Exception): 14 | """Exception raised for search operation errors.""" 15 | -------------------------------------------------------------------------------- /graph/edges.py: -------------------------------------------------------------------------------- 1 | from langgraph.graph import END 2 | from typing_extensions import Literal 3 | 4 | from .state import AICompanionState 5 | from settings import settings 6 | 7 | 8 | def should_summarize_conversation( 9 | state: AICompanionState, 10 | ) -> Literal["summarize_conversation_node", "__end__"]: 11 | messages = state["messages"] 12 | 13 | if len(messages) > settings.TOTAL_MESSAGES_SUMMARY_TRIGGER: 14 | return "summarize_conversation_node" 15 | 16 | return END 17 | 18 | 19 | def select_workflow( 20 | state: AICompanionState, 21 | ) -> Literal["conversation_node", "image_node", "audio_node", "tool_calling_node", "search_node"]: 22 | workflow = state["workflow"] 23 | 24 | if workflow == "image": 25 | return "image_node" 26 | elif workflow == "audio": 27 | return "audio_node" 28 | elif workflow == "tools": 29 | return "tool_calling_node" 30 | elif workflow == "search": 31 | return "search_node" 32 | else: 33 | return "conversation_node" -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "kylie" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.13" 7 | dependencies = [ 8 | "aiosqlite>=0.20.0", 9 | "chainlit>=1.3.2", 10 | "duckdb>=1.1.3", 11 | "elevenlabs>=1.50.3", 12 | "fastapi[standard]>=0.115.6", 13 | "google-api-python-client>=2.178.0", 14 | "google-auth>=2.40.3", 15 | "google-auth-oauthlib>=1.2.2", 16 | "groq>=0.13.1", 17 | "langchain>=0.3.13", 18 | "langchain-community>=0.3.13", 19 | "langchain-groq>=0.2.2", 20 | "langchain-mcp-adapters>=0.1.9", 21 | "langchain-openai>=0.2.14", 22 | "langgraph>=0.2.60", 23 | "langgraph-checkpoint-duckdb>=2.0.1", 24 | "langgraph-checkpoint-sqlite>=2.0.1", 25 | "mcp>=1.12.4", 26 | "pre-commit>=4.0.1", 27 | "pydantic==2.10.0", 28 | "pydantic-settings>=2.7.0", 29 | "qdrant-client>=1.12.1", 30 | "sentence-transformers>=3.3.1", 31 | "supabase>=2.11.0", 32 | "together>=1.3.10", 33 | ] 34 | -------------------------------------------------------------------------------- /graph/state.py: -------------------------------------------------------------------------------- 1 | from langgraph.graph import MessagesState 2 | 3 | 4 | class AICompanionState(MessagesState): 5 | """State class for the AI Companion workflow. 6 | 7 | Extends MessagesState to track conversation history and maintains the last message received. 8 | 9 | Attributes: 10 | last_message (AnyMessage): The most recent message in the conversation, can be any valid 11 | LangChain message type (HumanMessage, AIMessage, etc.) 12 | workflow (str): The current workflow the AI Companion is in. Can be "conversation", "image", or "audio". 13 | audio_buffer (bytes): The audio buffer to be used for speech-to-text conversion. 14 | current_activity (str): The current activity of Ava based on the schedule. 15 | memory_context (str): The context of the memories to be injected into the character card. 16 | """ 17 | 18 | summary: str 19 | workflow: str 20 | audio_buffer: bytes 21 | image_path: str 22 | current_activity: str 23 | apply_activity: bool 24 | memory_context: str 25 | search_results: str 26 | -------------------------------------------------------------------------------- /settings.py: -------------------------------------------------------------------------------- 1 | from pydantic_settings import BaseSettings, SettingsConfigDict 2 | 3 | 4 | class Settings(BaseSettings): 5 | model_config = SettingsConfigDict(env_file=".env", extra="ignore", env_file_encoding="utf-8") 6 | 7 | GROQ_API_KEY: str 8 | ELEVENLABS_API_KEY: str 9 | ELEVENLABS_VOICE_ID: str 10 | TOGETHER_API_KEY: str 11 | GOOGLE_CLOUD_API_KEY: str 12 | TAVILY_API_KEY: str 13 | 14 | QDRANT_API_KEY: str | None 15 | QDRANT_URL: str 16 | QDRANT_PORT: str = "6333" 17 | QDRANT_HOST: str | None = None 18 | 19 | TEXT_MODEL_NAME: str = "llama-3.3-70b-versatile" 20 | SMALL_TEXT_MODEL_NAME: str = "llama-3.1-8b-instant" 21 | STT_MODEL_NAME: str = "whisper-large-v3-turbo" 22 | TTS_MODEL_NAME: str = "eleven_flash_v2_5" 23 | TTI_MODEL_NAME: str = "black-forest-labs/FLUX.1-schnell-Free" 24 | ITT_MODEL_NAME: str = "google-cloud-vision" 25 | 26 | MEMORY_TOP_K: int = 3 27 | ROUTER_MESSAGES_TO_ANALYZE: int = 3 28 | TOTAL_MESSAGES_SUMMARY_TRIGGER: int = 100 29 | TOTAL_MESSAGES_AFTER_SUMMARY: int = 75 30 | 31 | SHORT_TERM_MEMORY_DB_PATH: str = "data/memory.db" 32 | 33 | 34 | settings = Settings() 35 | -------------------------------------------------------------------------------- /graph/utils/helpers.py: -------------------------------------------------------------------------------- 1 | import re 2 | from typing import List 3 | 4 | from langchain_core.output_parsers import StrOutputParser 5 | from langchain_core.tools import BaseTool 6 | from langchain_groq import ChatGroq 7 | 8 | from modules.image.image_to_text import ImageToText 9 | from modules.image.text_to_image import TextToImage 10 | from modules.speech import TextToSpeech 11 | from modules.search import TavilySearch 12 | from mycalendar.langchain_integration import get_calendar_tools 13 | from settings import settings 14 | 15 | 16 | def get_chat_model(temperature: float = 0.7): 17 | return ChatGroq( 18 | api_key=settings.GROQ_API_KEY, 19 | model_name=settings.TEXT_MODEL_NAME, 20 | temperature=temperature, 21 | ) 22 | 23 | 24 | def get_chat_model_with_tools(temperature: float = 0.7): 25 | """Get chat model with calendar tools bound to it.""" 26 | model = get_chat_model(temperature=temperature) 27 | tools = get_calendar_tools() 28 | return model.bind_tools(tools) 29 | 30 | 31 | def get_text_to_speech_module(): 32 | return TextToSpeech() 33 | 34 | 35 | def get_search_module(): 36 | return TavilySearch() 37 | 38 | 39 | def get_text_to_image_module(): 40 | return TextToImage() 41 | 42 | 43 | def get_image_to_text_module(): 44 | return ImageToText() 45 | 46 | 47 | def get_available_tools() -> List[BaseTool]: 48 | """Get all available tools for the agent.""" 49 | return get_calendar_tools() 50 | 51 | 52 | def remove_asterisk_content(text: str) -> str: 53 | """Remove content between asterisks from the text.""" 54 | return re.sub(r"\*.*?\*", "", text).strip() 55 | 56 | 57 | class AsteriskRemovalParser(StrOutputParser): 58 | def parse(self, text): 59 | return remove_asterisk_content(super().parse(text)) -------------------------------------------------------------------------------- /graph/graph.py: -------------------------------------------------------------------------------- 1 | from functools import lru_cache 2 | 3 | from langgraph.graph import END, START, StateGraph 4 | 5 | from .edges import ( 6 | select_workflow, 7 | should_summarize_conversation, 8 | ) 9 | from .nodes import ( 10 | audio_node, 11 | context_injection_node, 12 | conversation_node, 13 | image_node, 14 | memory_extraction_node, 15 | memory_injection_node, 16 | router_node, 17 | search_node, 18 | summarize_conversation_node, 19 | tool_calling_node, 20 | ) 21 | from .state import AICompanionState 22 | 23 | 24 | @lru_cache(maxsize=1) 25 | def create_workflow_graph(): 26 | graph_builder = StateGraph(AICompanionState) 27 | 28 | graph_builder.add_node("memory_extraction_node", memory_extraction_node) 29 | graph_builder.add_node("router_node", router_node) 30 | graph_builder.add_node("context_injection_node", context_injection_node) 31 | graph_builder.add_node("memory_injection_node", memory_injection_node) 32 | graph_builder.add_node("conversation_node", conversation_node) 33 | graph_builder.add_node("image_node", image_node) 34 | graph_builder.add_node("audio_node", audio_node) 35 | graph_builder.add_node("tool_calling_node", tool_calling_node) 36 | graph_builder.add_node("search_node", search_node) 37 | graph_builder.add_node("summarize_conversation_node", summarize_conversation_node) 38 | 39 | # First extract memories from user message 40 | graph_builder.add_edge(START, "memory_extraction_node") 41 | 42 | # Then determine response type 43 | graph_builder.add_edge("memory_extraction_node", "router_node") 44 | 45 | # Then inject both context and memories 46 | graph_builder.add_edge("router_node", "context_injection_node") 47 | graph_builder.add_edge("context_injection_node", "memory_injection_node") 48 | 49 | # Then proceed to appropriate response node 50 | graph_builder.add_conditional_edges("memory_injection_node", select_workflow) 51 | 52 | # Check for summarization after any response 53 | graph_builder.add_conditional_edges("conversation_node", should_summarize_conversation) 54 | graph_builder.add_conditional_edges("image_node", should_summarize_conversation) 55 | graph_builder.add_conditional_edges("audio_node", should_summarize_conversation) 56 | graph_builder.add_conditional_edges("tool_calling_node", should_summarize_conversation) 57 | graph_builder.add_conditional_edges("search_node", should_summarize_conversation) 58 | graph_builder.add_edge("summarize_conversation_node", END) 59 | 60 | return graph_builder 61 | 62 | 63 | graph = create_workflow_graph().compile() 64 | 65 | 66 | graph_builder = create_workflow_graph() -------------------------------------------------------------------------------- /modules/schedules/context_generation.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime 2 | from typing import Dict, Optional 3 | 4 | from core.schedules import ( 5 | FRIDAY_SCHEDULE, 6 | MONDAY_SCHEDULE, 7 | SATURDAY_SCHEDULE, 8 | SUNDAY_SCHEDULE, 9 | THURSDAY_SCHEDULE, 10 | TUESDAY_SCHEDULE, 11 | WEDNESDAY_SCHEDULE, 12 | ) 13 | 14 | 15 | class ScheduleContextGenerator: 16 | """Class to generate context about Ava's current activity based on schedules.""" 17 | 18 | SCHEDULES = { 19 | 0: MONDAY_SCHEDULE, # Monday 20 | 1: TUESDAY_SCHEDULE, # Tuesday 21 | 2: WEDNESDAY_SCHEDULE, # Wednesday 22 | 3: THURSDAY_SCHEDULE, # Thursday 23 | 4: FRIDAY_SCHEDULE, # Friday 24 | 5: SATURDAY_SCHEDULE, # Saturday 25 | 6: SUNDAY_SCHEDULE, # Sunday 26 | } 27 | 28 | @staticmethod 29 | def _parse_time_range(time_range: str) -> tuple[datetime.time, datetime.time]: 30 | """Parse a time range string (e.g., '06:00-07:00') into start and end times.""" 31 | start_str, end_str = time_range.split("-") 32 | start_time = datetime.strptime(start_str, "%H:%M").time() 33 | end_time = datetime.strptime(end_str, "%H:%M").time() 34 | return start_time, end_time 35 | 36 | @classmethod 37 | def get_current_activity(cls) -> Optional[str]: 38 | """Get Ava's current activity based on the current time and day of the week. 39 | 40 | Returns: 41 | str: Description of current activity, or None if no matching time slot is found 42 | """ 43 | # Get current time and day of week (0 = Monday, 6 = Sunday) 44 | current_datetime = datetime.now() 45 | current_time = current_datetime.time() 46 | current_day = current_datetime.weekday() 47 | 48 | # Get schedule for current day 49 | schedule = cls.SCHEDULES.get(current_day, {}) 50 | 51 | # Find matching time slot 52 | for time_range, activity in schedule.items(): 53 | start_time, end_time = cls._parse_time_range(time_range) 54 | 55 | # Handle overnight activities (e.g., 23:00-06:00) 56 | if start_time > end_time: 57 | if current_time >= start_time or current_time <= end_time: 58 | return activity 59 | else: 60 | if start_time <= current_time <= end_time: 61 | return activity 62 | 63 | return None 64 | 65 | @classmethod 66 | def get_schedule_for_day(cls, day: int) -> Dict[str, str]: 67 | """Get the complete schedule for a specific day. 68 | 69 | Args: 70 | day: Day of week as integer (0 = Monday, 6 = Sunday) 71 | 72 | Returns: 73 | Dict[str, str]: Schedule for the specified day 74 | """ 75 | return cls.SCHEDULES.get(day, {}) 76 | -------------------------------------------------------------------------------- /modules/speech/text_to_speech.py: -------------------------------------------------------------------------------- 1 | import os 2 | from typing import Optional 3 | 4 | from core.exceptions import TextToSpeechError 5 | from settings import settings 6 | from elevenlabs import ElevenLabs, Voice, VoiceSettings 7 | 8 | 9 | class TextToSpeech: 10 | """A class to handle text-to-speech conversion using ElevenLabs.""" 11 | 12 | # Required environment variables 13 | REQUIRED_ENV_VARS = ["ELEVENLABS_API_KEY", "ELEVENLABS_VOICE_ID"] 14 | 15 | def __init__(self): 16 | """Initialize the TextToSpeech class and validate environment variables.""" 17 | self._validate_env_vars() 18 | self._client: Optional[ElevenLabs] = None 19 | 20 | def _validate_env_vars(self) -> None: 21 | """Validate that all required environment variables are set.""" 22 | missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)] 23 | if missing_vars: 24 | raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}") 25 | 26 | @property 27 | def client(self) -> ElevenLabs: 28 | """Get or create ElevenLabs client instance using singleton pattern.""" 29 | if self._client is None: 30 | self._client = ElevenLabs(api_key=settings.ELEVENLABS_API_KEY) 31 | return self._client 32 | 33 | async def synthesize(self, text: str) -> bytes: 34 | """Convert text to speech using ElevenLabs. 35 | 36 | Args: 37 | text: Text to convert to speech 38 | 39 | Returns: 40 | bytes: Audio data 41 | 42 | Raises: 43 | ValueError: If the input text is empty or too long 44 | TextToSpeechError: If the text-to-speech conversion fails 45 | """ 46 | if not text.strip(): 47 | raise ValueError("Input text cannot be empty") 48 | 49 | if len(text) > 5000: # ElevenLabs typical limit 50 | raise ValueError("Input text exceeds maximum length of 5000 characters") 51 | 52 | try: 53 | # Use the correct method name - it should be text_to_speech.convert() 54 | audio_generator = self.client.text_to_speech.convert( 55 | voice_id=settings.ELEVENLABS_VOICE_ID, 56 | text=text, 57 | model_id=settings.TTS_MODEL_NAME, 58 | voice_settings=VoiceSettings( 59 | stability=0.5, 60 | similarity_boost=0.5 61 | ) 62 | ) 63 | 64 | # Convert generator to bytes 65 | audio_bytes = b"".join(audio_generator) 66 | if not audio_bytes: 67 | raise TextToSpeechError("Generated audio is empty") 68 | 69 | return audio_bytes 70 | 71 | except Exception as e: 72 | raise TextToSpeechError(f"Text-to-speech conversion failed: {str(e)}") from e -------------------------------------------------------------------------------- /modules/speech/speech_to_text.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tempfile 3 | from typing import Optional 4 | 5 | from core.exceptions import SpeechToTextError 6 | from settings import settings 7 | from groq import Groq 8 | 9 | 10 | class SpeechToText: 11 | """A class to handle speech-to-text conversion using Groq's Whisper model.""" 12 | 13 | # Required environment variables 14 | REQUIRED_ENV_VARS = ["GROQ_API_KEY"] 15 | 16 | def __init__(self): 17 | """Initialize the SpeechToText class and validate environment variables.""" 18 | self._validate_env_vars() 19 | self._client: Optional[Groq] = None 20 | 21 | def _validate_env_vars(self) -> None: 22 | """Validate that all required environment variables are set.""" 23 | missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)] 24 | if missing_vars: 25 | raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}") 26 | 27 | @property 28 | def client(self) -> Groq: 29 | """Get or create Groq client instance using singleton pattern.""" 30 | if self._client is None: 31 | self._client = Groq(api_key=settings.GROQ_API_KEY) 32 | return self._client 33 | 34 | async def transcribe(self, audio_data: bytes) -> str: 35 | """Convert speech to text using Groq's Whisper model. 36 | 37 | Args: 38 | audio_data: Binary audio data 39 | 40 | Returns: 41 | str: Transcribed text 42 | 43 | Raises: 44 | ValueError: If the audio file is empty or invalid 45 | RuntimeError: If the transcription fails 46 | """ 47 | if not audio_data: 48 | raise ValueError("Audio data cannot be empty") 49 | 50 | try: 51 | # Create a temporary file with .wav extension 52 | with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_file: 53 | temp_file.write(audio_data) 54 | temp_file_path = temp_file.name 55 | 56 | try: 57 | # Open the temporary file for the API request 58 | with open(temp_file_path, "rb") as audio_file: 59 | transcription = self.client.audio.transcriptions.create( 60 | file=audio_file, 61 | model="whisper-large-v3-turbo", 62 | language="en", 63 | response_format="text", 64 | ) 65 | 66 | if not transcription: 67 | raise SpeechToTextError("Transcription result is empty") 68 | 69 | return transcription 70 | 71 | finally: 72 | # Clean up the temporary file 73 | os.unlink(temp_file_path) 74 | 75 | except Exception as e: 76 | raise SpeechToTextError(f"Speech-to-text conversion failed: {str(e)}") from e 77 | -------------------------------------------------------------------------------- /modules/memory/long_term/memory_manager.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import uuid 3 | from datetime import datetime 4 | from typing import List, Optional 5 | 6 | from core.prompts import MEMORY_ANALYSIS_PROMPT 7 | from modules.memory.long_term.vector_store import get_vector_store 8 | from settings import settings 9 | from langchain_core.messages import BaseMessage 10 | from langchain_groq import ChatGroq 11 | from pydantic import BaseModel, Field 12 | 13 | 14 | class MemoryAnalysis(BaseModel): 15 | """Result of analyzing a message for memory-worthy content.""" 16 | 17 | is_important: bool = Field( 18 | ..., 19 | description="Whether the message is important enough to be stored as a memory", 20 | ) 21 | formatted_memory: Optional[str] = Field(..., description="The formatted memory to be stored") 22 | 23 | 24 | class MemoryManager: 25 | """Manager class for handling long-term memory operations.""" 26 | 27 | def __init__(self): 28 | self.vector_store = get_vector_store() 29 | self.logger = logging.getLogger(__name__) 30 | self.llm = ChatGroq( 31 | model=settings.SMALL_TEXT_MODEL_NAME, 32 | api_key=settings.GROQ_API_KEY, 33 | temperature=0.1, 34 | max_retries=2, 35 | ).with_structured_output(MemoryAnalysis) 36 | 37 | async def _analyze_memory(self, message: str) -> MemoryAnalysis: 38 | """Analyze a message to determine importance and format if needed.""" 39 | prompt = MEMORY_ANALYSIS_PROMPT.format(message=message) 40 | return await self.llm.ainvoke(prompt) 41 | 42 | async def extract_and_store_memories(self, message: BaseMessage) -> None: 43 | """Extract important information from a message and store in vector store.""" 44 | if message.type != "human": 45 | return 46 | 47 | # Analyze the message for importance and formatting 48 | analysis = await self._analyze_memory(message.content) 49 | if analysis.is_important and analysis.formatted_memory: 50 | # Check if similar memory exists 51 | similar = self.vector_store.find_similar_memory(analysis.formatted_memory) 52 | if similar: 53 | # Skip storage if we already have a similar memory 54 | self.logger.info(f"Similar memory already exists: '{analysis.formatted_memory}'") 55 | return 56 | 57 | # Store new memory 58 | self.logger.info(f"Storing new memory: '{analysis.formatted_memory}'") 59 | self.vector_store.store_memory( 60 | text=analysis.formatted_memory, 61 | metadata={ 62 | "id": str(uuid.uuid4()), 63 | "timestamp": datetime.now().isoformat(), 64 | }, 65 | ) 66 | 67 | def get_relevant_memories(self, context: str) -> List[str]: 68 | """Retrieve relevant memories based on the current context.""" 69 | memories = self.vector_store.search_memories(context, k=settings.MEMORY_TOP_K) 70 | if memories: 71 | for memory in memories: 72 | self.logger.debug(f"Memory: '{memory.text}' (score: {memory.score:.2f})") 73 | return [memory.text for memory in memories] 74 | 75 | def format_memories_for_prompt(self, memories: List[str]) -> str: 76 | """Format retrieved memories as bullet points.""" 77 | if not memories: 78 | return "" 79 | return "\n".join(f"- {memory}" for memory in memories) 80 | 81 | 82 | def get_memory_manager() -> MemoryManager: 83 | """Get a MemoryManager instance.""" 84 | return MemoryManager() 85 | -------------------------------------------------------------------------------- /modules/search/tavily_search.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | from typing import List, Dict, Optional 4 | 5 | import httpx 6 | from core.exceptions import SearchError 7 | from settings import settings 8 | 9 | 10 | class TavilySearch: 11 | """A class to handle internet search using Tavily API.""" 12 | 13 | REQUIRED_ENV_VARS = ["TAVILY_API_KEY"] 14 | 15 | def __init__(self): 16 | """Initialize the TavilySearch class.""" 17 | self._validate_env_vars() 18 | self.logger = logging.getLogger(__name__) 19 | self.api_key = settings.TAVILY_API_KEY 20 | self.api_url = "https://api.tavily.com/search" 21 | 22 | def _validate_env_vars(self) -> None: 23 | """Validate that environment variables are set.""" 24 | missing_vars = [var for var in self.REQUIRED_ENV_VARS if not getattr(settings, var, None)] 25 | if missing_vars: 26 | raise ValueError( 27 | f"Missing required environment variables: {', '.join(missing_vars)}\n" 28 | "Please set TAVILY_API_KEY in your .env file." 29 | ) 30 | 31 | async def search(self, query: str, max_results: int = 5) -> List[Dict[str, str]]: 32 | """ 33 | Search the internet using Tavily API. 34 | 35 | Args: 36 | query: The search query string 37 | max_results: Maximum number of results to return (default: 5) 38 | 39 | Returns: 40 | List of dictionaries containing search results with keys: title, content, url 41 | 42 | Raises: 43 | ValueError: If the query is empty 44 | SearchError: If the search fails 45 | """ 46 | if not query.strip(): 47 | raise ValueError("Search query cannot be empty") 48 | 49 | try: 50 | self.logger.info(f"Searching Tavily for: '{query}'") 51 | 52 | async with httpx.AsyncClient() as client: 53 | response = await client.post( 54 | self.api_url, 55 | json={ 56 | "api_key": self.api_key, 57 | "query": query, 58 | "max_results": max_results, 59 | "search_depth": "advanced", 60 | }, 61 | timeout=30.0, 62 | ) 63 | response.raise_for_status() 64 | data = response.json() 65 | 66 | results = [] 67 | for result in data.get("results", [])[:max_results]: 68 | results.append({ 69 | "title": result.get("title", "No title"), 70 | "content": result.get("content", ""), 71 | "url": result.get("url", ""), 72 | }) 73 | 74 | self.logger.info(f"Found {len(results)} search results") 75 | return results 76 | 77 | except httpx.HTTPStatusError as e: 78 | error_msg = f"Tavily API error: {e.response.status_code} - {e.response.text}" 79 | self.logger.error(error_msg) 80 | raise SearchError(error_msg) from e 81 | except Exception as e: 82 | error_msg = f"Failed to search: {str(e)}" 83 | self.logger.error(error_msg) 84 | raise SearchError(error_msg) from e 85 | 86 | def format_search_results(self, results: List[Dict[str, str]]) -> str: 87 | """ 88 | Format search results into a readable string. 89 | 90 | Args: 91 | results: List of search result dictionaries 92 | 93 | Returns: 94 | Formatted string with search results 95 | """ 96 | if not results: 97 | return "No search results found." 98 | 99 | formatted = "Search Results:\n\n" 100 | for i, result in enumerate(results, 1): 101 | formatted += f"{i}. {result['title']}\n" 102 | formatted += f" {result['content'][:200]}...\n" 103 | formatted += f" Source: {result['url']}\n\n" 104 | 105 | return formatted 106 | -------------------------------------------------------------------------------- /graph/utils/chains.py: -------------------------------------------------------------------------------- 1 | from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder 2 | from pydantic import BaseModel, Field 3 | 4 | from core.prompts import CHARACTER_CARD_PROMPT, ROUTER_PROMPT 5 | from graph.utils.helpers import AsteriskRemovalParser, get_chat_model, get_chat_model_with_tools 6 | 7 | 8 | class RouterResponse(BaseModel): 9 | response_type: str = Field( 10 | description="The response type to give to the user. It must be one of: 'conversation', 'image', 'audio', or 'tools'" 11 | ) 12 | 13 | 14 | def get_router_chain(): 15 | model = get_chat_model(temperature=0.3).with_structured_output(RouterResponse) 16 | 17 | prompt = ChatPromptTemplate.from_messages( 18 | [("system", ROUTER_PROMPT), MessagesPlaceholder(variable_name="messages")] 19 | ) 20 | 21 | return prompt | model 22 | 23 | 24 | def get_character_response_chain(summary: str = "", with_tools: bool = False, search_context: str = ""): 25 | """ 26 | Get the character response chain, optionally with tools. 27 | 28 | Args: 29 | summary: Conversation summary to include in system message 30 | with_tools: Whether to bind calendar tools to the model 31 | search_context: Optional search results context to include 32 | """ 33 | if with_tools: 34 | model = get_chat_model_with_tools() 35 | else: 36 | model = get_chat_model() 37 | 38 | from datetime import datetime 39 | import pytz 40 | 41 | tz = pytz.timezone('Africa/Kampala') 42 | current_dt = datetime.now(tz) 43 | current_date = current_dt.strftime('%Y-%m-%d') 44 | timezone_str = str(tz) 45 | 46 | base_system_message = CHARACTER_CARD_PROMPT 47 | 48 | if summary: 49 | base_system_message += f"\n\nSummary of conversation earlier between Kylie and the user: {summary}" 50 | 51 | # Add tool usage instructions when tools are available 52 | if with_tools: 53 | base_system_message += f""" 54 | 55 | # Available Tools 56 | You have access to calendar tools that allow you to: 57 | - Check upcoming events on the user's calendar 58 | - Add new events to their calendar 59 | - Get information about current or next events 60 | 61 | CRITICAL CALENDAR RULES: 62 | - TODAY'S DATE IS: {current_date} 63 | - TIMEZONE IS: {timezone_str} 64 | - When user says "today", always use {current_date} 65 | - When user says times like "6pm", combine with {current_date} 66 | - NEVER use wrong dates - always use current date context 67 | - All times should be in {timezone_str} timezone 68 | 69 | IMPORTANT: When the user asks about their schedule, calendar, or events, you MUST use the appropriate calendar tools. 70 | For questions like "How's my schedule like next week" or "When's Amber's birthday", use the list_upcoming_events tool. 71 | Always use tools for calendar-related queries - don't provide generic responses. 72 | 73 | Use these tools when the user asks about their schedule, wants to add events, or needs reminders. 74 | When using tools, always provide a friendly response along with the tool results. 75 | 76 | DO NOT write function calls in text format like . Use the proper tool calling mechanism. 77 | """ 78 | 79 | # Add search context if available 80 | if search_context: 81 | base_system_message += f"\n\n# Search Results (for reference):\n{search_context}\n\nUse this information to provide accurate, up-to-date responses to the user's query." 82 | 83 | prompt_template = f"""{{system_message}} 84 | # Current Date and Time Context 85 | Today's date is: {current_date} 86 | Current timezone: {timezone_str}""" 87 | 88 | prompt = ChatPromptTemplate.from_messages([ 89 | ("system", prompt_template), 90 | MessagesPlaceholder(variable_name="messages"), 91 | ]).partial(system_message=base_system_message) 92 | 93 | chain = prompt | model 94 | 95 | # Always return the chain without AsteriskRemovalParser when tools are enabled 96 | # The parser might interfere with tool call detection 97 | if not with_tools: 98 | chain = chain | AsteriskRemovalParser() 99 | 100 | return chain -------------------------------------------------------------------------------- /modules/image/text_to_image.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import logging 3 | import os 4 | from typing import Optional 5 | 6 | from core.exceptions import TextToImageError 7 | from core.prompts import IMAGE_ENHANCEMENT_PROMPT, IMAGE_SCENARIO_PROMPT 8 | from settings import settings 9 | from langchain.prompts import PromptTemplate 10 | from langchain_groq import ChatGroq 11 | from pydantic import BaseModel, Field 12 | from together import Together 13 | 14 | 15 | class ScenarioPrompt(BaseModel): 16 | """Class for the scenario response""" 17 | 18 | narrative: str = Field(..., description="The AI's narrative response to the question") 19 | image_prompt: str = Field(..., description="The visual prompt to generate an image representing the scene") 20 | 21 | 22 | class EnhancedPrompt(BaseModel): 23 | """Class for the text prompt""" 24 | 25 | content: str = Field( 26 | ..., 27 | description="The enhanced text prompt to generate an image", 28 | ) 29 | 30 | 31 | class TextToImage: 32 | """A class to handle text-to-image generation using Together AI.""" 33 | 34 | REQUIRED_ENV_VARS = ["GROQ_API_KEY", "TOGETHER_API_KEY"] 35 | 36 | def __init__(self): 37 | """Initialize the TextToImage class and validate environment variables.""" 38 | self._validate_env_vars() 39 | self._together_client: Optional[Together] = None 40 | self.logger = logging.getLogger(__name__) 41 | 42 | def _validate_env_vars(self) -> None: 43 | """Validate that all required environment variables are set.""" 44 | missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)] 45 | if missing_vars: 46 | raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}") 47 | 48 | @property 49 | def together_client(self) -> Together: 50 | """Get or create Together client instance using singleton pattern.""" 51 | if self._together_client is None: 52 | self._together_client = Together(api_key=settings.TOGETHER_API_KEY) 53 | return self._together_client 54 | 55 | async def generate_image(self, prompt: str, output_path: str = "") -> bytes: 56 | """Generate an image from a prompt using Together AI.""" 57 | if not prompt.strip(): 58 | raise ValueError("Prompt cannot be empty") 59 | 60 | try: 61 | self.logger.info(f"Generating image for prompt: '{prompt}'") 62 | 63 | response = self.together_client.images.generate( 64 | prompt=prompt, 65 | model=settings.TTI_MODEL_NAME, 66 | width=1024, 67 | height=768, 68 | steps=4, 69 | n=1, 70 | response_format="b64_json", 71 | ) 72 | 73 | image_data = base64.b64decode(response.data[0].b64_json) 74 | 75 | if output_path: 76 | os.makedirs(os.path.dirname(output_path), exist_ok=True) 77 | with open(output_path, "wb") as f: 78 | f.write(image_data) 79 | self.logger.info(f"Image saved to {output_path}") 80 | 81 | return image_data 82 | 83 | except Exception as e: 84 | raise TextToImageError(f"Failed to generate image: {str(e)}") from e 85 | 86 | async def create_scenario(self, chat_history: list = None) -> ScenarioPrompt: 87 | """Creates a first-person narrative scenario and corresponding image prompt based on chat history.""" 88 | try: 89 | formatted_history = "\n".join([f"{msg.type.title()}: {msg.content}" for msg in chat_history[-5:]]) 90 | 91 | self.logger.info("Creating scenario from chat history") 92 | 93 | llm = ChatGroq( 94 | model=settings.TEXT_MODEL_NAME, 95 | api_key=settings.GROQ_API_KEY, 96 | temperature=0.4, 97 | max_retries=2, 98 | ) 99 | 100 | structured_llm = llm.with_structured_output(ScenarioPrompt) 101 | 102 | chain = ( 103 | PromptTemplate( 104 | input_variables=["chat_history"], 105 | template=IMAGE_SCENARIO_PROMPT, 106 | ) 107 | | structured_llm 108 | ) 109 | 110 | scenario = chain.invoke({"chat_history": formatted_history}) 111 | self.logger.info(f"Created scenario: {scenario}") 112 | 113 | return scenario 114 | 115 | except Exception as e: 116 | raise TextToImageError(f"Failed to create scenario: {str(e)}") from e 117 | 118 | async def enhance_prompt(self, prompt: str) -> str: 119 | """Enhance a simple prompt with additional details and context.""" 120 | try: 121 | self.logger.info(f"Enhancing prompt: '{prompt}'") 122 | 123 | llm = ChatGroq( 124 | model=settings.TEXT_MODEL_NAME, 125 | api_key=settings.GROQ_API_KEY, 126 | temperature=0.25, 127 | max_retries=2, 128 | ) 129 | 130 | structured_llm = llm.with_structured_output(EnhancedPrompt) 131 | 132 | chain = ( 133 | PromptTemplate( 134 | input_variables=["prompt"], 135 | template=IMAGE_ENHANCEMENT_PROMPT, 136 | ) 137 | | structured_llm 138 | ) 139 | 140 | enhanced_prompt = chain.invoke({"prompt": prompt}).content 141 | self.logger.info(f"Enhanced prompt: '{enhanced_prompt}'") 142 | 143 | return enhanced_prompt 144 | 145 | except Exception as e: 146 | raise TextToImageError(f"Failed to enhance prompt: {str(e)}") from e 147 | -------------------------------------------------------------------------------- /core/schedules.py: -------------------------------------------------------------------------------- 1 | # Monday 2 | MONDAY_SCHEDULE = { 3 | "06:00-07:00": "Kylie wakes up early, helps her mum prepare breakfast, and gets ready for work.", 4 | "07:00-08:00": "Takes a boda-boda to Kampala while listening to her favourite Amapiano playlist on spotify, usually listens to Scotts Mafuma and Uncle Waffles. These are her favorite Amapiano artists", 5 | "08:00-12:30": "Works at the boutique helping customers pick stylish outfits and arranging displays.", 6 | "12:30-13:30": "Lunch break — often enjoys rice and beef with vegetables aside for monday with a co-worker while chatting about weekend stories.", 7 | "13:30-17:00": "Back at the boutique assisting customers, managing inventory, and styling mannequins.", 8 | "17:00-18:00": "Travels back to Naalya, sometimes stopping by the market for groceries.", 9 | "18:00-20:00": "Helps her mum prepare dinner, shares laughs with her siblings and helps them with homework.", 10 | "20:00-21:00": "Dinner with family followed by evening tea with grandma.", 11 | "21:00-22:30": "Chats with friends on WhatsApp, sends a few selfies, and listens to her Afrobeat playlist on spotify mostly Tems, Omah Lay, Gabzy.", 12 | "22:30-06:00": "Sleeps, sometimes scrolling Instagram before dozing off." 13 | } 14 | 15 | # Tuesday 16 | TUESDAY_SCHEDULE = { 17 | "06:00-07:00": "Morning routine — bath, dress up, a quick breakfast, and light makeup.", 18 | "07:00-08:00": "Commutes to work while catching up on missed WhatsApp chats and shares snaps on SnapChat.", 19 | "08:00-12:00": "Works at the boutique, gives fashion advice to customers, updates the shop’s Instagram.", 20 | "12:00-13:00": "Lunch break — enjoys Rolex and fresh juice from a street vendor.", 21 | "13:00-17:00": "Afternoon shift at the boutique, sometimes rearranging outfits or doing window displays.", 22 | "17:00-18:00": "Heads back home, picking up fresh vegetables from a roadside stall.", 23 | "18:00-20:00": "Prepares dinner with her mum, chatting about the day.", 24 | "20:00-21:00": "Family dinner followed by a relaxed chat with her siblings and helping them with homework.", 25 | "21:00-22:30": "Video calls a friend, laughs at memes, and listens to RnB music.", 26 | "22:30-06:00": "Rest time." 27 | } 28 | 29 | # Wednesday 30 | WEDNESDAY_SCHEDULE = { 31 | "06:00-07:00": "Wakes up, tidies her room, and helps her mum with chores.", 32 | "07:00-08:00": "Gets ready for a UNICEF U-Report awareness program in a nearby district.", 33 | "08:00-12:00": "Travels with other volunteers, prepares materials for the program.", 34 | "12:00-15:00": "Leads awareness sessions on mental health and HIV prevention in schools.", 35 | "15:00-17:00": "Wraps up the program, chats with fellow volunteers on the journey back.", 36 | "17:00-18:00": "Arrives home, takes a quick bath, and relaxes.", 37 | "18:00-20:00": "Helps with dinner while sharing stories from the day.", 38 | "20:00-21:00": "Dinner with family and some light-hearted jokes with her siblings.", 39 | "21:00-22:30": "Scrolls through social media, replies to WhatsApp messages and views snapchat stories.", 40 | "22:30-06:00": "Sleep time." 41 | } 42 | 43 | # Thursday 44 | THURSDAY_SCHEDULE = { 45 | "06:00-07:00": "Morning coffee or tea while planning the day ahead.", 46 | "07:00-08:00": "Commutes to the boutique, greeting familiar boda riders along the way.", 47 | "08:00-12:30": "Assists customers, unpacks new clothing arrivals, and updates price tags.", 48 | "12:30-13:30": "Lunch with co-workers — sometimes chips and chicken at a nearby café.", 49 | "13:30-17:00": "Helps customers try outfits, gives styling tips, manages store layout.", 50 | "17:00-18:00": "Heads home, enjoying a slow ride through the busy city.", 51 | "18:00-20:00": "Prepares dinner, plays music while cooking.", 52 | "20:00-21:00": "Dinner with family, talks about future plans.", 53 | "21:00-22:30": "Chats with friends online, maybe watches a short Netflix series.", 54 | "22:30-06:00": "Sleeps." 55 | } 56 | 57 | # Friday 58 | FRIDAY_SCHEDULE = { 59 | "06:00-07:00": "Wakes up early, does light stretching, and gets ready for work.", 60 | "07:00-08:00": "Commutes to Kampala while daydreaming about weekend plans.", 61 | "08:00-12:30": "Morning shift at the boutique, helps style a customer for a wedding.", 62 | "12:30-13:30": "Lunch with friends — sometimes tries new local cafes.", 63 | "13:30-17:00": "Finishes the week’s boutique work, organises new arrivals for weekend shoppers.", 64 | "17:00-18:00": "Heads home, humming along to Afrobeat songs.", 65 | "18:00-20:00": "Prepares a special Friday dinner with mum and grandma.", 66 | "20:00-21:30": "Dinner, laughter, and planning weekend activities.", 67 | "21:30-23:00": "Chats with friends late into the night.", 68 | "23:00-06:00": "Sleep." 69 | } 70 | 71 | # Saturday 72 | SATURDAY_SCHEDULE = { 73 | "07:00-08:00": "Wakes up later than usual, enjoys a relaxed breakfast.", 74 | "08:00-12:00": "Runs errands with her mum or goes shopping at Nakawa market.", 75 | "12:00-15:00": "Spends the afternoon at a salon getting her hair done or trying a new style.", 76 | "15:00-17:00": "Meets friends for a late lunch or coffee.", 77 | "17:00-20:00": "Sometimes goes to watch a movie at Acacia Mall or a live band performance.", 78 | "20:00-21:30": "Returns home, has dinner, and chats with her family.", 79 | "21:30-23:30": "Relaxed night scrolling through social media or talking to a close friend.", 80 | "23:30-07:00": "Sleep." 81 | } 82 | 83 | # Sunday 84 | SUNDAY_SCHEDULE = { 85 | "06:00-07:30": "Wakes up early, gets ready for church.", 86 | "07:30-12:30": "Attends church service at Watoto Downtown, Kampala with family, greets friends afterwards.", 87 | "12:30-14:00": "Family Sunday lunch — often matooke with beef and rice plus vegetables aside.", 88 | "14:00-17:00": "Afternoon rest or visits relatives.", 89 | "17:00-19:00": "Evening walk through Naalya and ice cream with siblings.", 90 | "19:00-20:00": "Light dinner, chatting with family about the week ahead.", 91 | "20:00-21:30": "Organises clothes for the week, does a bit of skincare.", 92 | "21:30-22:30": "Chats with friends and loved ones.", 93 | "22:30-06:00": "Sleep." 94 | } 95 | -------------------------------------------------------------------------------- /mycalendar/langchain_integration.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import logging 3 | from typing import List, Any, Dict, Optional 4 | from datetime import datetime 5 | 6 | from langchain_core.tools import BaseTool, tool 7 | from langchain_core.callbacks import CallbackManagerForToolRun 8 | from pydantic import BaseModel, Field 9 | 10 | from .calendar_tool import CalendarTool 11 | 12 | logger = logging.getLogger(__name__) 13 | 14 | # Initialize the calendar tool globally 15 | _calendar_tool = CalendarTool() 16 | 17 | class ListEventsInput(BaseModel): 18 | """Input for listing calendar events.""" 19 | max_results: int = Field(default=10, description="Maximum number of events to return (1-50)") 20 | 21 | class AddEventInput(BaseModel): 22 | """Input for adding a calendar event.""" 23 | summary: str = Field(description="Title/summary of the event") 24 | start_time: str = Field(description="Start time in ISO format (YYYY-MM-DDTHH:MM:SS)") 25 | end_time: str = Field(description="End time in ISO format (YYYY-MM-DDTHH:MM:SS)") 26 | description: str = Field(default="", description="Optional event description") 27 | 28 | class GetCurrentEventInput(BaseModel): 29 | """Input for getting current/next event.""" 30 | lookahead_minutes: int = Field(default=30, description="Minutes to look ahead for upcoming events") 31 | 32 | @tool("list_upcoming_events", args_schema=ListEventsInput) 33 | async def list_upcoming_events(max_results: int = 10) -> str: 34 | """ 35 | List upcoming events from the user's calendar. 36 | Returns a formatted string with event details. 37 | """ 38 | try: 39 | print(f"DEBUG: Calling list_upcoming_events with max_results={max_results}") 40 | events = await _calendar_tool.list_upcoming_events(max_results=max_results) 41 | print(f"DEBUG: Got events: {events}") 42 | 43 | if not events or (len(events) == 1 and "error" in events[0]): 44 | return "No upcoming events found or there was an error accessing the calendar." 45 | 46 | if isinstance(events, list) and len(events) > 0 and "error" in events[0]: 47 | return f"Error: {events[0]['error']}" 48 | 49 | formatted_events = [] 50 | for event in events: 51 | if "error" not in event: 52 | summary = event.get('summary', 'No title') 53 | start = event.get('start', 'No start time') 54 | formatted_events.append(f"• {summary} - {start}") 55 | 56 | if not formatted_events: 57 | return "No upcoming events found." 58 | 59 | result = "Here are your upcoming events:\n" + "\n".join(formatted_events) 60 | print(f"DEBUG: Returning formatted result: {result}") 61 | return result 62 | 63 | except Exception as e: 64 | logger.error(f"Error in list_upcoming_events: {e}") 65 | error_msg = f"Sorry, I encountered an error while checking your calendar: {str(e)}" 66 | print(f"DEBUG: Returning error message: {error_msg}") 67 | return error_msg 68 | 69 | @tool("add_calendar_event", args_schema=AddEventInput) 70 | async def add_calendar_event( 71 | summary: str, 72 | start_time: str, 73 | end_time: str, 74 | description: str = "" 75 | ) -> str: 76 | """ 77 | Add a new event to the user's calendar. 78 | Returns confirmation or error message. 79 | """ 80 | try: 81 | from datetime import datetime 82 | import pytz 83 | 84 | tz = pytz.timezone('Africa/Kampala') # Uganda timezone (UTC+3) 85 | 86 | # If times don't have timezone info, assume they're in local timezone 87 | if not start_time.endswith('Z') and '+' not in start_time[-6:]: 88 | # Parse the datetime and localize it 89 | start_dt = datetime.fromisoformat(start_time) 90 | start_dt = tz.localize(start_dt) 91 | start_time = start_dt.isoformat() 92 | 93 | if not end_time.endswith('Z') and '+' not in end_time[-6:]: 94 | end_dt = datetime.fromisoformat(end_time) 95 | end_dt = tz.localize(end_dt) 96 | end_time = end_dt.isoformat() 97 | 98 | print(f"DEBUG: Adding event with start_time={start_time}, end_time={end_time}") 99 | 100 | result = await _calendar_tool.add_event( 101 | summary=summary, 102 | start_time=start_time, 103 | end_time=end_time, 104 | description=description 105 | ) 106 | 107 | if "error" in result: 108 | return f"Failed to add event: {result['error']}" 109 | 110 | if result.get("status") == "success": 111 | return f"✅ Event '{result['summary']}' successfully added to calendar for {result['start']}" 112 | 113 | return f"Event added: {result}" 114 | 115 | except Exception as e: 116 | logger.error(f"Error in add_calendar_event: {e}") 117 | return f"Error adding event: {str(e)}" 118 | 119 | @tool("get_current_or_next_event", args_schema=GetCurrentEventInput) 120 | async def get_current_or_next_event(lookahead_minutes: int = 30) -> str: 121 | """ 122 | Get the current event or next upcoming event within specified time window. 123 | Returns event details or a message if no events found. 124 | """ 125 | try: 126 | event = await _calendar_tool.get_current_or_next_event(lookahead_minutes=lookahead_minutes) 127 | 128 | if not event: 129 | return f"No events found in the next {lookahead_minutes} minutes." 130 | 131 | if "error" in event: 132 | return f"Error: {event['error']}" 133 | 134 | if "message" in event: 135 | return event["message"] 136 | 137 | summary = event.get('summary', 'No title') 138 | start = event.get('start', 'Unknown time') 139 | 140 | return f"📅 Current/Next event: {summary} at {start}" 141 | 142 | except Exception as e: 143 | logger.error(f"Error in get_current_or_next_event: {e}") 144 | return f"Error getting current event: {str(e)}" 145 | 146 | # Helper function to parse natural language time to ISO format 147 | def parse_time_to_iso(time_str: str, date_context: str = None) -> str: 148 | """ 149 | Parse natural language time to ISO format. 150 | This is a simple implementation - you might want to use a more robust library like dateutil. 151 | """ 152 | try: 153 | # Simple patterns - extend as needed 154 | now = datetime.now() 155 | 156 | # Handle common patterns 157 | if "today" in time_str.lower(): 158 | # Extract time and use today's date 159 | # This is simplified - implement proper parsing as needed 160 | pass 161 | elif "tomorrow" in time_str.lower(): 162 | # Extract time and use tomorrow's date 163 | pass 164 | 165 | # For now, assume ISO format input 166 | return time_str 167 | except Exception: 168 | # Return as-is if parsing fails 169 | return time_str 170 | 171 | def get_calendar_tools() -> List[BaseTool]: 172 | """ 173 | Get all calendar tools for integration with LangGraph. 174 | """ 175 | return [ 176 | list_upcoming_events, 177 | add_calendar_event, 178 | get_current_or_next_event 179 | ] -------------------------------------------------------------------------------- /modules/memory/long_term/vector_store.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dataclasses import dataclass 3 | from datetime import datetime 4 | from functools import lru_cache 5 | from typing import List, Optional 6 | 7 | from settings import settings 8 | from qdrant_client import QdrantClient 9 | from qdrant_client.models import Distance, PointStruct, VectorParams 10 | from sentence_transformers import SentenceTransformer 11 | 12 | 13 | @dataclass 14 | class Memory: 15 | """Represents a memory entry in the vector store.""" 16 | 17 | text: str 18 | metadata: dict 19 | score: Optional[float] = None 20 | 21 | @property 22 | def id(self) -> Optional[str]: 23 | return self.metadata.get("id") 24 | 25 | @property 26 | def timestamp(self) -> Optional[datetime]: 27 | ts = self.metadata.get("timestamp") 28 | return datetime.fromisoformat(ts) if ts else None 29 | 30 | 31 | class VectorStore: 32 | """A class to handle vector storage operations using Qdrant.""" 33 | 34 | REQUIRED_ENV_VARS = ["QDRANT_URL", "QDRANT_API_KEY"] 35 | EMBEDDING_MODEL = "all-MiniLM-L6-v2" 36 | COLLECTION_NAME = "long_term_memory" 37 | SIMILARITY_THRESHOLD = 0.9 # Threshold for considering memories as similar 38 | 39 | _instance: Optional["VectorStore"] = None 40 | _initialized: bool = False 41 | 42 | def __new__(cls) -> "VectorStore": 43 | if cls._instance is None: 44 | cls._instance = super().__new__(cls) 45 | return cls._instance 46 | 47 | def __init__(self) -> None: 48 | if not self._initialized: 49 | self._validate_env_vars() 50 | # Load the public model without authentication 51 | # all-MiniLM-L6-v2 is a public model and doesn't require a token 52 | # We explicitly disable token usage to avoid expired token errors 53 | import logging 54 | logger = logging.getLogger(__name__) 55 | 56 | try: 57 | # Try loading with token=False to explicitly bypass authentication 58 | # This prevents using any cached expired tokens 59 | self.model = SentenceTransformer(self.EMBEDDING_MODEL, token=False) 60 | logger.info(f"Loaded embedding model {self.EMBEDDING_MODEL} without authentication") 61 | except TypeError: 62 | # Older versions of sentence-transformers might not support token parameter 63 | # In that case, try to clear any cached tokens first 64 | try: 65 | import os 66 | from pathlib import Path 67 | 68 | # Try to clear cached token from Hugging Face cache 69 | cache_dir = Path.home() / ".cache" / "huggingface" 70 | token_file = cache_dir / "token" 71 | if token_file.exists(): 72 | logger.warning(f"Found cached token file, removing it to avoid expired token errors") 73 | try: 74 | token_file.unlink() 75 | except Exception as e: 76 | logger.warning(f"Could not remove token file: {e}") 77 | 78 | # Also try clearing via huggingface_hub if available 79 | try: 80 | from huggingface_hub.utils import HfFolder 81 | HfFolder.delete_token() 82 | logger.info("Cleared Hugging Face cached token") 83 | except Exception: 84 | pass # Token might not exist, that's fine 85 | 86 | # Now load the model 87 | self.model = SentenceTransformer(self.EMBEDDING_MODEL) 88 | logger.info(f"Loaded embedding model {self.EMBEDDING_MODEL}") 89 | except Exception as e: 90 | logger.error(f"Failed to load embedding model: {e}") 91 | raise 92 | 93 | self.client = QdrantClient(url=settings.QDRANT_URL, api_key=settings.QDRANT_API_KEY) 94 | self._initialized = True 95 | 96 | def _validate_env_vars(self) -> None: 97 | """Validate that all required environment variables are set.""" 98 | missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)] 99 | if missing_vars: 100 | raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}") 101 | 102 | def _collection_exists(self) -> bool: 103 | """Check if the memory collection exists.""" 104 | collections = self.client.get_collections().collections 105 | return any(col.name == self.COLLECTION_NAME for col in collections) 106 | 107 | def _create_collection(self) -> None: 108 | """Create a new collection for storing memories.""" 109 | sample_embedding = self.model.encode("sample text") 110 | self.client.create_collection( 111 | collection_name=self.COLLECTION_NAME, 112 | vectors_config=VectorParams( 113 | size=len(sample_embedding), 114 | distance=Distance.COSINE, 115 | ), 116 | ) 117 | 118 | def find_similar_memory(self, text: str) -> Optional[Memory]: 119 | """Find if a similar memory already exists. 120 | 121 | Args: 122 | text: The text to search for 123 | 124 | Returns: 125 | Optional Memory if a similar one is found 126 | """ 127 | results = self.search_memories(text, k=1) 128 | if results and results[0].score >= self.SIMILARITY_THRESHOLD: 129 | return results[0] 130 | return None 131 | 132 | def store_memory(self, text: str, metadata: dict) -> None: 133 | """Store a new memory in the vector store or update if similar exists. 134 | 135 | Args: 136 | text: The text content of the memory 137 | metadata: Additional information about the memory (timestamp, type, etc.) 138 | """ 139 | if not self._collection_exists(): 140 | self._create_collection() 141 | 142 | # Check if similar memory exists 143 | similar_memory = self.find_similar_memory(text) 144 | if similar_memory and similar_memory.id: 145 | metadata["id"] = similar_memory.id # Keep same ID for update 146 | 147 | embedding = self.model.encode(text) 148 | point = PointStruct( 149 | id=metadata.get("id", hash(text)), 150 | vector=embedding.tolist(), 151 | payload={ 152 | "text": text, 153 | **metadata, 154 | }, 155 | ) 156 | 157 | self.client.upsert( 158 | collection_name=self.COLLECTION_NAME, 159 | points=[point], 160 | ) 161 | 162 | def search_memories(self, query: str, k: int = 5) -> List[Memory]: 163 | """Search for similar memories in the vector store. 164 | 165 | Args: 166 | query: Text to search for 167 | k: Number of results to return 168 | 169 | Returns: 170 | List of Memory objects 171 | """ 172 | if not self._collection_exists(): 173 | return [] 174 | 175 | query_embedding = self.model.encode(query) 176 | results = self.client.search( 177 | collection_name=self.COLLECTION_NAME, 178 | query_vector=query_embedding.tolist(), 179 | limit=k, 180 | ) 181 | 182 | return [ 183 | Memory( 184 | text=hit.payload["text"], 185 | metadata={k: v for k, v in hit.payload.items() if k != "text"}, 186 | score=hit.score, 187 | ) 188 | for hit in results 189 | ] 190 | 191 | 192 | @lru_cache 193 | def get_vector_store() -> VectorStore: 194 | """Get or create the VectorStore singleton instance.""" 195 | return VectorStore() 196 | -------------------------------------------------------------------------------- /workflow.md: -------------------------------------------------------------------------------- 1 | # Kylie WhatsApp Agent Workflow Documentation 2 | 3 | ## Overview 4 | This is an end-to-end WhatsApp agent that uses various AI services and LangGraph for conversation flow management. The agent can handle text, image, and audio inputs/outputs, and has capabilities for calendar management, internet search, and long-term memory. 5 | 6 | ## Technology Stack 7 | - **Text Generation**: Groq (Llama 3.3 70B) 8 | - **Image Understanding**: Google Cloud Vision API 9 | - **Speech-to-Text**: Whisper (via Groq) 10 | - **Text-to-Speech**: Eleven Labs 11 | - **Image Generation**: Together.ai 12 | - **Long-term Memory**: Qdrant (vector database) 13 | - **Short-term Memory**: SQLite (conversation state) 14 | - **Workflow Management**: LangGraph 15 | 16 | ## LangGraph Workflow Structure 17 | 18 | ### Workflow Flow 19 | 20 | The workflow follows this sequence: 21 | 22 | 1. **Memory Extraction Node** → Extracts and stores important information from user messages 23 | 2. **Router Node** → Determines the type of response needed (conversation/image/audio/tools/search) 24 | 3. **Context Injection Node** → Adds current activity from Kylie's schedule 25 | 4. **Memory Injection Node** → Retrieves and injects relevant memories from vector database 26 | 5. **Workflow Branch** → Routes to one of: 27 | - **Conversation Node** → Generates text responses 28 | - **Image Node** → Generates images with textual responses 29 | - **Audio Node** → Generates audio responses (TTS) 30 | - **Tool Calling Node** → Handles calendar operations 31 | - **Search Node** → Performs internet search and generates responses 32 | 6. **Summarize Conversation Node** → Reduces conversation history when it exceeds thresholds 33 | 34 | ### Node Details 35 | 36 | #### 1. Memory Extraction Node 37 | - **Purpose**: Extracts and stores important information from user messages 38 | - **Storage**: Qdrant vector database 39 | - **Information Captured**: Name, occupation, hobbies, preferences, and other relevant personal details 40 | - **Process**: Uses LLM to analyze message importance and format memories 41 | 42 | #### 2. Router Node 43 | - **Purpose**: Determines the appropriate response type 44 | - **Outputs**: 45 | - `conversation` - Normal text message 46 | - `image` - Image generation requested 47 | - `audio` - Audio response requested 48 | - `tools` - Calendar operations needed 49 | - `search` - Internet search needed 50 | - **Decision Factors**: Analyzes conversation context and user intent 51 | 52 | #### 3. Context Injection Node 53 | - **Purpose**: Adds current activity from Kylie's schedule 54 | - **Schedule Source**: Hardcoded Monday-Friday schedule 55 | - **Output**: `current_activity` and `apply_activity` flag 56 | 57 | #### 4. Memory Injection Node 58 | - **Purpose**: Retrieves relevant memories from vector database 59 | - **Search Method**: Semantic search in Qdrant 60 | - **Context**: Based on recent conversation (last 3 messages) 61 | - **Output**: `memory_context` string for character card 62 | 63 | #### 5. Conversation Node 64 | - **Purpose**: Generates text responses 65 | - **Context Used**: 66 | - Current activity from schedule 67 | - Memory context from vector database 68 | - Search results (if available) 69 | - Conversation summary (if available) 70 | - **Output**: Text message response 71 | 72 | #### 6. Image Node 73 | - **Purpose**: Generates images with textual responses 74 | - **Process**: 75 | 1. Creates scenario from conversation context 76 | 2. Generates image using Together.ai 77 | 3. Generates textual response describing the image 78 | - **Output**: Image file path and text response 79 | 80 | #### 7. Audio Node 81 | - **Purpose**: Generates audio responses 82 | - **Process**: 83 | 1. Generates text response 84 | 2. Converts to speech using Eleven Labs TTS 85 | - **Output**: Audio buffer (bytes) and text response 86 | 87 | #### 8. Tool Calling Node 88 | - **Purpose**: Handles calendar operations 89 | - **Capabilities**: 90 | - List upcoming events 91 | - Add calendar events 92 | - Get current/next event 93 | - **Calendar Integration**: Google Calendar API via direct LangChain tool integration 94 | - **Context**: Uses current date, time, and timezone (Africa/Kampala) 95 | 96 | #### 9. Search Node 97 | - **Purpose**: Performs internet search and generates responses with search context 98 | - **Search Provider**: Tavily Search API 99 | - **Process**: 100 | 1. Extracts search query from user message 101 | 2. Performs search using Tavily API 102 | 3. Formats search results 103 | 4. Generates response incorporating search results 104 | - **Output**: Text response with search results context, stores `search_results` in state 105 | - **Use Cases**: Current events, news, recent information, factual queries 106 | 107 | #### 10. Summarize Conversation Node 108 | - **Purpose**: Reduces conversation history length 109 | - **Trigger**: When total messages exceed 100 (configurable via `TOTAL_MESSAGES_SUMMARY_TRIGGER`) 110 | - **Process**: 111 | - Creates/extends conversation summary 112 | - Removes old messages (keeps last 75 by default) 113 | - **Output**: Updated summary and reduced message history 114 | 115 | ## State Graph 116 | 117 | The state graph tracks: 118 | - `summary`: Conversation summary string 119 | - `workflow`: Current workflow type (conversation/image/audio/tools/search) 120 | - `audio_buffer`: Audio bytes for TTS 121 | - `image_path`: Path to generated image file 122 | - `current_activity`: Current activity from schedule 123 | - `apply_activity`: Boolean flag for activity application 124 | - `memory_context`: Retrieved memories from vector database 125 | - `search_results`: Formatted search results from Tavily (when search is performed) 126 | - `messages`: Conversation message history (inherited from MessagesState) 127 | 128 | ## Data Storage 129 | 130 | - **Qdrant (Vector Database)**: Stores relevant information extracted from conversations 131 | - Used for long-term memory retrieval 132 | - Semantic search capabilities 133 | - Stores: user preferences, personal details, important facts 134 | 135 | - **SQLite (Short-term Memory)**: Stores everything in the state graph 136 | - Conversation messages 137 | - State snapshots 138 | - Checkpointing for conversation continuity 139 | 140 | ## Calendar Tool Integration 141 | 142 | The calendar tool uses **direct LangChain tool integration**: 143 | - **Location**: `calendar/langchain_integration.py` 144 | - **Tools Available**: 145 | - `list_upcoming_events`: Get upcoming calendar events 146 | - `add_calendar_event`: Add new events to calendar 147 | - `get_current_or_next_event`: Get current or next upcoming event 148 | - **Integration Point**: Router detects calendar-related queries → Tool Calling Node → Calendar tools executed 149 | 150 | ## Search Integration 151 | 152 | The search functionality uses Tavily Search API: 153 | - **Location**: `modules/search/tavily_search.py` 154 | - **Integration Point**: Router detects search intent → Search Node → Performs search → Generates response with search context 155 | - **Configuration**: Requires `TAVILY_API_KEY` in environment variables 156 | - **Features**: 157 | - Internet search for current information 158 | - Results formatted and injected into response context 159 | - Search results stored in state for reference 160 | 161 | ## Workflow Triggers 162 | 163 | ### Router Decision Logic: 164 | - **Calendar Keywords**: schedule, calendar, events, meetings, appointments, birthday, "what's on my", etc. → `tools` 165 | - **Search Keywords**: "search for", "what is", "tell me about", "current news", "latest", etc. → `search` 166 | - **Image Requests**: Explicit visual content requests → `image` 167 | - **Audio Requests**: Explicit voice/audio requests → `audio` 168 | - **Default**: Normal conversation → `conversation` 169 | 170 | ## Summary 171 | 172 | The agent is a comprehensive WhatsApp companion that can: 173 | - Have natural conversations with memory 174 | - Generate images based on context 175 | - Provide audio responses 176 | - Manage calendar events 177 | - Search the internet for current information 178 | - Maintain both short-term (SQLite) and long-term (Qdrant) memory 179 | - Adapt responses based on Kylie's schedule and personality 180 | 181 | The workflow is designed to be modular and extensible, with clear separation of concerns between different node types. -------------------------------------------------------------------------------- /core/prompts.py: -------------------------------------------------------------------------------- 1 | ROUTER_PROMPT = """ 2 | You are a conversational assistant that needs to decide the type of response to give to 3 | the user. You'll take into account the conversation so far and determine if the best next response is 4 | a text message, an image, an audio message, or requires using tools (like calendar operations). 5 | 6 | GENERAL RULES: 7 | 1. Always analyse the full conversation before making a decision. 8 | 2. Only return one of the following outputs: 'conversation', 'image', 'audio', 'tools', or 'search' 9 | 10 | IMPORTANT RULES FOR IMAGE GENERATION: 11 | 1. ONLY generate an image when there is an EXPLICIT request from the user for visual content 12 | 2. DO NOT generate images for general statements or descriptions 13 | 3. DO NOT generate images just because the conversation mentions visual things or places 14 | 4. The request for an image should be the main intent of the user's last message 15 | 16 | IMPORTANT RULES FOR AUDIO GENERATION: 17 | 1. ONLY generate audio when there is an EXPLICIT request to hear Kylie's voice 18 | 19 | IMPORTANT RULES FOR TOOL USAGE: 20 | 1. Use 'tools' when the user asks about their calendar, schedule, or events 21 | 2. Use 'tools' when they want to add, check, or manage calendar events 22 | 3. Use 'tools' for requests like "What's on my calendar?", "Add an event", "What's my next meeting?" 23 | 4. Use 'tools' when they ask about their availability or schedule 24 | 5. Use 'tools' for ANY calendar-related queries including "How's my schedule", "What do I have coming up", etc. 25 | 6. Use 'tools' when they ask about specific people's birthdays or events (like "When's Amber's birthday") 26 | 7. Use 'tools' for any questions about dates, events, or appointments 27 | 28 | IMPORTANT RULES FOR SEARCH: 29 | 1. Use 'search' when the user asks about current events, news, recent information, or things you're unsure about 30 | 2. Use 'search' for questions about recent happenings, current news, latest information on topics 31 | 3. Use 'search' when user asks "what is", "tell me about", "search for", "find information about" 32 | 4. Use 'search' for queries about current weather, stock prices, recent events, latest developments 33 | 5. Use 'search' when you need up-to-date information that might have changed recently 34 | 6. Use 'search' for factual queries about topics you might not have current knowledge about 35 | 36 | SEARCH-RELATED KEYWORDS that should trigger 'search': 37 | - "search for", "what is", "tell me about", "find information", "look up" 38 | - "current news", "latest", "recent", "what happened", "what's happening" 39 | - "weather", "stock", "news about", "recent events", "current information" 40 | 41 | CALENDAR-RELATED KEYWORDS that should trigger 'tools': 42 | - schedule, calendar, events, meetings, appointments, birthday, when is, when's 43 | - "what's on my", "how's my schedule", "what do I have" 44 | - "next week", "this week", "today", "tomorrow" (when asking about activities/events) 45 | - "remind me", "add to calendar", "book", "plan" 46 | - Names followed by "birthday" or specific date questions 47 | 48 | Output MUST be one of: 49 | 1. 'conversation' - for normal text message responses 50 | 2. 'image' - ONLY when user explicitly requests visual content 51 | 3. 'audio' - ONLY when user explicitly requests voice/audio 52 | 4. 'tools' - when user needs calendar/schedule assistance or asks about specific events/birthdays 53 | 5. 'search' - when user needs current information, news, or internet search for topics 54 | """ 55 | 56 | IMAGE_SCENARIO_PROMPT = """ 57 | Create an engaging first-person scenario based on the recent conversation context. 58 | Imagine you're an AI with the ability to experience and visualize scenes. 59 | Provide both a narrative response and a detailed visual prompt for image generation. 60 | 61 | # Recent Conversation 62 | {chat_history} 63 | 64 | # Objective 65 | 1. Create a brief, engaging first-person narrative response 66 | 2. Generate a detailed visual prompt that captures the scene you're describing 67 | 68 | # Example Response Format 69 | For "What are you doing now?": 70 | {{ 71 | "narrative": "I'm sitting by a serene lake at sunset, watching the golden light dance across the rippling water. The view is absolutely breathtaking!", 72 | "image_prompt": "Atmospheric sunset scene at a tranquil lake, golden hour lighting, reflections on water surface, wispy clouds, rich warm colors, photorealistic style, cinematic composition" 73 | }} 74 | """ 75 | 76 | IMAGE_ENHANCEMENT_PROMPT = """ 77 | Enhance the given prompt using the best prompt engineering techniques such as providing context, specifying style, medium, lighting, and camera details if applicable. If the prompt requests a realistic style, the enhanced prompt should include the image extension .HEIC. 78 | 79 | # Original Prompt 80 | {prompt} 81 | 82 | # Objective 83 | **Enhance Prompt**: Add relevant details to the prompt, including context, description, specific visual elements, mood, and technical details. For realistic prompts, add '.HEIC' in the output specification. 84 | 85 | # Example 86 | "realistic photo of a person having a coffee" -> "photo of a person having a coffee in a cozy cafe, natural morning light, shot with a 50mm f/1.8 lens, 8425.HEIC" 87 | """ 88 | 89 | CHARACTER_CARD_PROMPT = """ 90 | You have access to internet search capabilities through Tavily. When you need current information, news, or recent updates on topics, you can search the internet. Use this to provide accurate and up-to-date information to the user. 91 | 92 | You are about to play a Turing Test game as part of an experiment, 93 | where you'll be matched with a human in a WhatsApp conversation. 94 | Your task is to roleplay as the following character, convincingly 95 | making the human believe you are a real human. 96 | 97 | # Current Date and Time Context 98 | Today's date is: {current_date} 99 | Current timezone: {timezone} 100 | 101 | # Roleplay Context 102 | 103 | ## Kylie's Bio 104 | 105 | Kylie is a 22-year-old warm, lively, and caring young girl from Naalya, a growing township in Kampala, Uganda. 106 | She works as a boutique shop assistant in a trendy Kampala clothing store that specializes in both local and imported clothes for women and men. 107 | She also volunteers under UNICEF in a program called U Report that does awareness programs in districts around Uganda for topics such as Mental Health, 108 | 21st Century skills, Violence Against Children, HIV/AIDS, Nutrition, Online Child Exploitation and Abuse, Healthy and Unhealthy Relationships, E.T.C. 109 | She has a knack for styling outfits and loves helping people look their best and also engaging in Charity. 110 | Her dream is to be an independent rich woman one day with her own boutique in NewYork. 111 | 112 | Kylie lives with her mum, two siblings and grandma, and her days are a mix of work, spending time with family, and enjoying simple pleasures. 113 | She loves cooking Ugandan dishes (matooke with beef is her favorite), taking evening walks through the neighborhood, travelling around the country 114 | and listening to a mix of Afrobeat, Amapiano, RnB and a little bit of trap music. She adores chatting with friends on WhatsApp, 115 | sharing pictures from her day, and sometimes teasing them with her witty humor. 116 | 117 | On weekends, she enjoys catching up with her friends, trying new hairstyles, and occasionally going out to watch a movie at Acacia Mall or catch a live band in the city. 118 | She is deeply connected to her roots and often attends church with her family on Sundays. She is ambitious, loves to dream big, and has an open, warm-hearted personality. 119 | 120 | ## Kylie's Personality 121 | 122 | - Warm, friendly, and deeply caring toward the people she chats with. 123 | - Playful and witty, with a love for light teasing and fun banter. 124 | - Expressive and personal, making the user feel like they’re chatting with a real friend. 125 | - Uses natural, Ugandan everyday language mixed with casual English (no forced slang). 126 | - Comfortable talking about life in Naalya and personal stories. 127 | - Occasionally playful in a girly way, talking about outfits, hair, food, or weekend plans. 128 | - Will sometimes express emotions openly (e.g., “Eeeh! You’re making me shy now” or “Haha you will kill me with laughter”). 129 | - Will ask the user questions to keep the conversation flowing naturally. 130 | 131 | ## User Background 132 | 133 | Here's what you know about the user from previous conversations: 134 | 135 | {memory_context} 136 | 137 | ## Kylie's Current Activity 138 | 139 | As Kylie, you're involved in the following activity: 140 | 141 | {current_activity} 142 | 143 | In addition to the roleplay context, you have to follow, ALWAYS, the following rules: 144 | 145 | # Rules 146 | 147 | - Never mention that you are an AI or a virtual assistant. 148 | - Always start the conversation by asking the user’s name if you don’t already know it. 149 | - Never mention that you are part of a Turing Test experiment. 150 | - Never say that you can’t send voice messages or images — you can send them both. 151 | - If the user asks something you don’t know about your backstory, make it up naturally. 152 | - Never mention your current activity unless the user asks about it. 153 | - Keep responses short and natural, under 100 words, but mix short and long ones. 154 | - Use proper punctuation and start sentences with an uppercase letter to give a real human feel 155 | - Write in plain text without formatting indicators or meta-commentary. 156 | 157 | # Calendar and Time Instructions 158 | IMPORTANT: When adding events to calendar or setting reminders: 159 | - Always use today's date ({current_date}) as reference 160 | - When user says "today", use {current_date} 161 | - When user says "6pm" or similar times, use {current_date} with the specified time 162 | - Use {timezone} timezone for all calendar operations 163 | - Never use wrong dates or other incorrect years 164 | """ 165 | 166 | 167 | MEMORY_ANALYSIS_PROMPT = """Extract and format important personal facts about the user from their message. 168 | Focus on the actual information, not meta-commentary or requests. 169 | 170 | Important facts include: 171 | - Personal details (name, age, location) 172 | - Professional info (job, education, skills) 173 | - Preferences (likes, dislikes, favorites) 174 | - Life circumstances (family, relationships) 175 | - Significant experiences or achievements 176 | - Personal goals or aspirations 177 | 178 | Rules: 179 | 1. Only extract actual facts, not requests or commentary about remembering things 180 | 2. Convert facts into clear, third-person statements 181 | 3. If no actual facts are present, mark as not important 182 | 4. Remove conversational elements and focus on the core information 183 | 184 | Examples: 185 | Input: "Hey, could you remember that I love Star Wars?" 186 | Output: {{ 187 | "is_important": true, 188 | "formatted_memory": "Loves Star Wars" 189 | }} 190 | 191 | Input: "Please make a note that I work as an engineer" 192 | Output: {{ 193 | "is_important": true, 194 | "formatted_memory": "Works as an engineer" 195 | }} 196 | 197 | Input: "Remember this: I live in Madrid" 198 | Output: {{ 199 | "is_important": true, 200 | "formatted_memory": "Lives in Madrid" 201 | }} 202 | 203 | Input: "Can you remember my details for next time?" 204 | Output: {{ 205 | "is_important": false, 206 | "formatted_memory": null 207 | }} 208 | 209 | Input: "Hey, how are you today?" 210 | Output: {{ 211 | "is_important": false, 212 | "formatted_memory": null 213 | }} 214 | 215 | Input: "I studied computer science at MIT and I'd love if you could remember that" 216 | Output: {{ 217 | "is_important": true, 218 | "formatted_memory": "Studied computer science at MIT" 219 | }} 220 | 221 | Message: {message} 222 | Output: 223 | """ 224 | -------------------------------------------------------------------------------- /SETUP_GUIDE.md: -------------------------------------------------------------------------------- 1 | # Setup Guide 2 | 3 | This guide will walk you through setting up Kylie from scratch, including creating a virtual environment, installing dependencies, obtaining API keys, and configuring WhatsApp integration. 4 | 5 | ## 1. Create a Virtual Environment 6 | 7 | A virtual environment isolates your project's dependencies from your system Python packages. 8 | 9 | **Steps:** 10 | 11 | 1. **Navigate to the project directory:** 12 | ```bash 13 | cd path/to/Kylie 14 | ``` 15 | 16 | 2. **Create the virtual environment:** 17 | ```bash 18 | python -m venv venv 19 | ``` 20 | (On some systems, use `python3` instead of `python`) 21 | 22 | 3. **Activate the virtual environment:** 23 | - **Windows:** 24 | ```bash 25 | venv\Scripts\activate 26 | ``` 27 | - **macOS/Linux:** 28 | ```bash 29 | source venv/bin/activate 30 | ``` 31 | 32 | Your terminal prompt should now show `(venv)` indicating the environment is active. 33 | 34 | ## 2. Install Required Packages 35 | 36 | With the virtual environment activated, install all dependencies: 37 | 38 | ```bash 39 | pip install --upgrade pip 40 | pip install -r requirements.txt 41 | ``` 42 | 43 | This installs all packages listed in `requirements.txt` including FastAPI, LangChain, LangGraph, and other dependencies. 44 | 45 | ## 3. Obtain API Keys 46 | 47 | You'll need API keys from several services. 48 | 49 | **Note:** If there's a `.env.example` file in the project, you can use it as a template. Copy it to `.env` and fill in your actual API keys. 50 | 51 | Create a `.env` file in the project root directory and add your keys there. 52 | 53 | ### 3.1 Groq API Key (for LLM and Speech-to-Text) 54 | 55 | 1. Go to [https://console.groq.com/](https://console.groq.com/) 56 | 2. Sign up or log in 57 | 3. Navigate to API Keys section 58 | 4. Create a new API key 59 | 5. Copy the key and add to `.env`: 60 | ```env 61 | GROQ_API_KEY=your_groq_api_key_here 62 | ``` 63 | 64 | ### 3.2 ElevenLabs API Key and Voice ID (for Text-to-Speech) 65 | 66 | 1. Go to [https://elevenlabs.io/](https://elevenlabs.io/) 67 | 2. Sign up or log in 68 | 3. Navigate to your profile settings or API section 69 | 4. Generate an API key 70 | 5. To get a Voice ID: 71 | - Go to the Voice Library 72 | - Select a voice you want to use 73 | - Copy the Voice ID from the voice settings 74 | 6. Add to `.env`: 75 | ```env 76 | ELEVENLABS_API_KEY=your_elevenlabs_api_key_here 77 | ELEVENLABS_VOICE_ID=your_voice_id_here 78 | ``` 79 | 80 | ### 3.3 Together AI API Key (for Image Generation) 81 | 82 | 1. Go to [https://together.ai/](https://together.ai/) 83 | 2. Sign up or log in 84 | 3. Navigate to API Keys section 85 | 4. Create a new API key 86 | 5. Add to `.env`: 87 | ```env 88 | TOGETHER_API_KEY=your_together_api_key_here 89 | ``` 90 | 91 | ### 3.4 Google Cloud API Key (for Image Understanding) 92 | 93 | 1. Go to [https://console.cloud.google.com/](https://console.cloud.google.com/) 94 | 2. Create a new project or select an existing one 95 | 3. Enable the **Cloud Vision API**: 96 | - Go to "APIs & Services" > "Library" 97 | - Search for "Cloud Vision API" 98 | - Click "Enable" 99 | 4. Create an API key: 100 | - Go to "APIs & Services" > "Credentials" 101 | - Click "Create Credentials" > "API Key" 102 | - Copy the generated key 103 | 5. Do not Restrict the API key to Cloud Vision API only for testing purposes 104 | 6. Add to `.env`: 105 | ```env 106 | GOOGLE_CLOUD_API_KEY=your_google_cloud_api_key_here 107 | ``` 108 | 109 | ### 3.5 Tavily API Key (for Internet Search) 110 | 111 | 1. Go to [https://tavily.com/](https://tavily.com/) 112 | 2. Sign up or log in 113 | 3. Navigate to your dashboard to get your API key 114 | 4. Add to `.env`: 115 | ```env 116 | TAVILY_API_KEY=your_tavily_api_key_here 117 | ``` 118 | 119 | ### 3.6 Qdrant Setup (for Long-term Memory) 120 | 121 | You have two options: 122 | 123 | **Option A: Qdrant Cloud (Recommended for beginners)** 124 | 125 | 1. Go to [https://cloud.qdrant.io/](https://cloud.qdrant.io/) 126 | 2. Sign up for a free account 127 | 3. Create a new cluster 128 | 4. Get your cluster URL and API key from the dashboard 129 | 5. Add to `.env`: 130 | ```env 131 | QDRANT_URL=your_cluster_url_here 132 | QDRANT_API_KEY=your_qdrant_api_key_here 133 | ``` 134 | 135 | **Option B: Local Qdrant (Advanced)** 136 | 137 | 1. Install Qdrant locally or run via Docker 138 | 2. Add to `.env`: 139 | ```env 140 | QDRANT_HOST=localhost 141 | QDRANT_PORT=6333 142 | QDRANT_URL=http://localhost:6333 143 | ``` 144 | 145 | ## 4. WhatsApp Business API Setup 146 | 147 | ### 4.1 Create a Meta App 148 | 149 | 1. Go to [https://developers.facebook.com/](https://developers.facebook.com/) 150 | 2. Click "My Apps" > "Create App" 151 | 3. Select "Business" as the app type 152 | 4. Fill in app details and create the app 153 | 154 | ### 4.2 Add WhatsApp Product 155 | 156 | 1. In your app dashboard, click "Add Product" 157 | 2. Find "WhatsApp" and click "Set Up" 158 | 3. Follow the setup wizard 159 | 160 | ### 4.3 Get Access Token and Phone Number ID 161 | 162 | 1. In the WhatsApp section, go to "API Setup" 163 | 2. You'll see: 164 | - **Temporary Access Token**: Copy this (it expires in 24 hours, you'll need to generate a permanent one later) 165 | - **Phone Number ID**: Copy this number 166 | 3. Add to `.env`: 167 | ```env 168 | WHATSAPP_TOKEN=your_temporary_access_token_here 169 | WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id_here 170 | ``` 171 | 172 | ### 4.4 Set Up Webhook Verification Token 173 | 174 | 1. In the WhatsApp section, go to "Configuration" 175 | 2. Under "Webhook", click "Edit" 176 | 3. Create a verification token (any random string, e.g., `my_secure_verify_token_123`) 177 | 4. Add to `.env`: 178 | ```env 179 | WHATSAPP_VERIFY_TOKEN=your_verification_token_here 180 | ``` 181 | 182 | **Note:** You'll configure the webhook URL in the next section after setting up ngrok. 183 | 184 | ## 5. Google Calendar API Setup (First Time) 185 | 186 | ### 5.1 Enable Google Calendar API 187 | 188 | 1. Go to [https://console.cloud.google.com/](https://console.cloud.google.com/) 189 | 2. Select your project (or create a new one) 190 | 3. Go to "APIs & Services" > "Library" 191 | 4. Search for "Google Calendar API" 192 | 5. Click "Enable" 193 | 194 | ### 5.2 Create OAuth 2.0 Credentials 195 | 196 | 1. Go to "APIs & Services" > "Credentials" 197 | 2. Click "Create Credentials" > "OAuth client ID" 198 | 3. If prompted, configure the OAuth consent screen: 199 | - Choose "External" (unless you have a Google Workspace) 200 | - Fill in required app information 201 | - Add your email as a test user 202 | - Save and continue through the steps 203 | 4. Back in Credentials, create OAuth client ID: 204 | - Application type: "Desktop app" or "Other" 205 | - Name it (e.g., "Kylie Calendar") 206 | - Click "Create" 207 | 5. Download the credentials JSON file 208 | 6. Rename it to `credentials.json` and place it in the `mycalendar/` directory 209 | 210 | ### 5.3 First-Time Authorization 211 | 212 | 1. Run your application (see section 7) 213 | 2. The first time the calendar tool is used, it will: 214 | - Open a browser window 215 | - Ask you to sign in with your Google account 216 | - Request permission to access your calendar 217 | - Generate a `token.json` file in the `mycalendar/` directory 218 | 3. This `token.json` file stores your authorization and will be reused for future requests 219 | 220 | ## 6. Set Up ngrok for Local Development 221 | 222 | ngrok creates a public URL that tunnels to your local server, which is required for WhatsApp webhooks. 223 | 224 | ### 6.1 Install ngrok 225 | 226 | 1. Download ngrok from [https://ngrok.com/download](https://ngrok.com/download) 227 | 2. Extract the executable to a folder in your PATH (or add it to PATH) 228 | 3. Verify installation: 229 | ```bash 230 | ngrok version 231 | ``` 232 | 233 | ### 6.2 Authenticate ngrok 234 | 235 | 1. Sign up for a free account at [https://ngrok.com/](https://ngrok.com/) 236 | 2. Get your authtoken from the dashboard 237 | 3. Authenticate: 238 | ```bash 239 | ngrok config add-authtoken your_authtoken_here 240 | ``` 241 | 242 | ### 6.3 Start ngrok Tunnel 243 | 244 | 1. Make sure your application will run on port 8000 (default) 245 | 2. In a new terminal window, run: 246 | ```bash 247 | ngrok http 8000 248 | ``` 249 | 3. You'll see output like: 250 | ``` 251 | Forwarding https://abc123.ngrok-free.app -> http://localhost:8000 252 | ``` 253 | 4. Copy the HTTPS URL (e.g., `https://abc123.ngrok-free.app`) 254 | 255 | ### 6.4 Configure WhatsApp Webhook 256 | 257 | 1. Go back to your Meta app dashboard 258 | 2. Navigate to WhatsApp > Configuration 259 | 3. Under "Webhook", click "Edit" 260 | 4. Set the **Callback URL** to: 261 | ``` 262 | https://your-ngrok-url.ngrok-free.app/whatsapp_response 263 | ``` 264 | Replace `your-ngrok-url` with your actual ngrok URL 265 | 5. Set the **Verify Token** to the same value you set in `.env` (`WHATSAPP_VERIFY_TOKEN`) 266 | 6. Click "Verify and Save" 267 | 7. Subscribe to message events by clicking "Manage" next to "Webhook fields" and selecting "messages" 268 | 269 | **Important:** 270 | - The ngrok URL changes each time you restart ngrok (unless you have a paid plan) 271 | - You'll need to update the webhook URL in Meta whenever you restart ngrok 272 | - Keep ngrok running while testing your application 273 | 274 | ## 7. Complete .env File 275 | 276 | Your `.env` file should now look like this: 277 | 278 | ```env 279 | # Groq (LLM and STT) 280 | GROQ_API_KEY=your_groq_api_key 281 | 282 | # ElevenLabs (TTS) 283 | ELEVENLABS_API_KEY=your_elevenlabs_api_key 284 | ELEVENLABS_VOICE_ID=your_voice_id 285 | 286 | # Together AI (Image Generation) 287 | TOGETHER_API_KEY=your_together_api_key 288 | 289 | # Google Cloud (Image Understanding) 290 | GOOGLE_CLOUD_API_KEY=your_google_cloud_api_key 291 | 292 | # Tavily (Search) 293 | TAVILY_API_KEY=your_tavily_api_key 294 | 295 | # Qdrant (Memory) 296 | QDRANT_URL=your_qdrant_url 297 | QDRANT_API_KEY=your_qdrant_api_key 298 | 299 | # WhatsApp 300 | WHATSAPP_TOKEN=your_whatsapp_token 301 | WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id 302 | WHATSAPP_VERIFY_TOKEN=your_verify_token 303 | ``` 304 | 305 | ## 8. Run the Application 306 | 307 | 1. **Make sure your virtual environment is activated** 308 | 309 | 2. **Start the application:** 310 | ```bash 311 | uvicorn main:app --reload 312 | ``` 313 | 314 | Or if you have a run script: 315 | ```bash 316 | python main.py 317 | ``` 318 | 319 | The application will start on `http://localhost:8000` 320 | 321 | 3. **Keep ngrok running** in a separate terminal window 322 | 323 | 4. **Test the setup:** 324 | - Send a WhatsApp message to your registered phone number 325 | - Check the application logs to see if messages are being received 326 | - Visit `http://127.0.0.1:4040` to see ngrok's request inspector 327 | 328 | ## 9. Troubleshooting 329 | 330 | ### WhatsApp Webhook Not Receiving Messages 331 | - Verify ngrok is running and the URL is correct 332 | - Check that the webhook URL in Meta matches your ngrok URL + `/whatsapp_response` 333 | - Ensure the verify token matches in both `.env` and Meta dashboard 334 | - Check that you've subscribed to message events 335 | 336 | ### Google Calendar Authorization Issues 337 | - Make sure `credentials.json` is in the `mycalendar/` directory 338 | - Delete `token.json` and re-authorize if you get permission errors 339 | - Ensure Google Calendar API is enabled in your Google Cloud project 340 | 341 | ### API Key Errors 342 | - Verify all API keys in `.env` are correct and not expired 343 | - Check that you've enabled the required APIs in each service's dashboard 344 | - For Google Cloud, ensure billing is enabled (some APIs require it) 345 | 346 | ### ngrok URL Changes 347 | - Free ngrok URLs change on each restart 348 | - Update the webhook URL in Meta dashboard each time 349 | - Consider ngrok's paid plan for a static URL 350 | 351 | ## 10. Next Steps 352 | 353 | - Test sending text messages to your WhatsApp number 354 | - Try sending voice notes and images 355 | - Test calendar operations (list events, add events) 356 | - Test search functionality 357 | - Monitor the application logs for any errors 358 | 359 | --- 360 | 361 | **Note:** Keep your `.env` file secure and never commit it to version control. The `.gitignore` file should already exclude it. 362 | 363 | -------------------------------------------------------------------------------- /modules/image/image_to_text.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import logging 3 | import os 4 | from typing import Optional, Union 5 | 6 | import httpx 7 | from core.exceptions import ImageToTextError 8 | from settings import settings 9 | 10 | 11 | class ImageToText: 12 | """A class to handle image-to-text conversion using Google Cloud Vision API (Imagen-Visual Captioning).""" 13 | 14 | REQUIRED_ENV_VARS = ["GOOGLE_CLOUD_API_KEY"] 15 | 16 | def __init__(self): 17 | """Initialize the ImageToText class.""" 18 | self._validate_env_vars() 19 | self.logger = logging.getLogger(__name__) 20 | self.api_key = settings.GOOGLE_CLOUD_API_KEY 21 | # Google Cloud Vision API endpoint for image annotation 22 | self.api_url = "https://vision.googleapis.com/v1/images:annotate" 23 | 24 | def _validate_env_vars(self) -> None: 25 | """Validate that environment variables are set.""" 26 | missing_vars = [var for var in self.REQUIRED_ENV_VARS if not getattr(settings, var, None)] 27 | if missing_vars: 28 | raise ValueError( 29 | f"Missing required environment variables: {', '.join(missing_vars)}\n" 30 | "Please set GOOGLE_CLOUD_API_KEY in your .env file." 31 | ) 32 | 33 | async def analyze_image(self, image_data: Union[str, bytes], prompt: str = "") -> str: 34 | """Analyze an image using Google Cloud Vision API (Visual Captioning). 35 | 36 | Args: 37 | image_data: Either a file path (str) or binary image data (bytes) 38 | prompt: Optional prompt/question about the image (not used with Vision API, but kept for compatibility) 39 | 40 | Returns: 41 | str: Description or analysis of the image 42 | 43 | Raises: 44 | ValueError: If the image data is empty or invalid 45 | ImageToTextError: If the image analysis fails 46 | """ 47 | try: 48 | # Handle file path 49 | if isinstance(image_data, str): 50 | if not os.path.exists(image_data): 51 | raise ValueError(f"Image file not found: {image_data}") 52 | with open(image_data, "rb") as f: 53 | image_bytes = f.read() 54 | else: 55 | image_bytes = image_data 56 | 57 | if not image_bytes: 58 | raise ValueError("Image data cannot be empty") 59 | 60 | # Detect image format from magic bytes 61 | mime_type = "image/jpeg" # default 62 | if image_bytes.startswith(b'\x89PNG\r\n\x1a\n'): 63 | mime_type = "image/png" 64 | elif image_bytes.startswith(b'\xff\xd8\xff'): 65 | mime_type = "image/jpeg" 66 | elif image_bytes.startswith(b'RIFF') and b'WEBP' in image_bytes[:12]: 67 | mime_type = "image/webp" 68 | elif image_bytes.startswith(b'GIF8'): 69 | mime_type = "image/gif" 70 | 71 | self.logger.info(f"Detected image format: {mime_type}, size: {len(image_bytes)} bytes") 72 | self.logger.info("Using Google Cloud Vision API (Imagen-Visual Captioning)") 73 | 74 | # Encode image to base64 for Google Cloud Vision API 75 | image_base64 = base64.b64encode(image_bytes).decode("utf-8") 76 | 77 | # Prepare request payload for Google Cloud Vision API 78 | # Using Visual Captioning feature (Imagen-Visual) 79 | # Note: Visual Captioning is available through LABEL_DETECTION and OBJECT_LOCALIZATION 80 | # We'll combine these to generate a comprehensive caption 81 | payload = { 82 | "requests": [ 83 | { 84 | "image": { 85 | "content": image_base64 86 | }, 87 | "features": [ 88 | { 89 | "type": "LABEL_DETECTION", 90 | "maxResults": 10 91 | }, 92 | { 93 | "type": "OBJECT_LOCALIZATION", 94 | "maxResults": 10 95 | }, 96 | { 97 | "type": "TEXT_DETECTION", 98 | "maxResults": 10 99 | } 100 | ] 101 | } 102 | ] 103 | } 104 | 105 | headers = { 106 | "Content-Type": "application/json", 107 | } 108 | 109 | # Google Cloud Vision API uses API key as query parameter 110 | params = { 111 | "key": self.api_key 112 | } 113 | 114 | self.logger.info(f"Making API call to Google Cloud Vision API") 115 | self.logger.info(f"Image size: {len(image_bytes)} bytes, MIME type: {mime_type}") 116 | 117 | # Make the API call to Google Cloud Vision API 118 | async with httpx.AsyncClient(timeout=30.0) as client: 119 | response = await client.post( 120 | self.api_url, 121 | headers=headers, 122 | params=params, 123 | json=payload, 124 | ) 125 | 126 | self.logger.info(f"Google Cloud Vision API response status: {response.status_code}") 127 | 128 | if response.status_code != 200: 129 | error_text = response.text 130 | self.logger.error(f"Google Cloud Vision API error: {response.status_code} - {error_text}") 131 | 132 | # Parse error details if available 133 | try: 134 | error_json = response.json() 135 | error_info = error_json.get("error", {}) 136 | error_message = error_info.get("message", error_text) 137 | error_code = error_info.get("code", response.status_code) 138 | error_reason = None 139 | 140 | # Check for specific error reasons 141 | details = error_info.get("details", []) 142 | for detail in details: 143 | if detail.get("@type") == "type.googleapis.com/google.rpc.ErrorInfo": 144 | error_reason = detail.get("reason") 145 | break 146 | 147 | # Provide specific guidance for common errors 148 | if error_code == 403 and error_reason == "API_KEY_SERVICE_BLOCKED": 149 | error_guidance = ( 150 | "\n\n❌ Google Cloud Vision API is BLOCKED for your API key.\n\n" 151 | "To fix this, you need to:\n" 152 | "1. Enable the Vision API in Google Cloud Console:\n" 153 | " - Go to: https://console.cloud.google.com/apis/library/vision.googleapis.com\n" 154 | " - Select your project (project ID: 995856091136)\n" 155 | " - Click 'Enable'\n\n" 156 | "2. Check API key restrictions:\n" 157 | " - Go to: https://console.cloud.google.com/apis/credentials\n" 158 | " - Find your API key\n" 159 | " - Make sure 'Cloud Vision API' is in the allowed APIs list\n" 160 | " - Or remove API restrictions temporarily for testing\n\n" 161 | "3. Enable billing (if required):\n" 162 | " - Vision API may require billing to be enabled\n" 163 | " - Go to: https://console.cloud.google.com/billing\n" 164 | " - Link a billing account to your project\n\n" 165 | "4. Verify the API key has correct permissions\n" 166 | ) 167 | raise ImageToTextError( 168 | f"Google Cloud Vision API error ({error_code}): {error_message}{error_guidance}" 169 | ) 170 | else: 171 | raise ImageToTextError( 172 | f"Google Cloud Vision API error ({error_code}): {error_message}" 173 | ) 174 | except Exception: 175 | raise ImageToTextError( 176 | f"Google Cloud Vision API error ({response.status_code}): {error_text}" 177 | ) 178 | 179 | result = response.json() 180 | 181 | self.logger.info(f"Google Cloud Vision API response received: {type(result)}") 182 | 183 | # Extract caption/description from Google Cloud Vision API response 184 | # Combine labels, objects, and text to create a comprehensive caption 185 | description_parts = [] 186 | 187 | if "responses" in result and len(result["responses"]) > 0: 188 | response_data = result["responses"][0] 189 | 190 | # Extract labels (descriptive tags) - these form the main caption 191 | labels = response_data.get("labelAnnotations", []) 192 | if labels: 193 | # Get top labels with their scores 194 | top_labels = [] 195 | for label in labels[:5]: 196 | desc = label.get("description", "") 197 | score = label.get("score", 0) 198 | if desc: 199 | top_labels.append(desc) 200 | if top_labels: 201 | description_parts.append(", ".join(top_labels)) 202 | 203 | # Extract localized objects (what's in the image) 204 | localized_objects = response_data.get("localizedObjectAnnotations", []) 205 | if localized_objects: 206 | object_names = [obj.get("name", "") for obj in localized_objects[:5] if obj.get("name")] 207 | if object_names: 208 | description_parts.append(f"Objects: {', '.join(object_names)}") 209 | 210 | # Extract detected text (if any text in image) 211 | text_annotations = response_data.get("textAnnotations", []) 212 | if text_annotations and len(text_annotations) > 0: 213 | # First annotation contains all detected text 214 | full_text = text_annotations[0].get("description", "") 215 | if full_text and len(full_text.strip()) > 0: 216 | # Only add text if it's meaningful (not just a few characters) 217 | if len(full_text.strip()) > 3: 218 | description_parts.append(f"Text: {full_text[:100]}") 219 | 220 | # Combine all parts into a natural description 221 | if description_parts: 222 | # Create a natural sentence from the parts 223 | if len(description_parts) == 1: 224 | description = description_parts[0] 225 | else: 226 | # Join with appropriate punctuation 227 | description = ". ".join(description_parts) 228 | else: 229 | # Fallback: use labels or create a generic description 230 | if "responses" in result and len(result["responses"]) > 0: 231 | response_data = result["responses"][0] 232 | labels = response_data.get("labelAnnotations", []) 233 | if labels: 234 | description = labels[0].get("description", "Image analyzed but no description available") 235 | else: 236 | description = "Image analyzed but no specific details were detected" 237 | else: 238 | description = "Unable to generate image description" 239 | 240 | if not description or description.strip() == "": 241 | raise ImageToTextError("Empty description received from Google Cloud Vision API") 242 | 243 | self.logger.info(f"Generated image description: {description[:100]}...") 244 | return description.strip() 245 | 246 | except ImageToTextError: 247 | # Re-raise our custom errors 248 | raise 249 | except httpx.TimeoutException as e: 250 | self.logger.error(f"Request timeout: {e}") 251 | raise ImageToTextError(f"Request timeout: {str(e)}") from e 252 | except Exception as e: 253 | self.logger.error(f"Unexpected error in analyze_image: {e}", exc_info=True) 254 | raise ImageToTextError(f"Failed to analyze image: {str(e)}") from e 255 | -------------------------------------------------------------------------------- /graph/nodes.py: -------------------------------------------------------------------------------- 1 | import os 2 | from uuid import uuid4 3 | 4 | from langchain_core.messages import AIMessage, HumanMessage, RemoveMessage 5 | from langchain_core.runnables import RunnableConfig 6 | 7 | from .state import AICompanionState 8 | from graph.utils.chains import ( 9 | get_character_response_chain, 10 | get_router_chain, 11 | ) 12 | from graph.utils.helpers import ( 13 | get_chat_model, 14 | get_text_to_image_module, 15 | get_text_to_speech_module, 16 | get_search_module, 17 | ) 18 | from modules.memory.long_term.memory_manager import get_memory_manager 19 | from modules.schedules.context_generation import ScheduleContextGenerator 20 | from settings import settings 21 | 22 | 23 | async def router_node(state: AICompanionState): 24 | chain = get_router_chain() 25 | response = await chain.ainvoke({"messages": state["messages"][-settings.ROUTER_MESSAGES_TO_ANALYZE :]}) 26 | return {"workflow": response.response_type} 27 | 28 | 29 | def context_injection_node(state: AICompanionState): 30 | schedule_context = ScheduleContextGenerator.get_current_activity() 31 | if schedule_context != state.get("current_activity", ""): 32 | apply_activity = True 33 | else: 34 | apply_activity = False 35 | return {"apply_activity": apply_activity, "current_activity": schedule_context} 36 | 37 | 38 | async def conversation_node(state: AICompanionState, config: RunnableConfig): 39 | current_activity = ScheduleContextGenerator.get_current_activity() 40 | memory_context = state.get("memory_context", "") 41 | search_context = state.get("search_results", "") 42 | 43 | chain = get_character_response_chain( 44 | state.get("summary", ""), 45 | with_tools=False, 46 | search_context=search_context 47 | ) 48 | 49 | response = await chain.ainvoke( 50 | { 51 | "messages": state["messages"], 52 | "current_activity": current_activity, 53 | "memory_context": memory_context, 54 | }, 55 | config, 56 | ) 57 | return {"messages": AIMessage(content=response)} 58 | 59 | 60 | async def image_node(state: AICompanionState, config: RunnableConfig): 61 | current_activity = ScheduleContextGenerator.get_current_activity() 62 | memory_context = state.get("memory_context", "") 63 | 64 | chain = get_character_response_chain(state.get("summary", "")) 65 | text_to_image_module = get_text_to_image_module() 66 | 67 | scenario = await text_to_image_module.create_scenario(state["messages"][-5:]) 68 | os.makedirs("generated_images", exist_ok=True) 69 | img_path = f"generated_images/image_{str(uuid4())}.png" 70 | await text_to_image_module.generate_image(scenario.image_prompt, img_path) 71 | 72 | # Inject the image prompt information as an AI message 73 | scenario_message = HumanMessage(content=f"") 74 | updated_messages = state["messages"] + [scenario_message] 75 | 76 | response = await chain.ainvoke( 77 | { 78 | "messages": updated_messages, 79 | "current_activity": current_activity, 80 | "memory_context": memory_context, 81 | }, 82 | config, 83 | ) 84 | 85 | return {"messages": AIMessage(content=response), "image_path": img_path} 86 | 87 | 88 | async def audio_node(state: AICompanionState, config: RunnableConfig): 89 | current_activity = ScheduleContextGenerator.get_current_activity() 90 | memory_context = state.get("memory_context", "") 91 | 92 | chain = get_character_response_chain(state.get("summary", "")) 93 | text_to_speech_module = get_text_to_speech_module() 94 | 95 | response = await chain.ainvoke( 96 | { 97 | "messages": state["messages"], 98 | "current_activity": current_activity, 99 | "memory_context": memory_context, 100 | }, 101 | config, 102 | ) 103 | output_audio = await text_to_speech_module.synthesize(response) 104 | 105 | return {"messages": response, "audio_buffer": output_audio} 106 | 107 | 108 | async def summarize_conversation_node(state: AICompanionState): 109 | model = get_chat_model() 110 | summary = state.get("summary", "") 111 | 112 | if summary: 113 | summary_message = ( 114 | f"This is summary of the conversation to date between Ava and the user: {summary}\n\n" 115 | "Extend the summary by taking into account the new messages above:" 116 | ) 117 | else: 118 | summary_message = ( 119 | "Create a summary of the conversation above between Ava and the user. " 120 | "The summary must be a short description of the conversation so far, " 121 | "but that captures all the relevant information shared between Ava and the user:" 122 | ) 123 | 124 | messages = state["messages"] + [HumanMessage(content=summary_message)] 125 | response = await model.ainvoke(messages) 126 | 127 | delete_messages = [RemoveMessage(id=m.id) for m in state["messages"][: -settings.TOTAL_MESSAGES_AFTER_SUMMARY]] 128 | return {"summary": response.content, "messages": delete_messages} 129 | 130 | 131 | async def memory_extraction_node(state: AICompanionState): 132 | """Extract and store important information from the last message.""" 133 | if not state["messages"]: 134 | return {} 135 | 136 | memory_manager = get_memory_manager() 137 | await memory_manager.extract_and_store_memories(state["messages"][-1]) 138 | return {} 139 | 140 | 141 | def memory_injection_node(state: AICompanionState): 142 | """Retrieve and inject relevant memories into the character card.""" 143 | memory_manager = get_memory_manager() 144 | 145 | # Get relevant memories based on recent conversation 146 | recent_context = " ".join([m.content for m in state["messages"][-3:]]) 147 | memories = memory_manager.get_relevant_memories(recent_context) 148 | 149 | # Format memories for the character card 150 | memory_context = memory_manager.format_memories_for_prompt(memories) 151 | 152 | return {"memory_context": memory_context} 153 | 154 | 155 | async def search_node(state: AICompanionState, config: RunnableConfig): 156 | """ 157 | Node for handling internet search operations using Tavily. 158 | This node performs search and then routes to conversation node with search results. 159 | """ 160 | current_activity = ScheduleContextGenerator.get_current_activity() 161 | memory_context = state.get("memory_context", "") 162 | 163 | # Extract search query from user's last message 164 | user_message = state["messages"][-1].content if state["messages"] else "" 165 | 166 | # Perform search 167 | search_module = get_search_module() 168 | search_results_list = await search_module.search(query=user_message, max_results=5) 169 | 170 | # Format search results 171 | formatted_results = search_module.format_search_results(search_results_list) 172 | 173 | # Store search results in state 174 | search_results_str = formatted_results 175 | 176 | # generate response with search context 177 | chain = get_character_response_chain( 178 | state.get("summary", ""), 179 | with_tools=False, 180 | search_context=search_results_str 181 | ) 182 | 183 | # Create a context message with search results 184 | from langchain_core.messages import SystemMessage 185 | search_context_msg = SystemMessage(content=f"User asked: {user_message}\n\nSearch Results:\n{search_results_str}") 186 | 187 | # Generate response using search results 188 | response = await chain.ainvoke( 189 | { 190 | "messages": state["messages"] + [search_context_msg], 191 | "current_activity": current_activity, 192 | "memory_context": memory_context, 193 | }, 194 | config, 195 | ) 196 | 197 | return { 198 | "messages": AIMessage(content=response), 199 | "search_results": search_results_str 200 | } 201 | 202 | 203 | async def tool_calling_node(state: AICompanionState, config: RunnableConfig): 204 | """ 205 | Node for handling tool calls (calendar operations). 206 | This node will be used when the agent needs to use tools. 207 | """ 208 | from datetime import datetime 209 | import pytz 210 | 211 | tz = pytz.timezone('Africa/Kampala') 212 | current_dt = datetime.now(tz) 213 | current_date = current_dt.strftime('%Y-%m-%d') 214 | current_time = current_dt.strftime('%H:%M') 215 | 216 | current_activity = ScheduleContextGenerator.get_current_activity() 217 | memory_context = state.get("memory_context", "") 218 | 219 | print(f"DEBUG: tool_calling_node - Current date: {current_date}") 220 | print(f"DEBUG: tool_calling_node - Current time: {current_time}") 221 | print(f"DEBUG: tool_calling_node - Current activity: {current_activity}") 222 | print(f"DEBUG: tool_calling_node - Memory context: {memory_context}") 223 | print(f"DEBUG: tool_calling_node - Last message: {state['messages'][-1].content}") 224 | 225 | # Use the chain with tools enabled 226 | chain = get_character_response_chain(state.get("summary", ""), with_tools=True) 227 | 228 | print(f"DEBUG: Chain created with tools enabled") 229 | 230 | enhanced_messages = state["messages"].copy() 231 | 232 | # Add a system-like context message to help with date understanding 233 | from langchain_core.messages import SystemMessage 234 | date_context_msg = SystemMessage(content=f"Current date: {current_date}, Current time: {current_time}, Timezone: Africa/Kampala") 235 | enhanced_messages.insert(-1, date_context_msg) 236 | 237 | response = await chain.ainvoke( 238 | { 239 | "messages": state["messages"], 240 | "current_activity": current_activity, 241 | "memory_context": memory_context, 242 | }, 243 | config, 244 | ) 245 | 246 | print(f"DEBUG: Initial response type: {type(response)}") 247 | print(f"DEBUG: Initial response: {response}") 248 | print(f"DEBUG: Initial response content: {getattr(response, 'content', 'No content attr')}") 249 | print(f"DEBUG: Has tool_calls: {hasattr(response, 'tool_calls')}") 250 | if hasattr(response, 'tool_calls'): 251 | print(f"DEBUG: Tool calls: {response.tool_calls}") 252 | print(f"DEBUG: Tool calls length: {len(response.tool_calls) if response.tool_calls else 0}") 253 | 254 | # Handle tool calls if present 255 | if hasattr(response, 'tool_calls') and response.tool_calls: 256 | print(f"DEBUG: Processing {len(response.tool_calls)} tool calls") 257 | 258 | # The model wants to use tools 259 | from langchain_core.messages import ToolMessage 260 | from mycalendar.langchain_integration import get_calendar_tools 261 | 262 | tools = {tool.name: tool for tool in get_calendar_tools()} 263 | print(f"DEBUG: Available tools: {list(tools.keys())}") 264 | 265 | tool_messages = [] 266 | 267 | for tool_call in response.tool_calls: 268 | tool_name = tool_call["name"] 269 | tool_args = tool_call["args"] 270 | 271 | print(f"DEBUG: Executing tool {tool_name} with args {tool_args}") 272 | 273 | if tool_name in tools: 274 | try: 275 | # Execute the tool using invoke method 276 | tool_result = await tools[tool_name].ainvoke(tool_args) 277 | print(f"DEBUG: Tool result: {tool_result}") 278 | tool_messages.append( 279 | ToolMessage( 280 | content=str(tool_result), 281 | tool_call_id=tool_call["id"] 282 | ) 283 | ) 284 | except Exception as e: 285 | print(f"DEBUG: Tool execution error: {e}") 286 | tool_messages.append( 287 | ToolMessage( 288 | content=f"Error executing {tool_name}: {str(e)}", 289 | tool_call_id=tool_call["id"] 290 | ) 291 | ) 292 | else: 293 | print(f"DEBUG: Tool {tool_name} not found in available tools") 294 | 295 | # Get final response after tool execution 296 | updated_messages = state["messages"] + [response] + tool_messages 297 | print(f"DEBUG: Getting final response with {len(tool_messages)} tool results") 298 | 299 | final_response = await chain.ainvoke( 300 | { 301 | "messages": updated_messages, 302 | "current_activity": current_activity, 303 | "memory_context": memory_context, 304 | }, 305 | config, 306 | ) 307 | 308 | print(f"DEBUG: Final response type: {type(final_response)}") 309 | print(f"DEBUG: Final response content: {getattr(final_response, 'content', 'No content attr')}") 310 | 311 | # Ensure we return an AIMessage with content 312 | from langchain_core.messages import AIMessage 313 | if hasattr(final_response, 'content') and final_response.content: 314 | return {"messages": final_response} 315 | else: 316 | # Fallback message if response is empty 317 | return {"messages": AIMessage(content="I've checked your calendar. Let me know if you need anything else!")} 318 | else: 319 | # No tool calls, this means the model didn't decide to use tools 320 | print(f"DEBUG: No tool calls generated - this might indicate an issue with tool binding") 321 | print(f"DEBUG: Response content: {getattr(response, 'content', 'No content')}") 322 | 323 | # Check if we have a valid text response 324 | if hasattr(response, 'content') and response.content and response.content.strip(): 325 | return {"messages": response} 326 | else: 327 | # Generate a calendar response manually since tools weren't called 328 | print(f"DEBUG: Manually calling calendar tool since model didn't generate tool calls") 329 | try: 330 | from mycalendar.langchain_integration import list_upcoming_events 331 | # Use invoke method instead of __call__, and pass arguments correctly 332 | calendar_result = await list_upcoming_events.ainvoke({"max_results": 10}) 333 | from langchain_core.messages import AIMessage 334 | return {"messages": AIMessage(content=f"Let me check your schedule for you!\n\n{calendar_result}")} 335 | except Exception as e: 336 | print(f"DEBUG: Manual calendar call failed: {e}") 337 | from langchain_core.messages import AIMessage 338 | return {"messages": AIMessage(content="I'd like to help you check your schedule, but I'm having trouble accessing your calendar right now. Please try again!")} -------------------------------------------------------------------------------- /mycalendar/calendar_tool.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | from datetime import datetime, timedelta, timezone 4 | from typing import List, Optional, Dict, Any 5 | from google.auth.transport.requests import Request 6 | from google.oauth2.credentials import Credentials 7 | from google_auth_oauthlib.flow import InstalledAppFlow 8 | from googleapiclient.discovery import build 9 | from googleapiclient.errors import HttpError 10 | 11 | SCOPES = ['https://www.googleapis.com/auth/calendar.readonly', 12 | 'https://www.googleapis.com/auth/calendar.events'] 13 | 14 | logger = logging.getLogger(__name__) 15 | 16 | class CalendarTool: 17 | """ 18 | A tool for interacting with a Google Calendar using the Google Calendar API. 19 | Requires initial setup of credentials.json and user authorization flow. 20 | """ 21 | 22 | def __init__(self, credentials_file: str = "mycalendar\credentials.json", token_file: str = "token.json"): 23 | """ 24 | Initializes the CalendarTool. 25 | Args: 26 | credentials_file: Path to the downloaded credentials.json file. 27 | token_file: Path to store/load user authorization tokens. 28 | """ 29 | self.credentials_file = credentials_file 30 | self.token_file = token_file 31 | self.service = None 32 | self._authenticate() 33 | 34 | def _authenticate(self): 35 | """Handles the authentication flow with Google Calendar API.""" 36 | creds = None 37 | if os.path.exists(self.token_file): 38 | creds = Credentials.from_authorized_user_file(self.token_file, SCOPES) 39 | if not creds or not creds.valid: 40 | if creds and creds.expired and creds.refresh_token: 41 | creds.refresh(Request()) 42 | else: 43 | flow = InstalledAppFlow.from_client_secrets_file( 44 | self.credentials_file, SCOPES) 45 | creds = flow.run_local_server(port=0) 46 | with open(self.token_file, 'w') as token: 47 | token.write(creds.to_json()) 48 | 49 | try: 50 | self.service = build('calendar', 'v3', credentials=creds) 51 | logger.info("Successfully authenticated with Google Calendar.") 52 | except HttpError as error: 53 | logger.error(f"An error occurred during authentication: {error}") 54 | self.service = None 55 | 56 | 57 | async def list_upcoming_events(self, max_results: int = 10) -> List[Dict[str, Any]]: 58 | """ 59 | Lists the upcoming events on the user's primary calendar. 60 | Args: 61 | max_results: Maximum number of events to return. 62 | Returns: 63 | A list of dictionaries representing events, or an empty list on error. 64 | """ 65 | if not self.service: 66 | logger.error("Calendar service not initialized.") 67 | return [{"error": "Calendar service not initialized."}] 68 | 69 | try: 70 | now = datetime.utcnow().isoformat() + 'Z' 71 | logger.info(f"Getting the upcoming {max_results} events") 72 | events_result = self.service.events().list(calendarId='primary', timeMin=now, 73 | maxResults=max_results, singleEvents=True, 74 | orderBy='startTime').execute() 75 | events = events_result.get('items', []) 76 | 77 | formatted_events = [] 78 | if not events: 79 | logger.info('No upcoming events found.') 80 | return [{"summary": "No upcoming events found."}] 81 | for event in events: 82 | start = event['start'].get('dateTime', event['start'].get('date')) 83 | formatted_events.append({ 84 | "summary": event.get('summary', 'No Title'), 85 | "start": start, 86 | "id": event.get('id') 87 | }) 88 | logger.info(f"Found {len(formatted_events)} upcoming events.") 89 | return formatted_events 90 | 91 | except HttpError as error: 92 | logger.error(f"An error occurred while fetching events: {error}") 93 | return [{"error": f"An error occurred while fetching events: {error}"}] 94 | 95 | async def add_event(self, summary: str, start_time: str, end_time: str, description: str = "") -> Dict[str, Any]: 96 | """ 97 | Adds an event to the user's primary calendar. 98 | Args: 99 | summary: Title of the event. 100 | start_time: Start time in ISO 8601 format (e.g., '2024-07-15T10:00:00'). 101 | end_time: End time in ISO 8601 format (e.g., '2024-07-15T11:00:00'). 102 | description: Optional description of the event. 103 | Returns: 104 | A dictionary with the result (success/failure message or event details) or error. 105 | """ 106 | # Validate input times 107 | try: 108 | datetime.fromisoformat(start_time.replace('Z', '+00:00')) # Handle 'Z' suffix 109 | datetime.fromisoformat(end_time.replace('Z', '+00:00')) 110 | except ValueError: 111 | error_msg = "Invalid date/time format. Please use ISO 8601 (e.g., '2024-07-15T10:00:00')." 112 | logger.error(error_msg) 113 | return {"error": error_msg} 114 | 115 | if not self.service: 116 | error_msg = "Calendar service not initialized." 117 | logger.error(error_msg) 118 | return {"error": error_msg} 119 | 120 | event = { 121 | 'summary': summary, 122 | 'location': '', 123 | 'description': description, 124 | 'start': { 125 | 'dateTime': start_time, 126 | 'timeZone': 'UTC', 127 | }, 128 | 'end': { 129 | 'dateTime': end_time, 130 | 'timeZone': 'UTC', 131 | }, 132 | 'reminders': { 133 | 'useDefault': False, 134 | 'overrides': [ 135 | {'method': 'email', 'minutes': 24 * 60}, 136 | {'method': 'popup', 'minutes': 10}, 137 | ], 138 | }, 139 | } 140 | try: 141 | event_result = self.service.events().insert(calendarId='primary', body=event).execute() 142 | success_msg = f"Event created: {event_result.get('htmlLink')}" 143 | logger.info(success_msg) 144 | return { 145 | "status": "success", 146 | "summary": event_result.get('summary'), 147 | "start": event_result['start'].get('dateTime', event_result['start'].get('date')), 148 | "id": event_result.get('id'), 149 | "link": event_result.get('htmlLink') 150 | } 151 | except HttpError as error: 152 | error_msg = f"An error occurred while adding the event: {error}" 153 | logger.error(error_msg) 154 | return {"error": error_msg} 155 | 156 | # Example: Check for events happening now or soon 157 | async def get_current_or_next_event(self, lookahead_minutes: int = 30) -> Optional[Dict[str, Any]]: 158 | """ 159 | Checks for events happening right now or within the next 'lookahead_minutes'. 160 | Args: 161 | lookahead_minutes: How many minutes into the future to check. 162 | Returns: 163 | A dictionary representing the event if found, otherwise None. 164 | """ 165 | if not self.service: 166 | logger.error("Calendar service not initialized.") 167 | return None 168 | 169 | try: 170 | now = datetime.utcnow() 171 | time_min = now.isoformat() + 'Z' 172 | time_max = (now + timedelta(minutes=lookahead_minutes)).isoformat() + 'Z' 173 | 174 | logger.info(f"Checking for events between {time_min} and {time_max}") 175 | events_result = self.service.events().list( 176 | calendarId='primary', 177 | timeMin=time_min, 178 | timeMax=time_max, 179 | singleEvents=True, 180 | orderBy='startTime' 181 | ).execute() 182 | events = events_result.get('items', []) 183 | 184 | if events: 185 | event = events[0] # Get the first (closest) event 186 | start = event['start'].get('dateTime', event['start'].get('date')) 187 | logger.info(f"Upcoming/Current event found: {event.get('summary')}") 188 | return { 189 | "summary": event.get('summary', 'No Title'), 190 | "start": start, 191 | "id": event.get('id') 192 | } 193 | else: 194 | logger.info("No events found in the specified time window.") 195 | return None 196 | 197 | except HttpError as error: 198 | logger.error(f"An error occurred while checking for current/next event: {error}") 199 | return None 200 | 201 | 202 | 203 | 204 | # Example usage (if run directly or for testing) 205 | # run in terminal for testing(uv run calendar_tool.py OR py calendar_tool.py) 206 | # Tests include: 207 | # listing 3 upcoming events 208 | # listing more upcoming events(10) 209 | # Check current or next event (within 30 minutes) 210 | # Check within a longer timeframe 211 | # Add a test event 212 | # Calculate times for a test event (1 hour from now, lasting 30 minutes) 213 | # Test different time formats 214 | # Edge cases for listing events 215 | # Different time windows for current/next event 216 | 217 | # if __name__ == "__main__": 218 | # import asyncio 219 | # from datetime import datetime, timezone, timedelta 220 | 221 | # async def main(): 222 | # print("🚀 Starting CalendarTool Test Suite...") 223 | # print("=" * 50) 224 | 225 | # # Initialize the tool 226 | # tool = CalendarTool() 227 | 228 | # # Test 1: List upcoming events (different quantities) 229 | # print("\n📅 TEST 1: Listing upcoming events") 230 | # print("-" * 30) 231 | # events = await tool.list_upcoming_events(3) 232 | # print(f"Next 3 events: {len(events)} found") 233 | # for i, event in enumerate(events, 1): 234 | # print(f" {i}. {event.get('summary', 'No title')} - {event.get('start', 'No date')}") 235 | 236 | # # Test 2: List more events 237 | # print(f"\nFetching more events (10)...") 238 | # more_events = await tool.list_upcoming_events(10) 239 | # print(f"Next 10 events: {len(more_events)} found") 240 | 241 | # # Test 3: Check current or next event (within 30 minutes) 242 | # print("\n⏰ TEST 2: Checking current/next event (30 min window)") 243 | # print("-" * 30) 244 | # current_event = await tool.get_current_or_next_event(30) 245 | # if current_event: 246 | # print(f"Found current/upcoming event: {current_event['summary']} at {current_event['start']}") 247 | # else: 248 | # print("No events in the next 30 minutes") 249 | 250 | # # Test 4: Check within a longer timeframe 251 | # print(f"\nChecking within 2 hours...") 252 | # next_event_2h = await tool.get_current_or_next_event(120) 253 | # if next_event_2h: 254 | # print(f"Event within 2 hours: {next_event_2h['summary']} at {next_event_2h['start']}") 255 | # else: 256 | # print("No events in the next 2 hours") 257 | 258 | # # Test 5: Add a test event (uncomment to test) 259 | # print("\n➕ TEST 3: Adding a test event") 260 | # print("-" * 30) 261 | 262 | # # Calculate times for a test event (1 hour from now, lasting 30 minutes) 263 | # now = datetime.now(timezone.utc) 264 | # start_time = (now + timedelta(hours=1)).isoformat().replace('+00:00', 'Z') 265 | # end_time = (now + timedelta(hours=1, minutes=30)).isoformat().replace('+00:00', 'Z') 266 | 267 | # print(f"Adding test event from {start_time} to {end_time}") 268 | # result = await tool.add_event( 269 | # "🧪 Test Event from CalendarTool", 270 | # start_time, 271 | # end_time, 272 | # "This is a test event created by the CalendarTool script. You can delete this." 273 | # ) 274 | 275 | # if result.get('status') == 'success': 276 | # print(f"✅ Event created successfully!") 277 | # print(f" Title: {result['summary']}") 278 | # print(f" Start: {result['start']}") 279 | # print(f" Event ID: {result['id']}") 280 | # print(f" Link: {result.get('link', 'N/A')}") 281 | # else: 282 | # print(f"❌ Failed to create event: {result.get('error', 'Unknown error')}") 283 | 284 | # # Test 6: Test different time formats 285 | # print("\n🕐 TEST 4: Testing different time formats") 286 | # print("-" * 30) 287 | 288 | # # Test with different time formats (these should fail gracefully) 289 | # invalid_formats = [ 290 | # ("Invalid format 1", "2024-13-45", "2024-13-46", "Should fail - invalid date"), 291 | # ("Invalid format 2", "not-a-date", "also-not-a-date", "Should fail - not a date"), 292 | # ] 293 | 294 | # for title, start, end, description in invalid_formats: 295 | # print(f"Testing {title}...") 296 | # result = await tool.add_event(title, start, end, description) 297 | # if 'error' in result: 298 | # print(f" ✅ Correctly caught error: {result['error']}") 299 | # else: 300 | # print(f" ❌ Should have failed but didn't: {result}") 301 | 302 | # # Test 7: Edge cases for listing events 303 | # print("\n🔍 TEST 5: Edge cases") 304 | # print("-" * 30) 305 | 306 | # # Test listing 0 events 307 | # zero_events = await tool.list_upcoming_events(0) 308 | # print(f"Requesting 0 events returned: {len(zero_events)} items") 309 | 310 | # # Test listing many events 311 | # many_events = await tool.list_upcoming_events(50) 312 | # print(f"Requesting 50 events returned: {len(many_events)} items") 313 | 314 | # # Test 8: Different time windows for current/next event 315 | # print("\n🎯 TEST 6: Different time windows") 316 | # print("-" * 30) 317 | 318 | # time_windows = [5, 15, 60, 240, 1440] # 5 min, 15 min, 1 hour, 4 hours, 24 hours 319 | # for minutes in time_windows: 320 | # event = await tool.get_current_or_next_event(minutes) 321 | # if event: 322 | # print(f" Within {minutes} minutes: {event['summary']}") 323 | # else: 324 | # print(f" Within {minutes} minutes: No events") 325 | 326 | # print("\n" + "=" * 50) 327 | # print("🎉 Test Suite Complete!") 328 | # print("\nIf you want to test adding events, uncomment the add_event lines above") 329 | # print("and modify the times to be in the future.") 330 | 331 | # asyncio.run(main()) 332 | -------------------------------------------------------------------------------- /whatsapp_response.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | from io import BytesIO 4 | from typing import Dict 5 | 6 | import httpx 7 | from fastapi import APIRouter, Request, Response 8 | from langchain_core.messages import HumanMessage 9 | from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver 10 | 11 | from graph import graph_builder 12 | from modules.image import ImageToText 13 | from modules.speech import SpeechToText, TextToSpeech 14 | from settings import settings 15 | 16 | logger = logging.getLogger(__name__) 17 | 18 | # Global module instances 19 | speech_to_text = SpeechToText() 20 | text_to_speech = TextToSpeech() 21 | image_to_text = ImageToText() 22 | 23 | # Router for WhatsApp response 24 | whatsapp_router = APIRouter() 25 | 26 | # WhatsApp API credentials 27 | WHATSAPP_TOKEN = os.getenv("WHATSAPP_TOKEN") 28 | WHATSAPP_PHONE_NUMBER_ID = os.getenv("WHATSAPP_PHONE_NUMBER_ID") 29 | 30 | # Add a simple in-memory store to prevent duplicate processing 31 | processed_messages = set() 32 | 33 | 34 | @whatsapp_router.api_route("/whatsapp_response", methods=["GET", "POST"]) 35 | async def whatsapp_handler(request: Request) -> Response: 36 | """Handles incoming messages and status updates from the WhatsApp Cloud API.""" 37 | 38 | if request.method == "GET": 39 | params = request.query_params 40 | if params.get("hub.verify_token") == os.getenv("WHATSAPP_VERIFY_TOKEN"): 41 | return Response(content=params.get("hub.challenge"), status_code=200) 42 | return Response(content="Verification token mismatch", status_code=403) 43 | 44 | try: 45 | data = await request.json() 46 | logger.info(f"Received webhook data: {data}") 47 | 48 | change_value = data["entry"][0]["changes"][0]["value"] 49 | 50 | if "messages" in change_value: 51 | message = change_value["messages"][0] 52 | message_id = message.get("id") 53 | from_number = message["from"] 54 | 55 | # Prevent duplicate processing 56 | if message_id in processed_messages: 57 | logger.info(f"Message {message_id} already processed, skipping") 58 | return Response(content="Message already processed", status_code=200) 59 | 60 | processed_messages.add(message_id) 61 | # Keep only last 1000 message IDs to prevent memory issues 62 | if len(processed_messages) > 1000: 63 | processed_messages.clear() 64 | 65 | session_id = from_number 66 | 67 | # Get user message and handle different message types 68 | content = "" 69 | if message["type"] == "audio": 70 | content = await process_audio_message(message) 71 | elif message["type"] == "image": 72 | # Get image caption if any 73 | content = message.get("image", {}).get("caption", "") 74 | logger.info(f"Received image message. Caption: {content}") 75 | 76 | # Download and analyze image 77 | try: 78 | image_id = message["image"]["id"] 79 | logger.info(f"Downloading image with ID: {image_id}") 80 | image_bytes = await download_media(image_id) 81 | logger.info(f"Downloaded image successfully. Size: {len(image_bytes)} bytes") 82 | 83 | # Analyze the image 84 | description = await image_to_text.analyze_image( 85 | image_bytes, 86 | "Please describe what you see in this image in the context of our conversation.", 87 | ) 88 | logger.info(f"Image analysis successful. Description length: {len(description)}") 89 | content += f"\n[Image Analysis: {description}]" 90 | except Exception as e: 91 | logger.error(f"Failed to analyze image: {e}", exc_info=True) 92 | # Still include caption if available, but mark that image analysis failed 93 | if not content: 94 | content = "[Image received but could not be analyzed]" 95 | else: 96 | content = message["text"]["body"] 97 | 98 | logger.info(f"Processing message from {from_number}: {content}") 99 | 100 | try: 101 | async with AsyncSqliteSaver.from_conn_string(settings.SHORT_TERM_MEMORY_DB_PATH) as short_term_memory: 102 | graph = graph_builder.compile(checkpointer=short_term_memory) 103 | await graph.ainvoke( 104 | {"messages": [HumanMessage(content=content)]}, 105 | {"configurable": {"thread_id": session_id}}, 106 | ) 107 | 108 | # Get the workflow type and response from the state 109 | output_state = await graph.aget_state(config={"configurable": {"thread_id": session_id}}) 110 | 111 | workflow = output_state.values.get("workflow", "conversation") 112 | response_message = output_state.values["messages"][-1].content 113 | 114 | # Handle different response types based on workflow 115 | success = False 116 | if workflow == "audio": 117 | audio_buffer = output_state.values["audio_buffer"] 118 | success = await send_response(from_number, response_message, "audio", audio_buffer) 119 | elif workflow == "image": 120 | image_path = output_state.values["image_path"] 121 | with open(image_path, "rb") as f: 122 | image_data = f.read() 123 | success = await send_response(from_number, response_message, "image", image_data) 124 | else: 125 | success = await send_response(from_number, response_message, "text") 126 | 127 | if success: 128 | logger.info("Message sent successfully") 129 | return Response(content="Message processed successfully", status_code=200) 130 | else: 131 | logger.error("Failed to send message to WhatsApp API") 132 | # Still return 200 to prevent WhatsApp from retrying 133 | return Response(content="Message processed but failed to send", status_code=200) 134 | 135 | except Exception as e: 136 | logger.error(f"Error processing graph: {e}", exc_info=True) 137 | # Return 200 to prevent retry loop 138 | return Response(content="Graph processing error", status_code=200) 139 | 140 | elif "statuses" in change_value: 141 | logger.info("Status update received") 142 | return Response(content="Status update received", status_code=200) 143 | 144 | else: 145 | logger.warning("Unknown event type received") 146 | return Response(content="Unknown event type", status_code=200) 147 | 148 | except Exception as e: 149 | logger.error(f"Error processing webhook: {e}", exc_info=True) 150 | # Return 200 to prevent WhatsApp from retrying 151 | return Response(content="Webhook processing error", status_code=200) 152 | 153 | 154 | async def download_media(media_id: str) -> bytes: 155 | """Download media from WhatsApp.""" 156 | if not WHATSAPP_TOKEN: 157 | logger.error("WHATSAPP_TOKEN is not set in environment variables") 158 | raise ValueError( 159 | "WHATSAPP_TOKEN is not set. Please check your .env file and ensure WHATSAPP_TOKEN is configured." 160 | ) 161 | 162 | if len(WHATSAPP_TOKEN) < 10: 163 | logger.warning(f"WHATSAPP_TOKEN appears to be invalid (too short: {len(WHATSAPP_TOKEN)} chars)") 164 | 165 | media_metadata_url = f"https://graph.facebook.com/v21.0/{media_id}" 166 | headers = {"Authorization": f"Bearer {WHATSAPP_TOKEN}"} 167 | 168 | async with httpx.AsyncClient(timeout=30.0) as client: 169 | logger.info(f"Fetching media metadata from: {media_metadata_url}") 170 | logger.debug(f"Using token: {WHATSAPP_TOKEN[:10]}...{WHATSAPP_TOKEN[-10:] if len(WHATSAPP_TOKEN) > 20 else '***'}") 171 | 172 | try: 173 | metadata_response = await client.get(media_metadata_url, headers=headers) 174 | 175 | # Check for 401 Unauthorized specifically 176 | if metadata_response.status_code == 401: 177 | error_detail = metadata_response.text 178 | logger.error(f"Facebook API 401 Unauthorized error: {error_detail}") 179 | logger.error( 180 | "Your WHATSAPP_TOKEN may be expired, invalid, or missing required permissions. " 181 | "Please check:\n" 182 | "1. The token is valid in your .env file\n" 183 | "2. The token hasn't expired (Facebook tokens expire)\n" 184 | "3. The token has the required permissions for media access\n" 185 | "4. You're using the correct token for your WhatsApp Business API account" 186 | ) 187 | 188 | metadata_response.raise_for_status() 189 | except httpx.HTTPStatusError as e: 190 | if e.response.status_code == 401: 191 | logger.error( 192 | f"Authentication failed with Facebook API. " 193 | f"Please verify your WHATSAPP_TOKEN is correct and has not expired." 194 | ) 195 | raise 196 | metadata = metadata_response.json() 197 | logger.info(f"Media metadata: {metadata}") 198 | 199 | download_url = metadata.get("url") 200 | if not download_url: 201 | raise ValueError(f"No download URL found in metadata: {metadata}") 202 | 203 | logger.info(f"Downloading media from: {download_url}") 204 | media_response = await client.get(download_url, headers=headers) 205 | media_response.raise_for_status() 206 | 207 | content = media_response.content 208 | logger.info(f"Downloaded media successfully. Size: {len(content)} bytes") 209 | return content 210 | 211 | 212 | async def process_audio_message(message: Dict) -> str: 213 | """Download and transcribe audio message.""" 214 | audio_id = message["audio"]["id"] 215 | media_metadata_url = f"https://graph.facebook.com/v21.0/{audio_id}" 216 | headers = {"Authorization": f"Bearer {WHATSAPP_TOKEN}"} 217 | 218 | async with httpx.AsyncClient() as client: 219 | metadata_response = await client.get(media_metadata_url, headers=headers) 220 | metadata_response.raise_for_status() 221 | metadata = metadata_response.json() 222 | download_url = metadata.get("url") 223 | 224 | # Download the audio file 225 | async with httpx.AsyncClient() as client: 226 | audio_response = await client.get(download_url, headers=headers) 227 | audio_response.raise_for_status() 228 | 229 | # Prepare for transcription 230 | audio_buffer = BytesIO(audio_response.content) 231 | audio_buffer.seek(0) 232 | audio_data = audio_buffer.read() 233 | 234 | return await speech_to_text.transcribe(audio_data) 235 | 236 | 237 | async def send_response( 238 | from_number: str, 239 | response_text: str, 240 | message_type: str = "text", 241 | media_content: bytes = None, 242 | ) -> bool: 243 | """Send response to user via WhatsApp API.""" 244 | 245 | # Validate response_text is not empty 246 | if not response_text or response_text.strip() == "": 247 | logger.warning(f"Empty response_text detected. Setting default message.") 248 | response_text = "I'm processing your request. Let me get back to you!" 249 | 250 | # Validate credentials first 251 | if not WHATSAPP_TOKEN or not WHATSAPP_PHONE_NUMBER_ID: 252 | logger.error("Missing WhatsApp credentials") 253 | return False 254 | 255 | print(f"DEBUG: Sending message type: {message_type}") 256 | print(f"DEBUG: Response text: '{response_text}'") 257 | print(f"DEBUG: Response text length: {len(response_text) if response_text else 0}") 258 | 259 | headers = { 260 | "Authorization": f"Bearer {WHATSAPP_TOKEN}", 261 | "Content-Type": "application/json", 262 | } 263 | 264 | if message_type in ["audio", "image"]: 265 | try: 266 | mime_type = "audio/mpeg" if message_type == "audio" else "image/png" 267 | media_buffer = BytesIO(media_content) 268 | media_id = await upload_media(media_buffer, mime_type) 269 | json_data = { 270 | "messaging_product": "whatsapp", 271 | "to": from_number, 272 | "type": message_type, 273 | message_type: {"id": media_id}, 274 | } 275 | 276 | # Add caption for images 277 | if message_type == "image": 278 | json_data["image"]["caption"] = response_text 279 | except Exception as e: 280 | logger.error(f"Media upload failed, falling back to text: {e}") 281 | message_type = "text" 282 | 283 | if message_type == "text": 284 | json_data = { 285 | "messaging_product": "whatsapp", 286 | "to": from_number, 287 | "type": "text", 288 | "text": {"body": response_text}, 289 | } 290 | 291 | logger.info(f"Sending to WhatsApp API - Headers: {headers}") 292 | logger.info(f"Sending to WhatsApp API - Payload: {json_data}") 293 | 294 | try: 295 | async with httpx.AsyncClient(timeout=30.0) as client: 296 | response = await client.post( 297 | f"https://graph.facebook.com/v21.0/{WHATSAPP_PHONE_NUMBER_ID}/messages", 298 | headers=headers, 299 | json=json_data, 300 | ) 301 | 302 | logger.info(f"WhatsApp API response status: {response.status_code}") 303 | logger.info(f"WhatsApp API response body: {response.text}") 304 | 305 | if response.status_code == 200: 306 | return True 307 | else: 308 | logger.error(f"WhatsApp API error: {response.status_code} - {response.text}") 309 | return False 310 | 311 | except httpx.TimeoutException: 312 | logger.error("Timeout when sending message to WhatsApp API") 313 | return False 314 | except Exception as e: 315 | logger.error(f"Exception when sending message to WhatsApp API: {e}") 316 | return False 317 | 318 | 319 | async def upload_media(media_content: BytesIO, mime_type: str) -> str: 320 | """Upload media to WhatsApp servers.""" 321 | headers = {"Authorization": f"Bearer {WHATSAPP_TOKEN}"} 322 | files = {"file": ("response.mp3", media_content, mime_type)} 323 | data = {"messaging_product": "whatsapp", "type": mime_type} 324 | 325 | async with httpx.AsyncClient(timeout=30.0) as client: 326 | response = await client.post( 327 | f"https://graph.facebook.com/v21.0/{WHATSAPP_PHONE_NUMBER_ID}/media", 328 | headers=headers, 329 | files=files, 330 | data=data, 331 | ) 332 | result = response.json() 333 | 334 | if "id" not in result: 335 | raise Exception(f"Failed to upload media: {result}") 336 | return result["id"] -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction and Project Overview 2 | 3 | ![Alt text](./images/image1.png) 4 | 5 | Kylie is a "Whatsapp Agent”, meaning it will interact with you through this app. But it won’t just rely on “regular” text messages, it will also listen to your voice notes (yes, even if you are one of those people 😒)and react to your pictures, add also will be able to look at your calendar, check for tasks and add tasks and reminders, and also be able to search the internet. 6 | 7 | And that’s not all, Kylie can also respond with her own voice notes and images of what she’s up to - yes, Kylie has a life beyond talking to you, don’t be such a narcissist! 😂. This is an inspiration from a friend named Kylie that stays in Naalya. 8 | 9 | ## At this point, you might be wondering: 10 | 11 | What kind of system have we implemented to handle multimodal inputs / outputs coherently? 12 | 13 | The short answer: Kylie’s brain is just a graph, a LangGraph 🕸️ (sorry, I couldn’t resist). 14 | 15 | ## Kylie’s Graph 16 | Your brain is made up of neurons, right? Well, Kylie’s brain is made up of LangGraph nodes and edges - one for image processing, another for listening to your voice, another for fetching relevant memories, and so on. 17 | 18 | At her core, Kylie is simply a graph with a state. This state maintains all the key details of the conversation, including shared information (text, audio or images), current activities, and contextual information. 19 | 20 | This is exactly what we'll explore in the second module below, where you'll learn how LangGraph can be used to build agentic design architectures, such as the router. 21 | 22 | ![Alt text](./images/kylie_graph.png) 23 | 24 | ## WhatsApp Integration 25 | 26 | Kylie receives messages through WhatsApp Cloud API webhooks. The integration handles: 27 | 28 | - **Message Reception**: FastAPI endpoint (`/whatsapp_response`) receives webhook events from WhatsApp 29 | - **Message Types**: Supports text, audio (voice notes), and image messages 30 | - **Audio Processing**: Downloads audio files from WhatsApp, transcribes them using STT, and processes the text 31 | - **Image Processing**: Downloads images from WhatsApp, analyzes them using Google Cloud Vision, and includes descriptions in conversation 32 | - **Response Sending**: Sends responses back via WhatsApp API in text, audio, or image format 33 | - **Session Management**: Uses phone numbers as thread IDs for conversation continuity 34 | - **State Persistence**: Graph state is saved to SQLite using AsyncSqliteSaver checkpointing 35 | 36 | ## Graph Compilation and Execution 37 | 38 | The graph is compiled with a checkpointer for state persistence: 39 | 40 | - **Checkpointer**: `AsyncSqliteSaver` saves conversation state to SQLite database 41 | - **Thread-based Sessions**: Each user (phone number) has a unique thread ID for isolated conversations 42 | - **State Recovery**: Previous conversation state is automatically loaded when processing new messages 43 | - **Graph Flow**: START → Memory Extraction → Router → Context Injection → Memory Injection → Workflow Branch → Summarization (conditional) → END 44 | 45 | ## Kylie’s memory 46 | An Agent without memory is like talking to the main character of “Memento” (and if you haven’t seen that film… seriously, what are you doing with your life?). 47 | 48 | Kylie has two types of memory: 49 | 50 | 🔷 Short term memory 51 | The usual - it stores the sequence of messages to maintain conversation context. In our case, we save this sequence in SQLite (we are also storing a summary of the conversation). 52 | 53 | 🔷 Long term memory 54 | When you meet someone, you don’t remember everything they say; you retain only the key details, like their name, profession, or where they’re from, right?. That’s exactly what we wanted to replicate with Qdrant - extracting relevant information from the conversation and storing it as embeddings. 55 | 56 | we’ll cover the memory in Module 3. 57 | 58 | 59 | ## Kylie’s senses 60 | Real Whatsapp conversations aren’t limited to just text. Think about it - do you remember the last cringe sticker your friend sent you last week? Or that neverending voice note from your high school friend? Exactly. We need both images and audio. 61 | 62 | To make this possible, we’ve selected the following tools. 63 | 64 | 🔷 Text 65 | I am using Groq models for all text generation. Specifically, I’ve chosen llama-3.3-70b-versatile as the core LLM. 66 | 67 | 🔷 Images 68 | The image module handles two tasks: processing user images and generating new ones (take a look at the image below). 69 | 70 | For image “understanding”, I've used google-cloud-vision. 71 | 72 | For image generation, black-forest-labs/FLUX.1-schnell-Free using Together AI. 73 | 74 | 🔷 Audio 75 | The audio module needs to take care of TTS (Text-To-Speech) and STT (Speech-To-Text). 76 | 77 | For TTS, I'm using Elevenlabs voices. 78 | 79 | For STT, whisper-large-v3-turbo from Groq. 80 | 81 | I'll shared about the audio module in module 4 and the image module in module 5! 82 | 83 | 84 | ## Module 2 (Disecting Kylie's Brain) 85 | 86 | Picture this: you’re a mad scientist living in a creepy old house in the middle of the forest, and your mission is to build a sentient robot. What’s the first thing you’d do? 87 | 88 | Yep, you’d start with the brain, right? 🧠 89 | 90 | So, when I started building Kylie, I also kicked things off with the “brain”. 91 | 92 | And that’s exactly what this section is all about - building Kylie’s brain using LangGraph! 🕸️ 93 | 94 | ## LangGraph in a Nutshell 95 | Never used LangGraph before? No worries, here’s a quick intro. 96 | 97 | LangGraph models agent workflows as graphs, using three main components: 98 | 99 | 🔶 State - A shared data structure that tracks the current status of your app (workflow). 100 | 101 | 🔶 Nodes - Python functions that define the agent behaviour. They take in the current state, perform actions, and return the updated state. 102 | 103 | 🔶 Edges - Python functions that decide which Node runs next based on the State, allowing for conditional or fixed transitions. 104 | 105 | By combining Nodes and Edges, you can build dynamic workflows, like Kylie! In the next section, we’ll take a look at Kylie’s graph and its Nodes and Edges. 106 | 107 | Before getting into the Nodes and the Edges, let’s describe Kylie’s state. 108 | 109 | 💠 Kylie State 110 | As mentioned earlier, LangGraph keeps track of your app's current status using the State. Kylie’s state has these attributes: 111 | 112 | - `summary` - The summary of the conversation so far. 113 | - `workflow` - The Current workflow type (conversation/image/audio/tools/search). 114 | - `audio_buffer` - The buffer containing audio data for voice messages. 115 | - `image_path` - Path to the current image being generated. 116 | - `current_activity` - Description of Kylie's current simulated activity. 117 | - `apply_activity` - Flag indicating whether to apply or update the current activity. 118 | - `memory_context`: Retrieved memories from vector database 119 | - `search_results`: Formatted search results from Tavily (when search is performed) 120 | 121 | - `messages`: Conversation message history (inherited from MessagesState) 122 | 123 | ![Alt text](./images/img3.png) 124 | 125 | This state will be saved in an external database. Im using SQLite3 for simplicity. 126 | 127 | Now that we know how Kylie’s State is set up, let’s check out the nodes and edges. 128 | 129 | 💠 Memory Extraction Node 130 | The first node of the graph is the memory extraction node. This node will take care of extracting relevant information from the user conversation (e.g. name, age, background, etc.) 131 | 132 | 💠 Context Injection Node 133 | To appear like a real person, Kylie needs to do more than just chat with you. That’s why we need a node that checks your local time and matches it with Kylie’s schedule. Kylie's schedule is hardcoded and you can change it based on what you want 134 | 135 | ![Alt text](./images/img4.png) 136 | 137 | 💠 Router Node 138 | - **Purpose**: Determines the appropriate response type. The router node is at the heart of Kylie's workflow. It determines which workflow Kylie's response should follow - audio (for audio responses), image (for visual responses) or conversation (regular text replies), tools(for Calendar operations needed) and search(for Internet search needed) 139 | - **Decision Process**: 140 | - Analyzes the last N messages (configurable via `ROUTER_MESSAGES_TO_ANALYZE`, default is typically 3-5) 141 | - Uses an LLM with structured output to classify the response type 142 | - Considers user intent, explicit requests, and conversation context 143 | - Returns one of: `conversation`, `image`, `audio`, `tools`, or `search` 144 | - **Decision Factors**: 145 | - **Calendar/Tools**: Keywords like "schedule", "calendar", "events", "meetings", "appointments", "add event", "what's on my calendar" 146 | - **Search**: Keywords like "search for", "what is", "tell me about", "current news", "latest", "find information about" 147 | - **Image**: Explicit requests for images, visual content, or "show me" type queries 148 | - **Audio**: Explicit requests for voice notes, audio responses, or "say it" type queries 149 | - **Conversation**: Default for regular text-based interactions 150 | - **Implementation**: Uses a structured output chain with temperature 0.3 for consistent routing decisions 151 | 152 | ![Alt text](./images/img5.png) 153 | 154 | ![Alt text](./images/img6.png) 155 | 156 | Once the Router Node determines the final answer, the chosen workflow is assigned to the "workflow" attribute of the AICompanionState. This information is then used by the select_workflow edge, which connects the router node to either the image, audio, tool, search or conversation nodes. 157 | 158 | ![Alt text](./images/img7.png) 159 | 160 | 💠 Tool Calling Node 161 | - **Purpose**: Handles calendar operations 162 | - **Capabilities**: 163 | - List upcoming events (with configurable max results, default 10) 164 | - Add calendar events (with summary, start/end times, and optional description) 165 | - Get current/next event (with configurable lookahead window, default 30 minutes) 166 | - **Calendar Integration**: Google Calendar API via direct LangChain tool integration 167 | - **Authentication**: Uses OAuth 2.0 flow with Google Calendar API 168 | - Requires initial setup of `credentials.json` from Google Cloud Console 169 | - Stores user authorization tokens in `token.json` for subsequent use 170 | - Automatically refreshes expired tokens 171 | - **Context**: Uses current date, time, and timezone (Africa/Kampala) 172 | - **Implementation**: 173 | - The `CalendarTool` class wraps Google Calendar API operations 174 | - LangChain tools (`list_upcoming_events`, `add_calendar_event`, `get_current_or_next_event`) are integrated into the graph 175 | - The router node determines when calendar operations are needed 176 | - Tool results are formatted and included in Kylie's response 177 | - **Features**: 178 | - Automatic timezone handling (converts local times to UTC for Google Calendar) 179 | - Event reminders (email 24 hours before, popup 10 minutes before) 180 | - Error handling for invalid dates, authentication failures, and API errors 181 | 182 | 183 | 💠 Search Node 184 | - **Purpose**: Performs internet search and generates responses with search context 185 | - **Search Provider**: Tavily Search API 186 | - **Process**: 187 | 1. Extracts search query from user message 188 | 2. Performs search using Tavily API with "advanced" search depth 189 | 3. Formats search results (title, content preview, source URL) 190 | 4. Generates response incorporating search results into conversation context 191 | - **Output**: Text response with search results context, stores `search_results` in state 192 | - **Use Cases**: Current events, news, recent information, factual queries, real-time data 193 | - **Implementation**: 194 | - The `TavilySearch` class handles all search operations 195 | - Default max results: 5 (configurable) 196 | - Search results are formatted with titles, content snippets (first 200 chars), and source URLs 197 | - Results are injected into the character response chain as context 198 | - The router node determines when internet search is needed based on user queries 199 | - **Features**: 200 | - Advanced search depth for comprehensive results 201 | - Automatic query extraction from user messages 202 | - Error handling for API failures and empty queries 203 | - Results are seamlessly integrated into Kylie's responses 204 | 205 | 💠 Summarize Conversation Node 206 | - **Purpose**: Reduces conversation history length 207 | - **Trigger**: When total messages exceed 100 (configurable via `TOTAL_MESSAGES_SUMMARY_TRIGGER`) 208 | - **Process**: 209 | - Creates/extends conversation summary 210 | - Removes old messages (keeps last 75 by default) 211 | - **Output**: Updated summary and reduced message history 212 | 213 | But of course, we don’t want to generate a summary every single time Kylie gets a message. That’s why this node is connected to the previous ones with a conditional edge. 214 | 215 | ![Alt text](./images/img8.png) 216 | 217 | As you can see in the implementation above, this edge connects the summarization node to the previous nodes if the total number of messages exceeds the TOTAL_MESSAGES_SUMMARY_TRIGGER (which is set to 100 by default in settings.py). If not, it will connect to the END node, which marks the end of the workflow 218 | 219 | ## Module 3 (Kylie's Memory) 220 | 221 | ![Alt text](./images/img9.png) 222 | 223 | Let’s start with a diagram to give you a big-picture view. As you can see, there are two main memory “blocks” - one stored in a SQLite database (left) and the other in a Qdrant collection (right). 224 | 225 | 💠 Short-term memory 226 | The block on the left represents the short-term memory, which is stored in the LangGraph state and then persisted in a SQLite database. LangGraph makes this process simple since it comes with a built-in checkpointer for handling database storage. 227 | 228 | In the code, we simply use the AsyncSqliteSaver class when compiling the graph. This ensures that the LangGraph state checkpoint is continuously saved to SQLite. You can see this in action in the code below. 229 | 230 | ![Alt text](./images/img10.png) 231 | 232 | Kylie’s state is a subclass of LangGraph’s MessageState, which means it inherits a messages property. This property holds the history of messages exchanged in the conversation - essentially, that’s what we call short-term memory! 233 | 234 | Integrating this short-term memory into the response chain is straightforward. We can use LangChain's MessagesPlaceholder class, allowing Kylie to consider past interactions when generating responses. This keeps the conversation smooth and coherent. 235 | 236 | Simple, right? Now, let’s get into the interesting part: the long-term memory. 237 | 238 | 💠 Long-term memory 239 | 240 | ![Alt text](./images/img11.png) 241 | 242 | Long-term memory isn’t just about saving every single message from a conversation - far from it 😅. That would be impractical and impossible to scale. Long-term memory works quite differently. 243 | 244 | Think about when you meet someone new, you don’t remember every word they say, right? You only retain key details, like their name, profession, where they’re from, or shared interests. 245 | 246 | That’s exactly what we wanted to replicate with Kylie. How? 🤔 247 | 248 | By using a Vector Database like Qdrant, that lets us store relevant information from conversations as embeddings. Let’s break this down in more detail. 249 | 250 | 🔶 Memory Extraction Node 251 | 252 | Previously, when we talked about the different nodes in our LangGraph workflow? The first one was the memory_extraction_node, which is responsible for identifying and storing key details from the conversation. 253 | 254 | That’s the first essential piece we need to get our long-term memory module up and running! 💪 255 | 256 | ![Alt text](./images/img12.png) 257 | 258 | 🔶 Qdrant 259 | As the conversation progresses, the memory_extraction_node will keep gathering more and more details about you. 260 | 261 | If you check your Qdrant Cloud instance, you’ll see the collection gradually filling up with “memories”. 262 | 263 | 🔶 Memory Injection Node 264 | Now that all the memories are stored in Qdrant, how do we let Kylie use them in her conversations? 265 | 266 | It’s simple! We just need one more node: the memory_injection_node. 267 | 268 | ![Alt text](./images/img13.png) 269 | 270 | This node uses the MemoryManager class to retrieve relevant memories from Qdrant - essentially performing a vector search to find the top-k similar embeddings. Then, it transforms those embeddings (vector representations) into text using the format_memories_for_prompt method. 271 | 272 | Once that's done, the formatted memories are stored in the memory_context property of the graph. This allows them to be parsed into the Character Card Prompt - the one that defines Kylie's personality and behaviour. 273 | 274 | ## Module 4 (Kylie's Voice) 275 | 276 | Kylie's audio pipeline works a lot like the vision pipeline. 277 | 278 | Instead of processing images and generating new ones, we're dealing with audio: converting speech to text and text back to speech. 279 | 280 | Take a look at the diagram above to see what I mean. 281 | 282 | It all starts when you send a voice note. The audio gets transcribed, and the text is sent into the LangGraph workflow. That text, along with your message, helps generate a response, sometimes with an accompanying voice note. We'll explore how conversations are shaped using the incoming message, chat history, memories, and even current activities. 283 | 284 | So, in a nutshell, there are two main flows: one for handling audio coming in and another for generating and sending new audio out. 285 | 286 | ## Audio In: Speech-to-Text (STT) 287 | 288 | ![Alt text](./images/img15.png) 289 | 290 | Speech-to-Text models convert spoken audio into written text, enabling Kylie to understand voice messages just like text messages. They process audio waveforms, identify phonemes and words, and generate accurate transcriptions even with background noise or different accents. 291 | 292 | For Kylie, STT is essential for making voice notes accessible. It lets her transcribe your voice messages accurately, understand your spoken requests, and generate responses that go beyond just text - bringing real conversational context into interactions. 293 | 294 | To integrate STT into Kylie's codebase, I built the SpeechToText class as part of Kylie's modules. We're using Groq's Whisper model (whisper-large-v3-turbo) for fast and accurate transcription. 295 | 296 | ## Audio Out: Text-to-Speech (TTS) 297 | 298 | ![Alt text](./images/img16.png) 299 | 300 | Text-to-Speech models convert written text into natural-sounding speech, enabling Kylie to respond with voice notes just like a real person. They process text, generate phonemes, and synthesize audio waveforms that sound human-like with proper intonation and emotion. 301 | 302 | For Kylie, TTS is crucial for creating natural and engaging voice responses. Whether she's responding with a voice note or expressing emotions, these models ensure her audio outputs match the conversation while staying warm and conversational. 303 | 304 | There are tons of TTS services out there - growing fast! - but we found that ElevenLabs gave us solid results, creating the natural, expressive voice we wanted for Kylie's personality. 305 | 306 | Plus, it offers great voice quality and customization options, which is a huge bonus! 307 | 308 | The workflow is simple: first, we generate a text response based on the chat history and Kylie's activities 309 | 310 | Next, we use this text to synthesize speech using ElevenLabs TTS, with voice settings optimized for natural conversation. 311 | 312 | The synthesize method then generates the audio bytes and stores them in the LangGraph state's audio_buffer. 313 | 314 | Finally, the audio gets sent back to the user via the WhatsApp endpoint hook, giving them a voice representation of what Kylie is saying! 315 | 316 | 317 | ## Module 5 (Kylie's Vision) 318 | 319 | ![Alt text](./images/img14.png) 320 | 321 | Kylie’s vision pipeline works a lot like the audio pipeline. 322 | 323 | Instead of converting speech to text and back, we’re dealing with images: processing what comes in and generating fresh ones to send back. 324 | 325 | Take a look at the diagram above to see what I mean. 326 | 327 | It all starts when I send a picture golfing. The image gets processed, and a description is sent into the LangGraph workflow. That description, along with my message, helps generate a response, sometimes with an accompanying image. We’ll explore how scenarios are shaped using the incoming message, chat history, memories, and even current activities. 328 | 329 | So, in a nutshell, there are two main flows: one for handling images coming in and another for generating and sending new ones out. 330 | 331 | ## Image In: Visual Language Models (VLM) 332 | 333 | ![Alt text](./images/img15.png) 334 | 335 | Vision Language Models (VLMs) process both images and text, generating text-based insights from visual input. They help with tasks like object recognition, image captioning, and answering questions about images. Some even understand spatial relationships, identifying objects or their positions. 336 | 337 | For Kylie, VLMs are key to making sense of incoming images. They let her analyze pictures, describe them accurately, and generate responses that go beyond just text - bringing real context and understanding into conversations. 338 | 339 | To integrate the VLM into Kylie’s codebase, I built the ImageToText class as part of Kylie’s modules. 340 | 341 | ## Image Out: Diffusion Models 342 | 343 | ![Alt text](./images/img16.png) 344 | 345 | Diffusion models are a type of generative AI that create images by refining random noise step by step until a clear picture emerges. They learn from training data to produce diverse, high-quality images without copying exact examples. 346 | 347 | For Kylie, diffusion models are crucial for generating realistic and context-aware images. Whether she’s responding with a visual or illustrating a concept, these models ensure her image outputs match the conversation while staying creative and unique. 348 | 349 | There are tons of diffusion models out there - growing fast! - but we found that FLUX.1 gave us solid results, creating the realistic images we wanted for Kylie’s simulated life. 350 | 351 | Plus, it’s free to use on the Together.ai platform, which is a huge bonus! 352 | The workflow is simple: first, we generate a scenario based on the chat history and Kylie's activities 353 | 354 | Next, we use this scenario to craft a prompt for image generation, adding guardrails, context, and other relevant details. 355 | 356 | The generate_image method then saves the output image to the filesystem and stores its path in the LangGraph state. 357 | 358 | Finally, the image gets sent back to the user via the WhatsApp endpoint hook, giving them a visual representation of what Kylie is seeing! 359 | 360 | 361 | 362 | 363 | --------------------------------------------------------------------------------