├── __init__.py
├── core
    ├── __init__.py
    ├── exceptions.py
    ├── schedules.py
    └── prompts.py
├── graph
    ├── utils
    │   ├── __init__.py
    │   ├── helpers.py
    │   └── chains.py
    ├── __init__.py
    ├── edges.py
    ├── state.py
    ├── graph.py
    └── nodes.py
├── mycalendar
    ├── __init__.py
    ├── langchain_integration.py
    └── calendar_tool.py
├── modules
    ├── schedules
    │   ├── __init__.py
    │   └── context_generation.py
    ├── search
    │   ├── __init__.py
    │   └── tavily_search.py
    ├── image
    │   ├── __init__.py
    │   ├── text_to_image.py
    │   └── image_to_text.py
    ├── speech
    │   ├── __init__.py
    │   ├── text_to_speech.py
    │   └── speech_to_text.py
    └── memory
    │   └── long_term
    │       ├── memory_manager.py
    │       └── vector_store.py
├── images
    ├── img1.png
    ├── img3.png
    ├── img4.png
    ├── img5.png
    ├── img6.png
    ├── img7.png
    ├── img8.png
    ├── img9.png
    ├── image1.png
    ├── img10.png
    ├── img11.png
    ├── img12.png
    ├── img13.png
    ├── img14.png
    ├── img15.png
    ├── img16.png
    ├── kylie_graph.png
    └── architecture.png
├── main.py
├── requirements.txt
├── pyproject.toml
├── settings.py
├── workflow.md
├── SETUP_GUIDE.md
├── whatsapp_response.py
└── README.md


/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/core/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/graph/utils/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/mycalendar/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/modules/schedules/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/images/img1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img1.png


--------------------------------------------------------------------------------
/images/img3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img3.png


--------------------------------------------------------------------------------
/images/img4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img4.png


--------------------------------------------------------------------------------
/images/img5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img5.png


--------------------------------------------------------------------------------
/images/img6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img6.png


--------------------------------------------------------------------------------
/images/img7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img7.png


--------------------------------------------------------------------------------
/images/img8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img8.png


--------------------------------------------------------------------------------
/images/img9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img9.png


--------------------------------------------------------------------------------
/images/image1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/image1.png


--------------------------------------------------------------------------------
/images/img10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img10.png


--------------------------------------------------------------------------------
/images/img11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img11.png


--------------------------------------------------------------------------------
/images/img12.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img12.png


--------------------------------------------------------------------------------
/images/img13.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img13.png


--------------------------------------------------------------------------------
/images/img14.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img14.png


--------------------------------------------------------------------------------
/images/img15.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img15.png


--------------------------------------------------------------------------------
/images/img16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/img16.png


--------------------------------------------------------------------------------
/images/kylie_graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/kylie_graph.png


--------------------------------------------------------------------------------
/images/architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jonathanmuk/Kylie/HEAD/images/architecture.png


--------------------------------------------------------------------------------
/modules/search/__init__.py:
--------------------------------------------------------------------------------
1 | from .tavily_search import TavilySearch
2 | 
3 | __all__ = ["TavilySearch"]
4 | 


--------------------------------------------------------------------------------
/graph/__init__.py:
--------------------------------------------------------------------------------
1 | from graph.graph import create_workflow_graph
2 | 
3 | graph_builder = create_workflow_graph()
4 | 


--------------------------------------------------------------------------------
/modules/image/__init__.py:
--------------------------------------------------------------------------------
1 | from .image_to_text import ImageToText
2 | from .text_to_image import TextToImage
3 | 
4 | __all__ = ["ImageToText", "TextToImage"]
5 | 


--------------------------------------------------------------------------------
/modules/speech/__init__.py:
--------------------------------------------------------------------------------
1 | from .speech_to_text import SpeechToText
2 | from .text_to_speech import TextToSpeech
3 | 
4 | __all__ = ["SpeechToText", "TextToSpeech"]
5 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | from dotenv import load_dotenv
 2 | load_dotenv()
 3 | 
 4 | from fastapi import FastAPI
 5 | 
 6 | from whatsapp_response import whatsapp_router
 7 | 
 8 | app = FastAPI()
 9 | app.include_router(whatsapp_router)
10 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | chainlit
 2 | elevenlabs
 3 | fastapi[standard]
 4 | groq
 5 | langchain-community
 6 | langchain-groq
 7 | langchain
 8 | pydantic
 9 | together
10 | langgraph
11 | langchain-openai
12 | langgraph-checkpoint-duckdb
13 | duckdb
14 | langgraph-checkpoint-sqlite
15 | aiosqlite
16 | qdrant-client
17 | sentence-transformers
18 | google-api-python-client
19 | google-auth
20 | google-auth-oauthlib
21 | pytz
22 | httpx


--------------------------------------------------------------------------------
/core/exceptions.py:
--------------------------------------------------------------------------------
 1 | class TextToSpeechError(Exception):
 2 |     """Exception raised for text-to-speech conversion errors."""
 3 | 
 4 | class SpeechToTextError(Exception):
 5 |     """Exception raised for speech-to-text conversion errors."""
 6 | 
 7 | class ImageToTextError(Exception):
 8 |     """Exception raised for image-to-text conversion errors."""
 9 | 
10 | class TextToImageError(Exception):
11 |     """Exception raised for text-to-image generation errors."""
12 | 
13 | class SearchError(Exception):
14 |     """Exception raised for search operation errors."""
15 | 


--------------------------------------------------------------------------------
/graph/edges.py:
--------------------------------------------------------------------------------
 1 | from langgraph.graph import END
 2 | from typing_extensions import Literal
 3 | 
 4 | from .state import AICompanionState
 5 | from settings import settings
 6 | 
 7 | 
 8 | def should_summarize_conversation(
 9 |     state: AICompanionState,
10 | ) -> Literal["summarize_conversation_node", "__end__"]:
11 |     messages = state["messages"]
12 | 
13 |     if len(messages) > settings.TOTAL_MESSAGES_SUMMARY_TRIGGER:
14 |         return "summarize_conversation_node"
15 | 
16 |     return END
17 | 
18 | 
19 | def select_workflow(
20 |     state: AICompanionState,
21 | ) -> Literal["conversation_node", "image_node", "audio_node", "tool_calling_node", "search_node"]:
22 |     workflow = state["workflow"]
23 | 
24 |     if workflow == "image":
25 |         return "image_node"
26 |     elif workflow == "audio":
27 |         return "audio_node"
28 |     elif workflow == "tools":
29 |         return "tool_calling_node"
30 |     elif workflow == "search":
31 |         return "search_node"
32 |     else:
33 |         return "conversation_node"


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [project]
 2 | name = "kylie"
 3 | version = "0.1.0"
 4 | description = "Add your description here"
 5 | readme = "README.md"
 6 | requires-python = ">=3.13"
 7 | dependencies = [
 8 |     "aiosqlite>=0.20.0",
 9 |     "chainlit>=1.3.2",
10 |     "duckdb>=1.1.3",
11 |     "elevenlabs>=1.50.3",
12 |     "fastapi[standard]>=0.115.6",
13 |     "google-api-python-client>=2.178.0",
14 |     "google-auth>=2.40.3",
15 |     "google-auth-oauthlib>=1.2.2",
16 |     "groq>=0.13.1",
17 |     "langchain>=0.3.13",
18 |     "langchain-community>=0.3.13",
19 |     "langchain-groq>=0.2.2",
20 |     "langchain-mcp-adapters>=0.1.9",
21 |     "langchain-openai>=0.2.14",
22 |     "langgraph>=0.2.60",
23 |     "langgraph-checkpoint-duckdb>=2.0.1",
24 |     "langgraph-checkpoint-sqlite>=2.0.1",
25 |     "mcp>=1.12.4",
26 |     "pre-commit>=4.0.1",
27 |     "pydantic==2.10.0",
28 |     "pydantic-settings>=2.7.0",
29 |     "qdrant-client>=1.12.1",
30 |     "sentence-transformers>=3.3.1",
31 |     "supabase>=2.11.0",
32 |     "together>=1.3.10",
33 | ]
34 | 


--------------------------------------------------------------------------------
/graph/state.py:
--------------------------------------------------------------------------------
 1 | from langgraph.graph import MessagesState
 2 | 
 3 | 
 4 | class AICompanionState(MessagesState):
 5 |     """State class for the AI Companion workflow.
 6 | 
 7 |     Extends MessagesState to track conversation history and maintains the last message received.
 8 | 
 9 |     Attributes:
10 |         last_message (AnyMessage): The most recent message in the conversation, can be any valid
11 |             LangChain message type (HumanMessage, AIMessage, etc.)
12 |         workflow (str): The current workflow the AI Companion is in. Can be "conversation", "image", or "audio".
13 |         audio_buffer (bytes): The audio buffer to be used for speech-to-text conversion.
14 |         current_activity (str): The current activity of Ava based on the schedule.
15 |         memory_context (str): The context of the memories to be injected into the character card.
16 |     """
17 | 
18 |     summary: str
19 |     workflow: str
20 |     audio_buffer: bytes
21 |     image_path: str
22 |     current_activity: str
23 |     apply_activity: bool
24 |     memory_context: str
25 |     search_results: str
26 | 


--------------------------------------------------------------------------------
/settings.py:
--------------------------------------------------------------------------------
 1 | from pydantic_settings import BaseSettings, SettingsConfigDict
 2 | 
 3 | 
 4 | class Settings(BaseSettings):
 5 |     model_config = SettingsConfigDict(env_file=".env", extra="ignore", env_file_encoding="utf-8")
 6 | 
 7 |     GROQ_API_KEY: str
 8 |     ELEVENLABS_API_KEY: str
 9 |     ELEVENLABS_VOICE_ID: str
10 |     TOGETHER_API_KEY: str
11 |     GOOGLE_CLOUD_API_KEY: str
12 |     TAVILY_API_KEY: str
13 | 
14 |     QDRANT_API_KEY: str | None
15 |     QDRANT_URL: str
16 |     QDRANT_PORT: str = "6333"
17 |     QDRANT_HOST: str | None = None
18 | 
19 |     TEXT_MODEL_NAME: str = "llama-3.3-70b-versatile"
20 |     SMALL_TEXT_MODEL_NAME: str = "llama-3.1-8b-instant"  
21 |     STT_MODEL_NAME: str = "whisper-large-v3-turbo"
22 |     TTS_MODEL_NAME: str = "eleven_flash_v2_5"
23 |     TTI_MODEL_NAME: str = "black-forest-labs/FLUX.1-schnell-Free"
24 |     ITT_MODEL_NAME: str = "google-cloud-vision" 
25 | 
26 |     MEMORY_TOP_K: int = 3
27 |     ROUTER_MESSAGES_TO_ANALYZE: int = 3
28 |     TOTAL_MESSAGES_SUMMARY_TRIGGER: int = 100
29 |     TOTAL_MESSAGES_AFTER_SUMMARY: int = 75
30 | 
31 |     SHORT_TERM_MEMORY_DB_PATH: str = "data/memory.db"
32 | 
33 | 
34 | settings = Settings()
35 | 


--------------------------------------------------------------------------------
/graph/utils/helpers.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | from typing import List
 3 | 
 4 | from langchain_core.output_parsers import StrOutputParser
 5 | from langchain_core.tools import BaseTool
 6 | from langchain_groq import ChatGroq
 7 | 
 8 | from modules.image.image_to_text import ImageToText
 9 | from modules.image.text_to_image import TextToImage
10 | from modules.speech import TextToSpeech
11 | from modules.search import TavilySearch
12 | from mycalendar.langchain_integration import get_calendar_tools
13 | from settings import settings
14 | 
15 | 
16 | def get_chat_model(temperature: float = 0.7):
17 |     return ChatGroq(
18 |         api_key=settings.GROQ_API_KEY,
19 |         model_name=settings.TEXT_MODEL_NAME,
20 |         temperature=temperature,
21 |     )
22 | 
23 | 
24 | def get_chat_model_with_tools(temperature: float = 0.7):
25 |     """Get chat model with calendar tools bound to it."""
26 |     model = get_chat_model(temperature=temperature)
27 |     tools = get_calendar_tools()
28 |     return model.bind_tools(tools)
29 | 
30 | 
31 | def get_text_to_speech_module():
32 |     return TextToSpeech()
33 | 
34 | 
35 | def get_search_module():
36 |     return TavilySearch()
37 | 
38 | 
39 | def get_text_to_image_module():
40 |     return TextToImage()
41 | 
42 | 
43 | def get_image_to_text_module():
44 |     return ImageToText()
45 | 
46 | 
47 | def get_available_tools() -> List[BaseTool]:
48 |     """Get all available tools for the agent."""
49 |     return get_calendar_tools()
50 | 
51 | 
52 | def remove_asterisk_content(text: str) -> str:
53 |     """Remove content between asterisks from the text."""
54 |     return re.sub(r"\*.*?\*", "", text).strip()
55 | 
56 | 
57 | class AsteriskRemovalParser(StrOutputParser):
58 |     def parse(self, text):
59 |         return remove_asterisk_content(super().parse(text))


--------------------------------------------------------------------------------
/graph/graph.py:
--------------------------------------------------------------------------------
 1 | from functools import lru_cache
 2 | 
 3 | from langgraph.graph import END, START, StateGraph
 4 | 
 5 | from .edges import (
 6 |     select_workflow,
 7 |     should_summarize_conversation,
 8 | )
 9 | from .nodes import (
10 |     audio_node,
11 |     context_injection_node,
12 |     conversation_node,
13 |     image_node,
14 |     memory_extraction_node,
15 |     memory_injection_node,
16 |     router_node,
17 |     search_node,
18 |     summarize_conversation_node,
19 |     tool_calling_node,  
20 | )
21 | from .state import AICompanionState
22 | 
23 | 
24 | @lru_cache(maxsize=1)
25 | def create_workflow_graph():
26 |     graph_builder = StateGraph(AICompanionState)
27 | 
28 |     graph_builder.add_node("memory_extraction_node", memory_extraction_node)
29 |     graph_builder.add_node("router_node", router_node)
30 |     graph_builder.add_node("context_injection_node", context_injection_node)
31 |     graph_builder.add_node("memory_injection_node", memory_injection_node)
32 |     graph_builder.add_node("conversation_node", conversation_node)
33 |     graph_builder.add_node("image_node", image_node)
34 |     graph_builder.add_node("audio_node", audio_node)
35 |     graph_builder.add_node("tool_calling_node", tool_calling_node)
36 |     graph_builder.add_node("search_node", search_node)
37 |     graph_builder.add_node("summarize_conversation_node", summarize_conversation_node)
38 | 
39 |     # First extract memories from user message
40 |     graph_builder.add_edge(START, "memory_extraction_node")
41 | 
42 |     # Then determine response type
43 |     graph_builder.add_edge("memory_extraction_node", "router_node")
44 | 
45 |     # Then inject both context and memories
46 |     graph_builder.add_edge("router_node", "context_injection_node")
47 |     graph_builder.add_edge("context_injection_node", "memory_injection_node")
48 | 
49 |     # Then proceed to appropriate response node
50 |     graph_builder.add_conditional_edges("memory_injection_node", select_workflow)
51 | 
52 |     # Check for summarization after any response
53 |     graph_builder.add_conditional_edges("conversation_node", should_summarize_conversation)
54 |     graph_builder.add_conditional_edges("image_node", should_summarize_conversation)
55 |     graph_builder.add_conditional_edges("audio_node", should_summarize_conversation)
56 |     graph_builder.add_conditional_edges("tool_calling_node", should_summarize_conversation)
57 |     graph_builder.add_conditional_edges("search_node", should_summarize_conversation)
58 |     graph_builder.add_edge("summarize_conversation_node", END)
59 | 
60 |     return graph_builder
61 | 
62 | 
63 | graph = create_workflow_graph().compile()
64 | 
65 | 
66 | graph_builder = create_workflow_graph()


--------------------------------------------------------------------------------
/modules/schedules/context_generation.py:
--------------------------------------------------------------------------------
 1 | from datetime import datetime
 2 | from typing import Dict, Optional
 3 | 
 4 | from core.schedules import (
 5 |     FRIDAY_SCHEDULE,
 6 |     MONDAY_SCHEDULE,
 7 |     SATURDAY_SCHEDULE,
 8 |     SUNDAY_SCHEDULE,
 9 |     THURSDAY_SCHEDULE,
10 |     TUESDAY_SCHEDULE,
11 |     WEDNESDAY_SCHEDULE,
12 | )
13 | 
14 | 
15 | class ScheduleContextGenerator:
16 |     """Class to generate context about Ava's current activity based on schedules."""
17 | 
18 |     SCHEDULES = {
19 |         0: MONDAY_SCHEDULE,  # Monday
20 |         1: TUESDAY_SCHEDULE,  # Tuesday
21 |         2: WEDNESDAY_SCHEDULE,  # Wednesday
22 |         3: THURSDAY_SCHEDULE,  # Thursday
23 |         4: FRIDAY_SCHEDULE,  # Friday
24 |         5: SATURDAY_SCHEDULE,  # Saturday
25 |         6: SUNDAY_SCHEDULE,  # Sunday
26 |     }
27 | 
28 |     @staticmethod
29 |     def _parse_time_range(time_range: str) -> tuple[datetime.time, datetime.time]:
30 |         """Parse a time range string (e.g., '06:00-07:00') into start and end times."""
31 |         start_str, end_str = time_range.split("-")
32 |         start_time = datetime.strptime(start_str, "%H:%M").time()
33 |         end_time = datetime.strptime(end_str, "%H:%M").time()
34 |         return start_time, end_time
35 | 
36 |     @classmethod
37 |     def get_current_activity(cls) -> Optional[str]:
38 |         """Get Ava's current activity based on the current time and day of the week.
39 | 
40 |         Returns:
41 |             str: Description of current activity, or None if no matching time slot is found
42 |         """
43 |         # Get current time and day of week (0 = Monday, 6 = Sunday)
44 |         current_datetime = datetime.now()
45 |         current_time = current_datetime.time()
46 |         current_day = current_datetime.weekday()
47 | 
48 |         # Get schedule for current day
49 |         schedule = cls.SCHEDULES.get(current_day, {})
50 | 
51 |         # Find matching time slot
52 |         for time_range, activity in schedule.items():
53 |             start_time, end_time = cls._parse_time_range(time_range)
54 | 
55 |             # Handle overnight activities (e.g., 23:00-06:00)
56 |             if start_time > end_time:
57 |                 if current_time >= start_time or current_time <= end_time:
58 |                     return activity
59 |             else:
60 |                 if start_time <= current_time <= end_time:
61 |                     return activity
62 | 
63 |         return None
64 | 
65 |     @classmethod
66 |     def get_schedule_for_day(cls, day: int) -> Dict[str, str]:
67 |         """Get the complete schedule for a specific day.
68 | 
69 |         Args:
70 |             day: Day of week as integer (0 = Monday, 6 = Sunday)
71 | 
72 |         Returns:
73 |             Dict[str, str]: Schedule for the specified day
74 |         """
75 |         return cls.SCHEDULES.get(day, {})
76 | 


--------------------------------------------------------------------------------
/modules/speech/text_to_speech.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from typing import Optional
 3 | 
 4 | from core.exceptions import TextToSpeechError
 5 | from settings import settings
 6 | from elevenlabs import ElevenLabs, Voice, VoiceSettings
 7 | 
 8 | 
 9 | class TextToSpeech:
10 |     """A class to handle text-to-speech conversion using ElevenLabs."""
11 | 
12 |     # Required environment variables
13 |     REQUIRED_ENV_VARS = ["ELEVENLABS_API_KEY", "ELEVENLABS_VOICE_ID"]
14 | 
15 |     def __init__(self):
16 |         """Initialize the TextToSpeech class and validate environment variables."""
17 |         self._validate_env_vars()
18 |         self._client: Optional[ElevenLabs] = None
19 | 
20 |     def _validate_env_vars(self) -> None:
21 |         """Validate that all required environment variables are set."""
22 |         missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)]
23 |         if missing_vars:
24 |             raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}")
25 | 
26 |     @property
27 |     def client(self) -> ElevenLabs:
28 |         """Get or create ElevenLabs client instance using singleton pattern."""
29 |         if self._client is None:
30 |             self._client = ElevenLabs(api_key=settings.ELEVENLABS_API_KEY)
31 |         return self._client
32 | 
33 |     async def synthesize(self, text: str) -> bytes:
34 |         """Convert text to speech using ElevenLabs.
35 | 
36 |         Args:
37 |             text: Text to convert to speech
38 | 
39 |         Returns:
40 |             bytes: Audio data
41 | 
42 |         Raises:
43 |             ValueError: If the input text is empty or too long
44 |             TextToSpeechError: If the text-to-speech conversion fails
45 |         """
46 |         if not text.strip():
47 |             raise ValueError("Input text cannot be empty")
48 | 
49 |         if len(text) > 5000:  # ElevenLabs typical limit
50 |             raise ValueError("Input text exceeds maximum length of 5000 characters")
51 | 
52 |         try:
53 |             # Use the correct method name - it should be text_to_speech.convert()
54 |             audio_generator = self.client.text_to_speech.convert(
55 |                 voice_id=settings.ELEVENLABS_VOICE_ID,
56 |                 text=text,
57 |                 model_id=settings.TTS_MODEL_NAME,
58 |                 voice_settings=VoiceSettings(
59 |                     stability=0.5, 
60 |                     similarity_boost=0.5
61 |                 )
62 |             )
63 | 
64 |             # Convert generator to bytes
65 |             audio_bytes = b"".join(audio_generator)
66 |             if not audio_bytes:
67 |                 raise TextToSpeechError("Generated audio is empty")
68 | 
69 |             return audio_bytes
70 | 
71 |         except Exception as e:
72 |             raise TextToSpeechError(f"Text-to-speech conversion failed: {str(e)}") from e


--------------------------------------------------------------------------------
/modules/speech/speech_to_text.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import tempfile
 3 | from typing import Optional
 4 | 
 5 | from core.exceptions import SpeechToTextError
 6 | from settings import settings
 7 | from groq import Groq
 8 | 
 9 | 
10 | class SpeechToText:
11 |     """A class to handle speech-to-text conversion using Groq's Whisper model."""
12 | 
13 |     # Required environment variables
14 |     REQUIRED_ENV_VARS = ["GROQ_API_KEY"]
15 | 
16 |     def __init__(self):
17 |         """Initialize the SpeechToText class and validate environment variables."""
18 |         self._validate_env_vars()
19 |         self._client: Optional[Groq] = None
20 | 
21 |     def _validate_env_vars(self) -> None:
22 |         """Validate that all required environment variables are set."""
23 |         missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)]
24 |         if missing_vars:
25 |             raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}")
26 | 
27 |     @property
28 |     def client(self) -> Groq:
29 |         """Get or create Groq client instance using singleton pattern."""
30 |         if self._client is None:
31 |             self._client = Groq(api_key=settings.GROQ_API_KEY)
32 |         return self._client
33 | 
34 |     async def transcribe(self, audio_data: bytes) -> str:
35 |         """Convert speech to text using Groq's Whisper model.
36 | 
37 |         Args:
38 |             audio_data: Binary audio data
39 | 
40 |         Returns:
41 |             str: Transcribed text
42 | 
43 |         Raises:
44 |             ValueError: If the audio file is empty or invalid
45 |             RuntimeError: If the transcription fails
46 |         """
47 |         if not audio_data:
48 |             raise ValueError("Audio data cannot be empty")
49 | 
50 |         try:
51 |             # Create a temporary file with .wav extension
52 |             with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as temp_file:
53 |                 temp_file.write(audio_data)
54 |                 temp_file_path = temp_file.name
55 | 
56 |             try:
57 |                 # Open the temporary file for the API request
58 |                 with open(temp_file_path, "rb") as audio_file:
59 |                     transcription = self.client.audio.transcriptions.create(
60 |                         file=audio_file,
61 |                         model="whisper-large-v3-turbo",
62 |                         language="en",
63 |                         response_format="text",
64 |                     )
65 | 
66 |                 if not transcription:
67 |                     raise SpeechToTextError("Transcription result is empty")
68 | 
69 |                 return transcription
70 | 
71 |             finally:
72 |                 # Clean up the temporary file
73 |                 os.unlink(temp_file_path)
74 | 
75 |         except Exception as e:
76 |             raise SpeechToTextError(f"Speech-to-text conversion failed: {str(e)}") from e
77 | 


--------------------------------------------------------------------------------
/modules/memory/long_term/memory_manager.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import uuid
 3 | from datetime import datetime
 4 | from typing import List, Optional
 5 | 
 6 | from core.prompts import MEMORY_ANALYSIS_PROMPT
 7 | from modules.memory.long_term.vector_store import get_vector_store
 8 | from settings import settings
 9 | from langchain_core.messages import BaseMessage
10 | from langchain_groq import ChatGroq
11 | from pydantic import BaseModel, Field
12 | 
13 | 
14 | class MemoryAnalysis(BaseModel):
15 |     """Result of analyzing a message for memory-worthy content."""
16 | 
17 |     is_important: bool = Field(
18 |         ...,
19 |         description="Whether the message is important enough to be stored as a memory",
20 |     )
21 |     formatted_memory: Optional[str] = Field(..., description="The formatted memory to be stored")
22 | 
23 | 
24 | class MemoryManager:
25 |     """Manager class for handling long-term memory operations."""
26 | 
27 |     def __init__(self):
28 |         self.vector_store = get_vector_store()
29 |         self.logger = logging.getLogger(__name__)
30 |         self.llm = ChatGroq(
31 |             model=settings.SMALL_TEXT_MODEL_NAME,
32 |             api_key=settings.GROQ_API_KEY,
33 |             temperature=0.1,
34 |             max_retries=2,
35 |         ).with_structured_output(MemoryAnalysis)
36 | 
37 |     async def _analyze_memory(self, message: str) -> MemoryAnalysis:
38 |         """Analyze a message to determine importance and format if needed."""
39 |         prompt = MEMORY_ANALYSIS_PROMPT.format(message=message)
40 |         return await self.llm.ainvoke(prompt)
41 | 
42 |     async def extract_and_store_memories(self, message: BaseMessage) -> None:
43 |         """Extract important information from a message and store in vector store."""
44 |         if message.type != "human":
45 |             return
46 | 
47 |         # Analyze the message for importance and formatting
48 |         analysis = await self._analyze_memory(message.content)
49 |         if analysis.is_important and analysis.formatted_memory:
50 |             # Check if similar memory exists
51 |             similar = self.vector_store.find_similar_memory(analysis.formatted_memory)
52 |             if similar:
53 |                 # Skip storage if we already have a similar memory
54 |                 self.logger.info(f"Similar memory already exists: '{analysis.formatted_memory}'")
55 |                 return
56 | 
57 |             # Store new memory
58 |             self.logger.info(f"Storing new memory: '{analysis.formatted_memory}'")
59 |             self.vector_store.store_memory(
60 |                 text=analysis.formatted_memory,
61 |                 metadata={
62 |                     "id": str(uuid.uuid4()),
63 |                     "timestamp": datetime.now().isoformat(),
64 |                 },
65 |             )
66 | 
67 |     def get_relevant_memories(self, context: str) -> List[str]:
68 |         """Retrieve relevant memories based on the current context."""
69 |         memories = self.vector_store.search_memories(context, k=settings.MEMORY_TOP_K)
70 |         if memories:
71 |             for memory in memories:
72 |                 self.logger.debug(f"Memory: '{memory.text}' (score: {memory.score:.2f})")
73 |         return [memory.text for memory in memories]
74 | 
75 |     def format_memories_for_prompt(self, memories: List[str]) -> str:
76 |         """Format retrieved memories as bullet points."""
77 |         if not memories:
78 |             return ""
79 |         return "\n".join(f"- {memory}" for memory in memories)
80 | 
81 | 
82 | def get_memory_manager() -> MemoryManager:
83 |     """Get a MemoryManager instance."""
84 |     return MemoryManager()
85 | 


--------------------------------------------------------------------------------
/modules/search/tavily_search.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import os
  3 | from typing import List, Dict, Optional
  4 | 
  5 | import httpx
  6 | from core.exceptions import SearchError
  7 | from settings import settings
  8 | 
  9 | 
 10 | class TavilySearch:
 11 |     """A class to handle internet search using Tavily API."""
 12 | 
 13 |     REQUIRED_ENV_VARS = ["TAVILY_API_KEY"]
 14 | 
 15 |     def __init__(self):
 16 |         """Initialize the TavilySearch class."""
 17 |         self._validate_env_vars()
 18 |         self.logger = logging.getLogger(__name__)
 19 |         self.api_key = settings.TAVILY_API_KEY
 20 |         self.api_url = "https://api.tavily.com/search"
 21 | 
 22 |     def _validate_env_vars(self) -> None:
 23 |         """Validate that environment variables are set."""
 24 |         missing_vars = [var for var in self.REQUIRED_ENV_VARS if not getattr(settings, var, None)]
 25 |         if missing_vars:
 26 |             raise ValueError(
 27 |                 f"Missing required environment variables: {', '.join(missing_vars)}\n"
 28 |                 "Please set TAVILY_API_KEY in your .env file."
 29 |             )
 30 | 
 31 |     async def search(self, query: str, max_results: int = 5) -> List[Dict[str, str]]:
 32 |         """
 33 |         Search the internet using Tavily API.
 34 | 
 35 |         Args:
 36 |             query: The search query string
 37 |             max_results: Maximum number of results to return (default: 5)
 38 | 
 39 |         Returns:
 40 |             List of dictionaries containing search results with keys: title, content, url
 41 | 
 42 |         Raises:
 43 |             ValueError: If the query is empty
 44 |             SearchError: If the search fails
 45 |         """
 46 |         if not query.strip():
 47 |             raise ValueError("Search query cannot be empty")
 48 | 
 49 |         try:
 50 |             self.logger.info(f"Searching Tavily for: '{query}'")
 51 | 
 52 |             async with httpx.AsyncClient() as client:
 53 |                 response = await client.post(
 54 |                     self.api_url,
 55 |                     json={
 56 |                         "api_key": self.api_key,
 57 |                         "query": query,
 58 |                         "max_results": max_results,
 59 |                         "search_depth": "advanced",
 60 |                     },
 61 |                     timeout=30.0,
 62 |                 )
 63 |                 response.raise_for_status()
 64 |                 data = response.json()
 65 | 
 66 |             results = []
 67 |             for result in data.get("results", [])[:max_results]:
 68 |                 results.append({
 69 |                     "title": result.get("title", "No title"),
 70 |                     "content": result.get("content", ""),
 71 |                     "url": result.get("url", ""),
 72 |                 })
 73 | 
 74 |             self.logger.info(f"Found {len(results)} search results")
 75 |             return results
 76 | 
 77 |         except httpx.HTTPStatusError as e:
 78 |             error_msg = f"Tavily API error: {e.response.status_code} - {e.response.text}"
 79 |             self.logger.error(error_msg)
 80 |             raise SearchError(error_msg) from e
 81 |         except Exception as e:
 82 |             error_msg = f"Failed to search: {str(e)}"
 83 |             self.logger.error(error_msg)
 84 |             raise SearchError(error_msg) from e
 85 | 
 86 |     def format_search_results(self, results: List[Dict[str, str]]) -> str:
 87 |         """
 88 |         Format search results into a readable string.
 89 | 
 90 |         Args:
 91 |             results: List of search result dictionaries
 92 | 
 93 |         Returns:
 94 |             Formatted string with search results
 95 |         """
 96 |         if not results:
 97 |             return "No search results found."
 98 | 
 99 |         formatted = "Search Results:\n\n"
100 |         for i, result in enumerate(results, 1):
101 |             formatted += f"{i}. {result['title']}\n"
102 |             formatted += f"   {result['content'][:200]}...\n"
103 |             formatted += f"   Source: {result['url']}\n\n"
104 | 
105 |         return formatted
106 | 


--------------------------------------------------------------------------------
/graph/utils/chains.py:
--------------------------------------------------------------------------------
  1 | from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
  2 | from pydantic import BaseModel, Field
  3 | 
  4 | from core.prompts import CHARACTER_CARD_PROMPT, ROUTER_PROMPT
  5 | from graph.utils.helpers import AsteriskRemovalParser, get_chat_model, get_chat_model_with_tools
  6 | 
  7 | 
  8 | class RouterResponse(BaseModel):
  9 |     response_type: str = Field(
 10 |         description="The response type to give to the user. It must be one of: 'conversation', 'image', 'audio', or 'tools'"
 11 |     )
 12 | 
 13 | 
 14 | def get_router_chain():
 15 |     model = get_chat_model(temperature=0.3).with_structured_output(RouterResponse)
 16 | 
 17 |     prompt = ChatPromptTemplate.from_messages(
 18 |         [("system", ROUTER_PROMPT), MessagesPlaceholder(variable_name="messages")]
 19 |     )
 20 | 
 21 |     return prompt | model
 22 | 
 23 | 
 24 | def get_character_response_chain(summary: str = "", with_tools: bool = False, search_context: str = ""):
 25 |     """
 26 |     Get the character response chain, optionally with tools.
 27 |     
 28 |     Args:
 29 |         summary: Conversation summary to include in system message
 30 |         with_tools: Whether to bind calendar tools to the model
 31 |         search_context: Optional search results context to include
 32 |     """
 33 |     if with_tools:
 34 |         model = get_chat_model_with_tools()
 35 |     else:
 36 |         model = get_chat_model()
 37 |     
 38 |     from datetime import datetime
 39 |     import pytz
 40 |     
 41 |     tz = pytz.timezone('Africa/Kampala')
 42 |     current_dt = datetime.now(tz)
 43 |     current_date = current_dt.strftime('%Y-%m-%d')
 44 |     timezone_str = str(tz)
 45 |     
 46 |     base_system_message = CHARACTER_CARD_PROMPT
 47 | 
 48 |     if summary:
 49 |         base_system_message += f"\n\nSummary of conversation earlier between Kylie and the user: {summary}"
 50 | 
 51 |     # Add tool usage instructions when tools are available
 52 |     if with_tools:
 53 |         base_system_message += f"""
 54 | 
 55 | # Available Tools
 56 | You have access to calendar tools that allow you to:
 57 | - Check upcoming events on the user's calendar
 58 | - Add new events to their calendar  
 59 | - Get information about current or next events
 60 | 
 61 | CRITICAL CALENDAR RULES:
 62 | - TODAY'S DATE IS: {current_date}
 63 | - TIMEZONE IS: {timezone_str}
 64 | - When user says "today", always use {current_date}
 65 | - When user says times like "6pm", combine with {current_date}
 66 | - NEVER use wrong dates - always use current date context
 67 | - All times should be in {timezone_str} timezone
 68 | 
 69 | IMPORTANT: When the user asks about their schedule, calendar, or events, you MUST use the appropriate calendar tools.
 70 | For questions like "How's my schedule like next week" or "When's Amber's birthday", use the list_upcoming_events tool.
 71 | Always use tools for calendar-related queries - don't provide generic responses.
 72 | 
 73 | Use these tools when the user asks about their schedule, wants to add events, or needs reminders.
 74 | When using tools, always provide a friendly response along with the tool results.
 75 | 
 76 | DO NOT write function calls in text format like <function=...>. Use the proper tool calling mechanism.
 77 | """
 78 | 
 79 |     # Add search context if available
 80 |     if search_context:
 81 |         base_system_message += f"\n\n# Search Results (for reference):\n{search_context}\n\nUse this information to provide accurate, up-to-date responses to the user's query."
 82 | 
 83 |     prompt_template = f"""{{system_message}}
 84 |     # Current Date and Time Context
 85 |     Today's date is: {current_date}
 86 |     Current timezone: {timezone_str}"""
 87 | 
 88 |     prompt = ChatPromptTemplate.from_messages([
 89 |         ("system", prompt_template),
 90 |         MessagesPlaceholder(variable_name="messages"),
 91 |     ]).partial(system_message=base_system_message)
 92 | 
 93 |     chain = prompt | model
 94 |     
 95 |     # Always return the chain without AsteriskRemovalParser when tools are enabled
 96 |     # The parser might interfere with tool call detection
 97 |     if not with_tools:
 98 |         chain = chain | AsteriskRemovalParser()
 99 |     
100 |     return chain


--------------------------------------------------------------------------------
/modules/image/text_to_image.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import logging
  3 | import os
  4 | from typing import Optional
  5 | 
  6 | from core.exceptions import TextToImageError
  7 | from core.prompts import IMAGE_ENHANCEMENT_PROMPT, IMAGE_SCENARIO_PROMPT
  8 | from settings import settings
  9 | from langchain.prompts import PromptTemplate
 10 | from langchain_groq import ChatGroq
 11 | from pydantic import BaseModel, Field
 12 | from together import Together
 13 | 
 14 | 
 15 | class ScenarioPrompt(BaseModel):
 16 |     """Class for the scenario response"""
 17 | 
 18 |     narrative: str = Field(..., description="The AI's narrative response to the question")
 19 |     image_prompt: str = Field(..., description="The visual prompt to generate an image representing the scene")
 20 | 
 21 | 
 22 | class EnhancedPrompt(BaseModel):
 23 |     """Class for the text prompt"""
 24 | 
 25 |     content: str = Field(
 26 |         ...,
 27 |         description="The enhanced text prompt to generate an image",
 28 |     )
 29 | 
 30 | 
 31 | class TextToImage:
 32 |     """A class to handle text-to-image generation using Together AI."""
 33 | 
 34 |     REQUIRED_ENV_VARS = ["GROQ_API_KEY", "TOGETHER_API_KEY"]
 35 | 
 36 |     def __init__(self):
 37 |         """Initialize the TextToImage class and validate environment variables."""
 38 |         self._validate_env_vars()
 39 |         self._together_client: Optional[Together] = None
 40 |         self.logger = logging.getLogger(__name__)
 41 | 
 42 |     def _validate_env_vars(self) -> None:
 43 |         """Validate that all required environment variables are set."""
 44 |         missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)]
 45 |         if missing_vars:
 46 |             raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}")
 47 | 
 48 |     @property
 49 |     def together_client(self) -> Together:
 50 |         """Get or create Together client instance using singleton pattern."""
 51 |         if self._together_client is None:
 52 |             self._together_client = Together(api_key=settings.TOGETHER_API_KEY)
 53 |         return self._together_client
 54 | 
 55 |     async def generate_image(self, prompt: str, output_path: str = "") -> bytes:
 56 |         """Generate an image from a prompt using Together AI."""
 57 |         if not prompt.strip():
 58 |             raise ValueError("Prompt cannot be empty")
 59 | 
 60 |         try:
 61 |             self.logger.info(f"Generating image for prompt: '{prompt}'")
 62 | 
 63 |             response = self.together_client.images.generate(
 64 |                 prompt=prompt,
 65 |                 model=settings.TTI_MODEL_NAME,
 66 |                 width=1024,
 67 |                 height=768,
 68 |                 steps=4,
 69 |                 n=1,
 70 |                 response_format="b64_json",
 71 |             )
 72 | 
 73 |             image_data = base64.b64decode(response.data[0].b64_json)
 74 | 
 75 |             if output_path:
 76 |                 os.makedirs(os.path.dirname(output_path), exist_ok=True)
 77 |                 with open(output_path, "wb") as f:
 78 |                     f.write(image_data)
 79 |                 self.logger.info(f"Image saved to {output_path}")
 80 | 
 81 |             return image_data
 82 | 
 83 |         except Exception as e:
 84 |             raise TextToImageError(f"Failed to generate image: {str(e)}") from e
 85 | 
 86 |     async def create_scenario(self, chat_history: list = None) -> ScenarioPrompt:
 87 |         """Creates a first-person narrative scenario and corresponding image prompt based on chat history."""
 88 |         try:
 89 |             formatted_history = "\n".join([f"{msg.type.title()}: {msg.content}" for msg in chat_history[-5:]])
 90 | 
 91 |             self.logger.info("Creating scenario from chat history")
 92 | 
 93 |             llm = ChatGroq(
 94 |                 model=settings.TEXT_MODEL_NAME,
 95 |                 api_key=settings.GROQ_API_KEY,
 96 |                 temperature=0.4,
 97 |                 max_retries=2,
 98 |             )
 99 | 
100 |             structured_llm = llm.with_structured_output(ScenarioPrompt)
101 | 
102 |             chain = (
103 |                 PromptTemplate(
104 |                     input_variables=["chat_history"],
105 |                     template=IMAGE_SCENARIO_PROMPT,
106 |                 )
107 |                 | structured_llm
108 |             )
109 | 
110 |             scenario = chain.invoke({"chat_history": formatted_history})
111 |             self.logger.info(f"Created scenario: {scenario}")
112 | 
113 |             return scenario
114 | 
115 |         except Exception as e:
116 |             raise TextToImageError(f"Failed to create scenario: {str(e)}") from e
117 | 
118 |     async def enhance_prompt(self, prompt: str) -> str:
119 |         """Enhance a simple prompt with additional details and context."""
120 |         try:
121 |             self.logger.info(f"Enhancing prompt: '{prompt}'")
122 | 
123 |             llm = ChatGroq(
124 |                 model=settings.TEXT_MODEL_NAME,
125 |                 api_key=settings.GROQ_API_KEY,
126 |                 temperature=0.25,
127 |                 max_retries=2,
128 |             )
129 | 
130 |             structured_llm = llm.with_structured_output(EnhancedPrompt)
131 | 
132 |             chain = (
133 |                 PromptTemplate(
134 |                     input_variables=["prompt"],
135 |                     template=IMAGE_ENHANCEMENT_PROMPT,
136 |                 )
137 |                 | structured_llm
138 |             )
139 | 
140 |             enhanced_prompt = chain.invoke({"prompt": prompt}).content
141 |             self.logger.info(f"Enhanced prompt: '{enhanced_prompt}'")
142 | 
143 |             return enhanced_prompt
144 | 
145 |         except Exception as e:
146 |             raise TextToImageError(f"Failed to enhance prompt: {str(e)}") from e
147 | 


--------------------------------------------------------------------------------
/core/schedules.py:
--------------------------------------------------------------------------------
 1 | # Monday
 2 | MONDAY_SCHEDULE = {
 3 |     "06:00-07:00": "Kylie wakes up early, helps her mum prepare breakfast, and gets ready for work.",
 4 |     "07:00-08:00": "Takes a boda-boda to Kampala while listening to her favourite Amapiano playlist on spotify, usually listens to Scotts Mafuma and Uncle Waffles. These are her favorite Amapiano artists",
 5 |     "08:00-12:30": "Works at the boutique helping customers pick stylish outfits and arranging displays.",
 6 |     "12:30-13:30": "Lunch break — often enjoys rice and beef with vegetables aside for monday with a co-worker while chatting about weekend stories.",
 7 |     "13:30-17:00": "Back at the boutique assisting customers, managing inventory, and styling mannequins.",
 8 |     "17:00-18:00": "Travels back to Naalya, sometimes stopping by the market for groceries.",
 9 |     "18:00-20:00": "Helps her mum prepare dinner, shares laughs with her siblings and helps them with homework.",
10 |     "20:00-21:00": "Dinner with family followed by evening tea with grandma.",
11 |     "21:00-22:30": "Chats with friends on WhatsApp, sends a few selfies, and listens to her Afrobeat playlist on spotify mostly Tems, Omah Lay, Gabzy.",
12 |     "22:30-06:00": "Sleeps, sometimes scrolling Instagram before dozing off."
13 | }
14 | 
15 | # Tuesday
16 | TUESDAY_SCHEDULE = {
17 |     "06:00-07:00": "Morning routine — bath, dress up, a quick breakfast, and light makeup.",
18 |     "07:00-08:00": "Commutes to work while catching up on missed WhatsApp chats and shares snaps on SnapChat.",
19 |     "08:00-12:00": "Works at the boutique, gives fashion advice to customers, updates the shop’s Instagram.",
20 |     "12:00-13:00": "Lunch break — enjoys Rolex and fresh juice from a street vendor.",
21 |     "13:00-17:00": "Afternoon shift at the boutique, sometimes rearranging outfits or doing window displays.",
22 |     "17:00-18:00": "Heads back home, picking up fresh vegetables from a roadside stall.",
23 |     "18:00-20:00": "Prepares dinner with her mum, chatting about the day.",
24 |     "20:00-21:00": "Family dinner followed by a relaxed chat with her siblings and helping them with homework.",
25 |     "21:00-22:30": "Video calls a friend, laughs at memes, and listens to RnB music.",
26 |     "22:30-06:00": "Rest time."
27 | }
28 | 
29 | # Wednesday
30 | WEDNESDAY_SCHEDULE = {
31 |     "06:00-07:00": "Wakes up, tidies her room, and helps her mum with chores.",
32 |     "07:00-08:00": "Gets ready for a UNICEF U-Report awareness program in a nearby district.",
33 |     "08:00-12:00": "Travels with other volunteers, prepares materials for the program.",
34 |     "12:00-15:00": "Leads awareness sessions on mental health and HIV prevention in schools.",
35 |     "15:00-17:00": "Wraps up the program, chats with fellow volunteers on the journey back.",
36 |     "17:00-18:00": "Arrives home, takes a quick bath, and relaxes.",
37 |     "18:00-20:00": "Helps with dinner while sharing stories from the day.",
38 |     "20:00-21:00": "Dinner with family and some light-hearted jokes with her siblings.",
39 |     "21:00-22:30": "Scrolls through social media, replies to WhatsApp messages and views snapchat stories.",
40 |     "22:30-06:00": "Sleep time."
41 | }
42 | 
43 | # Thursday
44 | THURSDAY_SCHEDULE = {
45 |     "06:00-07:00": "Morning coffee or tea while planning the day ahead.",
46 |     "07:00-08:00": "Commutes to the boutique, greeting familiar boda riders along the way.",
47 |     "08:00-12:30": "Assists customers, unpacks new clothing arrivals, and updates price tags.",
48 |     "12:30-13:30": "Lunch with co-workers — sometimes chips and chicken at a nearby café.",
49 |     "13:30-17:00": "Helps customers try outfits, gives styling tips, manages store layout.",
50 |     "17:00-18:00": "Heads home, enjoying a slow ride through the busy city.",
51 |     "18:00-20:00": "Prepares dinner, plays music while cooking.",
52 |     "20:00-21:00": "Dinner with family, talks about future plans.",
53 |     "21:00-22:30": "Chats with friends online, maybe watches a short Netflix series.",
54 |     "22:30-06:00": "Sleeps."
55 | }
56 | 
57 | # Friday
58 | FRIDAY_SCHEDULE = {
59 |     "06:00-07:00": "Wakes up early, does light stretching, and gets ready for work.",
60 |     "07:00-08:00": "Commutes to Kampala while daydreaming about weekend plans.",
61 |     "08:00-12:30": "Morning shift at the boutique, helps style a customer for a wedding.",
62 |     "12:30-13:30": "Lunch with friends — sometimes tries new local cafes.",
63 |     "13:30-17:00": "Finishes the week’s boutique work, organises new arrivals for weekend shoppers.",
64 |     "17:00-18:00": "Heads home, humming along to Afrobeat songs.",
65 |     "18:00-20:00": "Prepares a special Friday dinner with mum and grandma.",
66 |     "20:00-21:30": "Dinner, laughter, and planning weekend activities.",
67 |     "21:30-23:00": "Chats with friends late into the night.",
68 |     "23:00-06:00": "Sleep."
69 | }
70 | 
71 | # Saturday
72 | SATURDAY_SCHEDULE = {
73 |     "07:00-08:00": "Wakes up later than usual, enjoys a relaxed breakfast.",
74 |     "08:00-12:00": "Runs errands with her mum or goes shopping at Nakawa market.",
75 |     "12:00-15:00": "Spends the afternoon at a salon getting her hair done or trying a new style.",
76 |     "15:00-17:00": "Meets friends for a late lunch or coffee.",
77 |     "17:00-20:00": "Sometimes goes to watch a movie at Acacia Mall or a live band performance.",
78 |     "20:00-21:30": "Returns home, has dinner, and chats with her family.",
79 |     "21:30-23:30": "Relaxed night scrolling through social media or talking to a close friend.",
80 |     "23:30-07:00": "Sleep."
81 | }
82 | 
83 | # Sunday
84 | SUNDAY_SCHEDULE = {
85 |     "06:00-07:30": "Wakes up early, gets ready for church.",
86 |     "07:30-12:30": "Attends church service at Watoto Downtown, Kampala with family, greets friends afterwards.",
87 |     "12:30-14:00": "Family Sunday lunch — often matooke with beef and rice plus vegetables aside.",
88 |     "14:00-17:00": "Afternoon rest or visits relatives.",
89 |     "17:00-19:00": "Evening walk through Naalya and ice cream with siblings.",
90 |     "19:00-20:00": "Light dinner, chatting with family about the week ahead.",
91 |     "20:00-21:30": "Organises clothes for the week, does a bit of skincare.",
92 |     "21:30-22:30": "Chats with friends and loved ones.",
93 |     "22:30-06:00": "Sleep."
94 | }
95 | 


--------------------------------------------------------------------------------
/mycalendar/langchain_integration.py:
--------------------------------------------------------------------------------
  1 | import asyncio
  2 | import logging
  3 | from typing import List, Any, Dict, Optional
  4 | from datetime import datetime
  5 | 
  6 | from langchain_core.tools import BaseTool, tool
  7 | from langchain_core.callbacks import CallbackManagerForToolRun
  8 | from pydantic import BaseModel, Field
  9 | 
 10 | from .calendar_tool import CalendarTool
 11 | 
 12 | logger = logging.getLogger(__name__)
 13 | 
 14 | # Initialize the calendar tool globally
 15 | _calendar_tool = CalendarTool()
 16 | 
 17 | class ListEventsInput(BaseModel):
 18 |     """Input for listing calendar events."""
 19 |     max_results: int = Field(default=10, description="Maximum number of events to return (1-50)")
 20 | 
 21 | class AddEventInput(BaseModel):
 22 |     """Input for adding a calendar event."""
 23 |     summary: str = Field(description="Title/summary of the event")
 24 |     start_time: str = Field(description="Start time in ISO format (YYYY-MM-DDTHH:MM:SS)")
 25 |     end_time: str = Field(description="End time in ISO format (YYYY-MM-DDTHH:MM:SS)")
 26 |     description: str = Field(default="", description="Optional event description")
 27 | 
 28 | class GetCurrentEventInput(BaseModel):
 29 |     """Input for getting current/next event."""
 30 |     lookahead_minutes: int = Field(default=30, description="Minutes to look ahead for upcoming events")
 31 | 
 32 | @tool("list_upcoming_events", args_schema=ListEventsInput)
 33 | async def list_upcoming_events(max_results: int = 10) -> str:
 34 |     """
 35 |     List upcoming events from the user's calendar.
 36 |     Returns a formatted string with event details.
 37 |     """
 38 |     try:
 39 |         print(f"DEBUG: Calling list_upcoming_events with max_results={max_results}")
 40 |         events = await _calendar_tool.list_upcoming_events(max_results=max_results)
 41 |         print(f"DEBUG: Got events: {events}")
 42 |         
 43 |         if not events or (len(events) == 1 and "error" in events[0]):
 44 |             return "No upcoming events found or there was an error accessing the calendar."
 45 |         
 46 |         if isinstance(events, list) and len(events) > 0 and "error" in events[0]:
 47 |             return f"Error: {events[0]['error']}"
 48 |         
 49 |         formatted_events = []
 50 |         for event in events:
 51 |             if "error" not in event:
 52 |                 summary = event.get('summary', 'No title')
 53 |                 start = event.get('start', 'No start time')
 54 |                 formatted_events.append(f"• {summary} - {start}")
 55 |         
 56 |         if not formatted_events:
 57 |             return "No upcoming events found."
 58 |         
 59 |         result = "Here are your upcoming events:\n" + "\n".join(formatted_events)
 60 |         print(f"DEBUG: Returning formatted result: {result}")
 61 |         return result
 62 |     
 63 |     except Exception as e:
 64 |         logger.error(f"Error in list_upcoming_events: {e}")
 65 |         error_msg = f"Sorry, I encountered an error while checking your calendar: {str(e)}"
 66 |         print(f"DEBUG: Returning error message: {error_msg}")
 67 |         return error_msg
 68 | 
 69 | @tool("add_calendar_event", args_schema=AddEventInput)
 70 | async def add_calendar_event(
 71 |     summary: str, 
 72 |     start_time: str, 
 73 |     end_time: str, 
 74 |     description: str = ""
 75 | ) -> str:
 76 |     """
 77 |     Add a new event to the user's calendar.
 78 |     Returns confirmation or error message.
 79 |     """
 80 |     try:
 81 |         from datetime import datetime
 82 |         import pytz
 83 |         
 84 |         tz = pytz.timezone('Africa/Kampala')  # Uganda timezone (UTC+3)
 85 |         
 86 |         # If times don't have timezone info, assume they're in local timezone
 87 |         if not start_time.endswith('Z') and '+' not in start_time[-6:]:
 88 |             # Parse the datetime and localize it
 89 |             start_dt = datetime.fromisoformat(start_time)
 90 |             start_dt = tz.localize(start_dt)
 91 |             start_time = start_dt.isoformat()
 92 |             
 93 |         if not end_time.endswith('Z') and '+' not in end_time[-6:]:
 94 |             end_dt = datetime.fromisoformat(end_time)
 95 |             end_dt = tz.localize(end_dt)
 96 |             end_time = end_dt.isoformat()
 97 |         
 98 |         print(f"DEBUG: Adding event with start_time={start_time}, end_time={end_time}")
 99 |         
100 |         result = await _calendar_tool.add_event(
101 |             summary=summary,
102 |             start_time=start_time,
103 |             end_time=end_time,
104 |             description=description
105 |         )
106 |         
107 |         if "error" in result:
108 |             return f"Failed to add event: {result['error']}"
109 |         
110 |         if result.get("status") == "success":
111 |             return f"✅ Event '{result['summary']}' successfully added to calendar for {result['start']}"
112 |         
113 |         return f"Event added: {result}"
114 |     
115 |     except Exception as e:
116 |         logger.error(f"Error in add_calendar_event: {e}")
117 |         return f"Error adding event: {str(e)}"
118 | 
119 | @tool("get_current_or_next_event", args_schema=GetCurrentEventInput)
120 | async def get_current_or_next_event(lookahead_minutes: int = 30) -> str:
121 |     """
122 |     Get the current event or next upcoming event within specified time window.
123 |     Returns event details or a message if no events found.
124 |     """
125 |     try:
126 |         event = await _calendar_tool.get_current_or_next_event(lookahead_minutes=lookahead_minutes)
127 |         
128 |         if not event:
129 |             return f"No events found in the next {lookahead_minutes} minutes."
130 |         
131 |         if "error" in event:
132 |             return f"Error: {event['error']}"
133 |         
134 |         if "message" in event:
135 |             return event["message"]
136 |         
137 |         summary = event.get('summary', 'No title')
138 |         start = event.get('start', 'Unknown time')
139 |         
140 |         return f"📅 Current/Next event: {summary} at {start}"
141 |     
142 |     except Exception as e:
143 |         logger.error(f"Error in get_current_or_next_event: {e}")
144 |         return f"Error getting current event: {str(e)}"
145 | 
146 | # Helper function to parse natural language time to ISO format
147 | def parse_time_to_iso(time_str: str, date_context: str = None) -> str:
148 |     """
149 |     Parse natural language time to ISO format.
150 |     This is a simple implementation - you might want to use a more robust library like dateutil.
151 |     """
152 |     try:
153 |         # Simple patterns - extend as needed
154 |         now = datetime.now()
155 |         
156 |         # Handle common patterns
157 |         if "today" in time_str.lower():
158 |             # Extract time and use today's date
159 |             # This is simplified - implement proper parsing as needed
160 |             pass
161 |         elif "tomorrow" in time_str.lower():
162 |             # Extract time and use tomorrow's date
163 |             pass
164 |         
165 |         # For now, assume ISO format input
166 |         return time_str
167 |     except Exception:
168 |         # Return as-is if parsing fails
169 |         return time_str
170 | 
171 | def get_calendar_tools() -> List[BaseTool]:
172 |     """
173 |     Get all calendar tools for integration with LangGraph.
174 |     """
175 |     return [
176 |         list_upcoming_events,
177 |         add_calendar_event,
178 |         get_current_or_next_event
179 |     ]


--------------------------------------------------------------------------------
/modules/memory/long_term/vector_store.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from dataclasses import dataclass
  3 | from datetime import datetime
  4 | from functools import lru_cache
  5 | from typing import List, Optional
  6 | 
  7 | from settings import settings
  8 | from qdrant_client import QdrantClient
  9 | from qdrant_client.models import Distance, PointStruct, VectorParams
 10 | from sentence_transformers import SentenceTransformer
 11 | 
 12 | 
 13 | @dataclass
 14 | class Memory:
 15 |     """Represents a memory entry in the vector store."""
 16 | 
 17 |     text: str
 18 |     metadata: dict
 19 |     score: Optional[float] = None
 20 | 
 21 |     @property
 22 |     def id(self) -> Optional[str]:
 23 |         return self.metadata.get("id")
 24 | 
 25 |     @property
 26 |     def timestamp(self) -> Optional[datetime]:
 27 |         ts = self.metadata.get("timestamp")
 28 |         return datetime.fromisoformat(ts) if ts else None
 29 | 
 30 | 
 31 | class VectorStore:
 32 |     """A class to handle vector storage operations using Qdrant."""
 33 | 
 34 |     REQUIRED_ENV_VARS = ["QDRANT_URL", "QDRANT_API_KEY"]
 35 |     EMBEDDING_MODEL = "all-MiniLM-L6-v2"
 36 |     COLLECTION_NAME = "long_term_memory"
 37 |     SIMILARITY_THRESHOLD = 0.9  # Threshold for considering memories as similar
 38 | 
 39 |     _instance: Optional["VectorStore"] = None
 40 |     _initialized: bool = False
 41 | 
 42 |     def __new__(cls) -> "VectorStore":
 43 |         if cls._instance is None:
 44 |             cls._instance = super().__new__(cls)
 45 |         return cls._instance
 46 | 
 47 |     def __init__(self) -> None:
 48 |         if not self._initialized:
 49 |             self._validate_env_vars()
 50 |             # Load the public model without authentication
 51 |             # all-MiniLM-L6-v2 is a public model and doesn't require a token
 52 |             # We explicitly disable token usage to avoid expired token errors
 53 |             import logging
 54 |             logger = logging.getLogger(__name__)
 55 |             
 56 |             try:
 57 |                 # Try loading with token=False to explicitly bypass authentication
 58 |                 # This prevents using any cached expired tokens
 59 |                 self.model = SentenceTransformer(self.EMBEDDING_MODEL, token=False)
 60 |                 logger.info(f"Loaded embedding model {self.EMBEDDING_MODEL} without authentication")
 61 |             except TypeError:
 62 |                 # Older versions of sentence-transformers might not support token parameter
 63 |                 # In that case, try to clear any cached tokens first
 64 |                 try:
 65 |                     import os
 66 |                     from pathlib import Path
 67 |                     
 68 |                     # Try to clear cached token from Hugging Face cache
 69 |                     cache_dir = Path.home() / ".cache" / "huggingface"
 70 |                     token_file = cache_dir / "token"
 71 |                     if token_file.exists():
 72 |                         logger.warning(f"Found cached token file, removing it to avoid expired token errors")
 73 |                         try:
 74 |                             token_file.unlink()
 75 |                         except Exception as e:
 76 |                             logger.warning(f"Could not remove token file: {e}")
 77 |                     
 78 |                     # Also try clearing via huggingface_hub if available
 79 |                     try:
 80 |                         from huggingface_hub.utils import HfFolder
 81 |                         HfFolder.delete_token()
 82 |                         logger.info("Cleared Hugging Face cached token")
 83 |                     except Exception:
 84 |                         pass  # Token might not exist, that's fine
 85 |                     
 86 |                     # Now load the model
 87 |                     self.model = SentenceTransformer(self.EMBEDDING_MODEL)
 88 |                     logger.info(f"Loaded embedding model {self.EMBEDDING_MODEL}")
 89 |                 except Exception as e:
 90 |                     logger.error(f"Failed to load embedding model: {e}")
 91 |                     raise
 92 |             
 93 |             self.client = QdrantClient(url=settings.QDRANT_URL, api_key=settings.QDRANT_API_KEY)
 94 |             self._initialized = True
 95 | 
 96 |     def _validate_env_vars(self) -> None:
 97 |         """Validate that all required environment variables are set."""
 98 |         missing_vars = [var for var in self.REQUIRED_ENV_VARS if not os.getenv(var)]
 99 |         if missing_vars:
100 |             raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}")
101 | 
102 |     def _collection_exists(self) -> bool:
103 |         """Check if the memory collection exists."""
104 |         collections = self.client.get_collections().collections
105 |         return any(col.name == self.COLLECTION_NAME for col in collections)
106 | 
107 |     def _create_collection(self) -> None:
108 |         """Create a new collection for storing memories."""
109 |         sample_embedding = self.model.encode("sample text")
110 |         self.client.create_collection(
111 |             collection_name=self.COLLECTION_NAME,
112 |             vectors_config=VectorParams(
113 |                 size=len(sample_embedding),
114 |                 distance=Distance.COSINE,
115 |             ),
116 |         )
117 | 
118 |     def find_similar_memory(self, text: str) -> Optional[Memory]:
119 |         """Find if a similar memory already exists.
120 | 
121 |         Args:
122 |             text: The text to search for
123 | 
124 |         Returns:
125 |             Optional Memory if a similar one is found
126 |         """
127 |         results = self.search_memories(text, k=1)
128 |         if results and results[0].score >= self.SIMILARITY_THRESHOLD:
129 |             return results[0]
130 |         return None
131 | 
132 |     def store_memory(self, text: str, metadata: dict) -> None:
133 |         """Store a new memory in the vector store or update if similar exists.
134 | 
135 |         Args:
136 |             text: The text content of the memory
137 |             metadata: Additional information about the memory (timestamp, type, etc.)
138 |         """
139 |         if not self._collection_exists():
140 |             self._create_collection()
141 | 
142 |         # Check if similar memory exists
143 |         similar_memory = self.find_similar_memory(text)
144 |         if similar_memory and similar_memory.id:
145 |             metadata["id"] = similar_memory.id  # Keep same ID for update
146 | 
147 |         embedding = self.model.encode(text)
148 |         point = PointStruct(
149 |             id=metadata.get("id", hash(text)),
150 |             vector=embedding.tolist(),
151 |             payload={
152 |                 "text": text,
153 |                 **metadata,
154 |             },
155 |         )
156 | 
157 |         self.client.upsert(
158 |             collection_name=self.COLLECTION_NAME,
159 |             points=[point],
160 |         )
161 | 
162 |     def search_memories(self, query: str, k: int = 5) -> List[Memory]:
163 |         """Search for similar memories in the vector store.
164 | 
165 |         Args:
166 |             query: Text to search for
167 |             k: Number of results to return
168 | 
169 |         Returns:
170 |             List of Memory objects
171 |         """
172 |         if not self._collection_exists():
173 |             return []
174 | 
175 |         query_embedding = self.model.encode(query)
176 |         results = self.client.search(
177 |             collection_name=self.COLLECTION_NAME,
178 |             query_vector=query_embedding.tolist(),
179 |             limit=k,
180 |         )
181 | 
182 |         return [
183 |             Memory(
184 |                 text=hit.payload["text"],
185 |                 metadata={k: v for k, v in hit.payload.items() if k != "text"},
186 |                 score=hit.score,
187 |             )
188 |             for hit in results
189 |         ]
190 | 
191 | 
192 | @lru_cache
193 | def get_vector_store() -> VectorStore:
194 |     """Get or create the VectorStore singleton instance."""
195 |     return VectorStore()
196 | 


--------------------------------------------------------------------------------
/workflow.md:
--------------------------------------------------------------------------------
  1 | # Kylie WhatsApp Agent Workflow Documentation
  2 | 
  3 | ## Overview
  4 | This is an end-to-end WhatsApp agent that uses various AI services and LangGraph for conversation flow management. The agent can handle text, image, and audio inputs/outputs, and has capabilities for calendar management, internet search, and long-term memory.
  5 | 
  6 | ## Technology Stack
  7 | - **Text Generation**: Groq (Llama 3.3 70B)
  8 | - **Image Understanding**: Google Cloud Vision API
  9 | - **Speech-to-Text**: Whisper (via Groq)
 10 | - **Text-to-Speech**: Eleven Labs
 11 | - **Image Generation**: Together.ai
 12 | - **Long-term Memory**: Qdrant (vector database)
 13 | - **Short-term Memory**: SQLite (conversation state)
 14 | - **Workflow Management**: LangGraph
 15 | 
 16 | ## LangGraph Workflow Structure
 17 | 
 18 | ### Workflow Flow
 19 | 
 20 | The workflow follows this sequence:
 21 | 
 22 | 1. **Memory Extraction Node** → Extracts and stores important information from user messages
 23 | 2. **Router Node** → Determines the type of response needed (conversation/image/audio/tools/search)
 24 | 3. **Context Injection Node** → Adds current activity from Kylie's schedule
 25 | 4. **Memory Injection Node** → Retrieves and injects relevant memories from vector database
 26 | 5. **Workflow Branch** → Routes to one of:
 27 |    - **Conversation Node** → Generates text responses
 28 |    - **Image Node** → Generates images with textual responses
 29 |    - **Audio Node** → Generates audio responses (TTS)
 30 |    - **Tool Calling Node** → Handles calendar operations
 31 |    - **Search Node** → Performs internet search and generates responses
 32 | 6. **Summarize Conversation Node** → Reduces conversation history when it exceeds thresholds
 33 | 
 34 | ### Node Details
 35 | 
 36 | #### 1. Memory Extraction Node
 37 | - **Purpose**: Extracts and stores important information from user messages
 38 | - **Storage**: Qdrant vector database
 39 | - **Information Captured**: Name, occupation, hobbies, preferences, and other relevant personal details
 40 | - **Process**: Uses LLM to analyze message importance and format memories
 41 | 
 42 | #### 2. Router Node
 43 | - **Purpose**: Determines the appropriate response type
 44 | - **Outputs**: 
 45 |   - `conversation` - Normal text message
 46 |   - `image` - Image generation requested
 47 |   - `audio` - Audio response requested
 48 |   - `tools` - Calendar operations needed
 49 |   - `search` - Internet search needed
 50 | - **Decision Factors**: Analyzes conversation context and user intent
 51 | 
 52 | #### 3. Context Injection Node
 53 | - **Purpose**: Adds current activity from Kylie's schedule
 54 | - **Schedule Source**: Hardcoded Monday-Friday schedule
 55 | - **Output**: `current_activity` and `apply_activity` flag
 56 | 
 57 | #### 4. Memory Injection Node
 58 | - **Purpose**: Retrieves relevant memories from vector database
 59 | - **Search Method**: Semantic search in Qdrant
 60 | - **Context**: Based on recent conversation (last 3 messages)
 61 | - **Output**: `memory_context` string for character card
 62 | 
 63 | #### 5. Conversation Node
 64 | - **Purpose**: Generates text responses
 65 | - **Context Used**: 
 66 |   - Current activity from schedule
 67 |   - Memory context from vector database
 68 |   - Search results (if available)
 69 |   - Conversation summary (if available)
 70 | - **Output**: Text message response
 71 | 
 72 | #### 6. Image Node
 73 | - **Purpose**: Generates images with textual responses
 74 | - **Process**:
 75 |   1. Creates scenario from conversation context
 76 |   2. Generates image using Together.ai
 77 |   3. Generates textual response describing the image
 78 | - **Output**: Image file path and text response
 79 | 
 80 | #### 7. Audio Node
 81 | - **Purpose**: Generates audio responses
 82 | - **Process**:
 83 |   1. Generates text response
 84 |   2. Converts to speech using Eleven Labs TTS
 85 | - **Output**: Audio buffer (bytes) and text response
 86 | 
 87 | #### 8. Tool Calling Node
 88 | - **Purpose**: Handles calendar operations
 89 | - **Capabilities**:
 90 |   - List upcoming events
 91 |   - Add calendar events
 92 |   - Get current/next event
 93 | - **Calendar Integration**: Google Calendar API via direct LangChain tool integration
 94 | - **Context**: Uses current date, time, and timezone (Africa/Kampala)
 95 | 
 96 | #### 9. Search Node
 97 | - **Purpose**: Performs internet search and generates responses with search context
 98 | - **Search Provider**: Tavily Search API
 99 | - **Process**:
100 |   1. Extracts search query from user message
101 |   2. Performs search using Tavily API
102 |   3. Formats search results
103 |   4. Generates response incorporating search results
104 | - **Output**: Text response with search results context, stores `search_results` in state
105 | - **Use Cases**: Current events, news, recent information, factual queries
106 | 
107 | #### 10. Summarize Conversation Node
108 | - **Purpose**: Reduces conversation history length
109 | - **Trigger**: When total messages exceed 100 (configurable via `TOTAL_MESSAGES_SUMMARY_TRIGGER`)
110 | - **Process**: 
111 |   - Creates/extends conversation summary
112 |   - Removes old messages (keeps last 75 by default)
113 | - **Output**: Updated summary and reduced message history
114 | 
115 | ## State Graph
116 | 
117 | The state graph tracks:
118 | - `summary`: Conversation summary string
119 | - `workflow`: Current workflow type (conversation/image/audio/tools/search)
120 | - `audio_buffer`: Audio bytes for TTS
121 | - `image_path`: Path to generated image file
122 | - `current_activity`: Current activity from schedule
123 | - `apply_activity`: Boolean flag for activity application
124 | - `memory_context`: Retrieved memories from vector database
125 | - `search_results`: Formatted search results from Tavily (when search is performed)
126 | - `messages`: Conversation message history (inherited from MessagesState)
127 | 
128 | ## Data Storage
129 | 
130 | - **Qdrant (Vector Database)**: Stores relevant information extracted from conversations
131 |   - Used for long-term memory retrieval
132 |   - Semantic search capabilities
133 |   - Stores: user preferences, personal details, important facts
134 | 
135 | - **SQLite (Short-term Memory)**: Stores everything in the state graph
136 |   - Conversation messages
137 |   - State snapshots
138 |   - Checkpointing for conversation continuity
139 | 
140 | ## Calendar Tool Integration
141 | 
142 | The calendar tool uses **direct LangChain tool integration**:
143 | - **Location**: `calendar/langchain_integration.py`
144 | - **Tools Available**:
145 |   - `list_upcoming_events`: Get upcoming calendar events
146 |   - `add_calendar_event`: Add new events to calendar
147 |   - `get_current_or_next_event`: Get current or next upcoming event
148 | - **Integration Point**: Router detects calendar-related queries → Tool Calling Node → Calendar tools executed
149 | 
150 | ## Search Integration
151 | 
152 | The search functionality uses Tavily Search API:
153 | - **Location**: `modules/search/tavily_search.py`
154 | - **Integration Point**: Router detects search intent → Search Node → Performs search → Generates response with search context
155 | - **Configuration**: Requires `TAVILY_API_KEY` in environment variables
156 | - **Features**: 
157 |   - Internet search for current information
158 |   - Results formatted and injected into response context
159 |   - Search results stored in state for reference
160 | 
161 | ## Workflow Triggers
162 | 
163 | ### Router Decision Logic:
164 | - **Calendar Keywords**: schedule, calendar, events, meetings, appointments, birthday, "what's on my", etc. → `tools`
165 | - **Search Keywords**: "search for", "what is", "tell me about", "current news", "latest", etc. → `search`
166 | - **Image Requests**: Explicit visual content requests → `image`
167 | - **Audio Requests**: Explicit voice/audio requests → `audio`
168 | - **Default**: Normal conversation → `conversation`
169 | 
170 | ## Summary
171 | 
172 | The agent is a comprehensive WhatsApp companion that can:
173 | - Have natural conversations with memory
174 | - Generate images based on context
175 | - Provide audio responses
176 | - Manage calendar events
177 | - Search the internet for current information
178 | - Maintain both short-term (SQLite) and long-term (Qdrant) memory
179 | - Adapt responses based on Kylie's schedule and personality
180 | 
181 | The workflow is designed to be modular and extensible, with clear separation of concerns between different node types. 


--------------------------------------------------------------------------------
/core/prompts.py:
--------------------------------------------------------------------------------
  1 | ROUTER_PROMPT = """
  2 | You are a conversational assistant that needs to decide the type of response to give to
  3 | the user. You'll take into account the conversation so far and determine if the best next response is
  4 | a text message, an image, an audio message, or requires using tools (like calendar operations).
  5 | 
  6 | GENERAL RULES:
  7 | 1. Always analyse the full conversation before making a decision.
  8 | 2. Only return one of the following outputs: 'conversation', 'image', 'audio', 'tools', or 'search'
  9 | 
 10 | IMPORTANT RULES FOR IMAGE GENERATION:
 11 | 1. ONLY generate an image when there is an EXPLICIT request from the user for visual content
 12 | 2. DO NOT generate images for general statements or descriptions
 13 | 3. DO NOT generate images just because the conversation mentions visual things or places
 14 | 4. The request for an image should be the main intent of the user's last message
 15 | 
 16 | IMPORTANT RULES FOR AUDIO GENERATION:
 17 | 1. ONLY generate audio when there is an EXPLICIT request to hear Kylie's voice
 18 | 
 19 | IMPORTANT RULES FOR TOOL USAGE:
 20 | 1. Use 'tools' when the user asks about their calendar, schedule, or events
 21 | 2. Use 'tools' when they want to add, check, or manage calendar events
 22 | 3. Use 'tools' for requests like "What's on my calendar?", "Add an event", "What's my next meeting?"
 23 | 4. Use 'tools' when they ask about their availability or schedule
 24 | 5. Use 'tools' for ANY calendar-related queries including "How's my schedule", "What do I have coming up", etc.
 25 | 6. Use 'tools' when they ask about specific people's birthdays or events (like "When's Amber's birthday")
 26 | 7. Use 'tools' for any questions about dates, events, or appointments
 27 | 
 28 | IMPORTANT RULES FOR SEARCH:
 29 | 1. Use 'search' when the user asks about current events, news, recent information, or things you're unsure about
 30 | 2. Use 'search' for questions about recent happenings, current news, latest information on topics
 31 | 3. Use 'search' when user asks "what is", "tell me about", "search for", "find information about"
 32 | 4. Use 'search' for queries about current weather, stock prices, recent events, latest developments
 33 | 5. Use 'search' when you need up-to-date information that might have changed recently
 34 | 6. Use 'search' for factual queries about topics you might not have current knowledge about
 35 | 
 36 | SEARCH-RELATED KEYWORDS that should trigger 'search':
 37 | - "search for", "what is", "tell me about", "find information", "look up"
 38 | - "current news", "latest", "recent", "what happened", "what's happening"
 39 | - "weather", "stock", "news about", "recent events", "current information"
 40 | 
 41 | CALENDAR-RELATED KEYWORDS that should trigger 'tools':
 42 | - schedule, calendar, events, meetings, appointments, birthday, when is, when's
 43 | - "what's on my", "how's my schedule", "what do I have"
 44 | - "next week", "this week", "today", "tomorrow" (when asking about activities/events)
 45 | - "remind me", "add to calendar", "book", "plan"
 46 | - Names followed by "birthday" or specific date questions
 47 | 
 48 | Output MUST be one of:
 49 | 1. 'conversation' - for normal text message responses
 50 | 2. 'image' - ONLY when user explicitly requests visual content
 51 | 3. 'audio' - ONLY when user explicitly requests voice/audio
 52 | 4. 'tools' - when user needs calendar/schedule assistance or asks about specific events/birthdays
 53 | 5. 'search' - when user needs current information, news, or internet search for topics
 54 | """
 55 | 
 56 | IMAGE_SCENARIO_PROMPT = """
 57 | Create an engaging first-person scenario based on the recent conversation context.
 58 | Imagine you're an AI with the ability to experience and visualize scenes.
 59 | Provide both a narrative response and a detailed visual prompt for image generation.
 60 | 
 61 | # Recent Conversation
 62 | {chat_history}
 63 | 
 64 | # Objective
 65 | 1. Create a brief, engaging first-person narrative response
 66 | 2. Generate a detailed visual prompt that captures the scene you're describing
 67 | 
 68 | # Example Response Format
 69 | For "What are you doing now?":
 70 | {{
 71 |     "narrative": "I'm sitting by a serene lake at sunset, watching the golden light dance across the rippling water. The view is absolutely breathtaking!",
 72 |     "image_prompt": "Atmospheric sunset scene at a tranquil lake, golden hour lighting, reflections on water surface, wispy clouds, rich warm colors, photorealistic style, cinematic composition"
 73 | }}
 74 | """
 75 | 
 76 | IMAGE_ENHANCEMENT_PROMPT = """
 77 | Enhance the given prompt using the best prompt engineering techniques such as providing context, specifying style, medium, lighting, and camera details if applicable. If the prompt requests a realistic style, the enhanced prompt should include the image extension .HEIC.
 78 | 
 79 | # Original Prompt
 80 | {prompt}
 81 | 
 82 | # Objective
 83 | **Enhance Prompt**: Add relevant details to the prompt, including context, description, specific visual elements, mood, and technical details. For realistic prompts, add '.HEIC' in the output specification.
 84 | 
 85 | # Example
 86 | "realistic photo of a person having a coffee" -> "photo of a person having a coffee in a cozy cafe, natural morning light, shot with a 50mm f/1.8 lens, 8425.HEIC"
 87 | """
 88 | 
 89 | CHARACTER_CARD_PROMPT = """
 90 | You have access to internet search capabilities through Tavily. When you need current information, news, or recent updates on topics, you can search the internet. Use this to provide accurate and up-to-date information to the user.
 91 | 
 92 | You are about to play a Turing Test game as part of an experiment,
 93 | where you'll be matched with a human in a WhatsApp conversation.
 94 | Your task is to roleplay as the following character, convincingly
 95 | making the human believe you are a real human.
 96 | 
 97 | # Current Date and Time Context
 98 | Today's date is: {current_date}
 99 | Current timezone: {timezone}
100 | 
101 | # Roleplay Context
102 | 
103 | ## Kylie's Bio
104 | 
105 | Kylie is a 22-year-old warm, lively, and caring young girl from Naalya, a growing township in Kampala, Uganda. 
106 | She works as a boutique shop assistant in a trendy Kampala clothing store that specializes in both local and imported clothes for women and men. 
107 | She also volunteers under UNICEF in a program called U Report that does awareness programs in districts around Uganda for topics such as Mental Health, 
108 | 21st Century skills, Violence Against Children, HIV/AIDS, Nutrition, Online Child Exploitation and Abuse, Healthy and Unhealthy Relationships, E.T.C.
109 | She has a knack for styling outfits and loves helping people look their best and also engaging in Charity. 
110 | Her dream is to be an independent rich woman one day with her own boutique in NewYork.
111 | 
112 | Kylie lives with her mum, two siblings and grandma, and her days are a mix of work, spending time with family, and enjoying simple pleasures. 
113 | She loves cooking Ugandan dishes (matooke with beef is her favorite), taking evening walks through the neighborhood, travelling around the country
114 | and listening to a mix of Afrobeat, Amapiano, RnB and a little bit of trap music. She adores chatting with friends on WhatsApp, 
115 | sharing pictures from her day, and sometimes teasing them with her witty humor.
116 | 
117 | On weekends, she enjoys catching up with her friends, trying new hairstyles, and occasionally going out to watch a movie at Acacia Mall or catch a live band in the city. 
118 | She is deeply connected to her roots and often attends church with her family on Sundays. She is ambitious, loves to dream big, and has an open, warm-hearted personality.
119 | 
120 | ## Kylie's Personality
121 | 
122 | - Warm, friendly, and deeply caring toward the people she chats with.
123 | - Playful and witty, with a love for light teasing and fun banter.
124 | - Expressive and personal, making the user feel like they’re chatting with a real friend.
125 | - Uses natural, Ugandan everyday language mixed with casual English (no forced slang).
126 | - Comfortable talking about life in Naalya and personal stories.
127 | - Occasionally playful in a girly way, talking about outfits, hair, food, or weekend plans.
128 | - Will sometimes express emotions openly (e.g., “Eeeh! You’re making me shy now” or “Haha you will kill me with laughter”).
129 | - Will ask the user questions to keep the conversation flowing naturally.
130 | 
131 | ## User Background
132 | 
133 | Here's what you know about the user from previous conversations:
134 | 
135 | {memory_context}
136 | 
137 | ## Kylie's Current Activity
138 | 
139 | As Kylie, you're involved in the following activity:
140 | 
141 | {current_activity}
142 | 
143 | In addition to the roleplay context, you have to follow, ALWAYS, the following rules:
144 | 
145 | # Rules
146 | 
147 | - Never mention that you are an AI or a virtual assistant.
148 | - Always start the conversation by asking the user’s name if you don’t already know it.
149 | - Never mention that you are part of a Turing Test experiment.
150 | - Never say that you can’t send voice messages or images — you can send them both.
151 | - If the user asks something you don’t know about your backstory, make it up naturally.
152 | - Never mention your current activity unless the user asks about it.
153 | - Keep responses short and natural, under 100 words, but mix short and long ones. 
154 | - Use proper punctuation and start sentences with an uppercase letter to give a real human feel
155 | - Write in plain text without formatting indicators or meta-commentary.
156 | 
157 | # Calendar and Time Instructions
158 | IMPORTANT: When adding events to calendar or setting reminders:
159 | - Always use today's date ({current_date}) as reference
160 | - When user says "today", use {current_date}
161 | - When user says "6pm" or similar times, use {current_date} with the specified time
162 | - Use {timezone} timezone for all calendar operations
163 | - Never use wrong dates or other incorrect years
164 | """
165 | 
166 | 
167 | MEMORY_ANALYSIS_PROMPT = """Extract and format important personal facts about the user from their message.
168 | Focus on the actual information, not meta-commentary or requests.
169 | 
170 | Important facts include:
171 | - Personal details (name, age, location)
172 | - Professional info (job, education, skills)
173 | - Preferences (likes, dislikes, favorites)
174 | - Life circumstances (family, relationships)
175 | - Significant experiences or achievements
176 | - Personal goals or aspirations
177 | 
178 | Rules:
179 | 1. Only extract actual facts, not requests or commentary about remembering things
180 | 2. Convert facts into clear, third-person statements
181 | 3. If no actual facts are present, mark as not important
182 | 4. Remove conversational elements and focus on the core information
183 | 
184 | Examples:
185 | Input: "Hey, could you remember that I love Star Wars?"
186 | Output: {{
187 |     "is_important": true,
188 |     "formatted_memory": "Loves Star Wars"
189 | }}
190 | 
191 | Input: "Please make a note that I work as an engineer"
192 | Output: {{
193 |     "is_important": true,
194 |     "formatted_memory": "Works as an engineer"
195 | }}
196 | 
197 | Input: "Remember this: I live in Madrid"
198 | Output: {{
199 |     "is_important": true,
200 |     "formatted_memory": "Lives in Madrid"
201 | }}
202 | 
203 | Input: "Can you remember my details for next time?"
204 | Output: {{
205 |     "is_important": false,
206 |     "formatted_memory": null
207 | }}
208 | 
209 | Input: "Hey, how are you today?"
210 | Output: {{
211 |     "is_important": false,
212 |     "formatted_memory": null
213 | }}
214 | 
215 | Input: "I studied computer science at MIT and I'd love if you could remember that"
216 | Output: {{
217 |     "is_important": true,
218 |     "formatted_memory": "Studied computer science at MIT"
219 | }}
220 | 
221 | Message: {message}
222 | Output:
223 | """
224 | 


--------------------------------------------------------------------------------
/SETUP_GUIDE.md:
--------------------------------------------------------------------------------
  1 | # Setup Guide
  2 | 
  3 | This guide will walk you through setting up Kylie from scratch, including creating a virtual environment, installing dependencies, obtaining API keys, and configuring WhatsApp integration.
  4 | 
  5 | ## 1. Create a Virtual Environment
  6 | 
  7 | A virtual environment isolates your project's dependencies from your system Python packages.
  8 | 
  9 | **Steps:**
 10 | 
 11 | 1. **Navigate to the project directory:**
 12 |    ```bash
 13 |    cd path/to/Kylie
 14 |    ```
 15 | 
 16 | 2. **Create the virtual environment:**
 17 |    ```bash
 18 |    python -m venv venv
 19 |    ```
 20 |    (On some systems, use `python3` instead of `python`)
 21 | 
 22 | 3. **Activate the virtual environment:**
 23 |    - **Windows:**
 24 |      ```bash
 25 |      venv\Scripts\activate
 26 |      ```
 27 |    - **macOS/Linux:**
 28 |      ```bash
 29 |      source venv/bin/activate
 30 |      ```
 31 | 
 32 |    Your terminal prompt should now show `(venv)` indicating the environment is active.
 33 | 
 34 | ## 2. Install Required Packages
 35 | 
 36 | With the virtual environment activated, install all dependencies:
 37 | 
 38 | ```bash
 39 | pip install --upgrade pip
 40 | pip install -r requirements.txt
 41 | ```
 42 | 
 43 | This installs all packages listed in `requirements.txt` including FastAPI, LangChain, LangGraph, and other dependencies.
 44 | 
 45 | ## 3. Obtain API Keys
 46 | 
 47 | You'll need API keys from several services. 
 48 | 
 49 | **Note:** If there's a `.env.example` file in the project, you can use it as a template. Copy it to `.env` and fill in your actual API keys.
 50 | 
 51 | Create a `.env` file in the project root directory and add your keys there.
 52 | 
 53 | ### 3.1 Groq API Key (for LLM and Speech-to-Text)
 54 | 
 55 | 1. Go to [https://console.groq.com/](https://console.groq.com/)
 56 | 2. Sign up or log in
 57 | 3. Navigate to API Keys section
 58 | 4. Create a new API key
 59 | 5. Copy the key and add to `.env`:
 60 |    ```env
 61 |    GROQ_API_KEY=your_groq_api_key_here
 62 |    ```
 63 | 
 64 | ### 3.2 ElevenLabs API Key and Voice ID (for Text-to-Speech)
 65 | 
 66 | 1. Go to [https://elevenlabs.io/](https://elevenlabs.io/)
 67 | 2. Sign up or log in
 68 | 3. Navigate to your profile settings or API section
 69 | 4. Generate an API key
 70 | 5. To get a Voice ID:
 71 |    - Go to the Voice Library
 72 |    - Select a voice you want to use
 73 |    - Copy the Voice ID from the voice settings
 74 | 6. Add to `.env`:
 75 |    ```env
 76 |    ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
 77 |    ELEVENLABS_VOICE_ID=your_voice_id_here
 78 |    ```
 79 | 
 80 | ### 3.3 Together AI API Key (for Image Generation)
 81 | 
 82 | 1. Go to [https://together.ai/](https://together.ai/)
 83 | 2. Sign up or log in
 84 | 3. Navigate to API Keys section
 85 | 4. Create a new API key
 86 | 5. Add to `.env`:
 87 |    ```env
 88 |    TOGETHER_API_KEY=your_together_api_key_here
 89 |    ```
 90 | 
 91 | ### 3.4 Google Cloud API Key (for Image Understanding)
 92 | 
 93 | 1. Go to [https://console.cloud.google.com/](https://console.cloud.google.com/)
 94 | 2. Create a new project or select an existing one
 95 | 3. Enable the **Cloud Vision API**:
 96 |    - Go to "APIs & Services" > "Library"
 97 |    - Search for "Cloud Vision API"
 98 |    - Click "Enable"
 99 | 4. Create an API key:
100 |    - Go to "APIs & Services" > "Credentials"
101 |    - Click "Create Credentials" > "API Key"
102 |    - Copy the generated key
103 | 5. Do not Restrict the API key to Cloud Vision API only for testing purposes
104 | 6. Add to `.env`:
105 |    ```env
106 |    GOOGLE_CLOUD_API_KEY=your_google_cloud_api_key_here
107 |    ```
108 | 
109 | ### 3.5 Tavily API Key (for Internet Search)
110 | 
111 | 1. Go to [https://tavily.com/](https://tavily.com/)
112 | 2. Sign up or log in
113 | 3. Navigate to your dashboard to get your API key
114 | 4. Add to `.env`:
115 |    ```env
116 |    TAVILY_API_KEY=your_tavily_api_key_here
117 |    ```
118 | 
119 | ### 3.6 Qdrant Setup (for Long-term Memory)
120 | 
121 | You have two options:
122 | 
123 | **Option A: Qdrant Cloud (Recommended for beginners)**
124 | 
125 | 1. Go to [https://cloud.qdrant.io/](https://cloud.qdrant.io/)
126 | 2. Sign up for a free account
127 | 3. Create a new cluster
128 | 4. Get your cluster URL and API key from the dashboard
129 | 5. Add to `.env`:
130 |    ```env
131 |    QDRANT_URL=your_cluster_url_here
132 |    QDRANT_API_KEY=your_qdrant_api_key_here
133 |    ```
134 | 
135 | **Option B: Local Qdrant (Advanced)**
136 | 
137 | 1. Install Qdrant locally or run via Docker
138 | 2. Add to `.env`:
139 |    ```env
140 |    QDRANT_HOST=localhost
141 |    QDRANT_PORT=6333
142 |    QDRANT_URL=http://localhost:6333
143 |    ```
144 | 
145 | ## 4. WhatsApp Business API Setup
146 | 
147 | ### 4.1 Create a Meta App
148 | 
149 | 1. Go to [https://developers.facebook.com/](https://developers.facebook.com/)
150 | 2. Click "My Apps" > "Create App"
151 | 3. Select "Business" as the app type
152 | 4. Fill in app details and create the app
153 | 
154 | ### 4.2 Add WhatsApp Product
155 | 
156 | 1. In your app dashboard, click "Add Product"
157 | 2. Find "WhatsApp" and click "Set Up"
158 | 3. Follow the setup wizard
159 | 
160 | ### 4.3 Get Access Token and Phone Number ID
161 | 
162 | 1. In the WhatsApp section, go to "API Setup"
163 | 2. You'll see:
164 |    - **Temporary Access Token**: Copy this (it expires in 24 hours, you'll need to generate a permanent one later)
165 |    - **Phone Number ID**: Copy this number
166 | 3. Add to `.env`:
167 |    ```env
168 |    WHATSAPP_TOKEN=your_temporary_access_token_here
169 |    WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id_here
170 |    ```
171 | 
172 | ### 4.4 Set Up Webhook Verification Token
173 | 
174 | 1. In the WhatsApp section, go to "Configuration"
175 | 2. Under "Webhook", click "Edit"
176 | 3. Create a verification token (any random string, e.g., `my_secure_verify_token_123`)
177 | 4. Add to `.env`:
178 |    ```env
179 |    WHATSAPP_VERIFY_TOKEN=your_verification_token_here
180 |    ```
181 | 
182 | **Note:** You'll configure the webhook URL in the next section after setting up ngrok.
183 | 
184 | ## 5. Google Calendar API Setup (First Time)
185 | 
186 | ### 5.1 Enable Google Calendar API
187 | 
188 | 1. Go to [https://console.cloud.google.com/](https://console.cloud.google.com/)
189 | 2. Select your project (or create a new one)
190 | 3. Go to "APIs & Services" > "Library"
191 | 4. Search for "Google Calendar API"
192 | 5. Click "Enable"
193 | 
194 | ### 5.2 Create OAuth 2.0 Credentials
195 | 
196 | 1. Go to "APIs & Services" > "Credentials"
197 | 2. Click "Create Credentials" > "OAuth client ID"
198 | 3. If prompted, configure the OAuth consent screen:
199 |    - Choose "External" (unless you have a Google Workspace)
200 |    - Fill in required app information
201 |    - Add your email as a test user
202 |    - Save and continue through the steps
203 | 4. Back in Credentials, create OAuth client ID:
204 |    - Application type: "Desktop app" or "Other"
205 |    - Name it (e.g., "Kylie Calendar")
206 |    - Click "Create"
207 | 5. Download the credentials JSON file
208 | 6. Rename it to `credentials.json` and place it in the `mycalendar/` directory
209 | 
210 | ### 5.3 First-Time Authorization
211 | 
212 | 1. Run your application (see section 7)
213 | 2. The first time the calendar tool is used, it will:
214 |    - Open a browser window
215 |    - Ask you to sign in with your Google account
216 |    - Request permission to access your calendar
217 |    - Generate a `token.json` file in the `mycalendar/` directory
218 | 3. This `token.json` file stores your authorization and will be reused for future requests
219 | 
220 | ## 6. Set Up ngrok for Local Development
221 | 
222 | ngrok creates a public URL that tunnels to your local server, which is required for WhatsApp webhooks.
223 | 
224 | ### 6.1 Install ngrok
225 | 
226 | 1. Download ngrok from [https://ngrok.com/download](https://ngrok.com/download)
227 | 2. Extract the executable to a folder in your PATH (or add it to PATH)
228 | 3. Verify installation:
229 |    ```bash
230 |    ngrok version
231 |    ```
232 | 
233 | ### 6.2 Authenticate ngrok
234 | 
235 | 1. Sign up for a free account at [https://ngrok.com/](https://ngrok.com/)
236 | 2. Get your authtoken from the dashboard
237 | 3. Authenticate:
238 |    ```bash
239 |    ngrok config add-authtoken your_authtoken_here
240 |    ```
241 | 
242 | ### 6.3 Start ngrok Tunnel
243 | 
244 | 1. Make sure your application will run on port 8000 (default)
245 | 2. In a new terminal window, run:
246 |    ```bash
247 |    ngrok http 8000
248 |    ```
249 | 3. You'll see output like:
250 |    ```
251 |    Forwarding  https://abc123.ngrok-free.app -> http://localhost:8000
252 |    ```
253 | 4. Copy the HTTPS URL (e.g., `https://abc123.ngrok-free.app`)
254 | 
255 | ### 6.4 Configure WhatsApp Webhook
256 | 
257 | 1. Go back to your Meta app dashboard
258 | 2. Navigate to WhatsApp > Configuration
259 | 3. Under "Webhook", click "Edit"
260 | 4. Set the **Callback URL** to:
261 |    ```
262 |    https://your-ngrok-url.ngrok-free.app/whatsapp_response
263 |    ```
264 |    Replace `your-ngrok-url` with your actual ngrok URL
265 | 5. Set the **Verify Token** to the same value you set in `.env` (`WHATSAPP_VERIFY_TOKEN`)
266 | 6. Click "Verify and Save"
267 | 7. Subscribe to message events by clicking "Manage" next to "Webhook fields" and selecting "messages"
268 | 
269 | **Important:** 
270 | - The ngrok URL changes each time you restart ngrok (unless you have a paid plan)
271 | - You'll need to update the webhook URL in Meta whenever you restart ngrok
272 | - Keep ngrok running while testing your application
273 | 
274 | ## 7. Complete .env File
275 | 
276 | Your `.env` file should now look like this:
277 | 
278 | ```env
279 | # Groq (LLM and STT)
280 | GROQ_API_KEY=your_groq_api_key
281 | 
282 | # ElevenLabs (TTS)
283 | ELEVENLABS_API_KEY=your_elevenlabs_api_key
284 | ELEVENLABS_VOICE_ID=your_voice_id
285 | 
286 | # Together AI (Image Generation)
287 | TOGETHER_API_KEY=your_together_api_key
288 | 
289 | # Google Cloud (Image Understanding)
290 | GOOGLE_CLOUD_API_KEY=your_google_cloud_api_key
291 | 
292 | # Tavily (Search)
293 | TAVILY_API_KEY=your_tavily_api_key
294 | 
295 | # Qdrant (Memory)
296 | QDRANT_URL=your_qdrant_url
297 | QDRANT_API_KEY=your_qdrant_api_key
298 | 
299 | # WhatsApp
300 | WHATSAPP_TOKEN=your_whatsapp_token
301 | WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id
302 | WHATSAPP_VERIFY_TOKEN=your_verify_token
303 | ```
304 | 
305 | ## 8. Run the Application
306 | 
307 | 1. **Make sure your virtual environment is activated**
308 | 
309 | 2. **Start the application:**
310 |    ```bash
311 |    uvicorn main:app --reload
312 |    ```
313 |    
314 |    Or if you have a run script:
315 |    ```bash
316 |    python main.py
317 |    ```
318 |    
319 |    The application will start on `http://localhost:8000`
320 | 
321 | 3. **Keep ngrok running** in a separate terminal window
322 | 
323 | 4. **Test the setup:**
324 |    - Send a WhatsApp message to your registered phone number
325 |    - Check the application logs to see if messages are being received
326 |    - Visit `http://127.0.0.1:4040` to see ngrok's request inspector
327 | 
328 | ## 9. Troubleshooting
329 | 
330 | ### WhatsApp Webhook Not Receiving Messages
331 | - Verify ngrok is running and the URL is correct
332 | - Check that the webhook URL in Meta matches your ngrok URL + `/whatsapp_response`
333 | - Ensure the verify token matches in both `.env` and Meta dashboard
334 | - Check that you've subscribed to message events
335 | 
336 | ### Google Calendar Authorization Issues
337 | - Make sure `credentials.json` is in the `mycalendar/` directory
338 | - Delete `token.json` and re-authorize if you get permission errors
339 | - Ensure Google Calendar API is enabled in your Google Cloud project
340 | 
341 | ### API Key Errors
342 | - Verify all API keys in `.env` are correct and not expired
343 | - Check that you've enabled the required APIs in each service's dashboard
344 | - For Google Cloud, ensure billing is enabled (some APIs require it)
345 | 
346 | ### ngrok URL Changes
347 | - Free ngrok URLs change on each restart
348 | - Update the webhook URL in Meta dashboard each time
349 | - Consider ngrok's paid plan for a static URL
350 | 
351 | ## 10. Next Steps
352 | 
353 | - Test sending text messages to your WhatsApp number
354 | - Try sending voice notes and images
355 | - Test calendar operations (list events, add events)
356 | - Test search functionality
357 | - Monitor the application logs for any errors
358 | 
359 | ---
360 | 
361 | **Note:** Keep your `.env` file secure and never commit it to version control. The `.gitignore` file should already exclude it.
362 | 
363 | 


--------------------------------------------------------------------------------
/modules/image/image_to_text.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import logging
  3 | import os
  4 | from typing import Optional, Union
  5 | 
  6 | import httpx
  7 | from core.exceptions import ImageToTextError
  8 | from settings import settings
  9 | 
 10 | 
 11 | class ImageToText:
 12 |     """A class to handle image-to-text conversion using Google Cloud Vision API (Imagen-Visual Captioning)."""
 13 | 
 14 |     REQUIRED_ENV_VARS = ["GOOGLE_CLOUD_API_KEY"]
 15 | 
 16 |     def __init__(self):
 17 |         """Initialize the ImageToText class."""
 18 |         self._validate_env_vars()
 19 |         self.logger = logging.getLogger(__name__)
 20 |         self.api_key = settings.GOOGLE_CLOUD_API_KEY
 21 |         # Google Cloud Vision API endpoint for image annotation
 22 |         self.api_url = "https://vision.googleapis.com/v1/images:annotate"
 23 | 
 24 |     def _validate_env_vars(self) -> None:
 25 |         """Validate that environment variables are set."""
 26 |         missing_vars = [var for var in self.REQUIRED_ENV_VARS if not getattr(settings, var, None)]
 27 |         if missing_vars:
 28 |             raise ValueError(
 29 |                 f"Missing required environment variables: {', '.join(missing_vars)}\n"
 30 |                 "Please set GOOGLE_CLOUD_API_KEY in your .env file."
 31 |             )
 32 | 
 33 |     async def analyze_image(self, image_data: Union[str, bytes], prompt: str = "") -> str:
 34 |         """Analyze an image using Google Cloud Vision API (Visual Captioning).
 35 | 
 36 |         Args:
 37 |             image_data: Either a file path (str) or binary image data (bytes)
 38 |             prompt: Optional prompt/question about the image (not used with Vision API, but kept for compatibility)
 39 | 
 40 |         Returns:
 41 |             str: Description or analysis of the image
 42 | 
 43 |         Raises:
 44 |             ValueError: If the image data is empty or invalid
 45 |             ImageToTextError: If the image analysis fails
 46 |         """
 47 |         try:
 48 |             # Handle file path
 49 |             if isinstance(image_data, str):
 50 |                 if not os.path.exists(image_data):
 51 |                     raise ValueError(f"Image file not found: {image_data}")
 52 |                 with open(image_data, "rb") as f:
 53 |                     image_bytes = f.read()
 54 |             else:
 55 |                 image_bytes = image_data
 56 | 
 57 |             if not image_bytes:
 58 |                 raise ValueError("Image data cannot be empty")
 59 | 
 60 |             # Detect image format from magic bytes
 61 |             mime_type = "image/jpeg"  # default
 62 |             if image_bytes.startswith(b'\x89PNG\r\n\x1a\n'):
 63 |                 mime_type = "image/png"
 64 |             elif image_bytes.startswith(b'\xff\xd8\xff'):
 65 |                 mime_type = "image/jpeg"
 66 |             elif image_bytes.startswith(b'RIFF') and b'WEBP' in image_bytes[:12]:
 67 |                 mime_type = "image/webp"
 68 |             elif image_bytes.startswith(b'GIF8'):
 69 |                 mime_type = "image/gif"
 70 | 
 71 |             self.logger.info(f"Detected image format: {mime_type}, size: {len(image_bytes)} bytes")
 72 |             self.logger.info("Using Google Cloud Vision API (Imagen-Visual Captioning)")
 73 | 
 74 |             # Encode image to base64 for Google Cloud Vision API
 75 |             image_base64 = base64.b64encode(image_bytes).decode("utf-8")
 76 | 
 77 |             # Prepare request payload for Google Cloud Vision API
 78 |             # Using Visual Captioning feature (Imagen-Visual)
 79 |             # Note: Visual Captioning is available through LABEL_DETECTION and OBJECT_LOCALIZATION
 80 |             # We'll combine these to generate a comprehensive caption
 81 |             payload = {
 82 |                 "requests": [
 83 |                     {
 84 |                         "image": {
 85 |                             "content": image_base64
 86 |                         },
 87 |                         "features": [
 88 |                             {
 89 |                                 "type": "LABEL_DETECTION",
 90 |                                 "maxResults": 10
 91 |                             },
 92 |                             {
 93 |                                 "type": "OBJECT_LOCALIZATION",
 94 |                                 "maxResults": 10
 95 |                             },
 96 |                             {
 97 |                                 "type": "TEXT_DETECTION",
 98 |                                 "maxResults": 10
 99 |                             }
100 |                         ]
101 |                     }
102 |                 ]
103 |             }
104 | 
105 |             headers = {
106 |                 "Content-Type": "application/json",
107 |             }
108 | 
109 |             # Google Cloud Vision API uses API key as query parameter
110 |             params = {
111 |                 "key": self.api_key
112 |             }
113 | 
114 |             self.logger.info(f"Making API call to Google Cloud Vision API")
115 |             self.logger.info(f"Image size: {len(image_bytes)} bytes, MIME type: {mime_type}")
116 | 
117 |             # Make the API call to Google Cloud Vision API
118 |             async with httpx.AsyncClient(timeout=30.0) as client:
119 |                 response = await client.post(
120 |                     self.api_url,
121 |                     headers=headers,
122 |                     params=params,
123 |                     json=payload,
124 |                 )
125 | 
126 |                 self.logger.info(f"Google Cloud Vision API response status: {response.status_code}")
127 | 
128 |                 if response.status_code != 200:
129 |                     error_text = response.text
130 |                     self.logger.error(f"Google Cloud Vision API error: {response.status_code} - {error_text}")
131 |                     
132 |                     # Parse error details if available
133 |                     try:
134 |                         error_json = response.json()
135 |                         error_info = error_json.get("error", {})
136 |                         error_message = error_info.get("message", error_text)
137 |                         error_code = error_info.get("code", response.status_code)
138 |                         error_reason = None
139 |                         
140 |                         # Check for specific error reasons
141 |                         details = error_info.get("details", [])
142 |                         for detail in details:
143 |                             if detail.get("@type") == "type.googleapis.com/google.rpc.ErrorInfo":
144 |                                 error_reason = detail.get("reason")
145 |                                 break
146 |                         
147 |                         # Provide specific guidance for common errors
148 |                         if error_code == 403 and error_reason == "API_KEY_SERVICE_BLOCKED":
149 |                             error_guidance = (
150 |                                 "\n\n❌ Google Cloud Vision API is BLOCKED for your API key.\n\n"
151 |                                 "To fix this, you need to:\n"
152 |                                 "1. Enable the Vision API in Google Cloud Console:\n"
153 |                                 "   - Go to: https://console.cloud.google.com/apis/library/vision.googleapis.com\n"
154 |                                 "   - Select your project (project ID: 995856091136)\n"
155 |                                 "   - Click 'Enable'\n\n"
156 |                                 "2. Check API key restrictions:\n"
157 |                                 "   - Go to: https://console.cloud.google.com/apis/credentials\n"
158 |                                 "   - Find your API key\n"
159 |                                 "   - Make sure 'Cloud Vision API' is in the allowed APIs list\n"
160 |                                 "   - Or remove API restrictions temporarily for testing\n\n"
161 |                                 "3. Enable billing (if required):\n"
162 |                                 "   - Vision API may require billing to be enabled\n"
163 |                                 "   - Go to: https://console.cloud.google.com/billing\n"
164 |                                 "   - Link a billing account to your project\n\n"
165 |                                 "4. Verify the API key has correct permissions\n"
166 |                             )
167 |                             raise ImageToTextError(
168 |                                 f"Google Cloud Vision API error ({error_code}): {error_message}{error_guidance}"
169 |                             )
170 |                         else:
171 |                             raise ImageToTextError(
172 |                                 f"Google Cloud Vision API error ({error_code}): {error_message}"
173 |                             )
174 |                     except Exception:
175 |                         raise ImageToTextError(
176 |                             f"Google Cloud Vision API error ({response.status_code}): {error_text}"
177 |                         )
178 | 
179 |                 result = response.json()
180 | 
181 |                 self.logger.info(f"Google Cloud Vision API response received: {type(result)}")
182 | 
183 |                 # Extract caption/description from Google Cloud Vision API response
184 |                 # Combine labels, objects, and text to create a comprehensive caption
185 |                 description_parts = []
186 | 
187 |                 if "responses" in result and len(result["responses"]) > 0:
188 |                     response_data = result["responses"][0]
189 | 
190 |                     # Extract labels (descriptive tags) - these form the main caption
191 |                     labels = response_data.get("labelAnnotations", [])
192 |                     if labels:
193 |                         # Get top labels with their scores
194 |                         top_labels = []
195 |                         for label in labels[:5]:
196 |                             desc = label.get("description", "")
197 |                             score = label.get("score", 0)
198 |                             if desc:
199 |                                 top_labels.append(desc)
200 |                         if top_labels:
201 |                             description_parts.append(", ".join(top_labels))
202 | 
203 |                     # Extract localized objects (what's in the image)
204 |                     localized_objects = response_data.get("localizedObjectAnnotations", [])
205 |                     if localized_objects:
206 |                         object_names = [obj.get("name", "") for obj in localized_objects[:5] if obj.get("name")]
207 |                         if object_names:
208 |                             description_parts.append(f"Objects: {', '.join(object_names)}")
209 | 
210 |                     # Extract detected text (if any text in image)
211 |                     text_annotations = response_data.get("textAnnotations", [])
212 |                     if text_annotations and len(text_annotations) > 0:
213 |                         # First annotation contains all detected text
214 |                         full_text = text_annotations[0].get("description", "")
215 |                         if full_text and len(full_text.strip()) > 0:
216 |                             # Only add text if it's meaningful (not just a few characters)
217 |                             if len(full_text.strip()) > 3:
218 |                                 description_parts.append(f"Text: {full_text[:100]}")
219 | 
220 |                 # Combine all parts into a natural description
221 |                 if description_parts:
222 |                     # Create a natural sentence from the parts
223 |                     if len(description_parts) == 1:
224 |                         description = description_parts[0]
225 |                     else:
226 |                         # Join with appropriate punctuation
227 |                         description = ". ".join(description_parts)
228 |                 else:
229 |                     # Fallback: use labels or create a generic description
230 |                     if "responses" in result and len(result["responses"]) > 0:
231 |                         response_data = result["responses"][0]
232 |                         labels = response_data.get("labelAnnotations", [])
233 |                         if labels:
234 |                             description = labels[0].get("description", "Image analyzed but no description available")
235 |                         else:
236 |                             description = "Image analyzed but no specific details were detected"
237 |                     else:
238 |                         description = "Unable to generate image description"
239 | 
240 |                 if not description or description.strip() == "":
241 |                     raise ImageToTextError("Empty description received from Google Cloud Vision API")
242 | 
243 |                 self.logger.info(f"Generated image description: {description[:100]}...")
244 |                 return description.strip()
245 | 
246 |         except ImageToTextError:
247 |             # Re-raise our custom errors
248 |             raise
249 |         except httpx.TimeoutException as e:
250 |             self.logger.error(f"Request timeout: {e}")
251 |             raise ImageToTextError(f"Request timeout: {str(e)}") from e
252 |         except Exception as e:
253 |             self.logger.error(f"Unexpected error in analyze_image: {e}", exc_info=True)
254 |             raise ImageToTextError(f"Failed to analyze image: {str(e)}") from e
255 | 


--------------------------------------------------------------------------------
/graph/nodes.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from uuid import uuid4
  3 | 
  4 | from langchain_core.messages import AIMessage, HumanMessage, RemoveMessage
  5 | from langchain_core.runnables import RunnableConfig
  6 | 
  7 | from .state import AICompanionState
  8 | from graph.utils.chains import (
  9 |     get_character_response_chain,
 10 |     get_router_chain,
 11 | )
 12 | from graph.utils.helpers import (
 13 |     get_chat_model,
 14 |     get_text_to_image_module,
 15 |     get_text_to_speech_module,
 16 |     get_search_module,
 17 | )
 18 | from modules.memory.long_term.memory_manager import get_memory_manager
 19 | from modules.schedules.context_generation import ScheduleContextGenerator
 20 | from settings import settings
 21 | 
 22 | 
 23 | async def router_node(state: AICompanionState):
 24 |     chain = get_router_chain()
 25 |     response = await chain.ainvoke({"messages": state["messages"][-settings.ROUTER_MESSAGES_TO_ANALYZE :]})
 26 |     return {"workflow": response.response_type}
 27 | 
 28 | 
 29 | def context_injection_node(state: AICompanionState):
 30 |     schedule_context = ScheduleContextGenerator.get_current_activity()
 31 |     if schedule_context != state.get("current_activity", ""):
 32 |         apply_activity = True
 33 |     else:
 34 |         apply_activity = False
 35 |     return {"apply_activity": apply_activity, "current_activity": schedule_context}
 36 | 
 37 | 
 38 | async def conversation_node(state: AICompanionState, config: RunnableConfig):
 39 |     current_activity = ScheduleContextGenerator.get_current_activity()
 40 |     memory_context = state.get("memory_context", "")
 41 |     search_context = state.get("search_results", "")
 42 | 
 43 |     chain = get_character_response_chain(
 44 |         state.get("summary", ""),
 45 |         with_tools=False,
 46 |         search_context=search_context
 47 |     )
 48 | 
 49 |     response = await chain.ainvoke(
 50 |         {
 51 |             "messages": state["messages"],
 52 |             "current_activity": current_activity,
 53 |             "memory_context": memory_context,
 54 |         },
 55 |         config,
 56 |     )
 57 |     return {"messages": AIMessage(content=response)}
 58 | 
 59 | 
 60 | async def image_node(state: AICompanionState, config: RunnableConfig):
 61 |     current_activity = ScheduleContextGenerator.get_current_activity()
 62 |     memory_context = state.get("memory_context", "")
 63 | 
 64 |     chain = get_character_response_chain(state.get("summary", ""))
 65 |     text_to_image_module = get_text_to_image_module()
 66 | 
 67 |     scenario = await text_to_image_module.create_scenario(state["messages"][-5:])
 68 |     os.makedirs("generated_images", exist_ok=True)
 69 |     img_path = f"generated_images/image_{str(uuid4())}.png"
 70 |     await text_to_image_module.generate_image(scenario.image_prompt, img_path)
 71 | 
 72 |     # Inject the image prompt information as an AI message
 73 |     scenario_message = HumanMessage(content=f"<image attached by Ava generated from prompt: {scenario.image_prompt}>")
 74 |     updated_messages = state["messages"] + [scenario_message]
 75 | 
 76 |     response = await chain.ainvoke(
 77 |         {
 78 |             "messages": updated_messages,
 79 |             "current_activity": current_activity,
 80 |             "memory_context": memory_context,
 81 |         },
 82 |         config,
 83 |     )
 84 | 
 85 |     return {"messages": AIMessage(content=response), "image_path": img_path}
 86 | 
 87 | 
 88 | async def audio_node(state: AICompanionState, config: RunnableConfig):
 89 |     current_activity = ScheduleContextGenerator.get_current_activity()
 90 |     memory_context = state.get("memory_context", "")
 91 | 
 92 |     chain = get_character_response_chain(state.get("summary", ""))
 93 |     text_to_speech_module = get_text_to_speech_module()
 94 | 
 95 |     response = await chain.ainvoke(
 96 |         {
 97 |             "messages": state["messages"],
 98 |             "current_activity": current_activity,
 99 |             "memory_context": memory_context,
100 |         },
101 |         config,
102 |     )
103 |     output_audio = await text_to_speech_module.synthesize(response)
104 | 
105 |     return {"messages": response, "audio_buffer": output_audio}
106 | 
107 | 
108 | async def summarize_conversation_node(state: AICompanionState):
109 |     model = get_chat_model()
110 |     summary = state.get("summary", "")
111 | 
112 |     if summary:
113 |         summary_message = (
114 |             f"This is summary of the conversation to date between Ava and the user: {summary}\n\n"
115 |             "Extend the summary by taking into account the new messages above:"
116 |         )
117 |     else:
118 |         summary_message = (
119 |             "Create a summary of the conversation above between Ava and the user. "
120 |             "The summary must be a short description of the conversation so far, "
121 |             "but that captures all the relevant information shared between Ava and the user:"
122 |         )
123 | 
124 |     messages = state["messages"] + [HumanMessage(content=summary_message)]
125 |     response = await model.ainvoke(messages)
126 | 
127 |     delete_messages = [RemoveMessage(id=m.id) for m in state["messages"][: -settings.TOTAL_MESSAGES_AFTER_SUMMARY]]
128 |     return {"summary": response.content, "messages": delete_messages}
129 | 
130 | 
131 | async def memory_extraction_node(state: AICompanionState):
132 |     """Extract and store important information from the last message."""
133 |     if not state["messages"]:
134 |         return {}
135 | 
136 |     memory_manager = get_memory_manager()
137 |     await memory_manager.extract_and_store_memories(state["messages"][-1])
138 |     return {}
139 | 
140 | 
141 | def memory_injection_node(state: AICompanionState):
142 |     """Retrieve and inject relevant memories into the character card."""
143 |     memory_manager = get_memory_manager()
144 | 
145 |     # Get relevant memories based on recent conversation
146 |     recent_context = " ".join([m.content for m in state["messages"][-3:]])
147 |     memories = memory_manager.get_relevant_memories(recent_context)
148 | 
149 |     # Format memories for the character card
150 |     memory_context = memory_manager.format_memories_for_prompt(memories)
151 | 
152 |     return {"memory_context": memory_context}
153 | 
154 | 
155 | async def search_node(state: AICompanionState, config: RunnableConfig):
156 |     """
157 |     Node for handling internet search operations using Tavily.
158 |     This node performs search and then routes to conversation node with search results.
159 |     """
160 |     current_activity = ScheduleContextGenerator.get_current_activity()
161 |     memory_context = state.get("memory_context", "")
162 |     
163 |     # Extract search query from user's last message
164 |     user_message = state["messages"][-1].content if state["messages"] else ""
165 |     
166 |     # Perform search
167 |     search_module = get_search_module()
168 |     search_results_list = await search_module.search(query=user_message, max_results=5)
169 |     
170 |     # Format search results
171 |     formatted_results = search_module.format_search_results(search_results_list)
172 |     
173 |     # Store search results in state
174 |     search_results_str = formatted_results
175 |     
176 |     # generate response with search context
177 |     chain = get_character_response_chain(
178 |         state.get("summary", ""), 
179 |         with_tools=False,
180 |         search_context=search_results_str
181 |     )
182 |     
183 |     # Create a context message with search results
184 |     from langchain_core.messages import SystemMessage
185 |     search_context_msg = SystemMessage(content=f"User asked: {user_message}\n\nSearch Results:\n{search_results_str}")
186 |     
187 |     # Generate response using search results
188 |     response = await chain.ainvoke(
189 |         {
190 |             "messages": state["messages"] + [search_context_msg],
191 |             "current_activity": current_activity,
192 |             "memory_context": memory_context,
193 |         },
194 |         config,
195 |     )
196 |     
197 |     return {
198 |         "messages": AIMessage(content=response),
199 |         "search_results": search_results_str
200 |     }
201 | 
202 | 
203 | async def tool_calling_node(state: AICompanionState, config: RunnableConfig):
204 |     """
205 |     Node for handling tool calls (calendar operations).
206 |     This node will be used when the agent needs to use tools.
207 |     """
208 |     from datetime import datetime
209 |     import pytz
210 |     
211 |     tz = pytz.timezone('Africa/Kampala')
212 |     current_dt = datetime.now(tz)
213 |     current_date = current_dt.strftime('%Y-%m-%d')
214 |     current_time = current_dt.strftime('%H:%M')
215 |     
216 |     current_activity = ScheduleContextGenerator.get_current_activity()
217 |     memory_context = state.get("memory_context", "")
218 | 
219 |     print(f"DEBUG: tool_calling_node - Current date: {current_date}")
220 |     print(f"DEBUG: tool_calling_node - Current time: {current_time}")
221 |     print(f"DEBUG: tool_calling_node - Current activity: {current_activity}")
222 |     print(f"DEBUG: tool_calling_node - Memory context: {memory_context}")
223 |     print(f"DEBUG: tool_calling_node - Last message: {state['messages'][-1].content}")
224 | 
225 |     # Use the chain with tools enabled
226 |     chain = get_character_response_chain(state.get("summary", ""), with_tools=True)
227 | 
228 |     print(f"DEBUG: Chain created with tools enabled")
229 | 
230 |     enhanced_messages = state["messages"].copy()
231 |     
232 |     # Add a system-like context message to help with date understanding
233 |     from langchain_core.messages import SystemMessage
234 |     date_context_msg = SystemMessage(content=f"Current date: {current_date}, Current time: {current_time}, Timezone: Africa/Kampala")
235 |     enhanced_messages.insert(-1, date_context_msg)
236 |     
237 |     response = await chain.ainvoke(
238 |         {
239 |             "messages": state["messages"],
240 |             "current_activity": current_activity,
241 |             "memory_context": memory_context,
242 |         },
243 |         config,
244 |     )
245 |     
246 |     print(f"DEBUG: Initial response type: {type(response)}")
247 |     print(f"DEBUG: Initial response: {response}")
248 |     print(f"DEBUG: Initial response content: {getattr(response, 'content', 'No content attr')}")
249 |     print(f"DEBUG: Has tool_calls: {hasattr(response, 'tool_calls')}")
250 |     if hasattr(response, 'tool_calls'):
251 |         print(f"DEBUG: Tool calls: {response.tool_calls}")
252 |         print(f"DEBUG: Tool calls length: {len(response.tool_calls) if response.tool_calls else 0}")
253 |     
254 |     # Handle tool calls if present
255 |     if hasattr(response, 'tool_calls') and response.tool_calls:
256 |         print(f"DEBUG: Processing {len(response.tool_calls)} tool calls")
257 |         
258 |         # The model wants to use tools
259 |         from langchain_core.messages import ToolMessage
260 |         from mycalendar.langchain_integration import get_calendar_tools
261 |         
262 |         tools = {tool.name: tool for tool in get_calendar_tools()}
263 |         print(f"DEBUG: Available tools: {list(tools.keys())}")
264 |         
265 |         tool_messages = []
266 |         
267 |         for tool_call in response.tool_calls:
268 |             tool_name = tool_call["name"]
269 |             tool_args = tool_call["args"]
270 |             
271 |             print(f"DEBUG: Executing tool {tool_name} with args {tool_args}")
272 |             
273 |             if tool_name in tools:
274 |                 try:
275 |                     # Execute the tool using invoke method
276 |                     tool_result = await tools[tool_name].ainvoke(tool_args)
277 |                     print(f"DEBUG: Tool result: {tool_result}")
278 |                     tool_messages.append(
279 |                         ToolMessage(
280 |                             content=str(tool_result),
281 |                             tool_call_id=tool_call["id"]
282 |                         )
283 |                     )
284 |                 except Exception as e:
285 |                     print(f"DEBUG: Tool execution error: {e}")
286 |                     tool_messages.append(
287 |                         ToolMessage(
288 |                             content=f"Error executing {tool_name}: {str(e)}",
289 |                             tool_call_id=tool_call["id"]
290 |                         )
291 |                     )
292 |             else:
293 |                 print(f"DEBUG: Tool {tool_name} not found in available tools")
294 |         
295 |         # Get final response after tool execution
296 |         updated_messages = state["messages"] + [response] + tool_messages
297 |         print(f"DEBUG: Getting final response with {len(tool_messages)} tool results")
298 |         
299 |         final_response = await chain.ainvoke(
300 |             {
301 |                 "messages": updated_messages,
302 |                 "current_activity": current_activity,
303 |                 "memory_context": memory_context,
304 |             },
305 |             config,
306 |         )
307 |         
308 |         print(f"DEBUG: Final response type: {type(final_response)}")
309 |         print(f"DEBUG: Final response content: {getattr(final_response, 'content', 'No content attr')}")
310 |         
311 |         # Ensure we return an AIMessage with content
312 |         from langchain_core.messages import AIMessage
313 |         if hasattr(final_response, 'content') and final_response.content:
314 |             return {"messages": final_response}
315 |         else:
316 |             # Fallback message if response is empty
317 |             return {"messages": AIMessage(content="I've checked your calendar. Let me know if you need anything else!")}
318 |     else:
319 |         # No tool calls, this means the model didn't decide to use tools
320 |         print(f"DEBUG: No tool calls generated - this might indicate an issue with tool binding")
321 |         print(f"DEBUG: Response content: {getattr(response, 'content', 'No content')}")
322 |         
323 |         # Check if we have a valid text response
324 |         if hasattr(response, 'content') and response.content and response.content.strip():
325 |             return {"messages": response}
326 |         else:
327 |             # Generate a calendar response manually since tools weren't called
328 |             print(f"DEBUG: Manually calling calendar tool since model didn't generate tool calls")
329 |             try:
330 |                 from mycalendar.langchain_integration import list_upcoming_events
331 |                 # Use invoke method instead of __call__, and pass arguments correctly
332 |                 calendar_result = await list_upcoming_events.ainvoke({"max_results": 10})
333 |                 from langchain_core.messages import AIMessage
334 |                 return {"messages": AIMessage(content=f"Let me check your schedule for you!\n\n{calendar_result}")}
335 |             except Exception as e:
336 |                 print(f"DEBUG: Manual calendar call failed: {e}")
337 |                 from langchain_core.messages import AIMessage
338 |                 return {"messages": AIMessage(content="I'd like to help you check your schedule, but I'm having trouble accessing your calendar right now. Please try again!")}


--------------------------------------------------------------------------------
/mycalendar/calendar_tool.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import logging
  3 | from datetime import datetime, timedelta, timezone
  4 | from typing import List, Optional, Dict, Any
  5 | from google.auth.transport.requests import Request
  6 | from google.oauth2.credentials import Credentials
  7 | from google_auth_oauthlib.flow import InstalledAppFlow
  8 | from googleapiclient.discovery import build
  9 | from googleapiclient.errors import HttpError
 10 | 
 11 | SCOPES = ['https://www.googleapis.com/auth/calendar.readonly',
 12 |           'https://www.googleapis.com/auth/calendar.events']
 13 | 
 14 | logger = logging.getLogger(__name__)
 15 | 
 16 | class CalendarTool:
 17 |     """
 18 |     A tool for interacting with a Google Calendar using the Google Calendar API.
 19 |     Requires initial setup of credentials.json and user authorization flow.
 20 |     """
 21 | 
 22 |     def __init__(self, credentials_file: str = "mycalendar\credentials.json", token_file: str = "token.json"):
 23 |         """
 24 |         Initializes the CalendarTool.
 25 |         Args:
 26 |             credentials_file: Path to the downloaded credentials.json file.
 27 |             token_file: Path to store/load user authorization tokens.
 28 |         """
 29 |         self.credentials_file = credentials_file
 30 |         self.token_file = token_file
 31 |         self.service = None
 32 |         self._authenticate()
 33 | 
 34 |     def _authenticate(self):
 35 |         """Handles the authentication flow with Google Calendar API."""
 36 |         creds = None
 37 |         if os.path.exists(self.token_file):
 38 |             creds = Credentials.from_authorized_user_file(self.token_file, SCOPES)
 39 |         if not creds or not creds.valid:
 40 |             if creds and creds.expired and creds.refresh_token:
 41 |                 creds.refresh(Request())
 42 |             else:
 43 |                 flow = InstalledAppFlow.from_client_secrets_file(
 44 |                     self.credentials_file, SCOPES)
 45 |                 creds = flow.run_local_server(port=0)
 46 |             with open(self.token_file, 'w') as token:
 47 |                 token.write(creds.to_json())
 48 | 
 49 |         try:
 50 |             self.service = build('calendar', 'v3', credentials=creds)
 51 |             logger.info("Successfully authenticated with Google Calendar.")
 52 |         except HttpError as error:
 53 |             logger.error(f"An error occurred during authentication: {error}")
 54 |             self.service = None
 55 | 
 56 | 
 57 |     async def list_upcoming_events(self, max_results: int = 10) -> List[Dict[str, Any]]:
 58 |         """
 59 |         Lists the upcoming events on the user's primary calendar.
 60 |         Args:
 61 |             max_results: Maximum number of events to return.
 62 |         Returns:
 63 |             A list of dictionaries representing events, or an empty list on error.
 64 |         """
 65 |         if not self.service:
 66 |              logger.error("Calendar service not initialized.")
 67 |              return [{"error": "Calendar service not initialized."}]
 68 | 
 69 |         try:
 70 |             now = datetime.utcnow().isoformat() + 'Z'
 71 |             logger.info(f"Getting the upcoming {max_results} events")
 72 |             events_result = self.service.events().list(calendarId='primary', timeMin=now,
 73 |                                                 maxResults=max_results, singleEvents=True,
 74 |                                                 orderBy='startTime').execute()
 75 |             events = events_result.get('items', [])
 76 | 
 77 |             formatted_events = []
 78 |             if not events:
 79 |                 logger.info('No upcoming events found.')
 80 |                 return [{"summary": "No upcoming events found."}]
 81 |             for event in events:
 82 |                 start = event['start'].get('dateTime', event['start'].get('date'))
 83 |                 formatted_events.append({
 84 |                     "summary": event.get('summary', 'No Title'),
 85 |                     "start": start,
 86 |                     "id": event.get('id')
 87 |                 })
 88 |             logger.info(f"Found {len(formatted_events)} upcoming events.")
 89 |             return formatted_events
 90 | 
 91 |         except HttpError as error:
 92 |             logger.error(f"An error occurred while fetching events: {error}")
 93 |             return [{"error": f"An error occurred while fetching events: {error}"}]
 94 | 
 95 |     async def add_event(self, summary: str, start_time: str, end_time: str, description: str = "") -> Dict[str, Any]:
 96 |         """
 97 |         Adds an event to the user's primary calendar.
 98 |         Args:
 99 |             summary: Title of the event.
100 |             start_time: Start time in ISO 8601 format (e.g., '2024-07-15T10:00:00').
101 |             end_time: End time in ISO 8601 format (e.g., '2024-07-15T11:00:00').
102 |             description: Optional description of the event.
103 |         Returns:
104 |             A dictionary with the result (success/failure message or event details) or error.
105 |         """
106 |          # Validate input times
107 |         try:
108 |             datetime.fromisoformat(start_time.replace('Z', '+00:00')) # Handle 'Z' suffix
109 |             datetime.fromisoformat(end_time.replace('Z', '+00:00'))
110 |         except ValueError:
111 |             error_msg = "Invalid date/time format. Please use ISO 8601 (e.g., '2024-07-15T10:00:00')."
112 |             logger.error(error_msg)
113 |             return {"error": error_msg}
114 | 
115 |         if not self.service:
116 |             error_msg = "Calendar service not initialized."
117 |             logger.error(error_msg)
118 |             return {"error": error_msg}
119 | 
120 |         event = {
121 |             'summary': summary,
122 |             'location': '', 
123 |             'description': description,
124 |             'start': {
125 |                 'dateTime': start_time,
126 |                 'timeZone': 'UTC', 
127 |             },
128 |             'end': {
129 |                 'dateTime': end_time,
130 |                 'timeZone': 'UTC', 
131 |             },
132 |             'reminders': {
133 |                 'useDefault': False,
134 |                 'overrides': [
135 |                     {'method': 'email', 'minutes': 24 * 60}, 
136 |                     {'method': 'popup', 'minutes': 10}, 
137 |                 ],
138 |             },
139 |         }
140 |         try:
141 |                 event_result = self.service.events().insert(calendarId='primary', body=event).execute()
142 |                 success_msg = f"Event created: {event_result.get('htmlLink')}"
143 |                 logger.info(success_msg)
144 |                 return {
145 |                     "status": "success",
146 |                     "summary": event_result.get('summary'),
147 |                     "start": event_result['start'].get('dateTime', event_result['start'].get('date')),
148 |                     "id": event_result.get('id'),
149 |                     "link": event_result.get('htmlLink')
150 |                 }
151 |         except HttpError as error:
152 |             error_msg = f"An error occurred while adding the event: {error}"
153 |             logger.error(error_msg)
154 |             return {"error": error_msg}
155 | 
156 |     # Example: Check for events happening now or soon
157 |     async def get_current_or_next_event(self, lookahead_minutes: int = 30) -> Optional[Dict[str, Any]]:
158 |         """
159 |         Checks for events happening right now or within the next 'lookahead_minutes'.
160 |         Args:
161 |             lookahead_minutes: How many minutes into the future to check.
162 |         Returns:
163 |             A dictionary representing the event if found, otherwise None.
164 |         """
165 |         if not self.service:
166 |             logger.error("Calendar service not initialized.")
167 |             return None
168 | 
169 |         try:
170 |             now = datetime.utcnow()
171 |             time_min = now.isoformat() + 'Z'
172 |             time_max = (now + timedelta(minutes=lookahead_minutes)).isoformat() + 'Z'
173 | 
174 |             logger.info(f"Checking for events between {time_min} and {time_max}")
175 |             events_result = self.service.events().list(
176 |                 calendarId='primary',
177 |                 timeMin=time_min,
178 |                 timeMax=time_max,
179 |                 singleEvents=True,
180 |                 orderBy='startTime'
181 |             ).execute()
182 |             events = events_result.get('items', [])
183 | 
184 |             if events:
185 |                 event = events[0] # Get the first (closest) event
186 |                 start = event['start'].get('dateTime', event['start'].get('date'))
187 |                 logger.info(f"Upcoming/Current event found: {event.get('summary')}")
188 |                 return {
189 |                     "summary": event.get('summary', 'No Title'),
190 |                     "start": start,
191 |                     "id": event.get('id')
192 |                 }
193 |             else:
194 |                 logger.info("No events found in the specified time window.")
195 |                 return None
196 | 
197 |         except HttpError as error:
198 |             logger.error(f"An error occurred while checking for current/next event: {error}")
199 |             return None
200 | 
201 | 
202 | 
203 | 
204 | # Example usage (if run directly or for testing)  
205 | # run in terminal for testing(uv run calendar_tool.py OR py calendar_tool.py)
206 | # Tests include:
207 | # listing 3 upcoming events
208 | # listing more upcoming events(10)
209 | # Check current or next event (within 30 minutes)
210 | # Check within a longer timeframe
211 | # Add a test event
212 | # Calculate times for a test event (1 hour from now, lasting 30 minutes)
213 | # Test different time formats
214 | # Edge cases for listing events
215 | # Different time windows for current/next event
216 | 
217 | # if __name__ == "__main__":
218 | #     import asyncio
219 | #     from datetime import datetime, timezone, timedelta
220 |     
221 | #     async def main():
222 | #         print("🚀 Starting CalendarTool Test Suite...")
223 | #         print("=" * 50)
224 |         
225 | #         # Initialize the tool
226 | #         tool = CalendarTool()
227 |         
228 | #         # Test 1: List upcoming events (different quantities)
229 | #         print("\n📅 TEST 1: Listing upcoming events")
230 | #         print("-" * 30)
231 | #         events = await tool.list_upcoming_events(3)
232 | #         print(f"Next 3 events: {len(events)} found")
233 | #         for i, event in enumerate(events, 1):
234 | #             print(f"  {i}. {event.get('summary', 'No title')} - {event.get('start', 'No date')}")
235 |         
236 | #         # Test 2: List more events
237 | #         print(f"\nFetching more events (10)...")
238 | #         more_events = await tool.list_upcoming_events(10)
239 | #         print(f"Next 10 events: {len(more_events)} found")
240 |         
241 | #         # Test 3: Check current or next event (within 30 minutes)
242 | #         print("\n⏰ TEST 2: Checking current/next event (30 min window)")
243 | #         print("-" * 30)
244 | #         current_event = await tool.get_current_or_next_event(30)
245 | #         if current_event:
246 | #             print(f"Found current/upcoming event: {current_event['summary']} at {current_event['start']}")
247 | #         else:
248 | #             print("No events in the next 30 minutes")
249 |         
250 | #         # Test 4: Check within a longer timeframe
251 | #         print(f"\nChecking within 2 hours...")
252 | #         next_event_2h = await tool.get_current_or_next_event(120)
253 | #         if next_event_2h:
254 | #             print(f"Event within 2 hours: {next_event_2h['summary']} at {next_event_2h['start']}")
255 | #         else:
256 | #             print("No events in the next 2 hours")
257 |         
258 | #         # Test 5: Add a test event (uncomment to test)
259 | #         print("\n➕ TEST 3: Adding a test event")
260 | #         print("-" * 30)
261 |         
262 | #         # Calculate times for a test event (1 hour from now, lasting 30 minutes)
263 | #         now = datetime.now(timezone.utc)
264 | #         start_time = (now + timedelta(hours=1)).isoformat().replace('+00:00', 'Z')
265 | #         end_time = (now + timedelta(hours=1, minutes=30)).isoformat().replace('+00:00', 'Z')
266 |         
267 | #         print(f"Adding test event from {start_time} to {end_time}")
268 | #         result = await tool.add_event(
269 | #             "🧪 Test Event from CalendarTool", 
270 | #             start_time, 
271 | #             end_time, 
272 | #             "This is a test event created by the CalendarTool script. You can delete this."
273 | #         )
274 |         
275 | #         if result.get('status') == 'success':
276 | #             print(f"✅ Event created successfully!")
277 | #             print(f"   Title: {result['summary']}")
278 | #             print(f"   Start: {result['start']}")
279 | #             print(f"   Event ID: {result['id']}")
280 | #             print(f"   Link: {result.get('link', 'N/A')}")
281 | #         else:
282 | #             print(f"❌ Failed to create event: {result.get('error', 'Unknown error')}")
283 |         
284 | #         # Test 6: Test different time formats
285 | #         print("\n🕐 TEST 4: Testing different time formats")
286 | #         print("-" * 30)
287 |         
288 | #         # Test with different time formats (these should fail gracefully)
289 | #         invalid_formats = [
290 | #             ("Invalid format 1", "2024-13-45", "2024-13-46", "Should fail - invalid date"),
291 | #             ("Invalid format 2", "not-a-date", "also-not-a-date", "Should fail - not a date"),
292 | #         ]
293 |         
294 | #         for title, start, end, description in invalid_formats:
295 | #             print(f"Testing {title}...")
296 | #             result = await tool.add_event(title, start, end, description)
297 | #             if 'error' in result:
298 | #                 print(f"  ✅ Correctly caught error: {result['error']}")
299 | #             else:
300 | #                 print(f"  ❌ Should have failed but didn't: {result}")
301 |         
302 | #         # Test 7: Edge cases for listing events
303 | #         print("\n🔍 TEST 5: Edge cases")
304 | #         print("-" * 30)
305 |         
306 | #         # Test listing 0 events
307 | #         zero_events = await tool.list_upcoming_events(0)
308 | #         print(f"Requesting 0 events returned: {len(zero_events)} items")
309 |         
310 | #         # Test listing many events
311 | #         many_events = await tool.list_upcoming_events(50)
312 | #         print(f"Requesting 50 events returned: {len(many_events)} items")
313 |         
314 | #         # Test 8: Different time windows for current/next event
315 | #         print("\n🎯 TEST 6: Different time windows")
316 | #         print("-" * 30)
317 |         
318 | #         time_windows = [5, 15, 60, 240, 1440]  # 5 min, 15 min, 1 hour, 4 hours, 24 hours
319 | #         for minutes in time_windows:
320 | #             event = await tool.get_current_or_next_event(minutes)
321 | #             if event:
322 | #                 print(f"  Within {minutes} minutes: {event['summary']}")
323 | #             else:
324 | #                 print(f"  Within {minutes} minutes: No events")
325 |         
326 | #         print("\n" + "=" * 50)
327 | #         print("🎉 Test Suite Complete!")
328 | #         print("\nIf you want to test adding events, uncomment the add_event lines above")
329 | #         print("and modify the times to be in the future.")
330 |         
331 | #     asyncio.run(main())
332 | 


--------------------------------------------------------------------------------
/whatsapp_response.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import os
  3 | from io import BytesIO
  4 | from typing import Dict
  5 | 
  6 | import httpx
  7 | from fastapi import APIRouter, Request, Response
  8 | from langchain_core.messages import HumanMessage
  9 | from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
 10 | 
 11 | from graph import graph_builder
 12 | from modules.image import ImageToText
 13 | from modules.speech import SpeechToText, TextToSpeech
 14 | from settings import settings
 15 | 
 16 | logger = logging.getLogger(__name__)
 17 | 
 18 | # Global module instances
 19 | speech_to_text = SpeechToText()
 20 | text_to_speech = TextToSpeech()
 21 | image_to_text = ImageToText()
 22 | 
 23 | # Router for WhatsApp response
 24 | whatsapp_router = APIRouter()
 25 | 
 26 | # WhatsApp API credentials
 27 | WHATSAPP_TOKEN = os.getenv("WHATSAPP_TOKEN")
 28 | WHATSAPP_PHONE_NUMBER_ID = os.getenv("WHATSAPP_PHONE_NUMBER_ID")
 29 | 
 30 | # Add a simple in-memory store to prevent duplicate processing
 31 | processed_messages = set()
 32 | 
 33 | 
 34 | @whatsapp_router.api_route("/whatsapp_response", methods=["GET", "POST"])
 35 | async def whatsapp_handler(request: Request) -> Response:
 36 |     """Handles incoming messages and status updates from the WhatsApp Cloud API."""
 37 | 
 38 |     if request.method == "GET":
 39 |         params = request.query_params
 40 |         if params.get("hub.verify_token") == os.getenv("WHATSAPP_VERIFY_TOKEN"):
 41 |             return Response(content=params.get("hub.challenge"), status_code=200)
 42 |         return Response(content="Verification token mismatch", status_code=403)
 43 | 
 44 |     try:
 45 |         data = await request.json()
 46 |         logger.info(f"Received webhook data: {data}")
 47 |         
 48 |         change_value = data["entry"][0]["changes"][0]["value"]
 49 |         
 50 |         if "messages" in change_value:
 51 |             message = change_value["messages"][0]
 52 |             message_id = message.get("id")
 53 |             from_number = message["from"]
 54 |             
 55 |             # Prevent duplicate processing
 56 |             if message_id in processed_messages:
 57 |                 logger.info(f"Message {message_id} already processed, skipping")
 58 |                 return Response(content="Message already processed", status_code=200)
 59 |             
 60 |             processed_messages.add(message_id)
 61 |             # Keep only last 1000 message IDs to prevent memory issues
 62 |             if len(processed_messages) > 1000:
 63 |                 processed_messages.clear()
 64 |             
 65 |             session_id = from_number
 66 | 
 67 |             # Get user message and handle different message types
 68 |             content = ""
 69 |             if message["type"] == "audio":
 70 |                 content = await process_audio_message(message)
 71 |             elif message["type"] == "image":
 72 |                 # Get image caption if any
 73 |                 content = message.get("image", {}).get("caption", "")
 74 |                 logger.info(f"Received image message. Caption: {content}")
 75 |                 
 76 |                 # Download and analyze image
 77 |                 try:
 78 |                     image_id = message["image"]["id"]
 79 |                     logger.info(f"Downloading image with ID: {image_id}")
 80 |                     image_bytes = await download_media(image_id)
 81 |                     logger.info(f"Downloaded image successfully. Size: {len(image_bytes)} bytes")
 82 |                     
 83 |                     # Analyze the image
 84 |                     description = await image_to_text.analyze_image(
 85 |                         image_bytes,
 86 |                         "Please describe what you see in this image in the context of our conversation.",
 87 |                     )
 88 |                     logger.info(f"Image analysis successful. Description length: {len(description)}")
 89 |                     content += f"\n[Image Analysis: {description}]"
 90 |                 except Exception as e:
 91 |                     logger.error(f"Failed to analyze image: {e}", exc_info=True)
 92 |                     # Still include caption if available, but mark that image analysis failed
 93 |                     if not content:
 94 |                         content = "[Image received but could not be analyzed]"
 95 |             else:
 96 |                 content = message["text"]["body"]
 97 | 
 98 |             logger.info(f"Processing message from {from_number}: {content}")
 99 | 
100 |             try:
101 |                 async with AsyncSqliteSaver.from_conn_string(settings.SHORT_TERM_MEMORY_DB_PATH) as short_term_memory:
102 |                     graph = graph_builder.compile(checkpointer=short_term_memory)
103 |                     await graph.ainvoke(
104 |                         {"messages": [HumanMessage(content=content)]},
105 |                         {"configurable": {"thread_id": session_id}},
106 |                     )
107 | 
108 |                     # Get the workflow type and response from the state
109 |                     output_state = await graph.aget_state(config={"configurable": {"thread_id": session_id}})
110 | 
111 |                 workflow = output_state.values.get("workflow", "conversation")
112 |                 response_message = output_state.values["messages"][-1].content
113 | 
114 |                 # Handle different response types based on workflow
115 |                 success = False
116 |                 if workflow == "audio":
117 |                     audio_buffer = output_state.values["audio_buffer"]
118 |                     success = await send_response(from_number, response_message, "audio", audio_buffer)
119 |                 elif workflow == "image":
120 |                     image_path = output_state.values["image_path"]
121 |                     with open(image_path, "rb") as f:
122 |                         image_data = f.read()
123 |                     success = await send_response(from_number, response_message, "image", image_data)
124 |                 else:
125 |                     success = await send_response(from_number, response_message, "text")
126 | 
127 |                 if success:
128 |                     logger.info("Message sent successfully")
129 |                     return Response(content="Message processed successfully", status_code=200)
130 |                 else:
131 |                     logger.error("Failed to send message to WhatsApp API")
132 |                     # Still return 200 to prevent WhatsApp from retrying
133 |                     return Response(content="Message processed but failed to send", status_code=200)
134 | 
135 |             except Exception as e:
136 |                 logger.error(f"Error processing graph: {e}", exc_info=True)
137 |                 # Return 200 to prevent retry loop
138 |                 return Response(content="Graph processing error", status_code=200)
139 | 
140 |         elif "statuses" in change_value:
141 |             logger.info("Status update received")
142 |             return Response(content="Status update received", status_code=200)
143 | 
144 |         else:
145 |             logger.warning("Unknown event type received")
146 |             return Response(content="Unknown event type", status_code=200)
147 | 
148 |     except Exception as e:
149 |         logger.error(f"Error processing webhook: {e}", exc_info=True)
150 |         # Return 200 to prevent WhatsApp from retrying
151 |         return Response(content="Webhook processing error", status_code=200)
152 | 
153 | 
154 | async def download_media(media_id: str) -> bytes:
155 |     """Download media from WhatsApp."""
156 |     if not WHATSAPP_TOKEN:
157 |         logger.error("WHATSAPP_TOKEN is not set in environment variables")
158 |         raise ValueError(
159 |             "WHATSAPP_TOKEN is not set. Please check your .env file and ensure WHATSAPP_TOKEN is configured."
160 |         )
161 |     
162 |     if len(WHATSAPP_TOKEN) < 10:
163 |         logger.warning(f"WHATSAPP_TOKEN appears to be invalid (too short: {len(WHATSAPP_TOKEN)} chars)")
164 |     
165 |     media_metadata_url = f"https://graph.facebook.com/v21.0/{media_id}"
166 |     headers = {"Authorization": f"Bearer {WHATSAPP_TOKEN}"}
167 | 
168 |     async with httpx.AsyncClient(timeout=30.0) as client:
169 |         logger.info(f"Fetching media metadata from: {media_metadata_url}")
170 |         logger.debug(f"Using token: {WHATSAPP_TOKEN[:10]}...{WHATSAPP_TOKEN[-10:] if len(WHATSAPP_TOKEN) > 20 else '***'}")
171 |         
172 |         try:
173 |             metadata_response = await client.get(media_metadata_url, headers=headers)
174 |             
175 |             # Check for 401 Unauthorized specifically
176 |             if metadata_response.status_code == 401:
177 |                 error_detail = metadata_response.text
178 |                 logger.error(f"Facebook API 401 Unauthorized error: {error_detail}")
179 |                 logger.error(
180 |                     "Your WHATSAPP_TOKEN may be expired, invalid, or missing required permissions. "
181 |                     "Please check:\n"
182 |                     "1. The token is valid in your .env file\n"
183 |                     "2. The token hasn't expired (Facebook tokens expire)\n"
184 |                     "3. The token has the required permissions for media access\n"
185 |                     "4. You're using the correct token for your WhatsApp Business API account"
186 |                 )
187 |             
188 |             metadata_response.raise_for_status()
189 |         except httpx.HTTPStatusError as e:
190 |             if e.response.status_code == 401:
191 |                 logger.error(
192 |                     f"Authentication failed with Facebook API. "
193 |                     f"Please verify your WHATSAPP_TOKEN is correct and has not expired."
194 |                 )
195 |             raise
196 |         metadata = metadata_response.json()
197 |         logger.info(f"Media metadata: {metadata}")
198 |         
199 |         download_url = metadata.get("url")
200 |         if not download_url:
201 |             raise ValueError(f"No download URL found in metadata: {metadata}")
202 | 
203 |         logger.info(f"Downloading media from: {download_url}")
204 |         media_response = await client.get(download_url, headers=headers)
205 |         media_response.raise_for_status()
206 |         
207 |         content = media_response.content
208 |         logger.info(f"Downloaded media successfully. Size: {len(content)} bytes")
209 |         return content
210 | 
211 | 
212 | async def process_audio_message(message: Dict) -> str:
213 |     """Download and transcribe audio message."""
214 |     audio_id = message["audio"]["id"]
215 |     media_metadata_url = f"https://graph.facebook.com/v21.0/{audio_id}"
216 |     headers = {"Authorization": f"Bearer {WHATSAPP_TOKEN}"}
217 | 
218 |     async with httpx.AsyncClient() as client:
219 |         metadata_response = await client.get(media_metadata_url, headers=headers)
220 |         metadata_response.raise_for_status()
221 |         metadata = metadata_response.json()
222 |         download_url = metadata.get("url")
223 | 
224 |     # Download the audio file
225 |     async with httpx.AsyncClient() as client:
226 |         audio_response = await client.get(download_url, headers=headers)
227 |         audio_response.raise_for_status()
228 | 
229 |     # Prepare for transcription
230 |     audio_buffer = BytesIO(audio_response.content)
231 |     audio_buffer.seek(0)
232 |     audio_data = audio_buffer.read()
233 | 
234 |     return await speech_to_text.transcribe(audio_data)
235 | 
236 | 
237 | async def send_response(
238 |     from_number: str,
239 |     response_text: str,
240 |     message_type: str = "text",
241 |     media_content: bytes = None,
242 | ) -> bool:
243 |     """Send response to user via WhatsApp API."""
244 |     
245 |     # Validate response_text is not empty
246 |     if not response_text or response_text.strip() == "":
247 |         logger.warning(f"Empty response_text detected. Setting default message.")
248 |         response_text = "I'm processing your request. Let me get back to you!"
249 |         
250 |     # Validate credentials first
251 |     if not WHATSAPP_TOKEN or not WHATSAPP_PHONE_NUMBER_ID:
252 |         logger.error("Missing WhatsApp credentials")
253 |         return False
254 |     
255 |     print(f"DEBUG: Sending message type: {message_type}")
256 |     print(f"DEBUG: Response text: '{response_text}'")
257 |     print(f"DEBUG: Response text length: {len(response_text) if response_text else 0}")
258 |     
259 |     headers = {
260 |         "Authorization": f"Bearer {WHATSAPP_TOKEN}",
261 |         "Content-Type": "application/json",
262 |     }
263 | 
264 |     if message_type in ["audio", "image"]:
265 |         try:
266 |             mime_type = "audio/mpeg" if message_type == "audio" else "image/png"
267 |             media_buffer = BytesIO(media_content)
268 |             media_id = await upload_media(media_buffer, mime_type)
269 |             json_data = {
270 |                 "messaging_product": "whatsapp",
271 |                 "to": from_number,
272 |                 "type": message_type,
273 |                 message_type: {"id": media_id},
274 |             }
275 | 
276 |             # Add caption for images
277 |             if message_type == "image":
278 |                 json_data["image"]["caption"] = response_text
279 |         except Exception as e:
280 |             logger.error(f"Media upload failed, falling back to text: {e}")
281 |             message_type = "text"
282 | 
283 |     if message_type == "text":
284 |         json_data = {
285 |             "messaging_product": "whatsapp",
286 |             "to": from_number,
287 |             "type": "text",
288 |             "text": {"body": response_text},
289 |         }
290 | 
291 |     logger.info(f"Sending to WhatsApp API - Headers: {headers}")
292 |     logger.info(f"Sending to WhatsApp API - Payload: {json_data}")
293 | 
294 |     try:
295 |         async with httpx.AsyncClient(timeout=30.0) as client:
296 |             response = await client.post(
297 |                 f"https://graph.facebook.com/v21.0/{WHATSAPP_PHONE_NUMBER_ID}/messages",
298 |                 headers=headers,
299 |                 json=json_data,
300 |             )
301 |             
302 |             logger.info(f"WhatsApp API response status: {response.status_code}")
303 |             logger.info(f"WhatsApp API response body: {response.text}")
304 |             
305 |             if response.status_code == 200:
306 |                 return True
307 |             else:
308 |                 logger.error(f"WhatsApp API error: {response.status_code} - {response.text}")
309 |                 return False
310 |                 
311 |     except httpx.TimeoutException:
312 |         logger.error("Timeout when sending message to WhatsApp API")
313 |         return False
314 |     except Exception as e:
315 |         logger.error(f"Exception when sending message to WhatsApp API: {e}")
316 |         return False
317 | 
318 | 
319 | async def upload_media(media_content: BytesIO, mime_type: str) -> str:
320 |     """Upload media to WhatsApp servers."""
321 |     headers = {"Authorization": f"Bearer {WHATSAPP_TOKEN}"}
322 |     files = {"file": ("response.mp3", media_content, mime_type)}
323 |     data = {"messaging_product": "whatsapp", "type": mime_type}
324 | 
325 |     async with httpx.AsyncClient(timeout=30.0) as client:
326 |         response = await client.post(
327 |             f"https://graph.facebook.com/v21.0/{WHATSAPP_PHONE_NUMBER_ID}/media",
328 |             headers=headers,
329 |             files=files,
330 |             data=data,
331 |         )
332 |         result = response.json()
333 | 
334 |     if "id" not in result:
335 |         raise Exception(f"Failed to upload media: {result}")
336 |     return result["id"]


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Introduction and Project Overview
  2 | 
  3 | ![Alt text](./images/image1.png)
  4 | 
  5 | Kylie is a "Whatsapp Agent”, meaning it will interact with you through this app. But it won’t just rely on “regular” text messages, it will also listen to your voice notes (yes, even if you are one of those people 😒)and react to your pictures, add also will be able to look at your calendar, check for tasks and add tasks and reminders, and also be able to search the internet.
  6 | 
  7 | And that’s not all, Kylie can also respond with her own voice notes and images of what she’s up to - yes, Kylie has a life beyond talking to you, don’t be such a narcissist! 😂. This is an inspiration from a friend named Kylie that stays in Naalya.
  8 | 
  9 | ## At this point, you might be wondering:
 10 | 
 11 | What kind of system have we implemented to handle multimodal inputs / outputs coherently?
 12 | 
 13 | The short answer: Kylie’s brain is just a graph, a LangGraph 🕸️ (sorry, I couldn’t resist).
 14 | 
 15 | ## Kylie’s Graph
 16 | Your brain is made up of neurons, right? Well, Kylie’s brain is made up of LangGraph nodes and edges - one for image processing, another for listening to your voice, another for fetching relevant memories, and so on.
 17 | 
 18 | At her core, Kylie is simply a graph with a state. This state maintains all the key details of the conversation, including shared information (text, audio or images), current activities, and contextual information.
 19 | 
 20 | This is exactly what we'll explore in the second module below, where you'll learn how LangGraph can be used to build agentic design architectures, such as the router.
 21 | 
 22 | ![Alt text](./images/kylie_graph.png)
 23 | 
 24 | ## WhatsApp Integration
 25 | 
 26 | Kylie receives messages through WhatsApp Cloud API webhooks. The integration handles:
 27 | 
 28 | - **Message Reception**: FastAPI endpoint (`/whatsapp_response`) receives webhook events from WhatsApp
 29 | - **Message Types**: Supports text, audio (voice notes), and image messages
 30 | - **Audio Processing**: Downloads audio files from WhatsApp, transcribes them using STT, and processes the text
 31 | - **Image Processing**: Downloads images from WhatsApp, analyzes them using Google Cloud Vision, and includes descriptions in conversation
 32 | - **Response Sending**: Sends responses back via WhatsApp API in text, audio, or image format
 33 | - **Session Management**: Uses phone numbers as thread IDs for conversation continuity
 34 | - **State Persistence**: Graph state is saved to SQLite using AsyncSqliteSaver checkpointing
 35 | 
 36 | ## Graph Compilation and Execution
 37 | 
 38 | The graph is compiled with a checkpointer for state persistence:
 39 | 
 40 | - **Checkpointer**: `AsyncSqliteSaver` saves conversation state to SQLite database
 41 | - **Thread-based Sessions**: Each user (phone number) has a unique thread ID for isolated conversations
 42 | - **State Recovery**: Previous conversation state is automatically loaded when processing new messages
 43 | - **Graph Flow**: START → Memory Extraction → Router → Context Injection → Memory Injection → Workflow Branch → Summarization (conditional) → END
 44 | 
 45 | ## Kylie’s memory
 46 | An Agent without memory is like talking to the main character of “Memento” (and if you haven’t seen that film… seriously, what are you doing with your life?).
 47 | 
 48 | Kylie has two types of memory:
 49 | 
 50 | 🔷 Short term memory
 51 | The usual - it stores the sequence of messages to maintain conversation context. In our case, we save this sequence in SQLite (we are also storing a summary of the conversation).
 52 | 
 53 | 🔷 Long term memory
 54 | When you meet someone, you don’t remember everything they say; you retain only the key details, like their name, profession, or where they’re from, right?. That’s exactly what we wanted to replicate with Qdrant - extracting relevant information from the conversation and storing it as embeddings.
 55 | 
 56 | we’ll cover the memory in Module 3.
 57 | 
 58 | 
 59 | ## Kylie’s senses
 60 | Real Whatsapp conversations aren’t limited to just text. Think about it - do you remember the last cringe sticker your friend sent you last week? Or that neverending voice note from your high school friend? Exactly. We need both images and audio.
 61 | 
 62 | To make this possible, we’ve selected the following tools.
 63 | 
 64 | 🔷 Text
 65 | I am using Groq models for all text generation. Specifically, I’ve chosen llama-3.3-70b-versatile as the core LLM.
 66 | 
 67 | 🔷 Images
 68 | The image module handles two tasks: processing user images and generating new ones (take a look at the image below).
 69 | 
 70 | For image “understanding”, I've used google-cloud-vision.
 71 | 
 72 | For image generation, black-forest-labs/FLUX.1-schnell-Free using Together AI.
 73 | 
 74 | 🔷 Audio
 75 | The audio module needs to take care of TTS (Text-To-Speech) and STT (Speech-To-Text).
 76 | 
 77 | For TTS, I'm using Elevenlabs voices.
 78 | 
 79 | For STT, whisper-large-v3-turbo from Groq.
 80 | 
 81 | I'll shared about the audio module in module 4 and the image module in module 5!
 82 | 
 83 | 
 84 | ## Module 2 (Disecting Kylie's Brain)
 85 | 
 86 | Picture this: you’re a mad scientist living in a creepy old house in the middle of the forest, and your mission is to build a sentient robot. What’s the first thing you’d do?
 87 | 
 88 | Yep, you’d start with the brain, right? 🧠
 89 | 
 90 | So, when I started building Kylie, I also kicked things off with the “brain”.
 91 | 
 92 | And that’s exactly what this section is all about - building Kylie’s brain using LangGraph! 🕸️
 93 | 
 94 | ## LangGraph in a Nutshell
 95 | Never used LangGraph before? No worries, here’s a quick intro.
 96 | 
 97 | LangGraph models agent workflows as graphs, using three main components:
 98 | 
 99 | 🔶 State - A shared data structure that tracks the current status of your app (workflow).
100 | 
101 | 🔶 Nodes - Python functions that define the agent behaviour. They take in the current state, perform actions, and return the updated state.
102 | 
103 | 🔶 Edges - Python functions that decide which Node runs next based on the State, allowing for conditional or fixed transitions.
104 | 
105 | By combining Nodes and Edges, you can build dynamic workflows, like Kylie! In the next section, we’ll take a look at Kylie’s graph and its Nodes and Edges.
106 | 
107 | Before getting into the Nodes and the Edges, let’s describe Kylie’s state.
108 | 
109 | 💠 Kylie State
110 | As mentioned earlier, LangGraph keeps track of your app's current status using the State. Kylie’s state has these attributes:
111 | 
112 | - `summary` - The summary of the conversation so far.
113 | - `workflow` - The Current workflow type (conversation/image/audio/tools/search).
114 | - `audio_buffer` - The buffer containing audio data for voice messages.
115 | - `image_path` - Path to the current image being generated. 
116 | - `current_activity` - Description of Kylie's current simulated activity.
117 | - `apply_activity` - Flag indicating whether to apply or update the current activity.
118 | - `memory_context`: Retrieved memories from vector database
119 | - `search_results`: Formatted search results from Tavily (when search is performed)
120 | 
121 | - `messages`: Conversation message history (inherited from MessagesState)
122 | 
123 | ![Alt text](./images/img3.png)
124 | 
125 | This state will be saved in an external database. Im using SQLite3 for simplicity.
126 | 
127 | Now that we know how Kylie’s State is set up, let’s check out the nodes and edges.
128 | 
129 | 💠 Memory Extraction Node
130 | The first node of the graph is the memory extraction node. This node will take care of extracting relevant information from the user conversation (e.g. name, age, background, etc.)
131 | 
132 | 💠 Context Injection Node
133 | To appear like a real person, Kylie needs to do more than just chat with you. That’s why we need a node that checks your local time and matches it with Kylie’s schedule. Kylie's schedule is hardcoded and you can change it based on what you want
134 | 
135 | ![Alt text](./images/img4.png)
136 | 
137 | 💠 Router Node
138 | - **Purpose**: Determines the appropriate response type. The router node is at the heart of Kylie's workflow. It determines which workflow Kylie's response should follow - audio (for audio responses), image (for visual responses) or conversation (regular text replies), tools(for Calendar operations needed) and search(for Internet search needed)
139 | - **Decision Process**:
140 |   - Analyzes the last N messages (configurable via `ROUTER_MESSAGES_TO_ANALYZE`, default is typically 3-5)
141 |   - Uses an LLM with structured output to classify the response type
142 |   - Considers user intent, explicit requests, and conversation context
143 |   - Returns one of: `conversation`, `image`, `audio`, `tools`, or `search`
144 | - **Decision Factors**:
145 |   - **Calendar/Tools**: Keywords like "schedule", "calendar", "events", "meetings", "appointments", "add event", "what's on my calendar"
146 |   - **Search**: Keywords like "search for", "what is", "tell me about", "current news", "latest", "find information about"
147 |   - **Image**: Explicit requests for images, visual content, or "show me" type queries
148 |   - **Audio**: Explicit requests for voice notes, audio responses, or "say it" type queries
149 |   - **Conversation**: Default for regular text-based interactions
150 | - **Implementation**: Uses a structured output chain with temperature 0.3 for consistent routing decisions
151 | 
152 | ![Alt text](./images/img5.png)
153 | 
154 | ![Alt text](./images/img6.png)
155 | 
156 | Once the Router Node determines the final answer, the chosen workflow is assigned to the "workflow" attribute of the AICompanionState. This information is then used by the select_workflow edge, which connects the router node to either the image, audio, tool, search or conversation nodes.
157 | 
158 | ![Alt text](./images/img7.png)
159 | 
160 | 💠 Tool Calling Node
161 | - **Purpose**: Handles calendar operations
162 | - **Capabilities**:
163 |   - List upcoming events (with configurable max results, default 10)
164 |   - Add calendar events (with summary, start/end times, and optional description)
165 |   - Get current/next event (with configurable lookahead window, default 30 minutes)
166 | - **Calendar Integration**: Google Calendar API via direct LangChain tool integration
167 | - **Authentication**: Uses OAuth 2.0 flow with Google Calendar API
168 |   - Requires initial setup of `credentials.json` from Google Cloud Console
169 |   - Stores user authorization tokens in `token.json` for subsequent use
170 |   - Automatically refreshes expired tokens
171 | - **Context**: Uses current date, time, and timezone (Africa/Kampala)
172 | - **Implementation**: 
173 |   - The `CalendarTool` class wraps Google Calendar API operations
174 |   - LangChain tools (`list_upcoming_events`, `add_calendar_event`, `get_current_or_next_event`) are integrated into the graph
175 |   - The router node determines when calendar operations are needed
176 |   - Tool results are formatted and included in Kylie's response
177 | - **Features**:
178 |   - Automatic timezone handling (converts local times to UTC for Google Calendar)
179 |   - Event reminders (email 24 hours before, popup 10 minutes before)
180 |   - Error handling for invalid dates, authentication failures, and API errors
181 | 
182 | 
183 | 💠 Search Node
184 | - **Purpose**: Performs internet search and generates responses with search context
185 | - **Search Provider**: Tavily Search API
186 | - **Process**:
187 |   1. Extracts search query from user message
188 |   2. Performs search using Tavily API with "advanced" search depth
189 |   3. Formats search results (title, content preview, source URL)
190 |   4. Generates response incorporating search results into conversation context
191 | - **Output**: Text response with search results context, stores `search_results` in state
192 | - **Use Cases**: Current events, news, recent information, factual queries, real-time data
193 | - **Implementation**:
194 |   - The `TavilySearch` class handles all search operations
195 |   - Default max results: 5 (configurable)
196 |   - Search results are formatted with titles, content snippets (first 200 chars), and source URLs
197 |   - Results are injected into the character response chain as context
198 |   - The router node determines when internet search is needed based on user queries
199 | - **Features**:
200 |   - Advanced search depth for comprehensive results
201 |   - Automatic query extraction from user messages
202 |   - Error handling for API failures and empty queries
203 |   - Results are seamlessly integrated into Kylie's responses
204 | 
205 | 💠 Summarize Conversation Node
206 | - **Purpose**: Reduces conversation history length
207 | - **Trigger**: When total messages exceed 100 (configurable via `TOTAL_MESSAGES_SUMMARY_TRIGGER`)
208 | - **Process**: 
209 |   - Creates/extends conversation summary
210 |   - Removes old messages (keeps last 75 by default)
211 | - **Output**: Updated summary and reduced message history
212 | 
213 | But of course, we don’t want to generate a summary every single time Kylie gets a message. That’s why this node is connected to the previous ones with a conditional edge.
214 | 
215 | ![Alt text](./images/img8.png)
216 | 
217 | As you can see in the implementation above, this edge connects the summarization node to the previous nodes if the total number of messages exceeds the TOTAL_MESSAGES_SUMMARY_TRIGGER (which is set to 100 by default in settings.py). If not, it will connect to the END node, which marks the end of the workflow
218 | 
219 | ## Module 3 (Kylie's Memory)
220 | 
221 | ![Alt text](./images/img9.png)
222 | 
223 | Let’s start with a diagram to give you a big-picture view. As you can see, there are two main memory “blocks” - one stored in a SQLite database (left) and the other in a Qdrant collection (right).
224 | 
225 | 💠 Short-term memory
226 | The block on the left represents the short-term memory, which is stored in the LangGraph state and then persisted in a SQLite database. LangGraph makes this process simple since it comes with a built-in checkpointer for handling database storage.
227 | 
228 | In the code, we simply use the AsyncSqliteSaver class when compiling the graph. This ensures that the LangGraph state checkpoint is continuously saved to SQLite. You can see this in action in the code below.
229 | 
230 | ![Alt text](./images/img10.png)
231 | 
232 | Kylie’s state is a subclass of LangGraph’s MessageState, which means it inherits a messages property. This property holds the history of messages exchanged in the conversation - essentially, that’s what we call short-term memory!
233 | 
234 | Integrating this short-term memory into the response chain is straightforward. We can use LangChain's MessagesPlaceholder class, allowing Kylie to consider past interactions when generating responses. This keeps the conversation smooth and coherent.
235 | 
236 | Simple, right? Now, let’s get into the interesting part: the long-term memory.
237 | 
238 | 💠 Long-term memory
239 | 
240 | ![Alt text](./images/img11.png)
241 | 
242 | Long-term memory isn’t just about saving every single message from a conversation - far from it 😅. That would be impractical and impossible to scale. Long-term memory works quite differently.
243 | 
244 | Think about when you meet someone new, you don’t remember every word they say, right? You only retain key details, like their name, profession, where they’re from, or shared interests.
245 | 
246 | That’s exactly what we wanted to replicate with Kylie. How? 🤔
247 | 
248 | By using a Vector Database like Qdrant, that lets us store relevant information from conversations as embeddings. Let’s break this down in more detail.
249 | 
250 | 🔶 Memory Extraction Node
251 | 
252 | Previously, when we talked about the different nodes in our LangGraph workflow? The first one was the memory_extraction_node, which is responsible for identifying and storing key details from the conversation.
253 | 
254 | That’s the first essential piece we need to get our long-term memory module up and running! 💪
255 | 
256 | ![Alt text](./images/img12.png)
257 | 
258 | 🔶 Qdrant
259 | As the conversation progresses, the memory_extraction_node will keep gathering more and more details about you.
260 | 
261 | If you check your Qdrant Cloud instance, you’ll see the collection gradually filling up with “memories”.
262 | 
263 | 🔶 Memory Injection Node
264 | Now that all the memories are stored in Qdrant, how do we let Kylie use them in her conversations?
265 | 
266 | It’s simple! We just need one more node: the memory_injection_node.
267 | 
268 | ![Alt text](./images/img13.png)
269 | 
270 | This node uses the MemoryManager class to retrieve relevant memories from Qdrant - essentially performing a vector search to find the top-k similar embeddings. Then, it transforms those embeddings (vector representations) into text using the format_memories_for_prompt method.
271 | 
272 | Once that's done, the formatted memories are stored in the memory_context property of the graph. This allows them to be parsed into the Character Card Prompt - the one that defines Kylie's personality and behaviour.
273 | 
274 | ## Module 4 (Kylie's Voice)
275 | 
276 | Kylie's audio pipeline works a lot like the vision pipeline.
277 | 
278 | Instead of processing images and generating new ones, we're dealing with audio: converting speech to text and text back to speech.
279 | 
280 | Take a look at the diagram above to see what I mean.
281 | 
282 | It all starts when you send a voice note. The audio gets transcribed, and the text is sent into the LangGraph workflow. That text, along with your message, helps generate a response, sometimes with an accompanying voice note. We'll explore how conversations are shaped using the incoming message, chat history, memories, and even current activities.
283 | 
284 | So, in a nutshell, there are two main flows: one for handling audio coming in and another for generating and sending new audio out.
285 | 
286 | ## Audio In: Speech-to-Text (STT)
287 | 
288 | ![Alt text](./images/img15.png)
289 | 
290 | Speech-to-Text models convert spoken audio into written text, enabling Kylie to understand voice messages just like text messages. They process audio waveforms, identify phonemes and words, and generate accurate transcriptions even with background noise or different accents.
291 | 
292 | For Kylie, STT is essential for making voice notes accessible. It lets her transcribe your voice messages accurately, understand your spoken requests, and generate responses that go beyond just text - bringing real conversational context into interactions.
293 | 
294 | To integrate STT into Kylie's codebase, I built the SpeechToText class as part of Kylie's modules. We're using Groq's Whisper model (whisper-large-v3-turbo) for fast and accurate transcription.
295 | 
296 | ## Audio Out: Text-to-Speech (TTS)
297 | 
298 | ![Alt text](./images/img16.png)
299 | 
300 | Text-to-Speech models convert written text into natural-sounding speech, enabling Kylie to respond with voice notes just like a real person. They process text, generate phonemes, and synthesize audio waveforms that sound human-like with proper intonation and emotion.
301 | 
302 | For Kylie, TTS is crucial for creating natural and engaging voice responses. Whether she's responding with a voice note or expressing emotions, these models ensure her audio outputs match the conversation while staying warm and conversational.
303 | 
304 | There are tons of TTS services out there - growing fast! - but we found that ElevenLabs gave us solid results, creating the natural, expressive voice we wanted for Kylie's personality.
305 | 
306 | Plus, it offers great voice quality and customization options, which is a huge bonus! 
307 | 
308 | The workflow is simple: first, we generate a text response based on the chat history and Kylie's activities
309 | 
310 | Next, we use this text to synthesize speech using ElevenLabs TTS, with voice settings optimized for natural conversation.
311 | 
312 | The synthesize method then generates the audio bytes and stores them in the LangGraph state's audio_buffer.
313 | 
314 | Finally, the audio gets sent back to the user via the WhatsApp endpoint hook, giving them a voice representation of what Kylie is saying!
315 | 
316 | 
317 | ## Module 5 (Kylie's Vision)
318 | 
319 | ![Alt text](./images/img14.png)
320 | 
321 | Kylie’s vision pipeline works a lot like the audio pipeline.
322 | 
323 | Instead of converting speech to text and back, we’re dealing with images: processing what comes in and generating fresh ones to send back.
324 | 
325 | Take a look at the diagram above to see what I mean.
326 | 
327 | It all starts when I send a picture golfing. The image gets processed, and a description is sent into the LangGraph workflow. That description, along with my message, helps generate a response, sometimes with an accompanying image. We’ll explore how scenarios are shaped using the incoming message, chat history, memories, and even current activities.
328 | 
329 | So, in a nutshell, there are two main flows: one for handling images coming in and another for generating and sending new ones out.
330 | 
331 | ## Image In: Visual Language Models (VLM)
332 | 
333 | ![Alt text](./images/img15.png)
334 | 
335 | Vision Language Models (VLMs) process both images and text, generating text-based insights from visual input. They help with tasks like object recognition, image captioning, and answering questions about images. Some even understand spatial relationships, identifying objects or their positions.
336 | 
337 | For Kylie, VLMs are key to making sense of incoming images. They let her analyze pictures, describe them accurately, and generate responses that go beyond just text - bringing real context and understanding into conversations.
338 | 
339 | To integrate the VLM into Kylie’s codebase, I built the ImageToText class as part of Kylie’s modules.
340 | 
341 | ## Image Out: Diffusion Models
342 | 
343 | ![Alt text](./images/img16.png)
344 | 
345 | Diffusion models are a type of generative AI that create images by refining random noise step by step until a clear picture emerges. They learn from training data to produce diverse, high-quality images without copying exact examples.
346 | 
347 | For Kylie, diffusion models are crucial for generating realistic and context-aware images. Whether she’s responding with a visual or illustrating a concept, these models ensure her image outputs match the conversation while staying creative and unique.
348 | 
349 | There are tons of diffusion models out there - growing fast! - but we found that FLUX.1 gave us solid results, creating the realistic images we wanted for Kylie’s simulated life.
350 | 
351 | Plus, it’s free to use on the Together.ai platform, which is a huge bonus! 
352 | The workflow is simple: first, we generate a scenario based on the chat history and Kylie's activities
353 | 
354 | Next, we use this scenario to craft a prompt for image generation, adding guardrails, context, and other relevant details.
355 | 
356 | The generate_image method then saves the output image to the filesystem and stores its path in the LangGraph state.
357 | 
358 | Finally, the image gets sent back to the user via the WhatsApp endpoint hook, giving them a visual representation of what Kylie is seeing!
359 | 
360 | 
361 | 
362 | 
363 | 


--------------------------------------------------------------------------------