├── src
    ├── __init__.py
    └── redis_context_course
    │   ├── scripts
    │       ├── __init__.py
    │       ├── generate_courses_from_hierarchical.py
    │       └── load_hierarchical_courses.py
    │   ├── __init__.py
    │   ├── models.py
    │   └── redis_config.py
├── .python-version
├── tests
    ├── __init__.py
    ├── conftest.py
    ├── test_tools.py
    └── test_package.py
├── context_overview.png
├── public
    ├── ce-overview.png
    ├── context-problems.png
    ├── agents-token-hungry.png
    └── chroma_distractors.png
├── data
    └── arxiv_2504_02268.pdf
├── .env.example
├── progressive_agents
    ├── stage1_baseline_rag
    │   └── agent
    │   │   ├── __init__.py
    │   │   ├── state.py
    │   │   ├── workflow.py
    │   │   └── setup.py
    ├── stage3_full_agent_without_memory
    │   ├── agent
    │   │   ├── __init__.py
    │   │   ├── state.py
    │   │   ├── edges.py
    │   │   └── workflow.py
    │   ├── test_linear_algebra.py
    │   ├── test_simple.py
    │   └── debug_search.py
    ├── stage5_working_memory
    │   ├── agent
    │   │   ├── __init__.py
    │   │   ├── edges.py
    │   │   ├── react_parser.py
    │   │   └── state.py
    │   ├── test_linear_algebra.py
    │   ├── test_simple.py
    │   ├── test_exact_match.py
    │   ├── test_exact_match_react.py
    │   ├── test_react_multi_turn.py
    │   ├── debug_search.py
    │   └── test_react_simple.py
    ├── stage6_full_memory
    │   ├── agent
    │   │   ├── __init__.py
    │   │   ├── edges.py
    │   │   ├── react_parser.py
    │   │   └── state.py
    │   ├── test_linear_algebra.py
    │   ├── test_simple.py
    │   ├── test_simple_memory.py
    │   ├── test_exact_match.py
    │   ├── test_react_linear_algebra.py
    │   ├── debug_search.py
    │   └── test_react_simple.py
    ├── stage4_hybrid_search
    │   └── agent
    │   │   ├── __init__.py
    │   │   ├── state.py
    │   │   ├── react_parser.py
    │   │   ├── workflow.py
    │   │   ├── react_prompts.py
    │   │   └── setup.py
    └── stage2_context_engineered
    │   └── agent
    │       ├── state.py
    │       ├── __init__.py
    │       ├── workflow.py
    │       ├── context_engineering.py
    │       └── setup.py
├── .gitignore
├── docker-compose.yml
├── requirements.txt
├── workshop_boa
    ├── redis_context_course_boa
    │   ├── __init__.py
    │   └── redis_config_boa.py
    └── 03_data_engineering_theory_README.md
├── pyproject.toml
├── notebooks
    └── SETUP_GUIDE.md
└── test_openai_connection.py


/src/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------
1 | 3.13.7
2 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | Tests for the Redis Context Course package.
3 | """
4 | 


--------------------------------------------------------------------------------
/context_overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/context-eng-matters/main/context_overview.png


--------------------------------------------------------------------------------
/public/ce-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/context-eng-matters/main/public/ce-overview.png


--------------------------------------------------------------------------------
/data/arxiv_2504_02268.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/context-eng-matters/main/data/arxiv_2504_02268.pdf


--------------------------------------------------------------------------------
/public/context-problems.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/context-eng-matters/main/public/context-problems.png


--------------------------------------------------------------------------------
/public/agents-token-hungry.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/context-eng-matters/main/public/agents-token-hungry.png


--------------------------------------------------------------------------------
/public/chroma_distractors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/redis-developer/context-eng-matters/main/public/chroma_distractors.png


--------------------------------------------------------------------------------
/src/redis_context_course/scripts/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Scripts package for Redis Context Course.
 3 | 
 4 | This package contains command-line scripts for data generation,
 5 | ingestion, and other utilities for the context engineering course.
 6 | 
 7 | Available scripts:
 8 | - generate_courses: Generate sample course catalog data
 9 | - ingest_courses: Ingest course data into Redis
10 | """
11 | 
12 | __all__ = ["generate_courses", "ingest_courses"]
13 | 


--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------
 1 | # Required
 2 | OPENAI_API_KEY=sk-your-key-here
 3 | 
 4 | # Optional - Redis configuration (defaults provided)
 5 | REDIS_URL=redis://localhost:6379
 6 | 
 7 | # Optional - Agent Memory Server configuration
 8 | AGENT_MEMORY_SERVER_URL=http://localhost:8088
 9 | AGENT_MEMORY_URL=http://localhost:8088
10 | 
11 | # Optional - Redis index name for course data
12 | REDIS_INDEX_NAME=course_catalog
13 | 
14 | # Optional - OpenAI model configuration
15 | OPENAI_MODEL=gpt-4o
16 | 
17 | 


--------------------------------------------------------------------------------
/tests/conftest.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import time
 3 | 
 4 | import pytest
 5 | from testcontainers.core.container import DockerContainer
 6 | 
 7 | 
 8 | @pytest.fixture(scope="session")
 9 | def redis_stack_url():
10 |     """Start a Redis 8 container (modules built-in) and yield REDIS_URL."""
11 |     image = os.getenv("TEST_REDIS_IMAGE", "redis:8.2.1")
12 |     with DockerContainer(image) as c:
13 |         c.with_exposed_ports(6379)
14 |         c.start()
15 |         host = c.get_container_host_ip()
16 |         port = int(c.get_exposed_port(6379))
17 |         url = f"redis://{host}:{port}"
18 |         # Tiny wait for readiness
19 |         time.sleep(1.0)
20 |         yield url
21 | 


--------------------------------------------------------------------------------
/progressive_agents/stage1_baseline_rag/agent/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Stage 1: Baseline RAG Agent
 3 | 
 4 | A simple RAG agent that demonstrates the BASELINE approach:
 5 | - Semantic search with Redis vector embeddings
 6 | - RAW, unoptimized context (no context engineering)
 7 | - Simple 2-node LangGraph workflow
 8 | - No query decomposition
 9 | - No quality evaluation
10 | 
11 | This agent intentionally shows the problems that context engineering solves.
12 | Students will see:
13 | - Noisy context (unnecessary fields)
14 | - Token waste (inefficient)
15 | - Poor LLM parsing (verbose JSON)
16 | 
17 | Stage 2 will apply Section 2 context engineering techniques to fix these issues.
18 | """
19 | 
20 | from .setup import cleanup_courses, load_courses_if_needed, setup_agent
21 | from .state import AgentState, initialize_state
22 | from .workflow import create_workflow
23 | 
24 | __all__ = [
25 |     "setup_agent",
26 |     "load_courses_if_needed",
27 |     "cleanup_courses",
28 |     "create_workflow",
29 |     "AgentState",
30 |     "initialize_state",
31 | ]
32 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Python
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | *.so
 6 | .Python
 7 | build/
 8 | develop-eggs/
 9 | dist/
10 | downloads/
11 | eggs/
12 | .eggs/
13 | lib/
14 | lib64/
15 | parts/
16 | sdist/
17 | var/
18 | wheels/
19 | *.egg-info/
20 | .installed.cfg
21 | *.egg
22 | 
23 | # Virtual environments
24 | .venv/
25 | venv/
26 | ENV/
27 | env/
28 | 
29 | # UV
30 | .uv/
31 | 
32 | # IDE
33 | .idea/
34 | .vscode/
35 | *.swp
36 | *.swo
37 | *~
38 | 
39 | # Jupyter
40 | .ipynb_checkpoints/
41 | *.ipynb_checkpoints
42 | 
43 | # Environment
44 | .env
45 | .env.local
46 | 
47 | # Testing
48 | .pytest_cache/
49 | .coverage
50 | htmlcov/
51 | .tox/
52 | .nox/
53 | 
54 | # Linting
55 | .ruff_cache/
56 | .mypy_cache/
57 | 
58 | # OS
59 | .DS_Store
60 | Thumbs.db
61 | 
62 | # Project specific
63 | test_output/
64 | *.log
65 | 
66 | test_outputs/
67 | workshop/arch
68 | workshop/run_comparison.py
69 | workshop/test_agents_simple.py
70 | 
71 | # Archived progressive agent stages
72 | progressive_agents/archive/
73 | 
74 | # Local analysis files (not for version control)
75 | local_analysis/
76 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/agent/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Course Q&A Agent - Stage 3 (Full Agent without Memory)
 3 | 
 4 | A LangGraph-based agent for answering questions about courses using semantic search
 5 | and context engineering techniques. This is Stage 3 of the progressive learning path.
 6 | 
 7 | Adapted from the caching-agent architecture with CourseManager integration.
 8 | """
 9 | 
10 | from .setup import cleanup_courses, initialize_course_manager, setup_agent
11 | from .state import WorkflowMetrics, WorkflowState, initialize_metrics
12 | from .tools import optimize_course_text, search_courses, transform_course_to_text
13 | from .workflow import create_workflow, run_agent
14 | 
15 | __all__ = [
16 |     # State management
17 |     "WorkflowState",
18 |     "WorkflowMetrics",
19 |     "initialize_metrics",
20 |     # Workflow
21 |     "create_workflow",
22 |     "run_agent",
23 |     # Setup
24 |     "setup_agent",
25 |     "initialize_course_manager",
26 |     "cleanup_courses",
27 |     # Tools
28 |     "search_courses",
29 |     "transform_course_to_text",
30 |     "optimize_course_text",
31 | ]
32 | 
33 | __version__ = "0.1.0"
34 | 


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | services:
 2 |   redis:
 3 |     image: redis:8.2.2
 4 |     container_name: redis-context-engineering
 5 |     ports:
 6 |       - "6379:6379"
 7 |     environment:
 8 |       - REDIS_ARGS=--save 60 1 --loglevel warning
 9 |     volumes:
10 |       - redis-data:/data
11 |     healthcheck:
12 |       test: ["CMD", "redis-cli", "ping"]
13 |       interval: 5s
14 |       timeout: 3s
15 |       retries: 5
16 | 
17 |   agent-memory-server:
18 |     image: ghcr.io/redis/agent-memory-server:0.12.3
19 |     container_name: agent-memory-server
20 |     command: ["agent-memory", "api", "--host", "0.0.0.0", "--port", "8000", "--no-worker"]
21 |     ports:
22 |       - "8088:8000"  # Host port changed to avoid conflicts
23 |     environment:
24 |       - REDIS_URL=redis://redis:6379
25 |       - OPENAI_API_KEY=${OPENAI_API_KEY}
26 |       - LOG_LEVEL=INFO
27 |     depends_on:
28 |       redis:
29 |         condition: service_healthy
30 |     healthcheck:
31 |       test: ["CMD", "curl", "-f", "http://localhost:8000/v1/health"]
32 |       interval: 10s
33 |       timeout: 5s
34 |       retries: 5
35 |       start_period: 30s
36 | 
37 | volumes:
38 |   redis-data:
39 | 
40 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | # Context Engineering Course - Core Dependencies
 2 | # Python 3.10+
 3 | 
 4 | # LangChain ecosystem
 5 | langchain>=0.2.0
 6 | langchain-openai>=0.1.0
 7 | langchain-core>=0.2.0
 8 | langchain-community>=0.2.0
 9 | langchain-experimental>=0.3.0
10 | langchain-text-splitters>=0.3.0
11 | 
12 | # LangGraph for agent workflows
13 | langgraph>=0.2.0
14 | langgraph-checkpoint>=1.0.0
15 | langgraph-checkpoint-redis>=0.1.0
16 | 
17 | # Redis and vector search
18 | redis>=6.0.0
19 | redisvl>=0.8.0
20 | 
21 | # OpenAI
22 | openai>=1.0.0
23 | 
24 | # Agent Memory Server client
25 | agent-memory-client>=0.12.3
26 | 
27 | # Data validation and models
28 | pydantic>=2.0.0
29 | 
30 | # Utilities
31 | python-dotenv>=1.0.0
32 | click>=8.0.0
33 | rich>=13.0.0
34 | tiktoken>=0.5.0
35 | python-ulid>=3.0.0
36 | 
37 | # Data generation
38 | faker>=20.0.0
39 | pandas>=2.0.0
40 | numpy>=1.24.0
41 | 
42 | # Jupyter notebooks
43 | jupyter>=1.0.0
44 | ipykernel>=6.0.0
45 | 
46 | # Embeddings (for notebooks)
47 | sentence-transformers>=2.0.0
48 | langchain-huggingface>=0.1.0
49 | pypdf>=6.3.0
50 | 
51 | # Testing (optional - uncomment if needed)
52 | # pytest>=7.0.0
53 | # pytest-asyncio>=0.21.0
54 | 
55 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/agent/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Course Q&A Agent - Stage 5 (Memory-Augmented Agent)
 3 | 
 4 | A LangGraph-based agent for answering questions about courses with working memory
 5 | for multi-turn conversations. This is Stage 5 of the progressive learning path.
 6 | 
 7 | Extends Stage 4 with Agent Memory Server integration.
 8 | """
 9 | 
10 | from .setup import cleanup_courses, initialize_course_manager, setup_agent
11 | from .state import WorkflowMetrics, WorkflowState, initialize_metrics, initialize_state
12 | from .tools import optimize_course_text, search_courses, transform_course_to_text
13 | from .workflow import create_workflow, run_agent, run_agent_async
14 | 
15 | __all__ = [
16 |     # State management
17 |     "WorkflowState",
18 |     "WorkflowMetrics",
19 |     "initialize_metrics",
20 |     "initialize_state",
21 |     # Workflow
22 |     "create_workflow",
23 |     "run_agent",
24 |     "run_agent_async",
25 |     # Setup
26 |     "setup_agent",
27 |     "initialize_course_manager",
28 |     "cleanup_courses",
29 |     # Tools
30 |     "search_courses",
31 |     "transform_course_to_text",
32 |     "optimize_course_text",
33 | ]
34 | 
35 | __version__ = "0.1.0"
36 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/agent/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Course Q&A Agent - Stage 5 (Memory-Augmented Agent)
 3 | 
 4 | A LangGraph-based agent for answering questions about courses with working memory
 5 | for multi-turn conversations. This is Stage 5 of the progressive learning path.
 6 | 
 7 | Extends Stage 4 with Agent Memory Server integration.
 8 | """
 9 | 
10 | from .setup import cleanup_courses, initialize_course_manager, setup_agent
11 | from .state import WorkflowMetrics, WorkflowState, initialize_metrics, initialize_state
12 | from .tools import optimize_course_text, search_courses, transform_course_to_text
13 | from .workflow import create_workflow, run_agent, run_agent_async
14 | 
15 | __all__ = [
16 |     # State management
17 |     "WorkflowState",
18 |     "WorkflowMetrics",
19 |     "initialize_metrics",
20 |     "initialize_state",
21 |     # Workflow
22 |     "create_workflow",
23 |     "run_agent",
24 |     "run_agent_async",
25 |     # Setup
26 |     "setup_agent",
27 |     "initialize_course_manager",
28 |     "cleanup_courses",
29 |     # Tools
30 |     "search_courses",
31 |     "transform_course_to_text",
32 |     "optimize_course_text",
33 | ]
34 | 
35 | __version__ = "0.1.0"
36 | 


--------------------------------------------------------------------------------
/progressive_agents/stage4_hybrid_search/agent/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Course Q&A Agent - Stage 4 ReAct (Hybrid Search with NER + ReAct Loop)
 3 | 
 4 | A LangGraph-based agent for answering questions about courses using:
 5 | - Hybrid search (semantic + exact match)
 6 | - Named Entity Recognition (NER) for course codes
 7 | - ReAct (Reasoning + Acting) loop for explicit reasoning
 8 | 
 9 | This is an alternative to Stage 4 that adds ReAct capabilities.
10 | """
11 | 
12 | from .setup import cleanup_courses, initialize_course_manager, setup_agent
13 | from .state import WorkflowMetrics, WorkflowState, initialize_metrics
14 | from .tools import optimize_course_text, search_courses_tool, transform_course_to_text
15 | from .workflow import create_workflow, run_agent, run_agent_async
16 | 
17 | __all__ = [
18 |     # State management
19 |     "WorkflowState",
20 |     "WorkflowMetrics",
21 |     "initialize_metrics",
22 |     # Workflow
23 |     "create_workflow",
24 |     "run_agent",
25 |     "run_agent_async",
26 |     # Setup
27 |     "setup_agent",
28 |     "initialize_course_manager",
29 |     "cleanup_courses",
30 |     # Tools
31 |     "search_courses_tool",
32 |     "transform_course_to_text",
33 |     "optimize_course_text",
34 | ]
35 | 
36 | __version__ = "0.1.0"
37 | 
38 | 


--------------------------------------------------------------------------------
/progressive_agents/stage1_baseline_rag/agent/state.py:
--------------------------------------------------------------------------------
 1 | """
 2 | State definitions for Stage 1 Baseline RAG Agent.
 3 | 
 4 | This is a simplified state compared to Stage 3 - no metrics tracking,
 5 | no quality scores, no iteration tracking. Just the basics.
 6 | """
 7 | 
 8 | from typing import Optional, TypedDict
 9 | 
10 | 
11 | class AgentState(TypedDict):
12 |     """
13 |     Simple state for baseline RAG agent.
14 | 
15 |     This is intentionally minimal to show the baseline approach
16 |     without any advanced features.
17 |     """
18 | 
19 |     # Input
20 |     query: str
21 | 
22 |     # Research results
23 |     raw_context: str  # Raw course data (JSON or basic string)
24 |     courses_found: int
25 | 
26 |     # Output
27 |     final_answer: str
28 | 
29 |     # Simple metrics
30 |     total_tokens: Optional[int]
31 |     total_time_ms: Optional[float]
32 | 
33 | 
34 | def initialize_state(query: str) -> AgentState:
35 |     """
36 |     Initialize agent state with a query.
37 | 
38 |     Args:
39 |         query: User's question
40 | 
41 |     Returns:
42 |         Initialized state dictionary
43 |     """
44 |     return {
45 |         "query": query,
46 |         "raw_context": "",
47 |         "courses_found": 0,
48 |         "final_answer": "",
49 |         "total_tokens": None,
50 |         "total_time_ms": None,
51 |     }
52 | 


--------------------------------------------------------------------------------
/progressive_agents/stage2_context_engineered/agent/state.py:
--------------------------------------------------------------------------------
 1 | """
 2 | State definitions for Stage 2 Context-Engineered Agent.
 3 | 
 4 | Same simple state as Stage 1 - the difference is in HOW we process
 5 | the context (with context engineering), not in the state structure.
 6 | """
 7 | 
 8 | from typing import Optional, TypedDict
 9 | 
10 | 
11 | class AgentState(TypedDict):
12 |     """
13 |     Simple state for context-engineered RAG agent.
14 | 
15 |     Same structure as Stage 1, but the context will be engineered
16 |     using Section 2 techniques (cleaning, transformation, optimization).
17 |     """
18 | 
19 |     # Input
20 |     query: str
21 | 
22 |     # Research results
23 |     engineered_context: str  # Context-engineered course data (natural text)
24 |     courses_found: int
25 | 
26 |     # Output
27 |     final_answer: str
28 | 
29 |     # Simple metrics
30 |     total_tokens: Optional[int]
31 |     total_time_ms: Optional[float]
32 | 
33 | 
34 | def initialize_state(query: str) -> AgentState:
35 |     """
36 |     Initialize agent state with a query.
37 | 
38 |     Args:
39 |         query: User's question
40 | 
41 |     Returns:
42 |         Initialized state dictionary
43 |     """
44 |     return {
45 |         "query": query,
46 |         "engineered_context": "",
47 |         "courses_found": 0,
48 |         "final_answer": "",
49 |         "total_tokens": None,
50 |         "total_time_ms": None,
51 |     }
52 | 


--------------------------------------------------------------------------------
/progressive_agents/stage2_context_engineered/agent/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Stage 2: Context-Engineered Agent
 3 | 
 4 | A RAG agent that applies Section 2 context engineering techniques:
 5 | - Semantic search with Redis vector embeddings (same as Stage 1)
 6 | - CONTEXT ENGINEERING: Clean, transform, and optimize retrieved data
 7 | - Simple 2-node LangGraph workflow (same as Stage 1)
 8 | - No query decomposition (same as Stage 1)
 9 | - No quality evaluation (same as Stage 1)
10 | 
11 | This agent demonstrates the VALUE of context engineering by comparing
12 | to Stage 1's baseline approach. Same architecture, better context!
13 | 
14 | Context Engineering Techniques Applied:
15 | 1. CLEANING: Remove noise fields (id, timestamps, enrollment data)
16 | 2. TRANSFORMATION: Convert JSON → natural text format
17 | 3. OPTIMIZATION: Efficient token usage while preserving information
18 | 
19 | Students will see:
20 | - 40-50% token reduction compared to Stage 1
21 | - Better LLM understanding (natural text vs JSON)
22 | - Same functionality, better efficiency
23 | - Clear ROI on context engineering
24 | """
25 | 
26 | from .context_engineering import (
27 |     format_courses_for_llm,
28 |     optimize_course_text,
29 |     transform_course_to_text,
30 | )
31 | from .setup import cleanup_courses, load_courses_if_needed, setup_agent
32 | from .state import AgentState, initialize_state
33 | from .workflow import create_workflow
34 | 
35 | __all__ = [
36 |     "setup_agent",
37 |     "load_courses_if_needed",
38 |     "cleanup_courses",
39 |     "create_workflow",
40 |     "AgentState",
41 |     "initialize_state",
42 |     "transform_course_to_text",
43 |     "optimize_course_text",
44 |     "format_courses_for_llm",
45 | ]
46 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/test_linear_algebra.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Test to see how the agent handles "linear algebra" queries.
 3 | 
 4 | Key question: Does it use vector search (semantic) or exact match?
 5 | """
 6 | 
 7 | import asyncio
 8 | import logging
 9 | import sys
10 | from pathlib import Path
11 | 
12 | # Add parent directory to path
13 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
14 | 
15 | from redis_context_course import CourseManager
16 | 
17 | from progressive_agents.stage6_full_memory.agent.workflow import (
18 |     create_workflow,
19 |     run_agent_async,
20 | )
21 | 
22 | # Configure logging to show tool calls
23 | logging.basicConfig(
24 |     level=logging.INFO,
25 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
26 | )
27 | 
28 | 
29 | async def test_query(agent, query, session_id):
30 |     """Test a single query and show what strategy was used."""
31 |     print("\n" + "=" * 80)
32 |     print(f"Query: {query}")
33 |     print("=" * 80)
34 | 
35 |     result = await run_agent_async(
36 |         agent=agent,
37 |         query=query,
38 |         session_id=session_id,
39 |         student_id="test_user",
40 |         enable_caching=False,
41 |     )
42 | 
43 |     response = result.get("final_response", "No response")
44 |     print(f"\nResponse:\n{response}\n")
45 | 
46 | 
47 | async def main():
48 |     """Run linear algebra tests."""
49 |     print("Initializing Course Manager...")
50 |     course_manager = CourseManager()
51 | 
52 |     print("Creating agent workflow...")
53 |     agent = create_workflow(course_manager)
54 | 
55 |     # Test questions about "linear algebra" (not a course code)
56 |     questions = [
57 |         ("general_1", "I am interested in linear algebra"),
58 |         ("prereq_1", "What are the prerequisites for linear algebra?"),
59 |         ("topics_1", "What's the topics for linear algebra?"),
60 |         ("assignments_1", "What are the assignments for linear algebra course?"),
61 |     ]
62 | 
63 |     for session_id, question in questions:
64 |         await test_query(agent, question, session_id)
65 | 
66 | 
67 | if __name__ == "__main__":
68 |     asyncio.run(main())
69 | 
70 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/test_linear_algebra.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Test to see how the agent handles "linear algebra" queries.
 3 | 
 4 | Key question: Does it use vector search (semantic) or exact match?
 5 | """
 6 | 
 7 | import asyncio
 8 | import logging
 9 | import sys
10 | from pathlib import Path
11 | 
12 | # Add parent directory to path
13 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
14 | 
15 | from redis_context_course import CourseManager
16 | 
17 | from progressive_agents.stage5_working_memory.agent.workflow import (
18 |     create_workflow,
19 |     run_agent_async,
20 | )
21 | 
22 | # Configure logging to show tool calls
23 | logging.basicConfig(
24 |     level=logging.INFO,
25 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
26 | )
27 | 
28 | 
29 | async def test_query(agent, query, session_id):
30 |     """Test a single query and show what strategy was used."""
31 |     print("\n" + "=" * 80)
32 |     print(f"Query: {query}")
33 |     print("=" * 80)
34 | 
35 |     result = await run_agent_async(
36 |         agent=agent,
37 |         query=query,
38 |         session_id=session_id,
39 |         student_id="test_user",
40 |         enable_caching=False,
41 |     )
42 | 
43 |     response = result.get("final_response", "No response")
44 |     print(f"\nResponse:\n{response}\n")
45 | 
46 | 
47 | async def main():
48 |     """Run linear algebra tests."""
49 |     print("Initializing Course Manager...")
50 |     course_manager = CourseManager()
51 | 
52 |     print("Creating agent workflow...")
53 |     agent = create_workflow(course_manager)
54 | 
55 |     # Test questions about "linear algebra" (not a course code)
56 |     questions = [
57 |         ("general_1", "I am interested in linear algebra"),
58 |         ("prereq_1", "What are the prerequisites for linear algebra?"),
59 |         ("topics_1", "What's the topics for linear algebra?"),
60 |         ("assignments_1", "What are the assignments for linear algebra course?"),
61 |     ]
62 | 
63 |     for session_id, question in questions:
64 |         await test_query(agent, question, session_id)
65 | 
66 | 
67 | if __name__ == "__main__":
68 |     asyncio.run(main())
69 | 
70 | 


--------------------------------------------------------------------------------
/progressive_agents/stage1_baseline_rag/agent/workflow.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow builder for Stage 1 Baseline RAG Agent.
 3 | 
 4 | This is a SIMPLE 2-node workflow:
 5 | 1. Research (semantic search with raw context)
 6 | 2. Synthesize (LLM answer generation)
 7 | 
 8 | No decomposition, no quality evaluation, no iteration.
 9 | Just the basics to show the baseline approach.
10 | """
11 | 
12 | import logging
13 | 
14 | from langgraph.graph import END, START, StateGraph
15 | 
16 | from .nodes import research_node, synthesize_node, set_verbose
17 | from .state import AgentState
18 | 
19 | logger = logging.getLogger("stage1-baseline")
20 | 
21 | 
22 | def create_workflow(verbose: bool = True) -> StateGraph:
23 |     """
24 |     Create a simple 2-node RAG workflow.
25 | 
26 |     Args:
27 |         verbose: If True, show detailed logging. If False, suppress intermediate logs.
28 | 
29 |     Workflow:
30 |     START → research → synthesize → END
31 | 
32 |     This is intentionally simple to show the baseline approach.
33 |     Stage 2 will have the same structure but with context engineering.
34 |     Stage 3 adds decomposition and quality evaluation.
35 | 
36 |     Returns:
37 |         Compiled LangGraph workflow
38 |     """
39 |     # Set verbose mode for nodes
40 |     set_verbose(verbose)
41 | 
42 |     # Control logger level based on verbose flag
43 |     if not verbose:
44 |         logging.getLogger("stage1-baseline").setLevel(logging.CRITICAL)
45 |     else:
46 |         logging.getLogger("stage1-baseline").setLevel(logging.INFO)
47 | 
48 |     logger.info("🏗️  Building Stage 1 Baseline RAG workflow...")
49 | 
50 |     # Create state graph
51 |     workflow = StateGraph(AgentState)
52 | 
53 |     # Add nodes
54 |     workflow.add_node("research", research_node)
55 |     workflow.add_node("synthesize", synthesize_node)
56 | 
57 |     # Define edges (linear flow)
58 |     workflow.add_edge(START, "research")
59 |     workflow.add_edge("research", "synthesize")
60 |     workflow.add_edge("synthesize", END)
61 | 
62 |     # Compile
63 |     app = workflow.compile()
64 | 
65 |     logger.info("✅ Workflow created successfully")
66 |     logger.info("📊 Workflow: START → research → synthesize → END")
67 | 
68 |     return app
69 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/test_simple.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Simple test to verify the agent works with tool calling.
 3 | """
 4 | 
 5 | import asyncio
 6 | import logging
 7 | import sys
 8 | from pathlib import Path
 9 | 
10 | # Add parent directory to path
11 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
12 | 
13 | from redis_context_course import CourseManager
14 | 
15 | from progressive_agents.stage6_full_memory.agent.workflow import (
16 |     create_workflow,
17 |     run_agent_async,
18 | )
19 | 
20 | # Configure logging
21 | logging.basicConfig(
22 |     level=logging.INFO,
23 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
24 | )
25 | 
26 | 
27 | async def main():
28 |     """Run a simple test."""
29 |     print("Initializing Course Manager...")
30 |     course_manager = CourseManager()
31 | 
32 |     print("Creating agent workflow...")
33 |     agent = create_workflow(course_manager)
34 | 
35 |     # Test questions
36 |     questions = [
37 |         ("GENERAL", "What is CS004?"),
38 |         ("PREREQUISITES", "What are the prerequisites for CS004?"),
39 |         ("SYLLABUS", "What's the syllabus for CS004?"),
40 |         ("ASSIGNMENTS", "What are the assignments for CS004?"),
41 |     ]
42 | 
43 |     for intent_type, question in questions:
44 |         print("\n" + "=" * 80)
45 |         print(f"Testing: {intent_type}")
46 |         print(f"Question: {question}")
47 |         print("=" * 80)
48 | 
49 |         try:
50 |             result = await run_agent_async(
51 |                 agent=agent,
52 |                 query=question,
53 |                 session_id=f"test_{intent_type.lower()}",
54 |                 student_id="test_user",
55 |                 enable_caching=False,
56 |             )
57 | 
58 |             response = result.get("final_response", "No response")
59 |             execution_path = " → ".join(result.get("execution_path", []))
60 | 
61 |             print(f"\nResponse:\n{response}\n")
62 |             print(f"Execution Path: {execution_path}")
63 | 
64 |         except Exception as e:
65 |             print(f"❌ Error: {e}")
66 |             import traceback
67 | 
68 |             traceback.print_exc()
69 | 
70 | 
71 | if __name__ == "__main__":
72 |     asyncio.run(main())
73 | 
74 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/test_simple.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Simple test to verify the agent works with tool calling.
 3 | """
 4 | 
 5 | import asyncio
 6 | import logging
 7 | import sys
 8 | from pathlib import Path
 9 | 
10 | # Add parent directory to path
11 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
12 | 
13 | from redis_context_course import CourseManager
14 | 
15 | from progressive_agents.stage5_working_memory.agent.workflow import (
16 |     create_workflow,
17 |     run_agent_async,
18 | )
19 | 
20 | # Configure logging
21 | logging.basicConfig(
22 |     level=logging.INFO,
23 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
24 | )
25 | 
26 | 
27 | async def main():
28 |     """Run a simple test."""
29 |     print("Initializing Course Manager...")
30 |     course_manager = CourseManager()
31 | 
32 |     print("Creating agent workflow...")
33 |     agent = create_workflow(course_manager)
34 | 
35 |     # Test questions
36 |     questions = [
37 |         ("GENERAL", "What is CS004?"),
38 |         ("PREREQUISITES", "What are the prerequisites for CS004?"),
39 |         ("SYLLABUS", "What's the syllabus for CS004?"),
40 |         ("ASSIGNMENTS", "What are the assignments for CS004?"),
41 |     ]
42 | 
43 |     for intent_type, question in questions:
44 |         print("\n" + "=" * 80)
45 |         print(f"Testing: {intent_type}")
46 |         print(f"Question: {question}")
47 |         print("=" * 80)
48 | 
49 |         try:
50 |             result = await run_agent_async(
51 |                 agent=agent,
52 |                 query=question,
53 |                 session_id=f"test_{intent_type.lower()}",
54 |                 student_id="test_user",
55 |                 enable_caching=False,
56 |             )
57 | 
58 |             response = result.get("final_response", "No response")
59 |             execution_path = " → ".join(result.get("execution_path", []))
60 | 
61 |             print(f"\nResponse:\n{response}\n")
62 |             print(f"Execution Path: {execution_path}")
63 | 
64 |         except Exception as e:
65 |             print(f"❌ Error: {e}")
66 |             import traceback
67 | 
68 |             traceback.print_exc()
69 | 
70 | 
71 | if __name__ == "__main__":
72 |     asyncio.run(main())
73 | 
74 | 


--------------------------------------------------------------------------------
/progressive_agents/stage2_context_engineered/agent/workflow.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow builder for Stage 2 Context-Engineered Agent.
 3 | 
 4 | This is a SIMPLE 2-node workflow (same as Stage 1):
 5 | 1. Research (semantic search with ENGINEERED context)
 6 | 2. Synthesize (LLM answer generation)
 7 | 
 8 | No decomposition, no quality evaluation, no iteration.
 9 | The ONLY difference from Stage 1 is context engineering!
10 | """
11 | 
12 | import logging
13 | 
14 | from langgraph.graph import END, START, StateGraph
15 | 
16 | from .nodes import research_node, synthesize_node, set_verbose
17 | from .state import AgentState
18 | 
19 | logger = logging.getLogger("stage2-engineered")
20 | 
21 | 
22 | def create_workflow(verbose: bool = True) -> StateGraph:
23 |     """
24 |     Create a simple 2-node RAG workflow with context engineering.
25 | 
26 |     Args:
27 |         verbose: If True, show detailed logging. If False, suppress intermediate logs.
28 | 
29 |     Workflow:
30 |     START → research (with context engineering) → synthesize → END
31 | 
32 |     Same structure as Stage 1, but research node applies Section 2 techniques.
33 |     This makes it easy to compare and see the impact of context engineering.
34 | 
35 |     Returns:
36 |         Compiled LangGraph workflow
37 |     """
38 |     # Set verbose mode for nodes
39 |     set_verbose(verbose)
40 | 
41 |     # Control logger level based on verbose flag
42 |     if not verbose:
43 |         logging.getLogger("stage2-engineered").setLevel(logging.CRITICAL)
44 |     else:
45 |         logging.getLogger("stage2-engineered").setLevel(logging.INFO)
46 | 
47 |     logger.info("🏗️  Building Stage 2 Context-Engineered workflow...")
48 | 
49 |     # Create state graph
50 |     workflow = StateGraph(AgentState)
51 | 
52 |     # Add nodes
53 |     workflow.add_node("research", research_node)
54 |     workflow.add_node("synthesize", synthesize_node)
55 | 
56 |     # Define edges (linear flow)
57 |     workflow.add_edge(START, "research")
58 |     workflow.add_edge("research", "synthesize")
59 |     workflow.add_edge("synthesize", END)
60 | 
61 |     # Compile
62 |     app = workflow.compile()
63 | 
64 |     logger.info("✅ Workflow created successfully")
65 |     logger.info("📊 Workflow: START → research (engineered) → synthesize → END")
66 | 
67 |     return app
68 | 


--------------------------------------------------------------------------------
/workshop_boa/redis_context_course_boa/__init__.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Redis Context Course BOA - Bank of America Workshop Version
 3 | 
 4 | This is a BOA-specific version of redis_context_course that uses Orchestra API
 5 | for embeddings and LLM calls. It's designed for the workshop_boa directory.
 6 | 
 7 | Key differences from the original:
 8 | - Uses Orchestra API instead of OpenAI
 9 | - Includes placeholder mode for testing without Orchestra
10 | - Clear TODO markers for configuration
11 | - Non-breaking changes with fallback support
12 | 
13 | Main Components:
14 | - hierarchical_manager_boa: Course manager with Orchestra embeddings
15 | - hierarchical_models: Data models (same as original)
16 | - hierarchical_context: Context assemblers (same as original)
17 | - redis_config_boa: Redis configuration with Orchestra support
18 | 
19 | Usage in workshop notebooks:
20 |     # Import BOA version instead of original
21 |     from redis_context_course_boa import HierarchicalCourseManager
22 |     from redis_context_course_boa import CourseSummary, CourseDetails
23 |     from redis_context_course_boa import HierarchicalContextAssembler
24 | """
25 | 
26 | # Import models from original package (no changes needed)
27 | from redis_context_course.hierarchical_models import (
28 |     Assignment,
29 |     AssignmentType,
30 |     CourseDetails,
31 |     CourseSummary,
32 |     CourseSyllabus,
33 |     GradingPolicy,
34 |     HierarchicalCourse,
35 |     SyllabusWeek,
36 | )
37 | 
38 | # Import context assemblers from original package (no changes needed)
39 | from redis_context_course.hierarchical_context import (
40 |     HierarchicalContextAssembler,
41 |     FlatContextAssembler,
42 | )
43 | 
44 | # Import BOA-specific components
45 | from .hierarchical_manager_boa import HierarchicalCourseManager
46 | from .redis_config_boa import RedisConfig, redis_config
47 | 
48 | __all__ = [
49 |     # Core classes (BOA versions)
50 |     "HierarchicalCourseManager",
51 |     "RedisConfig",
52 |     "redis_config",
53 |     # Data models (from original)
54 |     "CourseSummary",
55 |     "CourseDetails",
56 |     "HierarchicalCourse",
57 |     "CourseSyllabus",
58 |     "Assignment",
59 |     "AssignmentType",
60 |     "GradingPolicy",
61 |     "SyllabusWeek",
62 |     # Context assemblers (from original)
63 |     "HierarchicalContextAssembler",
64 |     "FlatContextAssembler",
65 | ]
66 | 
67 | 


--------------------------------------------------------------------------------
/tests/test_tools.py:
--------------------------------------------------------------------------------
 1 | import pytest
 2 | 
 3 | from redis_context_course import tools as tools_mod
 4 | 
 5 | 
 6 | class FakeCourse:
 7 |     def __init__(self, code, title, desc, credits=3, fmt="Online", diff="Beginner"):
 8 |         self.course_code = code
 9 |         self.title = title
10 |         self.description = desc
11 |         self.credits = credits
12 |         self.format = type("Fmt", (), {"value": fmt})
13 |         self.difficulty_level = type("Diff", (), {"value": diff})
14 |         self.prerequisites = []
15 | 
16 | 
17 | class FakeCourseManager:
18 |     async def search_courses(self, query: str, limit: int = 5):
19 |         return [
20 |             FakeCourse("CS101", "Intro to CS", "Learn basics of programming"),
21 |             FakeCourse("CS102", "Python Basics", "Introductory Python course"),
22 |         ][:limit]
23 | 
24 |     async def get_course(self, course_code: str):
25 |         if course_code == "MISSING":
26 |             return None
27 |         return FakeCourse(course_code, "Some Course", "Detailed description")
28 | 
29 | 
30 | @pytest.mark.asyncio
31 | async def test_search_courses_tool_formats_result():
32 |     cm = FakeCourseManager()
33 |     (search_tool, get_details_tool, check_prereq_tool) = tools_mod.create_course_tools(
34 |         cm
35 |     )
36 | 
37 |     out = await search_tool.ainvoke({"query": "python beginner", "limit": 2})
38 |     assert "CS101" in out and "CS102" in out
39 |     assert "Credits:" in out and "Online" in out
40 | 
41 | 
42 | @pytest.mark.asyncio
43 | async def test_get_course_details_handles_missing():
44 |     cm = FakeCourseManager()
45 |     (_, get_details_tool, _) = tools_mod.create_course_tools(cm)
46 | 
47 |     out = await get_details_tool.ainvoke({"course_code": "MISSING"})
48 |     assert "not found" in out.lower()
49 | 
50 | 
51 | def test_select_tools_by_keywords():
52 |     tools_map = {
53 |         "search": ["S1"],
54 |         "memory": ["M1"],
55 |     }
56 |     res1 = tools_mod.select_tools_by_keywords("find programming courses", tools_map)
57 |     res2 = tools_mod.select_tools_by_keywords(
58 |         "please remember my preferences", tools_map
59 |     )
60 |     res3 = tools_mod.select_tools_by_keywords("random", tools_map)
61 | 
62 |     assert res1 == ["S1"]
63 |     assert res2 == ["M1"]
64 |     assert res3 == ["S1"]  # defaults to search
65 | 
66 | 
67 | # NOTE: Agent-specific tool tests have been removed.
68 | # The tools are now available via create_agent_tools() in tools.py
69 | # Use the progressive_agents stages for testing full agent functionality.
70 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/test_linear_algebra.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Test script for Stage 3 with topic-based queries (linear algebra).
 3 | 
 4 | Tests how the agent handles:
 5 | 1. Topic-based queries (should use semantic_only strategy)
 6 | 2. Different intents with the same topic
 7 | """
 8 | 
 9 | import asyncio
10 | import logging
11 | import sys
12 | from pathlib import Path
13 | 
14 | # Add parent directory to path
15 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
16 | 
17 | from redis_context_course import CourseManager
18 | 
19 | from progressive_agents.stage3_full_agent_without_memory.agent.workflow import (
20 |     create_workflow,
21 | )
22 | 
23 | # Configure logging
24 | logging.basicConfig(
25 |     level=logging.INFO,
26 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
27 | )
28 | 
29 | # Suppress verbose logs
30 | logging.getLogger("httpx").setLevel(logging.WARNING)
31 | logging.getLogger("openai").setLevel(logging.WARNING)
32 | 
33 | 
34 | async def test_linear_algebra():
35 |     """Test the agent with linear algebra queries."""
36 |     
37 |     print("\n" + "=" * 80)
38 |     print("STAGE 3 LINEAR ALGEBRA TEST")
39 |     print("=" * 80)
40 |     
41 |     # Initialize course manager
42 |     print("\n📚 Initializing CourseManager...")
43 |     course_manager = CourseManager()
44 |     
45 |     # Create workflow
46 |     print("🔧 Creating workflow...")
47 |     workflow = create_workflow(course_manager)
48 |     
49 |     # Test questions about linear algebra
50 |     test_questions = [
51 |         "I am interested in linear algebra",
52 |         "What are the prerequisites for linear algebra?",
53 |         "What's the topics for linear algebra?",
54 |         "What are the assignments for linear algebra course?",
55 |     ]
56 |     
57 |     for i, question in enumerate(test_questions, 1):
58 |         print(f"\n{'=' * 80}")
59 |         print(f"Test {i}/{len(test_questions)}")
60 |         print(f"{'=' * 80}")
61 |         print(f"Query: {question}")
62 |         
63 |         # Run workflow
64 |         initial_state = {
65 |             "original_query": question,
66 |             "execution_path": [],
67 |             "llm_calls": {},
68 |             "metrics": {},
69 |         }
70 |         
71 |         result = await workflow.ainvoke(initial_state)
72 |         
73 |         # Extract response
74 |         response = result.get("final_response", "No response")
75 |         
76 |         print(f"\nResponse:\n{response}")
77 |         print(f"\n{'-' * 80}")
78 | 
79 | 
80 | if __name__ == "__main__":
81 |     asyncio.run(test_linear_algebra())
82 | 
83 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/test_simple_memory.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Simple test to verify Stage 6 memory tools work correctly.
 3 | """
 4 | 
 5 | import asyncio
 6 | import logging
 7 | import sys
 8 | import uuid
 9 | from pathlib import Path
10 | 
11 | # Add parent directory to path
12 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
13 | 
14 | from redis_context_course import CourseManager
15 | 
16 | from progressive_agents.stage6_full_memory.agent.workflow import (
17 |     create_workflow,
18 |     run_agent_async,
19 | )
20 | 
21 | # Configure logging
22 | logging.basicConfig(
23 |     level=logging.WARNING,  # Reduce noise
24 |     format="%(message)s",
25 | )
26 | 
27 | 
28 | async def main():
29 |     """Run a simple memory test."""
30 |     print("=" * 80)
31 |     print("STAGE 6: SIMPLE MEMORY TEST")
32 |     print("=" * 80)
33 | 
34 |     course_manager = CourseManager()
35 |     agent = create_workflow(course_manager)
36 | 
37 |     student_id = f"demo_student_{uuid.uuid4().hex[:6]}"
38 | 
39 |     # Test 1: Store preferences
40 |     print(f"\n👤 Student: {student_id}")
41 |     print("\n📍 SESSION 1: Store Preferences")
42 |     print("-" * 80)
43 | 
44 |     session_1 = f"session_1_{uuid.uuid4().hex[:6]}"
45 |     query_1 = "I'm interested in machine learning and I prefer online courses."
46 | 
47 |     print(f"User: {query_1}")
48 | 
49 |     result_1 = await run_agent_async(
50 |         agent=agent,
51 |         query=query_1,
52 |         session_id=session_1,
53 |         student_id=student_id,
54 |         enable_caching=False,
55 |     )
56 | 
57 |     response_1 = result_1.get("final_response", "No response")
58 |     print(f"\nAgent: {response_1}")
59 | 
60 |     # Test 2: Retrieve preferences in new session
61 |     print("\n\n📍 SESSION 2: Retrieve Preferences (New Session)")
62 |     print("-" * 80)
63 | 
64 |     session_2 = f"session_2_{uuid.uuid4().hex[:6]}"
65 |     query_2 = "What courses would you recommend for me?"
66 | 
67 |     print(f"User: {query_2}")
68 | 
69 |     result_2 = await run_agent_async(
70 |         agent=agent,
71 |         query=query_2,
72 |         session_id=session_2,
73 |         student_id=student_id,
74 |         enable_caching=False,
75 |     )
76 | 
77 |     response_2 = result_2.get("final_response", "No response")
78 |     print(f"\nAgent: {response_2}")
79 | 
80 |     # Check if personalization worked
81 |     print("\n" + "=" * 80)
82 |     if "machine learning" in response_2.lower() or "online" in response_2.lower():
83 |         print("✅ SUCCESS: Agent used stored preferences for personalization!")
84 |     else:
85 |         print("⚠️  Note: Response may not have explicitly mentioned stored preferences")
86 |     print("=" * 80)
87 | 
88 | 
89 | if __name__ == "__main__":
90 |     asyncio.run(main())
91 | 
92 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/agent/state.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow state definitions for the Course Q&A Agent.
 3 | 
 4 | Adapted from the caching-agent for course-specific question answering.
 5 | """
 6 | 
 7 | from typing import Dict, List, Optional, TypedDict
 8 | 
 9 | 
10 | class WorkflowMetrics(TypedDict):
11 |     """Metrics tracking for workflow performance analysis."""
12 | 
13 |     total_latency: float
14 |     decomposition_latency: float
15 |     cache_latency: float
16 |     research_latency: float
17 |     synthesis_latency: float
18 |     cache_hit_rate: float
19 |     cache_hits_count: int
20 |     questions_researched: int
21 |     total_research_iterations: int
22 |     llm_calls: Dict[str, int]
23 |     sub_question_count: int
24 |     execution_path: str
25 | 
26 | 
27 | class WorkflowState(TypedDict):
28 |     """
29 |     State for Course Q&A workflow with semantic caching and quality evaluation.
30 | 
31 |     Tracks query processing, course retrieval, caching, and performance metrics.
32 |     """
33 | 
34 |     # Core query management
35 |     original_query: str
36 |     sub_questions: List[str]
37 |     sub_answers: Dict[str, str]
38 |     final_response: Optional[str]
39 | 
40 |     # Query intent classification
41 |     query_intent: Optional[
42 |         str
43 |     ]  # "GREETING", "GENERAL", "SYLLABUS_OBJECTIVES", "ASSIGNMENTS", "PREREQUISITES"
44 | 
45 |     # Cache management (granular per sub-question)
46 |     # NOTE: Semantic caching is commented out for now - will be added later
47 |     cache_hits: Dict[str, bool]
48 |     cache_confidences: Dict[str, float]
49 |     cache_enabled: bool
50 | 
51 |     # Research iteration and quality control
52 |     research_iterations: Dict[str, int]
53 |     max_research_iterations: int
54 |     research_quality_scores: Dict[str, float]
55 |     research_feedback: Dict[str, str]
56 |     current_research_strategy: Dict[str, str]
57 | 
58 |     # Agent coordination
59 |     execution_path: List[str]
60 |     active_sub_question: Optional[str]
61 | 
62 |     # Metrics and tracking
63 |     metrics: WorkflowMetrics
64 |     timestamp: str
65 |     comparison_mode: bool
66 |     llm_calls: Dict[str, int]
67 | 
68 | 
69 | def initialize_metrics() -> WorkflowMetrics:
70 |     """Initialize a clean metrics structure with default values."""
71 |     return {
72 |         "total_latency": 0.0,
73 |         "decomposition_latency": 0.0,
74 |         "cache_latency": 0.0,
75 |         "research_latency": 0.0,
76 |         "synthesis_latency": 0.0,
77 |         "cache_hit_rate": 0.0,
78 |         "cache_hits_count": 0,
79 |         "questions_researched": 0,
80 |         "total_research_iterations": 0,
81 |         "llm_calls": {},
82 |         "sub_question_count": 0,
83 |         "execution_path": "",
84 |     }
85 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/test_exact_match.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Test exact match search to debug the CS002 issue.
 3 | """
 4 | 
 5 | import asyncio
 6 | import logging
 7 | from redis_context_course import CourseManager
 8 | 
 9 | logging.basicConfig(level=logging.INFO)
10 | logger = logging.getLogger(__name__)
11 | 
12 | 
13 | async def test_exact_search():
14 |     """Test exact course code search."""
15 |     course_manager = CourseManager()
16 |     
17 |     # Test searching for CS002
18 |     print("\n" + "=" * 80)
19 |     print("Testing exact search for CS002")
20 |     print("=" * 80)
21 |     
22 |     results = await course_manager.search_courses(
23 |         query="CS002",
24 |         filters=None,
25 |         limit=5,
26 |         similarity_threshold=0.0,  # Get all results to see what's happening
27 |     )
28 |     
29 |     print(f"\nFound {len(results)} results:")
30 |     for i, course in enumerate(results, 1):
31 |         print(f"{i}. {course.course_code}: {course.title}")
32 |         print(f"   Department: {course.department}")
33 |         print(f"   Description: {course.description[:100]}...")
34 |         print()
35 |     
36 |     # Test searching for CS004
37 |     print("\n" + "=" * 80)
38 |     print("Testing exact search for CS004")
39 |     print("=" * 80)
40 |     
41 |     results = await course_manager.search_courses(
42 |         query="CS004",
43 |         filters=None,
44 |         limit=5,
45 |         similarity_threshold=0.0,
46 |     )
47 |     
48 |     print(f"\nFound {len(results)} results:")
49 |     for i, course in enumerate(results, 1):
50 |         print(f"{i}. {course.course_code}: {course.title}")
51 |         print(f"   Department: {course.department}")
52 |         print(f"   Description: {course.description[:100]}...")
53 |         print()
54 |     
55 |     # Get all courses to see what's available
56 |     print("\n" + "=" * 80)
57 |     print("Getting all CS courses")
58 |     print("=" * 80)
59 |     
60 |     all_courses = await course_manager.get_all_courses()
61 |     cs_courses = [c for c in all_courses if c.course_code.startswith("CS")]
62 |     cs_courses.sort(key=lambda x: x.course_code)
63 |     
64 |     print(f"\nFound {len(cs_courses)} CS courses:")
65 |     for course in cs_courses:
66 |         print(f"  {course.course_code}: {course.title}")
67 |     
68 |     # Try get_course_by_code
69 |     print("\n" + "=" * 80)
70 |     print("Testing get_course_by_code for CS002")
71 |     print("=" * 80)
72 |     
73 |     course = await course_manager.get_course_by_code("CS002")
74 |     if course:
75 |         print(f"Found: {course.course_code}: {course.title}")
76 |         print(f"Description: {course.description}")
77 |     else:
78 |         print("Course not found!")
79 | 
80 | 
81 | if __name__ == "__main__":
82 |     asyncio.run(test_exact_search())
83 | 
84 | 


--------------------------------------------------------------------------------
/progressive_agents/stage4_hybrid_search/agent/state.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow state definitions for the Stage 4 ReAct Course Q&A Agent.
 3 | 
 4 | Extends Stage 4 state with ReAct-specific fields.
 5 | """
 6 | 
 7 | from typing import Any, Dict, List, Optional, TypedDict
 8 | 
 9 | 
10 | class WorkflowMetrics(TypedDict):
11 |     """Metrics tracking for workflow performance analysis."""
12 | 
13 |     total_latency: float
14 |     decomposition_latency: float
15 |     cache_latency: float
16 |     research_latency: float
17 |     synthesis_latency: float
18 |     cache_hit_rate: float
19 |     cache_hits_count: int
20 |     questions_researched: int
21 |     total_research_iterations: int
22 |     llm_calls: Dict[str, int]
23 |     sub_question_count: int
24 |     execution_path: str
25 | 
26 | 
27 | class WorkflowState(TypedDict):
28 |     """
29 |     State for Stage 4 ReAct Course Q&A workflow.
30 | 
31 |     Combines hybrid search with NER and ReAct reasoning loop.
32 |     """
33 | 
34 |     # Core query management
35 |     original_query: str
36 |     sub_questions: List[str]
37 |     sub_answers: Dict[str, str]
38 |     final_response: Optional[str]
39 | 
40 |     # Query intent classification
41 |     query_intent: Optional[str]  # "GREETING", "GENERAL", "SYLLABUS_OBJECTIVES", etc.
42 | 
43 |     # Named Entity Recognition (NER) - from Stage 4
44 |     extracted_entities: Optional[Dict[str, Any]]
45 |     search_strategy: Optional[str]  # "exact_match", "hybrid", "semantic_only"
46 | 
47 |     # Hybrid search results - from Stage 4
48 |     exact_matches: Optional[List[str]]
49 |     metadata_filters: Optional[Dict[str, Any]]
50 | 
51 |     # ReAct-specific fields
52 |     reasoning_trace: List[Dict[str, Any]]  # Thought/Action/Observation history
53 |     react_iterations: int  # Number of ReAct loop iterations
54 | 
55 |     # Cache management (disabled for now)
56 |     cache_hits: Dict[str, bool]
57 |     cache_confidences: Dict[str, float]
58 |     cache_enabled: bool
59 | 
60 |     # Research iteration and quality control
61 |     research_iterations: Dict[str, int]
62 |     max_research_iterations: int
63 |     research_quality_scores: Dict[str, float]
64 |     research_feedback: Dict[str, str]
65 |     current_research_strategy: Dict[str, str]
66 | 
67 |     # Agent coordination
68 |     execution_path: List[str]
69 |     active_sub_question: Optional[str]
70 | 
71 |     # Metrics and tracking
72 |     metrics: WorkflowMetrics
73 |     timestamp: str
74 |     comparison_mode: bool
75 |     llm_calls: Dict[str, int]
76 | 
77 | 
78 | def initialize_metrics() -> WorkflowMetrics:
79 |     """Initialize a clean metrics structure with default values."""
80 |     return {
81 |         "total_latency": 0.0,
82 |         "decomposition_latency": 0.0,
83 |         "cache_latency": 0.0,
84 |         "research_latency": 0.0,
85 |         "synthesis_latency": 0.0,
86 |         "cache_hit_rate": 0.0,
87 |         "cache_hits_count": 0,
88 |         "questions_researched": 0,
89 |         "total_research_iterations": 0,
90 |         "llm_calls": {},
91 |         "sub_question_count": 0,
92 |         "execution_path": "",
93 |     }
94 | 
95 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/agent/edges.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow edges and routing logic for the Course Q&A Agent.
 3 | 
 4 | Adapted from caching-agent with minimal changes for course Q&A routing.
 5 | """
 6 | 
 7 | import logging
 8 | from typing import Any, Dict, Literal
 9 | 
10 | # Configure logger
11 | logger = logging.getLogger("course-qa-workflow")
12 | 
13 | # NOTE: Semantic cache initialization commented out for now
14 | # Global cache variable
15 | # cache = None
16 | 
17 | 
18 | def initialize_edges():
19 |     """Initialize edges with required dependencies."""
20 |     # NOTE: Semantic cache initialization commented out
21 |     # global cache
22 |     # cache = semantic_cache
23 |     pass
24 | 
25 | 
26 | def route_after_cache_check(state: Dict[str, Any]) -> Literal["research", "synthesize"]:
27 |     """
28 |     Route after cache check based on cache hit results.
29 | 
30 |     Returns:
31 |         "research" if any cache misses detected
32 |         "synthesize" if all sub-questions are cached
33 |     """
34 |     cache_hits = state.get("cache_hits", {})
35 | 
36 |     # Count cache misses
37 |     cache_misses = [
38 |         question for question, is_cached in cache_hits.items() if not is_cached
39 |     ]
40 | 
41 |     if cache_misses:
42 |         logger.info(
43 |             f"🔀 Routing to researcher: {len(cache_misses)} cache misses detected"
44 |         )
45 |         for question in cache_misses:
46 |             logger.info(f"   🔍 Will research: '{question[:50]}...'")
47 |         return "research"
48 |     else:
49 |         logger.info("🔀 Routing to synthesis: all sub-questions cached!")
50 |         return "synthesize"
51 | 
52 | 
53 | def route_after_quality_evaluation(
54 |     state: Dict[str, Any],
55 | ) -> Literal["research", "synthesize"]:
56 |     """
57 |     Route after quality evaluation based on research quality scores.
58 | 
59 |     Returns:
60 |         "research" if any answers need improvement and haven't exceeded max iterations
61 |         "synthesize" if all answers are adequate or max iterations reached
62 |     """
63 |     quality_scores = state.get("research_quality_scores", {})
64 |     research_iterations = state.get("research_iterations", {})
65 |     max_iterations = state.get("max_research_iterations", 2)
66 | 
67 |     # Find questions that need improvement and haven't exceeded max iterations
68 |     needs_improvement = []
69 |     for question, score in quality_scores.items():
70 |         current_iterations = research_iterations.get(question, 0)
71 |         if score < 0.7 and current_iterations < max_iterations:
72 |             needs_improvement.append((question, score, current_iterations))
73 | 
74 |     if needs_improvement:
75 |         logger.info(
76 |             f"🔄 Routing to additional research: {len(needs_improvement)} questions need improvement"
77 |         )
78 |         for question, score, iteration in needs_improvement:
79 |             logger.info(
80 |                 f"   🔍 Improve: '{question[:40]}...' (score: {score:.2f}, iteration: {iteration + 1})"
81 |             )
82 |         return "research"
83 |     else:
84 |         logger.info("🔀 Routing to synthesis: all research quality is adequate!")
85 |         return "synthesize"
86 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/agent/edges.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow edges and routing logic for the Course Q&A Agent.
 3 | 
 4 | Adapted from caching-agent with minimal changes for course Q&A routing.
 5 | """
 6 | 
 7 | import logging
 8 | from typing import Any, Dict, Literal
 9 | 
10 | # Configure logger
11 | logger = logging.getLogger("course-qa-workflow")
12 | 
13 | # NOTE: Semantic cache initialization commented out for now
14 | # Global cache variable
15 | # cache = None
16 | 
17 | 
18 | def initialize_edges():
19 |     """Initialize edges with required dependencies."""
20 |     # NOTE: Semantic cache initialization commented out
21 |     # global cache
22 |     # cache = semantic_cache
23 |     pass
24 | 
25 | 
26 | def route_after_cache_check(state: Dict[str, Any]) -> Literal["research", "synthesize"]:
27 |     """
28 |     Route after cache check based on cache hit results.
29 | 
30 |     Returns:
31 |         "research" if any cache misses detected
32 |         "synthesize" if all sub-questions are cached
33 |     """
34 |     cache_hits = state.get("cache_hits", {})
35 | 
36 |     # Count cache misses
37 |     cache_misses = [
38 |         question for question, is_cached in cache_hits.items() if not is_cached
39 |     ]
40 | 
41 |     if cache_misses:
42 |         logger.info(
43 |             f"🔀 Routing to researcher: {len(cache_misses)} cache misses detected"
44 |         )
45 |         for question in cache_misses:
46 |             logger.info(f"   🔍 Will research: '{question[:50]}...'")
47 |         return "research"
48 |     else:
49 |         logger.info("🔀 Routing to synthesis: all sub-questions cached!")
50 |         return "synthesize"
51 | 
52 | 
53 | def route_after_quality_evaluation(
54 |     state: Dict[str, Any],
55 | ) -> Literal["research", "synthesize"]:
56 |     """
57 |     Route after quality evaluation based on research quality scores.
58 | 
59 |     Returns:
60 |         "research" if any answers need improvement and haven't exceeded max iterations
61 |         "synthesize" if all answers are adequate or max iterations reached
62 |     """
63 |     quality_scores = state.get("research_quality_scores", {})
64 |     research_iterations = state.get("research_iterations", {})
65 |     max_iterations = state.get("max_research_iterations", 2)
66 | 
67 |     # Find questions that need improvement and haven't exceeded max iterations
68 |     needs_improvement = []
69 |     for question, score in quality_scores.items():
70 |         current_iterations = research_iterations.get(question, 0)
71 |         if score < 0.7 and current_iterations < max_iterations:
72 |             needs_improvement.append((question, score, current_iterations))
73 | 
74 |     if needs_improvement:
75 |         logger.info(
76 |             f"🔄 Routing to additional research: {len(needs_improvement)} questions need improvement"
77 |         )
78 |         for question, score, iteration in needs_improvement:
79 |             logger.info(
80 |                 f"   🔍 Improve: '{question[:40]}...' (score: {score:.2f}, iteration: {iteration + 1})"
81 |             )
82 |         return "research"
83 |     else:
84 |         logger.info("🔀 Routing to synthesis: all research quality is adequate!")
85 |         return "synthesize"
86 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/agent/edges.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Workflow edges and routing logic for the Course Q&A Agent.
 3 | 
 4 | Adapted from caching-agent with minimal changes for course Q&A routing.
 5 | """
 6 | 
 7 | import logging
 8 | from typing import Any, Dict, Literal
 9 | 
10 | # Configure logger
11 | logger = logging.getLogger("course-qa-workflow")
12 | 
13 | # NOTE: Semantic cache initialization commented out for now
14 | # Global cache variable
15 | # cache = None
16 | 
17 | 
18 | def initialize_edges():
19 |     """Initialize edges with required dependencies."""
20 |     # NOTE: Semantic cache initialization commented out
21 |     # global cache
22 |     # cache = semantic_cache
23 |     pass
24 | 
25 | 
26 | def route_after_cache_check(state: Dict[str, Any]) -> Literal["research", "synthesize"]:
27 |     """
28 |     Route after cache check based on cache hit results.
29 | 
30 |     Returns:
31 |         "research" if any cache misses detected
32 |         "synthesize" if all sub-questions are cached
33 |     """
34 |     cache_hits = state.get("cache_hits", {})
35 | 
36 |     # Count cache misses
37 |     cache_misses = [
38 |         question for question, is_cached in cache_hits.items() if not is_cached
39 |     ]
40 | 
41 |     if cache_misses:
42 |         logger.info(
43 |             f"🔀 Routing to researcher: {len(cache_misses)} cache misses detected"
44 |         )
45 |         for question in cache_misses:
46 |             logger.info(f"   🔍 Will research: '{question[:50]}...'")
47 |         return "research"
48 |     else:
49 |         logger.info("🔀 Routing to synthesis: all sub-questions cached!")
50 |         return "synthesize"
51 | 
52 | 
53 | def route_after_quality_evaluation(
54 |     state: Dict[str, Any],
55 | ) -> Literal["research", "synthesize"]:
56 |     """
57 |     Route after quality evaluation based on research quality scores.
58 | 
59 |     Returns:
60 |         "research" if any answers need improvement and haven't exceeded max iterations
61 |         "synthesize" if all answers are adequate or max iterations reached
62 |     """
63 |     quality_scores = state.get("research_quality_scores", {})
64 |     research_iterations = state.get("research_iterations", {})
65 |     max_iterations = state.get("max_research_iterations", 2)
66 | 
67 |     # Find questions that need improvement and haven't exceeded max iterations
68 |     needs_improvement = []
69 |     for question, score in quality_scores.items():
70 |         current_iterations = research_iterations.get(question, 0)
71 |         if score < 0.7 and current_iterations < max_iterations:
72 |             needs_improvement.append((question, score, current_iterations))
73 | 
74 |     if needs_improvement:
75 |         logger.info(
76 |             f"🔄 Routing to additional research: {len(needs_improvement)} questions need improvement"
77 |         )
78 |         for question, score, iteration in needs_improvement:
79 |             logger.info(
80 |                 f"   🔍 Improve: '{question[:40]}...' (score: {score:.2f}, iteration: {iteration + 1})"
81 |             )
82 |         return "research"
83 |     else:
84 |         logger.info("🔀 Routing to synthesis: all research quality is adequate!")
85 |         return "synthesize"
86 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/test_exact_match.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Test exact match with Stage 7 ReAct agent for CS002.
  3 | """
  4 | 
  5 | import asyncio
  6 | import logging
  7 | 
  8 | from langchain_openai import ChatOpenAI
  9 | from redis_context_course import CourseManager
 10 | 
 11 | from progressive_agents.stage6_full_memory.agent.react_agent import run_react_agent
 12 | from progressive_agents.stage6_full_memory.agent.tools import initialize_tools
 13 | 
 14 | logging.basicConfig(
 15 |     level=logging.INFO,
 16 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 17 | )
 18 | logger = logging.getLogger("course-qa-workflow")
 19 | 
 20 | 
 21 | async def test_cs002():
 22 |     """Test exact match for CS002."""
 23 |     print("\n" + "=" * 80)
 24 |     print("STAGE 7 TEST: Exact Match for CS002")
 25 |     print("=" * 80)
 26 |     
 27 |     # Initialize
 28 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 29 |     course_manager = CourseManager()
 30 |     initialize_tools(course_manager, user_id="test_user")
 31 |     
 32 |     # Test CS002
 33 |     query = "What is CS002?"
 34 |     print(f"\nQuery: {query}")
 35 |     
 36 |     result = await run_react_agent(
 37 |         query=query,
 38 |         llm=llm,
 39 |         user_id="test_user",
 40 |         max_iterations=10,
 41 |     )
 42 |     
 43 |     # Print results
 44 |     print(f"\n{'─' * 80}")
 45 |     print(f"FINAL ANSWER:")
 46 |     print(f"{'─' * 80}")
 47 |     print(result["answer"])
 48 |     print(f"\nIterations: {result['iterations']}")
 49 |     print(f"Success: {result['success']}")
 50 |     
 51 |     # Verify it's the right course
 52 |     if "Machine Learning" in result["answer"]:
 53 |         print("\n✅ CORRECT: Found CS002 - Machine Learning Fundamentals")
 54 |     else:
 55 |         print("\n❌ WRONG: Did not find the correct course")
 56 | 
 57 | 
 58 | async def test_cs001():
 59 |     """Test exact match for CS001."""
 60 |     print("\n" + "=" * 80)
 61 |     print("STAGE 7 TEST: Exact Match for CS001")
 62 |     print("=" * 80)
 63 |     
 64 |     # Initialize
 65 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 66 |     course_manager = CourseManager()
 67 |     initialize_tools(course_manager, user_id="test_user")
 68 |     
 69 |     # Test CS001
 70 |     query = "What is CS001?"
 71 |     print(f"\nQuery: {query}")
 72 |     
 73 |     result = await run_react_agent(
 74 |         query=query,
 75 |         llm=llm,
 76 |         user_id="test_user",
 77 |         max_iterations=10,
 78 |     )
 79 |     
 80 |     # Print answer
 81 |     print(f"\n{'─' * 80}")
 82 |     print(f"FINAL ANSWER:")
 83 |     print(f"{'─' * 80}")
 84 |     print(result["answer"])
 85 |     print(f"\nIterations: {result['iterations']}")
 86 |     
 87 |     # Verify it's the right course
 88 |     if "Database" in result["answer"]:
 89 |         print("\n✅ CORRECT: Found CS001 - Database Systems")
 90 |     else:
 91 |         print("\n❌ WRONG: Did not find the correct course")
 92 | 
 93 | 
 94 | async def main():
 95 |     """Run tests."""
 96 |     await test_cs002()
 97 |     await test_cs001()
 98 |     
 99 |     print("\n" + "=" * 80)
100 |     print("STAGE 7 EXACT MATCH TESTS COMPLETE")
101 |     print("=" * 80)
102 | 
103 | 
104 | if __name__ == "__main__":
105 |     asyncio.run(main())
106 | 
107 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/test_exact_match_react.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Test exact match with ReAct agent for CS002.
  3 | """
  4 | 
  5 | import asyncio
  6 | import logging
  7 | 
  8 | from langchain_openai import ChatOpenAI
  9 | from redis_context_course import CourseManager
 10 | 
 11 | from progressive_agents.stage5_working_memory.agent.react_agent import run_react_agent
 12 | from progressive_agents.stage5_working_memory.agent.tools import initialize_tools
 13 | 
 14 | logging.basicConfig(
 15 |     level=logging.INFO,
 16 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 17 | )
 18 | logger = logging.getLogger("course-qa-workflow")
 19 | 
 20 | 
 21 | async def test_cs002():
 22 |     """Test exact match for CS002."""
 23 |     print("\n" + "=" * 80)
 24 |     print("TEST: Exact Match for CS002")
 25 |     print("=" * 80)
 26 |     
 27 |     # Initialize
 28 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 29 |     course_manager = CourseManager()
 30 |     initialize_tools(course_manager)
 31 |     
 32 |     # Test CS002
 33 |     query = "What is CS002?"
 34 |     print(f"\nQuery: {query}")
 35 |     
 36 |     result = await run_react_agent(
 37 |         query=query,
 38 |         llm=llm,
 39 |         max_iterations=10,
 40 |     )
 41 |     
 42 |     # Print results
 43 |     print(f"\n{'─' * 80}")
 44 |     print("REASONING TRACE:")
 45 |     print(f"{'─' * 80}")
 46 |     for i, step in enumerate(result["reasoning_trace"], 1):
 47 |         print(f"\nStep {i}:")
 48 |         print(f"  Thought: {step.thought}")
 49 |         print(f"  Action: {step.action}")
 50 |         if step.action != "FINISH":
 51 |             print(f"  Action Input: {step.action_input}")
 52 |             print(f"  Observation: {step.observation[:300]}...")
 53 |         else:
 54 |             print(f"  Final Answer: {step.action_input}")
 55 |     
 56 |     print(f"\n{'─' * 80}")
 57 |     print(f"FINAL ANSWER:")
 58 |     print(f"{'─' * 80}")
 59 |     print(result["answer"])
 60 |     print(f"\nIterations: {result['iterations']}")
 61 |     print(f"Success: {result['success']}")
 62 |     
 63 |     # Verify it's the right course
 64 |     if "Machine Learning" in result["answer"]:
 65 |         print("\n✅ CORRECT: Found CS002 - Machine Learning Fundamentals")
 66 |     else:
 67 |         print("\n❌ WRONG: Did not find the correct course")
 68 | 
 69 | 
 70 | async def test_cs001():
 71 |     """Test exact match for CS001."""
 72 |     print("\n" + "=" * 80)
 73 |     print("TEST: Exact Match for CS001")
 74 |     print("=" * 80)
 75 |     
 76 |     # Initialize
 77 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 78 |     course_manager = CourseManager()
 79 |     initialize_tools(course_manager)
 80 |     
 81 |     # Test CS001
 82 |     query = "What is CS001?"
 83 |     print(f"\nQuery: {query}")
 84 |     
 85 |     result = await run_react_agent(
 86 |         query=query,
 87 |         llm=llm,
 88 |         max_iterations=10,
 89 |     )
 90 |     
 91 |     # Print answer
 92 |     print(f"\n{'─' * 80}")
 93 |     print(f"FINAL ANSWER:")
 94 |     print(f"{'─' * 80}")
 95 |     print(result["answer"])
 96 |     print(f"\nIterations: {result['iterations']}")
 97 | 
 98 | 
 99 | async def main():
100 |     """Run tests."""
101 |     await test_cs002()
102 |     await test_cs001()
103 |     
104 |     print("\n" + "=" * 80)
105 |     print("EXACT MATCH TESTS COMPLETE")
106 |     print("=" * 80)
107 | 
108 | 
109 | if __name__ == "__main__":
110 |     asyncio.run(main())
111 | 
112 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/test_simple.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Simple test script for Stage 3 agentic workflow.
  3 | 
  4 | Tests basic functionality with one question per intent type.
  5 | """
  6 | 
  7 | import asyncio
  8 | import logging
  9 | import sys
 10 | from pathlib import Path
 11 | 
 12 | # Add parent directory to path
 13 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 14 | 
 15 | from redis_context_course import CourseManager
 16 | 
 17 | from progressive_agents.stage3_full_agent_without_memory.agent.workflow import (
 18 |     create_workflow,
 19 | )
 20 | 
 21 | # Configure logging
 22 | logging.basicConfig(
 23 |     level=logging.INFO,
 24 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 25 | )
 26 | 
 27 | # Suppress verbose logs
 28 | logging.getLogger("httpx").setLevel(logging.WARNING)
 29 | logging.getLogger("openai").setLevel(logging.WARNING)
 30 | 
 31 | 
 32 | async def test_agent():
 33 |     """Test the agent with simple questions."""
 34 |     
 35 |     print("\n" + "=" * 80)
 36 |     print("STAGE 3 AGENTIC WORKFLOW TEST")
 37 |     print("=" * 80)
 38 |     
 39 |     # Initialize course manager
 40 |     print("\n📚 Initializing CourseManager...")
 41 |     course_manager = CourseManager()
 42 |     
 43 |     # Create workflow
 44 |     print("🔧 Creating workflow...")
 45 |     workflow = create_workflow(course_manager)
 46 |     
 47 |     # Test questions (one per intent)
 48 |     test_questions = [
 49 |         ("GENERAL", "What is CS004?"),
 50 |         ("PREREQUISITES", "What are the prerequisites for CS004?"),
 51 |         ("SYLLABUS", "What's the syllabus for CS004?"),
 52 |         ("ASSIGNMENTS", "What are the assignments for CS004?"),
 53 |     ]
 54 |     
 55 |     results = []
 56 |     
 57 |     for intent_type, question in test_questions:
 58 |         print(f"\n{'=' * 80}")
 59 |         print(f"Testing Intent: {intent_type}")
 60 |         print(f"{'=' * 80}")
 61 |         print(f"Question: {question}")
 62 |         
 63 |         # Run workflow
 64 |         initial_state = {
 65 |             "original_query": question,
 66 |             "execution_path": [],
 67 |             "llm_calls": {},
 68 |             "metrics": {},
 69 |         }
 70 |         
 71 |         result = await workflow.ainvoke(initial_state)
 72 |         
 73 |         # Extract response
 74 |         response = result.get("final_response", "No response")
 75 |         
 76 |         print(f"\nResponse: {response[:200]}...")
 77 |         
 78 |         # Check if intent matches
 79 |         intent_match = "✅" if any(
 80 |             keyword in response.lower()
 81 |             for keyword in {
 82 |                 "GENERAL": ["computer vision", "cs004"],
 83 |                 "PREREQUISITES": ["prerequisite", "required"],
 84 |                 "SYLLABUS": ["syllabus", "topics", "week"],
 85 |                 "ASSIGNMENTS": ["assignment", "project"],
 86 |             }[intent_type]
 87 |         ) else "❌"
 88 |         
 89 |         print(f"Intent Match: {intent_match}")
 90 |         
 91 |         results.append({
 92 |             "intent": intent_type,
 93 |             "question": question,
 94 |             "response": response,
 95 |             "match": intent_match,
 96 |         })
 97 |     
 98 |     # Summary
 99 |     print(f"\n{'=' * 80}")
100 |     print("SUMMARY")
101 |     print(f"{'=' * 80}")
102 |     
103 |     passed = sum(1 for r in results if r["match"] == "✅")
104 |     total = len(results)
105 |     
106 |     print(f"\nTests Passed: {passed}/{total}")
107 |     
108 |     for r in results:
109 |         print(f"\n{r['match']} {r['intent']}: {r['question']}")
110 |     
111 |     print(f"\n{'=' * 80}\n")
112 | 
113 | 
114 | if __name__ == "__main__":
115 |     asyncio.run(test_agent())
116 | 
117 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/test_react_linear_algebra.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Test to see how the ReAct agent handles "linear algebra" queries.
  3 | 
  4 | Key question: Does it use vector search (semantic) or exact match?
  5 | Tests the agent's reasoning about search strategy selection.
  6 | """
  7 | 
  8 | import asyncio
  9 | import logging
 10 | import sys
 11 | from pathlib import Path
 12 | 
 13 | # Add parent directory to path
 14 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 15 | 
 16 | from redis_context_course import CourseManager
 17 | 
 18 | from progressive_agents.stage6_full_memory.agent.workflow import (
 19 |     create_workflow,
 20 |     run_agent_async,
 21 | )
 22 | 
 23 | # Configure logging to show tool calls
 24 | logging.basicConfig(
 25 |     level=logging.INFO,
 26 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 27 | )
 28 | 
 29 | 
 30 | async def test_query(agent, query, session_id):
 31 |     """Test a single query and show what strategy was used."""
 32 |     print("\n" + "=" * 80)
 33 |     print(f"Query: {query}")
 34 |     print("=" * 80)
 35 | 
 36 |     result = await run_agent_async(
 37 |         agent=agent,
 38 |         query=query,
 39 |         session_id=session_id,
 40 |         student_id="test_user_linear_algebra",
 41 |         enable_caching=False,
 42 |     )
 43 | 
 44 |     response = result.get("final_response", "No response")
 45 |     react_iterations = result.get("react_iterations", 0)
 46 |     reasoning_trace = result.get("reasoning_trace", [])
 47 | 
 48 |     print(f"\nResponse:\n{response}\n")
 49 |     print(f"ReAct Iterations: {react_iterations}")
 50 |     print(f"Reasoning Steps: {len(reasoning_trace)}")
 51 | 
 52 |     # Show reasoning trace
 53 |     if reasoning_trace:
 54 |         print("\nReasoning Trace:")
 55 |         print("-" * 80)
 56 |         for i, step in enumerate(reasoning_trace, 1):
 57 |             if step["type"] == "thought":
 58 |                 print(f"{i}. 💭 Thought: {step['content'][:150]}...")
 59 |             elif step["type"] == "action":
 60 |                 print(f"{i}. 🔧 Action: {step['action']}")
 61 |                 print(f"      Input: {step['input']}")
 62 |                 # Check search strategy
 63 |                 if step["action"] == "search_courses" and "search_strategy" in step["input"]:
 64 |                     strategy = step["input"]["search_strategy"]
 65 |                     print(f"      ⭐ Search Strategy: {strategy}")
 66 |             elif step["type"] == "finish":
 67 |                 print(f"{i}. ✅ FINISH")
 68 |         print("-" * 80)
 69 | 
 70 | 
 71 | async def main():
 72 |     """Run linear algebra tests."""
 73 |     print("Initializing Course Manager...")
 74 |     course_manager = CourseManager()
 75 | 
 76 |     print("Creating agent workflow...")
 77 |     agent = create_workflow(course_manager)
 78 | 
 79 |     print("\n" + "=" * 80)
 80 |     print("STAGE 7 REACT AGENT - LINEAR ALGEBRA TESTS")
 81 |     print("=" * 80)
 82 |     print("\nThese tests check if the agent correctly chooses search strategies")
 83 |     print("for 'linear algebra' (not a course code, requires semantic search)")
 84 |     print("=" * 80)
 85 | 
 86 |     # Test questions about "linear algebra" (not a course code)
 87 |     questions = [
 88 |         ("general_1", "I am interested in linear algebra"),
 89 |         ("prereq_1", "What are the prerequisites for linear algebra?"),
 90 |         ("topics_1", "What topics are covered in linear algebra?"),
 91 |         ("assignments_1", "What are the assignments for linear algebra course?"),
 92 |     ]
 93 | 
 94 |     for session_id, question in questions:
 95 |         await test_query(agent, question, session_id)
 96 | 
 97 |     print("\n" + "=" * 80)
 98 |     print("✅ LINEAR ALGEBRA TESTS COMPLETE")
 99 |     print("=" * 80)
100 | 
101 | 
102 | if __name__ == "__main__":
103 |     asyncio.run(main())
104 | 
105 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
  1 | [project]
  2 | name = "context-eng-matters"
  3 | version = "1.0.0"
  4 | description = "Context Engineering Course - A comprehensive, hands-on course teaching practical context engineering patterns for AI agents using Redis, Agent Memory Server, LangChain, and LangGraph"
  5 | readme = "README.md"
  6 | requires-python = ">=3.10"
  7 | license = {text = "MIT"}
  8 | keywords = [
  9 |     "redis",
 10 |     "ai",
 11 |     "context-engineering",
 12 |     "langgraph",
 13 |     "langchain",
 14 |     "openai",
 15 |     "vector-database",
 16 |     "semantic-search",
 17 |     "memory-management",
 18 |     "rag",
 19 |     "agents",
 20 | ]
 21 | classifiers = [
 22 |     "Development Status :: 4 - Beta",
 23 |     "Intended Audience :: Developers",
 24 |     "Intended Audience :: Education",
 25 |     "Topic :: Software Development :: Libraries :: Python Modules",
 26 |     "Topic :: Scientific/Engineering :: Artificial Intelligence",
 27 |     "License :: OSI Approved :: MIT License",
 28 |     "Programming Language :: Python :: 3",
 29 |     "Programming Language :: Python :: 3.10",
 30 |     "Programming Language :: Python :: 3.11",
 31 |     "Programming Language :: Python :: 3.12",
 32 | ]
 33 | 
 34 | dependencies = [
 35 |     # LangChain ecosystem
 36 |     "langchain>=0.2.0",
 37 |     "langchain-openai>=0.1.0",
 38 |     "langchain-core>=0.2.0",
 39 |     "langchain-community>=0.2.0",
 40 |     "langchain-experimental>=0.3.0",
 41 |     "langchain-text-splitters>=0.3.0",
 42 |     # LangGraph for agent workflows
 43 |     "langgraph>=0.2.0",
 44 |     "langgraph-checkpoint>=1.0.0",
 45 |     "langgraph-checkpoint-redis>=0.1.0",
 46 |     # Redis and vector search
 47 |     "redis>=6.0.0",
 48 |     "redisvl>=0.8.0",
 49 |     # OpenAI
 50 |     "openai>=1.0.0",
 51 |     # Agent Memory Server client
 52 |     "agent-memory-client>=0.12.3",
 53 |     # Data validation and models
 54 |     "pydantic>=2.0.0",
 55 |     # Utilities
 56 |     "python-dotenv>=1.0.0",
 57 |     "click>=8.0.0",
 58 |     "rich>=13.0.0",
 59 |     "tiktoken>=0.5.0",
 60 |     "python-ulid>=3.0.0",
 61 |     # Data generation
 62 |     "faker>=20.0.0",
 63 |     "pandas>=2.0.0",
 64 |     "numpy>=1.24.0",
 65 |     # Jupyter notebooks
 66 |     "jupyter>=1.0.0",
 67 |     "ipykernel>=6.0.0",
 68 |     # Embeddings (for notebooks)
 69 |     "sentence-transformers>=2.0.0",
 70 |     "langchain-huggingface>=0.1.0",
 71 |     "pypdf",
 72 | ]
 73 | 
 74 | [project.optional-dependencies]
 75 | dev = [
 76 |     "pytest>=7.0.0",
 77 |     "pytest-asyncio>=0.21.0",
 78 |     "black>=23.0.0",
 79 |     "isort>=5.12.0",
 80 |     "mypy>=1.5.0",
 81 |     "ruff>=0.1.0",
 82 | ]
 83 | 
 84 | [project.scripts]
 85 | generate-courses = "redis_context_course.scripts.generate_courses:main"
 86 | generate-hierarchical-courses = "redis_context_course.scripts.generate_hierarchical_courses:main"
 87 | ingest-courses = "redis_context_course.scripts.ingest_courses:main"
 88 | load-hierarchical-courses = "redis_context_course.scripts.load_hierarchical_courses:main"
 89 | 
 90 | [project.urls]
 91 | Homepage = "https://github.com/redis-developer/redis-ai-resources"
 92 | Documentation = "https://github.com/redis-developer/redis-ai-resources/blob/main/python-recipes/context-engineering/README.md"
 93 | Repository = "https://github.com/redis-developer/redis-ai-resources.git"
 94 | 
 95 | [build-system]
 96 | requires = ["hatchling"]
 97 | build-backend = "hatchling.build"
 98 | 
 99 | [tool.hatch.build.targets.wheel]
100 | packages = ["src/redis_context_course"]
101 | 
102 | [tool.ruff]
103 | line-length = 88
104 | target-version = "py310"
105 | 
106 | [tool.ruff.lint]
107 | select = ["E", "F", "I", "W"]
108 | ignore = ["E501"]
109 | 
110 | [tool.pytest.ini_options]
111 | testpaths = ["tests"]
112 | python_files = ["test_*.py"]
113 | python_classes = ["Test*"]
114 | python_functions = ["test_*"]
115 | addopts = "-v --tb=short"
116 | asyncio_mode = "auto"
117 | 


--------------------------------------------------------------------------------
/src/redis_context_course/__init__.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Redis Context Course - Context Engineering Reference Implementation
  3 | 
  4 | This package provides a complete reference implementation of a context-aware
  5 | AI agent for university course recommendations and academic planning.
  6 | 
  7 | The agent demonstrates key context engineering concepts:
  8 | - System context management
  9 | - Working memory and long-term memory (via Redis Agent Memory Server)
 10 | - Tool integration and usage
 11 | - Semantic search and retrieval
 12 | - Personalized recommendations
 13 | 
 14 | Main Components:
 15 | - models: Data models for courses and students
 16 | - memory_client: Interface to Redis Agent Memory Server
 17 | - course_manager: Course storage and recommendation engine
 18 | - redis_config: Redis configuration and connections
 19 | - tools: Tool definitions for building agents
 20 | 
 21 | Installation:
 22 |     pip install redis-context-course agent-memory-server
 23 | 
 24 | Usage:
 25 |     from redis_context_course import CourseManager, MemoryClient, create_agent_tools
 26 | 
 27 |     # Initialize components
 28 |     course_manager = CourseManager()
 29 |     memory_client = MemoryClient(config=MemoryClientConfig(...))
 30 | 
 31 |     # Create tools for your agent
 32 |     tools = create_agent_tools(course_manager, memory_client, "student_id")
 33 | 
 34 | Command Line Tools:
 35 |     generate-courses --courses-per-major 15
 36 |     ingest-courses --catalog course_catalog.json
 37 | """
 38 | 
 39 | # Import core models (these have minimal dependencies)
 40 | # Import memory client directly from agent_memory_client
 41 | from agent_memory_client import MemoryAPIClient as MemoryClient
 42 | from agent_memory_client import MemoryClientConfig
 43 | 
 44 | # Import course manager
 45 | from .course_manager import CourseManager
 46 | from .models import (
 47 |     AgentResponse,
 48 |     Course,
 49 |     CourseFormat,
 50 |     CourseRecommendation,
 51 |     CourseSchedule,
 52 |     DayOfWeek,
 53 |     DifficultyLevel,
 54 |     Major,
 55 |     Prerequisite,
 56 |     Semester,
 57 |     StudentProfile,
 58 | )
 59 | 
 60 | # Import optimization helpers (from Section 4)
 61 | from .optimization_helpers import (
 62 |     classify_intent_with_llm,
 63 |     count_tokens,
 64 |     create_summary_view,
 65 |     create_user_profile_view,
 66 |     estimate_token_budget,
 67 |     extract_references,
 68 |     filter_tools_by_intent,
 69 |     format_context_for_llm,
 70 |     hybrid_retrieval,
 71 | )
 72 | from .redis_config import RedisConfig, redis_config
 73 | 
 74 | # Import tools (used in notebooks and for building agents)
 75 | from .tools import (
 76 |     create_agent_tools,
 77 |     create_course_tools,
 78 |     create_memory_tools,
 79 |     select_tools_by_keywords,
 80 | )
 81 | 
 82 | __version__ = "1.0.0"
 83 | __author__ = "Redis AI Resources Team"
 84 | __email__ = "redis-ai@redis.com"
 85 | __license__ = "MIT"
 86 | __description__ = (
 87 |     "Context Engineering with Redis - University Class Agent Reference Implementation"
 88 | )
 89 | 
 90 | __all__ = [
 91 |     # Core classes
 92 |     "MemoryClient",
 93 |     "MemoryClientConfig",
 94 |     "CourseManager",
 95 |     "RedisConfig",
 96 |     "redis_config",
 97 |     # Data models
 98 |     "Course",
 99 |     "Major",
100 |     "StudentProfile",
101 |     "CourseRecommendation",
102 |     "AgentResponse",
103 |     "Prerequisite",
104 |     "CourseSchedule",
105 |     # Enums
106 |     "DifficultyLevel",
107 |     "CourseFormat",
108 |     "Semester",
109 |     "DayOfWeek",
110 |     # Tools (for notebooks and building agents)
111 |     "create_agent_tools",
112 |     "create_course_tools",
113 |     "create_memory_tools",
114 |     "select_tools_by_keywords",
115 |     # Optimization helpers (Section 4)
116 |     "count_tokens",
117 |     "estimate_token_budget",
118 |     "hybrid_retrieval",
119 |     "create_summary_view",
120 |     "create_user_profile_view",
121 |     "filter_tools_by_intent",
122 |     "classify_intent_with_llm",
123 |     "extract_references",
124 |     "format_context_for_llm",
125 | ]
126 | 


--------------------------------------------------------------------------------
/progressive_agents/stage2_context_engineered/agent/context_engineering.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Context Engineering Functions for Stage 2.
  3 | 
  4 | These functions are from Section 2 Notebook 2 and demonstrate
  5 | the core context engineering techniques:
  6 | 1. Context Cleaning - Remove noise fields
  7 | 2. Context Transformation - JSON → natural text
  8 | 3. Context Optimization - Token compression
  9 | 
 10 | Students will see how these techniques improve upon Stage 1's raw context.
 11 | """
 12 | 
 13 | from redis_context_course.models import Course
 14 | 
 15 | 
 16 | def transform_course_to_text(course: Course) -> str:
 17 |     """
 18 |     Transform course object to LLM-optimized text format.
 19 | 
 20 |     This is the context engineering technique from Section 2 Notebook 2.
 21 |     Converts structured course data into natural text format that's easier
 22 |     for LLMs to process.
 23 | 
 24 |     Context Engineering Steps Applied:
 25 |     1. CLEAN: Only include relevant fields (no id, timestamps, enrollment)
 26 |     2. TRANSFORM: Convert JSON → natural text format
 27 |     3. STRUCTURE: Use consistent, readable formatting
 28 | 
 29 |     Args:
 30 |         course: Course object to transform
 31 | 
 32 |     Returns:
 33 |         LLM-friendly text representation
 34 |     """
 35 |     # Build prerequisites text
 36 |     prereq_text = ""
 37 |     if course.prerequisites:
 38 |         prereq_codes = [p.course_code for p in course.prerequisites]
 39 |         prereq_text = f"\nPrerequisites: {', '.join(prereq_codes)}"
 40 | 
 41 |     # Build learning objectives text
 42 |     objectives_text = ""
 43 |     if course.learning_objectives:
 44 |         objectives_text = f"\nLearning Objectives:\n" + "\n".join(
 45 |             f"  - {obj}" for obj in course.learning_objectives
 46 |         )
 47 | 
 48 |     # Build course text (CLEANED and TRANSFORMED)
 49 |     course_text = f"""{course.course_code}: {course.title}
 50 | Department: {course.department}
 51 | Credits: {course.credits}
 52 | Level: {course.difficulty_level.value}
 53 | Format: {course.format.value}
 54 | Instructor: {course.instructor}{prereq_text}
 55 | Description: {course.description}{objectives_text}"""
 56 | 
 57 |     return course_text
 58 | 
 59 | 
 60 | def optimize_course_text(course: Course) -> str:
 61 |     """
 62 |     Create ultra-compact course description.
 63 | 
 64 |     This is the optimization technique from Section 2 Notebook 2.
 65 |     Reduces token count while preserving essential information.
 66 | 
 67 |     Use this when you need maximum token efficiency (e.g., many courses).
 68 |     Use transform_course_to_text() when you need full details.
 69 | 
 70 |     Args:
 71 |         course: Course object to optimize
 72 | 
 73 |     Returns:
 74 |         Compact text representation
 75 |     """
 76 |     prereqs = (
 77 |         f" (Prereq: {', '.join([p.course_code for p in course.prerequisites])})"
 78 |         if course.prerequisites
 79 |         else ""
 80 |     )
 81 |     return (
 82 |         f"{course.course_code}: {course.title} - {course.description[:100]}...{prereqs}"
 83 |     )
 84 | 
 85 | 
 86 | def format_courses_for_llm(courses: list[Course], use_optimized: bool = False) -> str:
 87 |     """
 88 |     Format a list of courses for LLM consumption.
 89 | 
 90 |     Applies context engineering to all courses and combines them
 91 |     into a single, well-structured context string.
 92 | 
 93 |     Args:
 94 |         courses: List of Course objects
 95 |         use_optimized: If True, use compact format; if False, use full format
 96 | 
 97 |     Returns:
 98 |         Formatted context string ready for LLM
 99 |     """
100 |     if not courses:
101 |         return "No courses found."
102 | 
103 |     formatted_courses = []
104 | 
105 |     for i, course in enumerate(courses, 1):
106 |         if use_optimized:
107 |             course_text = optimize_course_text(course)
108 |         else:
109 |             course_text = transform_course_to_text(course)
110 | 
111 |         formatted_courses.append(f"Course {i}:\n{course_text}")
112 | 
113 |     return "\n\n".join(formatted_courses)
114 | 


--------------------------------------------------------------------------------
/notebooks/SETUP_GUIDE.md:
--------------------------------------------------------------------------------
  1 | # 🚀 Setup Guide for Context Engineering Notebooks
  2 | 
  3 | This guide helps you set up all required services for the Context Engineering course notebooks.
  4 | 
  5 | ## 📋 Prerequisites
  6 | 
  7 | Before running any notebooks, you need:
  8 | 
  9 | 1. **Docker Desktop** - For Redis and Agent Memory Server
 10 | 2. **Python 3.11+** - For running notebooks
 11 | 3. **OpenAI API Key** - For LLM functionality
 12 | 
 13 | ## ⚡ Quick Setup (Recommended)
 14 | 
 15 | From the repository root:
 16 | 
 17 | ```bash
 18 | # 1. Copy environment file and add your OpenAI API key
 19 | cp .env.example .env
 20 | # Edit .env and add your OPENAI_API_KEY
 21 | 
 22 | # 2. Start services
 23 | docker-compose up -d
 24 | 
 25 | # 3. Install dependencies
 26 | uv sync
 27 | 
 28 | # 4. Load course data
 29 | uv run python -m redis_context_course.scripts.ingest_courses \
 30 |   --catalog src/redis_context_course/data/courses.json \
 31 |   --index-name hierarchical_courses \
 32 |   --clear
 33 | 
 34 | # 5. Run notebooks
 35 | uv run jupyter notebook notebooks/
 36 | ```
 37 | 
 38 | ## 🔧 Manual Setup
 39 | 
 40 | If you prefer to set up services manually:
 41 | 
 42 | ### 1. Environment Variables
 43 | 
 44 | Create a `.env` file in the repository root:
 45 | 
 46 | ```bash
 47 | # Create .env file
 48 | cat > .env << EOF
 49 | OPENAI_API_KEY=your_openai_api_key_here
 50 | REDIS_URL=redis://localhost:6379
 51 | AGENT_MEMORY_SERVER_URL=http://localhost:8088
 52 | OPENAI_MODEL=gpt-4o
 53 | EOF
 54 | ```
 55 | 
 56 | ### 2. Start Redis
 57 | 
 58 | ```bash
 59 | docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest
 60 | ```
 61 | 
 62 | ### 3. Start Agent Memory Server
 63 | 
 64 | ```bash
 65 | docker run -d --name agent-memory-server \
 66 |     -p 8088:8000 \
 67 |     -e REDIS_URL=redis://host.docker.internal:6379 \
 68 |     -e OPENAI_API_KEY="your_openai_api_key_here" \
 69 |     ghcr.io/redis/agent-memory-server:0.12.3
 70 | ```
 71 | 
 72 | ## ✅ Verify Setup
 73 | 
 74 | ```bash
 75 | # Check Redis
 76 | docker exec redis redis-cli ping
 77 | # Should return: PONG
 78 | 
 79 | # Check Agent Memory Server
 80 | curl http://localhost:8088/v1/health
 81 | # Should return: {"now":<timestamp>}
 82 | 
 83 | # Check Docker containers
 84 | docker ps
 85 | # Should show both redis and agent-memory-server
 86 | ```
 87 | 
 88 | ## 🚨 Troubleshooting
 89 | 
 90 | ### Redis Connection Issues
 91 | 
 92 | If you see Redis connection errors:
 93 | 
 94 | ```bash
 95 | # Stop and restart Agent Memory Server
 96 | docker stop agent-memory-server
 97 | docker rm agent-memory-server
 98 | 
 99 | # Restart with correct Redis URL
100 | docker run -d --name agent-memory-server \
101 |     -p 8088:8000 \
102 |     -e REDIS_URL=redis://host.docker.internal:6379 \
103 |     -e OPENAI_API_KEY="your_openai_api_key_here" \
104 |     ghcr.io/redis/agent-memory-server:0.12.3
105 | ```
106 | 
107 | ### Port Conflicts
108 | 
109 | If ports 6379 or 8088 are in use:
110 | 
111 | ```bash
112 | # Check what's using the ports
113 | lsof -i :6379
114 | lsof -i :8088
115 | 
116 | # Stop conflicting services or use different ports
117 | ```
118 | 
119 | ### Docker Issues
120 | 
121 | If Docker commands fail:
122 | 
123 | 1. Make sure Docker Desktop is running
124 | 2. Check Docker has enough resources allocated
125 | 3. Try restarting Docker Desktop
126 | 
127 | ## 📚 Next Steps
128 | 
129 | Once setup is complete:
130 | 
131 | 1. **Start with Section 1** if you're new to context engineering
132 | 2. **Jump to Section 4** if you want to learn about memory tools and agents
133 | 3. **Check the README** in each section for specific requirements
134 | 
135 | ## 🔗 Section-Specific Requirements
136 | 
137 | ### Section 3 & 4: Memory Systems & Tools/Agents
138 | - ✅ Redis (for vector storage)
139 | - ✅ Agent Memory Server (for memory management)
140 | - ✅ OpenAI API key
141 | 
142 | ### Section 2: RAG Foundations  
143 | - ✅ Redis (for vector storage)
144 | - ✅ OpenAI API key
145 | 
146 | ### Section 1: Context Foundations
147 | - ✅ OpenAI API key only
148 | 
149 | ---
150 | 
151 | **Need help?** Check the troubleshooting section or review the setup scripts for detailed error handling.
152 | 


--------------------------------------------------------------------------------
/test_openai_connection.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """
  3 | Test script to diagnose OpenAI API connection issues.
  4 | """
  5 | 
  6 | import os
  7 | import sys
  8 | from pathlib import Path
  9 | from dotenv import load_dotenv
 10 | 
 11 | # Load .env
 12 | env_path = Path(__file__).parent / ".env"
 13 | print(f"Loading .env from: {env_path}")
 14 | load_dotenv(env_path)
 15 | 
 16 | api_key = os.getenv("OPENAI_API_KEY")
 17 | print(f"API Key loaded: {'Yes' if api_key else 'No'}")
 18 | if api_key:
 19 |     print(f"API Key prefix: {api_key[:20]}...")
 20 | 
 21 | print("\n" + "="*80)
 22 | print("Test 1: Direct socket connection")
 23 | print("="*80)
 24 | 
 25 | import socket
 26 | import ssl
 27 | 
 28 | try:
 29 |     sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 30 |     sock.settimeout(5)
 31 |     print("Connecting to api.openai.com:443...")
 32 |     sock.connect(("api.openai.com", 443))
 33 |     print("✅ TCP connection successful!")
 34 |     
 35 |     context = ssl.create_default_context()
 36 |     ssock = context.wrap_socket(sock, server_hostname="api.openai.com")
 37 |     print("✅ SSL handshake successful!")
 38 |     ssock.close()
 39 | except Exception as e:
 40 |     print(f"❌ Failed: {type(e).__name__}: {e}")
 41 |     sys.exit(1)
 42 | 
 43 | print("\n" + "="*80)
 44 | print("Test 2: OpenAI Python SDK (synchronous)")
 45 | print("="*80)
 46 | 
 47 | try:
 48 |     import openai
 49 |     
 50 |     print("Creating OpenAI client with 10s timeout...")
 51 |     client = openai.OpenAI(timeout=10.0, max_retries=1)
 52 |     
 53 |     print("Making chat completion request...")
 54 |     response = client.chat.completions.create(
 55 |         model="gpt-4o-mini",
 56 |         messages=[{"role": "user", "content": "Say 'Hello' in one word"}],
 57 |         max_tokens=5
 58 |     )
 59 |     
 60 |     result = response.choices[0].message.content
 61 |     print(f"✅ Success! Response: {result}")
 62 |     
 63 | except Exception as e:
 64 |     print(f"❌ Failed: {type(e).__name__}: {e}")
 65 |     import traceback
 66 |     traceback.print_exc()
 67 |     sys.exit(1)
 68 | 
 69 | print("\n" + "="*80)
 70 | print("Test 3: LangChain ChatOpenAI (synchronous)")
 71 | print("="*80)
 72 | 
 73 | try:
 74 |     from langchain_openai import ChatOpenAI
 75 |     from langchain_core.messages import HumanMessage
 76 |     
 77 |     print("Creating ChatOpenAI with 10s timeout...")
 78 |     llm = ChatOpenAI(
 79 |         model="gpt-4o-mini",
 80 |         temperature=0.1,
 81 |         max_tokens=10,
 82 |         timeout=10,
 83 |         max_retries=1
 84 |     )
 85 |     
 86 |     print("Making invoke call...")
 87 |     response = llm.invoke([HumanMessage(content="Say 'Hello' in one word")])
 88 |     
 89 |     print(f"✅ Success! Response: {response.content}")
 90 |     
 91 | except Exception as e:
 92 |     print(f"❌ Failed: {type(e).__name__}: {e}")
 93 |     import traceback
 94 |     traceback.print_exc()
 95 |     sys.exit(1)
 96 | 
 97 | print("\n" + "="*80)
 98 | print("Test 4: LangChain ChatOpenAI (async)")
 99 | print("="*80)
100 | 
101 | try:
102 |     import asyncio
103 |     from langchain_openai import ChatOpenAI
104 |     from langchain_core.messages import HumanMessage
105 |     
106 |     async def test_async():
107 |         print("Creating ChatOpenAI with 10s timeout...")
108 |         llm = ChatOpenAI(
109 |             model="gpt-4o-mini",
110 |             temperature=0.1,
111 |             max_tokens=10,
112 |             timeout=10,
113 |             max_retries=1
114 |         )
115 |         
116 |         print("Making ainvoke call...")
117 |         response = await llm.ainvoke([HumanMessage(content="Say 'Hello' in one word")])
118 |         return response.content
119 |     
120 |     result = asyncio.run(test_async())
121 |     print(f"✅ Success! Response: {result}")
122 |     
123 | except Exception as e:
124 |     print(f"❌ Failed: {type(e).__name__}: {e}")
125 |     import traceback
126 |     traceback.print_exc()
127 |     sys.exit(1)
128 | 
129 | print("\n" + "="*80)
130 | print("✅ ALL TESTS PASSED!")
131 | print("="*80)
132 | print("\nOpenAI API connection is working correctly.")
133 | print("The agents should work now.")
134 | 
135 | 


--------------------------------------------------------------------------------
/progressive_agents/stage4_hybrid_search/agent/react_parser.py:
--------------------------------------------------------------------------------
  1 | """
  2 | ReAct output parser for Stage 4 ReAct.
  3 | 
  4 | Parses LLM output in the Thought → Action → Observation format.
  5 | """
  6 | 
  7 | import json
  8 | import re
  9 | from typing import Any, Dict, Optional
 10 | 
 11 | 
 12 | def parse_react_output(text: str) -> Dict[str, Optional[str]]:
 13 |     """
 14 |     Parse ReAct format output from LLM.
 15 | 
 16 |     Expected format:
 17 |         Thought: [reasoning]
 18 |         Action: [action_name]
 19 |         Action Input: [JSON input]
 20 | 
 21 |     Args:
 22 |         text: Raw LLM output text
 23 | 
 24 |     Returns:
 25 |         Dictionary with 'thought', 'action', and 'action_input' keys
 26 |     """
 27 |     # Extract Thought (everything between "Thought:" and "Action:")
 28 |     thought_match = re.search(
 29 |         r"Thought:\s*(.+?)(?=\nAction:|\Z)", text, re.DOTALL | re.IGNORECASE
 30 |     )
 31 | 
 32 |     # Extract Action (word after "Action:")
 33 |     action_match = re.search(r"Action:\s*(\w+)", text, re.IGNORECASE)
 34 | 
 35 |     # Extract Action Input - everything after "Action Input:" until end or next section
 36 |     # For FINISH actions, this can be multi-line plain text
 37 |     action_input_match = re.search(
 38 |         r"Action Input:\s*(.+?)(?=\nThought:|\nObservation:|\nAction:|\Z)",
 39 |         text,
 40 |         re.DOTALL | re.IGNORECASE,
 41 |     )
 42 | 
 43 |     return {
 44 |         "thought": thought_match.group(1).strip() if thought_match else None,
 45 |         "action": action_match.group(1).strip() if action_match else None,
 46 |         "action_input": (
 47 |             action_input_match.group(1).strip() if action_input_match else None
 48 |         ),
 49 |     }
 50 | 
 51 | 
 52 | def validate_action_input(action_input: str) -> Optional[Dict[str, Any]]:
 53 |     """
 54 |     Validate and parse action input as JSON.
 55 | 
 56 |     Args:
 57 |         action_input: Raw action input string
 58 | 
 59 |     Returns:
 60 |         Parsed JSON dict, or None if invalid
 61 |     """
 62 |     if not action_input:
 63 |         return None
 64 | 
 65 |     try:
 66 |         # Try to parse as JSON
 67 |         parsed = json.loads(action_input)
 68 |         return parsed
 69 |     except json.JSONDecodeError:
 70 |         # If not valid JSON, try to extract JSON-like content
 71 |         json_match = re.search(r"({.+}|\[.+\])", action_input, re.DOTALL)
 72 |         if json_match:
 73 |             try:
 74 |                 return json.loads(json_match.group(1))
 75 |             except json.JSONDecodeError:
 76 |                 pass
 77 |         return None
 78 | 
 79 | 
 80 | def format_observation(result: str, max_length: int = 8000) -> str:
 81 |     """
 82 |     Format tool result as observation for ReAct loop.
 83 | 
 84 |     Args:
 85 |         result: Raw tool result
 86 |         max_length: Maximum length of observation (default 8000 chars ≈ 2000 tokens)
 87 |                    This should be large enough to include full syllabus data.
 88 | 
 89 |     Returns:
 90 |         Formatted observation string
 91 |     """
 92 |     if len(result) > max_length:
 93 |         result = result[:max_length] + "... [truncated]"
 94 |     return f"Observation: {result}"
 95 | 
 96 | 
 97 | def extract_final_answer(action_input: str) -> str:
 98 |     """
 99 |     Extract final answer from FINISH action input.
100 | 
101 |     Args:
102 |         action_input: Action input for FINISH action
103 | 
104 |     Returns:
105 |         Final answer text
106 |     """
107 |     parsed = validate_action_input(action_input)
108 |     if parsed:
109 |         if isinstance(parsed, dict):
110 |             return parsed.get("answer", parsed.get("response", str(parsed)))
111 |         return str(parsed)
112 |     return action_input.strip()
113 | 
114 | 
115 | def is_valid_react_output(text: str) -> bool:
116 |     """
117 |     Check if LLM output is valid ReAct format.
118 | 
119 |     Args:
120 |         text: Raw LLM output
121 | 
122 |     Returns:
123 |         True if valid ReAct format, False otherwise
124 |     """
125 |     parsed = parse_react_output(text)
126 |     if not parsed["action"]:
127 |         return False
128 |     if parsed["action"].upper() != "FINISH" and not parsed["action_input"]:
129 |         return False
130 |     return True
131 | 
132 | 
133 | def format_react_error(error: str) -> str:
134 |     """
135 |     Format error message for ReAct loop.
136 | 
137 |     Args:
138 |         error: Error message
139 | 
140 |     Returns:
141 |         Formatted error observation
142 |     """
143 |     return f"Observation: Error - {error}"
144 | 
145 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/test_react_multi_turn.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Multi-turn conversation tests for Stage 5 ReAct agent.
  3 | 
  4 | Tests the ReAct pattern with conversation history and context.
  5 | """
  6 | 
  7 | import asyncio
  8 | import logging
  9 | import os
 10 | from pathlib import Path
 11 | from typing import List, Dict
 12 | 
 13 | from langchain_openai import ChatOpenAI
 14 | from redis_context_course import CourseManager
 15 | 
 16 | from progressive_agents.stage5_working_memory.agent.react_agent import run_react_agent
 17 | from progressive_agents.stage5_working_memory.agent.tools import initialize_tools
 18 | 
 19 | # Configure logging
 20 | logging.basicConfig(
 21 |     level=logging.INFO,
 22 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 23 | )
 24 | logger = logging.getLogger("course-qa-workflow")
 25 | 
 26 | 
 27 | async def run_conversation(queries: List[str], conversation_name: str):
 28 |     """Run a multi-turn conversation."""
 29 |     print("\n" + "=" * 80)
 30 |     print(f"CONVERSATION: {conversation_name}")
 31 |     print("=" * 80)
 32 |     
 33 |     # Initialize
 34 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 35 |     course_manager = CourseManager()
 36 |     initialize_tools(course_manager)
 37 |     
 38 |     # Track conversation history
 39 |     conversation_history = []
 40 |     
 41 |     for i, query in enumerate(queries, 1):
 42 |         print(f"\n{'─' * 80}")
 43 |         print(f"Turn {i}: {query}")
 44 |         print(f"{'─' * 80}")
 45 |         
 46 |         # Run agent
 47 |         result = await run_react_agent(
 48 |             query=query,
 49 |             llm=llm,
 50 |             conversation_history=conversation_history,
 51 |             max_iterations=10,
 52 |         )
 53 |         
 54 |         # Print reasoning trace
 55 |         print(f"\nREASONING STEPS:")
 56 |         for j, step in enumerate(result["reasoning_trace"], 1):
 57 |             print(f"\n  Step {j}:")
 58 |             print(f"    Thought: {step.thought}")
 59 |             print(f"    Action: {step.action}")
 60 |             if step.action != "FINISH":
 61 |                 print(f"    Observation: {step.observation[:150]}...")
 62 |         
 63 |         # Print answer
 64 |         print(f"\n{'─' * 80}")
 65 |         print(f"ANSWER:")
 66 |         print(f"{'─' * 80}")
 67 |         print(result["answer"])
 68 |         print(f"\nIterations: {result['iterations']} | Success: {result['success']}")
 69 |         
 70 |         # Update conversation history
 71 |         conversation_history.append({"role": "user", "content": query})
 72 |         conversation_history.append({"role": "assistant", "content": result["answer"]})
 73 | 
 74 | 
 75 | async def test_pronoun_resolution():
 76 |     """Test pronoun resolution across turns."""
 77 |     queries = [
 78 |         "What is CS002?",
 79 |         "What are the prerequisites for it?",
 80 |         "Tell me more about the syllabus",
 81 |     ]
 82 |     await run_conversation(queries, "Pronoun Resolution Test")
 83 | 
 84 | 
 85 | async def test_follow_up_questions():
 86 |     """Test follow-up questions building on previous context."""
 87 |     queries = [
 88 |         "Tell me about machine learning courses",
 89 |         "Which one is best for beginners?",
 90 |         "What are the prerequisites for that course?",
 91 |     ]
 92 |     await run_conversation(queries, "Follow-up Questions Test")
 93 | 
 94 | 
 95 | async def test_comparison_across_turns():
 96 |     """Test comparing courses mentioned in different turns."""
 97 |     queries = [
 98 |         "What is CS001?",
 99 |         "What is CS002?",
100 |         "Which one should I take first?",
101 |     ]
102 |     await run_conversation(queries, "Comparison Across Turns Test")
103 | 
104 | 
105 | async def test_context_accumulation():
106 |     """Test accumulating context across multiple turns."""
107 |     queries = [
108 |         "I'm interested in computer vision",
109 |         "What courses cover that topic?",
110 |         "What are the prerequisites for the advanced one?",
111 |         "Are there any beginner courses I should take first?",
112 |     ]
113 |     await run_conversation(queries, "Context Accumulation Test")
114 | 
115 | 
116 | async def main():
117 |     """Run all multi-turn tests."""
118 |     await test_pronoun_resolution()
119 |     await test_follow_up_questions()
120 |     await test_comparison_across_turns()
121 |     await test_context_accumulation()
122 |     
123 |     print("\n" + "=" * 80)
124 |     print("ALL MULTI-TURN TESTS COMPLETE")
125 |     print("=" * 80)
126 | 
127 | 
128 | if __name__ == "__main__":
129 |     asyncio.run(main())
130 | 
131 | 


--------------------------------------------------------------------------------
/progressive_agents/stage1_baseline_rag/agent/setup.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Setup and initialization for Stage 1 Baseline RAG Agent.
  3 | 
  4 | Handles:
  5 | - Course data loading
  6 | - Agent initialization
  7 | - Cleanup
  8 | """
  9 | 
 10 | import logging
 11 | from typing import Optional
 12 | 
 13 | from redis_context_course import CourseManager
 14 | from redis_context_course.scripts.generate_courses import CourseGenerator
 15 | from redis_context_course.scripts.ingest_courses import CourseIngestionPipeline
 16 | 
 17 | logger = logging.getLogger("stage1-baseline")
 18 | 
 19 | 
 20 | async def load_courses_if_needed(
 21 |     course_manager: CourseManager, force_reload: bool = False
 22 | ) -> int:
 23 |     """
 24 |     Load sample courses into Redis if not already present.
 25 | 
 26 |     Args:
 27 |         course_manager: CourseManager instance
 28 |         force_reload: If True, regenerate courses even if they exist
 29 | 
 30 |     Returns:
 31 |         Number of courses loaded
 32 |     """
 33 |     # Check if courses already exist
 34 |     existing_courses = await course_manager.get_all_courses()
 35 | 
 36 |     if existing_courses and not force_reload:
 37 |         logger.info(f"📚 Found {len(existing_courses)} existing courses in Redis")
 38 |         return len(existing_courses)
 39 | 
 40 |     logger.info("📚 Generating sample courses...")
 41 | 
 42 |     # Generate sample courses
 43 |     generator = CourseGenerator()
 44 |     courses = generator.generate_courses(courses_per_major=10)
 45 | 
 46 |     # Convert to format expected by ingestion pipeline
 47 |     courses_data = [course.model_dump(mode="json") for course in courses]
 48 | 
 49 |     logger.info(f"📥 Ingesting {len(courses_data)} courses into Redis...")
 50 | 
 51 |     # Ingest into Redis
 52 |     ingestion = CourseIngestionPipeline()
 53 | 
 54 |     # Clear existing data if force reload
 55 |     if force_reload:
 56 |         ingestion.clear_existing_data()
 57 | 
 58 |     ingested_count = await ingestion.ingest_courses(courses_data)
 59 | 
 60 |     logger.info(f"✅ Successfully loaded {ingested_count} courses")
 61 | 
 62 |     return ingested_count
 63 | 
 64 | 
 65 | async def cleanup_courses(course_manager: CourseManager):
 66 |     """
 67 |     Remove all courses from Redis.
 68 | 
 69 |     Args:
 70 |         course_manager: CourseManager instance
 71 |     """
 72 |     logger.info("🧹 Cleaning up courses from Redis...")
 73 | 
 74 |     # Get all courses
 75 |     courses = await course_manager.get_all_courses()
 76 | 
 77 |     if not courses:
 78 |         logger.info("No courses to clean up")
 79 |         return
 80 | 
 81 |     # Delete each course
 82 |     for course in courses:
 83 |         await course_manager.delete_course(course.course_code)
 84 | 
 85 |     logger.info(f"✅ Removed {len(courses)} courses from Redis")
 86 | 
 87 | 
 88 | def setup_agent(
 89 |     course_manager: Optional[CourseManager] = None,
 90 |     auto_load_courses: bool = True,
 91 |     verbose: bool = True,
 92 | ) -> tuple:
 93 |     """
 94 |     Initialize the Stage 1 Baseline RAG agent.
 95 | 
 96 |     Args:
 97 |         course_manager: Optional CourseManager instance (creates new if None)
 98 |         auto_load_courses: If True, automatically load courses if Redis is empty
 99 |         verbose: If True, show detailed logging. If False, suppress intermediate logs.
100 | 
101 |     Returns:
102 |         Tuple of (workflow, course_manager)
103 |     """
104 |     from .nodes import initialize_nodes
105 |     from .workflow import create_workflow
106 | 
107 |     # Control logger level based on verbose flag
108 |     if not verbose:
109 |         logger.setLevel(logging.CRITICAL)
110 |     else:
111 |         logger.setLevel(logging.INFO)
112 | 
113 |     logger.info("🚀 Initializing Stage 1 Baseline RAG Agent...")
114 | 
115 |     # Create course manager if not provided
116 |     if course_manager is None:
117 |         course_manager = CourseManager()
118 | 
119 |     # Auto-load courses if needed
120 |     if auto_load_courses:
121 |         import asyncio
122 | 
123 |         try:
124 |             loop = asyncio.get_event_loop()
125 |         except RuntimeError:
126 |             loop = asyncio.new_event_loop()
127 |             asyncio.set_event_loop(loop)
128 | 
129 |         courses_loaded = loop.run_until_complete(load_courses_if_needed(course_manager))
130 |         logger.info(f"✅ {courses_loaded} courses available")
131 | 
132 |     # Initialize nodes with course manager
133 |     initialize_nodes(course_manager)
134 | 
135 |     # Create workflow with verbose setting
136 |     workflow = create_workflow(verbose=verbose)
137 | 
138 |     logger.info("✅ Stage 1 Baseline RAG Agent initialized")
139 |     logger.warning("⚠️  This agent uses RAW context - no optimization!")
140 | 
141 |     return workflow, course_manager
142 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/agent/react_parser.py:
--------------------------------------------------------------------------------
  1 | """
  2 | ReAct output parser for Stage 7.
  3 | 
  4 | Parses LLM output in the Thought → Action → Observation format.
  5 | """
  6 | 
  7 | import json
  8 | import re
  9 | from typing import Any, Dict, Optional
 10 | 
 11 | 
 12 | def parse_react_output(text: str) -> Dict[str, Optional[str]]:
 13 |     """
 14 |     Parse ReAct format output from LLM.
 15 | 
 16 |     Expected format:
 17 |         Thought: [reasoning]
 18 |         Action: [action_name]
 19 |         Action Input: [JSON input]
 20 | 
 21 |     Args:
 22 |         text: Raw LLM output text
 23 | 
 24 |     Returns:
 25 |         Dictionary with 'thought', 'action', and 'action_input' keys
 26 |     """
 27 |     # Extract Thought (everything between "Thought:" and "Action:")
 28 |     thought_match = re.search(
 29 |         r"Thought:\s*(.+?)(?=\nAction:|\Z)", text, re.DOTALL | re.IGNORECASE
 30 |     )
 31 | 
 32 |     # Extract Action (word after "Action:")
 33 |     action_match = re.search(r"Action:\s*(\w+)", text, re.IGNORECASE)
 34 | 
 35 |     # Extract Action Input - everything after "Action Input:" until end or next section
 36 |     # For FINISH actions, this can be multi-line plain text
 37 |     action_input_match = re.search(
 38 |         r"Action Input:\s*(.+?)(?=\nThought:|\nObservation:|\nAction:|\Z)",
 39 |         text,
 40 |         re.DOTALL | re.IGNORECASE,
 41 |     )
 42 | 
 43 |     return {
 44 |         "thought": thought_match.group(1).strip() if thought_match else None,
 45 |         "action": action_match.group(1).strip() if action_match else None,
 46 |         "action_input": (
 47 |             action_input_match.group(1).strip() if action_input_match else None
 48 |         ),
 49 |     }
 50 | 
 51 | 
 52 | def validate_action_input(action_input: str) -> Optional[Dict[str, Any]]:
 53 |     """
 54 |     Validate and parse action input as JSON.
 55 | 
 56 |     Args:
 57 |         action_input: Raw action input string
 58 | 
 59 |     Returns:
 60 |         Parsed JSON dict, or None if invalid
 61 |     """
 62 |     if not action_input:
 63 |         return None
 64 | 
 65 |     try:
 66 |         # Try to parse as JSON
 67 |         parsed = json.loads(action_input)
 68 |         return parsed
 69 |     except json.JSONDecodeError:
 70 |         # If not valid JSON, try to extract JSON-like content
 71 |         # Sometimes LLM adds extra text around JSON
 72 |         json_match = re.search(r"({.+}|\[.+\])", action_input, re.DOTALL)
 73 |         if json_match:
 74 |             try:
 75 |                 return json.loads(json_match.group(1))
 76 |             except json.JSONDecodeError:
 77 |                 pass
 78 | 
 79 |         return None
 80 | 
 81 | 
 82 | def format_observation(result: str, max_length: int = 8000) -> str:
 83 |     """
 84 |     Format tool result as observation for ReAct loop.
 85 | 
 86 |     Args:
 87 |         result: Raw tool result
 88 |         max_length: Maximum length of observation
 89 | 
 90 |     Returns:
 91 |         Formatted observation string
 92 |     """
 93 |     # Truncate if too long
 94 |     if len(result) > max_length:
 95 |         result = result[:max_length] + "..."
 96 | 
 97 |     return f"Observation: {result}"
 98 | 
 99 | 
100 | def extract_final_answer(action_input: str) -> str:
101 |     """
102 |     Extract final answer from FINISH action input.
103 | 
104 |     Args:
105 |         action_input: Action input for FINISH action
106 | 
107 |     Returns:
108 |         Final answer text
109 |     """
110 |     # If it's JSON, try to extract 'answer' or 'response' field
111 |     parsed = validate_action_input(action_input)
112 |     if parsed:
113 |         if isinstance(parsed, dict):
114 |             return parsed.get("answer", parsed.get("response", str(parsed)))
115 |         return str(parsed)
116 | 
117 |     # Otherwise, return as-is
118 |     return action_input.strip()
119 | 
120 | 
121 | def is_valid_react_output(text: str) -> bool:
122 |     """
123 |     Check if LLM output is valid ReAct format.
124 | 
125 |     Args:
126 |         text: Raw LLM output
127 | 
128 |     Returns:
129 |         True if valid ReAct format, False otherwise
130 |     """
131 |     parsed = parse_react_output(text)
132 | 
133 |     # Must have at least an action
134 |     if not parsed["action"]:
135 |         return False
136 | 
137 |     # If action is not FINISH, must have action_input
138 |     if parsed["action"].upper() != "FINISH" and not parsed["action_input"]:
139 |         return False
140 | 
141 |     return True
142 | 
143 | 
144 | def format_react_error(error: str) -> str:
145 |     """
146 |     Format error message for ReAct loop.
147 | 
148 |     Args:
149 |         error: Error message
150 | 
151 |     Returns:
152 |         Formatted error observation
153 |     """
154 |     return f"Observation: Error - {error}"
155 | 
156 | 


--------------------------------------------------------------------------------
/progressive_agents/stage2_context_engineered/agent/setup.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Setup and initialization for Stage 2 Context-Engineered Agent.
  3 | 
  4 | Handles:
  5 | - Course data loading
  6 | - Agent initialization
  7 | - Cleanup
  8 | """
  9 | 
 10 | import logging
 11 | from typing import Optional
 12 | 
 13 | from redis_context_course import CourseManager
 14 | from redis_context_course.scripts.generate_courses import CourseGenerator
 15 | from redis_context_course.scripts.ingest_courses import CourseIngestionPipeline
 16 | 
 17 | logger = logging.getLogger("stage2-engineered")
 18 | 
 19 | 
 20 | async def load_courses_if_needed(
 21 |     course_manager: CourseManager, force_reload: bool = False
 22 | ) -> int:
 23 |     """
 24 |     Load sample courses into Redis if not already present.
 25 | 
 26 |     Args:
 27 |         course_manager: CourseManager instance
 28 |         force_reload: If True, regenerate courses even if they exist
 29 | 
 30 |     Returns:
 31 |         Number of courses loaded
 32 |     """
 33 |     # Check if courses already exist
 34 |     existing_courses = await course_manager.get_all_courses()
 35 | 
 36 |     if existing_courses and not force_reload:
 37 |         logger.info(f"📚 Found {len(existing_courses)} existing courses in Redis")
 38 |         return len(existing_courses)
 39 | 
 40 |     logger.info("📚 Generating sample courses...")
 41 | 
 42 |     # Generate sample courses
 43 |     generator = CourseGenerator()
 44 |     courses = generator.generate_courses(courses_per_major=10)
 45 | 
 46 |     # Convert to format expected by ingestion pipeline
 47 |     courses_data = [course.model_dump(mode="json") for course in courses]
 48 | 
 49 |     logger.info(f"📥 Ingesting {len(courses_data)} courses into Redis...")
 50 | 
 51 |     # Ingest into Redis
 52 |     ingestion = CourseIngestionPipeline()
 53 | 
 54 |     # Clear existing data if force reload
 55 |     if force_reload:
 56 |         ingestion.clear_existing_data()
 57 | 
 58 |     ingested_count = await ingestion.ingest_courses(courses_data)
 59 | 
 60 |     logger.info(f"✅ Successfully loaded {ingested_count} courses")
 61 | 
 62 |     return ingested_count
 63 | 
 64 | 
 65 | async def cleanup_courses(course_manager: CourseManager):
 66 |     """
 67 |     Remove all courses from Redis.
 68 | 
 69 |     Args:
 70 |         course_manager: CourseManager instance
 71 |     """
 72 |     logger.info("🧹 Cleaning up courses from Redis...")
 73 | 
 74 |     # Get all courses
 75 |     courses = await course_manager.get_all_courses()
 76 | 
 77 |     if not courses:
 78 |         logger.info("No courses to clean up")
 79 |         return
 80 | 
 81 |     # Delete each course
 82 |     for course in courses:
 83 |         await course_manager.delete_course(course.course_code)
 84 | 
 85 |     logger.info(f"✅ Removed {len(courses)} courses from Redis")
 86 | 
 87 | 
 88 | def setup_agent(
 89 |     course_manager: Optional[CourseManager] = None,
 90 |     auto_load_courses: bool = True,
 91 |     verbose: bool = True,
 92 | ) -> tuple:
 93 |     """
 94 |     Initialize the Stage 2 Context-Engineered Agent.
 95 | 
 96 |     Args:
 97 |         course_manager: Optional CourseManager instance (creates new if None)
 98 |         auto_load_courses: If True, automatically load courses if Redis is empty
 99 |         verbose: If True, show detailed logging. If False, suppress intermediate logs.
100 | 
101 |     Returns:
102 |         Tuple of (workflow, course_manager)
103 |     """
104 |     from .nodes import initialize_nodes
105 |     from .workflow import create_workflow
106 | 
107 |     # Control logger level based on verbose flag
108 |     if not verbose:
109 |         logger.setLevel(logging.CRITICAL)
110 |     else:
111 |         logger.setLevel(logging.INFO)
112 | 
113 |     logger.info("🚀 Initializing Stage 2 Context-Engineered Agent...")
114 | 
115 |     # Create course manager if not provided
116 |     if course_manager is None:
117 |         course_manager = CourseManager()
118 | 
119 |     # Auto-load courses if needed
120 |     if auto_load_courses:
121 |         import asyncio
122 | 
123 |         try:
124 |             loop = asyncio.get_event_loop()
125 |         except RuntimeError:
126 |             loop = asyncio.new_event_loop()
127 |             asyncio.set_event_loop(loop)
128 | 
129 |         courses_loaded = loop.run_until_complete(load_courses_if_needed(course_manager))
130 |         logger.info(f"✅ {courses_loaded} courses available")
131 | 
132 |     # Initialize nodes with course manager
133 |     initialize_nodes(course_manager)
134 | 
135 |     # Create workflow with verbose setting
136 |     workflow = create_workflow(verbose=verbose)
137 | 
138 |     logger.info("✅ Stage 2 Context-Engineered Agent initialized")
139 |     logger.info("✨ This agent uses Section 2 context engineering techniques!")
140 | 
141 |     return workflow, course_manager
142 | 


--------------------------------------------------------------------------------
/src/redis_context_course/scripts/generate_courses_from_hierarchical.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """
  3 | Generate courses.json from hierarchical_courses.json.
  4 | 
  5 | This ensures Redis only contains courses that have full hierarchical data
  6 | (syllabi, assignments, etc.), avoiding the "No hierarchical data" warnings.
  7 | """
  8 | 
  9 | import json
 10 | from datetime import datetime, time
 11 | from pathlib import Path
 12 | from typing import Any, Dict, List
 13 | 
 14 | from ulid import ULID
 15 | 
 16 | 
 17 | def generate_schedule(format_type: str) -> Dict[str, Any] | None:
 18 |     """Generate a course schedule based on format."""
 19 |     if format_type == "online":
 20 |         return None  # Online courses typically don't have fixed schedules
 21 |     
 22 |     # In-person or hybrid courses have schedules
 23 |     return {
 24 |         "days": ["tuesday", "thursday"],
 25 |         "start_time": "10:00",
 26 |         "end_time": "11:30",
 27 |         "location": "Room 101"
 28 |     }
 29 | 
 30 | 
 31 | def hierarchical_to_course(h_course: Dict[str, Any]) -> Dict[str, Any]:
 32 |     """Convert hierarchical course data to Course model format."""
 33 |     summary = h_course["summary"]
 34 |     details = h_course["details"]
 35 |     
 36 |     # Convert prerequisites from hierarchical format
 37 |     prerequisites = []
 38 |     if details.get("prerequisites"):
 39 |         for prereq in details["prerequisites"]:
 40 |             if isinstance(prereq, dict):
 41 |                 prerequisites.append(prereq)
 42 |             else:
 43 |                 # It's just a course code string
 44 |                 prerequisites.append({
 45 |                     "course_code": prereq,
 46 |                     "course_title": f"Prerequisite {prereq}",
 47 |                     "minimum_grade": "C",
 48 |                     "can_be_concurrent": False
 49 |                 })
 50 |     
 51 |     # Determine major from department
 52 |     dept = summary.get("department", "Computer Science")
 53 |     major_map = {
 54 |         "Computer Science": "Computer Science",
 55 |         "Mathematics": "Mathematics",
 56 |         "Data Science": "Data Science",
 57 |         "Engineering": "Engineering",
 58 |     }
 59 |     major = major_map.get(dept, dept)
 60 |     
 61 |     return {
 62 |         "id": str(ULID()),
 63 |         "course_code": summary["course_code"],
 64 |         "title": summary["title"],
 65 |         "description": details.get("full_description", summary.get("short_description", "")),
 66 |         "credits": summary["credits"],
 67 |         "difficulty_level": summary["difficulty_level"],
 68 |         "format": summary["format"],
 69 |         "department": dept,
 70 |         "major": major,
 71 |         "prerequisites": prerequisites,
 72 |         "schedule": generate_schedule(summary["format"]),
 73 |         "semester": details.get("semester", "fall"),
 74 |         "year": details.get("year", 2024),
 75 |         "instructor": summary["instructor"],
 76 |         "max_enrollment": details.get("max_enrollment", 30),
 77 |         "current_enrollment": 0,
 78 |         "tags": summary.get("tags", []) + details.get("tags", []),
 79 |         "learning_objectives": details.get("learning_objectives", []),
 80 |         "created_at": datetime.now().isoformat(),
 81 |         "updated_at": datetime.now().isoformat(),
 82 |     }
 83 | 
 84 | 
 85 | def main():
 86 |     """Generate courses.json from hierarchical_courses.json."""
 87 |     # Paths
 88 |     data_dir = Path(__file__).parent.parent / "data"
 89 |     hierarchical_path = data_dir / "hierarchical" / "hierarchical_courses.json"
 90 |     output_path = data_dir / "courses.json"
 91 |     
 92 |     # Load hierarchical courses
 93 |     print(f"Loading hierarchical courses from: {hierarchical_path}")
 94 |     with open(hierarchical_path) as f:
 95 |         hierarchical_data = json.load(f)
 96 |     
 97 |     # Convert to Course format
 98 |     courses = []
 99 |     seen_codes = set()  # Track unique course codes
100 |     
101 |     for h_course in hierarchical_data["courses"]:
102 |         code = h_course["summary"]["course_code"]
103 |         if code in seen_codes:
104 |             print(f"  Skipping duplicate: {code}")
105 |             continue
106 |         seen_codes.add(code)
107 |         
108 |         course = hierarchical_to_course(h_course)
109 |         courses.append(course)
110 |         print(f"  Converted: {code} - {course['title']}")
111 |     
112 |     # Save courses.json
113 |     print(f"\nSaving {len(courses)} courses to: {output_path}")
114 |     with open(output_path, "w") as f:
115 |         json.dump({"courses": courses}, f, indent=2, default=str)
116 |     
117 |     print(f"\n✅ Generated courses.json with {len(courses)} courses")
118 |     print(f"   All courses have matching hierarchical data (syllabi, assignments)")
119 | 
120 | 
121 | if __name__ == "__main__":
122 |     main()
123 | 
124 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/agent/react_parser.py:
--------------------------------------------------------------------------------
  1 | """
  2 | ReAct output parser for Stage 7.
  3 | 
  4 | Parses LLM output in the Thought → Action → Observation format.
  5 | """
  6 | 
  7 | import json
  8 | import re
  9 | from typing import Any, Dict, Optional
 10 | 
 11 | 
 12 | def parse_react_output(text: str) -> Dict[str, Optional[str]]:
 13 |     """
 14 |     Parse ReAct format output from LLM.
 15 | 
 16 |     Expected format:
 17 |         Thought: [reasoning]
 18 |         Action: [action_name]
 19 |         Action Input: [JSON input]
 20 | 
 21 |     Args:
 22 |         text: Raw LLM output text
 23 | 
 24 |     Returns:
 25 |         Dictionary with 'thought', 'action', and 'action_input' keys
 26 |     """
 27 |     # Extract Thought (everything between "Thought:" and "Action:")
 28 |     thought_match = re.search(
 29 |         r"Thought:\s*(.+?)(?=\nAction:|\Z)", text, re.DOTALL | re.IGNORECASE
 30 |     )
 31 | 
 32 |     # Extract Action (word after "Action:")
 33 |     action_match = re.search(r"Action:\s*(\w+)", text, re.IGNORECASE)
 34 | 
 35 |     # Extract Action Input - everything after "Action Input:" until end or next section
 36 |     # For FINISH actions, this can be multi-line plain text
 37 |     action_input_match = re.search(
 38 |         r"Action Input:\s*(.+?)(?=\nThought:|\nObservation:|\nAction:|\Z)",
 39 |         text,
 40 |         re.DOTALL | re.IGNORECASE,
 41 |     )
 42 | 
 43 |     return {
 44 |         "thought": thought_match.group(1).strip() if thought_match else None,
 45 |         "action": action_match.group(1).strip() if action_match else None,
 46 |         "action_input": (
 47 |             action_input_match.group(1).strip() if action_input_match else None
 48 |         ),
 49 |     }
 50 | 
 51 | 
 52 | def validate_action_input(action_input: str) -> Optional[Dict[str, Any]]:
 53 |     """
 54 |     Validate and parse action input as JSON.
 55 | 
 56 |     Args:
 57 |         action_input: Raw action input string
 58 | 
 59 |     Returns:
 60 |         Parsed JSON dict, or None if invalid
 61 |     """
 62 |     if not action_input:
 63 |         return None
 64 | 
 65 |     try:
 66 |         # Try to parse as JSON
 67 |         parsed = json.loads(action_input)
 68 |         return parsed
 69 |     except json.JSONDecodeError:
 70 |         # If not valid JSON, try to extract JSON-like content
 71 |         # Sometimes LLM adds extra text around JSON
 72 |         json_match = re.search(r"({.+}|\[.+\])", action_input, re.DOTALL)
 73 |         if json_match:
 74 |             try:
 75 |                 return json.loads(json_match.group(1))
 76 |             except json.JSONDecodeError:
 77 |                 pass
 78 | 
 79 |         return None
 80 | 
 81 | 
 82 | def format_observation(result: str, max_length: int = 8000) -> str:
 83 |     """
 84 |     Format tool result as observation for ReAct loop.
 85 | 
 86 |     Args:
 87 |         result: Raw tool result
 88 |         max_length: Maximum length of observation (default 8000 chars ≈ 2000 tokens)
 89 |                    This should be large enough to include full syllabus data.
 90 | 
 91 |     Returns:
 92 |         Formatted observation string
 93 |     """
 94 |     # Truncate if too long
 95 |     if len(result) > max_length:
 96 |         result = result[:max_length] + "... [truncated]"
 97 | 
 98 |     return f"Observation: {result}"
 99 | 
100 | 
101 | def extract_final_answer(action_input: str) -> str:
102 |     """
103 |     Extract final answer from FINISH action input.
104 | 
105 |     Args:
106 |         action_input: Action input for FINISH action
107 | 
108 |     Returns:
109 |         Final answer text
110 |     """
111 |     # If it's JSON, try to extract 'answer' or 'response' field
112 |     parsed = validate_action_input(action_input)
113 |     if parsed:
114 |         if isinstance(parsed, dict):
115 |             return parsed.get("answer", parsed.get("response", str(parsed)))
116 |         return str(parsed)
117 | 
118 |     # Otherwise, return as-is
119 |     return action_input.strip()
120 | 
121 | 
122 | def is_valid_react_output(text: str) -> bool:
123 |     """
124 |     Check if LLM output is valid ReAct format.
125 | 
126 |     Args:
127 |         text: Raw LLM output
128 | 
129 |     Returns:
130 |         True if valid ReAct format, False otherwise
131 |     """
132 |     parsed = parse_react_output(text)
133 | 
134 |     # Must have at least an action
135 |     if not parsed["action"]:
136 |         return False
137 | 
138 |     # If action is not FINISH, must have action_input
139 |     if parsed["action"].upper() != "FINISH" and not parsed["action_input"]:
140 |         return False
141 | 
142 |     return True
143 | 
144 | 
145 | def format_react_error(error: str) -> str:
146 |     """
147 |     Format error message for ReAct loop.
148 | 
149 |     Args:
150 |         error: Error message
151 | 
152 |     Returns:
153 |         Formatted error observation
154 |     """
155 |     return f"Observation: Error - {error}"
156 | 
157 | 


--------------------------------------------------------------------------------
/src/redis_context_course/models.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Data models for the Redis University Class Agent.
  3 | 
  4 | This module defines the core data structures used throughout the application,
  5 | including courses, majors, prerequisites, and student information.
  6 | """
  7 | 
  8 | from datetime import datetime, time
  9 | from enum import Enum
 10 | from typing import Any, Dict, List, Optional
 11 | 
 12 | from pydantic import BaseModel, ConfigDict, Field
 13 | from ulid import ULID
 14 | 
 15 | 
 16 | class DifficultyLevel(str, Enum):
 17 |     """Course difficulty levels."""
 18 | 
 19 |     BEGINNER = "beginner"
 20 |     INTERMEDIATE = "intermediate"
 21 |     ADVANCED = "advanced"
 22 |     GRADUATE = "graduate"
 23 | 
 24 | 
 25 | class CourseFormat(str, Enum):
 26 |     """Course delivery formats."""
 27 | 
 28 |     IN_PERSON = "in_person"
 29 |     ONLINE = "online"
 30 |     HYBRID = "hybrid"
 31 | 
 32 | 
 33 | class Semester(str, Enum):
 34 |     """Academic semesters."""
 35 | 
 36 |     FALL = "fall"
 37 |     SPRING = "spring"
 38 |     SUMMER = "summer"
 39 |     WINTER = "winter"
 40 | 
 41 | 
 42 | class DayOfWeek(str, Enum):
 43 |     """Days of the week for scheduling."""
 44 | 
 45 |     MONDAY = "monday"
 46 |     TUESDAY = "tuesday"
 47 |     WEDNESDAY = "wednesday"
 48 |     THURSDAY = "thursday"
 49 |     FRIDAY = "friday"
 50 |     SATURDAY = "saturday"
 51 |     SUNDAY = "sunday"
 52 | 
 53 | 
 54 | class CourseSchedule(BaseModel):
 55 |     """Course schedule information."""
 56 | 
 57 |     days: List[DayOfWeek]
 58 |     start_time: time
 59 |     end_time: time
 60 |     location: Optional[str] = None
 61 | 
 62 |     model_config = ConfigDict(json_encoders={time: lambda v: v.strftime("%H:%M")})
 63 | 
 64 | 
 65 | class Prerequisite(BaseModel):
 66 |     """Course prerequisite information."""
 67 | 
 68 |     course_code: str
 69 |     course_title: str
 70 |     minimum_grade: Optional[str] = "C"
 71 |     can_be_concurrent: bool = False
 72 | 
 73 | 
 74 | class Course(BaseModel):
 75 |     """Complete course information."""
 76 | 
 77 |     id: str = Field(default_factory=lambda: str(ULID()))
 78 |     course_code: str  # e.g., "CS101"
 79 |     title: str
 80 |     description: str
 81 |     credits: int
 82 |     difficulty_level: DifficultyLevel
 83 |     format: CourseFormat
 84 |     department: str
 85 |     major: str
 86 |     prerequisites: List[Prerequisite] = Field(default_factory=list)
 87 |     schedule: Optional[CourseSchedule] = None
 88 |     semester: Semester
 89 |     year: int
 90 |     instructor: str
 91 |     max_enrollment: int
 92 |     current_enrollment: int = 0
 93 |     tags: List[str] = Field(default_factory=list)
 94 |     learning_objectives: List[str] = Field(default_factory=list)
 95 |     created_at: datetime = Field(default_factory=datetime.now)
 96 |     updated_at: datetime = Field(default_factory=datetime.now)
 97 | 
 98 | 
 99 | class Major(BaseModel):
100 |     """Academic major information."""
101 | 
102 |     id: str = Field(default_factory=lambda: str(ULID()))
103 |     name: str
104 |     code: str  # e.g., "CS", "MATH", "ENG"
105 |     department: str
106 |     description: str
107 |     required_credits: int
108 |     core_courses: List[str] = Field(default_factory=list)  # Course codes
109 |     elective_courses: List[str] = Field(default_factory=list)  # Course codes
110 |     career_paths: List[str] = Field(default_factory=list)
111 |     created_at: datetime = Field(default_factory=datetime.now)
112 | 
113 | 
114 | class StudentProfile(BaseModel):
115 |     """Student profile and preferences."""
116 | 
117 |     id: str = Field(default_factory=lambda: str(ULID()))
118 |     name: str
119 |     email: str
120 |     major: Optional[str] = None
121 |     year: int = 1  # 1-4 for undergraduate, 5+ for graduate
122 |     completed_courses: List[str] = Field(default_factory=list)  # Course codes
123 |     current_courses: List[str] = Field(default_factory=list)  # Course codes
124 |     interests: List[str] = Field(default_factory=list)
125 |     preferred_format: Optional[CourseFormat] = None
126 |     preferred_difficulty: Optional[DifficultyLevel] = None
127 |     max_credits_per_semester: int = 15
128 |     created_at: datetime = Field(default_factory=datetime.now)
129 |     updated_at: datetime = Field(default_factory=datetime.now)
130 | 
131 | 
132 | class CourseRecommendation(BaseModel):
133 |     """Course recommendation with reasoning."""
134 | 
135 |     course: Course
136 |     relevance_score: float = Field(ge=0.0, le=1.0)
137 |     reasoning: str
138 |     prerequisites_met: bool
139 |     fits_schedule: bool = True
140 |     fits_preferences: bool = True
141 | 
142 | 
143 | class AgentResponse(BaseModel):
144 |     """Structured response from the agent."""
145 | 
146 |     message: str
147 |     recommendations: List[CourseRecommendation] = Field(default_factory=list)
148 |     suggested_actions: List[str] = Field(default_factory=list)
149 |     metadata: Dict[str, Any] = Field(default_factory=dict)
150 | 


--------------------------------------------------------------------------------
/src/redis_context_course/redis_config.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Redis configuration and connection management for the Class Agent.
  3 | 
  4 | This module handles all Redis connections, including vector storage
  5 | and checkpointing.
  6 | """
  7 | 
  8 | import os
  9 | from typing import Optional
 10 | 
 11 | import redis
 12 | from langchain_openai import OpenAIEmbeddings
 13 | from langgraph.checkpoint.redis import RedisSaver
 14 | from redisvl.index import SearchIndex
 15 | from redisvl.schema import IndexSchema
 16 | 
 17 | 
 18 | class RedisConfig:
 19 |     """Redis configuration management."""
 20 | 
 21 |     def __init__(
 22 |         self,
 23 |         redis_url: Optional[str] = None,
 24 |         vector_index_name: str = "course_catalog",
 25 |         checkpoint_namespace: str = "class_agent",
 26 |     ):
 27 |         self.redis_url = redis_url or os.getenv("REDIS_URL", "redis://localhost:6379")
 28 |         # Allow override via environment variable for progressive agents
 29 |         self.vector_index_name = os.getenv("COURSE_INDEX_NAME", vector_index_name)
 30 |         self.checkpoint_namespace = checkpoint_namespace
 31 | 
 32 |         # Initialize connections
 33 |         self._redis_client = None
 34 |         self._vector_index = None
 35 |         self._checkpointer = None
 36 |         self._embeddings = None
 37 | 
 38 |     @property
 39 |     def redis_client(self) -> redis.Redis:
 40 |         """Get Redis client instance."""
 41 |         if self._redis_client is None:
 42 |             self._redis_client = redis.from_url(self.redis_url, decode_responses=True)
 43 |         return self._redis_client
 44 | 
 45 |     @property
 46 |     def embeddings(self) -> OpenAIEmbeddings:
 47 |         """Get OpenAI embeddings instance."""
 48 |         if self._embeddings is None:
 49 |             self._embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 50 |         return self._embeddings
 51 | 
 52 |     @property
 53 |     def vector_index(self) -> SearchIndex:
 54 |         """Get or create vector search index for courses."""
 55 |         if self._vector_index is None:
 56 |             schema = IndexSchema.from_dict(
 57 |                 {
 58 |                     "index": {
 59 |                         "name": self.vector_index_name,
 60 |                         "prefix": f"{self.vector_index_name}:",
 61 |                         "storage_type": "hash",
 62 |                     },
 63 |                     "fields": [
 64 |                         {"name": "id", "type": "tag"},
 65 |                         {"name": "course_code", "type": "tag"},
 66 |                         {"name": "title", "type": "text"},
 67 |                         {"name": "description", "type": "text"},
 68 |                         {"name": "department", "type": "tag"},
 69 |                         {"name": "major", "type": "tag"},
 70 |                         {"name": "difficulty_level", "type": "tag"},
 71 |                         {"name": "format", "type": "tag"},
 72 |                         {"name": "semester", "type": "tag"},
 73 |                         {"name": "year", "type": "numeric"},
 74 |                         {"name": "credits", "type": "numeric"},
 75 |                         {"name": "tags", "type": "tag"},
 76 |                         {
 77 |                             "name": "content_vector",
 78 |                             "type": "vector",
 79 |                             "attrs": {
 80 |                                 "dims": 1536,
 81 |                                 "distance_metric": "cosine",
 82 |                                 "algorithm": "hnsw",
 83 |                                 "datatype": "float32",
 84 |                             },
 85 |                         },
 86 |                     ],
 87 |                 }
 88 |             )
 89 | 
 90 |             # Initialize index with connection params (avoid deprecated .connect())
 91 |             self._vector_index = SearchIndex(schema, redis_url=self.redis_url)
 92 | 
 93 |             # Create index if it doesn't exist
 94 |             try:
 95 |                 self._vector_index.create(overwrite=False)
 96 |             except Exception:
 97 |                 # Index likely already exists
 98 |                 pass
 99 | 
100 |         return self._vector_index
101 | 
102 |     @property
103 |     def checkpointer(self) -> RedisSaver:
104 |         """Get Redis checkpointer for LangGraph state management."""
105 |         if self._checkpointer is None:
106 |             self._checkpointer = RedisSaver(redis_client=self.redis_client)
107 |             self._checkpointer.setup()
108 |         return self._checkpointer
109 | 
110 |     def health_check(self) -> bool:
111 |         """Check if Redis connection is healthy."""
112 |         try:
113 |             return self.redis_client.ping()
114 |         except Exception:
115 |             return False
116 | 
117 |     # EXPERIMENTAL: Not currently used in notebooks or progressive_agents but available for external use
118 |     def cleanup(self):
119 |         """Clean up connections."""
120 |         if self._redis_client:
121 |             self._redis_client.close()
122 |         if self._vector_index:
123 |             self._vector_index.disconnect()
124 | 
125 | 
126 | # Global configuration instance
127 | redis_config = RedisConfig()
128 | 


--------------------------------------------------------------------------------
/progressive_agents/stage4_hybrid_search/agent/workflow.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Main workflow builder and runner for the Stage 4 ReAct Course Q&A Agent.
  3 | 
  4 | Implements ReAct (Reasoning + Acting) loop with hybrid search.
  5 | """
  6 | 
  7 | import logging
  8 | import time
  9 | from datetime import datetime
 10 | from typing import Any, Dict
 11 | 
 12 | from langgraph.graph import END, StateGraph
 13 | 
 14 | from .react_agent import react_agent_node, set_verbose
 15 | from .state import WorkflowState, initialize_metrics
 16 | from .tools import initialize_tools
 17 | 
 18 | # Configure logger
 19 | logger = logging.getLogger("course-qa-workflow")
 20 | 
 21 | 
 22 | def create_workflow(course_manager, verbose: bool = True):
 23 |     """
 24 |     Create and compile the Stage 4 ReAct Course Q&A agent workflow.
 25 | 
 26 |     Args:
 27 |         course_manager: CourseManager instance for course search
 28 |         verbose: If True, show detailed logging. If False, suppress intermediate logs.
 29 | 
 30 |     Returns:
 31 |         Compiled LangGraph workflow
 32 |     """
 33 |     # Set verbose mode for react agent
 34 |     set_verbose(verbose)
 35 | 
 36 |     # Control logger level based on verbose flag
 37 |     if not verbose:
 38 |         logger.setLevel(logging.CRITICAL)
 39 |     else:
 40 |         logger.setLevel(logging.INFO)
 41 | 
 42 |     # Initialize tools
 43 |     initialize_tools(course_manager)
 44 | 
 45 |     # Create workflow graph
 46 |     workflow = StateGraph(WorkflowState)
 47 | 
 48 |     # Add ReAct agent node
 49 |     workflow.add_node("react_agent", react_agent_node)
 50 | 
 51 |     # Set entry point
 52 |     workflow.set_entry_point("react_agent")
 53 | 
 54 |     # Add edge to end
 55 |     workflow.add_edge("react_agent", END)
 56 | 
 57 |     # Compile and return
 58 |     return workflow.compile()
 59 | 
 60 | 
 61 | def run_agent(agent, query: str, enable_caching: bool = False) -> Dict[str, Any]:
 62 |     """
 63 |     Run the Stage 4 ReAct Course Q&A agent on a query (synchronous).
 64 | 
 65 |     Args:
 66 |         agent: Compiled LangGraph workflow
 67 |         query: User query about courses
 68 |         enable_caching: Whether to use semantic caching (currently disabled)
 69 | 
 70 |     Returns:
 71 |         Dictionary with results and metrics
 72 |     """
 73 |     import asyncio
 74 |     return asyncio.run(run_agent_async(agent, query, enable_caching))
 75 | 
 76 | 
 77 | async def run_agent_async(
 78 |     agent, query: str, enable_caching: bool = False
 79 | ) -> Dict[str, Any]:
 80 |     """
 81 |     Run the Stage 4 ReAct Course Q&A agent on a query (async).
 82 | 
 83 |     Args:
 84 |         agent: Compiled LangGraph workflow
 85 |         query: User query about courses
 86 |         enable_caching: Whether to use semantic caching (currently disabled)
 87 | 
 88 |     Returns:
 89 |         Dictionary with results and metrics
 90 |     """
 91 |     start_time = time.perf_counter()
 92 | 
 93 |     # Initialize state for the workflow
 94 |     initial_state: WorkflowState = {
 95 |         "original_query": query,
 96 |         "sub_questions": [],
 97 |         "sub_answers": {},
 98 |         "query_intent": None,
 99 |         "extracted_entities": None,
100 |         "search_strategy": None,
101 |         "exact_matches": None,
102 |         "metadata_filters": None,
103 |         # ReAct-specific fields
104 |         "reasoning_trace": [],
105 |         "react_iterations": 0,
106 |         # Cache (disabled)
107 |         "cache_hits": {},
108 |         "cache_confidences": {},
109 |         "cache_enabled": enable_caching,
110 |         # Research
111 |         "research_iterations": {},
112 |         "max_research_iterations": 2,
113 |         "research_quality_scores": {},
114 |         "research_feedback": {},
115 |         "current_research_strategy": {},
116 |         # Output
117 |         "final_response": None,
118 |         "execution_path": [],
119 |         "active_sub_question": None,
120 |         "metrics": initialize_metrics(),
121 |         "timestamp": datetime.now().isoformat(),
122 |         "comparison_mode": False,
123 |         "llm_calls": {},
124 |     }
125 | 
126 |     logger.info("=" * 80)
127 |     logger.info(f"🚀 Starting Stage 4 ReAct workflow for query: '{query[:50]}...'")
128 | 
129 |     try:
130 |         # Execute the workflow
131 |         final_state = await agent.ainvoke(initial_state)
132 | 
133 |         # Calculate final metrics
134 |         total_time = (time.perf_counter() - start_time) * 1000
135 |         final_state["metrics"]["total_latency"] = total_time
136 | 
137 |         # Create execution path string
138 |         execution_path = " → ".join(final_state["execution_path"])
139 |         final_state["metrics"]["execution_path"] = execution_path
140 | 
141 |         logger.info("=" * 80)
142 |         logger.info(f"✅ Workflow completed in {total_time:.2f}ms")
143 |         logger.info(f"📊 Execution path: {execution_path}")
144 |         logger.info(f"🔄 ReAct iterations: {final_state.get('react_iterations', 0)}")
145 | 
146 |         return final_state
147 | 
148 |     except Exception as e:
149 |         logger.error(f"Workflow execution failed: {e}")
150 |         return {
151 |             "original_query": query,
152 |             "final_response": f"Error: {e}",
153 |             "execution_path": ["failed"],
154 |             "metrics": {"total_latency": (time.perf_counter() - start_time) * 1000},
155 |             "reasoning_trace": [],
156 |             "react_iterations": 0,
157 |         }
158 | 
159 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/debug_search.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """
  3 | Debug script to test course search functionality.
  4 | """
  5 | 
  6 | import asyncio
  7 | import sys
  8 | from pathlib import Path
  9 | 
 10 | from dotenv import load_dotenv
 11 | 
 12 | # Load .env from project root
 13 | env_path = Path(__file__).parent.parent / "src" / ".env"
 14 | load_dotenv(env_path)
 15 | 
 16 | # Add agent module to path
 17 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 18 | 
 19 | from agent.setup import setup_agent
 20 | from agent.tools import search_courses
 21 | 
 22 | 
 23 | async def test_course_search():
 24 |     """Test course search functionality."""
 25 |     print("=" * 80)
 26 |     print("Course Search Debug Tool")
 27 |     print("=" * 80)
 28 |     print()
 29 | 
 30 |     # Initialize CourseManager
 31 |     print("🔧 Initializing CourseManager...")
 32 |     course_manager, _ = await setup_agent(auto_load_courses=True)
 33 |     print()
 34 | 
 35 |     # Test 1: Get all courses
 36 |     print("=" * 80)
 37 |     print("Test 1: Get All Courses")
 38 |     print("=" * 80)
 39 |     all_courses = await course_manager.get_all_courses()
 40 |     print(f"✅ Found {len(all_courses)} courses in Redis")
 41 | 
 42 |     if all_courses:
 43 |         print("\nFirst 3 courses:")
 44 |         for i, course in enumerate(all_courses[:3], 1):
 45 |             print(f"\n{i}. {course.course_code}: {course.title}")
 46 |             print(f"   Department: {course.department}")
 47 |             print(f"   Description: {course.description[:100]}...")
 48 |     print()
 49 | 
 50 |     # Test 2: Direct CourseManager search
 51 |     print("=" * 80)
 52 |     print("Test 2: Direct CourseManager Search")
 53 |     print("=" * 80)
 54 |     test_queries = [
 55 |         "machine learning",
 56 |         "database",
 57 |         "web development",
 58 |         "python programming",
 59 |     ]
 60 | 
 61 |     for query in test_queries:
 62 |         print(f"\n🔍 Query: '{query}'")
 63 |         results = await course_manager.search_courses(
 64 |             query=query, limit=3, similarity_threshold=0.6
 65 |         )
 66 |         print(f"   Results: {len(results)} courses found")
 67 | 
 68 |         if results:
 69 |             for i, course in enumerate(results, 1):
 70 |                 print(f"   {i}. {course.course_code}: {course.title}")
 71 |         else:
 72 |             print("   ⚠️  No courses found!")
 73 |     print()
 74 | 
 75 |     # Test 3: Lower similarity threshold
 76 |     print("=" * 80)
 77 |     print("Test 3: Search with Lower Threshold (0.3)")
 78 |     print("=" * 80)
 79 |     query = "machine learning"
 80 |     print(f"🔍 Query: '{query}'")
 81 |     results = await course_manager.search_courses(
 82 |         query=query,
 83 |         limit=5,
 84 |         similarity_threshold=0.3,  # Lower threshold
 85 |     )
 86 |     print(f"   Results: {len(results)} courses found")
 87 | 
 88 |     if results:
 89 |         for i, course in enumerate(results, 1):
 90 |             print(f"   {i}. {course.course_code}: {course.title}")
 91 |             print(f"      Department: {course.department}")
 92 |     print()
 93 | 
 94 |     # Test 4: Test the search_courses tool
 95 |     print("=" * 80)
 96 |     print("Test 4: Agent Tool Search (search_courses)")
 97 |     print("=" * 80)
 98 | 
 99 |     # Initialize the tool with course_manager
100 |     from agent.tools import initialize_tools
101 | 
102 |     initialize_tools(course_manager)
103 | 
104 |     query = "machine learning courses"
105 |     print(f"🔍 Query: '{query}'")
106 |     result = await search_courses(query, top_k=3)
107 |     print("\nTool Result:")
108 |     print("-" * 80)
109 |     print(result)
110 |     print("-" * 80)
111 |     print()
112 | 
113 |     # Test 5: Check course content for embeddings
114 |     print("=" * 80)
115 |     print("Test 5: Sample Course Content")
116 |     print("=" * 80)
117 |     if all_courses:
118 |         sample = all_courses[0]
119 |         print(f"Course: {sample.course_code}: {sample.title}")
120 |         print(f"Department: {sample.department}")
121 |         print(f"Major: {sample.major}")
122 |         print(f"Tags: {', '.join(sample.tags)}")
123 |         print(f"Description: {sample.description}")
124 |         print(f"Learning Objectives: {', '.join(sample.learning_objectives[:2])}...")
125 | 
126 |         # Show what gets embedded
127 |         content = f"{sample.title} {sample.description} {sample.department} {sample.major} {' '.join(sample.tags)} {' '.join(sample.learning_objectives)}"
128 |         print(f"\nEmbedded content length: {len(content)} chars")
129 |         print(f"Embedded content preview: {content[:200]}...")
130 |     print()
131 | 
132 |     # Test 6: Try different query formats
133 |     print("=" * 80)
134 |     print("Test 6: Different Query Formats")
135 |     print("=" * 80)
136 | 
137 |     query_variations = [
138 |         "What machine learning courses are available?",
139 |         "machine learning",
140 |         "ML courses",
141 |         "artificial intelligence",
142 |         "data science",
143 |     ]
144 | 
145 |     for query in query_variations:
146 |         print(f"\n🔍 Query: '{query}'")
147 |         results = await course_manager.search_courses(
148 |             query=query, limit=2, similarity_threshold=0.5
149 |         )
150 |         print(f"   Results: {len(results)} courses")
151 |         if results:
152 |             print(f"   Top: {results[0].course_code}: {results[0].title}")
153 | 
154 |     print()
155 |     print("=" * 80)
156 |     print("✅ Debug Complete")
157 |     print("=" * 80)
158 | 
159 | 
160 | if __name__ == "__main__":
161 |     asyncio.run(test_course_search())
162 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/debug_search.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """
  3 | Debug script to test course search functionality.
  4 | """
  5 | 
  6 | import asyncio
  7 | import sys
  8 | from pathlib import Path
  9 | 
 10 | from dotenv import load_dotenv
 11 | 
 12 | # Load .env from project root
 13 | env_path = Path(__file__).parent.parent / "src" / ".env"
 14 | load_dotenv(env_path)
 15 | 
 16 | # Add agent module to path
 17 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 18 | 
 19 | from agent.setup import setup_agent
 20 | from agent.tools import search_courses
 21 | 
 22 | 
 23 | async def test_course_search():
 24 |     """Test course search functionality."""
 25 |     print("=" * 80)
 26 |     print("Course Search Debug Tool")
 27 |     print("=" * 80)
 28 |     print()
 29 | 
 30 |     # Initialize CourseManager
 31 |     print("🔧 Initializing CourseManager...")
 32 |     course_manager, _ = await setup_agent(auto_load_courses=True)
 33 |     print()
 34 | 
 35 |     # Test 1: Get all courses
 36 |     print("=" * 80)
 37 |     print("Test 1: Get All Courses")
 38 |     print("=" * 80)
 39 |     all_courses = await course_manager.get_all_courses()
 40 |     print(f"✅ Found {len(all_courses)} courses in Redis")
 41 | 
 42 |     if all_courses:
 43 |         print("\nFirst 3 courses:")
 44 |         for i, course in enumerate(all_courses[:3], 1):
 45 |             print(f"\n{i}. {course.course_code}: {course.title}")
 46 |             print(f"   Department: {course.department}")
 47 |             print(f"   Description: {course.description[:100]}...")
 48 |     print()
 49 | 
 50 |     # Test 2: Direct CourseManager search
 51 |     print("=" * 80)
 52 |     print("Test 2: Direct CourseManager Search")
 53 |     print("=" * 80)
 54 |     test_queries = [
 55 |         "machine learning",
 56 |         "database",
 57 |         "web development",
 58 |         "python programming",
 59 |     ]
 60 | 
 61 |     for query in test_queries:
 62 |         print(f"\n🔍 Query: '{query}'")
 63 |         results = await course_manager.search_courses(
 64 |             query=query, limit=3, similarity_threshold=0.6
 65 |         )
 66 |         print(f"   Results: {len(results)} courses found")
 67 | 
 68 |         if results:
 69 |             for i, course in enumerate(results, 1):
 70 |                 print(f"   {i}. {course.course_code}: {course.title}")
 71 |         else:
 72 |             print("   ⚠️  No courses found!")
 73 |     print()
 74 | 
 75 |     # Test 3: Lower similarity threshold
 76 |     print("=" * 80)
 77 |     print("Test 3: Search with Lower Threshold (0.3)")
 78 |     print("=" * 80)
 79 |     query = "machine learning"
 80 |     print(f"🔍 Query: '{query}'")
 81 |     results = await course_manager.search_courses(
 82 |         query=query,
 83 |         limit=5,
 84 |         similarity_threshold=0.3,  # Lower threshold
 85 |     )
 86 |     print(f"   Results: {len(results)} courses found")
 87 | 
 88 |     if results:
 89 |         for i, course in enumerate(results, 1):
 90 |             print(f"   {i}. {course.course_code}: {course.title}")
 91 |             print(f"      Department: {course.department}")
 92 |     print()
 93 | 
 94 |     # Test 4: Test the search_courses tool
 95 |     print("=" * 80)
 96 |     print("Test 4: Agent Tool Search (search_courses)")
 97 |     print("=" * 80)
 98 | 
 99 |     # Initialize the tool with course_manager
100 |     from agent.tools import initialize_tools
101 | 
102 |     initialize_tools(course_manager)
103 | 
104 |     query = "machine learning courses"
105 |     print(f"🔍 Query: '{query}'")
106 |     result = await search_courses(query, top_k=3)
107 |     print("\nTool Result:")
108 |     print("-" * 80)
109 |     print(result)
110 |     print("-" * 80)
111 |     print()
112 | 
113 |     # Test 5: Check course content for embeddings
114 |     print("=" * 80)
115 |     print("Test 5: Sample Course Content")
116 |     print("=" * 80)
117 |     if all_courses:
118 |         sample = all_courses[0]
119 |         print(f"Course: {sample.course_code}: {sample.title}")
120 |         print(f"Department: {sample.department}")
121 |         print(f"Major: {sample.major}")
122 |         print(f"Tags: {', '.join(sample.tags)}")
123 |         print(f"Description: {sample.description}")
124 |         print(f"Learning Objectives: {', '.join(sample.learning_objectives[:2])}...")
125 | 
126 |         # Show what gets embedded
127 |         content = f"{sample.title} {sample.description} {sample.department} {sample.major} {' '.join(sample.tags)} {' '.join(sample.learning_objectives)}"
128 |         print(f"\nEmbedded content length: {len(content)} chars")
129 |         print(f"Embedded content preview: {content[:200]}...")
130 |     print()
131 | 
132 |     # Test 6: Try different query formats
133 |     print("=" * 80)
134 |     print("Test 6: Different Query Formats")
135 |     print("=" * 80)
136 | 
137 |     query_variations = [
138 |         "What machine learning courses are available?",
139 |         "machine learning",
140 |         "ML courses",
141 |         "artificial intelligence",
142 |         "data science",
143 |     ]
144 | 
145 |     for query in query_variations:
146 |         print(f"\n🔍 Query: '{query}'")
147 |         results = await course_manager.search_courses(
148 |             query=query, limit=2, similarity_threshold=0.5
149 |         )
150 |         print(f"   Results: {len(results)} courses")
151 |         if results:
152 |             print(f"   Top: {results[0].course_code}: {results[0].title}")
153 | 
154 |     print()
155 |     print("=" * 80)
156 |     print("✅ Debug Complete")
157 |     print("=" * 80)
158 | 
159 | 
160 | if __name__ == "__main__":
161 |     asyncio.run(test_course_search())
162 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/debug_search.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """
  3 | Debug script to test course search functionality.
  4 | """
  5 | 
  6 | import asyncio
  7 | import sys
  8 | from pathlib import Path
  9 | 
 10 | from dotenv import load_dotenv
 11 | 
 12 | # Load .env from project root
 13 | env_path = Path(__file__).parent.parent / "src" / ".env"
 14 | load_dotenv(env_path)
 15 | 
 16 | # Add agent module to path
 17 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 18 | 
 19 | from agent.setup import setup_agent
 20 | from agent.tools import search_courses
 21 | 
 22 | 
 23 | async def test_course_search():
 24 |     """Test course search functionality."""
 25 |     print("=" * 80)
 26 |     print("Course Search Debug Tool")
 27 |     print("=" * 80)
 28 |     print()
 29 | 
 30 |     # Initialize CourseManager
 31 |     print("🔧 Initializing CourseManager...")
 32 |     course_manager, _ = await setup_agent(auto_load_courses=True)
 33 |     print()
 34 | 
 35 |     # Test 1: Get all courses
 36 |     print("=" * 80)
 37 |     print("Test 1: Get All Courses")
 38 |     print("=" * 80)
 39 |     all_courses = await course_manager.get_all_courses()
 40 |     print(f"✅ Found {len(all_courses)} courses in Redis")
 41 | 
 42 |     if all_courses:
 43 |         print("\nFirst 3 courses:")
 44 |         for i, course in enumerate(all_courses[:3], 1):
 45 |             print(f"\n{i}. {course.course_code}: {course.title}")
 46 |             print(f"   Department: {course.department}")
 47 |             print(f"   Description: {course.description[:100]}...")
 48 |     print()
 49 | 
 50 |     # Test 2: Direct CourseManager search
 51 |     print("=" * 80)
 52 |     print("Test 2: Direct CourseManager Search")
 53 |     print("=" * 80)
 54 |     test_queries = [
 55 |         "machine learning",
 56 |         "database",
 57 |         "web development",
 58 |         "python programming",
 59 |     ]
 60 | 
 61 |     for query in test_queries:
 62 |         print(f"\n🔍 Query: '{query}'")
 63 |         results = await course_manager.search_courses(
 64 |             query=query, limit=3, similarity_threshold=0.6
 65 |         )
 66 |         print(f"   Results: {len(results)} courses found")
 67 | 
 68 |         if results:
 69 |             for i, course in enumerate(results, 1):
 70 |                 print(f"   {i}. {course.course_code}: {course.title}")
 71 |         else:
 72 |             print("   ⚠️  No courses found!")
 73 |     print()
 74 | 
 75 |     # Test 3: Lower similarity threshold
 76 |     print("=" * 80)
 77 |     print("Test 3: Search with Lower Threshold (0.3)")
 78 |     print("=" * 80)
 79 |     query = "machine learning"
 80 |     print(f"🔍 Query: '{query}'")
 81 |     results = await course_manager.search_courses(
 82 |         query=query,
 83 |         limit=5,
 84 |         similarity_threshold=0.3,  # Lower threshold
 85 |     )
 86 |     print(f"   Results: {len(results)} courses found")
 87 | 
 88 |     if results:
 89 |         for i, course in enumerate(results, 1):
 90 |             print(f"   {i}. {course.course_code}: {course.title}")
 91 |             print(f"      Department: {course.department}")
 92 |     print()
 93 | 
 94 |     # Test 4: Test the search_courses tool
 95 |     print("=" * 80)
 96 |     print("Test 4: Agent Tool Search (search_courses)")
 97 |     print("=" * 80)
 98 | 
 99 |     # Initialize the tool with course_manager
100 |     from agent.tools import initialize_tools
101 | 
102 |     initialize_tools(course_manager)
103 | 
104 |     query = "machine learning courses"
105 |     print(f"🔍 Query: '{query}'")
106 |     result = await search_courses(query, top_k=3)
107 |     print("\nTool Result:")
108 |     print("-" * 80)
109 |     print(result)
110 |     print("-" * 80)
111 |     print()
112 | 
113 |     # Test 5: Check course content for embeddings
114 |     print("=" * 80)
115 |     print("Test 5: Sample Course Content")
116 |     print("=" * 80)
117 |     if all_courses:
118 |         sample = all_courses[0]
119 |         print(f"Course: {sample.course_code}: {sample.title}")
120 |         print(f"Department: {sample.department}")
121 |         print(f"Major: {sample.major}")
122 |         print(f"Tags: {', '.join(sample.tags)}")
123 |         print(f"Description: {sample.description}")
124 |         print(f"Learning Objectives: {', '.join(sample.learning_objectives[:2])}...")
125 | 
126 |         # Show what gets embedded
127 |         content = f"{sample.title} {sample.description} {sample.department} {sample.major} {' '.join(sample.tags)} {' '.join(sample.learning_objectives)}"
128 |         print(f"\nEmbedded content length: {len(content)} chars")
129 |         print(f"Embedded content preview: {content[:200]}...")
130 |     print()
131 | 
132 |     # Test 6: Try different query formats
133 |     print("=" * 80)
134 |     print("Test 6: Different Query Formats")
135 |     print("=" * 80)
136 | 
137 |     query_variations = [
138 |         "What machine learning courses are available?",
139 |         "machine learning",
140 |         "ML courses",
141 |         "artificial intelligence",
142 |         "data science",
143 |     ]
144 | 
145 |     for query in query_variations:
146 |         print(f"\n🔍 Query: '{query}'")
147 |         results = await course_manager.search_courses(
148 |             query=query, limit=2, similarity_threshold=0.5
149 |         )
150 |         print(f"   Results: {len(results)} courses")
151 |         if results:
152 |             print(f"   Top: {results[0].course_code}: {results[0].title}")
153 | 
154 |     print()
155 |     print("=" * 80)
156 |     print("✅ Debug Complete")
157 |     print("=" * 80)
158 | 
159 | 
160 | if __name__ == "__main__":
161 |     asyncio.run(test_course_search())
162 | 


--------------------------------------------------------------------------------
/progressive_agents/stage4_hybrid_search/agent/react_prompts.py:
--------------------------------------------------------------------------------
  1 | """
  2 | ReAct prompts for Stage 4 ReAct.
  3 | 
  4 | Defines the system prompt and examples for ReAct (Reasoning + Acting) loop.
  5 | This version focuses on hybrid search with NER (no memory capabilities).
  6 | """
  7 | 
  8 | REACT_SYSTEM_PROMPT = """You are a helpful Redis University course advisor assistant.
  9 | 
 10 | You have access to ONE tool:
 11 | 
 12 | **search_courses** - Search the Redis University course catalog with hybrid search
 13 |    Parameters:
 14 |    - query (str): Search query
 15 |    - intent (str): GENERAL, PREREQUISITES, SYLLABUS_OBJECTIVES, or ASSIGNMENTS
 16 |    - search_strategy (str): "exact_match", "hybrid", or "semantic_only"
 17 |    - course_codes (list): Specific course codes to search for (use for exact matches)
 18 |    - information_type (list): What info to retrieve (e.g., ["prerequisites", "syllabus"])
 19 |    - departments (list): Filter by department
 20 | 
 21 | You must use the following format:
 22 | 
 23 | Thought: [Your reasoning about what to do next]
 24 | Action: [One of: search_courses or FINISH]
 25 | Action Input: [Valid JSON with the required parameters]
 26 | 
 27 | You will receive:
 28 | Observation: [Result of the action]
 29 | 
 30 | Then you continue with another Thought/Action/Observation cycle.
 31 | 
 32 | When you have enough information to answer the user's question, use:
 33 | Thought: I have enough information to provide a complete answer
 34 | Action: FINISH
 35 | Action Input: [Your final answer to the user]
 36 | 
 37 | IMPORTANT GUIDELINES:
 38 | - Always start with a Thought explaining your reasoning
 39 | - Only use ONE Action per turn
 40 | - Action Input must be valid JSON matching the tool's parameters
 41 | - Use "exact_match" strategy when the user mentions specific course codes (e.g., CS002, CS009)
 42 | - Use "hybrid" strategy for topic-based searches (e.g., "machine learning courses")
 43 | - Use FINISH when you're ready to provide the final answer to the user
 44 | 
 45 | INTERPRETING SEARCH RESULTS:
 46 | - If a search returns course data with an empty field (e.g., "prerequisites": []), that means the field has NO VALUE - not that the search failed
 47 | - Empty prerequisites [] means "no prerequisites required" - this IS a valid answer
 48 | - Empty syllabus or assignments means "not specified" - provide what IS available
 49 | - Only retry a search if you get an actual error or no courses are found at all
 50 | - Do NOT keep searching with different strategies when you already have the course data
 51 | - After 1-2 search attempts, use whatever information you have to answer the user
 52 | 
 53 | EXAMPLES:
 54 | 
 55 | Example 1: Specific course code search
 56 | User: "What is CS004?"
 57 | Thought: The user is asking about a specific course code. I should search for it using exact match.
 58 | Action: search_courses
 59 | Action Input: {"query": "CS004", "intent": "GENERAL", "search_strategy": "exact_match", "course_codes": ["CS004"]}
 60 | Observation: Found course CS004 - Advanced Algorithms. This course covers...
 61 | 
 62 | Thought: I have found the course information. I can now provide a complete answer.
 63 | Action: FINISH
 64 | Action Input: CS004 is Advanced Algorithms, a course that covers algorithm design, complexity analysis, and optimization techniques.
 65 | 
 66 | Example 2: Topic-based search
 67 | User: "What machine learning courses are available?"
 68 | Thought: The user is asking about a topic. I should use hybrid search to find relevant courses.
 69 | Action: search_courses
 70 | Action Input: {"query": "machine learning", "intent": "GENERAL", "search_strategy": "hybrid"}
 71 | Observation: Found 5 courses: CS002 (Machine Learning Fundamentals), CS006 (Deep Learning)...
 72 | 
 73 | Thought: I have found several machine learning courses. I can provide a helpful response.
 74 | Action: FINISH
 75 | Action Input: Here are the machine learning courses available: CS002 (Machine Learning Fundamentals), CS006 (Deep Learning), and more. Would you like details on any specific course?
 76 | 
 77 | Example 3: Prerequisites query (IMPORTANT - handle empty results correctly)
 78 | User: "What are the prerequisites for CS002?"
 79 | Thought: The user is asking about prerequisites for a specific course. I'll search with exact match.
 80 | Action: search_courses
 81 | Action Input: {"query": "CS002", "intent": "PREREQUISITES", "search_strategy": "exact_match", "course_codes": ["CS002"], "information_type": ["prerequisites"]}
 82 | Observation: Found CS002 - Machine Learning Fundamentals. Prerequisites: [] (empty). Level: Advanced.
 83 | 
 84 | Thought: I found the course and it has an empty prerequisites array. This means CS002 has NO prerequisites required - this is a valid answer, not missing data.
 85 | Action: FINISH
 86 | Action Input: CS002 (Machine Learning Fundamentals) has no formal prerequisites listed. However, since it's an advanced-level course, having a background in programming (especially Python) and basic statistics would be helpful.
 87 | 
 88 | Example 4: Syllabus query
 89 | User: "What's the syllabus for CS006?"
 90 | Thought: The user wants syllabus information. I'll search with SYLLABUS_OBJECTIVES intent.
 91 | Action: search_courses
 92 | Action Input: {"query": "CS006", "intent": "SYLLABUS_OBJECTIVES", "search_strategy": "exact_match", "course_codes": ["CS006"], "information_type": ["syllabus"]}
 93 | Observation: Found CS006 - Deep Learning. Syllabus: Week 1: Introduction to Neural Networks...
 94 | 
 95 | Thought: I have the syllabus information. I can provide a complete answer.
 96 | Action: FINISH
 97 | Action Input: Here is the syllabus for CS006 (Deep Learning): Week 1 covers Introduction to Neural Networks, Week 2 covers...
 98 | 
 99 | Now, respond to the user's query using this format."""
100 | 
101 | 


--------------------------------------------------------------------------------
/tests/test_package.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Basic tests to verify the package structure and imports work correctly.
  3 | """
  4 | 
  5 | import pytest
  6 | 
  7 | 
  8 | def test_package_imports():
  9 |     """Test that the main package imports work correctly."""
 10 |     try:
 11 |         import redis_context_course
 12 | 
 13 |         assert redis_context_course.__version__ == "1.0.0"
 14 |         assert redis_context_course.__author__ == "Redis AI Resources Team"
 15 |     except ImportError as e:
 16 |         pytest.fail(f"Failed to import redis_context_course: {e}")
 17 | 
 18 | 
 19 | def test_model_imports():
 20 |     """Test that model imports work correctly."""
 21 |     try:
 22 |         from redis_context_course.models import (
 23 |             CourseFormat,
 24 |             DifficultyLevel,
 25 |         )
 26 | 
 27 |         # Test enum values
 28 |         assert DifficultyLevel.BEGINNER == "beginner"
 29 |         assert CourseFormat.ONLINE == "online"
 30 | 
 31 |     except ImportError as e:
 32 |         pytest.fail(f"Failed to import models: {e}")
 33 | 
 34 | 
 35 | def test_manager_imports():
 36 |     """Test that manager imports work correctly."""
 37 |     try:
 38 |         from redis_context_course import MemoryClient, MemoryClientConfig
 39 |         from redis_context_course.course_manager import CourseManager
 40 |         from redis_context_course.redis_config import RedisConfig
 41 | 
 42 |         # Test that classes can be instantiated (without Redis connection)
 43 |         assert MemoryClient is not None
 44 |         assert MemoryClientConfig is not None
 45 |         assert CourseManager is not None
 46 |         assert RedisConfig is not None
 47 | 
 48 |     except ImportError as e:
 49 |         pytest.fail(f"Failed to import managers: {e}")
 50 | 
 51 | 
 52 | def test_tools_module_imports():
 53 |     """Test that tools module imports work correctly."""
 54 |     try:
 55 |         from redis_context_course.tools import create_agent_tools
 56 | 
 57 |         assert create_agent_tools is not None
 58 | 
 59 |     except ImportError as e:
 60 |         pytest.fail(f"Failed to import tools: {e}")
 61 | 
 62 | 
 63 | def test_scripts_imports():
 64 |     """Test that script imports work correctly."""
 65 |     try:
 66 |         from redis_context_course.scripts import generate_courses, ingest_courses
 67 | 
 68 |         assert generate_courses is not None
 69 |         assert ingest_courses is not None
 70 | 
 71 |     except ImportError as e:
 72 |         pytest.fail(f"Failed to import scripts: {e}")
 73 | 
 74 | 
 75 | def test_create_agent_tools_exists():
 76 |     """Test that create_agent_tools function exists in the package."""
 77 |     try:
 78 |         from redis_context_course import create_agent_tools
 79 | 
 80 |         assert create_agent_tools is not None
 81 |         assert callable(create_agent_tools)
 82 | 
 83 |     except ImportError as e:
 84 |         pytest.fail(f"Failed to import create_agent_tools: {e}")
 85 | 
 86 | 
 87 | def test_tools_imports():
 88 |     """Test that tools module imports work correctly."""
 89 |     try:
 90 |         from redis_context_course.tools import (
 91 |             create_course_tools,
 92 |             create_memory_tools,
 93 |             select_tools_by_keywords,
 94 |         )
 95 | 
 96 |         assert create_course_tools is not None
 97 |         assert create_memory_tools is not None
 98 |         assert select_tools_by_keywords is not None
 99 | 
100 |     except ImportError as e:
101 |         pytest.fail(f"Failed to import tools: {e}")
102 | 
103 | 
104 | def test_optimization_helpers_imports():
105 |     """Test that optimization helpers import work correctly."""
106 |     try:
107 |         from redis_context_course.optimization_helpers import (
108 |             count_tokens,
109 |             create_summary_view,
110 |             estimate_token_budget,
111 |             filter_tools_by_intent,
112 |             format_context_for_llm,
113 |             hybrid_retrieval,
114 |         )
115 | 
116 |         assert count_tokens is not None
117 |         assert estimate_token_budget is not None
118 |         assert hybrid_retrieval is not None
119 |         assert create_summary_view is not None
120 |         assert filter_tools_by_intent is not None
121 |         assert format_context_for_llm is not None
122 | 
123 |     except ImportError as e:
124 |         pytest.fail(f"Failed to import optimization helpers: {e}")
125 | 
126 | 
127 | def test_count_tokens_basic():
128 |     """Test basic token counting functionality."""
129 |     try:
130 |         from redis_context_course.optimization_helpers import count_tokens
131 | 
132 |         # Test with simple text
133 |         text = "Hello, world!"
134 |         tokens = count_tokens(text)
135 | 
136 |         assert isinstance(tokens, int)
137 |         assert tokens > 0
138 | 
139 |     except Exception as e:
140 |         pytest.fail(f"Token counting failed: {e}")
141 | 
142 | 
143 | def test_filter_tools_by_intent_basic():
144 |     """Test basic tool filtering functionality."""
145 |     try:
146 |         from redis_context_course.optimization_helpers import filter_tools_by_intent
147 | 
148 |         # Mock tool groups
149 |         tool_groups = {
150 |             "search": ["search_tool"],
151 |             "memory": ["memory_tool"],
152 |         }
153 | 
154 |         # Test search intent
155 |         result = filter_tools_by_intent("find courses", tool_groups)
156 |         assert result == ["search_tool"]
157 | 
158 |         # Test memory intent
159 |         result = filter_tools_by_intent("remember this", tool_groups)
160 |         assert result == ["memory_tool"]
161 | 
162 |     except Exception as e:
163 |         pytest.fail(f"Tool filtering failed: {e}")
164 | 
165 | 
166 | if __name__ == "__main__":
167 |     pytest.main([__file__])
168 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/test_react_simple.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Simple tests for Stage 5 ReAct agent (single-turn conversations).
  3 | 
  4 | Tests the ReAct pattern with working memory and search_courses tool.
  5 | """
  6 | 
  7 | import asyncio
  8 | import logging
  9 | import os
 10 | from pathlib import Path
 11 | 
 12 | from langchain_openai import ChatOpenAI
 13 | from redis_context_course import CourseManager
 14 | 
 15 | from progressive_agents.stage5_working_memory.agent.react_agent import run_react_agent
 16 | from progressive_agents.stage5_working_memory.agent.tools import initialize_tools
 17 | 
 18 | # Configure logging
 19 | logging.basicConfig(
 20 |     level=logging.INFO,
 21 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 22 | )
 23 | logger = logging.getLogger("course-qa-workflow")
 24 | 
 25 | 
 26 | async def test_simple_query():
 27 |     """Test a simple course lookup."""
 28 |     print("\n" + "=" * 80)
 29 |     print("TEST 1: Simple Course Lookup")
 30 |     print("=" * 80)
 31 |     
 32 |     # Initialize
 33 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 34 | 
 35 |     # Initialize course manager
 36 |     course_manager = CourseManager()
 37 | 
 38 |     # Initialize tools
 39 |     initialize_tools(course_manager)
 40 |     
 41 |     # Run query
 42 |     query = "What is CS004?"
 43 |     print(f"\nQuery: {query}")
 44 |     
 45 |     result = await run_react_agent(
 46 |         query=query,
 47 |         llm=llm,
 48 |         max_iterations=10,
 49 |     )
 50 |     
 51 |     # Print results
 52 |     print(f"\n{'─' * 80}")
 53 |     print("REASONING TRACE:")
 54 |     print(f"{'─' * 80}")
 55 |     for i, step in enumerate(result["reasoning_trace"], 1):
 56 |         print(f"\nStep {i}:")
 57 |         print(f"  Thought: {step.thought}")
 58 |         print(f"  Action: {step.action}")
 59 |         if step.action != "FINISH":
 60 |             print(f"  Action Input: {step.action_input}")
 61 |             print(f"  Observation: {step.observation[:200]}...")
 62 |         else:
 63 |             print(f"  Final Answer: {step.action_input}")
 64 |     
 65 |     print(f"\n{'─' * 80}")
 66 |     print(f"FINAL ANSWER:")
 67 |     print(f"{'─' * 80}")
 68 |     print(result["answer"])
 69 |     print(f"\nIterations: {result['iterations']}")
 70 |     print(f"Success: {result['success']}")
 71 | 
 72 | 
 73 | async def test_semantic_search():
 74 |     """Test semantic search for courses."""
 75 |     print("\n" + "=" * 80)
 76 |     print("TEST 2: Semantic Search")
 77 |     print("=" * 80)
 78 |     
 79 |     # Initialize
 80 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
 81 | 
 82 |     # Initialize course manager
 83 |     course_manager = CourseManager()
 84 | 
 85 |     # Initialize tools
 86 |     initialize_tools(course_manager)
 87 |     
 88 |     # Run query
 89 |     query = "I want to learn about machine learning"
 90 |     print(f"\nQuery: {query}")
 91 |     
 92 |     result = await run_react_agent(
 93 |         query=query,
 94 |         llm=llm,
 95 |         max_iterations=10,
 96 |     )
 97 |     
 98 |     # Print results
 99 |     print(f"\n{'─' * 80}")
100 |     print("REASONING TRACE:")
101 |     print(f"{'─' * 80}")
102 |     for i, step in enumerate(result["reasoning_trace"], 1):
103 |         print(f"\nStep {i}:")
104 |         print(f"  Thought: {step.thought}")
105 |         print(f"  Action: {step.action}")
106 |         if step.action != "FINISH":
107 |             print(f"  Action Input: {step.action_input}")
108 |             print(f"  Observation: {step.observation[:200]}...")
109 |         else:
110 |             print(f"  Final Answer: {step.action_input[:200]}...")
111 |     
112 |     print(f"\n{'─' * 80}")
113 |     print(f"FINAL ANSWER:")
114 |     print(f"{'─' * 80}")
115 |     print(result["answer"])
116 |     print(f"\nIterations: {result['iterations']}")
117 |     print(f"Success: {result['success']}")
118 | 
119 | 
120 | async def test_prerequisites():
121 |     """Test prerequisite lookup."""
122 |     print("\n" + "=" * 80)
123 |     print("TEST 3: Prerequisites Lookup")
124 |     print("=" * 80)
125 |     
126 |     # Initialize
127 |     llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
128 | 
129 |     # Initialize course manager
130 |     course_manager = CourseManager()
131 | 
132 |     # Initialize tools
133 |     initialize_tools(course_manager)
134 |     
135 |     # Run query
136 |     query = "What are the prerequisites for CS004?"
137 |     print(f"\nQuery: {query}")
138 |     
139 |     result = await run_react_agent(
140 |         query=query,
141 |         llm=llm,
142 |         max_iterations=10,
143 |     )
144 |     
145 |     # Print results
146 |     print(f"\n{'─' * 80}")
147 |     print("REASONING TRACE:")
148 |     print(f"{'─' * 80}")
149 |     for i, step in enumerate(result["reasoning_trace"], 1):
150 |         print(f"\nStep {i}:")
151 |         print(f"  Thought: {step.thought}")
152 |         print(f"  Action: {step.action}")
153 |         if step.action != "FINISH":
154 |             print(f"  Action Input: {step.action_input}")
155 |             print(f"  Observation: {step.observation[:200]}...")
156 |         else:
157 |             print(f"  Final Answer: {step.action_input}")
158 |     
159 |     print(f"\n{'─' * 80}")
160 |     print(f"FINAL ANSWER:")
161 |     print(f"{'─' * 80}")
162 |     print(result["answer"])
163 |     print(f"\nIterations: {result['iterations']}")
164 |     print(f"Success: {result['success']}")
165 | 
166 | 
167 | async def main():
168 |     """Run all tests."""
169 |     await test_simple_query()
170 |     await test_semantic_search()
171 |     await test_prerequisites()
172 |     
173 |     print("\n" + "=" * 80)
174 |     print("ALL TESTS COMPLETE")
175 |     print("=" * 80)
176 | 
177 | 
178 | if __name__ == "__main__":
179 |     asyncio.run(main())
180 | 
181 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/test_react_simple.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | """
  3 | Simple test for Stage 7 ReAct Loop Agent.
  4 | 
  5 | Tests basic ReAct functionality with explicit reasoning traces.
  6 | """
  7 | 
  8 | import asyncio
  9 | import logging
 10 | import sys
 11 | from pathlib import Path
 12 | 
 13 | # Add parent directory to path
 14 | sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
 15 | 
 16 | from redis_context_course import CourseManager
 17 | 
 18 | from progressive_agents.stage6_full_memory.agent.workflow import (
 19 |     create_workflow,
 20 |     run_agent_async,
 21 | )
 22 | 
 23 | # Configure logging
 24 | logging.basicConfig(
 25 |     level=logging.INFO,
 26 |     format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 27 | )
 28 | 
 29 | 
 30 | async def test_react_simple():
 31 |     """Test simple ReAct loop with course search."""
 32 |     print("=" * 80)
 33 |     print("TEST: Simple ReAct Loop - Course Search")
 34 |     print("=" * 80)
 35 |     print()
 36 | 
 37 |     # Setup
 38 |     print("Initializing Course Manager...")
 39 |     course_manager = CourseManager()
 40 | 
 41 |     print("Creating agent workflow...")
 42 |     agent = create_workflow(course_manager)
 43 | 
 44 |     # Test query
 45 |     query = "What is CS004?"
 46 |     student_id = "test_student_react"
 47 |     session_id = "test_session_react_001"
 48 | 
 49 |     print(f"Query: {query}")
 50 |     print(f"Student: {student_id}")
 51 |     print()
 52 | 
 53 |     # Run agent
 54 |     result = await run_agent_async(
 55 |         agent, query, session_id=session_id, student_id=student_id, enable_caching=False
 56 |     )
 57 | 
 58 |     # Display results
 59 |     print("=" * 80)
 60 |     print("RESULTS")
 61 |     print("=" * 80)
 62 |     print()
 63 | 
 64 |     print(f"Final Response: {result['final_response']}")
 65 |     print()
 66 | 
 67 |     print(f"ReAct Iterations: {result.get('react_iterations', 0)}")
 68 |     print(f"Reasoning Steps: {len(result.get('reasoning_trace', []))}")
 69 |     print()
 70 | 
 71 |     # Show reasoning trace
 72 |     if result.get("reasoning_trace"):
 73 |         print("Reasoning Trace:")
 74 |         print("-" * 80)
 75 |         for i, step in enumerate(result["reasoning_trace"], 1):
 76 |             print(f"\nStep {i}:")
 77 |             if step["type"] == "thought":
 78 |                 print(f"  💭 Thought: {step['content'][:100]}...")
 79 |             elif step["type"] == "action":
 80 |                 print(f"  🔧 Action: {step['action']}")
 81 |                 print(f"     Input: {step['input']}")
 82 |                 print(f"  👁️  Observation: {step['observation'][:100]}...")
 83 |             elif step["type"] == "finish":
 84 |                 print(f"  ✅ FINISH")
 85 |         print()
 86 | 
 87 |     print("=" * 80)
 88 |     print("✅ TEST PASSED")
 89 |     print("=" * 80)
 90 | 
 91 | 
 92 | async def test_react_memory():
 93 |     """Test ReAct loop with memory storage and retrieval."""
 94 |     print("\n\n")
 95 |     print("=" * 80)
 96 |     print("TEST: ReAct Loop - Memory Storage and Retrieval")
 97 |     print("=" * 80)
 98 |     print()
 99 | 
100 |     # Setup
101 |     print("Initializing Course Manager...")
102 |     course_manager = CourseManager()
103 | 
104 |     print("Creating agent workflow...")
105 |     agent = create_workflow(course_manager)
106 | 
107 |     student_id = "test_student_react_memory"
108 |     session_id = "test_session_react_002"
109 | 
110 |     # Test 1: Store preferences
111 |     query1 = "I'm interested in machine learning and prefer online courses."
112 |     print(f"Query 1: {query1}")
113 |     print()
114 | 
115 |     result1 = await run_agent_async(
116 |         agent, query1, session_id=session_id, student_id=student_id, enable_caching=False
117 |     )
118 | 
119 |     print(f"Response 1: {result1['final_response'][:100]}...")
120 |     print(f"Iterations: {result1.get('react_iterations', 0)}")
121 |     print(f"Reasoning Steps: {len(result1.get('reasoning_trace', []))}")
122 |     print()
123 | 
124 |     # Test 2: Retrieve preferences
125 |     query2 = "What courses would you recommend for me?"
126 |     print(f"Query 2: {query2}")
127 |     print()
128 | 
129 |     result2 = await run_agent_async(
130 |         agent, query2, session_id=session_id, student_id=student_id, enable_caching=False
131 |     )
132 | 
133 |     print(f"Response 2: {result2['final_response'][:100]}...")
134 |     print(f"Iterations: {result2.get('react_iterations', 0)}")
135 |     print(f"Reasoning Steps: {len(result2.get('reasoning_trace', []))}")
136 |     print()
137 | 
138 |     # Show reasoning trace for query 2
139 |     if result2.get("reasoning_trace"):
140 |         print("Reasoning Trace (Query 2):")
141 |         print("-" * 80)
142 |         for i, step in enumerate(result2["reasoning_trace"], 1):
143 |             print(f"\nStep {i}:")
144 |             if step["type"] == "thought":
145 |                 print(f"  💭 Thought: {step['content'][:100]}...")
146 |             elif step["type"] == "action":
147 |                 print(f"  🔧 Action: {step['action']}")
148 |                 print(f"     Input: {step['input']}")
149 |                 print(f"  👁️  Observation: {step['observation'][:100]}...")
150 |             elif step["type"] == "finish":
151 |                 print(f"  ✅ FINISH")
152 |         print()
153 | 
154 |     print("=" * 80)
155 |     print("✅ TEST PASSED")
156 |     print("=" * 80)
157 | 
158 | 
159 | async def main():
160 |     """Run all tests."""
161 |     try:
162 |         await test_react_simple()
163 |         await test_react_memory()
164 |     except Exception as e:
165 |         print(f"\n❌ TEST FAILED: {e}")
166 |         import traceback
167 | 
168 |         traceback.print_exc()
169 |         sys.exit(1)
170 | 
171 | 
172 | if __name__ == "__main__":
173 |     asyncio.run(main())
174 | 
175 | 


--------------------------------------------------------------------------------
/src/redis_context_course/scripts/load_hierarchical_courses.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Load hierarchical courses into Redis.
  3 | 
  4 | This script loads courses generated by generate_hierarchical_courses.py
  5 | into the two-tier Redis storage (summaries + details).
  6 | 
  7 | Usage:
  8 |     # Load courses (appends to existing data)
  9 |     python -m redis_context_course.scripts.load_hierarchical_courses \
 10 |         -i src/redis_context_course/data/hierarchical/hierarchical_courses.json
 11 | 
 12 |     # Force reload (clears existing data first)
 13 |     python -m redis_context_course.scripts.load_hierarchical_courses \
 14 |         -i src/redis_context_course/data/hierarchical/hierarchical_courses.json \
 15 |         --force
 16 | """
 17 | 
 18 | import asyncio
 19 | import json
 20 | import click
 21 | from pathlib import Path
 22 | from datetime import datetime
 23 | from dotenv import load_dotenv
 24 | 
 25 | # Load environment variables from .env file
 26 | load_dotenv()
 27 | 
 28 | from redis_context_course.hierarchical_models import (
 29 |     CourseSummary,
 30 |     CourseDetails,
 31 |     HierarchicalCourse,
 32 | )
 33 | from redis_context_course.hierarchical_manager import HierarchicalCourseManager
 34 | from redis_context_course.redis_config import redis_config
 35 | 
 36 | 
 37 | def clear_existing_data(summary_index: str, details_prefix: str):
 38 |     """Clear existing course data from Redis."""
 39 |     redis_client = redis_config.redis_client
 40 | 
 41 |     # Clear summary keys
 42 |     summary_keys = redis_client.keys(f'{summary_index}:*')
 43 |     if summary_keys:
 44 |         for key in summary_keys:
 45 |             redis_client.delete(key)
 46 |         print(f"   Deleted {len(summary_keys)} summary keys")
 47 | 
 48 |     # Clear detail keys
 49 |     detail_keys = redis_client.keys(f'{details_prefix}:*')
 50 |     if detail_keys:
 51 |         for key in detail_keys:
 52 |             redis_client.delete(key)
 53 |         print(f"   Deleted {len(detail_keys)} detail keys")
 54 | 
 55 |     # Drop the index
 56 |     try:
 57 |         redis_client.execute_command('FT.DROPINDEX', summary_index)
 58 |         print(f"   Dropped index: {summary_index}")
 59 |     except Exception:
 60 |         pass  # Index may not exist
 61 | 
 62 |     return len(summary_keys), len(detail_keys)
 63 | 
 64 | 
 65 | async def load_courses_from_json(json_file: Path, manager: HierarchicalCourseManager):
 66 |     """Load courses from JSON file into Redis."""
 67 |     
 68 |     print(f"📖 Loading courses from {json_file}...")
 69 |     
 70 |     with open(json_file, 'r') as f:
 71 |         data = json.load(f)
 72 |     
 73 |     courses_data = data.get("courses", [])
 74 |     print(f"Found {len(courses_data)} courses in file")
 75 |     
 76 |     loaded = 0
 77 |     failed = 0
 78 |     
 79 |     for course_data in courses_data:
 80 |         try:
 81 |             # Reconstruct HierarchicalCourse
 82 |             summary = CourseSummary(**course_data["summary"])
 83 |             details = CourseDetails(**course_data["details"])
 84 |             
 85 |             course = HierarchicalCourse(
 86 |                 id=course_data["id"],
 87 |                 summary=summary,
 88 |                 details=details,
 89 |                 created_at=datetime.fromisoformat(course_data["created_at"]),
 90 |             )
 91 |             
 92 |             # Add to Redis
 93 |             success = await manager.add_course(course)
 94 |             if success:
 95 |                 loaded += 1
 96 |                 if loaded % 10 == 0:
 97 |                     print(f"  Loaded {loaded}/{len(courses_data)} courses...")
 98 |             else:
 99 |                 failed += 1
100 |                 
101 |         except Exception as e:
102 |             print(f"  ❌ Error loading course: {e}")
103 |             failed += 1
104 |     
105 |     print(f"\n✅ Loading complete!")
106 |     print(f"   Loaded: {loaded}")
107 |     print(f"   Failed: {failed}")
108 |     
109 |     return loaded, failed
110 | 
111 | 
112 | @click.command()
113 | @click.option(
114 |     '--input-file',
115 |     '-i',
116 |     type=click.Path(exists=True),
117 |     required=True,
118 |     help='JSON file with hierarchical courses'
119 | )
120 | @click.option(
121 |     '--summary-index',
122 |     '-s',
123 |     default='course_summaries',
124 |     help='Name for summary vector index'
125 | )
126 | @click.option(
127 |     '--details-prefix',
128 |     '-d',
129 |     default='course_details',
130 |     help='Prefix for details hash keys'
131 | )
132 | @click.option(
133 |     '--force',
134 |     '-f',
135 |     is_flag=True,
136 |     default=False,
137 |     help='Clear existing data before loading (force reload)'
138 | )
139 | def main(input_file: str, summary_index: str, details_prefix: str, force: bool):
140 |     """Load hierarchical courses into Redis.
141 | 
142 |     By default, appends to existing data. Use --force to clear existing
143 |     data before loading (useful when course data has been regenerated).
144 |     """
145 | 
146 |     print("🚀 Hierarchical Course Loader\n")
147 | 
148 |     # Clear existing data if force flag is set
149 |     if force:
150 |         print("🗑️  Clearing existing data (--force flag set)...")
151 |         summaries_deleted, details_deleted = clear_existing_data(summary_index, details_prefix)
152 |         print(f"   ✅ Cleared {summaries_deleted} summaries and {details_deleted} details\n")
153 | 
154 |     # Create manager
155 |     manager = HierarchicalCourseManager(
156 |         summary_index_name=summary_index,
157 |         details_prefix=details_prefix,
158 |     )
159 | 
160 |     # Ensure index is created before loading data
161 |     manager._get_summary_index()
162 | 
163 |     # Load courses
164 |     input_path = Path(input_file)
165 |     loaded, failed = asyncio.run(load_courses_from_json(input_path, manager))
166 | 
167 |     if loaded > 0:
168 |         print(f"\n📊 Redis Storage:")
169 |         print(f"   Summary index: {summary_index}")
170 |         print(f"   Details prefix: {details_prefix}")
171 |         print(f"\n💡 You can now use HierarchicalCourseManager to search courses!")
172 | 
173 | 
174 | if __name__ == "__main__":
175 |     main()
176 | 
177 | 


--------------------------------------------------------------------------------
/progressive_agents/stage3_full_agent_without_memory/agent/workflow.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Main workflow builder and runner for the Course Q&A Agent.
  3 | 
  4 | Adapted from caching-agent to use CourseManager for course search.
  5 | """
  6 | 
  7 | import logging
  8 | import time
  9 | from datetime import datetime
 10 | from typing import Any, Dict
 11 | 
 12 | from langgraph.graph import END, StateGraph
 13 | 
 14 | from .edges import (
 15 |     initialize_edges,
 16 |     route_after_cache_check,
 17 |     route_after_quality_evaluation,
 18 | )
 19 | from .nodes import (
 20 |     agent_node,
 21 |     check_cache_node,
 22 |     classify_intent_node,
 23 |     decompose_query_node,
 24 |     evaluate_quality_node,
 25 |     handle_greeting_node,
 26 |     initialize_nodes,
 27 |     research_node,
 28 |     set_verbose,
 29 |     synthesize_response_node,
 30 | )
 31 | from .state import WorkflowState, initialize_metrics
 32 | from .tools import initialize_tools
 33 | 
 34 | # Configure logger
 35 | logger = logging.getLogger("course-qa-workflow")
 36 | 
 37 | 
 38 | def create_workflow(course_manager, verbose: bool = True):
 39 |     """
 40 |     Create and compile the complete Course Q&A agent workflow.
 41 | 
 42 |     Args:
 43 |         course_manager: CourseManager instance for course search
 44 |         verbose: If True, show detailed logging. If False, suppress intermediate logs.
 45 | 
 46 |     Returns:
 47 |         Compiled LangGraph workflow
 48 |     """
 49 |     # Set verbose mode for nodes
 50 |     set_verbose(verbose)
 51 | 
 52 |     # Control logger level based on verbose flag
 53 |     if not verbose:
 54 |         logger.setLevel(logging.CRITICAL)
 55 |     else:
 56 |         logger.setLevel(logging.INFO)
 57 | 
 58 |     # Initialize all components
 59 |     initialize_nodes()
 60 |     initialize_edges()
 61 |     initialize_tools(course_manager)
 62 | 
 63 |     # Create workflow graph
 64 |     workflow = StateGraph(WorkflowState)
 65 | 
 66 |     # Add nodes
 67 |     workflow.add_node("classify_intent", classify_intent_node)
 68 |     workflow.add_node("handle_greeting", handle_greeting_node)
 69 |     workflow.add_node("agent", agent_node)  # NEW: Agent with tool calling
 70 | 
 71 |     # Set entry point
 72 |     workflow.set_entry_point("classify_intent")
 73 | 
 74 |     # Add routing function for intent classification
 75 |     def route_after_intent(state: WorkflowState) -> str:
 76 |         """Route based on query intent."""
 77 |         intent = state.get("query_intent", "GENERAL")
 78 | 
 79 |         if intent == "GREETING":
 80 |             return "handle_greeting"
 81 |         else:
 82 |             return "agent"  # Route to agent for all non-greeting queries
 83 | 
 84 |     # Add edges
 85 |     workflow.add_conditional_edges(
 86 |         "classify_intent",
 87 |         route_after_intent,
 88 |         {
 89 |             "handle_greeting": "handle_greeting",
 90 |             "agent": "agent",
 91 |         },
 92 |     )
 93 |     workflow.add_edge("handle_greeting", END)
 94 |     workflow.add_edge("agent", END)
 95 | 
 96 |     # Compile and return
 97 |     return workflow.compile()
 98 | 
 99 | 
100 | def run_agent(agent, query: str, enable_caching: bool = False) -> Dict[str, Any]:
101 |     """
102 |     Run the Course Q&A agent on a query (synchronous wrapper).
103 | 
104 |     Args:
105 |         agent: Compiled LangGraph workflow
106 |         query: User query about courses
107 |         enable_caching: Whether to use semantic caching (currently disabled)
108 | 
109 |     Returns:
110 |         Dictionary with results and metrics
111 |     """
112 |     import asyncio
113 | 
114 |     return asyncio.run(run_agent_async(agent, query, enable_caching))
115 | 
116 | 
117 | async def run_agent_async(
118 |     agent, query: str, enable_caching: bool = False
119 | ) -> Dict[str, Any]:
120 |     """
121 |     Run the Course Q&A agent on a query (async).
122 | 
123 |     Args:
124 |         agent: Compiled LangGraph workflow
125 |         query: User query about courses
126 |         enable_caching: Whether to use semantic caching (currently disabled)
127 | 
128 |     Returns:
129 |         Dictionary with results and metrics
130 |     """
131 |     start_time = time.perf_counter()
132 | 
133 |     # Initialize state for the workflow
134 |     initial_state: WorkflowState = {
135 |         "original_query": query,
136 |         "sub_questions": [],
137 |         "sub_answers": {},
138 |         "query_intent": None,
139 |         "detail_level": None,
140 |         "cache_hits": {},
141 |         "cache_confidences": {},
142 |         "cache_enabled": enable_caching,  # Currently always False
143 |         "research_iterations": {},
144 |         "max_research_iterations": 2,
145 |         "research_quality_scores": {},
146 |         "research_feedback": {},
147 |         "current_research_strategy": {},
148 |         "final_response": None,
149 |         "execution_path": [],
150 |         "active_sub_question": None,
151 |         "metrics": initialize_metrics(),
152 |         "timestamp": datetime.now().isoformat(),
153 |         "comparison_mode": False,
154 |         "llm_calls": {},
155 |     }
156 | 
157 |     logger.info("=" * 80)
158 |     logger.info(f"🚀 Starting Course Q&A workflow for query: '{query[:50]}...'")
159 | 
160 |     try:
161 |         # Execute the workflow (async)
162 |         final_state = await agent.ainvoke(initial_state)
163 | 
164 |         # Calculate final metrics
165 |         total_time = (time.perf_counter() - start_time) * 1000
166 |         final_state["metrics"]["total_latency"] = total_time
167 | 
168 |         # Create execution path string
169 |         execution_path = " → ".join(final_state["execution_path"])
170 |         final_state["metrics"]["execution_path"] = execution_path
171 | 
172 |         logger.info("=" * 80)
173 |         logger.info(f"✅ Workflow completed in {total_time:.2f}ms")
174 |         logger.info(f"📊 Execution path: {execution_path}")
175 | 
176 |         return final_state
177 | 
178 |     except Exception as e:
179 |         logger.error(f"Workflow execution failed: {e}")
180 |         return {
181 |             "original_query": query,
182 |             "final_response": f"Error: {e}",
183 |             "execution_path": ["failed"],
184 |             "metrics": {"total_latency": (time.perf_counter() - start_time) * 1000},
185 |         }
186 | 


--------------------------------------------------------------------------------
/progressive_agents/stage4_hybrid_search/agent/setup.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Setup and initialization for the Stage 4 ReAct Course Q&A Agent.
  3 | 
  4 | Initializes CourseManager, Redis, and other dependencies.
  5 | """
  6 | 
  7 | import json
  8 | import logging
  9 | import os
 10 | from pathlib import Path
 11 | from typing import Optional
 12 | 
 13 | from redis_context_course import CourseManager
 14 | from redis_context_course.redis_config import RedisConfig
 15 | from redis_context_course.scripts.generate_courses import CourseGenerator
 16 | from redis_context_course.scripts.ingest_courses import CourseIngestionPipeline
 17 | 
 18 | # Progressive agents use the hierarchical_courses index
 19 | # This index contains only courses with full syllabus data
 20 | PROGRESSIVE_AGENTS_INDEX = "hierarchical_courses"
 21 | 
 22 | # Configure logging
 23 | logging.basicConfig(
 24 |     level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
 25 | )
 26 | 
 27 | logger = logging.getLogger("course-qa-setup")
 28 | 
 29 | 
 30 | async def load_courses_if_needed(
 31 |     course_manager: CourseManager, force_reload: bool = False
 32 | ) -> int:
 33 |     """
 34 |     Load courses into Redis if they don't exist or if force_reload is True.
 35 | 
 36 |     Args:
 37 |         course_manager: CourseManager instance
 38 |         force_reload: If True, clear existing courses and reload
 39 | 
 40 |     Returns:
 41 |         Number of courses loaded
 42 |     """
 43 |     existing_courses = await course_manager.get_all_courses()
 44 | 
 45 |     if existing_courses and not force_reload:
 46 |         logger.info(f"📚 Found {len(existing_courses)} existing courses in Redis")
 47 |         return len(existing_courses)
 48 | 
 49 |     if not existing_courses:
 50 |         logger.info("📦 No courses found in Redis. Generating and loading sample courses...")
 51 |     else:
 52 |         logger.info("🔄 Force reload requested. Regenerating courses...")
 53 | 
 54 |     try:
 55 |         generator = CourseGenerator()
 56 |         courses = generator.generate_courses(courses_per_major=10)
 57 | 
 58 |         logger.info(f"✅ Generated {len(courses)} sample courses")
 59 | 
 60 |         temp_catalog_path = Path("/tmp/course_catalog_temp.json")
 61 |         catalog_data = {
 62 |             "majors": [],
 63 |             "courses": [course.model_dump(mode="json") for course in courses],
 64 |         }
 65 | 
 66 |         with open(temp_catalog_path, "w") as f:
 67 |             json.dump(catalog_data, f, indent=2, default=str)
 68 | 
 69 |         logger.info(f"💾 Saved catalog to {temp_catalog_path}")
 70 | 
 71 |         ingestion = CourseIngestionPipeline(config=course_manager._config)
 72 | 
 73 |         if force_reload or not existing_courses:
 74 |             logger.info("🧹 Clearing existing course data...")
 75 |             ingestion.clear_existing_data()
 76 | 
 77 |         catalog_data = ingestion.load_catalog_from_json(str(temp_catalog_path))
 78 |         courses_data = catalog_data.get("courses", [])
 79 | 
 80 |         ingested_count = await ingestion.ingest_courses(courses_data)
 81 | 
 82 |         logger.info(f"✅ Loaded {ingested_count} courses into Redis")
 83 | 
 84 |         temp_catalog_path.unlink()
 85 | 
 86 |         return ingested_count
 87 | 
 88 |     except Exception as e:
 89 |         logger.error(f"Failed to load courses: {e}")
 90 |         raise
 91 | 
 92 | 
 93 | async def initialize_course_manager(
 94 |     redis_url: Optional[str] = None, auto_load: bool = True
 95 | ) -> CourseManager:
 96 |     """
 97 |     Initialize the CourseManager for course search.
 98 | 
 99 |     Uses the hierarchical_courses index which contains only courses
100 |     with full syllabus data (matching hierarchical_courses.json).
101 | 
102 |     Args:
103 |         redis_url: Redis connection URL (defaults to env var REDIS_URL)
104 |         auto_load: If True, automatically load courses if they don't exist
105 | 
106 |     Returns:
107 |         Initialized CourseManager instance
108 |     """
109 |     if redis_url is None:
110 |         redis_url = os.getenv("REDIS_URL", "redis://localhost:6379")
111 | 
112 |     logger.info(f"Initializing CourseManager with Redis URL: {redis_url}")
113 |     logger.info(f"📇 Using index: {PROGRESSIVE_AGENTS_INDEX}")
114 | 
115 |     try:
116 |         # Create config with hierarchical_courses index
117 |         config = RedisConfig(
118 |             redis_url=redis_url,
119 |             vector_index_name=PROGRESSIVE_AGENTS_INDEX
120 |         )
121 | 
122 |         # Create CourseManager instance with custom config
123 |         course_manager = CourseManager(config=config)
124 | 
125 |         if auto_load:
126 |             course_count = await load_courses_if_needed(course_manager)
127 |             logger.info(f"✅ CourseManager initialized with {course_count} courses")
128 |         else:
129 |             all_courses = await course_manager.get_all_courses()
130 |             logger.info(f"✅ CourseManager initialized with {len(all_courses)} courses")
131 | 
132 |         return course_manager
133 | 
134 |     except Exception as e:
135 |         logger.error(f"Failed to initialize CourseManager: {e}")
136 |         raise
137 | 
138 | 
139 | async def cleanup_courses(course_manager: CourseManager):
140 |     """Clean up courses from Redis."""
141 |     logger.info("🧹 Cleaning up courses from Redis...")
142 | 
143 |     try:
144 |         ingestion = CourseIngestionPipeline(config=course_manager._config)
145 |         ingestion.clear_existing_data()
146 |         logger.info("✅ Courses cleaned up successfully")
147 |     except Exception as e:
148 |         logger.error(f"Failed to cleanup courses: {e}")
149 | 
150 | 
151 | async def setup_agent(auto_load_courses: bool = True):
152 |     """
153 |     Complete setup for the Stage 4 ReAct Course Q&A Agent.
154 | 
155 |     Args:
156 |         auto_load_courses: If True, automatically load courses if they don't exist
157 | 
158 |     Returns:
159 |         Tuple of (course_manager, None) - no semantic cache for this stage
160 |     """
161 |     logger.info("=" * 80)
162 |     logger.info("Setting up Stage 4 ReAct Course Q&A Agent")
163 |     logger.info("=" * 80)
164 | 
165 |     course_manager = await initialize_course_manager(auto_load=auto_load_courses)
166 | 
167 |     logger.info("=" * 80)
168 |     logger.info("✅ Stage 4 ReAct Course Q&A Agent setup complete")
169 |     logger.info("=" * 80)
170 | 
171 |     return course_manager, None
172 | 
173 | 


--------------------------------------------------------------------------------
/progressive_agents/stage6_full_memory/agent/state.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Workflow state definitions for the ReAct Loop Course Q&A Agent (Stage 7).
  3 | 
  4 | Extends Stage 6 with reasoning trace for ReAct (Reasoning + Acting) loop.
  5 | """
  6 | 
  7 | from typing import Any, Dict, List, Optional, TypedDict
  8 | 
  9 | 
 10 | class WorkflowMetrics(TypedDict):
 11 |     """Metrics tracking for workflow performance analysis."""
 12 | 
 13 |     total_latency: float
 14 |     decomposition_latency: float
 15 |     cache_latency: float
 16 |     research_latency: float
 17 |     synthesis_latency: float
 18 |     memory_load_latency: float  # NEW: Time to load working memory
 19 |     memory_save_latency: float  # NEW: Time to save working memory
 20 |     cache_hit_rate: float
 21 |     cache_hits_count: int
 22 |     questions_researched: int
 23 |     total_research_iterations: int
 24 |     llm_calls: Dict[str, int]
 25 |     sub_question_count: int
 26 |     execution_path: str
 27 | 
 28 | 
 29 | class WorkflowState(TypedDict):
 30 |     """
 31 |     State for ReAct Loop Course Q&A workflow (Stage 7).
 32 | 
 33 |     Extends Stage 6 with reasoning trace for explicit Thought → Action → Observation.
 34 |     """
 35 | 
 36 |     # Core query management
 37 |     original_query: str
 38 |     sub_questions: List[str]
 39 |     sub_answers: Dict[str, str]
 40 |     final_response: Optional[str]
 41 | 
 42 |     # NEW: ReAct reasoning trace
 43 |     reasoning_trace: List[
 44 |         Dict[str, Any]
 45 |     ]  # List of {type, thought, action, input, observation}
 46 |     react_iterations: int  # Number of ReAct loop iterations
 47 | 
 48 |     # Query intent classification
 49 |     query_intent: Optional[
 50 |         str
 51 |     ]  # "GREETING", "GENERAL", "SYLLABUS_OBJECTIVES", "ASSIGNMENTS", "PREREQUISITES"
 52 | 
 53 |     # Named Entity Recognition (NER) - from Stage 4
 54 |     extracted_entities: Optional[
 55 |         Dict[str, Any]
 56 |     ]  # course_codes, course_names, departments, etc.
 57 |     search_strategy: Optional[str]  # "exact_match", "hybrid", "semantic_only"
 58 | 
 59 |     # Hybrid search results - from Stage 4
 60 |     exact_matches: Optional[List[str]]  # Course codes that matched exactly
 61 |     metadata_filters: Optional[Dict[str, Any]]  # Filters extracted from query
 62 | 
 63 |     # NEW: Working Memory fields
 64 |     session_id: str  # Session identifier for conversation continuity
 65 |     student_id: str  # User identifier
 66 |     working_memory_loaded: bool  # Track if memory was loaded this turn
 67 |     conversation_history: List[Dict[str, str]]  # Previous messages from working memory
 68 |     current_turn_messages: List[
 69 |         Dict[str, str]
 70 |     ]  # Messages from current turn (to be saved)
 71 | 
 72 |     # Cache management (granular per sub-question)
 73 |     # NOTE: Semantic caching is commented out for now - will be added later
 74 |     cache_hits: Dict[str, bool]
 75 |     cache_confidences: Dict[str, float]
 76 |     cache_enabled: bool
 77 | 
 78 |     # Research iteration and quality control
 79 |     research_iterations: Dict[str, int]
 80 |     max_research_iterations: int
 81 |     research_quality_scores: Dict[str, float]
 82 |     research_feedback: Dict[str, str]
 83 |     current_research_strategy: Dict[str, str]
 84 | 
 85 |     # Agent coordination
 86 |     execution_path: List[str]
 87 |     active_sub_question: Optional[str]
 88 | 
 89 |     # Metrics and tracking
 90 |     metrics: WorkflowMetrics
 91 |     timestamp: str
 92 |     comparison_mode: bool
 93 |     llm_calls: Dict[str, int]
 94 | 
 95 | 
 96 | def initialize_metrics() -> WorkflowMetrics:
 97 |     """Initialize a clean metrics structure with default values."""
 98 |     return {
 99 |         "total_latency": 0.0,
100 |         "decomposition_latency": 0.0,
101 |         "cache_latency": 0.0,
102 |         "research_latency": 0.0,
103 |         "synthesis_latency": 0.0,
104 |         "memory_load_latency": 0.0,
105 |         "memory_save_latency": 0.0,
106 |         "cache_hit_rate": 0.0,
107 |         "cache_hits_count": 0,
108 |         "questions_researched": 0,
109 |         "total_research_iterations": 0,
110 |         "llm_calls": {},
111 |         "sub_question_count": 0,
112 |         "execution_path": "",
113 |     }
114 | 
115 | 
116 | def initialize_state(
117 |     query: str,
118 |     session_id: str,
119 |     student_id: str,
120 |     cache_enabled: bool = False,
121 |     max_research_iterations: int = 2,
122 |     comparison_mode: bool = False,
123 | ) -> WorkflowState:
124 |     """
125 |     Initialize workflow state for a new query.
126 | 
127 |     Args:
128 |         query: User's question
129 |         session_id: Session identifier for conversation continuity
130 |         student_id: User identifier
131 |         cache_enabled: Whether semantic caching is enabled
132 |         max_research_iterations: Maximum research attempts per sub-question
133 |         comparison_mode: Whether to run in comparison mode
134 | 
135 |     Returns:
136 |         Initialized WorkflowState
137 |     """
138 |     from datetime import datetime
139 | 
140 |     return {
141 |         # Core query
142 |         "original_query": query,
143 |         "sub_questions": [],
144 |         "sub_answers": {},
145 |         "final_response": None,
146 |         # Intent and NER
147 |         "query_intent": None,
148 |         "extracted_entities": None,
149 |         "search_strategy": None,
150 |         # Hybrid search
151 |         "exact_matches": None,
152 |         "metadata_filters": None,
153 |         # Working memory (NEW)
154 |         "session_id": session_id,
155 |         "student_id": student_id,
156 |         "working_memory_loaded": False,
157 |         "conversation_history": [],
158 |         "current_turn_messages": [],
159 |         # Cache
160 |         "cache_hits": {},
161 |         "cache_confidences": {},
162 |         "cache_enabled": cache_enabled,
163 |         # Research
164 |         "research_iterations": {},
165 |         "max_research_iterations": max_research_iterations,
166 |         "research_quality_scores": {},
167 |         "research_feedback": {},
168 |         "current_research_strategy": {},
169 |         # Coordination
170 |         "execution_path": [],
171 |         "active_sub_question": None,
172 |         # Metrics
173 |         "metrics": initialize_metrics(),
174 |         "timestamp": datetime.now().isoformat(),
175 |         "comparison_mode": comparison_mode,
176 |         "llm_calls": {},
177 |     }
178 | 


--------------------------------------------------------------------------------
/progressive_agents/stage5_working_memory/agent/state.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Workflow state definitions for the Memory-Augmented Course Q&A Agent.
  3 | 
  4 | Extends Stage 4 state with working memory fields for multi-turn conversations.
  5 | """
  6 | 
  7 | from typing import Any, Dict, List, Optional, TypedDict
  8 | 
  9 | 
 10 | class WorkflowMetrics(TypedDict):
 11 |     """Metrics tracking for workflow performance analysis."""
 12 | 
 13 |     total_latency: float
 14 |     decomposition_latency: float
 15 |     cache_latency: float
 16 |     research_latency: float
 17 |     synthesis_latency: float
 18 |     memory_load_latency: float  # NEW: Time to load working memory
 19 |     memory_save_latency: float  # NEW: Time to save working memory
 20 |     cache_hit_rate: float
 21 |     cache_hits_count: int
 22 |     questions_researched: int
 23 |     total_research_iterations: int
 24 |     llm_calls: Dict[str, int]
 25 |     sub_question_count: int
 26 |     execution_path: str
 27 | 
 28 | 
 29 | class WorkflowState(TypedDict):
 30 |     """
 31 |     State for Memory-Augmented Course Q&A workflow.
 32 | 
 33 |     Extends Stage 4 with working memory for multi-turn conversations.
 34 |     """
 35 | 
 36 |     # Core query management
 37 |     original_query: str
 38 |     sub_questions: List[str]
 39 |     sub_answers: Dict[str, str]
 40 |     final_response: Optional[str]
 41 | 
 42 |     # Query intent classification
 43 |     query_intent: Optional[
 44 |         str
 45 |     ]  # "GREETING", "GENERAL", "SYLLABUS_OBJECTIVES", "ASSIGNMENTS", "PREREQUISITES"
 46 | 
 47 |     # Named Entity Recognition (NER) - from Stage 4
 48 |     extracted_entities: Optional[
 49 |         Dict[str, Any]
 50 |     ]  # course_codes, course_names, departments, etc.
 51 |     search_strategy: Optional[str]  # "exact_match", "hybrid", "semantic_only"
 52 | 
 53 |     # Hybrid search results - from Stage 4
 54 |     exact_matches: Optional[List[str]]  # Course codes that matched exactly
 55 |     metadata_filters: Optional[Dict[str, Any]]  # Filters extracted from query
 56 | 
 57 |     # NEW: Working Memory fields
 58 |     session_id: str  # Session identifier for conversation continuity
 59 |     student_id: str  # User identifier
 60 |     working_memory_loaded: bool  # Track if memory was loaded this turn
 61 |     conversation_history: List[Dict[str, str]]  # Previous messages from working memory
 62 |     current_turn_messages: List[
 63 |         Dict[str, str]
 64 |     ]  # Messages from current turn (to be saved)
 65 | 
 66 |     # Cache management (granular per sub-question)
 67 |     # NOTE: Semantic caching is commented out for now - will be added later
 68 |     cache_hits: Dict[str, bool]
 69 |     cache_confidences: Dict[str, float]
 70 |     cache_enabled: bool
 71 | 
 72 |     # Research iteration and quality control
 73 |     research_iterations: Dict[str, int]
 74 |     max_research_iterations: int
 75 |     research_quality_scores: Dict[str, float]
 76 |     research_feedback: Dict[str, str]
 77 |     current_research_strategy: Dict[str, str]
 78 | 
 79 |     # Agent coordination
 80 |     execution_path: List[str]
 81 |     active_sub_question: Optional[str]
 82 | 
 83 |     # ReAct-specific fields
 84 |     reasoning_trace: List[Dict[str, Any]]  # Thought/Action/Observation history
 85 |     react_iterations: int  # Number of ReAct loop iterations
 86 | 
 87 |     # Metrics and tracking
 88 |     metrics: WorkflowMetrics
 89 |     timestamp: str
 90 |     comparison_mode: bool
 91 |     llm_calls: Dict[str, int]
 92 | 
 93 | 
 94 | def initialize_metrics() -> WorkflowMetrics:
 95 |     """Initialize a clean metrics structure with default values."""
 96 |     return {
 97 |         "total_latency": 0.0,
 98 |         "decomposition_latency": 0.0,
 99 |         "cache_latency": 0.0,
100 |         "research_latency": 0.0,
101 |         "synthesis_latency": 0.0,
102 |         "memory_load_latency": 0.0,
103 |         "memory_save_latency": 0.0,
104 |         "cache_hit_rate": 0.0,
105 |         "cache_hits_count": 0,
106 |         "questions_researched": 0,
107 |         "total_research_iterations": 0,
108 |         "llm_calls": {},
109 |         "sub_question_count": 0,
110 |         "execution_path": "",
111 |     }
112 | 
113 | 
114 | def initialize_state(
115 |     query: str,
116 |     session_id: str,
117 |     student_id: str,
118 |     cache_enabled: bool = False,
119 |     max_research_iterations: int = 2,
120 |     comparison_mode: bool = False,
121 | ) -> WorkflowState:
122 |     """
123 |     Initialize workflow state for a new query.
124 | 
125 |     Args:
126 |         query: User's question
127 |         session_id: Session identifier for conversation continuity
128 |         student_id: User identifier
129 |         cache_enabled: Whether semantic caching is enabled
130 |         max_research_iterations: Maximum research attempts per sub-question
131 |         comparison_mode: Whether to run in comparison mode
132 | 
133 |     Returns:
134 |         Initialized WorkflowState
135 |     """
136 |     from datetime import datetime
137 | 
138 |     return {
139 |         # Core query
140 |         "original_query": query,
141 |         "sub_questions": [],
142 |         "sub_answers": {},
143 |         "final_response": None,
144 |         # Intent and NER
145 |         "query_intent": None,
146 |         "extracted_entities": None,
147 |         "search_strategy": None,
148 |         # Hybrid search
149 |         "exact_matches": None,
150 |         "metadata_filters": None,
151 |         # Working memory (NEW)
152 |         "session_id": session_id,
153 |         "student_id": student_id,
154 |         "working_memory_loaded": False,
155 |         "conversation_history": [],
156 |         "current_turn_messages": [],
157 |         # Cache
158 |         "cache_hits": {},
159 |         "cache_confidences": {},
160 |         "cache_enabled": cache_enabled,
161 |         # Research
162 |         "research_iterations": {},
163 |         "max_research_iterations": max_research_iterations,
164 |         "research_quality_scores": {},
165 |         "research_feedback": {},
166 |         "current_research_strategy": {},
167 |         # Coordination
168 |         "execution_path": [],
169 |         "active_sub_question": None,
170 |         # ReAct
171 |         "reasoning_trace": [],
172 |         "react_iterations": 0,
173 |         # Metrics
174 |         "metrics": initialize_metrics(),
175 |         "timestamp": datetime.now().isoformat(),
176 |         "comparison_mode": comparison_mode,
177 |         "llm_calls": {},
178 |     }
179 | 


--------------------------------------------------------------------------------
/workshop_boa/redis_context_course_boa/redis_config_boa.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Redis configuration and connection management - BOA VERSION
  3 | 
  4 | This is a BOA-specific version that uses Orchestra API for embeddings.
  5 | 
  6 | Key changes from original:
  7 | - Uses Orchestra embeddings instead of OpenAI
  8 | - Supports placeholder mode for testing
  9 | - Clear TODO markers for configuration
 10 | """
 11 | 
 12 | import os
 13 | from typing import Optional
 14 | 
 15 | import redis
 16 | from redisvl.index import SearchIndex
 17 | from redisvl.schema import IndexSchema
 18 | 
 19 | 
 20 | class RedisConfig:
 21 |     """Redis configuration management - BOA VERSION."""
 22 | 
 23 |     def __init__(
 24 |         self,
 25 |         redis_url: Optional[str] = None,
 26 |         vector_index_name: str = "course_catalog",
 27 |         checkpoint_namespace: str = "class_agent",
 28 |         use_placeholder: bool = False,
 29 |     ):
 30 |         """
 31 |         Initialize Redis configuration.
 32 |         
 33 |         Args:
 34 |             redis_url: Redis connection URL
 35 |             vector_index_name: Name for vector index
 36 |             checkpoint_namespace: Namespace for checkpoints
 37 |             use_placeholder: If True, uses OpenAI instead of Orchestra (for testing)
 38 |         """
 39 |         # TODO Orchestra: Update Redis URL if needed
 40 |         # Default: redis://localhost:6379
 41 |         # For BOA: Update to your Redis instance
 42 |         self.redis_url = redis_url or os.getenv("REDIS_URL", "redis://localhost:6379")
 43 |         
 44 |         # Allow override via environment variable for progressive agents
 45 |         self.vector_index_name = os.getenv("COURSE_INDEX_NAME", vector_index_name)
 46 |         self.checkpoint_namespace = checkpoint_namespace
 47 |         self.use_placeholder = use_placeholder
 48 | 
 49 |         # Initialize connections
 50 |         self._redis_client = None
 51 |         self._vector_index = None
 52 |         self._embeddings = None
 53 | 
 54 |     @property
 55 |     def redis_client(self) -> redis.Redis:
 56 |         """Get or create Redis client."""
 57 |         if self._redis_client is None:
 58 |             self._redis_client = redis.from_url(
 59 |                 self.redis_url, decode_responses=False
 60 |             )
 61 |         return self._redis_client
 62 | 
 63 |     @property
 64 |     def embeddings(self):
 65 |         """Get or create embeddings function."""
 66 |         if self._embeddings is None:
 67 |             if self.use_placeholder:
 68 |                 # TODO Orchestra: Using OpenAI as placeholder for testing
 69 |                 print("⚠️  RedisConfig: Using OpenAI embeddings as placeholder")
 70 |                 from langchain_openai import OpenAIEmbeddings
 71 |                 self._embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 72 |             else:
 73 |                 # TODO Orchestra: Use Orchestra embeddings
 74 |                 # Import Orchestra utilities
 75 |                 import sys
 76 |                 sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
 77 |                 from orchestra_utils import OrchestraEmbeddings
 78 |                 
 79 |                 # TODO Orchestra: Update these parameters as needed
 80 |                 self._embeddings = OrchestraEmbeddings(
 81 |                     model="gpt-4o",  # TODO Orchestra: Update model name
 82 |                     user="workshop-user",  # TODO Orchestra: Update user identifier
 83 |                     data_privacy="confidential",  # TODO Orchestra: Update privacy level
 84 |                     residency="on-prem",  # TODO Orchestra: Update residency
 85 |                     source_id="workshop-boa"  # TODO Orchestra: Update source ID
 86 |                 )
 87 |                 print("✅ RedisConfig: Using Orchestra embeddings")
 88 |         
 89 |         return self._embeddings
 90 | 
 91 |     @property
 92 |     def vector_index(self) -> SearchIndex:
 93 |         """Get or create vector search index."""
 94 |         if self._vector_index is None:
 95 |             # Define schema for course catalog
 96 |             schema = IndexSchema.from_dict(
 97 |                 {
 98 |                     "index": {
 99 |                         "name": self.vector_index_name,
100 |                         "prefix": f"{self.vector_index_name}:",
101 |                         "storage_type": "hash",
102 |                     },
103 |                     "fields": [
104 |                         {"name": "id", "type": "tag"},
105 |                         {"name": "course_code", "type": "tag"},
106 |                         {"name": "title", "type": "text"},
107 |                         {"name": "description", "type": "text"},
108 |                         {"name": "department", "type": "tag"},
109 |                         {"name": "major", "type": "tag"},
110 |                         {"name": "difficulty_level", "type": "tag"},
111 |                         {"name": "format", "type": "tag"},
112 |                         {"name": "semester", "type": "tag"},
113 |                         {"name": "year", "type": "numeric"},
114 |                         {"name": "credits", "type": "numeric"},
115 |                         {"name": "tags", "type": "tag"},
116 |                         {
117 |                             "name": "content_vector",
118 |                             "type": "vector",
119 |                             "attrs": {
120 |                                 "dims": 1536,
121 |                                 "distance_metric": "cosine",
122 |                                 "algorithm": "hnsw",
123 |                                 "datatype": "float32",
124 |                             },
125 |                         },
126 |                     ],
127 |                 }
128 |             )
129 | 
130 |             self._vector_index = SearchIndex(schema=schema)
131 |             self._vector_index.set_client(self.redis_client)
132 | 
133 |         return self._vector_index
134 | 
135 |     def create_index(self, overwrite: bool = False):
136 |         """Create the vector index in Redis."""
137 |         try:
138 |             self.vector_index.create(overwrite=overwrite)
139 |             print(f"✅ Created vector index: {self.vector_index_name}")
140 |         except Exception as e:
141 |             if "Index already exists" in str(e):
142 |                 print(f"ℹ️  Vector index already exists: {self.vector_index_name}")
143 |             else:
144 |                 raise
145 | 
146 | 
147 | # Global instance for convenience
148 | # TODO Orchestra: Set use_placeholder=False when ready for Orchestra API
149 | redis_config = RedisConfig(use_placeholder=True)
150 | 
151 | 


--------------------------------------------------------------------------------
/workshop_boa/03_data_engineering_theory_README.md:
--------------------------------------------------------------------------------
  1 | # Data Engineering for Context Systems: Theoretical Foundation
  2 | 
  3 | ## Overview
  4 | 
  5 | This notebook provides a comprehensive theoretical foundation for data engineering in RAG and context systems. It's a companion to `03_data_engineering.ipynb`, focusing on the **why** behind chunking and data modeling decisions rather than just the **how**.
  6 | 
  7 | ## What Makes This Different
  8 | 
  9 | Unlike typical chunking tutorials that jump straight to "use 512 tokens," this notebook teaches you to:
 10 | 
 11 | 1. **Ask the right questions first**: "What is my natural retrieval unit?"
 12 | 2. **Understand when NOT to chunk**: Many structured data types are already optimal
 13 | 3. **Choose strategies based on data characteristics**: Not one-size-fits-all
 14 | 4. **Connect to context engineering principles**: How each decision affects what reaches the LLM
 15 | 
 16 | ## Structure
 17 | 
 18 | ### Part 1: The Foundation - Data Modeling for RAG
 19 | - **The critical first question**: What is your natural retrieval unit?
 20 | - **The "Don't Chunk" strategy**: When structured records are already optimal
 21 | - **Hierarchical patterns**: Summaries + Details architecture
 22 | - **Practical example**: Course catalog analysis
 23 | 
 24 | ### Part 2: When Chunking Matters
 25 | - **Comparative analysis**: Structured records vs. long-form documents
 26 | - **Research foundations**: Lost in the Middle, Context Rot, NIAH
 27 | - **The retrieval precision problem**: 8-12x reduction in irrelevant context
 28 | - **Context engineering impact**: How chunking affects LLM performance
 29 | 
 30 | ### Part 3: Core Chunking Strategies
 31 | - **Strategy 1: Document-Based (Structure-Aware)**
 32 |   - Theory, implementation, trade-offs
 33 |   - Best for: Research papers, technical docs
 34 |   - Optimizes for: Semantic completeness
 35 |   
 36 | - **Strategy 2: Fixed-Size (Token-Based)**
 37 |   - Theory, implementation, trade-offs
 38 |   - Best for: Unstructured text, consistent sizes
 39 |   - Optimizes for: Predictability
 40 |   
 41 | - **Strategy 3: Semantic (Meaning-Based)**
 42 |   - Theory, implementation, trade-offs
 43 |   - Best for: Dense academic text, adaptive boundaries
 44 |   - Optimizes for: Topical coherence
 45 | 
 46 | - **Decision framework**: Step-by-step guide for choosing strategies
 47 | 
 48 | ### Part 4: Advanced Topics
 49 | - **Multimodal content**: Handling tables, formulas, figures, code
 50 | - **Complex documents**: Legal contracts, knowledge graphs, recursive retrieval
 51 | - **Troubleshooting**: Common failure patterns and solutions
 52 | 
 53 | ### Part 5: Context Engineering Principles
 54 | - **The context engineering stack**: How data decisions cascade through the system
 55 | - **Core principles**: Precision over completeness, semantic boundaries, natural units
 56 | - **Token efficiency vs. retrieval precision**: Understanding trade-offs
 57 | - **Production-ready framework**: Step-by-step process for real systems
 58 | 
 59 | ## Key Concepts
 60 | 
 61 | ### The Critical First Question
 62 | > "What is the natural unit of information I want to retrieve?"
 63 | 
 64 | This single question determines whether you need chunking at all.
 65 | 
 66 | ### Context Engineering Principles
 67 | 
 68 | 1. **Precision Over Completeness**: Better to retrieve 500 relevant tokens than 6,000 mixed tokens
 69 | 2. **Semantic Boundaries Over Arbitrary Boundaries**: Keep tables with captions, formulas with definitions
 70 | 3. **Natural Units Over Forced Chunking**: Don't chunk data that's already at optimal granularity
 71 | 4. **Structure-Aware Over Structure-Blind**: Respect document organization when it aligns with semantics
 72 | 5. **Measure, Don't Assume**: Test strategies on YOUR data with YOUR queries
 73 | 
 74 | ### The Core Insight
 75 | 
 76 | > **Chunking isn't about fitting in context windows - it's about data modeling for retrieval.**
 77 | 
 78 | Just like database schema design, how you structure your knowledge base dramatically affects retrieval quality, token efficiency, and system performance.
 79 | 
 80 | ## Learning Outcomes
 81 | 
 82 | After completing this notebook, you will be able to:
 83 | 
 84 | - ✅ Identify natural retrieval units in your data
 85 | - ✅ Decide when to chunk vs. when not to chunk
 86 | - ✅ Choose appropriate chunking strategies based on data characteristics
 87 | - ✅ Understand how data engineering decisions affect context quality
 88 | - ✅ Implement production-ready chunking systems
 89 | - ✅ Handle multimodal content (tables, formulas, figures)
 90 | - ✅ Troubleshoot common chunking failures
 91 | - ✅ Apply context engineering principles to real-world systems
 92 | 
 93 | ## Prerequisites
 94 | 
 95 | - Understanding of vector embeddings and semantic search
 96 | - Familiarity with RAG (Retrieval-Augmented Generation) concepts
 97 | - Basic knowledge of LLM context windows
 98 | - Completed Module 2: RAG Fundamentals (recommended)
 99 | 
100 | ## Running the Notebook
101 | 
102 | 1. Ensure Redis is running with course data loaded
103 | 2. Set environment variables (OPENAI_API_KEY, REDIS_URL)
104 | 3. Run cells sequentially to see theory + practice examples
105 | 
106 | ## Estimated Time
107 | 
108 | 45-60 minutes for complete walkthrough
109 | 
110 | ## Related Resources
111 | 
112 | - **Practical companion**: `03_data_engineering.ipynb` - Hands-on implementation
113 | - **Research papers**: Lost in the Middle, Context Rot, Contextual Retrieval
114 | - **Tools**: LangChain Text Splitters, RedisVL, HuggingFace Embeddings
115 | 
116 | ## Key Takeaways
117 | 
118 | 1. **Many structured data types don't need chunking** - they're already at optimal granularity
119 | 2. **Chunking is a design choice**, not a default step - understand your data first
120 | 3. **Different strategies optimize for different goals** - structure-aware for completeness, fixed-size for predictability, semantic for coherence
121 | 4. **Context engineering is about controlling what reaches the LLM** - every data decision affects final quality
122 | 5. **Measure and iterate** - there's no universal "best" chunk size or strategy
123 | 
124 | ## Questions This Notebook Answers
125 | 
126 | - When should I chunk my data?
127 | - When should I NOT chunk my data?
128 | - What chunk size should I use? (Spoiler: It depends!)
129 | - How do I handle tables and formulas in documents?
130 | - Why does my RAG system return incomplete answers?
131 | - How can I reduce token costs without sacrificing quality?
132 | - What's the difference between chunking strategies?
133 | - How do I choose the right strategy for my use case?
134 | 
135 | ## Next Steps
136 | 
137 | After mastering data engineering for RAG:
138 | - **Module 4**: Memory Systems for Context Engineering
139 | - **Module 5**: Building Complete Agent Systems
140 | 
141 | ---
142 | 
143 | **Created**: 2025-12-10
144 | **Purpose**: Theoretical foundation for data engineering in context systems
145 | **Audience**: Engineers building production RAG and agent systems
146 | 
147 | 


--------------------------------------------------------------------------------