├── LICENSE ├── README.md ├── contextual_engineering.ipynb ├── requirements.txt └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | This project includes code and ideas adapted from: 4 | 5 | LangChain's context_engineering repository (https://github.com/langchain-ai/context_engineering) 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # LangChain AI Agents Using Contextual Engineering 3 | 4 | Context engineering means creating the right setup for an AI before giving it a task. This setup includes: 5 | 6 | * **Instructions** on how the AI should act, like being a helpful budget travel guide 7 | * Access to **useful info** from databases, documents, or live sources. 8 | * Remembering **past conversations** to avoid repeats or forgetting. 9 | * **Tools** the AI can use, such as calculators or search features. 10 | * Important details about you, like your **preferences** or location. 11 | 12 | ![Context Engineering](https://cdn-images-1.medium.com/max/1500/1*sCTOzjG6KP7slQuxLZUtNg.png) 13 | *Context Engineering (From [LangChain](https://blog.langchain.com/context-engineering-for-agents/) and [12Factor](https://github.com/humanlayer/12-factor-agents/tree/main))* 14 | 15 | [AI engineers are now shifting](https://diamantai.substack.com/p/why-ai-experts-are-moving-from-prompt) from prompt engineering to context engineering because… 16 | 17 | > context engineering focuses on providing AI with the right background and tools, making its answers smarter and more useful. 18 | 19 | In this blog, we will explore how **LangChain** and **LangGraph** two powerful tools for building AI agents, RAG apps, and LLM apps can be used to implement **contextual engineering** effectively to improve our AI Agents. 20 | 21 | This guide is created on top of [langgchain ai](https://github.com/FareedKhan-dev/contextual-engineering-guide) guide. 22 | 23 | --- 24 | 25 | 26 | ### Table of Contents 27 | - [What is Context Engineering?](#what-is-context-engineering) 28 | - [Scratchpad with LangGraph](#scratchpad-with-langgraph) 29 | - [Creating StateGraph](#creating-stategraph) 30 | - [Memory Writing in LangGraph](#memory-writing-in-langgraph) 31 | - [Scratchpad Selection Approach](#scratchpad-selection-approach) 32 | - [Memory Selection Ability](#memory-selection-ability) 33 | - [Advantage of LangGraph BigTool Calling](#advantage-of-langgraph-bigtool-calling) 34 | - [RAG with Contextual Engineering](#rag-with-contextual-engineering) 35 | - [Compression Strategy with knowledgeable Agents](#compression-strategy-with-knowledgeable-agents) 36 | - [Isolating Context using Sub-Agents Architecture](#isolating-context-using-sub-agents-architecture) 37 | - [Isolation using Sandboxed Environments](#isolation-using-sandboxed-environments) 38 | - [State Isolation in LangGraph](#state-isolation-in-langgraph) 39 | - [Summarizing Everything](#summarizing-everything) 40 | 41 | ### What is Context Engineering? 42 | LLMs work like a new type of operating system. The LLM acts like the CPU, and its context window works like RAM, serving as its short-term memory. But, like RAM, the context window has limited space for different information. 43 | 44 | > Just as an operating system decides what goes into RAM, “context engineering” is about choosing what the LLM should keep in its context. 45 | 46 | ![Different Context Types](https://cdn-images-1.medium.com/max/1000/1*kMEQSslFkhLiuJS8-WEMIg.png) 47 | 48 | When building LLM applications, we need to manage different types of context. Context engineering covers these main types: 49 | 50 | * Instructions: prompts, examples, memories, and tool descriptions 51 | * Knowledge: facts, stored information, and memories 52 | * Tools: feedback and results from tool calls 53 | 54 | This year, more people are interested in agents because LLMs are better at thinking and using tools. Agents work on long tasks by using LLMs and tools together, choosing the next step based on the tool’s feedback. 55 | 56 | ![Agent Workflow](https://cdn-images-1.medium.com/max/1500/1*Do44CZkpPYyIJefuNQ69GA.png) 57 | 58 | But long tasks and collecting too much feedback from tools use a lot of tokens. This can create problems: the context window can overflow, costs and delays can increase, and the agent might work worse. 59 | 60 | Drew Breunig explained how too much context can hurt performance, including: 61 | 62 | * Context Poisoning: [when a mistake or hallucination gets added to the context](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-poisoning) 63 | * Context Distraction: [when too much context confuses the model](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-distraction) 64 | * Context Confusion: [when extra, unnecessary details affect the answer](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-confusion) 65 | * Context Clash: [when parts of the context give conflicting information](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-clash) 66 | 67 | ![Multiple turns in Agent](https://cdn-images-1.medium.com/max/1500/1*ZJeZJPKI5jC_1BMCoghZxA.png) 68 | 69 | Anthropic [in their research](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com) stressed the need for it: 70 | 71 | > Agents often have conversations with hundreds of turns, so managing context carefully is crucial. 72 | 73 | So, how are people solving this problem today? Common strategies for agent context engineering can be grouped into four main types: 74 | 75 | * Write: creating clear and useful context 76 | * Select: picking only the most relevant information 77 | * Compress: shortening context to save space 78 | * Isolate: keeping different types of context separate 79 | 80 | ![Categories of Context Engineering](https://cdn-images-1.medium.com/max/2600/1*CacnXVAI6wR4eSIWgnZ9sg.png) 81 | *Categories of Context Engineering (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 82 | 83 | [LangGraph](https://www.langchain.com/langgraph) is built to support all these strategies. We will go through each of these components one by one in [LangGraph](https://www.langchain.com/langgraph) and see how they help make our AI agents work better. 84 | 85 | ### Scratchpad with LangGraph 86 | Just like humans take notes to remember things for later tasks, agents can do the same using a [scratchpad](https://www.anthropic.com/engineering/claude-think-tool). It stores information outside the context window so the agent can access it whenever needed. 87 | 88 | ![First Component of CE](https://cdn-images-1.medium.com/max/1000/1*aXpKxYt03iZPcrGkxsFvrQ.png) 89 | *First Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 90 | 91 | A good example is [Anthropic multi-agent researcher](https://www.anthropic.com/engineering/built-multi-agent-research-system): 92 | 93 | > *The LeadResearcher plans its approach and saves it to memory, because if the context window goes beyond 200,000 tokens, it gets cut off so saving the plan ensures it isn’t lost.* 94 | 95 | Scratchpads can be implemented in different ways: 96 | 97 | * As a [tool call](https://www.anthropic.com/engineering/claude-think-tool) that [writes to a file](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem). 98 | * As a field in a runtime [state object](https://langchain-ai.github.io/langgraph/concepts/low_level/#state) that persists during the session. 99 | 100 | In short, scratchpads help agents keep important notes during a session to complete tasks effectively. 101 | 102 | In terms of LangGraph, it supports both [short-term](https://langchain-ai.github.io/langgraph/concepts/memory/#short-term-memory) (thread-scoped) and [long-term memory](https://langchain-ai.github.io/langgraph/concepts/memory/#long-term-memory). 103 | 104 | * Short-term memory uses [checkpointing](https://langchain-ai.github.io/langgraph/concepts/persistence/) to save the [agent state](https://langchain-ai.github.io/langgraph/concepts/low_level/#state) during a session. It works like a scratchpad, letting you store information while the agent runs and retrieve it later. 105 | 106 | The state object is the main structure passed between graph nodes. You can define its format (usually a Python dictionary). It acts as a shared scratchpad, where each node can read and update specific fields. 107 | 108 | > We will only import the modules when we need them, so we can learn step by step in a clear way. 109 | 110 | For better and cleaner output, we will use Python `pprint` module for pretty printing and the `Console` module from the `rich` library. Let’s import and initialize them first: 111 | 112 | ```python 113 | # Import necessary libraries 114 | from typing import TypedDict # For defining the state schema with type hints 115 | 116 | from rich.console import Console # For pretty-printing output 117 | from rich.pretty import pprint # For pretty-printing Python objects 118 | 119 | # Initialize a console for rich, formatted output in the notebook. 120 | console = Console() 121 | ``` 122 | 123 | Next, we will create a `TypedDict` for the state object. 124 | 125 | ```python 126 | # Define the schema for the graph's state using TypedDict. 127 | # This class acts as a data structure that will be passed between nodes in the graph. 128 | # It ensures that the state has a consistent shape and provides type hints. 129 | class State(TypedDict): 130 | """ 131 | Defines the structure of the state for our joke generator workflow. 132 | 133 | Attributes: 134 | topic: The input topic for which a joke will be generated. 135 | joke: The output field where the generated joke will be stored. 136 | """ 137 | 138 | topic: str 139 | joke: str 140 | ``` 141 | 142 | This state object will store the topic and the joke that we ask our agent to generate based on the given topic. 143 | 144 | ### Creating StateGraph 145 | Once we define a state object, we can write context to it using a [StateGraph](https://langchain-ai.github.io/langgraph/concepts/low_level/#stategraph). 146 | 147 | A StateGraph is LangGraph’s main tool for building stateful [agents or workflows](https://langchain-ai.github.io/langgraph/concepts/workflows/). Think of it as a directed graph: 148 | 149 | * Nodes are steps in the workflow. Each node takes the current state as input, updates it, and returns the changes. 150 | * Edges connect nodes, defining how execution flows this can be linear, conditional, or even cyclical. 151 | 152 | Next, we will: 153 | 154 | 1. Create a [chat model](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html) by choosing from [Anthropic models](https://docs.anthropic.com/en/docs/about-claude/models/overview). 155 | 2. Use it in a LangGraph workflow. 156 | 157 | ```python 158 | # Import necessary libraries for environment management, display, and LangGraph 159 | import getpass 160 | import os 161 | 162 | from IPython.display import Image, display 163 | from langchain.chat_models import init_chat_model 164 | from langgraph.graph import END, START, StateGraph 165 | 166 | # --- Environment and Model Setup --- 167 | # Set the Anthropic API key to authenticate requests 168 | from dotenv import load_dotenv 169 | api_key = os.getenv("ANTHROPIC_API_KEY") 170 | if not api_key: 171 | raise ValueError("Missing ANTHROPIC_API_KEY in environment") 172 | 173 | # Initialize the chat model to be used in the workflow 174 | # We use a specific Claude model with temperature=0 for deterministic outputs 175 | llm = init_chat_model("anthropic:claude-sonnet-4-20250514", temperature=0) 176 | ``` 177 | We’ve initialized our Sonnet model. LangChain supports many open-source and closed models through their APIs, so you can use any of them. 178 | 179 | Now, we need to create a function that generates a response using this Sonnet model. 180 | ```python 181 | # --- Define Workflow Node --- 182 | def generate_joke(state: State) -> dict[str, str]: 183 | """ 184 | A node function that generates a joke based on the topic in the current state. 185 | 186 | This function reads the 'topic' from the state, uses the LLM to generate a joke, 187 | and returns a dictionary to update the 'joke' field in the state. 188 | 189 | Args: 190 | state: The current state of the graph, which must contain a 'topic'. 191 | 192 | Returns: 193 | A dictionary with the 'joke' key to update the state. 194 | """ 195 | # Read the topic from the state 196 | topic = state["topic"] 197 | print(f"Generating a joke about: {topic}") 198 | 199 | # Invoke the language model to generate a joke 200 | msg = llm.invoke(f"Write a short joke about {topic}") 201 | 202 | # Return the generated joke to be written back to the state 203 | return {"joke": msg.content} 204 | ``` 205 | This function simply returns a dictionary containing the generated response (the joke). 206 | 207 | Now, using the StateGraph, we can easily build and compile the graph. Let’s do that next. 208 | ```python 209 | # --- Build and Compile the Graph --- 210 | # Initialize a new StateGraph with the predefined State schema 211 | workflow = StateGraph(State) 212 | 213 | # Add the 'generate_joke' function as a node in the graph 214 | workflow.add_node("generate_joke", generate_joke) 215 | 216 | # Define the workflow's execution path: 217 | # The graph starts at the START entrypoint and flows to our 'generate_joke' node. 218 | workflow.add_edge(START, "generate_joke") 219 | # After 'generate_joke' completes, the graph execution ends. 220 | workflow.add_edge("generate_joke", END) 221 | 222 | # Compile the workflow into an executable chain 223 | chain = workflow.compile() 224 | 225 | # --- Visualize the Graph --- 226 | # Display a visual representation of the compiled workflow graph 227 | display(Image(chain.get_graph().draw_mermaid_png())) 228 | ``` 229 | ![Our Generated Graph](https://cdn-images-1.medium.com/max/1000/1*SxWwYN-oO_rG9xUFgeuB-A.png) 230 | 231 | Now we can execute this workflow. 232 | ```python 233 | # --- Execute the Workflow --- 234 | # Invoke the compiled graph with an initial state containing the topic. 235 | # The `invoke` method runs the graph from the START node to the END node. 236 | joke_generator_state = chain.invoke({"topic": "cats"}) 237 | 238 | # --- Display the Final State --- 239 | # Print the final state of the graph after execution. 240 | # This will show both the input 'topic' and the output 'joke' that was written to the state. 241 | console.print("\n[bold blue]Joke Generator State:[/bold blue]") 242 | pprint(joke_generator_state) 243 | 244 | #### OUTPUT #### 245 | { 246 | 'topic': 'cats', 247 | 'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!' 248 | } 249 | ``` 250 | It returns the dictionary which is basically the joke generation state of our agent. This simple example shows how we can write context to state. 251 | 252 | > You can learn more about [Checkpointing](https://langchain-ai.github.io/langgraph/concepts/persistence/) for saving and resuming graph states, and [Human-in-the-loop](https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/) for pausing workflows to get human input before continuing. 253 | 254 | ### Memory Writing in LangGraph 255 | Scratchpads help agents work within a single session, but sometimes agents need to remember things across multiple sessions. 256 | 257 | * [Reflexion](https://arxiv.org/abs/2303.11366) introduced the idea of agents reflecting after each turn and reusing self-generated hints. 258 | * [Generative Agents](https://ar5iv.labs.arxiv.org/html/2304.03442) created long-term memories by summarizing past agent feedback. 259 | 260 | ![Memory Writing](https://cdn-images-1.medium.com/max/1000/1*VaMVevdSVxDITLK1j0LfRQ.png) 261 | *Memory Writing (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 262 | 263 | These ideas are now used in products like [ChatGPT](https://help.openai.com/en/articles/8590148-memory-faq), [Cursor](https://forum.cursor.com/t/0-51-memories-feature/98509), and [Windsurf](https://docs.windsurf.com/windsurf/cascade/memories), which automatically create long-term memories from user interactions. 264 | 265 | * Checkpointing saves the graph’s state at each step in a [thread](https://langchain-ai.github.io/langgraph/concepts/persistence/). A thread has a unique ID and usually represents one interaction — like a single chat in ChatGPT. 266 | * Long-term memory lets you keep specific context across threads. You can save [individual files](https://langchain-ai.github.io/langgraph/concepts/memory/#profile) (e.g., a user profile) or [collections](https://langchain-ai.github.io/langgraph/concepts/memory/#collection) of memories. 267 | * It uses the [BaseStore](https://langchain-ai.github.io/langgraph/reference/store/) interface, a key-value store. You can use it in memory (as shown here) or with [LangGraph Platform deployments](https://langchain-ai.github.io/langgraph/concepts/persistence/#langgraph-platform). 268 | 269 | Let’s now create an `InMemoryStore` to use across multiple sessions in this notebook. 270 | 271 | ```python 272 | from langgraph.store.memory import InMemoryStore 273 | 274 | # --- Initialize Long-Term Memory Store --- 275 | # Create an instance of InMemoryStore, which provides a simple, non-persistent, 276 | # key-value storage system for use within the current session. 277 | store = InMemoryStore() 278 | 279 | # --- Define a Namespace for Organization --- 280 | # A namespace is used to logically group related data within the store. 281 | # Here, we use a tuple to represent a hierarchical namespace, 282 | # which could correspond to a user ID and an application context. 283 | namespace = ("rlm", "joke_generator") 284 | 285 | # --- Write Data to the Memory Store --- 286 | # Use the `put` method to save a key-value pair into the specified namespace. 287 | # This operation persists the joke generated in the previous step, making it 288 | # available for retrieval across different sessions or threads. 289 | store.put( 290 | namespace, # The namespace to write to 291 | "last_joke", # The key for the data entry 292 | {"joke": joke_generator_state["joke"]}, # The value to be stored 293 | ) 294 | ``` 295 | We’ll discuss how to select context from a namespace in the upcoming section. For now, we can use the [search](https://langchain-ai.github.io/langgraph/reference/store/#langgraph.store.base.BaseStore.search) method to view items within a namespace and confirm that we successfully wrote to it. 296 | ```python 297 | # Search the namespace to view all stored items 298 | stored_items = list(store.search(namespace)) 299 | 300 | # Display the stored items with rich formatting 301 | console.print("\n[bold green]Stored Items in Memory:[/bold green]") 302 | pprint(stored_items) 303 | 304 | #### OUTPUT #### 305 | [ 306 | Item(namespace=['rlm', 'joke_generator'], key='last_joke', 307 | value={'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!'}, 308 | created_at='2025-07-24T02:12:25.936238+00:00', 309 | updated_at='2025-07-24T02:12:25.936238+00:00', score=None) 310 | ] 311 | ``` 312 | Now, let’s embed everything we did into a LangGraph workflow. 313 | 314 | We will compile the workflow with two arguments: 315 | 316 | * `checkpointer` saves the graph state at each step in a thread. 317 | * `store` keeps context across different threads. 318 | 319 | ```python 320 | from langgraph.checkpoint.memory import InMemorySaver 321 | from langgraph.store.base import BaseStore 322 | from langgraph.store.memory import InMemoryStore 323 | 324 | # Initialize storage components 325 | checkpointer = InMemorySaver() # For thread-level state persistence 326 | memory_store = InMemoryStore() # For cross-thread memory storage 327 | 328 | 329 | def generate_joke(state: State, store: BaseStore) -> dict[str, str]: 330 | """Generate a joke with memory awareness. 331 | 332 | This enhanced version checks for existing jokes in memory 333 | before generating new ones. 334 | 335 | Args: 336 | state: Current state containing the topic 337 | store: Memory store for persistent context 338 | 339 | Returns: 340 | Dictionary with the generated joke 341 | """ 342 | # Check if there's an existing joke in memory 343 | existing_jokes = list(store.search(namespace)) 344 | if existing_jokes: 345 | existing_joke = existing_jokes[0].value 346 | print(f"Existing joke: {existing_joke}") 347 | else: 348 | print("Existing joke: No existing joke") 349 | 350 | # Generate a new joke based on the topic 351 | msg = llm.invoke(f"Write a short joke about {state['topic']}") 352 | 353 | # Store the new joke in long-term memory 354 | store.put(namespace, "last_joke", {"joke": msg.content}) 355 | 356 | # Return the joke to be added to state 357 | return {"joke": msg.content} 358 | 359 | 360 | # Build the workflow with memory capabilities 361 | workflow = StateGraph(State) 362 | 363 | # Add the memory-aware joke generation node 364 | workflow.add_node("generate_joke", generate_joke) 365 | 366 | # Connect the workflow components 367 | workflow.add_edge(START, "generate_joke") 368 | workflow.add_edge("generate_joke", END) 369 | 370 | # Compile with both checkpointing and memory store 371 | chain = workflow.compile(checkpointer=checkpointer, store=memory_store) 372 | ``` 373 | Great! Now we can simply execute the updated workflow and test how it works with the memory feature enabled. 374 | ```python 375 | # Execute the workflow with thread-based configuration 376 | config = {"configurable": {"thread_id": "1"}} 377 | joke_generator_state = chain.invoke({"topic": "cats"}, config) 378 | 379 | # Display the workflow result with rich formatting 380 | console.print("\n[bold cyan]Workflow Result (Thread 1):[/bold cyan]") 381 | pprint(joke_generator_state) 382 | 383 | #### OUTPUT #### 384 | Existing joke: No existing joke 385 | 386 | Workflow Result (Thread 1): 387 | { 'topic': 'cats', 388 | 'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!'} 389 | ``` 390 | Since this is thread 1, there’s no existing joke stored in our AI agent’s memory which is exactly what we’d expect for a fresh thread. 391 | 392 | Because we compiled the workflow with a checkpointer, we can now view the [latest state](https://langchain-ai.github.io/langgraph/concepts/persistence/#get-state) of the graph. 393 | ```python 394 | # --- Retrieve and Inspect the Graph State --- 395 | # Use the `get_state` method to retrieve the latest state snapshot for the 396 | # thread specified in the `config` (in this case, thread "1"). This is 397 | # possible because we compiled the graph with a checkpointer. 398 | latest_state = chain.get_state(config) 399 | 400 | # --- Display the State Snapshot --- 401 | # Print the retrieved state to the console. The StateSnapshot includes not only 402 | # the data ('topic', 'joke') but also execution metadata. 403 | console.print("\n[bold magenta]Latest Graph State (Thread 1):[/bold magenta]") 404 | pprint(latest_state) 405 | ``` 406 | Take a look at the output: 407 | ``` 408 | ### OUTPUT OF OUR LATEST STATE ### 409 | Latest Graph State: 410 | 411 | StateSnapshot( 412 | values={ 413 | 'topic': 'cats', 414 | 'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!' 415 | }, 416 | next=(), 417 | config={ 418 | 'configurable': { 419 | 'thread_id': '1', 420 | 'checkpoint_ns': '', 421 | 'checkpoint_id': '1f06833a-53a7-65a8-8001-548e412001c4' 422 | } 423 | }, 424 | metadata={'source': 'loop', 'step': 1, 'parents': {}}, 425 | created_at='2025-07-24T02:12:27.317802+00:00', 426 | parent_config={ 427 | 'configurable': { 428 | 'thread_id': '1', 429 | 'checkpoint_ns': '', 430 | 'checkpoint_id': '1f06833a-4a50-6108-8000-245cde0c2411' 431 | } 432 | }, 433 | tasks=(), 434 | interrupts=() 435 | ) 436 | ``` 437 | You can see that our state now shows the last conversation we had with the agent in this case, where we asked it to tell a joke about cats. 438 | 439 | Let’s rerun the workflow with different ID. 440 | ```python 441 | # Execute the workflow with a different thread ID 442 | config = {"configurable": {"thread_id": "2"}} 443 | joke_generator_state = chain.invoke({"topic": "cats"}, config) 444 | 445 | # Display the result showing memory persistence across threads 446 | console.print("\n[bold yellow]Workflow Result (Thread 2):[/bold yellow]") 447 | pprint(joke_generator_state) 448 | 449 | #### OUTPUT #### 450 | Existing joke: {'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!'} 451 | Workflow Result (Thread 2): 452 | {'topic': 'cats', 'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!'} 453 | ``` 454 | We can see that the joke from the first thread has been successfully saved to memory. 455 | 456 | > You can learn more about [LangMem](https://langchain-ai.github.io/langmem/) for memory abstractions and the [Ambient Agents Course](https://github.com/langchain-ai/agents-from-scratch/blob/main/notebooks/memory.ipynb) for an overview of memory in LangGraph agents. 457 | 458 | ### Scratchpad Selection Approach 459 | How you select context from a scratchpad depends on its implementation: 460 | 461 | * If it’s a [tool](https://www.anthropic.com/engineering/claude-think-tool), the agent can read it directly by making a tool call. 462 | * If it’s part of the agent’s runtime state, you (the developer) decide which parts of the state to share with the agent at each step. This gives you fine-grained control over what context is exposed. 463 | 464 | ![Second Component of CE](https://cdn-images-1.medium.com/max/1000/1*VZiHtQ_8AlNdV3HIMrbBZA.png) 465 | *Second Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 466 | 467 | In previous step, we learned how to write to the LangGraph state object. Now, we’ll learn how to select context from the state and pass it to an LLM call in a downstream node. 468 | 469 | This selective approach lets you control exactly what context the LLM sees during execution. 470 | ```python 471 | def generate_joke(state: State) -> dict[str, str]: 472 | """Generate an initial joke about the topic. 473 | 474 | Args: 475 | state: Current state containing the topic 476 | 477 | Returns: 478 | Dictionary with the generated joke 479 | """ 480 | msg = llm.invoke(f"Write a short joke about {state['topic']}") 481 | return {"joke": msg.content} 482 | 483 | 484 | def improve_joke(state: State) -> dict[str, str]: 485 | """Improve an existing joke by adding wordplay. 486 | 487 | This demonstrates selecting context from state - we read the existing 488 | joke from state and use it to generate an improved version. 489 | 490 | Args: 491 | state: Current state containing the original joke 492 | 493 | Returns: 494 | Dictionary with the improved joke 495 | """ 496 | print(f"Initial joke: {state['joke']}") 497 | 498 | # Select the joke from state to present it to the LLM 499 | msg = llm.invoke(f"Make this joke funnier by adding wordplay: {state['joke']}") 500 | return {"improved_joke": msg.content} 501 | ``` 502 | To make things a bit more complex, we’re now adding two workflows to our agent: 503 | 504 | 1. Generate Joke same as before. 505 | 2. Improve Joke takes the generated joke and makes it better. 506 | 507 | This setup will help us understand how scratchpad selection works in LangGraph. Let’s now compile this workflow the same way we did earlier and check how our graph looks. 508 | ```python 509 | # Build the workflow with two sequential nodes 510 | workflow = StateGraph(State) 511 | 512 | # Add both joke generation nodes 513 | workflow.add_node("generate_joke", generate_joke) 514 | workflow.add_node("improve_joke", improve_joke) 515 | 516 | # Connect nodes in sequence 517 | workflow.add_edge(START, "generate_joke") 518 | workflow.add_edge("generate_joke", "improve_joke") 519 | workflow.add_edge("improve_joke", END) 520 | 521 | # Compile the workflow 522 | chain = workflow.compile() 523 | 524 | # Display the workflow visualization 525 | display(Image(chain.get_graph().draw_mermaid_png())) 526 | ``` 527 | ![Our Generated Graph](https://cdn-images-1.medium.com/max/1000/1*XU_CMOwwboMYcK6lw3HjrA.png) 528 | 529 | When we execute this workflow, this is what we get. 530 | ```python 531 | # Execute the workflow to see context selection in action 532 | joke_generator_state = chain.invoke({"topic": "cats"}) 533 | 534 | # Display the final state with rich formatting 535 | console.print("\n[bold blue]Final Workflow State:[/bold blue]") 536 | pprint(joke_generator_state) 537 | 538 | #### OUTPUT #### 539 | Initial joke: Why did the cat join a band? 540 | 541 | Because it wanted to be the purr-cussionist! 542 | Final Workflow State: 543 | { 544 | 'topic': 'cats', 545 | 'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!'} 546 | ``` 547 | Now that we have executed our workflow, we can move on to using it in our memory selection step. 548 | 549 | ### Memory Selection Ability 550 | If agents can save memories, they also need to select relevant memories for the task at hand. This is useful for: 551 | 552 | * [Episodic memories](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-types) few-shot examples showing desired behavior. 553 | * [Procedural memories](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-types) instructions to guide behavior. 554 | * [Semantic memories](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-types) facts or relationships that provide task-relevant context. 555 | 556 | Some agents use narrow, predefined files to store memories: 557 | 558 | * Claude Code uses [`CLAUDE.md`](http://claude.md/). 559 | * [Cursor](https://docs.cursor.com/context/rules) and [Windsurf](https://windsurf.com/editor/directory) use “rules” files for instructions or examples. 560 | 561 | But when storing a large [collection](https://langchain-ai.github.io/langgraph/concepts/memory/#collection) of facts (semantic memories), selection gets harder. 562 | 563 | * [ChatGPT](https://help.openai.com/en/articles/8590148-memory-faq) sometimes retrieves irrelevant memories, as shown by [Simon Willison](https://simonwillison.net/2025/Jun/6/six-months-in-llms/) when ChatGPT wrongly fetched his location and injected it into an image making the context feel like it “no longer belonged to him”. 564 | * To improve selection, embeddings or [knowledge graphs](https://neo4j.com/blog/developer/graphiti-knowledge-graph-memory/#:~:text=changes%20since%20updates%20can%20trigger,and%20holistic%20memory%20for%20agentic) are used for indexing. 565 | 566 | In our previous section, we wrote to the `InMemoryStore` in graph nodes. Now, we can select context from it using the [get](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-storage) method to pull relevant state into our workflow. 567 | 568 | ```python 569 | from langgraph.store.memory import InMemoryStore 570 | 571 | # Initialize the memory store 572 | store = InMemoryStore() 573 | 574 | # Define namespace for organizing memories 575 | namespace = ("rlm", "joke_generator") 576 | 577 | # Store the generated joke in memory 578 | store.put( 579 | namespace, # namespace for organization 580 | "last_joke", # key identifier 581 | {"joke": joke_generator_state["joke"]} # value to store 582 | ) 583 | 584 | # Select (retrieve) the joke from memory 585 | retrieved_joke = store.get(namespace, "last_joke").value 586 | 587 | # Display the retrieved context 588 | console.print("\n[bold green]Retrieved Context from Memory:[/bold green]") 589 | pprint(retrieved_joke) 590 | 591 | #### OUTPUT #### 592 | Retrieved Context from Memory: 593 | {'joke': 'Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!'} 594 | ``` 595 | It successfully retrieves the correct joke from memory. 596 | 597 | Now, we need to write a proper `generate_joke` function that can: 598 | 599 | 1. Take the current state (for the scratchpad context). 600 | 2. Use memory (to fetch past jokes if we’re performing a joke improvement task). 601 | 602 | Let’s code that next. 603 | ```python 604 | # Initialize storage components 605 | checkpointer = InMemorySaver() 606 | memory_store = InMemoryStore() 607 | 608 | def generate_joke(state: State, store: BaseStore) -> dict[str, str]: 609 | """Generate a joke with memory-aware context selection. 610 | 611 | This function demonstrates selecting context from memory before 612 | generating new content, ensuring consistency and avoiding duplication. 613 | 614 | Args: 615 | state: Current state containing the topic 616 | store: Memory store for persistent context 617 | 618 | Returns: 619 | Dictionary with the generated joke 620 | """ 621 | # Select prior joke from memory if it exists 622 | prior_joke = store.get(namespace, "last_joke") 623 | if prior_joke: 624 | prior_joke_text = prior_joke.value["joke"] 625 | print(f"Prior joke: {prior_joke_text}") 626 | else: 627 | print("Prior joke: None!") 628 | 629 | # Generate a new joke that differs from the prior one 630 | prompt = ( 631 | f"Write a short joke about {state['topic']}, " 632 | f"but make it different from any prior joke you've written: {prior_joke_text if prior_joke else 'None'}" 633 | ) 634 | msg = llm.invoke(prompt) 635 | 636 | # Store the new joke in memory for future context selection 637 | store.put(namespace, "last_joke", {"joke": msg.content}) 638 | 639 | return {"joke": msg.content} 640 | ``` 641 | We can now simply execute this memory-aware workflow the same way we did earlier. 642 | ```python 643 | # Build the memory-aware workflow 644 | workflow = StateGraph(State) 645 | workflow.add_node("generate_joke", generate_joke) 646 | 647 | # Connect the workflow 648 | workflow.add_edge(START, "generate_joke") 649 | workflow.add_edge("generate_joke", END) 650 | 651 | # Compile with both checkpointing and memory store 652 | chain = workflow.compile(checkpointer=checkpointer, store=memory_store) 653 | 654 | # Execute the workflow with the first thread 655 | config = {"configurable": {"thread_id": "1"}} 656 | joke_generator_state = chain.invoke({"topic": "cats"}, config) 657 | 658 | #### OUTPUT #### 659 | Prior joke: None! 660 | ``` 661 | No prior joke is detected, We can now print the latest state structure. 662 | ```python 663 | # Get the latest state of the graph 664 | latest_state = chain.get_state(config) 665 | 666 | console.print("\n[bold magenta]Latest Graph State:[/bold magenta]") 667 | pprint(latest_state) 668 | ``` 669 | Our output: 670 | ``` 671 | #### OUTPUT OF LATEST STATE #### 672 | StateSnapshot( 673 | values={ 674 | 'topic': 'cats', 675 | 'joke': "Here's a new one:\n\nWhy did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!" 676 | }, 677 | next=(), 678 | config={ 679 | 'configurable': { 680 | 'thread_id': '1', 681 | 'checkpoint_ns': '', 682 | 'checkpoint_id': '1f068357-cc8d-68cb-8001-31f64daf7bb6' 683 | } 684 | }, 685 | metadata={'source': 'loop', 'step': 1, 'parents': {}}, 686 | created_at='2025-07-24T02:25:38.457825+00:00', 687 | parent_config={ 688 | 'configurable': { 689 | 'thread_id': '1', 690 | 'checkpoint_ns': '', 691 | 'checkpoint_id': '1f068357-c459-6deb-8000-16ce383a5b6b' 692 | } 693 | }, 694 | tasks=(), 695 | interrupts=() 696 | ) 697 | ``` 698 | We fetch the previous joke from memory and pass it to the LLM to improve it. 699 | ```python 700 | # Execute the workflow with a second thread to demonstrate memory persistence 701 | config = {"configurable": {"thread_id": "2"}} 702 | joke_generator_state = chain.invoke({"topic": "cats"}, config) 703 | 704 | 705 | #### OUTPUT #### 706 | Prior joke: Here is a new one: 707 | Why did the cat join a band? 708 | Because it wanted to be the purr-cussionist! 709 | ``` 710 | It has successfully **fetched the correct joke from memory** and **improved it** as expected. 711 | 712 | ### Advantage of LangGraph BigTool Calling 713 | Agents use tools, but giving them too many tools can cause confusion, especially when tool descriptions overlap. This makes it harder for the model to choose the right tool. 714 | 715 | A solution is to use RAG (Retrieval-Augmented Generation) on tool descriptions to fetch only the most relevant tools based on semantic similarity a method Drew Breunig calls [tool loadout](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html). 716 | 717 | > According to [recent research](https://arxiv.org/abs/2505.03275), this improves tool selection accuracy by up to 3x. 718 | 719 | For tool selection, the [LangGraph Bigtool](https://github.com/langchain-ai/langgraph-bigtool) library is ideal. It applies semantic similarity search over tool descriptions to select the most relevant ones for the task. It uses LangGraph’s long-term memory store, allowing agents to search and retrieve the right tools for a given problem. 720 | 721 | Let’s understand `langgraph-bigtool` by using an agent with all functions from Python’s built-in math library. 722 | ```python 723 | import math 724 | 725 | # Collect functions from `math` built-in 726 | all_tools = [] 727 | for function_name in dir(math): 728 | function = getattr(math, function_name) 729 | if not isinstance( 730 | function, types.BuiltinFunctionType 731 | ): 732 | continue 733 | # This is an idiosyncrasy of the `math` library 734 | if tool := convert_positional_only_function_to_tool( 735 | function 736 | ): 737 | all_tools.append(tool) 738 | ``` 739 | We first append all functions from Python’s math module into a list. Next, we need to convert these tool descriptions into vector embeddings so the agent can perform semantic similarity searches. 740 | 741 | For this, we will use an embedding model in our case, the OpenAI text-embedding model. 742 | ```python 743 | # Create registry of tools. This is a dict mapping 744 | # identifiers to tool instances. 745 | tool_registry = { 746 | str(uuid.uuid4()): tool 747 | for tool in all_tools 748 | } 749 | 750 | # Index tool names and descriptions in the LangGraph 751 | # Store. Here we use a simple in-memory store. 752 | embeddings = init_embeddings("openai:text-embedding-3-small") 753 | 754 | store = InMemoryStore( 755 | index={ 756 | "embed": embeddings, 757 | "dims": 1536, 758 | "fields": ["description"], 759 | } 760 | ) 761 | for tool_id, tool in tool_registry.items(): 762 | store.put( 763 | ("tools",), 764 | tool_id, 765 | { 766 | "description": f"{tool.name}: {tool.description}", 767 | }, 768 | ) 769 | ``` 770 | Each function is assigned a unique ID, and we structure these functions into a proper standardized format. This structured format ensures that the functions can be easily converted into embeddings for semantic search. 771 | 772 | Let’s now visualize the agent to see how it looks with all the math functions embedded and ready for semantic search! 773 | ```python 774 | # Initialize agent 775 | builder = create_agent(llm, tool_registry) 776 | agent = builder.compile(store=store) 777 | agent 778 | ``` 779 | ![Our Tool Agent](https://cdn-images-1.medium.com/max/1000/1*7uXCS9bgbNCwxB-6t6ZXOw.png) 780 | 781 | We can now invoke our agent with a simple query and observe how our tool-calling agent selects and uses the most relevant math functions to answer the question. 782 | ```python 783 | # Import a utility function to format and display messages 784 | from utils import format_messages 785 | 786 | # Define the query for the agent. 787 | # This query asks the agent to use one of its math tools to find the arc cosine. 788 | query = "Use available tools to calculate arc cosine of 0.5." 789 | 790 | # Invoke the agent with the query. The agent will search its tools, 791 | # select the 'acos' tool based on the query's semantics, and execute it. 792 | result = agent.invoke({"messages": query}) 793 | 794 | # Format and display the final messages from the agent's execution. 795 | format_messages(result['messages']) 796 | ``` 797 | ``` 798 | ┌────────────── Human ───────────────┐ 799 | │ Use available tools to calculate │ 800 | │ arc cosine of 0.5. │ 801 | └──────────────────────────────────────┘ 802 | 803 | ┌────────────── 📝 AI ─────────────────┐ 804 | │ I will search for a tool to calculate│ 805 | │ the arc cosine of 0.5. │ 806 | │ │ 807 | │ 🔧 Tool Call: retrieve_tools │ 808 | │ Args: { │ 809 | │ "query": "arc cosine arccos │ 810 | │ inverse cosine trig" │ 811 | │ } │ 812 | └──────────────────────────────────────┘ 813 | 814 | ┌────────────── 🔧 Tool Output ────────┐ 815 | │ Available tools: ['acos', 'acosh'] │ 816 | └──────────────────────────────────────┘ 817 | 818 | ┌────────────── 📝 AI ─────────────────┐ 819 | │ Perfect! I found the `acos` function │ 820 | │ which calculates the arc cosine. │ 821 | │ Now I will use it to calculate the │ 822 | │ arc │ 823 | │ cosine of 0.5. │ 824 | │ │ 825 | │ 🔧 Tool Call: acos │ 826 | │ Args: { "x": 0.5 } │ 827 | └──────────────────────────────────────┘ 828 | 829 | ┌────────────── 🔧 Tool Output ────────┐ 830 | │ 1.0471975511965976 │ 831 | └──────────────────────────────────────┘ 832 | 833 | ┌────────────── 📝 AI ─────────────────┐ 834 | │ The arc cosine of 0.5 is ≈**1.047** │ 835 | │ radians. │ 836 | │ │ 837 | │ ✔ Check: cos(π/3)=0.5, π/3≈1.047 rad │ 838 | │ (60°). │ 839 | └──────────────────────────────────────┘ 840 | ``` 841 | You can see how efficiently our ai agent is calling the correct tool. You can learn more about: 842 | 843 | * [**Toolshed**](https://arxiv.org/abs/2410.14594) introduces Toolshed Knowledge Bases and Advanced RAG-Tool Fusion for better tool selection in AI agents. 844 | * [**Graph RAG-Tool Fusion**](https://arxiv.org/abs/2502.07223) combines vector retrieval with graph traversal to capture tool dependencies. 845 | * [**LLM-Tool-Survey**](https://github.com/quchangle1/LLM-Tool-Survey) a comprehensive survey of tool learning with LLMs. 846 | * [**ToolRet**](https://arxiv.org/abs/2503.01763) a benchmark for evaluating and improving tool retrieval in LLMs. 847 | 848 | ### RAG with Contextual Engineering 849 | [RAG (Retrieval-Augmented Generation)](https://github.com/langchain-ai/rag-from-scratch) is a vast topic, and code agents are some of the best examples of agentic RAG in production. 850 | 851 | In practice, RAG is often the central challenge of context engineering. As [Varun from Windsurf](https://x.com/_mohansolo/status/1899630246862966837) points out: 852 | > Indexing ≠ context retrieval. Embedding search with AST-based chunking works, but fails as codebases grow. We need hybrid retrieval: grep/file search, knowledge-graph linking, and relevance-based re-ranking. 853 | 854 | LangGraph provides [tutorials and videos](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/) to help integrate RAG into agents. Typically, you build a retrieval tool that can use any combination of RAG techniques mentioned above. 855 | 856 | To demonstrate, we’ll fetch documents for our RAG system using three of the most recent pages from Lilian Weng’s excellent blog. 857 | 858 | We will start by pulling page content with the `WebBaseLoader` utility. 859 | ```python 860 | # Import the WebBaseLoader to fetch documents from URLs 861 | from langchain_community.document_loaders import WebBaseLoader 862 | 863 | # Define the list of URLs for Lilian Weng's blog posts 864 | urls = [ 865 | "https://lilianweng.github.io/posts/2025-05-01-thinking/", 866 | "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/", 867 | "https://lilianweng.github.io/posts/2024-07-07-hallucination/", 868 | "https://lilianweng.github.io/posts/2024-04-12-diffusion-video/", 869 | ] 870 | 871 | # Load the documents from the specified URLs using a list comprehension. 872 | # This creates a WebBaseLoader for each URL and calls its load() method. 873 | docs = [WebBaseLoader(url).load() for url in urls] 874 | ``` 875 | There are different ways to chunk data for RAG, and proper chunking is crucial for effective retrieval. 876 | 877 | Here, we’ll split the fetched documents into smaller chunks before indexing them into our vectorstore. We’ll use a simple, direct approach such as recursive chunking with overlapping segments to preserve context across chunks while keeping them manageable for embedding and retrieval. 878 | ```python 879 | # Import the text splitter for chunking documents 880 | from langchain_text_splitters import RecursiveCharacterTextSplitter 881 | 882 | # Flatten the list of documents. WebBaseLoader returns a list of documents for each URL, 883 | # so we have a list of lists. This comprehension combines them into a single list. 884 | docs_list = [item for sublist in docs for item in sublist] 885 | 886 | # Initialize the text splitter. This will split the documents into smaller chunks 887 | # of a specified size, with some overlap between chunks to maintain context. 888 | text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( 889 | chunk_size=2000, chunk_overlap=50 890 | ) 891 | 892 | # Split the documents into chunks. 893 | doc_splits = text_splitter.split_documents(docs_list) 894 | ``` 895 | Now that we have our split documents, we can index them into a vector store that we’ll use for semantic search. 896 | ```python 897 | # Import the necessary class for creating an in-memory vector store 898 | from langchain_core.vectorstores import InMemoryVectorStore 899 | 900 | # Create an in-memory vector store from the document splits. 901 | # This uses the 'doc_splits' created in the previous cell and the 'embeddings' model 902 | # initialized earlier to create vector representations of the text chunks. 903 | vectorstore = InMemoryVectorStore.from_documents( 904 | documents=doc_splits, embedding=embeddings 905 | ) 906 | 907 | # Create a retriever from the vector store. 908 | # The retriever provides an interface to search for relevant documents 909 | # based on a query. 910 | retriever = vectorstore.as_retriever() 911 | ``` 912 | We have to create a retriever tool that we can use in our agent. 913 | ```python 914 | # Import the function to create a retriever tool 915 | from langchain.tools.retriever import create_retriever_tool 916 | 917 | # Create a retriever tool from the vector store retriever. 918 | # This tool allows the agent to search for and retrieve relevant 919 | # documents from the blog posts based on a query. 920 | retriever_tool = create_retriever_tool( 921 | retriever, 922 | "retrieve_blog_posts", 923 | "Search and return information about Lilian Weng blog posts.", 924 | ) 925 | 926 | # The following line is an example of how to invoke the tool directly. 927 | # It's commented out as it's not needed for the agent execution flow but can be useful for testing. 928 | # retriever_tool.invoke({"query": "types of reward hacking"}) 929 | ``` 930 | Now, we can implement an agent that can select context from the tool. 931 | ```python 932 | # Augment the LLM with tools 933 | tools = [retriever_tool] 934 | tools_by_name = {tool.name: tool for tool in tools} 935 | llm_with_tools = llm.bind_tools(tools) 936 | ``` 937 | For RAG based solutions, we need to create a clear system prompt to guide our agent’s behavior. This prompt acts as its core instruction set. 938 | ```python 939 | from langgraph.graph import MessagesState 940 | from langchain_core.messages import SystemMessage, ToolMessage 941 | from typing_extensions import Literal 942 | 943 | rag_prompt = """You are a helpful assistant tasked with retrieving information from a series of technical blog posts by Lilian Weng. 944 | Clarify the scope of research with the user before using your retrieval tool to gather context. Reflect on any context you fetch, and 945 | proceed until you have sufficient context to answer the user's research request.""" 946 | ``` 947 | Next, we define the nodes of our graph. We’ll need two main nodes: 948 | 949 | 1. `llm_call` This is the brain of our agent. It takes the current conversation history (user query + previous tool outputs). It then decides the next step, call a tool or generate a final answer. 950 | 2. `tool_node` This is the action part of our agent. It executes the tool call requested by `llm_call`. It returns the tool’s result back to the agent. 951 | 952 | ```python 953 | # --- Define Agent Nodes --- 954 | 955 | def llm_call(state: MessagesState): 956 | """LLM decides whether to call a tool or generate a final answer.""" 957 | # Add the system prompt to the current message state 958 | messages_with_prompt = [SystemMessage(content=rag_prompt)] + state["messages"] 959 | 960 | # Invoke the LLM with the augmented message list 961 | response = llm_with_tools.invoke(messages_with_prompt) 962 | 963 | # Return the LLM's response to be added to the state 964 | return {"messages": [response]} 965 | 966 | def tool_node(state: dict): 967 | """Performs the tool call and returns the observation.""" 968 | # Get the last message, which should contain the tool calls 969 | last_message = state["messages"][-1] 970 | 971 | # Execute each tool call and collect the results 972 | result = [] 973 | for tool_call in last_message.tool_calls: 974 | tool = tools_by_name[tool_call["name"]] 975 | observation = tool.invoke(tool_call["args"]) 976 | result.append(ToolMessage(content=str(observation), tool_call_id=tool_call["id"])) 977 | 978 | # Return the tool's output as a message 979 | return {"messages": result} 980 | ``` 981 | We need a way to control the agent’s flow deciding whether it should call a tool or if it’s finished. 982 | 983 | To handle this, we will create a conditional edge function called `should_continue`. 984 | 985 | * This function checks if the last message from the LLM contains a tool call. 986 | * If it does, the graph routes to the `tool_node`. 987 | * If not, the execution ends. 988 | 989 | ```python 990 | # --- Define Conditional Edge --- 991 | 992 | def should_continue(state: MessagesState) -> Literal["Action", END]: 993 | """Decides the next step based on whether the LLM made a tool call.""" 994 | last_message = state["messages"][-1] 995 | 996 | # If the LLM made a tool call, route to the tool_node 997 | if last_message.tool_calls: 998 | return "Action" 999 | # Otherwise, end the workflow 1000 | return END 1001 | ``` 1002 | We can now simply build the workflow and compile the graph. 1003 | ```python 1004 | # Build workflow 1005 | agent_builder = StateGraph(MessagesState) 1006 | 1007 | # Add nodes 1008 | agent_builder.add_node("llm_call", llm_call) 1009 | agent_builder.add_node("environment", tool_node) 1010 | 1011 | # Add edges to connect nodes 1012 | agent_builder.add_edge(START, "llm_call") 1013 | agent_builder.add_conditional_edges( 1014 | "llm_call", 1015 | should_continue, 1016 | { 1017 | # Name returned by should_continue : Name of next node to visit 1018 | "Action": "environment", 1019 | END: END, 1020 | }, 1021 | ) 1022 | agent_builder.add_edge("environment", "llm_call") 1023 | 1024 | # Compile the agent 1025 | agent = agent_builder.compile() 1026 | 1027 | # Show the agent 1028 | display(Image(agent.get_graph(xray=True).draw_mermaid_png())) 1029 | ``` 1030 | ![RAG Based Agent](https://cdn-images-1.medium.com/max/1000/1*0QxVbzakDabkoMfgURIx2w.png) 1031 | 1032 | The graph shows a clear cycle: 1033 | 1034 | 1. the agent starts, calls the LLM. 1035 | 2. based on the LLM’s decision, it either performs an action (calls our retriever tool) and loops back, or it finishes and provides the answer 1036 | 1037 | Let’s test our RAG agent. We’ll ask it a specific question about **“reward hacking”** that can only be answered by retrieving information from the blog posts we indexed. 1038 | ```python 1039 | # Define the user's query 1040 | query = "What are the types of reward hacking discussed in the blogs?" 1041 | 1042 | # Invoke the agent with the query 1043 | result = agent.invoke({"messages": [("user", query)]}) 1044 | 1045 | # --- Display the Final Messages --- 1046 | # Format and print the conversation flow 1047 | format_messages(result['messages']) 1048 | ``` 1049 | ``` 1050 | ┌────────────── Human ───────────────┐ 1051 | │ Clarify scope: I want types of │ 1052 | │ reward hacking from Lilian Weng’s │ 1053 | │ blog on RL. │ 1054 | └──────────────────────────────────────┘ 1055 | 1056 | ┌────────────── 📝 AI ─────────────────┐ 1057 | │ Fetching context from her posts... │ 1058 | └──────────────────────────────────────┘ 1059 | 1060 | ┌────────────── 🔧 Tool Output ────────┐ 1061 | │ She lists 3 main types of reward │ 1062 | │ hacking in RL: │ 1063 | └──────────────────────────────────────┘ 1064 | 1065 | ┌────────────── 📝 AI ─────────────────┐ 1066 | │ 1. **Spec gaming** – Exploit reward │ 1067 | │ loopholes, not real goal. │ 1068 | │ │ 1069 | │ 2. **Reward tampering** – Change or │ 1070 | │ hack reward signals. │ 1071 | │ │ 1072 | │ 3. **Wireheading** – Self-stimulate │ 1073 | │ reward instead of task. │ 1074 | └──────────────────────────────────────┘ 1075 | 1076 | ┌────────────── 📝 AI ─────────────────┐ 1077 | │ These can cause harmful, unintended │ 1078 | │ behaviors in RL agents. │ 1079 | └──────────────────────────────────────┘ 1080 | ``` 1081 | As you can see, the agent correctly identified that it needed to use its retrieval tool. It then successfully retrieved the relevant context from the blog posts and used that information to provide a detailed and accurate answer. 1082 | 1083 | > This is a perfect example of how contextual engineering through RAG can create powerful, knowledgeable agents. 1084 | 1085 | ### Compression Strategy with knowledgeable Agents 1086 | Agent interactions can span [hundreds of turns](https://www.anthropic.com/engineering/built-multi-agent-research-system) and involve token-heavy tool calls. Summarization is a common way to manage this. 1087 | 1088 | ![Third Component of CE](https://cdn-images-1.medium.com/max/1000/1*Xu76qgF1u2G3JipeIgHo5Q.png) 1089 | *Third Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 1090 | 1091 | For example: 1092 | 1093 | * Claude Code uses “[auto-compact](https://docs.anthropic.com/en/docs/claude-code/costs)” when the context window exceeds 95%, summarizing the entire user-agent interaction history. 1094 | * Summarization can compress an [agent trajectory](https://langchain-ai.github.io/langgraph/concepts/memory/#manage-short-term-memory) using strategies like [recursive](https://arxiv.org/pdf/2308.15022#:~:text=the%20retrieved%20utterances%20capture%20the,based%203) or [hierarchical](https://alignment.anthropic.com/2025/summarization-for-monitoring/#:~:text=We%20addressed%20these%20issues%20by,of%20our%20computer%20use%20capability) summarization. 1095 | 1096 | You can also add summarization at specific points: 1097 | 1098 | * After token-heavy tool calls (e.g., search tools) [example here](https://github.com/langchain-ai/open_deep_research/blob/e5a5160a398a3699857d00d8569cb7fd0ac48a4f/src/open_deep_research/utils.py#L1407). 1099 | * At agent-agent boundaries for knowledge transfer [Cognition](https://cognition.ai/blog/dont-build-multi-agents#a-theory-of-building-long-running-agents) does this in Devin using a fine-tuned model. 1100 | 1101 | ![Summarization approach langgraph](https://cdn-images-1.medium.com/max/1500/1*y5AhaYoM_XDDrvlAnnFhcQ.png) 1102 | *Summarization approach langgraph (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 1103 | 1104 | LangGraph is a [low-level orchestration framework](https://blog.langchain.com/how-to-think-about-agent-frameworks/), giving you full control over: 1105 | 1106 | * Designing your agent as a set of [nodes](https://www.youtube.com/watch?v=aHCDrAbH_go). 1107 | * Explicitly defining logic within each node. 1108 | * Passing a shared state object between nodes. 1109 | 1110 | This makes it easy to compress context in different ways. For instance, you can: 1111 | 1112 | * Use a message list as the agent state. 1113 | * Summarize it with [built-in utilities](https://langchain-ai.github.io/langgraph/how-tos/memory/add-memory/#manage-short-term-memory). 1114 | 1115 | We will b using the same RAG based tool calling agent we coded earlier and add summarization of its conversation history. 1116 | 1117 | First, we need to extend our graph’s state to include a field for the final summary. 1118 | ```python 1119 | # Define extended state with a summary field 1120 | class State(MessagesState): 1121 | """Extended state that includes a summary field for context compression.""" 1122 | summary: str 1123 | ``` 1124 | Next, we’ll define a dedicated prompt for summarization and keep our RAG prompt from before. 1125 | ```python 1126 | # Define the summarization prompt 1127 | summarization_prompt = """Summarize the full chat history and all tool feedback to 1128 | give an overview of what the user asked about and what the agent did.""" 1129 | ``` 1130 | Now, we’ll create a `summary_node`. 1131 | 1132 | * This node will be triggered at the end of the agent’s work to generate a concise summary of the entire interaction. 1133 | * The `llm_call` and `tool_node` remain unchanged. 1134 | ```python 1135 | def summary_node(state: MessagesState) -> dict: 1136 | """ 1137 | Generate a summary of the conversation and tool interactions. 1138 | 1139 | Args: 1140 | state: The current state of the graph, containing the message history. 1141 | 1142 | Returns: 1143 | A dictionary with the key "summary" and the generated summary string 1144 | as the value, which updates the state. 1145 | """ 1146 | # Prepend the summarization system prompt to the message history 1147 | messages = [SystemMessage(content=summarization_prompt)] + state["messages"] 1148 | 1149 | # Invoke the language model to generate the summary 1150 | result = llm.invoke(messages) 1151 | 1152 | # Return the summary to be stored in the 'summary' field of the state 1153 | return {"summary": result.content} 1154 | ``` 1155 | Our conditional edge should_continue now needs to decide whether to call a tool or move forward to the new summary_node. 1156 | ```python 1157 | def should_continue(state: MessagesState) -> Literal["Action", "summary_node"]: 1158 | """Determine next step based on whether LLM made tool calls.""" 1159 | last_message = state["messages"][-1] 1160 | 1161 | # If LLM made tool calls, execute them 1162 | if last_message.tool_calls: 1163 | return "Action" 1164 | # Otherwise, proceed to summarization 1165 | return "summary_node" 1166 | ``` 1167 | Let’s build the graph with this new summarization step at the end. 1168 | ```python 1169 | # Build the RAG agent workflow 1170 | agent_builder = StateGraph(State) 1171 | 1172 | # Add nodes to the workflow 1173 | agent_builder.add_node("llm_call", llm_call) 1174 | agent_builder.add_node("Action", tool_node) 1175 | agent_builder.add_node("summary_node", summary_node) 1176 | 1177 | # Define the workflow edges 1178 | agent_builder.add_edge(START, "llm_call") 1179 | agent_builder.add_conditional_edges( 1180 | "llm_call", 1181 | should_continue, 1182 | { 1183 | "Action": "Action", 1184 | "summary_node": "summary_node", 1185 | }, 1186 | ) 1187 | agent_builder.add_edge("Action", "llm_call") 1188 | agent_builder.add_edge("summary_node", END) 1189 | 1190 | # Compile the agent 1191 | agent = agent_builder.compile() 1192 | 1193 | # Display the agent workflow 1194 | display(Image(agent.get_graph(xray=True).draw_mermaid_png())) 1195 | ``` 1196 | ![Our Created Agent](https://cdn-images-1.medium.com/max/1000/1*UTtZj95DQ9_0hXb-h2UetQ.png) 1197 | 1198 | Now, let’s run it with a query that will require fetching a lot of context. 1199 | ```python 1200 | from rich.markdown import Markdown 1201 | 1202 | query = "Why does RL improve LLM reasoning according to the blogs?" 1203 | result = agent.invoke({"messages": [("user", query)]}) 1204 | 1205 | # Print the final message to the user 1206 | format_message(result['messages'][-1]) 1207 | 1208 | # Print the generated summary 1209 | Markdown(result["summary"]) 1210 | 1211 | 1212 | #### OUTPUT #### 1213 | The user asked about why reinforcement learning (RL) improves LLM re... 1214 | ``` 1215 | Nice, but it uses **115k tokens**! You can see the full trace [here](https://smith.langchain.com/public/50d70503-1a8e-46c1-bbba-a1efb8626b05/r). This is a common challenge with agents that have token-heavy tool calls. 1216 | 1217 | A more efficient approach is to compress the context *before* it enters the agent’s main scratchpad. Let’s update the RAG agent to summarize the tool call output on the fly. 1218 | 1219 | First, a new prompt for this specific task: 1220 | ```python 1221 | tool_summarization_prompt = """You will be provided a doc from a RAG system. 1222 | Summarize the docs, ensuring to retain all relevant / essential information. 1223 | Your goal is simply to reduce the size of the doc (tokens) to a more manageable size.""" 1224 | ``` 1225 | Next, we’ll modify our **tool_node** to include this summarization step. 1226 | ```python 1227 | def tool_node_with_summarization(state: dict): 1228 | """Performs the tool call and then summarizes the output.""" 1229 | result = [] 1230 | for tool_call in state["messages"][-1].tool_calls: 1231 | tool = tools_by_name[tool_call["name"]] 1232 | observation = tool.invoke(tool_call["args"]) 1233 | 1234 | # Summarize the doc 1235 | summary_msg = llm.invoke([ 1236 | SystemMessage(content=tool_summarization_prompt), 1237 | ("user", str(observation)) 1238 | ]) 1239 | 1240 | result.append(ToolMessage(content=summary_msg.content, tool_call_id=tool_call["id"])) 1241 | return {"messages": result} 1242 | ``` 1243 | Now, our `should_continue` edge can be simplified since we don’t need the final `summary_node` anymore. 1244 | ```python 1245 | def should_continue(state: MessagesState) -> Literal["Action", END]: 1246 | """Decide if we should continue the loop or stop.""" 1247 | if state["messages"][-1].tool_calls: 1248 | return "Action" 1249 | return END 1250 | ``` 1251 | Let’s build and compile this more efficient agent. 1252 | ```python 1253 | # Build workflow 1254 | agent_builder = StateGraph(MessagesState) 1255 | 1256 | # Add nodes 1257 | agent_builder.add_node("llm_call", llm_call) 1258 | agent_builder.add_node("Action", tool_node_with_summarization) 1259 | 1260 | # Add edges to connect nodes 1261 | agent_builder.add_edge(START, "llm_call") 1262 | agent_builder.add_conditional_edges( 1263 | "llm_call", 1264 | should_continue, 1265 | { 1266 | "Action": "Action", 1267 | END: END, 1268 | }, 1269 | ) 1270 | agent_builder.add_edge("Action", "llm_call") 1271 | 1272 | # Compile the agent 1273 | agent = agent_builder.compile() 1274 | 1275 | # Show the agent 1276 | display(Image(agent.get_graph(xray=True).draw_mermaid_png())) 1277 | ``` 1278 | ![Our Updated Agent](https://cdn-images-1.medium.com/max/1000/1*FCRrXQxZveaQxyLHf6AROQ.png) 1279 | 1280 | Let’s run the same query and see the difference. 1281 | ```python 1282 | query = "Why does RL improve LLM reasoning according to the blogs?" 1283 | result = agent.invoke({"messages": [("user", query)]}) 1284 | format_messages(result['messages']) 1285 | ``` 1286 | ``` 1287 | ┌────────────── user ───────────────┐ 1288 | │ Why does RL improve LLM reasoning?│ 1289 | │ According to the blogs? │ 1290 | └───────────────────────────────────┘ 1291 | 1292 | ┌────────────── 📝 AI ──────────────┐ 1293 | │ Searching Lilian Weng’s blog for │ 1294 | │ how RL improves LLM reasoning... │ 1295 | │ │ 1296 | │ 🔧 Tool Call: retrieve_blog_posts │ 1297 | │ Args: { │ 1298 | │ "query": "Reinforcement Learning │ 1299 | │ for LLM reasoning" │ 1300 | │ } │ 1301 | └───────────────────────────────────┘ 1302 | 1303 | ┌────────────── 🔧 Tool Output ─────┐ 1304 | │ Lilian Weng explains RL helps LLM │ 1305 | │ reasoning by training on rewards │ 1306 | │ for each reasoning step (Process- │ 1307 | │ based Reward Models). This guides │ 1308 | │ the model to think step-by-step, │ 1309 | │ improving coherence and logic. │ 1310 | └───────────────────────────────────┘ 1311 | 1312 | ┌────────────── 📝 AI ──────────────┐ 1313 | │ RL improves LLM reasoning by │ 1314 | │ rewarding stepwise thinking via │ 1315 | │ PRMs, encouraging coherent, │ 1316 | │ logical argumentation over final │ 1317 | │ answers. It helps the model self- │ 1318 | │ correct and explore better paths. │ 1319 | └───────────────────────────────────┘ 1320 | ``` 1321 | > This time, the agent only used **60k tokens** See the trace [here](https://smith.langchain.com/public/994cdf93-e837-4708-9628-c83b397dd4b5/r). 1322 | 1323 | This simple change cut our token usage nearly in half, making the agent far more efficient and cost-effective. 1324 | 1325 | You can learn more about: 1326 | 1327 | * [**Heuristic Compression and Message Trimming**](https://langchain-ai.github.io/langgraph/how-tos/memory/add-memory/#trim-messages) managing token limits by trimming messages to prevent context overflow. 1328 | * [**SummarizationNode as Pre-Model Hook**](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent-manage-message-history/) summarizing conversation history to control token usage in ReAct agents. 1329 | * [**LangMem Summarization**](https://langchain-ai.github.io/langmem/guides/summarization/) strategies for long context management with message summarization and running summaries. 1330 | 1331 | ### Isolating Context using Sub-Agents Architecture 1332 | A common way to isolate context is by splitting it across sub-agents. OpenAI [Swarm](https://github.com/openai/swarm) library was designed for this “[separation of concerns](https://openai.github.io/openai-agents-python/ref/agent/)” where each agent manages a specific sub-task with its own tools, instructions, and context window. 1333 | 1334 | ![Fourth Component of CE](https://cdn-images-1.medium.com/max/1000/1*-b9BLPkLHkYsy2iLQIdxUg.png) 1335 | *Fourth Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 1336 | 1337 | Anthropic’s [multi-agent researcher](https://www.anthropic.com/engineering/built-multi-agent-research-system) showed that multiple agents with isolated contexts outperformed a single agent by 90.2%, as each sub-agent focuses on a narrower sub-task. 1338 | 1339 | > *Subagents operate in parallel with their own context windows, exploring different aspects of the question simultaneously.* 1340 | 1341 | However, multi-agent systems have challenges: 1342 | 1343 | * Much higher token use (sometimes 15× more tokens than single-agent chat). 1344 | * Careful [prompt engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system) is required to plan sub-agent work. 1345 | * Coordinating sub-agents can be complex. 1346 | 1347 | ![Multi Agent Parallelization](https://cdn-images-1.medium.com/max/1000/1*N_BT9M5OyYB7UJfDkpcL-g.png) 1348 | *Multi Agent Parallelization (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))* 1349 | 1350 | LangGraph supports multi-agent setups. A common approach is the [supervisor](https://github.com/langchain-ai/langgraph-supervisor-py) architecture, also used in Anthropic multi-agent researcher. The supervisor delegates tasks to sub-agents, each running in its own context window. 1351 | 1352 | Let’s build a simple supervisor that manages two agents: 1353 | 1354 | * `math_expert` handles mathematical calculations. 1355 | * `research_expert` searches and provides researched information. 1356 | 1357 | The supervisor will decide which expert to call based on the query and coordinate their responses within the LangGraph workflow. 1358 | ```python 1359 | from langgraph.prebuilt import create_react_agent 1360 | from langgraph_supervisor import create_supervisor 1361 | 1362 | # --- Define Tools for Each Agent --- 1363 | def add(a: float, b: float) -> float: 1364 | """Add two numbers.""" 1365 | return a + b 1366 | 1367 | def multiply(a: float, b: float) -> float: 1368 | """Multiply two numbers.""" 1369 | return a * b 1370 | 1371 | def web_search(query: str) -> str: 1372 | """Mock web search function that returns FAANG company headcounts.""" 1373 | return ( 1374 | "Here are the headcounts for each of the FAANG companies in 2024:\n" 1375 | "1. **Facebook (Meta)**: 67,317 employees.\n" 1376 | "2. **Apple**: 164,000 employees.\n" 1377 | "3. **Amazon**: 1,551,000 employees.\n" 1378 | "4. **Netflix**: 14,000 employees.\n" 1379 | "5. **Google (Alphabet)**: 181,269 employees." 1380 | ) 1381 | ``` 1382 | Now we can create our specialized agents and the supervisor to manage them. 1383 | ```python 1384 | # --- Create Specialized Agents with Isolated Contexts --- 1385 | math_agent = create_react_agent( 1386 | model=llm, 1387 | tools=[add, multiply], 1388 | name="math_expert", 1389 | prompt="You are a math expert. Always use one tool at a time." 1390 | ) 1391 | 1392 | research_agent = create_react_agent( 1393 | model=llm, 1394 | tools=[web_search], 1395 | name="research_expert", 1396 | prompt="You are a world class researcher with access to web search. Do not do any math." 1397 | ) 1398 | 1399 | # --- Create Supervisor Workflow for Coordinating Agents --- 1400 | workflow = create_supervisor( 1401 | [research_agent, math_agent], 1402 | model=llm, 1403 | prompt=( 1404 | "You are a team supervisor managing a research expert and a math expert. " 1405 | "Delegate tasks to the appropriate agent to answer the user's query. " 1406 | "For current events or facts, use research_agent. " 1407 | "For math problems, use math_agent." 1408 | ) 1409 | ) 1410 | 1411 | # Compile the multi-agent application 1412 | app = workflow.compile() 1413 | ``` 1414 | Let’s execute the workflow and see how the supervisor delegates tasks. 1415 | ```python 1416 | # --- Execute the Multi-Agent Workflow --- 1417 | result = app.invoke({ 1418 | "messages": [ 1419 | { 1420 | "role": "user", 1421 | "content": "what's the combined headcount of the FAANG companies in 2024?" 1422 | } 1423 | ] 1424 | }) 1425 | 1426 | # Format and display the results 1427 | format_messages(result['messages']) 1428 | ``` 1429 | ``` 1430 | ┌────────────── user ───────────────┐ 1431 | │ Learn more about LangGraph Swarm │ 1432 | │ and multi-agent systems. │ 1433 | └───────────────────────────────────┘ 1434 | 1435 | ┌────────────── 📝 AI ──────────────┐ 1436 | │ Fetching details on LangGraph │ 1437 | │ Swarm and related resources... │ 1438 | └───────────────────────────────────┘ 1439 | 1440 | ┌────────────── 🔧 Tool Output ─────┐ 1441 | │ **LangGraph Swarm** │ 1442 | │ Repo: │ 1443 | │ https://github.com/langchain-ai/ │ 1444 | │ langgraph-swarm-py │ 1445 | │ │ 1446 | │ • Python library for multi-agent │ 1447 | │ AI with dynamic collaboration. │ 1448 | │ • Agents hand off control based │ 1449 | │ on specialization, keeping │ 1450 | │ conversation context. │ 1451 | │ • Supports custom handoffs, │ 1452 | │ streaming, memory, and human- │ 1453 | │ in-the-loop. │ 1454 | │ • Install: │ 1455 | │ `pip install langgraph-swarm` │ 1456 | └───────────────────────────────────┘ 1457 | 1458 | ┌────────────── 🔧 Tool Output ─────┐ 1459 | │ **Videos on multi-agent systems** │ 1460 | │ 1. https://youtu.be/4nZl32FwU-o │ 1461 | │ 2. https://youtu.be/JeyDrn1dSUQ │ 1462 | │ 3. https://youtu.be/B_0TNuYi56w │ 1463 | └───────────────────────────────────┘ 1464 | 1465 | ┌────────────── 📝 AI ──────────────┐ 1466 | │ LangGraph Swarm makes it easy to │ 1467 | │ build context-aware multi-agent │ 1468 | │ systems. Check videos for deeper │ 1469 | │ insights on multi-agent behavior. │ 1470 | └───────────────────────────────────┘ 1471 | ``` 1472 | Here, the supervisor correctly isolates the context for each task sending the research query to the researcher and the math problem to the mathematician showing effective context isolation. 1473 | 1474 | You can learn more about: 1475 | 1476 | * [**LangGraph Swarm**](https://github.com/langchain-ai/langgraph-swarm-py) a Python library for building multi-agent systems with dynamic handoffs, memory, and human-in-the-loop support. 1477 | * [**Videos on multi-agent systems**](https://www.youtube.com/watch?v=4nZl32FwU-o) additional insights into building collaborative AI agents ([video 2](https://www.youtube.com/watch?v=JeyDrn1dSUQ), [video 3](https://www.youtube.com/watch?v=B_0TNuYi56w)). 1478 | 1479 | ### Isolation using Sandboxed Environments 1480 | HuggingFace’s [deep researcher](https://huggingface.co/blog/open-deep-research#:~:text=From%20building%20,it%20can%20still%20use%20it) shows a cool way to isolate context. Most agents use [tool calling APIs](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview) that return JSON arguments to run tools like search APIs and get results. 1481 | 1482 | HuggingFace uses a [CodeAgent](https://huggingface.co/papers/2402.01030) that writes code to call tools. This code runs in a secure [sandbox](https://e2b.dev/), and results from running the code are sent back to the LLM. 1483 | 1484 | This keeps heavy data (like images or audio) outside the LLM’s token limit. HuggingFace explains: 1485 | 1486 | > *[Code Agents allow for] better handling of state … Need to store this image/audio/other for later? Just save it as a variable in your state and use it later.* 1487 | 1488 | Using sandboxes with LangGraph is easy. The [LangChain Sandbox](https://github.com/langchain-ai/langchain-sandbox) runs untrusted Python code securely using Pyodide (Python compiled to WebAssembly). You can add this as a tool to any LangGraph agent. 1489 | 1490 | **Note:** Deno is required. Install it here: https://docs.deno.com/runtime/getting_started/installation/ 1491 | ```python 1492 | from langchain_sandbox import PyodideSandboxTool 1493 | from langgraph.prebuilt import create_react_agent 1494 | 1495 | # Create a sandbox tool with network access for package installation 1496 | tool = PyodideSandboxTool(allow_net=True) 1497 | 1498 | # Create a ReAct agent with the sandbox tool 1499 | agent = create_react_agent(llm, tools=[tool]) 1500 | 1501 | # Execute a mathematical query using the sandbox 1502 | result = await agent.ainvoke( 1503 | {"messages": [{"role": "user", "content": "what's 5 + 7?"}]}, 1504 | ) 1505 | 1506 | # Format and display the results 1507 | format_messages(result['messages']) 1508 | ``` 1509 | ``` 1510 | ┌────────────── user ───────────────┐ 1511 | │ what's 5 + 7? │ 1512 | └──────────────────────────────────┘ 1513 | 1514 | ┌────────────── 📝 AI ──────────────┐ 1515 | │ I can solve this by executing │ 1516 | │ Python code in the sandbox. │ 1517 | │ │ 1518 | │ 🔧 Tool Call: pyodide_sandbox │ 1519 | │ Args: { │ 1520 | │ "code": "print(5 + 7)" │ 1521 | │ } │ 1522 | └──────────────────────────────────┘ 1523 | 1524 | ┌────────────── 🔧 Tool Output ─────┐ 1525 | │ 12 │ 1526 | └──────────────────────────────────┘ 1527 | 1528 | ┌────────────── 📝 AI ──────────────┐ 1529 | │ The answer is 12. │ 1530 | └──────────────────────────────────┘ 1531 | ``` 1532 | ### State Isolation in LangGraph 1533 | An agent’s **runtime state object** is another great way to isolate context, similar to sandboxing. You can design this state with a schema (like a Pydantic model) that has different fields for storing context. 1534 | 1535 | For example, one field (like `messages`) is shown to the LLM each turn, while other fields keep information isolated until needed. 1536 | 1537 | LangGraph is built around a [**state**](https://langchain-ai.github.io/langgraph/concepts/low_level/#state) object, letting you create a custom state schema and access its fields throughout the agent’s workflow. 1538 | 1539 | For instance, you can store tool call results in specific fields, keeping them hidden from the LLM until necessary. You’ve seen many examples of this in these notebooks. 1540 | 1541 | ### Summarizing Everything 1542 | Let’s summarize, what we have done so far: 1543 | 1544 | * We used LangGraph `StateGraph` to create a **"scratchpad"** for short-term memory and an `InMemoryStore` for long-term memory, allowing our agent to store and recall information. 1545 | * We demonstrated how to selectively pull relevant information from the agent’s state and long-term memory. This included using Retrieval-Augmented Generation (`RAG`) to find specific knowledge and `langgraph-bigtool` to select the right tool from many options. 1546 | * To manage long conversations and token-heavy tool outputs, we implemented summarization. 1547 | * We showed how to compress `RAG` results on-the-fly to make the agent more efficient and reduce token usage. 1548 | * We explored keeping contexts separate to avoid confusion by building a multi-agent system with a supervisor that delegates tasks to specialized sub-agents and by using sandboxed environments to run code. 1549 | 1550 | All these techniques fall under **“Contextual Engineering”** a strategy to improve AI agents by carefully managing their working memory (`context`) to make them more efficient, accurate, and capable of handling complex, long-running tasks. -------------------------------------------------------------------------------- /contextual_engineering.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e4a2c5f1", 6 | "metadata": {}, 7 | "source": [ 8 | "\n", 9 | "# LangChain AI Agents Using Contextual Engineering\n", 10 | "\n", 11 | "Context engineering means creating the right setup for an AI before giving it a task. This setup includes:\n", 12 | "\n", 13 | "* **Instructions** on how the AI should act, like being a helpful budget travel guide\n", 14 | "* Access to **useful info** from databases, documents, or live sources.\n", 15 | "* Remembering **past conversations** to avoid repeats or forgetting.\n", 16 | "* **Tools** the AI can use, such as calculators or search features.\n", 17 | "* Important details about you, like your **preferences** or location.\n", 18 | "\n", 19 | "![Context Engineering](https://cdn-images-1.medium.com/max/1500/1*sCTOzjG6KP7slQuxLZUtNg.png)\n", 20 | "*Context Engineering (From [LangChain](https://blog.langchain.com/context-engineering-for-agents/) and [12Factor](https://github.com/humanlayer/12-factor-agents/tree/main))*\n", 21 | "\n", 22 | "[AI engineers are now shifting](https://diamantai.substack.com/p/why-ai-experts-are-moving-from-prompt) from prompt engineering to context engineering because…\n", 23 | "\n", 24 | "> context engineering focuses on providing AI with the right background and tools, making its answers smarter and more useful.\n", 25 | "\n", 26 | "In this notebook, we will explore how **LangChain** and **LangGraph**, two powerful tools for building AI agents, RAG apps, and LLM apps, can be used to implement **contextual engineering** effectively to improve our AI Agents." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "id": "a9b1d2c3", 32 | "metadata": {}, 33 | "source": [ 34 | "### Table of Contents\n", 35 | "- [What is Context Engineering?](#what-is-context-engineering)\n", 36 | "- [Writing Context: Scratchpad and Memory](#writing-context-scratchpad-and-memory)\n", 37 | "- [Selecting Context: State, Memory, RAG, and Tools](#selecting-context-state-memory-rag-and-tools)\n", 38 | "- [Compressing Context: Summarization Strategies](#compressing-context-summarization-strategies)\n", 39 | "- [Isolating Context: Sub-Agents and Sandboxing](#isolating-context-sub-agents-and-sandboxing)\n", 40 | "- [Summarizing Everything](#summarizing-everything)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "id": "f5g6h7i8", 46 | "metadata": {}, 47 | "source": [ 48 | "### What is Context Engineering?\n", 49 | "LLMs work like a new type of operating system. The LLM acts like the CPU, and its context window works like RAM, serving as its short-term memory. But, like RAM, the context window has limited space for different information.\n", 50 | "\n", 51 | "> Just as an operating system decides what goes into RAM, “context engineering” is about choosing what the LLM should keep in its context.\n", 52 | "\n", 53 | "![Different Context Types](https://cdn-images-1.medium.com/max/1000/1*kMEQSslFkhLiuJS8-WEMIg.png)\n", 54 | "\n", 55 | "When building LLM applications, we need to manage different types of context. Context engineering covers these main types:\n", 56 | "\n", 57 | "* Instructions: prompts, examples, memories, and tool descriptions\n", 58 | "* Knowledge: facts, stored information, and memories\n", 59 | "* Tools: feedback and results from tool calls\n", 60 | "\n", 61 | "This year, more people are interested in agents because LLMs are better at thinking and using tools. Agents work on long tasks by using LLMs and tools together, choosing the next step based on the tool’s feedback.\n", 62 | "\n", 63 | "![Agent Workflow](https://cdn-images-1.medium.com/max/1500/1*Do44CZkpPYyIJefuNQ69GA.png)\n", 64 | "\n", 65 | "But long tasks and collecting too much feedback from tools use a lot of tokens. This can create problems: the context window can overflow, costs and delays can increase, and the agent might work worse.\n", 66 | "\n", 67 | "Drew Breunig explained how too much context can hurt performance, including:\n", 68 | "\n", 69 | "* Context Poisoning: [when a mistake or hallucination gets added to the context](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-poisoning)\n", 70 | "* Context Distraction: [when too much context confuses the model](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-distraction)\n", 71 | "* Context Confusion: [when extra, unnecessary details affect the answer](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-confusion)\n", 72 | "* Context Clash: [when parts of the context give conflicting information](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-clash)\n", 73 | "\n", 74 | "![Multiple turns in Agent](https://cdn-images-1.medium.com/max/1500/1*ZJeZJPKI5jC_1BMCoghZxA.png)\n", 75 | "\n", 76 | "Anthropic [in their research](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com) stressed the need for it:\n", 77 | "\n", 78 | "> Agents often have conversations with hundreds of turns, so managing context carefully is crucial.\n", 79 | "\n", 80 | "So, how are people solving this problem today? Common strategies for agent context engineering can be grouped into four main types:\n", 81 | "\n", 82 | "* **Write**: creating clear and useful context\n", 83 | "* **Select**: picking only the most relevant information\n", 84 | "* **Compress**: shortening context to save space\n", 85 | "* **Isolate**: keeping different types of context separate\n", 86 | "\n", 87 | "![Categories of Context Engineering](https://cdn-images-1.medium.com/max/2600/1*CacnXVAI6wR4eSIWgnZ9sg.png)\n", 88 | "*Categories of Context Engineering (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n", 89 | "\n", 90 | "[LangGraph](https://www.langchain.com/langgraph) is built to support all these strategies. We will go through each of these components one by one and see how they help make our AI agents work better." 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "id": "j9k1l2m3", 96 | "metadata": {}, 97 | "source": [ 98 | "### Writing Context: Scratchpad and Memory\n", 99 | "\n", 100 | "The first principle of contextual engineering is **writing** context. This means creating and storing information outside the LLM's immediate context window, which the agent can access later. We will explore two primary mechanisms for this in LangGraph: the **scratchpad** (for short-term, session-specific notes) and **memory** (for long-term persistence across sessions).\n", 101 | "\n", 102 | "![First Component of CE](https://cdn-images-1.medium.com/max/1000/1*aXpKxYt03iZPcrGkxsFvrQ.png)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "id": "n4o5p6q7", 108 | "metadata": {}, 109 | "source": [ 110 | "#### Scratchpad with LangGraph\n", 111 | "Just like humans take notes to remember things for later tasks, agents can do the same using a [scratchpad](https://www.anthropic.com/engineering/claude-think-tool). It stores information outside the context window so the agent can access it whenever needed.\n", 112 | "\n", 113 | "A good example is [Anthropic's multi-agent researcher](https://www.anthropic.com/engineering/built-multi-agent-research-system):\n", 114 | "\n", 115 | "> *The LeadResearcher plans its approach and saves it to memory, because if the context window goes beyond 200,000 tokens, it gets cut off so saving the plan ensures it isn’t lost.*\n", 116 | "\n", 117 | "In LangGraph, the `StateGraph` object serves as this scratchpad. The state is the central data structure passed between nodes in your graph. You define its schema, and each node can read from and write to it. This provides a powerful way to maintain short-term, thread-scoped memory for your agent.\n", 118 | "\n", 119 | "First, let's set up our environment and helper utilities for printing." 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "id": "r8s9t0u1", 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "# Import necessary libraries for typing, formatting, and environment management\n", 130 | "import getpass\n", 131 | "import os\n", 132 | "from typing import TypedDict\n", 133 | "\n", 134 | "from IPython.display import Image, display\n", 135 | "from rich.console import Console\n", 136 | "from rich.pretty import pprint\n", 137 | "\n", 138 | "# Initialize a console for rich, formatted output in the notebook.\n", 139 | "console = Console()\n", 140 | "\n", 141 | "# Set the Anthropic API key to authenticate requests\n", 142 | "# It's recommended to set this as an environment variable for security\n", 143 | "if \"ANTHROPIC_API_KEY\" not in os.environ:\n", 144 | " os.environ[\"ANTHROPIC_API_KEY\"] = getpass.getpass(\"Provide your Anthropic API key: \")" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "id": "v2w3x4y5", 150 | "metadata": {}, 151 | "source": [ 152 | "Next, we will create a `TypedDict` for the state object. This defines the schema of our scratchpad, ensuring data consistency as it flows through the graph." 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "id": "z6a7b8c9", 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [ 162 | "# Define the schema for the graph's state using TypedDict.\n", 163 | "# This class acts as a data structure that will be passed between nodes in the graph.\n", 164 | "# It ensures that the state has a consistent shape and provides type hints.\n", 165 | "class State(TypedDict):\n", 166 | " \"\"\"\n", 167 | " Defines the structure of the state for our joke generator workflow.\n", 168 | "\n", 169 | " Attributes:\n", 170 | " topic: The input topic for which a joke will be generated.\n", 171 | " joke: The output field where the generated joke will be stored.\n", 172 | " \"\"\"\n", 173 | "\n", 174 | " topic: str\n", 175 | " joke: str" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "id": "d0e1f2g3", 181 | "metadata": {}, 182 | "source": [ 183 | "#### Creating a StateGraph to Write to the Scratchpad\n", 184 | "Once we define a state object, we can write context to it using a `StateGraph`. A StateGraph is LangGraph’s main tool for building stateful agents.\n", 185 | "\n", 186 | "- **Nodes** are steps in the workflow. Each node is a function that takes the current state as input and returns updates.\n", 187 | "- **Edges** connect nodes, defining the execution flow.\n", 188 | "\n", 189 | "Let's create a chat model and a node function that uses it to generate a joke and write it to our state object." 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "id": "h4i5j6k7", 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "# Import necessary libraries for LangChain and LangGraph\n", 200 | "from langchain.chat_models import init_chat_model\n", 201 | "from langgraph.graph import END, START, StateGraph\n", 202 | "\n", 203 | "# --- Model Setup ---\n", 204 | "# Initialize the chat model to be used in the workflow\n", 205 | "# We use a specific Claude model with temperature=0 for deterministic outputs\n", 206 | "llm = init_chat_model(\"anthropic:claude-3-sonnet-20240229\", temperature=0)\n", 207 | "\n", 208 | "# --- Define Workflow Node ---\n", 209 | "def generate_joke(state: State) -> dict[str, str]:\n", 210 | " \"\"\"\n", 211 | " A node function that generates a joke based on the topic in the current state.\n", 212 | "\n", 213 | " This function reads the 'topic' from the state, uses the LLM to generate a joke,\n", 214 | " and returns a dictionary to update the 'joke' field in the state.\n", 215 | "\n", 216 | " Args:\n", 217 | " state: The current state of the graph, which must contain a 'topic'.\n", 218 | "\n", 219 | " Returns:\n", 220 | " A dictionary with the 'joke' key to update the state.\n", 221 | " \"\"\"\n", 222 | " # Read the topic from the state\n", 223 | " topic = state[\"topic\"]\n", 224 | " print(f\"Generating a joke about: {topic}\")\n", 225 | "\n", 226 | " # Invoke the language model to generate a joke\n", 227 | " msg = llm.invoke(f\"Write a short joke about {topic}\")\n", 228 | "\n", 229 | " # Return the generated joke to be written back to the state\n", 230 | " return {\"joke\": msg.content}\n", 231 | "\n", 232 | "# --- Build and Compile the Graph ---\n", 233 | "# Initialize a new StateGraph with the predefined State schema\n", 234 | "workflow = StateGraph(State)\n", 235 | "\n", 236 | "# Add the 'generate_joke' function as a node in the graph\n", 237 | "workflow.add_node(\"generate_joke\", generate_joke)\n", 238 | "\n", 239 | "# Define the workflow's execution path:\n", 240 | "# The graph starts at the START entrypoint and flows to our 'generate_joke' node.\n", 241 | "workflow.add_edge(START, \"generate_joke\")\n", 242 | "# After 'generate_joke' completes, the graph execution ends.\n", 243 | "workflow.add_edge(\"generate_joke\", END)\n", 244 | "\n", 245 | "# Compile the workflow into an executable chain\n", 246 | "chain = workflow.compile()\n", 247 | "\n", 248 | "# --- Visualize the Graph ---\n", 249 | "# Display a visual representation of the compiled workflow graph\n", 250 | "display(Image(chain.get_graph().draw_mermaid_png()))" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "id": "l8m9n0o1", 256 | "metadata": {}, 257 | "source": [ 258 | "Now we can execute this workflow. It will take an initial state with a `topic`, run the `generate_joke` node, and write the result into the `joke` field of the state." 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "id": "p2q3r4s5", 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [ 268 | "# --- Execute the Workflow ---\n", 269 | "# Invoke the compiled graph with an initial state containing the topic.\n", 270 | "# The `invoke` method runs the graph from the START node to the END node.\n", 271 | "joke_generator_state = chain.invoke({\"topic\": \"cats\"})\n", 272 | "\n", 273 | "# --- Display the Final State ---\n", 274 | "# Print the final state of the graph after execution.\n", 275 | "# This will show both the input 'topic' and the output 'joke' that was written to the state.\n", 276 | "console.print(\"\\n[bold blue]Joke Generator Final State:[/bold blue]\")\n", 277 | "pprint(joke_generator_state)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "id": "t6u7v8w9", 283 | "metadata": {}, 284 | "source": [ 285 | "#### Memory Writing in LangGraph\n", 286 | "Scratchpads help agents work within a single session, but sometimes agents need to remember things across multiple sessions. This is where long-term memory comes in.\n", 287 | "\n", 288 | "* [Reflexion](https://arxiv.org/abs/2303.11366) introduced the idea of agents reflecting after each turn and reusing self-generated hints.\n", 289 | "* [Generative Agents](https://ar5iv.labs.arxiv.org/html/2304.03442) created long-term memories by summarizing past agent feedback.\n", 290 | "\n", 291 | "![Memory Writing](https://cdn-images-1.medium.com/max/1000/1*VaMVevdSVxDITLK1j0LfRQ.png)\n", 292 | "\n", 293 | "LangGraph supports long-term memory through a `store` that can be passed to a compiled graph. This allows you to persist context *across threads* (e.g., different chat sessions).\n", 294 | "\n", 295 | "- **Checkpointing** saves the graph’s state at each step in a `thread`.\n", 296 | "- **Long-term memory** lets you keep specific context across threads using a key-value `BaseStore`.\n", 297 | "\n", 298 | "Let's enhance our agent to use both short-term checkpointing and a long-term memory store." 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "id": "x0y1z2a3", 305 | "metadata": {}, 306 | "outputs": [], 307 | "source": [ 308 | "# Import memory and persistence components from LangGraph\n", 309 | "from langgraph.checkpoint.memory import InMemorySaver\n", 310 | "from langgraph.store.base import BaseStore\n", 311 | "from langgraph.store.memory import InMemoryStore\n", 312 | "\n", 313 | "# Initialize storage components\n", 314 | "checkpointer = InMemorySaver() # For thread-level state persistence (short-term memory)\n", 315 | "memory_store = InMemoryStore() # For cross-thread memory storage (long-term memory)\n", 316 | "\n", 317 | "# Define a namespace to logically group related data in the long-term store.\n", 318 | "namespace = (\"rlm\", \"joke_generator\")\n", 319 | "\n", 320 | "def generate_joke_with_memory(state: State, store: BaseStore) -> dict[str, str]:\n", 321 | " \"\"\"Generate a joke with memory awareness.\n", 322 | " \n", 323 | " This enhanced version checks for existing jokes in long-term memory\n", 324 | " before generating a new one and saves the new joke.\n", 325 | " \n", 326 | " Args:\n", 327 | " state: Current state containing the topic.\n", 328 | " store: Memory store for persistent context.\n", 329 | " \n", 330 | " Returns:\n", 331 | " A dictionary with the generated joke.\n", 332 | " \"\"\"\n", 333 | " # Check if there's an existing joke in memory (we will cover selection later)\n", 334 | " existing_jokes = list(store.search(namespace))\n", 335 | " if existing_jokes:\n", 336 | " existing_joke_content = existing_jokes[0].value\n", 337 | " print(f\"Found existing joke in memory: {existing_joke_content}\")\n", 338 | " else:\n", 339 | " print(\"No existing joke found in memory.\")\n", 340 | "\n", 341 | " # Generate a new joke based on the topic\n", 342 | " msg = llm.invoke(f\"Write a short joke about {state['topic']}\")\n", 343 | " \n", 344 | " # Write the new joke to long-term memory\n", 345 | " store.put(namespace, \"last_joke\", {\"joke\": msg.content})\n", 346 | " print(f\"Wrote new joke to memory: {msg.content[:50]}...\")\n", 347 | "\n", 348 | " # Return the joke to be added to the current session's state (scratchpad)\n", 349 | " return {\"joke\": msg.content}\n", 350 | "\n", 351 | "\n", 352 | "# Build the workflow with memory capabilities\n", 353 | "workflow_with_memory = StateGraph(State)\n", 354 | "workflow_with_memory.add_node(\"generate_joke\", generate_joke_with_memory)\n", 355 | "workflow_with_memory.add_edge(START, \"generate_joke\")\n", 356 | "workflow_with_memory.add_edge(\"generate_joke\", END)\n", 357 | "\n", 358 | "# Compile with both checkpointing (for session state) and a memory store (for long-term)\n", 359 | "chain_with_memory = workflow_with_memory.compile(checkpointer=checkpointer, store=memory_store)" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "id": "b4c5d6e7", 365 | "metadata": {}, 366 | "source": [ 367 | "Now, let's execute the updated workflow. We'll use a `config` object to specify a `thread_id`. This identifies the current session. The first time we run it, there should be no joke in long-term memory." 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": null, 373 | "id": "f8g9h0i1", 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "# Execute the workflow within a specific thread (e.g., a user session)\n", 378 | "config_thread_1 = {\"configurable\": {\"thread_id\": \"1\"}}\n", 379 | "joke_state_thread_1 = chain_with_memory.invoke({\"topic\": \"dogs\"}, config_thread_1)\n", 380 | "\n", 381 | "# Display the workflow result for the first thread\n", 382 | "console.print(\"\\n[bold cyan]Workflow Result (Thread 1):[/bold cyan]\")\n", 383 | "pprint(joke_state_thread_1)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "id": "j2k3l4m5", 389 | "metadata": {}, 390 | "source": [ 391 | "Because we compiled the workflow with a checkpointer, we can now view the latest state of the graph for that thread. This shows the value of the short-term scratchpad." 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "id": "n6o7p8q9", 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "# --- Retrieve and Inspect the Graph State ---\n", 402 | "# Use the `get_state` method to retrieve the latest state snapshot for thread \"1\".\n", 403 | "latest_state_thread_1 = chain_with_memory.get_state(config_thread_1)\n", 404 | "\n", 405 | "# --- Display the State Snapshot ---\n", 406 | "# The StateSnapshot includes not only the data ('topic', 'joke') but also execution metadata.\n", 407 | "console.print(\"\\n[bold magenta]Latest Graph State (Thread 1):[/bold magenta]\")\n", 408 | "pprint(latest_state_thread_1)" 409 | ] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "id": "r0s1t2u3", 414 | "metadata": {}, 415 | "source": [ 416 | "Now, let's run the workflow again but with a *different* `thread_id`. This simulates a new session. Our long-term memory store should now contain the joke from the first session, demonstrating how context can be persisted and shared across threads." 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "id": "v4w5x6y7", 423 | "metadata": {}, 424 | "outputs": [], 425 | "source": [ 426 | "# Execute the workflow with a different thread ID to simulate a new session\n", 427 | "config_thread_2 = {\"configurable\": {\"thread_id\": \"2\"}}\n", 428 | "joke_state_thread_2 = chain_with_memory.invoke({\"topic\": \"birds\"}, config_thread_2)\n", 429 | "\n", 430 | "# Display the result, which should show that it found the joke from the previous thread in memory\n", 431 | "console.print(\"\\n[bold yellow]Workflow Result (Thread 2):[/bold yellow]\")\n", 432 | "pprint(joke_state_thread_2)" 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "id": "b1c2d3e4", 438 | "metadata": {}, 439 | "source": [ 440 | "### Selecting Context: State, Memory, RAG, and Tools\n", 441 | "\n", 442 | "The second principle is **selecting** context. Once context is written, agents need to be able to retrieve the *most relevant* pieces of information for the current task. This prevents context window overflow and keeps the agent focused.\n", 443 | "\n", 444 | "![Second Component of CE](https://cdn-images-1.medium.com/max/1000/1*VZiHtQ_8AlNdV3HIMrbBZA.png)\n", 445 | "\n", 446 | "We will explore four ways to select context:\n", 447 | "1. **From the Scratchpad (State):** Selecting data written in the current session.\n", 448 | "2. **From Long-Term Memory:** Retrieving data from past sessions.\n", 449 | "3. **From Knowledge (RAG):** Using Retrieval-Augmented Generation to fetch information from documents.\n", 450 | "4. **From Tools (Tool-RAG):** Using RAG to select the best tool for a job." 451 | ] 452 | }, 453 | { 454 | "cell_type": "markdown", 455 | "id": "f5g6h7i8j9", 456 | "metadata": {}, 457 | "source": [ 458 | "#### Scratchpad Selection Approach\n", 459 | "How you select context from a scratchpad depends on its implementation. Since our scratchpad is the agent's runtime `State` object, we (the developer) decide which parts of the state to share with the agent at each step. This gives fine-grained control.\n", 460 | "\n", 461 | "Let's create a two-step workflow. The first node generates a joke (writes to state). The second node *selects* that joke from the state and improves it." 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": null, 467 | "id": "k1l2m3n4", 468 | "metadata": {}, 469 | "outputs": [], 470 | "source": [ 471 | "# We need a state that can hold the original and the improved joke\n", 472 | "class JokeImprovementState(TypedDict):\n", 473 | " topic: str\n", 474 | " joke: str\n", 475 | " improved_joke: str\n", 476 | "\n", 477 | "def improve_joke(state: JokeImprovementState) -> dict[str, str]:\n", 478 | " \"\"\"Improve an existing joke by adding wordplay.\n", 479 | " \n", 480 | " This demonstrates selecting context from state - we read the existing\n", 481 | " joke from state and use it to generate an improved version.\n", 482 | " \n", 483 | " Args:\n", 484 | " state: Current state containing the original joke.\n", 485 | " \n", 486 | " Returns:\n", 487 | " A dictionary with the improved joke.\n", 488 | " \"\"\"\n", 489 | " initial_joke = state[\"joke\"]\n", 490 | " print(f\"Initial joke selected from state: {initial_joke[:50]}...\")\n", 491 | " \n", 492 | " # Select the joke from state to present it to the LLM\n", 493 | " msg = llm.invoke(f\"Make this joke funnier by adding wordplay: {initial_joke}\")\n", 494 | " return {\"improved_joke\": msg.content}\n", 495 | "\n", 496 | "# --- Build the two-step workflow ---\n", 497 | "selection_workflow = StateGraph(JokeImprovementState)\n", 498 | "\n", 499 | "# Add the initial joke generation node (reusing from before)\n", 500 | "selection_workflow.add_node(\"generate_joke\", generate_joke)\n", 501 | "# Add the new improvement node\n", 502 | "selection_workflow.add_node(\"improve_joke\", improve_joke)\n", 503 | "\n", 504 | "# Connect nodes in sequence\n", 505 | "selection_workflow.add_edge(START, \"generate_joke\")\n", 506 | "selection_workflow.add_edge(\"generate_joke\", \"improve_joke\")\n", 507 | "selection_workflow.add_edge(\"improve_joke\", END)\n", 508 | "\n", 509 | "# Compile the workflow\n", 510 | "selection_chain = selection_workflow.compile()\n", 511 | "\n", 512 | "# Visualize the new graph\n", 513 | "display(Image(selection_chain.get_graph().draw_mermaid_png()))" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": null, 519 | "id": "o5p6q7r8", 520 | "metadata": {}, 521 | "outputs": [], 522 | "source": [ 523 | "# Execute the workflow to see context selection in action\n", 524 | "joke_improvement_state = selection_chain.invoke({\"topic\": \"computers\"})\n", 525 | "\n", 526 | "# Display the final state with rich formatting\n", 527 | "console.print(\"\\n[bold blue]Final Joke Improvement State:[/bold blue]\")\n", 528 | "pprint(joke_improvement_state)" 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "id": "s9t0u1v2", 534 | "metadata": {}, 535 | "source": [ 536 | "#### Memory Selection Ability\n", 537 | "If agents can save memories, they also need to select relevant memories for the task at hand. This is useful for recalling:\n", 538 | "- **Episodic memories:** Few-shot examples of desired behavior.\n", 539 | "- **Procedural memories:** Instructions to guide behavior.\n", 540 | "- **Semantic memories:** Facts or relationships for task-relevant context.\n", 541 | "\n", 542 | "In our previous example, we wrote to the `InMemoryStore`. Now, we can select context from it using the `store.get()` method to pull relevant state into our workflow. Let's create a node that selects the previously stored joke and tries to generate a *different* one." 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": null, 548 | "id": "w3x4y5z6", 549 | "metadata": {}, 550 | "outputs": [], 551 | "source": [ 552 | "# Re-initialize storage components for this example\n", 553 | "checkpointer_select = InMemorySaver()\n", 554 | "memory_store_select = InMemoryStore()\n", 555 | "# Pre-populate the store with a joke for selection\n", 556 | "memory_store_select.put(namespace, \"last_joke\", {\"joke\": \"Why was the computer cold? Because it left its Windows open!\"})\n", 557 | "\n", 558 | "def generate_different_joke(state: State, store: BaseStore) -> dict[str, str]:\n", 559 | " \"\"\"Generate a joke with memory-aware context selection.\n", 560 | " \n", 561 | " This function demonstrates selecting context from memory before\n", 562 | " generating new content, ensuring it doesn't repeat itself.\n", 563 | " \n", 564 | " Args:\n", 565 | " state: Current state containing the topic\n", 566 | " store: Memory store for persistent context\n", 567 | " \n", 568 | " Returns:\n", 569 | " Dictionary with the newly generated joke\n", 570 | " \"\"\"\n", 571 | " # Select prior joke from memory if it exists\n", 572 | " prior_joke_item = store.get(namespace, \"last_joke\")\n", 573 | " prior_joke_text = \"None\"\n", 574 | " if prior_joke_item:\n", 575 | " prior_joke_text = prior_joke_item.value[\"joke\"]\n", 576 | " print(f\"Selected prior joke from memory: {prior_joke_text}\")\n", 577 | " else:\n", 578 | " print(\"No prior joke found in memory.\")\n", 579 | "\n", 580 | " # Generate a new joke that differs from the prior one\n", 581 | " prompt = (\n", 582 | " f\"Write a short joke about {state['topic']}, \"\n", 583 | " f\"but make it different from this prior joke: '{prior_joke_text}'\"\n", 584 | " )\n", 585 | " msg = llm.invoke(prompt)\n", 586 | "\n", 587 | " # Store the new joke in memory for future context selection\n", 588 | " store.put(namespace, \"last_joke\", {\"joke\": msg.content})\n", 589 | "\n", 590 | " return {\"joke\": msg.content}\n", 591 | "\n", 592 | "# Build the memory-aware workflow\n", 593 | "memory_selection_workflow = StateGraph(State)\n", 594 | "memory_selection_workflow.add_node(\"generate_joke\", generate_different_joke)\n", 595 | "memory_selection_workflow.add_edge(START, \"generate_joke\")\n", 596 | "memory_selection_workflow.add_edge(\"generate_joke\", END)\n", 597 | "\n", 598 | "# Compile with both checkpointing and memory store\n", 599 | "memory_selection_chain = memory_selection_workflow.compile(checkpointer=checkpointer_select, store=memory_store_select)\n", 600 | "\n", 601 | "# Execute the workflow\n", 602 | "config = {\"configurable\": {\"thread_id\": \"3\"}}\n", 603 | "new_joke_state = memory_selection_chain.invoke({\"topic\": \"computers\"}, config)\n", 604 | "\n", 605 | "console.print(\"\\n[bold green]Memory Selection Workflow Final State:[/bold green]\")\n", 606 | "pprint(new_joke_state)" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "id": "a7b8c9d0", 612 | "metadata": {}, 613 | "source": [ 614 | "#### Advantage of LangGraph BigTool Calling (Tool Selection)\n", 615 | "Agents use tools, but giving them too many can cause confusion, especially when tool descriptions overlap. A solution is to use RAG on tool descriptions to fetch only the most relevant tools for a task.\n", 616 | "\n", 617 | "> According to [recent research](https://arxiv.org/abs/2505.03275), this improves tool selection accuracy by up to 3x.\n", 618 | "\n", 619 | "The `langgraph-bigtool` library is ideal for this. It applies semantic similarity search over tool descriptions to select the most relevant ones. Let’s demonstrate by creating an agent with all functions from Python’s built-in `math` library and see how it selects the correct one." 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": null, 625 | "id": "e1f2g3h4", 626 | "metadata": {}, 627 | "outputs": [], 628 | "source": [ 629 | "# Import necessary libraries for this example\n", 630 | "import math\n", 631 | "import types\n", 632 | "import uuid\n", 633 | "\n", 634 | "from langchain.embeddings import init_embeddings\n", 635 | "from langgraph_bigtool import create_agent\n", 636 | "from langgraph_bigtool.utils import convert_positional_only_function_to_tool\n", 637 | "from utils import format_messages # A helper from the provided utils.py\n", 638 | "\n", 639 | "# Ensure OpenAI API key is set for embeddings\n", 640 | "if \"OPENAI_API_KEY\" not in os.environ:\n", 641 | " os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Provide your OpenAI API key: \")\n", 642 | "\n", 643 | "# --- 1. Collect and Prepare Tools ---\n", 644 | "# Collect all built-in functions from the `math` module\n", 645 | "all_math_tools = []\n", 646 | "for function_name in dir(math):\n", 647 | " function = getattr(math, function_name)\n", 648 | " if isinstance(function, types.BuiltinFunctionType):\n", 649 | " # This handles an idiosyncrasy of the `math` library's function signatures\n", 650 | " if tool := convert_positional_only_function_to_tool(function):\n", 651 | " all_math_tools.append(tool)\n", 652 | "\n", 653 | "# Create a registry mapping unique IDs to each tool instance\n", 654 | "tool_registry = {str(uuid.uuid4()): tool for tool in all_math_tools}\n", 655 | "\n", 656 | "# --- 2. Index Tools for Semantic Search ---\n", 657 | "# Initialize the embeddings model\n", 658 | "embeddings = init_embeddings(\"openai:text-embedding-3-small\")\n", 659 | "\n", 660 | "# Set up an in-memory store configured for vector search on tool descriptions\n", 661 | "tool_store = InMemoryStore(\n", 662 | " index={\n", 663 | " \"embed\": embeddings,\n", 664 | " \"dims\": 1536, # Dimension for text-embedding-3-small\n", 665 | " \"fields\": [\"description\"],\n", 666 | " }\n", 667 | ")\n", 668 | "\n", 669 | "# Index each tool's name and description into the store\n", 670 | "for tool_id, tool in tool_registry.items():\n", 671 | " tool_store.put(\n", 672 | " (\"tools\",), # A namespace for tools\n", 673 | " tool_id,\n", 674 | " {\"description\": f\"{tool.name}: {tool.description}\"},\n", 675 | " )\n", 676 | "\n", 677 | "# --- 3. Create and Compile the Agent ---\n", 678 | "# The create_agent function from langgraph-bigtool sets up the agent logic\n", 679 | "builder = create_agent(llm, tool_registry)\n", 680 | "bigtool_agent = builder.compile(store=tool_store)\n", 681 | "\n", 682 | "display(Image(bigtool_agent.get_graph().draw_mermaid_png()))" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": null, 688 | "id": "i5j6k7l8", 689 | "metadata": {}, 690 | "outputs": [], 691 | "source": [ 692 | "# --- 4. Invoke the Agent ---\n", 693 | "# Define the query for the agent. This requires selecting the correct math tool.\n", 694 | "query = \"Use available tools to calculate arc cosine of 0.5.\"\n", 695 | "\n", 696 | "# Invoke the agent. It will first search its tools, select 'acos', and then execute it.\n", 697 | "result = bigtool_agent.invoke({\"messages\": query})\n", 698 | "\n", 699 | "# Format and display the final messages from the agent's execution.\n", 700 | "# The output will show the agent's thought process: searching, finding, and using the tool.\n", 701 | "format_messages(result['messages'])" 702 | ] 703 | }, 704 | { 705 | "cell_type": "markdown", 706 | "id": "m9n0o1p2", 707 | "metadata": {}, 708 | "source": [ 709 | "#### RAG with Contextual Engineering (Knowledge Selection)\n", 710 | "[RAG (Retrieval-Augmented Generation)](https://github.com/langchain-ai/rag-from-scratch) is a cornerstone of context engineering. It allows agents to select relevant knowledge from vast document stores.\n", 711 | "\n", 712 | "In LangGraph, this is typically done by creating a retrieval tool. Let's build a RAG agent that can answer questions about Lilian Weng’s blog posts." 713 | ] 714 | }, 715 | { 716 | "cell_type": "code", 717 | "execution_count": null, 718 | "id": "q3r4s5t6", 719 | "metadata": {}, 720 | "outputs": [], 721 | "source": [ 722 | "# Import necessary components for RAG\n", 723 | "from langchain_community.document_loaders import WebBaseLoader\n", 724 | "from langchain_core.vectorstores import InMemoryVectorStore\n", 725 | "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", 726 | "from langchain.tools.retriever import create_retriever_tool\n", 727 | "from langgraph.graph import MessagesState\n", 728 | "from langchain_core.messages import SystemMessage, ToolMessage\n", 729 | "from typing_extensions import Literal\n", 730 | "\n", 731 | "# --- 1. Load and Chunk Documents ---\n", 732 | "# Define the URLs for Lilian Weng's blog posts\n", 733 | "urls = [\n", 734 | " \"https://lilianweng.github.io/posts/2025-05-01-thinking/\",\n", 735 | " \"https://lilianweng.github.io/posts/2024-11-28-reward-hacking/\",\n", 736 | " \"https://lilianweng.github.io/posts/2024-07-07-hallucination/\",\n", 737 | " \"https://lilianweng.github.io/posts/2024-04-12-diffusion-video/\",\n", 738 | "]\n", 739 | "docs = [WebBaseLoader(url).load() for url in urls]\n", 740 | "docs_list = [item for sublist in docs for item in sublist]\n", 741 | "\n", 742 | "# Split the documents into smaller chunks for effective retrieval\n", 743 | "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n", 744 | " chunk_size=2000, chunk_overlap=50\n", 745 | ")\n", 746 | "doc_splits = text_splitter.split_documents(docs_list)\n", 747 | "\n", 748 | "# --- 2. Create Vector Store and Retriever Tool ---\n", 749 | "vectorstore = InMemoryVectorStore.from_documents(documents=doc_splits, embedding=embeddings)\n", 750 | "retriever = vectorstore.as_retriever()\n", 751 | "\n", 752 | "# Create a retriever tool that the agent can call\n", 753 | "retriever_tool = create_retriever_tool(\n", 754 | " retriever,\n", 755 | " \"retrieve_blog_posts\",\n", 756 | " \"Search and return information about Lilian Weng blog posts.\",\n", 757 | ")\n", 758 | "\n", 759 | "rag_tools = [retriever_tool]\n", 760 | "rag_tools_by_name = {tool.name: tool for tool in rag_tools}\n", 761 | "llm_with_rag_tools = llm.bind_tools(rag_tools)" 762 | ] 763 | }, 764 | { 765 | "cell_type": "markdown", 766 | "id": "u7v8w9x0", 767 | "metadata": {}, 768 | "source": [ 769 | "Now we define the graph components for our RAG agent: the prompt, the nodes for calling the LLM and the tool, and a conditional edge to create a loop." 770 | ] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "execution_count": null, 775 | "id": "y1z2a3b4", 776 | "metadata": {}, 777 | "outputs": [], 778 | "source": [ 779 | "# --- 3. Define the RAG Agent Graph ---\n", 780 | "rag_prompt = \"\"\"You are a helpful assistant tasked with retrieving information from a series of technical blog posts by Lilian Weng. \n", 781 | "Clarify the scope of research with the user before using your retrieval tool to gather context. Reflect on any context you fetch, and\n", 782 | "proceed until you have sufficient context to answer the user's research request.\"\"\"\n", 783 | "\n", 784 | "def rag_llm_call(state: MessagesState):\n", 785 | " \"\"\"Node to call the LLM. The LLM decides whether to call a tool or generate a final answer.\"\"\"\n", 786 | " messages_with_prompt = [SystemMessage(content=rag_prompt)] + state[\"messages\"]\n", 787 | " response = llm_with_rag_tools.invoke(messages_with_prompt)\n", 788 | " return {\"messages\": [response]}\n", 789 | "\n", 790 | "def rag_tool_node(state: dict):\n", 791 | " \"\"\"Node to perform the tool call and return the observation.\"\"\"\n", 792 | " last_message = state[\"messages\"][-1]\n", 793 | " result = []\n", 794 | " for tool_call in last_message.tool_calls:\n", 795 | " tool = rag_tools_by_name[tool_call[\"name\"]]\n", 796 | " observation = tool.invoke(tool_call[\"args\"])\n", 797 | " result.append(ToolMessage(content=str(observation), tool_call_id=tool_call[\"id\"]))\n", 798 | " return {\"messages\": result}\n", 799 | "\n", 800 | "def should_continue_rag(state: MessagesState) -> Literal[\"Action\", END]:\n", 801 | " \"\"\"Conditional edge to decide the next step. If the LLM made a tool call, route to the tool node. Otherwise, end.\"\"\"\n", 802 | " if state[\"messages\"][-1].tool_calls:\n", 803 | " return \"Action\"\n", 804 | " return END\n", 805 | "\n", 806 | "# Build the RAG agent workflow\n", 807 | "rag_agent_builder = StateGraph(MessagesState)\n", 808 | "rag_agent_builder.add_node(\"llm_call\", rag_llm_call)\n", 809 | "rag_agent_builder.add_node(\"Action\", rag_tool_node)\n", 810 | "rag_agent_builder.set_entry_point(\"llm_call\")\n", 811 | "rag_agent_builder.add_conditional_edges(\"llm_call\", should_continue_rag, {\"Action\": \"Action\", END: END})\n", 812 | "rag_agent_builder.add_edge(\"Action\", \"llm_call\")\n", 813 | "\n", 814 | "rag_agent = rag_agent_builder.compile()\n", 815 | "display(Image(rag_agent.get_graph(xray=True).draw_mermaid_png()))" 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": null, 821 | "id": "c5d6e7f8", 822 | "metadata": {}, 823 | "outputs": [], 824 | "source": [ 825 | "# --- 4. Invoke the RAG Agent ---\n", 826 | "query = \"What are the types of reward hacking discussed in the blogs?\"\n", 827 | "result = rag_agent.invoke({\"messages\": [(\"user\", query)]})\n", 828 | "format_messages(result['messages'])" 829 | ] 830 | }, 831 | { 832 | "cell_type": "markdown", 833 | "id": "g9h0i1j2", 834 | "metadata": {}, 835 | "source": [ 836 | "### Compressing Context: Summarization Strategies\n", 837 | "\n", 838 | "The third principle is **compressing** context. Agent interactions can span hundreds of turns and involve token-heavy tool calls. Summarization is a common and effective way to manage this, reducing token count while retaining essential information.\n", 839 | "\n", 840 | "![Third Component of CE](https://cdn-images-1.medium.com/max/1000/1*Xu76qgF1u2G3JipeIgHo5Q.png)\n", 841 | "\n", 842 | "We can add summarization at different points in the agent's workflow:\n", 843 | "- At the end of a conversation to create a summary of the entire interaction.\n", 844 | "- After a token-heavy tool call to compress its output before it enters the agent's scratchpad.\n", 845 | "\n", 846 | "Let's explore both approaches." 847 | ] 848 | }, 849 | { 850 | "cell_type": "markdown", 851 | "id": "k3l4m5n6", 852 | "metadata": {}, 853 | "source": [ 854 | "#### Approach 1: Summarizing the Entire Conversation\n", 855 | "\n", 856 | "First, we'll build an agent that performs its RAG task and then, as a final step, generates a summary of the whole interaction. This can be useful for logging or creating a concise record of the agent's work." 857 | ] 858 | }, 859 | { 860 | "cell_type": "code", 861 | "execution_count": null, 862 | "id": "o7p8q9r0", 863 | "metadata": {}, 864 | "outputs": [], 865 | "source": [ 866 | "from rich.markdown import Markdown\n", 867 | "\n", 868 | "# Define an extended state that includes a summary field\n", 869 | "class StateWithSummary(MessagesState):\n", 870 | " summary: str\n", 871 | "\n", 872 | "summarization_prompt = \"\"\"Summarize the full chat history and all tool feedback to give an overview of what the user asked about and what the agent did.\"\"\"\n", 873 | "\n", 874 | "def summary_node(state: MessagesState) -> dict:\n", 875 | " \"\"\"Node to generate a summary of the conversation.\"\"\"\n", 876 | " messages = [SystemMessage(content=summarization_prompt)] + state[\"messages\"]\n", 877 | " result = llm.invoke(messages)\n", 878 | " return {\"summary\": result.content}\n", 879 | "\n", 880 | "def should_continue_to_summary(state: MessagesState) -> Literal[\"Action\", \"summary_node\"]:\n", 881 | " \"\"\"Conditional edge to route to tool action or to the final summary node.\"\"\"\n", 882 | " if state[\"messages\"][-1].tool_calls:\n", 883 | " return \"Action\"\n", 884 | " return \"summary_node\"\n", 885 | "\n", 886 | "# Build the workflow with a final summary step\n", 887 | "summary_agent_builder = StateGraph(StateWithSummary)\n", 888 | "summary_agent_builder.add_node(\"llm_call\", rag_llm_call)\n", 889 | "summary_agent_builder.add_node(\"Action\", rag_tool_node)\n", 890 | "summary_agent_builder.add_node(\"summary_node\", summary_node)\n", 891 | "summary_agent_builder.set_entry_point(\"llm_call\")\n", 892 | "summary_agent_builder.add_conditional_edges(\"llm_call\", should_continue_to_summary, {\"Action\": \"Action\", \"summary_node\": \"summary_node\"})\n", 893 | "summary_agent_builder.add_edge(\"Action\", \"llm_call\")\n", 894 | "summary_agent_builder.add_edge(\"summary_node\", END)\n", 895 | "\n", 896 | "summary_agent = summary_agent_builder.compile()\n", 897 | "display(Image(summary_agent.get_graph(xray=True).draw_mermaid_png()))" 898 | ] 899 | }, 900 | { 901 | "cell_type": "code", 902 | "execution_count": null, 903 | "id": "s1t2u3v4", 904 | "metadata": {}, 905 | "outputs": [], 906 | "source": [ 907 | "# Run the agent and display the final summary\n", 908 | "query = \"Why does RL improve LLM reasoning according to the blogs?\"\n", 909 | "result = summary_agent.invoke({\"messages\": [(\"user\", query)]})\n", 910 | "\n", 911 | "console.print(\"\\n[bold green]Final Agent Message:[/bold green]\")\n", 912 | "format_messages([result['messages'][-1]])\n", 913 | "\n", 914 | "console.print(\"\\n[bold purple]Generated Conversation Summary:[/bold purple]\")\n", 915 | "display(Markdown(result[\"summary\"]))" 916 | ] 917 | }, 918 | { 919 | "cell_type": "markdown", 920 | "id": "w5x6y7z8", 921 | "metadata": {}, 922 | "source": [ 923 | "**Note:** While effective, this approach can be token-intensive, as the full, uncompressed tool outputs are passed through the loop. For the query above, this can use over 100k tokens.\n", 924 | "\n", 925 | "#### Approach 2: Compressing Tool Outputs On-the-Fly\n", 926 | "A more efficient approach is to compress the context *before* it enters the agent’s main scratchpad. Let’s update the RAG agent to summarize the tool call output immediately after it's received." 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": null, 932 | "id": "a9b0c1d2", 933 | "metadata": {}, 934 | "outputs": [], 935 | "source": [ 936 | "tool_summarization_prompt = \"\"\"You will be provided a document from a RAG system.\n", 937 | "Summarize the document, ensuring to retain all relevant and essential information.\n", 938 | "Your goal is to reduce the size of the document (tokens) to a more manageable size for an agent.\"\"\"\n", 939 | "\n", 940 | "def tool_node_with_summarization(state: dict):\n", 941 | " \"\"\"Performs the tool call and then immediately summarizes the output.\"\"\"\n", 942 | " last_message = state[\"messages\"][-1]\n", 943 | " result = []\n", 944 | " for tool_call in last_message.tool_calls:\n", 945 | " tool = rag_tools_by_name[tool_call[\"name\"]]\n", 946 | " observation = tool.invoke(tool_call[\"args\"])\n", 947 | " \n", 948 | " # Summarize the document before adding it to the state\n", 949 | " summary_msg = llm.invoke([\n", 950 | " SystemMessage(content=tool_summarization_prompt),\n", 951 | " (\"user\", str(observation))\n", 952 | " ])\n", 953 | " \n", 954 | " result.append(ToolMessage(content=summary_msg.content, tool_call_id=tool_call[\"id\"]))\n", 955 | " return {\"messages\": result}\n", 956 | "\n", 957 | "# Build the more efficient workflow\n", 958 | "efficient_agent_builder = StateGraph(MessagesState)\n", 959 | "efficient_agent_builder.add_node(\"llm_call\", rag_llm_call)\n", 960 | "efficient_agent_builder.add_node(\"Action\", tool_node_with_summarization)\n", 961 | "efficient_agent_builder.set_entry_point(\"llm_call\")\n", 962 | "efficient_agent_builder.add_conditional_edges(\"llm_call\", should_continue_rag, {\"Action\": \"Action\", END: END})\n", 963 | "efficient_agent_builder.add_edge(\"Action\", \"llm_call\")\n", 964 | "\n", 965 | "efficient_agent = efficient_agent_builder.compile()\n", 966 | "display(Image(efficient_agent.get_graph(xray=True).draw_mermaid_png()))" 967 | ] 968 | }, 969 | { 970 | "cell_type": "code", 971 | "execution_count": null, 972 | "id": "e3f4g5h6", 973 | "metadata": {}, 974 | "outputs": [], 975 | "source": [ 976 | "# Run the same query with the efficient agent\n", 977 | "query = \"Why does RL improve LLM reasoning according to the blogs?\"\n", 978 | "result = efficient_agent.invoke({\"messages\": [(\"user\", query)]})\n", 979 | "\n", 980 | "console.print(\"\\n[bold green]Efficient Agent Conversation Flow:[/bold green]\")\n", 981 | "format_messages(result['messages'])" 982 | ] 983 | }, 984 | { 985 | "cell_type": "markdown", 986 | "id": "i7j8k9l0", 987 | "metadata": {}, 988 | "source": [ 989 | "**Result:** This simple change can cut token usage by nearly half, making the agent far more efficient and cost-effective, demonstrating the power of on-the-fly context compression." 990 | ] 991 | }, 992 | { 993 | "cell_type": "markdown", 994 | "id": "m1n2o3p4", 995 | "metadata": {}, 996 | "source": [ 997 | "### Isolating Context: Sub-Agents and Sandboxing\n", 998 | "The final principle is **isolating** context. This involves splitting up the context to prevent different tasks or types of information from interfering with each other. This is crucial for complex, multi-step problems.\n", 999 | "\n", 1000 | "![Fourth Component of CE](https://cdn-images-1.medium.com/max/1000/1*-b9BLPkLHkYsy2iLQIdxUg.png)\n", 1001 | "\n", 1002 | "We will look at two powerful isolation techniques:\n", 1003 | "1. **Sub-Agent Architectures:** Using multiple, specialized agents managed by a supervisor.\n", 1004 | "2. **Sandboxed Environments:** Executing code in a secure, isolated environment." 1005 | ] 1006 | }, 1007 | { 1008 | "cell_type": "markdown", 1009 | "id": "q5r6s7t8", 1010 | "metadata": {}, 1011 | "source": [ 1012 | "#### Isolating Context using Sub-Agents Architecture\n", 1013 | "\n", 1014 | "A common way to isolate context is by splitting tasks across sub-agents. OpenAI's [Swarm](https://github.com/openai/swarm) library was designed for this \"separation of concerns,\" where each agent manages a specific sub-task with its own tools, instructions, and context window.\n", 1015 | "\n", 1016 | "> *Subagents operate in parallel with their own context windows, exploring different aspects of the question simultaneously.* - Anthropic\n", 1017 | "\n", 1018 | "LangGraph supports this through a **supervisor** architecture. The supervisor delegates tasks to specialized sub-agents, each running in its own isolated context window. Let’s build a supervisor that manages a `math_expert` and a `research_expert`." 1019 | ] 1020 | }, 1021 | { 1022 | "cell_type": "code", 1023 | "execution_count": null, 1024 | "id": "u9v0w1x2", 1025 | "metadata": {}, 1026 | "outputs": [], 1027 | "source": [ 1028 | "# Import prebuilt agent creators\n", 1029 | "from langgraph.prebuilt import create_react_agent\n", 1030 | "from langgraph_supervisor import create_supervisor\n", 1031 | "\n", 1032 | "# --- 1. Define Tools for Each Agent ---\n", 1033 | "def add(a: float, b: float) -> float:\n", 1034 | " \"\"\"Add two numbers.\"\"\"\n", 1035 | " return a + b\n", 1036 | "\n", 1037 | "def multiply(a: float, b: float) -> float:\n", 1038 | " \"\"\"Multiply two numbers.\"\"\"\n", 1039 | " return a * b\n", 1040 | "\n", 1041 | "def web_search(query: str) -> str:\n", 1042 | " \"\"\"Mock web search function that returns FAANG company headcounts.\"\"\"\n", 1043 | " return (\n", 1044 | " \"Here are the headcounts for each of the FAANG companies in 2024:\\n\"\n", 1045 | " \"1. **Facebook (Meta)**: 67,317 employees.\\n\"\n", 1046 | " \"2. **Apple**: 164,000 employees.\\n\"\n", 1047 | " \"3. **Amazon**: 1,551,000 employees.\\n\"\n", 1048 | " \"4. **Netflix**: 14,000 employees.\\n\"\n", 1049 | " \"5. **Google (Alphabet)**: 181,269 employees.\"\n", 1050 | " )\n", 1051 | "\n", 1052 | "# --- 2. Create Specialized Agents ---\n", 1053 | "# Each agent has its own tools and instructions, isolating its context\n", 1054 | "math_agent = create_react_agent(\n", 1055 | " model=llm,\n", 1056 | " tools=[add, multiply],\n", 1057 | " name=\"math_expert\",\n", 1058 | " prompt=\"You are a math expert. Always use one tool at a time.\"\n", 1059 | ")\n", 1060 | "\n", 1061 | "research_agent = create_react_agent(\n", 1062 | " model=llm,\n", 1063 | " tools=[web_search],\n", 1064 | " name=\"research_expert\",\n", 1065 | " prompt=\"You are a world class researcher with access to web search. Do not do any math.\"\n", 1066 | ")\n", 1067 | "\n", 1068 | "# --- 3. Create Supervisor Workflow ---\n", 1069 | "# The supervisor coordinates the agents\n", 1070 | "supervisor_workflow = create_supervisor(\n", 1071 | " [research_agent, math_agent],\n", 1072 | " model=llm,\n", 1073 | " prompt=(\n", 1074 | " \"You are a team supervisor managing a research expert and a math expert. \"\n", 1075 | " \"Delegate tasks to the appropriate agent to answer the user's query. \"\n", 1076 | " \"For current events or facts, use research_agent. \"\n", 1077 | " \"For math problems, use math_agent.\"\n", 1078 | " )\n", 1079 | ")\n", 1080 | "\n", 1081 | "# Compile the multi-agent application\n", 1082 | "multi_agent_app = supervisor_workflow.compile()" 1083 | ] 1084 | }, 1085 | { 1086 | "cell_type": "code", 1087 | "execution_count": null, 1088 | "id": "y3z4a5b6", 1089 | "metadata": {}, 1090 | "outputs": [], 1091 | "source": [ 1092 | "# --- 4. Execute the Multi-Agent Workflow ---\n", 1093 | "result = multi_agent_app.invoke({\n", 1094 | " \"messages\": [\n", 1095 | " {\n", 1096 | " \"role\": \"user\",\n", 1097 | " \"content\": \"what's the combined headcount of the FAANG companies in 2024?\"\n", 1098 | " }\n", 1099 | " ]\n", 1100 | "})\n", 1101 | "\n", 1102 | "# Format and display the results, showing the delegation in action\n", 1103 | "format_messages(result['messages'])" 1104 | ] 1105 | }, 1106 | { 1107 | "cell_type": "markdown", 1108 | "id": "c7d8e9f0", 1109 | "metadata": {}, 1110 | "source": [ 1111 | "#### Isolation using Sandboxed Environments\n", 1112 | "Another powerful way to isolate context is to use a sandboxed execution environment. Instead of the LLM just calling tools via JSON, a `CodeAgent` can write and execute code in a secure sandbox. The results are then returned to the LLM.\n", 1113 | "\n", 1114 | "This keeps heavy data or complex state (like variables in a script) outside the LLM’s token limit, isolating it in the environment.\n", 1115 | "\n", 1116 | "The `langchain-sandbox` provides a secure environment for executing untrusted Python code using Pyodide (Python compiled to WebAssembly). We can add this as a tool to any LangGraph agent.\n", 1117 | "\n", 1118 | "**Note:** Deno is required. Install it from: https://docs.deno.com/runtime/getting_started/installation/" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "code", 1123 | "execution_count": null, 1124 | "id": "g1h2i3j4", 1125 | "metadata": {}, 1126 | "outputs": [], 1127 | "source": [ 1128 | "# Import the sandbox tool and a prebuilt agent\n", 1129 | "from langchain_sandbox import PyodideSandboxTool\n", 1130 | "from langgraph.prebuilt import create_react_agent\n", 1131 | "\n", 1132 | "# Create a sandbox tool. allow_net=True lets it install packages if needed.\n", 1133 | "sandbox_tool = PyodideSandboxTool(allow_net=True)\n", 1134 | "\n", 1135 | "# Create a ReAct agent equipped with the sandbox tool\n", 1136 | "sandbox_agent = create_react_agent(llm, tools=[sandbox_tool])\n", 1137 | "\n", 1138 | "# Execute a query that the agent can solve by writing and running Python code\n", 1139 | "result = await sandbox_agent.ainvoke(\n", 1140 | " {\"messages\": [{\"role\": \"user\", \"content\": \"what's 5 + 7?\"}]},\n", 1141 | ")\n", 1142 | "\n", 1143 | "# Format and display the results\n", 1144 | "format_messages(result['messages'])" 1145 | ] 1146 | }, 1147 | { 1148 | "cell_type": "markdown", 1149 | "id": "k5l6m7n8", 1150 | "metadata": {}, 1151 | "source": [ 1152 | "#### State Isolation in LangGraph\n", 1153 | "Finally, it's important to remember that the agent’s **runtime state object** is itself a powerful way to isolate context. By designing a state schema with different fields, you can control what the LLM sees.\n", 1154 | "\n", 1155 | "For example, one field (like `messages`) can be shown to the LLM on each turn, while other fields store information (like raw tool outputs or intermediate calculations) that remains isolated until a specific node needs to access it. You’ve seen many examples of this throughout this notebook, where we explicitly read from and write to specific fields of the state object." 1156 | ] 1157 | }, 1158 | { 1159 | "cell_type": "markdown", 1160 | "id": "o9p0q1r2", 1161 | "metadata": {}, 1162 | "source": [ 1163 | "### Summarizing Everything\n", 1164 | "Let’s summarize what we have done so far:\n", 1165 | "\n", 1166 | "* **Write:** We used LangGraph `StateGraph` to create a **\"scratchpad\"** for short-term memory and an `InMemoryStore` for long-term memory, allowing our agent to store and recall information.\n", 1167 | "* **Select:** We demonstrated how to selectively pull relevant information from the agent’s state and long-term memory. This included using Retrieval-Augmented Generation (`RAG`) to find specific knowledge and `langgraph-bigtool` to select the right tool from many options.\n", 1168 | "* **Compress:** To manage long conversations and token-heavy tool outputs, we implemented summarization. We showed how to compress `RAG` results on-the-fly to make the agent more efficient and reduce token usage.\n", 1169 | "* **Isolate:** We explored keeping contexts separate to avoid confusion by building a multi-agent system with a supervisor that delegates tasks to specialized sub-agents and by using sandboxed environments to run code.\n", 1170 | "\n", 1171 | "All these techniques fall under **“Contextual Engineering”** — a strategy to improve AI agents by carefully managing their working memory (`context`) to make them more efficient, accurate, and capable of handling complex, long-running tasks." 1172 | ] 1173 | } 1174 | ], 1175 | "metadata": { 1176 | "kernelspec": { 1177 | "display_name": "Python 3 (ipykernel)", 1178 | "language": "python", 1179 | "name": "python3" 1180 | }, 1181 | "language_info": { 1182 | "codemirror_mode": { 1183 | "name": "ipython", 1184 | "version": 3 1185 | }, 1186 | "file_extension": ".py", 1187 | "mimetype": "text/x-python", 1188 | "name": "python", 1189 | "nbconvert_exporter": "python", 1190 | "pygments_lexer": "ipython3", 1191 | "version": "3.10.0" 1192 | } 1193 | }, 1194 | "nbformat": 4, 1195 | "nbformat_minor": 5 1196 | } -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Core LangGraph and LangChain dependencies 2 | langgraph>=0.2.0 3 | langchain>=0.3.0 4 | langchain-openai>=0.2.0 5 | langchain-anthropic>=0.3.0 6 | langchain-sandbox>=0.0.6 7 | langgraph_bigtool>=0.0.3 8 | langchain_community>=0.3.27 9 | langgraph_supervisor>=0.0.27 10 | langgraph_swarm>=0.0.12 11 | 12 | # Data validation and type checking 13 | pydantic>=2.0.0 14 | 15 | # Optional dependencies for examples 16 | pandas>=2.0.0 17 | numpy>=1.24.0 18 | matplotlib>=3.7.0 19 | httpx>=0.24.0 20 | rich>=14.0.0 21 | 22 | # Jupyter notebook support 23 | jupyter>=1.0.0 24 | ipykernel>=6.20.0 -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utility functions for formatting and displaying conversation messages. 3 | 4 | This module provides functions to render message objects in a structured 5 | and visually appealing way in the console using the `rich` library. 6 | """ 7 | 8 | # Import necessary standard libraries 9 | import json 10 | # Import typing hints for clear function signatures 11 | from typing import Any, List 12 | 13 | # Import components from the rich library for enhanced console output 14 | from rich.console import Console 15 | from rich.panel import Panel 16 | 17 | # Initialize a global Console object from the rich library. 18 | # This object will be used for all styled output to the terminal. 19 | console = Console() 20 | 21 | 22 | def format_message_content(message: Any) -> str: 23 | """ 24 | Converts the content of a message object into a displayable string. 25 | 26 | This function handles simple string content as well as complex list-based 27 | content, such as tool calls, by parsing and formatting them appropriately. 28 | 29 | Args: 30 | message: A message object that has a 'content' attribute. 31 | 32 | Returns: 33 | A formatted string representation of the message content. 34 | """ 35 | # Retrieve the content from the message object. 36 | content = message.content 37 | 38 | # Check if the content is a simple string. 39 | if isinstance(content, str): 40 | # If it is, return it directly. 41 | return content 42 | # Check if the content is a list, which often indicates complex data like tool calls. 43 | elif isinstance(content, list): 44 | # Initialize an empty list to hold formatted parts of the content. 45 | parts = [] 46 | # Iterate over each item in the content list. 47 | for item in content: 48 | # If the item is a simple text block. 49 | if item.get("type") == "text": 50 | # Append the text directly to our parts list. 51 | parts.append(item["text"]) 52 | # If the item represents a tool being used. 53 | elif item.get("type") == "tool_use": 54 | # Format a string to indicate a tool call, including the tool's name. 55 | tool_call_str = f"\n🔧 Tool Call: {item.get('name')}" 56 | # Format the tool's input arguments as a pretty-printed JSON string. 57 | tool_args_str = f" Args: {json.dumps(item.get('input', {}), indent=2)}" 58 | # Add the formatted tool call strings to our parts list. 59 | parts.extend([tool_call_str, tool_args_str]) 60 | # Join all the formatted parts into a single string, separated by newlines. 61 | return "\n".join(parts) 62 | # For any other type of content. 63 | else: 64 | # Convert the content to a string as a fallback. 65 | return str(content) 66 | 67 | 68 | def format_messages(messages: List[Any]) -> None: 69 | """ 70 | Formats and displays a list of messages using rich Panels. 71 | 72 | Each message is rendered inside a styled panel, with a title and border 73 | color that corresponds to its role (e.g., Human, AI, Tool). 74 | 75 | Args: 76 | messages: A list of message objects to be displayed. 77 | """ 78 | # Iterate through each message object in the provided list. 79 | for m in messages: 80 | # Determine the message type by getting the class name and removing "Message". 81 | msg_type = m.__class__.__name__.replace("Message", "") 82 | # Get the formatted string content of the message using our helper function. 83 | content = format_message_content(m) 84 | 85 | # Define default arguments for the rich Panel. 86 | panel_args = {"title": f"📝 {msg_type}", "border_style": "white"} 87 | 88 | # Customize panel appearance based on the message type. 89 | # If the message is from a human user. 90 | if msg_type == "Human": 91 | # Update the title and set the border color to blue. 92 | panel_args.update(title="🧑 Human", border_style="blue") 93 | # If the message is from the AI assistant. 94 | elif msg_type == "Ai": 95 | # Update the title and set the border color to green. 96 | panel_args.update(title="🤖 Assistant", border_style="green") 97 | # If the message is a tool's output. 98 | elif msg_type == "Tool": 99 | # Update the title and set the border color to yellow. 100 | panel_args.update(title="🔧 Tool Output", border_style="yellow") 101 | 102 | # Create a Panel with the formatted content and customized arguments. 103 | # Then, print the panel to the console. 104 | console.print(Panel(content, **panel_args)) 105 | 106 | 107 | def format_message(messages: List[Any]) -> None: 108 | """ 109 | Alias for the format_messages function. 110 | 111 | This provides backward compatibility for any code that might still 112 | use the singular name `format_message`. 113 | 114 | Args: 115 | messages: A list of message objects to be displayed. 116 | """ 117 | # Call the main format_messages function to perform the rendering. 118 | format_messages(messages) --------------------------------------------------------------------------------