├── LICENSE ├── README.md ├── content ├── appendix-13-pre-fetch.md ├── brief-history-of-software.md ├── factor-1-natural-language-to-tool-calls.md ├── factor-10-small-focused-agents.md ├── factor-11-trigger-from-anywhere.md ├── factor-12-stateless-reducer.md ├── factor-2-own-your-prompts.md ├── factor-3-own-your-context-window.md ├── factor-4-tools-are-structured-outputs.md ├── factor-5-unify-execution-state.md ├── factor-6-launch-pause-resume.md ├── factor-7-contact-humans-with-tools.md ├── factor-8-own-your-control-flow.md └── factor-9-compact-errors.md └── img ├── 010-software-dag.png ├── 015-dag-orchestrators.png ├── 020-dags-with-ml.png ├── 025-agent-dag.png ├── 026-agent-dag-lines.png ├── 027-agent-loop-animation.gif ├── 027-agent-loop-animation.mp4 ├── 027-agent-loop-dag.png ├── 027-agent-loop.png ├── 028-micro-agent-dag.png ├── 029-deploybot-high-level.png ├── 030-deploybot-animation.gif ├── 030-deploybot-animation.mp4 ├── 031-deploybot-animation-5.gif ├── 031-deploybot-animation-5.mp4 ├── 031-deploybot-animation.gif ├── 031-deploybot-animation.mp4 ├── 033-deploybot.gif ├── 035-deploybot-conversation.png ├── 040-4-components.png ├── 110-natural-language-tool-calls.png ├── 120-own-your-prompts.png ├── 130-own-your-context-building.png ├── 140-tools-are-just-structured-outputs.png ├── 150-all-state-in-context-window.png ├── 150-unify-state.png ├── 155-unify-state-animation.gif ├── 160-pause-resume-with-simple-apis.png ├── 165-pause-resume-animation.gif ├── 170-contact-humans-with-tools.png ├── 175-outer-loop-agents.png ├── 180-control-flow.png ├── 190-factor-9-errors-static.png ├── 195-factor-9-errors.gif ├── 1a0-small-focused-agents.png ├── 1a5-agent-scope-grow.gif ├── 1b0-trigger-from-anywhere.png ├── 1c0-stateless-reducer.png ├── 1c5-agent-foldl.png └── 220-context-engineering.png /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) 2 | 3 | This is a human-readable summary of (and not a substitute for) the license. Disclaimer. 4 | 5 | You are free to: 6 | 7 | - Share — copy and redistribute the material in any medium or format 8 | - Adapt — remix, transform, and build upon the material for any purpose, even commercially. 9 | 10 | This license is acceptable for Free Cultural Works. 11 | 12 | The licensor cannot revoke these freedoms as long as you follow the license terms. 13 | 14 | Under the following terms: 15 | 16 | - Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. 17 | - ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. 18 | - No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. 19 | 20 | Notices: 21 | 22 | You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation. 23 | 24 | No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. 25 | 26 | For the full text of this license, visit https://creativecommons.org/licenses/by-sa/4.0/legalcode -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 7 | 8 | 9 | # 12 Factor Agents - Principles for building reliable LLM applications 10 | 11 | 12 | 13 | *In the spirit of [12 Factor Apps](https://12factor.net/)*. *The source for this project is public at https://github.com/humanlayer/12-factor-agents, and I welcome your feedback and contributions. Let's figure this out together!* 14 | 15 | 16 | 17 |
18 | 19 | License: CC BY-SA 4.0 20 | 21 | Discord Server 22 | 23 | YouTube
 24 | Deep Dive 25 | 26 |
27 | 28 | 29 | 30 | 31 | Screenshot 2025-04-03 at 2 49 07 PM 32 | 33 | 34 | Hi, I'm Dex. I've been [hacking](https://youtu.be/8bIHcttkOTE) on [AI agents](https://theouterloop.substack.com) for [a while](https://humanlayer.dev). 35 | 36 | 37 | **I've tried every agent framework out there**, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to the "production grade" langraph, griptape, etc. 38 | 39 | **I've talked to a lot of really strong founders**, in and out of YC, who are all building really impressive things with AI. Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents. 40 | 41 | **I've been surprised to find** that most of the products out there billing themselves as "AI Agents" are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical. 42 | 43 | Agents, at least the good ones, don't follow the ["here's your prompt, here's a bag of tools, loop until you hit the goal"](https://www.anthropic.com/engineering/building-effective-agents#agents) pattern. Rather, they are comprised of mostly just software. 44 | 45 | So, I set out to answer: 46 | 47 | > ### **What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?** 48 | 49 | Welcome to 12-factor agents. As every Chicago mayor since Daley has consistently plastered all over the city's major airports, we're glad you're here. 50 | 51 | *Special thanks to [@iantbutler01](https://github.com/iantbutler01), [@tnm](https://github.com/tnm), [@hellovai](https://www.github.com/hellovai), [@stantonk](https://www.github.com/stantonk), [@balanceiskey](https://www.github.com/balanceiskey), [@AdjectiveAllison](https://www.github.com/AdjectiveAllison), [@pfbyjy](https://www.github.com/pfbyjy), [@a-churchill](https://www.github.com/a-churchill), and the SF MLOps community for early feedback on this guide.* 52 | 53 | ## The Short Version: The 12 Factors 54 | 55 | Even if LLMs [continue to get exponentially more powerful](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md#what-if-llms-get-smarter), there will be core engineering techniques that make LLM-powered software more reliable, more scalable, and easier to maintain. 56 | 57 | - [How We Got Here: A Brief History of Software](https://github.com/humanlayer/12-factor-agents/blob/main/content/brief-history-of-software.md) 58 | - [Factor 1: Natural Language to Tool Calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md) 59 | - [Factor 2: Own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md) 60 | - [Factor 3: Own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) 61 | - [Factor 4: Tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md) 62 | - [Factor 5: Unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) 63 | - [Factor 6: Launch/Pause/Resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) 64 | - [Factor 7: Contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) 65 | - [Factor 8: Own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) 66 | - [Factor 9: Compact Errors into Context Window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md) 67 | - [Factor 10: Small, Focused Agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md) 68 | - [Factor 11: Trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md) 69 | - [Factor 12: Make your agent a stateless reducer](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-12-stateless-reducer.md) 70 | 71 | ### Visual Nav 72 | 73 | | | | | 74 | |----|----|-----| 75 | |[![factor 1](https://github.com/humanlayer/12-factor-agents/blob/main/img/110-natural-language-tool-calls.png)](./content/factor-1-natural-language-to-tool-calls.md) | [![factor 2](https://github.com/humanlayer/12-factor-agents/blob/main/img/120-own-your-prompts.png)](./content/factor-2-own-your-prompts.md) | [![factor 3](https://github.com/humanlayer/12-factor-agents/blob/main/img/130-own-your-context-building.png)](./content/factor-3-own-your-context-window.md) | 76 | |[![factor 4](https://github.com/humanlayer/12-factor-agents/blob/main/img/140-tools-are-just-structured-outputs.png)](./content/factor-4-tools-are-structured-outputs.md) | [![factor 5](https://github.com/humanlayer/12-factor-agents/blob/main/img/150-unify-state.png)](./content/factor-5-unify-execution-state.md) | [![factor 6](https://github.com/humanlayer/12-factor-agents/blob/main/img/160-pause-resume-with-simple-apis.png)](./content/factor-6-launch-pause-resume.md) | 77 | | [![factor 7](https://github.com/humanlayer/12-factor-agents/blob/main/img/170-contact-humans-with-tools.png)](./content/factor-7-contact-humans-with-tools.md) | [![factor 8](https://github.com/humanlayer/12-factor-agents/blob/main/img/180-control-flow.png)](./content/factor-8-own-your-control-flow.md) | [![factor 9](https://github.com/humanlayer/12-factor-agents/blob/main/img/190-factor-9-errors-static.png)](./content/factor-9-compact-errors.md) | 78 | | [![factor 10](https://github.com/humanlayer/12-factor-agents/blob/main/img/1a0-small-focused-agents.png)](./content/factor-10-small-focused-agents.md) | [![factor 11](https://github.com/humanlayer/12-factor-agents/blob/main/img/1b0-trigger-from-anywhere.png)](./content/factor-11-trigger-from-anywhere.md) | [![factor 12](https://github.com/humanlayer/12-factor-agents/blob/main/img/1c0-stateless-reducer.png)](./content/factor-12-stateless-reducer.md) | 79 | 80 | ## How we got here 81 | 82 | For a deeper dive on my agent journey and what led us here, check out [A Brief History of Software](./content/brief-history-of-software.md) - a quick summary here: 83 | 84 | ### The promise of agents 85 | 86 | We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts. 87 | 88 | ![010-software-dag](https://github.com/humanlayer/12-factor-agents/blob/main/img/010-software-dag.png) 89 | 90 | ### From code to DAGs 91 | 92 | Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like [Airflow](https://airflow.apache.org/), [Prefect](https://www.prefect.io/), some predecessors, and some newer ones like ([dagster](https://dagster.io/), [inggest](https://www.inngest.com/), [windmill](https://www.windmill.dev/)). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc. 93 | 94 | ![015-dag-orchestrators](https://github.com/humanlayer/12-factor-agents/blob/main/img/015-dag-orchestrators.png) 95 | 96 | ### The promise of agents 97 | 98 | I'm not the first [person to say this](https://youtu.be/Dc99-zTMyMg?si=bcT0hIwWij2mR-40&t=73), but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions: 99 | 100 | ![025-agent-dag](https://github.com/humanlayer/12-factor-agents/blob/main/img/025-agent-dag.png) 101 | 102 | And let the LLM make decisions in real time to figure out the path 103 | 104 | ![026-agent-dag-lines](https://github.com/humanlayer/12-factor-agents/blob/main/img/026-agent-dag-lines.png) 105 | 106 | The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems. 107 | 108 | 109 | ### Agents as loops 110 | 111 | As we'll see later, it turns out this doesn't quite work. 112 | 113 | Let's dive one step deeper - with agents you've got this loop consisting of 3 steps: 114 | 115 | 1. LLM determines the next step in the workflow, outputting structured json ("tool calling") 116 | 2. Deterministic code executes the tool call 117 | 3. The result is appended to the context window 118 | 4. Repeat until the next step is determined to be "done" 119 | 120 | ```python 121 | initial_event = {"message": "..."} 122 | context = [initial_event] 123 | while True: 124 | next_step = await llm.determine_next_step(context) 125 | context.append(next_step) 126 | 127 | if (next_step.intent === "done"): 128 | return next_step.final_answer 129 | 130 | result = await execute_step(next_step) 131 | context.append(result) 132 | ``` 133 | 134 | Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done. 135 | 136 | Here's a multi-step example: 137 | 138 | [![027-agent-loop-animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/027-agent-loop-animation.gif)](https://github.com/user-attachments/assets/3beb0966-fdb1-4c12-a47f-ed4e8240f8fd) 139 | 140 |
141 | GIF Version 142 | 143 | ![027-agent-loop-animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/027-agent-loop-animation.gif)] 144 | 145 |
146 | 147 | ## Why 12-factor agents? 148 | 149 | At the end of the day, this approach just doesn't work as well as we want it to. 150 | 151 | In building HumanLayer, I've talked to at least 100 SaaS builders (mostly technical founders) looking to make their existing product more agentic. The journey usually goes something like: 152 | 153 | 1. Decide you want to build an agent 154 | 2. Product design, UX mapping, what problems to solve 155 | 3. Want to move fast, so grab $FRAMEWORK and *get to building* 156 | 4. Get to 70-80% quality bar 157 | 5. Realize that 80% isn't good enough for most customer-facing features 158 | 6. Realize that getting past 80% requires reverse-engineering the framework, prompts, flow, etc. 159 | 7. Start over from scratch 160 | 161 |
162 | Random Disclaimers 163 | 164 | **DISCLAIMER**: I'm not sure the exact right place to say this, but here seems as good as any: **this in BY NO MEANS meant to be a dig on either the many frameworks out there, or the pretty dang smart people who work on them**. They enable incredible things and have accelerated the AI ecosystem. 165 | 166 | I hope that one outcome of this post is that agent framework builders can learn from the journeys of myself and others, and make frameworks even better. 167 | 168 | Especially for builders who want to move fast but need deep control. 169 | 170 | **DISCLAIMER 2**: I'm not going to talk about MCP. I'm sure you can see where it fits in. 171 | 172 | **DISCLAIMER 3**: I'm using mostly typescript, for [reasons](https://www.linkedin.com/posts/dexterihorthy_llms-typescript-aiagents-activity-7290858296679313408-Lh9e?utm_source=share&utm_medium=member_desktop&rcm=ACoAAA4oHTkByAiD-wZjnGsMBUL_JT6nyyhOh30) but all this stuff works in python or any other language you prefer. 173 | 174 | 175 | Anyways back to the thing... 176 | 177 |
178 | 179 | ### Design Patterns for great LLM applications 180 | 181 | After digging through hundreds of AI libriaries and working with dozens of founders, my instinct is this: 182 | 183 | 1. There are some core things that make agents great 184 | 2. Going all in on a framework and building what is essentially a greenfield rewrite may be counter-productive 185 | 3. There are some core principles that make agents great, and you will get most/all of them if you pull in a framework 186 | 4. BUT, the fastest way I've seen for builders to get high-quality AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product 187 | 5. These modular concepts from agents can be defined and applied by most skilled software engineers, even if they don't have an AI background 188 | 189 | > #### The fastest way I've seen for builders to get good AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product 190 | 191 | 192 | ## The 12 Factors (again) 193 | 194 | 195 | - [How We Got Here: A Brief History of Software](./content/brief-history-of-software.md) 196 | - [Factor 1: Natural Language to Tool Calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md) 197 | - [Factor 2: Own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md) 198 | - [Factor 3: Own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) 199 | - [Factor 4: Tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md) 200 | - [Factor 5: Unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) 201 | - [Factor 6: Launch/Pause/Resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) 202 | - [Factor 7: Contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) 203 | - [Factor 8: Own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) 204 | - [Factor 9: Compact Errors into Context Window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md) 205 | - [Factor 10: Small, Focused Agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md) 206 | - [Factor 11: Trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md) 207 | - [Factor 12: Make your agent a stateless reducer](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-12-stateless-reducer.md) 208 | 209 | ## Honorable Mentions / other advice 210 | 211 | - [Factor 13: Pre-fetch all the context you might need](./content/appendix-13-pre-fetch.md) 212 | 213 | ## Related Resources 214 | 215 | - Contribute to this guide [here](https://github.com/humanlayer/12-factor-agents) 216 | - [I talked about a lot of this on an episode of the Tool Use podcast](https://youtu.be/8bIHcttkOTE) in March 2025 217 | - I write about some of this stuff at [The Outer Loop](https://theouterloop.substack.com) 218 | - I do [webinars about Maximizing LLM Performance](https://github.com/hellovai/ai-that-works/tree/main) with [@hellovai](https://github.com/hellovai) 219 | - We build OSS agents with this methodology under [got-agents/agents](https://github.com/got-agents/agents) 220 | - We ignored all our own advice and built a [framework for running distributed agents in kubernetes](https://github.com/humanlayer/kubechain) 221 | - Other links from this guide: 222 | - [12 Factor Apps](https://12factor.net) 223 | - [Building Effective Agents (Anthropic)](https://www.anthropic.com/engineering/building-effective-agents#agents) 224 | - [Prompts are Functions](https://thedataexchange.media/baml-revolution-in-ai-engineering/ ) 225 | - [Library patterns: Why frameworks are evil](https://tomasp.net/blog/2015/library-frameworks/) 226 | - [The Wrong Abstraction](https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction) 227 | - [Mailcrew Agent](https://github.com/dexhorthy/mailcrew) 228 | - [Mailcrew Demo Video](https://www.youtube.com/watch?v=f_cKnoPC_Oo) 229 | - [Chainlit Demo](https://x.com/chainlit_io/status/1858613325921480922) 230 | - [TypeScript for LLMs](https://www.linkedin.com/posts/dexterihorthy_llms-typescript-aiagents-activity-7290858296679313408-Lh9e) 231 | - [Schema Aligned Parsing](https://www.boundaryml.com/blog/schema-aligned-parsing) 232 | - [Function Calling vs Structured Outputs vs JSON Mode](https://www.vellum.ai/blog/when-should-i-use-function-calling-structured-outputs-or-json-mode) 233 | - [BAML on GitHub](https://github.com/boundaryml/baml) 234 | - [OpenAI JSON vs Function Calling](https://docs.llamaindex.ai/en/stable/examples/llm/openai_json_vs_function_calling/) 235 | - [Outer Loop Agents](https://theouterloop.substack.com/p/openais-realtime-api-is-a-step-towards) 236 | - [Airflow](https://airflow.apache.org/) 237 | - [Prefect](https://www.prefect.io/) 238 | - [Dagster](https://dagster.io/) 239 | - [Inngest](https://www.inngest.dev/) 240 | - [Windmill](https://www.windmill.dev/) 241 | - [The AI Agent Index (MIT)](https://aiagentindex.mit.edu/) 242 | - [NotebookLM on Finding Model Capability Boundaries](https://open.substack.com/pub/swyx/p/notebooklm?selection=08e1187c-cfee-4c63-93c9-71216640a5f8) 243 | 244 | 245 | 246 | 247 | -------------------------------------------------------------------------------- /content/appendix-13-pre-fetch.md: -------------------------------------------------------------------------------- 1 | ### Factor 13 - pre-fetch all the context you might need 2 | 3 | If there's a high chance that your model will call tool X, don't waste token round trips telling the model to fetch it, that is, instead of a pseudo-prompt like: 4 | 5 | ```jinja 6 | When looking at deployments, you will likely want to fetch the list of published git tags, 7 | so you can use it to deploy to prod. 8 | 9 | Here's what happened so far: 10 | 11 | {{ thread.events }} 12 | 13 | What's the next step? 14 | 15 | Answer in JSON format with one of the following intents: 16 | 17 | { 18 | intent: 'deploy_backend_to_prod', 19 | tag: string 20 | } OR { 21 | intent: 'list_git_tags' 22 | } OR { 23 | intent: 'done_for_now', 24 | message: string 25 | } 26 | ``` 27 | 28 | and your code looks like 29 | 30 | ```python 31 | thread = {"events": [inital_message]} 32 | next_step = await determine_next_step(thread) 33 | 34 | while True: 35 | switch next_step.intent: 36 | case 'list_git_tags': 37 | tags = await fetch_git_tags() 38 | thread["events"].append({ 39 | type: 'list_git_tags', 40 | data: tags, 41 | }) 42 | case 'deploy_backend_to_prod': 43 | deploy_result = await deploy_backend_to_prod(next_step.data.tag) 44 | thread["events"].append({ 45 | "type": 'deploy_backend_to_prod', 46 | "data": deploy_result, 47 | }) 48 | case 'done_for_now': 49 | await notify_human(next_step.message) 50 | break 51 | # ... 52 | ``` 53 | 54 | You might as well just fetch the tags and include them in the context window, like: 55 | 56 | ```diff 57 | - When looking at deployments, you will likely want to fetch the list of published git tags, 58 | - so you can use it to deploy to prod. 59 | 60 | + The current git tags are: 61 | 62 | + {{ git_tags }} 63 | 64 | 65 | Here's what happened so far: 66 | 67 | {{ thread.events }} 68 | 69 | What's the next step? 70 | 71 | Answer in JSON format with one of the following intents: 72 | 73 | { 74 | intent: 'deploy_backend_to_prod', 75 | tag: string 76 | - } OR { 77 | - intent: 'list_git_tags' 78 | } OR { 79 | intent: 'done_for_now', 80 | message: string 81 | } 82 | 83 | ``` 84 | 85 | and your code looks like 86 | 87 | ```diff 88 | thread = {"events": [inital_message]} 89 | + git_tags = await fetch_git_tags() 90 | 91 | - next_step = await determine_next_step(thread) 92 | + next_step = await determine_next_step(thread, git_tags) 93 | 94 | while True: 95 | switch next_step.intent: 96 | - case 'list_git_tags': 97 | - tags = await fetch_git_tags() 98 | - thread["events"].append({ 99 | - type: 'list_git_tags', 100 | - data: tags, 101 | - }) 102 | case 'deploy_backend_to_prod': 103 | deploy_result = await deploy_backend_to_prod(next_step.data.tag) 104 | thread["events"].append({ 105 | "type": 'deploy_backend_to_prod', 106 | "data": deploy_result, 107 | }) 108 | case 'done_for_now': 109 | await notify_human(next_step.message) 110 | break 111 | # ... 112 | ``` 113 | 114 | or even just include the tags in the thread and remove the specific parameter from your prompt template: 115 | 116 | ```diff 117 | thread = {"events": [inital_message]} 118 | + # add the request 119 | + thread["events"].append({ 120 | + "type": 'list_git_tags', 121 | + }) 122 | 123 | git_tags = await fetch_git_tags() 124 | 125 | + # add the result 126 | + thread["events"].append({ 127 | + "type": 'list_git_tags_result', 128 | + "data": git_tags, 129 | + }) 130 | 131 | - next_step = await determine_next_step(thread, git_tags) 132 | + next_step = await determine_next_step(thread) 133 | 134 | while True: 135 | switch next_step.intent: 136 | case 'deploy_backend_to_prod': 137 | deploy_result = await deploy_backend_to_prod(next_step.data.tag) 138 | thread["events"].append(deploy_result) 139 | case 'done_for_now': 140 | await notify_human(next_step.message) 141 | break 142 | # ... 143 | ``` 144 | 145 | Overall: 146 | 147 | > #### If you already know what tools you'll want the model to call, just call them DETERMINISTICALLY and let the model do the hard part of figuring out how to use their outputs 148 | 149 | Again, AI engineering is all about [Context Engineering](./factor-3-own-your-context-window.md). 150 | 151 | [← Stateless Reducer](./factor-12-stateless-reducer.md) | [Further Reading →](../README.md#related-resources) 152 | -------------------------------------------------------------------------------- /content/brief-history-of-software.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ## The longer version: how we got here 4 | 5 | ### You don't have to listen to me 6 | 7 | Whether you're new to agents or an ornery old veteran like me, I'm going to try to convince you to throw out most of what you think about AI Agents, take a step back, and rethink them from first principles. (spoiler alert if you didn't catch the OpenAI responses launch a few weeks back, but pushing MORE agent logic behind an API ain't it) 8 | 9 | 10 | ## Agents are software, and a brief history thereof 11 | 12 | let's talk about how we got here 13 | 14 | ### 60 years ago 15 | 16 | We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts. 17 | 18 | ![010-software-dag](https://github.com/humanlayer/12-factor-agents/blob/main/img/010-software-dag.png) 19 | 20 | ### 20 years ago 21 | 22 | Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like [Airflow](https://airflow.apache.org/), [Prefect](https://www.prefect.io/), some predecessors, and some newer ones like ([dagster](https://dagster.io/), [inggest](https://www.inngest.dev/), [windmill](https://www.windmill.dev/)). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc. 23 | 24 | ![015-dag-orchestrators](https://github.com/humanlayer/12-factor-agents/blob/main/img/015-dag-orchestrators.png) 25 | 26 | ### 10-15 years ago 27 | 28 | When ML models started to get good enough to be useful, we started to see DAGs with ML models sprinkled in. You might imagine steps like "summarize the text in this column into a new column" or "classify the support issues by severity or sentiment". 29 | 30 | ![020-dags-with-ml](https://github.com/humanlayer/12-factor-agents/blob/main/img/020-dags-with-ml.png) 31 | 32 | But at the end of the day, it's still mostly the same good old deterministic software. 33 | 34 | ### The promise of agents 35 | 36 | I'm not the first [person to say this](https://youtu.be/Dc99-zTMyMg?si=bcT0hIwWij2mR-40&t=73), but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions: 37 | 38 | ![025-agent-dag](https://github.com/humanlayer/12-factor-agents/blob/main/img/025-agent-dag.png) 39 | 40 | And let the LLM make decisions in real time to figure out the path 41 | 42 | ![026-agent-dag-lines](https://github.com/humanlayer/12-factor-agents/blob/main/img/026-agent-dag-lines.png) 43 | 44 | The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems. 45 | 46 | ### Agents as loops 47 | 48 | Put another way, you've got this loop consisting of 3 steps: 49 | 50 | 1. LLM determines the next step in the workflow, outputting structured json ("tool calling") 51 | 2. Deterministic code executes the tool call 52 | 3. The result is appended to the context window 53 | 4. repeat until the next step is determined to be "done" 54 | 55 | ```python 56 | initial_event = {"message": "..."} 57 | context = [initial_event] 58 | while True: 59 | next_step = await llm.determine_next_step(context) 60 | context.append(next_step) 61 | 62 | if (next_step.intent === "done"): 63 | return next_step.final_answer 64 | 65 | result = await execute_step(next_step) 66 | context.append(result) 67 | ``` 68 | 69 | Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), 70 | and we ask the llm to choose the next step (tool) or to determine that we're done. 71 | 72 | Here's a multi-step example: 73 | 74 | [![027-agent-loop-animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/027-agent-loop-animation.gif)](https://github.com/user-attachments/assets/3beb0966-fdb1-4c12-a47f-ed4e8240f8fd) 75 | 76 |
77 | GIF Version 78 | 79 | ![027-agent-loop-animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/027-agent-loop-animation.gif)] 80 | 81 |
82 | 83 | And the "materialized" DAG that was generated would look something like: 84 | 85 | ![027-agent-loop-dag](https://github.com/humanlayer/12-factor-agents/blob/main/img/027-agent-loop-dag.png) 86 | 87 | ### The problem with this "loop until you solve it" pattern 88 | 89 | The biggest problems with this pattern: 90 | 91 | - Agents get lost when the context window gets too long - they spin out trying the same broken approach over and over again 92 | - literally thats it, but that's enough to kneecap the approach 93 | 94 | Even if you haven't hand-rolled an agent, you've probably seen this long-context problem in working with agentic coding tools. They just get lost after a while and you need to start a new chat. 95 | 96 | I'll even perhaps posit something I've heard in passing quite a bit, and that YOU probably have developed your own intuition around: 97 | 98 | > ### **Even as models support longer and longer context windows, you'll ALWAYS get better results with a small, focused prompt and context** 99 | 100 | Most builders I've talked to **pushed the "tool calling loop" idea to the side** when they realized that anything more than 10-20 turns becomes a big mess that the LLM can't recover from. Even if the agent gets it right 90% of the time, that's miles away from "good enough to put in customer hands". Can you imagine a web app that crashed on 10% of page loads? 101 | 102 | ### What actually works - micro agents 103 | 104 | One thing that I **have** seen in the wild quite a bit is taking the agent pattern and sprinkling it into a broader more deterministic DAG. 105 | 106 | ![micro-agent-dag](https://github.com/humanlayer/12-factor-agents/blob/main/img/028-micro-agent-dag.png) 107 | 108 | You might be asking - "why use agents at all in this case?" - we'll get into that shortly, but basically, having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback, translating it into workflow steps without spinning out into context error loops. ([factor 1](./factor-1-natural-language-to-tool-calls.md), [factor 3](./factor-3-own-your-context-window.md) [factor 7](./factor-7-contact-humans-with-tools.md)). 109 | 110 | > #### having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback...without spinning out into context error loops 111 | 112 | ### A real life micro agent 113 | 114 | Here's an example of how deterministic code might run one micro agent responsible for handling the human-in-the-loop steps for deployment. 115 | 116 | ![029-deploybot-high-level](https://github.com/humanlayer/12-factor-agents/blob/main/img/029-deploybot-high-level.png) 117 | 118 | * **Human** Merges PR to GitHub main branch 119 | * **Deterministic Code** Deploys to staging env 120 | * **Deterministic Code** Runs end-to-end (e2e) tests against staging 121 | * **Deterministic Code** Hands to agent for prod deployment, with initial context: "deploy SHA 4af9ec0 to production" 122 | * **Agent** calls `deploy_frontend_to_prod(4af9ec0)` 123 | * **Deterministic code** requests human approval on this action 124 | * **Human** Rejects the action with feedback "can you deploy the backend first?" 125 | * **Agent** calls `deploy_backend_to_prod(4af9ec0)` 126 | * **Deterministic code** requests human approval on this action 127 | * **Human** approves the action 128 | * **Deterministic code** executed the backend deployment 129 | * **Agent** calls `deploy_frontend_to_prod(4af9ec0)` 130 | * **Deterministic code** requests human approval on this action 131 | * **Human** approves the action 132 | * **Deterministic code** executed the frontend deployment 133 | * **Agent** determines that the task was completed successfully, we're done! 134 | * **Deterministic code** run the end-to-end tests against production 135 | * **Deterministic code** task completed, OR pass to rollback agent to review failures and potentially roll back 136 | 137 | [![033-deploybot-animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/033-deploybot.gif)](https://github.com/user-attachments/assets/deb356e9-0198-45c2-9767-231cb569ae13) 138 | 139 |
140 | GIF Version 141 | 142 | ![033-deploybot-animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/033-deploybot.gif)] 143 | 144 |
145 | 146 | This example is based on a real life [OSS agent we've shipped to manage our deployments at Humanlayer](https://github.com/got-agents/agents/tree/main/deploybot-ts) - here is a real conversation I had with it last week: 147 | 148 | ![035-deploybot-conversation](https://github.com/humanlayer/12-factor-agents/blob/main/img/035-deploybot-conversation.png) 149 | 150 | 151 | We haven't given this agent a huge pile of tools or tasks. The primary value in the LLM is parsing the human's plaintext feedback and proposing an updated course of action. We isolate tasks and contexts as much as possible to keep the LLM focused on a small, 5-10 step workflow. 152 | 153 | Here's another [more classic support / chatbot demo](https://x.com/chainlit_io/status/1858613325921480922). 154 | 155 | ### So what's an agent really? 156 | 157 | - **prompt** - tell an LLM how to behave, and what "tools" it has available. The output of the prompt is a JSON object that describe the next step in the workflow (the "tool call" or "function call"). ([factor 2](./factor-2-own-your-prompts.md)) 158 | - **switch statement** - based on the JSON that the LLM returns, decide what to do with it. (part of [factor 8](./factor-8-own-your-control-flow.md)) 159 | - **accumulated context** - store the list of steps that have happened and their results ([factor 3](./factor-3-own-your-context-window.md)) 160 | - **for loop** - until the LLM emits some sort of "Terminal" tool call (or plaintext response), add the result of the switch statement to the context window and ask the LLM to choose the next step. ([factor 8](./factor-8-own-your-control-flow.md)) 161 | 162 | ![040-4-components](https://github.com/humanlayer/12-factor-agents/blob/main/img/040-4-components.png) 163 | 164 | In the "deploybot" example, we gain a couple benefits from owning the control flow and context accumulation: 165 | 166 | - In our **switch statement** and **for loop**, we can hijack control flow to pause for human input or to wait for completion of long-running tasks 167 | - We can trivially serialize the **context** window for pause+resume 168 | - In our **prompt**, we can optimize the heck out of how we pass instructions and "what happened so far" to the LLM 169 | 170 | 171 | [Part II](https://github.com/humanlayer/12-factor-agents/blob/main/README.md#12-factor-agents) will **formalize these patterns** so they can be applied to add impressive AI features to any software project, without needing to go all in on conventional implementations/definitions of "AI agent". 172 | 173 | [Factor 1 - Natural Language to Tool Calls →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md) -------------------------------------------------------------------------------- /content/factor-1-natural-language-to-tool-calls.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 1. Natural Language to Tool Calls 4 | 5 | One of the most common patterns in agent building is to convert natural language to structured tool calls. This is a powerful pattern that allows you to build agents that can reason about tasks and execute them. 6 | 7 | ![110-natural-language-tool-calls](https://github.com/humanlayer/12-factor-agents/blob/main/img/110-natural-language-tool-calls.png) 8 | 9 | This pattern, when applied atomically, is the simple translation of a phrase like 10 | 11 | > can you create a payment link for $750 to Terri for sponsoring the february AI tinkerers meetup? 12 | 13 | to a structured object that describes a Stripe API call like 14 | 15 | ```json 16 | { 17 | "function": { 18 | "name": "create_payment_link", 19 | "parameters": { 20 | "amount": 750, 21 | "customer": "cust_128934ddasf9", 22 | "product": "prod_8675309", 23 | "price": "prc_09874329fds", 24 | "quantity": 1, 25 | "memo": "Hey Jeff - see below for the payment link for the february ai tinkerers meetup" 26 | } 27 | } 28 | } 29 | ``` 30 | 31 | **Note**: in reality the stripe API is a bit more complex, a [real agent that does this](https://github.com/dexhorthy/mailcrew) ([video](https://www.youtube.com/watch?v=f_cKnoPC_Oo)) would list customers, list products, list prices, etc to build this payload with the proper ids, or include those ids in the prompt/context window (we'll see below how those are kinda the same thing though!) 32 | 33 | From there, deterministic code can pick up the payload and do something with it. (More on this in [factor 3](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)) 34 | 35 | ```python 36 | # The LLM takes natural language and returns a structured object 37 | nextStep = await llm.determineNextStep( 38 | """ 39 | create a payment link for $750 to Jeff 40 | for sponsoring the february AI tinkerers meetup 41 | """ 42 | ) 43 | 44 | // Handle the structured output based on its function 45 | if nextStep.function == 'create_payment_link': 46 | stripe.paymentlinks.create(nextStep.parameters) 47 | return # or whatever you want, see below 48 | elif nextStep.function == 'something_else': 49 | # ... more cases 50 | pass 51 | else: # the model didn't call a tool we know about 52 | # do something else 53 | pass 54 | ``` 55 | 56 | **NOTE**: While a full agent would then receive the API call result and loop with it, eventually returning something like 57 | 58 | > I've successfully created a payment link for $750 to Terri for sponsoring the february AI tinkerers meetup. Here's the link: https://buy.stripe.com/test_1234567890 59 | 60 | **Instead**, We're actually going to skip that step here, and save it for another factor, which you may or may not want to also incorporate (up to you!) 61 | 62 | [← How We Got Here](https://github.com/humanlayer/12-factor-agents/blob/main/content/brief-history-of-software.md) | [Own Your Prompts →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md) -------------------------------------------------------------------------------- /content/factor-10-small-focused-agents.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 10. Small, Focused Agents 4 | 5 | Rather than building monolithic agents that try to do everything, build small, focused agents that do one thing well. Agents are just one building block in a larger, mostly deterministic system. 6 | 7 | ![1a0-small-focused-agents](https://github.com/humanlayer/12-factor-agents/blob/main/img/1a0-small-focused-agents.png) 8 | 9 | The key insight here is about LLM limitations: the bigger and more complex a task is, the more steps it will take, which means a longer context window. As context grows, LLMs are more likely to get lost or lose focus. By keeping agents focused on specific domains with 3-10, maybe 20 steps max, we keep context windows manageable and LLM performance high. 10 | 11 | > #### As context grows, LLMs are more likely to get lost or lose focus 12 | 13 | Benefits of small, focused agents: 14 | 15 | 1. **Manageable Context**: Smaller context windows mean better LLM performance 16 | 2. **Clear Responsibilities**: Each agent has a well-defined scope and purpose 17 | 3. **Better Reliability**: Less chance of getting lost in complex workflows 18 | 4. **Easier Testing**: Simpler to test and validate specific functionality 19 | 5. **Improved Debugging**: Easier to identify and fix issues when they occur 20 | 21 | ### What if LLMs get smarter? 22 | 23 | Do we still need this if LLMs get smart enough to handle 100-step+ workflows? 24 | 25 | tl;dr yes. As agents and LLMs improve, they **might** naturally expand to be able to handle longer context windows. This means handling MORE of a larger DAG. This small, focused approach ensures you can get results TODAY, while preparing you to slowly expand agent scope as LLM context windows become more reliable. (If you've refactored large deterministic code bases before, you may be nodding your head right now). 26 | 27 | [![gif](https://github.com/humanlayer/12-factor-agents/blob/main/img/1a5-agent-scope-grow.gif)](https://github.com/user-attachments/assets/0cd3f52c-046e-4d5e-bab4-57657157c82f 28 | ) 29 | 30 |
31 | GIF Version 32 | ![gif](https://github.com/humanlayer/12-factor-agents/blob/main/img/1a5-agent-scope-grow.gif) 33 |
34 | 35 | Being intentional about size/scope of agents, and only growing in ways that allow you to maintain quality, is key here. As the [team that built NotebookLM put it](https://open.substack.com/pub/swyx/p/notebooklm?selection=08e1187c-cfee-4c63-93c9-71216640a5f8&utm_campaign=post-share-selection&utm_medium=web): 36 | 37 | > I feel like consistently, the most magical moments out of AI building come about for me when I'm really, really, really just close to the edge of the model capability 38 | 39 | Regardless of where that boundary is, if you can find that boundary and get it right consistently, you'll be building magical experiences. There are many moats to be built here, but as usual, they take some engineering rigor. 40 | 41 | [← Compact Errors](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md) | [Trigger From Anywhere →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md) 42 | -------------------------------------------------------------------------------- /content/factor-11-trigger-from-anywhere.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 11. Trigger from anywhere, meet users where they are 4 | 5 | If you're waiting for the [humanlayer](https://humanlayer.dev) pitch, you made it. If you're doing [factor 6 - launch/pause/resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) and [factor 7 - contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md), you're ready to incorporate this factor. 6 | 7 | ![1b0-trigger-from-anywhere](https://github.com/humanlayer/12-factor-agents/blob/main/img/1b0-trigger-from-anywhere.png) 8 | 9 | Enable users to trigger agents from slack, email, sms, or whatever other channel they want. Enable agents to respond via the same channels. 10 | 11 | Benefits: 12 | 13 | - **Meet users where they are**: This helps you build AI applications that feel like real humans, or at the very least, digital coworkers 14 | - **Outer Loop Agents**: Enable agents to be triggered by non-humans, e.g. events, crons, outages, whatever else. They may work for 5, 20, 90 minutes, but when they get to a critical point, they can contact a human for help, feedback, or approval 15 | - **High Stakes Tools**: If you're able to quickly loop in a variety of humans, you can give agents access to higher stakes operations like sending external emails, updating production data and more. Maintaining clear standards gets you auditability and confidence in agents that [perform bigger better things](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md#what-if-llms-get-smarter) 16 | 17 | [← Small Focused Agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md) | [Stateless Reducer →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-12-stateless-reducer.md) -------------------------------------------------------------------------------- /content/factor-12-stateless-reducer.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 12. Make your agent a stateless reducer 4 | 5 | Okay so we're over 1000 lines of markdown at this point. This one is mostly just for fun. 6 | 7 | ![1c0-stateless-reducer](https://github.com/humanlayer/12-factor-agents/blob/main/img/1c0-stateless-reducer.png) 8 | 9 | 10 | ![1c5-agent-foldl](https://github.com/humanlayer/12-factor-agents/blob/main/img/1c5-agent-foldl.png) 11 | 12 | [← Trigger From Anywhere](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md) | [Appendix - Pre-Fetch Context →](./appendix-13-pre-fetch.md) 13 | -------------------------------------------------------------------------------- /content/factor-2-own-your-prompts.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 2. Own your prompts 4 | 5 | Don't outsource your prompt engineering to a framework. 6 | 7 | ![120-own-your-prompts](https://github.com/humanlayer/12-factor-agents/blob/main/img/120-own-your-prompts.png) 8 | 9 | Some frameworks provide a "black box" approach like this: 10 | 11 | ```python 12 | agent = Agent( 13 | role="...", 14 | goal="...", 15 | personality="...", 16 | tools=[tool1, tool2, tool3] 17 | ) 18 | 19 | task = Task( 20 | instructions="...", 21 | expected_output=OutputModel 22 | ) 23 | 24 | result = agent.run(task) 25 | ``` 26 | 27 | This is great for pulling in some TOP NOTCH prompt engineering to get you started, but it is often difficult to tune and/or reverse engineer to get exactly the right tokens into your model. 28 | 29 | Instead, own your prompts and treat them as first-class code: 30 | 31 | ```rust 32 | function DetermineNextStep(thread: string) -> DoneForNow | ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation { 33 | prompt #" 34 | {{ _.role("system") }} 35 | 36 | You are a helpful assistant that manages deployments for frontend and backend systems. 37 | You work diligently to ensure safe and successful deployments by following best practices 38 | and proper deployment procedures. 39 | 40 | Before deploying any system, you should check: 41 | - The deployment environment (staging vs production) 42 | - The correct tag/version to deploy 43 | - The current system status 44 | 45 | You can use tools like deploy_backend, deploy_frontend, and check_deployment_status 46 | to manage deployments. For sensitive deployments, use request_approval to get 47 | human verification. 48 | 49 | Always think about what to do first, like: 50 | - Check current deployment status 51 | - Verify the deployment tag exists 52 | - Request approval if needed 53 | - Deploy to staging before production 54 | - Monitor deployment progress 55 | 56 | {{ _.role("user") }} 57 | 58 | {{ thread }} 59 | 60 | What should the next step be? 61 | "# 62 | } 63 | ``` 64 | 65 | (the above example uses [BAML](https://github.com/boundaryml/baml) to generate the prompt, but you can do this with any prompt engineering tool you want, or even just template it manually) 66 | 67 | If the signature looks a little funny, we'll get to that in [factor 4 - tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md) 68 | 69 | ```typescript 70 | function DetermineNextStep(thread: string) -> DoneForNow | ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation { 71 | ``` 72 | 73 | Key benefits of owning your prompts: 74 | 75 | 1. **Full Control**: Write exactly the instructions your agent needs, no black box abstractions 76 | 2. **Testing and Evals**: Build tests and evals for your prompts just like you would for any other code 77 | 3. **Iteration**: Quickly modify prompts based on real-world performance 78 | 4. **Transparency**: Know exactly what instructions your agent is working with 79 | 5. **Role Hacking**: take advantage of APIs that support nonstandard usage of user/assistant roles - for example, the now-deprecated non-chat flavor of OpenAI "completions" API. This includes some so-called "model gaslighting" techniques 80 | 81 | Remember: Your prompts are the primary interface between your application logic and the LLM. 82 | 83 | Having full control over your prompts gives you the flexibility and prompt control you need for production-grade agents. 84 | 85 | I don't know what's the best prompt, but I know you want the flexibility to be able to try EVERYTHING. 86 | 87 | [← Natural Language To Tool Calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md) | [Own Your Context Window →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) -------------------------------------------------------------------------------- /content/factor-3-own-your-context-window.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 3. Own your context window 4 | 5 | You don't necessarily need to use standard message-based formats for conveying context to an LLM. 6 | 7 | > #### At any given point, your input to an LLM in an agent is "here's what's happened so far, what's the next step" 8 | 9 | 10 | 11 | 12 | Everything is context engineering. [LLMs are stateless functions](https://thedataexchange.media/baml-revolution-in-ai-engineering/) that turn inputs into outputs. To get the best outputs, you need to give them the best inputs. 13 | 14 | Creating great context means: 15 | 16 | - The prompt and instructions you give to the model 17 | - Any documents or external data you retrieve (e.g. RAG) 18 | - Any past state, tool calls, results, or other history 19 | - Any past messages or events from related but separate histories/conversations (Memory) 20 | - Instructions about what sorts of structured data to output 21 | 22 | ![220-context-engineering](https://github.com/humanlayer/12-factor-agents/blob/main/img/220-context-engineering.png) 23 | 24 | This guide is all about getting as much as possible out of today's models. Notably not mentioned are: 25 | 26 | - Changes to models parameters like temperature, top_p, frequency_penalty, presence_penalty, etc. 27 | - Training your own completion or embedding models 28 | - Fine-tuning existing models 29 | 30 | Again, I don't know what's the best way to hand context to an LLM, but I know you want the flexibility to be able to try EVERYTHING. 31 | 32 | #### Standard vs Custom Context Formats 33 | 34 | Most LLM clients use a standard message-based format like this: 35 | 36 | ```yaml 37 | [ 38 | { 39 | "role": "system", 40 | "content": "You are a helpful assistant..." 41 | }, 42 | { 43 | "role": "user", 44 | "content": "Can you deploy the backend?" 45 | }, 46 | { 47 | "role": "assistant", 48 | "content": null, 49 | "tool_calls": [ 50 | { 51 | "id": "1", 52 | "name": "list_git_tags", 53 | "arguments": "{}" 54 | } 55 | ] 56 | }, 57 | { 58 | "role": "tool", 59 | "name": "list_git_tags", 60 | "content": "{\"tags\": [{\"name\": \"v1.2.3\", \"commit\": \"abc123\", \"date\": \"2024-03-15T10:00:00Z\"}, {\"name\": \"v1.2.2\", \"commit\": \"def456\", \"date\": \"2024-03-14T15:30:00Z\"}, {\"name\": \"v1.2.1\", \"commit\": \"abe033d\", \"date\": \"2024-03-13T09:15:00Z\"}]}", 61 | "tool_call_id": "1" 62 | } 63 | ] 64 | ``` 65 | 66 | While this works great for most use cases, if you want to really get THE MOST out of today's LLMs, you need to get your context into the LLM in the most token- and attention-efficient way you can. 67 | 68 | As an alternative to the standard message-based format, you can build your own context format that's optimized for your use case. For example, you can use custom objects and pack/spread them into one or more user, system, assistant, or tool messages as makes sense. 69 | 70 | Here's an example of putting the whole context window into a single user message: 71 | ```yaml 72 | 73 | [ 74 | { 75 | "role": "system", 76 | "content": "You are a helpful assistant..." 77 | }, 78 | { 79 | "role": "user", 80 | "content": | 81 | Here's everything that happened so far: 82 | 83 | 84 | From: @alex 85 | Channel: #deployments 86 | Text: Can you deploy the backend? 87 | 88 | 89 | 90 | intent: "list_git_tags" 91 | 92 | 93 | 94 | tags: 95 | - name: "v1.2.3" 96 | commit: "abc123" 97 | date: "2024-03-15T10:00:00Z" 98 | - name: "v1.2.2" 99 | commit: "def456" 100 | date: "2024-03-14T15:30:00Z" 101 | - name: "v1.2.1" 102 | commit: "ghi789" 103 | date: "2024-03-13T09:15:00Z" 104 | 105 | 106 | what's the next step? 107 | } 108 | ] 109 | ``` 110 | 111 | The model may infer that you're asking it `what's the next step` by the tool schemas you supply, but it never hurts to roll it into your prompt template. 112 | 113 | ### code example 114 | 115 | We can build this with something like: 116 | 117 | ```python 118 | 119 | class Thread: 120 | events: List[Event] 121 | 122 | class Event: 123 | # could just use string, or could be explicit - up to you 124 | type: Literal["list_git_tags", "deploy_backend", "deploy_frontend", "request_more_information", "done_for_now", "list_git_tags_result", "deploy_backend_result", "deploy_frontend_result", "request_more_information_result", "done_for_now_result", "error"] 125 | data: ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation | 126 | ListGitTagsResult | DeployBackendResult | DeployFrontendResult | RequestMoreInformationResult | string 127 | 128 | def event_to_prompt(event: Event) -> str: 129 | data = event.data if isinstance(event.data, str) \ 130 | else stringifyToYaml(event.data) 131 | 132 | return f"<{event.type}>\n{data}\n" 133 | 134 | 135 | def thread_to_prompt(thread: Thread) -> str: 136 | return '\n\n'.join(event_to_prompt(event) for event in thread.events) 137 | ``` 138 | 139 | #### Example Context Windows 140 | 141 | Here's how context windows might look with this approach: 142 | 143 | **Initial Slack Request:** 144 | ```xml 145 | 146 | From: @alex 147 | Channel: #deployments 148 | Text: Can you deploy the latest backend to production? 149 | 150 | ``` 151 | 152 | **After Listing Git Tags:** 153 | ```xml 154 | 155 | From: @alex 156 | Channel: #deployments 157 | Text: Can you deploy the latest backend to production? 158 | Thread: [] 159 | 160 | 161 | 162 | intent: "list_git_tags" 163 | 164 | 165 | 166 | tags: 167 | - name: "v1.2.3" 168 | commit: "abc123" 169 | date: "2024-03-15T10:00:00Z" 170 | - name: "v1.2.2" 171 | commit: "def456" 172 | date: "2024-03-14T15:30:00Z" 173 | - name: "v1.2.1" 174 | commit: "ghi789" 175 | date: "2024-03-13T09:15:00Z" 176 | 177 | ``` 178 | 179 | **After Error and Recovery:** 180 | ```xml 181 | 182 | From: @alex 183 | Channel: #deployments 184 | Text: Can you deploy the latest backend to production? 185 | Thread: [] 186 | 187 | 188 | 189 | intent: "deploy_backend" 190 | tag: "v1.2.3" 191 | environment: "production" 192 | 193 | 194 | 195 | error running deploy_backend: Failed to connect to deployment service 196 | 197 | 198 | 199 | intent: "request_more_information_from_human" 200 | question: "I had trouble connecting to the deployment service, can you provide more details and/or check on the status of the service?" 201 | 202 | 203 | 204 | data: 205 | response: "I'm not sure what's going on, can you check on the status of the latest workflow?" 206 | 207 | ``` 208 | 209 | From here your next step might be: 210 | 211 | ```python 212 | nextStep = await determine_next_step(thread_to_prompt(thread)) 213 | ``` 214 | 215 | ```python 216 | { 217 | "intent": "get_workflow_status", 218 | "workflow_name": "tag_push_prod.yaml", 219 | } 220 | ``` 221 | 222 | The XML-style format is just one example - the point is you can build your own format that makes sense for your application. You'll get better quality if you have the flexibility to experiment with different context structures and what you store vs. what you pass to the LLM. 223 | 224 | Key benefits of owning your context window: 225 | 226 | 1. **Information Density**: Structure information in ways that maximize the LLM's understanding 227 | 2. **Error Handling**: Include error information in a format that helps the LLM recover. Consider hiding errors and failed calls from context window once they are resolved. 228 | 3. **Safety**: Control what information gets passed to the LLM, filtering out sensitive data 229 | 4. **Flexibility**: Adapt the format as you learn what works best for your use case 230 | 5. **Token Efficiency**: Optimize context format for token efficiency and LLM understanding 231 | 232 | Context includes: prompts, instructions, RAG documents, history, tool calls, memory 233 | 234 | 235 | Remember: The context window is your primary interface with the LLM. Taking control of how you structure and present information can dramatically improve your agent's performance. 236 | 237 | Example - information density - same message, fewer tokens: 238 | 239 | ![Loom Screenshot 2025-04-22 at 09 00 56](https://github.com/user-attachments/assets/5cf041c6-72da-4943-be8a-99c73162b12a) 240 | 241 | 242 | Recurring theme here: I don't know what's the best approach, but I know you want the flexibility to be able to try EVERYTHING. 243 | 244 | [← Own Your Prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md) | [Tools Are Structured Outputs →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md) 245 | -------------------------------------------------------------------------------- /content/factor-4-tools-are-structured-outputs.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 4. Tools are just structured outputs 4 | 5 | Tools don't need to be complex. At their core, they're just structured output from your LLM that triggers deterministic code. 6 | 7 | ![140-tools-are-just-structured-outputs](https://github.com/humanlayer/12-factor-agents/blob/main/img/140-tools-are-just-structured-outputs.png) 8 | 9 | For example, lets say you have two tools `CreateIssue` and `SearchIssues`. To ask an LLM to "use one of several tools" is just to ask it to output JSON we can parse into an object representing those tools. 10 | 11 | ```python 12 | 13 | class Issue: 14 | title: str 15 | description: str 16 | team_id: str 17 | assignee_id: str 18 | 19 | class CreateIssue: 20 | intent: "create_issue" 21 | issue: Issue 22 | 23 | class SearchIssues: 24 | intent: "search_issues" 25 | query: str 26 | what_youre_looking_for: str 27 | ``` 28 | 29 | The pattern is simple: 30 | 1. LLM outputs structured JSON 31 | 3. Deterministic code executes the appropriate action (like calling an external API) 32 | 4. Results are captured and fed back into the context 33 | 34 | This creates a clean separation between the LLM's decision-making and your application's actions. The LLM decides what to do, but your code controls how it's done. Just because an LLM "called a tool" doesn't mean you have to go execute a specific corresponding function in the same way every time. 35 | 36 | If you recall our switch statement from above 37 | 38 | ```python 39 | if nextStep.intent == 'create_payment_link': 40 | stripe.paymentlinks.create(nextStep.parameters) 41 | return # or whatever you want, see below 42 | elif nextStep.intent == 'wait_for_a_while': 43 | # do something monadic idk 44 | else: #... the model didn't call a tool we know about 45 | # do something else 46 | ``` 47 | 48 | **Note**: there has been a lot said about the benefits of "plain prompting" vs. "tool calling" vs. "JSON mode" and the performance tradeoffs of each. We'll link some resources to that stuff soon, but not gonna get into it here. See [Prompting vs JSON Mode vs Function Calling vs Constrained Generation vs SAP](https://www.boundaryml.com/blog/schema-aligned-parsing), [When should I use function calling, structured outputs, or JSON mode?](https://www.vellum.ai/blog/when-should-i-use-function-calling-structured-outputs-or-json-mode#:~:text=We%20don%27t%20recommend%20using%20JSON,always%20use%20Structured%20Outputs%20instead) and [OpenAI JSON vs Function Calling](https://docs.llamaindex.ai/en/stable/examples/llm/openai_json_vs_function_calling/). 49 | 50 | The "next step" might not be as atomic as just "run a pure function and return the result". You unlock a lot of flexibility when you think of "tool calls" as just a model outputting JSON describing what deterministic code should do. Put this together with [factor 8 own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md). 51 | 52 | [← Own Your Context Window](./factor-3-own-your-context-window.md) | [Unify Execution State →](./factor-5-unify-execution-state.md) 53 | -------------------------------------------------------------------------------- /content/factor-5-unify-execution-state.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 5. Unify execution state and business state 4 | 5 | Even outside the AI world, many infrastructure systems try to separate "execution state" from "business state". For AI apps, this might involve complex abstractions to track things like current step, next step, waiting status, retry counts, etc. This separation creates complexity that may be worthwhile, but may be overkill for your use case. 6 | 7 | As always, it's up to you to decide what's right for your application. But don't think you *have* to manage them separately. 8 | 9 | More clearly: 10 | 11 | - **Execution state**: current step, next step, waiting status, retry counts, etc. 12 | - **Business state**: What's happened in the agent workflow so far (e.g. list of OpenAI messages, list of tool calls and results, etc.) 13 | 14 | If possible, SIMPLIFY - unify these as much as possible. 15 | 16 | [![155-unify-state](https://github.com/humanlayer/12-factor-agents/blob/main/img/155-unify-state-animation.gif)](https://github.com/user-attachments/assets/e5a851db-f58f-43d8-8b0c-1926c99fc68d) 17 | 18 | 19 |
20 | GIF Version 21 | 22 | ![155-unify-state](https://github.com/humanlayer/12-factor-agents/blob/main/img/155-unify-state-animation.gif)] 23 | 24 |
25 | 26 | In reality, you can engineer your application so that you can infer all execution state from the context window. In many cases, execution state (current step, waiting status, etc.) is just metadata about what has happened so far. 27 | 28 | You may have things that can't go in the context window, like session ids, password contexts, etc, but your goal should be to minimize those things. By embracing [factor 3](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) you can control what actually goes into the LLM 29 | 30 | This approach has several benefits: 31 | 32 | 1. **Simplicity**: One source of truth for all state 33 | 2. **Serialization**: The thread is trivially serializable/deserializable 34 | 3. **Debugging**: The entire history is visible in one place 35 | 4. **Flexibility**: Easy to add new state by just adding new event types 36 | 5. **Recovery**: Can resume from any point by just loading the thread 37 | 6. **Forking**: Can fork the thread at any point by copying some subset of the thread into a new context / state ID 38 | 7. **Human Interfaces and Observability**: Trivial to convert a thread into a human-readable markdown or a rich Web app UI 39 | 40 | [← Tools Are Structured Outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md) | [Launch/Pause/Resume →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) 41 | -------------------------------------------------------------------------------- /content/factor-6-launch-pause-resume.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 6. Launch/Pause/Resume with simple APIs 4 | 5 | Agents are just programs, and we have things we expect from how to launch, query, resume, and stop them. 6 | 7 | [![pause-resume animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/165-pause-resume-animation.gif)](https://github.com/user-attachments/assets/feb1a425-cb96-4009-a133-8bd29480f21f) 8 | 9 |
10 | GIF Version 11 | 12 | ![pause-resume animation](https://github.com/humanlayer/12-factor-agents/blob/main/img/165-pause-resume-animation.gif)] 13 | 14 |
15 | 16 | 17 | It should be easy for users, apps, pipelines, and other agents to launch an agent with a simple API. 18 | 19 | Agents and their orchestrating deterministic code should be able to pause an agent when a long-running operation is needed. 20 | 21 | External triggers like webhooks should enable agents to resume from where they left off without deep integration with the agent orchestrator. 22 | 23 | Closely related to [factor 5 - unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) and [factor 8 - own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md), but can be implemented independently. 24 | 25 | 26 | 27 | **Note** - often AI orchestrators will allow for pause and resume, but not between the moment of tool selection and tool execution. See also [factor 7 - contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) and [factor 11 - trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md). 28 | 29 | [← Unify Execution State](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) | [Contact Humans With Tools →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) -------------------------------------------------------------------------------- /content/factor-7-contact-humans-with-tools.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 7. Contact humans with tool calls 4 | 5 | By default, LLM APIs rely on a fundamental HIGH-STAKES token choice: Are we returning plaintext content, or are we returning structured data? 6 | 7 | ![170-contact-humans-with-tools](https://github.com/humanlayer/12-factor-agents/blob/main/img/170-contact-humans-with-tools.png) 8 | 9 | You're putting a lot of weight on that choice of first token, which, in the `the weather in tokyo` case, is 10 | 11 | > "the" 12 | 13 | but in the `fetch_weather` case, it's some special token to denote the start of a JSON object. 14 | 15 | > |JSON> 16 | 17 | You might get better results by having the LLM *always* output json, and then declare it's intent with some natural language tokens like `request_human_input` or `done_for_now` (as opposed to a "proper" tool like `check_weather_in_city`). 18 | 19 | Again, you might not get any performance boost from this, but you should experiment, and ensure you're free to try weird stuff to get the best results. 20 | 21 | ```python 22 | 23 | class Options: 24 | urgency: Literal["low", "medium", "high"] 25 | format: Literal["free_text", "yes_no", "multiple_choice"] 26 | choices: List[str] 27 | 28 | # Tool definition for human interaction 29 | class RequestHumanInput: 30 | intent: "request_human_input" 31 | question: str 32 | context: str 33 | options: Options 34 | 35 | # Example usage in the agent loop 36 | if nextStep.intent == 'request_human_input': 37 | thread.events.append({ 38 | type: 'human_input_requested', 39 | data: nextStep 40 | }) 41 | thread_id = await save_state(thread) 42 | await notify_human(nextStep, thread_id) 43 | return # Break loop and wait for response to come back with thread ID 44 | else: 45 | # ... other cases 46 | ``` 47 | 48 | Later, you might receive a webhook from a system that handles slack, email, sms, or other events. 49 | 50 | ```python 51 | 52 | @app.post('/webhook') 53 | def webhook(req: Request): 54 | thread_id = req.body.threadId 55 | thread = await load_state(thread_id) 56 | thread.events.push({ 57 | type: 'response_from_human', 58 | data: req.body 59 | }) 60 | # ... simplified for brevity, you likely don't want to block the web worker here 61 | next_step = await determine_next_step(thread_to_prompt(thread)) 62 | thread.events.append(next_step) 63 | result = await handle_next_step(thread, next_step) 64 | # todo - loop or break or whatever you want 65 | 66 | return {"status": "ok"} 67 | ``` 68 | 69 | The above includes patterns from [factor 5 - unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md), [factor 8 - own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md), [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md), and [factor 4 - tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md), and several others. 70 | 71 | If we were using the XML-y formatted from [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md), our context window after a few turns might look like this: 72 | 73 | ```xml 74 | 75 | (snipped for brevity) 76 | 77 | 78 | From: @alex 79 | Channel: #deployments 80 | Text: Can you deploy backend v1.2.3 to production? 81 | Thread: [] 82 | 83 | 84 | 85 | intent: "request_human_input" 86 | question: "Would you like to proceed with deploying v1.2.3 to production?" 87 | context: "This is a production deployment that will affect live users." 88 | options: { 89 | urgency: "high" 90 | format: "yes_no" 91 | } 92 | 93 | 94 | 95 | response: "yes please proceed" 96 | approved: true 97 | timestamp: "2024-03-15T10:30:00Z" 98 | user: "alex@company.com" 99 | 100 | 101 | 102 | intent: "deploy_backend" 103 | tag: "v1.2.3" 104 | environment: "production" 105 | 106 | 107 | 108 | status: "success" 109 | message: "Deployment v1.2.3 to production completed successfully." 110 | timestamp: "2024-03-15T10:30:00Z" 111 | 112 | ``` 113 | 114 | 115 | Benefits: 116 | 117 | 1. **Clear Instructions**: Tools for different types of human contact allow for more specifity from the LLM 118 | 2. **Inner vs Outer Loop**: Enables agents workflows **outside** of the traditional chatGPT-style interface, where the control flow and context initialization may be `Agent->Human` rather than `Human->Agent` (think, agents kicked off by a cron or an event) 119 | 3. **Multiple Human Access**: Can easily track and coordinate input from different humans through structured events 120 | 4. **Multi-Agent**: Simple abstraction can be easily extended to support `Agent->Agent` requests and responses 121 | 5. **Durable**: Combined with [factor 6 - launch/pause/resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md), this makes for durable, reliable, and introspectable multiplayer workflows 122 | 123 | [More on Outer Loop Agents over here](https://theouterloop.substack.com/p/openais-realtime-api-is-a-step-towards) 124 | 125 | ![175-outer-loop-agents](https://github.com/humanlayer/12-factor-agents/blob/main/img/175-outer-loop-agents.png) 126 | 127 | Works great with [factor 11 - trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md) 128 | 129 | [← Launch/Pause/Resume](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) | [Own Your Control Flow →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) -------------------------------------------------------------------------------- /content/factor-8-own-your-control-flow.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 8. Own your control flow 4 | 5 | If you own your control flow, you can do lots of fun things. 6 | 7 | ![180-control-flow](https://github.com/humanlayer/12-factor-agents/blob/main/img/180-control-flow.png) 8 | 9 | 10 | Build your own control structures that make sense for your specific use case. Specifically, certain types of tool calls may be reason to break out of the loop and wait for a response from a human or another long-running task like a training pipeline. You may also want to incorporate custom implementation of: 11 | 12 | - summarization or caching of tool call results 13 | - LLM-as-judge on structured output 14 | - context window compaction or other [memory management](https://github.com/humanlayer/12-factor-agents/blob/main/content/appendix-14-everything-is-context-engineering.md) 15 | - logging, tracing, and metrics 16 | - client-side rate limiting 17 | - durable sleep / pause / "wait for event" 18 | 19 | 20 | The below example shows three possible control flow patterns: 21 | 22 | 23 | - request_clarification: model asked for more info, break the loop and wait for a response from a human 24 | - fetch_git_tags: model asked for a list of git tags, fetch the tags, append to context window, and pass straight back to the model 25 | - deploy_backend: model asked to deploy a backend, this is a high-stakes thing, so break the loop and wait for human approval 26 | 27 | ```python 28 | def handle_next_step(thread: Thread): 29 | 30 | while True: 31 | next_step = await determine_next_step(thread_to_prompt(thread)) 32 | 33 | # inlined for clarity - in reality you could put 34 | # this in a method, use exceptions for control flow, or whatever you want 35 | if next_step.intent == 'request_clarification': 36 | thread.events.append({ 37 | type: 'request_clarification', 38 | data: nextStep, 39 | }) 40 | 41 | await send_message_to_human(next_step) 42 | await db.save_thread(thread) 43 | # async step - break the loop, we'll get a webhook later 44 | break 45 | elif next_step.intent == 'fetch_open_issues': 46 | thread.events.append({ 47 | type: 'fetch_open_issues', 48 | data: next_step, 49 | }) 50 | 51 | issues = await linear_client.issues() 52 | 53 | thread.events.append({ 54 | type: 'fetch_open_issues_result', 55 | data: issues, 56 | }) 57 | # sync step - pass the new context to the LLM to determine the NEXT next step 58 | continue 59 | elif next_step.intent == 'create_issue': 60 | thread.events.append({ 61 | type: 'create_issue', 62 | data: next_step, 63 | }) 64 | 65 | await request_human_approval(next_step) 66 | await db.save_thread(thread) 67 | # async step - break the loop, we'll get a webhook later 68 | break 69 | ``` 70 | 71 | This pattern allows you to interrupt and resume your agent's flow as needed, creating more natural conversations and workflows. 72 | 73 | **Example** - the number one feature request I have for every AI framework out there is we need to be able to interrupt 74 | a working agent and resume later, ESPECIALLY between the moment of tool **selection** and the moment of tool **invocation**. 75 | 76 | Without this level of resumability/granularity, there's no way to review/approve the tool call before it runs, which means 77 | you're forced to either: 78 | 79 | 1. Pause the task in memory while waiting for the long-running thing to complete (think `while...sleep`) and restart it from the beginning if the process is interrupted 80 | 2. Restrict the agent to only low-stakes, low-risk calls like research and summarization 81 | 3. Give the agent access to do bigger, more useful things, and just yolo hope it doesn't screw up 82 | 83 | 84 | You may notice this is closely related to [factor 5 - unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) and [factor 6 - launch/pause/resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md), but can be implemented independently. 85 | 86 | [← Contact Humans With Tools](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) | [Compact Errors →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md) 87 | -------------------------------------------------------------------------------- /content/factor-9-compact-errors.md: -------------------------------------------------------------------------------- 1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md) 2 | 3 | ### 9. Compact Errors into Context Window 4 | 5 | This one is a little short but is worth mentioning. One of these benefits of agents is "self-healing" - for short tasks, an LLM might call a tool that fails. Good LLMs have a fairly good chance of reading an error message or stack trace and figuring out what to change in a subsequent tool call. 6 | 7 | 8 | Most frameworks implement this, but you can do JUST THIS without doing any of the other 11 factors. Here's an example of 9 | 10 | 11 | ```python 12 | thread = {"events": [inital_message]} 13 | 14 | while True: 15 | next_step = await determine_next_step(thread_to_prompt(thread)) 16 | thread["events"].append({ 17 | "type": next_step.intent, 18 | "data": next_step, 19 | }) 20 | try: 21 | result = await handle_next_step(thread, next_step) # our switch statement 22 | except Exception as e: 23 | # if we get an error, we can add it to the context window and try again 24 | thread["events"].append({ 25 | "type": 'error', 26 | "data": format_error(e), 27 | }) 28 | # loop, or do whatever else here to try to recover 29 | ``` 30 | 31 | You may want to implement an errorCounter for a specific tool call, to limit to ~3 attempts of a single tool, or whatever other logic makes sense for your use case. 32 | 33 | ```python 34 | consecutive_errors = 0 35 | 36 | while True: 37 | 38 | # ... existing code ... 39 | 40 | try: 41 | result = await handle_next_step(thread, next_step) 42 | thread["events"].append({ 43 | "type": next_step.intent + '_result', 44 | data: result, 45 | }) 46 | # success! reset the error counter 47 | consecutive_errors = 0 48 | except Exception as e: 49 | consecutive_errors += 1 50 | if consecutive_errors < 3: 51 | # do the loop and try again 52 | thread["events"].append({ 53 | "type": 'error', 54 | "data": format_error(e), 55 | }) 56 | else: 57 | # break the loop, reset parts of the context window, escalate to a human, or whatever else you want to do 58 | break 59 | } 60 | } 61 | ``` 62 | Hitting some consecutive-error-threshold might be a great place to [escalate to a human](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md), whether by model decision or via deterministic takeover of the control flow. 63 | 64 | [![195-factor-9-errors](https://github.com/humanlayer/12-factor-agents/blob/main/img/195-factor-9-errors.gif)](https://github.com/user-attachments/assets/cd7ed814-8309-4baf-81a5-9502f91d4043) 65 | 66 | 67 |
68 | [GIF Version](https://github.com/humanlayer/12-factor-agents/blob/main/img/195-factor-9-errors.gif) 69 | 70 | ![195-factor-9-errors](https://github.com/humanlayer/12-factor-agents/blob/main/img/195-factor-9-errors.gif) 71 | 72 |
73 | 74 | Benefits: 75 | 76 | 1. **Self-Healing**: The LLM can read the error message and figure out what to change in a subsequent tool call 77 | 2. **Durable**: The agent can continue to run even if one tool call fails 78 | 79 | I'm sure you will find that if you do this TOO much, your agent will start to spin out and might repeat the same error over and over again. 80 | 81 | That's where [factor 8 - own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) and [factor 3 - own your context building](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) come in - you don't need to just put the raw error back on, you can completely restructure how it's represented, remove previous events from the context window, or whatever deterministic thing you find works to get an agent back on track. 82 | 83 | But the number one way to prevent error spin-outs is to embrace [factor 10 - small, focused agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md). 84 | 85 | [← Own Your Control Flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) | [Small Focused Agents →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md) -------------------------------------------------------------------------------- /img/010-software-dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/010-software-dag.png -------------------------------------------------------------------------------- /img/015-dag-orchestrators.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/015-dag-orchestrators.png -------------------------------------------------------------------------------- /img/020-dags-with-ml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/020-dags-with-ml.png -------------------------------------------------------------------------------- /img/025-agent-dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/025-agent-dag.png -------------------------------------------------------------------------------- /img/026-agent-dag-lines.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/026-agent-dag-lines.png -------------------------------------------------------------------------------- /img/027-agent-loop-animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop-animation.gif -------------------------------------------------------------------------------- /img/027-agent-loop-animation.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop-animation.mp4 -------------------------------------------------------------------------------- /img/027-agent-loop-dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop-dag.png -------------------------------------------------------------------------------- /img/027-agent-loop.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop.png -------------------------------------------------------------------------------- /img/028-micro-agent-dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/028-micro-agent-dag.png -------------------------------------------------------------------------------- /img/029-deploybot-high-level.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/029-deploybot-high-level.png -------------------------------------------------------------------------------- /img/030-deploybot-animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/030-deploybot-animation.gif -------------------------------------------------------------------------------- /img/030-deploybot-animation.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/030-deploybot-animation.mp4 -------------------------------------------------------------------------------- /img/031-deploybot-animation-5.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation-5.gif -------------------------------------------------------------------------------- /img/031-deploybot-animation-5.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation-5.mp4 -------------------------------------------------------------------------------- /img/031-deploybot-animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation.gif -------------------------------------------------------------------------------- /img/031-deploybot-animation.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation.mp4 -------------------------------------------------------------------------------- /img/033-deploybot.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/033-deploybot.gif -------------------------------------------------------------------------------- /img/035-deploybot-conversation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/035-deploybot-conversation.png -------------------------------------------------------------------------------- /img/040-4-components.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/040-4-components.png -------------------------------------------------------------------------------- /img/110-natural-language-tool-calls.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/110-natural-language-tool-calls.png -------------------------------------------------------------------------------- /img/120-own-your-prompts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/120-own-your-prompts.png -------------------------------------------------------------------------------- /img/130-own-your-context-building.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/130-own-your-context-building.png -------------------------------------------------------------------------------- /img/140-tools-are-just-structured-outputs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/140-tools-are-just-structured-outputs.png -------------------------------------------------------------------------------- /img/150-all-state-in-context-window.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/150-all-state-in-context-window.png -------------------------------------------------------------------------------- /img/150-unify-state.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/150-unify-state.png -------------------------------------------------------------------------------- /img/155-unify-state-animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/155-unify-state-animation.gif -------------------------------------------------------------------------------- /img/160-pause-resume-with-simple-apis.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/160-pause-resume-with-simple-apis.png -------------------------------------------------------------------------------- /img/165-pause-resume-animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/165-pause-resume-animation.gif -------------------------------------------------------------------------------- /img/170-contact-humans-with-tools.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/170-contact-humans-with-tools.png -------------------------------------------------------------------------------- /img/175-outer-loop-agents.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/175-outer-loop-agents.png -------------------------------------------------------------------------------- /img/180-control-flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/180-control-flow.png -------------------------------------------------------------------------------- /img/190-factor-9-errors-static.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/190-factor-9-errors-static.png -------------------------------------------------------------------------------- /img/195-factor-9-errors.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/195-factor-9-errors.gif -------------------------------------------------------------------------------- /img/1a0-small-focused-agents.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1a0-small-focused-agents.png -------------------------------------------------------------------------------- /img/1a5-agent-scope-grow.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1a5-agent-scope-grow.gif -------------------------------------------------------------------------------- /img/1b0-trigger-from-anywhere.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1b0-trigger-from-anywhere.png -------------------------------------------------------------------------------- /img/1c0-stateless-reducer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1c0-stateless-reducer.png -------------------------------------------------------------------------------- /img/1c5-agent-foldl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1c5-agent-foldl.png -------------------------------------------------------------------------------- /img/220-context-engineering.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/220-context-engineering.png --------------------------------------------------------------------------------