├── LICENSE
├── README.md
├── content
├── appendix-13-pre-fetch.md
├── brief-history-of-software.md
├── factor-1-natural-language-to-tool-calls.md
├── factor-10-small-focused-agents.md
├── factor-11-trigger-from-anywhere.md
├── factor-12-stateless-reducer.md
├── factor-2-own-your-prompts.md
├── factor-3-own-your-context-window.md
├── factor-4-tools-are-structured-outputs.md
├── factor-5-unify-execution-state.md
├── factor-6-launch-pause-resume.md
├── factor-7-contact-humans-with-tools.md
├── factor-8-own-your-control-flow.md
└── factor-9-compact-errors.md
└── img
├── 010-software-dag.png
├── 015-dag-orchestrators.png
├── 020-dags-with-ml.png
├── 025-agent-dag.png
├── 026-agent-dag-lines.png
├── 027-agent-loop-animation.gif
├── 027-agent-loop-animation.mp4
├── 027-agent-loop-dag.png
├── 027-agent-loop.png
├── 028-micro-agent-dag.png
├── 029-deploybot-high-level.png
├── 030-deploybot-animation.gif
├── 030-deploybot-animation.mp4
├── 031-deploybot-animation-5.gif
├── 031-deploybot-animation-5.mp4
├── 031-deploybot-animation.gif
├── 031-deploybot-animation.mp4
├── 033-deploybot.gif
├── 035-deploybot-conversation.png
├── 040-4-components.png
├── 110-natural-language-tool-calls.png
├── 120-own-your-prompts.png
├── 130-own-your-context-building.png
├── 140-tools-are-just-structured-outputs.png
├── 150-all-state-in-context-window.png
├── 150-unify-state.png
├── 155-unify-state-animation.gif
├── 160-pause-resume-with-simple-apis.png
├── 165-pause-resume-animation.gif
├── 170-contact-humans-with-tools.png
├── 175-outer-loop-agents.png
├── 180-control-flow.png
├── 190-factor-9-errors-static.png
├── 195-factor-9-errors.gif
├── 1a0-small-focused-agents.png
├── 1a5-agent-scope-grow.gif
├── 1b0-trigger-from-anywhere.png
├── 1c0-stateless-reducer.png
├── 1c5-agent-foldl.png
└── 220-context-engineering.png
/LICENSE:
--------------------------------------------------------------------------------
1 | Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0)
2 |
3 | This is a human-readable summary of (and not a substitute for) the license. Disclaimer.
4 |
5 | You are free to:
6 |
7 | - Share — copy and redistribute the material in any medium or format
8 | - Adapt — remix, transform, and build upon the material for any purpose, even commercially.
9 |
10 | This license is acceptable for Free Cultural Works.
11 |
12 | The licensor cannot revoke these freedoms as long as you follow the license terms.
13 |
14 | Under the following terms:
15 |
16 | - Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
17 | - ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
18 | - No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
19 |
20 | Notices:
21 |
22 | You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
23 |
24 | No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
25 |
26 | For the full text of this license, visit https://creativecommons.org/licenses/by-sa/4.0/legalcode
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
7 |
8 |
9 | # 12 Factor Agents - Principles for building reliable LLM applications
10 |
11 |
12 |
13 | *In the spirit of [12 Factor Apps](https://12factor.net/)*. *The source for this project is public at https://github.com/humanlayer/12-factor-agents, and I welcome your feedback and contributions. Let's figure this out together!*
14 |
15 |
16 |
17 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 | Hi, I'm Dex. I've been [hacking](https://youtu.be/8bIHcttkOTE) on [AI agents](https://theouterloop.substack.com) for [a while](https://humanlayer.dev).
35 |
36 |
37 | **I've tried every agent framework out there**, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to the "production grade" langraph, griptape, etc.
38 |
39 | **I've talked to a lot of really strong founders**, in and out of YC, who are all building really impressive things with AI. Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents.
40 |
41 | **I've been surprised to find** that most of the products out there billing themselves as "AI Agents" are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical.
42 |
43 | Agents, at least the good ones, don't follow the ["here's your prompt, here's a bag of tools, loop until you hit the goal"](https://www.anthropic.com/engineering/building-effective-agents#agents) pattern. Rather, they are comprised of mostly just software.
44 |
45 | So, I set out to answer:
46 |
47 | > ### **What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?**
48 |
49 | Welcome to 12-factor agents. As every Chicago mayor since Daley has consistently plastered all over the city's major airports, we're glad you're here.
50 |
51 | *Special thanks to [@iantbutler01](https://github.com/iantbutler01), [@tnm](https://github.com/tnm), [@hellovai](https://www.github.com/hellovai), [@stantonk](https://www.github.com/stantonk), [@balanceiskey](https://www.github.com/balanceiskey), [@AdjectiveAllison](https://www.github.com/AdjectiveAllison), [@pfbyjy](https://www.github.com/pfbyjy), [@a-churchill](https://www.github.com/a-churchill), and the SF MLOps community for early feedback on this guide.*
52 |
53 | ## The Short Version: The 12 Factors
54 |
55 | Even if LLMs [continue to get exponentially more powerful](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md#what-if-llms-get-smarter), there will be core engineering techniques that make LLM-powered software more reliable, more scalable, and easier to maintain.
56 |
57 | - [How We Got Here: A Brief History of Software](https://github.com/humanlayer/12-factor-agents/blob/main/content/brief-history-of-software.md)
58 | - [Factor 1: Natural Language to Tool Calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md)
59 | - [Factor 2: Own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
60 | - [Factor 3: Own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
61 | - [Factor 4: Tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md)
62 | - [Factor 5: Unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md)
63 | - [Factor 6: Launch/Pause/Resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md)
64 | - [Factor 7: Contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md)
65 | - [Factor 8: Own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md)
66 | - [Factor 9: Compact Errors into Context Window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md)
67 | - [Factor 10: Small, Focused Agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md)
68 | - [Factor 11: Trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md)
69 | - [Factor 12: Make your agent a stateless reducer](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-12-stateless-reducer.md)
70 |
71 | ### Visual Nav
72 |
73 | | | | |
74 | |----|----|-----|
75 | |[](./content/factor-1-natural-language-to-tool-calls.md) | [](./content/factor-2-own-your-prompts.md) | [](./content/factor-3-own-your-context-window.md) |
76 | |[](./content/factor-4-tools-are-structured-outputs.md) | [](./content/factor-5-unify-execution-state.md) | [](./content/factor-6-launch-pause-resume.md) |
77 | | [](./content/factor-7-contact-humans-with-tools.md) | [](./content/factor-8-own-your-control-flow.md) | [](./content/factor-9-compact-errors.md) |
78 | | [](./content/factor-10-small-focused-agents.md) | [](./content/factor-11-trigger-from-anywhere.md) | [](./content/factor-12-stateless-reducer.md) |
79 |
80 | ## How we got here
81 |
82 | For a deeper dive on my agent journey and what led us here, check out [A Brief History of Software](./content/brief-history-of-software.md) - a quick summary here:
83 |
84 | ### The promise of agents
85 |
86 | We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.
87 |
88 | 
89 |
90 | ### From code to DAGs
91 |
92 | Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like [Airflow](https://airflow.apache.org/), [Prefect](https://www.prefect.io/), some predecessors, and some newer ones like ([dagster](https://dagster.io/), [inggest](https://www.inngest.com/), [windmill](https://www.windmill.dev/)). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.
93 |
94 | 
95 |
96 | ### The promise of agents
97 |
98 | I'm not the first [person to say this](https://youtu.be/Dc99-zTMyMg?si=bcT0hIwWij2mR-40&t=73), but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:
99 |
100 | 
101 |
102 | And let the LLM make decisions in real time to figure out the path
103 |
104 | 
105 |
106 | The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.
107 |
108 |
109 | ### Agents as loops
110 |
111 | As we'll see later, it turns out this doesn't quite work.
112 |
113 | Let's dive one step deeper - with agents you've got this loop consisting of 3 steps:
114 |
115 | 1. LLM determines the next step in the workflow, outputting structured json ("tool calling")
116 | 2. Deterministic code executes the tool call
117 | 3. The result is appended to the context window
118 | 4. Repeat until the next step is determined to be "done"
119 |
120 | ```python
121 | initial_event = {"message": "..."}
122 | context = [initial_event]
123 | while True:
124 | next_step = await llm.determine_next_step(context)
125 | context.append(next_step)
126 |
127 | if (next_step.intent === "done"):
128 | return next_step.final_answer
129 |
130 | result = await execute_step(next_step)
131 | context.append(result)
132 | ```
133 |
134 | Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done.
135 |
136 | Here's a multi-step example:
137 |
138 | [](https://github.com/user-attachments/assets/3beb0966-fdb1-4c12-a47f-ed4e8240f8fd)
139 |
140 |
141 | GIF Version
142 |
143 | ]
144 |
145 |
146 |
147 | ## Why 12-factor agents?
148 |
149 | At the end of the day, this approach just doesn't work as well as we want it to.
150 |
151 | In building HumanLayer, I've talked to at least 100 SaaS builders (mostly technical founders) looking to make their existing product more agentic. The journey usually goes something like:
152 |
153 | 1. Decide you want to build an agent
154 | 2. Product design, UX mapping, what problems to solve
155 | 3. Want to move fast, so grab $FRAMEWORK and *get to building*
156 | 4. Get to 70-80% quality bar
157 | 5. Realize that 80% isn't good enough for most customer-facing features
158 | 6. Realize that getting past 80% requires reverse-engineering the framework, prompts, flow, etc.
159 | 7. Start over from scratch
160 |
161 |
162 | Random Disclaimers
163 |
164 | **DISCLAIMER**: I'm not sure the exact right place to say this, but here seems as good as any: **this in BY NO MEANS meant to be a dig on either the many frameworks out there, or the pretty dang smart people who work on them**. They enable incredible things and have accelerated the AI ecosystem.
165 |
166 | I hope that one outcome of this post is that agent framework builders can learn from the journeys of myself and others, and make frameworks even better.
167 |
168 | Especially for builders who want to move fast but need deep control.
169 |
170 | **DISCLAIMER 2**: I'm not going to talk about MCP. I'm sure you can see where it fits in.
171 |
172 | **DISCLAIMER 3**: I'm using mostly typescript, for [reasons](https://www.linkedin.com/posts/dexterihorthy_llms-typescript-aiagents-activity-7290858296679313408-Lh9e?utm_source=share&utm_medium=member_desktop&rcm=ACoAAA4oHTkByAiD-wZjnGsMBUL_JT6nyyhOh30) but all this stuff works in python or any other language you prefer.
173 |
174 |
175 | Anyways back to the thing...
176 |
177 |
178 |
179 | ### Design Patterns for great LLM applications
180 |
181 | After digging through hundreds of AI libriaries and working with dozens of founders, my instinct is this:
182 |
183 | 1. There are some core things that make agents great
184 | 2. Going all in on a framework and building what is essentially a greenfield rewrite may be counter-productive
185 | 3. There are some core principles that make agents great, and you will get most/all of them if you pull in a framework
186 | 4. BUT, the fastest way I've seen for builders to get high-quality AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product
187 | 5. These modular concepts from agents can be defined and applied by most skilled software engineers, even if they don't have an AI background
188 |
189 | > #### The fastest way I've seen for builders to get good AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product
190 |
191 |
192 | ## The 12 Factors (again)
193 |
194 |
195 | - [How We Got Here: A Brief History of Software](./content/brief-history-of-software.md)
196 | - [Factor 1: Natural Language to Tool Calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md)
197 | - [Factor 2: Own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
198 | - [Factor 3: Own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
199 | - [Factor 4: Tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md)
200 | - [Factor 5: Unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md)
201 | - [Factor 6: Launch/Pause/Resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md)
202 | - [Factor 7: Contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md)
203 | - [Factor 8: Own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md)
204 | - [Factor 9: Compact Errors into Context Window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md)
205 | - [Factor 10: Small, Focused Agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md)
206 | - [Factor 11: Trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md)
207 | - [Factor 12: Make your agent a stateless reducer](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-12-stateless-reducer.md)
208 |
209 | ## Honorable Mentions / other advice
210 |
211 | - [Factor 13: Pre-fetch all the context you might need](./content/appendix-13-pre-fetch.md)
212 |
213 | ## Related Resources
214 |
215 | - Contribute to this guide [here](https://github.com/humanlayer/12-factor-agents)
216 | - [I talked about a lot of this on an episode of the Tool Use podcast](https://youtu.be/8bIHcttkOTE) in March 2025
217 | - I write about some of this stuff at [The Outer Loop](https://theouterloop.substack.com)
218 | - I do [webinars about Maximizing LLM Performance](https://github.com/hellovai/ai-that-works/tree/main) with [@hellovai](https://github.com/hellovai)
219 | - We build OSS agents with this methodology under [got-agents/agents](https://github.com/got-agents/agents)
220 | - We ignored all our own advice and built a [framework for running distributed agents in kubernetes](https://github.com/humanlayer/kubechain)
221 | - Other links from this guide:
222 | - [12 Factor Apps](https://12factor.net)
223 | - [Building Effective Agents (Anthropic)](https://www.anthropic.com/engineering/building-effective-agents#agents)
224 | - [Prompts are Functions](https://thedataexchange.media/baml-revolution-in-ai-engineering/ )
225 | - [Library patterns: Why frameworks are evil](https://tomasp.net/blog/2015/library-frameworks/)
226 | - [The Wrong Abstraction](https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction)
227 | - [Mailcrew Agent](https://github.com/dexhorthy/mailcrew)
228 | - [Mailcrew Demo Video](https://www.youtube.com/watch?v=f_cKnoPC_Oo)
229 | - [Chainlit Demo](https://x.com/chainlit_io/status/1858613325921480922)
230 | - [TypeScript for LLMs](https://www.linkedin.com/posts/dexterihorthy_llms-typescript-aiagents-activity-7290858296679313408-Lh9e)
231 | - [Schema Aligned Parsing](https://www.boundaryml.com/blog/schema-aligned-parsing)
232 | - [Function Calling vs Structured Outputs vs JSON Mode](https://www.vellum.ai/blog/when-should-i-use-function-calling-structured-outputs-or-json-mode)
233 | - [BAML on GitHub](https://github.com/boundaryml/baml)
234 | - [OpenAI JSON vs Function Calling](https://docs.llamaindex.ai/en/stable/examples/llm/openai_json_vs_function_calling/)
235 | - [Outer Loop Agents](https://theouterloop.substack.com/p/openais-realtime-api-is-a-step-towards)
236 | - [Airflow](https://airflow.apache.org/)
237 | - [Prefect](https://www.prefect.io/)
238 | - [Dagster](https://dagster.io/)
239 | - [Inngest](https://www.inngest.dev/)
240 | - [Windmill](https://www.windmill.dev/)
241 | - [The AI Agent Index (MIT)](https://aiagentindex.mit.edu/)
242 | - [NotebookLM on Finding Model Capability Boundaries](https://open.substack.com/pub/swyx/p/notebooklm?selection=08e1187c-cfee-4c63-93c9-71216640a5f8)
243 |
244 |
245 |
246 |
247 |
--------------------------------------------------------------------------------
/content/appendix-13-pre-fetch.md:
--------------------------------------------------------------------------------
1 | ### Factor 13 - pre-fetch all the context you might need
2 |
3 | If there's a high chance that your model will call tool X, don't waste token round trips telling the model to fetch it, that is, instead of a pseudo-prompt like:
4 |
5 | ```jinja
6 | When looking at deployments, you will likely want to fetch the list of published git tags,
7 | so you can use it to deploy to prod.
8 |
9 | Here's what happened so far:
10 |
11 | {{ thread.events }}
12 |
13 | What's the next step?
14 |
15 | Answer in JSON format with one of the following intents:
16 |
17 | {
18 | intent: 'deploy_backend_to_prod',
19 | tag: string
20 | } OR {
21 | intent: 'list_git_tags'
22 | } OR {
23 | intent: 'done_for_now',
24 | message: string
25 | }
26 | ```
27 |
28 | and your code looks like
29 |
30 | ```python
31 | thread = {"events": [inital_message]}
32 | next_step = await determine_next_step(thread)
33 |
34 | while True:
35 | switch next_step.intent:
36 | case 'list_git_tags':
37 | tags = await fetch_git_tags()
38 | thread["events"].append({
39 | type: 'list_git_tags',
40 | data: tags,
41 | })
42 | case 'deploy_backend_to_prod':
43 | deploy_result = await deploy_backend_to_prod(next_step.data.tag)
44 | thread["events"].append({
45 | "type": 'deploy_backend_to_prod',
46 | "data": deploy_result,
47 | })
48 | case 'done_for_now':
49 | await notify_human(next_step.message)
50 | break
51 | # ...
52 | ```
53 |
54 | You might as well just fetch the tags and include them in the context window, like:
55 |
56 | ```diff
57 | - When looking at deployments, you will likely want to fetch the list of published git tags,
58 | - so you can use it to deploy to prod.
59 |
60 | + The current git tags are:
61 |
62 | + {{ git_tags }}
63 |
64 |
65 | Here's what happened so far:
66 |
67 | {{ thread.events }}
68 |
69 | What's the next step?
70 |
71 | Answer in JSON format with one of the following intents:
72 |
73 | {
74 | intent: 'deploy_backend_to_prod',
75 | tag: string
76 | - } OR {
77 | - intent: 'list_git_tags'
78 | } OR {
79 | intent: 'done_for_now',
80 | message: string
81 | }
82 |
83 | ```
84 |
85 | and your code looks like
86 |
87 | ```diff
88 | thread = {"events": [inital_message]}
89 | + git_tags = await fetch_git_tags()
90 |
91 | - next_step = await determine_next_step(thread)
92 | + next_step = await determine_next_step(thread, git_tags)
93 |
94 | while True:
95 | switch next_step.intent:
96 | - case 'list_git_tags':
97 | - tags = await fetch_git_tags()
98 | - thread["events"].append({
99 | - type: 'list_git_tags',
100 | - data: tags,
101 | - })
102 | case 'deploy_backend_to_prod':
103 | deploy_result = await deploy_backend_to_prod(next_step.data.tag)
104 | thread["events"].append({
105 | "type": 'deploy_backend_to_prod',
106 | "data": deploy_result,
107 | })
108 | case 'done_for_now':
109 | await notify_human(next_step.message)
110 | break
111 | # ...
112 | ```
113 |
114 | or even just include the tags in the thread and remove the specific parameter from your prompt template:
115 |
116 | ```diff
117 | thread = {"events": [inital_message]}
118 | + # add the request
119 | + thread["events"].append({
120 | + "type": 'list_git_tags',
121 | + })
122 |
123 | git_tags = await fetch_git_tags()
124 |
125 | + # add the result
126 | + thread["events"].append({
127 | + "type": 'list_git_tags_result',
128 | + "data": git_tags,
129 | + })
130 |
131 | - next_step = await determine_next_step(thread, git_tags)
132 | + next_step = await determine_next_step(thread)
133 |
134 | while True:
135 | switch next_step.intent:
136 | case 'deploy_backend_to_prod':
137 | deploy_result = await deploy_backend_to_prod(next_step.data.tag)
138 | thread["events"].append(deploy_result)
139 | case 'done_for_now':
140 | await notify_human(next_step.message)
141 | break
142 | # ...
143 | ```
144 |
145 | Overall:
146 |
147 | > #### If you already know what tools you'll want the model to call, just call them DETERMINISTICALLY and let the model do the hard part of figuring out how to use their outputs
148 |
149 | Again, AI engineering is all about [Context Engineering](./factor-3-own-your-context-window.md).
150 |
151 | [← Stateless Reducer](./factor-12-stateless-reducer.md) | [Further Reading →](../README.md#related-resources)
152 |
--------------------------------------------------------------------------------
/content/brief-history-of-software.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ## The longer version: how we got here
4 |
5 | ### You don't have to listen to me
6 |
7 | Whether you're new to agents or an ornery old veteran like me, I'm going to try to convince you to throw out most of what you think about AI Agents, take a step back, and rethink them from first principles. (spoiler alert if you didn't catch the OpenAI responses launch a few weeks back, but pushing MORE agent logic behind an API ain't it)
8 |
9 |
10 | ## Agents are software, and a brief history thereof
11 |
12 | let's talk about how we got here
13 |
14 | ### 60 years ago
15 |
16 | We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.
17 |
18 | 
19 |
20 | ### 20 years ago
21 |
22 | Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like [Airflow](https://airflow.apache.org/), [Prefect](https://www.prefect.io/), some predecessors, and some newer ones like ([dagster](https://dagster.io/), [inggest](https://www.inngest.dev/), [windmill](https://www.windmill.dev/)). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.
23 |
24 | 
25 |
26 | ### 10-15 years ago
27 |
28 | When ML models started to get good enough to be useful, we started to see DAGs with ML models sprinkled in. You might imagine steps like "summarize the text in this column into a new column" or "classify the support issues by severity or sentiment".
29 |
30 | 
31 |
32 | But at the end of the day, it's still mostly the same good old deterministic software.
33 |
34 | ### The promise of agents
35 |
36 | I'm not the first [person to say this](https://youtu.be/Dc99-zTMyMg?si=bcT0hIwWij2mR-40&t=73), but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:
37 |
38 | 
39 |
40 | And let the LLM make decisions in real time to figure out the path
41 |
42 | 
43 |
44 | The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.
45 |
46 | ### Agents as loops
47 |
48 | Put another way, you've got this loop consisting of 3 steps:
49 |
50 | 1. LLM determines the next step in the workflow, outputting structured json ("tool calling")
51 | 2. Deterministic code executes the tool call
52 | 3. The result is appended to the context window
53 | 4. repeat until the next step is determined to be "done"
54 |
55 | ```python
56 | initial_event = {"message": "..."}
57 | context = [initial_event]
58 | while True:
59 | next_step = await llm.determine_next_step(context)
60 | context.append(next_step)
61 |
62 | if (next_step.intent === "done"):
63 | return next_step.final_answer
64 |
65 | result = await execute_step(next_step)
66 | context.append(result)
67 | ```
68 |
69 | Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc),
70 | and we ask the llm to choose the next step (tool) or to determine that we're done.
71 |
72 | Here's a multi-step example:
73 |
74 | [](https://github.com/user-attachments/assets/3beb0966-fdb1-4c12-a47f-ed4e8240f8fd)
75 |
76 |
77 | GIF Version
78 |
79 | ]
80 |
81 |
82 |
83 | And the "materialized" DAG that was generated would look something like:
84 |
85 | 
86 |
87 | ### The problem with this "loop until you solve it" pattern
88 |
89 | The biggest problems with this pattern:
90 |
91 | - Agents get lost when the context window gets too long - they spin out trying the same broken approach over and over again
92 | - literally thats it, but that's enough to kneecap the approach
93 |
94 | Even if you haven't hand-rolled an agent, you've probably seen this long-context problem in working with agentic coding tools. They just get lost after a while and you need to start a new chat.
95 |
96 | I'll even perhaps posit something I've heard in passing quite a bit, and that YOU probably have developed your own intuition around:
97 |
98 | > ### **Even as models support longer and longer context windows, you'll ALWAYS get better results with a small, focused prompt and context**
99 |
100 | Most builders I've talked to **pushed the "tool calling loop" idea to the side** when they realized that anything more than 10-20 turns becomes a big mess that the LLM can't recover from. Even if the agent gets it right 90% of the time, that's miles away from "good enough to put in customer hands". Can you imagine a web app that crashed on 10% of page loads?
101 |
102 | ### What actually works - micro agents
103 |
104 | One thing that I **have** seen in the wild quite a bit is taking the agent pattern and sprinkling it into a broader more deterministic DAG.
105 |
106 | 
107 |
108 | You might be asking - "why use agents at all in this case?" - we'll get into that shortly, but basically, having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback, translating it into workflow steps without spinning out into context error loops. ([factor 1](./factor-1-natural-language-to-tool-calls.md), [factor 3](./factor-3-own-your-context-window.md) [factor 7](./factor-7-contact-humans-with-tools.md)).
109 |
110 | > #### having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback...without spinning out into context error loops
111 |
112 | ### A real life micro agent
113 |
114 | Here's an example of how deterministic code might run one micro agent responsible for handling the human-in-the-loop steps for deployment.
115 |
116 | 
117 |
118 | * **Human** Merges PR to GitHub main branch
119 | * **Deterministic Code** Deploys to staging env
120 | * **Deterministic Code** Runs end-to-end (e2e) tests against staging
121 | * **Deterministic Code** Hands to agent for prod deployment, with initial context: "deploy SHA 4af9ec0 to production"
122 | * **Agent** calls `deploy_frontend_to_prod(4af9ec0)`
123 | * **Deterministic code** requests human approval on this action
124 | * **Human** Rejects the action with feedback "can you deploy the backend first?"
125 | * **Agent** calls `deploy_backend_to_prod(4af9ec0)`
126 | * **Deterministic code** requests human approval on this action
127 | * **Human** approves the action
128 | * **Deterministic code** executed the backend deployment
129 | * **Agent** calls `deploy_frontend_to_prod(4af9ec0)`
130 | * **Deterministic code** requests human approval on this action
131 | * **Human** approves the action
132 | * **Deterministic code** executed the frontend deployment
133 | * **Agent** determines that the task was completed successfully, we're done!
134 | * **Deterministic code** run the end-to-end tests against production
135 | * **Deterministic code** task completed, OR pass to rollback agent to review failures and potentially roll back
136 |
137 | [](https://github.com/user-attachments/assets/deb356e9-0198-45c2-9767-231cb569ae13)
138 |
139 |
140 | GIF Version
141 |
142 | ]
143 |
144 |
145 |
146 | This example is based on a real life [OSS agent we've shipped to manage our deployments at Humanlayer](https://github.com/got-agents/agents/tree/main/deploybot-ts) - here is a real conversation I had with it last week:
147 |
148 | 
149 |
150 |
151 | We haven't given this agent a huge pile of tools or tasks. The primary value in the LLM is parsing the human's plaintext feedback and proposing an updated course of action. We isolate tasks and contexts as much as possible to keep the LLM focused on a small, 5-10 step workflow.
152 |
153 | Here's another [more classic support / chatbot demo](https://x.com/chainlit_io/status/1858613325921480922).
154 |
155 | ### So what's an agent really?
156 |
157 | - **prompt** - tell an LLM how to behave, and what "tools" it has available. The output of the prompt is a JSON object that describe the next step in the workflow (the "tool call" or "function call"). ([factor 2](./factor-2-own-your-prompts.md))
158 | - **switch statement** - based on the JSON that the LLM returns, decide what to do with it. (part of [factor 8](./factor-8-own-your-control-flow.md))
159 | - **accumulated context** - store the list of steps that have happened and their results ([factor 3](./factor-3-own-your-context-window.md))
160 | - **for loop** - until the LLM emits some sort of "Terminal" tool call (or plaintext response), add the result of the switch statement to the context window and ask the LLM to choose the next step. ([factor 8](./factor-8-own-your-control-flow.md))
161 |
162 | 
163 |
164 | In the "deploybot" example, we gain a couple benefits from owning the control flow and context accumulation:
165 |
166 | - In our **switch statement** and **for loop**, we can hijack control flow to pause for human input or to wait for completion of long-running tasks
167 | - We can trivially serialize the **context** window for pause+resume
168 | - In our **prompt**, we can optimize the heck out of how we pass instructions and "what happened so far" to the LLM
169 |
170 |
171 | [Part II](https://github.com/humanlayer/12-factor-agents/blob/main/README.md#12-factor-agents) will **formalize these patterns** so they can be applied to add impressive AI features to any software project, without needing to go all in on conventional implementations/definitions of "AI agent".
172 |
173 | [Factor 1 - Natural Language to Tool Calls →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md)
--------------------------------------------------------------------------------
/content/factor-1-natural-language-to-tool-calls.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 1. Natural Language to Tool Calls
4 |
5 | One of the most common patterns in agent building is to convert natural language to structured tool calls. This is a powerful pattern that allows you to build agents that can reason about tasks and execute them.
6 |
7 | 
8 |
9 | This pattern, when applied atomically, is the simple translation of a phrase like
10 |
11 | > can you create a payment link for $750 to Terri for sponsoring the february AI tinkerers meetup?
12 |
13 | to a structured object that describes a Stripe API call like
14 |
15 | ```json
16 | {
17 | "function": {
18 | "name": "create_payment_link",
19 | "parameters": {
20 | "amount": 750,
21 | "customer": "cust_128934ddasf9",
22 | "product": "prod_8675309",
23 | "price": "prc_09874329fds",
24 | "quantity": 1,
25 | "memo": "Hey Jeff - see below for the payment link for the february ai tinkerers meetup"
26 | }
27 | }
28 | }
29 | ```
30 |
31 | **Note**: in reality the stripe API is a bit more complex, a [real agent that does this](https://github.com/dexhorthy/mailcrew) ([video](https://www.youtube.com/watch?v=f_cKnoPC_Oo)) would list customers, list products, list prices, etc to build this payload with the proper ids, or include those ids in the prompt/context window (we'll see below how those are kinda the same thing though!)
32 |
33 | From there, deterministic code can pick up the payload and do something with it. (More on this in [factor 3](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md))
34 |
35 | ```python
36 | # The LLM takes natural language and returns a structured object
37 | nextStep = await llm.determineNextStep(
38 | """
39 | create a payment link for $750 to Jeff
40 | for sponsoring the february AI tinkerers meetup
41 | """
42 | )
43 |
44 | // Handle the structured output based on its function
45 | if nextStep.function == 'create_payment_link':
46 | stripe.paymentlinks.create(nextStep.parameters)
47 | return # or whatever you want, see below
48 | elif nextStep.function == 'something_else':
49 | # ... more cases
50 | pass
51 | else: # the model didn't call a tool we know about
52 | # do something else
53 | pass
54 | ```
55 |
56 | **NOTE**: While a full agent would then receive the API call result and loop with it, eventually returning something like
57 |
58 | > I've successfully created a payment link for $750 to Terri for sponsoring the february AI tinkerers meetup. Here's the link: https://buy.stripe.com/test_1234567890
59 |
60 | **Instead**, We're actually going to skip that step here, and save it for another factor, which you may or may not want to also incorporate (up to you!)
61 |
62 | [← How We Got Here](https://github.com/humanlayer/12-factor-agents/blob/main/content/brief-history-of-software.md) | [Own Your Prompts →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
--------------------------------------------------------------------------------
/content/factor-10-small-focused-agents.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 10. Small, Focused Agents
4 |
5 | Rather than building monolithic agents that try to do everything, build small, focused agents that do one thing well. Agents are just one building block in a larger, mostly deterministic system.
6 |
7 | 
8 |
9 | The key insight here is about LLM limitations: the bigger and more complex a task is, the more steps it will take, which means a longer context window. As context grows, LLMs are more likely to get lost or lose focus. By keeping agents focused on specific domains with 3-10, maybe 20 steps max, we keep context windows manageable and LLM performance high.
10 |
11 | > #### As context grows, LLMs are more likely to get lost or lose focus
12 |
13 | Benefits of small, focused agents:
14 |
15 | 1. **Manageable Context**: Smaller context windows mean better LLM performance
16 | 2. **Clear Responsibilities**: Each agent has a well-defined scope and purpose
17 | 3. **Better Reliability**: Less chance of getting lost in complex workflows
18 | 4. **Easier Testing**: Simpler to test and validate specific functionality
19 | 5. **Improved Debugging**: Easier to identify and fix issues when they occur
20 |
21 | ### What if LLMs get smarter?
22 |
23 | Do we still need this if LLMs get smart enough to handle 100-step+ workflows?
24 |
25 | tl;dr yes. As agents and LLMs improve, they **might** naturally expand to be able to handle longer context windows. This means handling MORE of a larger DAG. This small, focused approach ensures you can get results TODAY, while preparing you to slowly expand agent scope as LLM context windows become more reliable. (If you've refactored large deterministic code bases before, you may be nodding your head right now).
26 |
27 | [](https://github.com/user-attachments/assets/0cd3f52c-046e-4d5e-bab4-57657157c82f
28 | )
29 |
30 |
31 | GIF Version
32 | 
33 |
34 |
35 | Being intentional about size/scope of agents, and only growing in ways that allow you to maintain quality, is key here. As the [team that built NotebookLM put it](https://open.substack.com/pub/swyx/p/notebooklm?selection=08e1187c-cfee-4c63-93c9-71216640a5f8&utm_campaign=post-share-selection&utm_medium=web):
36 |
37 | > I feel like consistently, the most magical moments out of AI building come about for me when I'm really, really, really just close to the edge of the model capability
38 |
39 | Regardless of where that boundary is, if you can find that boundary and get it right consistently, you'll be building magical experiences. There are many moats to be built here, but as usual, they take some engineering rigor.
40 |
41 | [← Compact Errors](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md) | [Trigger From Anywhere →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md)
42 |
--------------------------------------------------------------------------------
/content/factor-11-trigger-from-anywhere.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 11. Trigger from anywhere, meet users where they are
4 |
5 | If you're waiting for the [humanlayer](https://humanlayer.dev) pitch, you made it. If you're doing [factor 6 - launch/pause/resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) and [factor 7 - contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md), you're ready to incorporate this factor.
6 |
7 | 
8 |
9 | Enable users to trigger agents from slack, email, sms, or whatever other channel they want. Enable agents to respond via the same channels.
10 |
11 | Benefits:
12 |
13 | - **Meet users where they are**: This helps you build AI applications that feel like real humans, or at the very least, digital coworkers
14 | - **Outer Loop Agents**: Enable agents to be triggered by non-humans, e.g. events, crons, outages, whatever else. They may work for 5, 20, 90 minutes, but when they get to a critical point, they can contact a human for help, feedback, or approval
15 | - **High Stakes Tools**: If you're able to quickly loop in a variety of humans, you can give agents access to higher stakes operations like sending external emails, updating production data and more. Maintaining clear standards gets you auditability and confidence in agents that [perform bigger better things](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md#what-if-llms-get-smarter)
16 |
17 | [← Small Focused Agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md) | [Stateless Reducer →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-12-stateless-reducer.md)
--------------------------------------------------------------------------------
/content/factor-12-stateless-reducer.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 12. Make your agent a stateless reducer
4 |
5 | Okay so we're over 1000 lines of markdown at this point. This one is mostly just for fun.
6 |
7 | 
8 |
9 |
10 | 
11 |
12 | [← Trigger From Anywhere](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md) | [Appendix - Pre-Fetch Context →](./appendix-13-pre-fetch.md)
13 |
--------------------------------------------------------------------------------
/content/factor-2-own-your-prompts.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 2. Own your prompts
4 |
5 | Don't outsource your prompt engineering to a framework.
6 |
7 | 
8 |
9 | Some frameworks provide a "black box" approach like this:
10 |
11 | ```python
12 | agent = Agent(
13 | role="...",
14 | goal="...",
15 | personality="...",
16 | tools=[tool1, tool2, tool3]
17 | )
18 |
19 | task = Task(
20 | instructions="...",
21 | expected_output=OutputModel
22 | )
23 |
24 | result = agent.run(task)
25 | ```
26 |
27 | This is great for pulling in some TOP NOTCH prompt engineering to get you started, but it is often difficult to tune and/or reverse engineer to get exactly the right tokens into your model.
28 |
29 | Instead, own your prompts and treat them as first-class code:
30 |
31 | ```rust
32 | function DetermineNextStep(thread: string) -> DoneForNow | ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation {
33 | prompt #"
34 | {{ _.role("system") }}
35 |
36 | You are a helpful assistant that manages deployments for frontend and backend systems.
37 | You work diligently to ensure safe and successful deployments by following best practices
38 | and proper deployment procedures.
39 |
40 | Before deploying any system, you should check:
41 | - The deployment environment (staging vs production)
42 | - The correct tag/version to deploy
43 | - The current system status
44 |
45 | You can use tools like deploy_backend, deploy_frontend, and check_deployment_status
46 | to manage deployments. For sensitive deployments, use request_approval to get
47 | human verification.
48 |
49 | Always think about what to do first, like:
50 | - Check current deployment status
51 | - Verify the deployment tag exists
52 | - Request approval if needed
53 | - Deploy to staging before production
54 | - Monitor deployment progress
55 |
56 | {{ _.role("user") }}
57 |
58 | {{ thread }}
59 |
60 | What should the next step be?
61 | "#
62 | }
63 | ```
64 |
65 | (the above example uses [BAML](https://github.com/boundaryml/baml) to generate the prompt, but you can do this with any prompt engineering tool you want, or even just template it manually)
66 |
67 | If the signature looks a little funny, we'll get to that in [factor 4 - tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md)
68 |
69 | ```typescript
70 | function DetermineNextStep(thread: string) -> DoneForNow | ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation {
71 | ```
72 |
73 | Key benefits of owning your prompts:
74 |
75 | 1. **Full Control**: Write exactly the instructions your agent needs, no black box abstractions
76 | 2. **Testing and Evals**: Build tests and evals for your prompts just like you would for any other code
77 | 3. **Iteration**: Quickly modify prompts based on real-world performance
78 | 4. **Transparency**: Know exactly what instructions your agent is working with
79 | 5. **Role Hacking**: take advantage of APIs that support nonstandard usage of user/assistant roles - for example, the now-deprecated non-chat flavor of OpenAI "completions" API. This includes some so-called "model gaslighting" techniques
80 |
81 | Remember: Your prompts are the primary interface between your application logic and the LLM.
82 |
83 | Having full control over your prompts gives you the flexibility and prompt control you need for production-grade agents.
84 |
85 | I don't know what's the best prompt, but I know you want the flexibility to be able to try EVERYTHING.
86 |
87 | [← Natural Language To Tool Calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-1-natural-language-to-tool-calls.md) | [Own Your Context Window →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
--------------------------------------------------------------------------------
/content/factor-3-own-your-context-window.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 3. Own your context window
4 |
5 | You don't necessarily need to use standard message-based formats for conveying context to an LLM.
6 |
7 | > #### At any given point, your input to an LLM in an agent is "here's what's happened so far, what's the next step"
8 |
9 |
10 |
11 |
12 | Everything is context engineering. [LLMs are stateless functions](https://thedataexchange.media/baml-revolution-in-ai-engineering/) that turn inputs into outputs. To get the best outputs, you need to give them the best inputs.
13 |
14 | Creating great context means:
15 |
16 | - The prompt and instructions you give to the model
17 | - Any documents or external data you retrieve (e.g. RAG)
18 | - Any past state, tool calls, results, or other history
19 | - Any past messages or events from related but separate histories/conversations (Memory)
20 | - Instructions about what sorts of structured data to output
21 |
22 | 
23 |
24 | This guide is all about getting as much as possible out of today's models. Notably not mentioned are:
25 |
26 | - Changes to models parameters like temperature, top_p, frequency_penalty, presence_penalty, etc.
27 | - Training your own completion or embedding models
28 | - Fine-tuning existing models
29 |
30 | Again, I don't know what's the best way to hand context to an LLM, but I know you want the flexibility to be able to try EVERYTHING.
31 |
32 | #### Standard vs Custom Context Formats
33 |
34 | Most LLM clients use a standard message-based format like this:
35 |
36 | ```yaml
37 | [
38 | {
39 | "role": "system",
40 | "content": "You are a helpful assistant..."
41 | },
42 | {
43 | "role": "user",
44 | "content": "Can you deploy the backend?"
45 | },
46 | {
47 | "role": "assistant",
48 | "content": null,
49 | "tool_calls": [
50 | {
51 | "id": "1",
52 | "name": "list_git_tags",
53 | "arguments": "{}"
54 | }
55 | ]
56 | },
57 | {
58 | "role": "tool",
59 | "name": "list_git_tags",
60 | "content": "{\"tags\": [{\"name\": \"v1.2.3\", \"commit\": \"abc123\", \"date\": \"2024-03-15T10:00:00Z\"}, {\"name\": \"v1.2.2\", \"commit\": \"def456\", \"date\": \"2024-03-14T15:30:00Z\"}, {\"name\": \"v1.2.1\", \"commit\": \"abe033d\", \"date\": \"2024-03-13T09:15:00Z\"}]}",
61 | "tool_call_id": "1"
62 | }
63 | ]
64 | ```
65 |
66 | While this works great for most use cases, if you want to really get THE MOST out of today's LLMs, you need to get your context into the LLM in the most token- and attention-efficient way you can.
67 |
68 | As an alternative to the standard message-based format, you can build your own context format that's optimized for your use case. For example, you can use custom objects and pack/spread them into one or more user, system, assistant, or tool messages as makes sense.
69 |
70 | Here's an example of putting the whole context window into a single user message:
71 | ```yaml
72 |
73 | [
74 | {
75 | "role": "system",
76 | "content": "You are a helpful assistant..."
77 | },
78 | {
79 | "role": "user",
80 | "content": |
81 | Here's everything that happened so far:
82 |
83 |
84 | From: @alex
85 | Channel: #deployments
86 | Text: Can you deploy the backend?
87 |
88 |
89 |
90 | intent: "list_git_tags"
91 |
92 |
93 |
94 | tags:
95 | - name: "v1.2.3"
96 | commit: "abc123"
97 | date: "2024-03-15T10:00:00Z"
98 | - name: "v1.2.2"
99 | commit: "def456"
100 | date: "2024-03-14T15:30:00Z"
101 | - name: "v1.2.1"
102 | commit: "ghi789"
103 | date: "2024-03-13T09:15:00Z"
104 |
105 |
106 | what's the next step?
107 | }
108 | ]
109 | ```
110 |
111 | The model may infer that you're asking it `what's the next step` by the tool schemas you supply, but it never hurts to roll it into your prompt template.
112 |
113 | ### code example
114 |
115 | We can build this with something like:
116 |
117 | ```python
118 |
119 | class Thread:
120 | events: List[Event]
121 |
122 | class Event:
123 | # could just use string, or could be explicit - up to you
124 | type: Literal["list_git_tags", "deploy_backend", "deploy_frontend", "request_more_information", "done_for_now", "list_git_tags_result", "deploy_backend_result", "deploy_frontend_result", "request_more_information_result", "done_for_now_result", "error"]
125 | data: ListGitTags | DeployBackend | DeployFrontend | RequestMoreInformation |
126 | ListGitTagsResult | DeployBackendResult | DeployFrontendResult | RequestMoreInformationResult | string
127 |
128 | def event_to_prompt(event: Event) -> str:
129 | data = event.data if isinstance(event.data, str) \
130 | else stringifyToYaml(event.data)
131 |
132 | return f"<{event.type}>\n{data}\n{event.type}>"
133 |
134 |
135 | def thread_to_prompt(thread: Thread) -> str:
136 | return '\n\n'.join(event_to_prompt(event) for event in thread.events)
137 | ```
138 |
139 | #### Example Context Windows
140 |
141 | Here's how context windows might look with this approach:
142 |
143 | **Initial Slack Request:**
144 | ```xml
145 |
146 | From: @alex
147 | Channel: #deployments
148 | Text: Can you deploy the latest backend to production?
149 |
150 | ```
151 |
152 | **After Listing Git Tags:**
153 | ```xml
154 |
155 | From: @alex
156 | Channel: #deployments
157 | Text: Can you deploy the latest backend to production?
158 | Thread: []
159 |
160 |
161 |
162 | intent: "list_git_tags"
163 |
164 |
165 |
166 | tags:
167 | - name: "v1.2.3"
168 | commit: "abc123"
169 | date: "2024-03-15T10:00:00Z"
170 | - name: "v1.2.2"
171 | commit: "def456"
172 | date: "2024-03-14T15:30:00Z"
173 | - name: "v1.2.1"
174 | commit: "ghi789"
175 | date: "2024-03-13T09:15:00Z"
176 |
177 | ```
178 |
179 | **After Error and Recovery:**
180 | ```xml
181 |
182 | From: @alex
183 | Channel: #deployments
184 | Text: Can you deploy the latest backend to production?
185 | Thread: []
186 |
187 |
188 |
189 | intent: "deploy_backend"
190 | tag: "v1.2.3"
191 | environment: "production"
192 |
193 |
194 |
195 | error running deploy_backend: Failed to connect to deployment service
196 |
197 |
198 |
199 | intent: "request_more_information_from_human"
200 | question: "I had trouble connecting to the deployment service, can you provide more details and/or check on the status of the service?"
201 |
202 |
203 |
204 | data:
205 | response: "I'm not sure what's going on, can you check on the status of the latest workflow?"
206 |
207 | ```
208 |
209 | From here your next step might be:
210 |
211 | ```python
212 | nextStep = await determine_next_step(thread_to_prompt(thread))
213 | ```
214 |
215 | ```python
216 | {
217 | "intent": "get_workflow_status",
218 | "workflow_name": "tag_push_prod.yaml",
219 | }
220 | ```
221 |
222 | The XML-style format is just one example - the point is you can build your own format that makes sense for your application. You'll get better quality if you have the flexibility to experiment with different context structures and what you store vs. what you pass to the LLM.
223 |
224 | Key benefits of owning your context window:
225 |
226 | 1. **Information Density**: Structure information in ways that maximize the LLM's understanding
227 | 2. **Error Handling**: Include error information in a format that helps the LLM recover. Consider hiding errors and failed calls from context window once they are resolved.
228 | 3. **Safety**: Control what information gets passed to the LLM, filtering out sensitive data
229 | 4. **Flexibility**: Adapt the format as you learn what works best for your use case
230 | 5. **Token Efficiency**: Optimize context format for token efficiency and LLM understanding
231 |
232 | Context includes: prompts, instructions, RAG documents, history, tool calls, memory
233 |
234 |
235 | Remember: The context window is your primary interface with the LLM. Taking control of how you structure and present information can dramatically improve your agent's performance.
236 |
237 | Example - information density - same message, fewer tokens:
238 |
239 | 
240 |
241 |
242 | Recurring theme here: I don't know what's the best approach, but I know you want the flexibility to be able to try EVERYTHING.
243 |
244 | [← Own Your Prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md) | [Tools Are Structured Outputs →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md)
245 |
--------------------------------------------------------------------------------
/content/factor-4-tools-are-structured-outputs.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 4. Tools are just structured outputs
4 |
5 | Tools don't need to be complex. At their core, they're just structured output from your LLM that triggers deterministic code.
6 |
7 | 
8 |
9 | For example, lets say you have two tools `CreateIssue` and `SearchIssues`. To ask an LLM to "use one of several tools" is just to ask it to output JSON we can parse into an object representing those tools.
10 |
11 | ```python
12 |
13 | class Issue:
14 | title: str
15 | description: str
16 | team_id: str
17 | assignee_id: str
18 |
19 | class CreateIssue:
20 | intent: "create_issue"
21 | issue: Issue
22 |
23 | class SearchIssues:
24 | intent: "search_issues"
25 | query: str
26 | what_youre_looking_for: str
27 | ```
28 |
29 | The pattern is simple:
30 | 1. LLM outputs structured JSON
31 | 3. Deterministic code executes the appropriate action (like calling an external API)
32 | 4. Results are captured and fed back into the context
33 |
34 | This creates a clean separation between the LLM's decision-making and your application's actions. The LLM decides what to do, but your code controls how it's done. Just because an LLM "called a tool" doesn't mean you have to go execute a specific corresponding function in the same way every time.
35 |
36 | If you recall our switch statement from above
37 |
38 | ```python
39 | if nextStep.intent == 'create_payment_link':
40 | stripe.paymentlinks.create(nextStep.parameters)
41 | return # or whatever you want, see below
42 | elif nextStep.intent == 'wait_for_a_while':
43 | # do something monadic idk
44 | else: #... the model didn't call a tool we know about
45 | # do something else
46 | ```
47 |
48 | **Note**: there has been a lot said about the benefits of "plain prompting" vs. "tool calling" vs. "JSON mode" and the performance tradeoffs of each. We'll link some resources to that stuff soon, but not gonna get into it here. See [Prompting vs JSON Mode vs Function Calling vs Constrained Generation vs SAP](https://www.boundaryml.com/blog/schema-aligned-parsing), [When should I use function calling, structured outputs, or JSON mode?](https://www.vellum.ai/blog/when-should-i-use-function-calling-structured-outputs-or-json-mode#:~:text=We%20don%27t%20recommend%20using%20JSON,always%20use%20Structured%20Outputs%20instead) and [OpenAI JSON vs Function Calling](https://docs.llamaindex.ai/en/stable/examples/llm/openai_json_vs_function_calling/).
49 |
50 | The "next step" might not be as atomic as just "run a pure function and return the result". You unlock a lot of flexibility when you think of "tool calls" as just a model outputting JSON describing what deterministic code should do. Put this together with [factor 8 own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md).
51 |
52 | [← Own Your Context Window](./factor-3-own-your-context-window.md) | [Unify Execution State →](./factor-5-unify-execution-state.md)
53 |
--------------------------------------------------------------------------------
/content/factor-5-unify-execution-state.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 5. Unify execution state and business state
4 |
5 | Even outside the AI world, many infrastructure systems try to separate "execution state" from "business state". For AI apps, this might involve complex abstractions to track things like current step, next step, waiting status, retry counts, etc. This separation creates complexity that may be worthwhile, but may be overkill for your use case.
6 |
7 | As always, it's up to you to decide what's right for your application. But don't think you *have* to manage them separately.
8 |
9 | More clearly:
10 |
11 | - **Execution state**: current step, next step, waiting status, retry counts, etc.
12 | - **Business state**: What's happened in the agent workflow so far (e.g. list of OpenAI messages, list of tool calls and results, etc.)
13 |
14 | If possible, SIMPLIFY - unify these as much as possible.
15 |
16 | [](https://github.com/user-attachments/assets/e5a851db-f58f-43d8-8b0c-1926c99fc68d)
17 |
18 |
19 |
20 | GIF Version
21 |
22 | ]
23 |
24 |
25 |
26 | In reality, you can engineer your application so that you can infer all execution state from the context window. In many cases, execution state (current step, waiting status, etc.) is just metadata about what has happened so far.
27 |
28 | You may have things that can't go in the context window, like session ids, password contexts, etc, but your goal should be to minimize those things. By embracing [factor 3](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) you can control what actually goes into the LLM
29 |
30 | This approach has several benefits:
31 |
32 | 1. **Simplicity**: One source of truth for all state
33 | 2. **Serialization**: The thread is trivially serializable/deserializable
34 | 3. **Debugging**: The entire history is visible in one place
35 | 4. **Flexibility**: Easy to add new state by just adding new event types
36 | 5. **Recovery**: Can resume from any point by just loading the thread
37 | 6. **Forking**: Can fork the thread at any point by copying some subset of the thread into a new context / state ID
38 | 7. **Human Interfaces and Observability**: Trivial to convert a thread into a human-readable markdown or a rich Web app UI
39 |
40 | [← Tools Are Structured Outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md) | [Launch/Pause/Resume →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md)
41 |
--------------------------------------------------------------------------------
/content/factor-6-launch-pause-resume.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 6. Launch/Pause/Resume with simple APIs
4 |
5 | Agents are just programs, and we have things we expect from how to launch, query, resume, and stop them.
6 |
7 | [](https://github.com/user-attachments/assets/feb1a425-cb96-4009-a133-8bd29480f21f)
8 |
9 |
10 | GIF Version
11 |
12 | ]
13 |
14 |
15 |
16 |
17 | It should be easy for users, apps, pipelines, and other agents to launch an agent with a simple API.
18 |
19 | Agents and their orchestrating deterministic code should be able to pause an agent when a long-running operation is needed.
20 |
21 | External triggers like webhooks should enable agents to resume from where they left off without deep integration with the agent orchestrator.
22 |
23 | Closely related to [factor 5 - unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) and [factor 8 - own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md), but can be implemented independently.
24 |
25 |
26 |
27 | **Note** - often AI orchestrators will allow for pause and resume, but not between the moment of tool selection and tool execution. See also [factor 7 - contact humans with tool calls](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) and [factor 11 - trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md).
28 |
29 | [← Unify Execution State](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) | [Contact Humans With Tools →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md)
--------------------------------------------------------------------------------
/content/factor-7-contact-humans-with-tools.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 7. Contact humans with tool calls
4 |
5 | By default, LLM APIs rely on a fundamental HIGH-STAKES token choice: Are we returning plaintext content, or are we returning structured data?
6 |
7 | 
8 |
9 | You're putting a lot of weight on that choice of first token, which, in the `the weather in tokyo` case, is
10 |
11 | > "the"
12 |
13 | but in the `fetch_weather` case, it's some special token to denote the start of a JSON object.
14 |
15 | > |JSON>
16 |
17 | You might get better results by having the LLM *always* output json, and then declare it's intent with some natural language tokens like `request_human_input` or `done_for_now` (as opposed to a "proper" tool like `check_weather_in_city`).
18 |
19 | Again, you might not get any performance boost from this, but you should experiment, and ensure you're free to try weird stuff to get the best results.
20 |
21 | ```python
22 |
23 | class Options:
24 | urgency: Literal["low", "medium", "high"]
25 | format: Literal["free_text", "yes_no", "multiple_choice"]
26 | choices: List[str]
27 |
28 | # Tool definition for human interaction
29 | class RequestHumanInput:
30 | intent: "request_human_input"
31 | question: str
32 | context: str
33 | options: Options
34 |
35 | # Example usage in the agent loop
36 | if nextStep.intent == 'request_human_input':
37 | thread.events.append({
38 | type: 'human_input_requested',
39 | data: nextStep
40 | })
41 | thread_id = await save_state(thread)
42 | await notify_human(nextStep, thread_id)
43 | return # Break loop and wait for response to come back with thread ID
44 | else:
45 | # ... other cases
46 | ```
47 |
48 | Later, you might receive a webhook from a system that handles slack, email, sms, or other events.
49 |
50 | ```python
51 |
52 | @app.post('/webhook')
53 | def webhook(req: Request):
54 | thread_id = req.body.threadId
55 | thread = await load_state(thread_id)
56 | thread.events.push({
57 | type: 'response_from_human',
58 | data: req.body
59 | })
60 | # ... simplified for brevity, you likely don't want to block the web worker here
61 | next_step = await determine_next_step(thread_to_prompt(thread))
62 | thread.events.append(next_step)
63 | result = await handle_next_step(thread, next_step)
64 | # todo - loop or break or whatever you want
65 |
66 | return {"status": "ok"}
67 | ```
68 |
69 | The above includes patterns from [factor 5 - unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md), [factor 8 - own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md), [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md), and [factor 4 - tools are just structured outputs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-4-tools-are-structured-outputs.md), and several others.
70 |
71 | If we were using the XML-y formatted from [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md), our context window after a few turns might look like this:
72 |
73 | ```xml
74 |
75 | (snipped for brevity)
76 |
77 |
78 | From: @alex
79 | Channel: #deployments
80 | Text: Can you deploy backend v1.2.3 to production?
81 | Thread: []
82 |
83 |
84 |
85 | intent: "request_human_input"
86 | question: "Would you like to proceed with deploying v1.2.3 to production?"
87 | context: "This is a production deployment that will affect live users."
88 | options: {
89 | urgency: "high"
90 | format: "yes_no"
91 | }
92 |
93 |
94 |
95 | response: "yes please proceed"
96 | approved: true
97 | timestamp: "2024-03-15T10:30:00Z"
98 | user: "alex@company.com"
99 |
100 |
101 |
102 | intent: "deploy_backend"
103 | tag: "v1.2.3"
104 | environment: "production"
105 |
106 |
107 |
108 | status: "success"
109 | message: "Deployment v1.2.3 to production completed successfully."
110 | timestamp: "2024-03-15T10:30:00Z"
111 |
112 | ```
113 |
114 |
115 | Benefits:
116 |
117 | 1. **Clear Instructions**: Tools for different types of human contact allow for more specifity from the LLM
118 | 2. **Inner vs Outer Loop**: Enables agents workflows **outside** of the traditional chatGPT-style interface, where the control flow and context initialization may be `Agent->Human` rather than `Human->Agent` (think, agents kicked off by a cron or an event)
119 | 3. **Multiple Human Access**: Can easily track and coordinate input from different humans through structured events
120 | 4. **Multi-Agent**: Simple abstraction can be easily extended to support `Agent->Agent` requests and responses
121 | 5. **Durable**: Combined with [factor 6 - launch/pause/resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md), this makes for durable, reliable, and introspectable multiplayer workflows
122 |
123 | [More on Outer Loop Agents over here](https://theouterloop.substack.com/p/openais-realtime-api-is-a-step-towards)
124 |
125 | 
126 |
127 | Works great with [factor 11 - trigger from anywhere, meet users where they are](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-11-trigger-from-anywhere.md)
128 |
129 | [← Launch/Pause/Resume](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md) | [Own Your Control Flow →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md)
--------------------------------------------------------------------------------
/content/factor-8-own-your-control-flow.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 8. Own your control flow
4 |
5 | If you own your control flow, you can do lots of fun things.
6 |
7 | 
8 |
9 |
10 | Build your own control structures that make sense for your specific use case. Specifically, certain types of tool calls may be reason to break out of the loop and wait for a response from a human or another long-running task like a training pipeline. You may also want to incorporate custom implementation of:
11 |
12 | - summarization or caching of tool call results
13 | - LLM-as-judge on structured output
14 | - context window compaction or other [memory management](https://github.com/humanlayer/12-factor-agents/blob/main/content/appendix-14-everything-is-context-engineering.md)
15 | - logging, tracing, and metrics
16 | - client-side rate limiting
17 | - durable sleep / pause / "wait for event"
18 |
19 |
20 | The below example shows three possible control flow patterns:
21 |
22 |
23 | - request_clarification: model asked for more info, break the loop and wait for a response from a human
24 | - fetch_git_tags: model asked for a list of git tags, fetch the tags, append to context window, and pass straight back to the model
25 | - deploy_backend: model asked to deploy a backend, this is a high-stakes thing, so break the loop and wait for human approval
26 |
27 | ```python
28 | def handle_next_step(thread: Thread):
29 |
30 | while True:
31 | next_step = await determine_next_step(thread_to_prompt(thread))
32 |
33 | # inlined for clarity - in reality you could put
34 | # this in a method, use exceptions for control flow, or whatever you want
35 | if next_step.intent == 'request_clarification':
36 | thread.events.append({
37 | type: 'request_clarification',
38 | data: nextStep,
39 | })
40 |
41 | await send_message_to_human(next_step)
42 | await db.save_thread(thread)
43 | # async step - break the loop, we'll get a webhook later
44 | break
45 | elif next_step.intent == 'fetch_open_issues':
46 | thread.events.append({
47 | type: 'fetch_open_issues',
48 | data: next_step,
49 | })
50 |
51 | issues = await linear_client.issues()
52 |
53 | thread.events.append({
54 | type: 'fetch_open_issues_result',
55 | data: issues,
56 | })
57 | # sync step - pass the new context to the LLM to determine the NEXT next step
58 | continue
59 | elif next_step.intent == 'create_issue':
60 | thread.events.append({
61 | type: 'create_issue',
62 | data: next_step,
63 | })
64 |
65 | await request_human_approval(next_step)
66 | await db.save_thread(thread)
67 | # async step - break the loop, we'll get a webhook later
68 | break
69 | ```
70 |
71 | This pattern allows you to interrupt and resume your agent's flow as needed, creating more natural conversations and workflows.
72 |
73 | **Example** - the number one feature request I have for every AI framework out there is we need to be able to interrupt
74 | a working agent and resume later, ESPECIALLY between the moment of tool **selection** and the moment of tool **invocation**.
75 |
76 | Without this level of resumability/granularity, there's no way to review/approve the tool call before it runs, which means
77 | you're forced to either:
78 |
79 | 1. Pause the task in memory while waiting for the long-running thing to complete (think `while...sleep`) and restart it from the beginning if the process is interrupted
80 | 2. Restrict the agent to only low-stakes, low-risk calls like research and summarization
81 | 3. Give the agent access to do bigger, more useful things, and just yolo hope it doesn't screw up
82 |
83 |
84 | You may notice this is closely related to [factor 5 - unify execution state and business state](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-5-unify-execution-state.md) and [factor 6 - launch/pause/resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md), but can be implemented independently.
85 |
86 | [← Contact Humans With Tools](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md) | [Compact Errors →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-9-compact-errors.md)
87 |
--------------------------------------------------------------------------------
/content/factor-9-compact-errors.md:
--------------------------------------------------------------------------------
1 | [← Back to README](https://github.com/humanlayer/12-factor-agents/blob/main/README.md)
2 |
3 | ### 9. Compact Errors into Context Window
4 |
5 | This one is a little short but is worth mentioning. One of these benefits of agents is "self-healing" - for short tasks, an LLM might call a tool that fails. Good LLMs have a fairly good chance of reading an error message or stack trace and figuring out what to change in a subsequent tool call.
6 |
7 |
8 | Most frameworks implement this, but you can do JUST THIS without doing any of the other 11 factors. Here's an example of
9 |
10 |
11 | ```python
12 | thread = {"events": [inital_message]}
13 |
14 | while True:
15 | next_step = await determine_next_step(thread_to_prompt(thread))
16 | thread["events"].append({
17 | "type": next_step.intent,
18 | "data": next_step,
19 | })
20 | try:
21 | result = await handle_next_step(thread, next_step) # our switch statement
22 | except Exception as e:
23 | # if we get an error, we can add it to the context window and try again
24 | thread["events"].append({
25 | "type": 'error',
26 | "data": format_error(e),
27 | })
28 | # loop, or do whatever else here to try to recover
29 | ```
30 |
31 | You may want to implement an errorCounter for a specific tool call, to limit to ~3 attempts of a single tool, or whatever other logic makes sense for your use case.
32 |
33 | ```python
34 | consecutive_errors = 0
35 |
36 | while True:
37 |
38 | # ... existing code ...
39 |
40 | try:
41 | result = await handle_next_step(thread, next_step)
42 | thread["events"].append({
43 | "type": next_step.intent + '_result',
44 | data: result,
45 | })
46 | # success! reset the error counter
47 | consecutive_errors = 0
48 | except Exception as e:
49 | consecutive_errors += 1
50 | if consecutive_errors < 3:
51 | # do the loop and try again
52 | thread["events"].append({
53 | "type": 'error',
54 | "data": format_error(e),
55 | })
56 | else:
57 | # break the loop, reset parts of the context window, escalate to a human, or whatever else you want to do
58 | break
59 | }
60 | }
61 | ```
62 | Hitting some consecutive-error-threshold might be a great place to [escalate to a human](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md), whether by model decision or via deterministic takeover of the control flow.
63 |
64 | [](https://github.com/user-attachments/assets/cd7ed814-8309-4baf-81a5-9502f91d4043)
65 |
66 |
67 |
68 | [GIF Version](https://github.com/humanlayer/12-factor-agents/blob/main/img/195-factor-9-errors.gif)
69 |
70 | 
71 |
72 |
73 |
74 | Benefits:
75 |
76 | 1. **Self-Healing**: The LLM can read the error message and figure out what to change in a subsequent tool call
77 | 2. **Durable**: The agent can continue to run even if one tool call fails
78 |
79 | I'm sure you will find that if you do this TOO much, your agent will start to spin out and might repeat the same error over and over again.
80 |
81 | That's where [factor 8 - own your control flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) and [factor 3 - own your context building](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md) come in - you don't need to just put the raw error back on, you can completely restructure how it's represented, remove previous events from the context window, or whatever deterministic thing you find works to get an agent back on track.
82 |
83 | But the number one way to prevent error spin-outs is to embrace [factor 10 - small, focused agents](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md).
84 |
85 | [← Own Your Control Flow](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-8-own-your-control-flow.md) | [Small Focused Agents →](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-10-small-focused-agents.md)
--------------------------------------------------------------------------------
/img/010-software-dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/010-software-dag.png
--------------------------------------------------------------------------------
/img/015-dag-orchestrators.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/015-dag-orchestrators.png
--------------------------------------------------------------------------------
/img/020-dags-with-ml.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/020-dags-with-ml.png
--------------------------------------------------------------------------------
/img/025-agent-dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/025-agent-dag.png
--------------------------------------------------------------------------------
/img/026-agent-dag-lines.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/026-agent-dag-lines.png
--------------------------------------------------------------------------------
/img/027-agent-loop-animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop-animation.gif
--------------------------------------------------------------------------------
/img/027-agent-loop-animation.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop-animation.mp4
--------------------------------------------------------------------------------
/img/027-agent-loop-dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop-dag.png
--------------------------------------------------------------------------------
/img/027-agent-loop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/027-agent-loop.png
--------------------------------------------------------------------------------
/img/028-micro-agent-dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/028-micro-agent-dag.png
--------------------------------------------------------------------------------
/img/029-deploybot-high-level.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/029-deploybot-high-level.png
--------------------------------------------------------------------------------
/img/030-deploybot-animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/030-deploybot-animation.gif
--------------------------------------------------------------------------------
/img/030-deploybot-animation.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/030-deploybot-animation.mp4
--------------------------------------------------------------------------------
/img/031-deploybot-animation-5.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation-5.gif
--------------------------------------------------------------------------------
/img/031-deploybot-animation-5.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation-5.mp4
--------------------------------------------------------------------------------
/img/031-deploybot-animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation.gif
--------------------------------------------------------------------------------
/img/031-deploybot-animation.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/031-deploybot-animation.mp4
--------------------------------------------------------------------------------
/img/033-deploybot.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/033-deploybot.gif
--------------------------------------------------------------------------------
/img/035-deploybot-conversation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/035-deploybot-conversation.png
--------------------------------------------------------------------------------
/img/040-4-components.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/040-4-components.png
--------------------------------------------------------------------------------
/img/110-natural-language-tool-calls.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/110-natural-language-tool-calls.png
--------------------------------------------------------------------------------
/img/120-own-your-prompts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/120-own-your-prompts.png
--------------------------------------------------------------------------------
/img/130-own-your-context-building.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/130-own-your-context-building.png
--------------------------------------------------------------------------------
/img/140-tools-are-just-structured-outputs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/140-tools-are-just-structured-outputs.png
--------------------------------------------------------------------------------
/img/150-all-state-in-context-window.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/150-all-state-in-context-window.png
--------------------------------------------------------------------------------
/img/150-unify-state.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/150-unify-state.png
--------------------------------------------------------------------------------
/img/155-unify-state-animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/155-unify-state-animation.gif
--------------------------------------------------------------------------------
/img/160-pause-resume-with-simple-apis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/160-pause-resume-with-simple-apis.png
--------------------------------------------------------------------------------
/img/165-pause-resume-animation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/165-pause-resume-animation.gif
--------------------------------------------------------------------------------
/img/170-contact-humans-with-tools.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/170-contact-humans-with-tools.png
--------------------------------------------------------------------------------
/img/175-outer-loop-agents.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/175-outer-loop-agents.png
--------------------------------------------------------------------------------
/img/180-control-flow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/180-control-flow.png
--------------------------------------------------------------------------------
/img/190-factor-9-errors-static.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/190-factor-9-errors-static.png
--------------------------------------------------------------------------------
/img/195-factor-9-errors.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/195-factor-9-errors.gif
--------------------------------------------------------------------------------
/img/1a0-small-focused-agents.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1a0-small-focused-agents.png
--------------------------------------------------------------------------------
/img/1a5-agent-scope-grow.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1a5-agent-scope-grow.gif
--------------------------------------------------------------------------------
/img/1b0-trigger-from-anywhere.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1b0-trigger-from-anywhere.png
--------------------------------------------------------------------------------
/img/1c0-stateless-reducer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1c0-stateless-reducer.png
--------------------------------------------------------------------------------
/img/1c5-agent-foldl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/1c5-agent-foldl.png
--------------------------------------------------------------------------------
/img/220-context-engineering.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/humanlayer/12-factor-agents/89747813dd5b383efd980becc6756f8027a3c2e0/img/220-context-engineering.png
--------------------------------------------------------------------------------