├── .dev.vars.example ├── .gitignore ├── .node-version ├── .prettierrc ├── README.md ├── app.vue ├── assets └── css │ └── style.css ├── components ├── ChatHistory.vue └── ChatMessage.vue ├── nuxt.config.ts ├── package-lock.json ├── package.json ├── pages └── index.vue ├── plugins └── vuetify.ts ├── public ├── favicon.ico ├── github-mark.svg └── images │ ├── cloudflare-env-vars.png │ └── interaction.png ├── server ├── api │ ├── chat.post.ts │ ├── ingest.ts │ └── trace.post.ts └── tsconfig.json ├── tsconfig.json ├── utils ├── conversational_retrieval_chain.ts └── data │ └── lilian_weng_agents_blog.ts └── wrangler.toml /.dev.vars.example: -------------------------------------------------------------------------------- 1 | # https://platform.openai.com/api-keys 2 | OPENAI_API_KEY= 3 | 4 | # https://dash.cloudflare.com/ 5 | CLOUDFLARE_ACCOUNT_ID= 6 | 7 | # https://developers.cloudflare.com/workers-ai/get-started/rest-api/ 8 | CLOUDFLARE_WORKERSAI_API_TOKEN= 9 | 10 | # For tracing via LangSmith 11 | # https://docs.smith.langchain.com/ 12 | LANGCHAIN_TRACING_V2= 13 | LANGCHAIN_SESSION= 14 | LANGCHAIN_API_KEY= 15 | 16 | ENVIRONMENT="local" -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Nuxt dev/build outputs 2 | .output 3 | .data 4 | .nuxt 5 | .nitro 6 | .cache 7 | dist 8 | 9 | # Node dependencies 10 | node_modules 11 | .yarn 12 | 13 | # Logs 14 | logs 15 | *.log 16 | 17 | # Misc 18 | .DS_Store 19 | .fleet 20 | .idea 21 | 22 | # Local env files 23 | .env 24 | .env.* 25 | !.env.example 26 | 27 | # Wrangler 28 | .wrangler 29 | .dev.vars 30 | -------------------------------------------------------------------------------- /.node-version: -------------------------------------------------------------------------------- 1 | 18 -------------------------------------------------------------------------------- /.prettierrc: -------------------------------------------------------------------------------- 1 | {} 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ☁️🦜 Cloudflare + LangChain + Nuxt Multi-source Chat 2 | 3 | This repo contains an example of a conversational retrieval system that can route between multiple data sources, choosing the one 4 | more relevant to an incoming question. This helps cut down on distraction from off-topic documents getting 5 | pulled in by the vectorstore's similarity search, which could occur if only a single database were used, and is particularly important for small models. 6 | 7 | 🚀 **Live version here:** https://langchain-cloudflare-nuxt-template.jacob-ee9.workers.dev/ 8 | 9 | The base version runs entirely on the Cloudflare WorkersAI stack with a tiny open-source Llama 2-7B model, but you can 10 | swap in more powerful models such as Anthropic's `claude-2` or OpenAI's `gpt-3.5-turbo` to improve performance in key places if desired. 11 | 12 | ![](/public/images/interaction.png) 13 | 14 | It uses: 15 | 16 | - A chat variant of Llama 2-7B run via the [Cloudflare WorkersAI network](https://developers.cloudflare.com/workers-ai/) 17 | - A Cloudflare WorkersAI embeddings model 18 | - Two different [Cloudflare Vectorize DBs](https://developers.cloudflare.com/vectorize/) for different knowledge bases (you could add more!) 19 | - [Cloudflare Workers](https://developers.cloudflare.com/workers/) for hosting 20 | - [LangChain.js](https://js.langchain.com/) for orchestration 21 | - [Nuxt](https://nuxt.com/) + [Vue](https://vuejs.org/) for the frontend 22 | 23 | ## How it works 24 | 25 | Here's how it works: 26 | 27 | 0. The vectorstores are populated with ingested data via a local ping to the Nuxt server route under `/server/api/ingest.ts`. 28 | 1. An incoming question is routed to the Nuxt server route under `/server/api/chat.post.ts`. This route constructs and calls a [LangChain Expression Language](https://js.langchain.com/docs/expression_language/) chain located under `/utils/conversational_retrieval_chain.ts`. 29 | 2. If there have been previous messages in the conversation, the chain first transforms the original question into a _standalone question_, free of pronouns and other references to chat history. This is important since vectorstores return results based on similarity between ingested docs and the query. 30 | 3. Based on this rephrased query, the chain selects which vectorstore to retrieve from. 31 | 4. The chain retrieves context docs based on the output of the previous step from the chosen vectorstore. 32 | 5. The chain generates a final answer based on this retrieved context and the standalone question. 33 | 34 | Here's an illustrative [LangSmith trace of the steps](https://smith.langchain.com/public/0474c554-01ab-4f7f-937f-b6c205fa91f5/r) involved. 35 | 36 | Because we use a small model, removing as much distraction as possible via routing is even more helpful here. 37 | 38 | The two default data sources populated in `server/api/ingest.ts` via [LangChain document loaders](https://js.langchain.com/docs/modules/data_connection/document_loaders/). 39 | They are a [a PDF detailing some of Cloudflare's features](https://www.cloudflare.com/resources/assets/slt3lc6tev37/3HWObubm6fybC0FWUdFYAJ/5d5e3b0a4d9c5a7619984ed6076f01fe/Cloudflare_for_Campaigns_Security_Guide.pdf) and a [blog post by Lilian Weng](https://lilianweng.github.io/posts/2023-06-23-agent/) that talks about autonomous agents. 40 | 41 | If you want to use alternate data sources, make sure to remember to also update the routing steps in the chain! 42 | 43 | ## Setup 44 | 45 | ### Install dependencies: 46 | 47 | ```bash 48 | $ npm install 49 | ``` 50 | 51 | ### Create Vectorize DBs 52 | 53 | Note that this currently requires you to be on a paid Cloudflare Workers plan. 54 | 55 | We configure the databases to work with Cloudflare WorkersAI's `@cf/baai/bge-base-en-v1.5` embeddings model. 56 | 57 | ```bash 58 | $ npx wrangler vectorize create langchain_cloudflare_docs_index --preset @cf/baai/bge-base-en-v1.5 59 | $ npx wrangler vectorize create langchain_ai_docs_index --preset @cf/baai/bge-base-en-v1.5 60 | ``` 61 | 62 | The names match those found in the default `wrangler.toml` file. If you choose different names, you will need to update the bindings there. 63 | 64 | **Note:** If you want to delete your databases, you can run the following commands: 65 | 66 | ```bash 67 | $ npx wrangler vectorize delete langchain_cloudflare_docs_index 68 | $ npx wrangler vectorize delete langchain_ai_docs_index 69 | ``` 70 | 71 | You can use other presets or parameters for other embedding models. 72 | See the [Wrangler CLI reference docs](https://developers.cloudflare.com/workers/wrangler/commands/#vectorize) for a full list of options. 73 | 74 | ### Set up required env vars 75 | 76 | Copy `.dev.vars.example` to `.dev.vars` and fill in the required variables: 77 | 78 | ```ini 79 | # https://dash.cloudflare.com/ 80 | CLOUDFLARE_ACCOUNT_ID= 81 | 82 | # https://developers.cloudflare.com/workers-ai/get-started/rest-api/ 83 | CLOUDFLARE_WORKERSAI_API_TOKEN= 84 | 85 | # For tracing via LangSmith 86 | # https://docs.smith.langchain.com/ 87 | LANGCHAIN_TRACING_V2= 88 | LANGCHAIN_SESSION= 89 | LANGCHAIN_API_KEY= 90 | 91 | # If swapping in OpenAI somewhere, https://platform.openai.com/api-keys 92 | OPENAI_API_KEY= 93 | 94 | ENVIRONMENT="local" 95 | ``` 96 | 97 | ### Build project 98 | 99 | ```bash 100 | $ npm run build 101 | ``` 102 | 103 | ### Start in preview mode 104 | 105 | ```bash 106 | $ npm run preview 107 | ``` 108 | 109 | You can combine the previous two steps to reflect changes while developing (Nuxt hot reload coming soon!): 110 | 111 | ```bash 112 | $ npm run reload 113 | ``` 114 | 115 | ### Ingestion 116 | 117 | Ping `http://localhost:3000/api/ingest` to populate the Vectorize DBs you set up earlier. 118 | 119 | And finally, go to `http://localhost:3000` and try asking a question about either Cloudflare or AI! 120 | 121 | ## Deployment 122 | 123 | When you're ready, deploy to Cloudflare workers via Wrangler with: 124 | 125 | ```bash 126 | $ npm run deploy 127 | ``` 128 | 129 | You'll need to set private encrypted environment variables in your Cloudflare console under your Worker's `Settings -> Variables` page for `CLOUDFLARE_WORKERSAI_API_TOKEN`, `CLOUDFLARE_ACCOUNT_ID`, and `LANGCHAIN_API_KEY` if using tracing: 130 | 131 | ![](/public/images/cloudflare-env-vars.png) 132 | 133 | ## Improving performance 134 | 135 | By default, the only APIs and resources this app uses are within Cloudflare. However, by leveraging frontier models with higher reasoning capabilities at key points 136 | in the retrieval chain, we can drastically improve performance without using an excessive number of tokens. 137 | 138 | The two places where this increased reasoning power are most helpful are in the question rephrase step and the routing step, and coincidentally, 139 | both steps are lighter on tokens because they do not involve passing a large list of retrieved docs as context. 140 | 141 | If you set the following environment variables, this app will use Anthropic's Claude 2 hosted via AWS Bedrock for routing and follow-up question rephrasing: 142 | 143 | ```ini 144 | BEDROCK_AWS_ACCESS_KEY_ID= 145 | BEDROCK_AWS_SECRET_ACCESS_KEY= 146 | BEDROCK_AWS_REGION= 147 | ``` 148 | 149 | To avoid distraction for the default Llama 2 model, we do not pass chat history into the final answer generation call. This makes it incapable of answering certain 150 | meta-questions about the conversation such as "how many questions have I asked?". To properly answer those questions, swap in a more powerful model and 151 | uncomment the corresponding line in `utils/conversational_retrieval_chain.ts`. 152 | 153 | ## Thank you! 154 | 155 | Special thanks to [Pooya Parsa](https://x.com/_pi0_) for help with integrating Nuxt with Cloudflare's Wrangler. 156 | 157 | For more, follow LangChain on X (formerly Twitter) [@LangChainAI](https://x.com/langchainai/). 158 | -------------------------------------------------------------------------------- /app.vue: -------------------------------------------------------------------------------- 1 | 8 | -------------------------------------------------------------------------------- /assets/css/style.css: -------------------------------------------------------------------------------- 1 | body { 2 | --color-white: #fafafa; 3 | --color-black: #111111; 4 | 5 | --color-blue: #275dad; 6 | 7 | --color-background: #292626; 8 | 9 | --box-shadow: 0px 1px 2px rgba(0, 0, 0, 0.1); 10 | 11 | font-family: Inter, system-ui, Avenir, Helvetica, Arial, sans-serif; 12 | line-height: 1.5; 13 | font-weight: 400; 14 | 15 | color-scheme: light dark; 16 | color: var(--color-white); 17 | background-color: var(--color-background); 18 | 19 | font-synthesis: none; 20 | text-rendering: optimizeLegibility; 21 | -webkit-font-smoothing: antialiased; 22 | -moz-osx-font-smoothing: grayscale; 23 | -webkit-text-size-adjust: 100%; 24 | } 25 | 26 | a { 27 | font-weight: 500; 28 | color: var(--color-white); 29 | text-decoration: none; 30 | } 31 | 32 | a:hover { 33 | color: var(--color-white); 34 | text-decoration: underline; 35 | } 36 | 37 | #app { 38 | overflow: hidden; 39 | } 40 | 41 | h1 { 42 | font-size: 3em; 43 | line-height: 1.1; 44 | } 45 | -------------------------------------------------------------------------------- /components/ChatHistory.vue: -------------------------------------------------------------------------------- 1 | 48 | 49 | 57 | 58 | 66 | -------------------------------------------------------------------------------- /components/ChatMessage.vue: -------------------------------------------------------------------------------- 1 | 7 | 8 | 13 | 14 | 19 | 20 | 39 | -------------------------------------------------------------------------------- /nuxt.config.ts: -------------------------------------------------------------------------------- 1 | import vuetify, { transformAssetUrls } from "vite-plugin-vuetify"; 2 | 3 | // https://nuxt.com/docs/api/configuration/nuxt-config 4 | export default defineNuxtConfig({ 5 | //... 6 | // devtools: { enabled: true }, 7 | build: { 8 | transpile: ["vuetify"], 9 | }, 10 | nitro: { 11 | // Useful for debugging 12 | // minify: false, 13 | commands: { 14 | preview: 15 | "npx wrangler dev ./server/index.mjs --site ./public --remote --port 3000", 16 | }, 17 | }, 18 | css: ["~/assets/css/style.css"], 19 | modules: [ 20 | (_options, nuxt) => { 21 | nuxt.hooks.hook("vite:extendConfig", (config) => { 22 | // @ts-expect-error 23 | config.plugins.push(vuetify({ autoImport: true })); 24 | }); 25 | }, 26 | //... 27 | ], 28 | vite: { 29 | vue: { 30 | template: { 31 | transformAssetUrls, 32 | }, 33 | }, 34 | }, 35 | }); 36 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "langchain-cloudflare-nuxt-template", 3 | "private": true, 4 | "type": "module", 5 | "scripts": { 6 | "build": "NITRO_PRESET=cloudflare-module nuxt build", 7 | "dev": "nuxt dev", 8 | "preview": "nuxt preview", 9 | "postinstall": "nuxt prepare", 10 | "reload": "npm run build && npm run preview", 11 | "deploy": "npm run build && npx wrangler deploy .output/server/index.mjs --site .output/public", 12 | "format": "prettier . --write" 13 | }, 14 | "devDependencies": { 15 | "@cloudflare/workers-types": "^4.20231218.0", 16 | "@nuxt/devtools": "latest", 17 | "nuxt": "^3.8.0", 18 | "prettier": "3.1.0", 19 | "vite-plugin-vuetify": "^1.0.2", 20 | "vue": "^3.3.7", 21 | "vue-router": "^4.2.5", 22 | "vuetify": "^3.3.23", 23 | "wrangler": "^3.15.0" 24 | }, 25 | "dependencies": { 26 | "@aws-crypto/sha256-js": "^5.2.0", 27 | "@aws-sdk/types": "^3.451.0", 28 | "@langchain/cloudflare": "0.0.1", 29 | "@langchain/community": "^0.0.16", 30 | "@mdi/font": "^7.3.67", 31 | "@microsoft/fetch-event-source": "^2.0.1", 32 | "@smithy/eventstream-codec": "^2.0.14", 33 | "@smithy/protocol-http": "^3.0.10", 34 | "@smithy/signature-v4": "^2.0.16", 35 | "@smithy/util-utf8": "^2.0.2", 36 | "langchain": "^0.1.2", 37 | "langsmith": "^0.0.57", 38 | "pdf-parse": "^1.1.1" 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /pages/index.vue: -------------------------------------------------------------------------------- 1 | 109 | 110 | 153 | 154 | 218 | -------------------------------------------------------------------------------- /plugins/vuetify.ts: -------------------------------------------------------------------------------- 1 | import "@mdi/font/css/materialdesignicons.css"; 2 | import "vuetify/styles"; 3 | import { createVuetify } from "vuetify"; 4 | 5 | export default defineNuxtPlugin((app) => { 6 | const vuetify = createVuetify({ 7 | theme: { 8 | defaultTheme: "dark", 9 | }, 10 | }); 11 | app.vueApp.use(vuetify); 12 | }); 13 | -------------------------------------------------------------------------------- /public/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/langchain-ai/langchain-cloudflare-nuxt-template/29d14bb366a34d292056d942d6580513e6871790/public/favicon.ico -------------------------------------------------------------------------------- /public/github-mark.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /public/images/cloudflare-env-vars.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/langchain-ai/langchain-cloudflare-nuxt-template/29d14bb366a34d292056d942d6580513e6871790/public/images/cloudflare-env-vars.png -------------------------------------------------------------------------------- /public/images/interaction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/langchain-ai/langchain-cloudflare-nuxt-template/29d14bb366a34d292056d942d6580513e6871790/public/images/interaction.png -------------------------------------------------------------------------------- /server/api/chat.post.ts: -------------------------------------------------------------------------------- 1 | import { HumanMessage, AIMessage } from "@langchain/core/messages"; 2 | import { HttpResponseOutputParser } from "langchain/output_parsers"; 3 | 4 | import { BedrockChat } from "@langchain/community/chat_models/bedrock/web"; 5 | import { 6 | ChatCloudflareWorkersAI, 7 | CloudflareVectorizeStore, 8 | CloudflareWorkersAIEmbeddings, 9 | } from "@langchain/cloudflare"; 10 | import { createConversationalRetrievalChain } from "~/utils/conversational_retrieval_chain"; 11 | 12 | const formatChatHistory = ( 13 | chatHistory: { type: "ai" | "human"; content: string }[], 14 | ) => { 15 | const messages = chatHistory.map((message) => { 16 | if (message.type === "ai") { 17 | return new AIMessage({ content: message.content }); 18 | } else { 19 | return new HumanMessage({ content: message.content }); 20 | } 21 | }); 22 | 23 | return messages; 24 | }; 25 | 26 | export default defineEventHandler(async (event) => { 27 | const cloudflareBindings = event.context?.cloudflare?.env; 28 | if (!cloudflareBindings) { 29 | throw new Error("No Cloudflare bindings found."); 30 | } 31 | const body = await readBody(event); 32 | const { messages } = body; 33 | const history = messages.slice(0, -1); 34 | const currentMessage = messages[messages.length - 1]; 35 | 36 | const embeddings = new CloudflareWorkersAIEmbeddings({ 37 | binding: cloudflareBindings.AI, 38 | modelName: "@cf/baai/bge-base-en-v1.5", 39 | }); 40 | 41 | const aiKnowledgeVectorstore = new CloudflareVectorizeStore(embeddings, { 42 | index: cloudflareBindings.AI_KNOWLEDGE_VECTORIZE_INDEX, 43 | }); 44 | 45 | const cloudflareKnowledgeVectorstore = new CloudflareVectorizeStore( 46 | embeddings, 47 | { 48 | index: cloudflareBindings.CLOUDFLARE_KNOWLEDGE_VECTORIZE_INDEX, 49 | }, 50 | ); 51 | 52 | const cloudflareModel = new ChatCloudflareWorkersAI({ 53 | model: "@cf/meta/llama-2-7b-chat-fp16", 54 | cloudflareAccountId: process.env.CLOUDFLARE_ACCOUNT_ID, 55 | cloudflareApiToken: process.env.CLOUDFLARE_WORKERSAI_API_TOKEN, 56 | }); 57 | 58 | // Set process.env.BEDROCK_AWS_ACCESS_KEY_ID to use a larger model for more reasoning-intensive, 59 | // low-token tasks like routing and question rephrasing 60 | const bedrockModel = 61 | process.env.BEDROCK_AWS_ACCESS_KEY_ID !== undefined 62 | ? new BedrockChat({ 63 | model: "anthropic.claude-v2", 64 | region: process.env.BEDROCK_AWS_REGION, 65 | credentials: { 66 | accessKeyId: process.env.BEDROCK_AWS_ACCESS_KEY_ID!, 67 | secretAccessKey: process.env.BEDROCK_AWS_SECRET_ACCESS_KEY!, 68 | }, 69 | }) 70 | : undefined; 71 | 72 | const chain = createConversationalRetrievalChain({ 73 | model: cloudflareModel, 74 | largerModel: bedrockModel, 75 | aiKnowledgeVectorstore, 76 | cloudflareKnowledgeVectorstore, 77 | }); 78 | 79 | let runIdResolver: (runId: string) => void; 80 | const runIdPromise = new Promise((resolve) => { 81 | runIdResolver = resolve; 82 | }); 83 | 84 | const stream = await chain 85 | .pipe(new HttpResponseOutputParser({ contentType: "text/event-stream" })) 86 | .stream( 87 | { 88 | chat_history: formatChatHistory(history), 89 | question: currentMessage.content, 90 | }, 91 | { 92 | callbacks: [ 93 | { 94 | handleChainStart(_llm, _prompts, runId) { 95 | runIdResolver(runId); 96 | }, 97 | }, 98 | ], 99 | }, 100 | ); 101 | 102 | const runId = await runIdPromise; 103 | return new Response(stream, { 104 | headers: { 105 | "Content-Type": "text/event-stream", 106 | "X-Langsmith-Run-Id": runId, 107 | }, 108 | }); 109 | }); 110 | -------------------------------------------------------------------------------- /server/api/ingest.ts: -------------------------------------------------------------------------------- 1 | import { WebPDFLoader } from "langchain/document_loaders/web/pdf"; 2 | import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"; 3 | import { Document } from "@langchain/core/documents"; 4 | import type { VectorStore } from "@langchain/core/vectorstores"; 5 | 6 | import { 7 | CloudflareVectorizeStore, 8 | CloudflareWorkersAIEmbeddings, 9 | } from "@langchain/cloudflare"; 10 | 11 | import { BLOG_POST_TEXT } from "~/utils/data/lilian_weng_agents_blog"; 12 | 13 | const upsertDocsToVectorstore = async ( 14 | vectorstore: VectorStore, 15 | docs: Document[], 16 | ) => { 17 | const ids = []; 18 | const encoder = new TextEncoder(); 19 | for (const doc of docs) { 20 | // Vectorize does not support object metadata, and we won't be needing it for 21 | // this app. 22 | doc.metadata = {}; 23 | const insecureHash = await crypto.subtle.digest( 24 | "SHA-1", 25 | encoder.encode(doc.pageContent), 26 | ); 27 | // Use a hash of the page content as an id 28 | const hashArray = Array.from(new Uint8Array(insecureHash)); 29 | const readableId = hashArray 30 | .map((b) => b.toString(16).padStart(2, "0")) 31 | .join(""); 32 | ids.push(readableId); 33 | } 34 | const result = await vectorstore.addDocuments(docs, { ids }); 35 | return result; 36 | }; 37 | 38 | export default defineEventHandler(async (event) => { 39 | if (process.env.ENVIRONMENT !== "local") { 40 | throw new Error( 41 | `You must run the ingest script with process.env.ENVIRONMENT set to "local".`, 42 | ); 43 | } 44 | const cloudflareBindings = event.context?.cloudflare?.env; 45 | if (!cloudflareBindings) { 46 | throw new Error("No Cloudflare bindings found."); 47 | } 48 | const embeddings = new CloudflareWorkersAIEmbeddings({ 49 | binding: cloudflareBindings.AI, 50 | modelName: "@cf/baai/bge-base-en-v1.5", 51 | }); 52 | 53 | // Tune based on your raw content. 54 | const splitter = new RecursiveCharacterTextSplitter({ 55 | chunkSize: 1024, 56 | chunkOverlap: 100, 57 | }); 58 | 59 | // Ingest content from a blog post on AI agents 60 | const aiAgentDocument = new Document({ 61 | pageContent: BLOG_POST_TEXT, 62 | }); 63 | const splitAiAgentDocs = await splitter.splitDocuments([aiAgentDocument]); 64 | const aiKnowledgeVectorstore = new CloudflareVectorizeStore(embeddings, { 65 | index: cloudflareBindings.AI_KNOWLEDGE_VECTORIZE_INDEX, 66 | }); 67 | 68 | await upsertDocsToVectorstore(aiKnowledgeVectorstore, splitAiAgentDocs); 69 | 70 | // Ingest content about Cloudflare 71 | // Need to polyfill a method that Cloudflare Workers is missing for the PDF loader 72 | globalThis.setImmediate = ((fn: () => {}) => setTimeout(fn, 0)) as any; 73 | const cloudflareFetchResponse = await fetch( 74 | "https://www.cloudflare.com/resources/assets/slt3lc6tev37/3HWObubm6fybC0FWUdFYAJ/5d5e3b0a4d9c5a7619984ed6076f01fe/Cloudflare_for_Campaigns_Security_Guide.pdf", 75 | ); 76 | const cloudflarePdfBlob = await cloudflareFetchResponse.blob(); 77 | const pdfLoader = new WebPDFLoader(cloudflarePdfBlob, { 78 | parsedItemSeparator: "", 79 | }); 80 | const cloudflareDocs = await pdfLoader.load(); 81 | const splitCloudflareDocs = await splitter.splitDocuments(cloudflareDocs); 82 | const cloudflareKnowledgeVectorstore = new CloudflareVectorizeStore( 83 | embeddings, 84 | { 85 | index: cloudflareBindings.CLOUDFLARE_KNOWLEDGE_VECTORIZE_INDEX, 86 | }, 87 | ); 88 | await upsertDocsToVectorstore( 89 | cloudflareKnowledgeVectorstore, 90 | splitCloudflareDocs, 91 | ); 92 | 93 | return "Ingest complete!"; 94 | }); 95 | -------------------------------------------------------------------------------- /server/api/trace.post.ts: -------------------------------------------------------------------------------- 1 | import { Client } from "langsmith"; 2 | 3 | export default defineEventHandler(async (event) => { 4 | if (!process.env.LANGCHAIN_API_KEY) { 5 | throw new Error("No LangChain API key set."); 6 | } 7 | const langsmithClient = new Client({ 8 | webUrl: "https://smith.langchain.com", 9 | }); 10 | const body = await readBody(event); 11 | const runId = body.run_id; 12 | if (!runId) { 13 | return Response.json( 14 | { error: "You must provide a run id." }, 15 | { status: 400 }, 16 | ); 17 | } 18 | const traceUrl = await langsmithClient.shareRun(runId); 19 | return Response.json({ url: traceUrl }, { status: 200 }); 20 | }); 21 | -------------------------------------------------------------------------------- /server/tsconfig.json: -------------------------------------------------------------------------------- 1 | { 2 | "extends": "../.nuxt/tsconfig.server.json" 3 | } 4 | -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- 1 | { 2 | // https://nuxt.com/docs/guide/concepts/typescript 3 | "extends": "./.nuxt/tsconfig.json" 4 | } 5 | -------------------------------------------------------------------------------- /utils/conversational_retrieval_chain.ts: -------------------------------------------------------------------------------- 1 | import type { VectorStore } from "@langchain/core/vectorstores"; 2 | import type { BaseLanguageModel } from "@langchain/core/language_models/base"; 3 | import type { Document } from "@langchain/core/documents"; 4 | 5 | import { 6 | ChatPromptTemplate, 7 | MessagesPlaceholder, 8 | } from "@langchain/core/prompts"; 9 | import { RunnableSequence, RunnableBranch } from "@langchain/core/runnables"; 10 | import { StringOutputParser } from "@langchain/core/output_parsers"; 11 | 12 | const CONDENSE_QUESTION_SYSTEM_TEMPLATE = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources. 13 | Your job is to remove references to chat history from incoming questions, rephrasing them as standalone questions.`; 14 | 15 | const CONDENSE_QUESTION_HUMAN_TEMPLATE = `Using only previous conversation as context, rephrase the following question to be a standalone question. 16 | 17 | Do not respond with anything other than a rephrased standalone question. Be concise, but complete and resolve all references to the chat history. 18 | 19 | 20 | {question} 21 | `; 22 | const condenseQuestionPrompt = ChatPromptTemplate.fromMessages([ 23 | ["system", CONDENSE_QUESTION_SYSTEM_TEMPLATE], 24 | new MessagesPlaceholder("chat_history"), 25 | ["human", CONDENSE_QUESTION_HUMAN_TEMPLATE], 26 | ]); 27 | 28 | const ROUTER_TEMPLATE = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources. 29 | You have access to two databases: one with information about Cloudflare, and another about artificial intelligence. 30 | Your job is to pick the database that would be more useful to answer the following question: 31 | 32 | 33 | {standalone_question} 34 | 35 | 36 | You must respond with one of the following answers: "Cloudflare", "Artificial Intelligence", or "Neither". Do not include anything else in your response.`; 37 | 38 | // This is equivalent to a human message 39 | const routerPrompt = ChatPromptTemplate.fromTemplate(ROUTER_TEMPLATE); 40 | 41 | const ANSWER_SYSTEM_TEMPLATE = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources. 42 | Using the provided context, answer the user's question to the best of your ability using only the resources provided. 43 | Generate a concise answer for a given question based solely on the provided context. 44 | You must only use information from the provided search results. Use an unbiased and journalistic tone. Combine search results together into a coherent answer. Do not repeat text. 45 | If there is no information in the context relevant to the question at hand, just say "Hmm, I'm not sure." 46 | Anything between the following \`context\` html blocks is retrieved from a knowledge bank, not part of the conversation with the user. 47 | 48 | 49 | {context} 50 | 51 | 52 | REMEMBER: Be concise, and only use facts from the provided context.`; 53 | 54 | const ANSWER_HUMAN_TEMPLATE = `Answer the following question to the best of your ability. This is extremely important for my career! 55 | 56 | {standalone_question}`; 57 | 58 | const answerPrompt = ChatPromptTemplate.fromMessages([ 59 | ["system", ANSWER_SYSTEM_TEMPLATE], 60 | // Adding chat history as part of the final answer generation is distracting for a small model like Llama 2-7B. 61 | // If using a more powerful model, you can re-enable to better support meta-questions about the conversation. 62 | // new MessagesPlaceholder("chat_history"), 63 | ["human", ANSWER_HUMAN_TEMPLATE], 64 | ]); 65 | 66 | const formatDocuments = (docs: Document[]) => { 67 | return docs 68 | .map((doc, i) => { 69 | return `\n${doc.pageContent}\n`; 70 | }) 71 | .join("\n"); 72 | }; 73 | 74 | export function createConversationalRetrievalChain({ 75 | model, 76 | largerModel, 77 | cloudflareKnowledgeVectorstore, 78 | aiKnowledgeVectorstore, 79 | }: { 80 | model: BaseLanguageModel; 81 | largerModel?: BaseLanguageModel; 82 | cloudflareKnowledgeVectorstore: VectorStore; 83 | aiKnowledgeVectorstore: VectorStore; 84 | }) { 85 | const cloudflareKnowledgeRetriever = cloudflareKnowledgeVectorstore 86 | .asRetriever() 87 | .withConfig({ runName: "CloudflareKnowledgeRetriever" }); 88 | const aiKnowledgeRetriever = aiKnowledgeVectorstore 89 | .asRetriever() 90 | .withConfig({ runName: "AIKnowledgeRetriever" }); 91 | 92 | const routingChain = RunnableSequence.from([ 93 | routerPrompt, 94 | largerModel ?? model, 95 | new StringOutputParser(), 96 | ]).withConfig({ runName: "RoutingChain" }); 97 | 98 | const retrievalChain = RunnableSequence.from([ 99 | { 100 | standalone_question: (input) => input.standalone_question, 101 | knowledge_base_name: routingChain, 102 | }, 103 | // Default to the AI retriever if the model does not think Cloudflare would be helpful. 104 | // You could change this to e.g. a general search retriever instead. 105 | RunnableBranch.from([ 106 | [ 107 | (output) => 108 | output.knowledge_base_name.toLowerCase().includes("cloudflare"), 109 | RunnableSequence.from([ 110 | // Retrievers only take a single string as input, 111 | // so we have to extract it from the previous step output. 112 | (output) => output.standalone_question, 113 | cloudflareKnowledgeRetriever, 114 | ]), 115 | ], 116 | RunnableSequence.from([ 117 | (output) => output.standalone_question, 118 | aiKnowledgeRetriever, 119 | ]), 120 | ]), 121 | formatDocuments, 122 | ]).withConfig({ runName: "RetrievalChain" }); 123 | 124 | const standaloneQuestionChain = RunnableSequence.from([ 125 | condenseQuestionPrompt, 126 | largerModel ?? model, 127 | new StringOutputParser(), 128 | ]).withConfig({ runName: "RephraseQuestionChain" }); 129 | 130 | const answerChain = RunnableSequence.from([ 131 | { 132 | standalone_question: (input) => input.standalone_question, 133 | chat_history: (input) => input.chat_history, 134 | context: retrievalChain, 135 | }, 136 | answerPrompt, 137 | model, 138 | ]).withConfig({ runName: "AnswerGenerationChain" }); 139 | 140 | /** 141 | * Chain steps are: 142 | * 1. If there is chat history, rephrase initial question as standalone question with standaloneQuestionChain 143 | * If question is not a followup, pass the user's question directly through 144 | * 2. Choose proper vectorstore based on the question using routingChain 145 | * 3. Retrieve context docs based on the output of routingChain using retrievalChain 146 | * 4. Generate a final answer based on context, question, and chat history in answerChain 147 | * 148 | * Illustrative trace: 149 | * https://smith.langchain.com/public/0474c554-01ab-4f7f-937f-b6c205fa91f5/r 150 | */ 151 | return RunnableSequence.from([ 152 | { 153 | standalone_question: RunnableBranch.from([ 154 | [(input) => input.chat_history.length > 0, standaloneQuestionChain], 155 | (input) => input.question, 156 | ]), 157 | chat_history: (input) => input.chat_history, 158 | }, 159 | answerChain, 160 | ]).withConfig({ runName: "ConversationalRetrievalChain" }); 161 | } 162 | -------------------------------------------------------------------------------- /utils/data/lilian_weng_agents_blog.ts: -------------------------------------------------------------------------------- 1 | /* 2 | @article{weng2023prompt, 3 | title = "LLM-powered Autonomous Agents"", 4 | author = "Weng, Lilian", 5 | journal = "lilianweng.github.io", 6 | year = "2023", 7 | month = "Jun", 8 | url = "https://lilianweng.github.io/posts/2023-06-23-agent/" 9 | } 10 | */ 11 | 12 | export const BLOG_POST_TEXT = `Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver. 13 | 14 | Agent System Overview 15 | In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components: 16 | 17 | Planning 18 | Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. 19 | Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results. 20 | Memory 21 | Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn. 22 | Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval. 23 | Tool use 24 | The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more. 25 | 26 | Component One: Planning 27 | A complicated task usually involves many steps. An agent needs to know what they are and plan ahead. 28 | 29 | Task Decomposition 30 | Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process. 31 | 32 | Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote. 33 | 34 | Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs. 35 | 36 | Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains. 37 | 38 | Self-Reflection 39 | Self-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable. 40 | 41 | ReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language. 42 | 43 | The ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as: 44 | 45 | Thought: ... 46 | Action: ... 47 | Observation: ... 48 | ... (Repeated many times) 49 | 50 | In both experiments on knowledge-intensive tasks and decision-making tasks, ReAct works better than the Act-only baseline where Thought: … step is removed. 51 | 52 | Reflexion (Shinn & Labash 2023) is a framework to equips agents with dynamic memory and self-reflection capabilities to improve reasoning skills. Reflexion has a standard RL setup, in which the reward model provides a simple binary reward and the action space follows the setup in ReAct where the task-specific action space is augmented with language to enable complex reasoning steps. After each action 53 | , the agent computes a heuristic 54 | and optionally may decide to reset the environment to start a new trial depending on the self-reflection results. 55 | 56 | 57 | The heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment. 58 | 59 | Self-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM. 60 | 61 | 62 | Chain of Hindsight (CoH; Liu et al. 2023) encourages the model to improve on its own outputs by explicitly presenting it with a sequence of past outputs, each annotated with feedback. The model is finetuned to only predict 63 | where conditioned on the sequence prefix, such that the model can self-reflect to produce better output based on the feedback sequence. The model can optionally receive multiple rounds of instructions with human annotators at test time. 64 | 65 | To avoid overfitting, CoH adds a regularization term to maximize the log-likelihood of the pre-training dataset. To avoid shortcutting and copying (because there are many common words in feedback sequences), they randomly mask 0% - 5% of past tokens during training. 66 | 67 | The training dataset in their experiments is a combination of WebGPT comparisons, summarization from human feedback and human preference dataset. 68 | 69 | 70 | The idea of CoH is to present a history of sequentially improved outputs in context and train the model to take on the trend to produce better outputs. Algorithm Distillation (AD; Laskin et al. 2023) applies the same idea to cross-episode trajectories in reinforcement learning tasks, where an algorithm is encapsulated in a long history-conditioned policy. Considering that an agent interacts with the environment many times and in each episode the agent gets a little better, AD concatenates this learning history and feeds that into the model. Hence we should expect the next predicted action to lead to better performance than previous trials. The goal is to learn the process of RL instead of training a task-specific policy itself. 71 | 72 | The paper hypothesizes that any algorithm that generates a set of learning histories can be distilled into a neural network by performing behavioral cloning over actions. The history data is generated by a set of source policies, each trained for a specific task. At the training stage, during each RL run, a random task is sampled and a subsequence of multi-episode history is used for training, such that the learned policy is task-agnostic. 73 | 74 | In reality, the model has limited context window length, so episodes should be short enough to construct multi-episode history. Multi-episodic contexts of 2-4 episodes are necessary to learn a near-optimal in-context RL algorithm. The emergence of in-context RL requires long enough context. 75 | 76 | In comparison with three baselines, including ED (expert distillation, behavior cloning with expert trajectories instead of learning history), source policy (used for generating trajectories for distillation by UCB), RL^2 (Duan et al. 2017; used as upper bound since it needs online RL), AD demonstrates in-context RL with performance getting close to RL^2 despite only using offline RL and learns much faster than other baselines. When conditioned on partial training history of the source policy, AD also improves much faster than ED baseline. 77 | 78 | Component Two: Memory 79 | (Big thank you to ChatGPT for helping me draft this section. I’ve learned a lot about the human brain and data structure for fast MIPS in my conversations with ChatGPT.) 80 | 81 | Types of Memory 82 | Memory can be defined as the processes used to acquire, store, retain, and later retrieve information. There are several types of memory in human brains. 83 | 84 | Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended. Sensory memory typically only lasts for up to a few seconds. Subcategories include iconic memory (visual), echoic memory (auditory), and haptic memory (touch). 85 | 86 | Short-Term Memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning. Short-term memory is believed to have the capacity of about 7 items (Miller 1956) and lasts for 20-30 seconds. 87 | 88 | Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM: 89 | 90 | Explicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semantic memory (facts and concepts). 91 | Implicit / procedural memory: This type of memory is unconscious and involves skills and routines that are performed automatically, like riding a bike or typing on a keyboard. 92 | 93 | We can roughly consider the following mappings: 94 | 95 | Sensory memory as learning embedding representations for raw inputs, including text, image or other modalities; 96 | Short-term memory as in-context learning. It is short and finite, as it is restricted by the finite context window length of Transformer. 97 | Long-term memory as the external vector store that the agent can attend to at query time, accessible via fast retrieval. 98 | Maximum Inner Product Search (MIPS) 99 | The external memory can alleviate the restriction of finite attention span. A standard practice is to save the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS). To optimize the retrieval speed, the common choice is the approximate nearest neighbors (ANN)​ algorithm to return approximately top k nearest neighbors to trade off a little accuracy lost for a huge speedup. 100 | 101 | A couple common choices of ANN algorithms for fast MIPS: 102 | 103 | LSH (Locality-Sensitive Hashing): It introduces a hashing function such that similar input items are mapped to the same buckets with high probability, where the number of buckets is much smaller than the number of inputs. 104 | ANNOY (Approximate Nearest Neighbors Oh Yeah): The core data structure are random projection trees, a set of binary trees where each non-leaf node represents a hyperplane splitting the input space into half and each leaf stores one data point. Trees are built independently and at random, so to some extent, it mimics a hashing function. ANNOY search happens in all the trees to iteratively search through the half that is closest to the query and then aggregates the results. The idea is quite related to KD tree but a lot more scalable. 105 | HNSW (Hierarchical Navigable Small World): It is inspired by the idea of small world networks where most nodes can be reached by any other nodes within a small number of steps; e.g. “six degrees of separation” feature of social networks. HNSW builds hierarchical layers of these small-world graphs, where the bottom layers contain the actual data points. The layers in the middle create shortcuts to speed up search. When performing a search, HNSW starts from a random node in the top layer and navigates towards the target. When it can’t get any closer, it moves down to the next layer, until it reaches the bottom layer. Each move in the upper layers can potentially cover a large distance in the data space, and each move in the lower layers refines the search quality. 106 | FAISS (Facebook AI Similarity Search): It operates on the assumption that in high dimensional space, distances between nodes follow a Gaussian distribution and thus there should exist clustering of data points. FAISS applies vector quantization by partitioning the vector space into clusters and then refining the quantization within clusters. Search first looks for cluster candidates with coarse quantization and then further looks into each cluster with finer quantization. 107 | ScaNN (Scalable Nearest Neighbors): The main innovation in ScaNN is anisotropic vector quantization. It quantizes a data point 108 | to 109 | such that the inner product 110 | is as similar to the original distance of 111 | as possible, instead of picking the closet quantization centroid points. 112 | 113 | Check more MIPS algorithms and performance comparison in ann-benchmarks.com. 114 | 115 | Component Three: Tool Use 116 | Tool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities. 117 | 118 | 119 | MRKL (Karpas et al. 2022), short for “Modular Reasoning, Knowledge and Language”, is a neuro-symbolic architecture for autonomous agents. A MRKL system is proposed to contain a collection of “expert” modules and the general-purpose LLM works as a router to route inquiries to the best suitable expert module. These modules can be neural (e.g. deep learning models) or symbolic (e.g. math calculator, currency converter, weather API). 120 | 121 | They did an experiment on fine-tuning LLM to call a calculator, using arithmetic as a test case. Their experiments showed that it was harder to solve verbal math problems than explicitly stated math problems because LLMs (7B Jurassic1-large model) failed to extract the right arguments for the basic arithmetic reliably. The results highlight when the external symbolic tools can work reliably, knowing when to and how to use the tools are crucial, determined by the LLM capability. 122 | 123 | Both TALM (Tool Augmented Language Models; Parisi et al. 2022) and Toolformer (Schick et al. 2023) fine-tune a LM to learn to use external tool APIs. The dataset is expanded based on whether a newly added API call annotation can improve the quality of model outputs. See more details in the “External APIs” section of Prompt Engineering. 124 | 125 | ChatGPT Plugins and OpenAI API function calling are good examples of LLMs augmented with tool use capability working in practice. The collection of tool APIs can be provided by other developers (as in Plugins) or self-defined (as in function calls). 126 | 127 | HuggingGPT (Shen et al. 2023) is a framework to use ChatGPT as the task planner to select models available in HuggingFace platform according to the model descriptions and summarize the response based on the execution results. 128 | 129 | 130 | The system comprises of 4 stages: 131 | 132 | (1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning. 133 | 134 | Instruction: 135 | 136 | The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_ids, "args": {"text": text, "image": URL, "audio": URL, "video": URL}}]. The "dep" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag "-task_id" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can't be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning. 137 | (2) Model selection: LLM distributes the tasks to expert models, where the request is framed as a multiple-choice question. LLM is presented with a list of models to choose from. Due to the limited context length, task type based filtration is needed. 138 | 139 | Instruction: 140 | 141 | Given the user request and the call command, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The AI assistant merely outputs the model id of the most appropriate model. The output must be in a strict JSON format: "id": "id", "reason": "your detail reason for the choice". We have a list of models for you to choose from {{ Candidate Models }}. Please select one model from the list. 142 | (3) Task execution: Expert models execute on the specific tasks and log results. 143 | 144 | Instruction: 145 | 146 | With the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path. 147 | (4) Response generation: LLM receives the execution results and provides summarized results to users. 148 | 149 | To put HuggingGPT into real world usage, a couple challenges need to solve: (1) Efficiency improvement is needed as both LLM inference rounds and interactions with other models slow down the process; (2) It relies on a long context window to communicate over complicated task content; (3) Stability improvement of LLM outputs and external model services. 150 | 151 | API-Bank (Li et al. 2023) is a benchmark for evaluating the performance of tool-augmented LLMs. It contains 53 commonly used API tools, a complete tool-augmented LLM workflow, and 264 annotated dialogues that involve 568 API calls. The selection of APIs is quite diverse, including search engines, calculator, calendar queries, smart home control, schedule management, health data management, account authentication workflow and more. Because there are a large number of APIs, LLM first has access to API search engine to find the right API to call and then uses the corresponding documentation to make a call. 152 | 153 | 154 | In the API-Bank workflow, LLMs need to make a couple of decisions and at each step we can evaluate how accurate that decision is. Decisions include: 155 | 156 | Whether an API call is needed. 157 | Identify the right API to call: if not good enough, LLMs need to iteratively modify the API inputs (e.g. deciding search keywords for Search Engine API). 158 | Response based on the API results: the model can choose to refine and call again if results are not satisfied. 159 | This benchmark evaluates the agent’s tool use capabilities at three levels: 160 | 161 | Level-1 evaluates the ability to call the API. Given an API’s description, the model needs to determine whether to call a given API, call it correctly, and respond properly to API returns. 162 | Level-2 examines the ability to retrieve the API. The model needs to search for possible APIs that may solve the user’s requirement and learn how to use them by reading documentation. 163 | Level-3 assesses the ability to plan API beyond retrieve and call. Given unclear user requests (e.g. schedule group meetings, book flight/hotel/restaurant for a trip), the model may have to conduct multiple API calls to solve it. 164 | Case Studies 165 | Scientific Discovery Agent 166 | ChemCrow (Bran et al. 2023) is a domain-specific example in which LLM is augmented with 13 expert-designed tools to accomplish tasks across organic synthesis, drug discovery, and materials design. The workflow, implemented in LangChain, reflects what was previously described in the ReAct and MRKLs and combines CoT reasoning with tools relevant to the tasks: 167 | 168 | The LLM is provided with a list of tool names, descriptions of their utility, and details about the expected input/output. 169 | It is then instructed to answer a user-given prompt using the tools provided when necessary. The instruction suggests the model to follow the ReAct format - Thought, Action, Action Input, Observation. 170 | One interesting observation is that while the LLM-based evaluation concluded that GPT-4 and ChemCrow perform nearly equivalently, human evaluations with experts oriented towards the completion and chemical correctness of the solutions showed that ChemCrow outperforms GPT-4 by a large margin. This indicates a potential problem with using LLM to evaluate its own performance on domains that requires deep expertise. The lack of expertise may cause LLMs not knowing its flaws and thus cannot well judge the correctness of task results. 171 | 172 | Boiko et al. (2023) also looked into LLM-empowered agents for scientific discovery, to handle autonomous design, planning, and performance of complex scientific experiments. This agent can use tools to browse the Internet, read documentation, execute code, call robotics experimentation APIs and leverage other LLMs. 173 | 174 | For example, when requested to "develop a novel anticancer drug", the model came up with the following reasoning steps: 175 | 176 | inquired about current trends in anticancer drug discovery; 177 | selected a target; 178 | requested a scaffold targeting these compounds; 179 | Once the compound was identified, the model attempted its synthesis. 180 | They also discussed the risks, especially with illicit drugs and bioweapons. They developed a test set containing a list of known chemical weapon agents and asked the agent to synthesize them. 4 out of 11 requests (36%) were accepted to obtain a synthesis solution and the agent attempted to consult documentation to execute the procedure. 7 out of 11 were rejected and among these 7 rejected cases, 5 happened after a Web search while 2 were rejected based on prompt only. 181 | 182 | Generative Agents Simulation 183 | Generative Agents (Park, et al. 2023) is super fun experiment where 25 virtual characters, each controlled by a LLM-powered agent, are living and interacting in a sandbox environment, inspired by The Sims. Generative agents create believable simulacra of human behavior for interactive applications. 184 | 185 | The design of generative agents combines LLM with memory, planning and reflection mechanisms to enable agents to behave conditioned on past experience, as well as to interact with other agents. 186 | 187 | Memory stream: is a long-term memory module (external database) that records a comprehensive list of agents’ experience in natural language. 188 | Each element is an observation, an event directly provided by the agent. - Inter-agent communication can trigger new natural language statements. 189 | Retrieval model: surfaces the context to inform the agent’s behavior, according to relevance, recency and importance. 190 | Recency: recent events have higher scores 191 | Importance: distinguish mundane from core memories. Ask LM directly. 192 | Relevance: based on how related it is to the current situation / query. 193 | Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above) 194 | Prompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions. 195 | Planning & Reacting: translate the reflections and the environment information into actions 196 | Planning is essentially in order to optimize believability at the moment vs in time. 197 | Prompt template: {Intro of an agent X}. Here is X's plan today in broad strokes: 1) 198 | Relationships between agents and observations of one agent by another are all taken into consideration for planning and reacting. 199 | Environment information is present in a tree structure. 200 | 201 | This fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others). 202 | 203 | Proof-of-Concept Examples 204 | AutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing. 205 | 206 | Here is the system message used by AutoGPT, where {{...}} are user inputs: 207 | 208 | You are {{ai-name}}, {{user-provided AI bot description}}. 209 | Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications. 210 | 211 | GOALS: 212 | 213 | 1. {{user-provided goal 1}} 214 | 2. {{user-provided goal 2}} 215 | 3. ... 216 | 4. ... 217 | 5. ... 218 | 219 | Constraints: 220 | 1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files. 221 | 2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember. 222 | 3. No user assistance 223 | 4. Exclusively use the commands listed in double quotes e.g. "command name" 224 | 5. Use subprocesses for commands that will not terminate within a few minutes 225 | 226 | Commands: 227 | 1. Google Search: "google", args: "input": "" 228 | 2. Browse Website: "browse_website", args: "url": "", "question": "" 229 | 3. Start GPT Agent: "start_agent", args: "name": "", "task": "", "prompt": "" 230 | 4. Message GPT Agent: "message_agent", args: "key": "", "message": "" 231 | 5. List GPT Agents: "list_agents", args: 232 | 6. Delete GPT Agent: "delete_agent", args: "key": "" 233 | 7. Clone Repository: "clone_repository", args: "repository_url": "", "clone_path": "" 234 | 8. Write to file: "write_to_file", args: "file": "", "text": "" 235 | 9. Read file: "read_file", args: "file": "" 236 | 10. Append to file: "append_to_file", args: "file": "", "text": "" 237 | 11. Delete file: "delete_file", args: "file": "" 238 | 12. Search Files: "search_files", args: "directory": "" 239 | 13. Analyze Code: "analyze_code", args: "code": "" 240 | 14. Get Improved Code: "improve_code", args: "suggestions": "", "code": "" 241 | 15. Write Tests: "write_tests", args: "code": "", "focus": "" 242 | 16. Execute Python File: "execute_python_file", args: "file": "" 243 | 17. Generate Image: "generate_image", args: "prompt": "" 244 | 18. Send Tweet: "send_tweet", args: "text": "" 245 | 19. Do Nothing: "do_nothing", args: 246 | 20. Task Complete (Shutdown): "task_complete", args: "reason": "" 247 | 248 | Resources: 249 | 1. Internet access for searches and information gathering. 250 | 2. Long Term memory management. 251 | 3. GPT-3.5 powered Agents for delegation of simple tasks. 252 | 4. File output. 253 | 254 | Performance Evaluation: 255 | 1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities. 256 | 2. Constructively self-criticize your big-picture behavior constantly. 257 | 3. Reflect on past decisions and strategies to refine your approach. 258 | 4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps. 259 | 260 | You should only respond in JSON format as described below 261 | Response Format: 262 | { 263 | "thoughts": { 264 | "text": "thought", 265 | "reasoning": "reasoning", 266 | "plan": "- short bulleted\n- list that conveys\n- long-term plan", 267 | "criticism": "constructive self-criticism", 268 | "speak": "thoughts summary to say to user" 269 | }, 270 | "command": { 271 | "name": "command name", 272 | "args": { 273 | "arg name": "value" 274 | } 275 | } 276 | } 277 | Ensure the response can be parsed by Python json.loads 278 | GPT-Engineer is another project to create a whole repository of code given a task specified in natural language. The GPT-Engineer is instructed to think over a list of smaller components to build and ask for user input to clarify questions as needed. 279 | 280 | Here are a sample conversation for task clarification sent to OpenAI ChatCompletion endpoint used by GPT-Engineer. The user inputs are wrapped in {{user input text}}. 281 | 282 | [ 283 | { 284 | "role": "system", 285 | "content": "You will read instructions and not carry them out, only seek to clarify them.\nSpecifically you will first summarise a list of super short bullets of areas that need clarification.\nThen you will pick one clarifying question, and wait for an answer from the user.\n" 286 | }, 287 | { 288 | "role": "user", 289 | "content": "We are writing {{a Super Mario game in python. MVC components split in separate files. Keyboard control.}}\n" 290 | }, 291 | { 292 | "role": "assistant", 293 | "content": "Summary of areas that need clarification:\n1. Specifics of the Super Mario game (e.g. level design, characters, gameplay mechanics)\n2. Details about the MVC components (e.g. which components are in each file)\n3. Keyboard control implementation (e.g. which keys to use, how to handle input)\n\nClarifying question:\nCan you provide more details about the Super Mario game, such as level design, characters, and gameplay mechanics?" 294 | }, 295 | { 296 | "role": "user", 297 | "content": "{{There are 10 levels in total. The main character is a plumber named Mario, who can walk and jump. It is a classical platform game just like Super Mario. The main character moves from left to right, trying to get to the destination, where there are many obstacles and attacks from enemies in the process.}}\n\nIs anything else unclear? If yes, only answer in the form:\n{remaining unclear areas} remaining questions.\n{Next question}\nIf everything is sufficiently clear, only answer \"Nothing more to clarify.\"." 298 | }, 299 | { 300 | "role": "assistant", 301 | "content": "Remaining unclear areas: 2 remaining questions.\nCan you provide more information about how the MVC components are split into separate files?" 302 | }, 303 | { 304 | "role": "user", 305 | "content": "{{Make your own assumptions and state them explicitly before starting}}" 306 | } 307 | ] 308 | Then after these clarification, the agent moved into the code writing mode with a different system message. System message: 309 | 310 | You will get instructions for code to write. You will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code. Make sure that every detail of the architecture is, in the end, implemented as code. 311 | Think step by step and reason yourself to the right decisions to make sure we get it right. You will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose. 312 | 313 | Then you will output the content of each file including ALL code. Each file must strictly follow a markdown code block format, where the following tokens must be replaced such that FILENAME is the lowercase file name including the file extension, LANG is the markup code block language for the code’s language, and CODE is the code: 314 | 315 | FILENAME 316 | 317 | CODE 318 | You will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on. Please note that the code should be fully functional. No placeholders. 319 | 320 | Follow a language and framework appropriate best practice file naming convention. Make sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other. Ensure to implement all code, if you are unsure, write a plausible implementation. Include module dependency or package manager dependency definition file. Before you finish, double check that all parts of the architecture is present in the files. 321 | 322 | Useful to know: You almost always put different classes in different files. For Python, you always create an appropriate requirements.txt file. For NodeJS, you always create an appropriate package.json file. You always add a comment briefly describing the purpose of the function definition. You try to add comments explaining very complex bits of logic. You always follow the best practices for the requested languages in terms of describing the code written as a defined package/project. 323 | 324 | Python toolbelt preferences: 325 | 326 | pytest 327 | dataclasses 328 | Conversation samples: 329 | 330 | [ 331 | { 332 | "role": "system", 333 | "content": "You will get instructions for code to write.\nYou will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code.\nMake sure that every detail of the architecture is, in the end, implemented as code.\n\nThink step by step and reason yourself to the right decisions to make sure we get it right.\nYou will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.\n\nThen you will output the content of each file including ALL code.\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\nFILENAME is the lowercase file name including the file extension,\nLANG is the markup code block language for the code's language, and CODE is the code:\n\nFILENAME\n\`\`\`LANG\nCODE\n\`\`\`\n\nYou will start with the \"entrypoint\" file, then go to the ones that are imported by that file, and so on.\nPlease note that the code should be fully functional. No placeholders.\n\nFollow a language and framework appropriate best practice file naming convention.\nMake sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other.\nEnsure to implement all code, if you are unsure, write a plausible implementation.\nInclude module dependency or package manager dependency definition file.\nBefore you finish, double check that all parts of the architecture is present in the files.\n\nUseful to know:\nYou almost always put different classes in different files.\nFor Python, you always create an appropriate requirements.txt file.\nFor NodeJS, you always create an appropriate package.json file.\nYou always add a comment briefly describing the purpose of the function definition.\nYou try to add comments explaining very complex bits of logic.\nYou always follow the best practices for the requested languages in terms of describing the code written as a defined\npackage/project.\n\n\nPython toolbelt preferences:\n- pytest\n- dataclasses\n" 334 | }, 335 | # … same conversation as earlier, ended with "Make your own assumptions and state them explicitly before starting". 336 | { 337 | "role": "assistant", 338 | "content": "Assumptions:\n1. Model: The model will contain the game's data, such as level information, character states, and enemy positions.\n2. View: The view will handle the game's visuals, including rendering the game objects, backgrounds, and updating the display.\n3. Controller: The controller will manage user input, such as keyboard controls, and update the model accordingly.\n\nWith these assumptions in mind, is there any specific aspect of the keyboard control implementation you would like me to clarify?" 339 | }, 340 | { 341 | "role": "user", 342 | "content": "Please now remember the steps:\n\nThink step by step and reason yourself to the right decisions to make sure we get it right.\nFirst lay out the names of the core classes, functions, methods that will be necessary, As well as a quick comment on their purpose.\n\nThen you will output the content of each file including ALL code.\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\nFILENAME is the lowercase file name including the file extension,\nLANG is the markup code block language for the code's language, and CODE is the code:\n\nFILENAME\n\`\`\`LANG\nCODE\n\`\`\`\n\nPlease note that the code should be fully functional. No placeholders.\n\nYou will start with the \"entrypoint\" file, then go to the ones that are imported by that file, and so on.\nFollow a language and framework appropriate best practice file naming convention.\nMake sure that files contain all imports, types etc. The code should be fully functional. Make sure that code in different files are compatible with each other.\nBefore you finish, double check that all parts of the architecture is present in the files.\n" 343 | } 344 | ] 345 | Challenges 346 | After going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations: 347 | 348 | Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention. 349 | 350 | Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error. 351 | 352 | Reliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.`; 353 | -------------------------------------------------------------------------------- /wrangler.toml: -------------------------------------------------------------------------------- 1 | name = "langchain-cloudflare-nuxt-template" 2 | main = "./.output/server/index.mjs" 3 | compatibility_date = "2024-01-10" 4 | compatibility_flags = [ "nodejs_compat" ] 5 | 6 | [[vectorize]] 7 | binding = "AI_KNOWLEDGE_VECTORIZE_INDEX" # available within your Worker on env.AI_KNOWLEDGE_VECTORIZE_INDEX 8 | index_name = "langchain_ai_docs_index" 9 | 10 | [[vectorize]] 11 | binding = "CLOUDFLARE_KNOWLEDGE_VECTORIZE_INDEX" # available within your Worker on env.CLOUDFLARE_KNOWLEDGE_VECTORIZE_INDEX 12 | index_name = "langchain_cloudflare_docs_index" 13 | 14 | [ai] 15 | binding = "AI" 16 | 17 | [vars] 18 | LANGCHAIN_TRACING_V2 = "true" 19 | LANGCHAIN_SESSION = "cloudflare-nuxt" --------------------------------------------------------------------------------