├── .gitignore ├── LICENSE ├── README.md ├── __init__.py ├── assets ├── ServerlessRAG.png └── full-architecture.png ├── data-pipeline ├── .gitignore ├── README.md ├── docs │ └── bedrock-docs-2012.10.20.pdf ├── ingest.py ├── requirements.txt └── run.sh ├── events ├── event-question.json └── event-warmup.json ├── example-pattern.json ├── ragfunction ├── .gitignore ├── Dockerfile ├── __init__.py ├── app.py └── requirements.txt ├── template.yaml └── test.sh /.gitignore: -------------------------------------------------------------------------------- 1 | samconfig.toml -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Giuseppe 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Serverless Retrieval Augmented Generation (RAG) with Amazon Bedrock, Lambda, and S3 2 | 3 | This pattern demonstates an implementation of Retrieval Augmented Generation, making usage of Bedrock, Lambda, and LanceDB embedding store backed by S3. 4 | 5 | Learn more about this pattern at Serverless Land Patterns [here](https://github.com/aws-samples/serverless-patterns/tree/main/apigw-lambda-bedrock-s3-rag) 6 | 7 | Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. 8 | 9 | ## Requirements 10 | 11 | * [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. 12 | * [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured 13 | * [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) 14 | * [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed 15 | 16 | ## Deployment Instructions 17 | 18 | 1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: 19 | ``` 20 | git clone https://github.com/giusedroid/serverless-rag-python.git 21 | ``` 22 | 1. Change directory to the pattern directory: 23 | ``` 24 | cd serverless-rag-python 25 | ``` 26 | 1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file: 27 | ``` 28 | sam build 29 | sam deploy --guided 30 | ``` 31 | 1. During the prompts: 32 | * Enter a stack name 33 | * Enter the desired AWS Region 34 | * Allow SAM CLI to create IAM roles with the required permissions. 35 | 36 | Once you have run `sam deploy --guided` mode once and saved arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults. 37 | 38 | 1. Note your stack name and outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing. 39 | 40 | ## How it works 41 | ![high level diagram](./assets/ServerlessRAG.png) 42 | 43 | With this pattern we want to showcase how to implement a serverless Retrieval Augmented Generation (RAG) architecture. 44 | Customers asked for a way to quickly test RAG capabilities on a small number of documents without managing infrastructure for contextual knowledge and non-parametric memory. 45 | In this pattern, we run a RAG workflow in a single Lambda function, so that customers only pay for the infrastructure they use, when they use it. 46 | We use [LanceDB](https://lancedb.com/) with Amazon S3 as backend for embedding storage. 47 | This pattern deploys one Lambda function behind an API Gateway and an S3 Bucket where to store your embeddings. 48 | This pattern makes use of Bedrock to calculate embeddings with Amazon Titan Embedding and Claude as prediction LLM. 49 | We also provide a local pipeline to ingest your PDFs and upload them to S3. 50 | 51 | ![Full architecture](./assets/full-architecture.png) 52 | 53 | ## Ingest Documents 54 | Once your stack has been deployed, you can ingest PDF documents by following the instructions in [`./data-pipeline/README.md`](./data-pipeline/README.md) 55 | 56 | ## Testing 57 | 58 | ```bash 59 | ./test.sh # find it in ./samconfig.toml 60 | ``` 61 | 62 | ### Sample Output 63 | ```bash 64 | Starting warmup at 2023-11-01 16:20:38.662 65 | Warmup ended at 2023-11-01 16:20:41.156 66 | Time to Lambda timeout: 28999 ms 67 | 68 | Testing prediction endpoint at 2023-11-01 16:20:41.183 69 | Prediction ended at 2023-11-01 16:20:59.934 70 | 71 | {"message": " Based on the provided context, the relevant information about Amazon Bedrock pricing is:\n\nWith Amazon Bedrock, you pay to run inference on any of the third-party foundation models. Pricing is based on the volume of input tokens and output tokens, and on whether you have purchased provisioned throughput for the model. For more inform tion, see the Model providers page in the Amazon Bedrock console. For each model, pricing is listed following the model version. For more information about purchasing provisioned throughput, see Provisioned throughput (p. 55).\n\nFor more information, see Amazon Bedrock Pricing.\n\nSo in summary, the Amazon Bedrock pricing model is based on the number of tokens processed during inference and whether provisioned throughput is purchased. The pricing details for each model are listed in the console."} 72 | ``` 73 | 74 | ## Known Limitations 75 | This implementation is subject to the hard limit on the timeout of APIGateway set to 29 seconds. We have measured the probability of failure because of `endpoint timeout` errors to be around 5% (n=700 observations). 76 | 77 | ## Cleanup 78 | 79 | 1. Delete the stack 80 | ```bash 81 | sam delete 82 | ``` 83 | 84 | ## Authors 85 | ### Giuseppe Battista 86 | Giuseppe Battista is a Senior Solutions Architect at Amazon Web Services. He leads soultions architecture for Early Stage Startups in UK and Ireland. He hosts the Twitch Show \"Let's Build a Startup\" on twitch.tv/aws and he's head of Unicorn's Den accelerator. 87 | 88 | [LinkedIn](https://www.linkedin.com/in/giusedroid/) 89 | [GitHub](https://github.com/giusedroid) 90 | [Buy me a Pint](https://monzo.me/giusebattista?amount=7) 91 | 92 | ### Kevin Shaffer-Morrison 93 | Kevin Shaffer-Morrison is a Senior Solutions Architect at Amazon Web Services. He's helped hundreds of startups get off the ground quickly and up into the cloud. Kevin focuses on helping the earliest stage of founders with code samples and twitch live streams on twitch.tv/aws. 94 | 95 | [Linkedin](https://www.linkedin.com/in/kshaffermorrison) 96 | [GitHub](https://github.com/shafkevi) 97 | 98 | ---- 99 | 100 | SPDX-License-Identifier: MIT-0 101 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/__init__.py -------------------------------------------------------------------------------- /assets/ServerlessRAG.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/assets/ServerlessRAG.png -------------------------------------------------------------------------------- /assets/full-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/assets/full-architecture.png -------------------------------------------------------------------------------- /data-pipeline/.gitignore: -------------------------------------------------------------------------------- 1 | embeddings/ -------------------------------------------------------------------------------- /data-pipeline/README.md: -------------------------------------------------------------------------------- 1 | # Serverless RAG with Lambda, S3, and LanceDB - Data Ingestion 2 | 3 | This data ingestion pipeline allows you to create embeddings from your PDF 4 | documents and make them available to LanceDB in your Lambda function. 5 | 6 | ## Prerequisites 7 | 8 | As you'll be running this locally, make sure your `aws cli` is configured with 9 | permissions to PUT files on S3 and invoke models on Amazon Bedrock. 10 | 11 | ## Usage 12 | 13 | 1. Make sure you have deployed the stack to your AWS account first. More info in 14 | the [`README`](../README.md) in the root of this repository 15 | 1. `pip install -r requirements.txt` 16 | 1. Copy all the `.pdf` documents you want to ingest to `./docs`. Make sure they 17 | all have `.pdf` extensions. In this example, we'll only ingest PDF documents. 18 | 1. Get your stack name from `../samconfig.toml` 19 | 20 | 21 | ```bash 22 | ./run.sh 23 | ``` 24 | 25 | ## Notes 26 | 27 | Once you've run the script, you can find your embeddings in `./embeddings`. 28 | To produce such embeddings we're making use of 29 | [Amazon's Titan Embedding model](https://aws.amazon.com/bedrock/titan/#Titan_Embeddings_.28generally_available.29). -------------------------------------------------------------------------------- /data-pipeline/docs/bedrock-docs-2012.10.20.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/data-pipeline/docs/bedrock-docs-2012.10.20.pdf -------------------------------------------------------------------------------- /data-pipeline/ingest.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from langchain.text_splitter import CharacterTextSplitter 4 | from langchain.vectorstores import LanceDB 5 | from langchain.embeddings import BedrockEmbeddings 6 | from langchain.document_loaders import PyPDFDirectoryLoader 7 | 8 | import lancedb as ldb 9 | import pyarrow as pa 10 | 11 | embeddings = BedrockEmbeddings( 12 | region_name="us-west-2" 13 | ) 14 | 15 | # we split the data into chunks of 1,000 characters, with an overlap 16 | # of 200 characters between the chunks, which helps to give better results 17 | # and contain the context of the information between chunks 18 | text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200) 19 | 20 | db = ldb.connect('/tmp/embeddings') 21 | 22 | schema = pa.schema( 23 | [ 24 | pa.field("vector", pa.list_(pa.float32(), 1536)), # document vector with 1.5k dimensions (TitanEmbedding) 25 | pa.field("text", pa.string()), # langchain requires it 26 | pa.field("id", pa.string()) # langchain requires it 27 | ]) 28 | 29 | tbl = db.create_table("doc_table", schema=schema) 30 | 31 | # load the document as before 32 | 33 | loader = PyPDFDirectoryLoader("./docs/") 34 | 35 | docs = loader.load() 36 | docs = text_splitter.split_documents(docs) 37 | 38 | LanceDB.from_documents(docs, embeddings, connection=tbl) 39 | -------------------------------------------------------------------------------- /data-pipeline/requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.6 2 | aiosignal==1.3.1 3 | annotated-types==0.6.0 4 | anyio==3.7.1 5 | asgiref==3.7.2 6 | astroid==2.15.5 7 | asttokens==2.2.1 8 | async-timeout==4.0.3 9 | attrs==23.1.0 10 | Automat==20.2.0 11 | Babel==2.8.0 12 | backcall==0.2.0 13 | bcrypt==3.2.0 14 | blinker==1.4 15 | boto==2.49.0 16 | boto3==1.28.73 17 | botocore==1.31.73 18 | cachetools==5.3.2 19 | certifi==2020.6.20 20 | chardet==4.0.0 21 | charset-normalizer==3.3.1 22 | click==8.1.7 23 | cloud-init==23.2.2 24 | colorama==0.4.4 25 | command-not-found==0.3 26 | configobj==5.0.6 27 | constantly==15.1.0 28 | cryptography==3.4.8 29 | dataclasses-json==0.6.1 30 | dbus-python==1.2.18 31 | decorator==5.1.1 32 | deprecation==2.1.0 33 | dill==0.3.7 34 | distro==1.7.0 35 | distro-info==1.1+ubuntu0.1 36 | Django==4.2.2 37 | ec2-hibinit-agent==1.0.0 38 | exceptiongroup==1.1.3 39 | executing==1.2.0 40 | frozenlist==1.4.0 41 | git-remote-codecommit==1.16 42 | greenlet==3.0.1 43 | hibagent==1.0.1 44 | httplib2==0.20.2 45 | hyperlink==21.0.0 46 | idna==3.3 47 | ikp3db @ https://dhj20r2nmszcd.cloudfront.net/static/ikp3db-1.4.1.post1.tar.gz 48 | importlib-metadata==4.6.4 49 | incremental==21.3.0 50 | ipython==8.14.0 51 | isort==5.12.0 52 | jedi==0.18.2 53 | jeepney==0.7.1 54 | Jinja2==3.0.3 55 | jmespath==1.0.1 56 | jsonpatch==1.33 57 | jsonpointer==2.0 58 | jsonschema==3.2.0 59 | keyring==23.5.0 60 | lancedb==0.3.2 61 | langchain==0.0.325 62 | langsmith==0.0.53 63 | launchpadlib==1.10.16 64 | lazr.restfulclient==0.14.4 65 | lazr.uri==1.0.6 66 | lazy-object-proxy==1.9.0 67 | MarkupSafe==2.0.1 68 | marshmallow==3.20.1 69 | matplotlib-inline==0.1.6 70 | mccabe==0.7.0 71 | more-itertools==8.10.0 72 | multidict==6.0.4 73 | mypy-extensions==1.0.0 74 | netifaces==0.11.0 75 | numpy==1.26.1 76 | oauthlib==3.2.0 77 | packaging==23.2 78 | pandas==2.1.2 79 | parso==0.8.3 80 | pexpect==4.8.0 81 | pickleshare==0.7.5 82 | platformdirs==3.10.0 83 | prompt-toolkit==3.0.39 84 | ptyprocess==0.7.0 85 | pure-eval==0.2.2 86 | py==1.11.0 87 | pyarrow==13.0.0 88 | pyasn1==0.4.8 89 | pyasn1-modules==0.2.1 90 | pydantic==2.4.2 91 | pydantic_core==2.10.1 92 | Pygments==2.16.1 93 | PyGObject==3.42.1 94 | PyHamcrest==2.0.2 95 | PyJWT==2.3.0 96 | pylance==0.8.7 97 | pylint==2.17.4 98 | pylint-django==2.5.3 99 | pylint-flask==0.6 100 | pylint-plugin-utils==0.8.2 101 | pyOpenSSL==21.0.0 102 | pyparsing==2.4.7 103 | pypdf==3.17.0 104 | pyrsistent==0.18.1 105 | pyserial==3.5 106 | python-apt==2.4.0+ubuntu2 107 | python-dateutil==2.8.2 108 | python-debian==0.1.43+ubuntu1.1 109 | python-magic==0.4.24 110 | pytz==2022.1 111 | PyYAML==6.0.1 112 | ratelimiter==1.2.0.post0 113 | requests==2.31.0 114 | retry==0.9.2 115 | s3transfer==0.7.0 116 | SecretStorage==3.3.1 117 | semver==3.0.2 118 | service-identity==18.1.0 119 | six==1.16.0 120 | sniffio==1.3.0 121 | sos==4.5.6 122 | SQLAlchemy==2.0.22 123 | sqlparse==0.4.4 124 | ssh-import-id==5.11 125 | stack-data==0.6.2 126 | systemd-python==234 127 | tenacity==8.2.3 128 | tomli==2.0.1 129 | tomlkit==0.12.1 130 | tqdm==4.66.1 131 | traitlets==5.9.0 132 | Twisted==22.1.0 133 | typing-inspect==0.9.0 134 | typing_extensions==4.7.1 135 | tzdata==2023.3 136 | ubuntu-advantage-tools==8001 137 | ufw==0.36.1 138 | unattended-upgrades==0.1 139 | urllib3==1.26.5 140 | wadllib==1.3.6 141 | wcwidth==0.2.6 142 | wrapt==1.15.0 143 | yarl==1.9.2 144 | zipp==1.0.0 145 | zope.interface==5.4.0 146 | -------------------------------------------------------------------------------- /data-pipeline/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Function to display help message 4 | usage() { 5 | echo "Usage: $0 " 6 | echo "Please provide your stack name. You can find it in your samconfig.toml" 7 | echo "Example: $0 my-stack-name" 8 | } 9 | 10 | # Check if at least one argument is provided 11 | if [ "$#" -ne 1 ]; then 12 | usage 13 | exit 1 14 | fi 15 | 16 | # Define the directory and file extension for documents 17 | directory="./docs" 18 | file_extension="pdf" # Change this to your desired file extension 19 | 20 | # Function to display error message 21 | error_message() { 22 | echo "Error: No valid documents found in $directory. Make sure your document has a .pdf extension" 23 | } 24 | 25 | 26 | # Find and print all documents in the directory 27 | documents=$(find "$directory" -type f -name "*.$file_extension") 28 | if [ -z "$documents" ]; then 29 | error_message 30 | exit 1 31 | else 32 | echo "Importing documents into LanceDB:" 33 | echo "$documents" 34 | fi 35 | 36 | rm -rf /tmp/embeddings 37 | python3 ingest.py 38 | 39 | 40 | STACK_NAME=$1 41 | BUCKET_NAME=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`DocumentBucketName`].OutputValue' --output text) 42 | 43 | 44 | 45 | cp -r /tmp/embeddings ./ 46 | 47 | echo "Exporting embeddings to s3://${BUCKET_NAME}" 48 | aws s3 sync ./embeddings s3://${BUCKET_NAME} -------------------------------------------------------------------------------- /events/event-question.json: -------------------------------------------------------------------------------- 1 | { 2 | "queryStringParameters": { 3 | "question": "what is the pricing model of Amazon Bedrock?" 4 | } 5 | } 6 | -------------------------------------------------------------------------------- /events/event-warmup.json: -------------------------------------------------------------------------------- 1 | { 2 | "queryStringParameters": { 3 | "warmup": "true" 4 | } 5 | } 6 | -------------------------------------------------------------------------------- /example-pattern.json: -------------------------------------------------------------------------------- 1 | { 2 | "title": "Serverless Retrieval Augmented Generation (RAG) with Bedrock, Lambda, S3", 3 | "description": "With this pattern we want to showcase how to implement a serverless RAG architecture.", 4 | "language": "Python", 5 | "level": "400", 6 | "framework": "SAM", 7 | "introBox": { 8 | "headline": "How it works", 9 | "text": [ 10 | "With this pattern we want to showcase how to implement a serverless RAG architecture.", 11 | "Customers asked for a way to quickly test RAG capabilities on a small number of documents without managing infrastructure for contextual knowledge and non-parametric memory.", 12 | "In this pattern, we run a RAG workflow in a single Lambda function, so that customers only pay for the infrastructure they use, when they use it. We use LanceDB with S3 as backend for embedding storage.", 13 | "This pattern deploys one Lambda function behind an API Gateway and an S3 Bucket where to store your embeddings.", 14 | "This pattern makes use of Bedrock to calculate embeddings with Amazon Titan Embedding and Claude as prediction LLM", 15 | "We also provide a local pipeline to ingest your PDFs and upload them to S3." 16 | ] 17 | }, 18 | "gitHub": { 19 | "template": { 20 | "repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/apigw-lambda-bedrock-s3-rag", 21 | "templateURL": "serverless-patterns/apigw-lambda-bedrock-s3-rag", 22 | "projectFolder": "apigw-lambda-bedrock-s3-rag", 23 | "templateFile": "template.yaml" 24 | } 25 | }, 26 | "resources": { 27 | "bullets": [ 28 | { 29 | "text": "Amazon Bedrock", 30 | "link": "https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html" 31 | }, 32 | { 33 | "text": "Retrieval Augmented Generation (RAG) - original research paper", 34 | "link": "https://arxiv.org/abs/2005.11401" 35 | }, 36 | { 37 | "text": "Amazon Titan Embedding", 38 | "link": "https://aws.amazon.com/bedrock/titan/#Titan_Embeddings_.28generally_available.29" 39 | } 40 | ] 41 | }, 42 | "deploy": { 43 | "text": [ 44 | "sam build", 45 | "sam deploy -g" 46 | ] 47 | }, 48 | "testing": { 49 | "text": [ 50 | "See the GitHub repo for detailed testing instructions." 51 | ] 52 | }, 53 | "cleanup": { 54 | "text": [ 55 | "sam delete" 56 | ] 57 | }, 58 | "authors": [ 59 | { 60 | "name": "Kevin Shaffer-Morrison", 61 | "image": "https://kevin.shaffer-morrison.com/images/sideProfileHeadshot.jpg", 62 | "bio": "Kevin Shaffer-Morrison is a Senior Solutions Architect at Amazon Web Services. He's helped hundreds of startups get off the ground quickly and up into the cloud. Kevin focuses on helping the earliest stage of founders with code samples and twitch live streams on twitch.tv/aws.", 63 | "linkedin": "kshaffermorrison" 64 | }, 65 | { 66 | "name": "Giuseppe Battista", 67 | "image": "https://media.licdn.com/dms/image/D4E03AQGOz6p4p-rSfg/profile-displayphoto-shrink_800_800/0/1686238097666?e=1703721600&v=beta&t=bhKqpehfON17sLUNDn-yFUKJhohuWljVKNCarDnsdYA", 68 | "bio": "Giuseppe Battista is a Senior Solutions Architect at Amazon Web Services. He leads soultions architecture for Early Stage Startups in UK and Ireland. He hosts the Twitch Show \"Let's Build a Startup\" on twitch.tv/aws and he's head of Unicorn's Den accelerator.", 69 | "linkedin": "giusedroid", 70 | "twitter": "giusedroid" 71 | } 72 | ] 73 | } 74 | -------------------------------------------------------------------------------- /ragfunction/.gitignore: -------------------------------------------------------------------------------- 1 | .aws-sam/ -------------------------------------------------------------------------------- /ragfunction/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM public.ecr.aws/lambda/python:3.10 2 | 3 | COPY app.py requirements.txt ./ 4 | 5 | RUN python3.10 -m pip install -r requirements.txt -t . 6 | 7 | # Command can be overwritten by providing a different command in the template directly. 8 | CMD ["app.lambda_handler"] 9 | -------------------------------------------------------------------------------- /ragfunction/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/ragfunction/__init__.py -------------------------------------------------------------------------------- /ragfunction/app.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | 4 | from langchain.prompts import ChatPromptTemplate 5 | from langchain.chat_models import BedrockChat 6 | from langchain.embeddings import BedrockEmbeddings 7 | from langchain.schema.output_parser import StrOutputParser 8 | from langchain.schema.runnable import RunnablePassthrough 9 | from langchain.vectorstores import LanceDB 10 | 11 | import lancedb as ldb 12 | 13 | embeddings = BedrockEmbeddings( 14 | region_name="us-west-2" 15 | ) 16 | 17 | # Retrieve data from S3 and load it into local LanceDB 18 | s3_bucket = os.environ.get('s3BucketName') 19 | db = ldb.connect(f"s3://{s3_bucket}/") 20 | tbl = db.open_table('doc_table') 21 | 22 | # Initialize LanceDB instance within langchain 23 | vectorstore = LanceDB(connection=tbl, embedding=embeddings) 24 | retriever = vectorstore.as_retriever() 25 | 26 | template = """Answer the question based only on the following context: 27 | {context} 28 | 29 | Question: {question} 30 | """ 31 | 32 | # Create langchain prompt and initialize the Bedrock model 33 | prompt = ChatPromptTemplate.from_template(template) 34 | model = BedrockChat(model_id="anthropic.claude-v2", model_kwargs={"temperature": 0.1}) 35 | 36 | # Chain together the RAG search to Lance, our prompt, the LLM model, and stringifying the output 37 | chain = ( 38 | {"context": retriever, "question": RunnablePassthrough()} 39 | | prompt 40 | | model 41 | | StrOutputParser() 42 | ) 43 | 44 | def lambda_handler(event, context): 45 | 46 | if event['queryStringParameters'].get('warmup') is not None: 47 | return { 48 | "statusCode": 202, 49 | "body": json.dumps({ 50 | "message": "warming up", 51 | "toTimeout": context.get_remaining_time_in_millis() 52 | }) 53 | } 54 | 55 | question = event['queryStringParameters'].get('question') 56 | 57 | if question is not None: 58 | 59 | results = chain.invoke(question) 60 | print(results) 61 | 62 | return { 63 | "statusCode": 200, 64 | "body": json.dumps({ 65 | "message": results, 66 | }), 67 | } 68 | 69 | return { 70 | "statusCode": 400, 71 | "body":{ 72 | json.dumps({ 73 | "message":"either provide question or warmup directive" 74 | }) 75 | } 76 | } 77 | 78 | 79 | -------------------------------------------------------------------------------- /ragfunction/requirements.txt: -------------------------------------------------------------------------------- 1 | lancedb==0.3.2 2 | langchain==0.0.325 3 | boto3==1.28.73 4 | pandas==2.1.2 -------------------------------------------------------------------------------- /template.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: '2010-09-09' 2 | Transform: AWS::Serverless-2016-10-31 3 | Description: > 4 | python3.10 5 | 6 | Sample SAM Template for RAG in a Lambda 7 | 8 | # More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst 9 | Globals: 10 | Function: 11 | Timeout: 29 12 | 13 | Resources: 14 | 15 | DocumentBucket: 16 | Type: 'AWS::S3::Bucket' 17 | Properties: 18 | BucketName: !Join [ "-", [!Ref AWS::StackName, document-bucket ] ] 19 | 20 | 21 | RAGFunction: 22 | Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction 23 | Properties: 24 | PackageType: Image 25 | MemorySize: 4048 26 | Architectures: 27 | - x86_64 28 | Events: 29 | HelloWorld: 30 | Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api 31 | Properties: 32 | Path: /rag 33 | Method: get 34 | RequestParameters: 35 | - method.request.querystring.question 36 | - method.request.querystring.warmup 37 | Environment: 38 | Variables: 39 | s3BucketName: !Ref DocumentBucket 40 | Policies: 41 | - Statement: 42 | - Effect: Allow 43 | Action: 'bedrock:InvokeModel' 44 | Resource: '*' 45 | - Effect: Allow 46 | Action: s3:GetObject 47 | Resource: !Sub arn:aws:s3:::${DocumentBucket}/* 48 | - Effect: Allow 49 | Action: s3:ListBucket 50 | Resource: !Sub arn:aws:s3:::${DocumentBucket} 51 | - Effect: Allow 52 | Action: s3:ListAllMyBuckets 53 | Resource: "*" 54 | Metadata: 55 | Dockerfile: Dockerfile 56 | DockerContext: ./ragfunction 57 | DockerTag: python3.10-v1 58 | 59 | Outputs: 60 | # ServerlessRestApi is an implicit API created out of Events key under Serverless::Function 61 | # Find out more about other implicit resources you can reference within SAM 62 | # https://github.com/awslabs/serverless-application-model/blob/master/docs/internals/generated_resources.rst#api 63 | RAGApi: 64 | Description: "API Gateway endpoint URL for Prod stage for Hello World function" 65 | Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/rag/" 66 | RAGFunction: 67 | Description: "Hello World Lambda Function ARN" 68 | Value: !GetAtt RAGFunction.Arn 69 | RAGFunctionIamRole: 70 | Description: "Implicit IAM Role created for Hello World function" 71 | Value: !GetAtt RAGFunctionRole.Arn 72 | DocumentBucketName: 73 | Description: "S3 bucket where LanceDB sources embeddings. Check this repository README for instructions on how to import your documents" 74 | Value: !Ref DocumentBucket 75 | -------------------------------------------------------------------------------- /test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Function to display help message 4 | usage() { 5 | echo "Usage: $0 " 6 | echo "Please provide your stack name. You can find it in your samconfig.toml" 7 | echo "Example: $0 my-stack-name" 8 | } 9 | 10 | # Check if at least one argument is provided 11 | if [ "$#" -ne 1 ]; then 12 | usage 13 | exit 1 14 | fi 15 | 16 | STACK_NAME=$1 17 | API_URL=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`RAGApi`].OutputValue' --output text) 18 | 19 | WARMUP_URL="$API_URL?warmup=true" 20 | PROMPT_URL="$API_URL?question=amazon+bedrock+pricing+model" 21 | 22 | WARMUP_FILE=warmup.txt 23 | BODY_FILE=response_body.json 24 | 25 | echo "Starting warmup at $(date +"%Y-%m-%d %T.%3N")" 26 | curl -s $WARMUP_URL -o $WARMUP_FILE 27 | echo "Warmup ended at $(date +"%Y-%m-%d %T.%3N")" 28 | echo "Time to Lambda timeout: $(cat $WARMUP_FILE | jq .toTimeout) ms" 29 | 30 | echo "Testing prediction endpoint at $(date +"%Y-%m-%d %T.%3N")" 31 | curl -s $PROMPT_URL -o $BODY_FILE 32 | echo "Prediction ended at $(date +"%Y-%m-%d %T.%3N")" 33 | 34 | cat $BODY_FILE 35 | 36 | rm $BODY_FILE 37 | rm $WARMUP_FILE 38 | --------------------------------------------------------------------------------