├── .gitignore
├── LICENSE
├── README.md
├── __init__.py
├── assets
    ├── ServerlessRAG.png
    └── full-architecture.png
├── data-pipeline
    ├── .gitignore
    ├── README.md
    ├── docs
    │   └── bedrock-docs-2012.10.20.pdf
    ├── ingest.py
    ├── requirements.txt
    └── run.sh
├── events
    ├── event-question.json
    └── event-warmup.json
├── example-pattern.json
├── ragfunction
    ├── .gitignore
    ├── Dockerfile
    ├── __init__.py
    ├── app.py
    └── requirements.txt
├── template.yaml
└── test.sh


/.gitignore:
--------------------------------------------------------------------------------
1 | samconfig.toml


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Giuseppe
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Serverless Retrieval Augmented Generation (RAG) with Amazon Bedrock, Lambda, and S3
  2 | 
  3 | This pattern demonstates an implementation of Retrieval Augmented Generation, making usage of Bedrock, Lambda, and LanceDB embedding store backed by S3.
  4 | 
  5 | Learn more about this pattern at Serverless Land Patterns [here](https://github.com/aws-samples/serverless-patterns/tree/main/apigw-lambda-bedrock-s3-rag)
  6 | 
  7 | Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
  8 | 
  9 | ## Requirements
 10 | 
 11 | * [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources.
 12 | * [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured
 13 | * [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
 14 | * [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed
 15 | 
 16 | ## Deployment Instructions
 17 | 
 18 | 1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
 19 |     ``` 
 20 |     git clone https://github.com/giusedroid/serverless-rag-python.git
 21 |     ```
 22 | 1. Change directory to the pattern directory:
 23 |     ```
 24 |     cd serverless-rag-python
 25 |     ```
 26 | 1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file:
 27 |     ```
 28 |     sam build
 29 |     sam deploy --guided
 30 |     ```
 31 | 1. During the prompts:
 32 |     * Enter a stack name
 33 |     * Enter the desired AWS Region
 34 |     * Allow SAM CLI to create IAM roles with the required permissions.
 35 | 
 36 |     Once you have run `sam deploy --guided` mode once and saved arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults.
 37 | 
 38 | 1. Note your stack name and outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing.
 39 | 
 40 | ## How it works
 41 | ![high level diagram](./assets/ServerlessRAG.png)
 42 | 
 43 | With this pattern we want to showcase how to implement a serverless Retrieval Augmented Generation (RAG) architecture.  
 44 | Customers asked for a way to quickly test RAG capabilities on a small number of documents without managing infrastructure for contextual knowledge and non-parametric memory.  
 45 | In this pattern, we run a RAG workflow in a single Lambda function, so that customers only pay for the infrastructure they use, when they use it.  
 46 | We use [LanceDB](https://lancedb.com/) with Amazon S3 as backend for embedding storage.  
 47 | This pattern deploys one Lambda function behind an API Gateway and an S3 Bucket where to store your embeddings.  
 48 | This pattern makes use of Bedrock to calculate embeddings with Amazon Titan Embedding and Claude as prediction LLM.  
 49 | We also provide a local pipeline to ingest your PDFs and upload them to S3.
 50 | 
 51 | ![Full architecture](./assets/full-architecture.png)
 52 | 
 53 | ## Ingest Documents
 54 | Once your stack has been deployed, you can ingest PDF documents by following the instructions in [`./data-pipeline/README.md`](./data-pipeline/README.md)
 55 | 
 56 | ## Testing
 57 | 
 58 | ```bash
 59 | ./test.sh <your-stack-name> # find it in ./samconfig.toml
 60 | ```
 61 | 
 62 | ### Sample Output
 63 | ```bash
 64 | Starting warmup at 2023-11-01 16:20:38.662
 65 | Warmup ended at 2023-11-01 16:20:41.156
 66 | Time to Lambda timeout: 28999 ms
 67 | 
 68 | Testing prediction endpoint at 2023-11-01 16:20:41.183
 69 | Prediction ended at 2023-11-01 16:20:59.934
 70 | 
 71 | {"message": " Based on the provided context, the relevant information about Amazon Bedrock pricing is:\n\nWith Amazon Bedrock, you pay to run inference on any of the third-party foundation models. Pricing is based on the volume of input tokens and output tokens, and on whether you have purchased provisioned throughput for the model. For more inform tion, see the Model providers page in the Amazon Bedrock console. For each model, pricing is listed following the model version. For more information about purchasing provisioned throughput, see Provisioned throughput (p. 55).\n\nFor more information, see Amazon Bedrock Pricing.\n\nSo in summary, the Amazon Bedrock pricing model is based on the number of tokens processed during inference and whether provisioned throughput is purchased. The pricing details for each model are listed in the console."}
 72 | ```
 73 | 
 74 | ## Known Limitations
 75 | This implementation is subject to the hard limit on the timeout of APIGateway set to 29 seconds. We have measured the probability of failure because of `endpoint timeout` errors to be around 5% (n=700 observations). 
 76 | 
 77 | ## Cleanup
 78 |  
 79 | 1. Delete the stack
 80 |     ```bash
 81 |     sam delete
 82 |     ```
 83 | 
 84 | ## Authors
 85 | ### Giuseppe Battista
 86 | Giuseppe Battista is a Senior Solutions Architect at Amazon Web Services. He leads soultions architecture for Early Stage Startups in UK and Ireland. He hosts the Twitch Show \"Let's Build a Startup\" on twitch.tv/aws and he's head of Unicorn's Den accelerator.  
 87 | 
 88 | [LinkedIn](https://www.linkedin.com/in/giusedroid/)
 89 | [GitHub](https://github.com/giusedroid)
 90 | [Buy me a Pint](https://monzo.me/giusebattista?amount=7)
 91 | 
 92 | ### Kevin Shaffer-Morrison
 93 | Kevin Shaffer-Morrison is a Senior Solutions Architect at Amazon Web Services. He's helped hundreds of startups get off the ground quickly and up into the cloud. Kevin focuses on helping the earliest stage of founders with code samples and  twitch live streams  on twitch.tv/aws.  
 94 | 
 95 | [Linkedin](https://www.linkedin.com/in/kshaffermorrison)
 96 | [GitHub](https://github.com/shafkevi)
 97 | 
 98 | ----
 99 | 
100 | SPDX-License-Identifier: MIT-0
101 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/__init__.py


--------------------------------------------------------------------------------
/assets/ServerlessRAG.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/assets/ServerlessRAG.png


--------------------------------------------------------------------------------
/assets/full-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/assets/full-architecture.png


--------------------------------------------------------------------------------
/data-pipeline/.gitignore:
--------------------------------------------------------------------------------
1 | embeddings/


--------------------------------------------------------------------------------
/data-pipeline/README.md:
--------------------------------------------------------------------------------
 1 | # Serverless RAG with Lambda, S3, and LanceDB - Data Ingestion
 2 | 
 3 | This data ingestion pipeline allows you to create embeddings from your PDF 
 4 | documents and make them available to LanceDB in your Lambda function.
 5 | 
 6 | ## Prerequisites
 7 | 
 8 | As you'll be running this locally, make sure your `aws cli` is configured with
 9 | permissions to PUT files on S3 and invoke models on Amazon Bedrock.
10 | 
11 | ## Usage
12 | 
13 | 1. Make sure you have deployed the stack to your AWS account first. More info in
14 | the [`README`](../README.md) in the root of this repository
15 | 1. `pip install -r requirements.txt`
16 | 1. Copy all the `.pdf` documents you want to ingest to `./docs`. Make sure they 
17 | all have `.pdf` extensions. In this example, we'll only ingest PDF documents.
18 | 1. Get your stack name from `../samconfig.toml`
19 | 
20 | 
21 | ```bash
22 | ./run.sh <your-stack-name>
23 | ```
24 | 
25 | ## Notes
26 | 
27 | Once you've run the script, you can find your embeddings in `./embeddings`.  
28 | To produce such embeddings we're making use of 
29 | [Amazon's Titan Embedding model](https://aws.amazon.com/bedrock/titan/#Titan_Embeddings_.28generally_available.29).


--------------------------------------------------------------------------------
/data-pipeline/docs/bedrock-docs-2012.10.20.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/data-pipeline/docs/bedrock-docs-2012.10.20.pdf


--------------------------------------------------------------------------------
/data-pipeline/ingest.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from langchain.text_splitter import CharacterTextSplitter
 4 | from langchain.vectorstores import LanceDB
 5 | from langchain.embeddings import BedrockEmbeddings
 6 | from langchain.document_loaders import PyPDFDirectoryLoader
 7 | 
 8 | import lancedb as ldb
 9 | import pyarrow as pa
10 | 
11 | embeddings = BedrockEmbeddings(
12 |     region_name="us-west-2"
13 | )
14 | 
15 | # we split the data into chunks of 1,000 characters, with an overlap
16 | # of 200 characters between the chunks, which helps to give better results
17 | # and contain the context of the information between chunks
18 | text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
19 | 
20 | db = ldb.connect('/tmp/embeddings')
21 | 
22 | schema = pa.schema(
23 |   [
24 |       pa.field("vector", pa.list_(pa.float32(), 1536)), # document vector with 1.5k dimensions (TitanEmbedding)
25 |       pa.field("text", pa.string()), # langchain requires it
26 |       pa.field("id", pa.string()) # langchain requires it
27 |   ])
28 |   
29 | tbl = db.create_table("doc_table", schema=schema)
30 | 
31 | # load the document as before
32 | 
33 | loader = PyPDFDirectoryLoader("./docs/")
34 | 
35 | docs = loader.load()
36 | docs = text_splitter.split_documents(docs)
37 | 
38 | LanceDB.from_documents(docs, embeddings, connection=tbl)
39 | 


--------------------------------------------------------------------------------
/data-pipeline/requirements.txt:
--------------------------------------------------------------------------------
  1 | aiohttp==3.8.6
  2 | aiosignal==1.3.1
  3 | annotated-types==0.6.0
  4 | anyio==3.7.1
  5 | asgiref==3.7.2
  6 | astroid==2.15.5
  7 | asttokens==2.2.1
  8 | async-timeout==4.0.3
  9 | attrs==23.1.0
 10 | Automat==20.2.0
 11 | Babel==2.8.0
 12 | backcall==0.2.0
 13 | bcrypt==3.2.0
 14 | blinker==1.4
 15 | boto==2.49.0
 16 | boto3==1.28.73
 17 | botocore==1.31.73
 18 | cachetools==5.3.2
 19 | certifi==2020.6.20
 20 | chardet==4.0.0
 21 | charset-normalizer==3.3.1
 22 | click==8.1.7
 23 | cloud-init==23.2.2
 24 | colorama==0.4.4
 25 | command-not-found==0.3
 26 | configobj==5.0.6
 27 | constantly==15.1.0
 28 | cryptography==3.4.8
 29 | dataclasses-json==0.6.1
 30 | dbus-python==1.2.18
 31 | decorator==5.1.1
 32 | deprecation==2.1.0
 33 | dill==0.3.7
 34 | distro==1.7.0
 35 | distro-info==1.1+ubuntu0.1
 36 | Django==4.2.2
 37 | ec2-hibinit-agent==1.0.0
 38 | exceptiongroup==1.1.3
 39 | executing==1.2.0
 40 | frozenlist==1.4.0
 41 | git-remote-codecommit==1.16
 42 | greenlet==3.0.1
 43 | hibagent==1.0.1
 44 | httplib2==0.20.2
 45 | hyperlink==21.0.0
 46 | idna==3.3
 47 | ikp3db @ https://dhj20r2nmszcd.cloudfront.net/static/ikp3db-1.4.1.post1.tar.gz
 48 | importlib-metadata==4.6.4
 49 | incremental==21.3.0
 50 | ipython==8.14.0
 51 | isort==5.12.0
 52 | jedi==0.18.2
 53 | jeepney==0.7.1
 54 | Jinja2==3.0.3
 55 | jmespath==1.0.1
 56 | jsonpatch==1.33
 57 | jsonpointer==2.0
 58 | jsonschema==3.2.0
 59 | keyring==23.5.0
 60 | lancedb==0.3.2
 61 | langchain==0.0.325
 62 | langsmith==0.0.53
 63 | launchpadlib==1.10.16
 64 | lazr.restfulclient==0.14.4
 65 | lazr.uri==1.0.6
 66 | lazy-object-proxy==1.9.0
 67 | MarkupSafe==2.0.1
 68 | marshmallow==3.20.1
 69 | matplotlib-inline==0.1.6
 70 | mccabe==0.7.0
 71 | more-itertools==8.10.0
 72 | multidict==6.0.4
 73 | mypy-extensions==1.0.0
 74 | netifaces==0.11.0
 75 | numpy==1.26.1
 76 | oauthlib==3.2.0
 77 | packaging==23.2
 78 | pandas==2.1.2
 79 | parso==0.8.3
 80 | pexpect==4.8.0
 81 | pickleshare==0.7.5
 82 | platformdirs==3.10.0
 83 | prompt-toolkit==3.0.39
 84 | ptyprocess==0.7.0
 85 | pure-eval==0.2.2
 86 | py==1.11.0
 87 | pyarrow==13.0.0
 88 | pyasn1==0.4.8
 89 | pyasn1-modules==0.2.1
 90 | pydantic==2.4.2
 91 | pydantic_core==2.10.1
 92 | Pygments==2.16.1
 93 | PyGObject==3.42.1
 94 | PyHamcrest==2.0.2
 95 | PyJWT==2.3.0
 96 | pylance==0.8.7
 97 | pylint==2.17.4
 98 | pylint-django==2.5.3
 99 | pylint-flask==0.6
100 | pylint-plugin-utils==0.8.2
101 | pyOpenSSL==21.0.0
102 | pyparsing==2.4.7
103 | pypdf==3.17.0
104 | pyrsistent==0.18.1
105 | pyserial==3.5
106 | python-apt==2.4.0+ubuntu2
107 | python-dateutil==2.8.2
108 | python-debian==0.1.43+ubuntu1.1
109 | python-magic==0.4.24
110 | pytz==2022.1
111 | PyYAML==6.0.1
112 | ratelimiter==1.2.0.post0
113 | requests==2.31.0
114 | retry==0.9.2
115 | s3transfer==0.7.0
116 | SecretStorage==3.3.1
117 | semver==3.0.2
118 | service-identity==18.1.0
119 | six==1.16.0
120 | sniffio==1.3.0
121 | sos==4.5.6
122 | SQLAlchemy==2.0.22
123 | sqlparse==0.4.4
124 | ssh-import-id==5.11
125 | stack-data==0.6.2
126 | systemd-python==234
127 | tenacity==8.2.3
128 | tomli==2.0.1
129 | tomlkit==0.12.1
130 | tqdm==4.66.1
131 | traitlets==5.9.0
132 | Twisted==22.1.0
133 | typing-inspect==0.9.0
134 | typing_extensions==4.7.1
135 | tzdata==2023.3
136 | ubuntu-advantage-tools==8001
137 | ufw==0.36.1
138 | unattended-upgrades==0.1
139 | urllib3==1.26.5
140 | wadllib==1.3.6
141 | wcwidth==0.2.6
142 | wrapt==1.15.0
143 | yarl==1.9.2
144 | zipp==1.0.0
145 | zope.interface==5.4.0
146 | 


--------------------------------------------------------------------------------
/data-pipeline/run.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Function to display help message
 4 | usage() {
 5 |   echo "Usage: $0 <stack-name>"
 6 |   echo "Please provide your stack name. You can find it in your samconfig.toml"
 7 |   echo "Example: $0 my-stack-name"
 8 | }
 9 | 
10 | # Check if at least one argument is provided
11 | if [ "$#" -ne 1 ]; then
12 |   usage
13 |   exit 1
14 | fi
15 | 
16 | # Define the directory and file extension for documents
17 | directory="./docs"
18 | file_extension="pdf"  # Change this to your desired file extension
19 | 
20 | # Function to display error message
21 | error_message() {
22 |   echo "Error: No valid documents found in $directory. Make sure your document has a .pdf extension"
23 | }
24 | 
25 | 
26 | # Find and print all documents in the directory
27 | documents=$(find "$directory" -type f -name "*.$file_extension")
28 | if [ -z "$documents" ]; then
29 |   error_message
30 |   exit 1
31 | else
32 |   echo "Importing documents into LanceDB:"
33 |   echo "$documents"
34 | fi
35 | 
36 | rm -rf /tmp/embeddings
37 | python3 ingest.py
38 | 
39 | 
40 | STACK_NAME=$1
41 | BUCKET_NAME=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`DocumentBucketName`].OutputValue' --output text)
42 | 
43 | 
44 | 
45 | cp -r /tmp/embeddings ./
46 | 
47 | echo "Exporting embeddings to s3://${BUCKET_NAME}"
48 | aws s3 sync ./embeddings s3://${BUCKET_NAME}


--------------------------------------------------------------------------------
/events/event-question.json:
--------------------------------------------------------------------------------
1 | {
2 |     "queryStringParameters": {
3 |       "question": "what is the pricing model of Amazon Bedrock?"
4 |     }
5 |   }
6 |   


--------------------------------------------------------------------------------
/events/event-warmup.json:
--------------------------------------------------------------------------------
1 | {
2 |     "queryStringParameters": {
3 |       "warmup": "true"
4 |     }
5 |   }
6 |   


--------------------------------------------------------------------------------
/example-pattern.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "title": "Serverless Retrieval Augmented Generation (RAG) with Bedrock, Lambda, S3",
 3 |   "description": "With this pattern we want to showcase how to implement a serverless RAG architecture.",
 4 |   "language": "Python",
 5 |   "level": "400",
 6 |   "framework": "SAM",
 7 |   "introBox": {
 8 |     "headline": "How it works",
 9 |     "text": [
10 |       "With this pattern we want to showcase how to implement a serverless RAG architecture.",
11 |       "Customers asked for a way to quickly test RAG capabilities on a small number of documents without managing infrastructure for contextual knowledge and non-parametric memory.",
12 |       "In this pattern, we run a RAG workflow in a single Lambda function, so that customers only pay for the infrastructure they use, when they use it. We use LanceDB with S3 as backend for embedding storage.",
13 |       "This pattern deploys one Lambda function behind an API Gateway and an S3 Bucket where to store your embeddings.",
14 |       "This pattern makes use of Bedrock to calculate embeddings with Amazon Titan Embedding and Claude as prediction LLM",
15 |       "We also provide a local pipeline to ingest your PDFs and upload them to S3."
16 |     ]
17 |   },
18 |   "gitHub": {
19 |     "template": {
20 |       "repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/apigw-lambda-bedrock-s3-rag",
21 |       "templateURL": "serverless-patterns/apigw-lambda-bedrock-s3-rag",
22 |       "projectFolder": "apigw-lambda-bedrock-s3-rag",
23 |       "templateFile": "template.yaml"
24 |     }
25 |   },
26 |   "resources": {
27 |     "bullets": [
28 |       {
29 |         "text": "Amazon Bedrock",
30 |         "link": "https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html"
31 |       },
32 |       {
33 |         "text": "Retrieval Augmented Generation (RAG) - original research paper",
34 |         "link": "https://arxiv.org/abs/2005.11401"
35 |       },
36 |       {
37 |         "text": "Amazon Titan Embedding",
38 |         "link": "https://aws.amazon.com/bedrock/titan/#Titan_Embeddings_.28generally_available.29"
39 |       }
40 |     ]
41 |   },
42 |   "deploy": {
43 |     "text": [
44 |       "sam build",
45 |       "sam deploy -g"
46 |     ]
47 |   },
48 |   "testing": {
49 |     "text": [
50 |       "See the GitHub repo for detailed testing instructions."
51 |     ]
52 |   },
53 |   "cleanup": {
54 |     "text": [
55 |       "sam delete"
56 |     ]
57 |   },
58 |   "authors": [
59 |     {
60 |       "name": "Kevin Shaffer-Morrison",
61 |       "image": "https://kevin.shaffer-morrison.com/images/sideProfileHeadshot.jpg",
62 |       "bio": "Kevin Shaffer-Morrison is a Senior Solutions Architect at Amazon Web Services. He's helped hundreds of startups get off the ground quickly and up into the cloud. Kevin focuses on helping the earliest stage of founders with code samples and  twitch live streams  on twitch.tv/aws.",
63 |       "linkedin": "kshaffermorrison"
64 |     },
65 |     {
66 |       "name": "Giuseppe Battista",
67 |       "image": "https://media.licdn.com/dms/image/D4E03AQGOz6p4p-rSfg/profile-displayphoto-shrink_800_800/0/1686238097666?e=1703721600&v=beta&t=bhKqpehfON17sLUNDn-yFUKJhohuWljVKNCarDnsdYA",
68 |       "bio": "Giuseppe Battista is a Senior Solutions Architect at Amazon Web Services. He leads soultions architecture for Early Stage Startups in UK and Ireland. He hosts the Twitch Show \"Let's Build a Startup\" on twitch.tv/aws and he's head of Unicorn's Den accelerator.",
69 |       "linkedin": "giusedroid",
70 |       "twitter": "giusedroid"
71 |     }
72 |   ]
73 | }
74 | 


--------------------------------------------------------------------------------
/ragfunction/.gitignore:
--------------------------------------------------------------------------------
1 | .aws-sam/


--------------------------------------------------------------------------------
/ragfunction/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM public.ecr.aws/lambda/python:3.10
2 | 
3 | COPY app.py requirements.txt ./
4 | 
5 | RUN python3.10 -m pip install -r requirements.txt -t .
6 | 
7 | # Command can be overwritten by providing a different command in the template directly.
8 | CMD ["app.lambda_handler"]
9 | 


--------------------------------------------------------------------------------
/ragfunction/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/giusedroid/serverless-rag-python/b6d2caa2c7a473f38894b2f53d7ed3a73d08a881/ragfunction/__init__.py


--------------------------------------------------------------------------------
/ragfunction/app.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | 
 4 | from langchain.prompts import ChatPromptTemplate
 5 | from langchain.chat_models import BedrockChat
 6 | from langchain.embeddings import BedrockEmbeddings
 7 | from langchain.schema.output_parser import StrOutputParser
 8 | from langchain.schema.runnable import RunnablePassthrough
 9 | from langchain.vectorstores import LanceDB
10 | 
11 | import lancedb as ldb
12 | 
13 | embeddings = BedrockEmbeddings(
14 |     region_name="us-west-2"
15 | )
16 | 
17 | # Retrieve data from S3 and load it into local LanceDB
18 | s3_bucket = os.environ.get('s3BucketName')
19 | db = ldb.connect(f"s3://{s3_bucket}/")
20 | tbl = db.open_table('doc_table')
21 | 
22 | # Initialize LanceDB instance within langchain
23 | vectorstore = LanceDB(connection=tbl, embedding=embeddings)
24 | retriever = vectorstore.as_retriever()
25 | 
26 | template = """Answer the question based only on the following context:
27 | {context}
28 | 
29 | Question: {question}
30 | """
31 | 
32 | # Create langchain prompt and initialize the Bedrock model
33 | prompt = ChatPromptTemplate.from_template(template)
34 | model = BedrockChat(model_id="anthropic.claude-v2", model_kwargs={"temperature": 0.1})
35 | 
36 | # Chain together the RAG search to Lance, our prompt, the LLM model, and stringifying the output
37 | chain = (
38 |     {"context": retriever, "question": RunnablePassthrough()}
39 |     | prompt
40 |     | model
41 |     | StrOutputParser()
42 | )
43 | 
44 | def lambda_handler(event, context):
45 |     
46 |     if event['queryStringParameters'].get('warmup') is not None:
47 |         return {
48 |             "statusCode": 202,
49 |             "body": json.dumps({
50 |                 "message": "warming up",
51 |                 "toTimeout": context.get_remaining_time_in_millis()
52 |             })
53 |         }
54 |     
55 |     question = event['queryStringParameters'].get('question')
56 |     
57 |     if question is not None:
58 |         
59 |         results = chain.invoke(question)
60 |         print(results)
61 | 
62 |         return {
63 |             "statusCode": 200,
64 |             "body": json.dumps({
65 |                 "message": results,
66 |             }),
67 |         }
68 |     
69 |     return {
70 |         "statusCode": 400,
71 |         "body":{
72 |             json.dumps({
73 |                 "message":"either provide question or warmup directive"
74 |             })
75 |         }
76 |     }
77 | 
78 | 
79 | 


--------------------------------------------------------------------------------
/ragfunction/requirements.txt:
--------------------------------------------------------------------------------
1 | lancedb==0.3.2
2 | langchain==0.0.325
3 | boto3==1.28.73
4 | pandas==2.1.2


--------------------------------------------------------------------------------
/template.yaml:
--------------------------------------------------------------------------------
 1 | AWSTemplateFormatVersion: '2010-09-09'
 2 | Transform: AWS::Serverless-2016-10-31
 3 | Description: >
 4 |   python3.10
 5 | 
 6 |   Sample SAM Template for RAG in a Lambda
 7 | 
 8 | # More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
 9 | Globals:
10 |   Function:
11 |     Timeout: 29
12 | 
13 | Resources:
14 | 
15 |   DocumentBucket:
16 |     Type: 'AWS::S3::Bucket'
17 |     Properties:
18 |       BucketName: !Join [ "-", [!Ref AWS::StackName, document-bucket ] ]
19 | 
20 | 
21 |   RAGFunction:
22 |     Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
23 |     Properties:
24 |       PackageType: Image
25 |       MemorySize: 4048
26 |       Architectures:
27 |         - x86_64
28 |       Events:
29 |         HelloWorld:
30 |           Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api
31 |           Properties:
32 |             Path: /rag
33 |             Method: get
34 |             RequestParameters:
35 |               - method.request.querystring.question
36 |               - method.request.querystring.warmup
37 |       Environment:
38 |         Variables:
39 |           s3BucketName: !Ref DocumentBucket
40 |       Policies:
41 |         - Statement:
42 |             - Effect: Allow
43 |               Action: 'bedrock:InvokeModel'
44 |               Resource: '*'
45 |             - Effect: Allow
46 |               Action: s3:GetObject
47 |               Resource: !Sub arn:aws:s3:::${DocumentBucket}/*
48 |             - Effect: Allow
49 |               Action: s3:ListBucket
50 |               Resource: !Sub arn:aws:s3:::${DocumentBucket}
51 |             - Effect: Allow
52 |               Action: s3:ListAllMyBuckets
53 |               Resource: "*"
54 |     Metadata:
55 |       Dockerfile: Dockerfile
56 |       DockerContext: ./ragfunction
57 |       DockerTag: python3.10-v1
58 | 
59 | Outputs:
60 |   # ServerlessRestApi is an implicit API created out of Events key under Serverless::Function
61 |   # Find out more about other implicit resources you can reference within SAM
62 |   # https://github.com/awslabs/serverless-application-model/blob/master/docs/internals/generated_resources.rst#api
63 |   RAGApi:
64 |     Description: "API Gateway endpoint URL for Prod stage for Hello World function"
65 |     Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/rag/"
66 |   RAGFunction:
67 |     Description: "Hello World Lambda Function ARN"
68 |     Value: !GetAtt RAGFunction.Arn
69 |   RAGFunctionIamRole:
70 |     Description: "Implicit IAM Role created for Hello World function"
71 |     Value: !GetAtt RAGFunctionRole.Arn
72 |   DocumentBucketName:
73 |     Description: "S3 bucket where LanceDB sources embeddings. Check this repository README for instructions on how to import your documents"
74 |     Value: !Ref DocumentBucket
75 | 


--------------------------------------------------------------------------------
/test.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Function to display help message
 4 | usage() {
 5 |   echo "Usage: $0 <stack-name>"
 6 |   echo "Please provide your stack name. You can find it in your samconfig.toml"
 7 |   echo "Example: $0 my-stack-name"
 8 | }
 9 | 
10 | # Check if at least one argument is provided
11 | if [ "$#" -ne 1 ]; then
12 |   usage
13 |   exit 1
14 | fi
15 | 
16 | STACK_NAME=$1
17 | API_URL=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`RAGApi`].OutputValue' --output text)
18 | 
19 | WARMUP_URL="$API_URL?warmup=true"
20 | PROMPT_URL="$API_URL?question=amazon+bedrock+pricing+model"
21 | 
22 | WARMUP_FILE=warmup.txt
23 | BODY_FILE=response_body.json
24 | 
25 | echo "Starting warmup at $(date +"%Y-%m-%d %T.%3N")"
26 | curl -s $WARMUP_URL -o $WARMUP_FILE
27 | echo "Warmup ended at $(date +"%Y-%m-%d %T.%3N")"
28 | echo "Time to Lambda timeout: $(cat $WARMUP_FILE | jq .toTimeout) ms"
29 | 
30 | echo "Testing prediction endpoint at $(date +"%Y-%m-%d %T.%3N")"
31 | curl -s $PROMPT_URL -o $BODY_FILE
32 | echo "Prediction ended at $(date +"%Y-%m-%d %T.%3N")"
33 | 
34 | cat $BODY_FILE
35 | 
36 | rm $BODY_FILE
37 | rm $WARMUP_FILE
38 | 


--------------------------------------------------------------------------------