├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── api ├── app │ ├── __init__.py │ ├── api │ │ ├── __init__.py │ │ └── api_v1 │ │ │ ├── __init__.py │ │ │ ├── api.py │ │ │ └── endpoints │ │ │ ├── __init__.py │ │ │ ├── fastapi_request.py │ │ │ ├── initialize.py │ │ │ └── llm_ep.py │ └── main.py └── requirements.txt ├── app ├── requirements.txt ├── sessions.py └── webapp.py ├── chatbot-manifest ├── chatbot-vs.yaml └── chatbot.yaml ├── cleanup.sh ├── configure-istio.sh ├── create-dynamodb-table.sh ├── data ├── Amazon_EMR_FAQs.csv └── Amazon_SageMaker_FAQs.csv ├── data_ingestion_to_vectordb ├── data_ingestion_to_vectordb.py └── requirements.txt ├── deploy-eks.sh ├── deploy-istio.sh ├── deploy-tenant-services.sh ├── deploy-userpools.sh ├── envoy-config ├── envoy-cds.yaml ├── envoy-lds.yaml └── envoy.yaml ├── fastapi_request.py ├── hosts-file-entry.sh ├── iam ├── chatbot-access-role-trust-policy.json ├── dynamodb-access-policy.json ├── s3-access-role-trust-policy.json ├── s3-contextual-data-access-policy.json └── s3-envoy-config-access-policy.json ├── image-build ├── Dockerfile-api ├── Dockerfile-app ├── build-chatbot-image.sh └── build-rag-api-image.sh ├── istio-proxy-v2-config ├── enable-X-Forwarded-For-header.yaml └── proxy-protocol-envoy-filter.yaml └── setup.sh /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # How to Build a Multitenant Chatbot with Retrieval Augmented Generation (RAG) using Amazon Bedrock and Amazon EKS 2 | Note that the instructions below are intended to give you step-by-step, how-to instructions for getting this solution up and running in your own AWS account. For a general description and overview of the solution, please see the 3 | [blog post here](https://aws.amazon.com/blogs/containers/build-a-multi-tenant-chatbot-with-rag-using-amazon-bedrock-and-amazon-eks/). 4 | 5 | ## BEFORE YOU START 6 | In order to build this environment, ensure that your have access to the required Bedrock models 7 | * Anthropic Claude Instant (Model Id: anthropic.claude-instant-v1) 8 | * Titan Embeddings G1 - Text (Model Id: amazon.titan-embed-text-v1) 9 | 10 | You can edit the **setup.sh** file to set the appropriate environment variables: 11 | * TEXT2TEXT_MODEL_ID 12 | * EMBEDDING_MODEL_ID 13 | 14 | ## CRITICAL STEPS 15 | Pay close attention to **Steps 2 & 3**, otherwise the deployment will not succeed. 16 | 17 | ## Setting up the environment 18 | 19 | > :warning: The Cloud9 workspace should be built by an IAM user with Administrator privileges, not the root account user. Please ensure you are logged in as an IAM user, not the root account user. 20 | 21 | 1. Create new Cloud9 Environment 22 | * Launch Cloud9 in your closest region Ex: `https://us-west-2.console.aws.amazon.com/cloud9/home?region=us-west-2` 23 | * Select Create environment 24 | * Name it whatever you want 25 | * Choose "t3.small" for instance type, take all default values and click Create environment 26 | * For Platform use "Amazon Linux 2" 27 | 2. Create EC2 Instance Role 28 | * Follow this [deep link](https://console.aws.amazon.com/iam/home#/roles$new?step=review&commonUseCase=EC2%2BEC2&selectedUseCase=EC2&policies=arn:aws:iam::aws:policy%2FAdministratorAccess) to create an IAM role with Administrator access. 29 | * Confirm that AWS service and EC2 are selected, then click `Next: Permissions` to view permissions. 30 | * Confirm that AdministratorAccess is checked, then click `Next: Tags` to assign tags. 31 | * Take the defaults, and click `Next: Review` to review. 32 | * Enter `Cloud9AdminRole` for the Name, and click `Create role`. 33 | 3. Remove managed credentials and attach EC2 Instance Role to Cloud9 Instance 34 | * Click the gear in the upper right-hand corner of the IDE which opens settings. Click the `AWS Settings` on the left and under `Credentials` slide the button to the left for `AWS Managed Temporary Credentials`. The button should be greyed out when done indicating it's off. 35 | * Click the round Button with an alphabet in the upper right-hand corner of the IDE and click `Manage EC2 Instance`. This will take you to the EC2 portion of the AWS Console 36 | * Right-click the Cloud9 EC2 instance and in the menu, click `Security` -> `Modify IAM Role` 37 | * Choose the Role you created in step 3 above. It should be titled `Cloud9AdminRole` and click `Save`. 38 | 39 | 4. Clone the repo and run the setup script 40 | * Return to the Cloud9 IDE 41 | * In the upper left part of the main screen, click the round green button with a `+` on it and click `New Terminal` 42 | * Enter the following in the terminal window 43 | 44 | ```bash 45 | git clone https://github.com/aws-samples/multi-tenant-chatbot-using-rag-with-amazon-bedrock.git 46 | cd multi-tenant-chatbot-using-rag-with-amazon-bedrock 47 | chmod +x setup.sh 48 | ./setup.sh 49 | ``` 50 | 51 | This [script](./setup.sh) sets up all Kubernetes tools, updates the AWS CLI and installs other dependencies that we'll use later. Ensure that the Administrator EC2 role was created and successfully attached to the EC2 instance that's running your Cloud9 IDE. Also ensure you turned off `AWS Managed Credentials` inside your Cloud9 IDE (refer to steps 2 and 3). 52 | 53 | 54 | 5. Create and Populate DynamoDB Table 55 | > :warning: Close the terminal window that you created the cluster in, and open a new terminal before starting this step otherwise you may get errors about your AWS_REGION not set. 56 | * Open a **_NEW_** terminal window and `cd` back into `multi-tenant-chatbot-using-rag-with-amazon-bedrock` and run the following script: 57 | 58 | ```bash 59 | chmod +x create-dynamodb-table.sh 60 | ./create-dynamodb-table.sh 61 | ``` 62 | 63 | This [script](./create-dynamodb-table.sh) creates the Session and ChatHistory DynamoDB tables for the chat application to maintain session information and LangChain chat history. 64 | 65 | 6. Create the EKS Cluster 66 | * Run the following script to create a cluster configuration file, and subsequently provision the cluster using `eksctl`: 67 | 68 | ```bash 69 | chmod +x deploy-eks.sh 70 | ./deploy-eks.sh 71 | ``` 72 | 73 | This [script](./deploy-eks.sh) create a cluster configuration file, and subsequently provision the cluster using `eksctl`. 74 | 75 | The cluster will take approximately 30 minutes to deploy. 76 | 77 | After EKS Cluster is set up, the script also 78 | 79 | a. associate an OIDC provider with the Cluster 80 | 81 | b. deploys AWS Load Balancer Controller on the cluster 82 | 83 | c. creates IAM roles and policies for various containers to access S3, DynamoDB and Bedrock 84 | 85 | 86 | 7. Deploy Istio Service Mesh 87 | > :warning: Close the terminal window that you created the cluster in, and open a new terminal before starting this step otherwise you may get errors about your AWS_REGION not set. 88 | * Open a **_NEW_** terminal window and `cd` back into `multi-tenant-chatbot-using-rag-with-amazon-bedrock` and run the following script: 89 | 90 | ```bash 91 | chmod +x deploy-istio.sh 92 | ./deploy-istio.sh 93 | ``` 94 | 95 | This [script](./deploy-istio.sh) deploys the Istio Service Mesh, including the Istio Ingress Gateway with Kubernetes annotations that signal the AWS Load Balancer Controller to automatically deploy a Network Load Balancer (NLB) and associate it with the Ingress Gateway service. It also enables proxy v2 protocol on the Istio Ingress Gateway that helps preserve the client IP forwarded by the NLB. 96 | 97 | 98 | 8. Deploy Cognito User Pools 99 | > :warning: Close the terminal window that you create the cluster in, and open a new terminal before starting this step otherwise you may get errors about your AWS_REGION not set. 100 | * Open a **_NEW_** terminal window and `cd` back into `multi-tenant-chatbot-using-rag-with-amazon-bedrock` and run the following script: 101 | 102 | ```bash 103 | chmod +x deploy-userpools.sh 104 | ./deploy-userpools.sh 105 | ``` 106 | 107 | This [script](./deploy-userpools.sh) deploys Cognito User Pools for two (2) example tenants: tenanta and tenantb. Within each User Pool. The script will ask for passwords that will be set for each user. 108 | 109 | The script also generates the YAML files for OIDC proxy configuration which will be deployed in the next step: 110 | 111 | a. oauth2-proxy configuration for each tenant 112 | 113 | b. External Authorization Policy for Istio Ingress Gateway 114 | 115 | 116 | 9. Configure Istio Ingress Gateway 117 | > :warning: Close the terminal window that you create the cluster in, and open a new terminal before starting this step otherwise you may get errors about your AWS_REGION not set. 118 | * Open a **_NEW_** terminal window and `cd` back into `multi-tenant-chatbot-using-rag-with-amazon-bedrock` and run the following script: 119 | 120 | ```bash 121 | chmod +x configure-istio.sh 122 | ./configure-istio.sh 123 | ``` 124 | 125 | This [script](./configure-istio.sh) creates the following to configure the Istio Ingress Gateway: 126 | 127 | a. Self-signed Root CA Cert and Key 128 | 129 | b. Istio Ingress Gateway Cert signed by the Root CA 130 | 131 | It also completes the following steps: 132 | 133 | a. Creates TLS secret object for Istio Ingress Gateway Cert and Key 134 | 135 | b. Creates namespaces for Gateway, Envoy Reverse Proxy, OIDC Proxies, and the example tenants 136 | 137 | c. Deploys an Istio Gateway resource 138 | 139 | d. Deploys an Envoy reverse proxy 140 | 141 | e. Deploy oauth2-proxy along with the configuration generated in the Step 8 142 | 143 | f. Adds an Istio External Authorization Provider definition pointing to the Envoy Reverse Proxy 144 | 145 | 146 | 10. Deploy Tenant Application Microservices 147 | > :warning: Close the terminal window that you create the cluster in, and open a new terminal before starting this step otherwise you may get errors about your AWS_REGION not set. 148 | * Open a **_NEW_** terminal window and `cd` back into `multi-tenant-chatbot-using-rag-with-amazon-bedrock` and run the following script: 149 | 150 | ```bash 151 | chmod +x deploy-tenant-services.sh 152 | ./deploy-tenant-services.sh 153 | ``` 154 | 155 | This [script](./deploy-tenant-services.sh) creates the service dpeloyments for the two (2) sample tenants, along with Istio VirtualService constructs that define the routing rules. 156 | 157 | 158 | 11. Once finished running all the above steps, the bookinfo app can be accessed using the following steps. 159 | 160 | a. Run the following command in the Cloud9 shell 161 | ```bash 162 | chmod +x hosts-file-entry.sh 163 | ./hosts-file-entry.sh 164 | ``` 165 | 166 | b. Append the output of the command into your local hosts file. It identifies the load balancer instance associated with the Istio Ingress Gateway, and looks up the public IP addresses assigned to it. 167 | 168 | c. To avoid TLS cert conflicts, configure the browser on desktop/laptop with a new profiles 169 | 170 | d. The browser used to test this deployment was Mozilla Firefox, in which a new profile can be created by pointing the browser to "about:profiles" 171 | 172 | e. Create a new profile, such as, "multitenant-chatbot" 173 | 174 | f. After creating the profile, click on the "Launch profile in new browser" 175 | 176 | g. In the browser, open two tabs, one for each of the following URLs: 177 | 178 | ``` 179 | https://tenanta.example.com/ 180 | 181 | https://tenantb.example.com/ 182 | ``` 183 | 184 | h. Because of self-signed TLS certificates, you may received a certificate related error or warning from the browser 185 | 186 | i. When the login prompt appears: 187 | 188 | In the browser windows with the "multitenant-chatapp" profile, login with: 189 | 190 | ``` 191 | user1@tenanta.com 192 | 193 | user1@tenantb.com 194 | ``` 195 | 196 | ## Sample Prompts 197 | 198 | # Tenant A 199 | * What are Foundation Models 200 | * Can I use R in SageMaker 201 | * how to detect statistical bias across a model training workflow 202 | * how can i monitor the performance of a model and take corrective action when drift is detected 203 | * what is tranium 204 | * what is inferentia 205 | 206 | # Tenant B 207 | * What are the applications of Impala 208 | * How to use HBase in EMR 209 | * What does it mean when the EMR cluster is BOOTSTRAPPING 210 | * How to use Kinesis for data ingestion 211 | 212 | ## Cleanup 213 | 214 | 1. The deployed components can be cleaned up by running the following: 215 | 216 | ```bash 217 | chmod +x cleanup.sh 218 | ./cleanup.sh 219 | ``` 220 | 221 | 2. You can also delete 222 | 223 | a. The EC2 Instance Role `Cloud9AdminRole` 224 | 225 | b. The Cloud9 Environment 226 | 227 | ## License 228 | 229 | This library is licensed under the MIT-0 License. See the LICENSE file. 230 | ## Security 231 | 232 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 233 | 234 | ## License 235 | 236 | This library is licensed under the MIT-0 License. See the LICENSE file. 237 | 238 | -------------------------------------------------------------------------------- /api/app/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/multi-tenant-chatbot-using-rag-with-amazon-bedrock/6ecf194a0c2a800716dc8f7979df86aa29731024/api/app/__init__.py -------------------------------------------------------------------------------- /api/app/api/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/multi-tenant-chatbot-using-rag-with-amazon-bedrock/6ecf194a0c2a800716dc8f7979df86aa29731024/api/app/api/__init__.py -------------------------------------------------------------------------------- /api/app/api/api_v1/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/multi-tenant-chatbot-using-rag-with-amazon-bedrock/6ecf194a0c2a800716dc8f7979df86aa29731024/api/app/api/api_v1/__init__.py -------------------------------------------------------------------------------- /api/app/api/api_v1/api.py: -------------------------------------------------------------------------------- 1 | from .endpoints import llm_ep 2 | from fastapi import APIRouter 3 | 4 | router = APIRouter() 5 | router.include_router(llm_ep.router, prefix="/llm", tags=["llm"]) 6 | -------------------------------------------------------------------------------- /api/app/api/api_v1/endpoints/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/multi-tenant-chatbot-using-rag-with-amazon-bedrock/6ecf194a0c2a800716dc8f7979df86aa29731024/api/app/api/api_v1/endpoints/__init__.py -------------------------------------------------------------------------------- /api/app/api/api_v1/endpoints/fastapi_request.py: -------------------------------------------------------------------------------- 1 | import os 2 | import boto3 3 | from enum import Enum 4 | from pydantic import BaseModel 5 | from typing import List 6 | 7 | class Text2TextModelName(str, Enum): 8 | modelId = os.environ.get('TEXT2TEXT_MODEL_ID') 9 | 10 | class EmbeddingsModelName(str, Enum): 11 | modelId = os.environ.get('EMBEDDING_MODEL_ID') 12 | 13 | class VectorDBType(str, Enum): 14 | FAISS = "faiss" 15 | 16 | class Request(BaseModel): 17 | q: str 18 | user_session_id: str 19 | max_length: int = 500 20 | num_return_sequences: int = 1 21 | do_sample: bool = False 22 | verbose: bool = False 23 | max_matching_docs: int = 3 24 | # Bedrock / Titan 25 | temperature: float = 0.1 26 | maxTokenCount: int = 512 27 | stopSequences: List = ['\n\nHuman:'] 28 | topP: float = 0.9 29 | topK: int = 250 30 | 31 | text_generation_model: Text2TextModelName = Text2TextModelName.modelId 32 | embeddings_generation_model: EmbeddingsModelName = EmbeddingsModelName.modelId 33 | vectordb_s3_path: str = f"s3://{os.environ.get('CONTEXTUAL_DATA_BUCKET')}/faiss_index/" 34 | vectordb_type: VectorDBType = VectorDBType.FAISS 35 | 36 | MODEL_ENDPOINT_MAPPING = { 37 | Text2TextModelName.modelId: os.environ.get('TEXT2TEXT_MODEL_ID'), 38 | EmbeddingsModelName.modelId: os.environ.get('EMBEDDING_MODEL_ID'), 39 | } 40 | -------------------------------------------------------------------------------- /api/app/api/api_v1/endpoints/initialize.py: -------------------------------------------------------------------------------- 1 | import os 2 | import boto3 3 | import logging 4 | from urllib.parse import urlparse 5 | from langchain.vectorstores import FAISS 6 | from langchain.embeddings import BedrockEmbeddings 7 | 8 | logger = logging.getLogger(__name__) 9 | 10 | def load_vector_db_faiss(vectordb_s3_path: str, vectordb_local_path: str, embeddings_model: str, bedrock_service: str) -> FAISS: 11 | os.makedirs(vectordb_local_path, exist_ok=True) 12 | # download the vectordb files from S3 13 | # note that the following code is only applicable to FAISS 14 | # would need to be enhanced to support other vector dbs 15 | vectordb_files = ["index.pkl", "index.faiss"] 16 | for vdb_file in vectordb_files: 17 | s3 = boto3.client('s3') 18 | fpath = os.path.join(vectordb_local_path, vdb_file) 19 | with open(fpath, 'wb') as f: 20 | parsed = urlparse(vectordb_s3_path) 21 | bucket = parsed.netloc 22 | path = os.path.join(parsed.path[1:], vdb_file) 23 | logger.info(f"going to download from bucket={bucket}, path={path}, to {fpath}") 24 | s3.download_fileobj(bucket, path, f) 25 | logger.info(f"after downloading from bucket={bucket}, path={path}, to {fpath}") 26 | 27 | logger.info("Creating an embeddings object to hydrate the vector db") 28 | 29 | boto3_bedrock = boto3.client(service_name=bedrock_service) 30 | br_embeddings = BedrockEmbeddings(client=boto3_bedrock, model_id=embeddings_model) 31 | 32 | vector_db = FAISS.load_local(vectordb_local_path, br_embeddings) 33 | logger.info(f"vector db hydrated, type={type(vector_db)} it has {vector_db.index.ntotal} embeddings") 34 | 35 | return vector_db 36 | -------------------------------------------------------------------------------- /api/app/api/api_v1/endpoints/llm_ep.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import boto3 4 | import logging 5 | from typing import Any, Dict 6 | from fastapi import APIRouter 7 | from urllib.parse import urlparse 8 | from langchain.chains import ConversationChain 9 | from langchain.memory import ConversationBufferMemory 10 | from langchain.chains import ConversationalRetrievalChain 11 | from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory 12 | from langchain.llms.bedrock import Bedrock 13 | from langchain.prompts import PromptTemplate 14 | # from langchain import PromptTemplate 15 | from .fastapi_request import (Request, 16 | Text2TextModelName, 17 | EmbeddingsModelName, 18 | VectorDBType) 19 | from .initialize import load_vector_db_faiss 20 | 21 | logging.getLogger().setLevel(logging.INFO) 22 | logger = logging.getLogger() 23 | 24 | CHATHISTORY_TABLE=os.environ.get('CHATHISTORY_TABLE') 25 | 26 | EMBEDDINGS_MODEL = os.environ.get('EMBEDDING_MODEL_ID') 27 | TEXT2TEXT_MODEL_ID = os.environ.get('TEXT2TEXT_MODEL_ID') 28 | BEDROCK_SERVICE = os.environ.get('BEDROCK_SERVICE') 29 | 30 | 31 | VECTOR_DB_DIR = os.path.join("/tmp", "_vectordb") 32 | _vector_db = None 33 | vectordb_s3_path: str = f"s3://{os.environ.get('CONTEXTUAL_DATA_BUCKET')}/faiss_index/" 34 | 35 | if _vector_db is None: 36 | _vector_db = load_vector_db_faiss(vectordb_s3_path, 37 | VECTOR_DB_DIR, 38 | EMBEDDINGS_MODEL, 39 | BEDROCK_SERVICE) 40 | router = APIRouter() 41 | 42 | @router.post("/rag") 43 | def rag_handler(req: Request) -> Dict[str, Any]: 44 | # dump the received request for debugging purposes 45 | logger.info(f"req={req}") 46 | 47 | 48 | # Use the vector db to find similar documents to the query 49 | # the vector db call would automatically convert the query text 50 | # into embeddings 51 | docs = _vector_db.similarity_search(req.q, k=req.max_matching_docs) 52 | logger.info(f"here are the {req.max_matching_docs} closest matching docs to the query=\"{req.q}\"") 53 | for d in docs: 54 | logger.info(f"---------") 55 | logger.info(d) 56 | logger.info(f"---------") 57 | 58 | parameters = { 59 | "max_tokens_to_sample": req.maxTokenCount, 60 | "stop_sequences": req.stopSequences, 61 | "temperature": req.temperature, 62 | "top_k": req.topK, 63 | "top_p": req.topP 64 | } 65 | 66 | endpoint_name = req.text_generation_model 67 | logger.info(f"ModelId: {TEXT2TEXT_MODEL_ID}, Bedrock Model: {BEDROCK_SERVICE}") 68 | 69 | session_id = req.user_session_id 70 | boto3_bedrock = boto3.client(service_name=BEDROCK_SERVICE) 71 | bedrock_llm = Bedrock(model_id=TEXT2TEXT_MODEL_ID, client=boto3_bedrock) 72 | bedrock_llm.model_kwargs = parameters 73 | 74 | message_history = DynamoDBChatMessageHistory(table_name=CHATHISTORY_TABLE, session_id=session_id) 75 | memory_chain = ConversationBufferMemory( 76 | memory_key="chat_history", 77 | chat_memory=message_history, 78 | input_key="question", 79 | ai_prefix="Assistant", 80 | return_messages=True 81 | ) 82 | 83 | condense_prompt_claude = PromptTemplate.from_template(""" 84 | Answer only with the new question. 85 | 86 | Human: How would you ask the question considering the previous conversation: {question} 87 | 88 | Assistant: Question:""") 89 | 90 | qa = ConversationalRetrievalChain.from_llm( 91 | llm=bedrock_llm, 92 | retriever=_vector_db.as_retriever(search_type='similarity', search_kwargs={"k": req.max_matching_docs}), 93 | memory=memory_chain, 94 | condense_question_prompt=condense_prompt_claude, 95 | chain_type='stuff', # 'refine', 96 | ) 97 | 98 | qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template(""" 99 | {context} 100 | 101 | Human: Answer the question inside the XML tags. 102 | 103 | {question} 104 | 105 | Do not use any XML tags in the answer. If you don't know the answer or if the answer is not in the context say "Sorry, I don't know." 106 | 107 | Assistant:""") 108 | 109 | answer = "" 110 | answer = qa.run({'question': req.q }) 111 | 112 | logger.info(f"answer received from llm,\nquestion: \"{req.q}\"\nanswer: \"{answer}\"") 113 | resp = {'question': req.q, 'answer': answer, 'session_id': req.user_session_id} 114 | if req.verbose is True: 115 | resp['docs'] = docs 116 | 117 | return resp 118 | -------------------------------------------------------------------------------- /api/app/main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import boto3 4 | import logging 5 | import subprocess 6 | from fastapi import FastAPI 7 | from api.api_v1.api import router as api_router 8 | 9 | logging.getLogger().setLevel(logging.INFO) 10 | logger = logging.getLogger() 11 | 12 | app = FastAPI() 13 | 14 | @app.get("/") 15 | async def root(): 16 | return {"message": "API for question answering bot"} 17 | 18 | app.include_router(api_router, prefix="/api/v1") 19 | -------------------------------------------------------------------------------- /api/requirements.txt: -------------------------------------------------------------------------------- 1 | fastapi==0.103.2 2 | uvicorn==0.23.2 3 | gunicorn==21.2.0 4 | pydantic==1.10.13 5 | langchain==0.0.305 6 | faiss-cpu==1.7.4 7 | numpy==1.26.0 8 | -------------------------------------------------------------------------------- /app/requirements.txt: -------------------------------------------------------------------------------- 1 | streamlit==1.27.1 2 | boto3 3 | cryptography==41.0.4 4 | httpx==0.25.0 -------------------------------------------------------------------------------- /app/sessions.py: -------------------------------------------------------------------------------- 1 | import uuid 2 | import logging 3 | from datetime import datetime 4 | from botocore.exceptions import ClientError 5 | 6 | logger = logging.getLogger(__name__) 7 | 8 | class Sessions: 9 | """ 10 | Encapsulates an Amazon DynamoDB table of session data. 11 | """ 12 | def __init__(self, dyn_resource): 13 | """ 14 | :param dyn_resource: A Boto3 DynamoDB resource. 15 | """ 16 | self.dyn_resource = dyn_resource 17 | self.table = None 18 | 19 | def exists(self, table_name): 20 | """ 21 | Determines whether a table exists. As a side effect, stores the table in 22 | a member variable. 23 | 24 | :param table_name: The name of the table to check. 25 | :return: True when the table exists; otherwise, False. 26 | """ 27 | try: 28 | table = self.dyn_resource.Table(table_name) 29 | table.load() 30 | exists = True 31 | except ClientError as err: 32 | if err.response['Error']['Code'] == 'ResourceNotFoundException': 33 | exists = False 34 | else: 35 | logger.error( 36 | "Couldn't check for existence of %s. Here's why: %s: %s", 37 | table_name, 38 | err.response['Error']['Code'], err.response['Error']['Message']) 39 | raise 40 | else: 41 | self.table = table 42 | return exists 43 | 44 | def get_session(self, tenant_id): 45 | """ 46 | Gets session data from the table for a specific session. 47 | 48 | :param tenantid: The Tenant ID 49 | :return: The data about the requested session. 50 | """ 51 | try: 52 | response = self.table.get_item(Key={'TenantId': tenant_id}) 53 | except ClientError as err: 54 | logger.error( 55 | "Couldn't get session %s from table %s. Here's why: %s: %s", 56 | tenant_id, self.table.name, 57 | err.response['Error']['Code'], err.response['Error']['Message']) 58 | raise 59 | else: 60 | if "Item" in response.keys(): 61 | return response["Item"] 62 | 63 | def add_session(self, tenant_id): 64 | """ 65 | Adds a session to the table. 66 | 67 | :param session_id: The current session_id for the tenant/user 68 | """ 69 | session_id = str(uuid.uuid4()) 70 | new_interaction = int(datetime.now().timestamp()) 71 | try: 72 | self.table.put_item( 73 | Item={ 74 | 'TenantId': tenant_id, 75 | 'session_id': session_id, 76 | 'last_interaction': new_interaction 77 | } 78 | ) 79 | except ClientError as err: 80 | logger.error( 81 | "Couldn't add session %s to table %s. Here's why: %s: %s", 82 | session_id, self.table.name, 83 | err.response['Error']['Code'], err.response['Error']['Message']) 84 | raise 85 | 86 | def update_session_last_interaction(self, tenant_id, new_interaction): 87 | """ 88 | Updates last_interaction data for a session in the table. 89 | 90 | :param tenant_id: The tenant_id of the session to update. 91 | :param last_interaction: The last_interaction time of the session to update. 92 | :return: The fields that were updated, with their new values. 93 | """ 94 | try: 95 | response = self.table.update_item( 96 | Key={'TenantId': tenant_id}, 97 | UpdateExpression="set last_interaction=:new_interaction", 98 | ExpressionAttributeValues={':new_interaction': new_interaction}, 99 | ReturnValues="UPDATED_NEW") 100 | except ClientError as err: 101 | logger.error( 102 | "Couldn't update session %s in table %s. Here's why: %s: %s", 103 | tenant_id, self.table.name, 104 | err.response['Error']['Code'], err.response['Error']['Message']) 105 | raise 106 | else: 107 | return response['Attributes'] 108 | 109 | def update_session(self, tenant_id, new_interaction): 110 | """ 111 | Updates session_id and last_interaction data for a session in the table. 112 | 113 | :param tenant_id: The tenant_id of the session to update. 114 | :param last_interaction: The last_interaction time of the session to update. 115 | :return: The fields that were updated, with their new values. 116 | """ 117 | session_id = str(uuid.uuid4()) 118 | 119 | try: 120 | response = self.table.update_item( 121 | Key={'TenantId': tenant_id}, 122 | UpdateExpression="set session_id=:session_id, last_interaction=:new_interaction", 123 | ExpressionAttributeValues={':session_id': session_id, 124 | ':new_interaction': new_interaction}, 125 | ReturnValues="UPDATED_NEW") 126 | except ClientError as err: 127 | logger.error( 128 | "Couldn't update session %s in table %s. Here's why: %s: %s", 129 | tenant_id, self.table.name, 130 | err.response['Error']['Code'], err.response['Error']['Message']) 131 | raise 132 | else: 133 | return response['Attributes'] -------------------------------------------------------------------------------- /app/webapp.py: -------------------------------------------------------------------------------- 1 | """ 2 | A simple web application to implement a chatbot. This app uses Streamlit 3 | for the UI and the Python requests package to talk to an API endpoint that 4 | implements text generation and Retrieval Augmented Generation (RAG) using 5 | Amazon Bedrock and FAISS as the vector database. 6 | """ 7 | import os 8 | import httpx 9 | import json 10 | from datetime import datetime 11 | import boto3 12 | import streamlit as st 13 | import requests as req 14 | from typing import List, Tuple, Dict 15 | import streamlit as st 16 | from streamlit.web.server.websocket_headers import _get_websocket_headers 17 | 18 | from sessions import Sessions 19 | 20 | table_name = os.getenv('SESSIONS_TABLE') 21 | 22 | # Create DynamoDB client 23 | dynamodb = boto3.resource("dynamodb") 24 | 25 | # global constants 26 | STREAMLIT_SESSION_VARS: List[Tuple] = [("generated", []), ("past", []), ("input", ""), ("stored_session", [])] 27 | HTTP_OK: int = 200 28 | 29 | MODE_RAG: str = 'RAG' 30 | MODE_VALUES: List[str] = [MODE_RAG] 31 | 32 | TEXT2TEXT_MODEL_LIST: List[str] = ["anthropic.claude-instant-v1"] 33 | EMBEDDINGS_MODEL_LIST: List[str] = ["amazon.titan-embed-text-v1"] 34 | 35 | # API endpoint 36 | api: str = "http://127.0.0.1:8000" 37 | api_rag_ep: str = f"{api}/api/v1/llm/rag" 38 | print(f"api_rag_ep={api_rag_ep}") 39 | 40 | #################### 41 | # Streamlit code 42 | #################### 43 | # Get request headers 44 | headers = _get_websocket_headers() 45 | tenantid = headers.get("X-Auth-Request-Tenantid") 46 | user_email = headers.get("X-Auth-Request-Email") 47 | 48 | tenant_id = tenantid + ":" + user_email 49 | dyn_resource = boto3.resource('dynamodb') 50 | IDLE_TIME = 600 # seconds 51 | current_time = int(datetime.now().timestamp()) 52 | 53 | # Page title 54 | st.set_page_config(page_title='Virtual assistant for knowledge base 👩‍💻', layout='wide') 55 | 56 | # keep track of conversations by using streamlit_session 57 | _ = [st.session_state.setdefault(k, v) for k,v in STREAMLIT_SESSION_VARS] 58 | 59 | # Define function to get user input 60 | def get_user_input() -> str: 61 | """ 62 | Returns the text entered by the user 63 | """ 64 | print(st.session_state) 65 | input_text = st.text_input("You: ", 66 | st.session_state["input"], 67 | key="input", 68 | placeholder="Ask me a question and I will consult the knowledge base to answer...", 69 | label_visibility='hidden') 70 | return input_text 71 | 72 | 73 | # sidebar with options 74 | with st.sidebar.expander("⚙️", expanded=True): 75 | text2text_model = st.selectbox(label='Text2Text Model', options=TEXT2TEXT_MODEL_LIST) 76 | embeddings_model = st.selectbox(label='Embeddings Model', options=EMBEDDINGS_MODEL_LIST) 77 | mode = st.selectbox(label='Mode', options=MODE_VALUES) 78 | 79 | # streamlit app layout sidebar + main panel 80 | # the main panel has a title, a sub header and user input textbox 81 | # and a text area for response and history 82 | st.title("👩‍💻 Virtual assistant for a knowledge base") 83 | st.subheader(f" Powered by :blue[{TEXT2TEXT_MODEL_LIST[0]}] for text generation and :blue[{EMBEDDINGS_MODEL_LIST[0]}] for embeddings") 84 | st.markdown(f"**Tenant ID:** :blue[{tenantid}]     **User:** :blue[{user_email}]") 85 | 86 | # get user input 87 | user_input: str = get_user_input() 88 | 89 | # based on the selected mode type call the appropriate API endpoint 90 | if user_input: 91 | try: 92 | sessions = Sessions(dyn_resource) 93 | sessions_exists = sessions.exists(table_name) 94 | if sessions_exists: 95 | session = sessions.get_session(tenant_id) 96 | if session: 97 | if ((current_time - session['last_interaction']) < IDLE_TIME): 98 | sessions.update_session_last_interaction(tenant_id, current_time) 99 | updated_session = sessions.get_session(tenant_id) 100 | print(updated_session['session_id']) 101 | else: 102 | sessions.update_session(tenant_id, current_time) 103 | updated_session = sessions.get_session(tenant_id) 104 | else: 105 | sessions.add_session(tenant_id) 106 | session = sessions.get_session(tenant_id) 107 | except Exception as e: 108 | print(f"Something went wrong: {e}") 109 | 110 | # headers for request and response encoding, same for both endpoints 111 | headers: Dict = {"accept": "application/json", 112 | "Content-Type": "application/json" 113 | } 114 | output: str = None 115 | if mode == MODE_RAG: 116 | user_session_id = tenant_id + ":" + session["session_id"] 117 | data = {"q": user_input, "user_session_id": user_session_id, "verbose": True} 118 | resp = req.post(api_rag_ep, headers=headers, json=data) 119 | if resp.status_code != HTTP_OK: 120 | output = resp.text 121 | else: 122 | resp = resp.json() 123 | sources = list(set([d['metadata']['source'] for d in resp['docs']])) 124 | output = f"{resp['answer']} \n \n Sources: {sources}" 125 | else: 126 | print("error") 127 | output = f"unhandled mode value={mode}" 128 | st.session_state.past.append(user_input) 129 | st.session_state.generated.append(output) 130 | 131 | # download the chat history 132 | download_str: List = [] 133 | with st.expander("Conversation", expanded=True): 134 | for i in range(len(st.session_state['generated'])-1, -1, -1): 135 | st.info(st.session_state["past"][i],icon="❓") 136 | st.success(st.session_state["generated"][i], icon="👩‍💻") 137 | download_str.append(st.session_state["past"][i]) 138 | download_str.append(st.session_state["generated"][i]) 139 | 140 | download_str = '\n'.join(download_str) 141 | if download_str: 142 | st.download_button('Download', download_str) 143 | -------------------------------------------------------------------------------- /chatbot-manifest/chatbot-vs.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.istio.io/v1alpha3 2 | kind: VirtualService 3 | metadata: 4 | name: chatbot 5 | spec: 6 | hosts: 7 | - ${TENANT}.example.com 8 | gateways: 9 | - llm-demo-gateway-ns/llm-demo-gateway 10 | http: 11 | - route: 12 | - destination: 13 | host: chatbot.${NAMESPACE}.svc.cluster.local 14 | port: 15 | number: 80 16 | 17 | -------------------------------------------------------------------------------- /chatbot-manifest/chatbot.yaml: -------------------------------------------------------------------------------- 1 | ################################################################################################## 2 | # Tenant Service Account 3 | ################################################################################################## 4 | apiVersion: v1 5 | kind: ServiceAccount 6 | metadata: 7 | annotations: 8 | eks.amazonaws.com/role-arn: arn:aws:iam::${ACCOUNT_ID}:role/${EKS_CLUSTER_NAME}-${TENANT}-chatbot-access-role-${RANDOM_STRING} 9 | name: ${SA_NAME} 10 | --- 11 | apiVersion: apps/v1 12 | kind: Deployment 13 | metadata: 14 | name: chatbot 15 | labels: 16 | app: chatbot 17 | spec: 18 | replicas: 1 19 | selector: 20 | matchLabels: 21 | app: chatbot 22 | template: 23 | metadata: 24 | labels: 25 | workload-tier: frontend 26 | app: chatbot 27 | spec: 28 | serviceAccountName: ${SA_NAME} 29 | containers: 30 | - image: ${REPO_URI_CHATBOT}:latest 31 | imagePullPolicy: Always 32 | name: chatbot 33 | ports: 34 | - containerPort: 8501 35 | env: 36 | - name: ISSUER_URI 37 | value: ${ISSUER_URI} 38 | - name: SESSIONS_TABLE 39 | value: ${SESSIONS_TABLE} 40 | - image: ${REPO_URI_RAGAPI}:latest 41 | imagePullPolicy: Always 42 | name: ragapi 43 | ports: 44 | - containerPort: 8000 45 | env: 46 | - name: CONTEXTUAL_DATA_BUCKET 47 | value: contextual-data-${TENANT}-${RANDOM_STRING} 48 | - name: CHATHISTORY_TABLE 49 | value: ${CHATHISTORY_TABLE} 50 | - name: TEXT2TEXT_MODEL_ID 51 | value: ${TEXT2TEXT_MODEL_ID} 52 | - name: EMBEDDING_MODEL_ID 53 | value: ${EMBEDDING_MODEL_ID} 54 | - name: BEDROCK_SERVICE 55 | value: ${BEDROCK_SERVICE} 56 | - name: AWS_DEFAULT_REGION 57 | value: ${AWS_DEFAULT_REGION} 58 | --- 59 | kind: Service 60 | apiVersion: v1 61 | metadata: 62 | name: chatbot 63 | labels: 64 | app: chatbot 65 | spec: 66 | selector: 67 | app: chatbot 68 | ports: 69 | - port: 80 70 | name: http 71 | targetPort: 8501 72 | -------------------------------------------------------------------------------- /cleanup.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | source ~/.bash_profile 3 | 4 | echo "Deleting ECR Repositories" 5 | aws ecr delete-repository \ 6 | --force \ 7 | --repository-name ${EKS_CLUSTER_NAME}-${RANDOM_STRING}-chatbot 2>&1 > /dev/null 8 | 9 | aws ecr delete-repository \ 10 | --force \ 11 | --repository-name ${EKS_CLUSTER_NAME}-${RANDOM_STRING}-rag-api 2>&1 > /dev/null 12 | 13 | docker_images=$(docker images -a -q) 14 | if [ ! -z "$docker_images" ] 15 | then 16 | docker rmi -f $(docker images -a -q) 17 | fi 18 | 19 | echo "Deleting EKS Cluster" 20 | eksctl delete cluster --name ${EKS_CLUSTER_NAME} 21 | 22 | echo "Deleting Envoy Config S3 Bucket" 23 | aws s3 rm s3://envoy-config-${RANDOM_STRING} --recursive 24 | aws s3 rb s3://envoy-config-${RANDOM_STRING} --force 25 | 26 | echo "Detaching IAM policies from Envoy & Chabot SA Roles" 27 | aws iam detach-role-policy \ 28 | --role-name ${EKS_CLUSTER_NAME}-s3-access-role-${RANDOM_STRING} \ 29 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/s3-envoy-config-access-policy-${RANDOM_STRING} 30 | 31 | echo "Deleting S3 Access SA Roles in IAM" 32 | aws iam delete-role \ 33 | --role-name ${EKS_CLUSTER_NAME}-s3-access-role-${RANDOM_STRING} 34 | 35 | aws iam delete-policy \ 36 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/s3-envoy-config-access-policy-${RANDOM_STRING} 37 | 38 | TENANTS="tenanta tenantb" 39 | for t in $TENANTS 40 | do 41 | POOLNAME=${t}_chatbot_example_com_${RANDOM_STRING} 42 | POOLID=$(aws cognito-idp list-user-pools \ 43 | --max-results 20 \ 44 | --query 'UserPools[?Name==`'${POOLNAME}'`].Id' \ 45 | --output text) 46 | DOMAIN=$(aws cognito-idp describe-user-pool \ 47 | --user-pool-id ${POOLID} \ 48 | --query 'UserPool.Domain' \ 49 | --output text) 50 | 51 | aws cognito-idp delete-user-pool-domain \ 52 | --user-pool-id ${POOLID} \ 53 | --domain ${DOMAIN} 54 | 55 | aws cognito-idp delete-user-pool \ 56 | --user-pool-id ${POOLID} 57 | 58 | aws s3 rm s3://contextual-data-${t}-${RANDOM_STRING} --recursive 59 | aws s3 rb s3://contextual-data-${t}-${RANDOM_STRING} --force 60 | 61 | aws dynamodb delete-table \ 62 | --table-name Sessions_${t}_${RANDOM_STRING} 63 | aws dynamodb delete-table \ 64 | --table-name ChatHistory_${t}_${RANDOM_STRING} 65 | 66 | aws iam detach-role-policy \ 67 | --role-name ${EKS_CLUSTER_NAME}-${t}-chatbot-access-role-${RANDOM_STRING} \ 68 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/s3-contextual-data-access-policy-${t}-${RANDOM_STRING} 69 | 70 | aws iam detach-role-policy \ 71 | --role-name ${EKS_CLUSTER_NAME}-${t}-chatbot-access-role-${RANDOM_STRING} \ 72 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/dynamodb-access-policy-${t}-${RANDOM_STRING} 73 | 74 | aws iam delete-role \ 75 | --role-name ${EKS_CLUSTER_NAME}-${t}-chatbot-access-role-${RANDOM_STRING} 76 | 77 | aws iam delete-policy \ 78 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/s3-contextual-data-access-policy-${t}-${RANDOM_STRING} 79 | aws iam delete-policy \ 80 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/dynamodb-access-policy-${t}-${RANDOM_STRING} 81 | done 82 | 83 | echo "Removing KMS Key and Alias" 84 | export KMS_KEY_ALIAS=${EKS_CLUSTER_NAME}-${RANDOM_STRING} 85 | export MASTER_ARN=$(aws kms describe-key \ 86 | --key-id alias/${KMS_KEY_ALIAS} \ 87 | --query KeyMetadata.Arn --output text) 88 | 89 | aws kms disable-key \ 90 | --key-id ${MASTER_ARN} 91 | 92 | aws kms delete-alias \ 93 | --alias-name alias/${KMS_KEY_ALIAS} 94 | 95 | echo "Deleting EC2 Key-Pair" 96 | aws ec2 delete-key-pair \ 97 | --key-name ${EC2_KEY_NAME} 98 | 99 | echo "Removing Environemnt Variables from .bash_profile" 100 | sed -i '/export ACCOUNT_ID/d' ~/.bash_profile 101 | sed -i '/export AWS_REGION/d' ~/.bash_profile 102 | sed -i '/export AWS_DEFAULT_REGION/d' ~/.bash_profile 103 | sed -i '/export YAML_PATH/d' ~/.bash_profile 104 | sed -i '/export EKS_CLUSTER_NAME/d' ~/.bash_profile 105 | sed -i '/export RANDOM_STRING/d' ~/.bash_profile 106 | sed -i '/export EC2_KEY_NAME/d' ~/.bash_profile 107 | sed -i '/export KMS_KEY_ALIAS/d' ~/.bash_profile 108 | sed -i '/export MASTER_ARN/d' ~/.bash_profile 109 | sed -i '/export ISTIO_VERSION/d' ~/.bash_profile 110 | sed -i '/export ENVOY_CONFIG_BUCKET/d' ~/.bash_profile 111 | sed -i '/export ECR_REPO_CHATBOT/d' ~/.bash_profile 112 | sed -i '/export REPO_URI_CHATBOT/d' ~/.bash_profile 113 | sed -i '/export ECR_REPO_RAGAPI/d' ~/.bash_profile 114 | sed -i '/export REPO_URI_RAGAPI/d' ~/.bash_profile 115 | sed -i '/export ENVOY_IRSA/d' ~/.bash_profile 116 | sed -i '/export BEDROCK_REGION/d' ~/.bash_profile 117 | sed -i '/export BEDROCK_ENDPOINT/d' ~/.bash_profile 118 | sed -i '/export TEXT2TEXT_MODEL_ID/d' ~/.bash_profile 119 | sed -i '/export EMBEDDING_MODEL_ID/d' ~/.bash_profile 120 | sed -i '/export BEDROCK_SERVICE/d' ~/.bash_profile 121 | 122 | 123 | unset ACCOUNT_ID 124 | unset AWS_REGION 125 | unset AWS_DEFAULT_REGION 126 | unset YAML_PATH 127 | unset EKS_CLUSTER_NAME 128 | unset RANDOM_STRING 129 | unset EC2_KEY_NAME 130 | unset KMS_KEY_ALIAS 131 | unset MASTER_ARN 132 | unset ISTIO_VERSION 133 | unset ENVOY_CONFIG_BUCKET 134 | unset ECR_REPO_CHATBOT 135 | unset REPO_URI_CHATBOT 136 | unset ECR_REPO_RAGAPI 137 | unset REPO_URI_RAGAPI 138 | unset ENVOY_IRSA 139 | unset BEDROCK_REGION 140 | unset BEDROCK_ENDPOINT 141 | unset TEXT2TEXT_MODEL_ID 142 | unset EMBEDDING_MODEL_ID 143 | unset BEDROCK_SERVICE 144 | 145 | rm -rf $HOME/.ssh/id_rsa 146 | rm -rf certs 147 | rm -rf yaml 148 | rm -rf faiss_index* 149 | rm -rf bedrock-sdk/* -------------------------------------------------------------------------------- /configure-istio.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | . ~/.bash_profile 3 | 4 | # Directory for generated certs 5 | mkdir certs 6 | 7 | echo "Creating Root CA Cert and Key" 8 | openssl req -x509 -sha256 -nodes -days 365 \ 9 | -newkey rsa:2048 \ 10 | -subj '/O=Cluster 1 CA/CN=ca.example.com' \ 11 | -keyout certs/ca_example_com.key \ 12 | -out certs/ca_example_com.crt 13 | 14 | echo "Creating Cert and Key for Istio Ingress Gateway" 15 | openssl req \ 16 | -newkey rsa:2048 -nodes \ 17 | -subj "/O=Cluster 1/CN=*.example.com" \ 18 | -keyout certs/example_com.key \ 19 | -out certs/example_com.csr 20 | 21 | openssl x509 -req -days 365 \ 22 | -set_serial 0 \ 23 | -CA certs/ca_example_com.crt \ 24 | -CAkey certs/ca_example_com.key \ 25 | -in certs/example_com.csr \ 26 | -out certs/example_com.crt 27 | 28 | echo "Creating TLS secret for Istio Ingress Gateway" 29 | kubectl create -n istio-ingress secret generic credentials \ 30 | --from-file=tls.key=certs/example_com.key \ 31 | --from-file=tls.crt=certs/example_com.crt 32 | 33 | echo "Creating namespaces" 34 | kubectl create namespace llm-demo-gateway-ns 35 | kubectl create namespace envoy-reverse-proxy-ns 36 | kubectl create namespace tenanta-oidc-proxy-ns 37 | kubectl create namespace tenantb-oidc-proxy-ns 38 | kubectl create namespace tenanta-ns 39 | kubectl create namespace tenantb-ns 40 | 41 | echo "Enabling sidecar injection in namespaces" 42 | kubectl label namespace envoy-reverse-proxy-ns istio-injection=enabled 43 | kubectl label namespace tenanta-oidc-proxy-ns istio-injection=enabled 44 | kubectl label namespace tenantb-oidc-proxy-ns istio-injection=enabled 45 | kubectl label namespace tenanta-ns istio-injection=enabled 46 | kubectl label namespace tenantb-ns istio-injection=enabled 47 | 48 | kubectl get namespace -L istio-injection 49 | 50 | echo "Applying STRICT mTLS Policy on all application namespaces" 51 | cat << EOF > ${YAML_PATH}/strictmtls.yaml 52 | --- 53 | apiVersion: security.istio.io/v1beta1 54 | kind: PeerAuthentication 55 | metadata: 56 | name: strict-mtls 57 | spec: 58 | mtls: 59 | mode: STRICT 60 | EOF 61 | kubectl -n tenanta-ns apply -f ${YAML_PATH}/strictmtls.yaml 62 | kubectl -n tenantb-ns apply -f ${YAML_PATH}/strictmtls.yaml 63 | kubectl -n envoy-reverse-proxy-ns apply -f ${YAML_PATH}/strictmtls.yaml 64 | 65 | kubectl -n tenanta-ns get PeerAuthentication 66 | kubectl -n tenantb-ns get PeerAuthentication 67 | kubectl -n envoy-reverse-proxy-ns get PeerAuthentication 68 | 69 | rm -rf ${YAML_PATH}/strictmtls.yaml 70 | 71 | echo "Deploying Istio Gateway resource" 72 | cat << EOF > ${YAML_PATH}/llm-demo-gateway.yaml 73 | --- 74 | apiVersion: networking.istio.io/v1alpha3 75 | kind: Gateway 76 | metadata: 77 | name: llm-demo-gateway 78 | spec: 79 | selector: 80 | istio: ingressgateway 81 | servers: 82 | - port: 83 | number: 443 84 | name: https 85 | protocol: HTTPS 86 | tls: 87 | mode: SIMPLE 88 | credentialName: credentials 89 | minProtocolVersion: TLSV1_2 90 | maxProtocolVersion: TLSV1_3 91 | hosts: 92 | - 'tenanta-ns/*' 93 | - 'tenantb-ns/*' 94 | EOF 95 | kubectl -n llm-demo-gateway-ns apply -f ${YAML_PATH}/llm-demo-gateway.yaml 96 | 97 | rm -rf ${YAML_PATH}/llm-demo-gateway.yaml 98 | 99 | # Copying Envoy Dynamic Config files to S3 bucket 100 | echo "Copying Envoy Dynamic Config files to S3 bucket" 101 | aws s3 cp envoy-config/envoy.yaml s3://${ENVOY_CONFIG_BUCKET} 102 | aws s3 cp envoy-config/envoy-lds.yaml s3://${ENVOY_CONFIG_BUCKET} 103 | aws s3 cp envoy-config/envoy-cds.yaml s3://${ENVOY_CONFIG_BUCKET} 104 | 105 | echo "Deploying Envoy Reverse Proxy" 106 | export DOLLAR='$' 107 | cat << EOF > ${YAML_PATH}/envoy-reverse-proxy.yaml 108 | --- 109 | apiVersion: v1 110 | kind: ServiceAccount 111 | metadata: 112 | annotations: 113 | eks.amazonaws.com/role-arn: arn:aws:iam::${ACCOUNT_ID}:role/${EKS_CLUSTER_NAME}-s3-access-role-${RANDOM_STRING} 114 | name: envoy-reverse-proxy-sa 115 | --- 116 | apiVersion: v1 117 | kind: Service 118 | metadata: 119 | name: envoy-reverse-proxy 120 | labels: 121 | app: envoy-reverse-proxy 122 | spec: 123 | selector: 124 | app: envoy-reverse-proxy 125 | ports: 126 | - port: 80 127 | name: http 128 | targetPort: 8000 129 | --- 130 | apiVersion: apps/v1 131 | kind: Deployment 132 | metadata: 133 | name: envoy-reverse-proxy 134 | labels: 135 | app: envoy-reverse-proxy 136 | spec: 137 | replicas: 2 138 | selector: 139 | matchLabels: 140 | app: envoy-reverse-proxy 141 | minReadySeconds: 60 142 | strategy: 143 | type: RollingUpdate 144 | rollingUpdate: 145 | maxSurge: 1 146 | maxUnavailable: 1 147 | template: 148 | metadata: 149 | labels: 150 | app: envoy-reverse-proxy 151 | annotations: 152 | eks.amazonaws.com/skip-containers: "envoy-reverse-proxy" 153 | spec: 154 | serviceAccountName: envoy-reverse-proxy-sa 155 | initContainers: 156 | - name: envoy-reverse-proxy-bootstrap 157 | image: public.ecr.aws/aws-cli/aws-cli:2.13.6 158 | volumeMounts: 159 | - name: envoy-config-volume 160 | mountPath: /config/envoy 161 | command: ["/bin/sh", "-c"] 162 | args: 163 | - aws s3 cp s3://${DOLLAR}{ENVOY_CONFIG_S3_BUCKET}/envoy.yaml /config/envoy; 164 | aws s3 cp s3://${DOLLAR}{ENVOY_CONFIG_S3_BUCKET}/envoy-lds.yaml /config/envoy; 165 | aws s3 cp s3://${DOLLAR}{ENVOY_CONFIG_S3_BUCKET}/envoy-cds.yaml /config/envoy; 166 | env: 167 | - name: ENVOY_CONFIG_S3_BUCKET 168 | value: ${ENVOY_CONFIG_BUCKET} 169 | containers: 170 | - name: envoy-reverse-proxy 171 | image: envoyproxy/envoy:v1.27.0 172 | args: ["-c", "/config/envoy/envoy.yaml"] 173 | imagePullPolicy: Always 174 | ports: 175 | - containerPort: 8000 176 | volumeMounts: 177 | - name: envoy-config-volume 178 | mountPath: /config/envoy 179 | volumes: 180 | - name: envoy-config-volume 181 | emptyDir: {} 182 | EOF 183 | kubectl -n envoy-reverse-proxy-ns apply -f ${YAML_PATH}/envoy-reverse-proxy.yaml 184 | 185 | rm -rf ${YAML_PATH}/envoy-reverse-proxy.yaml 186 | 187 | echo "Adding Istio External Authorization Provider" 188 | cat << EOF > ${YAML_PATH}/auth-provider.yaml 189 | --- 190 | apiVersion: v1 191 | data: 192 | mesh: |- 193 | accessLogFile: /dev/stdout 194 | defaultConfig: 195 | discoveryAddress: istiod.istio-system.svc:15012 196 | proxyMetadata: {} 197 | tracing: 198 | zipkin: 199 | address: zipkin.istio-system:9411 200 | enablePrometheusMerge: true 201 | rootNamespace: istio-system 202 | trustDomain: cluster.local 203 | extensionProviders: 204 | - name: rev-proxy 205 | envoyExtAuthzHttp: 206 | service: envoy-reverse-proxy.envoy-reverse-proxy-ns.svc.cluster.local 207 | port: "80" 208 | timeout: 1.5s 209 | includeHeadersInCheck: ["authorization", "cookie"] 210 | headersToUpstreamOnAllow: ["authorization", "path", "x-auth-request-user", "x-auth-request-email"] 211 | headersToDownstreamOnDeny: ["content-type", "set-cookie"] 212 | EOF 213 | kubectl -n istio-system patch configmap istio --patch "$(cat ${YAML_PATH}/auth-provider.yaml)" 214 | kubectl rollout restart deployment/istiod -n istio-system 215 | 216 | rm -rf ${YAML_PATH}/auth-provider.yaml 217 | 218 | echo "Configuring AuthorizationPolicy on Istio Ingress Gateway" 219 | kubectl apply -f ${YAML_PATH}/chatbot-auth-policy.yaml 220 | rm -rf ${YAML_PATH}/chatbot-auth-policy.yaml -------------------------------------------------------------------------------- /create-dynamodb-table.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | source ~/.bash_profile 3 | 4 | TENANTS="tenanta tenantb" 5 | 6 | for t in $TENANTS 7 | do 8 | 9 | export TABLE_NAME="Sessions_${t}_${RANDOM_STRING}" 10 | 11 | echo "Creating DynamoDB table ${TABLE_NAME}" 12 | export DDB_TABLE=$(aws dynamodb create-table \ 13 | --table-name ${TABLE_NAME} \ 14 | --attribute-definitions \ 15 | AttributeName=TenantId,AttributeType=S \ 16 | --provisioned-throughput \ 17 | ReadCapacityUnits=5,WriteCapacityUnits=5 \ 18 | --key-schema \ 19 | AttributeName=TenantId,KeyType=HASH \ 20 | --table-class STANDARD 21 | ) 22 | 23 | export TABLE_NAME="ChatHistory_${t}_${RANDOM_STRING}" 24 | 25 | echo "Creating DynamoDB table ${TABLE_NAME}" 26 | export DDB_TABLE=$(aws dynamodb create-table \ 27 | --table-name ${TABLE_NAME} \ 28 | --attribute-definitions \ 29 | AttributeName=SessionId,AttributeType=S \ 30 | --provisioned-throughput \ 31 | ReadCapacityUnits=5,WriteCapacityUnits=5 \ 32 | --key-schema \ 33 | AttributeName=SessionId,KeyType=HASH \ 34 | --table-class STANDARD 35 | ) 36 | done -------------------------------------------------------------------------------- /data/Amazon_SageMaker_FAQs.csv: -------------------------------------------------------------------------------- 1 | What is Amazon SageMaker?,"Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows." 2 | "In which Regions is Amazon SageMaker available? 3 | ","For a list of the supported Amazon SageMaker AWS Regions, please visit the AWS Regional Services page. Also, for more information, see Regional endpoints in the AWS general reference guide." 4 | "What is the service availability of Amazon SageMaker? 5 | ","Amazon SageMaker is designed for high availability. There are no maintenance windows or scheduled downtimes. SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each AWS Region to provide fault tolerance in the event of a server failure or Availability Zone outage." 6 | How does Amazon SageMaker secure my code?,"Amazon SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest." 7 | What security measures does Amazon SageMaker have?,"Amazon SageMaker ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest. Requests to the SageMaker API and console are made over a secure (SSL) connection. You pass AWS Identity and Access Management roles to SageMaker to provide permissions to access resources on your behalf for training and deployment. You can use encrypted Amazon Simple Storage Service (Amazon S3) buckets for model artifacts and data, as well as pass an AWS Key Management Service (KMS) key to SageMaker notebooks, training jobs, and endpoints, to encrypt the attached ML storage volume. Amazon SageMaker also supports Amazon Virtual Private Cloud (VPC) and AWS PrivateLink support." 8 | "Does Amazon SageMaker use or share models, training data, or algorithms?","Amazon SageMaker does not use or share customer models, training data, or algorithms. We know that customers care deeply about privacy and data security. That's why AWS gives you ownership and control over your content through simple, powerful tools that allow you to determine where your content will be stored, secure your content in transit and at rest, and manage your access to AWS services and resources for your users. We also implement responsible and sophisticated technical and physical controls that are designed to prevent unauthorized access to or disclosure of your content. As a customer, you maintain ownership of your content, and you select which AWS services can process, store, and host your content. We do not access your content for any purpose without your consent." 9 | "How am I charged for Amazon SageMaker? 10 | ","You pay for ML compute, storage, and data processing resources you use for hosting the notebook, training the model, performing predictions, and logging the outputs. Amazon SageMaker allows you to select the number and type of instance used for the hosted notebook, training, and model hosting. You pay only for what you use, as you use it; there are no minimum fees and no upfront commitments. See the Amazon SageMaker pricing page and the Amazon SageMaker Pricing calculator for details." 11 | "How can I optimize my Amazon SageMaker costs, such as detecting and stopping idle resources in order to avoid unnecessary charges?","There are several best practices you can adopt to optimize your Amazon SageMaker resource utilization. Some approaches involve configuration optimizations; others involve programmatic solutions. A full guide on this concept, complete with visual tutorials and code samples, can be found in this blog post." 12 | "What if I have my own notebook, training, or hosting environment?","Amazon SageMaker provides a full end-to-end workflow, but you can continue to use your existing tools with SageMaker. You can easily transfer the results of each stage in and out of SageMaker as your business requirements dictate." 13 | Is R supported with Amazon SageMaker?,"Yes, R is supported with Amazon SageMaker. You can use R within SageMaker notebook instances, which include a preinstalled R kernel and the reticulate library. Reticulate offers an R interface for the Amazon SageMaker Python SDK, enabling ML practitioners to build, train, tune, and deploy R models. 14 | " 15 | How can I check for imbalances in my model?,"Amazon SageMaker Clarify helps improve model transparency by detecting statistical bias across the entire ML workflow. SageMaker Clarify checks for imbalances during data preparation, after training, and ongoing over time, and also includes tools to help explain ML models and their predictions. Findings can be shared through explainability reports." 16 | What kind of bias does Amazon SageMaker Clarify detect?,"Measuring bias in ML models is a first step to mitigating bias. Bias may be measured before training and after training, as well as for inference for a deployed model. Each measure of bias corresponds to a different notion of fairness. Even considering simple notions of fairness leads to many different measures applicable in various contexts. You need to choose bias notions and metrics that are valid for the application and the situation under investigation. SageMaker currently supports the computation of different bias metrics for training data (as part of SageMaker data preparation), for the trained model (as part of Amazon SageMaker Experiments), and for inference for a deployed model (as part of Amazon SageMaker Model Monitor). For example, before training, we provide metrics for checking whether the training data is representative (that is, whether one group is underrepresented) and whether there are differences in the label distribution across groups. After training or during deployment, metrics can be helpful to measure whether (and by how much) the performance of the model differs across groups. For example, start by comparing the error rates (how likely a model's prediction is to differ from the true label) or break further down into precision (how likely a positive prediction is to be correct) and recall (how likely the model will correctly label a positive example)." 17 | How does Amazon SageMaker Clarify improve model explainability?,Amazon SageMaker Clarify is integrated with Amazon SageMaker Experiments to provide a feature importance graph detailing the importance of each input for your model’s overall decision-making process after the model has been trained. These details can help determine if a particular model input has more influence than it should on overall model behavior. SageMaker Clarify also makes explanations for individual predictions available via an API.  18 | What is Amazon SageMaker Studio?,"Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps. SageMaker Studio gives you complete access, control, and visibility into each step required to prepare data and build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place, making you much more productive. All ML development activities including notebooks, experiment management, automatic model creation, debugging and profiling, and model drift detection can be performed within the unified SageMaker Studio visual interface." 19 | What is RStudio on Amazon SageMaker?,"RStudio on Amazon SageMaker is the first fully managed RStudio Workbench in the cloud. You can quickly launch the familiar RStudio integrated development environment (IDE) and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) and analytics solutions in R at scale. You can seamlessly switch between the RStudio IDE and Amazon SageMaker Studio notebooks for R and Python development. All your work, including code, datasets, repositories, and other artifacts, is automatically synchronized between the two environments to reduce context switch and boost productivity." 20 | "How does Amazon SageMaker Studio pricing work? 21 | ",There is no additional charge for using Amazon SageMaker Studio. You pay only for the underlying compute and storage charges on the services you use within Amazon SageMaker Studio. 22 | In which Regions is Amazon SageMaker Studio supported?,You can find the Regions where Amazon SageMaker Studio is supported in the documentation here. 23 | What ML governance tools does Amazon SageMaker provide?,"Amazon SageMaker provides purpose-built ML governance tools across the ML lifecycle. With SageMaker Role Manager, administrators can define minimum permissions in minutes. SageMaker Model Cards makes it easier to capture, retrieve, and share essential model information from conception to deployment, and SageMaker Model Dashboard keeps you informed on production model behavior, all in one place. View more details." 24 | What does Amazon SageMaker Role Manager do?,"You can define minimum permissions in minutes with Amazon SageMaker Role Manager. SageMaker Role Manager provides a baseline set of permissions for ML activities and personas with a catalog of pre-built IAM policies. You can keep the baseline permissions, or customize them further based on your specific needs. With a few self-guided prompts, you can quickly input common governance constructs such as network access boundaries and encryption keys. SageMaker Role Manager will then generate the IAM policy automatically. You can discover the generated role and associated policies through the AWS IAM console. To further tailor the permissions to your use case, attach your managed IAM policies to the IAM role that you create with SageMaker Role Manager. You can also add tags to help identify the role and organize across AWS services." 25 | What does Amazon SageMaker Model Cards do?,Amazon SageMaker Model Cards helps you centralize and standardize model documentation throughout the ML lifecycle by creating a single source of truth for model information. SageMaker Model Cards auto-populates training details to accelerate the documentation process. You can also add details such as the purpose of the model and the performance goals. You can attach model evaluation results to your model card and provide visualizations to gain key insights into model performance. SageMaker Model Cards can easily be shared with others by exporting to a pdf format. 26 | What does Amazon SageMaker Model Dashboard do? ,"Amazon SageMaker Model Dashboard gives you a comprehensive overview of deployed models and endpoints, letting you track resources and model behavior violations through one pane. It allows you to monitor model behavior in four dimensions, including data and model quality, and bias and feature attribution drift through its integration with Amazon SageMaker Model Monitor and Amazon SageMaker Clarify. SageMaker Model Dashboard also provides an integrated experience to set up and receive alerts for missing and inactive model monitoring jobs, and deviations in model behavior for model quality, data quality, bias drift, and feature attribution drift. You can further inspect individual models and analyze factors impacting model performance over time. Then, you can follow up with ML practitioners to take corrective measures." 27 | What is Amazon SageMaker Autopilot?,"Amazon SageMaker Autopilot is the industry’s first automated machine learning capability that gives you complete control and visibility into your ML models. SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks. The result is the best-performing model that you can deploy at a fraction of the time normally required to train the model. You get full visibility into how the model was created and what’s in it, and SageMaker Autopilot integrates with Amazon SageMaker Studio. You can explore up to 50 different models generated by SageMaker Autopilot inside SageMaker Studio so it’s easy to pick the best model for your use case. SageMaker Autopilot can be used by people without ML experience to easily produce a model, or it can be used by experienced developers to quickly develop a baseline model on which teams can further iterate." 28 | What built-in algorithms are supported in Amazon SageMaker Autopilot?,Amazon SageMaker Autopilot supports 2 built-in algorithms: XGBoost and Linear Learner. 29 | Can I stop an Amazon SageMaker Autopilot job manually?,"Yes. You can stop a job at any time. When an Amazon SageMaker Autopilot job is stopped, all ongoing trials will be stopped and no new trial will be started." 30 | How do I get started with Amazon SageMaker quickly?,"Amazon SageMaker JumpStart helps you quickly and easily get started with ML. SageMaker JumpStart provides a set of solutions for the most common use cases that can be deployed readily with just a few clicks. The solutions are fully customizable and showcase the use of AWS CloudFormation templates and reference architectures so you can accelerate your ML journey. SageMaker JumpStart also supports one-click deployment and fine-tuning of more than 150 popular open-source models such as transformer, object detection, and image classification models." 31 | Which open-source models are supported with Amazon SageMaker JumpStart?,"Amazon SageMaker JumpStart includes 150+ pre-trained open-source models from PyTorch Hub and TensorFlow Hub. For vision tasks such as image classification and object detection, you can use models such as ResNet, MobileNet, and Single-Shot Detector (SSD). For text tasks such as sentence classification, text classification, and question answering, you can use models such as BERT, RoBERTa, and DistilBERT." 32 | "What solutions come pre-built with Amazon SageMaker JumpStart? 33 | ","SageMaker JumpStart includes solutions that are preconfigured with all necessary AWS services to launch a solution into production. Solutions are fully customizable so you can easily modify them to fit your specific use case and dataset. You can use solutions for over 15 use cases including demand forecasting, fraud detection, and predictive maintenance, and readily deploy solutions with just a few clicks. For more information about all solutions available, visit the SageMaker getting started page. " 34 | How can I share ML artifacts with others within my organization?,"With Amazon SageMaker JumpStart, data scientists and ML developers can easily share ML artifacts, including notebooks and models, within their organization. Administrators can set up a repository that is accessible by a defined set of users. All users with permission to access the repository can browse, search, and use models and notebooks as well as the public content inside of SageMaker JumpStart. Users can select artifacts to train models, deploy endpoints, and execute notebooks in SageMaker JumpStart." 35 | Why should I use Amazon SageMaker JumpStart to share ML artifacts with others within my organization?,"With Amazon SageMaker JumpStart, you can accelerate time-to-market when building ML applications. Models and notebooks built by one team inside of your organization can be easily shared with other teams within your organization with just a few clicks. Internal knowledge sharing and asset reuse can significantly increase the productivity of your organization." 36 | "How does Amazon SageMaker JumpStart pricing work? 37 | ","You are charged for the AWS services launched from Amazon SageMaker JumpStart, such as training jobs and endpoints, based on SageMaker pricing. There is no additional charge for using SageMaker JumpStart." 38 | What is Amazon SageMaker Canvas?,"Amazon SageMaker Canvas is a no-code service with an intuitive, point-and-click interface that lets you create highly accurate ML-based predictions from your data. SageMaker Canvas lets you access and combine data from a variety of sources using a drag-and-drop user interface, automatically cleaning and preparing data to minimize manual cleanup. SageMaker Canvas applies a variety of state-of-the-art ML algorithms to find highly accurate predictive models and provides an intuitive interface to make predictions. You can use SageMaker Canvas to make much more precise predictions in a variety of business applications and easily collaborate with data scientists and analysts in your enterprise by sharing your models, data, and reports. To learn more about SageMaker Canvas, please visit the SageMaker Canvas FAQ page." 39 | "How does Amazon SageMaker Canvas pricing work? 40 | ","With Amazon SageMaker Canvas, you pay based on usage. SageMaker Canvas lets you interactively ingest, explore, and prepare your data from multiple sources, train highly accurate ML models with your data, and generate predictions. There are two components that determine your bill: session charges based on the number of hours for which SageMaker Canvas is used or logged into, and charges for training the model based on the size of the dataset used to build the model. For more information see the SageMaker Canvas pricing page." 41 | How can I build a continuous integration and delivery (CI/CD) pipeline with Amazon SageMaker?,"Amazon SageMaker Pipelines helps you create fully automated ML workflows from data preparation through model deployment so you can scale to thousands of ML models in production. SageMaker Pipelines comes with a Python SDK that connects to Amazon SageMaker Studio so you can take advantage of a visual interface to build each step of the workflow. Then using a single API, you can connect each step to create an end-to-end workflow. SageMaker Pipelines takes care of managing data between steps, packaging the code recipes, and orchestrating their execution, reducing months of coding to a few hours. Every time a workflow executes, a complete record of the data processed and actions taken is kept so data scientists and ML developers can quickly debug problems." 42 | How do I view all my trained models to choose the best model to move to production?,"Amazon SageMaker Pipelines provides a central repository of trained models called a model registry. You can discover models and access the model registry visually through SageMaker Studio or programmatically through the Python SDK, making it easy to choose your desired model to deploy into production." 43 | What components of Amazon SageMaker can be added to Amazon SageMaker Pipelines?,"The components available through Amazon SageMaker Studio, including Amazon SageMaker Amazon Clarify, Amazon SageMaker Data Wrangler, Amazon SageMaker Feature Store, Amazon SageMaker Experiments, Amazon SageMaker Debugger, and Amazon SageMaker Model Monitor, can be added to SageMaker Pipelines." 44 | How do I track my model components across the entire ML workflow?,"Amazon SageMaker Pipelines automatically keeps track of all model constituents and keeps an audit trail of all changes, thereby eliminating manual tracking, and can help you achieve compliance goals. You can track data, code, trained models, and more with SageMaker Pipelines." 45 | How does the pricing for Amazon SageMaker Pipelines work?,There is no additional charge for Amazon SageMaker Pipelines. You pay only for the underlying compute or any separate AWS services you use within SageMaker Pipelines. 46 | Can I use Kubeflow with Amazon SageMaker?,"Yes. Amazon SageMaker Components for Kubeflow Pipelines are open-source plugins that allow you to use Kubeflow Pipelines to define your ML workflows and use SageMaker for the data labeling, training, and inference steps. Kubeflow Pipelines is an add-on to Kubeflow that lets you build and deploy portable and scalable end-to-end ML pipelines. However, when using Kubeflow Pipelines, ML ops teams need to manage a Kubernetes cluster with CPU and GPU instances and keep its utilization high at all times to reduce operational costs. Maximizing the utilization of a cluster across data science teams is challenging and adds additional operational overhead to the ML ops teams. As an alternative to an ML-optimized Kubernetes cluster, with SageMaker Components for Kubeflow Pipelines you can take advantage of powerful SageMaker features such as data labeling, fully managed large-scale hyperparameter tuning and distributed training jobs, one-click secure and scalable model deployment, and cost-effective training through Amazon EC2 Spot instances without needing to configure and manage Kubernetes clusters specifically to run the ML jobs." 47 | How does Amazon SageMaker Components for Kubeflow Pipelines pricing work?,There is no additional charge for using Amazon SageMaker Components for Kubeflow Pipelines.  48 | How can Amazon SageMaker prepare data for ML?,"Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for ML. From a single interface in Amazon SageMaker Studio, you can browse and import data from Amazon S3, Amazon Athena, Amazon Redshift, AWS Lake Formation, Amazon SageMaker Feature Store, and Snowflake in just a few clicks. You can also query and import data that is transferred from over 40 data sources and registered in AWS Glue Data Catalog by Amazon AppFlow. SageMaker Data Wrangler will automatically load, aggregate, and display the raw data. After importing your data into SageMaker Data Wrangler, you can see automatically generated column summaries and histograms. You can then dig deeper to understand your data and identify potential errors with the SageMaker Data Wrangler Data Quality and Insights report, which provides summary statistics and data quality warnings. You can also run bias analysis supported by Amazon SageMaker Clarify directly from SageMaker Data Wrangler to detect potential bias during data preparation. From there, you can use SageMaker Data Wrangler’s pre-built transformations to prepare your data. Once your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines or import that data into Amazon SageMaker Feature Store." 49 | How can I create model features with Amazon SageMaker Data Wrangler?,"Without writing a single line of code, Amazon SageMaker Data Wrangler can automatically transform your data into new features. SageMaker Data Wrangler offers a selection of preconfigured data transforms, impute missing data, one-hot encoding, dimensionality reduction using principal components analysis (PCA), as well as time-series specific transformers. For example, you can convert a text field column into a numerical column with a single click. You can also author a code snippet from SageMaker Data Wrangler’s library of snippets." 50 | How can I visualize my data in Amazon SageMaker Data Wrangler?,"Amazon SageMaker Data Wrangler helps you understand your data and identify potential errors and extreme values with a set of robust pre-configured visualization templates. Histograms, scatter plots, and ML-specific visualizations, such as target leakage detection, are all available without writing a single line of code. You can also create and edit your own visualizations." 51 | How does the pricing for Amazon SageMaker Data Wrangler work?,"You pay for all ML compute, storage, and data processing resources you use for Amazon SageMaker Data Wrangler. You can review all the details of SageMaker Data Wrangler pricing here. As part of the AWS Free Tier, you can also get started with SageMaker Data Wrangler for free." 52 | How can I train machine learning models with data prepared in Amazon SageMaker Data Wrangler?,"Amazon SageMaker Data Wrangler provides a unified experience enabling you to prepare data and seamlessly train a machine learning model in Amazon SageMaker Autopilot. SageMaker Autopilot automatically builds, trains, and tunes the best ML models based on your data. With SageMaker Autopilot, you still maintain full control and visibility of your data and model. You can also use features prepared in SageMaker Data Wrangler with your existing models. You can configure Amazon SageMaker Data Wrangler processing jobs to run as part of your SageMaker training pipeline either by configuring the job in the user interface (UI) or exporting a notebook with the orchestration code." 53 | How does Amazon SageMaker Data Wrangler handle new data when I have prepared my features on historical data?,"You can configure and launch Amazon SageMaker processing jobs directly from the SageMaker Data Wrangler UI, including scheduling your data processing job and parametrizing your data sources to easily transform new batches of data at scale." 54 | How does Amazon SageMaker Data Wrangler work with my CI/CD processes?,"Once you have prepared your data, Amazon SageMaker Data Wrangler provides different options for promoting your SageMaker Data Wrangler flow to production and integrates seamlessly with MLOps and CI/CD capabilities. You can configure and launch SageMaker processing jobs directly from the SageMaker Data Wrangler UI, including scheduling your data processing job and parametrizing your data sources to easily transform new batches of data at scale. Alternatively, SageMaker Data Wrangler integrates seamlessly with SageMaker processing and the SageMaker Spark container, allowing you to easily use SageMaker SDKs to integrate SageMaker Data Wrangler into your production workflow." 55 | "What model does Amazon SageMaker Data Wrangler Quick Model use? 56 | ","In a few clicks of a button, Amazon SageMaker Data Wrangler splits and trains an XGBoost model with default hyperparameters. Based on the problem type, SageMaker Data Wrangler provides a model summary, feature summary, and confusion matrix to quickly give you insight so you can iterate on your data preparation flows. " 57 | "What size data does Amazon SageMaker Data Wrangler support? 58 | ","Amazon SageMaker Data Wrangler supports various sampling techniques–such as top-K, random, and stratified sampling for importing data—so that you can quickly transform your data using SageMaker Data Wrangler’s UI. If you are using large or wide datasets, you can increase the SageMaker Data Wrangler instance size to improve performance. Once you have created your flow, you can process your full dataset using SageMaker Data Wrangler processing jobs." 59 | Does Amazon SageMaker Data Wrangler work with Amazon SageMaker Feature Store?,You can configure Amazon SageMaker Feature Store as a destination for your features prepared in Amazon SageMaker Data Wrangler. This can be done directly in the UI or you can export a notebook generated specifically for processing data with SageMaker Feature Store as the destination. 60 | How do I store features for my ML models?,"Amazon SageMaker Feature Store provides a central repository for data features with low latency (milliseconds) reads and writes. Features can be stored, retrieved, discovered, and shared through SageMaker Feature Store for easy reuse across models and teams with secure access and control. SageMaker Feature Store supports both online and offline features generated via batch or streaming pipelines. It supports backfilling the features and provides both online and offline stores to maintain parity between features used in model training and inference." 61 | How do I maintain consistency between online and offline features?,Amazon SageMaker Feature Store automatically maintains consistency between online and offline features without additional management or code. SageMaker Feature Store is fully managed and maintains consistency across training and inference environments. 62 | How can I reproduce a feature from a given moment in time?,Amazon SageMaker Feature Store maintains time stamps for all features at every instance of time. This helps you retrieve features at any period of time for business or compliance requirements. You can easily explain model features and their values from when they were first created to the present time by reproducing the model from a given moment in time. 63 | What are offline features?,"Offline features are used for training because you need access to very large volumes over a long period of time. These features are served from a high-throughput, high-bandwidth repository." 64 | What are online features?,Online features are used in applications required to make real-time predictions. Online features are served from a high-throughput repository with single-digit millisecond latency for fast predictions. 65 | How does pricing work for Amazon SageMaker Feature Store?,"You can get started with Amazon SageMaker Feature Store for free, as part of the AWS Free Tier. With SageMaker Feature Store, you pay for writing into the feature store, and reading and storage from the online feature store. For pricing details, see the SageMaker Pricing Page." 66 | What does Amazon SageMaker offer for data labeling?,"Amazon SageMaker provides two data labeling offerings, Amazon SageMaker Ground Truth Plus and Amazon SageMaker Ground Truth. Both options allow you to identify raw data, such as images, text files, and videos, and add informative labels to create high-quality training datasets for your ML models. To learn more, visit the SageMaker Data Labeling webpage." 67 | What is geospatial data? ,"Geospatial data represents features or objects on the Earth’s surface. The first type of geospatial data is vector data which uses two-dimensional geometries such as, points, lines, or polygons to represent objects like roads and land boundaries. The second type of geospatial data is raster data such as imagery captured by satellite, aerial platforms, or remote sensing data. This data type uses a matrix of pixels to define where features are located. You can use raster formats for storing data that varies. A third type of geospatial data is geo-tagged location data. It includes points of interest—for example, the Eiffel Tower—location tagged social media posts, latitude and longitude coordinates, or different styles and formats of street addresses. " 68 | What are Amazon SageMaker geospatial capabilities? ,"Amazon SageMaker geospatial capabilities make it easier for data scientists and machine learning (ML) engineers to build, train, and deploy ML models for making predictions using geospatial data. You can bring your own data, for example, Planet Labs satellite data from Amazon S3, or acquire data from Open Data on AWS, Amazon Location Service, and other Amazon SageMaker geospatial data sources. " 69 | Why should I use geospatial ML on Amazon SageMaker?,"You can use Amazon SageMaker geospatial capabilities to make predictions on geospatial data faster than do-it-yourself solutions. Amazon SageMaker geospatial capabilities make it easier to access geospatial data from your existing customer data lakes, open-source datasets, and other Amazon SageMaker geospatial data sources. Amazon SageMaker geospatial capabilities minimize the need for building custom infrastructure and data pre-processing functions by offering purpose-built algorithms for efficient data preparation, model training, and inference. You can also create and share custom visualizations and data with your organization from Amazon SageMaker Studio. Amazon SageMaker geospatial capabilities include pre-trained models for common uses in agriculture, real estate, insurance, and financial services." 70 | What are Amazon SageMaker Studio notebooks?,"Amazon SageMaker Studio notebooks are quick start, collaborative, managed Jupyter notebooks. Amazon SageMaker Studio notebooks integrate with purpose-built ML tools in SageMaker and other AWS services for end-to-end ML development in Amazon SageMaker Studio, the fully integrated development environment (IDE) for ML." 71 | How are Amazon SageMaker Studio notebooks different from the instance-based notebooks offering?,"SageMaker Studio notebooks offer a few important features that differentiate them from the instance-based notebooks. With the Studio notebooks, you can quickly launch notebooks without needing to manually provision an instance and waiting for it to be operational. The startup time of launching the UI to read and execute a notebook is faster than the instance-based notebooks. You also have the flexibility to choose from a large collection of instance types from within the UI at any time. You do not need to go to the AWS Management Console to start new instances and port over your notebooks. Each user has an isolated home directory independent of a particular instance. This directory is automatically mounted into all notebook servers and kernels as they’re started, so you can access your notebooks and other files even when you switch instances to view and run your notebooks. SageMaker Studio notebooks are integrated with AWS IAM Identity Center (successor to AWS SSO), making it easy to use your organizational credentials to access the notebooks. Notebook sharing is an integrated feature in SageMaker Studio notebooks. You can share your notebooks with your peers using a single click or even co-edit a single notebook at the same time." 72 | How do Amazon SageMaker Studio notebooks work?,"Amazon SageMaker Studio notebooks are one-click Jupyter notebooks that can be spun quickly. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. SageMaker also enables one-click sharing of notebooks. You can easily share notebooks with others and they’ll get the exact same notebook, saved in the same place. With SageMaker Studio notebooks you can sign in with your corporate credentials using AWS IAM Identity Center (successor to AWS SSO). Sharing notebooks within and across teams is easy, since the dependencies needed to run a notebook are automatically tracked in work images that are encapsulated with the notebook as it is shared." 73 | What are the shared spaces in Amazon SageMaker?,"Machine learning practitioners can create a shared workspace where teammates can read and edit Amazon SageMaker Studio notebooks together. By using the shared paces, teammates can coedit the same notebook file, run notebook code simultaneously, and review the results together to eliminate back and forth and streamline collaboration. In the shared spaces, ML teams will have built-in support for services like BitBucket and AWS CodeCommit, so they can easily manage different versions of their notebook and compare changes over time. Any resources created from within the notebooks, such as experiments and ML models, are automatically saved and associated with the specific workspace where they were created so teams can more easily stay organized and accelerate ML model development." 74 | How do Amazon SageMaker Studio notebooks work with other AWS services?,"Amazon SageMaker Studio notebooks give you access to all SageMaker features, such as distributed training, batch transform, hosting, and experiment management. You can access other services such as datasets in Amazon S3, Amazon Redshift, AWS Glue, Amazon EMR, or AWS Lake Formation from SageMaker notebooks." 75 | How does Amazon SageMaker Studio notebooks pricing work?,"You pay for both compute and storage when you use SageMaker Studio notebooks. See Amazon SageMaker Pricing for charges by compute instance type. Your notebooks and associated artifacts such as data files and scripts are persisted on Amazon EFS. See Amazon EFS Pricing for storage charges. As part of the AWS Free Tier, you can get started with Amazon SageMaker Studio notebooks for free." 76 | Do I get charged separately for each notebook created and run in SageMaker Studio?,"No. You can create and run multiple notebooks on the same compute instance. You pay only for the compute that you use, not for individual items. You can read more about this in our metering guide. In addition to the notebooks, you can also start and run terminals and interactive shells in SageMaker Studio, all on the same compute instance. Each application runs within a container or image. SageMaker Studio provides several built-in images purpose-built and preconfigured for data science and ML. You can read more about the SageMaker Studio developer environment in the guide for using SageMaker Studio notebooks." 77 | How do I monitor and shut down the resources used by my notebooks?,You can monitor and shut down the resources used by your SageMaker Studio notebooks through both SageMaker Studio visual interface and the AWS Management Console. See the documentation for more details. 78 | "I’m running a SageMaker Studio notebook. Will I still be charged if I close my browser, close the notebook tab, or just leave the browser open?","Yes, you will continue to be charged for the compute. This is similar to starting Amazon EC2 instances in the AWS Management Console and then closing the browser. The Amazon EC2 instances are still running and you still incur charges unless you explicitly shut down the instance." 79 | Do I get charged for creating and setting up an Amazon SageMaker Studio domain?,"No, you don’t get charged for creating or configuring an Amazon SageMaker Studio domain, including adding, updating, and deleting user profiles." 80 | How do I see the itemized charges for Amazon SageMaker Studio notebooks or other Amazon SageMaker services?,"As an admin, you can view the list of itemized charges for Amazon SageMaker, including SageMaker Studio, in the AWS Billing console. From the AWS Management Console for SageMaker, choose Services on the top menu, type ""billing"" in the search box and select Billing from the dropdown, then select Bills on the left panel. In the Details section, you can click on SageMaker to expand the list of Regions and drill down to the itemized charges." 81 | What is Amazon SageMaker Studio Lab?,"Amazon SageMaker Studio Lab is a free ML development environment that provides the compute, storage (up to 15 GB), and security—all at no cost—for anyone to learn and experiment with ML. All you need to get started is a valid email ID; you don’t need to configure infrastructure or manage identity and access or even sign up for an AWS account. SageMaker Studio Lab accelerates model building through GitHub integration, and it comes preconfigured with the most popular ML tools, frameworks, and libraries to get you started immediately. SageMaker Studio Lab automatically saves your work so you don’t need to restart between sessions. It’s as easy as closing your laptop and coming back later." 82 | Why should I use Amazon SageMaker Studio Lab?,"Amazon SageMaker Studio Lab is for students, researchers, and data scientists who need a free notebook development environment with no setup required for their ML classes and experiments. SageMaker Studio Lab is ideal for users who do not need a production environment but still want a subset of the SageMaker functionality to improve their ML skills. SageMaker sessions are automatically saved, enabling users to pick up where they left off for each user session." 83 | How does Amazon SageMaker Studio Lab work with other AWS services?,"Amazon SageMaker Studio Lab is a service built on AWS and uses many of the same core services as Amazon SageMaker Studio, such as Amazon S3 and Amazon EC2. Unlike the other services, customers will not need an AWS account. Instead, they will create an Amazon SageMaker Studio Lab specific account with an email address. This will give the user access to a limited environment (15 GB of storage, and 12 hour sessions) for them to run ML notebooks." 84 | What is Amazon SageMaker Canvas?,"Amazon SageMaker Canvas is a visual drag-and-drop service that allows business analysts to build ML models and generate accurate predictions without writing any code or requiring ML expertise. SageMaker Canvas makes it easy to access and combine data from a variety of sources, automatically clean data and apply a variety of data adjustments, and build ML models to generate accurate predictions with a single click. You can also easily publish results, explain and interpret models, and share models with others within your organization to review." 85 | "What data sources does Amazon SageMaker Canvas support? 86 | ","Amazon SageMaker Canvas enables you to seamlessly discover AWS data sources that your account has access to, including Amazon S3 and Amazon Redshift. You can browse and import data using the SageMaker Canvas visual drag-and-drop interface. Additionally, you can drag and drop files from your local disk, and use pre-built connectors to import data from third-party sources such as Snowflake." 87 | How do I build an ML model to generate accurate predictions in Amazon SageMaker Canvas?,"Once you have connected sources, selected a dataset, and prepared your data, you can select the target column that you want to predict to initiate a model creation job. Amazon SageMaker Canvas will automatically identify the problem type, generate new relevant features, test a comprehensive set of prediction models using ML techniques such as linear regression, logistic regression, deep learning, time-series forecasting, and gradient boosting, and build the model that makes accurate predictions based on your dataset." 88 | "How long does it take to build a model in Amazon SageMaker Canvas? How can I monitor progress during model creation? 89 | ","The time it takes to build a model depends on the size of your dataset. Small datasets can take less than 30 minutes, and large datasets can take a few hours. As the model creation job progresses, Amazon SageMaker Canvas provides detailed visual updates, including percent job complete and the amount of time left for job completion." 90 | What is Amazon SageMaker Experiments?,"Amazon SageMaker Experiments helps you organize and track iterations to ML models. SageMaker Experiments helps you manage iterations by automatically capturing the input parameters, configurations, and results, and storing them as ""experiments"". You can work within the visual interface of Amazon SageMaker Studio, where you can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually." 91 | What is Amazon SageMaker Debugger?,"Amazon SageMaker Debugger automatically captures real-time metrics during training, such as confusion matrices and learning gradients, to help improve model accuracy. The metrics from SageMaker Debugger can be visualized in Amazon SageMaker Studio for easy understanding. SageMaker Debugger can also generate warnings and remediation advice when common training problems are detected. SageMaker Debugger also automatically monitors and profiles system resources such as CPUs, GPUs, network, and memory in real time, and provides recommendations on re-allocation of these resources. This enables you to use your resources efficiently during training and helps reduce costs and resources." 92 | Does Amazon SageMaker support distributed training?,"Yes. Amazon SageMaker can automatically distribute deep learning models and large training sets across AWS GPU instances in a fraction of the time it takes to build and optimize these distribution strategies manually. The two distributed training techniques that SageMaker applies are data parallelism and model parallelism. Data parallelism is applied to improve training speeds by dividing the data equally across multiple GPU instances, allowing each instance to train concurrently. Model parallelism is useful for models too large to be stored on a single GPU and require the model to be partitioned into smaller parts before distributing across multiple GPUs. With only a few lines of additional code in your PyTorch and TensorFlow training scripts, SageMaker will automatically apply data parallelism or model parallelism for you, allowing you to develop and deploy your models faster. SageMaker will determine the best approach to split your model by using graph partitioning algorithms to balance the computation of each GPU while minimizing the communication between GPU instances. SageMaker also optimizes your distributed training jobs through algorithms that fully utilize the AWS compute and network in order to achieve near-linear scaling efficiency, which allows you to complete training faster than manual open-source implementations." 93 | What is Amazon SageMaker Training Compiler?,"Amazon SageMaker Training Compiler is a deep learning (DL) compiler that accelerates DL model training by up to 50 percent through graph- and kernel-level optimizations to use GPUs more efficiently. SageMaker Training Compiler is integrated with versions of TensorFlow and PyTorch in SageMaker, so you can speed up training in these popular frameworks with minimal code changes." 94 | How does Amazon SageMaker Training Compiler work?,"Amazon SageMaker Training Compiler accelerates training jobs by converting DL models from their high-level language representation to hardware-optimized instructions that train faster than jobs with the native frameworks. More specifically, SageMaker Training Compiler uses graph-level optimization (operator fusion, memory planning, and algebraic simplification), data flow-level optimizations (layout transformation, common sub-expression elimination), and backend optimizations (memory latency hiding, loop oriented optimizations) to produce an optimized model training job that more efficiently uses hardware resources and, as a result, trains faster." 95 | How can I use Amazon SageMaker Training Compiler?,"Amazon SageMaker Training Compiler is built into the SageMaker Python SDK and SageMaker Hugging Face Deep Learning Containers. You don’t need to change your workflows to access its speedup benefits. You can run training jobs in the same way as you already do, using any of the SageMaker interfaces: Amazon SageMaker notebook instances, Amazon SageMaker Studio, AWS SDK for Python (Boto3), and AWS Command Line Interface. You can enable SageMaker Training Compiler by adding a TrainingCompilerConfig class as a parameter when you create a framework estimator object. Practically, this means a couple of lines of code added to your existing training job script for a single GPU instance. Most up-to-date detailed documentation, sample notebooks, and examples are available in the documentation." 96 | What is the pricing of Amazon SageMaker Training Compiler? ,Training Compiler is a SageMaker Training feature and is provided at no additional charge exclusively to SageMaker customers. Customers can actually reduce their costs with Training Compiler as training times are reduced. 97 | What is Managed Spot Training?,"Managed Spot Training with Amazon SageMaker lets you train your ML models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%." 98 | How do I use Managed Spot Training?,"You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training jobs, both while they are running and while they are waiting for capacity." 99 | When should I use Managed Spot Training?,"Managed Spot Training is ideal when you have flexibility with your training runs and when you want to minimize the cost of your training jobs. With Managed Spot Training, you can reduce the cost of training your ML models by up to 90%." 100 | How does Managed Spot Training work?,"Managed Spot Training uses Amazon EC2 Spot instances for training, and these instances can be pre-empted when AWS needs capacity. As a result, Managed Spot Training jobs can run in small increments as and when capacity becomes available. The training jobs need not be restarted from scratch when there is an interruption, as Amazon SageMaker can resume the training jobs using the latest model checkpoint. The built-in frameworks and the built-in computer vision algorithms with SageMaker enable periodic checkpoints, and you can enable checkpoints with custom models." 101 | Do I need to periodically checkpoint with Managed Spot Training?,"We recommend periodic checkpoints as a general best practice for long-running training jobs. This prevents your Managed Spot Training jobs from restarting if capacity is pre-empted. When you enable checkpoints, Amazon SageMaker resumes your Managed Spot Training jobs from the last checkpoint." 102 | How do you calculate the cost savings with Managed Spot Training jobs?,"Once a Managed Spot Training job is completed, you can see the savings in the AWS Management Console and also calculate the cost savings as the percentage difference between the duration for which the training job ran and the duration for which you were billed. Regardless of how many times your Managed Spot Training jobs are interrupted, you are charged only once for the duration for which the data was downloaded." 103 | Which instances can I use with Managed Spot Training?,"Managed Spot Training can be used with all instances supported in Amazon SageMaker. 104 | " 105 | Which AWS Regions are supported with Managed Spot Training?,"Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available. 106 | " 107 | Are there limits to the size of the dataset I can use for training?,"There are no fixed limits to the size of the dataset you can use for training models with Amazon SageMaker. 108 | " 109 | What algorithms does Amazon SageMaker use to generate models?,"Amazon SageMaker includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, sequence2sequence, time-series forecasting, word2vec, and image classification. SageMaker also provides optimized Apache MXNet, Tensorflow, Chainer, PyTorch, Gluon, Keras, Horovod, Scikit-learn, and Deep Graph Library containers. In addition, Amazon SageMaker supports your custom training algorithms provided through a Docker image adhering to the documented specification." 110 | "What is Automatic Model Tuning? 111 | ",Most ML algorithms expose a variety of parameters that control how the underlying algorithm operates. Those parameters are generally referred to as hyperparameters and their values affect the quality of the trained models. Automatic model tuning is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model. 112 | What models can be tuned with Automatic Model Tuning?,"You can run automatic model tuning in Amazon SageMaker on top of any algorithm as long as it’s scientifically feasible, including built-in SageMaker algorithms, deep neural networks, or arbitrary algorithms you bring to SageMaker in the form of Docker images." 113 | Can I use Automatic Model Tuning outside of Amazon SageMaker?,Not at this time. The best model tuning performance and experience is within Amazon SageMaker. 114 | What is the underlying tuning algorithm for Automatic Model Tuning?,"Currently, the algorithm for tuning hyperparameters is a customized implementation of Bayesian Optimization. It aims to optimize a customer-specified objective metric throughout the tuning process. Specifically, it checks the object metric of completed training jobs, and uses the knowledge to infer the hyperparameter combination for the next training job." 115 | "Does Automatic Model Tuning recommend specific hyperparameters for tuning? 116 | ","No. How certain hyperparameters impact the model performance depends on various factors, and it is hard to definitively say one hyperparameter is more important than the others and thus needs to be tuned. For built-in algorithms within Amazon SageMaker, we do call out whether or not a hyperparameter is tunable." 117 | "How long does a hyperparameter tuning job take? 118 | ","The length of time for a hyperparameter tuning job depends on multiple factors, including the size of the data, the underlying algorithm, and the values of the hyperparameters. Additionally, customers can choose the number of simultaneous training jobs and total number of training jobs. All these choices affect how long a hyperparameter tuning job can last." 119 | "Can I optimize multiple objectives simultaneously, such as optimizing a model to be both fast and accurate? 120 | ","Not at this time. Currently, you need to specify a single objective metric to optimize or change your algorithm code to emit a new metric, which is a weighted average between two or more useful metrics, and have the tuning process optimize towards that objective metric." 121 | "How much does Automatic Model Tuning cost? 122 | ","There is no charge for a hyperparameter tuning job itself. You will be charged by the training jobs that are launched by the hyperparameter tuning job, based on model training pricing." 123 | "How do I decide to use Amazon SageMaker Autopilot or Automatic Model Tuning? 124 | ","Amazon SageMaker Autopilot automates everything in a typical ML workflow, including feature preprocessing, algorithm selection, and hyperparameter tuning, while specifically focusing on classification and regression use cases. Automatic Model Tuning, on the other hand, is designed to tune any model, no matter whether it is based on built-in algorithms, deep learning frameworks, or custom containers. In exchange for the flexibility, you have to manually pick the specific algorithm, hyperparameters to tune, and corresponding search ranges." 125 | What is reinforcement learning?,Reinforcement learning is a ML technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. 126 | "Can I train reinforcement learning models in Amazon SageMaker? 127 | ","Yes, you can train reinforcement learning models in Amazon SageMaker in addition to supervised and unsupervised learning models." 128 | How is reinforcement learning different from supervised learning?,"Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where the feedback provided to the agent is the correct set of actions for performing a task, reinforcement learning uses a delayed feedback where reward signals are optimized to ensure a long-term goal through a sequence of actions." 129 | "When should I use reinforcement learning? 130 | ","While the goal of supervised learning techniques is to find the right answer based on the patterns in the training data, the goal of unsupervised learning techniques is to find similarities and differences between data points. In contrast, the goal of reinforcement learning (RL) techniques is to learn how to achieve a desired outcome even when it is not clear how to accomplish that outcome. As a result, RL is more suited to enabling intelligent applications where an agent can make autonomous decisions such as robotics, autonomous vehicles, HVAC, industrial control, and more." 131 | "What type of environments can I use for training RL models? 132 | ","Amazon SageMaker RL supports a number of different environments for training RL models. You can use AWS services such as AWS RoboMaker, open-source environments or custom environments developed using Open AI Gym interfaces, or commercial simulation environments such as MATLAB and SimuLink." 133 | "Do I need to write my own RL agent algorithms to train RL models? 134 | ","No, Amazon SageMaker RL includes RL toolkits such as Coach and Ray RLLib that offer implementations of RL agent algorithms such as DQN, PPO, A3C, and many more." 135 | Can I bring my own RL libraries and algorithm implementation and run them in Amazon SageMaker RL?,"Yes, you can bring your own RL libraries and algorithm implementations in Docker Containers and run those in Amazon SageMaker RL." 136 | "Can I do distributed rollouts using Amazon SageMaker RL? 137 | ",Yes. You can even select a heterogeneous cluster where the training can run on a GPU instance and the simulations can run on multiple CPU instances. 138 | What deployment options does Amazon SageMaker provide? ,"After you build and train models, Amazon SageMaker provides three options to deploy them so you can start making predictions. Real-time inference is suitable for workloads with millisecond latency requirements, payload sizes up to 6 MB, and processing times of up to 60 seconds. Batch transform is ideal for offline predictions on large batches of data that are available up front. Asynchronous inference is designed for workloads that do not have sub-second latency requirements, payload sizes up to 1 GB, and processing times of up to 15 minutes. " 139 | What is Amazon SageMaker Asynchronous Inference?,"Amazon SageMaker Asynchronous Inference queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes and/or long processing times that need to be processed as they arrive. Optionally, you can configure auto-scaling settings to scale down the instance count to zero when not actively processing requests to save on costs. " 140 | How do I configure auto-scaling settings to scale down the instance count to zero when not actively processing requests?,"You can scale down the Amazon SageMaker Asynchronous Inference endpoint instance count to zero in order to save on costs when you are not actively processing requests. You need to define a scaling policy that scales on the ""ApproximateBacklogPerInstance"" custom metric and set the ""MinCapacity"" value to zero. For step-by-step instructions, please visit the autoscale an asynchronous endpoint section of the developer guide. " 141 | What is Amazon SageMaker Serverless Inference?,"Amazon SageMaker Serverless Inference is a purpose-built serverless model serving option that makes it easy to deploy and scale ML models. SageMaker Serverless Inference endpoints automatically start the compute resources and scale them in and out depending on traffic, eliminating the need for you to choose instance type, run provisioned capacity, or manage scaling. You can optionally specify the memory requirements for your serverless inference endpoint. You pay only for the duration of running the inference code and the amount of data processed, not for idle periods." 142 | Why should I use Amazon SageMaker Serverless Inference?,"Amazon SageMaker Serverless Inference simplifies the developer experience by eliminating the need to provision capacity up front and manage scaling policies. SageMaker Serverless Inference can scale instantly from tens to thousands of inferences within seconds based on the usage patterns, making it ideal for ML applications with intermittent or unpredictable traffic. For example, a chatbot service used by a payroll processing company experiences an increase in inquiries at the end of the month while for rest of the month traffic is intermittent. Provisioning instances for the entire month in such scenarios is not cost-effective, as you end up paying for idle periods. SageMaker Serverless Inference helps address these types of use cases by providing you automatic and fast scaling out of the box without the need for you to forecast traffic up front or manage scaling policies. Additionally, you pay only for the compute time to run your inference code (billed in milliseconds) and for data processing, making it a cost-effective option for workloads with intermittent traffic." 143 | What is Amazon SageMaker shadow testing?,"SageMaker helps you run shadow tests to evaluate a new ML model before production release by testing its performance against the currently deployed model. SageMaker deploys the new model in shadow mode alongside the current production model and mirrors a user-specified portion of the production traffic to the new model. It optionally logs the model inferences for offline comparison. It also provides a live dashboard with a comparison of key performance metrics, such as latency and error rate, between the production and shadow models to help you decide whether to promote the new model to production." 144 | Why should I use SageMaker for shadow testing?,"SageMaker simplifies the process of setting up and monitoring shadow variants so you can evaluate the performance of the new ML model on live production traffic. SageMaker eliminates the need for you to orchestrate infrastructure for shadow testing. It lets you control testing parameters such as the percentage of traffic mirrored to the shadow variant and the duration of the test. As a result, you can start small and increase the inference requests to the new model after you gain confidence in model performance. SageMaker creates a live dashboard displaying performance differences across key metrics, so you can easily compare model performance to evaluate how the new model differs from the production model." 145 | What is Amazon SageMaker Inference Recommender?,"Amazon SageMaker Inference Recommender is a new capability of Amazon SageMaker that reduces the time required to get ML models in production by automating performance benchmarking and tuning model performance across SageMaker ML instances. You can now use SageMaker Inference Recommender to deploy your model to an endpoint that delivers the best performance and minimizes cost. You can get started with SageMaker Inference Recommender in minutes while selecting an instance type and get recommendations for optimal endpoint configurations within hours, eliminating weeks of manual testing and tuning time. With SageMaker Inference Recommender, you pay only for the SageMaker ML instances used during load testing, and there are no additional charges." 146 | Why should I use Amazon SageMaker Inference Recommender?,"You should use SageMaker Inference Recommender if you need recommendations for the right endpoint configuration to improve performance and reduce costs. Previously, data scientists who wanted to deploy their models had to run manual benchmarks to select the right endpoint configuration. They had to first select the right ML instance type out of the 70+ available instance types based on the resource requirements of their models and sample payloads, and then optimize the model to account for differing hardware. Then, they had to conduct extensive load tests to validate that latency and throughput requirements are met and that the costs are low. SageMaker Inference Recommender eliminates this complexity by making it easy for you to: 1) get started in minutes with an instance recommendation; 2) conduct load tests across instance types to get recommendations on your endpoint configuration within hours; and 3) automatically tune container and model server parameters as well as perform model optimizations for a given instance type." 147 | How does Amazon SageMaker Inference Recommender work with other AWS services?,"Data scientists can access Amazon SageMaker Inference Recommender from SageMaker Studio, AWS SDK for Python (Boto3), or AWS CLI. They can get deployment recommendations within SageMaker Studio in the SageMaker model registry for registered model versions. Data scientists can search and filter the recommendations through SageMaker Studio, AWS SDK, or AWS CLI." 148 | "Can Amazon SageMaker Inference Recommender support multi-model endpoints or multi-container endpoints? 149 | ","No, we currently support only a single model per endpoint." 150 | What type of endpoints does SageMaker Inference Recommender support?,Currently we support only real-time endpoints. 151 | Can I use SageMaker Inference Recommender in one Region and benchmark in different Regions?,"At launch, we will support all Regions supported by Amazon SageMaker, except the AWS China Regions." 152 | Does Amazon SageMaker Inference Recommender support Amazon EC2 Inf1 instances?,"Yes, we support all types of containers. Amazon EC2 Inf1, based on the AWS Inferentia chip, requires a compiled model artifact using either the Neuron compiler or Amazon SageMaker Neo. Once you have a compiled model for an Inferentia target and the associated container image URI, you can use Amazon SageMaker Inference Recommender to benchmark different Inferentia instance types." 153 | What is Amazon SageMaker Model Monitor?,"Amazon SageMaker Model Monitor allows developers to detect and remediate concept drift. SageMaker Model Monitor automatically detects concept drift in deployed models and provides detailed alerts that help identify the source of the problem. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in Amazon SageMaker Studio. From inside SageMaker Studio, you can configure data to be collected, how to view it, and when to receive alerts." 154 | Can I access the infrastructure that Amazon SageMaker runs on?,"No. Amazon SageMaker operates the compute infrastructure on your behalf, allowing it to perform health checks, apply security patches, and do other routine maintenance. You can also deploy the model artifacts from training with custom inference code in your own hosting environment." 155 | "How do I scale the size and performance of an Amazon SageMaker model once in production? 156 | ","Amazon SageMaker hosting automatically scales to the performance needed for your application using Application Auto Scaling. In addition, you can manually change the instance number and type without incurring downtime by modifying the endpoint configuration." 157 | How do I monitor my Amazon SageMaker production environment?,"Amazon SageMaker emits performance metrics to Amazon CloudWatch Metrics so you can track metrics, set alarms, and automatically react to changes in production traffic. In addition, Amazon SageMaker writes logs to Amazon CloudWatch Logs to let you monitor and troubleshoot your production environment." 158 | What kinds of models can be hosted with Amazon SageMaker?,Amazon SageMaker can host any model that adheres to the documented specification for inference Docker images. This includes models created from Amazon SageMaker model artifacts and inference code. 159 | How many concurrent real-time API requests does Amazon SageMaker support?,Amazon SageMaker is designed to scale to a large number of transactions per second. The precise number varies based on the deployed model and the number and type of instances to which the model is deployed. 160 | What is Batch Transform?,"Batch Transform enables you to run predictions on large or small batch data. There is no need to break down the dataset into multiple chunks or manage real-time endpoints. With a simple API, you can request predictions for a large number of data records and transform the data quickly and easily." 161 | What is Amazon SageMaker Edge Manager?,"Amazon SageMaker Edge Manager makes it easier to optimize, secure, monitor, and maintain ML models on fleets of edge devices such as smart cameras, robots, personal computers, and mobile devices. SageMaker Edge Manager helps ML developers operate ML models on a variety of edge devices at scale." 162 | How do I get started with Amazon SageMaker Edge Manager?,"To get started with Amazon SageMaker Edge Manager, you need to compile and package your trained ML models in the cloud, register your devices, and prepare your devices with the SageMaker Edge Manager SDK. To prepare your model for deployment, SageMaker Edge Manager uses SageMaker Neo to compile your model for your target edge hardware. Once a model is compiled, SageMaker Edge Manager signs the model with an AWS generated key, then packages the model with its runtime and your necessary credentials to get it ready for deployment. On the device side, you register your device with SageMaker Edge Manager, download the SageMaker Edge Manager SDK, and then follow the instructions to install the SageMaker Edge Manager agent on your devices. The tutorial notebook provides a step-by-step example of how you can prepare the models and connect your models on edge devices with SageMaker Edge Manager." 163 | What devices are supported by Amazon SageMaker Edge Manager?,"Amazon SageMaker Edge Manager supports common CPU (ARM, x86) and GPU (ARM, Nvidia) based devices with Linux and Windows operating systems. Over time, SageMaker Edge Manager will expand to support more embedded processors and mobile platforms that are also supported by SageMaker Neo." 164 | Do I need to use Amazon SageMaker to train my model in order to use Amazon SageMaker Edge Manager?,"No, you do not. You can train your models elsewhere or use a pre-trained model from open source or from your model vendor." 165 | Do I need to use Amazon SageMaker Neo to compile my model in order to use Amazon SageMaker Edge Manager?,"Yes, you do. Amazon SageMaker Neo converts and compiles your models into an executable that you can then package and deploy on your edge devices. Once the model package is deployed, the Amazon SageMaker Edge Manager agent will unpack the model package and run the model on the device." 166 | How do I deploy models to the edge devices?,Amazon SageMaker Edge Manager stores the model package in your specified Amazon S3 bucket. You can use the over-the-air (OTA) deployment feature provided by AWS IoT Greengrass or any other deployment mechanism of your choice to deploy the model package from your S3 bucket to the devices. 167 | How is Amazon SageMaker Edge Manager SDK different from the SageMaker Neo runtime (dlr)?,"Neo dlr is an open-source runtime that only runs models compiled by the Amazon SageMaker Neo service. Compared to the open source dlr, the SageMaker Edge Manager SDK includes an enterprise grade on-device agent with additional security, model management, and model serving features. The SageMaker Edge Manager SDK is suitable for production deployment at scale." 168 | How is Amazon SageMaker Edge Manager related to AWS IoT Greengrass?,"Amazon SageMaker Edge Manager and AWS IoT Greengrass can work together in your IoT solution. Once your ML model is packaged with SageMaker Edge Manager, you can use AWS IoT Greengrass’s OTA update feature to deploy the model package  to your device. AWS IoT Greengrass allows you to monitor your IoT devices remotely, while SageMaker Edge Manager helps you monitor and maintain the ML models on the devices." 169 | How is Amazon SageMaker Edge Manager related to AWS Panorama? When should I use Amazon SageMaker Edge Manager versus AWS Panorama?,"AWS offers the most breadth and depth of capabilities for running models on edge devices. We have services to support a wide range of use cases, including computer vision, voice recognition, and predictive maintenance.For companies looking to run computer vision on edge devices such as cameras and appliances, you can use AWS Panorama. Panorama offers ready-to-deploy computer vision applications for edge devices. It’s easy to get started with AWS Panorama by logging into the cloud console, specifying the model you would like to use in Amazon S3 or in SageMaker, and then writing business logic as a Python script. AWS Panorama compiles the model for the target device and creates an application package so it can be deployed to your devices with just a few clicks. In addition, independent software vendors who want to build their own custom applications can use the AWS Panorama SDK, and device manufacturers can use the Device SDK to certify their devices for AWS Panorama. Customers who want to build their own models and have more granular control over model features can use Amazon SageMaker Edge Manager. SageMaker Edge Manager is a managed service to prepare, run, monitor, and update ML models across fleets of edge devices such as smart cameras, smart speakers, and robots for any use case such as natural langue processing, fraud detection, and predictive maintenance. SageMaker Edge Manager is for ML edge developers who want control over their model, including engineering different model features and monitoring models for drift. Any ML edge developer can use SageMaker Edge Manager through the SageMaker console and the SageMaker APIs. SageMaker Edge Manager brings the capabilities of SageMaker to build, train, and deploy models in the cloud to edge devices." 170 | In which AWS Regions is Amazon SageMaker Edge Manager available?,"Amazon SageMaker Edge Manager is available in six AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), and Asia Pacific (Tokyo). For details, see the AWS Regional Services list." 171 | What is Amazon SageMaker Neo?,Amazon SageMaker Neo enables ML models to train once and run anywhere in the cloud and at the edge. SageMaker Neo automatically optimizes models built with popular deep learning frameworks that can be used to deploy on multiple hardware platforms. Optimized models run up to 25 times faster and consume less than a tenth of the resources of typical ML models. 172 | How do I get started with Amazon SageMaker Neo?,"To get started with Amazon SageMaker Neo, log into the Amazon SageMaker console, choose a trained model, follow the example to compile models, and deploy the resulting model onto your target hardware platform." 173 | What are the major components of Amazon SageMaker Neo?,"Amazon SageMaker Neo contains two major components: a compiler and a runtime. First, the Neo compiler reads models exported by different frameworks. It then converts the framework-specific functions and operations into a framework-agnostic intermediate representation. Next, it performs a series of optimizations. Then, the compiler generates binary code for the optimized operations and writes them to a shared object library. The compiler also saves the model definition and parameters into separate files. During execution, the Neo runtime loads the artifacts generated by the compiler—model definition, parameters, and the shared object library to run the model." 174 | Do I need to use Amazon SageMaker to train my model in order to use Amazon SageMaker Neo to convert the model?,No. You can train models elsewhere and use Neo to optimize them for Amazon SageMaker ML instances or AWS IoT Greengrass supported devices. 175 | Which models does Amazon SageMaker Neo support?,"Currently, Amazon SageMaker Neo supports the most popular deep learning models that power computer vision applications and the most popular decision tree models used in Amazon SageMaker today. Neo optimizes the performance of AlexNet, ResNet, VGG, Inception, MobileNet, SqueezeNet, and DenseNet models trained in MXNet and TensorFlow, and classification and random cut forest models trained in XGBoost." 176 | Which hardware platforms does Amazon SageMaker Neo support?,"You can find the lists of supported cloud instances, edge devices, and framework versions in the Amazon SageMaker Neo documentation." 177 | In which AWS Regions is Amazon SageMaker Neo available?,"To see a list of supported Regions, view the AWS Regional Services list." 178 | What are Amazon SageMaker Savings Plans?,"Amazon SageMaker Savings Plans offer a flexible usage-based pricing model for Amazon SageMaker in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a one- or three-year term. Amazon SageMaker Savings Plans provide the most flexibility and help to reduce your costs by up to 64%. These plans automatically apply to eligible SageMaker ML instance usages, including SageMaker Studio notebooks, SageMaker On-Demand notebooks, SageMaker Processing, SageMaker Data Wrangler, SageMaker Training, SageMaker Real-Time Inference, and SageMaker Batch Transform regardless of instance family, size, or Region. For example, you can change usage from a CPU instance ml.c5.xlarge running in US East (Ohio) to an ml.Inf1 instance in US West (Oregon) for inference workloads at any time and automatically continue to pay the Savings Plans price." 179 | Why should I use Amazon SageMaker Savings Plans?,"If you have a consistent amount of Amazon SageMaker instance usage (measured in $/hour) and use multiple SageMaker components or expect your technology configuration (such as instance family, or Region) to change over time, SageMaker Savings Plans make it simpler to maximize your savings while providing flexibility to change the underlying technology configuration based on application needs or new innovation. The Savings Plans rate applies automatically to all eligible ML instance usage with no manual modifications required." 180 | How can I get started with Amazon SageMaker Savings Plans?,"You can get started with Savings Plans from AWS Cost Explorer in the AWS Management Console or by using the API/CLI. You can easily make a commitment to Savings Plans by using the recommendations provided in AWS Cost Explorer to realize the biggest savings. The recommended hourly commitment is based on your historical On-Demand usage and your choice of plan type, term length, and payment option. Once you sign up for a Savings Plan, your compute usage will automatically be charged at the discounted Savings Plans prices and any usage beyond your commitment will be charged at regular On-Demand rates." 181 | How are Savings Plans for Amazon SageMaker different from Compute Savings Plans for Amazon EC2?,The difference between Savings Plans for Amazon SageMaker and Savings Plans for EC2 is in the services they include. SageMaker Savings Plans apply only to SageMaker ML Instance usage. 182 | How do Savings Plans work with AWS Organizations/Consolidated Billing?,"Savings Plans can be purchased in any account within an AWS Organization/Consolidated Billing family. By default, the benefit provided by Savings Plans is applicable to usage across all accounts within an AWS Organization/Consolidated Billing family. However, you can also choose to restrict the benefit of Savings Plans to only the account that purchased them." -------------------------------------------------------------------------------- /data_ingestion_to_vectordb/data_ingestion_to_vectordb.py: -------------------------------------------------------------------------------- 1 | import os 2 | import boto3 3 | import botocore 4 | 5 | from langchain.document_loaders import CSVLoader 6 | from langchain.text_splitter import CharacterTextSplitter 7 | from langchain.indexes.vectorstore import VectorStoreIndexWrapper 8 | from langchain.embeddings import BedrockEmbeddings 9 | from langchain.llms.bedrock import Bedrock 10 | from langchain.vectorstores import FAISS 11 | 12 | LOCAL_RAG_DIR="data" 13 | FAISS_INDEX_DIR = "faiss_index" 14 | if not os.path.exists(LOCAL_RAG_DIR): 15 | os.makedirs(LOCAL_RAG_DIR) 16 | 17 | embeddings_model = os.environ.get('EMBEDDING_MODEL_ID') 18 | bedrock_service = os.environ.get('BEDROCK_SERVICE') 19 | boto3_bedrock = boto3.client(service_name=bedrock_service) 20 | br_embeddings = BedrockEmbeddings(client=boto3_bedrock, model_id=embeddings_model) 21 | 22 | TENANTS=["tenanta", "tenantb"] 23 | 24 | for t in TENANTS: 25 | if t == "tenanta": 26 | DATAFILE="Amazon_SageMaker_FAQs.csv" 27 | elif t == "tenantb": 28 | DATAFILE="Amazon_EMR_FAQs.csv" 29 | 30 | loader = CSVLoader(f"./{LOCAL_RAG_DIR}/{DATAFILE}") 31 | documents_aws = loader.load() 32 | print(f"documents:loaded:size={len(documents_aws)}") 33 | 34 | docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=",").split_documents(documents_aws) 35 | 36 | print(f"Documents:after split and chunking size={len(docs)}") 37 | 38 | vector_db = FAISS.from_documents( 39 | documents=docs, 40 | embedding=br_embeddings, 41 | ) 42 | 43 | print(f"vector_db:created={vector_db}::") 44 | 45 | vector_db.save_local(f"{FAISS_INDEX_DIR}-{t}") 46 | 47 | S3_BUCKET=f"contextual-data-{t}-{os.environ.get('RANDOM_STRING')}" 48 | print(f"S3 Bucket: ${S3_BUCKET}") 49 | 50 | s3_path = f"s3://{S3_BUCKET}/{DATAFILE}" 51 | 52 | s3 = boto3.resource('s3') 53 | 54 | try: 55 | to_upload = os.listdir(f"./{FAISS_INDEX_DIR}-{t}") 56 | for file in to_upload: 57 | s3.Bucket(S3_BUCKET).upload_file(f"./{FAISS_INDEX_DIR}-{t}/{file}", f"{FAISS_INDEX_DIR}/{file}", ) 58 | except botocore.exceptions.ClientError as e: 59 | if e.response['Error']['Code'] == "404": 60 | print("The object does not exist.") 61 | else: 62 | raise 63 | -------------------------------------------------------------------------------- /data_ingestion_to_vectordb/requirements.txt: -------------------------------------------------------------------------------- 1 | faiss-cpu==1.7.4 2 | langchain==0.0.305 -------------------------------------------------------------------------------- /deploy-eks.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | source ~/.bash_profile 3 | 4 | test -n "$EKS_CLUSTER_NAME" && echo EKS_CLUSTER_NAME is "$EKS_CLUSTER_NAME" || echo EKS_CLUSTER_NAME is not set 5 | 6 | export EKS_CLUSTER_VERSION="1.27" 7 | echo "Deploying Cluster ${EKS_CLUSTER_NAME} with EKS ${EKS_CLUSTER_VERSION}" 8 | 9 | cat << EOF > ${YAML_PATH}/cluster-config.yaml 10 | --- 11 | apiVersion: eksctl.io/v1alpha5 12 | kind: ClusterConfig 13 | metadata: 14 | name: ${EKS_CLUSTER_NAME} 15 | region: ${AWS_REGION} 16 | version: "${EKS_CLUSTER_VERSION}" 17 | iam: 18 | withOIDC: true 19 | serviceAccounts: 20 | - metadata: 21 | name: aws-load-balancer-controller 22 | namespace: kube-system 23 | wellKnownPolicies: 24 | awsLoadBalancerController: true 25 | availabilityZones: ["${AWS_REGION}a", "${AWS_REGION}b", "${AWS_REGION}c"] 26 | managedNodeGroups: 27 | - name: nodegroup 28 | desiredCapacity: 3 29 | instanceTypes: ["t3a.medium", "t3.medium"] 30 | volumeEncrypted: true 31 | ssh: 32 | allow: false 33 | cloudWatch: 34 | clusterLogging: 35 | enableTypes: ["*"] 36 | secretsEncryption: 37 | keyARN: ${MASTER_ARN} 38 | EOF 39 | 40 | eksctl create cluster -f ${YAML_PATH}/cluster-config.yaml 41 | 42 | aws eks update-kubeconfig --name=${EKS_CLUSTER_NAME} 43 | 44 | # Associate an OIDC provider with the EKS Cluster 45 | echo "Associating an OIDC provider with the EKS Cluster" 46 | eksctl utils associate-iam-oidc-provider \ 47 | --region=${AWS_REGION} \ 48 | --cluster=${EKS_CLUSTER_NAME} \ 49 | --approve 50 | 51 | export OIDC_PROVIDER=$(aws eks describe-cluster \ 52 | --name ${EKS_CLUSTER_NAME} \ 53 | --query "cluster.identity.oidc.issuer" \ 54 | --output text) 55 | 56 | echo "Installing AWS Load Balancer Controller" 57 | 58 | kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller/crds?ref=master" 59 | 60 | helm repo add eks https://aws.github.io/eks-charts 61 | 62 | helm repo update 63 | 64 | # Setting AWS Load Balancer Controller Version 65 | export VPC_ID=$(aws eks describe-cluster \ 66 | --name ${EKS_CLUSTER_NAME} \ 67 | --query "cluster.resourcesVpcConfig.vpcId" \ 68 | --output text) 69 | 70 | helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ 71 | -n kube-system \ 72 | --set clusterName=${EKS_CLUSTER_NAME} \ 73 | --set serviceAccount.create=false \ 74 | --set region=${AWS_REGION} \ 75 | --set vpcId=${VPC_ID} \ 76 | --set serviceAccount.name=aws-load-balancer-controller 77 | 78 | kubectl -n kube-system rollout status deployment aws-load-balancer-controller 79 | 80 | export OIDC_PROVIDER=$(aws eks describe-cluster \ 81 | --name ${EKS_CLUSTER_NAME} \ 82 | --query "cluster.identity.oidc.issuer" \ 83 | --output text) 84 | 85 | export OIDC_ID=$(echo $OIDC_PROVIDER | awk -F/ '{print $NF}') 86 | 87 | echo "Creating S3 Access Role in IAM" 88 | export S3_ACCESS_ROLE=${EKS_CLUSTER_NAME}-s3-access-role-${RANDOM_STRING} 89 | export ENVOY_IRSA=$( 90 | envsubst < iam/s3-access-role-trust-policy.json | \ 91 | xargs -0 -I {} aws iam create-role \ 92 | --role-name ${S3_ACCESS_ROLE} \ 93 | --assume-role-policy-document {} \ 94 | --query 'Role.Arn' \ 95 | --output text 96 | ) 97 | echo "Attaching S3 Bucket policy to S3 Access Role" 98 | aws iam attach-role-policy \ 99 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/s3-envoy-config-access-policy-${RANDOM_STRING} \ 100 | --role-name ${S3_ACCESS_ROLE} 101 | 102 | echo ${ENVOY_IRSA} 103 | echo "export ENVOY_IRSA=${ENVOY_IRSA}" \ 104 | | tee -a ~/.bash_profile 105 | 106 | TENANTS="tenanta tenantb" 107 | for t in $TENANTS 108 | do 109 | export NAMESPACE="${t}-ns" 110 | export SA_NAME="${t}-sa" 111 | 112 | echo "Creating DynamoDB / Bedrock Access Role in IAM" 113 | export CHATBOT_ACCESS_ROLE=${EKS_CLUSTER_NAME}-${t}-chatbot-access-role-${RANDOM_STRING} 114 | export CHATBOT_IRSA=$( 115 | envsubst < iam/chatbot-access-role-trust-policy.json | \ 116 | xargs -0 -I {} aws iam create-role \ 117 | --role-name ${CHATBOT_ACCESS_ROLE} \ 118 | --assume-role-policy-document {} \ 119 | --query 'Role.Arn' \ 120 | --output text 121 | ) 122 | echo "Attaching S3 Bucket and DynamoDB policy to Chatbot Access Role" 123 | aws iam attach-role-policy \ 124 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/s3-contextual-data-access-policy-${t}-${RANDOM_STRING} \ 125 | --role-name ${CHATBOT_ACCESS_ROLE} 126 | 127 | aws iam attach-role-policy \ 128 | --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/dynamodb-access-policy-${t}-${RANDOM_STRING} \ 129 | --role-name ${CHATBOT_ACCESS_ROLE} 130 | done -------------------------------------------------------------------------------- /deploy-istio.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | . ~/.bash_profile 3 | 4 | echo "Installing Istio with Ingress Gateway (NLB)" 5 | helm repo add istio https://istio-release.storage.googleapis.com/charts 6 | helm repo update 7 | 8 | kubectl create namespace istio-system 9 | kubectl create namespace istio-ingress 10 | 11 | helm install istio-base istio/base \ 12 | --namespace istio-system \ 13 | --version ${ISTIO_VERSION} \ 14 | --wait 15 | 16 | helm install istiod istio/istiod \ 17 | --namespace istio-system \ 18 | --version ${ISTIO_VERSION} \ 19 | --wait 20 | 21 | helm list --namespace istio-system --filter 'istio+' 22 | 23 | echo "Creating Istio Ingress Gateway, associating an internet-facing NLB instance" 24 | echo "with Proxy v2 protocol and cross-AZ loadbalancing enabled" 25 | LB_NAME=${EKS_CLUSTER_NAME}-nlb 26 | 27 | helm install istio-ingressgateway istio/gateway \ 28 | --namespace istio-ingress \ 29 | --set "service.annotations.service\.beta\.kubernetes\.io/aws-load-balancer-type"='external' \ 30 | --set "service.annotations.service\.beta\.kubernetes\.io/aws-load-balancer-scheme"="internet-facing" \ 31 | --set "service.annotations.service\.beta\.kubernetes\.io/aws-load-balancer-nlb-target-type"='ip' \ 32 | --set "service.annotations.service\.beta\.kubernetes\.io/aws-load-balancer-proxy-protocol"='*' \ 33 | --set "service.annotations.service\.beta\.kubernetes\.io/aws-load-balancer-name"="${LB_NAME}" \ 34 | --set "service.annotations.service\.beta\.kubernetes\.io/aws-load-balancer-attributes"='load_balancing.cross_zone.enabled=true' \ 35 | --version ${ISTIO_VERSION} \ 36 | --wait 37 | 38 | helm ls -n istio-ingress 39 | 40 | kubectl -n istio-system get svc 41 | kubectl -n istio-system get pods 42 | kubectl -n istio-ingress get svc 43 | kubectl -n istio-ingress get pods 44 | 45 | STATUS=$(aws elbv2 describe-load-balancers --name ${LB_NAME} \ 46 | --query 'LoadBalancers[0].State.Code') 47 | 48 | echo "Status of Load Balancer ${LB_NAME}: $STATUS" 49 | 50 | # Enable Proxy v2 protocol processing on Istio Ingress Gateway 51 | echo "Enabling Proxy v2 protocol processing on Istio Ingress Gateway" 52 | kubectl -n istio-ingress apply -f istio-proxy-v2-config/proxy-protocol-envoy-filter.yaml 53 | kubectl -n istio-ingress apply -f istio-proxy-v2-config/enable-X-Forwarded-For-header.yaml -------------------------------------------------------------------------------- /deploy-tenant-services.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | . ~/.bash_profile 3 | 4 | # Add oauth2-proxy Helm Repo 5 | helm repo add oauth2-proxy https://oauth2-proxy.github.io/manifests 6 | 7 | TENANTS="tenanta tenantb" 8 | 9 | for t in $TENANTS 10 | do 11 | export TENANT="${t}" 12 | export SA_NAME="${t}-sa" 13 | export NAMESPACE="${t}-ns" 14 | export DOMAIN="${t}-chatbot-example-com-${RANDOM_STRING}" 15 | 16 | USERPOOL_ID=$( 17 | aws cognito-idp describe-user-pool-domain \ 18 | --domain ${DOMAIN} \ 19 | --query 'DomainDescription.UserPoolId' \ 20 | --output text | xargs 21 | ) 22 | export ISSUER_URI=https://cognito-idp.${AWS_REGION}.amazonaws.com/${USERPOOL_ID} 23 | export SESSIONS_TABLE=Sessions_${t}_${RANDOM_STRING} 24 | export CHATHISTORY_TABLE=ChatHistory_${t}_${RANDOM_STRING} 25 | 26 | echo "Deploying ${t} services ..." 27 | 28 | echo "-> Deploying chatbot service" 29 | 30 | envsubst < chatbot-manifest/chatbot.yaml | kubectl -n ${NAMESPACE} apply -f - 31 | 32 | echo "Applying Frontend Authentication Policy for ${t}" 33 | kubectl -n ${NAMESPACE} apply -f ${YAML_PATH}/frontend-jwt-auth-${t}.yaml 34 | rm -rf ${YAML_PATH}/frontend-jwt-auth-${t}.yaml 35 | 36 | echo "Applying Frontend Authorization Policy for ${t}" 37 | kubectl -n ${NAMESPACE} apply -f ${YAML_PATH}/frontend-authz-pol-${t}.yaml 38 | rm -rf ${YAML_PATH}/frontend-authz-pol-${t}.yaml 39 | 40 | echo "-> Deploying VirtualService to expose chatbot via Ingress Gateway" 41 | envsubst < chatbot-manifest/chatbot-vs.yaml | kubectl -n ${NAMESPACE} apply -f - 42 | echo "Deploying OIDC Proxy for ${t}" 43 | helm install --namespace ${t}-oidc-proxy-ns oauth2-proxy \ 44 | oauth2-proxy/oauth2-proxy -f ${YAML_PATH}/oauth2-proxy-${t}-values.yaml 45 | rm -rf ${YAML_PATH}/oauth2-proxy-${t}-values.yaml 46 | done -------------------------------------------------------------------------------- /deploy-userpools.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | . ~/.bash_profile 3 | 4 | TENANTS="tenanta tenantb" 5 | USERS="user1" 6 | 7 | HOSTS="" 8 | 9 | for t in $TENANTS 10 | do 11 | echo "Creating User Pool for $t" 12 | export TENANT_ID=${t} 13 | POOLNAME=${t}_chatbot_example_com_${RANDOM_STRING} 14 | DOMAIN=${t}-chatbot-example-com-${RANDOM_STRING} 15 | CLIENTNAME=${t}_chatbot_client_${RANDOM_STRING} 16 | CALLBACKURL="https://${t}.example.com/oauth2/callback" 17 | LOGOUTURL="https://${t}.example.com" 18 | CUSTOM_ATTR=Name="tenantid",AttributeDataType="String",DeveloperOnlyAttribute=false,Mutable=true,Required=false,StringAttributeConstraints="{MinLength=1,MaxLength=20}" 19 | READ_ATTR="custom:tenantid" 20 | USER_ATTR=Name="custom:tenantid" 21 | 22 | export POOLINFO=$(aws cognito-idp create-user-pool \ 23 | --pool-name ${POOLNAME} \ 24 | --schema Name=email,AttributeDataType=String,DeveloperOnlyAttribute=false,Mutable=true,Required=true,StringAttributeConstraints="{MinLength=\"1\",MaxLength=\"64\"}" \ 25 | --mfa-configuration OFF \ 26 | --policies 'PasswordPolicy={MinimumLength=8,RequireUppercase=true,RequireLowercase=true,RequireNumbers=true,RequireSymbols=false,TemporaryPasswordValidityDays=7}' \ 27 | --username-configuration CaseSensitive=false \ 28 | --admin-create-user-config AllowAdminCreateUserOnly=true) 29 | 30 | POOLID=$(echo $POOLINFO|jq -r '.UserPool.Id') 31 | 32 | echo "Adding Custom Attributes to User Pool" 33 | aws cognito-idp add-custom-attributes --user-pool-id ${POOLID} \ 34 | --custom-attributes ${CUSTOM_ATTR} 35 | 36 | echo "Creating User Pool Client for ${t}" 37 | CLIENT=$(aws cognito-idp create-user-pool-client \ 38 | --user-pool-id ${POOLID} \ 39 | --client-name ${CLIENTNAME} \ 40 | --generate-secret \ 41 | --refresh-token-validity "1" \ 42 | --access-token-validity "1" \ 43 | --id-token-validity "1" \ 44 | --token-validity-units AccessToken="hours",IdToken="hours",RefreshToken="hours" \ 45 | --read-attributes ${READ_ATTR}) 46 | 47 | CLIENTID=$(echo $CLIENT|jq -r '.UserPoolClient.ClientId') 48 | CLIENTSECRET=$(echo $CLIENT|jq -r '.UserPoolClient.ClientSecret') 49 | 50 | echo "Setting User Pool Client Properties for ${t}" 51 | aws cognito-idp update-user-pool-client \ 52 | --user-pool-id ${POOLID} \ 53 | --client-id ${CLIENTID} \ 54 | --explicit-auth-flows ALLOW_REFRESH_TOKEN_AUTH \ 55 | --prevent-user-existence-errors "ENABLED" \ 56 | --supported-identity-providers "COGNITO" \ 57 | --callback-urls ${CALLBACKURL} \ 58 | --logout-urls ${LOGOUTURL} \ 59 | --allowed-o-auth-flows "code" \ 60 | --allowed-o-auth-scopes "openid" \ 61 | --allowed-o-auth-flows-user-pool-client \ 62 | --read-attributes "${READ_ATTR}" "email" "email_verified" 2>&1 > /dev/null 63 | 64 | echo "Creating User Pool Domain for ${t}" 65 | aws cognito-idp create-user-pool-domain \ 66 | --domain ${DOMAIN} \ 67 | --user-pool-id ${POOLID} 68 | 69 | for u in ${USERS} 70 | do 71 | USER=${u}@${t}.com 72 | echo "Creating ${USER} in ${POOLNAME}" 73 | read -s -p "Enter a Password for ${USER}: " PASSWORD 74 | printf "\n" 75 | 76 | aws cognito-idp admin-create-user \ 77 | --user-pool-id ${POOLID} \ 78 | --username ${USER} 2>&1 > /dev/null 79 | aws cognito-idp admin-set-user-password \ 80 | --user-pool-id ${POOLID} \ 81 | --username ${USER} \ 82 | --password ${PASSWORD} \ 83 | --permanent 84 | 85 | echo "Setting User Custom Attributes for ${USER}" 86 | aws cognito-idp admin-update-user-attributes \ 87 | --user-pool-id ${POOLID} \ 88 | --username ${USER} \ 89 | --user-attributes ${USER_ATTR},Value="${TENANT_ID}" 90 | 91 | aws cognito-idp admin-update-user-attributes \ 92 | --user-pool-id ${POOLID} \ 93 | --username ${USER} \ 94 | --user-attributes Name="email",Value="${USER}" 95 | 96 | aws cognito-idp admin-update-user-attributes \ 97 | --user-pool-id ${POOLID} \ 98 | --username ${USER} \ 99 | --user-attributes Name="email_verified",Value="true" 100 | done 101 | 102 | HOST="${t}.example.com" 103 | CALLBACK_URI="https://${t}.example.com/oauth2/callback" 104 | export ISSUER_URI=https://cognito-idp.${AWS_REGION}.amazonaws.com/$POOLID 105 | COOKIE_SECRET=$(openssl rand -base64 32 | head -c 32 | base64) 106 | 107 | echo "Creating AuthN Policy for ${t}" 108 | cat << EOF > ${YAML_PATH}/frontend-jwt-auth-${t}.yaml 109 | apiVersion: security.istio.io/v1beta1 110 | kind: RequestAuthentication 111 | metadata: 112 | name: frontend-jwt-auth 113 | spec: 114 | selector: 115 | matchLabels: 116 | workload-tier: frontend 117 | jwtRules: 118 | - issuer: "${ISSUER_URI}" 119 | forwardOriginalToken: true 120 | outputClaimToHeaders: 121 | - header: "x-auth-request-tenantid" 122 | claim: "custom:tenantid" 123 | 124 | EOF 125 | 126 | echo "Creating AuthZ Policy for ${t}" 127 | cat << EOF > ${YAML_PATH}/frontend-authz-pol-${t}.yaml 128 | apiVersion: security.istio.io/v1beta1 129 | kind: AuthorizationPolicy 130 | metadata: 131 | name: frontend-authz-pol 132 | spec: 133 | selector: 134 | matchLabels: 135 | workload-tier: frontend 136 | action: ALLOW 137 | rules: 138 | - from: 139 | - source: 140 | namespaces: ["istio-ingress"] 141 | principals: ["cluster.local/ns/istio-ingress/sa/istio-ingressgateway"] 142 | when: 143 | - key: request.auth.claims[custom:tenantid] 144 | values: ["${TENANT_ID}"] 145 | EOF 146 | 147 | echo "Creating oauth2-proxy Configuration for ${t}" 148 | cat << EOF > ${YAML_PATH}/oauth2-proxy-${t}-values.yaml 149 | config: 150 | clientID: "${CLIENTID}" 151 | clientSecret: "${CLIENTSECRET}" 152 | cookieSecret: "${COOKIE_SECRET}=" 153 | configFile: |- 154 | auth_logging = true 155 | cookie_httponly = true 156 | cookie_refresh = "1h" 157 | cookie_secure = true 158 | oidc_issuer_url = "${ISSUER_URI}" 159 | redirect_url = "${CALLBACK_URI}" 160 | scope="openid" 161 | reverse_proxy = true 162 | pass_host_header = true 163 | pass_access_token = true 164 | pass_authorization_header = true 165 | provider = "oidc" 166 | request_logging = true 167 | set_authorization_header = true 168 | set_xauthrequest = true 169 | session_store_type = "cookie" 170 | silence_ping_logging = true 171 | skip_provider_button = true 172 | skip_auth_strip_headers = false 173 | ssl_insecure_skip_verify = true 174 | skip_jwt_bearer_tokens = true 175 | standard_logging = true 176 | upstreams = [ "static://200" ] 177 | email_domains = [ "*" ] 178 | whitelist_domains = ["${t}.example.com"] 179 | EOF 180 | 181 | NL=$'\n' 182 | HOSTS+=" - $HOST$NL" 183 | 184 | echo "" 185 | done 186 | 187 | echo "Creating AuthorizationPolicy on Istio Ingress Gateway" 188 | cat << EOF > ${YAML_PATH}/chatbot-auth-policy.yaml 189 | apiVersion: security.istio.io/v1beta1 190 | kind: AuthorizationPolicy 191 | metadata: 192 | name: cluster1-auth-policy 193 | namespace: istio-system 194 | spec: 195 | selector: 196 | matchLabels: 197 | istio: ingressgateway 198 | action: CUSTOM 199 | provider: 200 | name: rev-proxy 201 | rules: 202 | - to: 203 | - operation: 204 | hosts: 205 | $HOSTS 206 | EOF -------------------------------------------------------------------------------- /envoy-config/envoy-cds.yaml: -------------------------------------------------------------------------------- 1 | resources: 2 | - '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster 3 | connect_timeout: 30s 4 | dns_lookup_family: AUTO 5 | lb_policy: ROUND_ROBIN 6 | load_assignment: 7 | cluster_name: tenanta_oidc_proxy 8 | endpoints: 9 | - lb_endpoints: 10 | - endpoint: 11 | address: 12 | socket_address: 13 | address: oauth2-proxy.tenanta-oidc-proxy-ns.svc.cluster.local 14 | port_value: 80 15 | name: tenanta_oidc_proxy 16 | type: LOGICAL_DNS 17 | - '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster 18 | connect_timeout: 30s 19 | dns_lookup_family: AUTO 20 | lb_policy: ROUND_ROBIN 21 | load_assignment: 22 | cluster_name: tenantb_oidc_proxy 23 | endpoints: 24 | - lb_endpoints: 25 | - endpoint: 26 | address: 27 | socket_address: 28 | address: oauth2-proxy.tenantb-oidc-proxy-ns.svc.cluster.local 29 | port_value: 80 30 | name: tenantb_oidc_proxy 31 | type: LOGICAL_DNS -------------------------------------------------------------------------------- /envoy-config/envoy-lds.yaml: -------------------------------------------------------------------------------- 1 | resources: 2 | - "@type": type.googleapis.com/envoy.config.listener.v3.Listener 3 | address: 4 | socket_address: 5 | address: 0.0.0.0 6 | port_value: 8000 7 | protocol: TCP 8 | filter_chains: 9 | - filters: 10 | - name: envoy.filters.network.http_connection_manager 11 | typed_config: 12 | '@type': type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager 13 | access_log: 14 | - name: envoy.access_loggers.file 15 | typed_config: 16 | '@type': type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog 17 | path: /dev/stdout 18 | http_filters: 19 | - name: envoy.filters.http.router 20 | typed_config: 21 | '@type': type.googleapis.com/envoy.extensions.filters.http.router.v3.Router 22 | route_config: 23 | name: local_route 24 | virtual_hosts: 25 | - domains: 26 | - tenanta.example.com 27 | name: tenanta 28 | routes: 29 | - match: 30 | prefix: / 31 | route: 32 | cluster: tenanta_oidc_proxy 33 | - domains: 34 | - tenantb.example.com 35 | name: tenantb 36 | routes: 37 | - match: 38 | prefix: / 39 | route: 40 | cluster: tenantb_oidc_proxy 41 | server_header_transformation: PASS_THROUGH 42 | stat_prefix: ingress_http 43 | name: listener_0 44 | drain_type: DEFAULT -------------------------------------------------------------------------------- /envoy-config/envoy.yaml: -------------------------------------------------------------------------------- 1 | admin: 2 | address: 3 | socket_address: 4 | address: 0.0.0.0 5 | port_value: 9901 6 | protocol: TCP 7 | node: 8 | cluster: envoy-reverse-proxy-cluster 9 | id: envoy-reverse-proxy-id 10 | dynamic_resources: 11 | lds_config: 12 | path: "/config/envoy/envoy-lds.yaml" 13 | cds_config: 14 | path: "/config/envoy/envoy-cds.yaml" -------------------------------------------------------------------------------- /fastapi_request.py: -------------------------------------------------------------------------------- 1 | import os 2 | import boto3 3 | from enum import Enum 4 | from pydantic import BaseModel 5 | from typing import List 6 | 7 | class Text2TextModelName(str, Enum): 8 | modelId = os.environ.get('TEXT2TEXT_MODEL_ID') 9 | 10 | class EmbeddingsModelName(str, Enum): 11 | modelId = os.environ.get('EMBEDDING_MODEL_ID') 12 | 13 | class VectorDBType(str, Enum): 14 | FAISS = "faiss" 15 | 16 | class Request(BaseModel): 17 | q: str 18 | user_session_id: str 19 | max_length: int = 500 20 | num_return_sequences: int = 1 21 | do_sample: bool = False 22 | verbose: bool = False 23 | max_matching_docs: int = 3 24 | # Bedrock / Titan 25 | temperature: float = 0.1 26 | maxTokenCount: int = 512 27 | stopSequences: List = ['\n\nHuman:'] 28 | topP: float = 0.9 29 | topK: int = 250 30 | 31 | text_generation_model: Text2TextModelName = Text2TextModelName.modelId 32 | embeddings_generation_model: EmbeddingsModelName = EmbeddingsModelName.modelId 33 | vectordb_s3_path: str = f"s3://{os.environ.get('CONTEXTUAL_DATA_BUCKET')}/faiss_index/" 34 | vectordb_type: VectorDBType = VectorDBType.FAISS 35 | 36 | MODEL_ENDPOINT_MAPPING = { 37 | Text2TextModelName.modelId: os.environ.get('TEXT2TEXT_MODEL_ID'), 38 | EmbeddingsModelName.modelId: os.environ.get('EMBEDDING_MODEL_ID'), 39 | } 40 | -------------------------------------------------------------------------------- /hosts-file-entry.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | . ~/.bash_profile 3 | 4 | export LB_FQDN=$(kubectl -n istio-ingress \ 5 | get svc istio-ingressgateway \ 6 | -o=jsonpath='{.status.loadBalancer.ingress[0].hostname}') 7 | 8 | export LB_NAME=${EKS_CLUSTER_NAME}-nlb 9 | 10 | STATUS=$(aws elbv2 describe-load-balancers --name ${LB_NAME} \ 11 | --query 'LoadBalancers[0].State.Code' \ 12 | | xargs) 13 | 14 | echo "Status of Load Balancer ${LB_NAME}: $STATUS" 15 | 16 | if [[ $STATUS == "active" ]] 17 | then 18 | echo "You can update your hosts file with the following entries:" 19 | echo "---------------------------------------------------------" 20 | 21 | dig +noall +short +answer ${LB_FQDN} \ 22 | | awk '{printf "%s\ttenanta.example.com\n%s\ttenantb.example.com\n",$0,$0}' 23 | else 24 | echo 'Load Balancer is not ready!!! Try again, shortly.' 25 | fi 26 | -------------------------------------------------------------------------------- /iam/chatbot-access-role-trust-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Principal": { 7 | "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}" 8 | }, 9 | "Action": "sts:AssumeRoleWithWebIdentity", 10 | "Condition": { 11 | "StringLike": { 12 | "oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}:sub": 13 | "system:serviceaccount:${NAMESPACE}:${SA_NAME}", 14 | "oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com" 15 | } 16 | } 17 | } 18 | ] 19 | } -------------------------------------------------------------------------------- /iam/dynamodb-access-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Sid": "LLMDemoDynamoDBAccess", 6 | "Effect": "Allow", 7 | "Action": [ 8 | "dynamodb:GetItem", 9 | "dynamodb:BatchGetItem", 10 | "dynamodb:Query", 11 | "dynamodb:DescribeTable", 12 | "dynamodb:DeleteItem", 13 | "dynamodb:Scan", 14 | "dynamodb:PutItem", 15 | "dynamodb:UpdateItem", 16 | "dynamodb:BatchWriteItem", 17 | "dynamodb:ConditionCheckItem" 18 | ], 19 | "Resource": [ 20 | "arn:aws:dynamodb:${AWS_REGION}:${ACCOUNT_ID}:table/Sessions_${TENANT}_${RANDOM_STRING}", 21 | "arn:aws:dynamodb:${AWS_REGION}:${ACCOUNT_ID}:table/Sessions_${TENANT}_${RANDOM_STRING}/index/*", 22 | "arn:aws:dynamodb:${AWS_REGION}:${ACCOUNT_ID}:table/ChatHistory_${TENANT}_${RANDOM_STRING}", 23 | "arn:aws:dynamodb:${AWS_REGION}:${ACCOUNT_ID}:table/ChatHistory_${TENANT}_${RANDOM_STRING}/index/*" 24 | ] 25 | }, 26 | { 27 | "Sid": "LLMDemoBedrockAccess", 28 | "Effect": "Allow", 29 | "Action": [ 30 | "bedrock:InvokeModel" 31 | ], 32 | "Resource": "*" 33 | } 34 | ] 35 | } -------------------------------------------------------------------------------- /iam/s3-access-role-trust-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Principal": { 7 | "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}" 8 | }, 9 | "Action": "sts:AssumeRoleWithWebIdentity", 10 | "Condition": { 11 | "StringLike": { 12 | "oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}:sub": 13 | "system:serviceaccount:envoy-reverse-proxy-ns:envoy-reverse-proxy-sa", 14 | "oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com" 15 | } 16 | } 17 | } 18 | ] 19 | } -------------------------------------------------------------------------------- /iam/s3-contextual-data-access-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Action": [ 7 | "s3:GetObject", 8 | "s3:GetObjectVersion" 9 | ], 10 | "Resource": "arn:aws:s3:::contextual-data-${TENANT}-${RANDOM_STRING}/*" 11 | } 12 | ] 13 | } -------------------------------------------------------------------------------- /iam/s3-envoy-config-access-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Action": [ 7 | "s3:GetObject", 8 | "s3:GetObjectVersion" 9 | ], 10 | "Resource": "arn:aws:s3:::${ENVOY_CONFIG_BUCKET}/*" 11 | } 12 | ] 13 | } -------------------------------------------------------------------------------- /image-build/Dockerfile-api: -------------------------------------------------------------------------------- 1 | FROM public.ecr.aws/docker/library/python:3.11.4-slim AS installer-image 2 | WORKDIR /app 3 | RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update -y 2>/dev/null >/dev/null && \ 4 | DEBIAN_FRONTEND=noninteractive apt-get -qq install -y \ 5 | build-essential \ 6 | curl 2>/dev/null >/dev/null \ 7 | && rm -rf /var/lib/apt/lists/* 8 | ADD api/requirements.txt ./ 9 | RUN pip install --upgrade -q -q pip && \ 10 | pip install --user --upgrade -q -q pip && pip install --user -q -q -r requirements.txt && \ 11 | python -m pip install --user -q -q botocore && \ 12 | python -m pip install --user -q -q boto3 13 | 14 | FROM public.ecr.aws/docker/library/python:3.11.4-slim 15 | RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update -y 2>/dev/null >/dev/null && \ 16 | DEBIAN_FRONTEND=noninteractive apt-get -qq upgrade -y 2>/dev/null >/dev/null && \ 17 | DEBIAN_FRONTEND=noninteractive apt install -qq -y curl 2>/dev/null >/dev/null && \ 18 | addgroup --gid 8000 ragapi && \ 19 | adduser --uid 8000 --gid 8000 --disabled-password --gecos "" ragapi 20 | USER ragapi 21 | WORKDIR /home/ragapi/app 22 | COPY --chown=ragapi:ragapi --from=installer-image /root/.local /home/ragapi/.local/ 23 | COPY --chown=ragapi:ragapi api/app /home/ragapi/app/ 24 | ENV PATH=/home/ragapi/.local/bin:$PATH 25 | EXPOSE 8000 26 | ENTRYPOINT ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "main:app"] 27 | -------------------------------------------------------------------------------- /image-build/Dockerfile-app: -------------------------------------------------------------------------------- 1 | FROM public.ecr.aws/docker/library/python:3.11.4-slim AS installer-image 2 | WORKDIR /app 3 | RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update -y 2>/dev/null >/dev/null && \ 4 | DEBIAN_FRONTEND=noninteractive apt-get -qq install -y \ 5 | build-essential \ 6 | curl \ 7 | software-properties-common 2>/dev/null >/dev/null \ 8 | && rm -rf /var/lib/apt/lists/* 9 | ADD app/* ./ 10 | RUN pip install --user --upgrade -q -q pip && pip install --user -q -q -r requirements.txt 11 | 12 | 13 | FROM public.ecr.aws/docker/library/python:3.11.4-slim 14 | RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update -y 2>/dev/null >/dev/null && \ 15 | DEBIAN_FRONTEND=noninteractive apt-get -qq upgrade -y 2>/dev/null >/dev/null && \ 16 | DEBIAN_FRONTEND=noninteractive apt install -qq -y curl 2>/dev/null >/dev/null && \ 17 | addgroup --gid 8000 streamlit && \ 18 | adduser --uid 8000 --gid 8000 --disabled-password --gecos "" streamlit 19 | USER streamlit 20 | WORKDIR /home/streamlit/app 21 | COPY --chown=streamlit:streamlit --from=installer-image /root/.local /home/streamlit/.local/ 22 | COPY --chown=streamlit:streamlit app/*.py /home/streamlit/app/ 23 | ENV PATH=/home/streamlit/.local/bin:$PATH 24 | EXPOSE 8501 25 | HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health 26 | ENTRYPOINT ["streamlit", "run", "webapp.py", "--server.port=8501", "--server.address=0.0.0.0"] -------------------------------------------------------------------------------- /image-build/build-chatbot-image.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | . ~/.bash_profile 3 | 4 | aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS \ 5 | --password-stdin ${REPO_URI_CHATBOT} 6 | 7 | ECR_IMAGE=$( 8 | aws ecr list-images \ 9 | --repository-name ${ECR_REPO_CHATBOT} \ 10 | --query 'imageIds[0].imageDigest' \ 11 | --output text 12 | ) 13 | 14 | aws ecr batch-delete-image \ 15 | --repository-name ${ECR_REPO_CHATBOT} \ 16 | --image-ids imageDigest=${ECR_IMAGE} 17 | 18 | docker build -f image-build/Dockerfile-app -t ${REPO_URI_CHATBOT}:latest . 19 | docker push ${REPO_URI_CHATBOT}:latest -------------------------------------------------------------------------------- /image-build/build-rag-api-image.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | . ~/.bash_profile 3 | 4 | aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS \ 5 | --password-stdin ${REPO_URI_RAGAPI} 6 | 7 | ECR_IMAGE=$( 8 | aws ecr list-images \ 9 | --repository-name ${ECR_REPO_RAGAPI} \ 10 | --query 'imageIds[0].imageDigest' \ 11 | --output text 12 | ) 13 | 14 | aws ecr batch-delete-image \ 15 | --repository-name ${ECR_REPO_RAGAPI} \ 16 | --image-ids imageDigest=${ECR_IMAGE} 17 | 18 | docker build -f image-build/Dockerfile-api -t ${REPO_URI_RAGAPI}:latest . 19 | docker push ${REPO_URI_RAGAPI}:latest -------------------------------------------------------------------------------- /istio-proxy-v2-config/enable-X-Forwarded-For-header.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.istio.io/v1alpha3 2 | kind: EnvoyFilter 3 | metadata: 4 | name: ingressgateway-settings 5 | spec: 6 | configPatches: 7 | - applyTo: NETWORK_FILTER 8 | match: 9 | listener: 10 | filterChain: 11 | filter: 12 | name: envoy.filters.network.http_connection_manager 13 | patch: 14 | operation: MERGE 15 | value: 16 | name: envoy.filters.network.http_connection_manager 17 | typed_config: 18 | "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager 19 | skip_xff_append: false 20 | use_remote_address: true 21 | xff_num_trusted_hops: 1 22 | -------------------------------------------------------------------------------- /istio-proxy-v2-config/proxy-protocol-envoy-filter.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: networking.istio.io/v1alpha3 2 | kind: EnvoyFilter 3 | metadata: 4 | name: proxy-protocol 5 | spec: 6 | workloadSelector: 7 | labels: 8 | istio: ingressgateway 9 | configPatches: 10 | - applyTo: LISTENER 11 | patch: 12 | operation: MERGE 13 | value: 14 | listener_filters: 15 | - name: envoy.filters.listener.proxy_protocol 16 | - name: envoy.filters.listener.tls_inspector 17 | -------------------------------------------------------------------------------- /setup.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | . ~/.bash_profile 3 | 4 | export TEXT2TEXT_MODEL_ID=anthropic.claude-instant-v1 5 | export EMBEDDING_MODEL_ID=amazon.titan-embed-text-v1 6 | export BEDROCK_SERVICE=bedrock-runtime 7 | echo "export TEXT2TEXT_MODEL_ID=${TEXT2TEXT_MODEL_ID}" \ 8 | | tee -a ~/.bash_profile 9 | echo "export EMBEDDING_MODEL_ID=${EMBEDDING_MODEL_ID}" \ 10 | | tee -a ~/.bash_profile 11 | echo "export BEDROCK_SERVICE=${BEDROCK_SERVICE}" \ 12 | | tee -a ~/.bash_profile 13 | 14 | export KUBECTL_VERSION="1.27.1/2023-04-19" 15 | 16 | if [ "x${KUBECTL_VERSION}" == "x" ] 17 | then 18 | echo "################" 19 | echo "Please specify a version for kubectl" 20 | echo "################" 21 | exit 22 | fi 23 | 24 | export EKS_CLUSTER_NAME="multitenant-chatapp" 25 | 26 | if [ "x${EKS_CLUSTER_NAME}" == "x" ] 27 | then 28 | echo "################" 29 | echo "Please specify a name for the EKS Cluster" 30 | echo "################" 31 | exit 32 | fi 33 | 34 | echo "export EKS_CLUSTER_NAME=${EKS_CLUSTER_NAME}" | tee -a ~/.bash_profile 35 | 36 | export ISTIO_VERSION="1.18.3" 37 | 38 | if [ "x${ISTIO_VERSION}" == "x" ] 39 | then 40 | echo "################" 41 | echo "Please specify a version for Istio" 42 | echo "################" 43 | exit 44 | fi 45 | 46 | echo "export ISTIO_VERSION=${ISTIO_VERSION}" \ 47 | | tee -a ~/.bash_profile 48 | 49 | echo "Installing helper tools" 50 | sudo yum -q -y install jq bash-completion 51 | sudo amazon-linux-extras install -q -y python3.8 2>/dev/null >/dev/null 52 | python3.8 -m pip install -q -q --user botocore 53 | python3.8 -m pip install -q -q --user boto3 54 | 55 | echo "Uninstalling AWS CLI 1.x" 56 | sudo pip uninstall awscli -y 57 | 58 | echo "Installing AWS CLI 2.x" 59 | curl --silent --no-progress-meter \ 60 | "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" \ 61 | -o "awscliv2.zip" 62 | unzip -qq awscliv2.zip 63 | sudo ./aws/install --update 64 | PATH=/usr/local/bin:$PATH 65 | /usr/local/bin/aws --version 66 | rm -rf aws awscliv2.zip 67 | 68 | CLOUD9_EC2_ROLE="Cloud9AdminRole" 69 | 70 | AWS=$(which aws) 71 | 72 | echo "---------------------------" 73 | ${AWS} sts get-caller-identity --query Arn | \ 74 | grep ${CLOUD9_EC2_ROLE} -q && echo "IAM role valid. You can continue setting up the EKS Cluster." || \ 75 | echo "IAM role NOT valid. Do not proceed with creating the EKS Cluster or you won't be able to authenticate. 76 | Ensure you assigned the role to your EC2 instance as detailed in the README.md" 77 | echo "---------------------------" 78 | 79 | export AWS_REGION=$(curl --silent --no-progress-meter \ 80 | http://169.254.169.254/latest/dynamic/instance-identity/document \ 81 | | jq -r '.region') 82 | export AWS_DEFAULT_REGION=$AWS_REGION 83 | 84 | export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account) 85 | 86 | export RANDOM_STRING=$(cat /dev/urandom \ 87 | | tr -dc '[:alpha:]' \ 88 | | fold -w ${1:-20} | head -n 1 \ 89 | | cut -c 1-8 \ 90 | | tr '[:upper:]' '[:lower:]') 91 | 92 | echo "export RANDOM_STRING=${RANDOM_STRING}" | tee -a ~/.bash_profile 93 | 94 | echo "Installing kubectl" 95 | sudo curl --silent --no-progress-meter --location -o /usr/local/bin/kubectl \ 96 | https://s3.us-west-2.amazonaws.com/amazon-eks/${KUBECTL_VERSION}/bin/linux/amd64/kubectl 97 | 98 | sudo chmod +x /usr/local/bin/kubectl 99 | 100 | kubectl version --client=true 101 | 102 | echo "Installing bash completion for kubectl" 103 | kubectl completion bash >> ~/.bash_completion 104 | . /etc/profile.d/bash_completion.sh 105 | . ~/.bash_completion 106 | 107 | echo "Installing eksctl" 108 | curl --silent --no-progress-meter \ 109 | --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \ 110 | | tar xz -C /tmp 111 | 112 | sudo mv -v /tmp/eksctl /usr/local/bin 113 | 114 | echo "eksctl Version: $(eksctl version)" 115 | 116 | echo "Installing bash completion for eksctl" 117 | eksctl completion bash >> ~/.bash_completion 118 | . /etc/profile.d/bash_completion.sh 119 | . ~/.bash_completion 120 | 121 | test -n "$AWS_REGION" && echo AWS_REGION is "$AWS_REGION" || echo AWS_REGION is not set 122 | echo "export ACCOUNT_ID=${ACCOUNT_ID}" | tee -a ~/.bash_profile 123 | echo "export AWS_REGION=${AWS_REGION}" | tee -a ~/.bash_profile 124 | echo "export AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}" | tee -a ~/.bash_profile 125 | aws configure set default.region ${AWS_REGION} 126 | aws configure get default.region 127 | 128 | export YAML_PATH=yaml 129 | echo "export YAML_PATH=yaml" | tee -a ~/.bash_profile 130 | [ -d ${YAML_PATH} ] || mkdir ${YAML_PATH} 131 | 132 | export ENVOY_CONFIG_BUCKET="envoy-config-${RANDOM_STRING}" 133 | aws s3 mb s3://${ENVOY_CONFIG_BUCKET} 134 | 135 | if [[ $? -eq 0 ]] 136 | then 137 | echo "export ENVOY_CONFIG_BUCKET=${ENVOY_CONFIG_BUCKET}" \ 138 | | tee -a ~/.bash_profile 139 | fi 140 | 141 | # Creating S3 Bucket Policy for Envoy Dynamic Configuration Files 142 | echo "Creating S3 Bucket Policy for Envoy Dynamic Configuration Files" 143 | envsubst < iam/s3-envoy-config-access-policy.json | \ 144 | xargs -0 -I {} aws iam create-policy \ 145 | --policy-name s3-envoy-config-access-policy-${RANDOM_STRING} \ 146 | --policy-document {} 147 | 148 | # Creating DynamoDB and Bedrock access policy for chatbot app 149 | TENANTS="tenanta tenantb" 150 | for t in $TENANTS 151 | do 152 | export TENANT=${t} 153 | 154 | echo "Creating Contextual Data S3 Bucket for ${t}" 155 | aws s3 mb s3://contextual-data-${t}-${RANDOM_STRING} 156 | 157 | if [ "${t}" == "tenanta" ] 158 | then 159 | aws s3 cp data/Amazon_SageMaker_FAQs.csv s3://contextual-data-${t}-${RANDOM_STRING} 160 | elif [ "${t}" == "tenantb" ] 161 | then 162 | aws s3 cp data/Amazon_EMR_FAQs.csv s3://contextual-data-${t}-${RANDOM_STRING} 163 | fi 164 | 165 | echo "S3 access policy for ${t}" 166 | envsubst < iam/s3-contextual-data-access-policy.json | \ 167 | xargs -0 -I {} aws iam create-policy \ 168 | --policy-name s3-contextual-data-access-policy-${t}-${RANDOM_STRING} \ 169 | --policy-document {} 170 | 171 | echo "DynamoDB and Bedrock access policy for ${t} chatbot app" 172 | envsubst < iam/dynamodb-access-policy.json | \ 173 | xargs -0 -I {} aws iam create-policy \ 174 | --policy-name dynamodb-access-policy-${t}-${RANDOM_STRING} \ 175 | --policy-document {} 176 | done 177 | 178 | # Ingest Data to FAISS Index 179 | source ~/.bash_profile 180 | pip3.8 install -q -q --user -r data_ingestion_to_vectordb/requirements.txt 181 | python3.8 data_ingestion_to_vectordb/data_ingestion_to_vectordb.py 182 | 183 | echo "Creating Chatbot ECR Repository" 184 | export ECR_REPO_CHATBOT=$(aws ecr create-repository \ 185 | --repository-name ${EKS_CLUSTER_NAME}-${RANDOM_STRING}-chatbot \ 186 | --encryption-configuration encryptionType=KMS) 187 | export REPO_URI_CHATBOT=$(echo ${ECR_REPO_CHATBOT}|jq -r '.repository.repositoryUri') 188 | export REPO_CHATBOT=$(echo ${ECR_REPO_CHATBOT}|jq -r '.repository.repositoryName') 189 | 190 | echo "Creating rag-api ECR Repository" 191 | export ECR_REPO_RAGAPI=$(aws ecr create-repository \ 192 | --repository-name ${EKS_CLUSTER_NAME}-${RANDOM_STRING}-rag-api \ 193 | --encryption-configuration encryptionType=KMS) 194 | export REPO_URI_RAGAPI=$(echo ${ECR_REPO_RAGAPI}|jq -r '.repository.repositoryUri') 195 | export REPO_RAGAPI=$(echo ${ECR_REPO_RAGAPI}|jq -r '.repository.repositoryName') 196 | 197 | echo "export ECR_REPO_CHATBOT=${REPO_CHATBOT}" | tee -a ~/.bash_profile 198 | echo "export REPO_URI_CHATBOT=${REPO_URI_CHATBOT}" | tee -a ~/.bash_profile 199 | echo "export ECR_REPO_RAGAPI=${REPO_RAGAPI}" | tee -a ~/.bash_profile 200 | echo "export REPO_URI_RAGAPI=${REPO_URI_RAGAPI}" | tee -a ~/.bash_profile 201 | 202 | echo "Building Chatbot and RAG-API Images" 203 | sh image-build/build-chatbot-image.sh 204 | docker rmi -f $(docker images -a -q) &> /dev/null 205 | sh image-build/build-rag-api-image.sh 206 | docker rmi -f $(docker images -a -q) &> /dev/null 207 | 208 | echo "Installing helm" 209 | curl --no-progress-meter \ 210 | -sSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash 211 | 212 | helm version --template='Version: {{.Version}}'; echo 213 | 214 | rm -vf ${HOME}/.aws/credentials 215 | 216 | echo "Generating a new key" 217 | ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa <<< y 218 | 219 | export EC2_KEY_NAME=${EKS_CLUSTER_NAME}-${RANDOM_STRING} 220 | aws ec2 import-key-pair --key-name ${EC2_KEY_NAME} --public-key-material fileb://~/.ssh/id_rsa.pub 221 | echo "export EC2_KEY_NAME=${EC2_KEY_NAME}" | tee -a ~/.bash_profile 222 | 223 | echo "Creating KMS Key and Alias" 224 | export KMS_KEY_ALIAS=${EKS_CLUSTER_NAME}-${RANDOM_STRING} 225 | aws kms create-alias --alias-name alias/${KMS_KEY_ALIAS} \ 226 | --target-key-id $(aws kms create-key --query KeyMetadata.Arn --output text) 227 | echo "export KMS_KEY_ALIAS=${KMS_KEY_ALIAS}" | tee -a ~/.bash_profile 228 | 229 | export MASTER_ARN=$(aws kms describe-key --key-id alias/${KMS_KEY_ALIAS} \ 230 | --query KeyMetadata.Arn --output text) 231 | echo "export MASTER_ARN=${MASTER_ARN}" | tee -a ~/.bash_profile 232 | --------------------------------------------------------------------------------