├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── GenerativeAIImmersionDayPresentationDeck.pdf ├── LICENSE ├── README.md ├── img ├── 3-event-access-code.png ├── 3-event-generator-aws-console-3.png ├── 3-event-generator-aws-console-signout.png ├── 3-one-time-passcode-2.png ├── 3-one-time-passcode.png ├── 3-sign-in.png ├── 3-terms-and-condition.png ├── 3-test-event.png ├── ChatEC2.png ├── KendraArchitecture.png ├── Langchain.png ├── May-31-2023 16-48-26.gif ├── PatentChat.png ├── chat-frontend.png ├── cloned.png ├── consoleLogin.png ├── delete-flant5xxl.gif ├── delete-kendra-index.gif ├── deploy-flant5xxl.gif ├── endpointEndpointConfiguration.png ├── enterPasscode.png ├── eventEngineAccess.png ├── eventHash.png ├── get-kendra-index.gif ├── get-url.gif ├── kendra-add-s3-bucket.png ├── kendra-index-create.gif ├── mgmtConsole.png ├── new_s3_connection.gif ├── notebooksComputeResources.png ├── openLauncher.png ├── otp.png ├── otpEmail.png ├── python-310.gif ├── quota-limit-increase.png ├── rag-architecture.png ├── rag-concept.png ├── rename.png ├── renameFile.png ├── sagemaker.png ├── sagemakerDomain.png ├── sagemakerLoading.png ├── sagemakerStart.png ├── sagemakerStudio.png ├── sendPasscode.png ├── teamDashboard.png ├── untitled.png ├── update-execution-role.gif └── update-trust-relationship.gif ├── lab1 ├── code │ ├── inference.py │ └── requirements.txt ├── falcon40B-instruct-notebook-full.ipynb └── gpt-j-notebook-full.ipynb ├── lab2 ├── fine-tuning.ipynb └── finetuning │ ├── finetuning.py │ └── requirements.txt ├── lab3 └── JumpStart_Stable_Diffusion_Inference_Only.ipynb └── lab4 ├── cf.yml ├── fe ├── Dockerfile ├── app.py ├── aws.png ├── requirements.txt └── setup.sh ├── rag-lab.ipynb ├── rag_app ├── kendra │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-311.pyc │ │ ├── kendra_index_retriever.cpython-311.pyc │ │ └── kendra_results.cpython-311.pyc │ ├── kendra_index_retriever.py │ └── kendra_results.py ├── rag_app.py └── requirements.txt └── template.yml /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | **/.aws-sam 3 | .venv -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /GenerativeAIImmersionDayPresentationDeck.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/GenerativeAIImmersionDayPresentationDeck.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Implementing Generative AI on AWS workshop 3 | 4 | **For full details please refer to Workshop Studio**: https://catalog.us-east-1.prod.workshops.aws/workshops/80ae1ed2-f415-4d3d-9eb0-e9118c147bd4 5 | 6 | This workshop is set up following the popular AWS Immersion Day format. It means to provide guidance on how to get started with Generative AI on AWS. The Immersion Day is split up into the following four blocks, consisting of a theory section covered by slides as well of a hands-on lab each: 7 | - Introduction Generative AI & Large Language Models, Large Language Model deployment & inference optimization 8 | - Large Language Model finetuning 9 | - Introduction Visual Foundation Models, deployment & inference optimization of Stable Diffusion 10 | - Engineering GenAI-powered applications on AWS 11 | 12 | Note that during an immersion day / workshop potentially only a subset of these topics might be covered. 13 | 14 | The repository is structured as follows: The slides can be found in the GenerativeAIImmersionDayPresentationDeck.pdf residing on root level of the repository. Similarily, the labs can be found in respectively named directories: 15 | - Lab 1 - Hosting Large Language Models can be found in the lab1 directory. 16 | - Option 1: For GPT-J start with the notebook option-1-gpt-j-notebook-full.ipynb. 17 | - Option 2: For Falcon40b-instruct start with the notebook falcon40b-instruct-notebook-full.ipynb. 18 | - Lab 2 - Finetuning Large Language Models can be found in the lab2 directory. Start with the notebook fine-tuning.ipynb. 19 | - Lab 3 - Hosting Stable Diffusion can be found in the lab3 directory. Start with the notebook JumpStart_Stable_Diffusion_Inference_Only.ipynb. 20 | - Lab 4 - Building the LLM-powered chatbot "AWSomeChat" with retrieval-augmented generation. Start with the notebook rag-app.ipynb. 21 | 22 | ## Security 23 | 24 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 25 | 26 | ## License 27 | 28 | This library is licensed under the MIT-0 License. See the LICENSE file. 29 | 30 | -------------------------------------------------------------------------------- /img/3-event-access-code.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-event-access-code.png -------------------------------------------------------------------------------- /img/3-event-generator-aws-console-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-event-generator-aws-console-3.png -------------------------------------------------------------------------------- /img/3-event-generator-aws-console-signout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-event-generator-aws-console-signout.png -------------------------------------------------------------------------------- /img/3-one-time-passcode-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-one-time-passcode-2.png -------------------------------------------------------------------------------- /img/3-one-time-passcode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-one-time-passcode.png -------------------------------------------------------------------------------- /img/3-sign-in.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-sign-in.png -------------------------------------------------------------------------------- /img/3-terms-and-condition.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-terms-and-condition.png -------------------------------------------------------------------------------- /img/3-test-event.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/3-test-event.png -------------------------------------------------------------------------------- /img/ChatEC2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/ChatEC2.png -------------------------------------------------------------------------------- /img/KendraArchitecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/KendraArchitecture.png -------------------------------------------------------------------------------- /img/Langchain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/Langchain.png -------------------------------------------------------------------------------- /img/May-31-2023 16-48-26.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/May-31-2023 16-48-26.gif -------------------------------------------------------------------------------- /img/PatentChat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/PatentChat.png -------------------------------------------------------------------------------- /img/chat-frontend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/chat-frontend.png -------------------------------------------------------------------------------- /img/cloned.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/cloned.png -------------------------------------------------------------------------------- /img/consoleLogin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/consoleLogin.png -------------------------------------------------------------------------------- /img/delete-flant5xxl.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/delete-flant5xxl.gif -------------------------------------------------------------------------------- /img/delete-kendra-index.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/delete-kendra-index.gif -------------------------------------------------------------------------------- /img/deploy-flant5xxl.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/deploy-flant5xxl.gif -------------------------------------------------------------------------------- /img/endpointEndpointConfiguration.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/endpointEndpointConfiguration.png -------------------------------------------------------------------------------- /img/enterPasscode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/enterPasscode.png -------------------------------------------------------------------------------- /img/eventEngineAccess.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/eventEngineAccess.png -------------------------------------------------------------------------------- /img/eventHash.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/eventHash.png -------------------------------------------------------------------------------- /img/get-kendra-index.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/get-kendra-index.gif -------------------------------------------------------------------------------- /img/get-url.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/get-url.gif -------------------------------------------------------------------------------- /img/kendra-add-s3-bucket.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/kendra-add-s3-bucket.png -------------------------------------------------------------------------------- /img/kendra-index-create.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/kendra-index-create.gif -------------------------------------------------------------------------------- /img/mgmtConsole.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/mgmtConsole.png -------------------------------------------------------------------------------- /img/new_s3_connection.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/new_s3_connection.gif -------------------------------------------------------------------------------- /img/notebooksComputeResources.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/notebooksComputeResources.png -------------------------------------------------------------------------------- /img/openLauncher.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/openLauncher.png -------------------------------------------------------------------------------- /img/otp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/otp.png -------------------------------------------------------------------------------- /img/otpEmail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/otpEmail.png -------------------------------------------------------------------------------- /img/python-310.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/python-310.gif -------------------------------------------------------------------------------- /img/quota-limit-increase.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/quota-limit-increase.png -------------------------------------------------------------------------------- /img/rag-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/rag-architecture.png -------------------------------------------------------------------------------- /img/rag-concept.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/rag-concept.png -------------------------------------------------------------------------------- /img/rename.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/rename.png -------------------------------------------------------------------------------- /img/renameFile.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/renameFile.png -------------------------------------------------------------------------------- /img/sagemaker.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/sagemaker.png -------------------------------------------------------------------------------- /img/sagemakerDomain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/sagemakerDomain.png -------------------------------------------------------------------------------- /img/sagemakerLoading.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/sagemakerLoading.png -------------------------------------------------------------------------------- /img/sagemakerStart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/sagemakerStart.png -------------------------------------------------------------------------------- /img/sagemakerStudio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/sagemakerStudio.png -------------------------------------------------------------------------------- /img/sendPasscode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/sendPasscode.png -------------------------------------------------------------------------------- /img/teamDashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/teamDashboard.png -------------------------------------------------------------------------------- /img/untitled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/untitled.png -------------------------------------------------------------------------------- /img/update-execution-role.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/update-execution-role.gif -------------------------------------------------------------------------------- /img/update-trust-relationship.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/img/update-trust-relationship.gif -------------------------------------------------------------------------------- /lab1/code/inference.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | from transformers import AutoTokenizer, GPTJForCausalLM, pipeline 4 | 5 | 6 | def model_fn(model_dir): 7 | 8 | tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6b") 9 | 10 | model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6b", revision="float16", torch_dtype=torch.float16) 11 | 12 | print('Model downloaded and loaded into memory...') 13 | 14 | if torch.cuda.is_available(): 15 | device = 0 16 | else: 17 | device = -1 18 | 19 | generation = pipeline("text-generation", model=model, tokenizer=tokenizer, device=device) 20 | 21 | return generation -------------------------------------------------------------------------------- /lab1/code/requirements.txt: -------------------------------------------------------------------------------- 1 | transformers==4.38.0 2 | 3 | -------------------------------------------------------------------------------- /lab1/falcon40B-instruct-notebook-full.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "attachments": {}, 5 | "cell_type": "markdown", 6 | "id": "34003d7d-74b8-4dd5-bbde-bd8de5353607", 7 | "metadata": { 8 | "tags": [] 9 | }, 10 | "source": [ 11 | "# Lab 1 (b)\n", 12 | "\n", 13 | "## Objective\n", 14 | "In this lab, we'll explore how to host a large language model on Amazon SageMaker using [Hugging Face LLM Inference Container for Amazon SageMaker](https://huggingface.co/blog/sagemaker-huggingface-llm), which allows you to easily deploy the most popular open-source LLMs, including Falcon, StarCoder, BLOOM, GPT-NeoX, Llama, and T5\n", 15 | "\n", 16 | "## Introduction\n", 17 | "\n", 18 | "Language models have recently exploded in both size and popularity. In 2018, BERT-large entered the scene and, with its 340M parameters and novel transformer architecture, set the standard on NLP task accuracy. Within just a few years, state-of-the-art NLP model size has grown by more than 500x with models such as OpenAI’s 175 billion parameter GPT-3 and similarly sized open source Bloom 176B raising the bar on NLP accuracy. This increase in the number of parameters is driven by the simple and empirically-demonstrated positive relationship between model size and accuracy: more is better. With easy access from models zoos such as HuggingFace and improved accuracy in NLP tasks such as classification and text generation, practitioners are increasingly reaching for these large models. However, deploying them can be a challenge because of their size.\n", 19 | "\n", 20 | "## Background and Details\n", 21 | "We'll be working with [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) that was developed by the Technology Innovation Institute (TII). Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.\n", 22 | "## Instructions\n", 23 | "\n", 24 | "### Prerequisites\n", 25 | "\n", 26 | "#### To run this workshop...\n", 27 | "You need a computer with a web browser, preferably with the latest version of Chrome / FireFox.\n", 28 | "Sequentially read and follow the instructions described in AWS Hosted Event and Work Environment Set Up\n", 29 | "\n", 30 | "#### Recommended background\n", 31 | "It will be easier for you to run this workshop if you have:\n", 32 | "\n", 33 | "- Experience with Deep learning models\n", 34 | "- Familiarity with Python or other similar programming languages\n", 35 | "- Experience with Jupyter notebooks\n", 36 | "- Begineers level knowledge and experience with SageMaker Hosting/Inference.\n", 37 | "\n", 38 | "#### Target audience\n", 39 | "Data Scientists, ML Engineering, ML Infrastructure, MLOps Engineers, Technical Leaders.\n", 40 | "Intended for customers working with large Generative AI models including Language, Computer vision and Multi-modal use-cases.\n", 41 | "Customers using EKS/EC2/ECS/On-prem for hosting or experience with SageMaker.\n", 42 | "\n", 43 | "Level of expertise - 400\n", 44 | "\n", 45 | "#### Time to complete\n", 46 | "Approximately 45 minutes." 47 | ] 48 | }, 49 | { 50 | "attachments": {}, 51 | "cell_type": "markdown", 52 | "id": "c295cdfa-c6b5-45b0-88e4-5f3f66aa6137", 53 | "metadata": {}, 54 | "source": [ 55 | "We are going to use the SageMaker Python SDK to deploy Falcon-40b-Instruct model to Amazon SageMaker. " 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "id": "d7cab4ed-1380-4ae5-a0c5-af2cc4e86490", 62 | "metadata": { 63 | "tags": [] 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "!pip install --upgrade boto3 sagemaker" 68 | ] 69 | }, 70 | { 71 | "attachments": {}, 72 | "cell_type": "markdown", 73 | "id": "e36960f4-07cf-4294-9fcc-8d320180f357", 74 | "metadata": {}, 75 | "source": [ 76 | "\n", 77 | "\n", 78 | "Before we begin with the actual work for packaging and deploying the model to Amazon SageMaker, we need to setup the notebook environment respectively. This includes:\n", 79 | "\n", 80 | "- retrieval of the execution role our SageMaker Studio domain is associated with for later usage\n", 81 | "- retrieval of our bucket for later usage\n", 82 | "- retrieval of the chosen region for later usage" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "id": "60977155-7ea3-489b-ab11-3c06c7385c08", 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "import sagemaker\n", 93 | "import boto3\n", 94 | "sess = sagemaker.Session()\n", 95 | "# sagemaker session bucket -> used for uploading data, models and logs\n", 96 | "# sagemaker will automatically create this bucket if it not exists\n", 97 | "sagemaker_session_bucket=None\n", 98 | "if sagemaker_session_bucket is None and sess is not None:\n", 99 | " # set to default bucket if a bucket name is not given\n", 100 | " sagemaker_session_bucket = sess.default_bucket()\n", 101 | "\n", 102 | "try:\n", 103 | " role = sagemaker.get_execution_role()\n", 104 | "except ValueError:\n", 105 | " iam = boto3.client('iam')\n", 106 | " role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']\n", 107 | "\n", 108 | "sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)\n", 109 | "\n", 110 | "print(f\"sagemaker role arn: {role}\")\n", 111 | "print(f\"sagemaker session region: {sess.boto_region_name}\")\n" 112 | ] 113 | }, 114 | { 115 | "attachments": {}, 116 | "cell_type": "markdown", 117 | "id": "af10a9fe-2b82-474b-aaa9-88c62f72e952", 118 | "metadata": {}, 119 | "source": [ 120 | "Compared to deploying regular Hugging Face models, we first need to retrieve the container uri and provide it to our HuggingFaceModel model class with a **image_uri** pointing to the image. To retrieve the new Hugging Face LLM Deep Learning Container in Amazon SageMaker, we can use the **get_huggingface_llm_image_uri** method provided by the SageMaker SDK. This method allows us to retrieve the URI for the desired Hugging Face LLM DLC based on the specified backend, session, region, and version. " 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "id": "e461638a-6dc8-40e4-9601-c88edcb2885e", 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "from sagemaker.huggingface import get_huggingface_llm_image_uri\n", 131 | "\n", 132 | "# retrieve the llm image uri\n", 133 | "llm_image = get_huggingface_llm_image_uri(\n", 134 | " \"huggingface\",\n", 135 | " version=\"0.8.2\"\n", 136 | ")\n", 137 | "\n", 138 | "# print ecr image uri\n", 139 | "print(f\"llm image uri: {llm_image}\")" 140 | ] 141 | }, 142 | { 143 | "attachments": {}, 144 | "cell_type": "markdown", 145 | "id": "752b5437-abda-4e6f-ad85-7163a673ad5d", 146 | "metadata": {}, 147 | "source": [ 148 | "To deploy Falcon-40B-Instruct model to Amazon SageMaker, we create a HuggingFaceModel model class and define our endpoint configuration including the **hf_model_id**, and **instance_type**. We will use a **g5.12xlarge** instance type with 4 NVIDIA A10G GPUs and 96GB of GPU memory.\n", 149 | "\n" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "id": "e50d34ef-0215-418e-9685-57a6bd13e49a", 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "import json\n", 160 | "from sagemaker.huggingface import HuggingFaceModel\n", 161 | "\n", 162 | "# sagemaker config\n", 163 | "instance_type = \"ml.g5.12xlarge\"\n", 164 | "number_of_gpu = 4\n", 165 | "\n", 166 | "# TGI config\n", 167 | "config = {\n", 168 | " 'HF_MODEL_ID': \"tiiuae/falcon-40b-instruct\", # model id from hf.co/models\n", 169 | " 'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica\n", 170 | " 'MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text\n", 171 | " 'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text)\n", 172 | " # 'HF_MODEL_QUANTIZE': \"bitsandbytes\", # comment in to quantize\n", 173 | "}\n", 174 | "\n", 175 | "# create HuggingFaceModel\n", 176 | "llm_model = HuggingFaceModel(\n", 177 | " role=role,\n", 178 | " image_uri=llm_image,\n", 179 | " env=config\n", 180 | ")\n" 181 | ] 182 | }, 183 | { 184 | "attachments": {}, 185 | "cell_type": "markdown", 186 | "id": "6896199d-350a-40f9-8104-fa1462a21037", 187 | "metadata": {}, 188 | "source": [ 189 | "After we have created the HuggingFaceModel we can deploy it to Amazon SageMaker using the deploy method. We will deploy the model with the ml.g5.12xlarge instance type. The Hugging Face LLM Deep Learning Container is powered by [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference), an open-source, purpose-built solution for deploying and serving Large Language Models.TGI will automatically distribute and shard the model across all GPUs." 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "id": "5a470809-96e6-4ced-bdb3-048b5b2c1efc", 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "# Deploy model to an endpoint\n", 200 | "\n", 201 | "llm = llm_model.deploy(\n", 202 | " initial_instance_count=1,\n", 203 | " instance_type=instance_type,\n", 204 | " # volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3\n", 205 | ")\n" 206 | ] 207 | }, 208 | { 209 | "attachments": {}, 210 | "cell_type": "markdown", 211 | "id": "55c9138e-3e67-4b4a-9949-0e759522f453", 212 | "metadata": {}, 213 | "source": [ 214 | "After our endpoint is deployed we can run inference on it using the predict method from the predictor. We can use different parameters to control the generation, defining them in the parameters attribute of the payload. As of today TGI supports the following parameters:\n", 215 | "\n", 216 | "- temperature: Controls randomness in the model. Lower values will make the model more deterministic and higher values will make the model more random. Default value is 1.0.\n", 217 | "- max_new_tokens: The maximum number of tokens to generate. Default value is 20, max value is 512.\n", 218 | "- repetition_penalty: Controls the likelihood of repetition, defaults to null.\n", 219 | "- seed: The seed to use for random generation, default is null.\n", 220 | "- stop: A list of tokens to stop the generation. The generation will stop when one of the tokens is generated.\n", 221 | "- top_k: The number of highest probability vocabulary tokens to keep for top-k-filtering. Default value is null, which disables top-k-filtering.\n", 222 | "- top_p: The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling, default to null\n", 223 | "- do_sample: Whether or not to use sampling; use greedy decoding otherwise. Default value is false.\n", 224 | "- best_of: Generate best_of sequences and return the one if the highest token logprobs, default to null.\n", 225 | "- details: Whether or not to return details about the generation. Default value is false.\n", 226 | "- return_full_text: Whether or not to return the full text or only the generated part. Default value is false.\n", 227 | "- truncate: Whether or not to truncate the input to the maximum length of the model. Default value is true.\n", 228 | "- typical_p: The typical probability of a token. Default value is null.\n", 229 | "- watermark: The watermark to use for the generation. Default value is false.\n" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": null, 235 | "id": "d50bf38a-82a8-4706-a213-269992825ccd", 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "# define payload\n", 240 | "prompt = \"\"\"You are an helpful Assistant, called Falcon. Knowing everyting about AWS.\n", 241 | "\n", 242 | "User: Can you tell me something about Amazon SageMaker?\n", 243 | "Falcon:\"\"\"\n", 244 | "\n", 245 | "# hyperparameters for llm\n", 246 | "payload = {\n", 247 | " \"inputs\": prompt,\n", 248 | " \"parameters\": {\n", 249 | " \"do_sample\": True,\n", 250 | " \"top_p\": 0.9,\n", 251 | " \"temperature\": 0.8,\n", 252 | " \"max_new_tokens\": 1024,\n", 253 | " \"repetition_penalty\": 1.03,\n", 254 | " \"stop\": [\"\\nUser:\",\"<|endoftext|>\",\"\"]\n", 255 | " }\n", 256 | "}\n", 257 | "\n", 258 | "# send request to endpoint\n", 259 | "response = llm.predict(payload)\n", 260 | "\n", 261 | "for seq in response:\n", 262 | " print(f\"Result: {seq['generated_text']}\")\n" 263 | ] 264 | }, 265 | { 266 | "attachments": {}, 267 | "cell_type": "markdown", 268 | "id": "6e6b7092-450b-4994-a8d0-b0937f82b394", 269 | "metadata": {}, 270 | "source": [ 271 | "## Prompt Engineering\n", 272 | "Prompt engineering is a technique used to design effective prompts for LLMs with the goal to achieve: \n", 273 | "\n", 274 | "- Control over the output: With prompt engineering, developers can control the output generated by LLMs. By designing prompts that specify the desired topic, style, tone, and level of formality, they can guide the LLM to produce text that meets the desired criteria.\n", 275 | "- Mitigating bias: LLMs have been shown to produce biased outputs when prompted with certain topics or language patterns. By engineering prompts that avoid biased language and encourage fairness, developers can help mitigate these issues.\n", 276 | "- Improving efficiency: Prompt engineering can help LLMs work more efficiently by guiding them to generate the desired output with fewer iterations. By providing clear, concise, and specific prompts, developers can help LLMs achieve the desired outcome faster and with fewer errors.\n", 277 | "\n", 278 | "In general, a prompt can contain any of the following components:\n", 279 | "\n", 280 | "- Instruction - a specific task or instruction you want the model to perform\n", 281 | "- Context - can involve external information or additional context that can steer the model to better responses\n", 282 | "- Input Data - is the input or question that we are interested to find a response for\n", 283 | "- Output Indicator - indicates the type or format of output.\n", 284 | "\n", 285 | "In general, the more information we provide with the prompt the better the above mentioned goals will be achieved.\n", 286 | "\n", 287 | "Let's try it out!" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "id": "473b3cfe-01ca-4d4a-a82d-b8b3f8c321a2", 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "# Simple unstructured prompt\n", 298 | "prompt = \"\"\"\n", 299 | "Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.\n", 300 | "\n", 301 | "User: What was OKT3 originally sourced from?\n", 302 | "\n", 303 | "Falcon:\"\"\"\n", 304 | "\n", 305 | "\n", 306 | "# hyperparameters for llm\n", 307 | "payload = {\n", 308 | " \"inputs\": prompt,\n", 309 | " \"parameters\": {\n", 310 | " \"do_sample\": True,\n", 311 | " \"top_p\": 0.9,\n", 312 | " \"temperature\": 0.8,\n", 313 | " \"max_new_tokens\": 1024,\n", 314 | " \"repetition_penalty\": 1.03,\n", 315 | " \"stop\": [\"\\nUser:\",\"<|endoftext|>\",\"\"]\n", 316 | " }\n", 317 | "}\n", 318 | "\n", 319 | "# send request to endpoint\n", 320 | "response = llm.predict(payload)\n", 321 | "\n", 322 | "for seq in response:\n", 323 | " print(f\"Result: {seq['generated_text']}\")" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "id": "11b018ff-7ac3-4c1e-baca-66b373b5f36e", 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [ 333 | "# We now stick to the scheme proposed above\n", 334 | "prompt = \"\"\"\n", 335 | "Answer the question based on the context below. Keep the answer short and concise. Respond \"Unsure about answer\" if not sure about the answer.\n", 336 | "\n", 337 | "Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.\n", 338 | "\n", 339 | "Question: What was OKT3 originally sourced from?\n", 340 | "\n", 341 | "Answer:\"\"\"\n", 342 | "\n", 343 | "\n", 344 | "# hyperparameters for llm\n", 345 | "payload = {\n", 346 | " \"inputs\": prompt,\n", 347 | " \"parameters\": {\n", 348 | " \"do_sample\": True,\n", 349 | " \"top_p\": 0.9,\n", 350 | " \"temperature\": 0.8,\n", 351 | " \"max_new_tokens\": 1024,\n", 352 | " \"repetition_penalty\": 1.03,\n", 353 | " \"stop\": [\"\\nUser:\",\"<|endoftext|>\",\"\"]\n", 354 | " }\n", 355 | "}\n", 356 | "\n", 357 | "# send request to endpoint\n", 358 | "response = llm.predict(payload)\n", 359 | "for seq in response:\n", 360 | " print(f\"Result: {seq['generated_text']}\")\n" 361 | ] 362 | }, 363 | { 364 | "attachments": {}, 365 | "cell_type": "markdown", 366 | "id": "aa161db6-7a30-4186-abf2-36503d771826", 367 | "metadata": {}, 368 | "source": [ 369 | "In addition, [few-shot learning](https://www.analyticsvidhya.com/blog/2021/05/an-introduction-to-few-shot-learning/) is an interesting approach for the context element of a prompt. Few-shot learning is a prompt engineering technique that enables models to learn new tasks or concepts from only a few examples (usually a single digit number is just fine) or samples. Despite of the fact that the model has never seen this task in the training phase, we experience a significant boost in performance. " 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "id": "ff9354fa-35f4-44fd-be85-03e93b7fec3a", 376 | "metadata": {}, 377 | "outputs": [], 378 | "source": [ 379 | "# One-shot\n", 380 | "prompt = \"\"\"\n", 381 | "Tweet: \"This new music video was incredibile\"\n", 382 | "Sentiment:\"\"\"\n", 383 | "\n", 384 | "\n", 385 | "# hyperparameters for llm\n", 386 | "payload = {\n", 387 | " \"inputs\": prompt,\n", 388 | " \"parameters\": {\n", 389 | " \"do_sample\": True,\n", 390 | " \"top_p\": 0.9,\n", 391 | " \"temperature\": 0.8,\n", 392 | " \"max_new_tokens\": 1024,\n", 393 | " \"repetition_penalty\": 1.03,\n", 394 | " \"stop\": [\"\\nUser:\",\"<|endoftext|>\",\"\"]\n", 395 | " }\n", 396 | "}\n", 397 | "\n", 398 | "# send request to endpoint\n", 399 | "response = llm.predict(payload)\n", 400 | "\n", 401 | "for seq in response:\n", 402 | " print(f\"Result: {seq['generated_text']}\")\n" 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "id": "40bb5842-077d-4da3-9760-a04d9424699e", 409 | "metadata": {}, 410 | "outputs": [], 411 | "source": [ 412 | "# Few-shot\n", 413 | "prompt = \"\"\"\n", 414 | "Tweet: \"I hate it when my phone battery dies.\"\n", 415 | "Sentiment: Negative\n", 416 | "###\n", 417 | "Tweet: \"My day has been 👍\"\n", 418 | "Sentiment: Positive\n", 419 | "###\n", 420 | "Tweet: \"This is the link to the article\"\n", 421 | "Sentiment: Neutral\n", 422 | "###\n", 423 | "Tweet: \"This new music video was incredibile\"\n", 424 | "Sentiment:\"\"\"\n", 425 | "\n", 426 | "# hyperparameters for llm\n", 427 | "payload = {\n", 428 | " \"inputs\": prompt,\n", 429 | " \"parameters\": {\n", 430 | " \"do_sample\": True,\n", 431 | " \"top_p\": 0.9,\n", 432 | " \"temperature\": 0.8,\n", 433 | " \"max_new_tokens\": 1024,\n", 434 | " \"repetition_penalty\": 1.03,\n", 435 | " \"stop\": [\"\\nUser:\",\"<|endoftext|>\",\"\"]\n", 436 | " }\n", 437 | "}\n", 438 | "\n", 439 | "# send request to endpoint\n", 440 | "response = llm.predict(payload)\n", 441 | "for seq in response:\n", 442 | " print(f\"Result: {seq['generated_text']}\")\n" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": null, 448 | "id": "0dbbb9c4-7428-45fa-9b25-d0392b08f62e", 449 | "metadata": { 450 | "tags": [] 451 | }, 452 | "outputs": [], 453 | "source": [ 454 | "llm.delete_model()\n", 455 | "llm.delete_endpoint()\n" 456 | ] 457 | } 458 | ], 459 | "metadata": { 460 | "availableInstances": [ 461 | { 462 | "_defaultOrder": 0, 463 | "_isFastLaunch": true, 464 | "category": "General purpose", 465 | "gpuNum": 0, 466 | "hideHardwareSpecs": false, 467 | "memoryGiB": 4, 468 | "name": "ml.t3.medium", 469 | "vcpuNum": 2 470 | }, 471 | { 472 | "_defaultOrder": 1, 473 | "_isFastLaunch": false, 474 | "category": "General purpose", 475 | "gpuNum": 0, 476 | "hideHardwareSpecs": false, 477 | "memoryGiB": 8, 478 | "name": "ml.t3.large", 479 | "vcpuNum": 2 480 | }, 481 | { 482 | "_defaultOrder": 2, 483 | "_isFastLaunch": false, 484 | "category": "General purpose", 485 | "gpuNum": 0, 486 | "hideHardwareSpecs": false, 487 | "memoryGiB": 16, 488 | "name": "ml.t3.xlarge", 489 | "vcpuNum": 4 490 | }, 491 | { 492 | "_defaultOrder": 3, 493 | "_isFastLaunch": false, 494 | "category": "General purpose", 495 | "gpuNum": 0, 496 | "hideHardwareSpecs": false, 497 | "memoryGiB": 32, 498 | "name": "ml.t3.2xlarge", 499 | "vcpuNum": 8 500 | }, 501 | { 502 | "_defaultOrder": 4, 503 | "_isFastLaunch": true, 504 | "category": "General purpose", 505 | "gpuNum": 0, 506 | "hideHardwareSpecs": false, 507 | "memoryGiB": 8, 508 | "name": "ml.m5.large", 509 | "vcpuNum": 2 510 | }, 511 | { 512 | "_defaultOrder": 5, 513 | "_isFastLaunch": false, 514 | "category": "General purpose", 515 | "gpuNum": 0, 516 | "hideHardwareSpecs": false, 517 | "memoryGiB": 16, 518 | "name": "ml.m5.xlarge", 519 | "vcpuNum": 4 520 | }, 521 | { 522 | "_defaultOrder": 6, 523 | "_isFastLaunch": false, 524 | "category": "General purpose", 525 | "gpuNum": 0, 526 | "hideHardwareSpecs": false, 527 | "memoryGiB": 32, 528 | "name": "ml.m5.2xlarge", 529 | "vcpuNum": 8 530 | }, 531 | { 532 | "_defaultOrder": 7, 533 | "_isFastLaunch": false, 534 | "category": "General purpose", 535 | "gpuNum": 0, 536 | "hideHardwareSpecs": false, 537 | "memoryGiB": 64, 538 | "name": "ml.m5.4xlarge", 539 | "vcpuNum": 16 540 | }, 541 | { 542 | "_defaultOrder": 8, 543 | "_isFastLaunch": false, 544 | "category": "General purpose", 545 | "gpuNum": 0, 546 | "hideHardwareSpecs": false, 547 | "memoryGiB": 128, 548 | "name": "ml.m5.8xlarge", 549 | "vcpuNum": 32 550 | }, 551 | { 552 | "_defaultOrder": 9, 553 | "_isFastLaunch": false, 554 | "category": "General purpose", 555 | "gpuNum": 0, 556 | "hideHardwareSpecs": false, 557 | "memoryGiB": 192, 558 | "name": "ml.m5.12xlarge", 559 | "vcpuNum": 48 560 | }, 561 | { 562 | "_defaultOrder": 10, 563 | "_isFastLaunch": false, 564 | "category": "General purpose", 565 | "gpuNum": 0, 566 | "hideHardwareSpecs": false, 567 | "memoryGiB": 256, 568 | "name": "ml.m5.16xlarge", 569 | "vcpuNum": 64 570 | }, 571 | { 572 | "_defaultOrder": 11, 573 | "_isFastLaunch": false, 574 | "category": "General purpose", 575 | "gpuNum": 0, 576 | "hideHardwareSpecs": false, 577 | "memoryGiB": 384, 578 | "name": "ml.m5.24xlarge", 579 | "vcpuNum": 96 580 | }, 581 | { 582 | "_defaultOrder": 12, 583 | "_isFastLaunch": false, 584 | "category": "General purpose", 585 | "gpuNum": 0, 586 | "hideHardwareSpecs": false, 587 | "memoryGiB": 8, 588 | "name": "ml.m5d.large", 589 | "vcpuNum": 2 590 | }, 591 | { 592 | "_defaultOrder": 13, 593 | "_isFastLaunch": false, 594 | "category": "General purpose", 595 | "gpuNum": 0, 596 | "hideHardwareSpecs": false, 597 | "memoryGiB": 16, 598 | "name": "ml.m5d.xlarge", 599 | "vcpuNum": 4 600 | }, 601 | { 602 | "_defaultOrder": 14, 603 | "_isFastLaunch": false, 604 | "category": "General purpose", 605 | "gpuNum": 0, 606 | "hideHardwareSpecs": false, 607 | "memoryGiB": 32, 608 | "name": "ml.m5d.2xlarge", 609 | "vcpuNum": 8 610 | }, 611 | { 612 | "_defaultOrder": 15, 613 | "_isFastLaunch": false, 614 | "category": "General purpose", 615 | "gpuNum": 0, 616 | "hideHardwareSpecs": false, 617 | "memoryGiB": 64, 618 | "name": "ml.m5d.4xlarge", 619 | "vcpuNum": 16 620 | }, 621 | { 622 | "_defaultOrder": 16, 623 | "_isFastLaunch": false, 624 | "category": "General purpose", 625 | "gpuNum": 0, 626 | "hideHardwareSpecs": false, 627 | "memoryGiB": 128, 628 | "name": "ml.m5d.8xlarge", 629 | "vcpuNum": 32 630 | }, 631 | { 632 | "_defaultOrder": 17, 633 | "_isFastLaunch": false, 634 | "category": "General purpose", 635 | "gpuNum": 0, 636 | "hideHardwareSpecs": false, 637 | "memoryGiB": 192, 638 | "name": "ml.m5d.12xlarge", 639 | "vcpuNum": 48 640 | }, 641 | { 642 | "_defaultOrder": 18, 643 | "_isFastLaunch": false, 644 | "category": "General purpose", 645 | "gpuNum": 0, 646 | "hideHardwareSpecs": false, 647 | "memoryGiB": 256, 648 | "name": "ml.m5d.16xlarge", 649 | "vcpuNum": 64 650 | }, 651 | { 652 | "_defaultOrder": 19, 653 | "_isFastLaunch": false, 654 | "category": "General purpose", 655 | "gpuNum": 0, 656 | "hideHardwareSpecs": false, 657 | "memoryGiB": 384, 658 | "name": "ml.m5d.24xlarge", 659 | "vcpuNum": 96 660 | }, 661 | { 662 | "_defaultOrder": 20, 663 | "_isFastLaunch": false, 664 | "category": "General purpose", 665 | "gpuNum": 0, 666 | "hideHardwareSpecs": true, 667 | "memoryGiB": 0, 668 | "name": "ml.geospatial.interactive", 669 | "supportedImageNames": [ 670 | "sagemaker-geospatial-v1-0" 671 | ], 672 | "vcpuNum": 0 673 | }, 674 | { 675 | "_defaultOrder": 21, 676 | "_isFastLaunch": true, 677 | "category": "Compute optimized", 678 | "gpuNum": 0, 679 | "hideHardwareSpecs": false, 680 | "memoryGiB": 4, 681 | "name": "ml.c5.large", 682 | "vcpuNum": 2 683 | }, 684 | { 685 | "_defaultOrder": 22, 686 | "_isFastLaunch": false, 687 | "category": "Compute optimized", 688 | "gpuNum": 0, 689 | "hideHardwareSpecs": false, 690 | "memoryGiB": 8, 691 | "name": "ml.c5.xlarge", 692 | "vcpuNum": 4 693 | }, 694 | { 695 | "_defaultOrder": 23, 696 | "_isFastLaunch": false, 697 | "category": "Compute optimized", 698 | "gpuNum": 0, 699 | "hideHardwareSpecs": false, 700 | "memoryGiB": 16, 701 | "name": "ml.c5.2xlarge", 702 | "vcpuNum": 8 703 | }, 704 | { 705 | "_defaultOrder": 24, 706 | "_isFastLaunch": false, 707 | "category": "Compute optimized", 708 | "gpuNum": 0, 709 | "hideHardwareSpecs": false, 710 | "memoryGiB": 32, 711 | "name": "ml.c5.4xlarge", 712 | "vcpuNum": 16 713 | }, 714 | { 715 | "_defaultOrder": 25, 716 | "_isFastLaunch": false, 717 | "category": "Compute optimized", 718 | "gpuNum": 0, 719 | "hideHardwareSpecs": false, 720 | "memoryGiB": 72, 721 | "name": "ml.c5.9xlarge", 722 | "vcpuNum": 36 723 | }, 724 | { 725 | "_defaultOrder": 26, 726 | "_isFastLaunch": false, 727 | "category": "Compute optimized", 728 | "gpuNum": 0, 729 | "hideHardwareSpecs": false, 730 | "memoryGiB": 96, 731 | "name": "ml.c5.12xlarge", 732 | "vcpuNum": 48 733 | }, 734 | { 735 | "_defaultOrder": 27, 736 | "_isFastLaunch": false, 737 | "category": "Compute optimized", 738 | "gpuNum": 0, 739 | "hideHardwareSpecs": false, 740 | "memoryGiB": 144, 741 | "name": "ml.c5.18xlarge", 742 | "vcpuNum": 72 743 | }, 744 | { 745 | "_defaultOrder": 28, 746 | "_isFastLaunch": false, 747 | "category": "Compute optimized", 748 | "gpuNum": 0, 749 | "hideHardwareSpecs": false, 750 | "memoryGiB": 192, 751 | "name": "ml.c5.24xlarge", 752 | "vcpuNum": 96 753 | }, 754 | { 755 | "_defaultOrder": 29, 756 | "_isFastLaunch": true, 757 | "category": "Accelerated computing", 758 | "gpuNum": 1, 759 | "hideHardwareSpecs": false, 760 | "memoryGiB": 16, 761 | "name": "ml.g4dn.xlarge", 762 | "vcpuNum": 4 763 | }, 764 | { 765 | "_defaultOrder": 30, 766 | "_isFastLaunch": false, 767 | "category": "Accelerated computing", 768 | "gpuNum": 1, 769 | "hideHardwareSpecs": false, 770 | "memoryGiB": 32, 771 | "name": "ml.g4dn.2xlarge", 772 | "vcpuNum": 8 773 | }, 774 | { 775 | "_defaultOrder": 31, 776 | "_isFastLaunch": false, 777 | "category": "Accelerated computing", 778 | "gpuNum": 1, 779 | "hideHardwareSpecs": false, 780 | "memoryGiB": 64, 781 | "name": "ml.g4dn.4xlarge", 782 | "vcpuNum": 16 783 | }, 784 | { 785 | "_defaultOrder": 32, 786 | "_isFastLaunch": false, 787 | "category": "Accelerated computing", 788 | "gpuNum": 1, 789 | "hideHardwareSpecs": false, 790 | "memoryGiB": 128, 791 | "name": "ml.g4dn.8xlarge", 792 | "vcpuNum": 32 793 | }, 794 | { 795 | "_defaultOrder": 33, 796 | "_isFastLaunch": false, 797 | "category": "Accelerated computing", 798 | "gpuNum": 4, 799 | "hideHardwareSpecs": false, 800 | "memoryGiB": 192, 801 | "name": "ml.g4dn.12xlarge", 802 | "vcpuNum": 48 803 | }, 804 | { 805 | "_defaultOrder": 34, 806 | "_isFastLaunch": false, 807 | "category": "Accelerated computing", 808 | "gpuNum": 1, 809 | "hideHardwareSpecs": false, 810 | "memoryGiB": 256, 811 | "name": "ml.g4dn.16xlarge", 812 | "vcpuNum": 64 813 | }, 814 | { 815 | "_defaultOrder": 35, 816 | "_isFastLaunch": false, 817 | "category": "Accelerated computing", 818 | "gpuNum": 1, 819 | "hideHardwareSpecs": false, 820 | "memoryGiB": 61, 821 | "name": "ml.p3.2xlarge", 822 | "vcpuNum": 8 823 | }, 824 | { 825 | "_defaultOrder": 36, 826 | "_isFastLaunch": false, 827 | "category": "Accelerated computing", 828 | "gpuNum": 4, 829 | "hideHardwareSpecs": false, 830 | "memoryGiB": 244, 831 | "name": "ml.p3.8xlarge", 832 | "vcpuNum": 32 833 | }, 834 | { 835 | "_defaultOrder": 37, 836 | "_isFastLaunch": false, 837 | "category": "Accelerated computing", 838 | "gpuNum": 8, 839 | "hideHardwareSpecs": false, 840 | "memoryGiB": 488, 841 | "name": "ml.p3.16xlarge", 842 | "vcpuNum": 64 843 | }, 844 | { 845 | "_defaultOrder": 38, 846 | "_isFastLaunch": false, 847 | "category": "Accelerated computing", 848 | "gpuNum": 8, 849 | "hideHardwareSpecs": false, 850 | "memoryGiB": 768, 851 | "name": "ml.p3dn.24xlarge", 852 | "vcpuNum": 96 853 | }, 854 | { 855 | "_defaultOrder": 39, 856 | "_isFastLaunch": false, 857 | "category": "Memory Optimized", 858 | "gpuNum": 0, 859 | "hideHardwareSpecs": false, 860 | "memoryGiB": 16, 861 | "name": "ml.r5.large", 862 | "vcpuNum": 2 863 | }, 864 | { 865 | "_defaultOrder": 40, 866 | "_isFastLaunch": false, 867 | "category": "Memory Optimized", 868 | "gpuNum": 0, 869 | "hideHardwareSpecs": false, 870 | "memoryGiB": 32, 871 | "name": "ml.r5.xlarge", 872 | "vcpuNum": 4 873 | }, 874 | { 875 | "_defaultOrder": 41, 876 | "_isFastLaunch": false, 877 | "category": "Memory Optimized", 878 | "gpuNum": 0, 879 | "hideHardwareSpecs": false, 880 | "memoryGiB": 64, 881 | "name": "ml.r5.2xlarge", 882 | "vcpuNum": 8 883 | }, 884 | { 885 | "_defaultOrder": 42, 886 | "_isFastLaunch": false, 887 | "category": "Memory Optimized", 888 | "gpuNum": 0, 889 | "hideHardwareSpecs": false, 890 | "memoryGiB": 128, 891 | "name": "ml.r5.4xlarge", 892 | "vcpuNum": 16 893 | }, 894 | { 895 | "_defaultOrder": 43, 896 | "_isFastLaunch": false, 897 | "category": "Memory Optimized", 898 | "gpuNum": 0, 899 | "hideHardwareSpecs": false, 900 | "memoryGiB": 256, 901 | "name": "ml.r5.8xlarge", 902 | "vcpuNum": 32 903 | }, 904 | { 905 | "_defaultOrder": 44, 906 | "_isFastLaunch": false, 907 | "category": "Memory Optimized", 908 | "gpuNum": 0, 909 | "hideHardwareSpecs": false, 910 | "memoryGiB": 384, 911 | "name": "ml.r5.12xlarge", 912 | "vcpuNum": 48 913 | }, 914 | { 915 | "_defaultOrder": 45, 916 | "_isFastLaunch": false, 917 | "category": "Memory Optimized", 918 | "gpuNum": 0, 919 | "hideHardwareSpecs": false, 920 | "memoryGiB": 512, 921 | "name": "ml.r5.16xlarge", 922 | "vcpuNum": 64 923 | }, 924 | { 925 | "_defaultOrder": 46, 926 | "_isFastLaunch": false, 927 | "category": "Memory Optimized", 928 | "gpuNum": 0, 929 | "hideHardwareSpecs": false, 930 | "memoryGiB": 768, 931 | "name": "ml.r5.24xlarge", 932 | "vcpuNum": 96 933 | }, 934 | { 935 | "_defaultOrder": 47, 936 | "_isFastLaunch": false, 937 | "category": "Accelerated computing", 938 | "gpuNum": 1, 939 | "hideHardwareSpecs": false, 940 | "memoryGiB": 16, 941 | "name": "ml.g5.xlarge", 942 | "vcpuNum": 4 943 | }, 944 | { 945 | "_defaultOrder": 48, 946 | "_isFastLaunch": false, 947 | "category": "Accelerated computing", 948 | "gpuNum": 1, 949 | "hideHardwareSpecs": false, 950 | "memoryGiB": 32, 951 | "name": "ml.g5.2xlarge", 952 | "vcpuNum": 8 953 | }, 954 | { 955 | "_defaultOrder": 49, 956 | "_isFastLaunch": false, 957 | "category": "Accelerated computing", 958 | "gpuNum": 1, 959 | "hideHardwareSpecs": false, 960 | "memoryGiB": 64, 961 | "name": "ml.g5.4xlarge", 962 | "vcpuNum": 16 963 | }, 964 | { 965 | "_defaultOrder": 50, 966 | "_isFastLaunch": false, 967 | "category": "Accelerated computing", 968 | "gpuNum": 1, 969 | "hideHardwareSpecs": false, 970 | "memoryGiB": 128, 971 | "name": "ml.g5.8xlarge", 972 | "vcpuNum": 32 973 | }, 974 | { 975 | "_defaultOrder": 51, 976 | "_isFastLaunch": false, 977 | "category": "Accelerated computing", 978 | "gpuNum": 1, 979 | "hideHardwareSpecs": false, 980 | "memoryGiB": 256, 981 | "name": "ml.g5.16xlarge", 982 | "vcpuNum": 64 983 | }, 984 | { 985 | "_defaultOrder": 52, 986 | "_isFastLaunch": false, 987 | "category": "Accelerated computing", 988 | "gpuNum": 4, 989 | "hideHardwareSpecs": false, 990 | "memoryGiB": 192, 991 | "name": "ml.g5.12xlarge", 992 | "vcpuNum": 48 993 | }, 994 | { 995 | "_defaultOrder": 53, 996 | "_isFastLaunch": false, 997 | "category": "Accelerated computing", 998 | "gpuNum": 4, 999 | "hideHardwareSpecs": false, 1000 | "memoryGiB": 384, 1001 | "name": "ml.g5.24xlarge", 1002 | "vcpuNum": 96 1003 | }, 1004 | { 1005 | "_defaultOrder": 54, 1006 | "_isFastLaunch": false, 1007 | "category": "Accelerated computing", 1008 | "gpuNum": 8, 1009 | "hideHardwareSpecs": false, 1010 | "memoryGiB": 768, 1011 | "name": "ml.g5.48xlarge", 1012 | "vcpuNum": 192 1013 | }, 1014 | { 1015 | "_defaultOrder": 55, 1016 | "_isFastLaunch": false, 1017 | "category": "Accelerated computing", 1018 | "gpuNum": 8, 1019 | "hideHardwareSpecs": false, 1020 | "memoryGiB": 1152, 1021 | "name": "ml.p4d.24xlarge", 1022 | "vcpuNum": 96 1023 | }, 1024 | { 1025 | "_defaultOrder": 56, 1026 | "_isFastLaunch": false, 1027 | "category": "Accelerated computing", 1028 | "gpuNum": 8, 1029 | "hideHardwareSpecs": false, 1030 | "memoryGiB": 1152, 1031 | "name": "ml.p4de.24xlarge", 1032 | "vcpuNum": 96 1033 | } 1034 | ], 1035 | "instance_type": "ml.t3.medium", 1036 | "kernelspec": { 1037 | "display_name": "Python 3 (Data Science)", 1038 | "language": "python", 1039 | "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" 1040 | }, 1041 | "language_info": { 1042 | "codemirror_mode": { 1043 | "name": "ipython", 1044 | "version": 3 1045 | }, 1046 | "file_extension": ".py", 1047 | "mimetype": "text/x-python", 1048 | "name": "python", 1049 | "nbconvert_exporter": "python", 1050 | "pygments_lexer": "ipython3", 1051 | "version": "3.7.10" 1052 | } 1053 | }, 1054 | "nbformat": 4, 1055 | "nbformat_minor": 5 1056 | } 1057 | -------------------------------------------------------------------------------- /lab1/gpt-j-notebook-full.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "a7b6c10d-19ab-45ff-aa96-14676491aad4", 6 | "metadata": {}, 7 | "source": [ 8 | "# Lab 1\n", 9 | "\n", 10 | "## Introduction\n", 11 | "\n", 12 | "Language models have recently exploded in both size and popularity. In 2018, BERT-large entered the scene and, with its 340M parameters and novel transformer architecture, set the standard on NLP task accuracy. Within just a few years, state-of-the-art NLP model size has grown by more than 500x with models such as OpenAI’s 175 billion parameter GPT-3 and similarly sized open source Bloom 176B raising the bar on NLP accuracy. This increase in the number of parameters is driven by the simple and empirically-demonstrated positive relationship between model size and accuracy: more is better. With easy access from models zoos such as HuggingFace and improved accuracy in NLP tasks such as classification and text generation, practitioners are increasingly reaching for these large models. However, deploying them can be a challenge because of their size.\n", 13 | "\n", 14 | "In this Lab, we'll explore how to host a large language model on Amazon SageMaker using Sagemaker Inference, one of many ready-to-use AWS Deep Learning Containers (DLCs) and the built-in HuggingFace integration of the Sagemaker SDK. \n", 15 | "\n", 16 | "## Background and Details\n", 17 | "We'll be working with GPT-J , a large language model with over 6B parameters pre-trained on the Pile dataset. The pre-trained weights of the original model come in FP32 format (4 bytes per parameter) and combine to roughly 24Gb in total. Since most of today's state-of-the-art single-GPU-powered instances are only equipped with 16, max. 24 GPU GB memory, the size of the model weights is bringing up challenges for model inference. With quantization, we'll explore one of several optimization options available that allow us to host this model on single GPU instances with fewer than 16Gb of GPU memory. Finally, we take a closer look into the art of prompt engineering and discover how [few-shot](https://www.analyticsvidhya.com/blog/2021/05/an-introduction-to-few-shot-learning/) (as opposed to zero-shot) approaches can significantly improve model performance on a huge variety of NLP tasks. \n", 18 | "\n", 19 | "## Instructions\n", 20 | "\n", 21 | "### Prerequisites\n", 22 | "\n", 23 | "#### To run this workshop...\n", 24 | "You need a computer with a web browser, preferably with the latest version of Chrome / FireFox.\n", 25 | "Sequentially read and follow the instructions described in AWS Hosted Event and Work Environment Set Up\n", 26 | "\n", 27 | "#### Recommended background\n", 28 | "It will be easier for you to run this workshop if you have:\n", 29 | "\n", 30 | "- Experience with Deep learning models\n", 31 | "- Familiarity with Python or other similar programming languages\n", 32 | "- Experience with Jupyter notebooks\n", 33 | "- Begineers level knowledge and experience with SageMaker Hosting/Inference.\n", 34 | "\n", 35 | "#### Target audience\n", 36 | "Data Scientists, ML Engineering, ML Infrastructure, MLOps Engineers, Technical Leaders.\n", 37 | "Intended for customers working with large Generative AI models including Language, Computer vision and Multi-modal use-cases.\n", 38 | "Customers using EKS/EC2/ECS/On-prem for hosting or experience with SageMaker.\n", 39 | "\n", 40 | "Level of expertise - 400\n", 41 | "\n", 42 | "#### Time to complete\n", 43 | "Approximately 1 hour." 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "id": "ff43d981-d2dc-440c-a8ed-ed47ea8fabe3", 49 | "metadata": {}, 50 | "source": [ 51 | "# Import of required dependencies\n", 52 | "\n", 53 | "For this lab, we will use the following libraries:\n", 54 | "\n", 55 | " - SageMaker SDK for interacting with Amazon SageMaker. We especially want to highlight the classes 'HuggingFaceModel' and 'HuggingFacePredictor', utilizing the built-in HuggingFace integration into SageMaker SDK. These classes are used to encapsulate functionality around the model and the deployed endpoint we will use. They inherit from the generic 'Model' and 'Predictor' classes of the native SageMaker SDK, however implementing some additional functionality specific to HuggingFace and the HuggingFace model hub.\n", 56 | " - boto3, the AWS SDK for python\n", 57 | " - os, a python library implementing miscellaneous operating system interfaces \n", 58 | " - tarfile, a python library to read and write tar archive files" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "id": "0437cde3-ed17-4cec-b06c-0d0441436644", 65 | "metadata": { 66 | "tags": [] 67 | }, 68 | "outputs": [], 69 | "source": [ 70 | "from sagemaker.huggingface import HuggingFaceModel, HuggingFacePredictor\n", 71 | "import sagemaker\n", 72 | "import boto3\n", 73 | "import os\n", 74 | "import tarfile" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "id": "7e3938a6-b8e1-4d4c-abec-48a4c25a7862", 80 | "metadata": {}, 81 | "source": [ 82 | "# Setup of notebook environment\n", 83 | "\n", 84 | "Before we begin with the actual work for packaging and deploying the model to Amazon SageMaker, we need to setup the notebook environment respectively. This includes:\n", 85 | "- retrieval of the execution role our SageMaker Studio domain is associated with for later usage\n", 86 | "- retrieval of our account_id for later usage\n", 87 | "- retrieval of the chosen region for later usage" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "id": "9a1c9354-e9e2-4e8d-a604-a051f4d04b1b", 94 | "metadata": { 95 | "tags": [] 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "# IAM role with permissions to create endpoint\n", 100 | "role = sagemaker.get_execution_role()" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "id": "186e7070-5c55-4188-8fda-d7e71deca18d", 107 | "metadata": { 108 | "tags": [] 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "# Create a new STS client\n", 113 | "sts_client = boto3.client('sts')\n", 114 | "\n", 115 | "# Call the GetCallerIdentity operation to retrieve the account ID\n", 116 | "response = sts_client.get_caller_identity()\n", 117 | "account_id = response['Account']\n", 118 | "account_id" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "id": "835c385f", 125 | "metadata": { 126 | "tags": [] 127 | }, 128 | "outputs": [], 129 | "source": [ 130 | "# Retrieve region\n", 131 | "region = boto3.Session().region_name\n", 132 | "region" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "id": "f654d9ba-1a5b-4ef1-8bda-e51873274e7a", 138 | "metadata": {}, 139 | "source": [ 140 | "# Create Model Artifact Archive\n", 141 | "\n", 142 | "For hosting a model with AWS SageMaker Inference we need to package our model artifacts into an archive called ‘model.tar.gz’ and upload it to S3. Within this archive, your model artifacts should be stored in the following directory structure:\n", 143 | "\n", 144 | "`model.tar.gz`\n", 145 | "- `model.bin`\n", 146 | "- `code/`\n", 147 | " - `inference.py`\n", 148 | " - `requirements.txt`\n", 149 | "\n", 150 | "The \"code\" directory contains your inference script (inference.py) and your requirements.txt file (if you have additional dependencies, detailed description see below). The “model.bin” file is a file in one of various binary formats containing the model weights as well as some configuration. In our case this will be non-existing at archive creation time, since we will be loading our model files on endpoint-start time from the HuggingFace model hub. Before you continue, open the code directory to get familiar with the structure and be able to follow when going through the subsequent steps below.\n", 151 | "\n", 152 | "If you have additional dependencies for your model, you can include them in a requirements.txt file in the \"code\" directory. SageMaker will install these dependencies during the deployment of your model.\n", 153 | "\n", 154 | "Sometimes you may want to override one of the [five functions](https://huggingface.co/docs/sagemaker/inference#user-defined-code-and-modules) within the hosting cycle, such as the model_fn function. To do this, you can create a new function in your inference.py file with the same name as the function you want to override. SageMaker will automatically use your new function instead of the default function. Since we want to dynamically load the model binaries on endpoint-start time, we will override the model_fn, the default method for loading a model. \n", 155 | "\n", 156 | "Within the model_fn() function of inference.py, this can be done by leveraging the capabilities of HuggingFace. HuggingFace is a company focussing on democratisation of open-source AI and closely partnering with AWS. With the ‘transformers’ library, they have created a popular open-source API/framework for natural language processing on top of common frameworks like PyTorch or Tensorflow. They have also built the HuggingFace model hub, a model repository providing thousands of open-source models throughout different ML tasks. \n", 157 | "\n", 158 | "SageMaker provides built-in support for HuggingFace models through the SageMaker HuggingFace SDK. Here are the steps we will take to dynamically load our model from the HuggingFace model hub into SageMaker:\n", 159 | "1. Import the transformers library\n", 160 | "2. Use the from_pretrained method to download a pre-built tokenised together with a pre-trained model from the HuggingFace Model Hub\n", 161 | "```python\n", 162 | "tokenizer = AutoTokenizer.from_pretrained(\"EleutherAI/gpt-j-6B\")\n", 163 | "model = GPTJForCausalLM.from_pretrained(\"EleutherAI/gpt-j-6B\", revision=\"float16\", torch_dtype=torch.float16)\n", 164 | "\n", 165 | "```\n", 166 | "3. Use the pipeline method to perform inference on your text data. These pipelines can be configured to be task specific. For our use case we will use the [‘text-generation’ task](link***)\n", 167 | "```python\n", 168 | "generation = pipeline(\"text-generation\", model=model, tokenizer=tokenizer, device=device)\n", 169 | "\n", 170 | "```\n", 171 | "4. Pass the generated pipeline object as return value of the model_fn() function to integrate with the rest of the inference lifecycle\n", 172 | "\n", 173 | "The pre-trained weights of the original model come in FP32 format (4 bytes per parameter) and combine to roughly 24Gb in total. Since most of today's state-of-the-art single-GPU-powered instances are only equipped with 16, max. 24 GPU GB memory, the size of the model weights is bringing up challenges for model inference. Quantization is a technique to reduce the memory footprint when hosting large models. Thereby, the model weights are converted into FP16 or int8 format, resulting into a reduction of the hosting footprint by 2-4. By applying quantization, the GPT-J model can be hosted on a single-gpu instance like the ml.g4dn series.\n", 174 | "The from_pretrained method in the HuggingFace SDK allows you to download different revisions of a pre-trained model from the HuggingFace Model Hub. You can specify the revision you want to download using the revision parameter. For our use case, we will be using the FP16 revision of the model weights.\n" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": null, 180 | "id": "7f0be137-65fc-4e93-b031-a10a033d321d", 181 | "metadata": { 182 | "tags": [] 183 | }, 184 | "outputs": [], 185 | "source": [ 186 | "# function to compress the code directory into a model.tar.gz archive as outlined above\n", 187 | "def compress(tar_dir=None, output_file=\"./model.tar.gz\"):\n", 188 | " with tarfile.open(output_file, \"w:gz\") as tar:\n", 189 | " tar.add(tar_dir, arcname=\"code\")" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "id": "95614fe2-3a5a-4858-adec-26ee5b625842", 196 | "metadata": { 197 | "tags": [] 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "# specifying source code directory path\n", 202 | "model_code_dir = './code'\n", 203 | "# create tar.gz archive\n", 204 | "print(\"creating `model.tar.gz` archive\")\n", 205 | "compress(model_code_dir)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "id": "923347b6", 211 | "metadata": {}, 212 | "source": [ 213 | "# Uploading to S3\n", 214 | "We now have successfully created the model.tar.gz. However, it is still residing within the EBS volume of our SageMaker Studio domain. In the next step we will upload the archive file into a S3 bucket to make it available for SageMaker Inference. Therefor we will perform the following steps:\n", 215 | "- Creation of a new S3 bucket for model artifact storage\n", 216 | "- Upload of the model artifact to S3 using the python AWS SDK 'boto3'" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "id": "bb42c6a3", 223 | "metadata": { 224 | "tags": [] 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "# function to upload the model artifact model.tar.gz into a S3 bucket \n", 229 | "def upload_file_to_s3(bucket_name=None, file_name=\"model.tar.gz\", key_prefix=\"\"):\n", 230 | " s3 = boto3.resource(\"s3\")\n", 231 | " key_prefix_with_file_name = os.path.join(key_prefix, file_name)\n", 232 | " s3.Bucket(bucket_name).upload_file(file_name, key_prefix_with_file_name)\n", 233 | " return f's3://{bucket_name}/{key_prefix_with_file_name}'" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "id": "cc859d76", 240 | "metadata": { 241 | "tags": [] 242 | }, 243 | "outputs": [], 244 | "source": [ 245 | "# specifying bucket name for model artifact storage\n", 246 | "model_bucket_name = f'immersion-day-bucket-{account_id}-{region}'\n", 247 | "# specifying key prefix for model artifact storage\n", 248 | "model_s3_key_prefix = 'huggingface/gpt-j/'" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "id": "469cfe60-bb73-417d-bbdc-f1c4f1dc125f", 255 | "metadata": { 256 | "tags": [] 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "# Create S3 bucket\n", 261 | "s3_client = boto3.client('s3', region_name=region)\n", 262 | "location = {'LocationConstraint': region}\n", 263 | "\n", 264 | "bucket_name = model_bucket_name\n", 265 | "\n", 266 | "# Check if bucket already exists\n", 267 | "bucket_exists = True\n", 268 | "try:\n", 269 | " s3_client.head_bucket(Bucket=bucket_name)\n", 270 | "except:\n", 271 | " bucket_exists = False\n", 272 | "\n", 273 | "# Create bucket if it does not exist\n", 274 | "if not bucket_exists:\n", 275 | " if region == 'us-east-1':\n", 276 | " s3_client.create_bucket(Bucket=bucket_name)\n", 277 | " else: \n", 278 | " s3_client.create_bucket(Bucket=bucket_name,\n", 279 | " CreateBucketConfiguration=location)\n", 280 | " print(f\"Bucket '{bucket_name}' created successfully\")" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "id": "b820737e-8424-426f-b966-c5696eb2ed1a", 287 | "metadata": { 288 | "tags": [] 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "# upload to s3\n", 293 | "print(\n", 294 | " f\"uploading `model.tar.gz` archive to s3://{bucket_name}/{model_s3_key_prefix}model.tar.gz\"\n", 295 | ")\n", 296 | "model_uri = upload_file_to_s3(bucket_name=bucket_name, key_prefix=model_s3_key_prefix)\n", 297 | "print(f\"Successfully uploaded to {model_uri}\")" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "id": "6693abd6", 303 | "metadata": {}, 304 | "source": [ 305 | "Amazon SageMaker Inference is a managed service that allows you to deploy machine learning models to make predictions or perform inference on new data. It enables you to create an endpoint that can be accessed using HTTP requests to make predictions in real-time. This service is designed to make it easy to deploy and manage machine learning models in production. SageMaker Inference provides a scalable, reliable, and cost-effective way to deploy machine learning models. For deploying a model with SageMaker Inference we will use the SageMaker SDK and leverage the built-in HuggingFace integration.\n", 306 | "\n", 307 | "## Model packaging\n", 308 | "First, we package the model into the 'HuggingFaceModel' class by specifying the following parameters:\n", 309 | "- image_uri: The image uri of a Docker image used for hosting the model. We will be using on of the many ready-to-use Deep Learning Containers AWS is providing [here](https://aws.amazon.com/machine-learning/containers/). Deep Learning Containers are Docker images that are preinstalled and tested with the latest versions of popular deep learning frameworks. Deep Learning Containers lets you deploy custom ML environments quickly without building and optimizing your environments from scratch. Since we are deploying a model from the HuggingFace model hub, we will use one of the HuggingFace DLCs, coming with preinstalled python 3.8, pytorch 1.10.2, transformers 4.17.0 dependencies and optimized for inference in GPU-accelerated environments. \n", 310 | "- model_data: S3 path to the model artifact we just created and uploaded to S3\n", 311 | "- role: IAM role, holding the required IAM permissions to perform the operations required to deploy a SageMaker Inference endpoint\n", 312 | "\n" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "id": "076078f9-fbf5-44f8-80da-ef1aaaf9eec0", 319 | "metadata": { 320 | "tags": [] 321 | }, 322 | "outputs": [], 323 | "source": [ 324 | "# create Hugging Face Model Class\n", 325 | "huggingface_model = HuggingFaceModel(\n", 326 | " image_uri=f'763104351884.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04',\n", 327 | " model_data=model_uri,\n", 328 | "\trole=role\n", 329 | " )" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "id": "0318fcb3", 335 | "metadata": {}, 336 | "source": [ 337 | "## Model deployment\n", 338 | "The created model package can now be used to deploy the actual model by calling its .deploy() function. Thereby, the following parameters have to be specified:\n", 339 | "- initial_instance_count: number of endpoint instances to be deployed \n", 340 | "- instance_type: EC2 instance type used for endpoint hosting\n", 341 | "- endpoint_name: name of endpoint\n", 342 | "\n", 343 | "Note that the SageMaker SDK creates the following two resources for you in the background:\n", 344 | "- EndpointConfiguration\n", 345 | "- Endpoint\n", 346 | "You can check these in the Inference section of the SageMaker section in the AWS console once the model has been successfully deployed.\n", 347 | "\n", 348 | "![Endpoints EndpointConfigurations](../img/endpointEndpointConfiguration.png)" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": null, 354 | "id": "3dea56b6-e602-4e5f-afd8-031b580c0494", 355 | "metadata": { 356 | "tags": [] 357 | }, 358 | "outputs": [], 359 | "source": [ 360 | "# deploy model to SageMaker Inference\n", 361 | "predictor = huggingface_model.deploy(\n", 362 | " initial_instance_count=1, # number of instances\n", 363 | " instance_type='ml.g4dn.4xlarge', \n", 364 | " endpoint_name='sm-endpoint-gpt-j-6b-immersion-day',\n", 365 | ")" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "id": "2555072b", 371 | "metadata": {}, 372 | "source": [ 373 | "# Inference \n", 374 | "## First try\n", 375 | "The .deploy() function returns an object of the HuggingFacePredictor class. This class implements functionality around the interfaces for the actual inference against deployed endpoints. Amongst others, it implements a .predict() function that can be used to conveniently call the endpoint for inference. When calling it, we can pass an object tothe function that consists of an 'inputs' parameter holding the prompt to be passed to the model. \n", 376 | "\n", 377 | "Hint: in case an error occurs, go and check the CloudWatch logs. Try to figure out what happened!" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "id": "3e5ab2f5", 384 | "metadata": { 385 | "tags": [] 386 | }, 387 | "outputs": [], 388 | "source": [ 389 | "# Calling the predict() function for inference \n", 390 | "predictor.predict({\"inputs\": \"What is the capital of Germany?\"})" 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "id": "c5839cf6", 396 | "metadata": {}, 397 | "source": [ 398 | "For more advanced use cases we can also specify a second parameter: 'parameters' is a python dictionary consisting of model-specific parameters passed to the model for customization of the output generated. For the GPT-J model, amongst [many options given](https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task) we specify the following parameters:\n", 399 | "- max_new_tokens: maximum number of tokens to be generated by the model \n", 400 | "- temperature: creativity of generated text. According to our experience, values between 0.2 (newspaper article, code) and 1.2 (poem) lead to high-quality results. \n", 401 | "- repetition_penalty: penalty for repeating occurence of tokens\n", 402 | "- top_k: breadth of vocabulary used. top amount of token candidates taken into account when sampling for next token prediction\n", 403 | "- return_full_text: boolean variable indicating if input prompt should be returned with result" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": null, 409 | "id": "75ff1cc4-022c-4989-ba24-14d6c6c4f45b", 410 | "metadata": { 411 | "tags": [] 412 | }, 413 | "outputs": [], 414 | "source": [ 415 | "predictor.predict({\"inputs\": \"What is the capital of Germany?\",\n", 416 | "\"parameters\": {\n", 417 | " \"max_new_tokens\": 30,\n", 418 | " \"temperature\": 0.5,\n", 419 | " \"repetition_penalty\": 1.1,\n", 420 | " \"top_k\": 20,\n", 421 | " \"return_full_text\": False\n", 422 | "}\n", 423 | "})" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "id": "56ae7538-6b09-4db2-8494-37b49a30933c", 429 | "metadata": {}, 430 | "source": [ 431 | "## Prompt Engineering\n", 432 | "Prompt engineering is a technique used to design effective prompts for LLMs with the goal to achieve: \n", 433 | "\n", 434 | "- Control over the output: With prompt engineering, developers can control the output generated by LLMs. By designing prompts that specify the desired topic, style, tone, and level of formality, they can guide the LLM to produce text that meets the desired criteria.\n", 435 | "- Mitigating bias: LLMs have been shown to produce biased outputs when prompted with certain topics or language patterns. By engineering prompts that avoid biased language and encourage fairness, developers can help mitigate these issues.\n", 436 | "- Improving efficiency: Prompt engineering can help LLMs work more efficiently by guiding them to generate the desired output with fewer iterations. By providing clear, concise, and specific prompts, developers can help LLMs achieve the desired outcome faster and with fewer errors.\n", 437 | "\n", 438 | "In general, a prompt can contain any of the following components:\n", 439 | "\n", 440 | "- Instruction - a specific task or instruction you want the model to perform\n", 441 | "- Context - can involve external information or additional context that can steer the model to better responses\n", 442 | "- Input Data - is the input or question that we are interested to find a response for\n", 443 | "- Output Indicator - indicates the type or format of output.\n", 444 | "\n", 445 | "In general, the more information we provide with the prompt the better the above mentioned goals will be achieved.\n", 446 | "\n", 447 | "Let's try it out!" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": null, 453 | "id": "60c30344-99d6-4670-a850-a41f9895428f", 454 | "metadata": { 455 | "tags": [] 456 | }, 457 | "outputs": [], 458 | "source": [ 459 | "# Simple unstructured prompt\n", 460 | "prompt = \"\"\"\n", 461 | "Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.\n", 462 | "\n", 463 | "What was OKT3 originally sourced from?\"\"\"\n", 464 | "\n", 465 | "predictor.predict({\"inputs\": prompt,\n", 466 | "\"parameters\": {\n", 467 | " \"max_new_tokens\": 10,\n", 468 | " \"temperature\": 0.7,\n", 469 | " \"repetition_penalty\": 1.1,\n", 470 | " \"top_k\": 20,\n", 471 | " \"return_full_text\": False\n", 472 | "}\n", 473 | "})" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": null, 479 | "id": "c149ed9b-fd75-4225-af16-1f0b111ec2aa", 480 | "metadata": { 481 | "tags": [] 482 | }, 483 | "outputs": [], 484 | "source": [ 485 | "# We now stick to the scheme proposed above\n", 486 | "prompt = \"\"\"\n", 487 | "Answer the question based on the context below. Keep the answer short and concise. Respond \"Unsure about answer\" if not sure about the answer.\n", 488 | "\n", 489 | "Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.\n", 490 | "\n", 491 | "Question: What was OKT3 originally sourced from?\n", 492 | "\n", 493 | "Answer:\"\"\"\n", 494 | "\n", 495 | "predictor.predict({\"inputs\": prompt,\n", 496 | "\"parameters\": {\n", 497 | " \"max_new_tokens\": 10,\n", 498 | " \"temperature\": 0.7,\n", 499 | " \"repetition_penalty\": 1.1,\n", 500 | " \"top_k\": 20,\n", 501 | " \"return_full_text\": False\n", 502 | "}\n", 503 | "})" 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "id": "4c922d5c-0725-4472-a4da-102729ab62ef", 509 | "metadata": {}, 510 | "source": [ 511 | "In addition, [few-shot learning](https://www.analyticsvidhya.com/blog/2021/05/an-introduction-to-few-shot-learning/) is an interesting approach for the context element of a prompt. Few-shot learning is a prompt engineering technique that enables models to learn new tasks or concepts from only a few examples (usually a single digit number is just fine) or samples. Despite of the fact that the model has never seen this task in the training phase, we experience a significant boost in performance. " 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": null, 517 | "id": "16512452-c095-4783-8e73-82429b478e29", 518 | "metadata": { 519 | "tags": [] 520 | }, 521 | "outputs": [], 522 | "source": [ 523 | "# One-shot\n", 524 | "prompt = \"\"\"\n", 525 | "Tweet: \"This new music video was incredibile\"\n", 526 | "Sentiment:\"\"\"\n", 527 | "predictor.predict({\"inputs\": prompt,\n", 528 | "\"parameters\": {\n", 529 | " \"max_new_tokens\": 20,\n", 530 | " \"temperature\": 0.5,\n", 531 | " \"repetition_penalty\": 1.1,\n", 532 | " \"top_k\": 20,\n", 533 | " \"return_full_text\": False\n", 534 | "}\n", 535 | "})" 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": null, 541 | "id": "46858a2d", 542 | "metadata": { 543 | "tags": [] 544 | }, 545 | "outputs": [], 546 | "source": [ 547 | "# Few-shot\n", 548 | "prompt = \"\"\"\n", 549 | "Tweet: \"I hate it when my phone battery dies.\"\n", 550 | "Sentiment: Negative\n", 551 | "###\n", 552 | "Tweet: \"My day has been 👍\"\n", 553 | "Sentiment: Positive\n", 554 | "###\n", 555 | "Tweet: \"This is the link to the article\"\n", 556 | "Sentiment: Neutral\n", 557 | "###\n", 558 | "Tweet: \"This new music video was incredibile\"\n", 559 | "Sentiment:\"\"\"\n", 560 | "predictor.predict({\"inputs\": prompt,\n", 561 | "\"parameters\": {\n", 562 | " \"max_new_tokens\": 20,\n", 563 | " \"temperature\": 0.5,\n", 564 | " \"repetition_penalty\": 1.1,\n", 565 | " \"top_k\": 20,\n", 566 | " \"return_full_text\": False\n", 567 | "}\n", 568 | "})" 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "id": "20e4a455", 574 | "metadata": {}, 575 | "source": [ 576 | "# Cleanup\n", 577 | "Finally, we clean up all resources not needed anymore since we pledge for the responsible use of compute resources. In this case this is the created endpoint together with the respective endpoint configuration. " 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": null, 583 | "id": "e30cb399-97f8-43cd-b446-2d7ac5081a42", 584 | "metadata": { 585 | "tags": [] 586 | }, 587 | "outputs": [], 588 | "source": [ 589 | "predictor.delete_endpoint(delete_endpoint_config=True)" 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": null, 595 | "id": "c3a19208-f871-4098-80a5-931a9e8c276c", 596 | "metadata": {}, 597 | "outputs": [], 598 | "source": [] 599 | } 600 | ], 601 | "metadata": { 602 | "availableInstances": [ 603 | { 604 | "_defaultOrder": 0, 605 | "_isFastLaunch": true, 606 | "category": "General purpose", 607 | "gpuNum": 0, 608 | "hideHardwareSpecs": false, 609 | "memoryGiB": 4, 610 | "name": "ml.t3.medium", 611 | "vcpuNum": 2 612 | }, 613 | { 614 | "_defaultOrder": 1, 615 | "_isFastLaunch": false, 616 | "category": "General purpose", 617 | "gpuNum": 0, 618 | "hideHardwareSpecs": false, 619 | "memoryGiB": 8, 620 | "name": "ml.t3.large", 621 | "vcpuNum": 2 622 | }, 623 | { 624 | "_defaultOrder": 2, 625 | "_isFastLaunch": false, 626 | "category": "General purpose", 627 | "gpuNum": 0, 628 | "hideHardwareSpecs": false, 629 | "memoryGiB": 16, 630 | "name": "ml.t3.xlarge", 631 | "vcpuNum": 4 632 | }, 633 | { 634 | "_defaultOrder": 3, 635 | "_isFastLaunch": false, 636 | "category": "General purpose", 637 | "gpuNum": 0, 638 | "hideHardwareSpecs": false, 639 | "memoryGiB": 32, 640 | "name": "ml.t3.2xlarge", 641 | "vcpuNum": 8 642 | }, 643 | { 644 | "_defaultOrder": 4, 645 | "_isFastLaunch": true, 646 | "category": "General purpose", 647 | "gpuNum": 0, 648 | "hideHardwareSpecs": false, 649 | "memoryGiB": 8, 650 | "name": "ml.m5.large", 651 | "vcpuNum": 2 652 | }, 653 | { 654 | "_defaultOrder": 5, 655 | "_isFastLaunch": false, 656 | "category": "General purpose", 657 | "gpuNum": 0, 658 | "hideHardwareSpecs": false, 659 | "memoryGiB": 16, 660 | "name": "ml.m5.xlarge", 661 | "vcpuNum": 4 662 | }, 663 | { 664 | "_defaultOrder": 6, 665 | "_isFastLaunch": false, 666 | "category": "General purpose", 667 | "gpuNum": 0, 668 | "hideHardwareSpecs": false, 669 | "memoryGiB": 32, 670 | "name": "ml.m5.2xlarge", 671 | "vcpuNum": 8 672 | }, 673 | { 674 | "_defaultOrder": 7, 675 | "_isFastLaunch": false, 676 | "category": "General purpose", 677 | "gpuNum": 0, 678 | "hideHardwareSpecs": false, 679 | "memoryGiB": 64, 680 | "name": "ml.m5.4xlarge", 681 | "vcpuNum": 16 682 | }, 683 | { 684 | "_defaultOrder": 8, 685 | "_isFastLaunch": false, 686 | "category": "General purpose", 687 | "gpuNum": 0, 688 | "hideHardwareSpecs": false, 689 | "memoryGiB": 128, 690 | "name": "ml.m5.8xlarge", 691 | "vcpuNum": 32 692 | }, 693 | { 694 | "_defaultOrder": 9, 695 | "_isFastLaunch": false, 696 | "category": "General purpose", 697 | "gpuNum": 0, 698 | "hideHardwareSpecs": false, 699 | "memoryGiB": 192, 700 | "name": "ml.m5.12xlarge", 701 | "vcpuNum": 48 702 | }, 703 | { 704 | "_defaultOrder": 10, 705 | "_isFastLaunch": false, 706 | "category": "General purpose", 707 | "gpuNum": 0, 708 | "hideHardwareSpecs": false, 709 | "memoryGiB": 256, 710 | "name": "ml.m5.16xlarge", 711 | "vcpuNum": 64 712 | }, 713 | { 714 | "_defaultOrder": 11, 715 | "_isFastLaunch": false, 716 | "category": "General purpose", 717 | "gpuNum": 0, 718 | "hideHardwareSpecs": false, 719 | "memoryGiB": 384, 720 | "name": "ml.m5.24xlarge", 721 | "vcpuNum": 96 722 | }, 723 | { 724 | "_defaultOrder": 12, 725 | "_isFastLaunch": false, 726 | "category": "General purpose", 727 | "gpuNum": 0, 728 | "hideHardwareSpecs": false, 729 | "memoryGiB": 8, 730 | "name": "ml.m5d.large", 731 | "vcpuNum": 2 732 | }, 733 | { 734 | "_defaultOrder": 13, 735 | "_isFastLaunch": false, 736 | "category": "General purpose", 737 | "gpuNum": 0, 738 | "hideHardwareSpecs": false, 739 | "memoryGiB": 16, 740 | "name": "ml.m5d.xlarge", 741 | "vcpuNum": 4 742 | }, 743 | { 744 | "_defaultOrder": 14, 745 | "_isFastLaunch": false, 746 | "category": "General purpose", 747 | "gpuNum": 0, 748 | "hideHardwareSpecs": false, 749 | "memoryGiB": 32, 750 | "name": "ml.m5d.2xlarge", 751 | "vcpuNum": 8 752 | }, 753 | { 754 | "_defaultOrder": 15, 755 | "_isFastLaunch": false, 756 | "category": "General purpose", 757 | "gpuNum": 0, 758 | "hideHardwareSpecs": false, 759 | "memoryGiB": 64, 760 | "name": "ml.m5d.4xlarge", 761 | "vcpuNum": 16 762 | }, 763 | { 764 | "_defaultOrder": 16, 765 | "_isFastLaunch": false, 766 | "category": "General purpose", 767 | "gpuNum": 0, 768 | "hideHardwareSpecs": false, 769 | "memoryGiB": 128, 770 | "name": "ml.m5d.8xlarge", 771 | "vcpuNum": 32 772 | }, 773 | { 774 | "_defaultOrder": 17, 775 | "_isFastLaunch": false, 776 | "category": "General purpose", 777 | "gpuNum": 0, 778 | "hideHardwareSpecs": false, 779 | "memoryGiB": 192, 780 | "name": "ml.m5d.12xlarge", 781 | "vcpuNum": 48 782 | }, 783 | { 784 | "_defaultOrder": 18, 785 | "_isFastLaunch": false, 786 | "category": "General purpose", 787 | "gpuNum": 0, 788 | "hideHardwareSpecs": false, 789 | "memoryGiB": 256, 790 | "name": "ml.m5d.16xlarge", 791 | "vcpuNum": 64 792 | }, 793 | { 794 | "_defaultOrder": 19, 795 | "_isFastLaunch": false, 796 | "category": "General purpose", 797 | "gpuNum": 0, 798 | "hideHardwareSpecs": false, 799 | "memoryGiB": 384, 800 | "name": "ml.m5d.24xlarge", 801 | "vcpuNum": 96 802 | }, 803 | { 804 | "_defaultOrder": 20, 805 | "_isFastLaunch": false, 806 | "category": "General purpose", 807 | "gpuNum": 0, 808 | "hideHardwareSpecs": true, 809 | "memoryGiB": 0, 810 | "name": "ml.geospatial.interactive", 811 | "supportedImageNames": [ 812 | "sagemaker-geospatial-v1-0" 813 | ], 814 | "vcpuNum": 0 815 | }, 816 | { 817 | "_defaultOrder": 21, 818 | "_isFastLaunch": true, 819 | "category": "Compute optimized", 820 | "gpuNum": 0, 821 | "hideHardwareSpecs": false, 822 | "memoryGiB": 4, 823 | "name": "ml.c5.large", 824 | "vcpuNum": 2 825 | }, 826 | { 827 | "_defaultOrder": 22, 828 | "_isFastLaunch": false, 829 | "category": "Compute optimized", 830 | "gpuNum": 0, 831 | "hideHardwareSpecs": false, 832 | "memoryGiB": 8, 833 | "name": "ml.c5.xlarge", 834 | "vcpuNum": 4 835 | }, 836 | { 837 | "_defaultOrder": 23, 838 | "_isFastLaunch": false, 839 | "category": "Compute optimized", 840 | "gpuNum": 0, 841 | "hideHardwareSpecs": false, 842 | "memoryGiB": 16, 843 | "name": "ml.c5.2xlarge", 844 | "vcpuNum": 8 845 | }, 846 | { 847 | "_defaultOrder": 24, 848 | "_isFastLaunch": false, 849 | "category": "Compute optimized", 850 | "gpuNum": 0, 851 | "hideHardwareSpecs": false, 852 | "memoryGiB": 32, 853 | "name": "ml.c5.4xlarge", 854 | "vcpuNum": 16 855 | }, 856 | { 857 | "_defaultOrder": 25, 858 | "_isFastLaunch": false, 859 | "category": "Compute optimized", 860 | "gpuNum": 0, 861 | "hideHardwareSpecs": false, 862 | "memoryGiB": 72, 863 | "name": "ml.c5.9xlarge", 864 | "vcpuNum": 36 865 | }, 866 | { 867 | "_defaultOrder": 26, 868 | "_isFastLaunch": false, 869 | "category": "Compute optimized", 870 | "gpuNum": 0, 871 | "hideHardwareSpecs": false, 872 | "memoryGiB": 96, 873 | "name": "ml.c5.12xlarge", 874 | "vcpuNum": 48 875 | }, 876 | { 877 | "_defaultOrder": 27, 878 | "_isFastLaunch": false, 879 | "category": "Compute optimized", 880 | "gpuNum": 0, 881 | "hideHardwareSpecs": false, 882 | "memoryGiB": 144, 883 | "name": "ml.c5.18xlarge", 884 | "vcpuNum": 72 885 | }, 886 | { 887 | "_defaultOrder": 28, 888 | "_isFastLaunch": false, 889 | "category": "Compute optimized", 890 | "gpuNum": 0, 891 | "hideHardwareSpecs": false, 892 | "memoryGiB": 192, 893 | "name": "ml.c5.24xlarge", 894 | "vcpuNum": 96 895 | }, 896 | { 897 | "_defaultOrder": 29, 898 | "_isFastLaunch": true, 899 | "category": "Accelerated computing", 900 | "gpuNum": 1, 901 | "hideHardwareSpecs": false, 902 | "memoryGiB": 16, 903 | "name": "ml.g4dn.xlarge", 904 | "vcpuNum": 4 905 | }, 906 | { 907 | "_defaultOrder": 30, 908 | "_isFastLaunch": false, 909 | "category": "Accelerated computing", 910 | "gpuNum": 1, 911 | "hideHardwareSpecs": false, 912 | "memoryGiB": 32, 913 | "name": "ml.g4dn.2xlarge", 914 | "vcpuNum": 8 915 | }, 916 | { 917 | "_defaultOrder": 31, 918 | "_isFastLaunch": false, 919 | "category": "Accelerated computing", 920 | "gpuNum": 1, 921 | "hideHardwareSpecs": false, 922 | "memoryGiB": 64, 923 | "name": "ml.g4dn.4xlarge", 924 | "vcpuNum": 16 925 | }, 926 | { 927 | "_defaultOrder": 32, 928 | "_isFastLaunch": false, 929 | "category": "Accelerated computing", 930 | "gpuNum": 1, 931 | "hideHardwareSpecs": false, 932 | "memoryGiB": 128, 933 | "name": "ml.g4dn.8xlarge", 934 | "vcpuNum": 32 935 | }, 936 | { 937 | "_defaultOrder": 33, 938 | "_isFastLaunch": false, 939 | "category": "Accelerated computing", 940 | "gpuNum": 4, 941 | "hideHardwareSpecs": false, 942 | "memoryGiB": 192, 943 | "name": "ml.g4dn.12xlarge", 944 | "vcpuNum": 48 945 | }, 946 | { 947 | "_defaultOrder": 34, 948 | "_isFastLaunch": false, 949 | "category": "Accelerated computing", 950 | "gpuNum": 1, 951 | "hideHardwareSpecs": false, 952 | "memoryGiB": 256, 953 | "name": "ml.g4dn.16xlarge", 954 | "vcpuNum": 64 955 | }, 956 | { 957 | "_defaultOrder": 35, 958 | "_isFastLaunch": false, 959 | "category": "Accelerated computing", 960 | "gpuNum": 1, 961 | "hideHardwareSpecs": false, 962 | "memoryGiB": 61, 963 | "name": "ml.p3.2xlarge", 964 | "vcpuNum": 8 965 | }, 966 | { 967 | "_defaultOrder": 36, 968 | "_isFastLaunch": false, 969 | "category": "Accelerated computing", 970 | "gpuNum": 4, 971 | "hideHardwareSpecs": false, 972 | "memoryGiB": 244, 973 | "name": "ml.p3.8xlarge", 974 | "vcpuNum": 32 975 | }, 976 | { 977 | "_defaultOrder": 37, 978 | "_isFastLaunch": false, 979 | "category": "Accelerated computing", 980 | "gpuNum": 8, 981 | "hideHardwareSpecs": false, 982 | "memoryGiB": 488, 983 | "name": "ml.p3.16xlarge", 984 | "vcpuNum": 64 985 | }, 986 | { 987 | "_defaultOrder": 38, 988 | "_isFastLaunch": false, 989 | "category": "Accelerated computing", 990 | "gpuNum": 8, 991 | "hideHardwareSpecs": false, 992 | "memoryGiB": 768, 993 | "name": "ml.p3dn.24xlarge", 994 | "vcpuNum": 96 995 | }, 996 | { 997 | "_defaultOrder": 39, 998 | "_isFastLaunch": false, 999 | "category": "Memory Optimized", 1000 | "gpuNum": 0, 1001 | "hideHardwareSpecs": false, 1002 | "memoryGiB": 16, 1003 | "name": "ml.r5.large", 1004 | "vcpuNum": 2 1005 | }, 1006 | { 1007 | "_defaultOrder": 40, 1008 | "_isFastLaunch": false, 1009 | "category": "Memory Optimized", 1010 | "gpuNum": 0, 1011 | "hideHardwareSpecs": false, 1012 | "memoryGiB": 32, 1013 | "name": "ml.r5.xlarge", 1014 | "vcpuNum": 4 1015 | }, 1016 | { 1017 | "_defaultOrder": 41, 1018 | "_isFastLaunch": false, 1019 | "category": "Memory Optimized", 1020 | "gpuNum": 0, 1021 | "hideHardwareSpecs": false, 1022 | "memoryGiB": 64, 1023 | "name": "ml.r5.2xlarge", 1024 | "vcpuNum": 8 1025 | }, 1026 | { 1027 | "_defaultOrder": 42, 1028 | "_isFastLaunch": false, 1029 | "category": "Memory Optimized", 1030 | "gpuNum": 0, 1031 | "hideHardwareSpecs": false, 1032 | "memoryGiB": 128, 1033 | "name": "ml.r5.4xlarge", 1034 | "vcpuNum": 16 1035 | }, 1036 | { 1037 | "_defaultOrder": 43, 1038 | "_isFastLaunch": false, 1039 | "category": "Memory Optimized", 1040 | "gpuNum": 0, 1041 | "hideHardwareSpecs": false, 1042 | "memoryGiB": 256, 1043 | "name": "ml.r5.8xlarge", 1044 | "vcpuNum": 32 1045 | }, 1046 | { 1047 | "_defaultOrder": 44, 1048 | "_isFastLaunch": false, 1049 | "category": "Memory Optimized", 1050 | "gpuNum": 0, 1051 | "hideHardwareSpecs": false, 1052 | "memoryGiB": 384, 1053 | "name": "ml.r5.12xlarge", 1054 | "vcpuNum": 48 1055 | }, 1056 | { 1057 | "_defaultOrder": 45, 1058 | "_isFastLaunch": false, 1059 | "category": "Memory Optimized", 1060 | "gpuNum": 0, 1061 | "hideHardwareSpecs": false, 1062 | "memoryGiB": 512, 1063 | "name": "ml.r5.16xlarge", 1064 | "vcpuNum": 64 1065 | }, 1066 | { 1067 | "_defaultOrder": 46, 1068 | "_isFastLaunch": false, 1069 | "category": "Memory Optimized", 1070 | "gpuNum": 0, 1071 | "hideHardwareSpecs": false, 1072 | "memoryGiB": 768, 1073 | "name": "ml.r5.24xlarge", 1074 | "vcpuNum": 96 1075 | }, 1076 | { 1077 | "_defaultOrder": 47, 1078 | "_isFastLaunch": false, 1079 | "category": "Accelerated computing", 1080 | "gpuNum": 1, 1081 | "hideHardwareSpecs": false, 1082 | "memoryGiB": 16, 1083 | "name": "ml.g5.xlarge", 1084 | "vcpuNum": 4 1085 | }, 1086 | { 1087 | "_defaultOrder": 48, 1088 | "_isFastLaunch": false, 1089 | "category": "Accelerated computing", 1090 | "gpuNum": 1, 1091 | "hideHardwareSpecs": false, 1092 | "memoryGiB": 32, 1093 | "name": "ml.g5.2xlarge", 1094 | "vcpuNum": 8 1095 | }, 1096 | { 1097 | "_defaultOrder": 49, 1098 | "_isFastLaunch": false, 1099 | "category": "Accelerated computing", 1100 | "gpuNum": 1, 1101 | "hideHardwareSpecs": false, 1102 | "memoryGiB": 64, 1103 | "name": "ml.g5.4xlarge", 1104 | "vcpuNum": 16 1105 | }, 1106 | { 1107 | "_defaultOrder": 50, 1108 | "_isFastLaunch": false, 1109 | "category": "Accelerated computing", 1110 | "gpuNum": 1, 1111 | "hideHardwareSpecs": false, 1112 | "memoryGiB": 128, 1113 | "name": "ml.g5.8xlarge", 1114 | "vcpuNum": 32 1115 | }, 1116 | { 1117 | "_defaultOrder": 51, 1118 | "_isFastLaunch": false, 1119 | "category": "Accelerated computing", 1120 | "gpuNum": 1, 1121 | "hideHardwareSpecs": false, 1122 | "memoryGiB": 256, 1123 | "name": "ml.g5.16xlarge", 1124 | "vcpuNum": 64 1125 | }, 1126 | { 1127 | "_defaultOrder": 52, 1128 | "_isFastLaunch": false, 1129 | "category": "Accelerated computing", 1130 | "gpuNum": 4, 1131 | "hideHardwareSpecs": false, 1132 | "memoryGiB": 192, 1133 | "name": "ml.g5.12xlarge", 1134 | "vcpuNum": 48 1135 | }, 1136 | { 1137 | "_defaultOrder": 53, 1138 | "_isFastLaunch": false, 1139 | "category": "Accelerated computing", 1140 | "gpuNum": 4, 1141 | "hideHardwareSpecs": false, 1142 | "memoryGiB": 384, 1143 | "name": "ml.g5.24xlarge", 1144 | "vcpuNum": 96 1145 | }, 1146 | { 1147 | "_defaultOrder": 54, 1148 | "_isFastLaunch": false, 1149 | "category": "Accelerated computing", 1150 | "gpuNum": 8, 1151 | "hideHardwareSpecs": false, 1152 | "memoryGiB": 768, 1153 | "name": "ml.g5.48xlarge", 1154 | "vcpuNum": 192 1155 | }, 1156 | { 1157 | "_defaultOrder": 55, 1158 | "_isFastLaunch": false, 1159 | "category": "Accelerated computing", 1160 | "gpuNum": 8, 1161 | "hideHardwareSpecs": false, 1162 | "memoryGiB": 1152, 1163 | "name": "ml.p4d.24xlarge", 1164 | "vcpuNum": 96 1165 | }, 1166 | { 1167 | "_defaultOrder": 56, 1168 | "_isFastLaunch": false, 1169 | "category": "Accelerated computing", 1170 | "gpuNum": 8, 1171 | "hideHardwareSpecs": false, 1172 | "memoryGiB": 1152, 1173 | "name": "ml.p4de.24xlarge", 1174 | "vcpuNum": 96 1175 | } 1176 | ], 1177 | "instance_type": "ml.t3.medium", 1178 | "kernelspec": { 1179 | "display_name": "Python 3 (Data Science)", 1180 | "language": "python", 1181 | "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" 1182 | }, 1183 | "language_info": { 1184 | "codemirror_mode": { 1185 | "name": "ipython", 1186 | "version": 3 1187 | }, 1188 | "file_extension": ".py", 1189 | "mimetype": "text/x-python", 1190 | "name": "python", 1191 | "nbconvert_exporter": "python", 1192 | "pygments_lexer": "ipython3", 1193 | "version": "3.7.10" 1194 | }, 1195 | "vscode": { 1196 | "interpreter": { 1197 | "hash": "5c7b89af1651d0b8571dde13640ecdccf7d5a6204171d6ab33e7c296e100e08a" 1198 | } 1199 | } 1200 | }, 1201 | "nbformat": 4, 1202 | "nbformat_minor": 5 1203 | } 1204 | -------------------------------------------------------------------------------- /lab2/finetuning/finetuning.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | # Copyright 2020 The HuggingFace Inc. team. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """ 17 | Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...) on a text file or a dataset. 18 | 19 | Here is the full list of checkpoints on the hub that can be fine-tuned by this script: 20 | https://huggingface.co/models?filter=text-generation 21 | """ 22 | # You can also adapt this script on your own causal language modeling task. Pointers for this are left as comments. 23 | 24 | import logging 25 | import math 26 | import os 27 | import sys 28 | import argparse 29 | from itertools import chain 30 | 31 | import datasets 32 | import evaluate 33 | import torch 34 | from datasets import load_dataset 35 | 36 | import transformers 37 | from transformers import ( 38 | AutoConfig, 39 | AutoModelForCausalLM, 40 | AutoTokenizer, 41 | Trainer, 42 | TrainingArguments, 43 | default_data_collator, 44 | is_torch_tpu_available, 45 | set_seed, 46 | ) 47 | from transformers.testing_utils import CaptureLogger 48 | 49 | 50 | logger = logging.getLogger(__name__) 51 | 52 | 53 | def main(): 54 | parser = argparse.ArgumentParser() 55 | 56 | # Training parameters 57 | parser.add_argument("--model_name_or_path", default="distilgpt2") 58 | parser.add_argument("--model_revision", default="main") 59 | 60 | parser.add_argument("--dataset_name", default="tiny_shakespeare") 61 | parser.add_argument("--do_train", default=1) 62 | parser.add_argument("--do_eval", default=1) 63 | parser.add_argument("--output_dir", default="/opt/ml/model") 64 | 65 | parser.add_argument("--per_device_train_batch_size", default=2) 66 | parser.add_argument("--per_device_eval_batch_size", default=2) 67 | 68 | print('train.py starting...') 69 | 70 | args = parser.parse_args() 71 | 72 | # Setup logging 73 | logging.basicConfig( 74 | format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", 75 | datefmt="%m/%d/%Y %H:%M:%S", 76 | handlers=[logging.StreamHandler(sys.stdout)], 77 | ) 78 | 79 | log_level = logging.INFO 80 | logger.setLevel(log_level) 81 | datasets.utils.logging.set_verbosity(log_level) 82 | transformers.utils.logging.set_verbosity(log_level) 83 | transformers.utils.logging.enable_default_handler() 84 | transformers.utils.logging.enable_explicit_format() 85 | 86 | # Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below) 87 | # or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/ 88 | # (the dataset will be downloaded automatically from the datasets Hub). 89 | # 90 | # For CSV/JSON files, this script will use the column called 'text' or the first column if no column called 91 | # 'text' is found. You can easily tweak this behavior (see below). 92 | # 93 | # In distributed training, the load_dataset function guarantee that only one local process can concurrently 94 | # download the dataset. 95 | if args.dataset_name is not None: 96 | # Downloading and loading a dataset from the hub. 97 | raw_datasets = load_dataset( 98 | args.dataset_name 99 | ) 100 | if "validation" not in raw_datasets.keys(): 101 | raw_datasets["validation"] = load_dataset( 102 | args.dataset_name, 103 | split=f"train[:10%]", 104 | use_auth_token=None, 105 | keep_in_memory=False 106 | ) 107 | raw_datasets["train"] = load_dataset( 108 | args.dataset_name, 109 | split=f"train[90%:]", 110 | use_auth_token=None, 111 | keep_in_memory=False 112 | ) 113 | else: 114 | raise ValueError( 115 | f"Please specify a dataset to be used for training." 116 | ) 117 | 118 | # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at 119 | # https://huggingface.co/docs/datasets/loading_datasets.html. 120 | 121 | # Load pretrained model and tokenizer 122 | #q 123 | # Distributed training: 124 | # The .from_pretrained methods guarantee that only one local process can concurrently 125 | # download model & vocab. 126 | 127 | #config = AutoConfig.from_pretrained(args.model_name_or_path, revision = args.model_revision) 128 | 129 | if args.model_name_or_path: 130 | tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path, use_fast = True, revision = args.model_revision) 131 | else: 132 | raise ValueError( 133 | "Please specify a model name with --model_name_or_path." 134 | ) 135 | 136 | if args.model_name_or_path: 137 | model = AutoModelForCausalLM.from_pretrained( 138 | args.model_name_or_path, 139 | # config=config, 140 | revision=args.model_revision, 141 | torch_dtype="auto", 142 | ) 143 | else: 144 | raise ValueError( 145 | "Please specify a model name with --model_name_or_path." 146 | ) 147 | 148 | # We resize the embeddings only when necessary to avoid index errors. 149 | embedding_size = model.get_input_embeddings().weight.shape[0] 150 | if len(tokenizer) > embedding_size: 151 | model.resize_token_embeddings(len(tokenizer)) 152 | 153 | # Preprocessing the datasets. 154 | # First we tokenize all the texts. 155 | column_names = list(raw_datasets["train"].features) 156 | text_column_name = "text" if "text" in column_names else column_names[0] 157 | 158 | # since this will be pickled to avoid _LazyModule error in Hasher force logger loading before tokenize_function 159 | tok_logger = transformers.utils.logging.get_logger("transformers.tokenization_utils_base") 160 | 161 | def tokenize_function(examples): 162 | with CaptureLogger(tok_logger) as cl: 163 | output = tokenizer(examples[text_column_name]) 164 | # clm input could be much much longer than block_size 165 | if "Token indices sequence length is longer than the" in cl.out: 166 | tok_logger.warning( 167 | "^^^^^^^^^^^^^^^^ Please ignore the warning above - this long input will be chunked into smaller bits" 168 | " before being passed to the model." 169 | ) 170 | return output 171 | 172 | 173 | tokenized_datasets = raw_datasets.map( 174 | tokenize_function, 175 | batched=True, 176 | remove_columns=column_names, 177 | desc="Running tokenizer on dataset", 178 | 179 | ) 180 | 181 | block_size = tokenizer.model_max_length 182 | if block_size > 1024: 183 | logger.warning( 184 | "The chosen tokenizer supports a `model_max_length` that is longer than the default `block_size` value" 185 | " of 1024. If you would like to use a longer `block_size` up to `tokenizer.model_max_length` you can" 186 | " override this default with `--block_size xxx`." 187 | ) 188 | block_size = 1024 189 | 190 | 191 | # Main data processing function that will concatenate all texts from our dataset and generate chunks of block_size. 192 | def group_texts(examples): 193 | # Concatenate all texts. 194 | concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()} 195 | total_length = len(concatenated_examples[list(examples.keys())[0]]) 196 | # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can 197 | # customize this part to your needs. 198 | if total_length >= block_size: 199 | total_length = (total_length // block_size) * block_size 200 | # Split by chunks of max_len. 201 | result = { 202 | k: [t[i : i + block_size] for i in range(0, total_length, block_size)] 203 | for k, t in concatenated_examples.items() 204 | } 205 | result["labels"] = result["input_ids"].copy() 206 | return result 207 | 208 | # Note that with `batched=True`, this map processes 1,000 texts together, so group_texts throws away a remainder 209 | # for each of those groups of 1,000 texts. You can adjust that batch_size here but a higher value might be slower 210 | # to preprocess. 211 | # 212 | # To speed up this part, you could use multiprocessing. See the documentation of the map method for more information: 213 | # https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.map 214 | 215 | 216 | lm_datasets = tokenized_datasets.map( 217 | group_texts, 218 | batched=True, 219 | desc=f"Grouping texts in chunks of {block_size}", 220 | ) 221 | 222 | if args.do_train: 223 | if "train" not in tokenized_datasets: 224 | raise ValueError("--do_train requires a train dataset") 225 | train_dataset = lm_datasets["train"] 226 | 227 | if args.do_eval: 228 | if "validation" not in tokenized_datasets: 229 | raise ValueError("--do_eval requires a validation dataset") 230 | eval_dataset = lm_datasets["validation"] 231 | 232 | def preprocess_logits_for_metrics(logits, labels): 233 | if isinstance(logits, tuple): 234 | # Depending on the model and config, logits may contain extra tensors, 235 | # like past_key_values, but logits always come first 236 | logits = logits[0] 237 | return logits.argmax(dim=-1) 238 | 239 | metric = evaluate.load("accuracy") 240 | 241 | def compute_metrics(eval_preds): 242 | preds, labels = eval_preds 243 | # preds have the same shape as the labels, after the argmax(-1) has been calculated 244 | # by preprocess_logits_for_metrics but we need to shift the labels 245 | labels = labels[:, 1:].reshape(-1) 246 | preds = preds[:, :-1].reshape(-1) 247 | return metric.compute(predictions=preds, references=labels) 248 | 249 | # Specifying training_args. Going with default values for every parameter not explicitly specified. See documentation for more information: https://huggingface.co/docs/transformers/v4.27.2/en/main_classes/trainer#transformers.TrainingArguments 250 | training_args = TrainingArguments( 251 | per_device_train_batch_size = int(args.per_device_train_batch_size), 252 | per_device_eval_batch_size=int(args.per_device_eval_batch_size), 253 | output_dir=args.output_dir, 254 | seed=42, 255 | disable_tqdm=False 256 | ) 257 | 258 | # Initialize our Trainer 259 | trainer = Trainer( 260 | model=model, 261 | args=training_args, 262 | train_dataset=train_dataset if args.do_train else None, 263 | eval_dataset=eval_dataset if args.do_eval else None, 264 | tokenizer=tokenizer, 265 | # Data collator will default to DataCollatorWithPadding, so we change it. 266 | data_collator=default_data_collator, 267 | compute_metrics=compute_metrics if args.do_eval and not is_torch_tpu_available() else None, 268 | preprocess_logits_for_metrics=preprocess_logits_for_metrics 269 | if args.do_eval and not is_torch_tpu_available() 270 | else None, 271 | ) 272 | 273 | # Training 274 | if args.do_train: 275 | 276 | train_result = trainer.train() 277 | trainer.save_model() # Saves the tokenizer too for easy upload 278 | 279 | metrics = train_result.metrics 280 | 281 | metrics["train_samples"] = len(train_dataset) 282 | 283 | trainer.log_metrics("train", metrics) 284 | trainer.save_metrics("train", metrics) 285 | trainer.save_state() 286 | 287 | # Evaluation 288 | if args.do_eval: 289 | logger.info("*** Evaluate ***") 290 | 291 | metrics = trainer.evaluate() 292 | 293 | metrics["eval_samples"] = len(eval_dataset) 294 | 295 | try: 296 | perplexity = math.exp(metrics["eval_loss"]) 297 | except OverflowError: 298 | perplexity = float("inf") 299 | metrics["perplexity"] = perplexity 300 | 301 | trainer.log_metrics("eval", metrics) 302 | trainer.save_metrics("eval", metrics) 303 | 304 | 305 | 306 | def _mp_fn(index): 307 | # For xla_spawn (TPUs) 308 | main() 309 | 310 | 311 | if __name__ == "__main__": 312 | main() 313 | -------------------------------------------------------------------------------- /lab2/finetuning/requirements.txt: -------------------------------------------------------------------------------- 1 | accelerate==0.28.0 2 | torch >= 1.3 3 | datasets==2.18.0 4 | sentencepiece != 0.1.92 5 | protobuf 6 | scikit-learn 7 | transformers==4.38.0 8 | evaluate==0.4.0 9 | -------------------------------------------------------------------------------- /lab3/JumpStart_Stable_Diffusion_Inference_Only.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "16c61f54", 6 | "metadata": {}, 7 | "source": [ 8 | "# Introduction to JumpStart - Text to Image (Inference only)" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "bdc23bae", 14 | "metadata": {}, 15 | "source": [ 16 | "***\n", 17 | "Welcome to Amazon [SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html)! You can use JumpStart to solve many Machine Learning tasks through one-click in SageMaker Studio, or through [SageMaker JumpStart API](https://sagemaker.readthedocs.io/en/stable/overview.html#use-prebuilt-models-with-sagemaker-jumpstart). In this demo notebook, we demonstrate how to use the JumpStart API to generate images from text using state-of-the-art Stable Diffusion models.\n", 18 | "\n", 19 | "Stable Diffusion is a text-to-image model that enables you to create photorealistic images from just a text prompt. A diffusion model trains by learning to remove noise that was added to a real image. This de-noising process generates a realistic image. These models can also generate images from text alone by conditioning the generation process on the text. For instance, Stable Diffusion is a latent diffusion where the model learns to recognize shapes in a pure noise image and gradually brings these shapes into focus if the shapes match the words in the input text.\n", 20 | "\n", 21 | "Deploying large models and running inference on models such as Stable Diffusion is often challenging and include issues such as CUDA out of memory, payload size limit exceeded and so on. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. Furthermore, it provides guidance on each step of the process including the recommended instance types, how to select parameters to guide image generation process, prompt engineering etc. Moreover, you can deploy and run inference on any of the 80+ Diffusion models from JumpStart without having to write any piece of your own code.\n", 22 | "\n", 23 | "In this lab, you will learn how to use JumpStart to generate highly realistic and artistic images of any subject/object/environment/scene. This may be as simple as an image of a cute dog or as detailed as a hyper-realistic image of a beautifully decoraded cozy kitchen by pixer in the style of various artists with dramatic sunset lighting and long shadows with cinematic atmosphere. This can be used to design products and build catalogs for ecommerce business needs or to generate realistic art pieces or stock images.\n", 24 | "\n", 25 | "Notebook license: This notebook is provided under [MIT No Attribution license](https://github.com/aws/mit-0).\n", 26 | "\n", 27 | "Model license: By using this model, you agree to the [CreativeML Open RAIL-M++ license](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL).\n", 28 | "\n", 29 | "***" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "id": "5db28351", 35 | "metadata": {}, 36 | "source": [ 37 | "1. [Set Up](#1.-Set-Up)\n", 38 | "2. [Run inference on the pre-trained model](#2.-Run-inference-on-the-pre-trained-model)\n", 39 | " * [Select a model](#2.1.-Select-a-Model)\n", 40 | " * [Retrieve JumpStart Artifacts & Deploy an Endpoint](#2.2.-Retrieve-JumpStart-Artifacts-&-Deploy-an-Endpoint)\n", 41 | " * [Query endpoint and parse response](#2.3.-Query-endpoint-and-parse-response)\n", 42 | " * [Supported Inference parameters](#2.4.-Supported-Inference-parameters)\n", 43 | " * [Compressed Image Output](#2.5.-Compressed-Image-Output)\n", 44 | " * [Prompt Engineering](#2.6.-Prompt-Engineering)\n", 45 | " * [Clean up the endpoint](#2.7.-Clean-up-the-endpoint)\n", 46 | "3. [Conclusion](#3.-Conclusion)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "id": "ce462973", 52 | "metadata": {}, 53 | "source": [ 54 | "Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel." 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "id": "9ea47727", 60 | "metadata": {}, 61 | "source": [ 62 | "### 1. Set Up" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "id": "35b91e81", 68 | "metadata": {}, 69 | "source": [ 70 | "***\n", 71 | "Before executing the notebook, there are some initial steps required for set up. This notebook requires latest version of sagemaker and ipywidgets\n", 72 | "\n", 73 | "***" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "id": "25293522", 80 | "metadata": { 81 | "tags": [] 82 | }, 83 | "outputs": [], 84 | "source": [ 85 | "!pip install ipywidgets==7.0.0 --quiet" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "id": "c2828beb-90b2-4143-87b5-dc07f8c313f1", 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "!pip install sagemaker boto3 --upgrade --quiet" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "id": "48370155", 101 | "metadata": {}, 102 | "source": [ 103 | "#### Permissions and environment variables\n", 104 | "\n", 105 | "***\n", 106 | "To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. \n", 107 | "\n", 108 | "***" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "id": "90518e45", 115 | "metadata": { 116 | "tags": [] 117 | }, 118 | "outputs": [], 119 | "source": [ 120 | "import sagemaker, boto3, json\n", 121 | "from sagemaker import get_execution_role\n", 122 | "\n", 123 | "aws_role = get_execution_role()\n", 124 | "aws_region = boto3.Session().region_name\n", 125 | "sess = sagemaker.Session()" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "id": "310fca48", 131 | "metadata": {}, 132 | "source": [ 133 | "## 2. Run inference on the pre-trained model\n", 134 | "\n", 135 | "***\n", 136 | "\n", 137 | "Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset.\n", 138 | "***" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "id": "0e072e72-8bb4-4a8d-b887-2e9658dc3672", 144 | "metadata": {}, 145 | "source": [ 146 | "### 2.1. Select a Model\n", 147 | "***\n", 148 | "You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at [Sagemaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#). For this lab, we recommend using the default `model_id`.\n", 149 | "\n", 150 | "***" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "id": "3e4f77d3-bd76-4d0c-b3a2-3ae3fe9e52f8", 157 | "metadata": { 158 | "tags": [] 159 | }, 160 | "outputs": [], 161 | "source": [ 162 | "# model_version=\"*\" fetches the latest version of the model\n", 163 | "model_id = \"model-txt2img-stabilityai-stable-diffusion-v2-fp16\"" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "id": "282e37a1-e379-4bd3-af2c-02d02fd41d78", 169 | "metadata": {}, 170 | "source": [ 171 | "### 2.2. Retrieve JumpStart Artifacts & Deploy an Endpoint\n", 172 | "\n", 173 | "***\n", 174 | "\n", 175 | "Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [sagemaker.model.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. \n", 176 | "\n", 177 | "\n", 178 | "### This may take upto 10 mins. Please do not kill the kernel while you wait.\n", 179 | "\n", 180 | "While you wait, you can checkout the [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/) blog to learn more about Stable Diffusion model and JumpStart.\n", 181 | "\n", 182 | "***" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "id": "3cadb253-54a5-44de-b10a-201c273c552e", 189 | "metadata": { 190 | "pycharm": { 191 | "is_executing": true 192 | }, 193 | "tags": [] 194 | }, 195 | "outputs": [], 196 | "source": [ 197 | "from sagemaker.jumpstart.model import JumpStartModel\n", 198 | "\n", 199 | "\n", 200 | "# Please use ml.g5.24xlarge instance type if it is available in your region. ml.g5.24xlarge has 24GB GPU compared to 16GB in ml.p3.2xlarge and supports generation of larger and better quality images.\n", 201 | "inference_instance_type = \"ml.g4dn.2xlarge\"\n", 202 | "\n", 203 | "model_id = \"model-txt2img-stabilityai-stable-diffusion-v2-fp16\"\n", 204 | "stability_model = JumpStartModel(model_id=model_id)\n", 205 | "\n", 206 | "model_predictor = stability_model.deploy(\n", 207 | " initial_instance_count=1,\n", 208 | " instance_type=inference_instance_type,\n", 209 | ")" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "id": "b2e0fd36", 215 | "metadata": {}, 216 | "source": [ 217 | "### 2.3. Query endpoint and parse response\n", 218 | "\n", 219 | "***\n", 220 | "Input to the endpoint is any string of text dumped in json and encoded in `utf-8` format. Output of the endpoint is a `json` with generated text.\n", 221 | "\n", 222 | "***" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "id": "84fb30d0", 229 | "metadata": { 230 | "tags": [] 231 | }, 232 | "outputs": [], 233 | "source": [ 234 | "import matplotlib.pyplot as plt\n", 235 | "import numpy as np\n", 236 | "\n", 237 | "\n", 238 | "def query(model_predictor, text):\n", 239 | " \"\"\"Query the model predictor.\"\"\"\n", 240 | "\n", 241 | " encoded_text = json.dumps(text).encode(\"utf-8\")\n", 242 | "\n", 243 | " query_response = model_predictor.predict(\n", 244 | " encoded_text,\n", 245 | " {\n", 246 | " \"ContentType\": \"application/x-text\",\n", 247 | " \"Accept\": \"application/json\",\n", 248 | " },\n", 249 | " )\n", 250 | " return query_response\n", 251 | "\n", 252 | "\n", 253 | "def display_img_and_prompt(img, prmpt):\n", 254 | " \"\"\"Display generated image.\"\"\"\n", 255 | " plt.figure(figsize=(12, 12))\n", 256 | " plt.imshow(np.array(img))\n", 257 | " plt.axis(\"off\")\n", 258 | " plt.title(prmpt)\n", 259 | " plt.show()" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "id": "aea0434b", 265 | "metadata": {}, 266 | "source": [ 267 | "***\n", 268 | "Below, we put in some example input text. You can put in any text and the model predicts the image corresponding to that text.\n", 269 | "\n", 270 | "***" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "id": "a5a12e3e-c269-432a-8e41-7e0903c975af", 277 | "metadata": { 278 | "pycharm": { 279 | "is_executing": true 280 | }, 281 | "tags": [] 282 | }, 283 | "outputs": [], 284 | "source": [ 285 | "text = \"cottage in impressionist style\"\n", 286 | "query_response = query(model_predictor, text)\n", 287 | "display_img_and_prompt(query_response['generated_image'], query_response['prompt'])" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "id": "7d591919-1be0-4e9f-b7ff-0aa6e0959053", 293 | "metadata": { 294 | "pycharm": { 295 | "is_executing": true 296 | } 297 | }, 298 | "source": [ 299 | "### 2.4. Supported Inference parameters\n", 300 | "\n", 301 | "***\n", 302 | "This model also supports many advanced parameters while performing inference. They include:\n", 303 | "\n", 304 | "* **prompt**: prompt to guide the image generation. Must be specified and can be a string or a list of strings.\n", 305 | "* **width**: width of the hallucinated image. If specified, it must be a positive integer divisible by 8.\n", 306 | "* **height**: height of the hallucinated image. If specified, it must be a positive integer divisible by 8.\n", 307 | "* **num_inference_steps**: Number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer.\n", 308 | "* **guidance_scale**: Higher guidance scale results in image closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.\n", 309 | "* **negative_prompt**: guide image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if prompt is a list of strings then negative_prompt must also be a list of strings. \n", 310 | "* **num_images_per_prompt**: number of images returned per prompt. If specified it must be a positive integer. \n", 311 | "* **seed**: Fix the randomized state for reproducibility. If specified, it must be an integer.\n", 312 | "\n", 313 | "***" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": null, 319 | "id": "4fee71b1-5584-4916-bd78-5b895be08d41", 320 | "metadata": { 321 | "pycharm": { 322 | "is_executing": true 323 | }, 324 | "tags": [] 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "import json\n", 329 | "\n", 330 | "# Training data for different models had different image sizes and it is often observed that the model performs best when the generated image\n", 331 | "# has dimensions same as the training data dimension. For dimensions not matching the default dimensions, it may result in a black image.\n", 332 | "# Stable Diffusion v1-4 was trained on 512x512 images and Stable Diffusion v2 was trained on 768x768 images.\n", 333 | "payload = {\n", 334 | " \"prompt\": \"astronaut on a horse\",\n", 335 | " \"width\": 512,\n", 336 | " \"height\": 512,\n", 337 | " \"num_images_per_prompt\": 1,\n", 338 | " \"num_inference_steps\": 25,\n", 339 | " \"guidance_scale\": 7.5,\n", 340 | "}\n", 341 | "\n", 342 | "\n", 343 | "def query_endpoint_with_json_payload(model_predictor, payload, content_type, accept):\n", 344 | " \"\"\"Query the model predictor with json payload.\"\"\"\n", 345 | "\n", 346 | " encoded_payload = json.dumps(payload).encode(\"utf-8\")\n", 347 | "\n", 348 | " query_response = model_predictor.predict(\n", 349 | " encoded_payload,\n", 350 | " {\n", 351 | " \"ContentType\": content_type,\n", 352 | " \"Accept\": accept,\n", 353 | " },\n", 354 | " )\n", 355 | " return query_response" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": null, 361 | "id": "9f0c2938", 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "query_response = query_endpoint_with_json_payload(\n", 366 | " model_predictor, payload, \"application/json\", \"application/json\"\n", 367 | ")\n", 368 | "\n", 369 | "for img in query_response['generated_images']:\n", 370 | " display_img_and_prompt(img, query_response['prompt'])" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "id": "62857efd-e53d-4730-a3d2-b7a9bcd03771", 376 | "metadata": { 377 | "pycharm": { 378 | "is_executing": true 379 | }, 380 | "tags": [] 381 | }, 382 | "source": [ 383 | "### 2.5. Compressed Image Output\n", 384 | "\n", 385 | "---\n", 386 | "\n", 387 | "Default response type above from an endpoint is a nested array with RGB values and if the generated image size is large, this may hit response size limit. To address this, we also support endpoint response where each image is returned as a JPEG image returned as bytes. To do this, please set `Accept = 'application/json;jpeg'`.\n", 388 | "\n", 389 | "\n", 390 | "---" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "id": "bfdf0bd9-37a6-4401-afbd-34388a4ecbe8", 397 | "metadata": { 398 | "pycharm": { 399 | "is_executing": true 400 | }, 401 | "tags": [] 402 | }, 403 | "outputs": [], 404 | "source": [ 405 | "from PIL import Image\n", 406 | "from io import BytesIO\n", 407 | "import base64\n", 408 | "import json\n", 409 | "\n", 410 | "# send prompt payload to LLM SageMaker endpoint\n", 411 | "query_response = query_endpoint_with_json_payload(\n", 412 | " model_predictor, payload, \"application/json\", \"application/json;jpeg\"\n", 413 | ")\n", 414 | "\n", 415 | "# Set generated images to all images from the LLM\n", 416 | "generated_images, prompt = query_response['generated_images'], query_response['prompt']\n", 417 | "\n", 418 | "\n", 419 | "# generated_images are a list of jpeg images as bytes with b64 encoding.\n", 420 | "def display_encoded_images(generated_images, prompt):\n", 421 | " # we decode the images and convert to RGB format before displaying\n", 422 | " for generated_image in generated_images:\n", 423 | " generated_image_decoded = BytesIO(base64.b64decode(generated_image.encode()))\n", 424 | " generated_image_rgb = Image.open(generated_image_decoded).convert(\"RGB\")\n", 425 | " display_img_and_prompt(generated_image_rgb, prompt)\n", 426 | "\n", 427 | "\n", 428 | "display_encoded_images(generated_images, prompt)" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "id": "f021fe91", 434 | "metadata": {}, 435 | "source": [ 436 | "### 2.6. Prompt Engineering\n", 437 | "---\n", 438 | "Writing a good prompt can sometime be an art. It is often difficult to predict whether a certain prompt will yield a satisfactory image with a given model. However, there are certain templates that have been observed to work. Broadly, a prompt can be roughly broken down into three pieces: (i) type of image (photograph/sketch/painting etc.), (ii) description (subject/object/environment/scene etc.) and (iii) the style of the image (realistic/artistic/type of art etc.). You can change each of the three parts individually to generate variations of an image. Adjectives have been known to play a significant role in the image generation process. Also, adding more details help in the generation process.\n", 439 | "\n", 440 | "To generate a realistic image, you can use phrases such as “a photo of”, “a photograph of”, “realistic” or “hyper realistic”. To generate images by artists you can use phrases like “by Pablo Piccaso” or “oil painting by Rembrandt” or “landscape art by Frederic Edwin Church” or “pencil drawing by Albrecht Dürer”. You can also combine different artists as well. To generate artistic image by category, you can add the art category in the prompt such as “lion on a beach, abstract”. Some other categories include “oil painting”, “pencil drawing, “pop art”, “digital art”, “anime”, “cartoon”, “futurism”, “watercolor”, “manga” etc. You can also include details such as lighting or camera lens such as 35mm wide lens or 85mm wide lens and details about the framing (portrait/landscape/close up etc.).\n", 441 | "\n", 442 | "Note that model generates different images even if same prompt is given multiple times. So, you can generate multiple images and select the image that suits your application best.\n", 443 | "\n", 444 | "---" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": null, 450 | "id": "2886ac49", 451 | "metadata": { 452 | "collapsed": false, 453 | "jupyter": { 454 | "outputs_hidden": false 455 | } 456 | }, 457 | "outputs": [], 458 | "source": [ 459 | "prompts = [\n", 460 | " \"a beautiful illustration of a young cybertronic hyderabadi american woman, round face, cateye glasses, purple colors, intricate, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by wlop and artgerm and alphonse mucha, masterpiece\",\n", 461 | " \"a photorealistic hyperrealistic render of an interior of a beautifully decorated cozy kitchen by pixar, wlop, artgerm, dramatic moody sunset lighting, long shadows, volumetric, cinematic atmosphere, octane render, artstation, 8 k\",\n", 462 | " \"symmetry!! portrait of nicolas cage, long hair in the wind, smile, happy, white vest, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and alphonse mucha\",\n", 463 | " \"a stunningly detailed stained glass window of a beautiful poison ivy with green skin wearing a business suit, dark eyeliner, intricate, elegant, highly detailed, digital painting, artstation, concept art, sharp focus, illustration, art by alphonse mucha\",\n", 464 | " \"a fantasy style portrait painting of rachel lane / alison brie / sally kellerman hybrid in the style of francois boucher oil painting unreal 5 daz. rpg portrait, extremely detailed artgerm alphonse mucha\",\n", 465 | " \"symmetry!! portrait of vanessa hudgens in the style of horizon zero dawn, machine face, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and alphonse mucha, 8 k\",\n", 466 | " \"landscape of the beautiful city of paris rebuilt near the pacific ocean in sunny california, amazing weather, sandy beach, palm trees, splendid haussmann architecture, digital painting, highly detailed, intricate, without duplication, art by craig mullins, concept art, matte painting, trending on artstation\",\n", 467 | "]\n", 468 | "\n", 469 | "for prompt in prompts:\n", 470 | " # Prompt for the LLM to generate images.\n", 471 | " payload = {\"prompt\": prompt, \"width\": 768, \"height\": 768, \"num_images_per_prompt\": 1}\n", 472 | " # Send each prompt to the model\n", 473 | " query_response = query_endpoint_with_json_payload(\n", 474 | " model_predictor, payload, \"application/json\", \"application/json;jpeg\"\n", 475 | " )\n", 476 | " # Get the images from the each inference request\n", 477 | " generated_images = query_response['generated_images']\n", 478 | " # Display image with prompt title\n", 479 | " display_encoded_images(generated_images, prompt)" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "id": "870d1173", 485 | "metadata": {}, 486 | "source": [ 487 | "### 2.7. Clean up the endpoint\n", 488 | "---\n", 489 | "An endpoint deployed on SageMaker is persistent. So, once you are done using the model, please make sure to delete the model and the deployed endpoint.\n", 490 | "\n", 491 | "---" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": null, 497 | "id": "63cb143b", 498 | "metadata": { 499 | "tags": [] 500 | }, 501 | "outputs": [], 502 | "source": [ 503 | "# Delete the SageMaker endpoint\n", 504 | "model_predictor.delete_endpoint()" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "id": "18a37c1c", 510 | "metadata": {}, 511 | "source": [ 512 | "### 3. Conclusion\n", 513 | "---\n", 514 | "In this tutorial, we learnt how to deploy a pre-trained Stable Diffusion model on SageMaker using JumpStart. We saw that Stable Diffusion models can generate highly photo-realistic images from text. JumpStart provides both Stable Diffusion 1 and Stable Diffusion 2 and their FP16 revisions. JumpStart also provides additional 84 diffusion models which have been trained to generate images from different themes and different languages. You can deploy any of these models without writing any code of your own. To deploy a specific model, you can select a `model_id` in the dropdown menu in [2.1. Select a Model](#2.1.-Select-a-Model).\n", 515 | "\n", 516 | "You can tweak the image generation process by selecting the appropriate parameters during inference. Guidance on how to set these parameters is provided in [2.4. Supported Inference parameters](#2.4.-Supported-Inference-parameters). We also saw how returning a large image payload can lead to response size limit issues. JumpStart handles it by encoding the image at the endpoint and decoding it in the notebook before displaying. Finally, we saw how prompt engineering is a crucial step in generating high quality images. We discussed how to set your own prompts and saw a some examples of good prompts.\n", 517 | "\n", 518 | "To learn more about Inference on pre-trained Stable Diffusion models, please check out the blog [Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/generate-images-from-text-with-the-stable-diffusion-model-on-amazon-sagemaker-jumpstart/)\n", 519 | "\n", 520 | "Although creating impressive images can find use in industries ranging from art to NFTs and beyond, today we also expect AI to be personalizable. JumpStart provides fine-tuning capability to the pre-trained models so that you can adapt the model to your own use case with as little as five training images. This can be useful when creating art, logos, custom designs, NFTs, and so on, or fun stuff such as generating custom AI images of your pets or avatars of yourself. To learn more about Stable Diffusion fine-tuning, please check out the blog [Fine-tune text-to-image Stable Diffusion models with Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/fine-tune-text-to-image-stable-diffusion-models-with-amazon-sagemaker-jumpstart/)." 521 | ] 522 | } 523 | ], 524 | "metadata": { 525 | "availableInstances": [ 526 | { 527 | "_defaultOrder": 0, 528 | "_isFastLaunch": true, 529 | "category": "General purpose", 530 | "gpuNum": 0, 531 | "hideHardwareSpecs": false, 532 | "memoryGiB": 4, 533 | "name": "ml.t3.medium", 534 | "vcpuNum": 2 535 | }, 536 | { 537 | "_defaultOrder": 1, 538 | "_isFastLaunch": false, 539 | "category": "General purpose", 540 | "gpuNum": 0, 541 | "hideHardwareSpecs": false, 542 | "memoryGiB": 8, 543 | "name": "ml.t3.large", 544 | "vcpuNum": 2 545 | }, 546 | { 547 | "_defaultOrder": 2, 548 | "_isFastLaunch": false, 549 | "category": "General purpose", 550 | "gpuNum": 0, 551 | "hideHardwareSpecs": false, 552 | "memoryGiB": 16, 553 | "name": "ml.t3.xlarge", 554 | "vcpuNum": 4 555 | }, 556 | { 557 | "_defaultOrder": 3, 558 | "_isFastLaunch": false, 559 | "category": "General purpose", 560 | "gpuNum": 0, 561 | "hideHardwareSpecs": false, 562 | "memoryGiB": 32, 563 | "name": "ml.t3.2xlarge", 564 | "vcpuNum": 8 565 | }, 566 | { 567 | "_defaultOrder": 4, 568 | "_isFastLaunch": true, 569 | "category": "General purpose", 570 | "gpuNum": 0, 571 | "hideHardwareSpecs": false, 572 | "memoryGiB": 8, 573 | "name": "ml.m5.large", 574 | "vcpuNum": 2 575 | }, 576 | { 577 | "_defaultOrder": 5, 578 | "_isFastLaunch": false, 579 | "category": "General purpose", 580 | "gpuNum": 0, 581 | "hideHardwareSpecs": false, 582 | "memoryGiB": 16, 583 | "name": "ml.m5.xlarge", 584 | "vcpuNum": 4 585 | }, 586 | { 587 | "_defaultOrder": 6, 588 | "_isFastLaunch": false, 589 | "category": "General purpose", 590 | "gpuNum": 0, 591 | "hideHardwareSpecs": false, 592 | "memoryGiB": 32, 593 | "name": "ml.m5.2xlarge", 594 | "vcpuNum": 8 595 | }, 596 | { 597 | "_defaultOrder": 7, 598 | "_isFastLaunch": false, 599 | "category": "General purpose", 600 | "gpuNum": 0, 601 | "hideHardwareSpecs": false, 602 | "memoryGiB": 64, 603 | "name": "ml.m5.4xlarge", 604 | "vcpuNum": 16 605 | }, 606 | { 607 | "_defaultOrder": 8, 608 | "_isFastLaunch": false, 609 | "category": "General purpose", 610 | "gpuNum": 0, 611 | "hideHardwareSpecs": false, 612 | "memoryGiB": 128, 613 | "name": "ml.m5.8xlarge", 614 | "vcpuNum": 32 615 | }, 616 | { 617 | "_defaultOrder": 9, 618 | "_isFastLaunch": false, 619 | "category": "General purpose", 620 | "gpuNum": 0, 621 | "hideHardwareSpecs": false, 622 | "memoryGiB": 192, 623 | "name": "ml.m5.12xlarge", 624 | "vcpuNum": 48 625 | }, 626 | { 627 | "_defaultOrder": 10, 628 | "_isFastLaunch": false, 629 | "category": "General purpose", 630 | "gpuNum": 0, 631 | "hideHardwareSpecs": false, 632 | "memoryGiB": 256, 633 | "name": "ml.m5.16xlarge", 634 | "vcpuNum": 64 635 | }, 636 | { 637 | "_defaultOrder": 11, 638 | "_isFastLaunch": false, 639 | "category": "General purpose", 640 | "gpuNum": 0, 641 | "hideHardwareSpecs": false, 642 | "memoryGiB": 384, 643 | "name": "ml.m5.24xlarge", 644 | "vcpuNum": 96 645 | }, 646 | { 647 | "_defaultOrder": 12, 648 | "_isFastLaunch": false, 649 | "category": "General purpose", 650 | "gpuNum": 0, 651 | "hideHardwareSpecs": false, 652 | "memoryGiB": 8, 653 | "name": "ml.m5d.large", 654 | "vcpuNum": 2 655 | }, 656 | { 657 | "_defaultOrder": 13, 658 | "_isFastLaunch": false, 659 | "category": "General purpose", 660 | "gpuNum": 0, 661 | "hideHardwareSpecs": false, 662 | "memoryGiB": 16, 663 | "name": "ml.m5d.xlarge", 664 | "vcpuNum": 4 665 | }, 666 | { 667 | "_defaultOrder": 14, 668 | "_isFastLaunch": false, 669 | "category": "General purpose", 670 | "gpuNum": 0, 671 | "hideHardwareSpecs": false, 672 | "memoryGiB": 32, 673 | "name": "ml.m5d.2xlarge", 674 | "vcpuNum": 8 675 | }, 676 | { 677 | "_defaultOrder": 15, 678 | "_isFastLaunch": false, 679 | "category": "General purpose", 680 | "gpuNum": 0, 681 | "hideHardwareSpecs": false, 682 | "memoryGiB": 64, 683 | "name": "ml.m5d.4xlarge", 684 | "vcpuNum": 16 685 | }, 686 | { 687 | "_defaultOrder": 16, 688 | "_isFastLaunch": false, 689 | "category": "General purpose", 690 | "gpuNum": 0, 691 | "hideHardwareSpecs": false, 692 | "memoryGiB": 128, 693 | "name": "ml.m5d.8xlarge", 694 | "vcpuNum": 32 695 | }, 696 | { 697 | "_defaultOrder": 17, 698 | "_isFastLaunch": false, 699 | "category": "General purpose", 700 | "gpuNum": 0, 701 | "hideHardwareSpecs": false, 702 | "memoryGiB": 192, 703 | "name": "ml.m5d.12xlarge", 704 | "vcpuNum": 48 705 | }, 706 | { 707 | "_defaultOrder": 18, 708 | "_isFastLaunch": false, 709 | "category": "General purpose", 710 | "gpuNum": 0, 711 | "hideHardwareSpecs": false, 712 | "memoryGiB": 256, 713 | "name": "ml.m5d.16xlarge", 714 | "vcpuNum": 64 715 | }, 716 | { 717 | "_defaultOrder": 19, 718 | "_isFastLaunch": false, 719 | "category": "General purpose", 720 | "gpuNum": 0, 721 | "hideHardwareSpecs": false, 722 | "memoryGiB": 384, 723 | "name": "ml.m5d.24xlarge", 724 | "vcpuNum": 96 725 | }, 726 | { 727 | "_defaultOrder": 20, 728 | "_isFastLaunch": false, 729 | "category": "General purpose", 730 | "gpuNum": 0, 731 | "hideHardwareSpecs": true, 732 | "memoryGiB": 0, 733 | "name": "ml.geospatial.interactive", 734 | "supportedImageNames": [ 735 | "sagemaker-geospatial-v1-0" 736 | ], 737 | "vcpuNum": 0 738 | }, 739 | { 740 | "_defaultOrder": 21, 741 | "_isFastLaunch": true, 742 | "category": "Compute optimized", 743 | "gpuNum": 0, 744 | "hideHardwareSpecs": false, 745 | "memoryGiB": 4, 746 | "name": "ml.c5.large", 747 | "vcpuNum": 2 748 | }, 749 | { 750 | "_defaultOrder": 22, 751 | "_isFastLaunch": false, 752 | "category": "Compute optimized", 753 | "gpuNum": 0, 754 | "hideHardwareSpecs": false, 755 | "memoryGiB": 8, 756 | "name": "ml.c5.xlarge", 757 | "vcpuNum": 4 758 | }, 759 | { 760 | "_defaultOrder": 23, 761 | "_isFastLaunch": false, 762 | "category": "Compute optimized", 763 | "gpuNum": 0, 764 | "hideHardwareSpecs": false, 765 | "memoryGiB": 16, 766 | "name": "ml.c5.2xlarge", 767 | "vcpuNum": 8 768 | }, 769 | { 770 | "_defaultOrder": 24, 771 | "_isFastLaunch": false, 772 | "category": "Compute optimized", 773 | "gpuNum": 0, 774 | "hideHardwareSpecs": false, 775 | "memoryGiB": 32, 776 | "name": "ml.c5.4xlarge", 777 | "vcpuNum": 16 778 | }, 779 | { 780 | "_defaultOrder": 25, 781 | "_isFastLaunch": false, 782 | "category": "Compute optimized", 783 | "gpuNum": 0, 784 | "hideHardwareSpecs": false, 785 | "memoryGiB": 72, 786 | "name": "ml.c5.9xlarge", 787 | "vcpuNum": 36 788 | }, 789 | { 790 | "_defaultOrder": 26, 791 | "_isFastLaunch": false, 792 | "category": "Compute optimized", 793 | "gpuNum": 0, 794 | "hideHardwareSpecs": false, 795 | "memoryGiB": 96, 796 | "name": "ml.c5.12xlarge", 797 | "vcpuNum": 48 798 | }, 799 | { 800 | "_defaultOrder": 27, 801 | "_isFastLaunch": false, 802 | "category": "Compute optimized", 803 | "gpuNum": 0, 804 | "hideHardwareSpecs": false, 805 | "memoryGiB": 144, 806 | "name": "ml.c5.18xlarge", 807 | "vcpuNum": 72 808 | }, 809 | { 810 | "_defaultOrder": 28, 811 | "_isFastLaunch": false, 812 | "category": "Compute optimized", 813 | "gpuNum": 0, 814 | "hideHardwareSpecs": false, 815 | "memoryGiB": 192, 816 | "name": "ml.c5.24xlarge", 817 | "vcpuNum": 96 818 | }, 819 | { 820 | "_defaultOrder": 29, 821 | "_isFastLaunch": true, 822 | "category": "Accelerated computing", 823 | "gpuNum": 1, 824 | "hideHardwareSpecs": false, 825 | "memoryGiB": 16, 826 | "name": "ml.g4dn.xlarge", 827 | "vcpuNum": 4 828 | }, 829 | { 830 | "_defaultOrder": 30, 831 | "_isFastLaunch": false, 832 | "category": "Accelerated computing", 833 | "gpuNum": 1, 834 | "hideHardwareSpecs": false, 835 | "memoryGiB": 32, 836 | "name": "ml.g4dn.2xlarge", 837 | "vcpuNum": 8 838 | }, 839 | { 840 | "_defaultOrder": 31, 841 | "_isFastLaunch": false, 842 | "category": "Accelerated computing", 843 | "gpuNum": 1, 844 | "hideHardwareSpecs": false, 845 | "memoryGiB": 64, 846 | "name": "ml.g4dn.4xlarge", 847 | "vcpuNum": 16 848 | }, 849 | { 850 | "_defaultOrder": 32, 851 | "_isFastLaunch": false, 852 | "category": "Accelerated computing", 853 | "gpuNum": 1, 854 | "hideHardwareSpecs": false, 855 | "memoryGiB": 128, 856 | "name": "ml.g4dn.8xlarge", 857 | "vcpuNum": 32 858 | }, 859 | { 860 | "_defaultOrder": 33, 861 | "_isFastLaunch": false, 862 | "category": "Accelerated computing", 863 | "gpuNum": 4, 864 | "hideHardwareSpecs": false, 865 | "memoryGiB": 192, 866 | "name": "ml.g4dn.12xlarge", 867 | "vcpuNum": 48 868 | }, 869 | { 870 | "_defaultOrder": 34, 871 | "_isFastLaunch": false, 872 | "category": "Accelerated computing", 873 | "gpuNum": 1, 874 | "hideHardwareSpecs": false, 875 | "memoryGiB": 256, 876 | "name": "ml.g4dn.16xlarge", 877 | "vcpuNum": 64 878 | }, 879 | { 880 | "_defaultOrder": 35, 881 | "_isFastLaunch": false, 882 | "category": "Accelerated computing", 883 | "gpuNum": 1, 884 | "hideHardwareSpecs": false, 885 | "memoryGiB": 61, 886 | "name": "ml.p3.2xlarge", 887 | "vcpuNum": 8 888 | }, 889 | { 890 | "_defaultOrder": 36, 891 | "_isFastLaunch": false, 892 | "category": "Accelerated computing", 893 | "gpuNum": 4, 894 | "hideHardwareSpecs": false, 895 | "memoryGiB": 244, 896 | "name": "ml.p3.8xlarge", 897 | "vcpuNum": 32 898 | }, 899 | { 900 | "_defaultOrder": 37, 901 | "_isFastLaunch": false, 902 | "category": "Accelerated computing", 903 | "gpuNum": 8, 904 | "hideHardwareSpecs": false, 905 | "memoryGiB": 488, 906 | "name": "ml.p3.16xlarge", 907 | "vcpuNum": 64 908 | }, 909 | { 910 | "_defaultOrder": 38, 911 | "_isFastLaunch": false, 912 | "category": "Accelerated computing", 913 | "gpuNum": 8, 914 | "hideHardwareSpecs": false, 915 | "memoryGiB": 768, 916 | "name": "ml.p3dn.24xlarge", 917 | "vcpuNum": 96 918 | }, 919 | { 920 | "_defaultOrder": 39, 921 | "_isFastLaunch": false, 922 | "category": "Memory Optimized", 923 | "gpuNum": 0, 924 | "hideHardwareSpecs": false, 925 | "memoryGiB": 16, 926 | "name": "ml.r5.large", 927 | "vcpuNum": 2 928 | }, 929 | { 930 | "_defaultOrder": 40, 931 | "_isFastLaunch": false, 932 | "category": "Memory Optimized", 933 | "gpuNum": 0, 934 | "hideHardwareSpecs": false, 935 | "memoryGiB": 32, 936 | "name": "ml.r5.xlarge", 937 | "vcpuNum": 4 938 | }, 939 | { 940 | "_defaultOrder": 41, 941 | "_isFastLaunch": false, 942 | "category": "Memory Optimized", 943 | "gpuNum": 0, 944 | "hideHardwareSpecs": false, 945 | "memoryGiB": 64, 946 | "name": "ml.r5.2xlarge", 947 | "vcpuNum": 8 948 | }, 949 | { 950 | "_defaultOrder": 42, 951 | "_isFastLaunch": false, 952 | "category": "Memory Optimized", 953 | "gpuNum": 0, 954 | "hideHardwareSpecs": false, 955 | "memoryGiB": 128, 956 | "name": "ml.r5.4xlarge", 957 | "vcpuNum": 16 958 | }, 959 | { 960 | "_defaultOrder": 43, 961 | "_isFastLaunch": false, 962 | "category": "Memory Optimized", 963 | "gpuNum": 0, 964 | "hideHardwareSpecs": false, 965 | "memoryGiB": 256, 966 | "name": "ml.r5.8xlarge", 967 | "vcpuNum": 32 968 | }, 969 | { 970 | "_defaultOrder": 44, 971 | "_isFastLaunch": false, 972 | "category": "Memory Optimized", 973 | "gpuNum": 0, 974 | "hideHardwareSpecs": false, 975 | "memoryGiB": 384, 976 | "name": "ml.r5.12xlarge", 977 | "vcpuNum": 48 978 | }, 979 | { 980 | "_defaultOrder": 45, 981 | "_isFastLaunch": false, 982 | "category": "Memory Optimized", 983 | "gpuNum": 0, 984 | "hideHardwareSpecs": false, 985 | "memoryGiB": 512, 986 | "name": "ml.r5.16xlarge", 987 | "vcpuNum": 64 988 | }, 989 | { 990 | "_defaultOrder": 46, 991 | "_isFastLaunch": false, 992 | "category": "Memory Optimized", 993 | "gpuNum": 0, 994 | "hideHardwareSpecs": false, 995 | "memoryGiB": 768, 996 | "name": "ml.r5.24xlarge", 997 | "vcpuNum": 96 998 | }, 999 | { 1000 | "_defaultOrder": 47, 1001 | "_isFastLaunch": false, 1002 | "category": "Accelerated computing", 1003 | "gpuNum": 1, 1004 | "hideHardwareSpecs": false, 1005 | "memoryGiB": 16, 1006 | "name": "ml.g5.xlarge", 1007 | "vcpuNum": 4 1008 | }, 1009 | { 1010 | "_defaultOrder": 48, 1011 | "_isFastLaunch": false, 1012 | "category": "Accelerated computing", 1013 | "gpuNum": 1, 1014 | "hideHardwareSpecs": false, 1015 | "memoryGiB": 32, 1016 | "name": "ml.g5.2xlarge", 1017 | "vcpuNum": 8 1018 | }, 1019 | { 1020 | "_defaultOrder": 49, 1021 | "_isFastLaunch": false, 1022 | "category": "Accelerated computing", 1023 | "gpuNum": 1, 1024 | "hideHardwareSpecs": false, 1025 | "memoryGiB": 64, 1026 | "name": "ml.g5.4xlarge", 1027 | "vcpuNum": 16 1028 | }, 1029 | { 1030 | "_defaultOrder": 50, 1031 | "_isFastLaunch": false, 1032 | "category": "Accelerated computing", 1033 | "gpuNum": 1, 1034 | "hideHardwareSpecs": false, 1035 | "memoryGiB": 128, 1036 | "name": "ml.g5.8xlarge", 1037 | "vcpuNum": 32 1038 | }, 1039 | { 1040 | "_defaultOrder": 51, 1041 | "_isFastLaunch": false, 1042 | "category": "Accelerated computing", 1043 | "gpuNum": 1, 1044 | "hideHardwareSpecs": false, 1045 | "memoryGiB": 256, 1046 | "name": "ml.g5.16xlarge", 1047 | "vcpuNum": 64 1048 | }, 1049 | { 1050 | "_defaultOrder": 52, 1051 | "_isFastLaunch": false, 1052 | "category": "Accelerated computing", 1053 | "gpuNum": 4, 1054 | "hideHardwareSpecs": false, 1055 | "memoryGiB": 192, 1056 | "name": "ml.g5.12xlarge", 1057 | "vcpuNum": 48 1058 | }, 1059 | { 1060 | "_defaultOrder": 53, 1061 | "_isFastLaunch": false, 1062 | "category": "Accelerated computing", 1063 | "gpuNum": 4, 1064 | "hideHardwareSpecs": false, 1065 | "memoryGiB": 384, 1066 | "name": "ml.g5.24xlarge", 1067 | "vcpuNum": 96 1068 | }, 1069 | { 1070 | "_defaultOrder": 54, 1071 | "_isFastLaunch": false, 1072 | "category": "Accelerated computing", 1073 | "gpuNum": 8, 1074 | "hideHardwareSpecs": false, 1075 | "memoryGiB": 768, 1076 | "name": "ml.g5.48xlarge", 1077 | "vcpuNum": 192 1078 | }, 1079 | { 1080 | "_defaultOrder": 55, 1081 | "_isFastLaunch": false, 1082 | "category": "Accelerated computing", 1083 | "gpuNum": 8, 1084 | "hideHardwareSpecs": false, 1085 | "memoryGiB": 1152, 1086 | "name": "ml.p4d.24xlarge", 1087 | "vcpuNum": 96 1088 | }, 1089 | { 1090 | "_defaultOrder": 56, 1091 | "_isFastLaunch": false, 1092 | "category": "Accelerated computing", 1093 | "gpuNum": 8, 1094 | "hideHardwareSpecs": false, 1095 | "memoryGiB": 1152, 1096 | "name": "ml.p4de.24xlarge", 1097 | "vcpuNum": 96 1098 | } 1099 | ], 1100 | "instance_type": "ml.t3.medium", 1101 | "kernelspec": { 1102 | "display_name": "Python 3 (Data Science)", 1103 | "language": "python", 1104 | "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" 1105 | }, 1106 | "language_info": { 1107 | "codemirror_mode": { 1108 | "name": "ipython", 1109 | "version": 3 1110 | }, 1111 | "file_extension": ".py", 1112 | "mimetype": "text/x-python", 1113 | "name": "python", 1114 | "nbconvert_exporter": "python", 1115 | "pygments_lexer": "ipython3", 1116 | "version": "3.7.10" 1117 | }, 1118 | "pycharm": { 1119 | "stem_cell": { 1120 | "cell_type": "raw", 1121 | "metadata": { 1122 | "collapsed": false 1123 | }, 1124 | "source": [] 1125 | } 1126 | } 1127 | }, 1128 | "nbformat": 4, 1129 | "nbformat_minor": 5 1130 | } 1131 | -------------------------------------------------------------------------------- /lab4/cf.yml: -------------------------------------------------------------------------------- 1 | --- 2 | AWSTemplateFormatVersion: '2010-09-09' 3 | Transform: AWS::Serverless-2016-10-31 4 | Resources: 5 | RagAppStack: 6 | Type: AWS::Serverless::Application 7 | Properties: 8 | Location: template.yml 9 | 10 | Outputs: 11 | LoadBalancerUrl: 12 | Description: URL of the load balancer 13 | Value: !Sub "http://${LoadBalancer.DNSName}" 14 | -------------------------------------------------------------------------------- /lab4/fe/Dockerfile: -------------------------------------------------------------------------------- 1 | #FROM --platform=linux/x86-64 python:3.11.1-slim 2 | FROM public.ecr.aws/docker/library/python:3.11-slim 3 | 4 | WORKDIR /usr/src/app 5 | 6 | COPY requirements.txt requirements.txt 7 | 8 | RUN pip install -r requirements.txt 9 | 10 | COPY app.py app.py 11 | 12 | COPY aws.png aws.png 13 | 14 | EXPOSE 80 15 | 16 | ENTRYPOINT [ "streamlit", "run", "app.py", \ 17 | "--server.enableCORS", "true", \ 18 | "--server.port", "80", \ 19 | "--browser.serverPort", "80"] -------------------------------------------------------------------------------- /lab4/fe/app.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | from streamlit_chat import message 3 | from streamlit_extras.colored_header import colored_header 4 | from streamlit_extras.add_vertical_space import add_vertical_space 5 | import uuid 6 | import sys 7 | import json 8 | import os 9 | import requests 10 | import uuid 11 | 12 | 13 | 14 | AI_ICON = "aws.png" 15 | base_url = os.getenv('BASE_URL') 16 | headers = {'Content-Type': 'application/json'} 17 | 18 | st.set_page_config(page_title="AWSomeChat - An LLM-powered chatbot on AWS documentation") 19 | 20 | # Sidebar contents 21 | with st.sidebar: 22 | st.title('🤗💬 AWSomeChat App') 23 | st.markdown(''' 24 | ## About 25 | This app is an LLM-powered chatbot built using: 26 | - [Streamlit](https://streamlit.io/) 27 | - [Amazon SageMaker](https://aws.amazon.com/sagemaker/) 28 | - [Amazon Kendra](https://aws.amazon.com/kendra/) 29 | 30 | ''') 31 | add_vertical_space(5) 32 | st.write('Made with ❤️ by your AWS WWSO AIML EMEA Team') 33 | 34 | st.markdown(""" 35 | 50 | """, unsafe_allow_html=True) 51 | 52 | 53 | def create_session_id(): 54 | return str(uuid.uuid4()) 55 | 56 | 57 | # Create or get the session state 58 | def get_session(): 59 | if 'session_id' not in st.session_state: 60 | st.session_state.session_id = create_session_id() 61 | return st.session_state.session_id 62 | 63 | 64 | session_id = get_session() 65 | 66 | 67 | # Refresh button callback 68 | def refresh(): 69 | session_id = create_session_id() 70 | st.session_state.session_id = session_id 71 | st.session_state['generated'] = ["Hi, I'm AWSomeChat. I have lots of information on AWS documentation. How may I help you?"] 72 | st.session_state['past'] = [] 73 | 74 | 75 | def clear(): 76 | st.session_state.session_id = session_id 77 | st.session_state['generated'] = ["Hi, I'm AWSomeChat. I have lots of information on AWS documentation. How may I help you?"] 78 | st.session_state['past'] = [] 79 | st.session_state['input'] = "" 80 | 81 | 82 | def write_logo(): 83 | col1, col2, col3 = st.columns([5, 1, 5]) 84 | with col2: 85 | st.image(AI_ICON, use_column_width='always') 86 | 87 | 88 | def write_top_bar(): 89 | col1, col2, col3, col4 = st.columns([1,10,2,2]) 90 | with col1: 91 | st.image(AI_ICON, use_column_width='always') 92 | with col2: 93 | st.write(f"

AWSomeChat

", unsafe_allow_html=True) 94 | with col3: 95 | if st.button("Clear Chat", key="clear"): 96 | clear() 97 | with col4: 98 | if st.button('Reset Session'): 99 | refresh() 100 | 101 | write_top_bar() 102 | 103 | session_header = f" Session ID: {session_id}" 104 | st.write(f"

{session_header}

", unsafe_allow_html=True) 105 | 106 | colored_header(label='', description='', color_name='blue-30') 107 | 108 | 109 | # Layout of input/response containers 110 | input_container = st.container() 111 | response_container = st.container() 112 | 113 | 114 | # User input 115 | ## Function for taking user provided prompt as input 116 | def get_text(): 117 | input_text = st.text_input("User Input: ", "", key="input") 118 | return input_text 119 | ## Applying the user input box 120 | with input_container: 121 | user_input = get_text() 122 | 123 | 124 | # Generate empty lists for generated and past. 125 | # generated stores AI generated responses 126 | if 'generated' not in st.session_state: 127 | st.session_state['generated'] = ["Hi, I'm AWSomeChat. I have lots of information on AWS documentation. How may I help you?"] 128 | ## past stores User's questions 129 | if 'past' not in st.session_state: 130 | st.session_state['past'] = [] 131 | 132 | 133 | # Response output 134 | ## Function for taking user prompt as input followed by producing AI generated responses 135 | def generate_response(prompt): 136 | url = f'{base_url}/ragapp' 137 | body = {"query": prompt, "uuid": session_id} 138 | response = requests.post(url, headers=headers, data=json.dumps(body), verify=False) 139 | output_text = response.text 140 | return output_text 141 | 142 | 143 | ## Conditional display of AI generated responses as a function of user provided prompts 144 | with response_container: 145 | if user_input: 146 | response = generate_response(user_input) 147 | st.session_state.past.append(user_input) 148 | st.session_state.generated.append(response) 149 | 150 | if st.session_state['generated']: 151 | for i in range(len(st.session_state['generated'])): 152 | if i > 0: 153 | message(st.session_state['past'][i-1], is_user=True, key=str(i) + '_user') 154 | message(st.session_state["generated"][i], key=str(i)) 155 | 156 | -------------------------------------------------------------------------------- /lab4/fe/aws.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/lab4/fe/aws.png -------------------------------------------------------------------------------- /lab4/fe/requirements.txt: -------------------------------------------------------------------------------- 1 | streamlit 2 | streamlit-chat 3 | streamlit-extras 4 | requests 5 | boto3 -------------------------------------------------------------------------------- /lab4/fe/setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Get the AWS account ID 4 | aws_account_id=$(aws sts get-caller-identity --query Account --output text) 5 | aws_region=$(aws configure get region) 6 | 7 | echo "AccountId = ${aws_account_id}" 8 | echo "Region = ${aws_region}" 9 | 10 | 11 | # Create a new ECR repository 12 | echo "Creating ECR Repository..." 13 | aws ecr create-repository --repository-name rag-app 14 | 15 | # Get the login command for the new repository 16 | echo "Logging into the repository..." 17 | #$(aws ecr get-login --no-include-email) 18 | # aws ecr get-login-password --region ${aws_region} | docker login --username AWS --password-stdin ${aws_account_id}.dkr.ecr.${aws_region}.amazonaws.com 19 | 20 | # Build and push the Docker image and tag it 21 | echo "Building and pushing Docker image..." 22 | sm-docker build -t "${aws_account_id}.dkr.ecr.us-east-1.amazonaws.com/rag-app:latest" --repository rag-app:latest . 23 | 24 | -------------------------------------------------------------------------------- /lab4/rag_app/kendra/__init__.py: -------------------------------------------------------------------------------- 1 | """Classes to work with AWS Kendra and Bedrock LLMs""" -------------------------------------------------------------------------------- /lab4/rag_app/kendra/__pycache__/__init__.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/lab4/rag_app/kendra/__pycache__/__init__.cpython-311.pyc -------------------------------------------------------------------------------- /lab4/rag_app/kendra/__pycache__/kendra_index_retriever.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/lab4/rag_app/kendra/__pycache__/kendra_index_retriever.cpython-311.pyc -------------------------------------------------------------------------------- /lab4/rag_app/kendra/__pycache__/kendra_results.cpython-311.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/generative-ai-on-aws-immersion-day/5975f841a2e179e0cf6a8236be8ddeb6d29f6715/lab4/rag_app/kendra/__pycache__/kendra_results.cpython-311.pyc -------------------------------------------------------------------------------- /lab4/rag_app/kendra/kendra_index_retriever.py: -------------------------------------------------------------------------------- 1 | """Chain for question-answering against a vector database.""" 2 | from __future__ import annotations 3 | 4 | from typing import Any, Dict, List, Optional 5 | 6 | from langchain.schema import BaseRetriever, Document 7 | 8 | from .kendra_results import kendra_query, kendra_client 9 | import boto3 10 | 11 | class KendraIndexRetriever(BaseRetriever): 12 | """Retriever to retrieve documents from Amazon Kendra index. 13 | 14 | Example: 15 | .. code-block:: python 16 | 17 | kendraIndexRetriever = KendraIndexRetriever() 18 | 19 | """ 20 | 21 | kendraindex: str 22 | """Kendra index id""" 23 | awsregion: str 24 | """AWS region of the Kendra index""" 25 | k: int 26 | """Number of documents to query for.""" 27 | return_source_documents: bool 28 | """Whether source documents to be returned """ 29 | kclient: Any 30 | """ boto3 client for Kendra. """ 31 | 32 | def __init__(self, kendraindex, awsregion, k=3, return_source_documents=False): 33 | self.kendraindex = kendraindex 34 | self.awsregion = awsregion 35 | self.k = k 36 | self.return_source_documents = return_source_documents 37 | self.kclient = kendra_client(self.kendraindex, self.awsregion) 38 | 39 | def get_relevant_documents(self, query: str) -> List[Document]: 40 | """Run search on Kendra index and get top k documents 41 | 42 | docs = get_relevant_documents('This is my query') 43 | """ 44 | docs = kendra_query(self.kclient, query, self.k, self.kendraindex) 45 | return docs 46 | 47 | async def aget_relevant_documents(self, query: str) -> List[Document]: 48 | return await super().aget_relevant_documents(query) 49 | -------------------------------------------------------------------------------- /lab4/rag_app/kendra/kendra_results.py: -------------------------------------------------------------------------------- 1 | from langchain.docstore.document import Document 2 | import boto3 3 | import re 4 | 5 | def clean_result(res_text): 6 | res = re.sub("\s+", " ", res_text).replace("...","") 7 | return res 8 | 9 | def get_top_n_results(resp, count): 10 | r = resp["ResultItems"][count] 11 | doc_title = r["DocumentTitle"]["Text"] 12 | doc_uri = r["DocumentURI"] 13 | r_type = r["Type"] 14 | if (r["AdditionalAttributes"] and r["AdditionalAttributes"][0]["Key"] == "AnswerText"): 15 | res_text = r["AdditionalAttributes"][0]["Value"]["TextWithHighlightsValue"]["Text"] 16 | else: 17 | res_text = r["DocumentExcerpt"]["Text"] 18 | doc_excerpt = clean_result(res_text) 19 | combined_text = "Document Title: " + doc_title + "\nDocument Excerpt: \n" + doc_excerpt + "\n" 20 | return {"page_content":combined_text, "metadata":{"source":doc_uri, "title": doc_title, "excerpt": doc_excerpt, "type": r_type}} 21 | 22 | def kendra_query(kclient, kquery, kcount, kindex_id): 23 | response = kclient.query(IndexId=kindex_id, QueryText=kquery.strip()) 24 | if len(response["ResultItems"]) > kcount: 25 | r_count = kcount 26 | else: 27 | r_count = len(response["ResultItems"]) 28 | docs = [get_top_n_results(response, i) for i in range(0, r_count)] 29 | return [Document(page_content = d["page_content"], metadata = d["metadata"]) for d in docs] 30 | 31 | def kendra_client(kindex_id, kregion): 32 | kclient = boto3.client('kendra', region_name=kregion) 33 | return kclient 34 | -------------------------------------------------------------------------------- /lab4/rag_app/rag_app.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from langchain.chains import ConversationalRetrievalChain 4 | from langchain import SagemakerEndpoint 5 | from langchain.prompts.prompt import PromptTemplate 6 | from langchain.embeddings import SagemakerEndpointEmbeddings 7 | from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler 8 | from langchain.llms.sagemaker_endpoint import ContentHandlerBase, LLMContentHandler 9 | from langchain.memory import ConversationBufferWindowMemory 10 | from langchain import PromptTemplate, LLMChain 11 | from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory 12 | # from kendra.kendra_index_retriever import KendraIndexRetriever 13 | from langchain.retrievers import AmazonKendraRetriever 14 | 15 | 16 | REGION = os.environ.get('REGION') 17 | KENDRA_INDEX_ID = os.environ.get('KENDRA_INDEX_ID') 18 | SM_ENDPOINT_NAME = os.environ.get('SM_ENDPOINT_NAME') 19 | 20 | # Generative LLM 21 | 22 | # Content Handler for Option 1 - FLAN-T5-XXL - please uncomment below if you used this option 23 | # class ContentHandler(LLMContentHandler): 24 | # content_type = "application/json" 25 | # accepts = "application/json" 26 | 27 | # def transform_input(self, prompt, model_kwargs): 28 | # input_str = json.dumps({"text_inputs": prompt, "temperature": 0, "max_length": 200}) 29 | # return input_str.encode('utf-8') 30 | 31 | # def transform_output(self, output): 32 | # response_json = json.loads(output.read().decode("utf-8")) 33 | # return response_json["generated_texts"][0] 34 | 35 | # Content Handler for Option 2 - Falcon40b-instruct - please uncomment below if you used this option 36 | # class ContentHandler(LLMContentHandler): 37 | # content_type = "application/json" 38 | # accepts = "application/json" 39 | 40 | # def transform_input(self, prompt, model_kwargs): 41 | # input_str = json.dumps({"inputs": prompt, "parameters": {"do_sample": False, "repetition_penalty": 1.1, "return_full_text": False, "max_new_tokens":100}}) 42 | # return input_str.encode('utf-8') 43 | 44 | # def transform_output(self, output): 45 | # response_json = json.loads(output.read().decode("utf-8")) 46 | # return response_json[0]["generated_text"] 47 | 48 | content_handler = ContentHandler() 49 | 50 | # SageMaker langchain integration, to assist invoking SageMaker endpoint. 51 | llm=SagemakerEndpoint( 52 | endpoint_name=SM_ENDPOINT_NAME, 53 | # model_kwargs=kwargs, 54 | region_name=REGION, 55 | content_handler=content_handler, 56 | ) 57 | 58 | _template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language. 59 | 60 | Chat History: 61 | {chat_history} 62 | Follow Up Input: {question} 63 | Standalone question:""" 64 | CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template) 65 | 66 | 67 | def lambda_handler(event, context): 68 | print(event) 69 | body = json.loads(event['body']) 70 | print(body) 71 | query = body['query'] 72 | uuid = body['uuid'] 73 | print(query) 74 | print(uuid) 75 | 76 | message_history = DynamoDBChatMessageHistory(table_name="MemoryTable", session_id=uuid) 77 | memory = ConversationBufferWindowMemory(memory_key="chat_history", chat_memory=message_history, return_messages=True, k=3) 78 | 79 | # This retriever is using the query API, self implement 80 | # retriever = KendraIndexRetriever(kendraindex=KENDRA_INDEX_ID, 81 | # awsregion=REGION, 82 | # return_source_documents=True) 83 | 84 | # This retriever is using the new Kendra retrieve API https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/ 85 | retriever = AmazonKendraRetriever( 86 | index_id=KENDRA_INDEX_ID, 87 | region_name=REGION, 88 | ) 89 | 90 | # retriever.get_relevant_documents(query) 91 | 92 | qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory, condense_question_prompt=CONDENSE_QUESTION_PROMPT, verbose=True) 93 | 94 | response = qa.run(query) 95 | clean_response = response.replace('\n','').strip() 96 | 97 | return { 98 | 'statusCode': 200, 99 | 'body': json.dumps(f'{clean_response}') 100 | } 101 | -------------------------------------------------------------------------------- /lab4/rag_app/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3>=1.26.163 2 | langchain>=0.0.219 -------------------------------------------------------------------------------- /lab4/template.yml: -------------------------------------------------------------------------------- 1 | --- 2 | Transform: AWS::Serverless-2016-10-31 3 | Resources: 4 | 5 | RagAppApi: 6 | Type: AWS::Serverless::Api 7 | Properties: 8 | StageName: prod 9 | RagAppFunction: 10 | Type: AWS::Serverless::Function 11 | Properties: 12 | CodeUri: ./rag_app 13 | Handler: rag_app.lambda_handler 14 | Runtime: python3.10 15 | PackageType: Zip 16 | Timeout: 60 17 | Environment: 18 | Variables: 19 | REGION: !Ref "AWS::Region" 20 | KENDRA_INDEX_ID: "***KENDRA_INDEX_ID***" 21 | SM_ENDPOINT_NAME: "***SAGEMAKER_ENDPOINT_NAME***" 22 | Role: !GetAtt LambdaExecutionRole.Arn 23 | Events: 24 | ApiEvent: 25 | Type: Api 26 | Properties: 27 | Path: /ragapp 28 | Method: POST 29 | RestApiId: 30 | Ref: RagAppApi 31 | LambdaExecutionRole: 32 | Type: AWS::IAM::Role 33 | Properties: 34 | AssumeRolePolicyDocument: 35 | Version: '2012-10-17' 36 | Statement: 37 | - Effect: Allow 38 | Principal: 39 | Service: lambda.amazonaws.com 40 | Action: 41 | - sts:AssumeRole 42 | Policies: 43 | - PolicyName: SageMakerAccess 44 | PolicyDocument: 45 | Version: '2012-10-17' 46 | Statement: 47 | - Effect: Allow 48 | Action: 49 | - sagemaker:InvokeEndpoint 50 | Resource: '*' 51 | - PolicyName: LambdaLogsAccess 52 | PolicyDocument: 53 | Version: '2012-10-17' 54 | Statement: 55 | - Effect: Allow 56 | Action: 57 | - logs:CreateLogGroup 58 | - logs:CreateLogStream 59 | - logs:PutLogEvents 60 | Resource: arn:aws:logs:*:*:* 61 | - PolicyName: DynamoDbAccess 62 | PolicyDocument: 63 | Version: '2012-10-17' 64 | Statement: 65 | - Effect: Allow 66 | Action: 67 | - dynamodb:Scan 68 | - dynamodb:Query 69 | - dynamodb:GetItem 70 | - dynamodb:PutItem 71 | - dynamodb:UpdateItem 72 | - dynamodb:DeleteItem 73 | Resource: "*" 74 | - PolicyName: KendraSearchPolicy 75 | PolicyDocument: 76 | Version: '2012-10-17' 77 | Statement: 78 | - Effect: Allow 79 | Action: 80 | - kendra:Query 81 | - kendra:BatchGetDocumentStatus 82 | - kendra:Retrieve 83 | Resource: "*" 84 | 85 | # VPC 86 | VPC: 87 | Type: AWS::EC2::VPC 88 | Properties: 89 | CidrBlock: 12.0.0.0/16 90 | EnableDnsSupport: true 91 | EnableDnsHostnames: true 92 | 93 | Subnet1: 94 | Type: AWS::EC2::Subnet 95 | Properties: 96 | VpcId: !Ref VPC 97 | CidrBlock: 12.0.0.0/24 98 | AvailabilityZone: !Join 99 | - '' 100 | - - !Ref "AWS::Region" 101 | - 'a' 102 | Subnet2: 103 | Type: AWS::EC2::Subnet 104 | Properties: 105 | VpcId: !Ref VPC 106 | CidrBlock: 12.0.1.0/24 107 | AvailabilityZone: !Join 108 | - '' 109 | - - !Ref "AWS::Region" 110 | - 'b' 111 | 112 | 113 | InternetGateway: 114 | Type: AWS::EC2::InternetGateway 115 | 116 | ElasticIp: 117 | Type: AWS::EC2::EIP 118 | Properties: 119 | Domain: vpc 120 | 121 | VPCGatewayAttachment: 122 | Type: AWS::EC2::VPCGatewayAttachment 123 | Properties: 124 | VpcId: !Ref VPC 125 | InternetGatewayId: !Ref InternetGateway 126 | 127 | PublicRouteTable: 128 | Type: AWS::EC2::RouteTable 129 | Properties: 130 | VpcId: !Ref VPC 131 | 132 | PublicRoute: 133 | Type: AWS::EC2::Route 134 | DependsOn: VPCGatewayAttachment 135 | Properties: 136 | RouteTableId: !Ref PublicRouteTable 137 | DestinationCidrBlock: 0.0.0.0/0 138 | GatewayId: !Ref InternetGateway 139 | 140 | PublicSubnetRouteTableAssociationSn1: 141 | Type: AWS::EC2::SubnetRouteTableAssociation 142 | Properties: 143 | SubnetId: !Ref Subnet1 144 | RouteTableId: !Ref PublicRouteTable 145 | 146 | PublicSubnetRouteTableAssociationSn2: 147 | Type: AWS::EC2::SubnetRouteTableAssociation 148 | Properties: 149 | SubnetId: !Ref Subnet2 150 | RouteTableId: !Ref PublicRouteTable 151 | 152 | # ECS 153 | ECSService: 154 | Type: AWS::ECS::Service 155 | DependsOn: 156 | - LoadBalancerListener 157 | - SecurityGroupEcs 158 | Properties: 159 | Cluster: !Ref ECSCluster 160 | DesiredCount: 1 161 | LaunchType: FARGATE 162 | TaskDefinition: !Ref ECSTaskDefinition 163 | LoadBalancers: 164 | - TargetGroupArn: !Ref TargetGroup 165 | ContainerPort: 80 166 | ContainerName: rag-app 167 | NetworkConfiguration: 168 | AwsvpcConfiguration: 169 | Subnets: 170 | - !Ref Subnet1 171 | - !Ref Subnet2 172 | AssignPublicIp: ENABLED 173 | SecurityGroups: 174 | - !Ref SecurityGroupEcs 175 | 176 | # ECS 177 | ECSTaskDefinition: 178 | Type: AWS::ECS::TaskDefinition 179 | Properties: 180 | RequiresCompatibilities: 181 | - FARGATE 182 | NetworkMode: awsvpc 183 | Cpu: 1024 184 | Memory: 2048 185 | ExecutionRoleArn: !GetAtt EcsTaskExecutionRole.Arn 186 | TaskRoleArn: !GetAtt EcsTaskExecutionRole.Arn 187 | ContainerDefinitions: 188 | - Name: rag-app 189 | Image: !Join 190 | - '' 191 | - - !Ref "AWS::AccountId" 192 | - ".dkr.ecr." 193 | - !Ref "AWS::Region" 194 | - ".amazonaws.com/rag-app:latest" 195 | LogConfiguration: 196 | LogDriver: awslogs 197 | Options: 198 | awslogs-region: !Ref 'AWS::Region' 199 | awslogs-group: !Ref MyLogGroup 200 | awslogs-stream-prefix: my-container 201 | Environment: 202 | - Name: BASE_URL 203 | Value: !Sub "https://${RagAppApi}.execute-api.${AWS::Region}.amazonaws.com/Stage" 204 | PortMappings: 205 | - ContainerPort: 80 206 | HostPort: 80 207 | Protocol: 'tcp' 208 | 209 | ECSCluster: 210 | Type: AWS::ECS::Cluster 211 | 212 | SecurityGroupEcs: 213 | Type: AWS::EC2::SecurityGroup 214 | Properties: 215 | GroupDescription: Allow SSH and HTTP access 216 | VpcId: !Ref VPC 217 | 218 | SecurityGroupIngress: 219 | - IpProtocol: tcp 220 | FromPort: 80 221 | ToPort: 80 222 | SourceSecurityGroupId: !Ref LoadBalancerSecurityGroup 223 | 224 | 225 | MyLogGroup: 226 | Type: AWS::Logs::LogGroup 227 | Properties: 228 | LogGroupName: rag-app-log-group 229 | RetentionInDays: 7 230 | 231 | EcsTaskExecutionRole: 232 | Type: AWS::IAM::Role 233 | Properties: 234 | AssumeRolePolicyDocument: 235 | Version: 2012-10-17 236 | Statement: 237 | - Effect: Allow 238 | Principal: 239 | Service: 240 | - ecs-tasks.amazonaws.com 241 | Action: 242 | - sts:AssumeRole 243 | Policies: 244 | - PolicyName: ecs-task-policy 245 | PolicyDocument: 246 | Version: 2012-10-17 247 | Statement: 248 | - Effect: Allow 249 | Action: 250 | - ecr:BatchCheckLayerAvailability 251 | - ecr:GetDownloadUrlForLayer 252 | - ecr:BatchGetImage 253 | - ecr:GetAuthorizationToken 254 | Resource: "*" 255 | - Effect: Allow 256 | Action: 257 | - logs:CreateLogGroup 258 | - logs:CreateLogStream 259 | - logs:PutLogEvents 260 | Resource: "arn:aws:logs:*:*:*" 261 | - Effect: Allow 262 | Action: 263 | - lambda:InvokeFunction 264 | Resource: 265 | - "*" 266 | 267 | # Load Balancing 268 | LoadBalancerSecurityGroup: 269 | Type: AWS::EC2::SecurityGroup 270 | Properties: 271 | GroupName: LoadBalancerSecurityGroup 272 | GroupDescription: Security group for load balancer 273 | VpcId: !Ref VPC 274 | SecurityGroupIngress: 275 | - IpProtocol: tcp 276 | FromPort: 80 277 | ToPort: 80 278 | CidrIp: 0.0.0.0/0 279 | LoadBalancer: 280 | Type: AWS::ElasticLoadBalancingV2::LoadBalancer 281 | Properties: 282 | Name: rag-load-balancer 283 | Subnets: 284 | - !Ref Subnet1 285 | - !Ref Subnet2 286 | SecurityGroups: 287 | - !GetAtt LoadBalancerSecurityGroup.GroupId 288 | LoadBalancerListener: 289 | Type: AWS::ElasticLoadBalancingV2::Listener 290 | Properties: 291 | LoadBalancerArn: !Ref LoadBalancer 292 | Port: 80 293 | Protocol: HTTP 294 | DefaultActions: 295 | - Type: forward 296 | TargetGroupArn: !Ref TargetGroup 297 | TargetGroup: 298 | Type: AWS::ElasticLoadBalancingV2::TargetGroup 299 | Properties: 300 | TargetType: ip 301 | Name: rag-lb-target-group 302 | Port: 80 303 | Protocol: HTTP 304 | VpcId: !Ref VPC 305 | 306 | MemoryTable: 307 | Type: AWS::DynamoDB::Table 308 | Properties: 309 | TableName: MemoryTable 310 | AttributeDefinitions: 311 | - AttributeName: SessionId 312 | AttributeType: S 313 | KeySchema: 314 | - AttributeName: SessionId 315 | KeyType: HASH 316 | BillingMode: PAY_PER_REQUEST 317 | 318 | 319 | 320 | --------------------------------------------------------------------------------