├── docs ├── cntxt.png ├── pdr.png ├── refine.png └── Architecture.drawio.png ├── CODE_OF_CONDUCT.md ├── LICENSE ├── CONTRIBUTING.md ├── .gitignore ├── README.md └── notebooks └── sagemaker-advanced-rag-langchain.ipynb /docs/cntxt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/advanced-rag-patterns-on-mixtral/main/docs/cntxt.png -------------------------------------------------------------------------------- /docs/pdr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/advanced-rag-patterns-on-mixtral/main/docs/pdr.png -------------------------------------------------------------------------------- /docs/refine.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/advanced-rag-patterns-on-mixtral/main/docs/refine.png -------------------------------------------------------------------------------- /docs/Architecture.drawio.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/advanced-rag-patterns-on-mixtral/main/docs/Architecture.drawio.png -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ### JupyterNotebooks ### 2 | # gitignore template for Jupyter Notebooks 3 | # website: http://jupyter.org/ 4 | 5 | .ipynb_checkpoints 6 | */.ipynb_checkpoints/* 7 | 8 | # IPython 9 | profile_default/ 10 | ipython_config.py 11 | 12 | # Remove previous ipynb_checkpoints 13 | # git rm -r .ipynb_checkpoints/ 14 | 15 | ### macOS ### 16 | # General 17 | .DS_Store 18 | .AppleDouble 19 | .LSOverride 20 | 21 | # Icon must end with two \r 22 | Icon 23 | 24 | 25 | # Thumbnails 26 | ._* 27 | 28 | # Files that might appear in the root of a volume 29 | .DocumentRevisions-V100 30 | .fseventsd 31 | .Spotlight-V100 32 | .TemporaryItems 33 | .Trashes 34 | .VolumeIcon.icns 35 | .com.apple.timemachine.donotpresent 36 | 37 | # Directories potentially created on remote AFP share 38 | .AppleDB 39 | .AppleDesktop 40 | Network Trash Folder 41 | Temporary Items 42 | .apdisk 43 | 44 | ### macOS Patch ### 45 | # iCloud generated files 46 | *.icloud 47 | 48 | ### Python ### 49 | # Byte-compiled / optimized / DLL files 50 | __pycache__/ 51 | *.py[cod] 52 | *$py.class 53 | 54 | # C extensions 55 | *.so 56 | 57 | # Distribution / packaging 58 | .Python 59 | build/ 60 | develop-eggs/ 61 | dist/ 62 | downloads/ 63 | eggs/ 64 | .eggs/ 65 | lib/ 66 | lib64/ 67 | parts/ 68 | sdist/ 69 | var/ 70 | wheels/ 71 | share/python-wheels/ 72 | *.egg-info/ 73 | .installed.cfg 74 | *.egg 75 | MANIFEST 76 | 77 | # PyInstaller 78 | # Usually these files are written by a python script from a template 79 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 80 | *.manifest 81 | *.spec 82 | 83 | # Installer logs 84 | pip-log.txt 85 | pip-delete-this-directory.txt 86 | 87 | # Unit test / coverage reports 88 | htmlcov/ 89 | .tox/ 90 | .nox/ 91 | .coverage 92 | .coverage.* 93 | .cache 94 | nosetests.xml 95 | coverage.xml 96 | *.cover 97 | *.py,cover 98 | .hypothesis/ 99 | .pytest_cache/ 100 | cover/ 101 | 102 | # Translations 103 | *.mo 104 | *.pot 105 | 106 | # Django stuff: 107 | *.log 108 | local_settings.py 109 | db.sqlite3 110 | db.sqlite3-journal 111 | 112 | # Flask stuff: 113 | instance/ 114 | .webassets-cache 115 | 116 | # Scrapy stuff: 117 | .scrapy 118 | 119 | # Sphinx documentation 120 | docs/_build/ 121 | 122 | # PyBuilder 123 | .pybuilder/ 124 | target/ 125 | 126 | # Jupyter Notebook 127 | 128 | # IPython 129 | 130 | # pyenv 131 | # For a library or package, you might want to ignore these files since the code is 132 | # intended to run in multiple environments; otherwise, check them in: 133 | # .python-version 134 | 135 | # pipenv 136 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 137 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 138 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 139 | # install all needed dependencies. 140 | #Pipfile.lock 141 | 142 | # poetry 143 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 144 | # This is especially recommended for binary packages to ensure reproducibility, and is more 145 | # commonly ignored for libraries. 146 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 147 | #poetry.lock 148 | 149 | # pdm 150 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 151 | #pdm.lock 152 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 153 | # in version control. 154 | # https://pdm.fming.dev/#use-with-ide 155 | .pdm.toml 156 | 157 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 158 | __pypackages__/ 159 | 160 | # Celery stuff 161 | celerybeat-schedule 162 | celerybeat.pid 163 | 164 | # SageMath parsed files 165 | *.sage.py 166 | 167 | # Environments 168 | .env 169 | .venv 170 | env/ 171 | venv/ 172 | ENV/ 173 | env.bak/ 174 | venv.bak/ 175 | 176 | # Spyder project settings 177 | .spyderproject 178 | .spyproject 179 | 180 | # Rope project settings 181 | .ropeproject 182 | 183 | # mkdocs documentation 184 | /site 185 | 186 | # mypy 187 | .mypy_cache/ 188 | .dmypy.json 189 | dmypy.json 190 | 191 | # Pyre type checker 192 | .pyre/ 193 | 194 | # pytype static type analyzer 195 | .pytype/ 196 | 197 | # Cython debug symbols 198 | cython_debug/ 199 | 200 | # PyCharm 201 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 202 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 203 | # and can be added to the global gitignore or merged into this file. For a more nuclear 204 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 205 | #.idea/ 206 | 207 | ### Python Patch ### 208 | # Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration 209 | poetry.toml 210 | 211 | # ruff 212 | .ruff_cache/ 213 | 214 | # LSP config files 215 | pyrightconfig.json 216 | 217 | ### VisualStudioCode ### 218 | .vscode/* 219 | !.vscode/settings.json 220 | !.vscode/tasks.json 221 | !.vscode/launch.json 222 | !.vscode/extensions.json 223 | !.vscode/*.code-snippets 224 | 225 | # Local History for Visual Studio Code 226 | .history/ 227 | 228 | # Built Visual Studio Code Extensions 229 | *.vsix 230 | 231 | ### VisualStudioCode Patch ### 232 | # Ignore all local history of files 233 | .history 234 | .ionide 235 | 236 | ### Windows ### 237 | # Windows thumbnail cache files 238 | Thumbs.db 239 | Thumbs.db:encryptable 240 | ehthumbs.db 241 | ehthumbs_vista.db 242 | 243 | # Dump file 244 | *.stackdump 245 | 246 | # Folder config file 247 | [Dd]esktop.ini 248 | 249 | # Recycle Bin used on file shares 250 | $RECYCLE.BIN/ 251 | 252 | # Windows Installer files 253 | *.cab 254 | *.msi 255 | *.msix 256 | *.msm 257 | *.msp 258 | 259 | # Windows shortcuts 260 | *.lnk 261 | 262 | # End of https://www.toptal.com/developers/gitignore/api/visualstudiocode,macos,windows,python,jupyternotebooks -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Advanced RAG Patterns with Mixtral on SageMaker Jumpstart 2 | 3 | Example Jupyter Notebook of [Mixtral 8x7B Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) and [BGE Large En](https://huggingface.co/BAAI/bge-large-en) to experiment Advanced Retrieval Augmented Generation (RAG) QnA systems on a SageMaker Notebook using [LangChain](https://www.langchain.com/). 4 | 5 | See the accompanying blog post on the [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/advanced-rag-patterns-on-amazon-sagemaker/) for a detailed description. 6 | 7 | ## :books: Background 8 | 9 | [Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed service for data science and machine learning (ML) workflows. 10 | You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. 11 | 12 | The [SageMaker example notebooks](https://sagemaker-examples.readthedocs.io/en/latest/) are Jupyter notebooks that demonstrate the usage of Amazon SageMaker. 13 | 14 | The [Sagemaker Example Community repository](https://github.com/aws/amazon-sagemaker-examples-community) are additional notebooks, beyond those critical for showcasing key SageMaker functionality, can be shared and explored by the commmunity. 15 | 16 | ## Prerequisites 17 | 18 | ### Sagemaker Prerequisites 19 | 20 | In this section, you sign up for an AWS account and create an AWS Identity and Access Management (IAM) admin user. 21 | 22 | If you're new to SageMaker, we recommend that you read [What is Amazon SageMaker?](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html). 23 | 24 | Follow the hyperlinks below to finish setting up the prerequisites for Sagemaker 25 | 26 | - [Create an AWS Account](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html#gs-account) - This walks you through setting up an AWS account 27 | - When you create an AWS account, you get a single sign-in identity that has complete access to all of the AWS services and resources in the account. This identity is called the AWS account root user. 28 | - Signing in to the AWS console using the email address and password that you used to create the account gives you complete access to all of the AWS resources in your account. We strongly recommend that you not use the root user for everyday tasks, even the administrative ones. 29 | - Instead, adhere to the Security best practices in [IAM](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.html), and [Create an Administrative User and Group](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html#gs-account-user). Then securely lock away the root user credentials and use them to perform only a few account and service management tasks. 30 | 31 | 32 | Once you are done with the above steps, you can move on to the one below 33 | 34 | - The Mixtral 8x7b model requires an ml.g5.48xlarge instance. Amazon SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to [launch an endpoint to host Mixtral 8x7B from SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-deploy.html), you may need to request a service quota increase to access an ml.g5.48xlarge instance for endpoint usage. You can easily [request service quota increases](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) through the AWS console, CLI, or API to allow access to those additional resources. 35 | 36 | ## Architecture 37 | 38 | ![](docs/Architecture.drawio.png) 39 | 40 | ## Solution 41 | 42 | In this repo we deploy a notebook, explain, and demonstrate the use of Mixtral 8x7B Instruct text generation combined with [BGE Large En](https://huggingface.co/BAAI/bge-large-en) embedding model to efficiently construct a Retrieval Augmented Generation (RAG) QnA system on a [SageMaker](https://www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwjC_smL6eSEAxUEaEcBHZKoCSwYABAAGgJxdQ&ase=2&gclid=CjwKCAiAi6uvBhADEiwAWiyRdm9fVXFJASMNG1LKo8hiUv7jEdUGhQ51tmCA-DngfHXsGDxLTBupFxoCvOcQAvD_BwE&ohost=www.google.com&cid=CAESVuD21B3o9zHMrlIQeG15m__r93DdZcVN4-3nXJ8u-dbMgnV5nBVFDPVOeevCZP5QgBP14_Qeor3zFwnOSibAEKrO6aqVYLSUyGSmJPwkoSC1y-dk0RNh&sig=AOD64_1g7baA79iO0oQhm2pU4_You-GQgQ&q&nis=4&adurl&ved=2ahUKEwjqhL-L6eSEAxUqD1kFHbRFAUEQ0Qx6BAgHEAE) Notebook using the Parent Document Retriever tool and Contextual Compression technique. This solution can be deployed with just a few clicks using [SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart/), a fully-managed platform that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that can be deployed quickly and easily, accelerating the development and deployment of machine learning applications. One of the key components of SageMaker Jumpstart is the Model Hub, which offers a vast catalog of pre-trained models such as the Mixtral 8x7B for a variety of tasks. 43 | 44 | The Jupyter notebook for this solution is in [Sagemaker-Advanced-RAG-langchain](https://github.com/aws-samples/advanced-rag-patterns-on-mixtral/blob/main/notebooks/sagemaker-advanced-rag-langchain.ipynb) 45 | 46 | 47 | ## Parent Document Retriever Workflow 48 | 49 | ![](docs/cntxt.png) 50 | 51 | ## Contextual Compression Workflow 52 | 53 | ![](docs/pdr.png) 54 | 55 | ## :hammer_and_wrench: Setup 56 | 57 | The quickest setup to run example notebooks includes: 58 | - An [AWS account](http://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html) 59 | - Proper [IAM User and Role](http://docs.aws.amazon.com/sagemaker/latest/dg/authentication-and-access-control.html) setup 60 | - An [Amazon SageMaker Notebook Instance](http://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html) 61 | 62 | ## Comparison of Results 63 | 64 | The following table compares results from different queries based on technique. 65 | 66 | |**Technique**| Regular Retriever Chain | Parent Document Retriever |LLM Chain Extractor: Contextual Compression |LLM Chain Filter: Contextual Compression | 67 | | :-: | :- | :- | :- | :- | 68 | |

**Query 1**

How did AWS evolve?

|AWS (Amazon Web Services) evolved from an initially unprofitable investment to an $85B annual revenue run rate business with strong profitability, offering a wide range of services and features, and becoming a significant part of Amazon's portfolio. Despite facing skepticism and short-term headwinds, AWS continued to innovate, attract new customers, and migrate active customers, offering benefits such as agility, innovation, cost-efficiency, and security. AWS also expanded its long-term investments, including chip development, to provide new capabilities and change what's possible for its customers.|AWS (Amazon Web Services) started with a feature-poor initial launch of the Elastic Compute Cloud (EC2) service in 2006, providing only one instance size, in one data center, in one region of the world, with Linux operating system instances only, and without many key features like monitoring, load balancing, auto-scaling, or persistent storage. However, AWS's success allowed them to quickly iterate and add the missing capabilities, eventually expanding to offer various flavors, sizes, and optimizations of compute, storage, and networking, as well as developing their own chips (Graviton) to push price and performance further. AWS's iterative innovation process required significant investments in financial and people resources over 20 years, often well in advance of when it would pay out, to meet customer needs and improve long-term customer experiences, loyalty, and returns for shareholders.|AWS evolved by starting as a small project inside Amazon, requiring significant capital investment and facing skepticism from both inside and outside the company. However, AWS had a head start on potential competitors and believed in the value it could bring to customers and Amazon. AWS made a long-term commitment to continue investing, resulting in over 3,300 new features and services launched in 2022. AWS has transformed how customers manage their technology infrastructure and has become an $85B annual revenue run rate business with strong profitability. AWS has also continuously improved its offerings, such as enhancing EC2 with additional features and services after its initial launch.|AWS (Amazon Web Services) evolved by initially launching feature-poor but iterating quickly based on customer feedback to add necessary capabilities. This approach allowed AWS to launch EC2 in 2006 with limited features and then continuously add new functionalities, such as additional instance sizes, data centers, regions, operating system options, monitoring tools, load balancing, auto-scaling, and persistent storage. Over time, AWS transformed from a feature-poor service to a multi-billion-dollar business by focusing on customer needs, agility, innovation, cost-efficiency, and security. AWS now has an $85B annual revenue run rate and offers over 3,300 new features and services each year, catering to a wide range of customers from start-ups to multinational companies and public sector organizations.| 69 | |

**Query 2**

Why is Amazon successful?

|Amazon is successful due to its continuous innovation and expansion into new areas such as technology infrastructure services, digital reading devices, voice-driven personal assistants, and new business models like the third-party marketplace. Its ability to scale operations quickly, as seen in the rapid expansion of its fulfillment and transportation networks, also contributes to its success. Additionally, Amazon's focus on optimization and efficiency gains in its processes has resulted in productivity improvements and cost reductions. The example of Amazon Business highlights the company's capability to leverage its e-commerce and logistics strengths in different sectors.|Amazon is successful due to its ability to constantly innovate, adapt to changing market conditions, and meet customer needs in various market segments. This is evident in the success of Amazon Business, which has grown to drive roughly $35B in annualized gross sales by delivering selection, value, and convenience to business customers. Amazon's investments in ecommerce and logistics capabilities have also enabled the creation of services like Buy with Prime, which helps merchants with direct-to-consumer websites drive conversion from views to purchases.|Based on the provided context, Amazon's success can be attributed to its strategic expansion from a book-selling platform to a global marketplace with a vibrant third-party seller ecosystem, early investment in AWS, innovation in introducing the Kindle and Alexa, and substantial growth in annual revenue from 2019 to 2022. This growth led to the expansion of the fulfillment center footprint, creation of a last-mile transportation network, and building a new sortation center network, which were optimized for productivity and cost reductions.|Amazon is successful due to its innovative business models, continuous technological advancements, and strategic organizational changes. The company has consistently disrupted traditional industries by introducing new ideas, such as an ecommerce platform for various products and services, a third-party marketplace, cloud infrastructure services (AWS), the Kindle e-reader, and the Alexa voice-driven personal assistant. Additionally, Amazon has made structural changes to improve its efficiency, such as reorganizing its US fulfillment network to decrease costs and delivery times, further contributing to its success.| 70 | |

Comparison

|

- Based on the responses from the Regular Retriever Chain, we notice that although it provides long winded answers, it suffers from context overflow and fails to mention any significant details from the corpus in regards to responding to the query provided.

- The regular retrieval chain is not able to capture the nuances with depth or contextual insight, potentially missing critical aspects of the document.

|

- The Parent Document Retriever delves deeper into the specifics of AWS's growth strategy, including the iterative process of adding new features based on customer feedback and the detailed journey from a feature-poor initial launch to a dominant market position, all while providing a context rich response.

- Responses cover a wide range of aspects, from technical innovations and market strategy to organizational efficiency and customer focus, providing a holistic view of the factors contributing to success along with examples. This can be attributed to PDR's targeted yet broad-ranging search capabilities.

|

- The LLM Chain Extractor maintains a balance between covering key points comprehensively and avoiding unnecessary depth.

- Dynamically adjusts to the query's context, ensuring the output is directly relevant and comprehensive.

|Similar to the LLM Chain Extractor, the LLM Chain Filter ensures that while the key points are covered , the output is efficient for customers looking for concise and contextual answers.| 71 | 72 | 73 | Upon comparing these different techniques, we can see that in contexts like detailing AWS’s transition from a simple service to a complex, multi-billion-dollar entity, or explaining Amazon's strategic successes, the regular retriever chain lacks the precision the more sophisticated techniques offer, leading to less targeted information. Although quite a few differences are visible between the advanced techniques discussed, they are by far more informative than regular retriever chains. 74 | 75 | For customers in industries such as healthcare, telecommunications, and financial services who are looking to implement RAG in their applications, the limitations of the regular retriever chain in providing precision, avoiding redundancy, and effectively compressing information make it less suited to fulfilling these needs compared to the more advanced parent document retriever and contextual compression techniques. These techniques are able to distill vast amounts of information into the concentrated, impactful insights that you need, while helping improve price-performance. 76 | 77 | ## Contributing 78 | 79 | For any specific questions on how to contribute, please contact [armdiazg@amazon.com](https://www.linkedin.com/in/armando-diaz-47a498113/), [vijeasns@amazon.com](https://www.linkedin.com/in/niithiyn-vijeaswaran-451245213/), [bustils@amazon.com](https://www.linkedin.com/in/sebastian-bustillo/), [puniomp@amazon.com](https://www.linkedin.com/in/marcpunio/), [farsabir@amazon.com](https://www.linkedin.com/in/farooqsabir/), and [aboujid@amazon.com](https://www.linkedin.com/in/aj-dhimine-34a8b01ba/). 80 | 81 | We welcome community contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. 82 | 83 | ## Security 84 | 85 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 86 | 87 | ## License 88 | 89 | This library is licensed under the MIT-0 License. See the [LICENSE](LICENSE) file. 90 | -------------------------------------------------------------------------------- /notebooks/sagemaker-advanced-rag-langchain.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e3eed142-6144-4ba2-a693-2bcfdeeae823", 6 | "metadata": {}, 7 | "source": [ 8 | "# Advanced Retrieval Augmented Question Answering with Mixtral 8x7B Instruct on SageMaker JumpStart using LangChain\n", 9 | "\n", 10 | "Advanced RAG Patterns with Mixtral 8x7B Instruct on SageMaker Jumpstart" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "id": "4c7a66db-4d22-4e0f-883c-8e8daf6a4291", 16 | "metadata": {}, 17 | "source": [ 18 | "In this notebook, we demonstrate the use of [Mixtral 8x7B Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) text generation combined with [BGE Large En](https://huggingface.co/BAAI/bge-large-en) embedding model to efficiently construct a Retrieval Augmented Generation (RAG) QnA system on a SageMaker Notebook. This notebook, powered by an `ml.t3.medium instance`, enables the deployment of LLMs on [SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart/). These can be called with an API endpoint created by SageMaker, which we then use to build, experiment with, and tune for comparing Advanced RAG application techniques using [LangChain](https://www.langchain.com/). Additionally, we showcase how the [FAISS](https://github.com/facebookresearch/faiss) Embedding store can be utilized to archive and retrieve embeddings, integrating it into your RAG workflow. " 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "id": "267f4f95-dfab-4c64-956e-6c29274131d0", 24 | "metadata": {}, 25 | "source": [ 26 | "## Prerequisites" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "id": "7d597937-7c3e-42cc-b595-309bd8e5ae28", 32 | "metadata": {}, 33 | "source": [ 34 | "---\n", 35 | "This Jupyter Notebook can be run on a t3.medium instance (ml.t3.medium). However, to deploy `Mixtral 8X7B Instruct` and `BGE Large En` models, you may need to request a quota increase. \n", 36 | "\n", 37 | "To request a quota increase, follow these steps:\n", 38 | "\n", 39 | "1. Navigate to the [Service Quotas console](https://console.aws.amazon.com/servicequotas/).\n", 40 | "2. Choose Amazon SageMaker.\n", 41 | "3. Review your default quota for the following resources:\n", 42 | " - `ml.g5.48xlarge` for endpoint usage\n", 43 | " - `ml.g5.2xlarge` for endpoint usage\n", 44 | "4. If needed, request a quota increase for these resources." 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "id": "b9ad641d-5a50-4493-b308-563f280f2b2f", 50 | "metadata": {}, 51 | "source": [ 52 | "
\n", 53 | "\n", 54 | "NOTE: To make sure that you have enough quotas to support your usage requirements, it's a best practice to monitor and manage your service quotas. Requests for Amazon EC2 service quota increases are subject to review by AWS engineering teams. Also, service quota increase requests aren't immediately processed when you submit a request. After your request is processed, you receive an email notification.\n", 55 | "
" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "f2277979-97a1-41a6-ac40-0f267326b40a", 61 | "metadata": {}, 62 | "source": [ 63 | "### Changing instance type\n", 64 | "---\n", 65 | "Models are supported on the following instance types:\n", 66 | "\n", 67 | " - Mixtral 8X7B and Mixtral 8X7B Instruct: `ml.g5.48xlarge`, and `ml.p4d.24xlarge`\n", 68 | " - BGE Large En: `ml.g5.2xlarge`, `ml.c6i.xlarge`,`ml.g5.4xlarge`, `ml.g5.8xlarge`, `ml.p3.2xlarge`, and `ml.g4dn.2xlarge`\n", 69 | "\n", 70 | "By default, the JumpStartModel class selects a default instance type available in your region. If you would like to use a different instance type, you can do so by specifying instance type in the JumpStartModel class.\n", 71 | "\n", 72 | "`my_model = JumpStartModel(model_id=model_id, instance_type=\"ml.p4d.24xlarge\")`" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "id": "70287fd5-1c6f-4a05-a7cc-085cb10b4508", 78 | "metadata": {}, 79 | "source": [ 80 | "### Local setup (Optional):\n", 81 | "---" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "id": "f33af4f4-69fd-4a9f-acb3-f06bbe64fb21", 87 | "metadata": {}, 88 | "source": [ 89 | "For a local server, follow these steps to execute this jupyter notebook:\n", 90 | "\n", 91 | "1. **Configure AWS CLI**: Configure [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) with your AWS credentials. Run `aws configure` and enter your AWS Access Key ID, AWS Secret Access Key, AWS Region, and default output format.\n", 92 | "\n", 93 | "2. **Install required libraries**: Install the necessary Python libraries for working with SageMaker, such as [sagemaker](https://github.com/aws/sagemaker-python-sdk/), [boto3](https://github.com/boto/boto3), and others. You can use a Python environment manager like [conda](https://docs.conda.io/en/latest/) or [virtualenv](https://virtualenv.pypa.io/en/latest/) to manage your Python packages in your preferred IDE (e.g. [Visual Studio Code](https://code.visualstudio.com/)).\n", 94 | "\n", 95 | "3. **Create an IAM role for SageMaker**: Create an AWS Identity and Access Management (IAM) role that grants your user [SageMaker permissions](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). \n", 96 | "\n", 97 | "By following these steps, you can set up a local Jupyter Notebook environment capable of deploying machine learning models on Amazon SageMaker using the appropriate IAM role for granting the necessary permissions." 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "id": "f8e19e16-86b3-4e27-94bf-00e2f832605c", 103 | "metadata": { 104 | "tags": [] 105 | }, 106 | "source": [ 107 | "## Contents\n", 108 | "---\n", 109 | "\n", 110 | "1. [Requirements](#Requirements)\n", 111 | "1. [Model Deployment](#00.-Model-Deployment)\n", 112 | "1. [Setup LangChain](#01.-Setup-LangChain)\n", 113 | "1. [Data Preparation](#Data-Preparation)\n", 114 | "1. [Question Answering](#Question-Answering)\n", 115 | "1. [Regular Retriever Chain](#Regular-Retriever-Chain)\n", 116 | "1. [Parent Document Retriever Chain](#Parent-Document-Retriever-Chain)\n", 117 | "1. [Contextual Compression Chain](#Contextual-Compression-Chain)\n", 118 | "1. [Conclusion](#Conclusion)\n", 119 | "1. [Clean Up Resources](#Clean-Up-Resources)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "id": "6b7bb282-2718-4911-a92b-4ef084441239", 125 | "metadata": {}, 126 | "source": [ 127 | "## Requirements\n", 128 | "---" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "id": "d709d071-695d-4102-a8cd-6fba6c4678a3", 134 | "metadata": {}, 135 | "source": [ 136 | "1. Create an Amazon SageMaker Notebook Instance - [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)\n", 137 | " - For Notebook Instance type, choose ml.t3.medium.\n", 138 | "2. For Select Kernel, choose [conda_python3](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html).\n", 139 | "3. Install the required packages." 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "id": "2f07d152-2889-4004-8eba-a0d9028708db", 145 | "metadata": {}, 146 | "source": [ 147 | "
\n", 148 | "\n", 149 | "NOTE:\n", 150 | "\n", 151 | "- For Amazon SageMaker Studio, select Kernel \"Python 3 (ipykernel)\".\n", 152 | "\n", 153 | "- For Amazon SageMaker Studio Classic, select Image \"Base Python 3.0\" and Kernel \"Python 3\".\n", 154 | "\n", 155 | "
" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "id": "7c22cf2e-971a-4df9-9e27-1cd7a05d8307", 161 | "metadata": {}, 162 | "source": [ 163 | "To run this notebook you would need to install the following dependencies:" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "id": "60fde7eb-a354-4934-9126-b793080328c1", 170 | "metadata": { 171 | "tags": [] 172 | }, 173 | "outputs": [], 174 | "source": [ 175 | "%%writefile requirements.txt\n", 176 | "langchain==0.1.14\n", 177 | "pypdf==4.1.0\n", 178 | "faiss-cpu==1.8.0\n", 179 | "boto3==1.34.58\n", 180 | "sqlalchemy==2.0.29" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "id": "c54cfea5-782f-49c1-8885-c8fc8955e09e", 187 | "metadata": { 188 | "tags": [] 189 | }, 190 | "outputs": [], 191 | "source": [ 192 | "!pip install -U -r requirements.txt --quiet" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "id": "ecc2e835", 198 | "metadata": {}, 199 | "source": [ 200 | "
\n", 201 | "\n", 202 | "NOTE:\n", 203 | "\n", 204 | "Before proceeding, please verify that you have the correct version of the SQLAlchemy library installed. This notebook requires SQLAlchemy >= 2.0.0.\n", 205 | "\n", 206 | "To check your installed SQLAlchemy version, you can run the following code:\n", 207 | "\n", 208 | "```python\n", 209 | "import sqlalchemy\n", 210 | "print(sqlalchemy.__version__)\n", 211 | "```\n", 212 | "\n", 213 | "If the version displayed is less than 2.0.0, and you have already installed the correct version using `pip`, you may need to \"restart\" or \"shutdown\" the Jupyter Notebook kernel to load the updated library.\n", 214 | "\n", 215 | "To restart the kernel, go to the \"Kernel\" menu and select \"Restart Kernel\". If that doesn't work, try shutting down the notebook completely and relaunching it.\n", 216 | "\n", 217 | "Restarting or shutting down the kernel will resolve any dependency issues and ensure that the correct SQLAlchemy version is loaded.\n", 218 | "\n", 219 | "If you haven't installed SQLAlchemy >= 2.0.0 yet, you can do so by running the following command in your terminal or command prompt:\n", 220 | "\n", 221 | "```\n", 222 | "pip install sqlalchemy>=2.0.29\n", 223 | "```\n", 224 | "\n", 225 | "Once the installation is complete, restart or shutdown the Jupyter Notebook kernel as described above.\n", 226 | "\n", 227 | "
" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": null, 233 | "id": "20737908", 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "import sqlalchemy\n", 238 | "print(sqlalchemy.__version__)" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "id": "1f89ca85", 245 | "metadata": {}, 246 | "outputs": [], 247 | "source": [ 248 | "import langchain\n", 249 | "print(langchain.__version__)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "id": "1c4bf9f6-1127-4444-b684-f0c933c6158a", 256 | "metadata": { 257 | "tags": [] 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "try:\n", 262 | " import sagemaker\n", 263 | "except ImportError:\n", 264 | " !pip install sagemaker --quiet" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "id": "d8193291-1bf2-478e-afff-d6afd33a358b", 270 | "metadata": { 271 | "tags": [] 272 | }, 273 | "source": [ 274 | "## 00. Model Deployment\n", 275 | "---" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "id": "8705564e-143f-411e-8537-a337255f256f", 281 | "metadata": { 282 | "tags": [] 283 | }, 284 | "source": [ 285 | "Deploy `Mixtral 8X7B Instruct` LLM model on Amazon SageMaker JumpStart:" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": null, 291 | "id": "40c642f3-7aaf-4c37-9071-36a19188e753", 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "# Import the JumpStartModel class from the SageMaker JumpStart library\n", 296 | "from sagemaker.jumpstart.model import JumpStartModel" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "id": "5be0b3ea-8482-4288-8ed2-e03d7bafcb7f", 303 | "metadata": { 304 | "tags": [] 305 | }, 306 | "outputs": [], 307 | "source": [ 308 | "# Specify the model ID for the HuggingFace Mixtral 8x7b Instruct LLM model\n", 309 | "model_id = \"huggingface-llm-mixtral-8x7b-instruct\"\n", 310 | "model = JumpStartModel(model_id=model_id)\n", 311 | "llm_predictor = model.deploy()" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "id": "2e1fdb95-bd4b-445c-b597-755fbd9a432a", 317 | "metadata": { 318 | "tags": [] 319 | }, 320 | "source": [ 321 | "Deploy `BGE Large En` embedding model on Amazon SageMaker JumpStart:" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": null, 327 | "id": "13c22418-d962-46b2-8e5e-0bcb74e6ff64", 328 | "metadata": { 329 | "tags": [] 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "# Specify the model ID for the HuggingFace BGE Large EN Embedding model\n", 334 | "model_id = \"huggingface-sentencesimilarity-bge-large-en\"\n", 335 | "text_embedding_model = JumpStartModel(model_id=model_id)\n", 336 | "embedding_predictor = text_embedding_model.deploy()" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "id": "cfd8c041-8d3c-4265-bb9b-e4d0dd0bb151", 342 | "metadata": {}, 343 | "source": [ 344 | "## 01. Setup LangChain\n", 345 | "---" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "id": "7f757369-4769-442a-87ba-db9bf75a4dfd", 352 | "metadata": { 353 | "tags": [] 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "import json\n", 358 | "import sagemaker\n", 359 | "\n", 360 | "from langchain_core.prompts import PromptTemplate\n", 361 | "from langchain_community.llms import SagemakerEndpoint\n", 362 | "from langchain_community.embeddings import SagemakerEndpointEmbeddings\n", 363 | "from langchain_community.llms.sagemaker_endpoint import LLMContentHandler\n", 364 | "from langchain_community.embeddings.sagemaker_endpoint import EmbeddingsContentHandler" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "id": "61f0256b-6fd8-4e1a-8206-ff4ca6efda72", 370 | "metadata": {}, 371 | "source": [ 372 | "Get endpoint names from predictors." 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "id": "d0e28065-2ccc-4174-98b1-c45fdb83cc6d", 379 | "metadata": { 380 | "tags": [] 381 | }, 382 | "outputs": [], 383 | "source": [ 384 | "sess = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs\n", 385 | "region = sess._region_name\n", 386 | "llm_endpoint_name = llm_predictor.endpoint_name\n", 387 | "embedding_endpoint_name = embedding_predictor.endpoint_name" 388 | ] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "id": "c7e593b2-1433-4bef-b217-be724575e4fe", 393 | "metadata": { 394 | "tags": [] 395 | }, 396 | "source": [ 397 | "Transform input and output data to proccess API calls for`Mixtral 8x7B Instruct` on Amazon SageMaker" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "id": "3616478c-d454-4196-b64c-c9445e28839f", 404 | "metadata": { 405 | "tags": [] 406 | }, 407 | "outputs": [], 408 | "source": [ 409 | "from typing import Dict\n", 410 | "\n", 411 | "class MixtralContentHandler(LLMContentHandler):\n", 412 | " content_type = \"application/json\"\n", 413 | " accepts = \"application/json\"\n", 414 | "\n", 415 | " def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:\n", 416 | " content_type = \"application/json\"\n", 417 | " accepts = \"application/json\"\n", 418 | " input_str = json.dumps(\n", 419 | " {\n", 420 | " \"inputs\": prompt,\n", 421 | " \"parameters\": {\n", 422 | " \"do_sample\": True,\n", 423 | " \"max_new_tokens\": model_kwargs.get(\"max_new_tokens\", 32768),\n", 424 | " \"top_p\": model_kwargs.get(\"top_p\", 0.9),\n", 425 | " \"temperature\": model_kwargs.get(\"temperature\", 0.6),\n", 426 | " \"return_full_text\": False,\n", 427 | " \"stop\": [\"###\", \"\"],\n", 428 | " },\n", 429 | " }\n", 430 | " )\n", 431 | " return input_str.encode(\"utf-8\")\n", 432 | " \n", 433 | " def transform_output(self, output: bytes) -> str:\n", 434 | " response_json = json.loads(output.read().decode(\"utf-8\"))\n", 435 | " return response_json[0][\"generated_text\"]" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "id": "f5476939-6f13-4ba0-9bc9-b43a92592282", 441 | "metadata": {}, 442 | "source": [ 443 | "Instantiate the LLM with SageMaker and LangChain" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "id": "f3639892-c20c-4f8f-9206-fc5877876367", 450 | "metadata": { 451 | "tags": [] 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "mixtral_content_handler = MixtralContentHandler()\n", 456 | "\n", 457 | "llm = SagemakerEndpoint(\n", 458 | " endpoint_name=llm_endpoint_name,\n", 459 | " region_name=region, \n", 460 | " model_kwargs={\"max_new_tokens\": 700, \"top_p\": 0.9, \"temperature\": 0.6},\n", 461 | " content_handler=mixtral_content_handler\n", 462 | " )" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "id": "2f58be6d-c4ff-4948-babc-52c41228c427", 468 | "metadata": {}, 469 | "source": [ 470 | "Transform input and output data to proccess API calls for`BGE Large En` on Amazon SageMaker" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": null, 476 | "id": "410c2cf2-dda6-4617-b828-255b4aa4dd57", 477 | "metadata": { 478 | "tags": [] 479 | }, 480 | "outputs": [], 481 | "source": [ 482 | "from typing import List\n", 483 | "\n", 484 | "class BGEContentHandler(EmbeddingsContentHandler):\n", 485 | " content_type = \"application/json\"\n", 486 | " accepts = \"application/json\"\n", 487 | "\n", 488 | " def transform_input(self, text_inputs: List[str], model_kwargs: dict) -> bytes:\n", 489 | " \"\"\"\n", 490 | " Transforms the input into bytes that can be consumed by SageMaker endpoint.\n", 491 | " Args:\n", 492 | " text_inputs (list[str]): A list of input text strings to be processed.\n", 493 | " model_kwargs (Dict): Additional keyword arguments to be passed to the endpoint.\n", 494 | " Possible keys and their descriptions:\n", 495 | " - mode (str): Inference method. Valid modes are 'embedding', 'nn_corpus', and 'nn_train_data'.\n", 496 | " - corpus (str): Corpus for Nearest Neighbor. Required when mode is 'nn_corpus'.\n", 497 | " - top_k (int): Top K for Nearest Neighbor. Required when mode is 'nn_corpus'.\n", 498 | " - queries (list[str]): Queries for Nearest Neighbor. Required when mode is 'nn_corpus' or 'nn_train_data'.\n", 499 | " Returns:\n", 500 | " The transformed bytes input.\n", 501 | " \"\"\"\n", 502 | " input_str = json.dumps(\n", 503 | " {\n", 504 | " \"text_inputs\": text_inputs,\n", 505 | " **model_kwargs\n", 506 | " }\n", 507 | " )\n", 508 | " return input_str.encode(\"utf-8\")\n", 509 | "\n", 510 | " def transform_output(self, output: bytes) -> List[List[float]]:\n", 511 | " \"\"\"\n", 512 | " Transforms the bytes output from the endpoint into a list of embeddings.\n", 513 | " Args:\n", 514 | " output: The bytes output from SageMaker endpoint.\n", 515 | " Returns:\n", 516 | " The transformed output - list of embeddings\n", 517 | " Note:\n", 518 | " The length of the outer list is the number of input strings.\n", 519 | " The length of the inner lists is the embedding dimension.\n", 520 | " \"\"\"\n", 521 | " response_json = json.loads(output.read().decode(\"utf-8\"))\n", 522 | " return response_json[\"embedding\"]" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "id": "54c4749e-e437-4212-b69d-6c800aa0e21c", 528 | "metadata": {}, 529 | "source": [ 530 | "Instantiate the embedding model with SageMaker and LangChain" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": null, 536 | "id": "84f6e08e-c969-4a05-8587-ad49f9d9dca5", 537 | "metadata": { 538 | "tags": [] 539 | }, 540 | "outputs": [], 541 | "source": [ 542 | "bge_content_handler = BGEContentHandler()\n", 543 | "sagemaker_embeddings = SagemakerEndpointEmbeddings(\n", 544 | " endpoint_name=embedding_endpoint_name,\n", 545 | " region_name=region,\n", 546 | " model_kwargs={\"mode\": \"embedding\"},\n", 547 | " content_handler=bge_content_handler,\n", 548 | ")" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "id": "c88793e5-c562-48ce-858b-50c918ac5249", 554 | "metadata": {}, 555 | "source": [ 556 | "## Data Preparation\n", 557 | "---" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "id": "433e0dc2-718f-47af-aa60-30fa9a60cae3", 563 | "metadata": { 564 | "tags": [] 565 | }, 566 | "source": [ 567 | "Let's first download some of the files to build our document store.\n", 568 | "\n", 569 | "In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on." 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "id": "c8b0cbc6-367a-443a-9e59-c63640a1e4c4", 576 | "metadata": { 577 | "tags": [] 578 | }, 579 | "outputs": [], 580 | "source": [ 581 | "!mkdir -p ./data\n", 582 | "\n", 583 | "from urllib.request import urlretrieve\n", 584 | "urls = [\n", 585 | " 'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',\n", 586 | " 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',\n", 587 | " 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',\n", 588 | " 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'\n", 589 | "]\n", 590 | "\n", 591 | "filenames = [\n", 592 | " 'AMZN-2022-Shareholder-Letter.pdf',\n", 593 | " 'AMZN-2021-Shareholder-Letter.pdf',\n", 594 | " 'AMZN-2020-Shareholder-Letter.pdf',\n", 595 | " 'AMZN-2019-Shareholder-Letter.pdf'\n", 596 | "]\n", 597 | "\n", 598 | "metadata = [\n", 599 | " dict(year=2022, source=filenames[0]),\n", 600 | " dict(year=2021, source=filenames[1]),\n", 601 | " dict(year=2020, source=filenames[2]),\n", 602 | " dict(year=2019, source=filenames[3])]\n", 603 | "\n", 604 | "data_root = \"./data/\"\n", 605 | "\n", 606 | "for idx, url in enumerate(urls):\n", 607 | " file_path = data_root + filenames[idx]\n", 608 | " urlretrieve(url, file_path)" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "id": "859253bf-4bb0-43bf-999a-e1abb1f6983b", 614 | "metadata": {}, 615 | "source": [ 616 | "As part of Amazon's culture, the CEO always includes a copy of the 1997 Letter to Shareholders with every new release. This will cause repetition, take longer to generate embeddings, and may skew your results. In the next section you will take the downloaded data, trim the 1997 letter (last 3 pages) and overwrite them as processed files." 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": null, 622 | "id": "1ac21b76-14b4-4c64-8cfe-408d877426c4", 623 | "metadata": { 624 | "tags": [] 625 | }, 626 | "outputs": [], 627 | "source": [ 628 | "from pypdf import PdfReader, PdfWriter\n", 629 | "import glob\n", 630 | "\n", 631 | "local_pdfs = glob.glob(data_root + '*.pdf')\n", 632 | "\n", 633 | "for local_pdf in local_pdfs:\n", 634 | " pdf_reader = PdfReader(local_pdf)\n", 635 | " pdf_writer = PdfWriter()\n", 636 | " for pagenum in range(len(pdf_reader.pages)-3):\n", 637 | " page = pdf_reader.pages[pagenum]\n", 638 | " pdf_writer.add_page(page)\n", 639 | "\n", 640 | " with open(local_pdf, 'wb') as new_file:\n", 641 | " new_file.seek(0)\n", 642 | " pdf_writer.write(new_file)\n", 643 | " new_file.truncate()" 644 | ] 645 | }, 646 | { 647 | "cell_type": "markdown", 648 | "id": "687fa7ac-6605-4842-87f8-7cc844e01c12", 649 | "metadata": {}, 650 | "source": [ 651 | "After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks." 652 | ] 653 | }, 654 | { 655 | "cell_type": "markdown", 656 | "id": "80600818-b41c-45b2-b86b-2e4c69271ed6", 657 | "metadata": {}, 658 | "source": [ 659 | "Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html)." 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": null, 665 | "id": "f13cbcc0-908c-4fa3-adab-f9eae209cf92", 666 | "metadata": { 667 | "tags": [] 668 | }, 669 | "outputs": [], 670 | "source": [ 671 | "import numpy as np\n", 672 | "from langchain_community.document_loaders import PyPDFLoader\n", 673 | "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", 674 | "\n", 675 | "documents = []\n", 676 | "\n", 677 | "for idx, file in enumerate(filenames):\n", 678 | " loader = PyPDFLoader(data_root + file)\n", 679 | " document = loader.load()\n", 680 | " for document_fragment in document:\n", 681 | " document_fragment.metadata = metadata[idx]\n", 682 | "\n", 683 | " documents += document\n", 684 | "\n", 685 | "# - in our testing Character split works better with this PDF data set\n", 686 | "text_splitter = RecursiveCharacterTextSplitter(\n", 687 | " # Set a really small chunk size, just to show.\n", 688 | " chunk_size=1000,\n", 689 | " chunk_overlap=100,\n", 690 | ")\n", 691 | "\n", 692 | "docs = text_splitter.split_documents(documents)\n", 693 | "print(docs[0])" 694 | ] 695 | }, 696 | { 697 | "cell_type": "markdown", 698 | "id": "0e27bc3e-9d2b-47cc-9764-9c8b16200b06", 699 | "metadata": {}, 700 | "source": [ 701 | "Before we are proceeding we are looking into some interesting statistics regarding the document preprocessing we just performed:" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": null, 707 | "id": "a1d6183e-9ceb-429c-8042-935d56acf4d3", 708 | "metadata": { 709 | "tags": [] 710 | }, 711 | "outputs": [], 712 | "source": [ 713 | "avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)\n", 714 | "\n", 715 | "print(f'Average length among {len(documents)} documents loaded is {avg_doc_length(documents)} characters.')\n", 716 | "print(f'After the split we have {len(docs)} documents as opposed to the original {len(documents)}.')\n", 717 | "print(f'Average length among {len(docs)} documents (after split) is {avg_doc_length(docs)} characters.')" 718 | ] 719 | }, 720 | { 721 | "cell_type": "markdown", 722 | "id": "7d55e72f-6162-4aa5-9aa9-0bbb29b026ea", 723 | "metadata": {}, 724 | "source": [ 725 | "We had 3 PDF documents and one txt file which have been split into smaller ~500 chunks." 726 | ] 727 | }, 728 | { 729 | "cell_type": "markdown", 730 | "id": "66b550cd-1f0f-445b-9c3b-dbd7abf5294f", 731 | "metadata": {}, 732 | "source": [ 733 | "Now we can see how a sample embedding would look like for one of those chunks." 734 | ] 735 | }, 736 | { 737 | "cell_type": "code", 738 | "execution_count": null, 739 | "id": "f67cd64a-ba7a-419e-953a-307704772f57", 740 | "metadata": { 741 | "tags": [] 742 | }, 743 | "outputs": [], 744 | "source": [ 745 | "sample_embedding = np.array(sagemaker_embeddings.embed_query(docs[0].page_content))\n", 746 | "print(\"Sample embedding of a document chunk: \", sample_embedding)\n", 747 | "print(\"Size of the embedding: \", sample_embedding.shape)" 748 | ] 749 | }, 750 | { 751 | "cell_type": "markdown", 752 | "id": "967efb1a-8586-4a74-bbd4-b52d1730693b", 753 | "metadata": { 754 | "tags": [] 755 | }, 756 | "source": [ 757 | "This can be easily done using [FAISS](https://github.com/facebookresearch/faiss) implementation inside [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html) which takes input the embeddings model and the documents to create the entire vector store. Using the Index Wrapper we can abstract away most of the heavy lifting such as creating the prompt, getting embeddings of the query, sampling the relevant documents and calling the LLM. [VectorStoreIndexWrapper](https://python.langchain.com/en/latest/modules/indexes/getting_started.html#one-line-index-creation) helps us with that." 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": null, 763 | "id": "0993d824-e269-4e0c-80f0-1b3bde27302d", 764 | "metadata": {}, 765 | "outputs": [], 766 | "source": [ 767 | "from langchain_community.vectorstores import FAISS\n", 768 | "from langchain.indexes.vectorstore import VectorStoreIndexWrapper\n", 769 | "\n", 770 | "vectorstore_faiss = FAISS.from_documents(\n", 771 | " docs,\n", 772 | " sagemaker_embeddings,\n", 773 | ")\n", 774 | "\n", 775 | "wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)" 776 | ] 777 | }, 778 | { 779 | "cell_type": "markdown", 780 | "id": "c95675c9-7116-4bd3-ba63-30963750e36f", 781 | "metadata": {}, 782 | "source": [ 783 | "## Question Answering\n", 784 | "---" 785 | ] 786 | }, 787 | { 788 | "cell_type": "markdown", 789 | "id": "e2d335bb-63bf-4870-8e6e-019a1b7c005d", 790 | "metadata": { 791 | "tags": [] 792 | }, 793 | "source": [ 794 | "We use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM. This wrapper performs the following steps behind the scences:\n", 795 | "\n", 796 | "- Takes input the question\n", 797 | "- Create question embedding\n", 798 | "- Fetch relevant documents\n", 799 | "- Stuff the documents and the question into a prompt\n", 800 | "- Invoke the model with the prompt and generate the answer in a human readable manner." 801 | ] 802 | }, 803 | { 804 | "cell_type": "markdown", 805 | "id": "4cf2ff92-84dd-4ad7-856c-643e342cf5cd", 806 | "metadata": {}, 807 | "source": [ 808 | "*Note: In this example we are using `Mixtral 8x7B Instruct` as the LLM under Amazon SageMaker, this particular model performs best if the inputs are provided under `[INST]` and the model is requested to generate an output after `[INST]`. In the cell below you see an example of how to control the prompt such that the LLM stays grounded and doesn't answer outside the context.*" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "id": "65402877-6bc3-439f-bba0-7aec7ea6154d", 814 | "metadata": { 815 | "tags": [] 816 | }, 817 | "source": [ 818 | "Now that we have our vector store in place, we can start asking questions." 819 | ] 820 | }, 821 | { 822 | "cell_type": "code", 823 | "execution_count": null, 824 | "id": "73ea8c05-0de9-4c5a-a53d-869aeb034592", 825 | "metadata": { 826 | "tags": [] 827 | }, 828 | "outputs": [], 829 | "source": [ 830 | "prompt_template = \"\"\"[INST]\n", 831 | "{query}\n", 832 | "[INST]\"\"\"\n", 833 | "PROMPT = PromptTemplate(\n", 834 | " template=prompt_template, input_variables=[\"query\"]\n", 835 | ")" 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": null, 841 | "id": "c398c738-0fb5-4281-b818-8edf6b610add", 842 | "metadata": {}, 843 | "outputs": [], 844 | "source": [ 845 | "query = \"How has AWS evolved?\"" 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": null, 851 | "id": "4e5bb92f-6a96-4b79-9993-c6a1a2492cf5", 852 | "metadata": {}, 853 | "outputs": [], 854 | "source": [ 855 | "answer = wrapper_store_faiss.query(question=PROMPT.format(query=query), llm=llm)\n", 856 | "print(answer)" 857 | ] 858 | }, 859 | { 860 | "cell_type": "markdown", 861 | "id": "6c609b1e-2791-4207-b4e7-329176803dff", 862 | "metadata": {}, 863 | "source": [ 864 | "Let's ask a different question:" 865 | ] 866 | }, 867 | { 868 | "cell_type": "code", 869 | "execution_count": null, 870 | "id": "96c813ff-9a1f-4c2f-be3a-3f5d2be6762c", 871 | "metadata": {}, 872 | "outputs": [], 873 | "source": [ 874 | "query_2 = \"Why is Amazon successful?\"" 875 | ] 876 | }, 877 | { 878 | "cell_type": "code", 879 | "execution_count": null, 880 | "id": "ae6e67e6-ef4d-48a4-9b67-825933e9d3d1", 881 | "metadata": {}, 882 | "outputs": [], 883 | "source": [ 884 | "answer_2 = wrapper_store_faiss.query(question=PROMPT.format(query=query_2), llm=llm)\n", 885 | "print(answer_2)" 886 | ] 887 | }, 888 | { 889 | "cell_type": "markdown", 890 | "id": "4187b411-b7ce-48e7-bda4-d8d2abcefc53", 891 | "metadata": {}, 892 | "source": [ 893 | "## Regular Retriever Chain\n", 894 | "---\n", 895 | "In the above scenario you explored the quick and easy way to get a context-aware answer to your question. Now let's have a look at a more customizable option with the help of [RetrievalQA](https://docs.smith.langchain.com/cookbook/hub-examples/retrieval-qa-chain) where you can customize how the documents fetched should be added to prompt using `chain_type` parameter. Also, if you want to control how many relevant documents should be retrieved then change the `k` parameter in the cell below to see different outputs. In many scenarios you might want to know which were the source documents that the LLM used to generate the answer, you can get those documents in the output using `return_source_documents` which returns the documents that are added to the context of the LLM prompt. `RetrievalQA` also allows you to provide a custom [prompt template](https://python.langchain.com/docs/modules/model_io/prompts/quick_start/) which can be specific to the model." 896 | ] 897 | }, 898 | { 899 | "cell_type": "code", 900 | "execution_count": null, 901 | "id": "786b9a12-3407-47ce-8457-437059d84788", 902 | "metadata": {}, 903 | "outputs": [], 904 | "source": [ 905 | "from langchain.chains import RetrievalQA\n", 906 | "\n", 907 | "prompt_template = \"\"\"[INST]\n", 908 | "Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n", 909 | "\n", 910 | "{context}\n", 911 | "\n", 912 | "Question: {question}\n", 913 | "\n", 914 | "[INST]\"\"\"\n", 915 | "PROMPT = PromptTemplate(\n", 916 | " template=prompt_template, input_variables=[\"context\", \"question\"]\n", 917 | ")\n", 918 | "\n", 919 | "qa = RetrievalQA.from_chain_type(\n", 920 | " llm=llm,\n", 921 | " chain_type=\"stuff\",\n", 922 | " retriever=vectorstore_faiss.as_retriever(\n", 923 | " search_type=\"similarity\", search_kwargs={\"k\": 3}\n", 924 | " ),\n", 925 | " return_source_documents=True,\n", 926 | " chain_type_kwargs={\"prompt\": PROMPT}\n", 927 | ")" 928 | ] 929 | }, 930 | { 931 | "cell_type": "markdown", 932 | "id": "70ff4d62-82a5-4f07-9ed4-828b468b6356", 933 | "metadata": {}, 934 | "source": [ 935 | "Let's start asking questions:" 936 | ] 937 | }, 938 | { 939 | "cell_type": "code", 940 | "execution_count": null, 941 | "id": "a304e6ab-a8ed-4d85-be9b-35ed60721a0b", 942 | "metadata": {}, 943 | "outputs": [], 944 | "source": [ 945 | "query = \"How did AWS evolve?\"\n", 946 | "result = qa({\"query\": query})\n", 947 | "print(result['result'])\n", 948 | "\n", 949 | "print(f\"\\n{result['source_documents']}\")" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": null, 955 | "id": "ba165d2c-8ee5-40e7-91b5-94be5bdc9fde", 956 | "metadata": {}, 957 | "outputs": [], 958 | "source": [ 959 | "query = \"Why is Amazon successful?\"\n", 960 | "result = qa({\"query\": query})\n", 961 | "print(result['result'])\n", 962 | "\n", 963 | "print(f\"\\n{result['source_documents']}\")" 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": null, 969 | "id": "c4c4a7f8-6bc5-4ede-bacb-ffcd656e9e74", 970 | "metadata": {}, 971 | "outputs": [], 972 | "source": [ 973 | "query = \"What business challenges has Amazon experienced?\"\n", 974 | "result = qa({\"query\": query})\n", 975 | "print(result['result'])\n", 976 | "\n", 977 | "print(f\"\\n{result['source_documents']}\")" 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": null, 983 | "id": "89ff4635-7987-42a2-aea6-b3d90110c7dd", 984 | "metadata": {}, 985 | "outputs": [], 986 | "source": [ 987 | "query = \"How was Amazon impacted by COVID-19?\"\n", 988 | "\n", 989 | "result = qa({\"query\": query})\n", 990 | "\n", 991 | "print(result['result'])\n", 992 | "\n", 993 | "print(f\"\\n{result['source_documents']}\")" 994 | ] 995 | }, 996 | { 997 | "cell_type": "markdown", 998 | "id": "64c868c8-cf34-42f6-a06f-443e04a195f4", 999 | "metadata": {}, 1000 | "source": [ 1001 | "## Parent Document Retriever Chain\n", 1002 | "---" 1003 | ] 1004 | }, 1005 | { 1006 | "cell_type": "markdown", 1007 | "id": "26568540-3442-41dc-ae12-ca8e624d5fff", 1008 | "metadata": {}, 1009 | "source": [ 1010 | "In this scenario, let's have a look at a more advanced rag option with the help of [ParentDocumentRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/parent_document_retriever). When working with document retrieval, you may encounter a trade-off between storing small chunks of a document for accurate embeddings and larger documents to preserve more context. The `ParentDocumentRetriever` strikes that balance by splitting and storing small chunks of data. \n", 1011 | "\n", 1012 | "First, a `parent_splitter` is used to divide the original documents into larger chunks called `parent documents.` These parent documents can preserve a reasonable amount of context so the LLM can.\n", 1013 | "\n", 1014 | "Next, a `child_splitter` is applied to create smaller `child documents` from the original documents. These child documents allow the embeddings to reflect more accurately their meaning.\n", 1015 | "\n", 1016 | "The child documents are then indexed in a vectorstore using embeddings. This enables efficient retrieval of relevant child documents based on similarity.\n", 1017 | "\n", 1018 | "To retrieve relevant information, the `ParentDocumentRetriever` first fetches the child documents from the vectorstore. It then looks up the parent IDs for those child documents and returns the corresponding larger parent documents.\n", 1019 | "\n", 1020 | "The `ParentDocumentRetriever` uses an [InMemoryStore](https://api.python.langchain.com/en/v0.1.4/storage/langchain.storage.in_memory.InMemoryBaseStore.html) to store and manage the parent documents. By working with both parent and child documents, this approach aims to balance accurate embeddings with contextual information, providing more meaningful and relevant retrieval results." 1021 | ] 1022 | }, 1023 | { 1024 | "cell_type": "code", 1025 | "execution_count": null, 1026 | "id": "1166d969-e079-4eaa-9479-b69a0ef05b58", 1027 | "metadata": {}, 1028 | "outputs": [], 1029 | "source": [ 1030 | "from langchain.retrievers import ParentDocumentRetriever\n", 1031 | "from langchain.storage import InMemoryStore" 1032 | ] 1033 | }, 1034 | { 1035 | "cell_type": "markdown", 1036 | "id": "5ce7358d-00b3-4778-a981-8decddb5e1ec", 1037 | "metadata": {}, 1038 | "source": [ 1039 | "Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to first split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on retrieval we retrieve the larger chunks (but still not the full documents)." 1040 | ] 1041 | }, 1042 | { 1043 | "cell_type": "code", 1044 | "execution_count": null, 1045 | "id": "10b5f339-b513-4ba5-b262-82d504dbd92a", 1046 | "metadata": {}, 1047 | "outputs": [], 1048 | "source": [ 1049 | "# This text splitter is used to create the parent documents\n", 1050 | "parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)\n", 1051 | "\n", 1052 | "# This text splitter is used to create the child documents\n", 1053 | "# It should create documents smaller than the parent\n", 1054 | "child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n", 1055 | "\n", 1056 | "# The vectorstore to use to index the child chunks\n", 1057 | "vectorstore_faiss = FAISS.from_documents(\n", 1058 | " child_splitter.split_documents(documents),\n", 1059 | " sagemaker_embeddings,\n", 1060 | ")\n", 1061 | "\n", 1062 | "# The storage layer for the parent documents\n", 1063 | "store = InMemoryStore()" 1064 | ] 1065 | }, 1066 | { 1067 | "cell_type": "code", 1068 | "execution_count": null, 1069 | "id": "c3d830d8-5101-4103-8958-f960c2728cfc", 1070 | "metadata": {}, 1071 | "outputs": [], 1072 | "source": [ 1073 | "# The storage layer for the parent documents\n", 1074 | "store = InMemoryStore()\n", 1075 | "retriever = ParentDocumentRetriever(\n", 1076 | " vectorstore=vectorstore_faiss,\n", 1077 | " docstore=store,\n", 1078 | " child_splitter=child_splitter,\n", 1079 | " parent_splitter=parent_splitter,\n", 1080 | ")" 1081 | ] 1082 | }, 1083 | { 1084 | "cell_type": "code", 1085 | "execution_count": null, 1086 | "id": "15243983-e5e5-4024-9ecb-6b965827684c", 1087 | "metadata": { 1088 | "tags": [] 1089 | }, 1090 | "outputs": [], 1091 | "source": [ 1092 | "retriever.add_documents(documents, ids=None)" 1093 | ] 1094 | }, 1095 | { 1096 | "cell_type": "markdown", 1097 | "id": "7d46bb5a-efb7-47d4-8d0d-3ec1afa95f25", 1098 | "metadata": {}, 1099 | "source": [ 1100 | "Let’s now call the vector store search functionality - we should see that it returns small chunks (since we’re storing the small chunks)." 1101 | ] 1102 | }, 1103 | { 1104 | "cell_type": "code", 1105 | "execution_count": null, 1106 | "id": "80d7d44f-0669-4d22-9efc-64ee3b0ef247", 1107 | "metadata": { 1108 | "tags": [] 1109 | }, 1110 | "outputs": [], 1111 | "source": [ 1112 | "sub_docs = vectorstore_faiss.similarity_search(\"How was Amazon impacted by COVID-19?\")" 1113 | ] 1114 | }, 1115 | { 1116 | "cell_type": "code", 1117 | "execution_count": null, 1118 | "id": "ada7c3d6-7eda-4d1a-8c0b-9889daec7f5d", 1119 | "metadata": { 1120 | "tags": [] 1121 | }, 1122 | "outputs": [], 1123 | "source": [ 1124 | "len(sub_docs[0].page_content)" 1125 | ] 1126 | }, 1127 | { 1128 | "cell_type": "code", 1129 | "execution_count": null, 1130 | "id": "e6d2a1c9-ea18-47a6-ade7-d801c712e700", 1131 | "metadata": { 1132 | "tags": [] 1133 | }, 1134 | "outputs": [], 1135 | "source": [ 1136 | "print(sub_docs[0].page_content)" 1137 | ] 1138 | }, 1139 | { 1140 | "cell_type": "markdown", 1141 | "id": "f98a54e4-0ac6-48b2-afb2-7568a0364ac0", 1142 | "metadata": {}, 1143 | "source": [ 1144 | "Let’s now retrieve from the overall retriever. This should return large documents - since it returns the documents where the smaller chunks are located." 1145 | ] 1146 | }, 1147 | { 1148 | "cell_type": "code", 1149 | "execution_count": null, 1150 | "id": "54413ac3-0012-4b31-b926-1de798e3572c", 1151 | "metadata": { 1152 | "tags": [] 1153 | }, 1154 | "outputs": [], 1155 | "source": [ 1156 | "retrieved_docs = retriever.get_relevant_documents(\"How was Amazon impacted by COVID-19?\")" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "code", 1161 | "execution_count": null, 1162 | "id": "810c388a-494a-4a3f-aed1-a59e3bbccb04", 1163 | "metadata": { 1164 | "tags": [] 1165 | }, 1166 | "outputs": [], 1167 | "source": [ 1168 | "len(retrieved_docs[0].page_content)" 1169 | ] 1170 | }, 1171 | { 1172 | "cell_type": "code", 1173 | "execution_count": null, 1174 | "id": "fab565fb-0240-4048-818a-da6a6b56b9a9", 1175 | "metadata": { 1176 | "tags": [] 1177 | }, 1178 | "outputs": [], 1179 | "source": [ 1180 | "print(retrieved_docs[0].page_content)" 1181 | ] 1182 | }, 1183 | { 1184 | "cell_type": "markdown", 1185 | "id": "ab3152a7-d33d-4eaf-81fc-15e174d119d1", 1186 | "metadata": {}, 1187 | "source": [ 1188 | "Now, let's initialize the chain using the `ParentDocumentRetriever`. We will pass the prompt in via the chain_type_kwargs argument." 1189 | ] 1190 | }, 1191 | { 1192 | "cell_type": "code", 1193 | "execution_count": null, 1194 | "id": "3d573ec9-5865-4dc7-9a47-183b70afce7f", 1195 | "metadata": { 1196 | "tags": [] 1197 | }, 1198 | "outputs": [], 1199 | "source": [ 1200 | "qa = RetrievalQA.from_chain_type(\n", 1201 | " llm=llm,\n", 1202 | " chain_type=\"stuff\",\n", 1203 | " retriever=retriever,\n", 1204 | " return_source_documents=True,\n", 1205 | " chain_type_kwargs={\"prompt\": PROMPT}\n", 1206 | ")" 1207 | ] 1208 | }, 1209 | { 1210 | "cell_type": "markdown", 1211 | "id": "98c73af7-fde2-4f30-8dba-c9f3d1dfac83", 1212 | "metadata": {}, 1213 | "source": [ 1214 | "Let's start asking questions:" 1215 | ] 1216 | }, 1217 | { 1218 | "cell_type": "code", 1219 | "execution_count": null, 1220 | "id": "320966d5-e056-452d-81ed-1cb67c873f32", 1221 | "metadata": { 1222 | "tags": [] 1223 | }, 1224 | "outputs": [], 1225 | "source": [ 1226 | "query = \"How did AWS evolve?\"\n", 1227 | "result = qa({\"query\": query})\n", 1228 | "print(result['result'])\n", 1229 | "\n", 1230 | "print(f\"\\n{result['source_documents']}\")" 1231 | ] 1232 | }, 1233 | { 1234 | "cell_type": "code", 1235 | "execution_count": null, 1236 | "id": "8b17a9cb-f084-4db5-9047-1d7b1244fa47", 1237 | "metadata": { 1238 | "tags": [] 1239 | }, 1240 | "outputs": [], 1241 | "source": [ 1242 | "query = \"Why is Amazon successful?\"\n", 1243 | "result = qa({\"query\": query})\n", 1244 | "print(result['result'])\n", 1245 | "\n", 1246 | "print(f\"\\n{result['source_documents']}\")" 1247 | ] 1248 | }, 1249 | { 1250 | "cell_type": "code", 1251 | "execution_count": null, 1252 | "id": "14e42c67-99f3-4722-aab4-48e486147b08", 1253 | "metadata": {}, 1254 | "outputs": [], 1255 | "source": [ 1256 | "query = \"What business challenges has Amazon experienced?\"\n", 1257 | "result = qa({\"query\": query})\n", 1258 | "print(result['result'])\n", 1259 | "\n", 1260 | "print(f\"\\n{result['source_documents']}\")" 1261 | ] 1262 | }, 1263 | { 1264 | "cell_type": "code", 1265 | "execution_count": null, 1266 | "id": "b54db0b8-bc30-4010-aae8-f4ff371e3fe1", 1267 | "metadata": {}, 1268 | "outputs": [], 1269 | "source": [ 1270 | "query = \"How was Amazon impacted by COVID-19?\"\n", 1271 | "result = qa({\"query\": query})\n", 1272 | "print(result['result'])\n", 1273 | "\n", 1274 | "print(f\"\\n{result['source_documents']}\")" 1275 | ] 1276 | }, 1277 | { 1278 | "cell_type": "markdown", 1279 | "id": "2d2b2233-d801-4692-885f-da9f96844bb9", 1280 | "metadata": {}, 1281 | "source": [ 1282 | "## Contextual Compression Chain\n", 1283 | "---" 1284 | ] 1285 | }, 1286 | { 1287 | "cell_type": "markdown", 1288 | "id": "286f39ba-f12e-40ac-929b-5d82a208c0f1", 1289 | "metadata": {}, 1290 | "source": [ 1291 | "In this scenario, let's have a look at one more advanced rag option called [Contextual compression](https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression). One challenge with retrieval is that usually you don’t know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.\n", 1292 | "\n", 1293 | "`Contextual compression` is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.\n", 1294 | "\n", 1295 | "To use the `Contextual Compression Retriever`, you’ll need: - a `base retriever` - a `Document Compressor`\n", 1296 | "\n", 1297 | "The `Contextual Compression Retriever` passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.\n", 1298 | "\n", 1299 | "\n", 1300 | "\n", 1301 | "\n", 1302 | "The `Contextual Compression Retriever` addresses the challenge of retrieving relevant information from a document storage system, where the pertinent data may be buried within documents containing a lot of irrelevant text. By compressing and filtering the retrieved documents based on the given query context, only the most relevant information is returned.\n", 1303 | "To utilize the `Contextual Compression Retriever`, you'll need:\n", 1304 | "\n", 1305 | "- **A base retriever**: This is the initial retriever that fetches documents from the storage system based on the query.\n", 1306 | "- **A Document Compressor**: This component takes the initially retrieved documents and shortens them by reducing the contents of individual documents or dropping irrelevant documents altogether, using the query context to determine relevance.\n", 1307 | "\n", 1308 | "The workflow is as follows: The query is passed to the base retriever, which fetches a set of potentially relevant documents. These documents are then fed into the Document Compressor, which compresses and filters them based on the query context. The resulting compressed and filtered documents, containing only the most relevant information, are then returned for further processing or use in downstream applications.\n", 1309 | "\n", 1310 | "By employing contextual compression, the `Contextual Compression Retriever` improves the quality of responses, reducing the cost of LLM calls, and enhancing the overall efficiency of the retrieval process." 1311 | ] 1312 | }, 1313 | { 1314 | "cell_type": "markdown", 1315 | "id": "9cd35b19-4bb0-411d-89f1-a35b7d178261", 1316 | "metadata": { 1317 | "tags": [] 1318 | }, 1319 | "source": [ 1320 | "### Adding contextual compression with an LLMChainExtractor\n", 1321 | "---" 1322 | ] 1323 | }, 1324 | { 1325 | "cell_type": "markdown", 1326 | "id": "0d33bfa7-108a-4bb8-828f-dfea005b2cde", 1327 | "metadata": {}, 1328 | "source": [ 1329 | "Now let’s wrap our base retriever with a `ContextualCompressionRetriever`. We’ll add an [LLMChainExtractor](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.chain_extract.LLMChainExtractor.html), which will iterate over the initially returned documents and extract from each only the content that is relevant to the query." 1330 | ] 1331 | }, 1332 | { 1333 | "cell_type": "code", 1334 | "execution_count": null, 1335 | "id": "15ce0654-eeb1-486a-b9ec-15f3149ecb5e", 1336 | "metadata": {}, 1337 | "outputs": [], 1338 | "source": [ 1339 | "from langchain.retrievers import ContextualCompressionRetriever\n", 1340 | "from langchain.retrievers.document_compressors import LLMChainExtractor\n", 1341 | "\n", 1342 | "text_splitter = RecursiveCharacterTextSplitter(\n", 1343 | " # Set a really small chunk size, just to show.\n", 1344 | " chunk_size=1000,\n", 1345 | " chunk_overlap=100,\n", 1346 | ")\n", 1347 | "\n", 1348 | "docs = text_splitter.split_documents(documents)\n", 1349 | "retriever = FAISS.from_documents(\n", 1350 | " docs,\n", 1351 | " sagemaker_embeddings,\n", 1352 | ").as_retriever()\n", 1353 | "\n", 1354 | "compressor = LLMChainExtractor.from_llm(llm)\n", 1355 | "compression_retriever = ContextualCompressionRetriever(\n", 1356 | " base_compressor=compressor, base_retriever=retriever\n", 1357 | ")\n", 1358 | "\n", 1359 | "compressed_docs = compression_retriever.get_relevant_documents(\n", 1360 | " \"How was Amazon impacted by COVID-19?\"\n", 1361 | ")\n", 1362 | "print(compressed_docs)" 1363 | ] 1364 | }, 1365 | { 1366 | "cell_type": "markdown", 1367 | "id": "86c6fae4-40f7-4808-a7da-4bd5afb1bab9", 1368 | "metadata": {}, 1369 | "source": [ 1370 | "Now, let's initialize the chain using the `ContextualCompressionRetriever` with an `LLMChainExtractor`. We will pass the prompt in via the chain_type_kwargs argument." 1371 | ] 1372 | }, 1373 | { 1374 | "cell_type": "code", 1375 | "execution_count": null, 1376 | "id": "a42b3afe-f18c-4762-9888-de060a83635b", 1377 | "metadata": { 1378 | "tags": [] 1379 | }, 1380 | "outputs": [], 1381 | "source": [ 1382 | "qa = RetrievalQA.from_chain_type(\n", 1383 | " llm=llm,\n", 1384 | " chain_type=\"stuff\",\n", 1385 | " retriever=compression_retriever,\n", 1386 | " return_source_documents=True,\n", 1387 | " chain_type_kwargs={\"prompt\": PROMPT}\n", 1388 | ")" 1389 | ] 1390 | }, 1391 | { 1392 | "cell_type": "markdown", 1393 | "id": "94bcc75d-1b1a-47d6-b14c-d83b6e320a09", 1394 | "metadata": {}, 1395 | "source": [ 1396 | "Let's start asking questions:" 1397 | ] 1398 | }, 1399 | { 1400 | "cell_type": "code", 1401 | "execution_count": null, 1402 | "id": "c70b9e88-5d5e-4e74-bbf4-ac01bf3ae129", 1403 | "metadata": { 1404 | "tags": [] 1405 | }, 1406 | "outputs": [], 1407 | "source": [ 1408 | "query = \"How did AWS evolve?\"\n", 1409 | "result = qa({\"query\": query})\n", 1410 | "print(result['result'])\n", 1411 | "\n", 1412 | "print(f\"\\n{result['source_documents']}\")" 1413 | ] 1414 | }, 1415 | { 1416 | "cell_type": "code", 1417 | "execution_count": null, 1418 | "id": "424162d8-7fbb-4da5-9f83-ebe0bacb5fea", 1419 | "metadata": {}, 1420 | "outputs": [], 1421 | "source": [ 1422 | "query = \"Why is Amazon successful?\"\n", 1423 | "result = qa({\"query\": query})\n", 1424 | "print(result['result'])\n", 1425 | "\n", 1426 | "print(f\"\\n{result['source_documents']}\")" 1427 | ] 1428 | }, 1429 | { 1430 | "cell_type": "code", 1431 | "execution_count": null, 1432 | "id": "bf6dbc6c-9524-4c7d-ab8c-d69bbe5ca116", 1433 | "metadata": {}, 1434 | "outputs": [], 1435 | "source": [ 1436 | "query = \"What business challenges has Amazon experienced?\"\n", 1437 | "result = qa({\"query\": query})\n", 1438 | "print(result['result'])\n", 1439 | "\n", 1440 | "print(f\"\\n{result['source_documents']}\")" 1441 | ] 1442 | }, 1443 | { 1444 | "cell_type": "code", 1445 | "execution_count": null, 1446 | "id": "1c3b1acf-43b0-43c5-83f1-01f466e3fb47", 1447 | "metadata": {}, 1448 | "outputs": [], 1449 | "source": [ 1450 | "query = \"How was Amazon impacted by COVID-19?\"\n", 1451 | "result = qa({\"query\": query})\n", 1452 | "print(result['result'])\n", 1453 | "\n", 1454 | "print(f\"\\n{result['source_documents']}\")" 1455 | ] 1456 | }, 1457 | { 1458 | "cell_type": "markdown", 1459 | "id": "1c909c11-34d3-440d-bf3e-87325697ebd5", 1460 | "metadata": { 1461 | "tags": [] 1462 | }, 1463 | "source": [ 1464 | "### More built-in compressors: filters\n", 1465 | "---" 1466 | ] 1467 | }, 1468 | { 1469 | "cell_type": "markdown", 1470 | "id": "e60e1328-ae7d-4127-8556-5c8a18e83222", 1471 | "metadata": {}, 1472 | "source": [ 1473 | "### LLMChainFilter\n", 1474 | "---" 1475 | ] 1476 | }, 1477 | { 1478 | "cell_type": "markdown", 1479 | "id": "1c84ad22-b15f-4099-88db-f937011aa68d", 1480 | "metadata": {}, 1481 | "source": [ 1482 | "The [LLMChainFilter](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.chain_filter.LLMChainFilter.html) is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents." 1483 | ] 1484 | }, 1485 | { 1486 | "cell_type": "code", 1487 | "execution_count": null, 1488 | "id": "69b189c2-a0d4-4e78-930a-aaa925ed499d", 1489 | "metadata": {}, 1490 | "outputs": [], 1491 | "source": [ 1492 | "from langchain.retrievers.document_compressors import LLMChainFilter\n", 1493 | "\n", 1494 | "_filter = LLMChainFilter.from_llm(llm)\n", 1495 | "compression_retriever = ContextualCompressionRetriever(\n", 1496 | " base_compressor=_filter, base_retriever=retriever\n", 1497 | ")\n", 1498 | "\n", 1499 | "compressed_docs = compression_retriever.get_relevant_documents(\n", 1500 | " \"How was Amazon impacted by COVID-19?\"\n", 1501 | ")\n", 1502 | "print(compressed_docs)" 1503 | ] 1504 | }, 1505 | { 1506 | "cell_type": "markdown", 1507 | "id": "6241230b-5a28-4093-9e8c-5469cecbba07", 1508 | "metadata": {}, 1509 | "source": [ 1510 | "Now, let's initialize the chain using the `ContextualCompressionRetriever` with an `LLMChainFilter`. We will pass the prompt in via the chain_type_kwargs argument." 1511 | ] 1512 | }, 1513 | { 1514 | "cell_type": "code", 1515 | "execution_count": null, 1516 | "id": "d724d950-0047-42c9-b1ef-355485080d58", 1517 | "metadata": { 1518 | "tags": [] 1519 | }, 1520 | "outputs": [], 1521 | "source": [ 1522 | "qa = RetrievalQA.from_chain_type(\n", 1523 | " llm=llm,\n", 1524 | " chain_type=\"stuff\",\n", 1525 | " retriever=compression_retriever,\n", 1526 | " return_source_documents=True,\n", 1527 | " chain_type_kwargs={\"prompt\": PROMPT}\n", 1528 | ")" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "markdown", 1533 | "id": "69495a8c-c96f-4c96-bbe0-df847fc9ed38", 1534 | "metadata": {}, 1535 | "source": [ 1536 | "Let's start asking questions:" 1537 | ] 1538 | }, 1539 | { 1540 | "cell_type": "code", 1541 | "execution_count": null, 1542 | "id": "753ca7c3-d7e1-41b7-abfc-879e1eacf573", 1543 | "metadata": { 1544 | "tags": [] 1545 | }, 1546 | "outputs": [], 1547 | "source": [ 1548 | "query = \"How did AWS evolve?\"\n", 1549 | "result = qa({\"query\": query})\n", 1550 | "print(result['result'])\n", 1551 | "\n", 1552 | "print(f\"\\n{result['source_documents']}\")" 1553 | ] 1554 | }, 1555 | { 1556 | "cell_type": "code", 1557 | "execution_count": null, 1558 | "id": "e359ce67-46a9-4208-a6a4-0be386dc1557", 1559 | "metadata": {}, 1560 | "outputs": [], 1561 | "source": [ 1562 | "query = \"Why is Amazon successful?\"\n", 1563 | "result = qa({\"query\": query})\n", 1564 | "print(result['result'])\n", 1565 | "\n", 1566 | "print(f\"\\n{result['source_documents']}\")" 1567 | ] 1568 | }, 1569 | { 1570 | "cell_type": "code", 1571 | "execution_count": null, 1572 | "id": "8099daff-51e0-4f65-84cb-3638f92cf830", 1573 | "metadata": {}, 1574 | "outputs": [], 1575 | "source": [ 1576 | "query = \"What business challenges has Amazon experienced?\"\n", 1577 | "result = qa({\"query\": query})\n", 1578 | "print(result['result'])\n", 1579 | "\n", 1580 | "print(f\"\\n{result['source_documents']}\")" 1581 | ] 1582 | }, 1583 | { 1584 | "cell_type": "code", 1585 | "execution_count": null, 1586 | "id": "a5afc93a-e95e-48bf-afba-66a5ba5320cc", 1587 | "metadata": {}, 1588 | "outputs": [], 1589 | "source": [ 1590 | "query = \"How was Amazon impacted by COVID-19?\"\n", 1591 | "result = qa({\"query\": query})\n", 1592 | "print(result['result'])\n", 1593 | "\n", 1594 | "print(f\"\\n{result['source_documents']}\")" 1595 | ] 1596 | }, 1597 | { 1598 | "cell_type": "markdown", 1599 | "id": "d52e4c55-0f80-47eb-83f7-6ae28fedf57f", 1600 | "metadata": {}, 1601 | "source": [ 1602 | "## Conclusion\n", 1603 | "---" 1604 | ] 1605 | }, 1606 | { 1607 | "cell_type": "markdown", 1608 | "id": "dd32616f-9b97-4960-9ccc-47f20a95dcc6", 1609 | "metadata": {}, 1610 | "source": [ 1611 | "Congratulations on completing the advanced retrieval augmented generation with `Mixtral 8x7B Instruct`! These are important techniques that combines the power of large language models with the precision of retrieval methods. Upon comparing these different techniques, we are able to see that in contexts like detailing AWS’s transition from a simple service to a complex, multi-billion-dollar entity, or explaining Amazon's strategic successes, the Regular Retriever Chain lacks the precision the more sophisticated techniques offer, leading to less targeted information. While there are quite few differences visible between the Advanced techniques discussed, they are far and away more informative than Regular Retriever Chains. For customers in industries such as HCLS, Telecommunications, and FSI who are looking to implement RAG in their applications, the limitations of the Regular Retriever Chain in providing precision, avoiding redundancy, and effectively compressing information make them less suited to fulfilling these needs compared to the more advanced Parent Document Retriever and Contextual Compression techniques, that are able to distill the vast amounts of information into the concentrated, impactful insights that customers need, while helping improve price performance." 1612 | ] 1613 | }, 1614 | { 1615 | "cell_type": "markdown", 1616 | "id": "e83d260d-8771-41ae-b173-0f81228b28dc", 1617 | "metadata": {}, 1618 | "source": [ 1619 | "In the above implementation of Advanced RAG based Question Answering we have explored the following concepts and how to implement them using Amazon SageMaker JumpStart and it's LangChain integration." 1620 | ] 1621 | }, 1622 | { 1623 | "cell_type": "markdown", 1624 | "id": "de5d08cb-6828-4089-a8fc-55c12637a2f1", 1625 | "metadata": {}, 1626 | "source": [ 1627 | "- Deploying models on Amazon SageMaker JumpStart\n", 1628 | "- Setting up `Mixtral 8x7B Instruct` and `BGE Large En` with LangChain\n", 1629 | "- Loading documents of different kind and generating embeddings to create a vector store\n", 1630 | "- Retrieving documents to the question using the following approaches from LangChain\n", 1631 | " - Regular Retrieval Chain\n", 1632 | " - Parent Document Retriever Chain\n", 1633 | " - Contextual Compression Chain\n", 1634 | "- Preparing a prompt which goes as input to the LLM\n", 1635 | "- Present an answer in a human friendly manner" 1636 | ] 1637 | }, 1638 | { 1639 | "cell_type": "markdown", 1640 | "id": "db0319f2-5309-45ba-a15c-2ff7b3720133", 1641 | "metadata": {}, 1642 | "source": [ 1643 | "### Take-aways\n", 1644 | "---\n", 1645 | "- Experiment with different retrieval techniques\n", 1646 | "- Leverage `Mixtral 8x7B Instruct` and `BGE Large En` models available under Amazon SageMaker JumpStart\n", 1647 | "- Explore options such as persistent storage of embeddings and document chunks\n", 1648 | "- Integration with enterprise data stores" 1649 | ] 1650 | }, 1651 | { 1652 | "cell_type": "markdown", 1653 | "id": "e69e4cb2-5ad3-439c-90ea-f538bc9872e8", 1654 | "metadata": {}, 1655 | "source": [ 1656 | "## Clean Up Resources\n", 1657 | "---" 1658 | ] 1659 | }, 1660 | { 1661 | "cell_type": "code", 1662 | "execution_count": null, 1663 | "id": "a99d7aae-56a6-4c63-8cea-9ac73b930eb3", 1664 | "metadata": {}, 1665 | "outputs": [], 1666 | "source": [ 1667 | "# Delete resources\n", 1668 | "llm_predictor.delete_model()\n", 1669 | "llm_predictor.delete_endpoint()\n", 1670 | "embedding_predictor.delete_model()\n", 1671 | "embedding_predictor.delete_endpoint()" 1672 | ] 1673 | }, 1674 | { 1675 | "cell_type": "markdown", 1676 | "id": "eaa7dc0f-cf00-4bd7-a20c-8952ef75fa86", 1677 | "metadata": {}, 1678 | "source": [ 1679 | "# Thank You!" 1680 | ] 1681 | } 1682 | ], 1683 | "metadata": { 1684 | "kernelspec": { 1685 | "display_name": "conda_python3", 1686 | "language": "python", 1687 | "name": "conda_python3" 1688 | }, 1689 | "language_info": { 1690 | "codemirror_mode": { 1691 | "name": "ipython", 1692 | "version": 3 1693 | }, 1694 | "file_extension": ".py", 1695 | "mimetype": "text/x-python", 1696 | "name": "python", 1697 | "nbconvert_exporter": "python", 1698 | "pygments_lexer": "ipython3", 1699 | "version": "3.10.13" 1700 | } 1701 | }, 1702 | "nbformat": 4, 1703 | "nbformat_minor": 5 1704 | } 1705 | --------------------------------------------------------------------------------