├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── docs ├── architecture1.png └── architecture2.png ├── query-amazon-redshift-with-mistral-small-Bedrock.ipynb ├── query-amazon-redshift-with-mistral-small-SageMaker.ipynb └── requirements.txt /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Transforming the Way We Talk to Databases: Using Everyday Language to Search and Retrieve Data with Mistral Small 2 | 3 | This project enables interacting with relational databases through Natural Language. Specifically, it uses the sparse Mixture of Experts (MoE) model Mistral Small, for generating SQL queries, and interpreting their corresponding tabular results to return to the user as answers. The application leverages SageMaker hosting tools to serve the model, which is deployed with a few clicks from SageMaker JumpStart. For more details, please refer to [this blogpost](https://aws.amazon.com/blogs/machine-learning/use-everyday-language-to-search-and-retrieve-data-with-mixtral-8x7b-on-amazon-sagemaker-jumpstart/). 4 | 5 | ## Setup Requirements 6 | 7 | You can currently deploy Mistral Small on SageMaker Jumpstart with one click. Amazon SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to [launch an endpoint to host Mistral Small from SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-deploy.html), you may need to request a service quota increase to access an **ml.g5.48xlarge instance for endpoint usage**. You can easily [request service quota increases](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) through the AWS console, CLI, or API to get access. 8 | 9 | You will also need access to a relational data source. Amazon Redshift is used as the primary data source in this post with the [TICKIT database](https://docs.aws.amazon.com/redshift/latest/dg/c_sampledb.html). This database helps analysts track sales activity for the fictional TICKIT web site, where users buy and sell tickets online for sporting events, shows, and concerts. 10 | 11 | _Please review any license terms applicable to the dataset with your legal team and confirm that your use case complies with the terms before proceeding._ 12 | 13 | You will first need to [set up a Redshift cluster](https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-launch-sample-cluster.html) if you don't already have one. Use the Amazon Redshift console or CLI to launch a cluster with your desired node type and number of nodes. Make sure to note the cluster endpoint, database name, and credentials to connect. 14 | 15 | Once the cluster is available, create a new database and tables in it to hold the relational data. You can load data for the TICKIT database from S3 following [these steps](https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-create-sample-db.html). 16 | 17 | To test that you successfully added data to your Redshift cluster. Follow these steps: 18 | 19 | 1. On the Redshift console, choose _Clusters_ and select the cluster to query. 20 | 2. Click on the _Query Editor_ tab to open the query editor. 21 | 3. You can run the following sample queries, or write your own: 22 | 23 | ``` 24 | /* Find total sales on a given date. */ 25 | SELECT sum(qtysold) 26 | FROM sales, date 27 | WHERE sales.dateid = date.dateid AND caldate = '2008-01-05'; 28 | ``` 29 | 30 | ``` 31 | /* Find the top 10 buyers. */ 32 | SELECT firstname, lastname, total_quantity 33 | FROM (SELECT buyerid, sum(qtysold) total_quantity 34 | FROM sales GROUP BY buyerid 35 | ORDER BY total_quantity 36 | desc limit 10) Q, users 37 | WHERE Q.buyerid = users.userid 38 | ORDER BY Q.total_quantity desc; 39 | ``` 40 | 41 | If you get successful responses, it means that you have correctly loaded the database data onto the cluster. The query editor allows saving, scheduling and sharing queries. You can also view query plans, execution details and monitor query performance. 42 | 43 | We recommend running this notebook in Amazon SageMaker Studio. For that, you must first set up a [SageMaker domain](https://docs.aws.amazon.com/sagemaker/latest/dg/sm-domain.html), making sure it has the appropriate permissions to interact with Amazon Redshift. Then, [clone this GitHub repository into SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks-git.html) with the following command: 44 | 45 | ``` 46 | git clone https://github.com/aws-samples/query-databases-with-natural-language.git 47 | ``` 48 | Open the _query-amazon-redshift-with-mistral-small-instruct.ipynb_ notebook to run through it. 49 | 50 | ## Solution Architecture 51 | 52 | ![](docs/architecture1.png) 53 | 54 | At a high level, Text2SQL solutions such as the one in this repository, consist of three core components: 55 | 56 | 1. **Structured Data Source**: This can be any relational data source such as Amazon RDS, Amazon Aurora, AWS Athena, or Snowflake. It contains the business data to query. 57 | 58 | 2. **Foundation Model**: A large language model (LLM) that is able to understand the data schema of the source database and map natural language questions into corresponding SQL queries. 59 | 60 | 3. **Orchestrator Back-end**: The code scripts can be executed in environments such as a SageMaker Studio notebook, a Lambda function, EC2, or ECS. On top of that, you could optionally add an orchestration service, such as AWS Step Functions, if needed. 61 | 62 | In the code provided in this repository, the architecture is the following: 63 | 64 | ![](docs/architecture2.png) 65 | 66 | The end-to-end flow is as follows: 67 | 68 | 1. The user asks a natural language question which is passed to the Mixtral 8x7B Instruct model, hosted in SageMaker. 69 | 70 | 2. The LLM analyzes the question and uses the schema fetched from the connected Redshift database to generate a SQL query. 71 | 72 | 3. The SQL query is run against the database. In case of an error, a retry workflow is executed. 73 | 74 | 4. Tabular results received are passed back to the LLM for interpretation and to convert them into a natural language response to the user's original question. 75 | 76 | For a step-by-step walk-through of the implementation, please check out the [reference blogpost](https://aws.amazon.com/blogs/machine-learning/use-everyday-language-to-search-and-retrieve-data-with-mixtral-8x7b-on-amazon-sagemaker-jumpstart/). 77 | -------------------------------------------------------------------------------- /docs/architecture1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/query-databases-with-natural-language/1ffa1f8f6d6c66251cf05058a4d897260617441a/docs/architecture1.png -------------------------------------------------------------------------------- /docs/architecture2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/query-databases-with-natural-language/1ffa1f8f6d6c66251cf05058a4d897260617441a/docs/architecture2.png -------------------------------------------------------------------------------- /query-amazon-redshift-with-mistral-small-Bedrock.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "f851893e-f3cc-4ce6-9b74-c5ac10a436ad", 7 | "metadata": { 8 | "scrolled": true, 9 | "tags": [] 10 | }, 11 | "outputs": [ 12 | { 13 | "name": "stdout", 14 | "output_type": "stream", 15 | "text": [ 16 | "Requirement already satisfied: boto3 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 1)) (1.36.3)\n", 17 | "Requirement already satisfied: sentencepiece in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 2)) (0.1.99)\n", 18 | "Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 3)) (2.2.3)\n", 19 | "Collecting anthropic (from -r requirements.txt (line 4))\n", 20 | " Using cached anthropic-0.49.0-py3-none-any.whl.metadata (24 kB)\n", 21 | "Collecting uuid (from -r requirements.txt (line 5))\n", 22 | " Using cached uuid-1.30-py3-none-any.whl\n", 23 | "Requirement already satisfied: transformers in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 6)) (4.48.3)\n", 24 | "Collecting tiktoken (from -r requirements.txt (line 7))\n", 25 | " Using cached tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)\n", 26 | "Requirement already satisfied: langchain in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 8)) (0.3.17)\n", 27 | "Requirement already satisfied: s3fs in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 9)) (2024.10.0)\n", 28 | "Requirement already satisfied: botocore<1.37.0,>=1.36.3 in /opt/conda/lib/python3.11/site-packages (from boto3->-r requirements.txt (line 1)) (1.36.3)\n", 29 | "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.11/site-packages (from boto3->-r requirements.txt (line 1)) (1.0.1)\n", 30 | "Requirement already satisfied: s3transfer<0.12.0,>=0.11.0 in /opt/conda/lib/python3.11/site-packages (from boto3->-r requirements.txt (line 1)) (0.11.2)\n", 31 | "Requirement already satisfied: numpy>=1.23.2 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (1.26.4)\n", 32 | "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (2.9.0.post0)\n", 33 | "Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (2024.1)\n", 34 | "Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (2025.1)\n", 35 | "Requirement already satisfied: anyio<5,>=3.5.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (4.8.0)\n", 36 | "Requirement already satisfied: distro<2,>=1.7.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (1.9.0)\n", 37 | "Requirement already satisfied: httpx<1,>=0.23.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (0.28.1)\n", 38 | "Collecting jiter<1,>=0.4.0 (from anthropic->-r requirements.txt (line 4))\n", 39 | " Downloading jiter-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)\n", 40 | "Requirement already satisfied: pydantic<3,>=1.9.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (2.10.6)\n", 41 | "Requirement already satisfied: sniffio in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (1.3.1)\n", 42 | "Requirement already satisfied: typing-extensions<5,>=4.10 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (4.12.2)\n", 43 | "Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (3.17.0)\n", 44 | "Requirement already satisfied: huggingface-hub<1.0,>=0.24.0 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (0.28.0)\n", 45 | "Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (24.2)\n", 46 | "Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (6.0.2)\n", 47 | "Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (2024.11.6)\n", 48 | "Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (2.32.3)\n", 49 | "Requirement already satisfied: tokenizers<0.22,>=0.21 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (0.21.0)\n", 50 | "Requirement already satisfied: safetensors>=0.4.1 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (0.5.2)\n", 51 | "Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (4.67.1)\n", 52 | "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (2.0.38)\n", 53 | "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (3.9.5)\n", 54 | "Requirement already satisfied: langchain-core<0.4.0,>=0.3.33 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (0.3.34)\n", 55 | "Requirement already satisfied: langchain-text-splitters<0.4.0,>=0.3.3 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (0.3.5)\n", 56 | "Requirement already satisfied: langsmith<0.4,>=0.1.17 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (0.2.11)\n", 57 | "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (9.0.0)\n", 58 | "Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /opt/conda/lib/python3.11/site-packages (from s3fs->-r requirements.txt (line 9)) (2.19.0)\n", 59 | "Requirement already satisfied: fsspec==2024.10.0.* in /opt/conda/lib/python3.11/site-packages (from s3fs->-r requirements.txt (line 9)) (2024.10.0)\n", 60 | "Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (0.12.0)\n", 61 | "Requirement already satisfied: multidict<7.0.0,>=6.0.0 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (6.1.0)\n", 62 | "Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (1.26.19)\n", 63 | "Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (1.17.2)\n", 64 | "Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (1.3.2)\n", 65 | "Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (23.2.0)\n", 66 | "Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (1.5.0)\n", 67 | "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (1.18.3)\n", 68 | "Requirement already satisfied: idna>=2.8 in /opt/conda/lib/python3.11/site-packages (from anyio<5,>=3.5.0->anthropic->-r requirements.txt (line 4)) (3.10)\n", 69 | "Requirement already satisfied: certifi in /opt/conda/lib/python3.11/site-packages (from httpx<1,>=0.23.0->anthropic->-r requirements.txt (line 4)) (2024.12.14)\n", 70 | "Requirement already satisfied: httpcore==1.* in /opt/conda/lib/python3.11/site-packages (from httpx<1,>=0.23.0->anthropic->-r requirements.txt (line 4)) (1.0.7)\n", 71 | "Requirement already satisfied: h11<0.15,>=0.13 in /opt/conda/lib/python3.11/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->anthropic->-r requirements.txt (line 4)) (0.14.0)\n", 72 | "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /opt/conda/lib/python3.11/site-packages (from langchain-core<0.4.0,>=0.3.33->langchain->-r requirements.txt (line 8)) (1.33)\n", 73 | "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /opt/conda/lib/python3.11/site-packages (from langsmith<0.4,>=0.1.17->langchain->-r requirements.txt (line 8)) (3.10.15)\n", 74 | "Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /opt/conda/lib/python3.11/site-packages (from langsmith<0.4,>=0.1.17->langchain->-r requirements.txt (line 8)) (1.0.0)\n", 75 | "Requirement already satisfied: annotated-types>=0.6.0 in /opt/conda/lib/python3.11/site-packages (from pydantic<3,>=1.9.0->anthropic->-r requirements.txt (line 4)) (0.7.0)\n", 76 | "Requirement already satisfied: pydantic-core==2.27.2 in /opt/conda/lib/python3.11/site-packages (from pydantic<3,>=1.9.0->anthropic->-r requirements.txt (line 4)) (2.27.2)\n", 77 | "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->-r requirements.txt (line 3)) (1.17.0)\n", 78 | "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->transformers->-r requirements.txt (line 6)) (3.4.1)\n", 79 | "Requirement already satisfied: greenlet!=0.4.17 in /opt/conda/lib/python3.11/site-packages (from SQLAlchemy<3,>=1.4->langchain->-r requirements.txt (line 8)) (3.1.1)\n", 80 | "Requirement already satisfied: jsonpointer>=1.9 in /opt/conda/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.4.0,>=0.3.33->langchain->-r requirements.txt (line 8)) (3.0.0)\n", 81 | "Requirement already satisfied: propcache>=0.2.0 in /opt/conda/lib/python3.11/site-packages (from yarl<2.0,>=1.0->aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (0.2.1)\n", 82 | "Using cached anthropic-0.49.0-py3-none-any.whl (243 kB)\n", 83 | "Using cached tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\n", 84 | "Downloading jiter-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (351 kB)\n", 85 | "Installing collected packages: uuid, jiter, tiktoken, anthropic\n", 86 | "Successfully installed anthropic-0.49.0 jiter-0.9.0 tiktoken-0.9.0 uuid-1.30\n", 87 | "Requirement already satisfied: ipywidgets in /opt/conda/lib/python3.11/site-packages (8.1.5)\n", 88 | "Requirement already satisfied: comm>=0.1.3 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (0.2.2)\n", 89 | "Requirement already satisfied: ipython>=6.1.0 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (8.31.0)\n", 90 | "Requirement already satisfied: traitlets>=4.3.1 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (5.14.3)\n", 91 | "Requirement already satisfied: widgetsnbextension~=4.0.12 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (4.0.13)\n", 92 | "Requirement already satisfied: jupyterlab_widgets~=3.0.12 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (3.0.13)\n", 93 | "Requirement already satisfied: decorator in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (5.1.1)\n", 94 | "Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (0.19.2)\n", 95 | "Requirement already satisfied: matplotlib-inline in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (0.1.7)\n", 96 | "Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (4.9.0)\n", 97 | "Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (3.0.50)\n", 98 | "Requirement already satisfied: pygments>=2.4.0 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (2.19.1)\n", 99 | "Requirement already satisfied: stack_data in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (0.6.3)\n", 100 | "Requirement already satisfied: typing_extensions>=4.6 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (4.12.2)\n", 101 | "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/conda/lib/python3.11/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets) (0.8.4)\n", 102 | "Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.11/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets) (0.7.0)\n", 103 | "Requirement already satisfied: wcwidth in /opt/conda/lib/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets) (0.2.13)\n", 104 | "Requirement already satisfied: executing>=1.2.0 in /opt/conda/lib/python3.11/site-packages (from stack_data->ipython>=6.1.0->ipywidgets) (2.1.0)\n", 105 | "Requirement already satisfied: asttokens>=2.1.0 in /opt/conda/lib/python3.11/site-packages (from stack_data->ipython>=6.1.0->ipywidgets) (3.0.0)\n", 106 | "Requirement already satisfied: pure_eval in /opt/conda/lib/python3.11/site-packages (from stack_data->ipython>=6.1.0->ipywidgets) (0.2.3)\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "!pip install -r requirements.txt\n", 112 | "!pip install ipywidgets" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 2, 118 | "id": "76c3aefd-00b6-47c5-996a-99f44006b85d", 119 | "metadata": { 120 | "tags": [] 121 | }, 122 | "outputs": [], 123 | "source": [ 124 | "import re\n", 125 | "import pandas as pd\n", 126 | "from io import StringIO\n", 127 | "import json\n", 128 | "import time\n", 129 | "import boto3\n", 130 | "import pandas as pd\n", 131 | "import multiprocessing\n", 132 | "import subprocess\n", 133 | "import shutil\n", 134 | "import os\n", 135 | "import codecs\n", 136 | "import uuid\n", 137 | "from transformers import LlamaTokenizer\n", 138 | "import tiktoken\n", 139 | "from transformers import AutoTokenizer\n", 140 | "REDSHIFT=boto3.client('redshift-data')\n", 141 | "S3=boto3.client('s3')\n", 142 | "from botocore.config import Config\n", 143 | "import ipywidgets as widgets\n", 144 | "from IPython.display import display\n", 145 | "\n", 146 | "config = Config(\n", 147 | " read_timeout=120,\n", 148 | " retries = dict(\n", 149 | " max_attempts = 4\n", 150 | " )\n", 151 | ")\n", 152 | "BEDROCK=boto3.client(service_name='bedrock-runtime')\n", 153 | "MIXTRAL_ENDPOINT=\"mixtral-demo\"" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "id": "c61e948f-67b0-4c72-ad33-930071495bed", 159 | "metadata": {}, 160 | "source": [ 161 | "## REDSHIFT" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "id": "e85ecd78-4240-4339-8a7b-3e77bd8d0824", 167 | "metadata": {}, 168 | "source": [ 169 | "#### Change parameters below to those of your redshift provisioned cluster" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 3, 175 | "id": "3a7227db-c014-4272-a1b7-57babd5232b4", 176 | "metadata": { 177 | "tags": [] 178 | }, 179 | "outputs": [], 180 | "source": [ 181 | "redshift_client = boto3.client('redshift-data')\n", 182 | "CLUSTER_IDENTIFIER = 'redshift-cluster-1'\n", 183 | "DATABASE = 'dev'\n", 184 | "DB_USER = 'awsuser' " 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 4, 190 | "id": "bdaacf97-6998-487f-8157-cb27d9a278bd", 191 | "metadata": { 192 | "tags": [] 193 | }, 194 | "outputs": [], 195 | "source": [ 196 | "redshift_client = boto3.client('redshift-data')\n", 197 | "CLUSTER_IDENTIFIER = 'redshift-cluster-1'\n", 198 | "DATABASE = 'dev'\n", 199 | "DB_USER = 'awsuser' " 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 5, 205 | "id": "a39647d0-ec63-43d6-9043-d34cc8e84141", 206 | "metadata": { 207 | "tags": [] 208 | }, 209 | "outputs": [], 210 | "source": [ 211 | "def token_counter(path):\n", 212 | " tokenizer = LlamaTokenizer.from_pretrained(path)\n", 213 | " return tokenizer\n", 214 | "def mixtral_counter(path):\n", 215 | " tokenizer = AutoTokenizer.from_pretrained(path)\n", 216 | " return tokenizer" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 6, 222 | "id": "f19af74f-a573-4df3-853f-96b22fba34cc", 223 | "metadata": { 224 | "tags": [] 225 | }, 226 | "outputs": [], 227 | "source": [ 228 | "def query_llm(prompts,tokens): \n", 229 | " \"\"\"\n", 230 | " Function to prompt the model to generate SQL statements from natural language\n", 231 | " \"\"\"\n", 232 | " import boto3 #remove\n", 233 | " import json #remove\n", 234 | "\n", 235 | " payload = json.dumps({\n", 236 | " \"prompt\":prompts,\n", 237 | " \"temperature\": 0.1})\n", 238 | " modelId = \"mistral.mistral-small-2402-v1:0\"\n", 239 | " accept = 'application/json'\n", 240 | " contentType = 'application/json'\n", 241 | " outputText = \"\\n\"\n", 242 | " boto3_bedrock = boto3.client(service_name='bedrock-runtime', region_name=\"us-east-1\")\n", 243 | " response = boto3_bedrock.invoke_model(body=payload, modelId=modelId, accept=accept, contentType=contentType)\n", 244 | " model_response = json.loads(response[\"body\"].read())\n", 245 | " # Extract and print the response text.\n", 246 | " response_text = model_response[\"outputs\"][0][\"text\"]\n", 247 | " return response_text" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 7, 253 | "id": "83880841-47bc-490f-920c-56619d011035", 254 | "metadata": { 255 | "tags": [] 256 | }, 257 | "outputs": [], 258 | "source": [ 259 | "def qna_llm(prompts,params):\n", 260 | " \"\"\"\n", 261 | " Function to prompt the model to generate natural language answers from sql results\n", 262 | " \"\"\" \n", 263 | " if 'mixtral' in params['model_id'].lower(): \n", 264 | " import boto3\n", 265 | " import json\n", 266 | " payload = json.dumps({\n", 267 | " \"prompt\":prompts,\n", 268 | " \"temperature\": params['temp']})\n", 269 | " modelId = \"mistral.mistral-small-2402-v1:0\"\n", 270 | " accept = 'application/json'\n", 271 | " contentType = 'application/json'\n", 272 | " outputText = \"\\n\"\n", 273 | " boto3_bedrock = boto3.client(service_name='bedrock-runtime', region_name=\"us-east-1\")\n", 274 | " response = boto3_bedrock.invoke_model(body=payload, modelId=modelId, accept=accept, contentType=contentType)\n", 275 | "\n", 276 | " model_response = json.loads(response[\"body\"].read())\n", 277 | " # Extract and print the response text.\n", 278 | " response_text = model_response[\"outputs\"][0][\"text\"]\n", 279 | " \n", 280 | " return response_text" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 8, 286 | "id": "dec641b4-70a4-45d3-86cb-bd876d15e354", 287 | "metadata": { 288 | "tags": [] 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "def chunk_csv_rows(csv_rows, max_token_per_chunk):\n", 293 | " \"\"\"\n", 294 | " Chunk CSV rows based on the maximum token count per chunk.\n", 295 | " Args:\n", 296 | " csv_rows (list): List of CSV rows.\n", 297 | " max_token_per_chunk (int, optional): Maximum token count per chunk.\n", 298 | " Returns:\n", 299 | " list: List of chunks containing CSV rows.\n", 300 | " Raises:\n", 301 | " ValueError: If a single CSV row exceeds the specified max_token_per_chunk.\n", 302 | " \"\"\"\n", 303 | " header = csv_rows[0] # Assuming the first row is the header\n", 304 | " csv_rows = csv_rows[1:] # Remove the header from the list\n", 305 | " current_chunk = []\n", 306 | " current_token_count = 0\n", 307 | " chunks = []\n", 308 | " header_token=len(mixtral_counter(\"mistralai/Mistral-Small-24B-Instruct-2501\").encode(header))\n", 309 | " for row in csv_rows:\n", 310 | " token = len(mixtral_counter(\"mistralai/Mistral-Small-24B-Instruct-2501\").encode(row))\n", 311 | " if current_token_count + token+header_token <= max_token_per_chunk:\n", 312 | " current_chunk.append(row)\n", 313 | " current_token_count += token\n", 314 | " else:\n", 315 | " if not current_chunk:\n", 316 | " raise ValueError(\"A single CSV row exceeds the specified max_token_per_chunk.\")\n", 317 | " header_and_chunk=[header]+current_chunk\n", 318 | " chunks.append(\"\\n\".join([x for x in header_and_chunk]))\n", 319 | " current_chunk = [row]\n", 320 | " current_token_count = token\n", 321 | "\n", 322 | " if current_chunk:\n", 323 | " last_chunk_and_header=[header]+current_chunk\n", 324 | " chunks.append(\"\\n\".join([x for x in last_chunk_and_header]))\n", 325 | " return chunks" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 9, 331 | "id": "248fc3be-eede-4b2c-a926-8d5c7a04f30e", 332 | "metadata": { 333 | "tags": [] 334 | }, 335 | "outputs": [], 336 | "source": [ 337 | "def get_tables_redshift(cluster_identifier, database, db_user, schema):\n", 338 | " \"\"\"\n", 339 | " Get a list of table names in a specified schema from an Amazon Redshift cluster.\n", 340 | " Args:\n", 341 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 342 | " database (str): The name of the database containing the tables.\n", 343 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 344 | " schema (str): The schema pattern to filter tables.\n", 345 | " Returns:\n", 346 | " list: A list of table names in the specified schema.\n", 347 | " \"\"\"\n", 348 | " tables_ls = REDSHIFT.list_tables(\n", 349 | " ClusterIdentifier=cluster_identifier,\n", 350 | " Database=database,\n", 351 | " DbUser=db_user,\n", 352 | " SchemaPattern=schema\n", 353 | " )\n", 354 | " return [x['name'] for x in tables_ls['Tables']]" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 10, 360 | "id": "47e81566-6e5e-4771-9fdd-65343c388f49", 361 | "metadata": { 362 | "tags": [] 363 | }, 364 | "outputs": [], 365 | "source": [ 366 | "def get_db_redshift(cluster_identifier, database, db_user):\n", 367 | " \"\"\"\n", 368 | " Get a list of databases from an Amazon Redshift cluster.\n", 369 | " Args:\n", 370 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 371 | " database (str): The name of the database containing the tables.\n", 372 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 373 | " Returns:\n", 374 | " list: A list of databases in the Redshift cluster.\n", 375 | " \"\"\"\n", 376 | " db_ls = REDSHIFT.list_databases(\n", 377 | " ClusterIdentifier=cluster_identifier,\n", 378 | " Database=database,\n", 379 | " DbUser=db_user\n", 380 | " )\n", 381 | " return db_ls['Databases']" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 11, 387 | "id": "34cf5577-7130-4302-a03f-4f5a60db0a9a", 388 | "metadata": { 389 | "tags": [] 390 | }, 391 | "outputs": [], 392 | "source": [ 393 | "def get_schema_redshift(cluster_identifier, database, db_user):\n", 394 | " \"\"\"\n", 395 | " Get a list of schemas from an Amazon Redshift cluster.\n", 396 | " Args:\n", 397 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 398 | " database (str): The name of the database containing the schemas.\n", 399 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 400 | " Returns:\n", 401 | " list: A list of schemas in the Redshift cluster.\n", 402 | " \"\"\"\n", 403 | " schema_ls = REDSHIFT.list_schemas(\n", 404 | " ClusterIdentifier=cluster_identifier,\n", 405 | " Database=database,\n", 406 | " DbUser=db_user\n", 407 | " )\n", 408 | " return schema_ls['Schemas']" 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": 12, 414 | "id": "2f0fb8ea-4c24-4da5-860e-f9eeb73ec9e8", 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "def execute_query_with_pagination(sql_query, cluster_identifier, database, db_user, max_wait_seconds=300):\n", 419 | " \"\"\"\n", 420 | " Execute multiple SQL queries in Amazon Redshift with pagination support.\n", 421 | " Args:\n", 422 | " sql_query (list): List of SQL queries to execute.\n", 423 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 424 | " database (str): The name of the database.\n", 425 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 426 | " max_wait_seconds (int): Maximum time to wait for query execution.\n", 427 | " Returns:\n", 428 | " list: A list of results from executing the SQL queries.\n", 429 | " \"\"\"\n", 430 | " results_list = []\n", 431 | " start_time = time.time()\n", 432 | "\n", 433 | " try:\n", 434 | " # Execute batch statements\n", 435 | " response_b = REDSHIFT.batch_execute_statement(\n", 436 | " ClusterIdentifier=cluster_identifier,\n", 437 | " Database=database,\n", 438 | " DbUser=db_user,\n", 439 | " Sqls=sql_query\n", 440 | " )\n", 441 | "\n", 442 | " # Monitor batch execution status\n", 443 | " while True:\n", 444 | " if time.time() - start_time > max_wait_seconds:\n", 445 | " raise TimeoutError(f\"Query execution timed out after {max_wait_seconds} seconds\")\n", 446 | "\n", 447 | " describe_b = REDSHIFT.describe_statement(\n", 448 | " Id=response_b['Id']\n", 449 | " )\n", 450 | " status = describe_b['Status']\n", 451 | "\n", 452 | " if status == 'FINISHED':\n", 453 | " break\n", 454 | " elif status == 'FAILED':\n", 455 | " error_message = describe_b.get('Error', 'Unknown error')\n", 456 | " raise RuntimeError(f\"Batch execution failed: {error_message}\")\n", 457 | " elif status == 'ABORTED':\n", 458 | " raise RuntimeError(\"Batch execution was aborted\")\n", 459 | "\n", 460 | " time.sleep(1)\n", 461 | "\n", 462 | " # Retrieve results with retry logic\n", 463 | " max_attempts = 5\n", 464 | " attempts = 0\n", 465 | "\n", 466 | " while attempts < max_attempts:\n", 467 | " try:\n", 468 | " if 'SubStatements' not in describe_b:\n", 469 | " raise RuntimeError(\"No SubStatements found in response\")\n", 470 | "\n", 471 | " for ids in describe_b['SubStatements']:\n", 472 | " if ids.get('Status') == 'FAILED':\n", 473 | " error_message = ids.get('Error', 'Unknown error')\n", 474 | " raise RuntimeError(f\"Query failed: {error_message}\")\n", 475 | "\n", 476 | " result_b = REDSHIFT.get_statement_result(Id=ids['Id'])\n", 477 | " processed_result = get_redshift_table_result(result_b)\n", 478 | " results_list.append(processed_result)\n", 479 | " break\n", 480 | "\n", 481 | " except REDSHIFT.exceptions.ResourceNotFoundException:\n", 482 | " attempts += 1\n", 483 | " if attempts == max_attempts:\n", 484 | " raise RuntimeError(f\"Failed to retrieve results after {max_attempts} attempts\")\n", 485 | " time.sleep(2)\n", 486 | "\n", 487 | " if len(results_list) != len(sql_query):\n", 488 | " raise RuntimeError(f\"Expected {len(sql_query)} results but got {len(results_list)}\")\n", 489 | "\n", 490 | " return results_list\n", 491 | "\n", 492 | " except Exception as e:\n", 493 | " error_type = type(e).__name__\n", 494 | " error_message = str(e)\n", 495 | " \n", 496 | " if isinstance(e, TimeoutError):\n", 497 | " print(f\"Timeout error: {error_message}\")\n", 498 | " else:\n", 499 | " print(f\"Error executing queries: {error_type} - {error_message}\")\n", 500 | " raise" 501 | ] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "execution_count": 13, 506 | "id": "206a9303-6fe9-4d71-b8cf-84b42f3bb5a0", 507 | "metadata": { 508 | "tags": [] 509 | }, 510 | "outputs": [], 511 | "source": [ 512 | "def execute_query_with_pagination( sql_query, cluster_identifier, database, db_user):\n", 513 | " \"\"\"\n", 514 | " Execute multiple SQL queries in Amazon Redshift with pagination support.\n", 515 | " Args:\n", 516 | " sql_query1 (str): The first SQL query to execute.\n", 517 | " sql_query2 (str): The second SQL query to execute.\n", 518 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 519 | " database (str): The name of the database.\n", 520 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 521 | " Returns:\n", 522 | " list: A list of results from executing the SQL queries.\n", 523 | " \"\"\"\n", 524 | " results_list=[]\n", 525 | " response_b = REDSHIFT.batch_execute_statement(\n", 526 | " ClusterIdentifier=cluster_identifier,\n", 527 | " Database=database,\n", 528 | " DbUser=db_user,\n", 529 | " Sqls=sql_query\n", 530 | " )\n", 531 | " describe_b=REDSHIFT.describe_statement(\n", 532 | " Id=response_b['Id'],\n", 533 | " )\n", 534 | " status=describe_b['Status']\n", 535 | " while status != \"FINISHED\":\n", 536 | " time.sleep(1)\n", 537 | " describe_b=REDSHIFT.describe_statement(\n", 538 | " Id=response_b['Id'],\n", 539 | " ) \n", 540 | " status=describe_b['Status']\n", 541 | " max_attempts = 5 \n", 542 | " attempts = 0\n", 543 | " while attempts < max_attempts:\n", 544 | " try:\n", 545 | " for ids in describe_b['SubStatements']:\n", 546 | " result_b = REDSHIFT.get_statement_result(Id=ids['Id']) \n", 547 | " results_list.append(get_redshift_table_result(result_b))\n", 548 | " break\n", 549 | " except REDSHIFT.exceptions.ResourceNotFoundException as e:\n", 550 | " attempts += 1\n", 551 | " time.sleep(2)\n", 552 | " print(\"Returning results: \" + str(results_list))\n", 553 | " return results_list" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": 14, 559 | "id": "bb1c043c-60d0-4d34-8fdd-886d6120c40f", 560 | "metadata": { 561 | "tags": [] 562 | }, 563 | "outputs": [], 564 | "source": [ 565 | "def get_redshift_table_result(response):\n", 566 | " \"\"\"\n", 567 | " Extracts result data from a Redshift query response and returns it as a CSV string.\n", 568 | " Args:\n", 569 | " response (dict): The response object from a Redshift query.\n", 570 | " Returns:\n", 571 | " str: A CSV string containing the result data.\n", 572 | " \"\"\"\n", 573 | " print(\"Working with query response: \" + str(response))\n", 574 | " columns = [c['name'] for c in response['ColumnMetadata']] \n", 575 | " data = []\n", 576 | " for r in response['Records']:\n", 577 | " row = []\n", 578 | " for col in r:\n", 579 | " row.append(list(col.values())[0]) \n", 580 | " data.append(row)\n", 581 | " df = pd.DataFrame(data, columns=columns) \n", 582 | " return df.to_csv(index=False)" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 15, 588 | "id": "8c24a0b9-ed77-4222-bc87-af828502e2ed", 589 | "metadata": { 590 | "tags": [] 591 | }, 592 | "outputs": [], 593 | "source": [ 594 | "def execute_query_redshift(sql_query, cluster_identifier, database, db_user):\n", 595 | " \"\"\"\n", 596 | " Execute a SQL query on an Amazon Redshift cluster.\n", 597 | " Args:\n", 598 | " sql_query (str): The SQL query to execute.\n", 599 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 600 | " database (str): The name of the database.\n", 601 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 602 | " Returns:\n", 603 | " dict: The response object from executing the SQL query.\n", 604 | " \"\"\"\n", 605 | " response = REDSHIFT.execute_statement(\n", 606 | " ClusterIdentifier=cluster_identifier,\n", 607 | " Database=database,\n", 608 | " DbUser=db_user,\n", 609 | " Sql=sql_query\n", 610 | " )\n", 611 | " return response" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": 16, 617 | "id": "1a7c4ef0-9166-4faf-a820-0c03d8974e87", 618 | "metadata": { 619 | "tags": [] 620 | }, 621 | "outputs": [], 622 | "source": [ 623 | "def single_execute_query(sql_query, cluster_identifier, database, db_user,question):\n", 624 | " \"\"\"\n", 625 | " Execute a single SQL query on an Amazon Redshift cluster and process the result.\n", 626 | "\n", 627 | " Args:\n", 628 | " sql_query (str): The SQL query to execute.\n", 629 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 630 | " database (str): The name of the database.\n", 631 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 632 | " question (str): A descriptive label or question associated with the query.\n", 633 | "\n", 634 | " Returns:\n", 635 | " pandas.DataFrame: DataFrame containing the processed result of the SQL query.\n", 636 | "\n", 637 | " \"\"\"\n", 638 | " result_sets = []\n", 639 | " response = execute_query_redshift(sql_query, cluster_identifier, database, db_user)\n", 640 | " df=redshift_querys(sql_query,response,question,params,cluster_identifier, database, db_user,question) \n", 641 | " return df" 642 | ] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": 17, 647 | "id": "30c4b2f5-65e8-4c9c-b0bc-f19406d7838c", 648 | "metadata": { 649 | "tags": [] 650 | }, 651 | "outputs": [], 652 | "source": [ 653 | "def llm_debugger(question, statement, error, params): \n", 654 | " \"\"\"\n", 655 | " Generate debugging guidance and expected SQL correction for a PostgreSQL error.\n", 656 | " Args:\n", 657 | " question (str): The user's question or intent.\n", 658 | " statement (str): The SQL statement that caused the error.\n", 659 | " error (str): The error message encountered.\n", 660 | " params (dict): Additional parameters including schema, sample data, and length.\n", 661 | " Returns:\n", 662 | " str: Formatted debugging guidance and expected SQL correction.\n", 663 | " \"\"\"\n", 664 | " prompts=f'''<>[INST]\n", 665 | "You are a PostgreSQL developer who is an expert at debugging errors. \n", 666 | "\n", 667 | "Here are the schema definition of table(s):\n", 668 | "{params['schema']}\n", 669 | "#############################\n", 670 | "Here are example records for each table:\n", 671 | "{params['sample']}\n", 672 | "#############################\n", 673 | "Here is the sql statement that threw the error below:\n", 674 | "{statement}\n", 675 | "#############################\n", 676 | "Here is the error to debug:\n", 677 | "{error}\n", 678 | "#############################\n", 679 | "Here is the intent of the user:\n", 680 | "{params['prompt']}\n", 681 | "<>\n", 682 | "First understand the error and think about how you can fix the error.\n", 683 | "Use the provided schema and sample row to guide your thought process for a solution.\n", 684 | "Do all this thinking inside XML tags.This is a space for you to write down relevant content and will not be shown to the user.\n", 685 | "\n", 686 | "Once your are done debugging, provide the the correct SQL statement without any additional text.\n", 687 | "When generating the correct SQL statement:\n", 688 | "1. Pay attention to the schema and table name and use them correctly in your generated sql. \n", 689 | "2. Never query for all columns from a table unless the question says so. You must query only the columns that are needed to answer the question.\n", 690 | "3. Wrap each column name in double quotes (\") to denote them as delimited identifiers. Do not use backslash (\\) to escape underscores (_) in column names. \n", 691 | "\n", 692 | "Format your response as:\n", 693 | " Correct SQL Statement [/INST]'''\n", 694 | "\n", 695 | " answer=query_llm(prompts,round(params['sql-len']))\n", 696 | " answer = answer.replace(\"\\\\\",\"\")\n", 697 | " return answer" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": 18, 703 | "id": "aae1ae4a-1171-48c4-9799-c0d57566015b", 704 | "metadata": { 705 | "tags": [] 706 | }, 707 | "outputs": [], 708 | "source": [ 709 | "def redshift_querys(q_s,response,prompt,params,cluster_identifier, database, db_user,question): \n", 710 | " \"\"\"\n", 711 | " Execute a Redshift query, handle errors, debug SQL, and return the result.\n", 712 | "\n", 713 | " Args:\n", 714 | " q_s (str): The SQL statement to execute or debug.\n", 715 | " response (dict): The response object from executing the SQL statement.\n", 716 | " prompt (str): The user's question or intent.\n", 717 | " params (dict): Additional parameters including schema, sample data, and length.\n", 718 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 719 | " database (str): The name of the database.\n", 720 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 721 | " question (str): A descriptive label or question associated with the query.\n", 722 | "\n", 723 | " Returns:\n", 724 | " pandas.DataFrame or str: DataFrame containing the query result, or debugging failure message with no result.\n", 725 | "\n", 726 | " \"\"\"\n", 727 | " max_execution=5\n", 728 | " attempt_number=0\n", 729 | " debug_count=max_execution\n", 730 | " try:\n", 731 | " statement_result = REDSHIFT.get_statement_result(\n", 732 | " Id=response['Id'],\n", 733 | "\n", 734 | " )\n", 735 | " except REDSHIFT.exceptions.ResourceNotFoundException as err: \n", 736 | " describe_statement=REDSHIFT.describe_statement(\n", 737 | " Id=response['Id'],\n", 738 | " )\n", 739 | " query_state=describe_statement['Status'] \n", 740 | " while query_state in ['SUBMITTED','PICKED','STARTED']:\n", 741 | " time.sleep(1)\n", 742 | " describe_statement=REDSHIFT.describe_statement(\n", 743 | " Id=response['Id'],\n", 744 | " )\n", 745 | " query_state=describe_statement['Status']\n", 746 | " while (max_execution > 0 and query_state == \"FAILED\"):\n", 747 | " max_execution = max_execution - 1\n", 748 | " attempt_number = 5 - max_execution\n", 749 | " print(\"- - - - - - - - - - - - - -\\n\")\n", 750 | " print(f\"\\nDEBUG TRIAL {attempt_number}\")\n", 751 | " bad_sql=describe_statement['QueryString']\n", 752 | " print(f\"\\nBAD SQL:\\n{bad_sql}\") \n", 753 | " error=describe_statement['Error']\n", 754 | " print(f\"ERROR:{error}\")\n", 755 | " print(\"\\nDEBUGGING...\")\n", 756 | " cql=llm_debugger(prompt, bad_sql, error, params) \n", 757 | " idx1 = cql.index('')\n", 758 | " idx2 = cql.index('')\n", 759 | " q_s=cql[idx1 + len('') + 1: idx2]\n", 760 | " print(f\"\\nDEBUGGED SQL {q_s}\")\n", 761 | " response = execute_query_redshift(q_s, cluster_identifier, database, db_user)\n", 762 | " describe_statement=REDSHIFT.describe_statement(\n", 763 | " Id=response['Id'],\n", 764 | " )\n", 765 | " query_state=describe_statement['Status']\n", 766 | " # print(f\"\\n{query_state}\")\n", 767 | " while query_state in ['SUBMITTED','PICKED','STARTED']:\n", 768 | " time.sleep(2)\n", 769 | " describe_statement=REDSHIFT.describe_statement(\n", 770 | " Id=response['Id'],\n", 771 | " )\n", 772 | " query_state=describe_statement['Status']\n", 773 | " if query_state == \"FINISHED\": \n", 774 | " break \n", 775 | " \n", 776 | " if max_execution == 0 and query_state == \"FAILED\":\n", 777 | " print(f\"DEBUGGING FAILED IN {str(debug_count)} ATTEMPTS\")\n", 778 | " else: \n", 779 | " max_attempts = 5\n", 780 | " attempts = 0\n", 781 | " while attempts < max_attempts:\n", 782 | " try:\n", 783 | " time.sleep(1)\n", 784 | " statement_result = REDSHIFT.get_statement_result(\n", 785 | " Id=response['Id']\n", 786 | " )\n", 787 | " break\n", 788 | "\n", 789 | " except REDSHIFT.exceptions.ResourceNotFoundException as e:\n", 790 | " attempts += 1\n", 791 | " time.sleep(5)\n", 792 | " if max_execution == 0 and query_state == \"FAILED\":\n", 793 | " df=f\"DEBUGGING FAILED IN {str(debug_count)} ATTEMPTS. NO RESULT AVAILABLE\"\n", 794 | " else:\n", 795 | " df=get_redshift_table_result(statement_result)\n", 796 | " return df, q_s" 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 19, 802 | "id": "3cab616f-b529-4646-961f-0bb1e420e018", 803 | "metadata": { 804 | "tags": [] 805 | }, 806 | "outputs": [], 807 | "source": [ 808 | "def redshift_qna(params):\n", 809 | " \"\"\"\n", 810 | " Execute a Q&A process for generating SQL queries based on user questions.\n", 811 | " Args:\n", 812 | " params (dict): A dictionary containing parameters including table name, database name, prompt, etc.\n", 813 | " Returns:\n", 814 | " tuple: A tuple containing the response, generated SQL statement, and query output.\n", 815 | " \"\"\"\n", 816 | " # sql1=f\"SELECT * FROM information_schema.columns WHERE table_name='{params['table']}' AND table_schema='{params['db']}'\"\n", 817 | " # sql2=f\"SELECT * from dev.{params['db']}.{params['table']} LIMIT 10\"\n", 818 | " sql1=f\"SELECT table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type FROM information_schema.columns WHERE table_schema='{params['db']}'\"\n", 819 | " sql2=[]\n", 820 | " for table in params['tables']:\n", 821 | " sql2.append(f\"SELECT * from {params['db']}.{table} LIMIT 3\")\n", 822 | " sqls=[sql1]+sql2\n", 823 | " print(sqls)\n", 824 | " question=params['prompt']\n", 825 | " results=execute_query_with_pagination(sqls, CLUSTER_IDENTIFIER, db, DB_USER) \n", 826 | " col_names=results[0].split('\\n')[0]\n", 827 | " observations=\"\\n\".join(sorted(results[0].split('\\n')[1:])).strip()\n", 828 | " params['schema']=f\"{col_names}\\n{observations}\"\n", 829 | " params['sample']=''\n", 830 | " for examples in results[1:]:\n", 831 | " params['sample']+=f\"{examples}\\n\\n\"\n", 832 | " \n", 833 | " prompts=f\"\"\"<>[INST]\n", 834 | "You are an expert PostgreSQL developer. Your job is to provide a syntactically correct PostgreSQL query given a user question.\n", 835 | "Here are the schema definition of table(s):\n", 836 | "########\n", 837 | "{params['schema']}\n", 838 | "########\n", 839 | "\n", 840 | "Here are example records for each table:\n", 841 | "##########\n", 842 | "{params['sample']}\n", 843 | "###########\n", 844 | "<>\n", 845 | "Here are some instructions when generating SQL statements:\n", 846 | "1. Determine the necessary table(s) and schema needed for an accurate query.\n", 847 | "2. Limit your queries to only the required columns to prevent unnecessary data retrieval and improve query performance.\n", 848 | "3. For clarity and to prevent potential conflicts, always include the schema name when referencing table names in your SQL queries.\n", 849 | "4. When working with Amazon Redshift table and column names containing underscores, do not use the backslash escape character (\\). Instead, use double quotes (\"\") to enclose the names in your queries.\n", 850 | "5. Do not mention 'dev' or 'public' in the queries.\n", 851 | "In your response, provide a single SQL statement to answer the question, avoid additional text that would cause failure during executing the sql. \n", 852 | "Format your response as:\n", 853 | "\n", 854 | "generated SQL statement \n", 855 | "\n", 856 | "\n", 857 | "Question: {question}[/INST]\"\"\"\n", 858 | "\n", 859 | " print(prompts)\n", 860 | " q_s=query_llm(prompts,200)\n", 861 | " sql_pattern = re.compile(r'(.*?)(?:|$)', re.DOTALL) \n", 862 | " sql_match = re.search(sql_pattern, q_s)\n", 863 | " q_s = sql_match.group(1) \n", 864 | " q_s = q_s.replace(\"\\\\\",\"\")\n", 865 | " print(f\" FIRST ATTEMPT SQL:\\n{q_s}\")\n", 866 | " output, q_s=single_execute_query(q_s, CLUSTER_IDENTIFIER, db, DB_USER,question) \n", 867 | " prompts=f'''<>[INST]You are a helpful and truthful assistant. Your job is to examine a sql statement and its generated result, then provide a response to my question.\n", 868 | "\n", 869 | "Here is the sql query:\n", 870 | "{q_s}\n", 871 | "\n", 872 | "Here is the corresponding sql query result:\n", 873 | "{output}\n", 874 | "<>\n", 875 | "question: {question}\n", 876 | "\n", 877 | "When providing your response:\n", 878 | "- First, review the sql query and the corresponding result. Then provide a complete answer to the my question, based on the result.\n", 879 | "- If you can't answer the question, please say so[/INST]'''\n", 880 | " response=qna_llm(prompts, params) \n", 881 | " return response, q_s,output" 882 | ] 883 | }, 884 | { 885 | "cell_type": "code", 886 | "execution_count": 21, 887 | "id": "f69fcae3-32f0-4048-a2cf-74183b928182", 888 | "metadata": { 889 | "tags": [] 890 | }, 891 | "outputs": [ 892 | { 893 | "data": { 894 | "text/plain": [ 895 | "('sample_data_dev',\n", 896 | " 'tickit',\n", 897 | " ['category', 'date', 'event', 'listing', 'sales', 'users', 'venue'])" 898 | ] 899 | }, 900 | "execution_count": 21, 901 | "metadata": {}, 902 | "output_type": "execute_result" 903 | } 904 | ], 905 | "source": [ 906 | "#db=get_db_redshift(CLUSTER_IDENTIFIER, DATABASE, DB_USER)[1]\n", 907 | "#schm=get_schema_redshift(CLUSTER_IDENTIFIER, db, DB_USER)[-1]\n", 908 | "db='sample_data_dev'\n", 909 | "schm = 'tickit'\n", 910 | "tables=get_tables_redshift(CLUSTER_IDENTIFIER, db, DB_USER,schm)\n", 911 | "db, schm, tables" 912 | ] 913 | }, 914 | { 915 | "cell_type": "markdown", 916 | "id": "2ec93e0a-21cc-42e6-8a38-df630590500d", 917 | "metadata": {}, 918 | "source": [ 919 | "#### Example prompts:" 920 | ] 921 | }, 922 | { 923 | "cell_type": "code", 924 | "execution_count": 22, 925 | "id": "53a0781d-9ff2-4cd2-b85b-34a4322e96c7", 926 | "metadata": { 927 | "tags": [] 928 | }, 929 | "outputs": [], 930 | "source": [ 931 | "prompt1 = \"What is the number of Venues where the show titled Macbeth was held?\"" 932 | ] 933 | }, 934 | { 935 | "cell_type": "code", 936 | "execution_count": 23, 937 | "id": "cf8c2dd2-d2a7-4155-ab3d-b6f206cc601a", 938 | "metadata": { 939 | "tags": [] 940 | }, 941 | "outputs": [], 942 | "source": [ 943 | "prompt2 = \"For the top 10 events, count the number of times each of them occur.\"" 944 | ] 945 | }, 946 | { 947 | "cell_type": "code", 948 | "execution_count": 24, 949 | "id": "2803aff4-7484-4cab-b231-d264e2747fcd", 950 | "metadata": { 951 | "tags": [] 952 | }, 953 | "outputs": [], 954 | "source": [ 955 | "prompt3 = \"What were the total Commissions Generated for Macbeth at Royce Hall?\"" 956 | ] 957 | }, 958 | { 959 | "cell_type": "code", 960 | "execution_count": 25, 961 | "id": "7e64de7c-e8e7-416b-b60b-694d49481bff", 962 | "metadata": { 963 | "tags": [] 964 | }, 965 | "outputs": [], 966 | "source": [ 967 | "prompt4 = \"the most popular state to host events based on the number of venues per state.\"" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 26, 973 | "id": "b304c971-ee2c-453c-9358-bd4f0fa371d7", 974 | "metadata": { 975 | "tags": [] 976 | }, 977 | "outputs": [ 978 | { 979 | "data": { 980 | "application/vnd.jupyter.widget-view+json": { 981 | "model_id": "8ad67ceec8a8497b9a0aed04c58c62c9", 982 | "version_major": 2, 983 | "version_minor": 0 984 | }, 985 | "text/plain": [ 986 | "Text(value='', description='Enter prompt:')" 987 | ] 988 | }, 989 | "metadata": {}, 990 | "output_type": "display_data" 991 | } 992 | ], 993 | "source": [ 994 | "entered_text = widgets.Text(\n", 995 | " value='',\n", 996 | " description='Enter prompt:',\n", 997 | ")\n", 998 | "display(entered_text)" 999 | ] 1000 | }, 1001 | { 1002 | "cell_type": "code", 1003 | "execution_count": 27, 1004 | "id": "6baba3e2-8c15-4b08-941f-e28a61094073", 1005 | "metadata": { 1006 | "tags": [] 1007 | }, 1008 | "outputs": [ 1009 | { 1010 | "name": "stdout", 1011 | "output_type": "stream", 1012 | "text": [ 1013 | "Prompt:\n", 1014 | "\n", 1015 | " What is the number of Venues where the show titled Macbeth was held?\n" 1016 | ] 1017 | } 1018 | ], 1019 | "source": [ 1020 | "prompt = entered_text.value\n", 1021 | "params={'sql-len':700,'text-token':500,'tables':tables,'db':schm,'temp':0.1,'model_id':'mixtral',\"prompt\":prompt}\n", 1022 | "print(f\"Prompt:\\n\\n {params['prompt']}\")" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "code", 1027 | "execution_count": 28, 1028 | "id": "642ae516-41e5-453e-be1a-ca7de1c1e9ec", 1029 | "metadata": { 1030 | "tags": [] 1031 | }, 1032 | "outputs": [ 1033 | { 1034 | "name": "stdout", 1035 | "output_type": "stream", 1036 | "text": [ 1037 | "[\"SELECT table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type FROM information_schema.columns WHERE table_schema='tickit'\", 'SELECT * from tickit.category LIMIT 3', 'SELECT * from tickit.date LIMIT 3', 'SELECT * from tickit.event LIMIT 3', 'SELECT * from tickit.listing LIMIT 3', 'SELECT * from tickit.sales LIMIT 3', 'SELECT * from tickit.users LIMIT 3', 'SELECT * from tickit.venue LIMIT 3']\n", 1038 | "Starting query execution.\n", 1039 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'table_catalog', 'length': 0, 'name': 'table_catalog', 'nullable': 1, 'precision': 65535, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'table_schema', 'length': 0, 'name': 'table_schema', 'nullable': 1, 'precision': 65535, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'table_name', 'length': 0, 'name': 'table_name', 'nullable': 1, 'precision': 65535, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'column_name', 'length': 0, 'name': 'column_name', 'nullable': 1, 'precision': 65535, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'varchar'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'ordinal_position', 'length': 0, 'name': 'ordinal_position', 'nullable': 1, 'precision': 10, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'int4'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'is_nullable', 'length': 0, 'name': 'is_nullable', 'nullable': 1, 'precision': 65535, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'data_type', 'length': 0, 'name': 'data_type', 'nullable': 1, 'precision': 65535, 'scale': 0, 'schemaName': 'information_schema', 'tableName': 'columns', 'typeName': 'varchar'}], 'Records': [[{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'holiday'}, {'longValue': 8}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likemusicals'}, {'longValue': 18}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likebroadway'}, {'longValue': 17}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likevegas'}, {'longValue': 16}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likerock'}, {'longValue': 15}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likeopera'}, {'longValue': 14}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likeclassical'}, {'longValue': 13}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likejazz'}, {'longValue': 12}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likeconcerts'}, {'longValue': 11}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'liketheatre'}, {'longValue': 10}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'likesports'}, {'longValue': 9}, {'stringValue': 'YES'}, {'stringValue': 'boolean'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'qtysold'}, {'longValue': 7}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'numtickets'}, {'longValue': 5}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'event'}, {'stringValue': 'catid'}, {'longValue': 3}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'event'}, {'stringValue': 'venueid'}, {'longValue': 2}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'year'}, {'longValue': 7}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'week'}, {'longValue': 4}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'dateid'}, {'longValue': 6}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'dateid'}, {'longValue': 4}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'event'}, {'stringValue': 'dateid'}, {'longValue': 4}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'dateid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'category'}, {'stringValue': 'catid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'venue'}, {'stringValue': 'venueid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'smallint'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'eventid'}, {'longValue': 5}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'buyerid'}, {'longValue': 4}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'sellerid'}, {'longValue': 3}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'listid'}, {'longValue': 2}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'salesid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'eventid'}, {'longValue': 3}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'sellerid'}, {'longValue': 2}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'listid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'event'}, {'stringValue': 'eventid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'venue'}, {'stringValue': 'venueseats'}, {'longValue': 5}, {'stringValue': 'YES'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'userid'}, {'longValue': 1}, {'stringValue': 'NO'}, {'stringValue': 'integer'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'qtr'}, {'longValue': 6}, {'stringValue': 'NO'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'month'}, {'longValue': 5}, {'stringValue': 'NO'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'day'}, {'longValue': 3}, {'stringValue': 'NO'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'venue'}, {'stringValue': 'venuestate'}, {'longValue': 4}, {'stringValue': 'YES'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'phone'}, {'longValue': 8}, {'stringValue': 'YES'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'state'}, {'longValue': 6}, {'stringValue': 'YES'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'username'}, {'longValue': 2}, {'stringValue': 'YES'}, {'stringValue': 'character'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'event'}, {'stringValue': 'eventname'}, {'longValue': 5}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'category'}, {'stringValue': 'catdesc'}, {'longValue': 4}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'category'}, {'stringValue': 'catname'}, {'longValue': 3}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'category'}, {'stringValue': 'catgroup'}, {'longValue': 2}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'venue'}, {'stringValue': 'venuecity'}, {'longValue': 3}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'venue'}, {'stringValue': 'venuename'}, {'longValue': 2}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'email'}, {'longValue': 7}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'city'}, {'longValue': 5}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'lastname'}, {'longValue': 4}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'users'}, {'stringValue': 'firstname'}, {'longValue': 3}, {'stringValue': 'YES'}, {'stringValue': 'character varying'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'date'}, {'stringValue': 'caldate'}, {'longValue': 2}, {'stringValue': 'NO'}, {'stringValue': 'date'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'commission'}, {'longValue': 9}, {'stringValue': 'YES'}, {'stringValue': 'numeric'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'pricepaid'}, {'longValue': 8}, {'stringValue': 'YES'}, {'stringValue': 'numeric'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'totalprice'}, {'longValue': 7}, {'stringValue': 'YES'}, {'stringValue': 'numeric'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'priceperticket'}, {'longValue': 6}, {'stringValue': 'YES'}, {'stringValue': 'numeric'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'sales'}, {'stringValue': 'saletime'}, {'longValue': 10}, {'stringValue': 'YES'}, {'stringValue': 'timestamp without time zone'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'listing'}, {'stringValue': 'listtime'}, {'longValue': 8}, {'stringValue': 'YES'}, {'stringValue': 'timestamp without time zone'}], [{'stringValue': 'sample_data_dev'}, {'stringValue': 'tickit'}, {'stringValue': 'event'}, {'stringValue': 'starttime'}, {'longValue': 6}, {'stringValue': 'YES'}, {'stringValue': 'timestamp without time zone'}]], 'TotalNumRows': 59, 'ResponseMetadata': {'RequestId': 'c995feb3-d93f-4760-879e-0aecf743f50d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'c995feb3-d93f-4760-879e-0aecf743f50d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '12236', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1040 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'catid', 'length': 0, 'name': 'catid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'category', 'typeName': 'int2'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'catgroup', 'length': 0, 'name': 'catgroup', 'nullable': 1, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'category', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'catname', 'length': 0, 'name': 'catname', 'nullable': 1, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'category', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'catdesc', 'length': 0, 'name': 'catdesc', 'nullable': 1, 'precision': 50, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'category', 'typeName': 'varchar'}], 'Records': [[{'longValue': 5}, {'stringValue': 'Sports'}, {'stringValue': 'MLS'}, {'stringValue': 'Major League Soccer'}], [{'longValue': 1}, {'stringValue': 'Sports'}, {'stringValue': 'MLB'}, {'stringValue': 'Major League Baseball'}], [{'longValue': 6}, {'stringValue': 'Shows'}, {'stringValue': 'Musicals'}, {'stringValue': 'Musical theatre'}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': '702abd11-8c4d-485e-b6ab-fbd5a14e223e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '702abd11-8c4d-485e-b6ab-fbd5a14e223e', 'content-type': 'application/x-amz-json-1.1', 'content-length': '1198', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1041 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'dateid', 'length': 0, 'name': 'dateid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'caldate', 'length': 0, 'name': 'caldate', 'nullable': 0, 'precision': 13, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'date'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'day', 'length': 0, 'name': 'day', 'nullable': 0, 'precision': 3, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'bpchar'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'week', 'length': 0, 'name': 'week', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'int2'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'month', 'length': 0, 'name': 'month', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'bpchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'qtr', 'length': 0, 'name': 'qtr', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'bpchar'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'year', 'length': 0, 'name': 'year', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'holiday', 'length': 0, 'name': 'holiday', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'date', 'typeName': 'bool'}], 'Records': [[{'longValue': 1827}, {'stringValue': '2008-01-01'}, {'stringValue': 'WE '}, {'longValue': 1}, {'stringValue': 'JAN '}, {'stringValue': '1 '}, {'longValue': 2008}, {'booleanValue': True}], [{'longValue': 1843}, {'stringValue': '2008-01-17'}, {'stringValue': 'FR '}, {'longValue': 3}, {'stringValue': 'JAN '}, {'stringValue': '1 '}, {'longValue': 2008}, {'booleanValue': False}], [{'longValue': 1845}, {'stringValue': '2008-01-19'}, {'stringValue': 'SU '}, {'longValue': 4}, {'stringValue': 'JAN '}, {'stringValue': '1 '}, {'longValue': 2008}, {'booleanValue': False}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': '607a258d-0bc2-4f00-b05f-e2d1564659d3', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '607a258d-0bc2-4f00-b05f-e2d1564659d3', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2181', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1042 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'eventid', 'length': 0, 'name': 'eventid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'event', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'venueid', 'length': 0, 'name': 'venueid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'event', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'catid', 'length': 0, 'name': 'catid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'event', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'dateid', 'length': 0, 'name': 'dateid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'event', 'typeName': 'int2'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'eventname', 'length': 0, 'name': 'eventname', 'nullable': 1, 'precision': 200, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'event', 'typeName': 'varchar'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'starttime', 'length': 0, 'name': 'starttime', 'nullable': 1, 'precision': 29, 'scale': 6, 'schemaName': 'tickit', 'tableName': 'event', 'typeName': 'timestamp'}], 'Records': [[{'longValue': 1334}, {'longValue': 208}, {'longValue': 6}, {'longValue': 1827}, {'stringValue': 'The King and I'}, {'stringValue': '2008-01-01 14:30:00'}], [{'longValue': 4850}, {'longValue': 91}, {'longValue': 9}, {'longValue': 1827}, {'stringValue': 'Zappa Plays Zappa'}, {'stringValue': '2008-01-01 14:00:00'}], [{'longValue': 6440}, {'longValue': 71}, {'longValue': 9}, {'longValue': 1827}, {'stringValue': 'Beck'}, {'stringValue': '2008-01-01 19:00:00'}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': '8d259589-2ca0-4e14-9ee6-f71c5ce0ea7e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '8d259589-2ca0-4e14-9ee6-f71c5ce0ea7e', 'content-type': 'application/x-amz-json-1.1', 'content-length': '1714', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1043 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'listid', 'length': 0, 'name': 'listid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'sellerid', 'length': 0, 'name': 'sellerid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'eventid', 'length': 0, 'name': 'eventid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'dateid', 'length': 0, 'name': 'dateid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'numtickets', 'length': 0, 'name': 'numtickets', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'priceperticket', 'length': 0, 'name': 'priceperticket', 'nullable': 1, 'precision': 8, 'scale': 2, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'numeric'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'totalprice', 'length': 0, 'name': 'totalprice', 'nullable': 1, 'precision': 8, 'scale': 2, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'numeric'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'listtime', 'length': 0, 'name': 'listtime', 'nullable': 1, 'precision': 29, 'scale': 6, 'schemaName': 'tickit', 'tableName': 'listing', 'typeName': 'timestamp'}], 'Records': [[{'longValue': 1315}, {'longValue': 37302}, {'longValue': 920}, {'longValue': 1827}, {'longValue': 9}, {'stringValue': '126.00'}, {'stringValue': '1134.00'}, {'stringValue': '2008-01-01 04:05:41'}], [{'longValue': 4118}, {'longValue': 40141}, {'longValue': 5624}, {'longValue': 1827}, {'longValue': 16}, {'stringValue': '43.00'}, {'stringValue': '688.00'}, {'stringValue': '2008-01-01 03:10:06'}], [{'longValue': 5273}, {'longValue': 24685}, {'longValue': 383}, {'longValue': 1827}, {'longValue': 1}, {'stringValue': '79.00'}, {'stringValue': '79.00'}, {'stringValue': '2008-01-01 06:03:47'}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': '211d3939-7ddc-41bf-88c1-8f95c4c2a7ee', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '211d3939-7ddc-41bf-88c1-8f95c4c2a7ee', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2285', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1044 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'salesid', 'length': 0, 'name': 'salesid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'listid', 'length': 0, 'name': 'listid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'sellerid', 'length': 0, 'name': 'sellerid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'buyerid', 'length': 0, 'name': 'buyerid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'eventid', 'length': 0, 'name': 'eventid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int4'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'dateid', 'length': 0, 'name': 'dateid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'qtysold', 'length': 0, 'name': 'qtysold', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'int2'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'pricepaid', 'length': 0, 'name': 'pricepaid', 'nullable': 1, 'precision': 8, 'scale': 2, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'numeric'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'commission', 'length': 0, 'name': 'commission', 'nullable': 1, 'precision': 8, 'scale': 2, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'numeric'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'saletime', 'length': 0, 'name': 'saletime', 'nullable': 1, 'precision': 29, 'scale': 6, 'schemaName': 'tickit', 'tableName': 'sales', 'typeName': 'timestamp'}], 'Records': [[{'longValue': 7011}, {'longValue': 7613}, {'longValue': 5933}, {'longValue': 1503}, {'longValue': 4515}, {'longValue': 1828}, {'longValue': 1}, {'stringValue': '177.00'}, {'stringValue': '26.55'}, {'stringValue': '2008-01-02 01:52:35'}], [{'longValue': 84644}, {'longValue': 96603}, {'longValue': 6051}, {'longValue': 1312}, {'longValue': 6641}, {'longValue': 1828}, {'longValue': 2}, {'stringValue': '810.00'}, {'stringValue': '121.50'}, {'stringValue': '2008-01-02 09:31:15'}], [{'longValue': 144048}, {'longValue': 166749}, {'longValue': 1303}, {'longValue': 617}, {'longValue': 7002}, {'longValue': 1828}, {'longValue': 1}, {'stringValue': '196.00'}, {'stringValue': '29.40'}, {'stringValue': '2008-01-02 03:34:14'}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': 'f60766c5-8ce5-447e-af74-970612357a5e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f60766c5-8ce5-447e-af74-970612357a5e', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2780', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1045 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'userid', 'length': 0, 'name': 'userid', 'nullable': 0, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'int4'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'username', 'length': 0, 'name': 'username', 'nullable': 1, 'precision': 8, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bpchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'firstname', 'length': 0, 'name': 'firstname', 'nullable': 1, 'precision': 30, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'lastname', 'length': 0, 'name': 'lastname', 'nullable': 1, 'precision': 30, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'city', 'length': 0, 'name': 'city', 'nullable': 1, 'precision': 30, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'state', 'length': 0, 'name': 'state', 'nullable': 1, 'precision': 2, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bpchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'email', 'length': 0, 'name': 'email', 'nullable': 1, 'precision': 100, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'phone', 'length': 0, 'name': 'phone', 'nullable': 1, 'precision': 14, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bpchar'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likesports', 'length': 0, 'name': 'likesports', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'liketheatre', 'length': 0, 'name': 'liketheatre', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likeconcerts', 'length': 0, 'name': 'likeconcerts', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likejazz', 'length': 0, 'name': 'likejazz', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likeclassical', 'length': 0, 'name': 'likeclassical', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likeopera', 'length': 0, 'name': 'likeopera', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likerock', 'length': 0, 'name': 'likerock', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likevegas', 'length': 0, 'name': 'likevegas', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likebroadway', 'length': 0, 'name': 'likebroadway', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': False, 'label': 'likemusicals', 'length': 0, 'name': 'likemusicals', 'nullable': 1, 'precision': 1, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'users', 'typeName': 'bool'}], 'Records': [[{'longValue': 2}, {'stringValue': 'PGL08LJI'}, {'stringValue': 'Vladimir'}, {'stringValue': 'Humphrey'}, {'stringValue': 'Murfreesboro'}, {'stringValue': 'SK'}, {'stringValue': 'Suspendisse.tristique@nonnisiAenean.edu'}, {'stringValue': '(783) 492-1886'}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'booleanValue': True}, {'booleanValue': True}, {'isNull': True}, {'isNull': True}, {'booleanValue': True}, {'booleanValue': False}, {'booleanValue': True}], [{'longValue': 4}, {'stringValue': 'XDZ38RDD'}, {'stringValue': 'Barry'}, {'stringValue': 'Roy'}, {'stringValue': 'Omaha'}, {'stringValue': 'AB'}, {'stringValue': 'sed@lacusUtnec.ca'}, {'stringValue': '(355) 452-8168'}, {'booleanValue': False}, {'booleanValue': True}, {'isNull': True}, {'booleanValue': False}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'booleanValue': False}], [{'longValue': 7}, {'stringValue': 'OWY35QYB'}, {'stringValue': 'Tamekah'}, {'stringValue': 'Juarez'}, {'stringValue': 'Moultrie'}, {'stringValue': 'WV'}, {'stringValue': 'elementum@semperpretiumneque.ca'}, {'stringValue': '(297) 875-7247'}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'booleanValue': True}, {'booleanValue': True}, {'booleanValue': False}, {'isNull': True}, {'isNull': True}, {'booleanValue': False}, {'booleanValue': False}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': '5d31a13f-507b-41d8-be4d-eae137c96f3b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '5d31a13f-507b-41d8-be4d-eae137c96f3b', 'content-type': 'application/x-amz-json-1.1', 'content-length': '5057', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1046 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'venueid', 'length': 0, 'name': 'venueid', 'nullable': 0, 'precision': 5, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'venue', 'typeName': 'int2'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'venuename', 'length': 0, 'name': 'venuename', 'nullable': 1, 'precision': 100, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'venue', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'venuecity', 'length': 0, 'name': 'venuecity', 'nullable': 1, 'precision': 30, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'venue', 'typeName': 'varchar'}, {'isCaseSensitive': True, 'isCurrency': False, 'isSigned': False, 'label': 'venuestate', 'length': 0, 'name': 'venuestate', 'nullable': 1, 'precision': 2, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'venue', 'typeName': 'bpchar'}, {'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'venueseats', 'length': 0, 'name': 'venueseats', 'nullable': 1, 'precision': 10, 'scale': 0, 'schemaName': 'tickit', 'tableName': 'venue', 'typeName': 'int4'}], 'Records': [[{'longValue': 2}, {'stringValue': 'Columbus Crew Stadium'}, {'stringValue': 'Columbus'}, {'stringValue': 'OH'}, {'longValue': 0}], [{'longValue': 4}, {'stringValue': 'CommunityAmerica Ballpark'}, {'stringValue': 'Kansas City'}, {'stringValue': 'KS'}, {'longValue': 0}], [{'longValue': 7}, {'stringValue': 'BMO Field'}, {'stringValue': 'Toronto'}, {'stringValue': 'ON'}, {'longValue': 0}]], 'TotalNumRows': 3, 'ResponseMetadata': {'RequestId': 'bb9377c5-55bd-487b-8c0a-98f9531c9bee', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bb9377c5-55bd-487b-8c0a-98f9531c9bee', 'content-type': 'application/x-amz-json-1.1', 'content-length': '1461', 'date': 'Fri, 28 Mar 2025 16:08:29 GMT'}, 'RetryAttempts': 0}}\n", 1047 | "Returning results: ['table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type\\nsample_data_dev,tickit,date,holiday,8,YES,boolean\\nsample_data_dev,tickit,users,likemusicals,18,YES,boolean\\nsample_data_dev,tickit,users,likebroadway,17,YES,boolean\\nsample_data_dev,tickit,users,likevegas,16,YES,boolean\\nsample_data_dev,tickit,users,likerock,15,YES,boolean\\nsample_data_dev,tickit,users,likeopera,14,YES,boolean\\nsample_data_dev,tickit,users,likeclassical,13,YES,boolean\\nsample_data_dev,tickit,users,likejazz,12,YES,boolean\\nsample_data_dev,tickit,users,likeconcerts,11,YES,boolean\\nsample_data_dev,tickit,users,liketheatre,10,YES,boolean\\nsample_data_dev,tickit,users,likesports,9,YES,boolean\\nsample_data_dev,tickit,sales,qtysold,7,NO,smallint\\nsample_data_dev,tickit,listing,numtickets,5,NO,smallint\\nsample_data_dev,tickit,event,catid,3,NO,smallint\\nsample_data_dev,tickit,event,venueid,2,NO,smallint\\nsample_data_dev,tickit,date,year,7,NO,smallint\\nsample_data_dev,tickit,date,week,4,NO,smallint\\nsample_data_dev,tickit,sales,dateid,6,NO,smallint\\nsample_data_dev,tickit,listing,dateid,4,NO,smallint\\nsample_data_dev,tickit,event,dateid,4,NO,smallint\\nsample_data_dev,tickit,date,dateid,1,NO,smallint\\nsample_data_dev,tickit,category,catid,1,NO,smallint\\nsample_data_dev,tickit,venue,venueid,1,NO,smallint\\nsample_data_dev,tickit,sales,eventid,5,NO,integer\\nsample_data_dev,tickit,sales,buyerid,4,NO,integer\\nsample_data_dev,tickit,sales,sellerid,3,NO,integer\\nsample_data_dev,tickit,sales,listid,2,NO,integer\\nsample_data_dev,tickit,sales,salesid,1,NO,integer\\nsample_data_dev,tickit,listing,eventid,3,NO,integer\\nsample_data_dev,tickit,listing,sellerid,2,NO,integer\\nsample_data_dev,tickit,listing,listid,1,NO,integer\\nsample_data_dev,tickit,event,eventid,1,NO,integer\\nsample_data_dev,tickit,venue,venueseats,5,YES,integer\\nsample_data_dev,tickit,users,userid,1,NO,integer\\nsample_data_dev,tickit,date,qtr,6,NO,character\\nsample_data_dev,tickit,date,month,5,NO,character\\nsample_data_dev,tickit,date,day,3,NO,character\\nsample_data_dev,tickit,venue,venuestate,4,YES,character\\nsample_data_dev,tickit,users,phone,8,YES,character\\nsample_data_dev,tickit,users,state,6,YES,character\\nsample_data_dev,tickit,users,username,2,YES,character\\nsample_data_dev,tickit,event,eventname,5,YES,character varying\\nsample_data_dev,tickit,category,catdesc,4,YES,character varying\\nsample_data_dev,tickit,category,catname,3,YES,character varying\\nsample_data_dev,tickit,category,catgroup,2,YES,character varying\\nsample_data_dev,tickit,venue,venuecity,3,YES,character varying\\nsample_data_dev,tickit,venue,venuename,2,YES,character varying\\nsample_data_dev,tickit,users,email,7,YES,character varying\\nsample_data_dev,tickit,users,city,5,YES,character varying\\nsample_data_dev,tickit,users,lastname,4,YES,character varying\\nsample_data_dev,tickit,users,firstname,3,YES,character varying\\nsample_data_dev,tickit,date,caldate,2,NO,date\\nsample_data_dev,tickit,sales,commission,9,YES,numeric\\nsample_data_dev,tickit,sales,pricepaid,8,YES,numeric\\nsample_data_dev,tickit,listing,totalprice,7,YES,numeric\\nsample_data_dev,tickit,listing,priceperticket,6,YES,numeric\\nsample_data_dev,tickit,sales,saletime,10,YES,timestamp without time zone\\nsample_data_dev,tickit,listing,listtime,8,YES,timestamp without time zone\\nsample_data_dev,tickit,event,starttime,6,YES,timestamp without time zone\\n', 'catid,catgroup,catname,catdesc\\n5,Sports,MLS,Major League Soccer\\n1,Sports,MLB,Major League Baseball\\n6,Shows,Musicals,Musical theatre\\n', 'dateid,caldate,day,week,month,qtr,year,holiday\\n1827,2008-01-01,WE ,1,JAN ,1 ,2008,True\\n1843,2008-01-17,FR ,3,JAN ,1 ,2008,False\\n1845,2008-01-19,SU ,4,JAN ,1 ,2008,False\\n', 'eventid,venueid,catid,dateid,eventname,starttime\\n1334,208,6,1827,The King and I,2008-01-01 14:30:00\\n4850,91,9,1827,Zappa Plays Zappa,2008-01-01 14:00:00\\n6440,71,9,1827,Beck,2008-01-01 19:00:00\\n', 'listid,sellerid,eventid,dateid,numtickets,priceperticket,totalprice,listtime\\n1315,37302,920,1827,9,126.00,1134.00,2008-01-01 04:05:41\\n4118,40141,5624,1827,16,43.00,688.00,2008-01-01 03:10:06\\n5273,24685,383,1827,1,79.00,79.00,2008-01-01 06:03:47\\n', 'salesid,listid,sellerid,buyerid,eventid,dateid,qtysold,pricepaid,commission,saletime\\n7011,7613,5933,1503,4515,1828,1,177.00,26.55,2008-01-02 01:52:35\\n84644,96603,6051,1312,6641,1828,2,810.00,121.50,2008-01-02 09:31:15\\n144048,166749,1303,617,7002,1828,1,196.00,29.40,2008-01-02 03:34:14\\n', 'userid,username,firstname,lastname,city,state,email,phone,likesports,liketheatre,likeconcerts,likejazz,likeclassical,likeopera,likerock,likevegas,likebroadway,likemusicals\\n2,PGL08LJI,Vladimir,Humphrey,Murfreesboro,SK,Suspendisse.tristique@nonnisiAenean.edu,(783) 492-1886,True,True,True,True,True,True,True,True,False,True\\n4,XDZ38RDD,Barry,Roy,Omaha,AB,sed@lacusUtnec.ca,(355) 452-8168,False,True,True,False,True,True,True,True,True,False\\n7,OWY35QYB,Tamekah,Juarez,Moultrie,WV,elementum@semperpretiumneque.ca,(297) 875-7247,True,True,True,True,True,False,True,True,False,False\\n', 'venueid,venuename,venuecity,venuestate,venueseats\\n2,Columbus Crew Stadium,Columbus,OH,0\\n4,CommunityAmerica Ballpark,Kansas City,KS,0\\n7,BMO Field,Toronto,ON,0\\n']\n", 1048 | "Queries have been executed\n", 1049 | "<>[INST]\n", 1050 | "You are an expert PostgreSQL developer. Your job is to provide a syntactically correct PostgreSQL query given a user question.\n", 1051 | "Here are the schema definition of table(s):\n", 1052 | "########\n", 1053 | "table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type\n", 1054 | "sample_data_dev,tickit,category,catdesc,4,YES,character varying\n", 1055 | "sample_data_dev,tickit,category,catgroup,2,YES,character varying\n", 1056 | "sample_data_dev,tickit,category,catid,1,NO,smallint\n", 1057 | "sample_data_dev,tickit,category,catname,3,YES,character varying\n", 1058 | "sample_data_dev,tickit,date,caldate,2,NO,date\n", 1059 | "sample_data_dev,tickit,date,dateid,1,NO,smallint\n", 1060 | "sample_data_dev,tickit,date,day,3,NO,character\n", 1061 | "sample_data_dev,tickit,date,holiday,8,YES,boolean\n", 1062 | "sample_data_dev,tickit,date,month,5,NO,character\n", 1063 | "sample_data_dev,tickit,date,qtr,6,NO,character\n", 1064 | "sample_data_dev,tickit,date,week,4,NO,smallint\n", 1065 | "sample_data_dev,tickit,date,year,7,NO,smallint\n", 1066 | "sample_data_dev,tickit,event,catid,3,NO,smallint\n", 1067 | "sample_data_dev,tickit,event,dateid,4,NO,smallint\n", 1068 | "sample_data_dev,tickit,event,eventid,1,NO,integer\n", 1069 | "sample_data_dev,tickit,event,eventname,5,YES,character varying\n", 1070 | "sample_data_dev,tickit,event,starttime,6,YES,timestamp without time zone\n", 1071 | "sample_data_dev,tickit,event,venueid,2,NO,smallint\n", 1072 | "sample_data_dev,tickit,listing,dateid,4,NO,smallint\n", 1073 | "sample_data_dev,tickit,listing,eventid,3,NO,integer\n", 1074 | "sample_data_dev,tickit,listing,listid,1,NO,integer\n", 1075 | "sample_data_dev,tickit,listing,listtime,8,YES,timestamp without time zone\n", 1076 | "sample_data_dev,tickit,listing,numtickets,5,NO,smallint\n", 1077 | "sample_data_dev,tickit,listing,priceperticket,6,YES,numeric\n", 1078 | "sample_data_dev,tickit,listing,sellerid,2,NO,integer\n", 1079 | "sample_data_dev,tickit,listing,totalprice,7,YES,numeric\n", 1080 | "sample_data_dev,tickit,sales,buyerid,4,NO,integer\n", 1081 | "sample_data_dev,tickit,sales,commission,9,YES,numeric\n", 1082 | "sample_data_dev,tickit,sales,dateid,6,NO,smallint\n", 1083 | "sample_data_dev,tickit,sales,eventid,5,NO,integer\n", 1084 | "sample_data_dev,tickit,sales,listid,2,NO,integer\n", 1085 | "sample_data_dev,tickit,sales,pricepaid,8,YES,numeric\n", 1086 | "sample_data_dev,tickit,sales,qtysold,7,NO,smallint\n", 1087 | "sample_data_dev,tickit,sales,salesid,1,NO,integer\n", 1088 | "sample_data_dev,tickit,sales,saletime,10,YES,timestamp without time zone\n", 1089 | "sample_data_dev,tickit,sales,sellerid,3,NO,integer\n", 1090 | "sample_data_dev,tickit,users,city,5,YES,character varying\n", 1091 | "sample_data_dev,tickit,users,email,7,YES,character varying\n", 1092 | "sample_data_dev,tickit,users,firstname,3,YES,character varying\n", 1093 | "sample_data_dev,tickit,users,lastname,4,YES,character varying\n", 1094 | "sample_data_dev,tickit,users,likebroadway,17,YES,boolean\n", 1095 | "sample_data_dev,tickit,users,likeclassical,13,YES,boolean\n", 1096 | "sample_data_dev,tickit,users,likeconcerts,11,YES,boolean\n", 1097 | "sample_data_dev,tickit,users,likejazz,12,YES,boolean\n", 1098 | "sample_data_dev,tickit,users,likemusicals,18,YES,boolean\n", 1099 | "sample_data_dev,tickit,users,likeopera,14,YES,boolean\n", 1100 | "sample_data_dev,tickit,users,likerock,15,YES,boolean\n", 1101 | "sample_data_dev,tickit,users,likesports,9,YES,boolean\n", 1102 | "sample_data_dev,tickit,users,liketheatre,10,YES,boolean\n", 1103 | "sample_data_dev,tickit,users,likevegas,16,YES,boolean\n", 1104 | "sample_data_dev,tickit,users,phone,8,YES,character\n", 1105 | "sample_data_dev,tickit,users,state,6,YES,character\n", 1106 | "sample_data_dev,tickit,users,userid,1,NO,integer\n", 1107 | "sample_data_dev,tickit,users,username,2,YES,character\n", 1108 | "sample_data_dev,tickit,venue,venuecity,3,YES,character varying\n", 1109 | "sample_data_dev,tickit,venue,venueid,1,NO,smallint\n", 1110 | "sample_data_dev,tickit,venue,venuename,2,YES,character varying\n", 1111 | "sample_data_dev,tickit,venue,venueseats,5,YES,integer\n", 1112 | "sample_data_dev,tickit,venue,venuestate,4,YES,character\n", 1113 | "########\n", 1114 | "\n", 1115 | "Here are example records for each table:\n", 1116 | "##########\n", 1117 | "catid,catgroup,catname,catdesc\n", 1118 | "5,Sports,MLS,Major League Soccer\n", 1119 | "1,Sports,MLB,Major League Baseball\n", 1120 | "6,Shows,Musicals,Musical theatre\n", 1121 | "\n", 1122 | "\n", 1123 | "dateid,caldate,day,week,month,qtr,year,holiday\n", 1124 | "1827,2008-01-01,WE ,1,JAN ,1 ,2008,True\n", 1125 | "1843,2008-01-17,FR ,3,JAN ,1 ,2008,False\n", 1126 | "1845,2008-01-19,SU ,4,JAN ,1 ,2008,False\n", 1127 | "\n", 1128 | "\n", 1129 | "eventid,venueid,catid,dateid,eventname,starttime\n", 1130 | "1334,208,6,1827,The King and I,2008-01-01 14:30:00\n", 1131 | "4850,91,9,1827,Zappa Plays Zappa,2008-01-01 14:00:00\n", 1132 | "6440,71,9,1827,Beck,2008-01-01 19:00:00\n", 1133 | "\n", 1134 | "\n", 1135 | "listid,sellerid,eventid,dateid,numtickets,priceperticket,totalprice,listtime\n", 1136 | "1315,37302,920,1827,9,126.00,1134.00,2008-01-01 04:05:41\n", 1137 | "4118,40141,5624,1827,16,43.00,688.00,2008-01-01 03:10:06\n", 1138 | "5273,24685,383,1827,1,79.00,79.00,2008-01-01 06:03:47\n", 1139 | "\n", 1140 | "\n", 1141 | "salesid,listid,sellerid,buyerid,eventid,dateid,qtysold,pricepaid,commission,saletime\n", 1142 | "7011,7613,5933,1503,4515,1828,1,177.00,26.55,2008-01-02 01:52:35\n", 1143 | "84644,96603,6051,1312,6641,1828,2,810.00,121.50,2008-01-02 09:31:15\n", 1144 | "144048,166749,1303,617,7002,1828,1,196.00,29.40,2008-01-02 03:34:14\n", 1145 | "\n", 1146 | "\n", 1147 | "userid,username,firstname,lastname,city,state,email,phone,likesports,liketheatre,likeconcerts,likejazz,likeclassical,likeopera,likerock,likevegas,likebroadway,likemusicals\n", 1148 | "2,PGL08LJI,Vladimir,Humphrey,Murfreesboro,SK,Suspendisse.tristique@nonnisiAenean.edu,(783) 492-1886,True,True,True,True,True,True,True,True,False,True\n", 1149 | "4,XDZ38RDD,Barry,Roy,Omaha,AB,sed@lacusUtnec.ca,(355) 452-8168,False,True,True,False,True,True,True,True,True,False\n", 1150 | "7,OWY35QYB,Tamekah,Juarez,Moultrie,WV,elementum@semperpretiumneque.ca,(297) 875-7247,True,True,True,True,True,False,True,True,False,False\n", 1151 | "\n", 1152 | "\n", 1153 | "venueid,venuename,venuecity,venuestate,venueseats\n", 1154 | "2,Columbus Crew Stadium,Columbus,OH,0\n", 1155 | "4,CommunityAmerica Ballpark,Kansas City,KS,0\n", 1156 | "7,BMO Field,Toronto,ON,0\n", 1157 | "\n", 1158 | "\n", 1159 | "\n", 1160 | "###########\n", 1161 | "<>\n", 1162 | "Here are some instructions when generating SQL statements:\n", 1163 | "1. Determine the necessary table(s) and schema needed for an accurate query.\n", 1164 | "2. Limit your queries to only the required columns to prevent unnecessary data retrieval and improve query performance.\n", 1165 | "3. For clarity and to prevent potential conflicts, always include the schema name when referencing table names in your SQL queries.\n", 1166 | "4. When working with Amazon Redshift table and column names containing underscores, do not use the backslash escape character (\\). Instead, use double quotes (\"\") to enclose the names in your queries.\n", 1167 | "5. Do not mention 'dev' or 'public' in the queries.\n", 1168 | "In your response, provide a single SQL statement to answer the question, avoid additional text that would cause failure during executing the sql. \n", 1169 | "Format your response as:\n", 1170 | "\n", 1171 | "generated SQL statement \n", 1172 | "\n", 1173 | "\n", 1174 | "Question: What is the number of Venues where the show titled Macbeth was held?[/INST]\n", 1175 | " FIRST ATTEMPT SQL:\n", 1176 | "\n", 1177 | "SELECT COUNT(DISTINCT v.venueid)\n", 1178 | "FROM tickit.event e\n", 1179 | "JOIN tickit.venue v ON e.venueid = v.venueid\n", 1180 | "WHERE e.eventname = 'Macbeth';\n", 1181 | "\n", 1182 | "Working with query response: {'ColumnMetadata': [{'isCaseSensitive': False, 'isCurrency': False, 'isSigned': True, 'label': 'count', 'length': 0, 'name': 'count', 'nullable': 1, 'precision': 19, 'scale': 0, 'schemaName': '', 'tableName': '', 'typeName': 'int8'}], 'Records': [[{'longValue': 41}]], 'TotalNumRows': 1, 'ResponseMetadata': {'RequestId': '6b6767aa-d765-456a-86cd-f64c172b26cd', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '6b6767aa-d765-456a-86cd-f64c172b26cd', 'content-type': 'application/x-amz-json-1.1', 'content-length': '258', 'date': 'Fri, 28 Mar 2025 16:08:41 GMT'}, 'RetryAttempts': 0}}\n", 1183 | "CPU times: user 154 ms, sys: 0 ns, total: 154 ms\n", 1184 | "Wall time: 23.6 s\n" 1185 | ] 1186 | } 1187 | ], 1188 | "source": [ 1189 | "%%time\n", 1190 | "result_text2sql = redshift_qna(params)" 1191 | ] 1192 | }, 1193 | { 1194 | "cell_type": "code", 1195 | "execution_count": 29, 1196 | "id": "73f7a388-c7dd-40df-bda1-3719044129d2", 1197 | "metadata": { 1198 | "tags": [] 1199 | }, 1200 | "outputs": [ 1201 | { 1202 | "name": "stdout", 1203 | "output_type": "stream", 1204 | "text": [ 1205 | "\n", 1206 | "Answer:\n", 1207 | "\n", 1208 | " The number of Venues where the show titled Macbeth was held is 41. This is based on the SQL query result that returned a count of 41 distinct venue IDs for the event named 'Macbeth'.\n", 1209 | "\n" 1210 | ] 1211 | } 1212 | ], 1213 | "source": [ 1214 | "# Query result in Natural Language\n", 1215 | "print(f\"\\nAnswer:\\n\\n{result_text2sql[0]}\\n\")" 1216 | ] 1217 | }, 1218 | { 1219 | "cell_type": "code", 1220 | "execution_count": 30, 1221 | "id": "66463052-9f25-4ff9-a19e-76832b179e73", 1222 | "metadata": { 1223 | "tags": [] 1224 | }, 1225 | "outputs": [ 1226 | { 1227 | "name": "stdout", 1228 | "output_type": "stream", 1229 | "text": [ 1230 | "\n", 1231 | "SQL Query generated from the prompt:\n", 1232 | "\n", 1233 | "\n", 1234 | "SELECT COUNT(DISTINCT v.venueid)\n", 1235 | "FROM tickit.event e\n", 1236 | "JOIN tickit.venue v ON e.venueid = v.venueid\n", 1237 | "WHERE e.eventname = 'Macbeth';\n", 1238 | "\n", 1239 | "\n" 1240 | ] 1241 | } 1242 | ], 1243 | "source": [ 1244 | "# Generated SQL query used\n", 1245 | "print(f\"\\nSQL Query generated from the prompt:\\n\")\n", 1246 | "print(result_text2sql[1])\n", 1247 | "print(\"\")" 1248 | ] 1249 | }, 1250 | { 1251 | "cell_type": "code", 1252 | "execution_count": 31, 1253 | "id": "884c31f9-917e-45a3-94fb-1c05eef677b3", 1254 | "metadata": { 1255 | "tags": [] 1256 | }, 1257 | "outputs": [ 1258 | { 1259 | "name": "stdout", 1260 | "output_type": "stream", 1261 | "text": [ 1262 | "\n", 1263 | "Tabular results from the SQL query:\n", 1264 | "\n" 1265 | ] 1266 | }, 1267 | { 1268 | "data": { 1269 | "text/html": [ 1270 | "
\n", 1271 | "\n", 1284 | "\n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | "
count
041
\n", 1298 | "
" 1299 | ], 1300 | "text/plain": [ 1301 | " count\n", 1302 | "0 41" 1303 | ] 1304 | }, 1305 | "execution_count": 31, 1306 | "metadata": {}, 1307 | "output_type": "execute_result" 1308 | } 1309 | ], 1310 | "source": [ 1311 | "# Tabular results from the SQL Query \n", 1312 | "print(f\"\\nTabular results from the SQL query:\\n\")\n", 1313 | "df=pd.read_csv(StringIO(result_text2sql[2]))\n", 1314 | "df" 1315 | ] 1316 | }, 1317 | { 1318 | "cell_type": "code", 1319 | "execution_count": null, 1320 | "id": "de80e756-14eb-4469-8620-15e94084d04a", 1321 | "metadata": {}, 1322 | "outputs": [], 1323 | "source": [] 1324 | } 1325 | ], 1326 | "metadata": { 1327 | "availableInstances": [ 1328 | { 1329 | "_defaultOrder": 0, 1330 | "_isFastLaunch": true, 1331 | "category": "General purpose", 1332 | "gpuNum": 0, 1333 | "hideHardwareSpecs": false, 1334 | "memoryGiB": 4, 1335 | "name": "ml.t3.medium", 1336 | "vcpuNum": 2 1337 | }, 1338 | { 1339 | "_defaultOrder": 1, 1340 | "_isFastLaunch": false, 1341 | "category": "General purpose", 1342 | "gpuNum": 0, 1343 | "hideHardwareSpecs": false, 1344 | "memoryGiB": 8, 1345 | "name": "ml.t3.large", 1346 | "vcpuNum": 2 1347 | }, 1348 | { 1349 | "_defaultOrder": 2, 1350 | "_isFastLaunch": false, 1351 | "category": "General purpose", 1352 | "gpuNum": 0, 1353 | "hideHardwareSpecs": false, 1354 | "memoryGiB": 16, 1355 | "name": "ml.t3.xlarge", 1356 | "vcpuNum": 4 1357 | }, 1358 | { 1359 | "_defaultOrder": 3, 1360 | "_isFastLaunch": false, 1361 | "category": "General purpose", 1362 | "gpuNum": 0, 1363 | "hideHardwareSpecs": false, 1364 | "memoryGiB": 32, 1365 | "name": "ml.t3.2xlarge", 1366 | "vcpuNum": 8 1367 | }, 1368 | { 1369 | "_defaultOrder": 4, 1370 | "_isFastLaunch": true, 1371 | "category": "General purpose", 1372 | "gpuNum": 0, 1373 | "hideHardwareSpecs": false, 1374 | "memoryGiB": 8, 1375 | "name": "ml.m5.large", 1376 | "vcpuNum": 2 1377 | }, 1378 | { 1379 | "_defaultOrder": 5, 1380 | "_isFastLaunch": false, 1381 | "category": "General purpose", 1382 | "gpuNum": 0, 1383 | "hideHardwareSpecs": false, 1384 | "memoryGiB": 16, 1385 | "name": "ml.m5.xlarge", 1386 | "vcpuNum": 4 1387 | }, 1388 | { 1389 | "_defaultOrder": 6, 1390 | "_isFastLaunch": false, 1391 | "category": "General purpose", 1392 | "gpuNum": 0, 1393 | "hideHardwareSpecs": false, 1394 | "memoryGiB": 32, 1395 | "name": "ml.m5.2xlarge", 1396 | "vcpuNum": 8 1397 | }, 1398 | { 1399 | "_defaultOrder": 7, 1400 | "_isFastLaunch": false, 1401 | "category": "General purpose", 1402 | "gpuNum": 0, 1403 | "hideHardwareSpecs": false, 1404 | "memoryGiB": 64, 1405 | "name": "ml.m5.4xlarge", 1406 | "vcpuNum": 16 1407 | }, 1408 | { 1409 | "_defaultOrder": 8, 1410 | "_isFastLaunch": false, 1411 | "category": "General purpose", 1412 | "gpuNum": 0, 1413 | "hideHardwareSpecs": false, 1414 | "memoryGiB": 128, 1415 | "name": "ml.m5.8xlarge", 1416 | "vcpuNum": 32 1417 | }, 1418 | { 1419 | "_defaultOrder": 9, 1420 | "_isFastLaunch": false, 1421 | "category": "General purpose", 1422 | "gpuNum": 0, 1423 | "hideHardwareSpecs": false, 1424 | "memoryGiB": 192, 1425 | "name": "ml.m5.12xlarge", 1426 | "vcpuNum": 48 1427 | }, 1428 | { 1429 | "_defaultOrder": 10, 1430 | "_isFastLaunch": false, 1431 | "category": "General purpose", 1432 | "gpuNum": 0, 1433 | "hideHardwareSpecs": false, 1434 | "memoryGiB": 256, 1435 | "name": "ml.m5.16xlarge", 1436 | "vcpuNum": 64 1437 | }, 1438 | { 1439 | "_defaultOrder": 11, 1440 | "_isFastLaunch": false, 1441 | "category": "General purpose", 1442 | "gpuNum": 0, 1443 | "hideHardwareSpecs": false, 1444 | "memoryGiB": 384, 1445 | "name": "ml.m5.24xlarge", 1446 | "vcpuNum": 96 1447 | }, 1448 | { 1449 | "_defaultOrder": 12, 1450 | "_isFastLaunch": false, 1451 | "category": "General purpose", 1452 | "gpuNum": 0, 1453 | "hideHardwareSpecs": false, 1454 | "memoryGiB": 8, 1455 | "name": "ml.m5d.large", 1456 | "vcpuNum": 2 1457 | }, 1458 | { 1459 | "_defaultOrder": 13, 1460 | "_isFastLaunch": false, 1461 | "category": "General purpose", 1462 | "gpuNum": 0, 1463 | "hideHardwareSpecs": false, 1464 | "memoryGiB": 16, 1465 | "name": "ml.m5d.xlarge", 1466 | "vcpuNum": 4 1467 | }, 1468 | { 1469 | "_defaultOrder": 14, 1470 | "_isFastLaunch": false, 1471 | "category": "General purpose", 1472 | "gpuNum": 0, 1473 | "hideHardwareSpecs": false, 1474 | "memoryGiB": 32, 1475 | "name": "ml.m5d.2xlarge", 1476 | "vcpuNum": 8 1477 | }, 1478 | { 1479 | "_defaultOrder": 15, 1480 | "_isFastLaunch": false, 1481 | "category": "General purpose", 1482 | "gpuNum": 0, 1483 | "hideHardwareSpecs": false, 1484 | "memoryGiB": 64, 1485 | "name": "ml.m5d.4xlarge", 1486 | "vcpuNum": 16 1487 | }, 1488 | { 1489 | "_defaultOrder": 16, 1490 | "_isFastLaunch": false, 1491 | "category": "General purpose", 1492 | "gpuNum": 0, 1493 | "hideHardwareSpecs": false, 1494 | "memoryGiB": 128, 1495 | "name": "ml.m5d.8xlarge", 1496 | "vcpuNum": 32 1497 | }, 1498 | { 1499 | "_defaultOrder": 17, 1500 | "_isFastLaunch": false, 1501 | "category": "General purpose", 1502 | "gpuNum": 0, 1503 | "hideHardwareSpecs": false, 1504 | "memoryGiB": 192, 1505 | "name": "ml.m5d.12xlarge", 1506 | "vcpuNum": 48 1507 | }, 1508 | { 1509 | "_defaultOrder": 18, 1510 | "_isFastLaunch": false, 1511 | "category": "General purpose", 1512 | "gpuNum": 0, 1513 | "hideHardwareSpecs": false, 1514 | "memoryGiB": 256, 1515 | "name": "ml.m5d.16xlarge", 1516 | "vcpuNum": 64 1517 | }, 1518 | { 1519 | "_defaultOrder": 19, 1520 | "_isFastLaunch": false, 1521 | "category": "General purpose", 1522 | "gpuNum": 0, 1523 | "hideHardwareSpecs": false, 1524 | "memoryGiB": 384, 1525 | "name": "ml.m5d.24xlarge", 1526 | "vcpuNum": 96 1527 | }, 1528 | { 1529 | "_defaultOrder": 20, 1530 | "_isFastLaunch": false, 1531 | "category": "General purpose", 1532 | "gpuNum": 0, 1533 | "hideHardwareSpecs": true, 1534 | "memoryGiB": 0, 1535 | "name": "ml.geospatial.interactive", 1536 | "supportedImageNames": [ 1537 | "sagemaker-geospatial-v1-0" 1538 | ], 1539 | "vcpuNum": 0 1540 | }, 1541 | { 1542 | "_defaultOrder": 21, 1543 | "_isFastLaunch": true, 1544 | "category": "Compute optimized", 1545 | "gpuNum": 0, 1546 | "hideHardwareSpecs": false, 1547 | "memoryGiB": 4, 1548 | "name": "ml.c5.large", 1549 | "vcpuNum": 2 1550 | }, 1551 | { 1552 | "_defaultOrder": 22, 1553 | "_isFastLaunch": false, 1554 | "category": "Compute optimized", 1555 | "gpuNum": 0, 1556 | "hideHardwareSpecs": false, 1557 | "memoryGiB": 8, 1558 | "name": "ml.c5.xlarge", 1559 | "vcpuNum": 4 1560 | }, 1561 | { 1562 | "_defaultOrder": 23, 1563 | "_isFastLaunch": false, 1564 | "category": "Compute optimized", 1565 | "gpuNum": 0, 1566 | "hideHardwareSpecs": false, 1567 | "memoryGiB": 16, 1568 | "name": "ml.c5.2xlarge", 1569 | "vcpuNum": 8 1570 | }, 1571 | { 1572 | "_defaultOrder": 24, 1573 | "_isFastLaunch": false, 1574 | "category": "Compute optimized", 1575 | "gpuNum": 0, 1576 | "hideHardwareSpecs": false, 1577 | "memoryGiB": 32, 1578 | "name": "ml.c5.4xlarge", 1579 | "vcpuNum": 16 1580 | }, 1581 | { 1582 | "_defaultOrder": 25, 1583 | "_isFastLaunch": false, 1584 | "category": "Compute optimized", 1585 | "gpuNum": 0, 1586 | "hideHardwareSpecs": false, 1587 | "memoryGiB": 72, 1588 | "name": "ml.c5.9xlarge", 1589 | "vcpuNum": 36 1590 | }, 1591 | { 1592 | "_defaultOrder": 26, 1593 | "_isFastLaunch": false, 1594 | "category": "Compute optimized", 1595 | "gpuNum": 0, 1596 | "hideHardwareSpecs": false, 1597 | "memoryGiB": 96, 1598 | "name": "ml.c5.12xlarge", 1599 | "vcpuNum": 48 1600 | }, 1601 | { 1602 | "_defaultOrder": 27, 1603 | "_isFastLaunch": false, 1604 | "category": "Compute optimized", 1605 | "gpuNum": 0, 1606 | "hideHardwareSpecs": false, 1607 | "memoryGiB": 144, 1608 | "name": "ml.c5.18xlarge", 1609 | "vcpuNum": 72 1610 | }, 1611 | { 1612 | "_defaultOrder": 28, 1613 | "_isFastLaunch": false, 1614 | "category": "Compute optimized", 1615 | "gpuNum": 0, 1616 | "hideHardwareSpecs": false, 1617 | "memoryGiB": 192, 1618 | "name": "ml.c5.24xlarge", 1619 | "vcpuNum": 96 1620 | }, 1621 | { 1622 | "_defaultOrder": 29, 1623 | "_isFastLaunch": true, 1624 | "category": "Accelerated computing", 1625 | "gpuNum": 1, 1626 | "hideHardwareSpecs": false, 1627 | "memoryGiB": 16, 1628 | "name": "ml.g4dn.xlarge", 1629 | "vcpuNum": 4 1630 | }, 1631 | { 1632 | "_defaultOrder": 30, 1633 | "_isFastLaunch": false, 1634 | "category": "Accelerated computing", 1635 | "gpuNum": 1, 1636 | "hideHardwareSpecs": false, 1637 | "memoryGiB": 32, 1638 | "name": "ml.g4dn.2xlarge", 1639 | "vcpuNum": 8 1640 | }, 1641 | { 1642 | "_defaultOrder": 31, 1643 | "_isFastLaunch": false, 1644 | "category": "Accelerated computing", 1645 | "gpuNum": 1, 1646 | "hideHardwareSpecs": false, 1647 | "memoryGiB": 64, 1648 | "name": "ml.g4dn.4xlarge", 1649 | "vcpuNum": 16 1650 | }, 1651 | { 1652 | "_defaultOrder": 32, 1653 | "_isFastLaunch": false, 1654 | "category": "Accelerated computing", 1655 | "gpuNum": 1, 1656 | "hideHardwareSpecs": false, 1657 | "memoryGiB": 128, 1658 | "name": "ml.g4dn.8xlarge", 1659 | "vcpuNum": 32 1660 | }, 1661 | { 1662 | "_defaultOrder": 33, 1663 | "_isFastLaunch": false, 1664 | "category": "Accelerated computing", 1665 | "gpuNum": 4, 1666 | "hideHardwareSpecs": false, 1667 | "memoryGiB": 192, 1668 | "name": "ml.g4dn.12xlarge", 1669 | "vcpuNum": 48 1670 | }, 1671 | { 1672 | "_defaultOrder": 34, 1673 | "_isFastLaunch": false, 1674 | "category": "Accelerated computing", 1675 | "gpuNum": 1, 1676 | "hideHardwareSpecs": false, 1677 | "memoryGiB": 256, 1678 | "name": "ml.g4dn.16xlarge", 1679 | "vcpuNum": 64 1680 | }, 1681 | { 1682 | "_defaultOrder": 35, 1683 | "_isFastLaunch": false, 1684 | "category": "Accelerated computing", 1685 | "gpuNum": 1, 1686 | "hideHardwareSpecs": false, 1687 | "memoryGiB": 61, 1688 | "name": "ml.p3.2xlarge", 1689 | "vcpuNum": 8 1690 | }, 1691 | { 1692 | "_defaultOrder": 36, 1693 | "_isFastLaunch": false, 1694 | "category": "Accelerated computing", 1695 | "gpuNum": 4, 1696 | "hideHardwareSpecs": false, 1697 | "memoryGiB": 244, 1698 | "name": "ml.p3.8xlarge", 1699 | "vcpuNum": 32 1700 | }, 1701 | { 1702 | "_defaultOrder": 37, 1703 | "_isFastLaunch": false, 1704 | "category": "Accelerated computing", 1705 | "gpuNum": 8, 1706 | "hideHardwareSpecs": false, 1707 | "memoryGiB": 488, 1708 | "name": "ml.p3.16xlarge", 1709 | "vcpuNum": 64 1710 | }, 1711 | { 1712 | "_defaultOrder": 38, 1713 | "_isFastLaunch": false, 1714 | "category": "Accelerated computing", 1715 | "gpuNum": 8, 1716 | "hideHardwareSpecs": false, 1717 | "memoryGiB": 768, 1718 | "name": "ml.p3dn.24xlarge", 1719 | "vcpuNum": 96 1720 | }, 1721 | { 1722 | "_defaultOrder": 39, 1723 | "_isFastLaunch": false, 1724 | "category": "Memory Optimized", 1725 | "gpuNum": 0, 1726 | "hideHardwareSpecs": false, 1727 | "memoryGiB": 16, 1728 | "name": "ml.r5.large", 1729 | "vcpuNum": 2 1730 | }, 1731 | { 1732 | "_defaultOrder": 40, 1733 | "_isFastLaunch": false, 1734 | "category": "Memory Optimized", 1735 | "gpuNum": 0, 1736 | "hideHardwareSpecs": false, 1737 | "memoryGiB": 32, 1738 | "name": "ml.r5.xlarge", 1739 | "vcpuNum": 4 1740 | }, 1741 | { 1742 | "_defaultOrder": 41, 1743 | "_isFastLaunch": false, 1744 | "category": "Memory Optimized", 1745 | "gpuNum": 0, 1746 | "hideHardwareSpecs": false, 1747 | "memoryGiB": 64, 1748 | "name": "ml.r5.2xlarge", 1749 | "vcpuNum": 8 1750 | }, 1751 | { 1752 | "_defaultOrder": 42, 1753 | "_isFastLaunch": false, 1754 | "category": "Memory Optimized", 1755 | "gpuNum": 0, 1756 | "hideHardwareSpecs": false, 1757 | "memoryGiB": 128, 1758 | "name": "ml.r5.4xlarge", 1759 | "vcpuNum": 16 1760 | }, 1761 | { 1762 | "_defaultOrder": 43, 1763 | "_isFastLaunch": false, 1764 | "category": "Memory Optimized", 1765 | "gpuNum": 0, 1766 | "hideHardwareSpecs": false, 1767 | "memoryGiB": 256, 1768 | "name": "ml.r5.8xlarge", 1769 | "vcpuNum": 32 1770 | }, 1771 | { 1772 | "_defaultOrder": 44, 1773 | "_isFastLaunch": false, 1774 | "category": "Memory Optimized", 1775 | "gpuNum": 0, 1776 | "hideHardwareSpecs": false, 1777 | "memoryGiB": 384, 1778 | "name": "ml.r5.12xlarge", 1779 | "vcpuNum": 48 1780 | }, 1781 | { 1782 | "_defaultOrder": 45, 1783 | "_isFastLaunch": false, 1784 | "category": "Memory Optimized", 1785 | "gpuNum": 0, 1786 | "hideHardwareSpecs": false, 1787 | "memoryGiB": 512, 1788 | "name": "ml.r5.16xlarge", 1789 | "vcpuNum": 64 1790 | }, 1791 | { 1792 | "_defaultOrder": 46, 1793 | "_isFastLaunch": false, 1794 | "category": "Memory Optimized", 1795 | "gpuNum": 0, 1796 | "hideHardwareSpecs": false, 1797 | "memoryGiB": 768, 1798 | "name": "ml.r5.24xlarge", 1799 | "vcpuNum": 96 1800 | }, 1801 | { 1802 | "_defaultOrder": 47, 1803 | "_isFastLaunch": false, 1804 | "category": "Accelerated computing", 1805 | "gpuNum": 1, 1806 | "hideHardwareSpecs": false, 1807 | "memoryGiB": 16, 1808 | "name": "ml.g5.xlarge", 1809 | "vcpuNum": 4 1810 | }, 1811 | { 1812 | "_defaultOrder": 48, 1813 | "_isFastLaunch": false, 1814 | "category": "Accelerated computing", 1815 | "gpuNum": 1, 1816 | "hideHardwareSpecs": false, 1817 | "memoryGiB": 32, 1818 | "name": "ml.g5.2xlarge", 1819 | "vcpuNum": 8 1820 | }, 1821 | { 1822 | "_defaultOrder": 49, 1823 | "_isFastLaunch": false, 1824 | "category": "Accelerated computing", 1825 | "gpuNum": 1, 1826 | "hideHardwareSpecs": false, 1827 | "memoryGiB": 64, 1828 | "name": "ml.g5.4xlarge", 1829 | "vcpuNum": 16 1830 | }, 1831 | { 1832 | "_defaultOrder": 50, 1833 | "_isFastLaunch": false, 1834 | "category": "Accelerated computing", 1835 | "gpuNum": 1, 1836 | "hideHardwareSpecs": false, 1837 | "memoryGiB": 128, 1838 | "name": "ml.g5.8xlarge", 1839 | "vcpuNum": 32 1840 | }, 1841 | { 1842 | "_defaultOrder": 51, 1843 | "_isFastLaunch": false, 1844 | "category": "Accelerated computing", 1845 | "gpuNum": 1, 1846 | "hideHardwareSpecs": false, 1847 | "memoryGiB": 256, 1848 | "name": "ml.g5.16xlarge", 1849 | "vcpuNum": 64 1850 | }, 1851 | { 1852 | "_defaultOrder": 52, 1853 | "_isFastLaunch": false, 1854 | "category": "Accelerated computing", 1855 | "gpuNum": 4, 1856 | "hideHardwareSpecs": false, 1857 | "memoryGiB": 192, 1858 | "name": "ml.g5.12xlarge", 1859 | "vcpuNum": 48 1860 | }, 1861 | { 1862 | "_defaultOrder": 53, 1863 | "_isFastLaunch": false, 1864 | "category": "Accelerated computing", 1865 | "gpuNum": 4, 1866 | "hideHardwareSpecs": false, 1867 | "memoryGiB": 384, 1868 | "name": "ml.g5.24xlarge", 1869 | "vcpuNum": 96 1870 | }, 1871 | { 1872 | "_defaultOrder": 54, 1873 | "_isFastLaunch": false, 1874 | "category": "Accelerated computing", 1875 | "gpuNum": 8, 1876 | "hideHardwareSpecs": false, 1877 | "memoryGiB": 768, 1878 | "name": "ml.g5.48xlarge", 1879 | "vcpuNum": 192 1880 | }, 1881 | { 1882 | "_defaultOrder": 55, 1883 | "_isFastLaunch": false, 1884 | "category": "Accelerated computing", 1885 | "gpuNum": 8, 1886 | "hideHardwareSpecs": false, 1887 | "memoryGiB": 1152, 1888 | "name": "ml.p4d.24xlarge", 1889 | "vcpuNum": 96 1890 | }, 1891 | { 1892 | "_defaultOrder": 56, 1893 | "_isFastLaunch": false, 1894 | "category": "Accelerated computing", 1895 | "gpuNum": 8, 1896 | "hideHardwareSpecs": false, 1897 | "memoryGiB": 1152, 1898 | "name": "ml.p4de.24xlarge", 1899 | "vcpuNum": 96 1900 | }, 1901 | { 1902 | "_defaultOrder": 57, 1903 | "_isFastLaunch": false, 1904 | "category": "Accelerated computing", 1905 | "gpuNum": 0, 1906 | "hideHardwareSpecs": false, 1907 | "memoryGiB": 32, 1908 | "name": "ml.trn1.2xlarge", 1909 | "vcpuNum": 8 1910 | }, 1911 | { 1912 | "_defaultOrder": 58, 1913 | "_isFastLaunch": false, 1914 | "category": "Accelerated computing", 1915 | "gpuNum": 0, 1916 | "hideHardwareSpecs": false, 1917 | "memoryGiB": 512, 1918 | "name": "ml.trn1.32xlarge", 1919 | "vcpuNum": 128 1920 | }, 1921 | { 1922 | "_defaultOrder": 59, 1923 | "_isFastLaunch": false, 1924 | "category": "Accelerated computing", 1925 | "gpuNum": 0, 1926 | "hideHardwareSpecs": false, 1927 | "memoryGiB": 512, 1928 | "name": "ml.trn1n.32xlarge", 1929 | "vcpuNum": 128 1930 | } 1931 | ], 1932 | "instance_type": "ml.t3.medium", 1933 | "kernelspec": { 1934 | "display_name": "Python 3 (ipykernel)", 1935 | "language": "python", 1936 | "name": "python3" 1937 | }, 1938 | "language_info": { 1939 | "codemirror_mode": { 1940 | "name": "ipython", 1941 | "version": 3 1942 | }, 1943 | "file_extension": ".py", 1944 | "mimetype": "text/x-python", 1945 | "name": "python", 1946 | "nbconvert_exporter": "python", 1947 | "pygments_lexer": "ipython3", 1948 | "version": "3.11.11" 1949 | } 1950 | }, 1951 | "nbformat": 4, 1952 | "nbformat_minor": 5 1953 | } 1954 | -------------------------------------------------------------------------------- /query-amazon-redshift-with-mistral-small-SageMaker.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "f851893e-f3cc-4ce6-9b74-c5ac10a436ad", 7 | "metadata": { 8 | "scrolled": true, 9 | "tags": [] 10 | }, 11 | "outputs": [ 12 | { 13 | "name": "stdout", 14 | "output_type": "stream", 15 | "text": [ 16 | "Requirement already satisfied: boto3 in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 1)) (1.36.3)\n", 17 | "Requirement already satisfied: sentencepiece in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 2)) (0.1.99)\n", 18 | "Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 3)) (2.2.3)\n", 19 | "Requirement already satisfied: anthropic in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 4)) (0.49.0)\n", 20 | "Requirement already satisfied: uuid in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 5)) (1.30)\n", 21 | "Requirement already satisfied: transformers in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 6)) (4.48.3)\n", 22 | "Requirement already satisfied: tiktoken in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 7)) (0.9.0)\n", 23 | "Requirement already satisfied: langchain in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 8)) (0.3.17)\n", 24 | "Requirement already satisfied: s3fs in /opt/conda/lib/python3.11/site-packages (from -r requirements.txt (line 9)) (2024.10.0)\n", 25 | "Requirement already satisfied: botocore<1.37.0,>=1.36.3 in /opt/conda/lib/python3.11/site-packages (from boto3->-r requirements.txt (line 1)) (1.36.3)\n", 26 | "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.11/site-packages (from boto3->-r requirements.txt (line 1)) (1.0.1)\n", 27 | "Requirement already satisfied: s3transfer<0.12.0,>=0.11.0 in /opt/conda/lib/python3.11/site-packages (from boto3->-r requirements.txt (line 1)) (0.11.2)\n", 28 | "Requirement already satisfied: numpy>=1.23.2 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (1.26.4)\n", 29 | "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (2.9.0.post0)\n", 30 | "Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (2024.1)\n", 31 | "Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas->-r requirements.txt (line 3)) (2025.1)\n", 32 | "Requirement already satisfied: anyio<5,>=3.5.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (4.8.0)\n", 33 | "Requirement already satisfied: distro<2,>=1.7.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (1.9.0)\n", 34 | "Requirement already satisfied: httpx<1,>=0.23.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (0.28.1)\n", 35 | "Requirement already satisfied: jiter<1,>=0.4.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (0.9.0)\n", 36 | "Requirement already satisfied: pydantic<3,>=1.9.0 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (2.10.6)\n", 37 | "Requirement already satisfied: sniffio in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (1.3.1)\n", 38 | "Requirement already satisfied: typing-extensions<5,>=4.10 in /opt/conda/lib/python3.11/site-packages (from anthropic->-r requirements.txt (line 4)) (4.12.2)\n", 39 | "Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (3.17.0)\n", 40 | "Requirement already satisfied: huggingface-hub<1.0,>=0.24.0 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (0.28.0)\n", 41 | "Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (24.2)\n", 42 | "Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (6.0.2)\n", 43 | "Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (2024.11.6)\n", 44 | "Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (2.32.3)\n", 45 | "Requirement already satisfied: tokenizers<0.22,>=0.21 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (0.21.0)\n", 46 | "Requirement already satisfied: safetensors>=0.4.1 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (0.5.2)\n", 47 | "Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.11/site-packages (from transformers->-r requirements.txt (line 6)) (4.67.1)\n", 48 | "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (2.0.38)\n", 49 | "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (3.9.5)\n", 50 | "Requirement already satisfied: langchain-core<0.4.0,>=0.3.33 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (0.3.34)\n", 51 | "Requirement already satisfied: langchain-text-splitters<0.4.0,>=0.3.3 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (0.3.5)\n", 52 | "Requirement already satisfied: langsmith<0.4,>=0.1.17 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (0.2.11)\n", 53 | "Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /opt/conda/lib/python3.11/site-packages (from langchain->-r requirements.txt (line 8)) (9.0.0)\n", 54 | "Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /opt/conda/lib/python3.11/site-packages (from s3fs->-r requirements.txt (line 9)) (2.19.0)\n", 55 | "Requirement already satisfied: fsspec==2024.10.0.* in /opt/conda/lib/python3.11/site-packages (from s3fs->-r requirements.txt (line 9)) (2024.10.0)\n", 56 | "Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (0.12.0)\n", 57 | "Requirement already satisfied: multidict<7.0.0,>=6.0.0 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (6.1.0)\n", 58 | "Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (1.26.19)\n", 59 | "Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /opt/conda/lib/python3.11/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs->-r requirements.txt (line 9)) (1.17.2)\n", 60 | "Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (1.3.2)\n", 61 | "Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (23.2.0)\n", 62 | "Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (1.5.0)\n", 63 | "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (1.18.3)\n", 64 | "Requirement already satisfied: idna>=2.8 in /opt/conda/lib/python3.11/site-packages (from anyio<5,>=3.5.0->anthropic->-r requirements.txt (line 4)) (3.10)\n", 65 | "Requirement already satisfied: certifi in /opt/conda/lib/python3.11/site-packages (from httpx<1,>=0.23.0->anthropic->-r requirements.txt (line 4)) (2024.12.14)\n", 66 | "Requirement already satisfied: httpcore==1.* in /opt/conda/lib/python3.11/site-packages (from httpx<1,>=0.23.0->anthropic->-r requirements.txt (line 4)) (1.0.7)\n", 67 | "Requirement already satisfied: h11<0.15,>=0.13 in /opt/conda/lib/python3.11/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->anthropic->-r requirements.txt (line 4)) (0.14.0)\n", 68 | "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /opt/conda/lib/python3.11/site-packages (from langchain-core<0.4.0,>=0.3.33->langchain->-r requirements.txt (line 8)) (1.33)\n", 69 | "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in /opt/conda/lib/python3.11/site-packages (from langsmith<0.4,>=0.1.17->langchain->-r requirements.txt (line 8)) (3.10.15)\n", 70 | "Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in /opt/conda/lib/python3.11/site-packages (from langsmith<0.4,>=0.1.17->langchain->-r requirements.txt (line 8)) (1.0.0)\n", 71 | "Requirement already satisfied: annotated-types>=0.6.0 in /opt/conda/lib/python3.11/site-packages (from pydantic<3,>=1.9.0->anthropic->-r requirements.txt (line 4)) (0.7.0)\n", 72 | "Requirement already satisfied: pydantic-core==2.27.2 in /opt/conda/lib/python3.11/site-packages (from pydantic<3,>=1.9.0->anthropic->-r requirements.txt (line 4)) (2.27.2)\n", 73 | "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->-r requirements.txt (line 3)) (1.17.0)\n", 74 | "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->transformers->-r requirements.txt (line 6)) (3.4.1)\n", 75 | "Requirement already satisfied: greenlet!=0.4.17 in /opt/conda/lib/python3.11/site-packages (from SQLAlchemy<3,>=1.4->langchain->-r requirements.txt (line 8)) (3.1.1)\n", 76 | "Requirement already satisfied: jsonpointer>=1.9 in /opt/conda/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.4.0,>=0.3.33->langchain->-r requirements.txt (line 8)) (3.0.0)\n", 77 | "Requirement already satisfied: propcache>=0.2.0 in /opt/conda/lib/python3.11/site-packages (from yarl<2.0,>=1.0->aiohttp<4.0.0,>=3.8.3->langchain->-r requirements.txt (line 8)) (0.2.1)\n", 78 | "Requirement already satisfied: ipywidgets in /opt/conda/lib/python3.11/site-packages (8.1.5)\n", 79 | "Requirement already satisfied: comm>=0.1.3 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (0.2.2)\n", 80 | "Requirement already satisfied: ipython>=6.1.0 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (8.31.0)\n", 81 | "Requirement already satisfied: traitlets>=4.3.1 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (5.14.3)\n", 82 | "Requirement already satisfied: widgetsnbextension~=4.0.12 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (4.0.13)\n", 83 | "Requirement already satisfied: jupyterlab_widgets~=3.0.12 in /opt/conda/lib/python3.11/site-packages (from ipywidgets) (3.0.13)\n", 84 | "Requirement already satisfied: decorator in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (5.1.1)\n", 85 | "Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (0.19.2)\n", 86 | "Requirement already satisfied: matplotlib-inline in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (0.1.7)\n", 87 | "Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (4.9.0)\n", 88 | "Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (3.0.50)\n", 89 | "Requirement already satisfied: pygments>=2.4.0 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (2.19.1)\n", 90 | "Requirement already satisfied: stack_data in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (0.6.3)\n", 91 | "Requirement already satisfied: typing_extensions>=4.6 in /opt/conda/lib/python3.11/site-packages (from ipython>=6.1.0->ipywidgets) (4.12.2)\n", 92 | "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/conda/lib/python3.11/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets) (0.8.4)\n", 93 | "Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.11/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets) (0.7.0)\n", 94 | "Requirement already satisfied: wcwidth in /opt/conda/lib/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets) (0.2.13)\n", 95 | "Requirement already satisfied: executing>=1.2.0 in /opt/conda/lib/python3.11/site-packages (from stack_data->ipython>=6.1.0->ipywidgets) (2.1.0)\n", 96 | "Requirement already satisfied: asttokens>=2.1.0 in /opt/conda/lib/python3.11/site-packages (from stack_data->ipython>=6.1.0->ipywidgets) (3.0.0)\n", 97 | "Requirement already satisfied: pure_eval in /opt/conda/lib/python3.11/site-packages (from stack_data->ipython>=6.1.0->ipywidgets) (0.2.3)\n", 98 | "Requirement already satisfied: sagemaker in /opt/conda/lib/python3.11/site-packages (2.228.0)\n", 99 | "Collecting sagemaker\n", 100 | " Downloading sagemaker-2.243.0-py3-none-any.whl.metadata (16 kB)\n", 101 | "Requirement already satisfied: attrs<24,>=23.1.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (23.2.0)\n", 102 | "Requirement already satisfied: boto3<2.0,>=1.35.75 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (1.36.3)\n", 103 | "Requirement already satisfied: cloudpickle>=2.2.1 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (2.2.1)\n", 104 | "Requirement already satisfied: docker in /opt/conda/lib/python3.11/site-packages (from sagemaker) (7.1.0)\n", 105 | "Requirement already satisfied: fastapi in /opt/conda/lib/python3.11/site-packages (from sagemaker) (0.115.8)\n", 106 | "Requirement already satisfied: google-pasta in /opt/conda/lib/python3.11/site-packages (from sagemaker) (0.2.0)\n", 107 | "Requirement already satisfied: importlib-metadata<7.0,>=1.4.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (6.10.0)\n", 108 | "Requirement already satisfied: jsonschema in /opt/conda/lib/python3.11/site-packages (from sagemaker) (4.23.0)\n", 109 | "Requirement already satisfied: numpy<2.0,>=1.9.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (1.26.4)\n", 110 | "Requirement already satisfied: omegaconf<=2.3,>=2.2 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (2.3.0)\n", 111 | "Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (24.2)\n", 112 | "Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (from sagemaker) (2.2.3)\n", 113 | "Requirement already satisfied: pathos in /opt/conda/lib/python3.11/site-packages (from sagemaker) (0.3.3)\n", 114 | "Requirement already satisfied: platformdirs in /opt/conda/lib/python3.11/site-packages (from sagemaker) (4.3.6)\n", 115 | "Requirement already satisfied: protobuf<6.0,>=3.12 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (4.25.3)\n", 116 | "Requirement already satisfied: psutil in /opt/conda/lib/python3.11/site-packages (from sagemaker) (5.9.8)\n", 117 | "Requirement already satisfied: pyyaml~=6.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (6.0.2)\n", 118 | "Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from sagemaker) (2.32.3)\n", 119 | "Collecting sagemaker-core<2.0.0,>=1.0.17 (from sagemaker)\n", 120 | " Downloading sagemaker_core-1.0.27-py3-none-any.whl.metadata (4.9 kB)\n", 121 | "Requirement already satisfied: schema in /opt/conda/lib/python3.11/site-packages (from sagemaker) (0.7.7)\n", 122 | "Requirement already satisfied: smdebug-rulesconfig==1.0.1 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (1.0.1)\n", 123 | "Requirement already satisfied: tblib<4,>=1.7.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (2.0.0)\n", 124 | "Requirement already satisfied: tqdm in /opt/conda/lib/python3.11/site-packages (from sagemaker) (4.67.1)\n", 125 | "Requirement already satisfied: urllib3<3.0.0,>=1.26.8 in /opt/conda/lib/python3.11/site-packages (from sagemaker) (1.26.19)\n", 126 | "Requirement already satisfied: uvicorn in /opt/conda/lib/python3.11/site-packages (from sagemaker) (0.34.0)\n", 127 | "Requirement already satisfied: botocore<1.37.0,>=1.36.3 in /opt/conda/lib/python3.11/site-packages (from boto3<2.0,>=1.35.75->sagemaker) (1.36.3)\n", 128 | "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.11/site-packages (from boto3<2.0,>=1.35.75->sagemaker) (1.0.1)\n", 129 | "Requirement already satisfied: s3transfer<0.12.0,>=0.11.0 in /opt/conda/lib/python3.11/site-packages (from boto3<2.0,>=1.35.75->sagemaker) (0.11.2)\n", 130 | "Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.11/site-packages (from importlib-metadata<7.0,>=1.4.0->sagemaker) (3.21.0)\n", 131 | "Requirement already satisfied: antlr4-python3-runtime==4.9.* in /opt/conda/lib/python3.11/site-packages (from omegaconf<=2.3,>=2.2->sagemaker) (4.9.3)\n", 132 | "Requirement already satisfied: pydantic<3.0.0,>=2.0.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker-core<2.0.0,>=1.0.17->sagemaker) (2.10.6)\n", 133 | "Requirement already satisfied: rich<14.0.0,>=13.0.0 in /opt/conda/lib/python3.11/site-packages (from sagemaker-core<2.0.0,>=1.0.17->sagemaker) (13.9.4)\n", 134 | "Collecting mock<5.0,>4.0 (from sagemaker-core<2.0.0,>=1.0.17->sagemaker)\n", 135 | " Using cached mock-4.0.3-py3-none-any.whl.metadata (2.8 kB)\n", 136 | "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/conda/lib/python3.11/site-packages (from jsonschema->sagemaker) (2024.10.1)\n", 137 | "Requirement already satisfied: referencing>=0.28.4 in /opt/conda/lib/python3.11/site-packages (from jsonschema->sagemaker) (0.36.2)\n", 138 | "Requirement already satisfied: rpds-py>=0.7.1 in /opt/conda/lib/python3.11/site-packages (from jsonschema->sagemaker) (0.22.3)\n", 139 | "Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->sagemaker) (3.4.1)\n", 140 | "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->sagemaker) (3.10)\n", 141 | "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests->sagemaker) (2024.12.14)\n", 142 | "Requirement already satisfied: starlette<0.46.0,>=0.40.0 in /opt/conda/lib/python3.11/site-packages (from fastapi->sagemaker) (0.45.3)\n", 143 | "Requirement already satisfied: typing-extensions>=4.8.0 in /opt/conda/lib/python3.11/site-packages (from fastapi->sagemaker) (4.12.2)\n", 144 | "Requirement already satisfied: six in /opt/conda/lib/python3.11/site-packages (from google-pasta->sagemaker) (1.17.0)\n", 145 | "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from pandas->sagemaker) (2.9.0.post0)\n", 146 | "Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas->sagemaker) (2024.1)\n", 147 | "Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas->sagemaker) (2025.1)\n", 148 | "Requirement already satisfied: ppft>=1.7.6.9 in /opt/conda/lib/python3.11/site-packages (from pathos->sagemaker) (1.7.6.9)\n", 149 | "Requirement already satisfied: dill>=0.3.9 in /opt/conda/lib/python3.11/site-packages (from pathos->sagemaker) (0.3.9)\n", 150 | "Requirement already satisfied: pox>=0.3.5 in /opt/conda/lib/python3.11/site-packages (from pathos->sagemaker) (0.3.5)\n", 151 | "Requirement already satisfied: multiprocess>=0.70.17 in /opt/conda/lib/python3.11/site-packages (from pathos->sagemaker) (0.70.17)\n", 152 | "Requirement already satisfied: click>=7.0 in /opt/conda/lib/python3.11/site-packages (from uvicorn->sagemaker) (8.1.8)\n", 153 | "Requirement already satisfied: h11>=0.8 in /opt/conda/lib/python3.11/site-packages (from uvicorn->sagemaker) (0.14.0)\n", 154 | "Requirement already satisfied: annotated-types>=0.6.0 in /opt/conda/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.0.0->sagemaker-core<2.0.0,>=1.0.17->sagemaker) (0.7.0)\n", 155 | "Requirement already satisfied: pydantic-core==2.27.2 in /opt/conda/lib/python3.11/site-packages (from pydantic<3.0.0,>=2.0.0->sagemaker-core<2.0.0,>=1.0.17->sagemaker) (2.27.2)\n", 156 | "Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.11/site-packages (from rich<14.0.0,>=13.0.0->sagemaker-core<2.0.0,>=1.0.17->sagemaker) (3.0.0)\n", 157 | "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.11/site-packages (from rich<14.0.0,>=13.0.0->sagemaker-core<2.0.0,>=1.0.17->sagemaker) (2.19.1)\n", 158 | "Requirement already satisfied: anyio<5,>=3.6.2 in /opt/conda/lib/python3.11/site-packages (from starlette<0.46.0,>=0.40.0->fastapi->sagemaker) (4.8.0)\n", 159 | "Requirement already satisfied: sniffio>=1.1 in /opt/conda/lib/python3.11/site-packages (from anyio<5,>=3.6.2->starlette<0.46.0,>=0.40.0->fastapi->sagemaker) (1.3.1)\n", 160 | "Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich<14.0.0,>=13.0.0->sagemaker-core<2.0.0,>=1.0.17->sagemaker) (0.1.2)\n", 161 | "Downloading sagemaker-2.243.0-py3-none-any.whl (1.6 MB)\n", 162 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m46.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 163 | "\u001b[?25hDownloading sagemaker_core-1.0.27-py3-none-any.whl (407 kB)\n", 164 | "Using cached mock-4.0.3-py3-none-any.whl (28 kB)\n", 165 | "Installing collected packages: mock, sagemaker-core, sagemaker\n", 166 | " Attempting uninstall: sagemaker\n", 167 | " Found existing installation: sagemaker 2.228.0\n", 168 | " Uninstalling sagemaker-2.228.0:\n", 169 | " Successfully uninstalled sagemaker-2.228.0\n", 170 | "Successfully installed mock-4.0.3 sagemaker-2.243.0 sagemaker-core-1.0.27\n" 171 | ] 172 | } 173 | ], 174 | "source": [ 175 | "!pip install -r requirements.txt\n", 176 | "!pip install ipywidgets\n", 177 | "!pip install sagemaker -U" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 38, 183 | "id": "76c3aefd-00b6-47c5-996a-99f44006b85d", 184 | "metadata": { 185 | "tags": [] 186 | }, 187 | "outputs": [], 188 | "source": [ 189 | "import re\n", 190 | "import pandas as pd\n", 191 | "from io import StringIO\n", 192 | "import json\n", 193 | "import time\n", 194 | "import boto3\n", 195 | "# import sentencepiece\n", 196 | "import pandas as pd\n", 197 | "from anthropic import Anthropic\n", 198 | "CLAUDE = Anthropic()\n", 199 | "import multiprocessing\n", 200 | "import subprocess\n", 201 | "import shutil\n", 202 | "import os\n", 203 | "import codecs\n", 204 | "import uuid\n", 205 | "from transformers import LlamaTokenizer\n", 206 | "import tiktoken\n", 207 | "from transformers import AutoTokenizer\n", 208 | "REDSHIFT=boto3.client('redshift-data')\n", 209 | "S3=boto3.client('s3')\n", 210 | "from botocore.config import Config\n", 211 | "import ipywidgets as widgets\n", 212 | "from IPython.display import display\n", 213 | "\n", 214 | "config = Config(\n", 215 | " read_timeout=360,\n", 216 | " retries = dict(\n", 217 | " max_attempts = 10\n", 218 | " )\n", 219 | ")\n", 220 | "BEDROCK=boto3.client(service_name='bedrock-runtime',region_name='us-east-1',config=config)\n", 221 | "MIXTRAL_ENDPOINT=\"mistral-endpoint\"" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "id": "3db7b0a8-f945-44eb-b85d-2364aedf9050", 227 | "metadata": {}, 228 | "source": [ 229 | "### Deploy Mixtral Small to SageMaker Endpoint" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 40, 235 | "id": "c6570d1a-7948-420f-a3f6-78058efcb496", 236 | "metadata": { 237 | "tags": [] 238 | }, 239 | "outputs": [ 240 | { 241 | "name": "stderr", 242 | "output_type": "stream", 243 | "text": [ 244 | "No instance type selected for inference hosting endpoint. Defaulting to ml.g6.12xlarge.\n" 245 | ] 246 | }, 247 | { 248 | "data": { 249 | "text/html": [ 250 | "
[03/28/25 16:32:22] INFO     No instance type selected for inference hosting endpoint. Defaulting to   model.py:238\n",
 251 |        "                             ml.g6.12xlarge.                                                                       \n",
 252 |        "
\n" 253 | ], 254 | "text/plain": [ 255 | "\u001b[2;36m[03/28/25 16:32:22]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;38;2;0;105;255mINFO \u001b[0m No instance type selected for inference hosting endpoint. Defaulting to \u001b]8;id=336744;file:///opt/conda/lib/python3.11/site-packages/sagemaker/jumpstart/factory/model.py\u001b\\\u001b[2mmodel.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=511852;file:///opt/conda/lib/python3.11/site-packages/sagemaker/jumpstart/factory/model.py#238\u001b\\\u001b[2m238\u001b[0m\u001b]8;;\u001b\\\n", 256 | "\u001b[2;36m \u001b[0m ml.g6.12xlarge. \u001b[2m \u001b[0m\n" 257 | ] 258 | }, 259 | "metadata": {}, 260 | "output_type": "display_data" 261 | } 262 | ], 263 | "source": [ 264 | "# Note this requires an ml.g5.48xlarge instance.\n", 265 | "model_id = \"huggingface-llm-mistral-small-24B-Instruct-2501\"\n", 266 | "from sagemaker.jumpstart.model import JumpStartModel\n", 267 | "model = JumpStartModel(model_id=model_id)\n", 268 | "predictor = model.deploy(endpoint_name=MIXTRAL_ENDPOINT)\n", 269 | "import sagemaker\n", 270 | "session = sagemaker.Session()" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "id": "c61e948f-67b0-4c72-ad33-930071495bed", 276 | "metadata": {}, 277 | "source": [ 278 | "## REDSHIFT" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "id": "e85ecd78-4240-4339-8a7b-3e77bd8d0824", 284 | "metadata": {}, 285 | "source": [ 286 | "#### Change parameters below to those of your redshift provisioned cluster" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 41, 292 | "id": "3a7227db-c014-4272-a1b7-57babd5232b4", 293 | "metadata": { 294 | "tags": [] 295 | }, 296 | "outputs": [], 297 | "source": [ 298 | "redshift_client = boto3.client('redshift-data')\n", 299 | "CLUSTER_IDENTIFIER = 'redshift-cluster-1'\n", 300 | "DATABASE = 'dev'\n", 301 | "DB_USER = 'awsuser' " 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 42, 307 | "id": "bdaacf97-6998-487f-8157-cb27d9a278bd", 308 | "metadata": { 309 | "tags": [] 310 | }, 311 | "outputs": [], 312 | "source": [ 313 | "redshift_client = boto3.client('redshift-data')\n", 314 | "CLUSTER_IDENTIFIER = 'redshift-cluster-1'\n", 315 | "DATABASE = 'dev'\n", 316 | "DB_USER = 'awsuser' " 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 43, 322 | "id": "a39647d0-ec63-43d6-9043-d34cc8e84141", 323 | "metadata": { 324 | "tags": [] 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "def token_counter(path):\n", 329 | " tokenizer = LlamaTokenizer.from_pretrained(path)\n", 330 | " return tokenizer\n", 331 | "def mixtral_counter(path):\n", 332 | " tokenizer = AutoTokenizer.from_pretrained(path)\n", 333 | " return tokenizer" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 44, 339 | "id": "886fd56d-0cf4-406a-af64-5dd30e691563", 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [ 343 | "def count_tokens(text, endpoint_name):\n", 344 | " client = boto3.client('sagemaker-runtime')\n", 345 | " \n", 346 | " # Prepare the input\n", 347 | " payload = {\n", 348 | " \"inputs\": text,\n", 349 | " \"parameters\": {\"return_token_count\": True}\n", 350 | " }\n", 351 | " \n", 352 | " # Call the endpoint\n", 353 | " response = client.invoke_endpoint(\n", 354 | " EndpointName=endpoint_name,\n", 355 | " ContentType='application/json',\n", 356 | " Body=json.dumps(payload)\n", 357 | " )\n", 358 | " \n", 359 | " # Parse the response\n", 360 | " response_body = json.loads(response['Body'].read().decode())\n", 361 | " return response_body.get('token_count', 0)" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 45, 367 | "id": "fb0a14f7-8879-4802-b41e-b6f18e5599f5", 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "def query_llm(prompt, max_new_tokens):\n", 372 | " payload = {\n", 373 | " \"inputs\": prompt,\n", 374 | " \"parameters\": {\"max_new_tokens\": max_new_tokens}\n", 375 | " }\n", 376 | " llama = boto3.client(\"sagemaker-runtime\")\n", 377 | " output = llama.invoke_endpoint(Body=json.dumps(payload), EndpointName=MIXTRAL_ENDPOINT)\n", 378 | " \n", 379 | " # Read and parse response\n", 380 | " response_body = output['Body'].read().decode()\n", 381 | " print(\"Raw response from LLM:\", response_body) # Debug print\n", 382 | " \n", 383 | " try:\n", 384 | " response_json = json.loads(response_body)\n", 385 | " if isinstance(response_json, list):\n", 386 | " generated_text = response_json[0]['generated_text']\n", 387 | " else:\n", 388 | " generated_text = response_json['generated_text']\n", 389 | " \n", 390 | " print(\"Processed response:\", generated_text) # Debug print\n", 391 | " return generated_text\n", 392 | " except Exception as e:\n", 393 | " print(f\"Error processing LLM response: {str(e)}\")\n", 394 | " print(f\"Raw response was: {response_body}\")\n", 395 | " return None" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": 46, 401 | "id": "881dc02f-58ca-41af-b7b3-5657f9c040cb", 402 | "metadata": {}, 403 | "outputs": [], 404 | "source": [ 405 | "def qna_llm(prompt, params):\n", 406 | " payload = {\n", 407 | " \"inputs\": prompt,\n", 408 | " \"parameters\": {\n", 409 | " \"max_new_tokens\": 200,\n", 410 | " \"temperature\": 0.1\n", 411 | " }\n", 412 | " }\n", 413 | " llama = boto3.client(\"sagemaker-runtime\")\n", 414 | " output = llama.invoke_endpoint(Body=json.dumps(payload), EndpointName=MIXTRAL_ENDPOINT)\n", 415 | " \n", 416 | " # Parse response and handle different possible formats\n", 417 | " response_body = json.loads(output['Body'].read().decode())\n", 418 | " \n", 419 | " # Debug print to see the actual response structure\n", 420 | " print(\"Response structure:\", response_body)\n", 421 | " \n", 422 | " # Handle different response formats\n", 423 | " if isinstance(response_body, list):\n", 424 | " answer = response_body[0]['generated_text']\n", 425 | " elif isinstance(response_body, dict):\n", 426 | " answer = response_body.get('generated_text')\n", 427 | " else:\n", 428 | " raise ValueError(f\"Unexpected response format: {response_body}\")\n", 429 | " \n", 430 | " return answer" 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 47, 436 | "id": "dec641b4-70a4-45d3-86cb-bd876d15e354", 437 | "metadata": { 438 | "tags": [] 439 | }, 440 | "outputs": [], 441 | "source": [ 442 | "def chunk_csv_rows(csv_rows, max_token_per_chunk):\n", 443 | " \"\"\"\n", 444 | " Chunk CSV rows based on the maximum token count per chunk.\n", 445 | " Args:\n", 446 | " csv_rows (list): List of CSV rows.\n", 447 | " max_token_per_chunk (int, optional): Maximum token count per chunk.\n", 448 | " Returns:\n", 449 | " list: List of chunks containing CSV rows.\n", 450 | " Raises:\n", 451 | " ValueError: If a single CSV row exceeds the specified max_token_per_chunk.\n", 452 | " \"\"\"\n", 453 | " header = csv_rows[0] # Assuming the first row is the header\n", 454 | " csv_rows = csv_rows[1:] # Remove the header from the list\n", 455 | " current_chunk = []\n", 456 | " current_token_count = 0\n", 457 | " chunks = []\n", 458 | " header_token=len(mixtral_counter(\"mistralai/Mistral-Small-24B-Instruct-2501\").encode(header))\n", 459 | " for row in csv_rows:\n", 460 | " token = len(mixtral_counter(\"mistralai/Mistral-Small-24B-Instruct-2501\").encode(row))\n", 461 | " if current_token_count + token+header_token <= max_token_per_chunk:\n", 462 | " current_chunk.append(row)\n", 463 | " current_token_count += token\n", 464 | " else:\n", 465 | " if not current_chunk:\n", 466 | " raise ValueError(\"A single CSV row exceeds the specified max_token_per_chunk.\")\n", 467 | " header_and_chunk=[header]+current_chunk\n", 468 | " chunks.append(\"\\n\".join([x for x in header_and_chunk]))\n", 469 | " current_chunk = [row]\n", 470 | " current_token_count = token\n", 471 | "\n", 472 | " if current_chunk:\n", 473 | " last_chunk_and_header=[header]+current_chunk\n", 474 | " chunks.append(\"\\n\".join([x for x in last_chunk_and_header]))\n", 475 | " return chunks" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": 48, 481 | "id": "248fc3be-eede-4b2c-a926-8d5c7a04f30e", 482 | "metadata": { 483 | "tags": [] 484 | }, 485 | "outputs": [], 486 | "source": [ 487 | "def get_tables_redshift(cluster_identifier, database, db_user, schema):\n", 488 | " \"\"\"\n", 489 | " Get a list of table names in a specified schema from an Amazon Redshift cluster.\n", 490 | " Args:\n", 491 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 492 | " database (str): The name of the database containing the tables.\n", 493 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 494 | " schema (str): The schema pattern to filter tables.\n", 495 | " Returns:\n", 496 | " list: A list of table names in the specified schema.\n", 497 | " \"\"\"\n", 498 | " tables_ls = REDSHIFT.list_tables(\n", 499 | " ClusterIdentifier=cluster_identifier,\n", 500 | " Database=database,\n", 501 | " DbUser=db_user,\n", 502 | " SchemaPattern=schema\n", 503 | " )\n", 504 | " return [x['name'] for x in tables_ls['Tables']]" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 49, 510 | "id": "47e81566-6e5e-4771-9fdd-65343c388f49", 511 | "metadata": { 512 | "tags": [] 513 | }, 514 | "outputs": [], 515 | "source": [ 516 | "def get_db_redshift(cluster_identifier, database, db_user):\n", 517 | " \"\"\"\n", 518 | " Get a list of databases from an Amazon Redshift cluster.\n", 519 | " Args:\n", 520 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 521 | " database (str): The name of the database containing the tables.\n", 522 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 523 | " Returns:\n", 524 | " list: A list of databases in the Redshift cluster.\n", 525 | " \"\"\"\n", 526 | " db_ls = REDSHIFT.list_databases(\n", 527 | " ClusterIdentifier=cluster_identifier,\n", 528 | " Database=database,\n", 529 | " DbUser=db_user\n", 530 | " )\n", 531 | " return db_ls['Databases']" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 50, 537 | "id": "34cf5577-7130-4302-a03f-4f5a60db0a9a", 538 | "metadata": { 539 | "tags": [] 540 | }, 541 | "outputs": [], 542 | "source": [ 543 | "def get_schema_redshift(cluster_identifier, database, db_user):\n", 544 | " \"\"\"\n", 545 | " Get a list of schemas from an Amazon Redshift cluster.\n", 546 | " Args:\n", 547 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 548 | " database (str): The name of the database containing the schemas.\n", 549 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 550 | " Returns:\n", 551 | " list: A list of schemas in the Redshift cluster.\n", 552 | " \"\"\"\n", 553 | " schema_ls = REDSHIFT.list_schemas(\n", 554 | " ClusterIdentifier=cluster_identifier,\n", 555 | " Database=database,\n", 556 | " DbUser=db_user\n", 557 | " )\n", 558 | " return schema_ls['Schemas']" 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": 51, 564 | "id": "67b92d3e-0803-4581-80c5-adcfbee717f7", 565 | "metadata": { 566 | "tags": [] 567 | }, 568 | "outputs": [], 569 | "source": [ 570 | "def execute_query_with_pagination( sql_query, cluster_identifier, database, db_user):\n", 571 | " \"\"\"\n", 572 | " Execute multiple SQL queries in Amazon Redshift with pagination support.\n", 573 | " Args:\n", 574 | " sql_query1 (str): The first SQL query to execute.\n", 575 | " sql_query2 (str): The second SQL query to execute.\n", 576 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 577 | " database (str): The name of the database.\n", 578 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 579 | " Returns:\n", 580 | " list: A list of results from executing the SQL queries.\n", 581 | " \"\"\"\n", 582 | " results_list=[]\n", 583 | " response_b = REDSHIFT.batch_execute_statement(\n", 584 | " ClusterIdentifier=cluster_identifier,\n", 585 | " Database=database,\n", 586 | " DbUser=db_user,\n", 587 | " Sqls=sql_query\n", 588 | " ) \n", 589 | " describe_b=REDSHIFT.describe_statement(\n", 590 | " Id=response_b['Id'],\n", 591 | " ) \n", 592 | " status=describe_b['Status']\n", 593 | " while status != \"FINISHED\":\n", 594 | " time.sleep(1)\n", 595 | " describe_b=REDSHIFT.describe_statement(\n", 596 | " Id=response_b['Id'],\n", 597 | " ) \n", 598 | " status=describe_b['Status']\n", 599 | " max_attempts = 5 \n", 600 | " attempts = 0\n", 601 | " while attempts < max_attempts:\n", 602 | " try:\n", 603 | " for ids in describe_b['SubStatements']:\n", 604 | " result_b = REDSHIFT.get_statement_result(Id=ids['Id']) \n", 605 | " results_list.append(get_redshift_table_result(result_b))\n", 606 | " break\n", 607 | " except REDSHIFT.exceptions.ResourceNotFoundException as e:\n", 608 | " attempts += 1\n", 609 | " time.sleep(2)\n", 610 | " return results_list" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": 52, 616 | "id": "bb1c043c-60d0-4d34-8fdd-886d6120c40f", 617 | "metadata": { 618 | "tags": [] 619 | }, 620 | "outputs": [], 621 | "source": [ 622 | "def get_redshift_table_result(response):\n", 623 | " \"\"\"\n", 624 | " Extracts result data from a Redshift query response and returns it as a CSV string.\n", 625 | " Args:\n", 626 | " response (dict): The response object from a Redshift query.\n", 627 | " Returns:\n", 628 | " str: A CSV string containing the result data.\n", 629 | " \"\"\"\n", 630 | " columns = [c['name'] for c in response['ColumnMetadata']] \n", 631 | " data = []\n", 632 | " for r in response['Records']:\n", 633 | " row = []\n", 634 | " for col in r:\n", 635 | " row.append(list(col.values())[0]) \n", 636 | " data.append(row)\n", 637 | " df = pd.DataFrame(data, columns=columns) \n", 638 | " return df.to_csv(index=False)" 639 | ] 640 | }, 641 | { 642 | "cell_type": "code", 643 | "execution_count": 53, 644 | "id": "8c24a0b9-ed77-4222-bc87-af828502e2ed", 645 | "metadata": { 646 | "tags": [] 647 | }, 648 | "outputs": [], 649 | "source": [ 650 | "def execute_query_redshift(sql_query, cluster_identifier, database, db_user):\n", 651 | " \"\"\"\n", 652 | " Execute a SQL query on an Amazon Redshift cluster.\n", 653 | " Args:\n", 654 | " sql_query (str): The SQL query to execute.\n", 655 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 656 | " database (str): The name of the database.\n", 657 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 658 | " Returns:\n", 659 | " dict: The response object from executing the SQL query.\n", 660 | " \"\"\"\n", 661 | " response = REDSHIFT.execute_statement(\n", 662 | " ClusterIdentifier=cluster_identifier,\n", 663 | " Database=database,\n", 664 | " DbUser=db_user,\n", 665 | " Sql=sql_query\n", 666 | " )\n", 667 | " return response" 668 | ] 669 | }, 670 | { 671 | "cell_type": "code", 672 | "execution_count": 54, 673 | "id": "1a7c4ef0-9166-4faf-a820-0c03d8974e87", 674 | "metadata": { 675 | "tags": [] 676 | }, 677 | "outputs": [], 678 | "source": [ 679 | "def single_execute_query(sql_query, cluster_identifier, database, db_user,question):\n", 680 | " \"\"\"\n", 681 | " Execute a single SQL query on an Amazon Redshift cluster and process the result.\n", 682 | "\n", 683 | " Args:\n", 684 | " sql_query (str): The SQL query to execute.\n", 685 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 686 | " database (str): The name of the database.\n", 687 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 688 | " question (str): A descriptive label or question associated with the query.\n", 689 | "\n", 690 | " Returns:\n", 691 | " pandas.DataFrame: DataFrame containing the processed result of the SQL query.\n", 692 | "\n", 693 | " \"\"\"\n", 694 | " result_sets = []\n", 695 | " response = execute_query_redshift(sql_query, cluster_identifier, database, db_user)\n", 696 | " df=redshift_querys(sql_query,response,question,params,cluster_identifier, database, db_user,question) \n", 697 | " return df" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": 55, 703 | "id": "30c4b2f5-65e8-4c9c-b0bc-f19406d7838c", 704 | "metadata": { 705 | "tags": [] 706 | }, 707 | "outputs": [], 708 | "source": [ 709 | "def llm_debugger(question, statement, error, params): \n", 710 | " \"\"\"\n", 711 | " Generate debugging guidance and expected SQL correction for a PostgreSQL error.\n", 712 | " Args:\n", 713 | " question (str): The user's question or intent.\n", 714 | " statement (str): The SQL statement that caused the error.\n", 715 | " error (str): The error message encountered.\n", 716 | " params (dict): Additional parameters including schema, sample data, and length.\n", 717 | " Returns:\n", 718 | " str: Formatted debugging guidance and expected SQL correction.\n", 719 | " \"\"\"\n", 720 | " prompts=f'''<>[INST]\n", 721 | "You are a PostgreSQL developer who is an expert at debugging errors. \n", 722 | "\n", 723 | "Here are the schema definition of table(s):\n", 724 | "{params['schema']}\n", 725 | "#############################\n", 726 | "Here are example records for each table:\n", 727 | "{params['sample']}\n", 728 | "#############################\n", 729 | "Here is the sql statement that threw the error below:\n", 730 | "{statement}\n", 731 | "#############################\n", 732 | "Here is the error to debug:\n", 733 | "{error}\n", 734 | "#############################\n", 735 | "Here is the intent of the user:\n", 736 | "{params['prompt']}\n", 737 | "<>\n", 738 | "First understand the error and think about how you can fix the error.\n", 739 | "Use the provided schema and sample row to guide your thought process for a solution.\n", 740 | "Do all this thinking inside XML tags.This is a space for you to write down relevant content and will not be shown to the user.\n", 741 | "\n", 742 | "Once your are done debugging, provide the the correct SQL statement without any additional text.\n", 743 | "When generating the correct SQL statement:\n", 744 | "1. Pay attention to the schema and table name and use them correctly in your generated sql. \n", 745 | "2. Never query for all columns from a table unless the question says so. You must query only the columns that are needed to answer the question.\n", 746 | "3. Wrap each column name in double quotes (\") to denote them as delimited identifiers. Do not use backslash (\\) to escape underscores (_) in column names. \n", 747 | "\n", 748 | "Format your response as:\n", 749 | " Correct SQL Statement [/INST]'''\n", 750 | "\n", 751 | " \n", 752 | "# prompts=f''' [INST] You are a PostgreSQL developer who is an expert at debugging errors.\n", 753 | "# Here are the schema definition of table(s):\n", 754 | "# {params['schema']}\n", 755 | "# #############################\n", 756 | "# Here are example records for each table:\n", 757 | "# {params['sample']}\n", 758 | "# #############################\n", 759 | "# Here is the sql statement that threw the error below:\n", 760 | "# {statement}\n", 761 | "# #############################\n", 762 | "# Here is the error to debug:\n", 763 | "# {error}\n", 764 | "# #############################\n", 765 | "# Here is the intent of the user:\n", 766 | "# {params['prompt']} \n", 767 | "# First understand the error and think about how you can fix the error.\n", 768 | "# Use the provided schema and sample row to guide your thought process for a solution.\n", 769 | "# Do all this thinking inside XML tags.This is a space for you to write down relevant content and will not be shown to the user.\n", 770 | "# Once your are done debugging, provide the the correct SQL statement without any additional text.\n", 771 | "# When generating the correct SQL statement:\n", 772 | "# 1. Pay attention to the database schema and table name and use them correctly in your response. \n", 773 | "# 2. Never query for all columns from a table unless the question says so. You must query only the columns that are needed to answer the question.\n", 774 | "# 3. Wrap all column name(s) in double quotes (\") to denote them as delimited identifiers. \n", 775 | "# 4. DO NOT escape underscores (_) in column name(s). Just wrap them in double quotes (\").\n", 776 | "# 5. SQL engine is Amazon Redshift database.\n", 777 | "\n", 778 | "# Format your response as:\n", 779 | "# Correct SQL Statement [/INST] '''\n", 780 | " answer=query_llm(prompts,round(params['sql-len']))\n", 781 | " answer = answer.replace(\"\\\\\",\"\")\n", 782 | " return answer" 783 | ] 784 | }, 785 | { 786 | "cell_type": "code", 787 | "execution_count": 56, 788 | "id": "aae1ae4a-1171-48c4-9799-c0d57566015b", 789 | "metadata": { 790 | "tags": [] 791 | }, 792 | "outputs": [], 793 | "source": [ 794 | "def redshift_querys(q_s,response,prompt,params,cluster_identifier, database, db_user,question): \n", 795 | " \"\"\"\n", 796 | " Execute a Redshift query, handle errors, debug SQL, and return the result.\n", 797 | "\n", 798 | " Args:\n", 799 | " q_s (str): The SQL statement to execute or debug.\n", 800 | " response (dict): The response object from executing the SQL statement.\n", 801 | " prompt (str): The user's question or intent.\n", 802 | " params (dict): Additional parameters including schema, sample data, and length.\n", 803 | " cluster_identifier (str): The identifier of the Redshift cluster.\n", 804 | " database (str): The name of the database.\n", 805 | " db_user (str): The username used to authenticate with the Redshift cluster.\n", 806 | " question (str): A descriptive label or question associated with the query.\n", 807 | "\n", 808 | " Returns:\n", 809 | " pandas.DataFrame or str: DataFrame containing the query result, or debugging failure message with no result.\n", 810 | "\n", 811 | " \"\"\"\n", 812 | " max_execution=5\n", 813 | " attempt_number=0\n", 814 | " debug_count=max_execution\n", 815 | " try:\n", 816 | " statement_result = REDSHIFT.get_statement_result(\n", 817 | " Id=response['Id'],\n", 818 | "\n", 819 | " )\n", 820 | " except REDSHIFT.exceptions.ResourceNotFoundException as err: \n", 821 | " # print(err)\n", 822 | " describe_statement=REDSHIFT.describe_statement(\n", 823 | " Id=response['Id'],\n", 824 | " )\n", 825 | " query_state=describe_statement['Status'] \n", 826 | " while query_state in ['SUBMITTED','PICKED','STARTED']:\n", 827 | " # print(query_state)\n", 828 | " time.sleep(1)\n", 829 | " describe_statement=REDSHIFT.describe_statement(\n", 830 | " Id=response['Id'],\n", 831 | " )\n", 832 | " query_state=describe_statement['Status']\n", 833 | " while (max_execution > 0 and query_state == \"FAILED\"):\n", 834 | " max_execution = max_execution - 1\n", 835 | " attempt_number = 5 - max_execution\n", 836 | " print(\"- - - - - - - - - - - - - -\\n\")\n", 837 | " print(f\"\\nDEBUG TRIAL {attempt_number}\")\n", 838 | " bad_sql=describe_statement['QueryString']\n", 839 | " print(f\"\\nBAD SQL:\\n{bad_sql}\") \n", 840 | " error=describe_statement['Error']\n", 841 | " print(f\"ERROR:{error}\")\n", 842 | " print(\"\\nDEBUGGING...\")\n", 843 | " cql=llm_debugger(prompt, bad_sql, error, params) \n", 844 | " idx1 = cql.index('')\n", 845 | " idx2 = cql.index('')\n", 846 | " q_s=cql[idx1 + len('') + 1: idx2]\n", 847 | " print(f\"\\nDEBUGGED SQL {q_s}\")\n", 848 | " response = execute_query_redshift(q_s, cluster_identifier, database, db_user)\n", 849 | " describe_statement=REDSHIFT.describe_statement(\n", 850 | " Id=response['Id'],\n", 851 | " )\n", 852 | " query_state=describe_statement['Status']\n", 853 | " # print(f\"\\n{query_state}\")\n", 854 | " while query_state in ['SUBMITTED','PICKED','STARTED']:\n", 855 | " time.sleep(2)\n", 856 | " # print(f\"\\n{query_state}\")\n", 857 | " describe_statement=REDSHIFT.describe_statement(\n", 858 | " Id=response['Id'],\n", 859 | " )\n", 860 | " query_state=describe_statement['Status']\n", 861 | " if query_state == \"FINISHED\": \n", 862 | " break \n", 863 | " \n", 864 | " if max_execution == 0 and query_state == \"FAILED\":\n", 865 | " print(f\"DEBUGGING FAILED IN {str(debug_count)} ATTEMPTS\")\n", 866 | " else: \n", 867 | " max_attempts = 5\n", 868 | " attempts = 0\n", 869 | " while attempts < max_attempts:\n", 870 | " try:\n", 871 | " time.sleep(1)\n", 872 | " # print(response['Id'])\n", 873 | " statement_result = REDSHIFT.get_statement_result(\n", 874 | " Id=response['Id']\n", 875 | " )\n", 876 | " break\n", 877 | "\n", 878 | " except REDSHIFT.exceptions.ResourceNotFoundException as e:\n", 879 | " attempts += 1\n", 880 | " time.sleep(5)\n", 881 | " if max_execution == 0 and query_state == \"FAILED\":\n", 882 | " df=f\"DEBUGGING FAILED IN {str(debug_count)} ATTEMPTS. NO RESULT AVAILABLE\"\n", 883 | " else:\n", 884 | " df=get_redshift_table_result(statement_result)\n", 885 | " return df, q_s" 886 | ] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 57, 891 | "id": "3cab616f-b529-4646-961f-0bb1e420e018", 892 | "metadata": { 893 | "tags": [] 894 | }, 895 | "outputs": [], 896 | "source": [ 897 | "def redshift_qna(params):\n", 898 | " \"\"\"\n", 899 | " Execute a Q&A process for generating SQL queries based on user questions.\n", 900 | " Args:\n", 901 | " params (dict): A dictionary containing parameters including table name, database name, prompt, etc.\n", 902 | " Returns:\n", 903 | " tuple: A tuple containing the response, generated SQL statement, and query output.\n", 904 | " \"\"\"\n", 905 | " # sql1=f\"SELECT * FROM information_schema.columns WHERE table_name='{params['table']}' AND table_schema='{params['db']}'\"\n", 906 | " # sql2=f\"SELECT * from dev.{params['db']}.{params['table']} LIMIT 10\"\n", 907 | " sql1=f\"SELECT table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type FROM information_schema.columns WHERE table_schema='{params['db']}'\"\n", 908 | " sql2=[]\n", 909 | " for table in params['tables']:\n", 910 | " sql2.append(f\"SELECT * from {params['db']}.{table} LIMIT 3\")\n", 911 | " sqls=[sql1]+sql2\n", 912 | " #print(sqls)\n", 913 | " question=params['prompt']\n", 914 | " results=execute_query_with_pagination(sqls, CLUSTER_IDENTIFIER, db, DB_USER) \n", 915 | " \n", 916 | " col_names=results[0].split('\\n')[0]\n", 917 | " observations=\"\\n\".join(sorted(results[0].split('\\n')[1:])).strip()\n", 918 | " params['schema']=f\"{col_names}\\n{observations}\"\n", 919 | " params['sample']=''\n", 920 | " for examples in results[1:]:\n", 921 | " params['sample']+=f\"{examples}\\n\\n\"\n", 922 | " # params['schema']=schema\n", 923 | " # params['sample']=schema_example\n", 924 | " \n", 925 | " prompts=f\"\"\"<>[INST]\n", 926 | "You are an expert PostgreSQL developer. Your job is to provide a syntactically correct PostgreSQL query given a user question.\n", 927 | "Here are the schema definition of table(s):\n", 928 | "########\n", 929 | "{params['schema']}\n", 930 | "########\n", 931 | "\n", 932 | "Here are example records for each table:\n", 933 | "##########\n", 934 | "{params['sample']}\n", 935 | "###########\n", 936 | "<>\n", 937 | "Here are some instructions when generating SQL statements:\n", 938 | "1. Determine the necessary table(s) and schema needed for an accurate query.\n", 939 | "2. Limit your queries to only the required columns to prevent unnecessary data retrieval and improve query performance.\n", 940 | "3. For clarity and to prevent potential conflicts, always include the schema name when referencing table names in your SQL queries.\n", 941 | "4. When working with Amazon Redshift table and column names containing underscores, do not use the backslash escape character (\\). Instead, use double quotes (\"\") to enclose the names in your queries.\n", 942 | "5. Do not mention 'dev' or 'public' in the queries.\n", 943 | "In your response, provide a single SQL statement to answer the question, avoid additional text that would cause failure during executing the sql. You MUST provide your answer according to the following format:\n", 944 | "Format your response as:\n", 945 | "\n", 946 | "generated SQL statement \n", 947 | "\n", 948 | "\n", 949 | "Question: {question}[/INST]\"\"\"\n", 950 | "\n", 951 | "# prompts=f\"\"\" [INST] You are an expert PostgreSQL developer. Your job is to provide a syntactically correct PostgreSQL query for Amazon Redshift Database.\n", 952 | "# Here are the schema definition of table(s):\n", 953 | "# {params['schema']}\n", 954 | "\n", 955 | "# Here are example records for each table:\n", 956 | "# {params['sample']}\n", 957 | "\n", 958 | "# Here are some instructions when generating SQL statements:\n", 959 | "# 1. Pay attention to database schema and table names and use them correctly in your response. \n", 960 | "# 2. Never query for all columns from a table. You must query only the columns that are needed to answer the question.\n", 961 | "# 3. Wrap all column name(s) in double quotes (\") to denote them as delimited identifiers. \n", 962 | "# 4. DO NOT escape underscores (_) in column name(s). Just wrap them in double quotes (\").\n", 963 | "# In your response, provide a single SQL statement to answer the question, avoid additional text that would cause failure during executing the sql. \n", 964 | "# Format your response as:\n", 965 | "# \n", 966 | "# generated SQL statement \n", 967 | "# \n", 968 | "# Question: {question} [/INST] \"\"\"\n", 969 | " q_s=query_llm(prompts,200)\n", 970 | " sql_pattern = re.compile(r'(.*?)(?:|$)', re.DOTALL) \n", 971 | " sql_match = re.search(sql_pattern, q_s)\n", 972 | " q_s = sql_match.group(1) \n", 973 | " q_s = q_s.replace(\"\\\\\",\"\")\n", 974 | " print(f\" FIRST ATTEMPT SQL:\\n{q_s}\")\n", 975 | " output, q_s=single_execute_query(q_s, CLUSTER_IDENTIFIER, db, DB_USER,question) \n", 976 | " input_token = count_tokens(output, MIXTRAL_ENDPOINT)\n", 977 | " if input_token>28000: \n", 978 | " csv_rows=output.split('\\n')\n", 979 | " chunk_rows=chunk_csv_rows(csv_rows, 20000)\n", 980 | " initial_summary=[]\n", 981 | " for chunk in chunk_rows:\n", 982 | " prompts=f'''<>[INST]You are a helpful and truthful assistant. Your job is provide answers based on samples of a tabular data provided.\n", 983 | "\n", 984 | "Here is the tabular data:\n", 985 | "#######\n", 986 | "{chunk}\n", 987 | "#######\n", 988 | "<>\n", 989 | "Question: {question}\n", 990 | "\n", 991 | "When providing your response:\n", 992 | "- First, review the result to understand the information within. Then provide a complete answer to the my question, based on the result.\n", 993 | "- If you can't answer the question, please say so[/INST]'''\n", 994 | " initial_summary.append(qna_llm(prompts,params))\n", 995 | " prompts = f'''<>[INST]You are a helpful and truthful assistant.\n", 996 | "\n", 997 | "Here are multiple answer for a question on different subset of a tabular data:\n", 998 | "#######\n", 999 | "{initial_summary}\n", 1000 | "#######\n", 1001 | "<>\n", 1002 | "Question: {question}\n", 1003 | "Based on the given question above, merege all answers provided in a coherent singular answer[/INST]'''\n", 1004 | " response=qna_llm(prompts,params)\n", 1005 | " \n", 1006 | " else: \n", 1007 | " prompts=f'''<>[INST]You are a helpful and truthful assistant. Your job is to examine a sql statement and its generated result, then provide a response to my question.\n", 1008 | "\n", 1009 | "Here is the sql query:\n", 1010 | "{q_s}\n", 1011 | "\n", 1012 | "Here is the corresponding sql query result:\n", 1013 | "{output}\n", 1014 | "<>\n", 1015 | "question: {question}\n", 1016 | "\n", 1017 | "When providing your response:\n", 1018 | "- First, review the sql query and the corresponding result. Then provide a complete answer to the my question, based on the result.\n", 1019 | "- If you can't answer the question, please say so[/INST]'''\n", 1020 | " response=qna_llm(prompts, params) \n", 1021 | " return response, q_s,output" 1022 | ] 1023 | }, 1024 | { 1025 | "cell_type": "code", 1026 | "execution_count": 58, 1027 | "id": "f69fcae3-32f0-4048-a2cf-74183b928182", 1028 | "metadata": { 1029 | "tags": [] 1030 | }, 1031 | "outputs": [ 1032 | { 1033 | "data": { 1034 | "text/plain": [ 1035 | "('sample_data_dev',\n", 1036 | " 'tickit',\n", 1037 | " ['category', 'date', 'event', 'listing', 'sales', 'users', 'venue'])" 1038 | ] 1039 | }, 1040 | "execution_count": 58, 1041 | "metadata": {}, 1042 | "output_type": "execute_result" 1043 | } 1044 | ], 1045 | "source": [ 1046 | "#db=get_db_redshift(CLUSTER_IDENTIFIER, DATABASE, DB_USER)[-1]\n", 1047 | "#schm=get_schema_redshift(CLUSTER_IDENTIFIER, db, DB_USER)[-1]\n", 1048 | "db='sample_data_dev'\n", 1049 | "schm = 'tickit'\n", 1050 | "tables=get_tables_redshift(CLUSTER_IDENTIFIER, db, DB_USER,schm)\n", 1051 | "db, schm, tables" 1052 | ] 1053 | }, 1054 | { 1055 | "cell_type": "markdown", 1056 | "id": "1b728224-5821-4ed2-8fb6-796d5c0a7fbf", 1057 | "metadata": {}, 1058 | "source": [ 1059 | "#### Example prompts:" 1060 | ] 1061 | }, 1062 | { 1063 | "cell_type": "code", 1064 | "execution_count": 59, 1065 | "id": "b9087bf8-f6b7-40d9-a849-58bf8d6a2923", 1066 | "metadata": { 1067 | "tags": [] 1068 | }, 1069 | "outputs": [], 1070 | "source": [ 1071 | "prompt1 = \"Who are the 5 people who spent the most on tickets for events?\"" 1072 | ] 1073 | }, 1074 | { 1075 | "cell_type": "code", 1076 | "execution_count": 60, 1077 | "id": "baec89b7-b910-4532-b929-4c8acb11645c", 1078 | "metadata": { 1079 | "tags": [] 1080 | }, 1081 | "outputs": [], 1082 | "source": [ 1083 | "prompt2 = \"the top five sellers names in San Diego, based on the number of tickets sold in 2008?\"" 1084 | ] 1085 | }, 1086 | { 1087 | "cell_type": "code", 1088 | "execution_count": 61, 1089 | "id": "b0065583-179c-466c-8f05-a787483a8d36", 1090 | "metadata": { 1091 | "tags": [] 1092 | }, 1093 | "outputs": [], 1094 | "source": [ 1095 | "prompt3 = \"What where the 10 events for which tickets took the longest to sell?\"" 1096 | ] 1097 | }, 1098 | { 1099 | "cell_type": "code", 1100 | "execution_count": 62, 1101 | "id": "7e64de7c-e8e7-416b-b60b-694d49481bff", 1102 | "metadata": { 1103 | "tags": [] 1104 | }, 1105 | "outputs": [], 1106 | "source": [ 1107 | "prompt4 = \"the most popular state to host events based on the number of venues per state.\"" 1108 | ] 1109 | }, 1110 | { 1111 | "cell_type": "code", 1112 | "execution_count": 63, 1113 | "id": "53a0781d-9ff2-4cd2-b85b-34a4322e96c7", 1114 | "metadata": { 1115 | "tags": [] 1116 | }, 1117 | "outputs": [], 1118 | "source": [ 1119 | "prompt5 = \"Number of Venues where the show Macbeth was held.\"" 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "code", 1124 | "execution_count": 27, 1125 | "id": "8fc5ed6b-ff95-4744-91d1-c3d9b912625b", 1126 | "metadata": { 1127 | "tags": [] 1128 | }, 1129 | "outputs": [], 1130 | "source": [ 1131 | "prompt6 = \"what are the top 10 buyers by quantity.\"" 1132 | ] 1133 | }, 1134 | { 1135 | "cell_type": "code", 1136 | "execution_count": 28, 1137 | "id": "cf8c2dd2-d2a7-4155-ab3d-b6f206cc601a", 1138 | "metadata": { 1139 | "tags": [] 1140 | }, 1141 | "outputs": [], 1142 | "source": [ 1143 | "prompt7 = \"for the top 10 events, count the number of times each of them occur.\"" 1144 | ] 1145 | }, 1146 | { 1147 | "cell_type": "code", 1148 | "execution_count": 29, 1149 | "id": "2803aff4-7484-4cab-b231-d264e2747fcd", 1150 | "metadata": { 1151 | "tags": [] 1152 | }, 1153 | "outputs": [], 1154 | "source": [ 1155 | "prompt8 = \"Total Commissions Generated for Macbeth at Royce Hall.\"" 1156 | ] 1157 | }, 1158 | { 1159 | "cell_type": "code", 1160 | "execution_count": 30, 1161 | "id": "b304c971-ee2c-453c-9358-bd4f0fa371d7", 1162 | "metadata": { 1163 | "tags": [] 1164 | }, 1165 | "outputs": [ 1166 | { 1167 | "data": { 1168 | "application/vnd.jupyter.widget-view+json": { 1169 | "model_id": "6211eb2c474341648eb17f9dab8be4b8", 1170 | "version_major": 2, 1171 | "version_minor": 0 1172 | }, 1173 | "text/plain": [ 1174 | "Text(value='', description='Enter prompt:')" 1175 | ] 1176 | }, 1177 | "metadata": {}, 1178 | "output_type": "display_data" 1179 | } 1180 | ], 1181 | "source": [ 1182 | "entered_text = widgets.Text(\n", 1183 | " value='',\n", 1184 | " description='Enter prompt:',\n", 1185 | ")\n", 1186 | "display(entered_text)" 1187 | ] 1188 | }, 1189 | { 1190 | "cell_type": "code", 1191 | "execution_count": 64, 1192 | "id": "6baba3e2-8c15-4b08-941f-e28a61094073", 1193 | "metadata": { 1194 | "tags": [] 1195 | }, 1196 | "outputs": [ 1197 | { 1198 | "name": "stdout", 1199 | "output_type": "stream", 1200 | "text": [ 1201 | "what are the top 10 buyers by quantity?\n" 1202 | ] 1203 | } 1204 | ], 1205 | "source": [ 1206 | "prompt = entered_text.value\n", 1207 | "params={'sql-len':700,'text-token':500,'tables':tables,'db':schm,'temp':0.1,'model_id':'mixtral',\n", 1208 | " \"prompt\":prompt}\n", 1209 | "print(params[\"prompt\"])" 1210 | ] 1211 | }, 1212 | { 1213 | "cell_type": "code", 1214 | "execution_count": 65, 1215 | "id": "642ae516-41e5-453e-be1a-ca7de1c1e9ec", 1216 | "metadata": { 1217 | "tags": [] 1218 | }, 1219 | "outputs": [ 1220 | { 1221 | "name": "stdout", 1222 | "output_type": "stream", 1223 | "text": [ 1224 | "Raw response from LLM: {\"generated_text\": \"\\nSELECT\\n buyerid,\\n SUM(qtysold) AS total_quantity\\nFROM\\n tickit.sales\\nGROUP BY\\n buyerid\\nORDER BY\\n total_quantity DESC\\nLIMIT 10;\\n\"}\n", 1225 | "Processed response: \n", 1226 | "SELECT\n", 1227 | " buyerid,\n", 1228 | " SUM(qtysold) AS total_quantity\n", 1229 | "FROM\n", 1230 | " tickit.sales\n", 1231 | "GROUP BY\n", 1232 | " buyerid\n", 1233 | "ORDER BY\n", 1234 | " total_quantity DESC\n", 1235 | "LIMIT 10;\n", 1236 | "\n", 1237 | " FIRST ATTEMPT SQL:\n", 1238 | "\n", 1239 | "SELECT\n", 1240 | " buyerid,\n", 1241 | " SUM(qtysold) AS total_quantity\n", 1242 | "FROM\n", 1243 | " tickit.sales\n", 1244 | "GROUP BY\n", 1245 | " buyerid\n", 1246 | "ORDER BY\n", 1247 | " total_quantity DESC\n", 1248 | "LIMIT 10;\n", 1249 | "\n", 1250 | "Response structure: {'generated_text': 'Based on the provided SQL query and its result, the top 10 buyers by quantity are as follows:\\n\\n1. Buyer ID 8933 with a total quantity of 67\\n2. Buyer ID 3797 with a total quantity of 64\\n3. Buyer ID 1298 with a total quantity of 64\\n4. Buyer ID 5002 with a total quantity of 63\\n5. Buyer ID 4064 with a total quantity of 60\\n6. Buyer ID 644 with a total quantity of 60\\n7. Buyer ID 3881 with a total quantity of 60\\n8. Buyer ID 522 with a total quantity of 60\\n9. Buyer ID 4842 with a total quantity of 60\\n10. Buyer ID 59'}\n", 1251 | "CPU times: user 124 ms, sys: 5.34 ms, total: 129 ms\n", 1252 | "Wall time: 20.6 s\n" 1253 | ] 1254 | } 1255 | ], 1256 | "source": [ 1257 | "%%time\n", 1258 | "result_text2sql = redshift_qna(params)" 1259 | ] 1260 | }, 1261 | { 1262 | "cell_type": "code", 1263 | "execution_count": 66, 1264 | "id": "73f7a388-c7dd-40df-bda1-3719044129d2", 1265 | "metadata": { 1266 | "tags": [] 1267 | }, 1268 | "outputs": [ 1269 | { 1270 | "name": "stdout", 1271 | "output_type": "stream", 1272 | "text": [ 1273 | "\n", 1274 | "Answer:\n", 1275 | "\n", 1276 | "Based on the provided SQL query and its result, the top 10 buyers by quantity are as follows:\n", 1277 | "\n", 1278 | "1. Buyer ID 8933 with a total quantity of 67\n", 1279 | "2. Buyer ID 3797 with a total quantity of 64\n", 1280 | "3. Buyer ID 1298 with a total quantity of 64\n", 1281 | "4. Buyer ID 5002 with a total quantity of 63\n", 1282 | "5. Buyer ID 4064 with a total quantity of 60\n", 1283 | "6. Buyer ID 644 with a total quantity of 60\n", 1284 | "7. Buyer ID 3881 with a total quantity of 60\n", 1285 | "8. Buyer ID 522 with a total quantity of 60\n", 1286 | "9. Buyer ID 4842 with a total quantity of 60\n", 1287 | "10. Buyer ID 59\n", 1288 | "\n" 1289 | ] 1290 | } 1291 | ], 1292 | "source": [ 1293 | "# Query result in Natural Language\n", 1294 | "print(f\"\\nAnswer:\\n\\n{result_text2sql[0]}\\n\")" 1295 | ] 1296 | }, 1297 | { 1298 | "cell_type": "code", 1299 | "execution_count": 67, 1300 | "id": "66463052-9f25-4ff9-a19e-76832b179e73", 1301 | "metadata": { 1302 | "tags": [] 1303 | }, 1304 | "outputs": [ 1305 | { 1306 | "name": "stdout", 1307 | "output_type": "stream", 1308 | "text": [ 1309 | "\n", 1310 | "SQL Query generated from the prompt:\n", 1311 | "\n" 1312 | ] 1313 | }, 1314 | { 1315 | "data": { 1316 | "text/html": [ 1317 | "
SELECT\n",
1392 |        "    buyerid,\n",
1393 |        "    SUM(qtysold) AS total_quantity\n",
1394 |        "FROM\n",
1395 |        "    tickit.sales\n",
1396 |        "GROUP BY\n",
1397 |        "    buyerid\n",
1398 |        "ORDER BY\n",
1399 |        "    total_quantity DESC\n",
1400 |        "LIMIT 10;\n",
1401 |        "
\n" 1402 | ], 1403 | "text/latex": [ 1404 | "\\begin{Verbatim}[commandchars=\\\\\\{\\}]\n", 1405 | "\\PY{k}{SELECT}\n", 1406 | "\\PY{+w}{ }\\PY{n}{buyerid}\\PY{p}{,}\n", 1407 | "\\PY{+w}{ }\\PY{k}{SUM}\\PY{p}{(}\\PY{n}{qtysold}\\PY{p}{)}\\PY{+w}{ }\\PY{k}{AS}\\PY{+w}{ }\\PY{n}{total\\PYZus{}quantity}\n", 1408 | "\\PY{k}{FROM}\n", 1409 | "\\PY{+w}{ }\\PY{n}{tickit}\\PY{p}{.}\\PY{n}{sales}\n", 1410 | "\\PY{k}{GROUP}\\PY{+w}{ }\\PY{k}{BY}\n", 1411 | "\\PY{+w}{ }\\PY{n}{buyerid}\n", 1412 | "\\PY{k}{ORDER}\\PY{+w}{ }\\PY{k}{BY}\n", 1413 | "\\PY{+w}{ }\\PY{n}{total\\PYZus{}quantity}\\PY{+w}{ }\\PY{k}{DESC}\n", 1414 | "\\PY{k}{LIMIT}\\PY{+w}{ }\\PY{l+m+mi}{10}\\PY{p}{;}\n", 1415 | "\\end{Verbatim}\n" 1416 | ], 1417 | "text/plain": [ 1418 | "\n", 1419 | "SELECT\n", 1420 | " buyerid,\n", 1421 | " SUM(qtysold) AS total_quantity\n", 1422 | "FROM\n", 1423 | " tickit.sales\n", 1424 | "GROUP BY\n", 1425 | " buyerid\n", 1426 | "ORDER BY\n", 1427 | " total_quantity DESC\n", 1428 | "LIMIT 10;" 1429 | ] 1430 | }, 1431 | "metadata": {}, 1432 | "output_type": "display_data" 1433 | }, 1434 | { 1435 | "name": "stdout", 1436 | "output_type": "stream", 1437 | "text": [ 1438 | "\n" 1439 | ] 1440 | } 1441 | ], 1442 | "source": [ 1443 | "# Generated SQL query used\n", 1444 | "from IPython.display import Code, display\n", 1445 | "print(f\"\\nSQL Query generated from the prompt:\\n\")\n", 1446 | "display(Code(result_text2sql[1], language='sql'))\n", 1447 | "print(\"\")" 1448 | ] 1449 | }, 1450 | { 1451 | "cell_type": "code", 1452 | "execution_count": 68, 1453 | "id": "884c31f9-917e-45a3-94fb-1c05eef677b3", 1454 | "metadata": { 1455 | "tags": [] 1456 | }, 1457 | "outputs": [ 1458 | { 1459 | "name": "stdout", 1460 | "output_type": "stream", 1461 | "text": [ 1462 | "\n", 1463 | "Tabular results from the SQL query:\n", 1464 | "\n" 1465 | ] 1466 | }, 1467 | { 1468 | "data": { 1469 | "text/html": [ 1470 | "
\n", 1471 | "\n", 1484 | "\n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | " \n", 1500 | " \n", 1501 | " \n", 1502 | " \n", 1503 | " \n", 1504 | " \n", 1505 | " \n", 1506 | " \n", 1507 | " \n", 1508 | " \n", 1509 | " \n", 1510 | " \n", 1511 | " \n", 1512 | " \n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | "
buyeridtotal_quantity
0893367
1379764
2129864
3500263
4406460
564460
6388160
752260
8484260
9595360
\n", 1545 | "
" 1546 | ], 1547 | "text/plain": [ 1548 | " buyerid total_quantity\n", 1549 | "0 8933 67\n", 1550 | "1 3797 64\n", 1551 | "2 1298 64\n", 1552 | "3 5002 63\n", 1553 | "4 4064 60\n", 1554 | "5 644 60\n", 1555 | "6 3881 60\n", 1556 | "7 522 60\n", 1557 | "8 4842 60\n", 1558 | "9 5953 60" 1559 | ] 1560 | }, 1561 | "execution_count": 68, 1562 | "metadata": {}, 1563 | "output_type": "execute_result" 1564 | } 1565 | ], 1566 | "source": [ 1567 | "# Tabular results from the SQL Query \n", 1568 | "print(f\"\\nTabular results from the SQL query:\\n\")\n", 1569 | "df=pd.read_csv(StringIO(result_text2sql[2]))\n", 1570 | "df" 1571 | ] 1572 | } 1573 | ], 1574 | "metadata": { 1575 | "availableInstances": [ 1576 | { 1577 | "_defaultOrder": 0, 1578 | "_isFastLaunch": true, 1579 | "category": "General purpose", 1580 | "gpuNum": 0, 1581 | "hideHardwareSpecs": false, 1582 | "memoryGiB": 4, 1583 | "name": "ml.t3.medium", 1584 | "vcpuNum": 2 1585 | }, 1586 | { 1587 | "_defaultOrder": 1, 1588 | "_isFastLaunch": false, 1589 | "category": "General purpose", 1590 | "gpuNum": 0, 1591 | "hideHardwareSpecs": false, 1592 | "memoryGiB": 8, 1593 | "name": "ml.t3.large", 1594 | "vcpuNum": 2 1595 | }, 1596 | { 1597 | "_defaultOrder": 2, 1598 | "_isFastLaunch": false, 1599 | "category": "General purpose", 1600 | "gpuNum": 0, 1601 | "hideHardwareSpecs": false, 1602 | "memoryGiB": 16, 1603 | "name": "ml.t3.xlarge", 1604 | "vcpuNum": 4 1605 | }, 1606 | { 1607 | "_defaultOrder": 3, 1608 | "_isFastLaunch": false, 1609 | "category": "General purpose", 1610 | "gpuNum": 0, 1611 | "hideHardwareSpecs": false, 1612 | "memoryGiB": 32, 1613 | "name": "ml.t3.2xlarge", 1614 | "vcpuNum": 8 1615 | }, 1616 | { 1617 | "_defaultOrder": 4, 1618 | "_isFastLaunch": true, 1619 | "category": "General purpose", 1620 | "gpuNum": 0, 1621 | "hideHardwareSpecs": false, 1622 | "memoryGiB": 8, 1623 | "name": "ml.m5.large", 1624 | "vcpuNum": 2 1625 | }, 1626 | { 1627 | "_defaultOrder": 5, 1628 | "_isFastLaunch": false, 1629 | "category": "General purpose", 1630 | "gpuNum": 0, 1631 | "hideHardwareSpecs": false, 1632 | "memoryGiB": 16, 1633 | "name": "ml.m5.xlarge", 1634 | "vcpuNum": 4 1635 | }, 1636 | { 1637 | "_defaultOrder": 6, 1638 | "_isFastLaunch": false, 1639 | "category": "General purpose", 1640 | "gpuNum": 0, 1641 | "hideHardwareSpecs": false, 1642 | "memoryGiB": 32, 1643 | "name": "ml.m5.2xlarge", 1644 | "vcpuNum": 8 1645 | }, 1646 | { 1647 | "_defaultOrder": 7, 1648 | "_isFastLaunch": false, 1649 | "category": "General purpose", 1650 | "gpuNum": 0, 1651 | "hideHardwareSpecs": false, 1652 | "memoryGiB": 64, 1653 | "name": "ml.m5.4xlarge", 1654 | "vcpuNum": 16 1655 | }, 1656 | { 1657 | "_defaultOrder": 8, 1658 | "_isFastLaunch": false, 1659 | "category": "General purpose", 1660 | "gpuNum": 0, 1661 | "hideHardwareSpecs": false, 1662 | "memoryGiB": 128, 1663 | "name": "ml.m5.8xlarge", 1664 | "vcpuNum": 32 1665 | }, 1666 | { 1667 | "_defaultOrder": 9, 1668 | "_isFastLaunch": false, 1669 | "category": "General purpose", 1670 | "gpuNum": 0, 1671 | "hideHardwareSpecs": false, 1672 | "memoryGiB": 192, 1673 | "name": "ml.m5.12xlarge", 1674 | "vcpuNum": 48 1675 | }, 1676 | { 1677 | "_defaultOrder": 10, 1678 | "_isFastLaunch": false, 1679 | "category": "General purpose", 1680 | "gpuNum": 0, 1681 | "hideHardwareSpecs": false, 1682 | "memoryGiB": 256, 1683 | "name": "ml.m5.16xlarge", 1684 | "vcpuNum": 64 1685 | }, 1686 | { 1687 | "_defaultOrder": 11, 1688 | "_isFastLaunch": false, 1689 | "category": "General purpose", 1690 | "gpuNum": 0, 1691 | "hideHardwareSpecs": false, 1692 | "memoryGiB": 384, 1693 | "name": "ml.m5.24xlarge", 1694 | "vcpuNum": 96 1695 | }, 1696 | { 1697 | "_defaultOrder": 12, 1698 | "_isFastLaunch": false, 1699 | "category": "General purpose", 1700 | "gpuNum": 0, 1701 | "hideHardwareSpecs": false, 1702 | "memoryGiB": 8, 1703 | "name": "ml.m5d.large", 1704 | "vcpuNum": 2 1705 | }, 1706 | { 1707 | "_defaultOrder": 13, 1708 | "_isFastLaunch": false, 1709 | "category": "General purpose", 1710 | "gpuNum": 0, 1711 | "hideHardwareSpecs": false, 1712 | "memoryGiB": 16, 1713 | "name": "ml.m5d.xlarge", 1714 | "vcpuNum": 4 1715 | }, 1716 | { 1717 | "_defaultOrder": 14, 1718 | "_isFastLaunch": false, 1719 | "category": "General purpose", 1720 | "gpuNum": 0, 1721 | "hideHardwareSpecs": false, 1722 | "memoryGiB": 32, 1723 | "name": "ml.m5d.2xlarge", 1724 | "vcpuNum": 8 1725 | }, 1726 | { 1727 | "_defaultOrder": 15, 1728 | "_isFastLaunch": false, 1729 | "category": "General purpose", 1730 | "gpuNum": 0, 1731 | "hideHardwareSpecs": false, 1732 | "memoryGiB": 64, 1733 | "name": "ml.m5d.4xlarge", 1734 | "vcpuNum": 16 1735 | }, 1736 | { 1737 | "_defaultOrder": 16, 1738 | "_isFastLaunch": false, 1739 | "category": "General purpose", 1740 | "gpuNum": 0, 1741 | "hideHardwareSpecs": false, 1742 | "memoryGiB": 128, 1743 | "name": "ml.m5d.8xlarge", 1744 | "vcpuNum": 32 1745 | }, 1746 | { 1747 | "_defaultOrder": 17, 1748 | "_isFastLaunch": false, 1749 | "category": "General purpose", 1750 | "gpuNum": 0, 1751 | "hideHardwareSpecs": false, 1752 | "memoryGiB": 192, 1753 | "name": "ml.m5d.12xlarge", 1754 | "vcpuNum": 48 1755 | }, 1756 | { 1757 | "_defaultOrder": 18, 1758 | "_isFastLaunch": false, 1759 | "category": "General purpose", 1760 | "gpuNum": 0, 1761 | "hideHardwareSpecs": false, 1762 | "memoryGiB": 256, 1763 | "name": "ml.m5d.16xlarge", 1764 | "vcpuNum": 64 1765 | }, 1766 | { 1767 | "_defaultOrder": 19, 1768 | "_isFastLaunch": false, 1769 | "category": "General purpose", 1770 | "gpuNum": 0, 1771 | "hideHardwareSpecs": false, 1772 | "memoryGiB": 384, 1773 | "name": "ml.m5d.24xlarge", 1774 | "vcpuNum": 96 1775 | }, 1776 | { 1777 | "_defaultOrder": 20, 1778 | "_isFastLaunch": false, 1779 | "category": "General purpose", 1780 | "gpuNum": 0, 1781 | "hideHardwareSpecs": true, 1782 | "memoryGiB": 0, 1783 | "name": "ml.geospatial.interactive", 1784 | "supportedImageNames": [ 1785 | "sagemaker-geospatial-v1-0" 1786 | ], 1787 | "vcpuNum": 0 1788 | }, 1789 | { 1790 | "_defaultOrder": 21, 1791 | "_isFastLaunch": true, 1792 | "category": "Compute optimized", 1793 | "gpuNum": 0, 1794 | "hideHardwareSpecs": false, 1795 | "memoryGiB": 4, 1796 | "name": "ml.c5.large", 1797 | "vcpuNum": 2 1798 | }, 1799 | { 1800 | "_defaultOrder": 22, 1801 | "_isFastLaunch": false, 1802 | "category": "Compute optimized", 1803 | "gpuNum": 0, 1804 | "hideHardwareSpecs": false, 1805 | "memoryGiB": 8, 1806 | "name": "ml.c5.xlarge", 1807 | "vcpuNum": 4 1808 | }, 1809 | { 1810 | "_defaultOrder": 23, 1811 | "_isFastLaunch": false, 1812 | "category": "Compute optimized", 1813 | "gpuNum": 0, 1814 | "hideHardwareSpecs": false, 1815 | "memoryGiB": 16, 1816 | "name": "ml.c5.2xlarge", 1817 | "vcpuNum": 8 1818 | }, 1819 | { 1820 | "_defaultOrder": 24, 1821 | "_isFastLaunch": false, 1822 | "category": "Compute optimized", 1823 | "gpuNum": 0, 1824 | "hideHardwareSpecs": false, 1825 | "memoryGiB": 32, 1826 | "name": "ml.c5.4xlarge", 1827 | "vcpuNum": 16 1828 | }, 1829 | { 1830 | "_defaultOrder": 25, 1831 | "_isFastLaunch": false, 1832 | "category": "Compute optimized", 1833 | "gpuNum": 0, 1834 | "hideHardwareSpecs": false, 1835 | "memoryGiB": 72, 1836 | "name": "ml.c5.9xlarge", 1837 | "vcpuNum": 36 1838 | }, 1839 | { 1840 | "_defaultOrder": 26, 1841 | "_isFastLaunch": false, 1842 | "category": "Compute optimized", 1843 | "gpuNum": 0, 1844 | "hideHardwareSpecs": false, 1845 | "memoryGiB": 96, 1846 | "name": "ml.c5.12xlarge", 1847 | "vcpuNum": 48 1848 | }, 1849 | { 1850 | "_defaultOrder": 27, 1851 | "_isFastLaunch": false, 1852 | "category": "Compute optimized", 1853 | "gpuNum": 0, 1854 | "hideHardwareSpecs": false, 1855 | "memoryGiB": 144, 1856 | "name": "ml.c5.18xlarge", 1857 | "vcpuNum": 72 1858 | }, 1859 | { 1860 | "_defaultOrder": 28, 1861 | "_isFastLaunch": false, 1862 | "category": "Compute optimized", 1863 | "gpuNum": 0, 1864 | "hideHardwareSpecs": false, 1865 | "memoryGiB": 192, 1866 | "name": "ml.c5.24xlarge", 1867 | "vcpuNum": 96 1868 | }, 1869 | { 1870 | "_defaultOrder": 29, 1871 | "_isFastLaunch": true, 1872 | "category": "Accelerated computing", 1873 | "gpuNum": 1, 1874 | "hideHardwareSpecs": false, 1875 | "memoryGiB": 16, 1876 | "name": "ml.g4dn.xlarge", 1877 | "vcpuNum": 4 1878 | }, 1879 | { 1880 | "_defaultOrder": 30, 1881 | "_isFastLaunch": false, 1882 | "category": "Accelerated computing", 1883 | "gpuNum": 1, 1884 | "hideHardwareSpecs": false, 1885 | "memoryGiB": 32, 1886 | "name": "ml.g4dn.2xlarge", 1887 | "vcpuNum": 8 1888 | }, 1889 | { 1890 | "_defaultOrder": 31, 1891 | "_isFastLaunch": false, 1892 | "category": "Accelerated computing", 1893 | "gpuNum": 1, 1894 | "hideHardwareSpecs": false, 1895 | "memoryGiB": 64, 1896 | "name": "ml.g4dn.4xlarge", 1897 | "vcpuNum": 16 1898 | }, 1899 | { 1900 | "_defaultOrder": 32, 1901 | "_isFastLaunch": false, 1902 | "category": "Accelerated computing", 1903 | "gpuNum": 1, 1904 | "hideHardwareSpecs": false, 1905 | "memoryGiB": 128, 1906 | "name": "ml.g4dn.8xlarge", 1907 | "vcpuNum": 32 1908 | }, 1909 | { 1910 | "_defaultOrder": 33, 1911 | "_isFastLaunch": false, 1912 | "category": "Accelerated computing", 1913 | "gpuNum": 4, 1914 | "hideHardwareSpecs": false, 1915 | "memoryGiB": 192, 1916 | "name": "ml.g4dn.12xlarge", 1917 | "vcpuNum": 48 1918 | }, 1919 | { 1920 | "_defaultOrder": 34, 1921 | "_isFastLaunch": false, 1922 | "category": "Accelerated computing", 1923 | "gpuNum": 1, 1924 | "hideHardwareSpecs": false, 1925 | "memoryGiB": 256, 1926 | "name": "ml.g4dn.16xlarge", 1927 | "vcpuNum": 64 1928 | }, 1929 | { 1930 | "_defaultOrder": 35, 1931 | "_isFastLaunch": false, 1932 | "category": "Accelerated computing", 1933 | "gpuNum": 1, 1934 | "hideHardwareSpecs": false, 1935 | "memoryGiB": 61, 1936 | "name": "ml.p3.2xlarge", 1937 | "vcpuNum": 8 1938 | }, 1939 | { 1940 | "_defaultOrder": 36, 1941 | "_isFastLaunch": false, 1942 | "category": "Accelerated computing", 1943 | "gpuNum": 4, 1944 | "hideHardwareSpecs": false, 1945 | "memoryGiB": 244, 1946 | "name": "ml.p3.8xlarge", 1947 | "vcpuNum": 32 1948 | }, 1949 | { 1950 | "_defaultOrder": 37, 1951 | "_isFastLaunch": false, 1952 | "category": "Accelerated computing", 1953 | "gpuNum": 8, 1954 | "hideHardwareSpecs": false, 1955 | "memoryGiB": 488, 1956 | "name": "ml.p3.16xlarge", 1957 | "vcpuNum": 64 1958 | }, 1959 | { 1960 | "_defaultOrder": 38, 1961 | "_isFastLaunch": false, 1962 | "category": "Accelerated computing", 1963 | "gpuNum": 8, 1964 | "hideHardwareSpecs": false, 1965 | "memoryGiB": 768, 1966 | "name": "ml.p3dn.24xlarge", 1967 | "vcpuNum": 96 1968 | }, 1969 | { 1970 | "_defaultOrder": 39, 1971 | "_isFastLaunch": false, 1972 | "category": "Memory Optimized", 1973 | "gpuNum": 0, 1974 | "hideHardwareSpecs": false, 1975 | "memoryGiB": 16, 1976 | "name": "ml.r5.large", 1977 | "vcpuNum": 2 1978 | }, 1979 | { 1980 | "_defaultOrder": 40, 1981 | "_isFastLaunch": false, 1982 | "category": "Memory Optimized", 1983 | "gpuNum": 0, 1984 | "hideHardwareSpecs": false, 1985 | "memoryGiB": 32, 1986 | "name": "ml.r5.xlarge", 1987 | "vcpuNum": 4 1988 | }, 1989 | { 1990 | "_defaultOrder": 41, 1991 | "_isFastLaunch": false, 1992 | "category": "Memory Optimized", 1993 | "gpuNum": 0, 1994 | "hideHardwareSpecs": false, 1995 | "memoryGiB": 64, 1996 | "name": "ml.r5.2xlarge", 1997 | "vcpuNum": 8 1998 | }, 1999 | { 2000 | "_defaultOrder": 42, 2001 | "_isFastLaunch": false, 2002 | "category": "Memory Optimized", 2003 | "gpuNum": 0, 2004 | "hideHardwareSpecs": false, 2005 | "memoryGiB": 128, 2006 | "name": "ml.r5.4xlarge", 2007 | "vcpuNum": 16 2008 | }, 2009 | { 2010 | "_defaultOrder": 43, 2011 | "_isFastLaunch": false, 2012 | "category": "Memory Optimized", 2013 | "gpuNum": 0, 2014 | "hideHardwareSpecs": false, 2015 | "memoryGiB": 256, 2016 | "name": "ml.r5.8xlarge", 2017 | "vcpuNum": 32 2018 | }, 2019 | { 2020 | "_defaultOrder": 44, 2021 | "_isFastLaunch": false, 2022 | "category": "Memory Optimized", 2023 | "gpuNum": 0, 2024 | "hideHardwareSpecs": false, 2025 | "memoryGiB": 384, 2026 | "name": "ml.r5.12xlarge", 2027 | "vcpuNum": 48 2028 | }, 2029 | { 2030 | "_defaultOrder": 45, 2031 | "_isFastLaunch": false, 2032 | "category": "Memory Optimized", 2033 | "gpuNum": 0, 2034 | "hideHardwareSpecs": false, 2035 | "memoryGiB": 512, 2036 | "name": "ml.r5.16xlarge", 2037 | "vcpuNum": 64 2038 | }, 2039 | { 2040 | "_defaultOrder": 46, 2041 | "_isFastLaunch": false, 2042 | "category": "Memory Optimized", 2043 | "gpuNum": 0, 2044 | "hideHardwareSpecs": false, 2045 | "memoryGiB": 768, 2046 | "name": "ml.r5.24xlarge", 2047 | "vcpuNum": 96 2048 | }, 2049 | { 2050 | "_defaultOrder": 47, 2051 | "_isFastLaunch": false, 2052 | "category": "Accelerated computing", 2053 | "gpuNum": 1, 2054 | "hideHardwareSpecs": false, 2055 | "memoryGiB": 16, 2056 | "name": "ml.g5.xlarge", 2057 | "vcpuNum": 4 2058 | }, 2059 | { 2060 | "_defaultOrder": 48, 2061 | "_isFastLaunch": false, 2062 | "category": "Accelerated computing", 2063 | "gpuNum": 1, 2064 | "hideHardwareSpecs": false, 2065 | "memoryGiB": 32, 2066 | "name": "ml.g5.2xlarge", 2067 | "vcpuNum": 8 2068 | }, 2069 | { 2070 | "_defaultOrder": 49, 2071 | "_isFastLaunch": false, 2072 | "category": "Accelerated computing", 2073 | "gpuNum": 1, 2074 | "hideHardwareSpecs": false, 2075 | "memoryGiB": 64, 2076 | "name": "ml.g5.4xlarge", 2077 | "vcpuNum": 16 2078 | }, 2079 | { 2080 | "_defaultOrder": 50, 2081 | "_isFastLaunch": false, 2082 | "category": "Accelerated computing", 2083 | "gpuNum": 1, 2084 | "hideHardwareSpecs": false, 2085 | "memoryGiB": 128, 2086 | "name": "ml.g5.8xlarge", 2087 | "vcpuNum": 32 2088 | }, 2089 | { 2090 | "_defaultOrder": 51, 2091 | "_isFastLaunch": false, 2092 | "category": "Accelerated computing", 2093 | "gpuNum": 1, 2094 | "hideHardwareSpecs": false, 2095 | "memoryGiB": 256, 2096 | "name": "ml.g5.16xlarge", 2097 | "vcpuNum": 64 2098 | }, 2099 | { 2100 | "_defaultOrder": 52, 2101 | "_isFastLaunch": false, 2102 | "category": "Accelerated computing", 2103 | "gpuNum": 4, 2104 | "hideHardwareSpecs": false, 2105 | "memoryGiB": 192, 2106 | "name": "ml.g5.12xlarge", 2107 | "vcpuNum": 48 2108 | }, 2109 | { 2110 | "_defaultOrder": 53, 2111 | "_isFastLaunch": false, 2112 | "category": "Accelerated computing", 2113 | "gpuNum": 4, 2114 | "hideHardwareSpecs": false, 2115 | "memoryGiB": 384, 2116 | "name": "ml.g5.24xlarge", 2117 | "vcpuNum": 96 2118 | }, 2119 | { 2120 | "_defaultOrder": 54, 2121 | "_isFastLaunch": false, 2122 | "category": "Accelerated computing", 2123 | "gpuNum": 8, 2124 | "hideHardwareSpecs": false, 2125 | "memoryGiB": 768, 2126 | "name": "ml.g5.48xlarge", 2127 | "vcpuNum": 192 2128 | }, 2129 | { 2130 | "_defaultOrder": 55, 2131 | "_isFastLaunch": false, 2132 | "category": "Accelerated computing", 2133 | "gpuNum": 8, 2134 | "hideHardwareSpecs": false, 2135 | "memoryGiB": 1152, 2136 | "name": "ml.p4d.24xlarge", 2137 | "vcpuNum": 96 2138 | }, 2139 | { 2140 | "_defaultOrder": 56, 2141 | "_isFastLaunch": false, 2142 | "category": "Accelerated computing", 2143 | "gpuNum": 8, 2144 | "hideHardwareSpecs": false, 2145 | "memoryGiB": 1152, 2146 | "name": "ml.p4de.24xlarge", 2147 | "vcpuNum": 96 2148 | }, 2149 | { 2150 | "_defaultOrder": 57, 2151 | "_isFastLaunch": false, 2152 | "category": "Accelerated computing", 2153 | "gpuNum": 0, 2154 | "hideHardwareSpecs": false, 2155 | "memoryGiB": 32, 2156 | "name": "ml.trn1.2xlarge", 2157 | "vcpuNum": 8 2158 | }, 2159 | { 2160 | "_defaultOrder": 58, 2161 | "_isFastLaunch": false, 2162 | "category": "Accelerated computing", 2163 | "gpuNum": 0, 2164 | "hideHardwareSpecs": false, 2165 | "memoryGiB": 512, 2166 | "name": "ml.trn1.32xlarge", 2167 | "vcpuNum": 128 2168 | }, 2169 | { 2170 | "_defaultOrder": 59, 2171 | "_isFastLaunch": false, 2172 | "category": "Accelerated computing", 2173 | "gpuNum": 0, 2174 | "hideHardwareSpecs": false, 2175 | "memoryGiB": 512, 2176 | "name": "ml.trn1n.32xlarge", 2177 | "vcpuNum": 128 2178 | } 2179 | ], 2180 | "instance_type": "ml.t3.medium", 2181 | "kernelspec": { 2182 | "display_name": "Python 3 (ipykernel)", 2183 | "language": "python", 2184 | "name": "python3" 2185 | }, 2186 | "language_info": { 2187 | "codemirror_mode": { 2188 | "name": "ipython", 2189 | "version": 3 2190 | }, 2191 | "file_extension": ".py", 2192 | "mimetype": "text/x-python", 2193 | "name": "python", 2194 | "nbconvert_exporter": "python", 2195 | "pygments_lexer": "ipython3", 2196 | "version": "3.11.11" 2197 | } 2198 | }, 2199 | "nbformat": 4, 2200 | "nbformat_minor": 5 2201 | } 2202 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | sentencepiece 3 | pandas 4 | anthropic 5 | uuid 6 | transformers 7 | tiktoken 8 | langchain 9 | s3fs 10 | --------------------------------------------------------------------------------