├── readme.md
└── text2cql_astradb_demo.ipynb


/readme.md:
--------------------------------------------------------------------------------
 1 | # README
 2 | 
 3 | Text-to-CQL demo with Astra DB.
 4 | 
 5 | Click here to run in [Colab](https://colab.research.google.com/github/datastaxdevs/demo-astradb-text2cql/blob/main/text2cql_astradb_demo.ipynb).
 6 | 
 7 | Alternatively you can run it locally after installing Jupyter. Requires Python 3.9+.
 8 | 
 9 | Visit the [documentation page](https://docs.datastax.com/en/astra-db-serverless/get-started/examples.html).
10 | 


--------------------------------------------------------------------------------
/text2cql_astradb_demo.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "id": "b69b5b27-f4dc-439b-8ae5-b3f776fa3745",
   6 |    "metadata": {
   7 |     "id": "b69b5b27-f4dc-439b-8ae5-b3f776fa3745"
   8 |    },
   9 |    "source": [
  10 |     "# Using LLMs to Generate CQL"
  11 |    ]
  12 |   },
  13 |   {
  14 |    "cell_type": "markdown",
  15 |    "id": "107770b0-abda-43ee-af5d-eae2100b65ad",
  16 |    "metadata": {
  17 |     "id": "107770b0-abda-43ee-af5d-eae2100b65ad"
  18 |    },
  19 |    "source": [
  20 |     "This demo shows usage of LLMs to integrate generation of queries in CQL (Cassandra Query Language) as part of a RAG pipeline.\n",
  21 |     "\n",
  22 |     "Starting from a natural-language question:\n",
  23 |     "- an LLM is asked to produce a CQL query;\n",
  24 |     "- the query is then executed on the database;\n",
  25 |     "- the resulting rows are presented to the LLM within a prompt instructing it to provide a final answer...\n",
  26 |     "- which is the final output, thus completing a CQL-generation-enriched RAG process.\n",
  27 |     "\n",
  28 |     "In each question-answering process, then, there are two distinct phases where LLMs are used.\n",
  29 |     "\n",
  30 |     "The database is an [Astra DB](https://docs.datastax.com/en/astra-db-serverless/index.html) instance, populated with fictitious sample data. Astra DB can be used via its Data API or, as in this case, using the CQL protocol.\n",
  31 |     "\n",
  32 |     "This example is derived from the [SQL-PaLM](https://arxiv.org/abs/2306.00739) paper. In the paper, the general strategy is described for the SQL language, consisting of first showing the LLM a DB schema in a standardized format, then ask it to produce a query for the user question."
  33 |    ]
  34 |   },
  35 |   {
  36 |    "cell_type": "markdown",
  37 |    "id": "1ffeccb0-d70c-4f15-b924-6e8cd7a5b30e",
  38 |    "metadata": {
  39 |     "id": "1ffeccb0-d70c-4f15-b924-6e8cd7a5b30e"
  40 |    },
  41 |    "source": [
  42 |     "## Setup"
  43 |    ]
  44 |   },
  45 |   {
  46 |    "cell_type": "markdown",
  47 |    "id": "b71640c3-3495-4459-837e-08d6e80ed410",
  48 |    "metadata": {
  49 |     "id": "b71640c3-3495-4459-837e-08d6e80ed410"
  50 |    },
  51 |    "source": [
  52 |     "#### Requirements"
  53 |    ]
  54 |   },
  55 |   {
  56 |    "cell_type": "code",
  57 |    "execution_count": 1,
  58 |    "id": "b43e0d9c-c760-485f-877b-df577f2cfacf",
  59 |    "metadata": {
  60 |     "id": "b43e0d9c-c760-485f-877b-df577f2cfacf"
  61 |    },
  62 |    "outputs": [
  63 |     {
  64 |      "name": "stdout",
  65 |      "output_type": "stream",
  66 |      "text": [
  67 |       "\n",
  68 |       "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.0.1\u001b[0m\n",
  69 |       "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
  70 |      ]
  71 |     }
  72 |    ],
  73 |    "source": [
  74 |     "# Install requirements, if not already installed\n",
  75 |     "!pip install -q \"openai>=1.73\" \"cassio>=0.1.10\""
  76 |    ]
  77 |   },
  78 |   {
  79 |    "cell_type": "code",
  80 |    "execution_count": 2,
  81 |    "id": "2288af41-77a4-4163-8753-5c7cd93aeabb",
  82 |    "metadata": {},
  83 |    "outputs": [],
  84 |    "source": [
  85 |     "import os\n",
  86 |     "from getpass import getpass\n",
  87 |     "\n",
  88 |     "import cassio\n",
  89 |     "import openai"
  90 |    ]
  91 |   },
  92 |   {
  93 |    "cell_type": "markdown",
  94 |    "id": "ae7063e7-80c4-49ae-84f1-3345aa3bbef1",
  95 |    "metadata": {
  96 |     "id": "ae7063e7-80c4-49ae-84f1-3345aa3bbef1"
  97 |    },
  98 |    "source": [
  99 |     "#### Connect to Services"
 100 |    ]
 101 |   },
 102 |   {
 103 |    "cell_type": "code",
 104 |    "execution_count": 3,
 105 |    "id": "79e94c9e-3c81-4170-bd37-6326b3875c60",
 106 |    "metadata": {},
 107 |    "outputs": [
 108 |     {
 109 |      "name": "stdin",
 110 |      "output_type": "stream",
 111 |      "text": [
 112 |       "OpenAI API Key:  ········\n"
 113 |      ]
 114 |     }
 115 |    ],
 116 |    "source": [
 117 |     "# OpenAI secrets\n",
 118 |     "if \"OPENAI_API_KEY\" not in os.environ:\n",
 119 |     "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API Key: \")"
 120 |    ]
 121 |   },
 122 |   {
 123 |    "cell_type": "code",
 124 |    "execution_count": 4,
 125 |    "id": "350ffbca-6b42-44dd-95cd-768f2c15e027",
 126 |    "metadata": {},
 127 |    "outputs": [
 128 |     {
 129 |      "name": "stdin",
 130 |      "output_type": "stream",
 131 |      "text": [
 132 |       "Astra DB Token:  ········\n",
 133 |       "Astra DB API endpoint:  https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
 134 |       "Astra DB Keyspace (empty for default):  \n"
 135 |      ]
 136 |     }
 137 |    ],
 138 |    "source": [
 139 |     "# Secrets and connection parameters for Astra DB\n",
 140 |     "if \"ASTRA_DB_APPLICATION_TOKEN\" not in os.environ:\n",
 141 |     "    os.environ[\"ASTRA_DB_APPLICATION_TOKEN\"] = getpass(\"Astra DB Token: \").strip()\n",
 142 |     "\n",
 143 |     "if \"ASTRA_DB_API_ENDPOINT\" not in os.environ:\n",
 144 |     "    os.environ[\"ASTRA_DB_API_ENDPOINT\"] = input(\"Astra DB API endpoint: \").strip()\n",
 145 |     "\n",
 146 |     "if not os.getenv(\"ASTRA_DB_KEYSPACE\"):\n",
 147 |     "    os.environ[\"ASTRA_DB_KEYSPACE\"] = input(\"Astra DB Keyspace (empty for default): \").strip()"
 148 |    ]
 149 |   },
 150 |   {
 151 |    "cell_type": "code",
 152 |    "execution_count": 5,
 153 |    "id": "9767fc7a-a9d8-4e89-b32c-805243700348",
 154 |    "metadata": {
 155 |     "colab": {
 156 |      "base_uri": "https://localhost:8080/"
 157 |     },
 158 |     "id": "9767fc7a-a9d8-4e89-b32c-805243700348",
 159 |     "outputId": "c6d2e944-ea84-4f1a-d551-6a3583c5f5a4"
 160 |    },
 161 |    "outputs": [],
 162 |    "source": [
 163 |     "# Initialize the OpenAI Client\n",
 164 |     "client = openai.OpenAI()"
 165 |    ]
 166 |   },
 167 |   {
 168 |    "cell_type": "code",
 169 |    "execution_count": 6,
 170 |    "id": "13722daf-bc68-4fef-93b8-a18d17482948",
 171 |    "metadata": {},
 172 |    "outputs": [
 173 |     {
 174 |      "name": "stdout",
 175 |      "output_type": "stream",
 176 |      "text": [
 177 |       "Connected to Astra DB with session=<cassandra.cluster.Session object at 0x7fa7151a4200>, keyspace=default_keyspace.\n"
 178 |      ]
 179 |     }
 180 |    ],
 181 |    "source": [
 182 |     "# Connect to Astra DB with a CQL session:\n",
 183 |     "database_id = os.environ[\"ASTRA_DB_API_ENDPOINT\"][8 : 8 + 36]\n",
 184 |     "\n",
 185 |     "cassio.init(\n",
 186 |     "    database_id=database_id,\n",
 187 |     "    token=os.environ[\"ASTRA_DB_APPLICATION_TOKEN\"],\n",
 188 |     "    keyspace=os.getenv(\"ASTRA_DB_KEYSPACE\") or None,\n",
 189 |     ")\n",
 190 |     "\n",
 191 |     "session = cassio.config.resolve_session()\n",
 192 |     "keyspace = cassio.config.resolve_keyspace()\n",
 193 |     "session.execute(f\"USE {keyspace};\")\n",
 194 |     "\n",
 195 |     "print(f\"Connected to Astra DB with session={session}, keyspace={keyspace}.\")"
 196 |    ]
 197 |   },
 198 |   {
 199 |    "cell_type": "code",
 200 |    "execution_count": 7,
 201 |    "id": "6a47bad8-9e13-41fa-af6d-724b72f702cd",
 202 |    "metadata": {
 203 |     "id": "6a47bad8-9e13-41fa-af6d-724b72f702cd"
 204 |    },
 205 |    "outputs": [],
 206 |    "source": [
 207 |     "# Tool to run CQL statements\n",
 208 |     "def execute_statement(statement: str):\n",
 209 |     "    # This is a simple wrapper around executing CQL statements in our\n",
 210 |     "    # Cassandra cluster, and either raising an error or returning the results\n",
 211 |     "    try:\n",
 212 |     "        rows = session.execute(statement)\n",
 213 |     "        return rows.all()\n",
 214 |     "    except Exception as e:\n",
 215 |     "        print(f\"Query '{statement}' failed with error {str(e)}.\")\n",
 216 |     "        raise"
 217 |    ]
 218 |   },
 219 |   {
 220 |    "cell_type": "markdown",
 221 |    "id": "890b2537-f585-4f31-bb20-ed71910eb586",
 222 |    "metadata": {
 223 |     "id": "890b2537-f585-4f31-bb20-ed71910eb586"
 224 |    },
 225 |    "source": [
 226 |     "#### (Optional) Dummy DB Setup"
 227 |    ]
 228 |   },
 229 |   {
 230 |    "cell_type": "markdown",
 231 |    "id": "db89558d-f24b-4517-bf46-97c65c2071ac",
 232 |    "metadata": {
 233 |     "id": "db89558d-f24b-4517-bf46-97c65c2071ac"
 234 |    },
 235 |    "source": [
 236 |     "Feel free to skip this section if you are instead adapting the notebook to fit your existing Cassandra Database. Here, we will utilize the python `cassandra-driver` package to connect to a DB and create some fake tables. This schema is pulled from [this DataStax example](https://www.datastax.com/learn/data-modeling-by-example/digital-library-data-model) on creating a data model for a digital music library."
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "code",
 241 |    "execution_count": 8,
 242 |    "id": "df4e69c2-0cda-42e3-8718-8f7ac6da230f",
 243 |    "metadata": {
 244 |     "id": "df4e69c2-0cda-42e3-8718-8f7ac6da230f"
 245 |    },
 246 |    "outputs": [],
 247 |    "source": [
 248 |     "# Create all necessary tables\n",
 249 |     "create_tables_cql = \"\"\"CREATE TABLE performers (\n",
 250 |     "    name TEXT PRIMARY KEY,\n",
 251 |     "    type TEXT,\n",
 252 |     "    country TEXT,\n",
 253 |     "    born INT,\n",
 254 |     "    died INT,\n",
 255 |     "    founded INT\n",
 256 |     ");\n",
 257 |     "\n",
 258 |     "CREATE TABLE albums_by_performer (\n",
 259 |     "    performer TEXT,\n",
 260 |     "    year INT,\n",
 261 |     "    title TEXT,\n",
 262 |     "    genre TEXT,\n",
 263 |     "    PRIMARY KEY (performer, year, title)\n",
 264 |     ") WITH CLUSTERING ORDER BY (year DESC, title ASC);\n",
 265 |     "\n",
 266 |     "CREATE TABLE albums_by_title (\n",
 267 |     "    title TEXT,\n",
 268 |     "    year INT,\n",
 269 |     "    performer TEXT,\n",
 270 |     "    genre TEXT,\n",
 271 |     "    PRIMARY KEY (title, year)\n",
 272 |     ") WITH CLUSTERING ORDER BY (year DESC);\n",
 273 |     "\n",
 274 |     "CREATE TABLE albums_by_genre (\n",
 275 |     "    genre TEXT,\n",
 276 |     "    year INT,\n",
 277 |     "    performer TEXT,\n",
 278 |     "    title TEXT,\n",
 279 |     "    PRIMARY KEY (genre, year, performer, title)\n",
 280 |     ") WITH CLUSTERING ORDER BY (year DESC);\n",
 281 |     "\n",
 282 |     "CREATE TABLE tracks_by_title (\n",
 283 |     "    title TEXT,\n",
 284 |     "    album_title TEXT,\n",
 285 |     "    album_year INT,\n",
 286 |     "    number INT,\n",
 287 |     "    length INT,\n",
 288 |     "    genre TEXT,\n",
 289 |     "    PRIMARY KEY (title, album_title, album_year, number)\n",
 290 |     ") WITH CLUSTERING ORDER BY (album_title ASC, album_year DESC, number ASC);\n",
 291 |     "\n",
 292 |     "CREATE TABLE tracks_by_album (\n",
 293 |     "    album_title TEXT,\n",
 294 |     "    album_year INT,\n",
 295 |     "    number INT,\n",
 296 |     "    title TEXT,\n",
 297 |     "    length INT,\n",
 298 |     "    genre TEXT STATIC,\n",
 299 |     "    PRIMARY KEY (album_title, album_year, number)\n",
 300 |     ") WITH CLUSTERING ORDER BY (album_year DESC, number ASC);\n",
 301 |     "\n",
 302 |     "CREATE TABLE users (\n",
 303 |     "    id UUID PRIMARY KEY,\n",
 304 |     "    name TEXT\n",
 305 |     ");\n",
 306 |     "\n",
 307 |     "CREATE TABLE tracks_by_user (\n",
 308 |     "    id UUID,\n",
 309 |     "    month DATE,\n",
 310 |     "    timestamp TIMESTAMP,\n",
 311 |     "    album_title TEXT,\n",
 312 |     "    album_year INT,\n",
 313 |     "    number INT,\n",
 314 |     "    title TEXT,\n",
 315 |     "    length INT,\n",
 316 |     "    PRIMARY KEY (id, timestamp)\n",
 317 |     ") WITH CLUSTERING ORDER BY (timestamp DESC);\"\"\""
 318 |    ]
 319 |   },
 320 |   {
 321 |    "cell_type": "code",
 322 |    "execution_count": 9,
 323 |    "id": "03471e5c-47f0-4abd-b742-673a2524f0e2",
 324 |    "metadata": {
 325 |     "id": "03471e5c-47f0-4abd-b742-673a2524f0e2"
 326 |    },
 327 |    "outputs": [
 328 |     {
 329 |      "name": "stdout",
 330 |      "output_type": "stream",
 331 |      "text": [
 332 |       "Running statement 'CREATE TABLE performers (     name TEXT  ...' ... done.\n",
 333 |       "Running statement 'CREATE TABLE albums_by_performer (     p ...' ... done.\n",
 334 |       "Running statement 'CREATE TABLE albums_by_title (     title ...' ... done.\n",
 335 |       "Running statement 'CREATE TABLE albums_by_genre (     genre ...' ... done.\n",
 336 |       "Running statement 'CREATE TABLE tracks_by_title (     title ...' ... done.\n",
 337 |       "Running statement 'CREATE TABLE tracks_by_album (     album ...' ... done.\n",
 338 |       "Running statement 'CREATE TABLE users (     id UUID PRIMARY ...' ... done.\n",
 339 |       "Running statement 'CREATE TABLE tracks_by_user (     id UUI ...' ... done.\n"
 340 |      ]
 341 |     }
 342 |    ],
 343 |    "source": [
 344 |     "# This parses the text above into executable strings by the driver\n",
 345 |     "for statement in create_tables_cql.split(\";\"):\n",
 346 |     "    _statement = statement.strip().replace(\"\\n\", \" \")\n",
 347 |     "    if _statement:\n",
 348 |     "        print(f\"Running statement '{_statement[:40]} ...' ...\", end=\"\")\n",
 349 |     "        execute_statement(_statement)\n",
 350 |     "        print(\" done.\")"
 351 |    ]
 352 |   },
 353 |   {
 354 |    "cell_type": "code",
 355 |    "execution_count": 10,
 356 |    "id": "4698c114-1b77-4613-8555-1122b5a60295",
 357 |    "metadata": {
 358 |     "id": "4698c114-1b77-4613-8555-1122b5a60295"
 359 |    },
 360 |    "outputs": [],
 361 |    "source": [
 362 |     "# Now populate with some fake data\n",
 363 |     "insert_fake_data_cql = \"\"\"\n",
 364 |     "-- Insert data into performers\n",
 365 |     "INSERT INTO performers (name, type, country, born, died, founded) VALUES ('The Beatles', 'Band', 'UK', 1960, NULL, 1960);\n",
 366 |     "INSERT INTO performers (name, type, country, born, died, founded) VALUES ('Adele', 'Solo', 'UK', 1988, NULL, NULL);\n",
 367 |     "INSERT INTO performers (name, type, country, born, died, founded) VALUES ('Elton John', 'Solo', 'UK', 1947, NULL, NULL);\n",
 368 |     "INSERT INTO performers (name, type, country, born, died, founded) VALUES ('Queen', 'Band', 'UK', 1970, NULL, 1970);\n",
 369 |     "INSERT INTO performers (name, type, country, born, died, founded) VALUES ('Taylor Swift', 'Solo', 'USA', 1989, NULL, NULL);\n",
 370 |     "\n",
 371 |     "-- Insert data into albums by performer, title, and genre\n",
 372 |     "-- Assuming 'Pop' as a genre for all for simplicity\n",
 373 |     "INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('The Beatles', 1967, 'Sgt. Pepper''s Lonely Hearts Club Band', 'Pop');\n",
 374 |     "INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('Adele', 2015, '25', 'Pop');\n",
 375 |     "INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('Elton John', 1973, 'Goodbye Yellow Brick Road', 'Pop');\n",
 376 |     "INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('Queen', 1975, 'A Night at the Opera', 'Pop');\n",
 377 |     "INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('Taylor Swift', 2014, '1989', 'Pop');\n",
 378 |     "\n",
 379 |     "-- Repeat for albums_by_title\n",
 380 |     "INSERT INTO albums_by_title (title, year, performer, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 1967, 'The Beatles', 'Pop');\n",
 381 |     "INSERT INTO albums_by_title (title, year, performer, genre) VALUES ('25', 2015, 'Adele', 'Pop');\n",
 382 |     "INSERT INTO albums_by_title (title, year, performer, genre) VALUES ('Goodbye Yellow Brick Road', 1973, 'Elton John', 'Pop');\n",
 383 |     "INSERT INTO albums_by_title (title, year, performer, genre) VALUES ('A Night at the Opera', 1975, 'Queen', 'Pop');\n",
 384 |     "INSERT INTO albums_by_title (title, year, performer, genre) VALUES ('1989', 2014, 'Taylor Swift', 'Pop');\n",
 385 |     "\n",
 386 |     "-- Repeat for albums_by_genre\n",
 387 |     "INSERT INTO albums_by_genre (genre, year, performer, title) VALUES ('Pop', 1967, 'The Beatles', 'Sgt. Pepper''s Lonely Hearts Club Band');\n",
 388 |     "INSERT INTO albums_by_genre (genre, year, performer, title) VALUES ('Pop', 2015, 'Adele', '25');\n",
 389 |     "INSERT INTO albums_by_genre (genre, year, performer, title) VALUES ('Pop', 1973, 'Elton John', 'Goodbye Yellow Brick Road');\n",
 390 |     "INSERT INTO albums_by_genre (genre, year, performer, title) VALUES ('Pop', 1975, 'Queen', 'A Night at the Opera');\n",
 391 |     "INSERT INTO albums_by_genre (genre, year, performer, title) VALUES ('Pop', 2014, 'Taylor Swift', '1989');\n",
 392 |     "\n",
 393 |     "-- Insert data into tracks_by_title and tracks_by_album\n",
 394 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Lucy in the Sky with Diamonds', 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 1, 208, 'Pop');\n",
 395 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('With a Little Help from My Friends', 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 2, 163, 'Pop');\n",
 396 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 3, 122, 'Pop');\n",
 397 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Getting Better', 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 4, 174, 'Pop');\n",
 398 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Fixing a Hole', 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 5, 139, 'Pop');\n",
 399 |     "\n",
 400 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Hello', '25', 2015, 1, 295, 'Pop');\n",
 401 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Send My Love', '25', 2015, 2, 223, 'Pop');\n",
 402 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('I Miss You', '25', 2015, 3, 350, 'Pop');\n",
 403 |     "\n",
 404 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Candle in the Wind', 'Goodbye Yellow Brick Road', 1973, 1, 219, 'Pop');\n",
 405 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Bennie and the Jets', 'Goodbye Yellow Brick Road', 1973, 2, 323, 'Pop');\n",
 406 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Goodbye Yellow Brick Road', 'Goodbye Yellow Brick Road', 1973, 3, 193, 'Pop');\n",
 407 |     "\n",
 408 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Bohemian Rhapsody', 'A Night at the Opera', 1975, 1, 354, 'Pop');\n",
 409 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Love of My Life', 'A Night at the Opera', 1975, 2, 220, 'Pop');\n",
 410 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Youre My Best Friend', 'A Night at the Opera', 1975, 3, 178, 'Pop');\n",
 411 |     "\n",
 412 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Welcome to New York', '1989', 2014, 1, 212, 'Pop');\n",
 413 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Blank Space', '1989', 2014, 2, 231, 'Pop');\n",
 414 |     "INSERT INTO tracks_by_title (title, album_title, album_year, number, length, genre) VALUES ('Style', '1989', 2014, 3, 230, 'Pop');\n",
 415 |     "\n",
 416 |     "-- Repeat for tracks_by_album with corresponding track numbers\n",
 417 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 1967, 1, 'Lucy in the Sky with Diamonds', 208, 'Pop');\n",
 418 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 1967, 2, 'With a Little Help from My Friends', 163, 'Pop');\n",
 419 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 1967, 3, 'Sgt. Pepper''s Lonely Hearts Club Band', 122, 'Pop');\n",
 420 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 1967, 4, 'Getting Better', 174, 'Pop');\n",
 421 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Sgt. Pepper''s Lonely Hearts Club Band', 1967, 5, 'Fixing a Hole', 139, 'Pop');\n",
 422 |     "\n",
 423 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('25', 2015, 1, 'Hello', 295, 'Pop');\n",
 424 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('25', 2015, 2, 'Send My Love', 223, 'Pop');\n",
 425 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('25', 2015, 3, 'I Miss You', 350, 'Pop');\n",
 426 |     "\n",
 427 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Goodbye Yellow Brick Road', 1973, 1, 'Candle in the Wind', 219, 'Pop');\n",
 428 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Goodbye Yellow Brick Road', 1973, 2, 'Bennie and the Jets', 323, 'Pop');\n",
 429 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('Goodbye Yellow Brick Road', 1973, 3, 'Goodbye Yellow Brick Road', 193, 'Pop');\n",
 430 |     "\n",
 431 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('A Night at the Opera', 1975, 1, 'Bohemian Rhapsody', 354, 'Pop');\n",
 432 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('A Night at the Opera', 1975, 2, 'Love of My Life', 220, 'Pop');\n",
 433 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('A Night at the Opera', 1975, 3, 'Youre My Best Friend', 178, 'Pop');\n",
 434 |     "\n",
 435 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('1989', 2014, 1, 'Welcome to New York', 212, 'Pop');\n",
 436 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('1989', 2014, 2, 'Blank Space', 231, 'Pop');\n",
 437 |     "INSERT INTO tracks_by_album (album_title, album_year, number, title, length, genre) VALUES ('1989', 2014, 3, 'Style', 230, 'Pop');\n",
 438 |     "\n",
 439 |     "-- Insert data into users\n",
 440 |     "INSERT INTO users (id, name) VALUES (uuid(), 'John Doe');\n",
 441 |     "INSERT INTO users (id, name) VALUES (uuid(), 'Jane Smith');\n",
 442 |     "INSERT INTO users (id, name) VALUES (uuid(), 'Emily Johnson');\n",
 443 |     "INSERT INTO users (id, name) VALUES (uuid(), 'Michael Brown');\n",
 444 |     "INSERT INTO users (id, name) VALUES (uuid(), 'Jessica Davis');\n",
 445 |     "\n",
 446 |     "-- Insert data into tracks_by_user\n",
 447 |     "-- User ids should be copied from the users insert statements once generated\n",
 448 |     "-- The following are placeholders and should be replaced with actual UUIDs\n",
 449 |     "INSERT INTO tracks_by_user (id, month, timestamp, album_title, album_year, number, title, length) VALUES (uuid(), '2024-01-01', toTimestamp(now()), 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 1, 'Lucy in the Sky with Diamonds', 208);\n",
 450 |     "INSERT INTO tracks_by_user (id, month, timestamp, album_title, album_year, number, title, length) VALUES (uuid(), '2024-01-01', toTimestamp(now()), 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 2, 'With a Little Help from My Friends', 163);\n",
 451 |     "INSERT INTO tracks_by_user (id, month, timestamp, album_title, album_year, number, title, length) VALUES (uuid(), '2024-01-01', toTimestamp(now()), 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 3, 'Sgt. Pepper''s Lonely Hearts Club Band', 122);\n",
 452 |     "INSERT INTO tracks_by_user (id, month, timestamp, album_title, album_year, number, title, length) VALUES (uuid(), '2024-01-01', toTimestamp(now()), 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 4, 'Getting Better', 174);\n",
 453 |     "INSERT INTO tracks_by_user (id, month, timestamp, album_title, album_year, number, title, length) VALUES (uuid(), '2024-01-01', toTimestamp(now()), 'Sgt. Pepper''s Lonely Hearts Club Band', 1967, 5, 'Fixing a Hole', 139);\n",
 454 |     "\"\"\""
 455 |    ]
 456 |   },
 457 |   {
 458 |    "cell_type": "code",
 459 |    "execution_count": 11,
 460 |    "id": "71fe8d93-322d-407f-a1ed-406366a9ddbf",
 461 |    "metadata": {
 462 |     "id": "71fe8d93-322d-407f-a1ed-406366a9ddbf"
 463 |    },
 464 |    "outputs": [
 465 |     {
 466 |      "name": "stdout",
 467 |      "output_type": "stream",
 468 |      "text": [
 469 |       "Running statement 'INSERT INTO performers (name, type, coun ...' ... done.\n",
 470 |       "Running statement 'INSERT INTO performers (name, type, coun ...' ... done.\n",
 471 |       "Running statement 'INSERT INTO performers (name, type, coun ...' ... done.\n",
 472 |       "Running statement 'INSERT INTO performers (name, type, coun ...' ... done.\n",
 473 |       "Running statement 'INSERT INTO performers (name, type, coun ...' ... done.\n",
 474 |       "Running statement 'INSERT INTO albums_by_performer (perform ...' ... done.\n",
 475 |       "Running statement 'INSERT INTO albums_by_performer (perform ...' ... done.\n",
 476 |       "Running statement 'INSERT INTO albums_by_performer (perform ...' ... done.\n",
 477 |       "Running statement 'INSERT INTO albums_by_performer (perform ...' ... done.\n",
 478 |       "Running statement 'INSERT INTO albums_by_performer (perform ...' ... done.\n",
 479 |       "Running statement 'INSERT INTO albums_by_title (title, year ...' ... done.\n",
 480 |       "Running statement 'INSERT INTO albums_by_title (title, year ...' ... done.\n",
 481 |       "Running statement 'INSERT INTO albums_by_title (title, year ...' ... done.\n",
 482 |       "Running statement 'INSERT INTO albums_by_title (title, year ...' ... done.\n",
 483 |       "Running statement 'INSERT INTO albums_by_title (title, year ...' ... done.\n",
 484 |       "Running statement 'INSERT INTO albums_by_genre (genre, year ...' ... done.\n",
 485 |       "Running statement 'INSERT INTO albums_by_genre (genre, year ...' ... done.\n",
 486 |       "Running statement 'INSERT INTO albums_by_genre (genre, year ...' ... done.\n",
 487 |       "Running statement 'INSERT INTO albums_by_genre (genre, year ...' ... done.\n",
 488 |       "Running statement 'INSERT INTO albums_by_genre (genre, year ...' ... done.\n",
 489 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 490 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 491 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 492 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 493 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 494 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 495 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 496 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 497 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 498 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 499 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 500 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 501 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 502 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 503 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 504 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 505 |       "Running statement 'INSERT INTO tracks_by_title (title, albu ...' ... done.\n",
 506 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 507 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 508 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 509 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 510 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 511 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 512 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 513 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 514 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 515 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 516 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 517 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 518 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 519 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 520 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 521 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 522 |       "Running statement 'INSERT INTO tracks_by_album (album_title ...' ... done.\n",
 523 |       "Running statement 'INSERT INTO users (id, name) VALUES (uui ...' ... done.\n",
 524 |       "Running statement 'INSERT INTO users (id, name) VALUES (uui ...' ... done.\n",
 525 |       "Running statement 'INSERT INTO users (id, name) VALUES (uui ...' ... done.\n",
 526 |       "Running statement 'INSERT INTO users (id, name) VALUES (uui ...' ... done.\n",
 527 |       "Running statement 'INSERT INTO users (id, name) VALUES (uui ...' ... done.\n",
 528 |       "Running statement 'INSERT INTO tracks_by_user (id, month, t ...' ... done.\n",
 529 |       "Running statement 'INSERT INTO tracks_by_user (id, month, t ...' ... done.\n",
 530 |       "Running statement 'INSERT INTO tracks_by_user (id, month, t ...' ... done.\n",
 531 |       "Running statement 'INSERT INTO tracks_by_user (id, month, t ...' ... done.\n",
 532 |       "Running statement 'INSERT INTO tracks_by_user (id, month, t ...' ... done.\n"
 533 |      ]
 534 |     }
 535 |    ],
 536 |    "source": [
 537 |     "# This parses the text above into executable strings by the driver\n",
 538 |     "for line in insert_fake_data_cql.split(\"\\n\"):\n",
 539 |     "    if \";\" in line:\n",
 540 |     "        _statement = line.replace(\"\\n\", \" \")\n",
 541 |     "        print(f\"Running statement '{_statement[:40]} ...' ...\", end=\"\")\n",
 542 |     "        execute_statement(_statement)\n",
 543 |     "        print(\" done.\")"
 544 |    ]
 545 |   },
 546 |   {
 547 |    "cell_type": "markdown",
 548 |    "id": "2495b975-8342-4386-b92d-5ae05b783853",
 549 |    "metadata": {
 550 |     "id": "2495b975-8342-4386-b92d-5ae05b783853"
 551 |    },
 552 |    "source": [
 553 |     "## (Optional) Give the LLM Additional Context with the Built-in 'Comments' Column"
 554 |    ]
 555 |   },
 556 |   {
 557 |    "cell_type": "markdown",
 558 |    "id": "23820cb9-d683-48bb-9440-b398446df4c9",
 559 |    "metadata": {
 560 |     "id": "23820cb9-d683-48bb-9440-b398446df4c9"
 561 |    },
 562 |    "source": [
 563 |     "LLM response quality greatly depends on the context they've been given - the more concise descriptions they have access to, the better. We can choose to augment the DB schema we pass to the model by utilizing the built-in `comment` property of CQL tables.\n",
 564 |     "\n",
 565 |     "NOTE: You can also include these comments at table creation by using the `WITH <table property 1> AND <table property 2> ... AND comment = '<comment>'` syntax"
 566 |    ]
 567 |   },
 568 |   {
 569 |    "cell_type": "code",
 570 |    "execution_count": 12,
 571 |    "id": "04d67d6c-6938-46ac-a5b5-8baaea54272a",
 572 |    "metadata": {
 573 |     "id": "04d67d6c-6938-46ac-a5b5-8baaea54272a"
 574 |    },
 575 |    "outputs": [],
 576 |    "source": [
 577 |     "add_comments_cql = f\"\"\"\n",
 578 |     "ALTER TABLE albums_by_genre WITH comment = 'Albums partitioned by musical genre';\n",
 579 |     "ALTER TABLE albums_by_performer WITH comment = 'Albums partitioned by name of performer/artist';\n",
 580 |     "ALTER TABLE albums_by_title WITH comment = 'Albums partitioned by album title';\n",
 581 |     "ALTER TABLE performers WITH comment = 'Performers/artists partitioned by performer name';\n",
 582 |     "ALTER TABLE tracks_by_album WITH comment = 'Tracks/songs partitioned by album title';\n",
 583 |     "ALTER TABLE tracks_by_title WITH comment = 'Tracks/songs partitioned by song title';\n",
 584 |     "ALTER TABLE tracks_by_user WITH comment = 'Tracks/songs users listened to partitioned by user ID and time of listen';\n",
 585 |     "ALTER TABLE users WITH comment = 'Users partitioned by user ID';\n",
 586 |     "\"\"\""
 587 |    ]
 588 |   },
 589 |   {
 590 |    "cell_type": "code",
 591 |    "execution_count": 13,
 592 |    "id": "db29bea2-7600-495b-b57b-e2937fb0753f",
 593 |    "metadata": {
 594 |     "id": "db29bea2-7600-495b-b57b-e2937fb0753f"
 595 |    },
 596 |    "outputs": [
 597 |     {
 598 |      "name": "stdout",
 599 |      "output_type": "stream",
 600 |      "text": [
 601 |       "Running statement 'ALTER TABLE albums_by_genre WITH comment ...' ... done.\n",
 602 |       "Running statement 'ALTER TABLE albums_by_performer WITH com ...' ... done.\n",
 603 |       "Running statement 'ALTER TABLE albums_by_title WITH comment ...' ... done.\n",
 604 |       "Running statement 'ALTER TABLE performers WITH comment = 'P ...' ... done.\n",
 605 |       "Running statement 'ALTER TABLE tracks_by_album WITH comment ...' ... done.\n",
 606 |       "Running statement 'ALTER TABLE tracks_by_title WITH comment ...' ... done.\n",
 607 |       "Running statement 'ALTER TABLE tracks_by_user WITH comment  ...' ... done.\n",
 608 |       "Running statement 'ALTER TABLE users WITH comment = 'Users  ...' ... done.\n"
 609 |      ]
 610 |     }
 611 |    ],
 612 |    "source": [
 613 |     "# This parses the text above into executable strings by the driver\n",
 614 |     "for line in add_comments_cql.split(\"\\n\"):\n",
 615 |     "    if \";\" in line:\n",
 616 |     "        _statement = line.replace(\"\\n\", \" \")\n",
 617 |     "        print(f\"Running statement '{_statement[:40]} ...' ...\", end=\"\")\n",
 618 |     "        execute_statement(_statement)\n",
 619 |     "        print(\" done.\")"
 620 |    ]
 621 |   },
 622 |   {
 623 |    "cell_type": "markdown",
 624 |    "id": "1a07cc49-7ba3-45ee-b2da-7fd297d8a86d",
 625 |    "metadata": {
 626 |     "id": "1a07cc49-7ba3-45ee-b2da-7fd297d8a86d"
 627 |    },
 628 |    "source": [
 629 |     "## Run Queries from User Questions"
 630 |    ]
 631 |   },
 632 |   {
 633 |    "cell_type": "markdown",
 634 |    "id": "942613bc-25aa-41cc-80e9-78ca15bcee06",
 635 |    "metadata": {
 636 |     "id": "942613bc-25aa-41cc-80e9-78ca15bcee06"
 637 |    },
 638 |    "source": [
 639 |     "#### Generating & Executing CQL"
 640 |    ]
 641 |   },
 642 |   {
 643 |    "cell_type": "markdown",
 644 |    "id": "8542b137-2fd1-4105-a402-17358647f815",
 645 |    "metadata": {
 646 |     "id": "8542b137-2fd1-4105-a402-17358647f815"
 647 |    },
 648 |    "source": [
 649 |     "Now, we can ask ChatGPT to provide us with some queries that answer our questions! The prompt template we use is taken from [SQL-PaLM](https://arxiv.org/abs/2306.00739), and adapted to fit the CQL use case. In order to use it though, we need to retrieve the schema from our DB."
 650 |    ]
 651 |   },
 652 |   {
 653 |    "cell_type": "code",
 654 |    "execution_count": 14,
 655 |    "id": "7f442866-55e8-4284-83e1-705bb320250b",
 656 |    "metadata": {
 657 |     "id": "7f442866-55e8-4284-83e1-705bb320250b"
 658 |    },
 659 |    "outputs": [],
 660 |    "source": [
 661 |     "TEXT2CQL_PROMPT = \"\"\"Convert the question to CQL (Cassandra Query Language)\n",
 662 |     "that can retrieve an appropriate answer, or answer saying that the data model\n",
 663 |     "does not support answering such a question in a performant way:\n",
 664 |     "\n",
 665 |     "[Schema : values (type)]\n",
 666 |     "{schema}\n",
 667 |     "\n",
 668 |     "[Partition Keys]\n",
 669 |     "{partition_keys}\n",
 670 |     "\n",
 671 |     "[Clustering Keys]\n",
 672 |     "{clustering_keys}\n",
 673 |     "\n",
 674 |     "[Q]\n",
 675 |     "{question}\n",
 676 |     "\n",
 677 |     "[CQL]\n",
 678 |     "\"\"\"\n",
 679 |     "\n",
 680 |     "\n",
 681 |     "def generate_schema_partition_clustering_keys(keyspace: str = keyspace) -> (str, str):\n",
 682 |     "    \"\"\"Generates a TEXT2CQL_PROMPT compatible schema for a keyspace\"\"\"\n",
 683 |     "    # Get all table names in our keyspace\n",
 684 |     "    table_names = execute_statement(\n",
 685 |     "        f\"SELECT table_name, comment FROM system_schema.tables WHERE keyspace_name = '{keyspace}'\"\n",
 686 |     "    )\n",
 687 |     "    tn_str = \", \".join([\"'\" + tn.table_name + \"'\" for tn in table_names])\n",
 688 |     "\n",
 689 |     "    # Now get all the column names corresponding to those tables\n",
 690 |     "    columns = execute_statement(\n",
 691 |     "        f\"SELECT * FROM system_schema.columns WHERE table_name IN ({tn_str}) AND keyspace_name = '{keyspace}' ALLOW FILTERING\"\n",
 692 |     "    )\n",
 693 |     "\n",
 694 |     "    # Now, we construct our prompt template formatted schema, partition_keys, and clustering keys\n",
 695 |     "    # from the table and column objects returned from the DB\n",
 696 |     "    schema = \" | \".join([\n",
 697 |     "        f\"{table.table_name} '{table.comment}' : \" + \" , \".join([\n",
 698 |     "            f\"{col.column_name} ({col.type})\"\n",
 699 |     "            for col in columns\n",
 700 |     "            if col.table_name == table.table_name\n",
 701 |     "        ])\n",
 702 |     "        for table in table_names\n",
 703 |     "    ])\n",
 704 |     "    partition_keys = \" | \".join([\n",
 705 |     "        f\"{table.table_name} : \" + \" , \".join([\n",
 706 |     "            col.column_name for col in columns\n",
 707 |     "            if col.table_name == table.table_name\n",
 708 |     "            and col.kind == \"partition_key\"\n",
 709 |     "        ])\n",
 710 |     "        for table in table_names\n",
 711 |     "    ])\n",
 712 |     "    clustering_keys = \" | \".join([\n",
 713 |     "        f\"{table.table_name} : \" + \" , \".join([\n",
 714 |     "            f\"{col.column_name} ({col.clustering_order})\" for col in columns\n",
 715 |     "            if col.table_name == table.table_name\n",
 716 |     "            and col.kind == \"clustering\"\n",
 717 |     "        ])\n",
 718 |     "        for table in table_names\n",
 719 |     "    ])\n",
 720 |     "    return schema, partition_keys, clustering_keys\n",
 721 |     "\n",
 722 |     "\n",
 723 |     "def execute_query_from_question(question: str, debug_cql: bool = True, debug_prompt: bool = False, return_cql: bool = False):\n",
 724 |     "    \"\"\"Generates and executes CQL from a user question based on LLM output\"\"\"\n",
 725 |     "    # Get all of the variables necessary to fill out the prompt\n",
 726 |     "    schema, partition_keys, clustering_keys = generate_schema_partition_clustering_keys()\n",
 727 |     "    prompt = TEXT2CQL_PROMPT.format(\n",
 728 |     "        schema=schema,\n",
 729 |     "        partition_keys=partition_keys,\n",
 730 |     "        clustering_keys=clustering_keys,\n",
 731 |     "        question=question,\n",
 732 |     "    )\n",
 733 |     "\n",
 734 |     "    if debug_prompt:\n",
 735 |     "        print(f\"Prompting model with:\\n{prompt}\")\n",
 736 |     "\n",
 737 |     "    # Get generated CQL from the LLM (in this case gpt-3.5-turbo)\n",
 738 |     "    completion = client.chat.completions.create(\n",
 739 |     "        messages=[{\n",
 740 |     "            \"role\": \"user\",\n",
 741 |     "            \"content\": prompt,\n",
 742 |     "        }],\n",
 743 |     "        model=\"gpt-3.5-turbo\",\n",
 744 |     "    ).choices[0].message.content\n",
 745 |     "\n",
 746 |     "    if debug_cql:\n",
 747 |     "        print(f\"Question: {question}\\nGenerated Query: {completion}\\n\")\n",
 748 |     "\n",
 749 |     "    # Need to trim trailing ';' if present to work with cassandra-driver\n",
 750 |     "    if completion.find(\";\") > -1:\n",
 751 |     "        completion = completion[:completion.find(\";\")]\n",
 752 |     "\n",
 753 |     "    results = execute_statement(completion)\n",
 754 |     "\n",
 755 |     "    if return_cql:\n",
 756 |     "        return (results, completion)\n",
 757 |     "    else:\n",
 758 |     "        return results"
 759 |    ]
 760 |   },
 761 |   {
 762 |    "cell_type": "code",
 763 |    "execution_count": 15,
 764 |    "id": "dd55a635-d1f5-4463-afff-b98b7bf23b56",
 765 |    "metadata": {
 766 |     "colab": {
 767 |      "base_uri": "https://localhost:8080/"
 768 |     },
 769 |     "id": "dd55a635-d1f5-4463-afff-b98b7bf23b56",
 770 |     "outputId": "66f548a0-c268-4d3a-cba6-5fb3fa3dc755"
 771 |    },
 772 |    "outputs": [
 773 |     {
 774 |      "name": "stdout",
 775 |      "output_type": "stream",
 776 |      "text": [
 777 |       "Prompting model with:\n",
 778 |       "Convert the question to CQL (Cassandra Query Language)\n",
 779 |       "that can retrieve an appropriate answer, or answer saying that the data model\n",
 780 |       "does not support answering such a question in a performant way:\n",
 781 |       "\n",
 782 |       "[Schema : values (type)]\n",
 783 |       "albums_by_genre 'Albums partitioned by musical genre' : genre (text) , performer (text) , title (text) , year (int) | albums_by_performer 'Albums partitioned by name of performer/artist' : genre (text) , performer (text) , title (text) , year (int) | albums_by_title 'Albums partitioned by album title' : genre (text) , performer (text) , title (text) , year (int) | performers 'Performers/artists partitioned by performer name' : born (int) , country (text) , died (int) , founded (int) , name (text) , type (text) | tracks_by_album 'Tracks/songs partitioned by album title' : album_title (text) , album_year (int) , genre (text) , length (int) , number (int) , title (text) | tracks_by_title 'Tracks/songs partitioned by song title' : album_title (text) , album_year (int) , genre (text) , length (int) , number (int) , title (text) | tracks_by_user 'Tracks/songs users listened to partitioned by user ID and time of listen' : album_title (text) , album_year (int) , id (uuid) , length (int) , month (date) , number (int) , timestamp (timestamp) , title (text) | users 'Users partitioned by user ID' : id (uuid) , name (text)\n",
 784 |       "\n",
 785 |       "[Partition Keys]\n",
 786 |       "albums_by_genre : genre | albums_by_performer : performer | albums_by_title : title | performers : name | tracks_by_album : album_title | tracks_by_title : title | tracks_by_user : id | users : id\n",
 787 |       "\n",
 788 |       "[Clustering Keys]\n",
 789 |       "albums_by_genre : performer (asc) , title (asc) , year (desc) | albums_by_performer : title (asc) , year (desc) | albums_by_title : year (desc) | performers :  | tracks_by_album : album_year (desc) , number (asc) | tracks_by_title : album_title (asc) , album_year (desc) , number (asc) | tracks_by_user : timestamp (desc) | users : \n",
 790 |       "\n",
 791 |       "[Q]\n",
 792 |       "What songs are on A Night at the Opera?\n",
 793 |       "\n",
 794 |       "[CQL]\n",
 795 |       "\n",
 796 |       "Question: What songs are on A Night at the Opera?\n",
 797 |       "Generated Query: SELECT title FROM tracks_by_album WHERE album_title = 'A Night at the Opera';\n",
 798 |       "\n"
 799 |      ]
 800 |     },
 801 |     {
 802 |      "data": {
 803 |       "text/plain": [
 804 |        "[Row(title='Bohemian Rhapsody'),\n",
 805 |        " Row(title='Love of My Life'),\n",
 806 |        " Row(title='Youre My Best Friend')]"
 807 |       ]
 808 |      },
 809 |      "execution_count": 15,
 810 |      "metadata": {},
 811 |      "output_type": "execute_result"
 812 |     }
 813 |    ],
 814 |    "source": [
 815 |     "# Show full prompting trace\n",
 816 |     "execute_query_from_question(\"What songs are on A Night at the Opera?\", debug_prompt=True)"
 817 |    ]
 818 |   },
 819 |   {
 820 |    "cell_type": "code",
 821 |    "execution_count": 16,
 822 |    "id": "b3ae4ade-dc9c-4cae-824f-98edc5bc5208",
 823 |    "metadata": {
 824 |     "colab": {
 825 |      "base_uri": "https://localhost:8080/"
 826 |     },
 827 |     "id": "b3ae4ade-dc9c-4cae-824f-98edc5bc5208",
 828 |     "outputId": "bb3c1b4b-dd61-48c9-99bf-8d2db543bbe0"
 829 |    },
 830 |    "outputs": [
 831 |     {
 832 |      "name": "stdout",
 833 |      "output_type": "stream",
 834 |      "text": [
 835 |       "Question: What are some of the most recent Pop albums in the last decade?\n",
 836 |       "Generated Query: SELECT * FROM albums_by_genre WHERE genre = 'Pop' AND year > 2010 ALLOW FILTERING;\n",
 837 |       "\n"
 838 |      ]
 839 |     },
 840 |     {
 841 |      "data": {
 842 |       "text/plain": [
 843 |        "[Row(genre='Pop', year=2015, performer='Adele', title='25'),\n",
 844 |        " Row(genre='Pop', year=2014, performer='Taylor Swift', title='1989')]"
 845 |       ]
 846 |      },
 847 |      "execution_count": 16,
 848 |      "metadata": {},
 849 |      "output_type": "execute_result"
 850 |     }
 851 |    ],
 852 |    "source": [
 853 |     "execute_query_from_question(\"What are some of the most recent Pop albums in the last decade?\")"
 854 |    ]
 855 |   },
 856 |   {
 857 |    "cell_type": "code",
 858 |    "execution_count": 17,
 859 |    "id": "51e97f95-5f44-4de5-9342-1c00b30918c4",
 860 |    "metadata": {
 861 |     "colab": {
 862 |      "base_uri": "https://localhost:8080/"
 863 |     },
 864 |     "id": "51e97f95-5f44-4de5-9342-1c00b30918c4",
 865 |     "outputId": "4231b990-e0f7-4824-e3f4-46450bbcc3ec"
 866 |    },
 867 |    "outputs": [
 868 |     {
 869 |      "name": "stdout",
 870 |      "output_type": "stream",
 871 |      "text": [
 872 |       "Question: How many albums has Taylor Swift made?\n",
 873 |       "Generated Query: SELECT COUNT(*) FROM albums_by_performer WHERE performer = 'Taylor Swift';\n",
 874 |       "\n"
 875 |      ]
 876 |     },
 877 |     {
 878 |      "data": {
 879 |       "text/plain": [
 880 |        "[Row(count=1)]"
 881 |       ]
 882 |      },
 883 |      "execution_count": 17,
 884 |      "metadata": {},
 885 |      "output_type": "execute_result"
 886 |     }
 887 |    ],
 888 |    "source": [
 889 |     "execute_query_from_question(\"How many albums has Taylor Swift made?\")"
 890 |    ]
 891 |   },
 892 |   {
 893 |    "cell_type": "markdown",
 894 |    "id": "7adb085f-48af-422b-9e9e-15c2d02a4f58",
 895 |    "metadata": {
 896 |     "id": "7adb085f-48af-422b-9e9e-15c2d02a4f58"
 897 |    },
 898 |    "source": [
 899 |     "Pretty cool that it can find the data to answer our questions! Let's see if we can take this one step further, and actually generate coherent responses using this data:"
 900 |    ]
 901 |   },
 902 |   {
 903 |    "cell_type": "markdown",
 904 |    "id": "d8afb916-f30b-4bed-9e00-5e86b261d788",
 905 |    "metadata": {
 906 |     "id": "d8afb916-f30b-4bed-9e00-5e86b261d788"
 907 |    },
 908 |    "source": [
 909 |     "#### End to End Question Answering"
 910 |    ]
 911 |   },
 912 |   {
 913 |    "cell_type": "markdown",
 914 |    "id": "09aa4a19-a284-4800-abb0-8d0e24185c38",
 915 |    "metadata": {
 916 |     "id": "09aa4a19-a284-4800-abb0-8d0e24185c38"
 917 |    },
 918 |    "source": [
 919 |     "Now, let's wrap up by showing how we can make a subsequent LLM call to answer the user's question with natural language. This completes a full \"RAG\" style pipeline!"
 920 |    ]
 921 |   },
 922 |   {
 923 |    "cell_type": "code",
 924 |    "execution_count": 18,
 925 |    "id": "dbb6002e-d5f1-408c-9bd3-e7ca4a329483",
 926 |    "metadata": {
 927 |     "id": "dbb6002e-d5f1-408c-9bd3-e7ca4a329483"
 928 |    },
 929 |    "outputs": [],
 930 |    "source": [
 931 |     "ANSWER_PROMPT = \"\"\"Query:\n",
 932 |     "```\n",
 933 |     "{cql}\n",
 934 |     "```\n",
 935 |     "\n",
 936 |     "Output:\n",
 937 |     "```\n",
 938 |     "{results_repr}\n",
 939 |     "```\n",
 940 |     "===\n",
 941 |     "\n",
 942 |     "Given the above results from querying the DB, answer the following user question:\n",
 943 |     "\n",
 944 |     "{question}\n",
 945 |     "\"\"\"\n",
 946 |     "\n",
 947 |     "\n",
 948 |     "def answer_question(question: str, debug_cql: bool = False, debug_prompt: bool = False) -> str:\n",
 949 |     "    \"\"\"Conducts a full RAG pipeline where the LLM retrieves relevant information\n",
 950 |     "    and references it to answer the question in natural language.\n",
 951 |     "    \"\"\"\n",
 952 |     "    # Get necessary fields to fill out prompt\n",
 953 |     "    query_results, cql = execute_query_from_question(\n",
 954 |     "        question=question,\n",
 955 |     "        debug_cql=debug_cql,\n",
 956 |     "        debug_prompt=debug_prompt,\n",
 957 |     "        return_cql=True,\n",
 958 |     "    )\n",
 959 |     "    prompt = ANSWER_PROMPT.format(\n",
 960 |     "        question=question,\n",
 961 |     "        results_repr=str(query_results),\n",
 962 |     "        cql=cql,\n",
 963 |     "    )\n",
 964 |     "\n",
 965 |     "    if debug_prompt:\n",
 966 |     "        print(f\"Prompting model with:\\n{prompt}\")\n",
 967 |     "\n",
 968 |     "    # Return the generated answer from the LLM\n",
 969 |     "    return client.chat.completions.create(\n",
 970 |     "        messages=[{\n",
 971 |     "            \"role\": \"user\",\n",
 972 |     "            \"content\": prompt,\n",
 973 |     "        }],\n",
 974 |     "        model=\"gpt-3.5-turbo\",\n",
 975 |     "    ).choices[0].message.content\n"
 976 |    ]
 977 |   },
 978 |   {
 979 |    "cell_type": "code",
 980 |    "execution_count": 19,
 981 |    "id": "8533ecb1-9c8e-4db1-beb4-e736d2a1b500",
 982 |    "metadata": {
 983 |     "colab": {
 984 |      "base_uri": "https://localhost:8080/"
 985 |     },
 986 |     "id": "8533ecb1-9c8e-4db1-beb4-e736d2a1b500",
 987 |     "outputId": "68d0beca-86cc-47b2-b96e-b6c6d1c5f1e8"
 988 |    },
 989 |    "outputs": [
 990 |     {
 991 |      "name": "stdout",
 992 |      "output_type": "stream",
 993 |      "text": [
 994 |       "Prompting model with:\n",
 995 |       "Convert the question to CQL (Cassandra Query Language)\n",
 996 |       "that can retrieve an appropriate answer, or answer saying that the data model\n",
 997 |       "does not support answering such a question in a performant way:\n",
 998 |       "\n",
 999 |       "[Schema : values (type)]\n",
1000 |       "albums_by_genre 'Albums partitioned by musical genre' : genre (text) , performer (text) , title (text) , year (int) | albums_by_performer 'Albums partitioned by name of performer/artist' : genre (text) , performer (text) , title (text) , year (int) | albums_by_title 'Albums partitioned by album title' : genre (text) , performer (text) , title (text) , year (int) | performers 'Performers/artists partitioned by performer name' : born (int) , country (text) , died (int) , founded (int) , name (text) , type (text) | tracks_by_album 'Tracks/songs partitioned by album title' : album_title (text) , album_year (int) , genre (text) , length (int) , number (int) , title (text) | tracks_by_title 'Tracks/songs partitioned by song title' : album_title (text) , album_year (int) , genre (text) , length (int) , number (int) , title (text) | tracks_by_user 'Tracks/songs users listened to partitioned by user ID and time of listen' : album_title (text) , album_year (int) , id (uuid) , length (int) , month (date) , number (int) , timestamp (timestamp) , title (text) | users 'Users partitioned by user ID' : id (uuid) , name (text)\n",
1001 |       "\n",
1002 |       "[Partition Keys]\n",
1003 |       "albums_by_genre : genre | albums_by_performer : performer | albums_by_title : title | performers : name | tracks_by_album : album_title | tracks_by_title : title | tracks_by_user : id | users : id\n",
1004 |       "\n",
1005 |       "[Clustering Keys]\n",
1006 |       "albums_by_genre : performer (asc) , title (asc) , year (desc) | albums_by_performer : title (asc) , year (desc) | albums_by_title : year (desc) | performers :  | tracks_by_album : album_year (desc) , number (asc) | tracks_by_title : album_title (asc) , album_year (desc) , number (asc) | tracks_by_user : timestamp (desc) | users : \n",
1007 |       "\n",
1008 |       "[Q]\n",
1009 |       "What songs are on A Night at the Opera?\n",
1010 |       "\n",
1011 |       "[CQL]\n",
1012 |       "\n",
1013 |       "Prompting model with:\n",
1014 |       "Query:\n",
1015 |       "```\n",
1016 |       "SELECT title FROM tracks_by_album WHERE album_title = 'A Night at the Opera'\n",
1017 |       "```\n",
1018 |       "\n",
1019 |       "Output:\n",
1020 |       "```\n",
1021 |       "[Row(title='Bohemian Rhapsody'), Row(title='Love of My Life'), Row(title='Youre My Best Friend')]\n",
1022 |       "```\n",
1023 |       "===\n",
1024 |       "\n",
1025 |       "Given the above results from querying the DB, answer the following user question:\n",
1026 |       "\n",
1027 |       "What songs are on A Night at the Opera?\n",
1028 |       "\n",
1029 |       "The songs on \"A Night at the Opera\" are \"Bohemian Rhapsody\", \"Love of My Life\", and \"You're My Best Friend\".\n"
1030 |      ]
1031 |     }
1032 |    ],
1033 |    "source": [
1034 |     "# Show full prompting trace\n",
1035 |     "print(\n",
1036 |     "    answer_question(\"What songs are on A Night at the Opera?\", debug_prompt=True)\n",
1037 |     ")"
1038 |    ]
1039 |   },
1040 |   {
1041 |    "cell_type": "code",
1042 |    "execution_count": 20,
1043 |    "id": "32edcf7a-ea91-4ae9-9543-3b987a3cb695",
1044 |    "metadata": {
1045 |     "colab": {
1046 |      "base_uri": "https://localhost:8080/"
1047 |     },
1048 |     "id": "32edcf7a-ea91-4ae9-9543-3b987a3cb695",
1049 |     "outputId": "713721a2-da55-45b5-8a4a-b285c588d6a1"
1050 |    },
1051 |    "outputs": [
1052 |     {
1053 |      "name": "stdout",
1054 |      "output_type": "stream",
1055 |      "text": [
1056 |       "Some of the most recent Pop albums in the last decade are Adele's \"25\" released in 2015 and Taylor Swift's \"1989\" released in 2014.\n"
1057 |      ]
1058 |     }
1059 |    ],
1060 |    "source": [
1061 |     "print(\n",
1062 |     "    answer_question(\"What are some of the most recent Pop albums in the last decade?\")\n",
1063 |     ")"
1064 |    ]
1065 |   },
1066 |   {
1067 |    "cell_type": "code",
1068 |    "execution_count": 21,
1069 |    "id": "3083b6cb-7e4c-4738-b934-a4e49ac372cb",
1070 |    "metadata": {
1071 |     "colab": {
1072 |      "base_uri": "https://localhost:8080/"
1073 |     },
1074 |     "id": "3083b6cb-7e4c-4738-b934-a4e49ac372cb",
1075 |     "outputId": "ae8b1eef-1871-4772-fec7-8ee279df7a8a"
1076 |    },
1077 |    "outputs": [
1078 |     {
1079 |      "name": "stdout",
1080 |      "output_type": "stream",
1081 |      "text": [
1082 |       "Taylor Swift has made 1 album.\n"
1083 |      ]
1084 |     }
1085 |    ],
1086 |    "source": [
1087 |     "print(\n",
1088 |     "    answer_question(\"How many albums has Taylor Swift made?\")\n",
1089 |     ")"
1090 |    ]
1091 |   },
1092 |   {
1093 |    "cell_type": "markdown",
1094 |    "id": "29edd522-6cb9-454c-a5b3-b8ee0f8a9ea8",
1095 |    "metadata": {
1096 |     "id": "29edd522-6cb9-454c-a5b3-b8ee0f8a9ea8"
1097 |    },
1098 |    "source": [
1099 |     "Awesome! Our model is answering questions based on just the data in our dummy DB, and is able to construct queries for retrieving that data in a fully automated way."
1100 |    ]
1101 |   }
1102 |  ],
1103 |  "metadata": {
1104 |   "colab": {
1105 |    "collapsed_sections": [
1106 |     "890b2537-f585-4f31-bb20-ed71910eb586",
1107 |     "2495b975-8342-4386-b92d-5ae05b783853"
1108 |    ],
1109 |    "provenance": []
1110 |   },
1111 |   "kernelspec": {
1112 |    "display_name": "Python 3 (ipykernel)",
1113 |    "language": "python",
1114 |    "name": "python3"
1115 |   },
1116 |   "language_info": {
1117 |    "codemirror_mode": {
1118 |     "name": "ipython",
1119 |     "version": 3
1120 |    },
1121 |    "file_extension": ".py",
1122 |    "mimetype": "text/x-python",
1123 |    "name": "python",
1124 |    "nbconvert_exporter": "python",
1125 |    "pygments_lexer": "ipython3",
1126 |    "version": "3.12.8"
1127 |   }
1128 |  },
1129 |  "nbformat": 4,
1130 |  "nbformat_minor": 5
1131 | }
1132 | 


--------------------------------------------------------------------------------