├── .coveragerc ├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── docs └── example_notebooks │ └── example.ipynb ├── poetry.lock ├── pyproject.toml ├── tests ├── __init__.py ├── data │ └── product_df.json ├── integration_tests │ ├── __init__.py │ └── test_llm_accessor.py └── unit_tests │ ├── __init__.py │ └── test_llm_accessor.py └── yolopandas ├── __init__.py ├── chains.py ├── llm_accessor.py └── utils ├── __init__.py └── query_helpers.py /.coveragerc: -------------------------------------------------------------------------------- 1 | [run] 2 | omit = tests/* 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode/ 2 | 3 | # Byte-compiled / optimized / DLL files 4 | __pycache__/ 5 | *.py[cod] 6 | *$py.class 7 | 8 | # Jupyter Notebook 9 | .ipynb_checkpoints 10 | 11 | # Unit test / coverage reports 12 | .coverage 13 | coverage.xml 14 | .pytest_cache/ 15 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License 2 | 3 | Copyright 2023 Chester Curme 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | coverage: 2 | poetry run pytest --cov \ 3 | --cov-config=.coveragerc \ 4 | --cov-report xml \ 5 | --cov-report term-missing:skip-covered 6 | 7 | format: 8 | poetry run black . 9 | poetry run isort . 10 | 11 | lint: 12 | poetry run mypy . 13 | poetry run black . --check 14 | poetry run isort . --check 15 | poetry run flake8 . 16 | 17 | unit_tests: 18 | poetry run pytest tests/unit_tests 19 | 20 | integration_tests: 21 | poetry run pytest tests/integration_tests 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # YOLOPandas 2 | 3 | Interact with Pandas objects via LLMs and [LangChain](https://github.com/hwchase17/langchain). 4 | 5 | YOLOPandas lets you specify commands with natural language and execute them directly on Pandas objects. 6 | You can preview the code before executing, or set `yolo=True` to execute the code straight from the LLM. 7 | 8 | **Warning**: YOLOPandas will execute arbitrary Python code on the machine it runs on. This is a dangerous thing to do. 9 | 10 | https://user-images.githubusercontent.com/26529506/214591990-c295a283-b9e6-4775-81e4-28917183ebb1.mp4 11 | 12 | ## Quick Install 13 | 14 | `pip install yolopandas` 15 | 16 | ## Basic usage 17 | 18 | YOLOPandas adds a `llm` accessor to Pandas dataframes. 19 | 20 | ```python 21 | from yolopandas import pd 22 | 23 | df = pd.DataFrame( 24 | [ 25 | {"name": "The Da Vinci Code", "type": "book", "price": 15, "quantity": 300, "rating": 4}, 26 | {"name": "Jurassic Park", "type": "book", "price": 12, "quantity": 400, "rating": 4.5}, 27 | {"name": "Jurassic Park", "type": "film", "price": 8, "quantity": 6, "rating": 5}, 28 | {"name": "Matilda", "type": "book", "price": 5, "quantity": 80, "rating": 4}, 29 | {"name": "Clockwork Orange", "type": None, "price": None, "quantity": 20, "rating": 4}, 30 | {"name": "Walden", "type": None, "price": None, "quantity": 100, "rating": 4.5}, 31 | ], 32 | ) 33 | 34 | df.llm.query("What item is the least expensive?") 35 | ``` 36 | The above will generate Pandas code to answer the question, and prompt the user to accept or reject the proposed code. 37 | Accepting it in this case will return a Pandas dataframe containing the result. 38 | 39 | Alternatively, you can execute the LLM output without first previewing it: 40 | ```python 41 | df.llm.query("What item is the least expensive?", yolo=True) 42 | ``` 43 | 44 | `.query` can return the result of the computation, which we do not constrain. For instance, while `"Show me products under $10"` will return a dataframe, the query `"Split the dataframe into two, 1/3 in one, 2/3 in the other. Return (df1, df2)"` can return a tuple of two dataframes. You can also chain queries together, for instance: 45 | ```python 46 | df.llm.query("Group by type and take the mean of all numeric columns.", yolo=True).llm.query("Make a bar plot of the result and use a log scale.", yolo=True) 47 | ``` 48 | 49 | Also, if you want to get a better idea of how much each query costs, you can use the function `run_query_with_cost` found in the utils module to compute the cost in $USD broken down by prompt/completion tokens: 50 | 51 | ```python 52 | 53 | from yolopandas.utils.query_helpers import run_query_with_cost 54 | 55 | run_query_with_cost(df, "What item is the least expensive?", yolo=True) 56 | ``` 57 | After running the above code, the output looks like the following: 58 | 59 | ``` 60 | Total Tokens: 267 61 | Prompt Tokens: 252 62 | Completion Tokens: 15 63 | Total Cost (USD): $0.00534 64 | ``` 65 | 66 | 67 | See the [example notebook](docs/example_notebooks/example.ipynb) for more ideas. 68 | 69 | 70 | ## LangChain Components 71 | 72 | This package uses several LangChain components, making it easy to work with if you are familiar with LangChain. In particular, it utilizes the LLM, Chain, and Memory abstractions. 73 | 74 | ### LLM Abstraction 75 | 76 | By working with LangChain's LLM abstraction, it is very easy to plug-and-play different LLM providers into YOLOPandas. You can do this in a few different ways: 77 | 78 | 1. You can change the default LLM by specifying a config path using the `LLPANDAS_LLM_CONFIGURATION` environment variable. The file at this path should be in [one of the accepted formats](https://langchain.readthedocs.io/en/latest/modules/llms/examples/llm_serialization.html). 79 | 80 | 2. If you have a LangChain LLM wrapper in memory, you can set it as the default LLM to use by doing: 81 | 82 | ```python 83 | import yolopandas 84 | yolopandas.set_llm(llm) 85 | ``` 86 | 87 | 3. You can set the LLM wrapper to use for a specific dataframe by doing: `df.reset_chain(llm=llm)` 88 | 89 | 90 | ### Chain Abstraction 91 | 92 | By working with LangChain's Chain abstraction, it is very easy to plug-and-play different chains into YOLOPandas. This can be useful if you want to customize the prompt, customize the chain, or anything like that. 93 | 94 | To use a custom chain for a particular dataframe, you can do: 95 | 96 | ```python 97 | df.set_chain(chain) 98 | ``` 99 | 100 | If you ever want to reset the chain to the base chain, you can do: 101 | 102 | ```python 103 | df.reset_chain() 104 | ``` 105 | 106 | ### Memory Abstraction 107 | 108 | The default chain used by YOLOPandas utilizes the LangChain concept of [memory](https://langchain.readthedocs.io/en/latest/modules/memory.html). This allows for "remembering" of previous commands, making it possible to ask follow up questions or ask for execution of commands that stem from previous interactions. 109 | 110 | For example, the query `"Make a seaborn plot of price grouped by type"` can be followed with `"Can you use a dark theme, and pastel colors?"` upon viewing the initial result. 111 | 112 | By default, memory is turned on. In order to have it turned off by default, you can set the environment variable `LLPANDAS_USE_MEMORY=False`. 113 | 114 | If you are resetting the chain, you can also specify whether to use memory there: 115 | 116 | ```python 117 | df.reset_chain(use_memory=False) 118 | ``` 119 | 120 | 121 | -------------------------------------------------------------------------------- /docs/example_notebooks/example.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "7aa7789f-56e5-4372-b21b-75b9aee45bfc", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "from yolopandas import pd" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 2, 16 | "id": "0f0e60c8-3a59-4834-93bc-f18931362387", 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "data": { 21 | "text/html": [ 22 | "
\n", 23 | "\n", 36 | "\n", 37 | " \n", 38 | " \n", 39 | " \n", 40 | " \n", 41 | " \n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | "
nametypepricequantityrating
0The Da Vinci Codebook15.03004.0
1Jurassic Parkbook12.04004.5
2Jurassic Parkfilm8.065.0
3Matildabook5.0804.0
4Clockwork OrangeNoneNaN204.0
5WaldenNoneNaN1004.5
\n", 98 | "
" 99 | ], 100 | "text/plain": [ 101 | " name type price quantity rating\n", 102 | "0 The Da Vinci Code book 15.0 300 4.0\n", 103 | "1 Jurassic Park book 12.0 400 4.5\n", 104 | "2 Jurassic Park film 8.0 6 5.0\n", 105 | "3 Matilda book 5.0 80 4.0\n", 106 | "4 Clockwork Orange None NaN 20 4.0\n", 107 | "5 Walden None NaN 100 4.5" 108 | ] 109 | }, 110 | "execution_count": 2, 111 | "metadata": {}, 112 | "output_type": "execute_result" 113 | } 114 | ], 115 | "source": [ 116 | "product_df = pd.DataFrame(\n", 117 | " [\n", 118 | " {\"name\": \"The Da Vinci Code\", \"type\": \"book\", \"price\": 15, \"quantity\": 300, \"rating\": 4},\n", 119 | " {\"name\": \"Jurassic Park\", \"type\": \"book\", \"price\": 12, \"quantity\": 400, \"rating\": 4.5},\n", 120 | " {\"name\": \"Jurassic Park\", \"type\": \"film\", \"price\": 8, \"quantity\": 6, \"rating\": 5},\n", 121 | " {\"name\": \"Matilda\", \"type\": \"book\", \"price\": 5, \"quantity\": 80, \"rating\": 4},\n", 122 | " {\"name\": \"Clockwork Orange\", \"type\": None, \"price\": None, \"quantity\": 20, \"rating\": 4},\n", 123 | " {\"name\": \"Walden\", \"type\": None, \"price\": None, \"quantity\": 100, \"rating\": 4.5},\n", 124 | " ],\n", 125 | ")\n", 126 | "\n", 127 | "product_df" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 3, 133 | "id": "b0790116-253e-46e0-a12e-8a31cb2f7db5", 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "\u001b[32;1m\u001b[1;3mdf.isnull().sum()\n", 141 | "\u001b[0m" 142 | ] 143 | }, 144 | { 145 | "data": { 146 | "text/plain": [ 147 | "name 0\n", 148 | "type 2\n", 149 | "price 2\n", 150 | "quantity 0\n", 151 | "rating 0\n", 152 | "dtype: int64" 153 | ] 154 | }, 155 | "execution_count": 3, 156 | "metadata": {}, 157 | "output_type": "execute_result" 158 | } 159 | ], 160 | "source": [ 161 | "product_df.llm.query(\"What columns are missing values?\")" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 4, 167 | "id": "bc4c7b71-d4b9-4b45-8a73-25bae097e93d", 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "data": { 172 | "text/plain": [ 173 | "name 0\n", 174 | "type 2\n", 175 | "dtype: int64" 176 | ] 177 | }, 178 | "execution_count": 4, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "product_df.llm.query(\"Of these, are any strings?\", yolo=True)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 5, 190 | "id": "403c03f2-ea70-4aad-ba4c-3bb98943e238", 191 | "metadata": {}, 192 | "outputs": [ 193 | { 194 | "name": "stdout", 195 | "output_type": "stream", 196 | "text": [ 197 | "\u001b[32;1m\u001b[1;3mimport random\n", 198 | "\n", 199 | "fruits = ['apple', 'banana', 'orange', 'strawberry', 'grape']\n", 200 | "\n", 201 | "df['type'] = df['type'].fillna(random.choice(fruits))\n", 202 | "\u001b[0m" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "product_df.llm.query(\"Impute the type column with random fruits.\")" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 6, 213 | "id": "4ee691c0-a959-4d16-8e99-3eb111363a68", 214 | "metadata": {}, 215 | "outputs": [ 216 | { 217 | "data": { 218 | "text/html": [ 219 | "
\n", 220 | "\n", 233 | "\n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | "
nametypepricequantityrating
0The Da Vinci Codebook15.03004.0
1Jurassic Parkbook12.04004.5
2Jurassic Parkfilm8.065.0
3Matildabook5.0804.0
4Clockwork OrangeappleNaN204.0
5WaldenappleNaN1004.5
\n", 295 | "
" 296 | ], 297 | "text/plain": [ 298 | " name type price quantity rating\n", 299 | "0 The Da Vinci Code book 15.0 300 4.0\n", 300 | "1 Jurassic Park book 12.0 400 4.5\n", 301 | "2 Jurassic Park film 8.0 6 5.0\n", 302 | "3 Matilda book 5.0 80 4.0\n", 303 | "4 Clockwork Orange apple NaN 20 4.0\n", 304 | "5 Walden apple NaN 100 4.5" 305 | ] 306 | }, 307 | "execution_count": 6, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | } 311 | ], 312 | "source": [ 313 | "product_df" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 7, 319 | "id": "d55ae853-8253-46b2-923c-a4dd36b7c8f8", 320 | "metadata": {}, 321 | "outputs": [ 322 | { 323 | "name": "stdout", 324 | "output_type": "stream", 325 | "text": [ 326 | "\u001b[32;1m\u001b[1;3msplit_index = round(len(df) / 3)\n", 327 | "\n", 328 | "df1 = df[:split_index]\n", 329 | "df2 = df[split_index:]\n", 330 | "\n", 331 | "(df1, df2)\n", 332 | "\u001b[0m" 333 | ] 334 | }, 335 | { 336 | "data": { 337 | "text/html": [ 338 | "
\n", 339 | "\n", 352 | "\n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | "
nametypepricequantityrating
0The Da Vinci Codebook15.03004.0
1Jurassic Parkbook12.04004.5
\n", 382 | "
" 383 | ], 384 | "text/plain": [ 385 | " name type price quantity rating\n", 386 | "0 The Da Vinci Code book 15.0 300 4.0\n", 387 | "1 Jurassic Park book 12.0 400 4.5" 388 | ] 389 | }, 390 | "metadata": {}, 391 | "output_type": "display_data" 392 | }, 393 | { 394 | "data": { 395 | "text/html": [ 396 | "
\n", 397 | "\n", 410 | "\n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | "
nametypepricequantityrating
2Jurassic Parkfilm8.065.0
3Matildabook5.0804.0
4Clockwork OrangeappleNaN204.0
5WaldenappleNaN1004.5
\n", 456 | "
" 457 | ], 458 | "text/plain": [ 459 | " name type price quantity rating\n", 460 | "2 Jurassic Park film 8.0 6 5.0\n", 461 | "3 Matilda book 5.0 80 4.0\n", 462 | "4 Clockwork Orange apple NaN 20 4.0\n", 463 | "5 Walden apple NaN 100 4.5" 464 | ] 465 | }, 466 | "metadata": {}, 467 | "output_type": "display_data" 468 | } 469 | ], 470 | "source": [ 471 | "from IPython.display import display\n", 472 | "\n", 473 | "\n", 474 | "df1, df2 = product_df.llm.query(\"Split the dataframe into two, 1/3 in one, 2/3 in the other. Return (df1, df2).\")\n", 475 | "\n", 476 | "display(df1)\n", 477 | "display(df2)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 8, 483 | "id": "67aea456-2fc9-4e57-b5a7-7cd987479c14", 484 | "metadata": {}, 485 | "outputs": [ 486 | { 487 | "name": "stdout", 488 | "output_type": "stream", 489 | "text": [ 490 | "\u001b[32;1m\u001b[1;3mdf[df['type'] == 'book']\n", 491 | "\u001b[0m" 492 | ] 493 | }, 494 | { 495 | "data": { 496 | "text/html": [ 497 | "
\n", 498 | "\n", 511 | "\n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | "
nametypepricequantityrating
0The Da Vinci Codebook15.03004.0
1Jurassic Parkbook12.04004.5
3Matildabook5.0804.0
\n", 549 | "
" 550 | ], 551 | "text/plain": [ 552 | " name type price quantity rating\n", 553 | "0 The Da Vinci Code book 15.0 300 4.0\n", 554 | "1 Jurassic Park book 12.0 400 4.5\n", 555 | "3 Matilda book 5.0 80 4.0" 556 | ] 557 | }, 558 | "execution_count": 8, 559 | "metadata": {}, 560 | "output_type": "execute_result" 561 | } 562 | ], 563 | "source": [ 564 | "product_df.llm.query(\"Now show me all products that are books.\")" 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": 9, 570 | "id": "c35d7246-6cf2-4148-8726-aa2bd9cc46f9", 571 | "metadata": {}, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | "\u001b[32;1m\u001b[1;3mdf[df['type'] == 'book'].sort_values(by='quantity').head(1)\n", 578 | "\u001b[0m" 579 | ] 580 | }, 581 | { 582 | "data": { 583 | "text/html": [ 584 | "
\n", 585 | "\n", 598 | "\n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | "
nametypepricequantityrating
3Matildabook5.0804.0
\n", 620 | "
" 621 | ], 622 | "text/plain": [ 623 | " name type price quantity rating\n", 624 | "3 Matilda book 5.0 80 4.0" 625 | ] 626 | }, 627 | "execution_count": 9, 628 | "metadata": {}, 629 | "output_type": "execute_result" 630 | } 631 | ], 632 | "source": [ 633 | "product_df.llm.query(\"Of these, which has the lowest items stocked?\")" 634 | ] 635 | }, 636 | { 637 | "cell_type": "code", 638 | "execution_count": 10, 639 | "id": "023d3262-5228-4a85-a24e-f127fed6fb9b", 640 | "metadata": {}, 641 | "outputs": [ 642 | { 643 | "name": "stdout", 644 | "output_type": "stream", 645 | "text": [ 646 | "\u001b[32;1m\u001b[1;3mimport seaborn as sns\n", 647 | "\n", 648 | "sns.catplot(x='type', y='price', data=df[df['type'] != 'apple'], kind='bar')\n", 649 | "\u001b[0m" 650 | ] 651 | }, 652 | { 653 | "data": { 654 | "text/plain": [ 655 | "" 656 | ] 657 | }, 658 | "execution_count": 10, 659 | "metadata": {}, 660 | "output_type": "execute_result" 661 | }, 662 | { 663 | "data": { 664 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAAAP2klEQVR4nO3de4yldX3H8ffH3VoXJaJ1lMZlBS2lEjRVpt4wmgqYbWuLaW0BxYo1blu1Xqpu0Bppmv5h0NRe1OKqCASCKUhF20JFvLXWqiNgF1gR6wUX2TLrFS8VKd/+MWeSZbK7Hoc9z3dmzvuVbOY8z3n2/L6TTN757dlzzqSqkCQN717dA0jStDLAktTEAEtSEwMsSU0MsCQ1Wd89wDg2b95cV1xxRfcYkrRc2dvJVbED3r17d/cIknTArYoAS9JaZIAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpqsik9D0/i2bt3Krl27OPTQQznrrLO6x5G0HwZ4jdm1axe33HJL9xiSxuBTEJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNJhbgJOckuS3JdXu575VJKsmDJrW+JK10k9wBnwtsXnoyyWHA04GbJ7i2JK14EwtwVX0c+OZe7nozsBWoSa0tSavBoM8BJzkJuKWqPjfkupK0Eg32O+GSHAS8loWnH8a5fguwBWDTpk0TnEySegy5A34EcATwuSRfATYCVyc5dG8XV9W2qpqtqtmZmZkBx5SkYQy2A66q7cCDF49HEZ6tqt1DzSBJK8kkX4Z2EfBJ4KgkO5O8YFJrSdJqNLEdcFWd+hPuP3xSa0vSauA74SSpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJhMLcJJzktyW5Lo9zr0xyeeT/FeSf0xyyKTWl6SVbpI74HOBzUvOXQkcU1WPBr4AvGaC60vSijaxAFfVx4FvLjn3waq6c3T4n8DGSa0vSStd53PAfwBc3ri+JLVqCXCSPwPuBC7czzVbkswlmZufnx9uOEkayOABTnI68AzgOVVV+7quqrZV1WxVzc7MzAw2nyQNZf2QiyXZDGwFnlpVPxhybUlaaSb5MrSLgE8CRyXZmeQFwFuAg4Erk1yb5OxJrS9JK93EdsBVdepeTr9rUutJ0mrjO+EkqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJanJxH4r8kpx7KvP7x5hUAfvvp11wM27b5+q7/2zb/z97hGkn5o7YElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaTCzASc5JcluS6/Y498AkVya5afT1AZNaX5JWuknugM8FNi85dwZwVVUdCVw1OpakqTSxAFfVx4FvLjl9EnDe6PZ5wDMntb4krXRDPwf8kKq6dXR7F/CQgdeXpBWj7T/hqqqA2tf9SbYkmUsyNz8/P+BkkjSMoQP8P0l+HmD09bZ9XVhV26pqtqpmZ2ZmBhtQkoYydIDfDzxvdPt5wGUDry9JK8YkX4Z2EfBJ4KgkO5O8AHgDcGKSm4ATRseSNJUm9oHsVXXqPu46flJrStJq4jvhJKmJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJmMHOMnDkpwwur0hycGTG0uS1r6xApzkhcAlwNtHpzYC75vQTJI0FcbdAb8YOA74LkBV3QQ8eFJDSdI0GDfAP6qqOxYPkqwHajIjSdJ0GDfAH0vyWmBDkhOBi4EPLHfRJK9Icn2S65JclOQ+y30sSVqtxg3wGcA8sB34Q+BfgNctZ8EkDwVeCsxW1THAOuCU5TyWJK1m68e8bgNwTlW9AyDJutG5H9yDdTck+TFwEPD1ZT6OJK1a4+6Ar2IhuIs2AB9azoJVdQvwJuBm4FbgO1X1weU8liStZuPugO9TVd9bPKiq7yU5aDkLJnkAcBJwBPBt4OIkp1XVBUuu2wJsAdi0adNylpIGcfNfPKp7BE3Yptdvn8jjjrsD/n6Sxy4eJDkW+OEy1zwB+HJVzVfVj4FLgSctvaiqtlXVbFXNzszMLHMpSVq5xt0Bv5yFnerXgQCHAicvc82bgSeMdtA/BI4H5pb5WJK0ao0V4Kr6TJJfAo4anbpxtHv9qVXVp5JcAlwN3AlcA2xbzmNJ0mq23wAneVpVfTjJby+56xeTUFWXLmfRqjoTOHM5f1eS1oqftAN+KvBh4Df3cl+x8PytJGkZ9hvgqjozyb2Ay6vqHwaaSZKmwk98FURV3QVsHWAWSZoq474M7UNJXpXksCQPXPwz0ckkaY0b92VoJ7PwnO+Llpx/+IEdR5Kmx7gBPpqF+D6ZhRD/G3D2pIaSpGkwboDPY+HD2P92dPzs0bnfm8RQkjQNxg3wMVV19B7HH0lywyQGkqRpMe5/wl2d5AmLB0kej28flqR7ZNwd8LHAfyS5eXS8CbgxyXagqurRE5lOktawcQO8eaJTSNIUGvfDeL466UEkadqM+xywJOkAM8CS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUpCXASQ5JckmSzyfZkeSJHXNIUqdxfynngfY3wBVV9awk9wYOappDktoMHuAk9weeApwOUFV3AHcMPYckdet4CuIIYB54d5JrkrwzyX0b5pCkVh0BXg88Fvj7qnoM8H3gjKUXJdmSZC7J3Pz8/NAzStLEdQR4J7Czqj41Or6EhSDfTVVtq6rZqpqdmZkZdEBJGsLgAa6qXcDXkhw1OnU8cMPQc0hSt65XQfwJcOHoFRBfAp7fNIcktWkJcFVdC8x2rC1JK4XvhJOkJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElq0vWB7JqQu+5937t9lbRyGeA15vtHPr17BElj8ikISWpigCWpiQGWpCYGWJKaGGBJamKAJamJAZakJgZYkpoYYElqYoAlqYkBlqQmBliSmhhgSWpigCWpSVuAk6xLck2Sf+qaQZI6de6AXwbsaFxfklq1BDjJRuA3gHd2rC9JK0HXDvivga3AXfu6IMmWJHNJ5ubn5wcbTJKGMniAkzwDuK2qPru/66pqW1XNVtXszMzMQNNJ0nA6dsDHAb+V5CvAe4CnJbmgYQ5JajV4gKvqNVW1saoOB04BPlxVpw09hyR183XAktSk9dfSV9VHgY92ziBJXdwBS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNBg9wksOSfCTJDUmuT/KyoWeQpJVgfcOadwKvrKqrkxwMfDbJlVV1Q8MsktRm8B1wVd1aVVePbt8O7AAeOvQcktSt9TngJIcDjwE+tZf7tiSZSzI3Pz8/+GySNGltAU5yP+C9wMur6rtL76+qbVU1W1WzMzMzww8oSRPWEuAkP8NCfC+sqks7ZpCkbh2vggjwLmBHVf3V0OtL0krRsQM+Dngu8LQk147+/HrDHJLUavCXoVXVvwMZel1JWml8J5wkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1MQAS1ITAyxJTQywJDUxwJLUxABLUhMDLElNDLAkNTHAktTEAEtSEwMsSU0MsCQ1McCS1KQlwEk2J7kxyReTnNExgyR1GzzASdYBbwV+DTgaODXJ0UPPIUndOnbAjwO+WFVfqqo7gPcAJzXMIUmt1jes+VDga3sc7wQev/SiJFuALaPD7yW5cYDZ1ooHAbu7hxhS3vS87hGmydT9fHFm7ukjXFFVm5ee7AjwWKpqG7Cte47VKMlcVc12z6G1yZ+vA6fjKYhbgMP2ON44OidJU6UjwJ8BjkxyRJJ7A6cA72+YQ5JaDf4URFXdmeQlwL8C64Bzqur6oedY43zqRpPkz9cBkqrqnkGSppLvhJOkJgZYkpoY4FUiyeFJrjsAj/OVJA86EDNpbUry0iQ7knxr8aMCkvx5kld1z7bWrNjXAUtq8yLghKra2T3IWucOeHVZn+TC0e7kkiQHJTk+yTVJtic5J8nPAuzr/KIkG5JcnuSFPd+KVqIkZwMPBy5P8ookb9nLNR9N8uYkc6OfxV9JcmmSm5L85fBTr14GeHU5CnhbVT0S+C7wp8C5wMlV9SgW/kXzx0nus7fzezzO/YAPABdV1TuGG18rXVX9EfB14FeBb+3n0jtG74Y7G7gMeDFwDHB6kp+b+KBrhAFeXb5WVZ8Y3b4AOB74clV9YXTuPOApLIR6b+cXXQa8u6rOH2BmrU2Lb57aDlxfVbdW1Y+AL3H3d7pqPwzw6rL0RdvfXubjfALYnOQef8KIptaPRl/v2uP24rH/tzQmA7y6bEryxNHtZwNzwOFJfmF07rnAx4Ab93F+0etZ+OflWyc/sqR9McCry43Ai5PsAB4AvBl4PnBxku0s7D7Orqr/3dv5JY/1MmBDkrMGm17S3fhWZElq4g5YkpoYYElqYoAlqYkBlqQmBliSmhhgrUlJDknyou45pP0xwFqrDmHhU72kFcsAa616A/CIJNcmuTjJMxfvGH2i3ElJTk9y2ejTvW5KcuYe15yW5NOjv//2JOs6vgmtbQZYa9UZwH9X1S8DbwFOB0hyf+BJwD+Prnsc8DvAo4HfTTKb5JHAycBxo7//f8Bzhhxe08EPzdCaV1UfS/K2JDMsxPa9o9/ODXBlVX0DIMmlwJOBO4Fjgc+MrtkA3NYyvNY0A6xpcT5wGnAKC5+TsWjpe/ELCHBeVb1moNk0pXwKQmvV7cDBexyfC7wcoKpu2OP8iUkemGQD8EwWPqrzKuBZSR4MMLr/YQPMrCnjDlhrUlV9I8knRr/I9PKqevXoU+Tet+TSTwPvBTYCF1TVHECS1wEfTHIv4Mcs/MaHrw72DWgq+GlomgpJDmLhtzc8tqq+Mzp3OjBbVS/pnE3Ty6cgtOYlOQHYAfzdYnyllcAdsCQ1cQcsSU0MsCQ1McCS1MQAS1ITAyxJTf4f290zOgI0FxEAAAAASUVORK5CYII=\n", 665 | "text/plain": [ 666 | "
" 667 | ] 668 | }, 669 | "metadata": { 670 | "needs_background": "light" 671 | }, 672 | "output_type": "display_data" 673 | } 674 | ], 675 | "source": [ 676 | "product_df.llm.query(\"Now make a seaborn plot of price grouped by type, but exclude those random fruits you made.\")" 677 | ] 678 | }, 679 | { 680 | "cell_type": "code", 681 | "execution_count": 11, 682 | "id": "2ad7daa9-f830-492d-b52c-40194be8f045", 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "name": "stdout", 687 | "output_type": "stream", 688 | "text": [ 689 | "\u001b[32;1m\u001b[1;3mimport seaborn as sns\n", 690 | "\n", 691 | "sns.set_style(\"dark\")\n", 692 | "sns.set_palette(\"pastel\")\n", 693 | "\n", 694 | "sns.catplot(x='type', y='price', data=df[df['type'] != 'apple'], kind='bar')\n", 695 | "\u001b[0m" 696 | ] 697 | }, 698 | { 699 | "data": { 700 | "text/plain": [ 701 | "" 702 | ] 703 | }, 704 | "execution_count": 11, 705 | "metadata": {}, 706 | "output_type": "execute_result" 707 | }, 708 | { 709 | "data": { 710 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAAAS9ElEQVR4nO3dfUyV9f/H8dfhThneJA40yww1rbRoJBZ5U1GiGAiiGTZb0spa5bHEMnTRdCtWa7rZHyVDU0tcmSSJ2Y2o2ZTu1DSztTS6wYRKRMI4Isj3jyb7+cvsRJ3rfc7h+dganQs4n/fZ2LNP1znXOa7W1tZWAQAcF2I9AAB0VAQYAIwQYAAwQoABwAgBBgAjYdYDeKOpqVnHjzdajwEA7RIT0/WcxwNiB+xyuaxHAID/XEAEGACCEQEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgIPM7t2facGC+dq9+zPrUQD8jYB4O0p4b+3aYlVWfiuPp1EJCcOsxwFwHuyAg0xjo+esrwD8FwEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgAHACAEGACMEGACM+CzAeXl5SkpKUlpa2p++t3z5cg0ePFi1tbW+Wh4A/J7PApyVlaWioqI/HT9y5Ih27NihPn36+GppAAgIPgtwYmKiunfv/qfjBQUFeuyxx+RyuXy1NAAEBEfPAW/evFmxsbG6/PLLnVwWAPySY58J19jYqKVLl2r58uVOLQkAfs2xHfAPP/ygqqoqZWRkKDk5WdXV1crKytIvv/zi1AgA4Fcc2wEPHjxYFRUVbbeTk5P1xhtvKDo62qkRAMCv+GwHPHv2bGVnZ6uyslKjR4/W2rVrfbUUAAQkn+2AFy1adN7vb9myxVdLA0BA4Eo4ADBCgAHACAEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgAHACAEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgAHACAEGACMEGACMEGAAMEKAAcBImK/uOC8vT9u2bVPPnj1VVlYmSXr22We1detWhYeH65JLLlFBQYG6devmqxEAwK/5bAeclZWloqKis46NGDFCZWVl2rBhgy699FItXbrUV8sDgN/zWYATExPVvXv3s46NHDlSYWF/bLqvueYaVVdX+2p5APB7ZueA161bp9GjR1stDwDmTAL84osvKjQ0VBMmTLBYHgD8gs+ehPsrJSUl2rZtm1asWCGXy+X08gDgNxwN8Pbt21VUVKRXX31VkZGRTi4NAH7HZwGePXu2PvnkEx07dkyjR4/WzJkzVVhYqKamJuXk5EiS4uPjtXDhQl+NAAB+zWcBXrRo0Z+O3X777b5aDgACDlfCAYARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcAAYIQAA4ARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEZ99KrK/uKBHlMLDOs5/Z0JDXW1fY2K6Gk/jnFPNp1V37IT1GMA/EvQBDg8L0Ruf/GI9hmMaPC1tXzvS4548PMZ6BOAf6zhbQwDwMwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADDiswDn5eUpKSlJaWlpbcfq6uqUk5OjlJQU5eTk6Pjx475aHgD8ns8CnJWVpaKiorOOFRYWKikpSe+9956SkpJUWFjoq+UBwO/5LMCJiYnq3r37WcfKy8uVmZkpScrMzNTmzZt9tTwA+D1HzwEfPXpUsbGxkqSYmBgdPXrUyeUBwK+YPQnncrnkcrmslgcAc44GuGfPnvr5558lST///LOio6OdXB4A/IqjAU5OTtb69eslSevXr9ctt9zi5PIA4Fd8FuDZs2crOztblZWVGj16tNauXasZM2Zox44dSklJ0c6dOzVjxgxfLQ8Afs9n7we8aNGicx5fuXKlr5YEgIDClXAAYIQAA4ARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcAAYIQAA4ARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcAAYIQAA4ARrwN8+PBh7dy5U5Lk8XjU0NDgs6EAoCPwKsCvv/663G638vPzJUnV1dV66KGHfDoYAAQ7rwK8evVqrVmzRl26dJEkXXrppaqtrfXpYAAQ7LwKcEREhCIiItpuNzc3+2wgAOgowrz5ocTERL300kvyeDzasWOHiouLlZyc3O5FV6xYobVr18rlcmnQoEEqKChQp06d2n1/ABCIvNoBz5kzR9HR0Ro0aJBee+013XjjjXrkkUfatWBNTY1WrVqldevWqaysTC0tLdq4cWO77gsAAplXO2CPx6NJkyZpypQpkqSWlhZ5PB5FRka2a9Ezvx8WFiaPx6PY2Nh23Q8ABDKvdsDTp0+Xx+Npu+3xeJSTk9OuBXv16qV77rlHN998s0aOHKkuXbpo5MiR7bovAAhkXu2AT548qaioqLbbUVFRamxsbNeCx48fV3l5ucrLy9W1a1fNmjVLpaWlysjIaNf9AdaiL+is0PBw6zHgQy2nTqm2zvP3P/gPeRXgyMhIffnllxoyZIgkaf/+/ercuXO7Fty5c6cuvvhiRUdHS5JSUlK0Z88eAoyAFRoerrp3XrAeAz50wbiZkowCPG/ePM2aNUuxsbFqbW3Vr7/+qsWLF7drwT59+mjv3r1qbGxU586dVVFRoaFDh7brvgAgkHkV4KuvvlqbNm1SZWWlJCkuLk7h7fxfrvj4eI0dO1YTJ05UWFiYrrjiCt1xxx3tui8ACGTnDXBFRYWSkpL03nvvnXX8u+++k/TH6YP2cLvdcrvd7fpdAAgW5w3wp59+qqSkJG3duvWc329vgAEAfxNgt9ut06dPa9SoURo/frxTMwFAh/C3rwMOCQlRUVGRE7MAQIfi1YUYN9xwg5YtW6YjR46orq6u7R8AQPt59SqIt99+Wy6XS8XFxWcdLy8v98lQANAReB3g4uJi7dq1Sy6XS8OGDVN2dravZwOAoObVKYi5c+fq0KFDuuuuuzRt2jQdPHhQc+fO9fVsABDUvNoBf/PNN3r77bfbbl9//fW8KgIA/iWvdsBXXnmlPv/887bbe/fu5fJhAPiXvNoBf/nll8rOzlafPn0kST/99JPi4uKUnp4uSdqwYYPvJgSAIOVVgHkdMAD897wK8EUXXeTrOQCgw/HqHDAA4L9HgAHACAEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgAHACAEGACMEGACMEGAAMEKAAcCISYDr6+vldrs1btw4paamas+ePRZjAIAprz4T7r/29NNPa9SoUVqyZImamprk8XgsxgAAU47vgH/77Td9+umnmjx5siQpIiJC3bp1c3oMADDneICrqqoUHR2tvLw8ZWZmav78+fr999+dHgMAzDke4ObmZh04cEBTp07V+vXrFRkZqcLCQqfHAABzjge4d+/e6t27t+Lj4yVJ48aN04EDB5weAwDMOR7gmJgY9e7dW99++60kqaKiQgMGDHB6DAAwZ/IqiCeffFJz5szRqVOn1LdvXxUUFFiMAQCmTAJ8xRVXqKSkxGJpAPAbXAkHAEYIMAAYIcAAYIQAA4ARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcBBJrxT5FlfAfgvAhxkho+5XX36X6nhY263HgXA3zB5Q3b4Tr/LE9Tv8gTrMQB4gR0wABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcAAYIQAA4ARAgwARggwABghwABgxCzALS0tyszM1P333281AgCYMgvwqlWrNGDAAKvlAcCcSYCrq6u1bds2TZ482WJ5APALJgF+5pln9NhjjykkhFPQADouxwu4detWRUdHa+jQoU4vDQB+xfHPhNu9e7e2bNmi7du36+TJk2poaNCcOXP0/PPPOz0KAJhyPMC5ubnKzc2VJH388cdavnw58QXQIXESFgCMmH4s/XXXXafrrrvOcgQAMMMOGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgAHACAEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwQoABwAgBBgAjBBgAjBBgADBCgAHACAEGACMEGACMEGAAMEKAAcAIAQYAIwQYAIwQYAAwEub0gkeOHNHjjz+uo0ePyuVyacqUKbr77rudHgMAzDke4NDQUD3xxBMaMmSIGhoaNGnSJI0YMUIDBw50ehQAMOX4KYjY2FgNGTJEktSlSxf1799fNTU1To8BAOZMzwFXVVXpq6++Unx8vOUYAGDCLMAnTpyQ2+3WvHnz1KVLF6sxAMCMSYBPnTolt9ut9PR0paSkWIwAAOYcD3Bra6vmz5+v/v37Kycnx+nlAcBvOB7gXbt2qbS0VB999JEyMjKUkZGhDz74wOkxAMCc4y9DGzZsmL7++munlwUAv8OVcABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcAAYIQAA4ARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEQIMAEYIMAAYIcAAYIQAA4ARAgwARggwABghwABghAADgBECDABGCDAAGCHAAGCEAAOAEZMAb9++XWPHjtWYMWNUWFhoMQIAmHM8wC0tLVq4cKGKioq0ceNGlZWV6eDBg06PAQDmHA/wvn371K9fP/Xt21cRERG67bbbVF5e7vQYAGAuzOkFa2pq1Lt377bbvXr10r59+877O+HhoYqJ6druNScPj2n37yJw/Ju/kX/rgnEzzdaGM3zx98WTcABgxPEA9+rVS9XV1W23a2pq1KtXL6fHAABzjgf4qquu0nfffacff/xRTU1N2rhxo5KTk50eAwDMOX4OOCwsTPn5+br33nvV0tKiSZMm6bLLLnN6DAAw52ptbW21HgIAOiKehAMAIwQYAIwQ4ABRVVWltLS0f30/ycnJqq2t/Q8mQrBatWqVUlNTlZiY2PZWAS+88IKWLVtmPFnwcfxJOAD+rbi4WCtWrDjrgin4BjvgANLc3Kzc3FylpqbK7XarsbFRFRUVyszMVHp6uvLy8tTU1CRJf3n8DI/Ho3vvvVevv/66xUOBn8rPz1dVVZXuu+8+rVixQgsXLvzTz9x111165plnlJWVpdTUVO3bt08PP/ywUlJStHjxYoOpAxcBDiCVlZW68847tWnTJkVFRenll1/WE088ocWLF2vDhg1qaWlRcXGxTp48ec7jZ/z+++964IEHlJaWpilTphg+IvibhQsXKjY2VitXrlS3bt3+8ufCw8NVUlKi7OxsPfjgg8rPz1dZWZnefPNNHTt2zMGJAxsBDiAXXnihrr32WknShAkTVFFRoYsvvlhxcXGSpIkTJ+qzzz5TZWXlOY+f8eCDDyorK0uZmZmOPwYEhzMXTw0aNEiXXXaZYmNjFRERob59+551pSvOjwAHEJfLddbt8+1QzichIUEffviheAk42isiIkKSFBIS0vbvZ243NzdbjRVwCHAA+emnn7Rnzx5JUllZmYYOHarDhw/r+++/lySVlpYqMTFRcXFx5zx+htvtVvfu3bVgwQLnHwSANgQ4gMTFxWn16tVKTU1VfX29pk+froKCAs2aNUvp6elyuVyaOnWqOnXqdM7j/9f8+fN18uRJPffcc0aPBgCXIgOAEXbAAGCEAAOAEQIMAEYIMAAYIcAAYIQAIyjV19dr9erV1mMA50WAEZTq6+u1Zs0a6zGA8+J1wAhKjz76qMrLyxUXF6d+/fppwoQJuvXWWyWp7R3l6uvr9f7776uhoUE1NTWaMGGCHn74YUl/XD34yiuv6NSpU4qPj9dTTz2l0NBQy4eEIMQOGEEpNzdXl1xyiUpLSzVt2jSVlJRIkn777Tft2bNHN910kyTpiy++0JIlS/TWW2/pnXfe0RdffKFDhw5p06ZNWrNmjUpLSxUSEqINGzYYPhoEK96QHUFv+PDhWrBggWpra/Xuu+9q7NixCgv740//hhtuUI8ePSRJY8aM0a5duxQWFqb9+/dr8uTJkv547+SePXuazY/gRYDRIWRkZOitt97Sxo0bVVBQ0Hb8/7/DnMvlUmtrqyZOnKjc3Fynx0QHwykIBKWoqCidOHGi7XZWVpZWrlwpSRo4cGDb8R07dqiurk4ej0ebN29WQkKCkpKS9O677+ro0aOSpLq6Oh0+fNjZB4AOgR0wglKPHj2UkJCgtLQ0jRo1SnPnzlX//v3bnog74+qrr9bMmTPbnoS76qqrJEmPPPKI7rnnHp0+fVrh4eHKz8/XRRddZPFQEMR4FQQ6hMbGRqWnp+vNN99U165dJUklJSXav3+/8vPzjadDR8UpCAS9nTt3avz48Zo2bVpbfAF/wA4YAIywAwYAIwQYAIwQYAAwQoABwAgBBgAj/wOSJV42B8vQEAAAAABJRU5ErkJggg==\n", 711 | "text/plain": [ 712 | "
" 713 | ] 714 | }, 715 | "metadata": {}, 716 | "output_type": "display_data" 717 | } 718 | ], 719 | "source": [ 720 | "product_df.llm.query(\"Can you use a dark theme, and pastel colors?\")" 721 | ] 722 | }, 723 | { 724 | "cell_type": "code", 725 | "execution_count": 12, 726 | "id": "42811777-24b0-4b52-bc21-3d10247537d7", 727 | "metadata": {}, 728 | "outputs": [ 729 | { 730 | "data": { 731 | "text/plain": [ 732 | "" 733 | ] 734 | }, 735 | "execution_count": 12, 736 | "metadata": {}, 737 | "output_type": "execute_result" 738 | }, 739 | { 740 | "data": { 741 | "image/png": "\n", 742 | "text/plain": [ 743 | "
" 744 | ] 745 | }, 746 | "metadata": {}, 747 | "output_type": "display_data" 748 | } 749 | ], 750 | "source": [ 751 | "product_df.llm.query(\"Group by type and take the mean of all numeric columns.\", yolo=True).llm.query(\"Make a bar plot of the result and use a log scale.\", yolo=True)" 752 | ] 753 | } 754 | ], 755 | "metadata": { 756 | "kernelspec": { 757 | "display_name": "Python 3 (ipykernel)", 758 | "language": "python", 759 | "name": "python3" 760 | }, 761 | "language_info": { 762 | "codemirror_mode": { 763 | "name": "ipython", 764 | "version": 3 765 | }, 766 | "file_extension": ".py", 767 | "mimetype": "text/x-python", 768 | "name": "python", 769 | "nbconvert_exporter": "python", 770 | "pygments_lexer": "ipython3", 771 | "version": "3.9.12" 772 | } 773 | }, 774 | "nbformat": 4, 775 | "nbformat_minor": 5 776 | } 777 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "yolopandas" 3 | version = "0.0.6" 4 | description = "Interact with Pandas objects via LLMs and langchain." 5 | authors = [] 6 | license = "MIT" 7 | readme = "README.md" 8 | repository = "https://www.github.com/ccurme/yolopandas" 9 | 10 | [tool.poetry.dependencies] 11 | python = ">=3.9,<4.0" 12 | ipython = "^8.8.0" 13 | langchain = ">= 0.0.60, < 1" 14 | openai = "^0" 15 | pandas = "^1.4" 16 | 17 | [tool.poetry.group.test.dependencies] 18 | pytest = "^7.2.0" 19 | pytest-cov = "^4.0.0" 20 | 21 | [tool.poetry.group.lint.dependencies] 22 | black = "^22.10.0" 23 | isort = "^5.10.1" 24 | flake8 = "^6.0.0" 25 | 26 | [tool.poetry.group.typing.dependencies] 27 | mypy = "^0.991" 28 | 29 | [tool.poetry.group.dev.dependencies] 30 | jupyter = "^1.0.0" 31 | 32 | [tool.isort] 33 | profile = "black" 34 | 35 | [tool.mypy] 36 | ignore_missing_imports = "True" 37 | disallow_untyped_defs = "True" 38 | exclude = ["tests"] 39 | 40 | [build-system] 41 | requires = ["poetry-core"] 42 | build-backend = "poetry.core.masonry.api" 43 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | TEST_DIRECTORY = os.path.dirname(os.path.abspath(__file__)) 4 | -------------------------------------------------------------------------------- /tests/data/product_df.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "name":"The Da Vinci Code", 4 | "type":"book", 5 | "price":15, 6 | "quantity":300, 7 | "rating":4.0 8 | }, 9 | { 10 | "name":"Jurassic Park", 11 | "type":"book", 12 | "price":12, 13 | "quantity":400, 14 | "rating":4.5 15 | }, 16 | { 17 | "name":"Jurassic Park", 18 | "type":"film", 19 | "price":8, 20 | "quantity":6, 21 | "rating":5.0 22 | }, 23 | { 24 | "name":"Matilda", 25 | "type":"book", 26 | "price":6, 27 | "quantity":80, 28 | "rating":4.0 29 | } 30 | ] -------------------------------------------------------------------------------- /tests/integration_tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ccurme/yolopandas/d008d0b73252045927289016c47667fd2247cc9f/tests/integration_tests/__init__.py -------------------------------------------------------------------------------- /tests/integration_tests/test_llm_accessor.py: -------------------------------------------------------------------------------- 1 | import os 2 | import unittest 3 | 4 | from tests import TEST_DIRECTORY 5 | from yolopandas import pd 6 | from yolopandas.utils.query_helpers import run_query_with_cost 7 | 8 | 9 | class TestLLMAccessor(unittest.TestCase): 10 | @classmethod 11 | def setUpClass(cls): 12 | test_data_path = os.path.join(TEST_DIRECTORY, "data", "product_df.json") 13 | cls.product_df = pd.read_json(test_data_path) 14 | 15 | def test_basic_use(self): 16 | self.product_df.llm.reset_chain(use_memory=False) 17 | result = self.product_df.llm.query( 18 | "What is the price of the highest-priced book?", 19 | yolo=True, 20 | ) 21 | expected_result = 15 22 | self.assertEqual(expected_result, result) 23 | 24 | result = run_query_with_cost( 25 | self.product_df, "What is the price of the highest-priced book?", yolo=True 26 | ) 27 | self.assertEqual(expected_result, result) 28 | 29 | result = self.product_df.llm.query( 30 | "What is the average price of products grouped by type?", 31 | yolo=True, 32 | ) 33 | expected = self.product_df.groupby("type")["price"].mean() 34 | pd.testing.assert_series_equal(expected, result) 35 | 36 | result = self.product_df.llm.query( 37 | "Give me products that are not books.", 38 | yolo=True, 39 | ) 40 | expected = self.product_df[self.product_df["type"] != "book"] 41 | pd.testing.assert_frame_equal(expected, result) 42 | 43 | def test_memory(self): 44 | self.product_df.llm.reset_chain(use_memory=True) 45 | _ = self.product_df.llm.query( 46 | "Show me all products that are books.", 47 | yolo=True, 48 | ) 49 | result = self.product_df.llm.query( 50 | "Of these, which has the fewest items stocked?", 51 | yolo=True, 52 | ) 53 | expected = ( 54 | self.product_df[self.product_df["type"] == "book"] 55 | .sort_values(by="quantity") 56 | .head(1) 57 | ) 58 | pd.testing.assert_frame_equal(expected, result) 59 | -------------------------------------------------------------------------------- /tests/unit_tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ccurme/yolopandas/d008d0b73252045927289016c47667fd2247cc9f/tests/unit_tests/__init__.py -------------------------------------------------------------------------------- /tests/unit_tests/test_llm_accessor.py: -------------------------------------------------------------------------------- 1 | import os 2 | import unittest 3 | from unittest.mock import Mock, patch 4 | 5 | from yolopandas import pd 6 | from langchain.chains.base import Chain 7 | from tests import TEST_DIRECTORY 8 | 9 | 10 | def _get_mock_chain(response: str) -> Chain: 11 | """Make mock Chain for unit tests.""" 12 | mock_chain = Mock(spec=Chain) 13 | mock_chain.run.return_value = response 14 | 15 | return mock_chain 16 | 17 | 18 | class TestLLMAccessor(unittest.TestCase): 19 | @classmethod 20 | def setUpClass(cls): 21 | test_data_path = os.path.join(TEST_DIRECTORY, "data", "product_df.json") 22 | cls.product_df = pd.read_json(test_data_path) 23 | 24 | @patch("yolopandas.llm_accessor.get_chain") 25 | def test_basic_use(self, mock): 26 | mock.return_value = _get_mock_chain("df[df['type'] == 'book']['price'].max()") 27 | result = self.product_df.llm.query( 28 | "What is the price of the highest-priced book?", 29 | yolo=True, 30 | ) 31 | expected_result = 15 32 | self.assertEqual(expected_result, result) 33 | 34 | mock.return_value = _get_mock_chain("df.groupby('type')['price'].mean()") 35 | self.product_df.llm.reset_chain() 36 | result = self.product_df.llm.query( 37 | "What is the average price of products grouped by type?", 38 | yolo=True, 39 | ) 40 | expected_result = self.product_df.groupby("type")["price"].mean() 41 | pd.testing.assert_series_equal(expected_result, result) 42 | 43 | mock.return_value = _get_mock_chain("df[df['type'] != 'book']") 44 | self.product_df.llm.reset_chain() 45 | result = self.product_df.llm.query( 46 | "Give me products that are not books.", 47 | yolo=True, 48 | ) 49 | expected = self.product_df[self.product_df["type"] != "book"] 50 | pd.testing.assert_frame_equal(expected, result) 51 | 52 | @patch("yolopandas.llm_accessor.get_chain") 53 | def test_sliced(self, mock): 54 | mock.return_value = _get_mock_chain("df[df['type'] == 'book']['price'].max()") 55 | self.product_df.llm.reset_chain() 56 | result = self.product_df[["name", "type", "price", "rating"]].llm.query( 57 | "What is the price of the highest-priced book?", yolo=True 58 | ) 59 | expected_result = 15 60 | self.assertEqual(expected_result, result) 61 | 62 | @patch("yolopandas.llm_accessor.get_chain") 63 | def test_multi_line(self, mock): 64 | """Test that we can accommodate multiple lines in the LLM response.""" 65 | query = """ 66 | Add a column `new_column` to the dataframe which is range 1 - number of rows, 67 | then return the mean of this column by each type. 68 | """ 69 | mock_response = ( 70 | "df['new_column'] = range(1, len(df) + 1)\n" 71 | "df.groupby('type')['new_column'].mean()" 72 | ) 73 | mock.return_value = _get_mock_chain(mock_response) 74 | self.product_df.llm.reset_chain() 75 | result = self.product_df.llm.query(query, yolo=True) 76 | expected = ( 77 | self.product_df.assign(new_column=range(1, len(self.product_df) + 1)) 78 | .groupby("type")["new_column"] 79 | .mean() 80 | ) 81 | pd.testing.assert_series_equal(expected, result) 82 | 83 | @patch("yolopandas.llm_accessor.get_chain") 84 | def test_multiline_exec(self, mock): 85 | """Test a multiline command when the final line should be exec'd not eval'd.""" 86 | query = """ 87 | Add a column `new_column` to the dataframe which is range 1 - number of rows, 88 | then add a column `foo` which is always the value 1 89 | """ 90 | mock_response = "df['new_column'] = range(1, len(df) + 1)\n" "df['foo'] = 1" 91 | mock.return_value = _get_mock_chain(mock_response) 92 | self.product_df.llm.reset_chain() 93 | self.product_df.llm.query(query, yolo=True) 94 | expected_df = self.product_df.assign( 95 | new_column=range(1, len(self.product_df) + 1) 96 | ).assign(foo=1) 97 | pd.testing.assert_frame_equal(expected_df, self.product_df) 98 | -------------------------------------------------------------------------------- /yolopandas/__init__.py: -------------------------------------------------------------------------------- 1 | from yolopandas import llm_accessor 2 | from yolopandas.chains import set_llm 3 | from yolopandas.llm_accessor import pd 4 | -------------------------------------------------------------------------------- /yolopandas/chains.py: -------------------------------------------------------------------------------- 1 | import os 2 | from typing import Optional 3 | 4 | from langchain import LLMChain, OpenAI, PromptTemplate 5 | from langchain.chains.base import Chain 6 | from langchain.chains.conversation.memory import ConversationBufferMemory 7 | from langchain.llms.base import BaseLLM 8 | from langchain.llms.loading import load_llm 9 | 10 | 11 | DEFAULT_LLM = None 12 | # Default template, no memory 13 | TEMPLATE = """ 14 | You are working with a pandas dataframe in Python. The name of the dataframe is `df`. 15 | The dataframe has the following columns: {df_columns}. 16 | 17 | You should execute code as commanded to either provide information to answer the question or to 18 | do the transformations required. 19 | 20 | You should not assign any variables; you should return a one-liner in Pandas. 21 | 22 | This is your objective: {query} 23 | 24 | Go! 25 | 26 | ```python 27 | print(df.head()) 28 | ``` 29 | ```output 30 | {df_head} 31 | ``` 32 | ```python""" 33 | 34 | PROMPT = PromptTemplate(template=TEMPLATE, input_variables=["query", "df_head", "df_columns"]) 35 | 36 | 37 | # Template with memory 38 | # TODO: add result of expected code to memory; currently we only remember what code was run. 39 | TEMPLATE_WITH_MEMORY = """ 40 | You are working with a pandas dataframe in Python. The name of the dataframe is `df`. 41 | The dataframe has the following columns: {df_columns}. 42 | 43 | You are interacting with a programmer. The programmer issues commands and you should translate 44 | them into Python code and execute them. 45 | 46 | This is the history of your interaction so far: 47 | {chat_history} 48 | Human: {query} 49 | 50 | Go! 51 | 52 | ```python 53 | df.head() 54 | ``` 55 | ```output 56 | {df_head} 57 | ``` 58 | ```python 59 | """ 60 | PROMPT_WITH_MEMORY = PromptTemplate( 61 | template=TEMPLATE_WITH_MEMORY, input_variables=["chat_history", "query", "df_head", "df_columns"] 62 | ) 63 | 64 | 65 | def set_llm(llm: BaseLLM) -> None: 66 | global DEFAULT_LLM 67 | DEFAULT_LLM = llm 68 | 69 | 70 | def get_chain(llm: Optional[BaseLLM] = None, use_memory: bool = True) -> Chain: 71 | """Get chain to use.""" 72 | if llm is None: 73 | if DEFAULT_LLM is None: 74 | llm_config_path = os.environ.get("LLPANDAS_LLM_CONFIGURATION") 75 | if llm_config_path is None: 76 | llm = OpenAI(temperature=0) 77 | else: 78 | llm = load_llm(llm_config_path) 79 | else: 80 | llm = DEFAULT_LLM 81 | 82 | if use_memory: 83 | memory = ConversationBufferMemory(memory_key="chat_history", input_key="query") 84 | chain = LLMChain(llm=llm, prompt=PROMPT_WITH_MEMORY, memory=memory) 85 | else: 86 | chain = LLMChain(llm=llm, prompt=PROMPT) 87 | 88 | return chain 89 | -------------------------------------------------------------------------------- /yolopandas/llm_accessor.py: -------------------------------------------------------------------------------- 1 | import ast 2 | import os 3 | from typing import Any, Optional 4 | 5 | import pandas as pd 6 | from IPython.display import clear_output 7 | from langchain.chains.base import Chain 8 | from langchain.input import print_text 9 | from langchain.llms.base import BaseLLM 10 | 11 | from yolopandas.chains import get_chain 12 | 13 | 14 | @pd.api.extensions.register_dataframe_accessor("llm") 15 | class LLMAccessor: 16 | def __init__(self, pandas_df: pd.DataFrame): 17 | self.df = pandas_df 18 | use_memory = bool(os.environ.get("LLPANDAS_USE_MEMORY", True)) 19 | self.chain = get_chain(use_memory=use_memory) 20 | 21 | def set_chain(self, chain: Chain) -> None: 22 | """Set chain to use.""" 23 | self.chain = chain 24 | 25 | def reset_chain( 26 | self, llm: Optional[BaseLLM] = None, use_memory: bool = True 27 | ) -> None: 28 | """Reset chain with LLM or memory kwarg.""" 29 | self.chain = get_chain(llm=llm, use_memory=use_memory) 30 | 31 | def query(self, query: str, yolo: bool = False) -> Any: 32 | """Query the dataframe.""" 33 | df = self.df 34 | df_columns = df.columns.tolist() 35 | inputs = {"query": query, "df_head": df.head(), "df_columns": df_columns, "stop": "```"} 36 | llm_response = self.chain.run(**inputs) 37 | eval_expression = False 38 | if not yolo: 39 | print("suggested code:") 40 | print(llm_response) 41 | print("run this code? y/n") 42 | user_input = input() 43 | if user_input == "y": 44 | clear_output(wait=True) 45 | print_text(llm_response, color="green") 46 | eval_expression = True 47 | else: 48 | eval_expression = True 49 | 50 | if eval_expression: 51 | # WARNING: This is a bad idea. Here we evaluate the (potentially multi-line) 52 | # llm response. Do not use unless you trust that llm_response is not malicious. 53 | tree = ast.parse(llm_response) 54 | module = ast.Module(tree.body[:-1], type_ignores=[]) 55 | exec(ast.unparse(module)) 56 | module_end = ast.Module(tree.body[-1:], type_ignores=[]) 57 | module_end_str = ast.unparse(module_end) 58 | try: 59 | return eval(module_end_str) 60 | except Exception: 61 | exec(module_end_str) 62 | -------------------------------------------------------------------------------- /yolopandas/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ccurme/yolopandas/d008d0b73252045927289016c47667fd2247cc9f/yolopandas/utils/__init__.py -------------------------------------------------------------------------------- /yolopandas/utils/query_helpers.py: -------------------------------------------------------------------------------- 1 | from typing import Any 2 | 3 | from langchain.callbacks import get_openai_callback 4 | 5 | from yolopandas import pd 6 | 7 | 8 | def run_query_with_cost(df: pd.DataFrame, query: str, yolo: bool = False) -> Any: 9 | """ 10 | A function to run a YOLOPandas query with cost estimation returned for your query in terms of tokens used. 11 | This includes total tokens, prompt tokens, completion tokens, and the total cost in USD. 12 | 13 | Parameters 14 | ---------- 15 | df : pd.DataFrame 16 | The Pandas DataFrame with your data 17 | query : str 18 | The query you want to run against your data 19 | yolo : bool 20 | Boolean value used to return a prompt to a user or not to accept the code result before 21 | running the code (False means to return the prompt) 22 | 23 | Returns 24 | ------- 25 | result : Any 26 | The results of the query run against your data. A prompt may be returned as intermediary 27 | output to proceed with generating the result or not. 28 | """ 29 | with get_openai_callback() as cb: 30 | result = df.llm.query(query, yolo=yolo) 31 | print(f"Total Tokens: {cb.total_tokens}") 32 | print(f"Prompt Tokens: {cb.prompt_tokens}") 33 | print(f"Completion Tokens: {cb.completion_tokens}") 34 | print(f"Total Cost (USD): ${cb.total_cost}") 35 | return result 36 | --------------------------------------------------------------------------------