├── images
    ├── 2020.png
    ├── 2021.png
    ├── nils.jpeg
    ├── end2end.png
    └── med-head.jpg
├── .dask
    └── config.yaml
├── binder
    ├── postBuild
    ├── jupyterlab-workspace.json
    ├── start
    └── environment.yml
├── README.md
└── dask-sql-pycon.ipynb


/images/2020.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adbreind/pycon2021-dask-sql/main/images/2020.png


--------------------------------------------------------------------------------
/images/2021.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adbreind/pycon2021-dask-sql/main/images/2021.png


--------------------------------------------------------------------------------
/images/nils.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adbreind/pycon2021-dask-sql/main/images/nils.jpeg


--------------------------------------------------------------------------------
/images/end2end.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adbreind/pycon2021-dask-sql/main/images/end2end.png


--------------------------------------------------------------------------------
/images/med-head.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/adbreind/pycon2021-dask-sql/main/images/med-head.jpg


--------------------------------------------------------------------------------
/.dask/config.yaml:
--------------------------------------------------------------------------------
1 | distributed:
2 |   dashboard:
3 |     link: "{JUPYTERHUB_BASE_URL}user/{JUPYTERHUB_USER}/proxy/{port}/status"
4 | 


--------------------------------------------------------------------------------
/binder/postBuild:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | # Install dask and ipywidgets JupyterLab extensions
4 | jupyter labextension install --minimize=False --clean \
5 |     dask-labextension \
6 |     @jupyter-widgets/jupyterlab-manager
7 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # pycon2021-dask-sql
2 | 
3 | 
4 | __Click here to launch:__
5 | 
6 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/adbreind/pycon2021-dask-sql.git/HEAD?urlpath=%2Fnotebooks%2Fdask-sql-pycon.ipynb)


--------------------------------------------------------------------------------
/binder/jupyterlab-workspace.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "data": {
 3 |     "file-browser-filebrowser:cwd": {
 4 |       "path": ""
 5 |     },
 6 |     "dask-dashboard-launcher": {
 7 |       "url": "DASK_DASHBOARD_URL"
 8 |     }
 9 |   },
10 |   "metadata": {
11 |     "id": "/lab"
12 |   }
13 | }


--------------------------------------------------------------------------------
/binder/start:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | # Replace DASK_DASHBOARD_URL with the proxy location
4 | sed -i -e "s|DASK_DASHBOARD_URL|${JUPYTERHUB_BASE_URL}user/${JUPYTERHUB_USER}/proxy/8787|g" binder/jupyterlab-workspace.json
5 | 
6 | # Import the workspace
7 | jupyter lab workspaces import binder/jupyterlab-workspace.json
8 | 
9 | exec "$@"


--------------------------------------------------------------------------------
/binder/environment.yml:
--------------------------------------------------------------------------------
 1 | name: dask-micro-2021
 2 | channels:
 3 |   - conda-forge
 4 | dependencies:
 5 |   - python=3.8
 6 |   - bokeh
 7 |   - dask=2021.2.0
 8 |   - distributed=2021.2.0
 9 |   - dask-sql=0.3.2
10 |   - jupyterlab
11 |   - nodejs
12 |   - tornado
13 |   - pip
14 |   - matplotlib
15 |   - dask_labextension
16 | 
17 | 


--------------------------------------------------------------------------------
/dask-sql-pycon.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "9bd73053-73d9-4b18-b009-bb597a23f3ab",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Dask-SQL: Empowering Pythonistas for <br>Scalable End-to-End Data Engineering and Data Science\n",
  9 |     "\n",
 10 |     "<img src='images/med-head.jpg' width=300>\n",
 11 |     "\n",
 12 |     "## Who Am I?\n",
 13 |     "\n",
 14 |     "### Adam Breindel\n",
 15 |     "\n",
 16 |     "__LinkedIn__ - https://www.linkedin.com/in/adbreind<br>\n",
 17 |     "__Email__ - <tt>adbreind@gmail.com</tt><br>\n",
 18 |     "__Twitter__ - <tt>@adbreind</tt>\n",
 19 |     "\n",
 20 |     "__What Do I Do?__\n",
 21 |     "* Training Lead at Coiled Computing: https://coiled.io\n",
 22 |     "    * Dask scales Python for data science and machine learning\n",
 23 |     "    * Coiled makes it easy to scale on the cloud\n",
 24 |     "* Consulting on data engineering and machine learning\n",
 25 |     "    * Development\n",
 26 |     "    * Various advisory roles\n",
 27 |     "* 20+ years building systems for startups and large enterprises\n",
 28 |     "* 10+ years teaching front- and back-end technology\n",
 29 |     "\n",
 30 |     "__Fun large-scale data projects__\n",
 31 |     "* Streaming neural net + decision tree fraud scoring\n",
 32 |     "* Realtime & offline analytics for banking\n",
 33 |     "* Music synchronization and licensing for networked jukeboxes\n",
 34 |     "\n",
 35 |     "__Industries__\n",
 36 |     "* Finance / Insurance\n",
 37 |     "* Travel, Media / Entertainment\n",
 38 |     "* Energy, Government\n",
 39 |     "* Advertising/Social Media, & more"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "id": "5395ad04-4446-493a-a51f-3cceef4d40f5",
 45 |    "metadata": {},
 46 |    "source": [
 47 |     "<br>\n",
 48 |     "<br>\n",
 49 |     "\n",
 50 |     "---\n",
 51 |     "\n",
 52 |     "<br>\n",
 53 |     "<br>\n",
 54 |     "\n",
 55 |     "# Basic large-scale enterprise data processing pattern\n",
 56 |     "\n",
 57 |     "<img src='images/end2end.png' width=800>\n",
 58 |     "<br>\n",
 59 |     "<br>\n",
 60 |     "Yes, we're missing a lot of important upstream work (data aquisition, ingestion) and downstream (deploy, monitor), but today we're focusing on *SQL*\n",
 61 |     "\n",
 62 |     "<br>\n",
 63 |     "<br>\n",
 64 |     "\n",
 65 |     "---\n",
 66 |     "\n",
 67 |     "<br>\n",
 68 |     "<br>\n",
 69 |     "\n",
 70 |     "# Let's zoom in on extracting from a data lake/warehouse and transforming\n",
 71 |     "\n",
 72 |     "<img src='images/2020.png' width=800>\n",
 73 |     "\n",
 74 |     "* There are __other__ tools (Presto/Trino, Spark, etc.) that can help\n",
 75 |     "* But we're *Pythonistas* and maybe not experts (or interested) in integrating complex JVM-based tools\n",
 76 |     "* And we'd like to ...\n",
 77 |     "  * Use Python together with SQL at scale\n",
 78 |     "  * Create services and tools for our company/team that use SQL\n",
 79 |     "  * Because many more folks know SQL than Python! (I know it's hard to believe, but it's true :)\n",
 80 |     "\n",
 81 |     "<br>\n",
 82 |     "<br>\n",
 83 |     "\n",
 84 |     "---\n",
 85 |     "\n",
 86 |     "<br>\n",
 87 |     "<br>\n",
 88 |     "\n",
 89 |     "# We're all happy it's 2021\n",
 90 |     "\n",
 91 |     "<img src='images/2021.png' width=800>\n",
 92 |     "\n",
 93 |     "\n",
 94 |     "<br>\n",
 95 |     "<br>\n",
 96 |     "\n",
 97 |     "---\n",
 98 |     "\n",
 99 |     "<br>\n",
100 |     "<br>\n"
101 |    ]
102 |   },
103 |   {
104 |    "cell_type": "markdown",
105 |    "id": "46cfd79c-1234-4d2a-83c5-d0bf633b3949",
106 |    "metadata": {},
107 |    "source": [
108 |     "# Introducing Dask-SQL\n",
109 |     "## Adding SQL execution and Hive access to Python!\n",
110 |     "\n",
111 |     "<img src='images/nils.jpeg' width=300>\n",
112 |     "\n",
113 |     "### Nils Braun\n",
114 |     "* Data Engineer for Enabling: Bosch Center for Artificial Intelligence (BCAI)\n",
115 |     "* https://www.linkedin.com/in/nlb/\n",
116 |     "* https://github.com/nils-braun\n",
117 |     "\n",
118 |     "### Dask-SQL\n",
119 |     "\n",
120 |     "Core features\n",
121 |     "\n",
122 |     "* SQL parsing, optimization, planning, translation for Dask\n",
123 |     "* Start with data from...\n",
124 |     "    * files in the cloud (e.g., S3)\n",
125 |     "    * any data in Python (e.g., Pandas or Dask Dataframe)\n",
126 |     "    * modern data catalog/aggregation like Intake (https://github.com/intake/intake)\n",
127 |     "    * __direct from enterprise data lakes/warehouses: Hive Metastore, Databricks, etc.__\n",
128 |     "        * Bring the SQL integration power of Spark right into the Python/Dask world\n",
129 |     "* Query cached datasets to leverage the speed of a large distributed memory pool\n",
130 |     "\n",
131 |     "Bonus features\n",
132 |     "* user-defined functions\n",
133 |     "* a SQL server\n",
134 |     "* ML in SQL\n",
135 |     "* a command-line client\n",
136 |     "* more in the works!\n",
137 |     "\n",
138 |     "Learn more...\n",
139 |     "* Homepage: https://nils-braun.github.io/dask-sql/\n",
140 |     "* Docs: https://dask-sql.readthedocs.io/en/latest/\n",
141 |     "* Source: https://github.com/nils-braun/dask-sql\n",
142 |     "\n",
143 |     "<br>\n",
144 |     "<br>\n",
145 |     "\n",
146 |     "---\n",
147 |     "\n",
148 |     "<br>\n",
149 |     "<br>"
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "markdown",
154 |    "id": "962139c9-72a8-4c1f-bba1-f44f6056776f",
155 |    "metadata": {},
156 |    "source": [
157 |     "## Before we dive into code ... a little clarification: data lakes\n",
158 |     "\n",
159 |     "If you haven't worked a lot in the large-scale data space, it can be a bit confusing why we need a Dask-SQL project. Common questions include...\n",
160 |     "\n",
161 |     "How is this different from...\n",
162 |     "* Dask `read_sql_table`? \n",
163 |     "* Pandas `read_sql`, `read_sql_table`, or `read_sql_query`?\n",
164 |     "* SQLAlchemy\n",
165 |     "* etc.\n",
166 |     "\n",
167 |     "The fundamental difference is: __those other approaches pass your query to a database system which already understands SQL, can execute a query, and has control over your data__\n",
168 |     "\n",
169 |     "__In enterprise data lakes, that \"database\" likely does not exist.__ Instead, you may have huge collections of files, in a variety of formats, with no query engine, and no process which has \"control\" over your data.\n",
170 |     "\n",
171 |     "You may not even have a data catalog. In other cases, you may have a catalog, but it is tied to a Hadoop/JVM-based system like Hive or Spark.\n",
172 |     "\n",
173 |     "In these data lake systems, all of the `read_sql` techniques above may not work at all, or may require you to pass your logic through to Hive/Spark/etc., requiring you to understand, use, and tune those systems before you can even start your work in Python.\n",
174 |     "\n",
175 |     "The goal of Dask-SQL is to allow you to formulate a SQL query against arbitrary files & formats, and execute that query at large scale with Dask."
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "markdown",
180 |    "id": "3f4de9c3-c9a8-4d3e-a073-8ba439fcb807",
181 |    "metadata": {},
182 |    "source": [
183 |     "<br>\n",
184 |     "<br>\n",
185 |     "\n",
186 |     "---\n",
187 |     "\n",
188 |     "<br>\n",
189 |     "<br>\n",
190 |     "\n",
191 |     "## It's coding time!\n",
192 |     "\n",
193 |     "We'll demo three key approaches here:\n",
194 |     "\n",
195 |     "1. Creating a Dask Dataframe -- a lazy, distributed datastructure -- over a set of files, and then using Dask-SQL to query the data\n",
196 |     "\n",
197 |     "2. Creating a Dask-SQL table completely within SQL, and querying that -- an approach that will be very helpful working your SQL analyst friends\n",
198 |     "\n",
199 |     "3. Using Dask-SQL to access tables *already defined in the Hive catalog (\"metastore\")* but querying the underlying files with Dask -- an incredibly valuable missing link for Python data folks working within orgs that rely on Hive to catalog their data."
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": null,
205 |    "id": "c5cd2fd9-4793-4f1a-a7e8-b3740442e32e",
206 |    "metadata": {},
207 |    "outputs": [],
208 |    "source": [
209 |     "from dask.distributed import Client\n",
210 |     "\n",
211 |     "client = Client()\n",
212 |     "\n",
213 |     "client"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": null,
219 |    "id": "1c9e855f-3cb5-4764-ab85-e5eb47c0373d",
220 |    "metadata": {},
221 |    "outputs": [],
222 |    "source": [
223 |     "from dask_sql import Context\n",
224 |     "\n",
225 |     "c = Context()"
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "code",
230 |    "execution_count": null,
231 |    "id": "667aee4b-6c9c-42af-a07e-ea2b123c64aa",
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "import dask.dataframe as dd\n",
236 |     "\n",
237 |     "df = dd.read_csv('data/powerplant.csv')\n",
238 |     "\n",
239 |     "df"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "code",
244 |    "execution_count": null,
245 |    "id": "6da3e6fb-ecef-48cd-8d97-cbc6c2fcaa99",
246 |    "metadata": {},
247 |    "outputs": [],
248 |    "source": [
249 |     "c.create_table(\"powerplant\", df)\n",
250 |     "\n",
251 |     "result = c.sql('SELECT * FROM powerplant')\n",
252 |     "\n",
253 |     "result"
254 |    ]
255 |   },
256 |   {
257 |    "cell_type": "code",
258 |    "execution_count": null,
259 |    "id": "9c168bfd-2ae3-44e4-bc43-bc960ecce869",
260 |    "metadata": {},
261 |    "outputs": [],
262 |    "source": [
263 |     "type(result)"
264 |    ]
265 |   },
266 |   {
267 |    "cell_type": "code",
268 |    "execution_count": null,
269 |    "id": "5ec06cec-b204-44f6-9026-2f82a3aa8d3e",
270 |    "metadata": {},
271 |    "outputs": [],
272 |    "source": [
273 |     "result.compute()"
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "code",
278 |    "execution_count": null,
279 |    "id": "dd9abde4-e6e0-4ed9-8251-69c8652639c2",
280 |    "metadata": {},
281 |    "outputs": [],
282 |    "source": [
283 |     "c.sql('SELECT * FROM powerplant', return_futures=False) # run immediately -- beware of large result sets!"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "code",
288 |    "execution_count": null,
289 |    "id": "d59a600c",
290 |    "metadata": {},
291 |    "outputs": [],
292 |    "source": [
293 |     "type(c.sql('SELECT * FROM powerplant', return_futures=False))"
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "code",
298 |    "execution_count": null,
299 |    "id": "ce616b95-11a9-4631-a374-f9901632089d",
300 |    "metadata": {},
301 |    "outputs": [],
302 |    "source": [
303 |     "query = '''\n",
304 |     "SELECT\n",
305 |     "    FLOOR(\"AT\") AS temp, AVG(\"PE\") AS output\n",
306 |     "FROM\n",
307 |     "    powerplant\n",
308 |     "GROUP BY \n",
309 |     "    FLOOR(\"AT\")\n",
310 |     "'''\n",
311 |     "\n",
312 |     "result = c.sql(query)\n",
313 |     "\n",
314 |     "result"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "code",
319 |    "execution_count": null,
320 |    "id": "146351e1-7b08-4156-af87-2452e0c89827",
321 |    "metadata": {},
322 |    "outputs": [],
323 |    "source": [
324 |     "result.compute().plot.scatter('temp','output')\n",
325 |     "\n",
326 |     "# hint: if you're not totally convinced the computation is happening in Dask, look at the Dask Task Stream dashboard!"
327 |    ]
328 |   },
329 |   {
330 |    "cell_type": "markdown",
331 |    "id": "0b750332-7c6b-4352-a138-66e9777ca021",
332 |    "metadata": {},
333 |    "source": [
334 |     "Maybe we could build a successful model with this data ... in fact, we could do it with any combination of\n",
335 |     "* Data prep in SQL, training/prediction in Python\n",
336 |     "* Training in Python, prediction in SQL\n",
337 |     "* Everything (!) in SQL\n",
338 |     "* Sound interesting? Check it out: https://dask-sql.readthedocs.io/en/latest/pages/machine_learning.html\n",
339 |     "\n",
340 |     "### What about \"creating the table completely in SQL\"?\n",
341 |     "\n",
342 |     "First, let's go \"full SQL\" so we don't even need to wrap our queries in Python..."
343 |    ]
344 |   },
345 |   {
346 |    "cell_type": "code",
347 |    "execution_count": null,
348 |    "id": "a50f01d0-5753-4f5e-91c4-9768211f77f8",
349 |    "metadata": {},
350 |    "outputs": [],
351 |    "source": [
352 |     "c.ipython_magic()"
353 |    ]
354 |   },
355 |   {
356 |    "cell_type": "code",
357 |    "execution_count": null,
358 |    "id": "e5e2f680-18af-49a5-b767-f7ba6adf1e34",
359 |    "metadata": {},
360 |    "outputs": [],
361 |    "source": [
362 |     "%%sql\n",
363 |     "\n",
364 |     "CREATE TABLE allsql WITH (\n",
365 |     "    format = 'csv',\n",
366 |     "    location = 'data/powerplant.csv' -- any Dask-accessible source or format (cloud/S3/..., parquet/ORC/...)\n",
367 |     ")"
368 |    ]
369 |   },
370 |   {
371 |    "cell_type": "code",
372 |    "execution_count": null,
373 |    "id": "a189d158",
374 |    "metadata": {},
375 |    "outputs": [],
376 |    "source": [
377 |     "%%sql\n",
378 |     "\n",
379 |     "SELECT\n",
380 |     "    FLOOR(\"AT\") AS temp, AVG(\"PE\") AS output\n",
381 |     "FROM\n",
382 |     "    allsql\n",
383 |     "GROUP BY \n",
384 |     "    FLOOR(\"AT\")\n",
385 |     "LIMIT 10"
386 |    ]
387 |   },
388 |   {
389 |    "cell_type": "markdown",
390 |    "id": "33c0b6a7",
391 |    "metadata": {},
392 |    "source": [
393 |     "### Let's see that Hive catalog integration!\n",
394 |     "\n",
395 |     "*note: this demo will not run in the standalone binder notebook available after PyCon, as it relies on a Hive server which is not configured in that container*"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "code",
400 |    "execution_count": null,
401 |    "id": "e0c27141",
402 |    "metadata": {},
403 |    "outputs": [],
404 |    "source": [
405 |     "from pyhive.hive import connect\n",
406 |     "\n",
407 |     "cursor = connect(\"localhost\", 10000).cursor()\n",
408 |     "\n",
409 |     "c.create_table(\"my_diamonds\", cursor, hive_table_name=\"diamonds\")"
410 |    ]
411 |   },
412 |   {
413 |    "cell_type": "markdown",
414 |    "id": "a5f9c10a",
415 |    "metadata": {},
416 |    "source": [
417 |     "Here's the magic...\n",
418 |     "* If you look at the Hive Server web UI, you'll see a query just ran to get schema info on the `Diamonds` table\n",
419 |     "* But in the following queries\n",
420 |     "    * Data is accessed directly from the underlying files\n",
421 |     "    * No Hive queries are run\n",
422 |     "    * All compute is done in Dask/Python"
423 |    ]
424 |   },
425 |   {
426 |    "cell_type": "code",
427 |    "execution_count": null,
428 |    "id": "404722db",
429 |    "metadata": {},
430 |    "outputs": [],
431 |    "source": [
432 |     "%%sql\n",
433 |     "\n",
434 |     "SELECT * FROM my_diamonds LIMIT 10"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "code",
439 |    "execution_count": null,
440 |    "id": "f7de52bb",
441 |    "metadata": {},
442 |    "outputs": [],
443 |    "source": [
444 |     "query = '''\n",
445 |     "SELECT FLOOR(10*carat)/10 AS carat, AVG(price) AS price, COUNT(1) AS num \n",
446 |     "FROM my_diamonds\n",
447 |     "GROUP BY FLOOR(10*carat)\n",
448 |     "'''\n",
449 |     "\n",
450 |     "data = c.sql(query).compute()\n",
451 |     "\n",
452 |     "data.plot.scatter('carat', 'price')\n",
453 |     "data.plot.bar('carat', 'num')"
454 |    ]
455 |   },
456 |   {
457 |    "cell_type": "markdown",
458 |    "id": "e99da79e-953e-4eb4-b213-82875016cc31",
459 |    "metadata": {},
460 |    "source": [
461 |     "## A Quick Look at How Dask-SQL Works\n",
462 |     "\n",
463 |     "* Locate the source data\n",
464 |     "    * Hive, Intake, Databricks catalog integration\n",
465 |     "    * Files or Python data provided by user\n",
466 |     "\n",
467 |     "\n",
468 |     "* Prepare the query using Apache Calcite\n",
469 |     "    * Parse SQL\n",
470 |     "    * Analyze (check vs. schema, etc.)\n",
471 |     "    * Optimize\n",
472 |     "\n",
473 |     "\n",
474 |     "* Create execution plan\n",
475 |     "    * Take logical relational operators (`SELECT`/project, `WHERE`/filter, `JOIN`, etc.) \n",
476 |     "    * Convert into Dask Dataframe API calls (`query`, `merge`, etc.)\n",
477 |     "\n",
478 |     "\n",
479 |     "* Then either...\n",
480 |     "    * Return a handle to the Dask Dataframe of results (recall this is a virtual Dataframe, so no execution yet)\n",
481 |     "    * or\n",
482 |     "    * Compute (materialize) the resulting dataframe and return the result as a Pandas Dataframe\n",
483 |     "    \n",
484 |     "More detail at https://dask-sql.readthedocs.io/en/latest/pages/how_does_it_work.html"
485 |    ]
486 |   },
487 |   {
488 |    "cell_type": "markdown",
489 |    "id": "f560c5e3-1688-4178-bb77-5cb6864d4b25",
490 |    "metadata": {},
491 |    "source": [
492 |     "## Some Practical Details\n",
493 |     "\n",
494 |     "### Installing Dask-SQL\n",
495 |     "\n",
496 |     "Recommended approach is via conda and conda-forge -- this will include all dependencies like the JVM, and avoid conflicts by keeping everything within a conda environment.\n",
497 |     "\n",
498 |     "There are also a few other options: more details at https://dask-sql.readthedocs.io/en/latest/pages/installation.html\n",
499 |     "\n",
500 |     "### Supported SQL Operators\n",
501 |     "\n",
502 |     "Dask-SQL is a young project, so it does not yet support all of SQL\n",
503 |     "\n",
504 |     "More detail on\n",
505 |     "* Query support https://dask-sql.readthedocs.io/en/latest/pages/sql/select.html\n",
506 |     "* Table creation https://dask-sql.readthedocs.io/en/latest/pages/sql/creation.html\n",
507 |     "* ML via SQL https://dask-sql.readthedocs.io/en/latest/pages/sql/ml.html\n",
508 |     "\n",
509 |     "### How to Contribute\n",
510 |     "\n",
511 |     "Source code and info on installing for development is at  https://github.com/nils-braun/dask-sql\n",
512 |     "\n",
513 |     "Check issues -- or file a new bug -- at https://github.com/nils-braun/dask-sql/issues\n",
514 |     "\n",
515 |     "And there's even a \"good first issue\" list at https://github.com/nils-braun/dask-sql/contribute"
516 |    ]
517 |   },
518 |   {
519 |    "cell_type": "markdown",
520 |    "id": "f7488066",
521 |    "metadata": {},
522 |    "source": [
523 |     "# Thank You!"
524 |    ]
525 |   },
526 |   {
527 |    "cell_type": "code",
528 |    "execution_count": null,
529 |    "id": "4999eff9",
530 |    "metadata": {},
531 |    "outputs": [],
532 |    "source": []
533 |   }
534 |  ],
535 |  "metadata": {
536 |   "kernelspec": {
537 |    "display_name": "Python 3",
538 |    "language": "python",
539 |    "name": "python3"
540 |   },
541 |   "language_info": {
542 |    "codemirror_mode": {
543 |     "name": "ipython",
544 |     "version": 3
545 |    },
546 |    "file_extension": ".py",
547 |    "mimetype": "text/x-python",
548 |    "name": "python",
549 |    "nbconvert_exporter": "python",
550 |    "pygments_lexer": "ipython3",
551 |    "version": "3.8.0"
552 |   }
553 |  },
554 |  "nbformat": 4,
555 |  "nbformat_minor": 5
556 | }
557 | 


--------------------------------------------------------------------------------