├── Advanced Db2 JSON Techniques.ipynb ├── Db2 Calling Stored Procedures.ipynb ├── Db2 Compatibility Features.ipynb ├── Db2 JSON Features.ipynb ├── Db2 Jupyter Extensions Tutorial.ipynb ├── Db2 Jupyter Macros.ipynb ├── Db2 Regular Expressions.ipynb ├── Db2 Row and Column Access Control.ipynb ├── Db2 Statistical Functions.ipynb ├── Db2 Time and Date Functions.ipynb ├── Db2 Using Prepared Statements.ipynb ├── Db2V11-JSON-ebook.pdf ├── LICENSE.md ├── Notice.txt ├── README.md ├── db2.ipynb ├── db2json.ipynb ├── db2jupyter.docker ├── dockerinstall.md ├── generate_json.ipynb ├── installation.md └── v1 ├── Db2 11 Compatibility Features.ipynb ├── Db2 11 JSON Features.ipynb ├── Db2 11 Regular Expressions.ipynb ├── Db2 11 Statistical Functions.ipynb ├── Db2 11 Time and Date Functions.ipynb ├── Db2 Jupyter Extensions Tutorial.ipynb ├── Db2 OData Gateway Tutorial.ipynb ├── db2.ipynb └── db2odata.ipynb /Advanced Db2 JSON Techniques.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Advanced JSON Techniques\n", 8 | "The basics of using JSON within DB2 were covered in a previous notebook. This notebook will cover some of the advanced techniques that are required to access arrays and structures that may be nested within a JSON document. The first step (as with any notebook!) is to load the Db2 Jupyter magic commands." 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "%run db2.ipynb" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "This section will work with a larger table that will give us some idea of the performance of using JSON within a Db2 database. This next statement will generate 25,000 customer records that have a structure similar to this:\n", 25 | "```json\n", 26 | "{\n", 27 | " \"customerid\": 100000,\n", 28 | " \"identity\": {\n", 29 | " \"firstname\": \"Jacob\",\n", 30 | " \"lastname\": \"Hines\",\n", 31 | " \"birthdate\": \"1982-09-18\"\n", 32 | " },\n", 33 | " \"contact\": {\n", 34 | " \"street\": \"Main Street North\",\n", 35 | " \"city\": \"Amherst\",\n", 36 | " \"state\": \"OH\",\n", 37 | " \"zipcode\": \"44001\",\n", 38 | " \"email\": \"Ja.Hines@yahii.com\",\n", 39 | " \"phone\": \"813-689-8309\"\n", 40 | " },\n", 41 | " \"payment\": {\n", 42 | " \"card_type\": \"MCCD\",\n", 43 | " \"card_no\": \"4742-3005-2829-9227\"\n", 44 | " },\n", 45 | " \"purchases\": [\n", 46 | " {\n", 47 | " \"tx_date\": \"2018-02-14\",\n", 48 | " \"tx_no\": 157972,\n", 49 | " \"product_id\": 1860,\n", 50 | " \"product\": \"Ugliest Snow Blower\",\n", 51 | " \"quantity\": 1,\n", 52 | " \"item_cost\": 51.86\n", 53 | " },...\n", 54 | " ]\n", 55 | "}\n", 56 | "```" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "The JSON record contains four distinct pieces of information:\n", 64 | "\n", 65 | "- Customerid - Primary key\n", 66 | "- Identity - Information on the customer including name and birthdate\n", 67 | "- Contact - Address, email, and phone number information\n", 68 | "- Payment - Current payment card that is used\n", 69 | "- Purchase - The purchase that the customer has made\n", 70 | "\n", 71 | "The purchase structure contains information on the customer purchases. For each purchased items there is the following information:\n", 72 | "\n", 73 | "- tx_date - Date of the transaction\n", 74 | "- tx_no - Transaction number\n", 75 | "- product_id - Id for the product\n", 76 | "- product - Name of the product\n", 77 | "- quantity - Quantity of products purchased\n", 78 | "- item_cost - Cost of one product\n", 79 | "\n", 80 | "If this was a relational database you would probably split these fields up into different tables and use join techniques to bring the information back together. In a JSON document we are able to keep all of this information in one record, which makes retrieval of an individual customer purchases easier." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "## Connect to the Db2 Database\n", 88 | "You may need to add connection information here if you have not yet connected to the database. " 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "%sql connect" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "We need to create JSON records for loading into the table. This next command will run the `db2json.ipynb` file which contains code to generate JSON records. If you want to change the number of records created or the data used, edit the file to modify the defaults. The program will display its progress as it runs." 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": null, 110 | "metadata": {}, 111 | "outputs": [], 112 | "source": [ 113 | "%run generate_json.ipynb" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### Load into Customer Table (BSON)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "The table that we create for JSON data will include a column with the actual JSON record and another field as an identifier for the row. The load statement in the next section of code will insert the `customerid` from the JSON record into the `CUSTNO` column. " 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": { 134 | "collapsed": true 135 | }, 136 | "outputs": [], 137 | "source": [ 138 | "%%sql -q\n", 139 | " DROP TABLE CUSTOMERS_BSON;\n", 140 | " CREATE TABLE CUSTOMERS_BSON \n", 141 | " (\n", 142 | " CUSTNO INT NOT NULL,\n", 143 | " DETAILS BLOB(2048) INLINE LENGTH 2048\n", 144 | " );" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "The next piece of code will import the data into Db2. The Python code will read the `customer.js` file, parse it and then insert in into the `CUSTOMERS` table. Note that the entire JSON record is stored into the `DETAILS` column, while the `CUSTNO` column is set to the `customerid` field that is found in the JSON record." 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "import io\n", 161 | "import json\n", 162 | "print(\"Starting Load\")\n", 163 | "start_time = time.time()\n", 164 | "%sql autocommit off\n", 165 | "x = %sql prepare INSERT INTO CUSTOMERS_BSON VALUES (?,SYSTOOLS.JSON2BSON(?))\n", 166 | "if (x != False):\n", 167 | " i = 0\n", 168 | " with open(\"customers.js\",\"r\") as records:\n", 169 | " for record in records:\n", 170 | " rec = json.loads(record)\n", 171 | " custno = rec['customerid']\n", 172 | " i += 1\n", 173 | " rc = %sql execute :x using :custno, :record\n", 174 | " if (rc == False): break\n", 175 | " if ((i % 5000) == 0): \n", 176 | " print(str(i)+\" rows read.\")\n", 177 | " %sql commit hold\n", 178 | " \n", 179 | " %sql commit work \n", 180 | "%sql autocommit on\n", 181 | "end_time = time.time()\n", 182 | "print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))\n", 183 | "customer_bson = end_time - start_time" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "To double-check the JSON format we retrieve one record from the table." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "%sql -j select systools.bson2json(details) from customers fetch first 1 row only" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "The table does not have an index currently defined on it. We could have created the table with a primary key but chose not to in order to speed up the load step. The next statement will create the index for our table." 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "%sql CREATE UNIQUE INDEX IX_CUST_BSON ON CUSTOMERS_BSON(CUSTNO)" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "### Db2 JSON Macro Extensions" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "The Db2 notebook on JSON features described how to retrieve individual fields from a record by using the `JSON_VAL` function. The following code retrieves the first and last name of customer number 100000." 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": null, 235 | "metadata": {}, 236 | "outputs": [], 237 | "source": [ 238 | "%%sql \n", 239 | "SELECT JSON_VAL(DETAILS,'identity.firstname','s:32') AS FIRSTNAME, \n", 240 | " JSON_VAL(DETAILS,'identity.lastname','s:32') AS LASTNAME\n", 241 | "FROM CUSTOMERS\n", 242 | "WHERE CUSTNO = 100000" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "In order to make JSON retrieval easier in Jupyter notebooks, the Db2 Jupyter `%sql` command allows for the creation of macros. Macros are used to expand text in a SQL statement. Macros do not execute any code, but are used specifically to insert text into a SQL statement. An example of a macro is found below." 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "collapsed": true 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "%%sql macro helloworld\n", 261 | "echo Everything you said is here: {1}\n", 262 | "exit" 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": null, 268 | "metadata": {}, 269 | "outputs": [], 270 | "source": [ 271 | "%sql %helloworld(Hello World) " 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "A number of macros can be found in the `db2jon.ipynb` file. We can insert the file in the next step to load these macros." 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": null, 284 | "metadata": { 285 | "collapsed": true 286 | }, 287 | "outputs": [], 288 | "source": [ 289 | "%run db2json.ipynb" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "To use a macro, we precede it with the % character and enclose any parameters in brackets:\n", 297 | "```\n", 298 | "SELECT %js(details,identity.firstname,32) FROM CUSTOMERS WHERE CUSTONO = 100000\n", 299 | "```\n", 300 | "The `%js` macro will expand the values in the string to include the `JSON_VAL(details, ...)` in the SQL. To echo the contents of what the macro produces, we need to include the `-e` flag in the SQL statement." 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": null, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "%%sql -e\n", 310 | "SELECT %js(details,identity.firstname,32)\n", 311 | "FROM CUSTOMERS\n", 312 | "WHERE CUSTNO = 100000" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "You will notice that a green box is placed around the generated code and the SQL has been modified by the macro to include the full JSON_VAL function. This makes it easier to create the SQL (and hopefully less error prone) when dealing with JSON objects. There are 5 macros that have been predefined for JSON queries.\n", 320 | "\n", 321 | "- `-e` - This flag turns on the display of the generated SQL. This is useful if you want to cut and paste the final SQL into an application. If you do not use the `-e` flag, the code will not be displayed. \n", 322 | "- `#js` - Return a string from a JSON field\n", 323 | "- `#ji` - Return an integer from a JSON field\n", 324 | "- `#jd` - Return a number (decimal) from a JSON field\n", 325 | "- `#jdate` - Return a date from a JSON field" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "The format for the `ji` (Integer) and `jdate` (Date) macro is:\n", 333 | "```\n", 334 | "#ji(json_column,json_field)\n", 335 | "#jdate(json_column,json_field)\n", 336 | "```\n", 337 | "Note that you do not need to enclose quotes around the column or field names unless they have special characters in the name. The quotes are stripped out of the parameters but the macros will add them back into the SQL function if necessary. Also, field names are case sensitive in JSON so make sure you use the proper spelling for the field you are trying to retrieve. The column name that contains JSON data is not case sensitive." 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "The `#jd` macro works similar to the `#ji` macro except that it provides for an additional decimal formatting operator:\n", 345 | "```\n", 346 | "#jd(json_column, json_field, 'digits,decimals')\n", 347 | "```\n", 348 | "A decimal number if formatted as x,y where x is the number of digits in total, and y represents the number of digits after the decimal point. So a value of 5,2 would mean there are 5 digits in total with 2 of them after the decimal point." 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "The `#js` (JSON string) macro the following format:\n", 356 | "```\n", 357 | "#js(column,field,length)\n", 358 | "```\n", 359 | "The macro needs which JSON column to use, the field that is being searched for, and the length that needs to be returned. Note that you do not use the JSON_VAL specification in this field ('s:255'). Instead you are supplying the length of the return string as an integer." 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "The next SQL command illustrates the use of these macros to return the customers first name, lastname, address, zipcode, date of birth and phone number from the JSON record." 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": null, 372 | "metadata": {}, 373 | "outputs": [], 374 | "source": [ 375 | "%%sql -e\n", 376 | "SELECT\n", 377 | " %js(details,identity.firstname,12) AS FIRSTNAME,\n", 378 | " %js(details,identity.lastname,15) AS LASTNAME,\n", 379 | " %js(details,contact.street,30) AS STREET,\n", 380 | " %js(details,contact.city,20) AS CITY,\n", 381 | " %js(details,contact.state,2) AS STATE,\n", 382 | " %ji(details,contact.zipcode) AS ZIPCODE,\n", 383 | " %jdate(details,identity.birthdate) AS BIRTHDATE,\n", 384 | " %js(details,contact.phone,12) AS PHONE\n", 385 | "FROM CUSTOMERS\n", 386 | "WHERE CUSTNO = 100000" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "Using the `-e` flag is useful when debugging any SQL that uses JSON functions. In addition it gives you the raw SQL that is generated so that you can cut and paste it into your own application. The examples with strings use the following format:\n", 394 | "```\n", 395 | "TRIM(JSON_VAL(...'s:12'))\n", 396 | "```\n", 397 | "If you use a specification that is too small to retrieve the field (s:12), then the JSON_VAL will return a NULL value rather than a truncated string. For this reason, you may want to use a larger size and the value will be trimmed down to a smaller size. " 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "The final macro that is included in the list is the `#jsonarray()` macro. The macro call has the following format:\n", 405 | "```\n", 406 | "jsonarray(table_name, pk, json_column, array, [optional where clause])\n", 407 | "```\n", 408 | "The fields are:\n", 409 | "- `table_name` - table that we are accessing\n", 410 | "- `pk` - primary key of the table\n", 411 | "- `json_column` - column that contains the json data\n", 412 | "- `array` - the field that contains an array of items (which must be a structure and not individual values)\n", 413 | "- `where clause` - An optional expression to limit the result set\n", 414 | "\n", 415 | "The `#jsonarray` macro is used in situations where you want to retrieve the contains of an array within a JSON record. The macro will create a WITH clause specification where the name of the resulting table is called `JSONARRAY` with two columns: PK for the primary key to the array record and ITEM for the items returned. The following example shows how you would retrieve all of the purchases that a customer made using the `JSON_TABLE` function." 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": {}, 422 | "outputs": [], 423 | "source": [ 424 | "%%sql \n", 425 | "with purchases(items) as \n", 426 | " (select systools.json2bson(items.value)\n", 427 | " from customers, \n", 428 | " table( systools.json_table(customers.details,'purchases','s:2048')) as items\n", 429 | " where custno = 100000\n", 430 | " )\n", 431 | "select %ji(items,product_id),%js(items,product,32),%ji(items,quantity),%jd(items,item_cost,'7,2') from purchases" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "This code can be simplified by using the `%jsonarray` macro instead. " 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": {}, 445 | "outputs": [], 446 | "source": [ 447 | "%%sql -e\n", 448 | "WITH\n", 449 | " %jsonarray(customers,custno,details,purchases,where=\"custno = 100000\")\n", 450 | "SELECT \n", 451 | " %ji(item,product_id),%js(item,product,32),%ji(item,quantity),%jd(item,item_cost,'7,2') from jsonarray" 452 | ] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "metadata": {}, 457 | "source": [ 458 | "If you display the SQL that was created, you will see the temporary table specification `jsonarrray(pk,item)` that was created by the macro. You must start the SQL statement using the `WITH` clause otherwise the syntax will be incorrect." 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": {}, 465 | "outputs": [], 466 | "source": [ 467 | "%%sql -e\n", 468 | "WITH\n", 469 | " %jsonarray(customers,custno,details,purchases,where=\"custno = 100000\")\n", 470 | "SELECT \n", 471 | " %ji(item,product_id),%js(item,product,32),%ji(item,quantity),%jd(item,item_cost,'7,2') from jsonarray" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "### Sales by State\n", 479 | "Now that we can access the data using the JSON macros, we can start doing some analysis on the records. For each of these queries, remember that there are no indexes being used to determine the answer. This first query will show the total sales per state. In order to determine this, we must take the array of purchases by a customer, calculate the total cost of their orders and sum it up for the state." 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": null, 485 | "metadata": { 486 | "pixiedust": { 487 | "displayParams": { 488 | "aggregation": "SUM", 489 | "handlerId": "barChart", 490 | "keyFields": "STATE", 491 | "legend": "false", 492 | "mpld3": "false", 493 | "sortby": "Values ASC", 494 | "valueFields": "2" 495 | } 496 | } 497 | }, 498 | "outputs": [], 499 | "source": [ 500 | "%%sql -pb\n", 501 | "WITH \n", 502 | " %jsonarray(customers,custno,details,purchases), \n", 503 | "SALES(state, amount) AS\n", 504 | " ( \n", 505 | " SELECT %js(c.details,contact.state,2), sum(%ji(j.item,quantity) * %jd(j.item,item_cost,'7,2')) \n", 506 | " from jsonarray j, customers c \n", 507 | " where j.pk = c.custno \n", 508 | " GROUP BY %js(c.details,contact.state,2) \n", 509 | " ) \n", 510 | "SELECT state, sum(amount) from SALES \n", 511 | "GROUP BY state \n", 512 | "ORDER BY 2 DESC" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "### Orders of a Product\n", 520 | "This calculation gives us the number of times a product was ordered. This doesn't tell us the quantity since we need to look at the individual purchases to see how many of the product the customer actually ordered. The SQL command will use the -a flag so that the output will be displayed in a TABLE so that you can find it easier to scroll the results." 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": null, 526 | "metadata": { 527 | "pixiedust": { 528 | "displayParams": { 529 | "aggregation": "SUM", 530 | "handlerId": "tableView", 531 | "keyFields": "PRODUCT", 532 | "sortby": "Values ASC", 533 | "valueFields": "QUANTITY" 534 | } 535 | } 536 | }, 537 | "outputs": [], 538 | "source": [ 539 | "%%sql -a\n", 540 | "WITH \n", 541 | " %jsonarray(customers,custno,details,purchases), \n", 542 | " SALES(product, quantity) AS ( \n", 543 | " SELECT %js(j.item,product,32),count(*) \n", 544 | " from jsonarray j \n", 545 | " GROUP BY %js(j.item,product,32) \n", 546 | " ) \n", 547 | "SELECT product, quantity from SALES \n", 548 | "ORDER BY 2 DESC" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "### Quantity of Products Ordered\n", 556 | "So this SQL will actually tell us the quantity of a product ordered." 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": null, 562 | "metadata": { 563 | "pixiedust": { 564 | "displayParams": { 565 | "aggregation": "SUM", 566 | "handlerId": "barChart", 567 | "keyFields": "PRODUCT", 568 | "sortby": "Values ASC", 569 | "valueFields": "QUANTITY" 570 | } 571 | } 572 | }, 573 | "outputs": [], 574 | "source": [ 575 | "%%sql -a\n", 576 | "WITH \n", 577 | " %jsonarray(customers,custno,details,purchases), \n", 578 | " SALES(product, quantity) AS ( \n", 579 | " SELECT %js(j.item,product,32), %ji(j.item,quantity) \n", 580 | " from jsonarray j \n", 581 | " ) \n", 582 | "SELECT product, sum(quantity) from SALES \n", 583 | "GROUP BY product\n", 584 | "ORDER BY 2 DESC" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "### Most Popular Product by State\n", 592 | "Here we need to get the product counts by state, and then pick the maximum quantity found in the answer set. The steps we need to perform in the SQL are:\n", 593 | "\n", 594 | "- Get all products purchased\n", 595 | "- Sum the sales of products by state\n", 596 | "- Compute the max for each state\n", 597 | "\n", 598 | "The next set of SQL statements will get us the results. Note that it is possible to have more than one product being the top seller in a state. Note: This SQL may take a while to execute." 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": null, 604 | "metadata": { 605 | "pixiedust": { 606 | "displayParams": { 607 | "aggregation": "SUM", 608 | "handlerId": "barChart", 609 | "keyFields": "PRODUCT", 610 | "sortby": "Values ASC", 611 | "valueFields": "QUANTITY" 612 | } 613 | } 614 | }, 615 | "outputs": [], 616 | "source": [ 617 | "%%sql -a\n", 618 | "WITH \n", 619 | " %jsonarray(customers,custno,details,purchases), \n", 620 | " SALES(product, state, quantity) AS ( \n", 621 | " SELECT %js(j.item,product,32), %js(c.details,contact.state,2), SUM(%ji(j.item,quantity))\n", 622 | " from jsonarray j, customers c\n", 623 | " where j.pk = c.custno\n", 624 | " GROUP BY %js(j.item,product,32), %js(c.details,contact.state,2)\n", 625 | " ),\n", 626 | " MAXSALES(state, total) AS (\n", 627 | " SELECT STATE, MAX(quantity)\n", 628 | " FROM SALES\n", 629 | " GROUP BY STATE\n", 630 | " )\n", 631 | "SELECT s.state, s.product, m.total \n", 632 | " FROM SALES s, MAXSALES m\n", 633 | "WHERE\n", 634 | " s.state = m.state and\n", 635 | " s.quantity = m.total\n", 636 | "ORDER BY s.product desc" 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "## Converting JSON into Relational Tables\n", 644 | "As you can see from the previous examples, it is possible to analytics against the JSON data. However, it requires some sophisticated techniques for manipulating (and converting) the JSON data. One approach is to use the JSON functions and move the data into a traditional relational table for analysis. The following SQL will create a \"flat\" relational table that will allow for further analysis without having to do joins. In addition, we are going to create the table using column organization which will give us the ability to do queries without having to worry about what indexes are needed. " 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": null, 650 | "metadata": {}, 651 | "outputs": [], 652 | "source": [ 653 | "%%sql\n", 654 | "DROP TABLE TRANSACTIONS;\n", 655 | "CREATE TABLE TRANSACTIONS\n", 656 | " (\n", 657 | " TX_DATE DATE,\n", 658 | " CITY VARCHAR(32),\n", 659 | " STATE CHAR(2),\n", 660 | " ZIPCODE INT,\n", 661 | " PRODUCT_NAME VARCHAR(32),\n", 662 | " PRODUCT_QTY INT,\n", 663 | " PRODUCT_COST DEC(7,2),\n", 664 | " CCARD CHAR(4)\n", 665 | " ) \n", 666 | "ORGANIZE BY COLUMN;" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "Next we use the `#jsonarray` macro to generate the rows we need to populate the transactions table." 674 | ] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": null, 679 | "metadata": {}, 680 | "outputs": [], 681 | "source": [ 682 | "%%sql\n", 683 | "INSERT INTO TRANSACTIONS \n", 684 | "WITH\n", 685 | " %jsonarray(customers,custno,details,purchases), \n", 686 | " TXS(tx_date, city, state, zipcode, product_name, product_qty, product_cost, ccard) AS \n", 687 | " ( \n", 688 | " SELECT %jdate(j.item,tx_date), \n", 689 | " %js(c.details,contact.city,32),\n", 690 | " %js(c.details,contact.state,2), \n", 691 | " %ji(c.details,contact.zipcode),\n", 692 | " %js(j.item,product,32),\n", 693 | " %ji(j.item,quantity),\n", 694 | " %jd(j.item,item_cost,'7,2'),\n", 695 | " %js(c.details,payment.card_type,4)\n", 696 | " from jsonarray j, customers c\n", 697 | " where j.pk = c.custno\n", 698 | " )\n", 699 | " SELECT TX_DATE, CITY, STATE, ZIPCODE, PRODUCT_NAME, PRODUCT_QTY, PRODUCT_COST, CCARD\n", 700 | " FROM TXS" 701 | ] 702 | }, 703 | { 704 | "cell_type": "markdown", 705 | "metadata": {}, 706 | "source": [ 707 | "Next we can check our record count to see how many records we have. " 708 | ] 709 | }, 710 | { 711 | "cell_type": "code", 712 | "execution_count": null, 713 | "metadata": {}, 714 | "outputs": [], 715 | "source": [ 716 | "%sql SELECT COUNT(*) FROM TRANSACTIONS" 717 | ] 718 | }, 719 | { 720 | "cell_type": "markdown", 721 | "metadata": {}, 722 | "source": [ 723 | "### Plot Routine\n", 724 | "This code will produce a graph of the results from an SQL statement. This routine allows more flexibility on what is displayed on the chart, including changes the colors, the titles, and the axis names.\n", 725 | "\n", 726 | "The format of the Plot is:\n", 727 | "```\n", 728 | "plotSQL(sql, \"Title of the plot\", \"X Axis Title\", \"Y Axis Title\", rotation, color palette)\n", 729 | "```\n", 730 | "If Rotation is set to True, the x axis labels will be rotated by 30 degress and place the text at a slight angle to make longer text easier to read. The color palette is either m1, m2, m3, or m4." 731 | ] 732 | }, 733 | { 734 | "cell_type": "code", 735 | "execution_count": null, 736 | "metadata": { 737 | "collapsed": true 738 | }, 739 | "outputs": [], 740 | "source": [ 741 | "import matplotlib.pyplot as plt\n", 742 | "\n", 743 | "m1 = ('#009926','#00994d','#009973','#009999','#007399','#004d99','#002699','#000099')\n", 744 | "m2 = (\"#00ff40\",\"#00ff80\",\"#00ffbf\",\"#00ffff\",\"#00bfff\",\"#0080ff\",\"#0040ff\")\n", 745 | "m3 = (\"#9933ff\",\"#cc33ff\",\"#ff33ff\",\"#ff33cc\",\"#ff3399\",\"#ff3366\",\"#ff3333\")\n", 746 | "m4 = (\"#ff00bf\",\"#ff0080\",\"#ff0040\",\"#ff0000\",\"#ff0000\",\"#ff4000\",\"#ff8000\",\"#ffbf00\",\"#ffff00\")\n", 747 | "\n", 748 | "def plotSQL(sql, title, xaxis, yaxis, rotation, colormap):\n", 749 | " if (rotation == True):\n", 750 | " rot = 30\n", 751 | " else:\n", 752 | " rot = 0\n", 753 | " df = %sql {sql}\n", 754 | " if (df is None):\n", 755 | " print(\"SQL call failed\")\n", 756 | " return\n", 757 | " xvalues = df.columns.values[0]\n", 758 | " yvalues = df.columns.values[1]\n", 759 | " df.plot(kind='bar',x=xvalues,y=yvalues,color=colormap);\n", 760 | " _ = plt.xlabel(\"\", fontsize=12);\n", 761 | " _ = plt.ylabel(yaxis, fontsize=12);\n", 762 | " _ = plt.suptitle(title, fontsize=20);\n", 763 | " _ = plt.xticks(rotation=rot);\n", 764 | " _ = plt.plot();" 765 | ] 766 | }, 767 | { 768 | "cell_type": "markdown", 769 | "metadata": {}, 770 | "source": [ 771 | "### Columnar: Top 10 States by Sales" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": null, 777 | "metadata": {}, 778 | "outputs": [], 779 | "source": [ 780 | "sqlin = \\\n", 781 | "'''\n", 782 | "WITH \n", 783 | "SALES(state, amount) AS\n", 784 | " ( \n", 785 | " SELECT STATE, SUM(PRODUCT_QTY * PRODUCT_COST) \n", 786 | " from TRANSACTIONS \n", 787 | " GROUP BY STATE\n", 788 | " ) \n", 789 | "SELECT state, sum(amount) from SALES \n", 790 | "GROUP BY state \n", 791 | "ORDER BY 2 DESC FETCH FIRST 10 ROWS ONLY\n", 792 | "WITH UR\n", 793 | "'''\n", 794 | "plotSQL(sqlin, \n", 795 | " \"Sales per State\", \n", 796 | " \"State\",\n", 797 | " \"Revenue\",\n", 798 | " False,\n", 799 | " m1)" 800 | ] 801 | }, 802 | { 803 | "cell_type": "markdown", 804 | "metadata": {}, 805 | "source": [ 806 | "### Columnar: Top 10 Products Sold" 807 | ] 808 | }, 809 | { 810 | "cell_type": "code", 811 | "execution_count": null, 812 | "metadata": {}, 813 | "outputs": [], 814 | "source": [ 815 | "sqlin = \\\n", 816 | "'''\n", 817 | "WITH \n", 818 | " SALES(product, quantity) AS ( \n", 819 | " SELECT PRODUCT_NAME, PRODUCT_QTY \n", 820 | " from TRANSACTIONS\n", 821 | " ) \n", 822 | "SELECT product, sum(quantity) from SALES \n", 823 | "GROUP BY product\n", 824 | "ORDER BY 2 DESC\n", 825 | "FETCH FIRST 10 ROWS ONLY\n", 826 | "WITH UR\n", 827 | "'''\n", 828 | "plotSQL(sqlin, \n", 829 | " \"Products Sold\", \n", 830 | " \"Product\",\n", 831 | " \"Quantity\",\n", 832 | " True,\n", 833 | " m2)" 834 | ] 835 | }, 836 | { 837 | "cell_type": "markdown", 838 | "metadata": {}, 839 | "source": [ 840 | "### Weekly Sales with Moving Average" 841 | ] 842 | }, 843 | { 844 | "cell_type": "code", 845 | "execution_count": null, 846 | "metadata": {}, 847 | "outputs": [], 848 | "source": [ 849 | "sqlin = \\\n", 850 | "\"\"\"\n", 851 | "WITH TOTALPROD(TX_DATE, TX_TOTAL) AS \n", 852 | " (\n", 853 | " SELECT WEEK(TX_DATE), SUM(PRODUCT_QTY*PRODUCT_COST) FROM TRANSACTIONS\n", 854 | " WHERE YEAR(TX_DATE) = 2017 AND WEEK(TX_DATE) <= 52\n", 855 | " GROUP BY WEEK(TX_DATE)\n", 856 | " )\n", 857 | "SELECT TX_DATE, TX_TOTAL AS SALES, \n", 858 | " AVG(TX_TOTAL) OVER (\n", 859 | " ORDER BY TX_DATE\n", 860 | " ROWS BETWEEN 8 PRECEDING AND CURRENT ROW) AS MOVING_AVG\n", 861 | " FROM TOTALPROD\n", 862 | " ORDER BY TX_DATE\n", 863 | "\"\"\"\n", 864 | "df = %sql {sqlin}\n", 865 | "txdate= df['TX_DATE']\n", 866 | "sales = df['SALES']\n", 867 | "avg = df['MOVING_AVG']\n", 868 | "\n", 869 | "plt.xlabel(\"Week of Year\", fontsize=12);\n", 870 | "plt.ylabel(\"Sales\", fontsize=12);\n", 871 | "plt.suptitle(\"Weekly Sales and Moving Average\", fontsize=20);\n", 872 | "plt.plot(txdate, sales, 'r');\n", 873 | "plt.plot(txdate, avg, 'b');\n", 874 | "plt.show();" 875 | ] 876 | }, 877 | { 878 | "cell_type": "markdown", 879 | "metadata": {}, 880 | "source": [ 881 | "### Top Products Sales by State" 882 | ] 883 | }, 884 | { 885 | "cell_type": "code", 886 | "execution_count": null, 887 | "metadata": {}, 888 | "outputs": [], 889 | "source": [ 890 | "import seaborn as sns\n", 891 | "cm = sns.light_palette(\"green\", as_cmap=True)\n", 892 | "sqlin = \\\n", 893 | "\"\"\"\n", 894 | "WITH \n", 895 | " SALES(product, state, quantity) AS ( \n", 896 | " SELECT PRODUCT_NAME, STATE, SUM(PRODUCT_QTY)\n", 897 | " from TRANSACTIONS\n", 898 | " GROUP BY PRODUCT_NAME, STATE\n", 899 | " ),\n", 900 | " MAXSALES(state, total) AS (\n", 901 | " SELECT STATE, MAX(quantity)\n", 902 | " FROM SALES\n", 903 | " GROUP BY STATE\n", 904 | " )\n", 905 | "SELECT s.state, s.product, m.total \n", 906 | " FROM SALES s, MAXSALES m\n", 907 | "WHERE\n", 908 | " s.state = m.state and\n", 909 | " s.quantity = m.total\n", 910 | "ORDER BY m.total desc\n", 911 | "\"\"\"\n", 912 | "result = %sql {sqlin}\n", 913 | "result.style.background_gradient(cmap=cm)" 914 | ] 915 | }, 916 | { 917 | "cell_type": "markdown", 918 | "metadata": {}, 919 | "source": [ 920 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 921 | ] 922 | } 923 | ], 924 | "metadata": { 925 | "kernelspec": { 926 | "display_name": "Python 3", 927 | "language": "python", 928 | "name": "python3" 929 | }, 930 | "language_info": { 931 | "codemirror_mode": { 932 | "name": "ipython", 933 | "version": 3 934 | }, 935 | "file_extension": ".py", 936 | "mimetype": "text/x-python", 937 | "name": "python", 938 | "nbconvert_exporter": "python", 939 | "pygments_lexer": "ipython3", 940 | "version": "3.6.1" 941 | } 942 | }, 943 | "nbformat": 4, 944 | "nbformat_minor": 2 945 | } 946 | -------------------------------------------------------------------------------- /Db2 Calling Stored Procedures.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Db2 Jupyter: Calling Stored Procedures" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Using the `%sql` magic command to call stored procedures is possible but requires some understanding of how the data and any answer sets are returned to the notebook. The following notebook will take you through the steps required to communicate with stored procedures." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "%run db2.ipynb" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### Supported Stored Procedures\n", 31 | "The `%sql` command supports calls to stored procedures, but there are limitations to how it is used. The following restrictions apply:\n", 32 | "* Only one answer set (cursor) is available from the stored procedure\n", 33 | "* Accessing Output values from the procedure requires the use of the -r (raw) flag\n", 34 | "* Output is available either as a pandas dataframe, or as an array" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "We will start with a simple example. The following stored procedure returns the first and last name of all employees. There are no parameters passed to the procedure. \n", 42 | "\n", 43 | "Some important points about the stored procedure itself:\n", 44 | "* The procedure must declare that there are answer set with the `DYNAMIC RESULTS SETS` clause\n", 45 | "* A cursor must be declared and then left open\n", 46 | "\n", 47 | "If you create more than one cursor, only the first one will get processed by the routine." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "%%sql -d\n", 57 | "create or replace procedure showemp\n", 58 | "DYNAMIC RESULT SETS 1\n", 59 | "begin\n", 60 | " DECLARE c1 CURSOR WITH RETURN TO CALLER FOR\n", 61 | " SELECT FIRSTNME, LASTNAME FROM EMPLOYEE;\n", 62 | " OPEN c1;\n", 63 | "end@" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "Calling the stored procedure is done using the standard SQL CALL procname(arguments) format. There are specific rules for supplying arguments to a stored procedure:\n", 71 | "* The `CALL` command is supported in a `%sql` statement only. It cannot be used as part of a `%%sql` block.\n", 72 | "* Null arguments must use the `null` keyword, rather than the Python `None` equivalent.\n", 73 | "* Brackets `()` are not required for stored procedures that have no arguments.\n", 74 | "\n", 75 | "The next statement will execute the stored procedure and display the results by default." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "%sql CALL SHOWEMP" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "Adding the `-a` (all output) flag to a stored procedure call will display all results. This is the same behavior when running regular SQL statements. If you want to assigned the result to a variable, you only need to add an assignment statement to the SQL." 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "allemps = %sql CALL SHOWEMP\n", 101 | "allemps.head(5)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "The use of the `-r` (raw output) flag will force the output to be converted into a two-dimensional array. The first line of th array will contain the column names, while the remainder will have the results in it. This format is useful if you want to write an application in Python to manipulate the data." 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "%sql -r CALL SHOWEMP" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "Next we will change the stored procedure to accept a single argument which is the employee number. The employee number is supplied to the stored procedure and one record is returned with all of the details of the employee." 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [ 133 | "%%sql -d\n", 134 | "create or replace procedure showemp(in inempno char(6))\n", 135 | "DYNAMIC RESULT SETS 1\n", 136 | "begin\n", 137 | " DECLARE c1 CURSOR WITH RETURN TO CALLER FOR\n", 138 | " SELECT * FROM EMPLOYEE WHERE EMPNO=inempno;\n", 139 | " OPEN c1;\n", 140 | "end@" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "When calling this procedure we must supply the employee number as a 6 character field. The behaviour will be similar to the first example with no parameters." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "%sql CALL SHOWEMP('000010')" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "If you use the standard `%sql` command without the `-r` parameter, only the answer set will be returned or displayed. An assignment statement with `-r` is slightly more complex:\n", 164 | "* The first value will be the array of results\n", 165 | "* Subsequent values will be the input/output parameters of the stored procedure\n", 166 | "The next SQL statement will assign the results to a single variable." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": { 173 | "collapsed": true 174 | }, 175 | "outputs": [], 176 | "source": [ 177 | "oneemp = %sql -r CALL SHOWEMP('000010')" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "The first value in the oneemp array (`oneemp[0]`) is the answer set array, while the subsequent values are the parameters that are passed (or returned) by the stored procedure. `oneemp[1]` should be equal to the employee number we were searching for." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "print(oneemp[1])" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "The use of the `-r` flag becomes mandatory when you are retrieving an answer set from a stored procedure, and also need to access the return results of the arguments to the stored procedure. SQL stored procedures can have input, output, and input/output values. These values are returned back via the `%sql` command but can only be accessed when you use the `-r` flag.\n", 201 | "\n", 202 | "The following stored procedure will return the employees in a department and also a count of the records found." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "%%sql -d\n", 212 | "create or replace procedure showdept(in indeptno char(3), out rowcount int)\n", 213 | "DYNAMIC RESULT SETS 1\n", 214 | "begin\n", 215 | " DECLARE c1 CURSOR WITH RETURN TO CALLER FOR\n", 216 | " SELECT * FROM EMPLOYEE WHERE WORKDEPT=indeptno;\n", 217 | " set rowcount = (SELECT COUNT(*) FROM EMPLOYEE WHERE WORKDEPT=indeptno);\n", 218 | " OPEN c1;\n", 219 | "end@" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "A normal call to this stored procedure (no flags) will return the result set. Note we must supply the second parameter here but do not supply a value since it is an output value. The `null` keyword must be used." 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "%sql CALL SHOWDEPT('E11',null)" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "To access the value of the rowcount, the `-r` flag must be used and the results assigned to a variable." 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": { 249 | "collapsed": true 250 | }, 251 | "outputs": [], 252 | "source": [ 253 | "alldept = %sql -r CALL SHOWDEPT('E11',null)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "The first value of the `alldept` will the result set as an array. The first value will be the department that we requested, and the second will be the rowcount." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "print(alldept[2])" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "An alternative way of accessing the results is to create an assignment statement with the answer set and all of the parameters supplied. The following SQL will assign the answer set and parameters directly to variables rather than an array." 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "answer, dept, count = %sql -r CALL SHOWDEPT('E11',null)\n", 286 | "print(count)" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "Stored procedures without answer sets will always return the parameters or None if nothing was supplied. This simple stored procedure increments the number that was sent to it." 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": {}, 300 | "outputs": [], 301 | "source": [ 302 | "%%sql -d\n", 303 | "create or replace procedure oneparm (inout r INT)\n", 304 | "begin\n", 305 | " SET r = r + 1;\n", 306 | "end@" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "Remember that there is no result set with the stored procedure so the parameters are returned." 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": null, 319 | "metadata": {}, 320 | "outputs": [], 321 | "source": [ 322 | "%sql call oneparm(1)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "### Stored Procedure Parameters\n", 330 | "Stored procedures can have variables passed as parameters. To pass a parameter to a stored procedure, place a colon (`:`) in front of the variable name:\n", 331 | "```\n", 332 | "%sql CALL SHOWDEPT(:deptno,null)\n", 333 | "```\n", 334 | "The python variable `deptno` will be substituted into the CALL statement when it is executed. " 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": null, 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [ 343 | "%%sql -d\n", 344 | "create or replace procedure deptcount(in indeptno char(3), out rowcount int)\n", 345 | "begin\n", 346 | " set rowcount = (SELECT COUNT(*) FROM EMPLOYEE WHERE WORKDEPT=indeptno);\n", 347 | "end@" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "The following code will go through each deparment number and get a count of employees by calling the stored procedure. Note that an answer set return using raw format always returns an array of rows, and each row itself is made up of an array of columns. The code needs to iterate across the rows and then across the columns. The first row of the answer set is the column names, so we skip that by using `depts[1:]` as the starting point. " 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": null, 360 | "metadata": {}, 361 | "outputs": [], 362 | "source": [ 363 | "depts = %sql -r SELECT DISTINCT WORKDEPT FROM EMPLOYEE\n", 364 | "for dept in depts[1:]:\n", 365 | " deptno = dept[0]\n", 366 | " deptparm, count = %sql -r CALL DEPTCOUNT(:deptno,null)\n", 367 | " print(\"Department: %s Employees: %d\" % (deptno,count))" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "## System Stored Procedures\n", 375 | "The Db2 System Stored procedures work using this syntax except for procedures that return binary XML output. At this point in time there is a limitation in retrieving this data using the Python Db2 API calls that are available. An example of a working procedure call is the REORGCHK procedure." 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "metadata": {}, 382 | "outputs": [], 383 | "source": [ 384 | "%sql CALL SYSPROC.REORGCHK_TB_STATS('T','DB2INST1.EMPLOYEE')" 385 | ] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": {}, 390 | "source": [ 391 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 392 | ] 393 | } 394 | ], 395 | "metadata": { 396 | "kernelspec": { 397 | "display_name": "Python 3", 398 | "language": "python", 399 | "name": "python3" 400 | }, 401 | "language_info": { 402 | "codemirror_mode": { 403 | "name": "ipython", 404 | "version": 3 405 | }, 406 | "file_extension": ".py", 407 | "mimetype": "text/x-python", 408 | "name": "python", 409 | "nbconvert_exporter": "python", 410 | "pygments_lexer": "ipython3", 411 | "version": "3.6.1" 412 | } 413 | }, 414 | "nbformat": 4, 415 | "nbformat_minor": 2 416 | } 417 | -------------------------------------------------------------------------------- /Db2 Jupyter Macros.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Db2 Macros\n", 8 | "The `%sql` command also allows the use of macros. Macros are used to substitute text into SQL commands that you execute. Macros substitution is done before any SQL is executed. This allows you to create macros that include commonly used SQL commands or parameters rather than having to type them in. Before using any macros, we must make sure we have loaded the Db2 extensions." 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "%run db2.ipynb" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "### Macro Basics\n", 25 | "A Macro command begins with a percent sign (`%` similar to the `%sql` magic command) and can be found anywhere within a `%sql` line or `%%sql` block. Macros must be separated from other text in the SQL with a space. \n", 26 | "\n", 27 | "To define a macro, the `%%sql macro ` command is used. The body of the macro is found in the cell below the definition of the macro. This simple macro called EMPTABLE will substitute a SELECT statement into a SQL block." 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": { 34 | "collapsed": true 35 | }, 36 | "outputs": [], 37 | "source": [ 38 | "%%sql macro emptable\n", 39 | "select * from employee" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "The name of the macro follows the `%%sql macro` command and is case sensitive. To use the macro, we can place it anywhere in the `%sql` block. This first example uses it by itself." 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "%sql %emptable" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "The actual SQL that is generated is not shown by default. If you do want to see the SQL that gets generated, you can use the `-e` (echo) option to display the final SQL statement. The following example will display the generated SQL. Note that the echo setting is only used to display results for the current cell that is executing." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "%%sql -e\n", 72 | "%emptable" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "Since we can use the `%emptable` anywhere in our SQL, we can add additional commands around it. In this example we add some logic to the select statement." 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": null, 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "%%sql\n", 89 | "%emptable\n", 90 | "where empno = '000010'" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "Macros can also have parameters supplied to them. The parameters are included after the name of the macro. Here is a simple macro which will use the first parameter as the name of the column we want returned from the EMPLOYEE table." 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "metadata": { 104 | "collapsed": true 105 | }, 106 | "outputs": [], 107 | "source": [ 108 | "%%sql macro emptable\n", 109 | "SELECT {1} FROM EMPLOYEE" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "This example illustrates two concepts. The `MACRO` command will replace any existing macro with the same name. Since we already have an emptable macro, the macro body will be replaced with this code. In addition, macros only exist for the duration of your notebook. If you create another Jupyter notebook, it will not contain any macros that you may have created. If there are macros that you want to share across notebooks, you should create a separate notebook and place all of the macro definitions in there. Then you can include these macros by executing the `%run` command using the name of the notebook that contains the macros.\n", 117 | "\n", 118 | "The following SQL shows the use of the macro with parameters." 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "%%sql\n", 128 | "%emptable(lastname)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "The remainder of this notebook will explore the advanced features of macros." 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "## Macro Parameters\n", 143 | "Macros can have up to 9 parameters supplied to them. The parameters are numbered from 1 to 9, left to right in the argument list for the macro. For instance, the following macro has 5 paramters:\n", 144 | "```\n", 145 | "%emptable(lastname,firstnme,salary,bonus,'000010')\n", 146 | "```\n", 147 | "Parameters are separated by commas, and can contain strings as shown using single or double quotes. When the parameters are used within a macro, the quotes are not included as part of the string. If you do want to pass the quotes as part of the parameter, use square brackets **[]** around the string. For instance, the following parameter will not have quotes passed to the macro:\n", 148 | "\n", 149 | "```python\n", 150 | "%sql %abc('no quotes')\n", 151 | "```\n", 152 | "\n", 153 | "To send the string with quotes, you could surround the parameter with other quotes **`\"'hello'\"`** or use the following technique if you use multiple quotes in your string:\n", 154 | "\n", 155 | "```python\n", 156 | "%sql %abc (['quotes'])\n", 157 | "```\n", 158 | "\n", 159 | "To use a parameter within your macro, you enclose the parameter number with braces `{}`. The next command will illustrate the use of the five parameters." 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": { 166 | "collapsed": true 167 | }, 168 | "outputs": [], 169 | "source": [ 170 | "%%sql macro emptable\n", 171 | "display on\n", 172 | "SELECT {1},{2},{3},{4} \n", 173 | "FROM EMPLOYEE\n", 174 | "WHERE EMPNO = '{5}'" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "Note that the `EMPNO` field is a character field in the `EMPLOYEE` table. Even though the employee number was supplied as a string, the quotes are not included in the parameter. The macro places quotes around the parameter `{5}` so that it is properly used in the SQL statement. The other feature of this macro is that the display (on) command is part of the macro body so the generated SQL will always be displayed." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "%sql %emptable(lastname,firstnme,salary,bonus,'000010')" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "We can modify the macro to assume that the parameters will include the quotes in the string." 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "metadata": { 204 | "collapsed": true 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "%%sql macro emptable\n", 209 | "SELECT {1},{2},{3},{4} \n", 210 | "FROM EMPLOYEE\n", 211 | "WHERE EMPNO = {5}" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "We just have to make sure that the quotes are part of the parameter now." 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "%sql -e %emptable(lastname,firstnme,salary,bonus,\"'000010'\")" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "We could use the square brackets as an alternative way of passing the parameter." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "%sql -e %emptable(lastname,firstnme,salary,bonus,['000010'])" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "Parameters can also be named in a macro. To name an input value, the macro needs to use the format:\n", 251 | "```\n", 252 | "field=value\n", 253 | "```\n", 254 | "For instance, the following macro call will have 2 numbered parameters and one named parameter:\n", 255 | "```\n", 256 | "%showemp(firstnme,lastname,logic=\"WHERE EMPNO='000010'\")\n", 257 | "```\n", 258 | "From within the macro the parameter count would be 2 and the value for parameter 1 is `firstnme`, and the value for parameter 2 is `lastname`. Since we have a named parameter, it is not included in the list of numbered parameters. In fact, the following statement is equivalent since unnamed parameters are numbered in the order that they are found in the macro, ignoring any named parameters that are found:\n", 259 | "```\n", 260 | "%showemp(firstnme,logic=\"WHERE EMPNO='000010'\",lastname)\n", 261 | "```\n", 262 | "The following macro illustrates this feature." 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": null, 268 | "metadata": { 269 | "collapsed": true 270 | }, 271 | "outputs": [], 272 | "source": [ 273 | "%%sql macro showemp\n", 274 | "SELECT {1},{2} FROM EMPLOYEE\n", 275 | " {logic}" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": {}, 282 | "outputs": [], 283 | "source": [ 284 | "%sql %showemp(firstnme,lastname,logic=\"WHERE EMPNO='000010'\")" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "%sql %showemp(firstnme,logic=\"WHERE EMPNO='000010'\",lastname)" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "Named parameters are useful when there are many options within the macro and you don't want to keep track of which position it is in. In addition, if you have a variable number of parameters, you should use named parameters for the fixed (required) parameters and numbered parameters for the optional ones." 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "## Macro Coding Overview\n", 308 | "Macros can contain any type of text, including SQL commands. In addition to the text, macros can also contain the following keywords:\n", 309 | "\n", 310 | "* echo - Display a message\n", 311 | "* exit - Exit the macro immediately\n", 312 | "* if/else/endif - Conditional logic\n", 313 | "* var - Set a variable\n", 314 | "* display - Turn the display of the final text on\n", 315 | "\n", 316 | "The only restriction with macros is that macros cannot be nested. This means I can't call a macro from within a macro. The sections below explain the use of each of these statement types." 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "### Echo Option\n", 324 | "The `-e` option will result in the final SQL being display after the macro substitution is done. \n", 325 | "```\n", 326 | "%%sql -e\n", 327 | "%showemp(...)\n", 328 | "```" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "metadata": { 335 | "collapsed": true 336 | }, 337 | "outputs": [], 338 | "source": [ 339 | "%%sql macro showdisplay\n", 340 | "SELECT * FROM EMPLOYEE FETCH FIRST ROW ONLY" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "Using the `-e` flag will display the final SQL that is run." 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": {}, 354 | "outputs": [], 355 | "source": [ 356 | "%sql -e %showdisplay" 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "If we remove the `-e` option, the final SQL will not be shown." 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": {}, 370 | "outputs": [], 371 | "source": [ 372 | "%sql %showdisplay" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "### Exit Command\n", 380 | "The `exit` command will terminate the processing within a macro and not run the generated SQL. You would use this when a condition is not met within the macro (like a missing parameter). " 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": { 387 | "collapsed": true 388 | }, 389 | "outputs": [], 390 | "source": [ 391 | "%%sql macro showexit\n", 392 | "echo This message gets shown\n", 393 | "SELECT * FROM EMPLOYEE FETCH FIRST ROW ONLY\n", 394 | "exit\n", 395 | "echo This message does not get shown" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "The macro that was defined will not show the second statement, nor will it execute the SQL that was defined in the macro body." 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "metadata": {}, 409 | "outputs": [], 410 | "source": [ 411 | "%sql %showexit" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "### Echo Command\n", 419 | "As you already noticed in the previous example, the `echo` command will display information on the screen. Any text following the command will have variables substituted and then displayed with a green box surrounding it. The following code illustates the use of the command." 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "metadata": { 426 | "collapsed": true 427 | }, 428 | "outputs": [], 429 | "source": [ 430 | "%%sql macro showecho\n", 431 | "echo Here is a message\n", 432 | "echo Two lines are shown" 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "The echo command will show each line as a separate box." 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": null, 445 | "metadata": {}, 446 | "outputs": [], 447 | "source": [ 448 | "%sql %showecho" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "If you want to have a message go across multiple lines use the `
` to start a new line." 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": null, 461 | "metadata": { 462 | "collapsed": true 463 | }, 464 | "outputs": [], 465 | "source": [ 466 | "%%sql macro showecho\n", 467 | "echo Here is a paragraph.
And a final paragraph." 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": null, 473 | "metadata": {}, 474 | "outputs": [], 475 | "source": [ 476 | "%sql %showecho" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "### Var Command\n", 484 | "The var (variable) command sets a macro variable to a value. A variable is referred to in the macro script using curly braces `{name}`. By default the arguments that are used in the macro call are assigned the variable names `{1}` to `{9}`. If you use a named argument (option=\"value\") in the macro call, a variable called `{option}` will contain the value within the macro.\n", 485 | "\n", 486 | "To set a variable within a macro you would use the `var` command:\n", 487 | "```\n", 488 | "var name value\n", 489 | "```\n", 490 | "The variable name can be any name as long as it only includes letters, numbers, underscore `_` and `$`. Variable names are case sensitive so `{a}` and `{A}` are different. When the macro finishes executing, the contents of the variables will be lost. If you do want to keep a variable between macros, you should start the name of the variable with a `$` sign:\n", 491 | "```\n", 492 | "var $name value\n", 493 | "```\n", 494 | "This variable will persist between macro calls." 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": null, 500 | "metadata": { 501 | "collapsed": true 502 | }, 503 | "outputs": [], 504 | "source": [ 505 | "%%sql macro initialize\n", 506 | "var $hello Hello There \n", 507 | "var hello You won't see this" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": { 514 | "collapsed": true 515 | }, 516 | "outputs": [], 517 | "source": [ 518 | "%%sql macro runit\n", 519 | "echo The value of hello is *{hello}*\n", 520 | "echo {$hello}" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": {}, 526 | "source": [ 527 | "Calling `runit` will display the variable that was set in the first macro." 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": null, 533 | "metadata": {}, 534 | "outputs": [], 535 | "source": [ 536 | "%sql %initialize\n", 537 | "%sql %runit" 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": {}, 543 | "source": [ 544 | "A variable can be converted to uppercase by placing the `^` beside the variable name or number. " 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": null, 550 | "metadata": { 551 | "collapsed": true 552 | }, 553 | "outputs": [], 554 | "source": [ 555 | "%%sql macro runit\n", 556 | "echo The first parameter is {^1}" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": null, 562 | "metadata": {}, 563 | "outputs": [], 564 | "source": [ 565 | "%sql %runit(Hello There)" 566 | ] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "The string following the variable name can include quotes and these will not be removed. Only quotes that are supplied in a parameter to a macro will have the quotes removed." 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "execution_count": null, 578 | "metadata": { 579 | "collapsed": true 580 | }, 581 | "outputs": [], 582 | "source": [ 583 | "%%sql macro runit\n", 584 | "var hello This is a long string without quotes\n", 585 | "var hello2 'This is a long string with quotes'\n", 586 | "echo {hello}
{hello2}" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": null, 592 | "metadata": {}, 593 | "outputs": [], 594 | "source": [ 595 | "%sql %runit" 596 | ] 597 | }, 598 | { 599 | "cell_type": "markdown", 600 | "metadata": {}, 601 | "source": [ 602 | "When passing parameters to a macro, the program will automatically create variables based on whether they are positional parameters (1, 2, ..., n) or named parameters. The following macro will be used to show how parameters are passed to the routine." 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": null, 608 | "metadata": { 609 | "collapsed": true 610 | }, 611 | "outputs": [], 612 | "source": [ 613 | "%%sql macro showvar\n", 614 | "echo parm1={1}
parm2={2}
message={message}" 615 | ] 616 | }, 617 | { 618 | "cell_type": "markdown", 619 | "metadata": {}, 620 | "source": [ 621 | "Calling the macro will show how the variable names get assigned and used." 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": null, 627 | "metadata": {}, 628 | "outputs": [], 629 | "source": [ 630 | "%sql %showvar(parameter 1, another parameter,message=\"Hello World\")" 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": {}, 636 | "source": [ 637 | "If you pass an empty value (or if a variable does not exist), a \"null\" value will be shown." 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": null, 643 | "metadata": {}, 644 | "outputs": [], 645 | "source": [ 646 | "%sql %showvar(1,,message=\"Hello World\")" 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "An empty string also returns a null value." 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": null, 659 | "metadata": {}, 660 | "outputs": [], 661 | "source": [ 662 | "%sql %showvar(1,2,message=\"\")" 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": {}, 668 | "source": [ 669 | "Finally, any string that is supplied to the macro will not include the quotes in the variable. The Hello World string will not have quotes when it is displayed:" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": null, 675 | "metadata": {}, 676 | "outputs": [], 677 | "source": [ 678 | "%sql %showvar(1,2,message=\"Hello World\")" 679 | ] 680 | }, 681 | { 682 | "cell_type": "markdown", 683 | "metadata": {}, 684 | "source": [ 685 | "You need to supply the quotes in the script or macro when using variables since quotes are stripped from any strings that are supplied." 686 | ] 687 | }, 688 | { 689 | "cell_type": "code", 690 | "execution_count": null, 691 | "metadata": { 692 | "collapsed": true 693 | }, 694 | "outputs": [], 695 | "source": [ 696 | "%%sql macro showvar\n", 697 | "echo parm1={1}
parm2={2}
message='{message}'" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": null, 703 | "metadata": {}, 704 | "outputs": [], 705 | "source": [ 706 | "%sql %showvar(1,2,message=\"Hello World\")" 707 | ] 708 | }, 709 | { 710 | "cell_type": "markdown", 711 | "metadata": {}, 712 | "source": [ 713 | "The count of the total number of parameters passed is found in the `{argc}` variable. You can use this variable to decide whether or not the user has supplied the proper number of arguments or change which code should be executed." 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": null, 719 | "metadata": { 720 | "collapsed": true 721 | }, 722 | "outputs": [], 723 | "source": [ 724 | "%%sql macro showvar\n", 725 | "echo The number of unnamed parameters is {argc}. The where clause is *{where}*." 726 | ] 727 | }, 728 | { 729 | "cell_type": "markdown", 730 | "metadata": {}, 731 | "source": [ 732 | "Unnamed parameters are included in the count of arguments while named parameters are ignored." 733 | ] 734 | }, 735 | { 736 | "cell_type": "code", 737 | "execution_count": null, 738 | "metadata": {}, 739 | "outputs": [], 740 | "source": [ 741 | "%sql %showvar(1,2,option=nothing,3,4,where=)" 742 | ] 743 | }, 744 | { 745 | "cell_type": "markdown", 746 | "metadata": {}, 747 | "source": [ 748 | "### If/Else/Endif Command\n", 749 | "If you need to add conditional logic to your macro then you should use the `if/else/endif` commands. The format of the `if` statement is:\n", 750 | "```\n", 751 | "if variable condition value\n", 752 | " statements\n", 753 | "else\n", 754 | " statements\n", 755 | "endif\n", 756 | "```\n", 757 | "The else portion is optional, but the block must be closed with the `endif` command. If statements can be nested up to 9 levels deep:\n", 758 | "```\n", 759 | "if condition 1\n", 760 | " if condition 2\n", 761 | " statements\n", 762 | " else\n", 763 | " if condition 3\n", 764 | " statements\n", 765 | " end if \n", 766 | " endif\n", 767 | "endif\n", 768 | "```\n", 769 | "If the condition in the if clause is true, then anything following the if statement will be executed and included in the final SQL statement. For instance, the following code will create a SQL statement based on the value of parameter 1:\n", 770 | "```\n", 771 | "if {1} = null\n", 772 | " SELECT * FROM EMPLOYEE\n", 773 | "else\n", 774 | " SELECT {1} FROM EMPLOYEE\n", 775 | "endif\n", 776 | "```" 777 | ] 778 | }, 779 | { 780 | "cell_type": "markdown", 781 | "metadata": {}, 782 | "source": [ 783 | "#### Conditions\n", 784 | "The `if` statement requires a condition to determine whether or not the block should be executed. The condition uses the following format:\n", 785 | "```\n", 786 | "if {variable} condition {variable} | constant | null\n", 787 | "```\n", 788 | "`Variable` can be a number from 1 to 9 which represents the argument in the macro list. So `{1}` refers to the first argument. The variable can also be the name of a named parameter or global variable.\n", 789 | "\n", 790 | "The condition is one of the following comparison operators:\n", 791 | "- `=`, `==`: Equal to\n", 792 | "- `<`: Less than\n", 793 | "- `>`: Greater than\n", 794 | "- `<=`,`=<`: Less than or equal to\n", 795 | "- `>=`, `=>`: Greater than or equal to\n", 796 | "- `!=`, `<>` : Not equal to\n", 797 | "\n", 798 | "The variable or constant will have quotes stripped away before doing the comparison. If you are testing for the existence of a variable, or to check if a variable is empty, use the keyword `null`." 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": null, 804 | "metadata": { 805 | "collapsed": true 806 | }, 807 | "outputs": [], 808 | "source": [ 809 | "%%sql macro showif\n", 810 | "if {argc} = 0\n", 811 | " echo No parameters supplied\n", 812 | " if {option} <> null\n", 813 | " echo The optional parameter option was set: {option}\n", 814 | " endif\n", 815 | "else\n", 816 | " if {argc} = \"1\"\n", 817 | " echo One parameter was supplied\n", 818 | " else\n", 819 | " echo More than one parameter was supplied: {argc}\n", 820 | " endif\n", 821 | "endif" 822 | ] 823 | }, 824 | { 825 | "cell_type": "markdown", 826 | "metadata": {}, 827 | "source": [ 828 | "Running the previous macro with no parameters will check to see if the option keyword was used." 829 | ] 830 | }, 831 | { 832 | "cell_type": "code", 833 | "execution_count": null, 834 | "metadata": {}, 835 | "outputs": [], 836 | "source": [ 837 | "%sql %showif" 838 | ] 839 | }, 840 | { 841 | "cell_type": "markdown", 842 | "metadata": {}, 843 | "source": [ 844 | "Now include the optional parameter." 845 | ] 846 | }, 847 | { 848 | "cell_type": "code", 849 | "execution_count": null, 850 | "metadata": {}, 851 | "outputs": [], 852 | "source": [ 853 | "%sql %showif(option=\"Yes there is an option\")" 854 | ] 855 | }, 856 | { 857 | "cell_type": "markdown", 858 | "metadata": {}, 859 | "source": [ 860 | "Finally, issue the macro with multiple parameters." 861 | ] 862 | }, 863 | { 864 | "cell_type": "code", 865 | "execution_count": null, 866 | "metadata": {}, 867 | "outputs": [], 868 | "source": [ 869 | "%sql %showif(Here,are,a,number,of,parameters)" 870 | ] 871 | }, 872 | { 873 | "cell_type": "markdown", 874 | "metadata": {}, 875 | "source": [ 876 | "One additional option is available for variable substitution. If the first character of the variable name or parameter number is the `^` symbol, it will uppercase the entire string." 877 | ] 878 | }, 879 | { 880 | "cell_type": "code", 881 | "execution_count": null, 882 | "metadata": { 883 | "collapsed": true 884 | }, 885 | "outputs": [], 886 | "source": [ 887 | "%%sql macro showif\n", 888 | "if {option} <> null\n", 889 | " echo The optional parameter option was set: {^option}\n", 890 | "endif" 891 | ] 892 | }, 893 | { 894 | "cell_type": "code", 895 | "execution_count": null, 896 | "metadata": {}, 897 | "outputs": [], 898 | "source": [ 899 | "%sql %showif(option=\"Yes there is an option\")" 900 | ] 901 | }, 902 | { 903 | "cell_type": "markdown", 904 | "metadata": {}, 905 | "source": [ 906 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 907 | ] 908 | } 909 | ], 910 | "metadata": { 911 | "kernelspec": { 912 | "display_name": "Python 3", 913 | "language": "python", 914 | "name": "python3" 915 | }, 916 | "language_info": { 917 | "codemirror_mode": { 918 | "name": "ipython", 919 | "version": 3 920 | }, 921 | "file_extension": ".py", 922 | "mimetype": "text/x-python", 923 | "name": "python", 924 | "nbconvert_exporter": "python", 925 | "pygments_lexer": "ipython3", 926 | "version": "3.6.1" 927 | } 928 | }, 929 | "nbformat": 4, 930 | "nbformat_minor": 2 931 | } 932 | -------------------------------------------------------------------------------- /Db2 Row and Column Access Control.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "hideCode": false, 7 | "hidePrompt": false 8 | }, 9 | "source": [ 10 | "# Db2 Row and Column Access Control" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": { 16 | "hideCode": false, 17 | "hidePrompt": false 18 | }, 19 | "source": [ 20 | "Row permissions and column masks are two new database concepts introduced in DB2 10 that address the shortcomings of traditional security methods. These new features act as a second layer of security that complements the table privileges security model available in prior releases of DB2.\n", 21 | "\n", 22 | "One advantage of row permissions and column masks is that no database user is automatically exempted from them — not even users with the DATAACCESS authority. The ability to manage row permissions and column masks within a database is vested solely in a user with SECADM, thereby ensuring that users with DATAACCESS authority can no longer freely access all data in the database.\n", 23 | " \n", 24 | "Another key advantage is that row permissions and column masks ensure that table data is protected regardless of how the SQL accesses the table, be it through an application, through ad hoc query tools, or through report generation tools.\n", 25 | " \n", 26 | "Finally, and perhaps most significantly, row permissions and column masks require no application changes; RCAC is transparent to existing applications." 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": { 33 | "hideCode": false, 34 | "hideOutput": false, 35 | "hidePrompt": false 36 | }, 37 | "outputs": [], 38 | "source": [ 39 | "%run db2.ipynb" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "We populate the database with the EMPLOYEE and DEPARTMENT tables so that we can run the various examples." 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "%sql -sampledata" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "hideCode": false, 62 | "hidePrompt": false 63 | }, 64 | "source": [ 65 | "## Health Care Scenario\n", 66 | "\n", 67 | "We will use a sample scenario based on the Health-care industry.\n", 68 | " \n", 69 | "Data in a hospital needs to be provided to a number of individuals,including other organizations like insurance companies, clearing house etc. Each of these companies have their own employees, and there are government regulations in place dictating who can see what data or parts of patient data.\n", 70 | "\n", 71 | "As a result security policies must be in place to ensure the rules are being met.\n", 72 | "\n", 73 | "Across the scenario there will be different groups of individuals that access the database and there must be security in place to ensure each group or individual only sees the data they are allowed to see.\n", 74 | "\n", 75 | "This SQL will create the base PATIENT table and populate it with information." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": { 82 | "hideCode": false, 83 | "hidePrompt": false 84 | }, 85 | "outputs": [], 86 | "source": [ 87 | "%%sql -q\n", 88 | "DROP TABLE HC.PATIENTS;\n", 89 | "\n", 90 | "CREATE TABLE HC.PATIENTS\n", 91 | " (\n", 92 | " SIN VARCHAR(11),\n", 93 | " USERID VARCHAR(8),\n", 94 | " NAME VARCHAR(8),\n", 95 | " ADDRESS VARCHAR(12),\n", 96 | " PHARMACY VARCHAR(12),\n", 97 | " ACCT_BALANCE DEC(9,2),\n", 98 | " PCP_ID VARCHAR(8)\n", 99 | " );\n", 100 | "\n", 101 | "INSERT INTO HC.PATIENTS VALUES\n", 102 | " ('123 551 234','MAX','Max','First St.','hypertension',89.7,'LEE'),\n", 103 | " ('123 589 812','MIKE','Mike','Long St.','diabetics',8.3,'JAMES'),\n", 104 | " ('123 119 856','SAM','Sam','Big St.','aspirin',12.5,'LEE'),\n", 105 | " ('123 191 454','DOUG','Doug','Good St.','influenza',7.68,'JAMES'),\n", 106 | " ('123 456 789','BOB','Bob','123 Some St.','hypertension',9,'LEE');\n", 107 | "\n", 108 | "SELECT * FROM HC.PATIENTS;" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": { 114 | "hideCode": false, 115 | "hidePrompt": false 116 | }, 117 | "source": [ 118 | "## Setting Permissions for Access\n", 119 | "The following permissions need to be defined for accessing the PATIENT data:\n", 120 | "\n", 121 | "- Patients can only access their own data\n", 122 | "- Physicians Can only access their own patients' data\n", 123 | "- Membership officers, Accounting, and Drug researchers can access all data\n", 124 | "- All other access is denied\n", 125 | "\n", 126 | "Normally SQL similar to the following is used to define who can access the data.\n", 127 | "```\n", 128 | "verify_role_for_user(SESSION_USER,'PATIENT') = 1\n", 129 | " AND patient.userid = SESSION_USER\n", 130 | "```\n", 131 | "Changing from one user to another requires a number of `SECADM` features and separate connections, so in order to simplify the demo we will use a `ROLE` table and a `UDF` to simulate this capability." 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "## Create ROLE table\n", 139 | "The ROLE table will simulate what ROLES does natively in DB2. The table contains two \n", 140 | "columns:\n", 141 | "\n", 142 | "- USERID - the name or ID of the user\n", 143 | "- ROLE - what their ROLE is \n", 144 | "\n", 145 | "Users can have more than one ROLE in this table." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "%%sql\n", 155 | "DROP TABLE HC.ROLES;\n", 156 | "CREATE TABLE HC.ROLES\n", 157 | " (\n", 158 | " USERID VARCHAR(8),\n", 159 | " ROLE VARCHAR(10)\n", 160 | " );" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "We also need to create a `SESSION_USER` global variable that can be used to identify the \"current\" user. Normally you would just use the `SESSION_USER` variable in the rules, but since we don't want to use SECADM we need to fake the userid." 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "%sql CREATE OR REPLACE VARIABLE HC.SESSION_USER VARCHAR(8);" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "The `HC.VERIFY_ROLE_FOR_USER` function will mimic what the `VERIFY_ROLE_FOR_USER`\n", 184 | "function does by returning a value of 1 if the user has that particular ROLE." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "%%sql -d\n", 194 | "CREATE OR REPLACE FUNCTION \n", 195 | " HC.VERIFY_ROLE_FOR_USER(UID VARCHAR(8), IN_ROLE VARCHAR(10))\n", 196 | " SECURED NO EXTERNAL ACTION DETERMINISTIC\n", 197 | " RETURNS INT\n", 198 | " BEGIN ATOMIC\n", 199 | " RETURN\n", 200 | " SELECT COUNT(*) FROM HC.ROLES H\n", 201 | " WHERE H.USERID = UID AND H.ROLE = IN_ROLE;\n", 202 | " END@" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "Create some ROLES for people in the Healthcare scenario." 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": null, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "%%sql\n", 219 | "INSERT INTO HC.ROLES\n", 220 | " VALUES\n", 221 | " ('LEE','PCP'),('JAMES','PCP'),\n", 222 | " ('MAX','PATIENT'),('MIKE','PATIENT'),('SAM','PATIENT'),\n", 223 | " ('DOUG','PATIENT'),('BOB','PATIENT'),\n", 224 | " ('JOHN','ACCOUNTING'),\n", 225 | " ('TOM','MEMBERSHIP'),\n", 226 | " ('JANE','RESEARCH'),\n", 227 | " ('FRED','DBA');" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "Now we can test to see if a user has a particular role. This first test checks to see the Dr. Lee is a PCP (Primary Care Provider)." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "%%sql\n", 244 | "SET HC.SESSION_USER = 'LEE';\n", 245 | "\n", 246 | "VALUES\n", 247 | " HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'PCP');" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "At this point in time we can set up some rules on what the various groups can see." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "%%sql\n", 264 | "CREATE OR REPLACE PERMISSION HC.ROW_ACCESS ON HC.PATIENTS\n", 265 | " FOR ROWS WHERE\n", 266 | " (\n", 267 | " HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'PATIENT') = 1 AND\n", 268 | " HC.SESSION_USER = USERID\n", 269 | " )\n", 270 | " OR\n", 271 | " ( \n", 272 | " HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'PCP') = 1 AND\n", 273 | " HC.SESSION_USER = PCP_ID\n", 274 | " )\n", 275 | " OR\n", 276 | " (\n", 277 | " HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'MEMBERSHIP') = 1 OR\n", 278 | " HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'ACCOUNTING') = 1 OR\n", 279 | " HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'RESEARCH') = 1\n", 280 | " )\n", 281 | " ENFORCED FOR ALL ACCESS\n", 282 | " ENABLE;" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "The rules now need to be activated in order for them to be enforced." 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": {}, 296 | "outputs": [], 297 | "source": [ 298 | "%sql ALTER TABLE HC.PATIENTS ACTIVATE ROW ACCESS CONTROL;" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "## Updating a Patient Record\n", 306 | "Dr. Lee needs to update the medication that his patient Sam is taking.\n", 307 | "\n", 308 | "First we examine the original record in the table." 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "%%sql \n", 318 | "SET HC.SESSION_USER = 'LEE';\n", 319 | "\n", 320 | "SELECT * FROM HC.PATIENTS WHERE NAME = 'Sam';" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "Dr. Lee decides to give Sam some codeine for his pain. The update is successful and we can see the results." 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": {}, 334 | "outputs": [], 335 | "source": [ 336 | "%%sql\n", 337 | "UPDATE HC.PATIENTS SET PHARMACY = 'Codeine' WHERE NAME = 'Sam';\n", 338 | "\n", 339 | "SELECT * FROM HC.PATIENTS WHERE NAME = 'Sam'" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "## Update Failure\n", 347 | "\n", 348 | "If a doctor is not assigned to a patient, they will not be able to update their\n", 349 | "record. The rule is that if you can't read (select) the record then you can't\n", 350 | "update it.\n" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": null, 356 | "metadata": {}, 357 | "outputs": [], 358 | "source": [ 359 | "%%sql\n", 360 | "SET HC.SESSION_USER = 'LEE';\n", 361 | "\n", 362 | "UPDATE HC.PATIENTS SET PHARMACY = 'Codeine' WHERE NAME = 'Doug';" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "The UPDATE completes, but no records are modified. To see all of the records, we need to change our userid to someone who can see all records (John in accounting). Note there is no way around this restriction - you must have the proper clearance to see the records." 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": {}, 376 | "outputs": [], 377 | "source": [ 378 | "%%sql\n", 379 | "SET HC.SESSION_USER = 'JOHN';\n", 380 | "\n", 381 | "SELECT * FROM HC.PATIENTS WHERE NAME = 'Doug'" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "## Selecting Rows from a Table\n", 389 | "\n", 390 | "The rule with Row and Column access control is that if you can read (select) the data then you can potentially update it. Here is a select statement issued by Dr. Lee and you can see that only patients assigned to him are displayed." 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "%%sql\n", 400 | "SET HC.SESSION_USER = 'LEE';\n", 401 | "\n", 402 | "SELECT * FROM HC.PATIENTS;" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "Changing the current user to Dr. James will change the results that are displayed." 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "%%sql\n", 419 | "SET HC.SESSION_USER = 'JAMES';\n", 420 | "\n", 421 | "SELECT * FROM HC.PATIENTS;" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "Changing the current user to one of the accounting, research, or fund raising users will result in all records being displayed." 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "metadata": {}, 435 | "outputs": [], 436 | "source": [ 437 | "%%sql\n", 438 | "SET HC.SESSION_USER = 'JOHN';\n", 439 | "\n", 440 | "SELECT * FROM HC.PATIENTS;" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": {}, 446 | "source": [ 447 | "Patients are able to see only their row." 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": null, 453 | "metadata": {}, 454 | "outputs": [], 455 | "source": [ 456 | "%%sql\n", 457 | "SET HC.SESSION_USER = 'BOB';\n", 458 | "\n", 459 | "SELECT * FROM HC.PATIENTS;" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "A DBA (Fred) who is not part of any of these groups will not be able to see any of the records, even though they may performance maintenance on the table itself." 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": null, 472 | "metadata": {}, 473 | "outputs": [], 474 | "source": [ 475 | "%%sql\n", 476 | "SET HC.SESSION_USER = 'FRED';\n", 477 | "\n", 478 | "SELECT * FROM HC.PATIENTS;" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "## Column Masks\n", 486 | "Column access control is implemented in the form of a mask, or lack thereof, on the data.\n", 487 | "\n", 488 | "Using our Health-care scenario as the base, we will implement column access control/rules in two forms:\n", 489 | "\n", 490 | "- We MASK the account balance column. Only the ACCOUNTING team can see the account balance in the table and all others see a balance of zero.\n", 491 | "\n", 492 | "- We MASK the SIN column (Social Insurance Number column). Only the PATIENT themselves can see the full Social Insurance number and all others see only the last three digits of the number.\n", 493 | "\n", 494 | "This first MASK is on account balance and sets it to zero for everyone except for those people in accounting." 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": null, 500 | "metadata": {}, 501 | "outputs": [], 502 | "source": [ 503 | "%%sql\n", 504 | "CREATE OR REPLACE MASK HC.ACCT_BALANCE_MASK ON HC.PATIENTS FOR\n", 505 | " COLUMN ACCT_BALANCE RETURN\n", 506 | " CASE\n", 507 | " WHEN HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'ACCOUNTING') = 1\n", 508 | " THEN ACCT_BALANCE\n", 509 | " ELSE 0.00\n", 510 | " END\n", 511 | " ENABLE;" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "The second mask will return the entire SIN number for the PATIENT, but only the last three digits of the SIN for all others." 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": null, 524 | "metadata": {}, 525 | "outputs": [], 526 | "source": [ 527 | "%%sql\n", 528 | "CREATE OR REPLACE MASK HC.SIN_MASK ON HC.PATIENTS FOR\n", 529 | " COLUMN SIN RETURN\n", 530 | " CASE\n", 531 | " WHEN HC.VERIFY_ROLE_FOR_USER(HC.SESSION_USER,'PATIENT') = 1\n", 532 | " THEN SIN\n", 533 | " ELSE\n", 534 | " 'XXX XXX ' || SUBSTR(SIN,9,3)\n", 535 | " END\n", 536 | " ENABLE;" 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "metadata": {}, 542 | "source": [ 543 | "In order for the MASKS to be effective, they need to be enabled for the table." 544 | ] 545 | }, 546 | { 547 | "cell_type": "code", 548 | "execution_count": null, 549 | "metadata": {}, 550 | "outputs": [], 551 | "source": [ 552 | "%sql ALTER TABLE HC.PATIENTS ACTIVATE COLUMN ACCESS CONTROL;" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "When someone from accounting now views the records, they will only see the last three digits of the SIN field but they will see all of the accounting data." 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "execution_count": null, 565 | "metadata": {}, 566 | "outputs": [], 567 | "source": [ 568 | "%%sql\n", 569 | "SET HC.SESSION_USER = 'JOHN';\n", 570 | "\n", 571 | "SELECT * FROM HC.PATIENTS;" 572 | ] 573 | }, 574 | { 575 | "cell_type": "markdown", 576 | "metadata": {}, 577 | "source": [ 578 | "When a researcher looks at the data, they will also only see the last three digits of the SIN field, but they will get a zero balance in the accounting field." 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [ 587 | "%%sql\n", 588 | "SET HC.SESSION_USER = 'JANE';\n", 589 | "\n", 590 | "SELECT * FROM HC.PATIENTS;" 591 | ] 592 | }, 593 | { 594 | "cell_type": "markdown", 595 | "metadata": {}, 596 | "source": [ 597 | "Dr. Lee will only see his patients (ROW CONTROL) and will see the last three digits of the SIN field and zero for the account balance (COLUMN MASK)." 598 | ] 599 | }, 600 | { 601 | "cell_type": "code", 602 | "execution_count": null, 603 | "metadata": {}, 604 | "outputs": [], 605 | "source": [ 606 | "%%sql\n", 607 | "SET HC.SESSION_USER = 'LEE';\n", 608 | "\n", 609 | "SELECT * FROM HC.PATIENTS;" 610 | ] 611 | }, 612 | { 613 | "cell_type": "markdown", 614 | "metadata": {}, 615 | "source": [ 616 | "Finally, the patients will be able to see their own SIN field, but the account balance will show as zero (presumably so they don't get sick over the amount!)." 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": null, 622 | "metadata": {}, 623 | "outputs": [], 624 | "source": [ 625 | "%%sql\n", 626 | "SET HC.SESSION_USER = 'BOB';\n", 627 | "\n", 628 | "SELECT * FROM HC.PATIENTS;" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "## Catalog Views\n", 636 | "The information about the ROW and COLUMN permissions can be found in the `SYSCAT.CONTROLS` table. One line is found in this table for each column permission and one for the row permission on a table." 637 | ] 638 | }, 639 | { 640 | "cell_type": "code", 641 | "execution_count": null, 642 | "metadata": {}, 643 | "outputs": [], 644 | "source": [ 645 | "%sql -a SELECT * FROM SYSCAT.CONTROLS;" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": { 651 | "collapsed": true 652 | }, 653 | "source": [ 654 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 655 | ] 656 | } 657 | ], 658 | "metadata": { 659 | "hide_code_all_hidden": false, 660 | "kernelspec": { 661 | "display_name": "Python 3", 662 | "language": "python", 663 | "name": "python3" 664 | }, 665 | "language_info": { 666 | "codemirror_mode": { 667 | "name": "ipython", 668 | "version": 3 669 | }, 670 | "file_extension": ".py", 671 | "mimetype": "text/x-python", 672 | "name": "python", 673 | "nbconvert_exporter": "python", 674 | "pygments_lexer": "ipython3", 675 | "version": "3.6.1" 676 | } 677 | }, 678 | "nbformat": 4, 679 | "nbformat_minor": 2 680 | } 681 | -------------------------------------------------------------------------------- /Db2 Statistical Functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "hideCode": false, 14 | "hidePrompt": false 15 | }, 16 | "source": [ 17 | "# Db2 Statistical Functions" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "hideCode": false, 24 | "hidePrompt": false 25 | }, 26 | "source": [ 27 | "Db2 already has a variety of Statistical functions built in. In Db2 11.1, a number of new\n", 28 | "functions have been added including:\n", 29 | " \n", 30 | " - [*COVARIANCE_SAMP*](#covariance) - The COVARIANCE_SAMP function returns the sample covariance of a set of number pairs\n", 31 | " - [*STDDEV_SAMP*](#stddev) - The STDDEV_SAMP column function returns the sample standard deviation (division by [n-1]) of a set of numbers.\n", 32 | " - [*VARIANCE_SAMP*](#variance) or VAR_SAMP - The VARIANCE_SAMP column function returns the sample variance (division by [n-1]) of a set of numbers.\n", 33 | " - [*CUME_DIST*](#cume) - The CUME_DIST column function returns the cumulative distribution of a row that is hypothetically inserted into a group of rows\n", 34 | " - [*PERCENT_RANK*](#rank) - The PERCENT_RANK column function returns the relative percentile rank of a row that is hypothetically inserted into a group of rows.\n", 35 | " - [*PERCENTILE_DISC*](#disc), [*PERCENTILE_CONT*](#cont) - Returns the value that corresponds to the specified percentile given a sort specification by using discrete (DISC) or continuous (CONT) distribution\n", 36 | " - [*MEDIAN*](#median) - The MEDIAN column function returns the median value in a set of values\n", 37 | " - [*WIDTH_BUCKET*](#width) - The WIDTH_BUCKET function is used to create equal-width histograms" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Sampling Functions\n", 45 | "The traditional VARIANCE, COVARIANCE, and STDDEV functions have been available in Db2 for a long time. When computing these values, the formulae assume that the entire population has been counted (N). The traditional formula for standard deviation is:" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "$$\\sigma=\\sqrt{\\frac{1}{N}\\sum_{i=1}^N(x_{i}-\\mu)^{2}}$$" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "N refers to the size of the population and in many cases, we only have a sample, not the entire population of values. \n", 60 | "In this case, the formula needs to be adjusted to account for the sampling." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "$$s=\\sqrt{\\frac{1}{N-1}\\sum_{i=1}^N(x_{i}-\\bar{x})^{2}}$$" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "hideCode": false, 74 | "hidePrompt": false 75 | }, 76 | "source": [ 77 | "Set up the connection to the database." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "%run db2.ipynb" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "We populate the database with the EMPLOYEE and DEPARTMENT tables so that we can run the various examples." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "%sql -sampledata" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "hideCode": false, 116 | "hidePrompt": false 117 | }, 118 | "source": [ 119 | "## COVARIANCE_SAMP\n", 120 | "\n", 121 | "The COVARIANCE_SAMP function returns the sample covariance of a set of number pairs. " 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": { 128 | "hideCode": false, 129 | "hidePrompt": false 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "%%sql\n", 134 | "SELECT COVARIANCE_SAMP(SALARY, BONUS) \n", 135 | " FROM EMPLOYEE \n", 136 | "WHERE WORKDEPT = 'A00'" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": { 149 | "hideCode": false, 150 | "hidePrompt": false 151 | }, 152 | "source": [ 153 | "## STDDEV_SAMP\n", 154 | "\n", 155 | "The STDDEV_SAMP column function returns the sample standard deviation (division by [n-1]) of a set of numbers." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "hideCode": false, 163 | "hidePrompt": false 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "%%sql\n", 168 | "SELECT STDDEV_SAMP(SALARY) \n", 169 | " FROM EMPLOYEE \n", 170 | "WHERE WORKDEPT = 'A00'" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": { 183 | "hideCode": false, 184 | "hidePrompt": false 185 | }, 186 | "source": [ 187 | "## VARIANCE_SAMP\n", 188 | "The VARIANCE_SAMP column function returns the sample variance (division by [n-1]) of a set of numbers." 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "hideCode": false, 196 | "hidePrompt": false 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "%%sql\n", 201 | "SELECT VARIANCE_SAMP(SALARY) \n", 202 | " FROM EMPLOYEE \n", 203 | "WHERE WORKDEPT = 'A00'" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": { 216 | "hideCode": false, 217 | "hidePrompt": false 218 | }, 219 | "source": [ 220 | "## MEDIAN\n", 221 | "\n", 222 | "The MEDIAN column function returns the median value in a set of values." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": { 229 | "hideCode": false, 230 | "hidePrompt": false 231 | }, 232 | "outputs": [], 233 | "source": [ 234 | "%%sql\n", 235 | "SELECT MEDIAN(SALARY) AS MEDIAN, AVG(SALARY) AS AVERAGE \n", 236 | " FROM EMPLOYEE \n", 237 | "WHERE WORKDEPT = 'E21'" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": { 250 | "hideCode": false, 251 | "hidePrompt": false 252 | }, 253 | "source": [ 254 | "## CUME_DIST\n", 255 | "\n", 256 | "The CUME_DIST column function returns the cumulative distribution of a row that is hypothetically inserted into \n", 257 | "a group of rows." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": { 264 | "hideCode": false, 265 | "hidePrompt": false 266 | }, 267 | "outputs": [], 268 | "source": [ 269 | "%%sql\n", 270 | "SELECT CUME_DIST(47000) WITHIN GROUP (ORDER BY SALARY) \n", 271 | " FROM EMPLOYEE \n", 272 | "WHERE WORKDEPT = 'A00'" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": { 285 | "collapsed": true, 286 | "hideCode": false, 287 | "hidePrompt": false 288 | }, 289 | "source": [ 290 | "## PERCENT_RANK\n", 291 | "\n", 292 | "The PERCENT_RANK column function returns the relative percentile rank of a\n", 293 | "row that is hypothetically inserted into a group of rows." 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": { 300 | "hideCode": false, 301 | "hidePrompt": false 302 | }, 303 | "outputs": [], 304 | "source": [ 305 | "%%sql\n", 306 | "SELECT PERCENT_RANK(47000) WITHIN GROUP (ORDER BY SALARY) \n", 307 | " FROM EMPLOYEE \n", 308 | "WHERE WORKDEPT = 'A00'" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": { 321 | "collapsed": true, 322 | "hideCode": false, 323 | "hidePrompt": false 324 | }, 325 | "source": [ 326 | "## PERCENTILE_DISC\n", 327 | "\n", 328 | "The PERCENTILE_DISC/CONT returns the value that corresponds to the specified percentile \n", 329 | "given a sort specification by using discrete (DISC) or continuous (CONT) distribution." 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": { 336 | "hideCode": false, 337 | "hidePrompt": false 338 | }, 339 | "outputs": [], 340 | "source": [ 341 | "%%sql\n", 342 | "SELECT PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY SALARY) \n", 343 | " FROM EMPLOYEE \n", 344 | "WHERE WORKDEPT = 'E21'" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": { 357 | "collapsed": true, 358 | "hideCode": false, 359 | "hidePrompt": false 360 | }, 361 | "source": [ 362 | "## PERCENTILE_CONT\n", 363 | "\n", 364 | "This is a function that gives you a continuous percentile calculation." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": null, 370 | "metadata": { 371 | "hideCode": false, 372 | "hidePrompt": false 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "%%sql\n", 377 | "SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY SALARY) \n", 378 | " FROM EMPLOYEE \n", 379 | "WHERE WORKDEPT = 'E21'" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": { 392 | "collapsed": true, 393 | "hideCode": false, 394 | "hidePrompt": false 395 | }, 396 | "source": [ 397 | "## WIDTH BUCKET and Histogram Example\n", 398 | "\n", 399 | "The WIDTH_BUCKET function is used to create equal-width histograms. Using the EMPLOYEE table, \n", 400 | "This SQL will assign a bucket to each employee's salary using a range of 35000 to 100000 divided into 13 buckets." 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": { 407 | "hideCode": false, 408 | "hidePrompt": false 409 | }, 410 | "outputs": [], 411 | "source": [ 412 | "%%sql\n", 413 | "SELECT EMPNO, SALARY, WIDTH_BUCKET(SALARY, 35000, 100000, 13) \n", 414 | " FROM EMPLOYEE \n", 415 | "ORDER BY EMPNO" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": { 421 | "hideCode": false, 422 | "hidePrompt": false 423 | }, 424 | "source": [ 425 | "We can plot this information by adding some more details to the bucket output." 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": null, 431 | "metadata": { 432 | "hideCode": false, 433 | "hidePrompt": false 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "%%sql -a\n", 438 | "WITH BUCKETS(EMPNO, SALARY, BNO) AS \n", 439 | " ( \n", 440 | " SELECT EMPNO, SALARY, \n", 441 | " WIDTH_BUCKET(SALARY, 35000, 100000, 9) AS BUCKET \n", 442 | " FROM EMPLOYEE ORDER BY EMPNO \n", 443 | " ) \n", 444 | "SELECT BNO, COUNT(*) AS COUNT FROM BUCKETS \n", 445 | "GROUP BY BNO \n", 446 | "ORDER BY BNO ASC " 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": { 452 | "hideCode": false, 453 | "hidePrompt": false 454 | }, 455 | "source": [ 456 | "And here is a plot of the data to make sense of the histogram." 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": { 463 | "hideCode": false, 464 | "hidePrompt": false 465 | }, 466 | "outputs": [], 467 | "source": [ 468 | "%%sql -pb\n", 469 | "WITH BUCKETS(EMPNO, SALARY, BNO) AS \n", 470 | " ( \n", 471 | " SELECT EMPNO, SALARY, \n", 472 | " WIDTH_BUCKET(SALARY, 35000, 100000, 9) AS BUCKET \n", 473 | " FROM EMPLOYEE ORDER BY EMPNO \n", 474 | " ) \n", 475 | "SELECT BNO, COUNT(*) AS COUNT FROM BUCKETS \n", 476 | "GROUP BY BNO \n", 477 | "ORDER BY BNO ASC " 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "[Back to Top](#top)" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 492 | ] 493 | } 494 | ], 495 | "metadata": { 496 | "hide_code_all_hidden": false, 497 | "kernelspec": { 498 | "display_name": "Python 3", 499 | "language": "python", 500 | "name": "python3" 501 | }, 502 | "language_info": { 503 | "codemirror_mode": { 504 | "name": "ipython", 505 | "version": 3 506 | }, 507 | "file_extension": ".py", 508 | "mimetype": "text/x-python", 509 | "name": "python", 510 | "nbconvert_exporter": "python", 511 | "pygments_lexer": "ipython3", 512 | "version": "3.6.1" 513 | } 514 | }, 515 | "nbformat": 4, 516 | "nbformat_minor": 2 517 | } 518 | -------------------------------------------------------------------------------- /Db2 Using Prepared Statements.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Db2 Jupyter: Using Prepared Statements" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Normal the `%sql` magic command is used to execute SQL commands immediately to get a result. If this statement needs to be executed multiple times with different variables, the process is inefficient since the SQL statement must be recompiled every time.\n", 15 | "\n", 16 | "The use of the `PREPARE` and `EXECUTE` command allow the user to optimize the statement once, and then re-execute the statement using different parameters.\n", 17 | "\n", 18 | "In addition, the commit scope can be modified so that not every statement gets committed immediately. By managing the commit scope, overhead in the database engine can be avoided. " 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "%run db2.ipynb" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Autocommit and Commit Scope\n", 35 | "By default, any SQL statements executed with the `%sql` magic command are immediately commited. This means that the log file has the transaction details and the results are committed to disk. In other words, you can't change your mind after the statement finishes execution.\n", 36 | "\n", 37 | "This behavior is often referred to as `AUTOCOMMIT` and adds a level of overhead to statement execution because at the end of every statement the results must be \"hardened\". On the other hand, autocommit means you don't have to worry about explicitly committing work or causing potential locking issues because you are holding up resources. When a record is updated, no other user will be able to view it (unless using \"dirty read\") until you commit. Holding the resource in a lock means that other workloads may come to a halt while they wait for you to commit your work. \n", 38 | "\n", 39 | "Here is a classic example of wanting a commit scope that is based on a series of statements:\n", 40 | "```\n", 41 | "withdrawal = 100\n", 42 | "%sql update checking set balance = balance - withdrawal\n", 43 | "%sql update savings set balance = balance + withdrawal\n", 44 | "```\n", 45 | "If autocommit is `ON`, you could have a problem with the transaction if the system failed after the first update statement. You would have taken money out of the checking account, but have not updated the savings account. To make sure that this transaction is run successfully:\n", 46 | "```\n", 47 | "%sql autocommit off\n", 48 | "withdrawal = 100\n", 49 | "%sql update checking set balance = balance - withdrawal\n", 50 | "%sql update savings set balance = balance + withdrawal\n", 51 | "%sql commit work\n", 52 | "```\n", 53 | "If the transaction fails before the `COMMIT WORK`, all changes to the database will be rolled back to its original state, thus protecting the integrity of the two tables." 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "### AUTOCOMMIT\n", 61 | "Autocommit can be turned on or off using the following syntax:\n", 62 | "```\n", 63 | "%sql AUTOCOMMIT ON | OFF\n", 64 | "```\n", 65 | "If you turn `AUTOCOMMIT OFF` then you need to make sure that you COMMIT work at the end of your code. If you don't there is possible you lose your work if the connection is lost to Db2. " 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "### COMMIT, ROLLBACK\n", 73 | "To `COMMIT` all changes to the database you must use the following syntax:\n", 74 | "```\n", 75 | "%sql COMMIT [WORK | HOLD]\n", 76 | "```\n", 77 | "The command `COMMIT` or `COMMIT WORK` are identical and will commit all work to the database. Issuing a `COMMIT` command also closes all open cursors or statements that are open. If you had created a prepared statement (see section below) then the compiled statement will be no longer valid. By issuing a `COMMIT` you are releasing all of the resources and locks that your application may be holding.\n", 78 | "\n", 79 | "`COMMIT HOLD` will allow you to commit your work to disk, but keeps all of the resources open for further execution. This is useful for situations where you are inserting or updating 1000's of records and do not want to tie up log space waiting for a commit to occur. The following pseudocode gives you an example how this would be used:\n", 80 | "```\n", 81 | "%sql autocommit off\n", 82 | "for i = 1 to 1000\n", 83 | " %sql insert into x values i\n", 84 | " if (i / 100 == 0) \n", 85 | " print i \"Records inserted\"\n", 86 | " %sql commit work\n", 87 | " end if \n", 88 | "end for\n", 89 | "%sql commit work\n", 90 | "%sql autocommit on\n", 91 | "```\n", 92 | "You should always remember to turn `AUTOCOMMIT ON` at the end of any code block or you will have to issue `COMMIT` at the end of any SQL command to commit it to the database." 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "## PREPARE and EXECUTE\n", 100 | "The `PREPARE` and `EXECUTE` commands are useful in situations where you want to repeat an SQL statement multiple times while just changing the parameter values. There isn't any benefit from using these statements for simple tasks that may only run occassionally. The benefit of `PREPARE/EXECUTE` is more evident when dealing with a large number of transactions that are the same.\n", 101 | "\n", 102 | "The `PREPARE` statement can be used against many types of SQL, but in this implementation, only the following SQL statements are supported:\n", 103 | "* SELECT\n", 104 | "* INSERT\n", 105 | "* UPDATE\n", 106 | "* DELETE\n", 107 | "* MERGE\n", 108 | "\n", 109 | "To prepare a statement, you must use the following syntax:\n", 110 | "```Python\n", 111 | "stmt = %sql PREPARE sql ....\n", 112 | "```\n", 113 | "\n", 114 | "The `PREPARE` statement always returns a statement handle. You must assign the results of the `PREPARE` statement to a variable since it will be required when you `EXECUTE` the statement.\n", 115 | "\n", 116 | "The SQL statement must have any variables replaced with a question mark `?`. For instance, if you wanted to insert a single value into a table you would use the following syntax:\n", 117 | "```Python\n", 118 | "stmt = %sql PREPARE insert into x values (?)\n", 119 | "```\n", 120 | "\n", 121 | "One important note with parameter markers. If you require the parameter to have a specific data type (say INTEGER) then you may want to place a `CAST` statement around it to force the proper conversion. Usually strings, integers, decimals, etc... convert fine when using this syntax, but occasionally you may run across a data type issue. For the previous example we could modify it to:\n", 122 | "```Python\n", 123 | "stmt = %sql PREPARE insert into x values (CAST(? AS INTEGER))\n", 124 | "```" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "Once you have prepared a statement, you can execute it using the following syntax:\n", 132 | "```Python\n", 133 | "%sql EXECUTE :stmt USING :v1,:v2,:v3,....\n", 134 | "```\n", 135 | "\n", 136 | "You must provide the variable names with a colon `:` in front of them and separate each one with a comma. This allows the SQL parser to differentiate between a host variable and a column or SQL keyword. You can also use constants as part of the `EXECUTE` statement:\n", 137 | "```Python\n", 138 | "%sql EXECUTE :stmt USING 3,'asdsa',24.5\n", 139 | "```\n", 140 | "\n", 141 | "Using variables are more convenient when dealing with strings that may contain single and double quotes. " 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "### Using Arrays and Multiple Parameters\n", 149 | "When using the `PREPARE` statement, it can become cumbersome when dealing with many parameter markers. For instance, in order to insert 10 columns into a table the code would look similar to this:\n", 150 | "```\n", 151 | "stmt = %sql PREPARE INSERT INTO X VALUES (?,?,?,?,?,?,?,?,?,?)\n", 152 | "```\n", 153 | "The `%sql` command allows you to use the shortform `?*#` where `#` is an integer representing the number of columns you want in the list. The above statement could be written as:\n", 154 | "```\n", 155 | "stmt = %sql PREPARE INSERT INTO X VALUES (?*10)\n", 156 | "```\n", 157 | "The syntax can also be used to create groups of parameter markers:\n", 158 | "```\n", 159 | "stmt = %sql PREPARE INSERT INTO X VALUES (?*3,?*7)\n", 160 | "```\n", 161 | "While this may seem a strange way of providing parameters, this becomes more useful when we use the `EXECUTE` command.\n" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "The `EXECUTE` command can use Python arrays (lists) as input arguments. For the previous example with 10 parameters you could issue the following command:\n", 169 | "```\n", 170 | "%sql EXECUTE :stmt USING :v1,:v2,:v3,:v4,:v5,:v6,:v7,:v8,:v9,:v10\n", 171 | "```\n", 172 | "If you placed all of these values into an array, you could also do the following:\n", 173 | "```\n", 174 | "%sql EXECUTE :stmt USING :v[0],:v[1],:v[2],:v[3],:v[4],:v[5],:v[6],:v[7],:v[8],:v[9]\n", 175 | "```\n", 176 | "That isn't much simpler but shows that you could use items within an array (one dimensional only). The easiest syntax is:\n", 177 | "```\n", 178 | "%sql EXECUTE :stmt USING :v\n", 179 | "```\n", 180 | "The only requirement is that the array `v` has exactly the number of values required to satisfy the parameter list required by the prepared statement. \n", 181 | "\n", 182 | "When you split the argument list into groups, you can use multiple arrays to contain the data. Given the following prepare statement:\n", 183 | "```\n", 184 | "stmt = %sql PREPARE INSERT INTO X VALUES (?*3,?*7)\n", 185 | "```\n", 186 | "You could execute the statement using two arrays:\n", 187 | "```\n", 188 | "%sql EXECUTE :stmt USING :name, :details\n", 189 | "```\n", 190 | "This would work as long as the total number of parameters supplied by the `name` array and `details` array is equal to 10." 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "## Performance Comparisons\n", 198 | "The following examples will show the use of `AUTOCOMMIT` and `PREPARE/EXECUTE` when running SQL statements.\n", 199 | "\n", 200 | "This first SQL statement will load the EMPLOYEE and DEPARTMENT tables (if they don't already exist) and then return an array of all of the employees in the company using a SELECT statement." 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "%sql -sampledata\n", 210 | "employees = %sql -r select * from employee" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "The `employees` variable contains all of the employee data as a Python array. The next statement will retrieve the contents of the first row only (Remember that row 0 contains the name of the columns)." 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "print(employees[1])" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "We now will create another `EMPLOYEE` table that is an exact duplicate of what we already have." 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": { 240 | "collapsed": true 241 | }, 242 | "outputs": [], 243 | "source": [ 244 | "%%sql -q\n", 245 | "DROP TABLE EMPLOYEE2;\n", 246 | "CREATE TABLE EMPLOYEE2 AS (SELECT * FROM EMPLOYEE) DEFINITION ONLY;" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "### Loop with INSERT Statements\n", 254 | "One could always use SQL to insert into this table, but we will use a loop to execute insert statements. The loop will be timed so we can get a sense of the cost of running this code. In order to make the loop run a longer the insert block is run 50 times." 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "%sql -q DELETE FROM EMPLOYEE2\n", 264 | "print(\"Starting Insert\")\n", 265 | "start_time = time.time()\n", 266 | "i = 0\n", 267 | "for k in range(0,50):\n", 268 | " for record in employees[1:]:\n", 269 | " i += 1\n", 270 | " empno,firstnme,midinit,lastname,workdept,phoneno,hiredate,job,edlevel,sex,birthdate,salary,bonus,comm = record\n", 271 | " %sql -q INSERT INTO EMPLOYEE2 VALUES ( \\\n", 272 | " :empno,:firstnme,:midinit, \\\n", 273 | " :lastname,:workdept,:phoneno, \\\n", 274 | " :hiredate,:job,:edlevel, \\\n", 275 | " :sex,:birthdate,:salary, \\\n", 276 | " :bonus,:comm) \n", 277 | "end_time = time.time()\n", 278 | "print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))\n", 279 | "time_insert = end_time-start_time" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "### Loop with PREPARE Statement\n", 287 | "An alternative method would be to use a prepared statement that allows us to compile the statement once in Db2 and then reuse the statement in Db2's memory. This method uses the individual column values as input into the `EXECUTE` statement." 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": {}, 294 | "outputs": [], 295 | "source": [ 296 | "%sql -q DELETE FROM EMPLOYEE2\n", 297 | "print(\"Starting Insert\")\n", 298 | "start_time = time.time()\n", 299 | "i = 0\n", 300 | "prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)\n", 301 | "for k in range(0,50):\n", 302 | " for record in employees[1:]:\n", 303 | " i += 1\n", 304 | " empno,firstnme,midinit,lastname,workdept,phoneno,hiredate,job,edlevel,sex,birthdate,salary,bonus,comm = record\n", 305 | " %sql execute :prep using :empno,:firstnme,:midinit,:lastname,:workdept,:phoneno,:hiredate,:job,:edlevel,:sex,:birthdate,:salary,:bonus,:comm\n", 306 | "end_time = time.time()\n", 307 | "print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))\n", 308 | "time_prepare = end_time-start_time" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "### Loop with PREPARE Statement and Arrays\n", 316 | "You will notice that it is a bit tedious to write out all of the columns that are required as part of an `INSERT` statement. A simpler option is to use a Python list or array to and assign it directly in the `EXECUTE` statement. So rather than:\n", 317 | "```\n", 318 | "%sql execute :prep using :empno, :firstnme, ...\n", 319 | "```\n", 320 | "We can just use the array variable generated as part of the for loop:\n", 321 | "```\n", 322 | "%sql execute :prep using :record\n", 323 | "```\n", 324 | "The following SQL demonstrates this approach." 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [ 333 | "%sql -q DELETE FROM EMPLOYEE2\n", 334 | "print(\"Starting Insert\")\n", 335 | "start_time = time.time()\n", 336 | "i = 0\n", 337 | "prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)\n", 338 | "for k in range(0,50):\n", 339 | " for record in employees[1:]:\n", 340 | " i += 1\n", 341 | " %sql execute :prep using :record\n", 342 | "\n", 343 | "end_time = time.time()\n", 344 | "print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))\n", 345 | "time_array = end_time-start_time" 346 | ] 347 | }, 348 | { 349 | "cell_type": "markdown", 350 | "metadata": {}, 351 | "source": [ 352 | "### Loop with PREPARE Statement, Arrays and AUTOCOMMIT OFF\n", 353 | "Finally, we can turn `AUTOCOMMIT` off and then commit the work at the end of the block to improve the total time required to insert the data. Note the use of the parameter shortform `?*14` in the code." 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [ 362 | "%sql -q DELETE FROM EMPLOYEE2\n", 363 | "%sql autocommit off\n", 364 | "print(\"Starting Insert\")\n", 365 | "start_time = time.time()\n", 366 | "i = 0\n", 367 | "prep = %sql prepare INSERT INTO EMPLOYEE2 VALUES (?*14)\n", 368 | "for k in range(0,50):\n", 369 | " for record in employees[1:]:\n", 370 | " i += 1\n", 371 | " %sql execute :prep using :record\n", 372 | "%sql commit work \n", 373 | "end_time = time.time()\n", 374 | "print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))\n", 375 | "%sql autocommit on \n", 376 | "time_commit = end_time-start_time" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": { 382 | "collapsed": true 383 | }, 384 | "source": [ 385 | "### Performance Comparison\n", 386 | "You may have noticed that the performance of the last method is substantially faster than the other examples. The primary reason for this is the `COMMIT` only occuring at the end of the code." 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "%%sql -pb\n", 396 | "WITH RESULT(RUN,ELAPSED) AS (\n", 397 | " VALUES\n", 398 | " ('INSERT',CAST(:time_insert AS DEC(5,2))),\n", 399 | " ('PREPARE',CAST(:time_prepare AS DEC(5,2))),\n", 400 | " ('ARRAY ',CAST(:time_array AS DEC(5,2))),\n", 401 | " ('COMMIT ',CAST(:time_commit AS DEC(5,2))) \n", 402 | ")\n", 403 | "SELECT RUN, ELAPSED FROM RESULT\n", 404 | " ORDER BY ELAPSED DESC" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 412 | ] 413 | } 414 | ], 415 | "metadata": { 416 | "kernelspec": { 417 | "display_name": "Python 3", 418 | "language": "python", 419 | "name": "python3" 420 | }, 421 | "language_info": { 422 | "codemirror_mode": { 423 | "name": "ipython", 424 | "version": 3 425 | }, 426 | "file_extension": ".py", 427 | "mimetype": "text/x-python", 428 | "name": "python", 429 | "nbconvert_exporter": "python", 430 | "pygments_lexer": "ipython3", 431 | "version": "3.6.1" 432 | } 433 | }, 434 | "nbformat": 4, 435 | "nbformat_minor": 2 436 | } 437 | -------------------------------------------------------------------------------- /Db2V11-JSON-ebook.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DB2-Samples/db2jupyter/25929617ebf87483f0b452991ecb1e30f615c718/Db2V11-JSON-ebook.pdf -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /Notice.txt: -------------------------------------------------------------------------------- 1 | This product includes software originally developed by George Baklarz, IBM Corporation 2 | Copyright 2017, 2018 IBM Corp. 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Db2 Jupyter Notebook Extensions 2 | A Jupyter notebook and magic functions to demonstrate Db2 LUW 11 features. 3 | 4 | This code is imported as a Jupyter notebook extension in any notebooks you create with Db2 code in it. Place the following line of code in any notebook that you want to use these commands with: 5 | ```Python 6 | %run db2.ipynb 7 | ``` 8 | 9 | This code defines a Jupyter/Python magic command called `%sql` which allows you to execute Db2 specific calls to the database. There are other packages available for manipulating databases, but this one has been specifically designed for demonstrating a number of the SQL features available in Db2. 10 | 11 | There are two ways of executing the `%sql` command. A single line SQL statement would use the line format of the magic command: 12 | ```Python 13 | %sql SELECT * FROM EMPLOYEE 14 | ``` 15 | 16 | If you have a large block of sql then you would place the %%sql command at the beginning of the block and then place the SQL statements into the remainder of the block. Using this form of the `%%sql` statement means that the notebook cell can only contain SQL and no other statements. 17 | ```Python 18 | %%sql 19 | SELECT * FROM EMPLOYEE 20 | ORDER BY LASTNAME 21 | ``` 22 | 23 | You can have multiple lines in the SQL block (`%%sql`). The default SQL delimiter is the semi-column (`;`). If you have scripts (triggers, procedures, functions) that use the semi-colon as part of the script, you will need to use the -d option to change the delimiter to an at "@" sign. 24 | ```Python 25 | %%sql -d 26 | SELECT * FROM EMPLOYEE 27 | @ 28 | CREATE PROCEDURE ... 29 | @ 30 | ``` 31 | 32 | The `%sql` command allows most DB2 commands to execute and has a special version of the CONNECT statement. A CONNECT by itself will attempt to reconnect to the database using previously used settings. If it cannot connect, it will prompt the user for additional information. 33 | 34 | The CONNECT command has the following format: 35 | ```Python 36 | %sql CONNECT TO USER USING HOST PORT 37 | ``` 38 | 39 | If you use a "?" for the password field, the system will prompt you for a password. This avoids typing the password as clear text on the screen. If a connection is not successful, the system will print the error message associated with the connect request. 40 | 41 | If the connection is successful, the parameters are saved on your system and will be used the next time you run a SQL statement, or when you issue the %sql CONNECT command with no parameters. 42 | 43 | In addition to the -d option, there are a number different options that you can specify at the beginning of the SQL: 44 | 45 | - -d - Delimiter: Change SQL delimiter to "@" from ";" 46 | - -q - Quiet: Quiet results - no answer set or messages returned from the function 47 | - -r - Return the result set as a data frame for Python usage 48 | - -t - Time: Time the following SQL statement and return the number of times it executes in 1 second 49 | - -j - JSON: Create a pretty JSON representation. Only the first column is formatted 50 | - -a - All: Return all rows in answer set and do not limit display 51 | - -pb - Plot Bar: Plot the results as a bar chart 52 | - -pl - Plot Line: Plot the results as a line chart 53 | - -pp - Plot Pie: Plot the results as a pie chart 54 | - -i - Interactive plotting and viewing of the data 55 | - -sampledata - Create and load the EMPLOYEE and DEPARTMENT tables 56 | 57 | One final note. You can pass python variables to the %sql command by using the \{\} braces with the name of the variable inbetween. Note that you will need to place proper punctuation around the variable in the event the SQL command requires it. For instance, the following example will find employee '000010' in the EMPLOYEE table. 58 | ```Python 59 | empno = '000010' 60 | %sql SELECT LASTNAME FROM EMPLOYEE WHERE EMPNO='{empno}' 61 | ``` 62 | 63 | The other option is to use a colon in front of a variable name and then no quotes are required. 64 | ```Python 65 | %sql SELECT LASTNAME FROM EMPLOYEE WHERE EMPNO=:empno 66 | ``` 67 | 68 | For more documentation and examples, see the Db2 Jupyter Tutorial or sign-up for a free [Db2 Proof of Technology](https://www.ibm.com/cloud/garage/tutorials/ibm-db2-local/modern-application-development-with-db-2) that contains the code from this GitHub repository. Additional details on Db2 features and functions can be found on the [Db2 Advanced Enterprise Edition](https://www.ibm.com/products/db2-database) site. 69 | -------------------------------------------------------------------------------- /db2json.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Db2 Jupyter Notebook Macros" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## JSON to BSON" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": { 21 | "collapsed": true 22 | }, 23 | "outputs": [], 24 | "source": [ 25 | "%%sql macro j2b\n", 26 | " if {argc} != 1\n", 27 | " echo Syntax: %j2b(value)\n", 28 | " exit\n", 29 | " endif\n", 30 | "SYSTOOLS.JSON2BSON({1})" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## BSON to JSON" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": null, 43 | "metadata": { 44 | "collapsed": true 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "%%sql macro b2j\n", 49 | " if {argc} != 1\n", 50 | " echo Syntax: %b2j(value)\n", 51 | " exit\n", 52 | " endif\n", 53 | "SYSTOOLS.BSON2JSON({1})" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "## JSON String Column" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "metadata": { 67 | "collapsed": true 68 | }, 69 | "outputs": [], 70 | "source": [ 71 | "%%sql macro js\n", 72 | " if {argc} = 3\n", 73 | " TRIM(JSON_VAL({1},'{2}','s:{3}'))\n", 74 | " else\n", 75 | " echo Syntax: #js( json_column, field ,column_size] )\n", 76 | " exit\n", 77 | " endif" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "## JSON Integer Column" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "metadata": { 91 | "collapsed": true 92 | }, 93 | "outputs": [], 94 | "source": [ 95 | "%%sql macro ji\n", 96 | " if {argc} = 2\n", 97 | " JSON_VAL({1},'{2}','i')\n", 98 | " else \n", 99 | " echo Syntax: #ji( [json_column,] field )\n", 100 | " exit\n", 101 | " endif" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "## JSON Decimal Column" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": { 115 | "collapsed": true 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "%%sql macro jd\n", 120 | " if {argc} = 3\n", 121 | " CAST (JSON_VAL({1},'{2}','n') AS DECIMAL({3}))\n", 122 | " else\n", 123 | " echo Syntax: #jd( json_column, field ,'precision,scale')\n", 124 | " exit\n", 125 | " endif" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "## JSON Date Column" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": { 139 | "collapsed": true 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "%%sql macro jdate\n", 144 | " if {argc} = 2 \n", 145 | " CAST (JSON_VAL({1},'{2}','s:10') AS DATE)\n", 146 | " else\n", 147 | " echo Syntax: #jdate( [json_column,] field )\n", 148 | " exit\n", 149 | " endif" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "## JSON Array Macro" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": null, 162 | "metadata": { 163 | "collapsed": true 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "%%sql macro jsonarray \n", 168 | "if {argc} < 4\n", 169 | " echo Syntax: jsonarray(table_name, pk, json_column, array [,where=\"optional where clause\"])\n", 170 | " exit\n", 171 | "endif\n", 172 | "var table {1}\n", 173 | "var pk {2}\n", 174 | "var js_column {3}\n", 175 | "var array {4}\n", 176 | "\n", 177 | "jsonarray(pk,item) as \n", 178 | " (\n", 179 | " select {pk},systools.json2bson(items.value) \n", 180 | " from {table}, \n", 181 | " table( systools.json_table({js_column},'{array}','s:2048') ) as items\n", 182 | " if {where} <> null\n", 183 | " where {where} \n", 184 | " endif \n", 185 | " )" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 193 | ] 194 | } 195 | ], 196 | "metadata": { 197 | "kernelspec": { 198 | "display_name": "Python 3", 199 | "language": "python", 200 | "name": "python3" 201 | }, 202 | "language_info": { 203 | "codemirror_mode": { 204 | "name": "ipython", 205 | "version": 3 206 | }, 207 | "file_extension": ".py", 208 | "mimetype": "text/x-python", 209 | "name": "python", 210 | "nbconvert_exporter": "python", 211 | "pygments_lexer": "ipython3", 212 | "version": "3.6.1" 213 | } 214 | }, 215 | "nbformat": 4, 216 | "nbformat_minor": 2 217 | } 218 | -------------------------------------------------------------------------------- /db2jupyter.docker: -------------------------------------------------------------------------------- 1 | # 2 | # Jupyter notebooks for Db2 using Python 3.x format 3 | # Version 1.5 4 | # 2019-04-28 5 | # Author: George Baklarz 6 | # 7 | 8 | # Build image: 9 | # docker build -t db2jupyter -f db2jupyter.docker . 10 | 11 | # Run image: 12 | # docker run --name db2jupyter -d -p 8888:8888 db2jupyter 13 | 14 | FROM continuumio/anaconda3 15 | 16 | # 1. Update the distribution 17 | # 2. Install Pixiedust library 18 | # 3. Update apt-get and install the gcc compiler 19 | # 4. easy_install the Db2 drivers (ibm-db) 20 | # 5. Remove temp files 21 | 22 | RUN conda update -y conda && \ 23 | conda install -y -c conda-forge ipywidgets qgrid && \ 24 | pip install pixiedust && \ 25 | apt-get update && \ 26 | apt-get install -y gcc && \ 27 | easy_install ibm-db && \ 28 | rm -rf /var/lib/apt/lists/* 29 | 30 | # Create the directory for Jupyter defaults and insert the settings into the file 31 | # 1. Allow connection without a password 32 | # 2. True deletion of files (bypasses a bug in Jupyter) 33 | # 3. Allow root to run Jupyter 34 | # 4. Clone the Db2-Samples Jupyter notebooks files into the /opt/notebooks directory 35 | 36 | RUN mkdir /root/.jupyter && \ 37 | echo "c.NotebookApp.token = u''" >> ~/.jupyter/jupyter_notebook_config.py && \ 38 | echo "c.FileContentsManager.delete_to_trash=False" >> ~/.jupyter/jupyter_notebook_config.py && \ 39 | echo "c.NotebookApp.allow_root=True" >> ~/.jupyter/jupyter_notebook_config.py && \ 40 | git clone git://github.com/DB2-Samples/db2jupyter.git /opt/notebooks 41 | 42 | # 43 | # Command to start Jupyter in the container 44 | # Updated --ip='*' to --ip='0.0.0.0' 45 | # 46 | 47 | CMD jupyter notebook --NotebookApp.token= --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root 48 | 49 | ENV DESCRIPTION="Db2 Jupyter Notebook" \ 50 | VERSION="1.4" \ 51 | BUILD_DATE="2018-07-17" 52 | 53 | EXPOSE 8888 -------------------------------------------------------------------------------- /dockerinstall.md: -------------------------------------------------------------------------------- 1 | ## Creating a Docker Container for Db2 and Jupyter Notebooks 2 | You can create a Docker image that will contain Jupyter Notebooks, the Db2 drivers, and all of the examples found in this Github repository. 3 | 4 | Assuming that you have Docker running on Windows, Mac, or Linux, the following commands will create a Docker image for you. 5 | 6 | 1) Download the `db2jupyter.docker` file and place it into a directory of your choice 7 | 2) Open up a command window that is able to issue Docker commands (either use Kitematic CLI command line, or a terminal window on Mac or Linux). 8 | 3) Navigate to the directory that the `db2jupyter.docker` file is located (i.e. cd or chdir) 9 | 4) Issue the following command to create a Docker image: 10 | ```Python 11 | docker build -t db2jupyter -f db2jupyter.docker . <- Note the period at the end 12 | ``` 13 | 14 | 5) Once the build is complete (there will be some warning messages with the ibm_db creation) you can now run the docker container with the following command. 15 | ```Python 16 | docker run --name db2jupyter -d -p 8888:8888 db2jupyter 17 | ``` 18 | 19 | 6) If port 8888 is already in use on your system you will have to give it a different port number. For instance, the following command will map the host port 9999 to port 8888 inside the Docker container. 20 | ```Python 21 | docker run --name db2jupyter -d -p 9999:8888 db2jupyter 22 | ``` 23 | 24 | 7) Use your favorite browser to navigate to `localhost:8888` and all of the Db2 notebooks will be listed and available for use. 25 | -------------------------------------------------------------------------------- /installation.md: -------------------------------------------------------------------------------- 1 | ## Installation Requirements 2 | These files in the Jupyter directory need to be loaded onto your workstation or server where Jupyter notebooks normally run. If you do not have Jupyter installed, there are a number of articles available on the web that tell you how to get it up and running. I've given up a quick write-up below on what you need to do at a minimum to use these notebooks. 3 | 4 | ### Jupyter 5 | You need to have Jupyter notebook installed on your system, which also needs Python to be installed on your system. If you already have Python available, odds are that you have Jupyter installed as well. The next set of commands will install Python on your platform and then install the Jupyter components. 6 | 7 | ### Anaconda or Miniconda 8 | Anaconda is an Open Data Science Platform that is powered by Python http://www.continuum.io. The platform keeps track of packages and their dependencies for development with Jupyter notebooks and Python. This makes it very easy to install extensions for Python without having to manually install everything. 9 | 10 | Download the Anaconda or Miniconda package applicable to your platform. Miniconda creates the minimal system required for using Python and Jupyter, while Anaconda installs all major packages. There are two versions of Python - V2 or V3. While it doesn't matter which one you use for most notebooks, there are some situations where you may want to use the Python 2 library. For Windows I would recommend using Python 2 in order to use the free Microsoft Compiler for Python. This becomes important when you want to add the ibm_db package to connect to a DB2 server. 11 | 12 | After installing Anaconda, you should issue the following commands from a command line shell that will update and install components required by Db2 notebooks. 13 | ``` 14 | conda update conda - This will update the Anaconda distribution so you have the latest code 15 | conda install -y -c conda-forge ipywidgets qgrid - Add components needed for displaying result sets 16 | apt-get update - Update apt-get catalog 17 | apt-get install -y gcc - Make sure a C compiler is available for the Db2 driver 18 | easy_install ibm-db - Install the Db2 Python drivers 19 | ``` 20 | 21 | At this point your installation should have Jupyter available on your system. To start the notebook server you need to issue the following command: 22 | ``` 23 | jupyter notebook (opens browser) 24 | jupyter notebook --no-browser (runs as a service) 25 | ``` 26 | The first command will open up a browser window that displays your notebooks. You can click on one of these notebooks to see the contents. If no notebooks are available, you will need to move the files in the Github jupyter directory to a local folder on your system that the program can access. 27 | 28 | ### Db2 Extensions 29 | 30 | To create a connection to Db2 with the Python Db2 extensions you must install the ibm_db package. This package adds appropriate database commands to Python so that it can access the data directly. The ibm_db package is not available as part of the Anaconda/Miniconda package so you need to use a different command to install it. 31 | 32 | For the Linux environments, a compiler is already installed that will build the ibm_db extensions. You only need to issue the following command to install the DB2 drivers: 33 | ``` 34 | easy_install ibm_db 35 | ``` 36 | On Windows, there is no default compiler. For Python V2, Microsoft makes available a C compiler just for Python usage. To find this compiler, search for "Python 2.7 C Compiler Windows" and then download and install this compiler. Once that is done you may also have to install the Db2 Client drivers. These drivers are part of a Db2 database installation, so you may already have them installed. If not, search for the DB2 Client Drivers and download one appropriate for your platform. These drivers are needed for compiling the code. 37 | 38 | Once you've installed the Db2 driver, note its location on disk. The following commands need to be issued to get the driver properly installed. 39 | ``` 40 | set IBM_DB_HOME=c:\Program Files\IBM\SQLLIB\ -- Location of DB2 installation 41 | cd Program files\ibm\sqllib\dsdriver\python32 -- Move to the directory in your command line 42 | easy_install ibm_db 43 | ``` 44 | When the command completes you will have access to Db2 from within the Jupyter notebooks. 45 | -------------------------------------------------------------------------------- /v1/Db2 11 Statistical Functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "hideCode": false, 14 | "hidePrompt": false 15 | }, 16 | "source": [ 17 | "# Db2 11 Statistical Functions" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "hideCode": false, 24 | "hidePrompt": false 25 | }, 26 | "source": [ 27 | "Db2 already has a variety of Statistical functions built in. In Db2 11.1, a number of new\n", 28 | "functions have been added including:\n", 29 | " \n", 30 | " - [*COVARIANCE_SAMP*](#covariance) - The COVARIANCE_SAMP function returns the sample covariance of a set of number pairs\n", 31 | " - [*STDDEV_SAMP*](#stddev) - The STDDEV_SAMP column function returns the sample standard deviation (division by [n-1]) of a set of numbers.\n", 32 | " - [*VARIANCE_SAMP*](#variance) or VAR_SAMP - The VARIANCE_SAMP column function returns the sample variance (division by [n-1]) of a set of numbers.\n", 33 | " - [*CUME_DIST*](#cume) - The CUME_DIST column function returns the cumulative distribution of a row that is hypothetically inserted into a group of rows\n", 34 | " - [*PERCENT_RANK*](#rank) - The PERCENT_RANK column function returns the relative percentile rank of a row that is hypothetically inserted into a group of rows.\n", 35 | " - [*PERCENTILE_DISC*](#disc), [*PERCENTILE_CONT*](#cont) - Returns the value that corresponds to the specified percentile given a sort specification by using discrete (DISC) or continuous (CONT) distribution\n", 36 | " - [*MEDIAN*](#median) - The MEDIAN column function returns the median value in a set of values\n", 37 | " - [*WIDTH_BUCKET*](#width) - The WIDTH_BUCKET function is used to create equal-width histograms" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Sampling Functions\n", 45 | "The traditional VARIANCE, COVARIANCE, and STDDEV functions have been available in Db2 for a long time. When computing these values, the formulae assume that the entire population has been counted (N). The traditional formula for standard deviation is:" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "$$\\sigma=\\sqrt{\\frac{1}{N}\\sum_{i=1}^N(x_{i}-\\mu)^{2}}$$" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "N refers to the size of the population and in many cases, we only have a sample, not the entire population of values. \n", 60 | "In this case, the formula needs to be adjusted to account for the sampling." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "$$s=\\sqrt{\\frac{1}{N-1}\\sum_{i=1}^N(x_{i}-\\bar{x})^{2}}$$" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "hideCode": false, 74 | "hidePrompt": false 75 | }, 76 | "source": [ 77 | "Set up the connection to the database." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "%run db2.ipynb" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "We populate the database with the EMPLOYEE and DEPARTMENT tables so that we can run the various examples." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "%sql -sampledata" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": { 115 | "hideCode": false, 116 | "hidePrompt": false 117 | }, 118 | "source": [ 119 | "## COVARIANCE_SAMP\n", 120 | "\n", 121 | "The COVARIANCE_SAMP function returns the sample covariance of a set of number pairs. " 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": { 128 | "hideCode": false, 129 | "hidePrompt": false 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "%%sql\n", 134 | "SELECT COVARIANCE_SAMP(SALARY, BONUS) \n", 135 | " FROM EMPLOYEE \n", 136 | "WHERE WORKDEPT = 'A00'" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": { 149 | "hideCode": false, 150 | "hidePrompt": false 151 | }, 152 | "source": [ 153 | "## STDDEV_SAMP\n", 154 | "\n", 155 | "The STDDEV_SAMP column function returns the sample standard deviation (division by [n-1]) of a set of numbers." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "hideCode": false, 163 | "hidePrompt": false 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "%%sql\n", 168 | "SELECT STDDEV_SAMP(SALARY) \n", 169 | " FROM EMPLOYEE \n", 170 | "WHERE WORKDEPT = 'A00'" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": { 183 | "hideCode": false, 184 | "hidePrompt": false 185 | }, 186 | "source": [ 187 | "## VARIANCE_SAMP\n", 188 | "The VARIANCE_SAMP column function returns the sample variance (division by [n-1]) of a set of numbers." 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "hideCode": false, 196 | "hidePrompt": false 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "%%sql\n", 201 | "SELECT VARIANCE_SAMP(SALARY) \n", 202 | " FROM EMPLOYEE \n", 203 | "WHERE WORKDEPT = 'A00'" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": { 216 | "hideCode": false, 217 | "hidePrompt": false 218 | }, 219 | "source": [ 220 | "## MEDIAN\n", 221 | "\n", 222 | "The MEDIAN column function returns the median value in a set of values." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": { 229 | "hideCode": false, 230 | "hidePrompt": false 231 | }, 232 | "outputs": [], 233 | "source": [ 234 | "%%sql\n", 235 | "SELECT MEDIAN(SALARY) AS MEDIAN, AVG(SALARY) AS AVERAGE \n", 236 | " FROM EMPLOYEE \n", 237 | "WHERE WORKDEPT = 'E21'" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": { 250 | "hideCode": false, 251 | "hidePrompt": false 252 | }, 253 | "source": [ 254 | "## CUME_DIST\n", 255 | "\n", 256 | "The CUME_DIST column function returns the cumulative distribution of a row that is hypothetically inserted into \n", 257 | "a group of rows." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": { 264 | "hideCode": false, 265 | "hidePrompt": false 266 | }, 267 | "outputs": [], 268 | "source": [ 269 | "%%sql\n", 270 | "SELECT CUME_DIST(47000) WITHIN GROUP (ORDER BY SALARY) \n", 271 | " FROM EMPLOYEE \n", 272 | "WHERE WORKDEPT = 'A00'" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": { 285 | "collapsed": true, 286 | "hideCode": false, 287 | "hidePrompt": false 288 | }, 289 | "source": [ 290 | "## PERCENT_RANK\n", 291 | "\n", 292 | "The PERCENT_RANK column function returns the relative percentile rank of a\n", 293 | "row that is hypothetically inserted into a group of rows." 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": { 300 | "hideCode": false, 301 | "hidePrompt": false 302 | }, 303 | "outputs": [], 304 | "source": [ 305 | "%%sql\n", 306 | "SELECT PERCENT_RANK(47000) WITHIN GROUP (ORDER BY SALARY) \n", 307 | " FROM EMPLOYEE \n", 308 | "WHERE WORKDEPT = 'A00'" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": { 321 | "collapsed": true, 322 | "hideCode": false, 323 | "hidePrompt": false 324 | }, 325 | "source": [ 326 | "## PERCENTILE_DISC\n", 327 | "\n", 328 | "The PERCENTILE_DISC/CONT returns the value that corresponds to the specified percentile \n", 329 | "given a sort specification by using discrete (DISC) or continuous (CONT) distribution." 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": null, 335 | "metadata": { 336 | "hideCode": false, 337 | "hidePrompt": false 338 | }, 339 | "outputs": [], 340 | "source": [ 341 | "%%sql\n", 342 | "SELECT PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY SALARY) \n", 343 | " FROM EMPLOYEE \n", 344 | "WHERE WORKDEPT = 'E21'" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": { 357 | "collapsed": true, 358 | "hideCode": false, 359 | "hidePrompt": false 360 | }, 361 | "source": [ 362 | "## PERCENTILE_CONT\n", 363 | "\n", 364 | "This is a function that gives you a continuous percentile calculation." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": null, 370 | "metadata": { 371 | "hideCode": false, 372 | "hidePrompt": false 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "%%sql\n", 377 | "SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY SALARY) \n", 378 | " FROM EMPLOYEE \n", 379 | "WHERE WORKDEPT = 'E21'" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": { 392 | "collapsed": true, 393 | "hideCode": false, 394 | "hidePrompt": false 395 | }, 396 | "source": [ 397 | "## WIDTH BUCKET and Histogram Example\n", 398 | "\n", 399 | "The WIDTH_BUCKET function is used to create equal-width histograms. Using the EMPLOYEE table, \n", 400 | "This SQL will assign a bucket to each employee's salary using a range of 35000 to 100000 divided into 13 buckets." 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": { 407 | "hideCode": false, 408 | "hidePrompt": false 409 | }, 410 | "outputs": [], 411 | "source": [ 412 | "%%sql\n", 413 | "SELECT EMPNO, SALARY, WIDTH_BUCKET(SALARY, 35000, 100000, 13) \n", 414 | " FROM EMPLOYEE \n", 415 | "ORDER BY EMPNO" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": { 421 | "hideCode": false, 422 | "hidePrompt": false 423 | }, 424 | "source": [ 425 | "We can plot this information by adding some more details to the bucket output." 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": null, 431 | "metadata": { 432 | "hideCode": false, 433 | "hidePrompt": false 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "%%sql -a\n", 438 | "WITH BUCKETS(EMPNO, SALARY, BNO) AS \n", 439 | " ( \n", 440 | " SELECT EMPNO, SALARY, \n", 441 | " WIDTH_BUCKET(SALARY, 35000, 100000, 9) AS BUCKET \n", 442 | " FROM EMPLOYEE ORDER BY EMPNO \n", 443 | " ) \n", 444 | "SELECT BNO, COUNT(*) AS COUNT FROM BUCKETS \n", 445 | "GROUP BY BNO \n", 446 | "ORDER BY BNO ASC " 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": { 452 | "hideCode": false, 453 | "hidePrompt": false 454 | }, 455 | "source": [ 456 | "And here is a plot of the data to make sense of the histogram." 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": { 463 | "hideCode": false, 464 | "hidePrompt": false 465 | }, 466 | "outputs": [], 467 | "source": [ 468 | "%%sql -pb\n", 469 | "WITH BUCKETS(EMPNO, SALARY, BNO) AS \n", 470 | " ( \n", 471 | " SELECT EMPNO, SALARY, \n", 472 | " WIDTH_BUCKET(SALARY, 35000, 100000, 9) AS BUCKET \n", 473 | " FROM EMPLOYEE ORDER BY EMPNO \n", 474 | " ) \n", 475 | "SELECT BNO, COUNT(*) AS COUNT FROM BUCKETS \n", 476 | "GROUP BY BNO \n", 477 | "ORDER BY BNO ASC " 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "[Back to Top](#top)" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 492 | ] 493 | } 494 | ], 495 | "metadata": { 496 | "hide_code_all_hidden": false, 497 | "kernelspec": { 498 | "display_name": "Python 3", 499 | "language": "python", 500 | "name": "python3" 501 | }, 502 | "language_info": { 503 | "codemirror_mode": { 504 | "name": "ipython", 505 | "version": 3 506 | }, 507 | "file_extension": ".py", 508 | "mimetype": "text/x-python", 509 | "name": "python", 510 | "nbconvert_exporter": "python", 511 | "pygments_lexer": "ipython3", 512 | "version": "3.6.1" 513 | } 514 | }, 515 | "nbformat": 4, 516 | "nbformat_minor": 2 517 | } 518 | -------------------------------------------------------------------------------- /v1/Db2 Jupyter Extensions Tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Db2 Jupyter Notebook Extensions Tutorial\n", 8 | "\n", 9 | "The SQL code tutorials for Db2 rely on a Jupyter notebook extension, commonly refer to as a \"magic\" command. The beginning of all of the notebooks begin with the following command which will load the extension and allow the remainder of the notebook to use the %sql magic command.\n", 10 | "
\n",
  11 |     "%run db2.ipynb\n",
  12 |     "
\n", 13 | "The cell below will load the Db2 extension. Note that it will take a few seconds for the extension to load, so you should generally wait until the \"Db2 Extensions Loaded\" message is displayed in your notebook. In the event you get an error on the load of the ibm_db library, modify the command to include the -update option:\n", 14 | "```\n", 15 | "run db2.ipynb -update\n", 16 | "```" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": true 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "%run db2.ipynb" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Connections to Db2\n", 35 | "\n", 36 | "Before any SQL commands can be issued, a connection needs to be made to the Db2 database that you will be using. The connection can be done manually (through the use of the CONNECT command), or automatically when the first `%sql` command is issued.\n", 37 | "\n", 38 | "The Db2 magic command tracks whether or not a connection has occured in the past and saves this information between notebooks and sessions. When you start up a notebook and issue a command, the program will reconnect to the database using your credentials from the last session. In the event that you have not connected before, the system will prompt you for all the information it needs to connect. This information includes:\n", 39 | "\n", 40 | "- Database name (SAMPLE) \n", 41 | "- Hostname - localhost (enter an IP address if you need to connect to a remote server) \n", 42 | "- PORT - 50000 (this is the default but it could be different) \n", 43 | "- Userid - DB2INST1 \n", 44 | "- Password - No password is provided so you have to enter a value \n", 45 | "- Maximum Rows - 10 lines of output are displayed when a result set is returned \n", 46 | "\n", 47 | "There will be default values presented in the panels that you can accept, or enter your own values. All of the information will be stored in the directory that the notebooks are stored on. Once you have entered the information, the system will attempt to connect to the database for you and then you can run all of the SQL scripts. More details on the CONNECT syntax will be found in a section below.\n", 48 | "\n", 49 | "The next statement will force a CONNECT to occur with the default values. If you have not connected before, it will prompt you for the information." 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "%sql CONNECT" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "## Line versus Cell Command\n", 68 | "The Db2 extension is made up of one magic command that works either at the LINE level (`%sql`) or at the CELL level (`%%sql`). If you only want to execute a SQL command on one line in your script, use the %sql form of the command. If you want to run a larger block of SQL, then use the `%%sql` form. Note that when you use the `%%sql` form of the command, the entire contents of the cell is considered part of the command, so you cannot mix other commands in the cell.\n", 69 | "\n", 70 | "The following is an example of a line command:" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "collapsed": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "%sql VALUES 'HELLO THERE'" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "If you have SQL that requires multiple lines, of if you need to execute many lines of SQL, then you should \n", 89 | "be using the CELL version of the `%sql` command. To start a block of SQL, start the cell with `%%sql` and do not place any SQL following the command. Subsequent lines can contain SQL code, with each SQL statement delimited with the semicolon (`;`). You can change the delimiter if required for procedures, etc... More details on this later." 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": { 96 | "collapsed": true 97 | }, 98 | "outputs": [], 99 | "source": [ 100 | "%%sql\n", 101 | "VALUES\n", 102 | " 1,\n", 103 | " 2,\n", 104 | " 3" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "If you are using a single statement then there is no need to use a delimiter. However, if you are combining a number of commands then you must use the semicolon." 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": { 118 | "collapsed": true 119 | }, 120 | "outputs": [], 121 | "source": [ 122 | "%%sql\n", 123 | "DROP TABLE STUFF;\n", 124 | "CREATE TABLE STUFF (A INT);\n", 125 | "INSERT INTO STUFF VALUES\n", 126 | " 1,2,3;\n", 127 | "SELECT * FROM STUFF;" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "The script will generate messages and output as it executes. Each SQL statement that generates results will have a table displayed with the result set. If a command is executed, the results of the execution get listed as well. The script you just ran probably generated an error on the DROP table command." 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "## Options\n", 142 | "Both forms of the `%sql` command have options that can be used to change the behavior of the code. For both forms of the command (`%sql`, `%%sql`), the options must be on the same line as the command:\n", 143 | "
\n",
 144 |     "%sql -t ...\n",
 145 |     "%%sql -t\n",
 146 |     "
\n", 147 | "\n", 148 | "The only difference is that the `%sql` command can have SQL following the parameters, while the `%%sql` requires the SQL to be placed on subsequent lines.\n", 149 | "\n", 150 | "There are a number of parameters that you can specify as part of the `%sql` statement. \n", 151 | "\n", 152 | "* -d - Use alternative delimiter\n", 153 | "* -t - Time the statement execution\n", 154 | "* -q - Suppress messages \n", 155 | "* -j - JSON formatting of a column\n", 156 | "* -a - Show all output\n", 157 | "* -pb - Bar chart of results\n", 158 | "* -pp - Pie chart of results \n", 159 | "* -pl - Line chart of results\n", 160 | "* -i - Interactive mode with Pixiedust\n", 161 | "* -sampledata Load the database with the sample EMPLOYEE and DEPARTMENT tables\n", 162 | "* -r - Return the results into a variable (list of rows)\n", 163 | "\n", 164 | "Multiple parameters are allowed on a command line. Each option should be separated by a space:\n", 165 | "
\n",
 166 |     "%sql -a -j ...\n",
 167 |     "
\n", 168 | "\n", 169 | "A SELECT statement will return the results as a dataframe and display the results as a table in the notebook. If you use the assignment statement, the dataframe will be placed into the variable and the results will not be displayed:\n", 170 | "
\n",
 171 |     "r = %sql SELECT * FROM EMPLOYEE\n",
 172 |     "
\n", 173 | "\n", 174 | "The sections below will explain the options in more detail." 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "## Delimiters\n", 182 | "The default delimiter for all SQL statements is the semicolon. However, this becomes a problem when you try to create a trigger, function, or procedure that uses SQLPL (or PL/SQL). Use the -d option to turn the SQL delimiter into the at (`@`) sign and -q to suppress error messages. The semi-colon is then ignored as a delimiter.\n", 183 | "\n", 184 | "For example, the following SQL will use the `@` sign as the delimiter." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": { 191 | "collapsed": true 192 | }, 193 | "outputs": [], 194 | "source": [ 195 | "%%sql -d -q\n", 196 | "DROP TABLE STUFF\n", 197 | "@\n", 198 | "CREATE TABLE STUFF (A INT)\n", 199 | "@\n", 200 | "INSERT INTO STUFF VALUES\n", 201 | " 1,2,3\n", 202 | "@\n", 203 | "SELECT * FROM STUFF\n", 204 | "@" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "The delimiter change will only take place for the statements following the `%%sql` command. Subsequent cells\n", 212 | "in the notebook will still use the semicolon. You must use the -d option for every cell that needs to use the\n", 213 | "semicolon in the script." 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "## Limiting Result Sets\n", 221 | "The default number of rows displayed for any result set is 10. You have the option of changing this option when initially connecting to the database. If you want to override the number of rows display you can either update\n", 222 | "the control variable, or use the -a option. The -a option will display all of the rows in the answer set. For instance, the following SQL will only show 10 rows even though we inserted 15 values:" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": { 229 | "collapsed": true 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "%sql values 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "You will notice that the displayed result will split the visible rows to the first 5 rows and the last 5 rows.\n", 241 | "Using the -a option will display all values:" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": { 248 | "collapsed": true 249 | }, 250 | "outputs": [], 251 | "source": [ 252 | "%sql -a values 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "To change the default value of rows displayed, you can either do a CONNECT RESET (discussed later) or set the\n", 260 | "Db2 control variable maxrows to a different value. A value of -1 will display all rows." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": { 267 | "collapsed": true 268 | }, 269 | "outputs": [], 270 | "source": [ 271 | "# Save previous version of maximum rows\n", 272 | "last_max = _settings['maxrows']\n", 273 | "_settings['maxrows'] = 5\n", 274 | "%sql values 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "A special note regarding the output from a SELECT statement. If the SQL statement is the last line of a block, the results will be displayed by default (unless you assigned the results to a variable). If the SQL is in the middle of a block of statements, the results will not be displayed. To explicitly display the results you must use the display function (or pDisplay if you have imported another library like pixiedust which overrides the pandas display function). " 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": { 288 | "collapsed": true 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "# Set the maximum back\n", 293 | "_settings['maxrows'] = last_max\n", 294 | "%sql values 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "## Quiet Mode\n", 302 | "Every SQL statement will result in some output. You will either get an answer set (SELECT), or an indication if\n", 303 | "the command worked. For instance, the following set of SQL will generate some error messages since the tables \n", 304 | "will probably not exist:" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": { 311 | "collapsed": true 312 | }, 313 | "outputs": [], 314 | "source": [ 315 | "%%sql\n", 316 | "DROP TABLE TABLE_NOT_FOUND;\n", 317 | "DROP TABLE TABLE_SPELLED_WRONG;" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "If you know that these errors may occur you can silence them with the -q option." 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": { 331 | "collapsed": true 332 | }, 333 | "outputs": [], 334 | "source": [ 335 | "%%sql -q\n", 336 | "DROP TABLE TABLE_NOT_FOUND;\n", 337 | "DROP TABLE TABLE_SPELLED_WRONG;" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "SQL output will not be suppressed, so the following command will still show the results." 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "collapsed": true 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "%%sql -q\n", 356 | "DROP TABLE TABLE_NOT_FOUND;\n", 357 | "DROP TABLE TABLE_SPELLED_WRONG;\n", 358 | "VALUES 1,2,3;" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "## Variables in %sql Blocks" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "The `%sql` syntax allows you to pass local variables to a script. There are 5 predefined variables defined in the program:\n", 373 | "\n", 374 | "- database - The name of the database you are connected to\n", 375 | "- uid - The userid that you connected with\n", 376 | "- host = The IP address of the host system\n", 377 | "- port - The port number of the host system\n", 378 | "- max - The maximum number of rows to return in an answer set\n", 379 | "\n", 380 | "These variables are all part of a structure called _settings. To pass a value to a LINE script, use the braces {} to surround the name of the variable:\n", 381 | "\n", 382 | "
\n",
 383 |     "   {_settings[\"database\"]}\n",
 384 |     "
\n", 385 | "\n", 386 | "The next line will display the currently connected database." 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": { 393 | "collapsed": true 394 | }, 395 | "outputs": [], 396 | "source": [ 397 | "%sql VALUES '{_settings[\"database\"]}'" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "You cannot use variable substitution with the CELL version of the `%%sql` command. If your SQL statement extends beyond one line, and you want to use variable substitution, you can use a couple of techniques to make it look like one line. The simplest way is to add the backslash character (```\\```) at the end of every line. The following example illustrates the technique." 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": { 411 | "collapsed": true 412 | }, 413 | "outputs": [], 414 | "source": [ 415 | "empno = '000010'\n", 416 | "%sql SELECT LASTNAME FROM \\\n", 417 | " EMPLOYEE \\\n", 418 | " WHERE \\\n", 419 | " EMPNO = '{empno}'" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "The other option for passing variables to a `%sql` or `%%sql` statement is to use the embedded variable format. This requires that the variable be prefixed with a colon (`:`) in front of it. When using this format, you do not need to use quote characters around the variables since its value is extracted at run time. The first example uses the value of the variable." 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": { 433 | "collapsed": true 434 | }, 435 | "outputs": [], 436 | "source": [ 437 | "empno = '000010'\n", 438 | "%sql select lastname from employee where empno='{empno}'" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "This example uses the embedded variable name (`:empno`)." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": null, 451 | "metadata": { 452 | "collapsed": true 453 | }, 454 | "outputs": [], 455 | "source": [ 456 | "%sql select lastname from employee where empno=:empno" 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "metadata": {}, 462 | "source": [ 463 | "## Timing SQL Statements\n", 464 | "Sometimes you want to see how the execution of a statement changes with the addition of indexes or other\n", 465 | "optimization changes. The -t option will run the statement on the LINE or one SQL statement in the CELL for \n", 466 | "exactly one second. The results will be displayed and optionally placed into a variable. The syntax of the\n", 467 | "command is:\n", 468 | "
\n",
 469 |     "sql_time = %sql -t SELECT * FROM EMPLOYEE\n",
 470 |     "
\n", 471 | "For instance, the following SQL will time the VALUES clause." 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": null, 477 | "metadata": { 478 | "collapsed": true 479 | }, 480 | "outputs": [], 481 | "source": [ 482 | "%sql -t VALUES 1,2,3,4,5,6,7,8,9" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "When timing a statement, no output will be displayed. If your SQL statement takes longer than one second you\n", 490 | "will need to modify the db2 _runtime variable. This variable must be set to the number of seconds that you\n", 491 | "want to run the statement." 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": null, 497 | "metadata": { 498 | "collapsed": true 499 | }, 500 | "outputs": [], 501 | "source": [ 502 | "_runtime = 5\n", 503 | "%sql -t VALUES 1,2,3,4,5,6,7,8,9" 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "metadata": {}, 509 | "source": [ 510 | "## JSON Formatting\n", 511 | "Db2 supports querying JSON that is stored in a column within a table. Standard output would just display the \n", 512 | "JSON as a string. For instance, the following statement would just return a large string of output." 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": { 519 | "collapsed": true 520 | }, 521 | "outputs": [], 522 | "source": [ 523 | "%%sql\n", 524 | "VALUES \n", 525 | " '{\n", 526 | " \"empno\":\"000010\",\n", 527 | " \"firstnme\":\"CHRISTINE\",\n", 528 | " \"midinit\":\"I\",\n", 529 | " \"lastname\":\"HAAS\",\n", 530 | " \"workdept\":\"A00\",\n", 531 | " \"phoneno\":[3978],\n", 532 | " \"hiredate\":\"01/01/1995\",\n", 533 | " \"job\":\"PRES\",\n", 534 | " \"edlevel\":18,\n", 535 | " \"sex\":\"F\",\n", 536 | " \"birthdate\":\"08/24/1963\",\n", 537 | " \"pay\" : {\n", 538 | " \"salary\":152750.00,\n", 539 | " \"bonus\":1000.00,\n", 540 | " \"comm\":4220.00}\n", 541 | " }'" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "metadata": {}, 547 | "source": [ 548 | "Adding the -j option to the %sql (or %%sql) command will format the first column of a return set to better\n", 549 | "display the structure of the document. Note that if your answer set has additional columns associated with it, they will not be displayed in this format." 550 | ] 551 | }, 552 | { 553 | "cell_type": "code", 554 | "execution_count": null, 555 | "metadata": { 556 | "collapsed": true 557 | }, 558 | "outputs": [], 559 | "source": [ 560 | "%%sql -j\n", 561 | "VALUES \n", 562 | " '{\n", 563 | " \"empno\":\"000010\",\n", 564 | " \"firstnme\":\"CHRISTINE\",\n", 565 | " \"midinit\":\"I\",\n", 566 | " \"lastname\":\"HAAS\",\n", 567 | " \"workdept\":\"A00\",\n", 568 | " \"phoneno\":[3978],\n", 569 | " \"hiredate\":\"01/01/1995\",\n", 570 | " \"job\":\"PRES\",\n", 571 | " \"edlevel\":18,\n", 572 | " \"sex\":\"F\",\n", 573 | " \"birthdate\":\"08/24/1963\",\n", 574 | " \"pay\" : {\n", 575 | " \"salary\":152750.00,\n", 576 | " \"bonus\":1000.00,\n", 577 | " \"comm\":4220.00}\n", 578 | " }'" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "## Plotting\n", 586 | "Sometimes it would be useful to display a result set as either a bar, pie, or line chart. The first one or two\n", 587 | "columns of a result set need to contain the values need to plot the information.\n", 588 | "\n", 589 | "The three possible plot options are:\n", 590 | " \n", 591 | "* -pb - bar chart (x,y)\n", 592 | "* -pp - pie chart (y)\n", 593 | "* -pl - line chart (x,y)\n", 594 | "\n", 595 | "The following data will be used to demonstrate the different charting options." 596 | ] 597 | }, 598 | { 599 | "cell_type": "code", 600 | "execution_count": null, 601 | "metadata": { 602 | "collapsed": true 603 | }, 604 | "outputs": [], 605 | "source": [ 606 | "%sql values 1,2,3,4,5" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "Since the results only have one column, the pie, line, and bar charts will not have any labels associated with\n", 614 | "them. The first example is a bar chart." 615 | ] 616 | }, 617 | { 618 | "cell_type": "code", 619 | "execution_count": null, 620 | "metadata": { 621 | "collapsed": true, 622 | "pixiedust": { 623 | "displayParams": { 624 | "handlerId": "dataframe" 625 | } 626 | } 627 | }, 628 | "outputs": [], 629 | "source": [ 630 | "%sql -pb values 1,2,3,4,5" 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": {}, 636 | "source": [ 637 | "The same data as a pie chart." 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": null, 643 | "metadata": { 644 | "collapsed": true 645 | }, 646 | "outputs": [], 647 | "source": [ 648 | "%sql -pp values 1,2,3,4,5" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "And finally a line chart." 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": null, 661 | "metadata": { 662 | "collapsed": true, 663 | "pixiedust": { 664 | "displayParams": { 665 | "handlerId": "dataframe" 666 | } 667 | }, 668 | "scrolled": true 669 | }, 670 | "outputs": [], 671 | "source": [ 672 | "%sql -pl values 1,2,3,4,5" 673 | ] 674 | }, 675 | { 676 | "cell_type": "markdown", 677 | "metadata": {}, 678 | "source": [ 679 | "If you retrieve two columns of information, the first column is used for the labels (X axis or pie slices) and \n", 680 | "the second column contains the data. " 681 | ] 682 | }, 683 | { 684 | "cell_type": "code", 685 | "execution_count": null, 686 | "metadata": { 687 | "collapsed": true 688 | }, 689 | "outputs": [], 690 | "source": [ 691 | "%sql -pb values ('A',1),('B',2),('C',3),('D',4),('E',5)" 692 | ] 693 | }, 694 | { 695 | "cell_type": "markdown", 696 | "metadata": {}, 697 | "source": [ 698 | "For a pie chart, the first column is used to label the slices, while the data comes from the second column." 699 | ] 700 | }, 701 | { 702 | "cell_type": "code", 703 | "execution_count": null, 704 | "metadata": { 705 | "collapsed": true 706 | }, 707 | "outputs": [], 708 | "source": [ 709 | "%sql -pp values ('A',1),('B',2),('C',3),('D',4),('E',5)" 710 | ] 711 | }, 712 | { 713 | "cell_type": "markdown", 714 | "metadata": {}, 715 | "source": [ 716 | "Finally, for a line chart, the x contains the labels and the y values are used." 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": null, 722 | "metadata": { 723 | "collapsed": true 724 | }, 725 | "outputs": [], 726 | "source": [ 727 | "%sql -pl values ('A',1),('B',2),('C',3),('D',4),('E',5)" 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": {}, 733 | "source": [ 734 | "The following SQL will plot the number of employees per department." 735 | ] 736 | }, 737 | { 738 | "cell_type": "code", 739 | "execution_count": null, 740 | "metadata": { 741 | "collapsed": true 742 | }, 743 | "outputs": [], 744 | "source": [ 745 | "%%sql -pb\n", 746 | "SELECT WORKDEPT, COUNT(*) \n", 747 | " FROM EMPLOYEE\n", 748 | "GROUP BY WORKDEPT" 749 | ] 750 | }, 751 | { 752 | "cell_type": "markdown", 753 | "metadata": {}, 754 | "source": [ 755 | "The final option for plotting data is to use interactive mode `-i`. This will display the data using an open-source project called Pixiedust. You can view the results in a table and then interactively create a plot by dragging and dropping column names into the appropriate slot. The next command will place you into interactive mode." 756 | ] 757 | }, 758 | { 759 | "cell_type": "code", 760 | "execution_count": null, 761 | "metadata": { 762 | "collapsed": true, 763 | "pixiedust": { 764 | "displayParams": { 765 | "handlerId": "dataframe" 766 | } 767 | } 768 | }, 769 | "outputs": [], 770 | "source": [ 771 | "%sql -i select * from employee" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": {}, 777 | "source": [ 778 | "## Sample Data\n", 779 | "Many of the Db2 notebooks depend on two of the tables that are found in the SAMPLE database. Rather than\n", 780 | "having to create the entire SAMPLE database, this option will create and populate the EMPLOYEE and \n", 781 | "DEPARTMENT tables in your database. Note that if you already have these tables defined, they will not be dropped." 782 | ] 783 | }, 784 | { 785 | "cell_type": "code", 786 | "execution_count": null, 787 | "metadata": { 788 | "collapsed": true 789 | }, 790 | "outputs": [], 791 | "source": [ 792 | "%sql -sampledata" 793 | ] 794 | }, 795 | { 796 | "cell_type": "markdown", 797 | "metadata": {}, 798 | "source": [ 799 | "## Result Sets \n", 800 | "By default, any `%sql` block will return the contents of a result set as a table that is displayed in the notebook. The results are displayed using a feature of pandas dataframes. The following select statement demonstrates a simple result set." 801 | ] 802 | }, 803 | { 804 | "cell_type": "code", 805 | "execution_count": null, 806 | "metadata": { 807 | "collapsed": true 808 | }, 809 | "outputs": [], 810 | "source": [ 811 | "%sql select * from employee fetch first 3 rows only" 812 | ] 813 | }, 814 | { 815 | "cell_type": "markdown", 816 | "metadata": {}, 817 | "source": [ 818 | "You can assign the result set directly to a variable." 819 | ] 820 | }, 821 | { 822 | "cell_type": "code", 823 | "execution_count": null, 824 | "metadata": { 825 | "collapsed": true 826 | }, 827 | "outputs": [], 828 | "source": [ 829 | "x = %sql select * from employee fetch first 3 rows only" 830 | ] 831 | }, 832 | { 833 | "cell_type": "markdown", 834 | "metadata": {}, 835 | "source": [ 836 | "The variable x contains the dataframe that was produced by the `%sql` statement so you access the result set by using this variable or display the contents by just referring to it in a command line." 837 | ] 838 | }, 839 | { 840 | "cell_type": "code", 841 | "execution_count": null, 842 | "metadata": { 843 | "collapsed": true 844 | }, 845 | "outputs": [], 846 | "source": [ 847 | "x" 848 | ] 849 | }, 850 | { 851 | "cell_type": "markdown", 852 | "metadata": {}, 853 | "source": [ 854 | "There is an additional way of capturing the data through the use of the `-r` flag.\n", 855 | "
\n",
 856 |     "var = %sql -r select * from employee\n",
 857 |     "
\n", 858 | "Rather than returning a dataframe result set, this option will produce a list of rows. Each row is a list itself. The rows and columns all start at zero (0), so to access the first column of the first row, you would use var[0][0] to access it." 859 | ] 860 | }, 861 | { 862 | "cell_type": "code", 863 | "execution_count": null, 864 | "metadata": { 865 | "collapsed": true 866 | }, 867 | "outputs": [], 868 | "source": [ 869 | "rows = %sql -r select * from employee fetch first 3 rows only\n", 870 | "print(rows[0][0])" 871 | ] 872 | }, 873 | { 874 | "cell_type": "markdown", 875 | "metadata": {}, 876 | "source": [ 877 | "The number of rows in the result set can be determined by using the length function." 878 | ] 879 | }, 880 | { 881 | "cell_type": "code", 882 | "execution_count": null, 883 | "metadata": { 884 | "collapsed": true 885 | }, 886 | "outputs": [], 887 | "source": [ 888 | "print(len(rows))" 889 | ] 890 | }, 891 | { 892 | "cell_type": "markdown", 893 | "metadata": {}, 894 | "source": [ 895 | "If you want to iterate over all of the rows and columns, you could use the following Python syntax instead of\n", 896 | "creating a for loop that goes from 0 to 41." 897 | ] 898 | }, 899 | { 900 | "cell_type": "code", 901 | "execution_count": null, 902 | "metadata": { 903 | "collapsed": true 904 | }, 905 | "outputs": [], 906 | "source": [ 907 | "for row in rows:\n", 908 | " line = \"\"\n", 909 | " for col in row:\n", 910 | " line = line + str(col) + \",\"\n", 911 | " print(line)" 912 | ] 913 | }, 914 | { 915 | "cell_type": "markdown", 916 | "metadata": {}, 917 | "source": [ 918 | "Since the data may be returned in different formats (like integers), you should use the str() function\n", 919 | "to convert the values to strings. Otherwise, the concatenation function used in the above example will fail. For\n", 920 | "instance, the 6th field is a birthdate field. If you retrieve it as an individual value and try and concatenate a string to it, you get the following error." 921 | ] 922 | }, 923 | { 924 | "cell_type": "code", 925 | "execution_count": null, 926 | "metadata": { 927 | "collapsed": true 928 | }, 929 | "outputs": [], 930 | "source": [ 931 | "print(\"Birth Date=\"+rows[0][6])" 932 | ] 933 | }, 934 | { 935 | "cell_type": "markdown", 936 | "metadata": {}, 937 | "source": [ 938 | "You can fix this problem by adding the str function to convert the date." 939 | ] 940 | }, 941 | { 942 | "cell_type": "code", 943 | "execution_count": null, 944 | "metadata": { 945 | "collapsed": true 946 | }, 947 | "outputs": [], 948 | "source": [ 949 | "print(\"Birth Date=\"+str(rows[0][6]))" 950 | ] 951 | }, 952 | { 953 | "cell_type": "markdown", 954 | "metadata": {}, 955 | "source": [ 956 | "## Db2 CONNECT Statement\n", 957 | "As mentioned at the beginning of this notebook, connecting to Db2 is automatically done when you issue your first\n", 958 | "`%sql` statement. Usually the program will prompt you with what options you want when connecting to a database. The other option is to use the CONNECT statement directly. The CONNECT statement is similar to the native Db2\n", 959 | "CONNECT command, but includes some options that allow you to connect to databases that has not been\n", 960 | "catalogued locally.\n", 961 | "\n", 962 | "The CONNECT command has the following format:\n", 963 | "
\n",
 964 |     "%sql CONNECT TO <database> USER <userid> USING <password | ?> HOST <ip address> PORT <port number>\n",
 965 |     "
\n", 966 | "If you use a \"?\" for the password field, the system will prompt you for a password. This avoids typing the \n", 967 | "password as clear text on the screen. If a connection is not successful, the system will print the error\n", 968 | "message associated with the connect request.\n", 969 | "\n", 970 | "If the connection is successful, the parameters are saved on your system and will be used the next time you\n", 971 | "run a SQL statement, or when you issue the %sql CONNECT command with no parameters.\n", 972 | "\n", 973 | "If you want to force the program to connect to a different database (with prompting), use the CONNECT RESET command. The next time you run a SQL statement, the program will prompt you for the the connection\n", 974 | "and will force the program to reconnect the next time a SQL statement is executed." 975 | ] 976 | }, 977 | { 978 | "cell_type": "code", 979 | "execution_count": null, 980 | "metadata": { 981 | "collapsed": true 982 | }, 983 | "outputs": [], 984 | "source": [ 985 | "%sql CONNECT RESET" 986 | ] 987 | }, 988 | { 989 | "cell_type": "code", 990 | "execution_count": null, 991 | "metadata": { 992 | "collapsed": true 993 | }, 994 | "outputs": [], 995 | "source": [ 996 | "%sql CONNECT" 997 | ] 998 | }, 999 | { 1000 | "cell_type": "markdown", 1001 | "metadata": {}, 1002 | "source": [ 1003 | "#### Credits: IBM 2018, George Baklarz [baklarz@ca.ibm.com]" 1004 | ] 1005 | } 1006 | ], 1007 | "metadata": { 1008 | "kernelspec": { 1009 | "display_name": "Python 3", 1010 | "language": "python", 1011 | "name": "python3" 1012 | }, 1013 | "language_info": { 1014 | "codemirror_mode": { 1015 | "name": "ipython", 1016 | "version": 3 1017 | }, 1018 | "file_extension": ".py", 1019 | "mimetype": "text/x-python", 1020 | "name": "python", 1021 | "nbconvert_exporter": "python", 1022 | "pygments_lexer": "ipython3", 1023 | "version": "3.6.1" 1024 | } 1025 | }, 1026 | "nbformat": 4, 1027 | "nbformat_minor": 2 1028 | } 1029 | --------------------------------------------------------------------------------