├── .gitignore ├── ACKNOWLEDGEMENTS.md ├── CONTRIBUTING.md ├── LICENSE ├── MAINTAINERS.md ├── README.md ├── data └── examples │ └── nodebook_1.ipynb ├── doc └── source │ └── images │ ├── architecture.png │ ├── new_custom_environment.png │ ├── new_notebook_custom_environment.png │ └── notebook_preview.png └── notebooks ├── images ├── display_sin_cos.png ├── mapbox_americas.png ├── mapbox_uk.png ├── pd_chart_types.png └── pixiedust_node_schematic.png └── nodebook_1.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | notebooks/.ipynb_checkpoints/ 2 | notebooks/derby.log 3 | notebooks/metastore_db/ 4 | 5 | 6 | -------------------------------------------------------------------------------- /ACKNOWLEDGEMENTS.md: -------------------------------------------------------------------------------- 1 | ## Acknowledgements 2 | 3 | * [Glynn Bird](https://github.com/glynnbird) created the [pixiedust_node](https://github.com/ibm-watson-data-lab/pixiedust_node) Python library, on which this code pattern is based. 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | This is an open source project, and we appreciate your help! 4 | 5 | We use the GitHub issue tracker to discuss new features and non-trivial bugs. 6 | 7 | In addition to the issue tracker, [#journeys on 8 | Slack](https://dwopen.slack.com) is the best way to get into contact with the 9 | project's maintainers. 10 | 11 | To contribute code, documentation, or tests, please submit a pull request to 12 | the GitHub repository. Generally, we expect two maintainers to review your pull 13 | request before it is approved for merging. For more details, see the 14 | [MAINTAINERS](MAINTAINERS.md) page. 15 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | # Maintainers Guide 2 | 3 | This guide is intended for maintainers - anybody with commit access to one or 4 | more Code Pattern repositories. 5 | 6 | ## Methodology 7 | 8 | This repository does not have a traditional release management cycle, but 9 | should instead be maintained as a useful, working, and polished reference at 10 | all times. While all work can therefore be focused on the master branch, the 11 | quality of this branch should never be compromised. 12 | 13 | The remainder of this document details how to merge pull requests to the 14 | repositories. 15 | 16 | ## Merge approval 17 | 18 | The project maintainers use LGTM (Looks Good To Me) in comments on the pull 19 | request to indicate acceptance prior to merging. A change requires LGTMs from 20 | two project maintainers. If the code is written by a maintainer, the change 21 | only requires one additional LGTM. 22 | 23 | ## Reviewing Pull Requests 24 | 25 | We recommend reviewing pull requests directly within GitHub. This allows a 26 | public commentary on changes, providing transparency for all users. When 27 | providing feedback be civil, courteous, and kind. Disagreement is fine, so long 28 | as the discourse is carried out politely. If we see a record of uncivil or 29 | abusive comments, we will revoke your commit privileges and invite you to leave 30 | the project. 31 | 32 | During your review, consider the following points: 33 | 34 | ### Does the change have positive impact? 35 | 36 | Some proposed changes may not represent a positive impact to the project. Ask 37 | whether or not the change will make understanding the code easier, or if it 38 | could simply be a personal preference on the part of the author (see 39 | [bikeshedding](https://en.wiktionary.org/wiki/bikeshedding)). 40 | 41 | Pull requests that do not have a clear positive impact should be closed without 42 | merging. 43 | 44 | ### Do the changes make sense? 45 | 46 | If you do not understand what the changes are or what they accomplish, ask the 47 | author for clarification. Ask the author to add comments and/or clarify test 48 | case names to make the intentions clear. 49 | 50 | At times, such clarification will reveal that the author may not be using the 51 | code correctly, or is unaware of features that accommodate their needs. If you 52 | feel this is the case, work up a code sample that would address the pull 53 | request for them, and feel free to close the pull request once they confirm. 54 | 55 | ### Does the change introduce a new feature? 56 | 57 | For any given pull request, ask yourself "is this a new feature?" If so, does 58 | the pull request (or associated issue) contain narrative indicating the need 59 | for the feature? If not, ask them to provide that information. 60 | 61 | Are new unit tests in place that test all new behaviors introduced? If not, do 62 | not merge the feature until they are! Is documentation in place for the new 63 | feature? (See the documentation guidelines). If not do not merge the feature 64 | until it is! Is the feature necessary for general use cases? Try and keep the 65 | scope of any given component narrow. If a proposed feature does not fit that 66 | scope, recommend to the user that they maintain the feature on their own, and 67 | close the request. You may also recommend that they see if the feature gains 68 | traction among other users, and suggest they re-submit when they can show such 69 | support. 70 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Run Node.js code in Jupyter notebooks 2 | 3 | Notebooks are where data scientists process, analyse, and visualise data in an iterative, collaborative environment. They typically run environments for languages like Python, R, and Scala. For years, data science notebooks have served academics and research scientists as a scratchpad for writing code, refining algorithms, and sharing and proving their work. Today, it's a workflow that lends itself well to web developers experimenting with data sets in Node.js. 4 | 5 | To that end, [pixiedust_node](https://github.com/pixiedust/pixiedust_node) is an add-on for Jupyter notebooks that allows Node.js/JavaScript to run inside notebook cells. To learn more follow the setup steps and explore the getting started notebook or click on the sample image below to preview the output. 6 | 7 | [![preview](doc/source/images/notebook_preview.png)](https://nbviewer.jupyter.org/github/IBM/nodejs-in-notebooks/blob/master/data/examples/nodebook_1.ipynb) 8 | 9 | When the reader has completed this Code Pattern, they will understand how to: 10 | 11 | * Run Node.js/JavaScript inside a Jupyter Notebook 12 | * Use JavaScript variables, functions, and promises 13 | * Work with remote data sources 14 | * Share data between Python and Node.js 15 | 16 | ![architecture](doc/source/images/architecture.png) 17 | 18 | ## Flow 19 | 20 | 1. Install Node.js in target environment (Watson Studio or a local machine) 21 | 2. Open Node.js notebook in target environment 22 | 3. Run Node.js notebook 23 | 24 | ## Included Components 25 | * [Watson Studio](https://www.ibm.com/cloud/watson-studio): Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark. 26 | * [Jupyter Notebook](https://jupyter.org/): An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. 27 | * [PixieDust](https://github.com/pixiedust/pixiedust): Provides a Python helper library for IPython Notebook. 28 | * [Cloudant NoSQL DB](https://cloud.ibm.com/catalog/services/cloudant): A fully managed data layer designed for modern web and mobile applications that leverages a flexible JSON schema. 29 | 30 | ## Featured Technologies 31 | * [pixiedust_node](https://github.com/pixiedust/pixiedust_node): Open source Python package, providing support for Javascript/Node.js code. 32 | * [Node.js](https://nodejs.org/): An open-source JavaScript run-time environment for executing server-side JavaScript code. 33 | 34 | # Steps 35 | 36 | You can run Node.js code in Watson Studio or your local environment: 37 | * [Run Node.js notebooks in Watson Studio](#run-nodejs-notebooks-in-watson-studio) 38 | * [Run Node.js notebooks in a local environment](#run-nodejs-notebooks-in-a-local-environment) 39 | 40 | To preview an example notebook without going through a setup [follow this link](https://nbviewer.jupyter.org/github/IBM/nodejs-in-notebooks/blob/master/data/examples/nodebook_1.ipynb). 41 | 42 | ## Run Node.js notebooks in Watson Studio 43 | 44 | ### Creating a custom runtime environment 45 | 46 | A runtime environment in Watson Studio (IBM's Data Science platform) is defined by its hardware and software configuration. By default, Node.js is not installed in runtime environments and you therefore need to create a custom runtime environment definition. [[Learn more about environments...]](https://dataplatform.ibm.com/docs/content/analyze-data/notebook-environments.html) 47 | 48 | * Open [Watson Studio](https://www.ibm.com/cloud/watson-studio) in your web browser. Sign up for a free account if necessary. 49 | * [Create a "Complete" project.](https://dataplatform.ibm.com/projects?context=analytics) [[Learn more about projects...]](https://dataplatform.ibm.com/docs/content/manage-data/manage-projects.html) 50 | * In this project, open the **Environments** tab. A list of existing environment definitions for Python and R is displayed. 51 | * Create a new environment definition. 52 | * Assign a name to the new environment definition, such as `Python 2 with Node.js`. 53 | * Enter a brief environment description. 54 | * Choose the desired hardware configuration, such as a minimalist free setup (which is sufficient for demonstration purposes). 55 | * Select Python 2 as _software version_. (Python 3 is currently not supported by pixiedust_node.) 56 | * `Create` the environment definition. 57 | * Customize the software definition. 58 | * Add the [nodejs conda package](https://anaconda.org/anaconda/nodejs) dependency, as shown below: 59 | ``` 60 | # Please add conda channels here 61 | channels: 62 | - defaults 63 | 64 | # Please add conda packages here 65 | dependencies: 66 | - nodejs 67 | 68 | # Please add pip packages here 69 | # To add pip packages, please comment out the next line 70 | #- pip: 71 | ``` 72 | * `Apply` the customization. It should look as follows: 73 | 74 | ![create_custom_runtime_environment](doc/source/images/new_custom_environment.png) 75 | 76 | You can now associate notebooks with this environment definition and run Node.js in the code cells, as illustrated in the getting started notebook. 77 | > Note: An environment definition is only available within the project that it was defined in. 78 | 79 | ### Loading the getting started notebook 80 | 81 | The [getting started notebook](notebooks/nodebook_1.ipynb) outlines how to 82 | * use variables, functions, and promises, 83 | * work with remote data sources, such as Apache CouchDB (or its managed sibling Cloudant), 84 | * visualize data, 85 | * share data between Python and Node.js. 86 | 87 | In the project you've created, add a new notebook _from URL_: 88 | * Enter any notebook name. 89 | * Specify remote URL `https://raw.githubusercontent.com/IBM/nodebook-code-pattern/master/notebooks/nodebook_1.ipynb` as source. 90 | * Select the custom runtime environment `Python 2 with Node.js.` you've created earlier. 91 | 92 | ![create_nodebook](doc/source/images/new_notebook_custom_environment.png) 93 | 94 | Follow the notebook instructions. 95 | > You should be able to run all cells one at a time without making any changes. Do not use run all. 96 | 97 | *** 98 | 99 | ## Run Node.js notebooks in a local environment 100 | 101 | ### Prerequisites 102 | To get started with nodebooks you'll need a local installation of 103 | 104 | * [PixieDust and its prerequisites](https://pixiedust.github.io/pixiedust/install.html) 105 | * A Python 2.7 kernel with Spark 2.x. (see section *Install a Jupyter Kernel* in [the PixieDust installation instructions](https://pixiedust.github.io/pixiedust/install.html)) 106 | * [Node.js/npm](https://nodejs.org/en/download/) 107 | 108 | 109 | ### Installing the samples 110 | 111 | To access the samples, clone this repository and launch a Jupyter server on your local machine. 112 | 113 | ``` 114 | $ git clone https://github.com/IBM/nodejs-in-notebooks.git 115 | $ cd nodejs-in-notebooks 116 | $ jupyter notebook notebooks/ 117 | ``` 118 | 119 | ### Running the samples 120 | 121 | Open [nodebook_1](notebooks/nodebook_1.ipynb) to learn more about 122 | 123 | * using variables, functions, and promises, 124 | * working with remote data sources, such as Apache CouchDB (or its managed sibling Cloudant), 125 | * visualizing data, 126 | * sharing data between Python and Node.js. 127 | 128 | > You should be able to run all cells one at a time without making any changes. Do not use run all. 129 | 130 | *** 131 | 132 | ## Optional data source customization 133 | 134 | Some of the nodebook code pattern examples access a read-only Cloudant database for illustrative purposes. If you prefer you can create your own copy of this database by replicating from remote database URL `https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities`. [[Learn more about database replication](https://developer.ibm.com/clouddataservices/docs/cloudant/replication/)...] 135 | 136 | # Sample Output 137 | 138 | 139 | Open [this link](https://nbviewer.jupyter.org/github/IBM/nodejs-in-notebooks/blob/master/data/examples/nodebook_1.ipynb) to preview the completed notebook. 140 | 141 | # Links 142 | * [pixiedust_node](https://github.com/pixiedust/pixiedust_node) 143 | * [pixiedust](https://github.com/pixiedust/pixiedust) 144 | * [Nodebooks: Introducing Node.js Data Science Notebooks](https://medium.com/ibm-watson-data-lab/nodebooks-node-js-data-science-notebooks-aa140bea21ba) 145 | * [Nodebooks: Sharing Data Between Node.js & Python](https://medium.com/ibm-watson-data-lab/nodebooks-sharing-data-between-node-js-python-3a4acae27a02) 146 | * [Sharing Variables Between Python & Node.js in Jupyter Notebooks](https://medium.com/ibm-watson-data-lab/sharing-variables-between-python-node-js-in-jupyter-notebooks-682a79d4bdd9) 147 | 148 | # Learn more 149 | * **Watson Studio**: Master the art of data science with IBM's [Watson Studio](https://www.ibm.com/cloud/watson-studio/) 150 | * **Data Analytics Code Patterns**: Enjoyed this Code Pattern? Check out our other [Data Analytics Code Patterns](https://developer.ibm.com/technologies/data-science/) 151 | * **With Watson**: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? [Join the With Watson program](https://www.ibm.com/watson/with-watson/) to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution. 152 | 153 | # License 154 | 155 | This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the [Developer Certificate of Origin, Version 1.1 (DCO)](https://developercertificate.org/) and the [Apache Software License, Version 2](https://www.apache.org/licenses/LICENSE-2.0.txt). 156 | 157 | [Apache Software License (ASL) FAQ](https://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN) 158 | -------------------------------------------------------------------------------- /data/examples/nodebook_1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Nodebooks: Introducing Node.js Data Science Notebooks\n", 8 | "\n", 9 | "Notebooks are where data scientists process, analyse, and visualise data in an iterative, collaborative environment. They typically run environments for languages like Python, R, and Scala. For years, data science notebooks have served academics and research scientists as a scratchpad for writing code, refining algorithms, and sharing and proving their work. Today, it's a workflow that lends itself well to web developers experimenting with data sets in Node.js.\n", 10 | "\n", 11 | "To that end, pixiedust_node is an add-on for Jupyter notebooks that allows Node.js/JavaScript to run inside notebook cells. Not only can web developers use the same workflow for collaborating in Node.js, but they can also use the same tools to work with existing data scientists coding in Python.\n", 12 | "\n", 13 | "pixiedust_node is built on the popular PixieDust helper library. Let’s get started.\n", 14 | "\n", 15 | "> Note: Run one cell at a time or unexpected results might be observed.\n", 16 | "\n", 17 | "\n", 18 | "## Part 1: Variables, functions, and promises\n", 19 | "\n", 20 | "\n", 21 | "### Installing\n", 22 | "Install the [`pixiedust`](https://pypi.python.org/pypi/pixiedust) and [`pixiedust_node`](https://pypi.python.org/pypi/pixiedust-node) packages using `pip`, the Python package manager. " 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 1, 28 | "metadata": {}, 29 | "outputs": [ 30 | { 31 | "name": "stdout", 32 | "output_type": "stream", 33 | "text": [ 34 | "Requirement already up-to-date: pixiedust in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages\n", 35 | "Requirement not upgraded as not directly required: lxml in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", 36 | "Requirement not upgraded as not directly required: geojson in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", 37 | "Requirement not upgraded as not directly required: colour in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", 38 | "Requirement not upgraded as not directly required: mpld3 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", 39 | "Requirement not upgraded as not directly required: astunparse in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", 40 | "Requirement not upgraded as not directly required: markdown in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust)\n", 41 | "Requirement not upgraded as not directly required: six<2.0,>=1.6.1 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust)\n", 42 | "Requirement not upgraded as not directly required: wheel<1.0,>=0.23.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust)\n", 43 | "Requirement already up-to-date: pixiedust_node in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages\n", 44 | "Requirement not upgraded as not directly required: pixiedust in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node)\n", 45 | "Requirement not upgraded as not directly required: pandas in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node)\n", 46 | "Requirement not upgraded as not directly required: ipython in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust_node)\n", 47 | "Requirement not upgraded as not directly required: lxml in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", 48 | "Requirement not upgraded as not directly required: geojson in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", 49 | "Requirement not upgraded as not directly required: colour in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", 50 | "Requirement not upgraded as not directly required: mpld3 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", 51 | "Requirement not upgraded as not directly required: astunparse in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", 52 | "Requirement not upgraded as not directly required: markdown in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pixiedust->pixiedust_node)\n", 53 | "Requirement not upgraded as not directly required: python-dateutil in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node)\n", 54 | "Requirement not upgraded as not directly required: pytz>=2011k in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node)\n", 55 | "Requirement not upgraded as not directly required: numpy>=1.9.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pandas->pixiedust_node)\n", 56 | "Requirement not upgraded as not directly required: setuptools>=18.5 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 57 | "Requirement not upgraded as not directly required: decorator in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 58 | "Requirement not upgraded as not directly required: pickleshare in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 59 | "Requirement not upgraded as not directly required: simplegeneric>0.8 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 60 | "Requirement not upgraded as not directly required: traitlets>=4.2 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 61 | "Requirement not upgraded as not directly required: prompt_toolkit<2.0.0,>=1.0.4 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 62 | "Requirement not upgraded as not directly required: pygments in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 63 | "Requirement not upgraded as not directly required: pexpect in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 64 | "Requirement not upgraded as not directly required: backports.shutil_get_terminal_size in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 65 | "Requirement not upgraded as not directly required: pathlib2 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from ipython->pixiedust_node)\n", 66 | "Requirement not upgraded as not directly required: six<2.0,>=1.6.1 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust->pixiedust_node)\n", 67 | "Requirement not upgraded as not directly required: wheel<1.0,>=0.23.0 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from astunparse->pixiedust->pixiedust_node)\n", 68 | "Requirement not upgraded as not directly required: ipython_genutils in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from traitlets>=4.2->ipython->pixiedust_node)\n", 69 | "Requirement not upgraded as not directly required: enum34 in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from traitlets>=4.2->ipython->pixiedust_node)\n", 70 | "Requirement not upgraded as not directly required: wcwidth in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from prompt_toolkit<2.0.0,>=1.0.4->ipython->pixiedust_node)\n", 71 | "Requirement not upgraded as not directly required: scandir in /opt/conda/envs/DSX-Python27/lib/python2.7/site-packages (from pathlib2->ipython->pixiedust_node)\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "# install or upgrade the packages\n", 77 | "# restart the kernel to pick up the latest version\n", 78 | "!pip install pixiedust --upgrade\n", 79 | "!pip install pixiedust_node --upgrade" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "### Using pixiedust_node\n", 87 | "Now we can import `pixiedust_node` into our notebook:" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 2, 93 | "metadata": {}, 94 | "outputs": [ 95 | { 96 | "name": "stdout", 97 | "output_type": "stream", 98 | "text": [ 99 | "Pixiedust database opened successfully\n" 100 | ] 101 | }, 102 | { 103 | "data": { 104 | "text/html": [ 105 | "\n", 106 | "
\n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " Pixiedust version 1.1.11\n", 111 | "
\n", 112 | " " 113 | ], 114 | "text/plain": [ 115 | "" 116 | ] 117 | }, 118 | "metadata": {}, 119 | "output_type": "display_data" 120 | }, 121 | { 122 | "data": { 123 | "text/html": [ 124 | "\n", 125 | "
\n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " Pixiedust Node.js \n", 130 | "
\n" 131 | ], 132 | "text/plain": [ 133 | "" 134 | ] 135 | }, 136 | "metadata": {}, 137 | "output_type": "display_data" 138 | }, 139 | { 140 | "name": "stdout", 141 | "output_type": "stream", 142 | "text": [ 143 | "pixiedust_node 0.2.5 started. Cells starting '%%node' may contain Node.js code.\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "import pixiedust_node" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "And then we can write JavaScript code in cells whose first line is `%%node`:" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 3, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "%%node\n", 165 | "// get the current date\n", 166 | "var date = new Date();" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "It’s that easy! We can have Python and Node.js in the same notebook. Cells are Python by default, but simply starting a cell with `%%node` indicates that the next lines will be JavaScript." 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "### Displaying HTML and images in notebook cells\n", 181 | "We can use the `html` function to render HTML code in a cell:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 4, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/html": [ 192 | "

Quote

\"Imagination is more important than knowledge\"\n", 193 | "Albert Einstein
" 194 | ], 195 | "text/plain": [ 196 | "" 197 | ] 198 | }, 199 | "metadata": {}, 200 | "output_type": "display_data" 201 | } 202 | ], 203 | "source": [ 204 | "%%node\n", 205 | "var str = '

Quote

\"Imagination is more important than knowledge\"\\nAlbert Einstein
';\n", 206 | "html(str)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "If we have an image we want to render, we can do that with the `image` function:" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 5, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/html": [ 224 | "" 225 | ], 226 | "text/plain": [ 227 | "" 228 | ] 229 | }, 230 | "metadata": {}, 231 | "output_type": "display_data" 232 | } 233 | ], 234 | "source": [ 235 | "%%node\n", 236 | "var url = 'https://github.com/IBM/nodejs-in-notebooks/blob/master/notebooks/images/pixiedust_node_schematic.png?raw=true';\n", 237 | "image(url);" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### Printing JavaScript variables\n", 245 | "\n", 246 | "Print variables using `console.log`." 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 6, 252 | "metadata": {}, 253 | "outputs": [ 254 | { 255 | "name": "stdout", 256 | "output_type": "stream", 257 | "text": [ 258 | "{ a: 1, b: 'two', c: true }\n" 259 | ] 260 | } 261 | ], 262 | "source": [ 263 | "%%node\n", 264 | "var x = { a:1, b:'two', c: true };\n", 265 | "console.log(x);" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "Calling the `print` function within your JavaScript code is the same as calling `print` in your Python code." 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 7, 278 | "metadata": {}, 279 | "outputs": [ 280 | { 281 | "name": "stdout", 282 | "output_type": "stream", 283 | "text": [ 284 | "{\"a\": 3, \"c\": false, \"b\": \"four\"}\n" 285 | ] 286 | } 287 | ], 288 | "source": [ 289 | "%%node\n", 290 | "var y = { a:3, b:'four', c: false };\n", 291 | "print(y);" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "### Visualizing data using PixieDust\n", 299 | "You can also use PixieDust’s `display` function to render data graphically. Configuring the output as line chart, the visualization looks as follows: " 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 8, 305 | "metadata": { 306 | "pixiedust": { 307 | "displayParams": { 308 | "aggregation": "SUM", 309 | "chartsize": "99", 310 | "handlerId": "lineChart", 311 | "keyFields": "x", 312 | "rowCount": "500", 313 | "valueFields": "cos,sin" 314 | } 315 | } 316 | }, 317 | "outputs": [], 318 | "source": [ 319 | "%%node\n", 320 | "var data = [];\n", 321 | "for (var i = 0; i < 1000; i++) {\n", 322 | " var x = 2*Math.PI * i/ 360;\n", 323 | " var obj = {\n", 324 | " x: x,\n", 325 | " i: i,\n", 326 | " sin: Math.sin(x),\n", 327 | " cos: Math.cos(x),\n", 328 | " tan: Math.tan(x)\n", 329 | " };\n", 330 | " data.push(obj);\n", 331 | "}\n", 332 | "// render data \n", 333 | "display(data);" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "\n", 341 | "\n", 342 | "PixieDust presents visualisations of DataFrames using Matplotlib, Bokeh, Brunel, d3, Google Maps and, MapBox. No code is required on your part because PixieDust presents simple pull-down menus and a friendly point-and-click interface, allowing you to configure how the data is presented:\n", 343 | "\n", 344 | "" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "### Adding npm modules\n", 352 | "There are thousands of libraries and tools in the npm repository, Node.js’s package manager. It’s essential that we can install npm libraries and use them in our notebook code.\n", 353 | "Let’s say we want to make some HTTP calls to an external API service. We could deal with Node.js’s low-level HTTP library, or an easier option would be to use the ubiquitous `request` npm module.\n", 354 | "Once we have pixiedust_node set up, installing an npm module is as simple as running `npm.install` in a Python cell:" 355 | ] 356 | }, 357 | { 358 | "cell_type": "code", 359 | "execution_count": 9, 360 | "metadata": {}, 361 | "outputs": [ 362 | { 363 | "name": "stdout", 364 | "output_type": "stream", 365 | "text": [ 366 | "/opt/conda/envs/DSX-Python27/bin/npm install -s request\n", 367 | "+ request@2.87.0\n", 368 | "updated 1 package in 0.856s\n" 369 | ] 370 | } 371 | ], 372 | "source": [ 373 | "npm.install('request');" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "Once installed, you may `require` the module in your JavaScript code:" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 10, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "name": "stdout", 390 | "output_type": "stream", 391 | "text": [ 392 | "... ... ... ...\n", 393 | "... ...\n", 394 | "{ iss_position: { longitude: '42.2119', latitude: '51.1124' },\n", 395 | "timestamp: 1531779053,\n", 396 | "message: 'success' }\n" 397 | ] 398 | } 399 | ], 400 | "source": [ 401 | "%%node\n", 402 | "var request = require('request');\n", 403 | "var r = {\n", 404 | " method:'GET',\n", 405 | " url: 'http://api.open-notify.org/iss-now.json',\n", 406 | " json: true\n", 407 | "};\n", 408 | "request(r, function(err, req, body) {\n", 409 | " console.log(body);\n", 410 | "});\n" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "As an HTTP request is an asynchronous action, the `request` library calls our callback function when the operation has completed. Inside that function, we can call print to render the data.\n", 418 | "We can organise our code into functions to encapsulate complexity and make it easier to reuse code. We can create a function to get the current position of the International Space Station in one notebook cell:" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": 11, 424 | "metadata": {}, 425 | "outputs": [ 426 | { 427 | "name": "stdout", 428 | "output_type": "stream", 429 | "text": [ 430 | "... ..... ..... ..... ..... ... ..... ..... ....... ....... ....... ....... ....... ..... ..... ...\n" 431 | ] 432 | } 433 | ], 434 | "source": [ 435 | "%%node\n", 436 | "var request = require('request');\n", 437 | "var getPosition = function(callback) {\n", 438 | " var r = {\n", 439 | " method:'GET',\n", 440 | " url: 'http://api.open-notify.org/iss-now.json',\n", 441 | " json: true\n", 442 | " };\n", 443 | " request(r, function(err, req, body) {\n", 444 | " var obj = null;\n", 445 | " if (!err) {\n", 446 | " obj = body.iss_position\n", 447 | " obj.latitude = parseFloat(obj.latitude);\n", 448 | " obj.longitude = parseFloat(obj.longitude);\n", 449 | " obj.time = new Date().getTime(); \n", 450 | " }\n", 451 | " callback(err, obj);\n", 452 | " });\n", 453 | "};" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "And use it in another cell:" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 12, 466 | "metadata": {}, 467 | "outputs": [ 468 | { 469 | "name": "stdout", 470 | "output_type": "stream", 471 | "text": [ 472 | "... ...\n", 473 | "{ longitude: 42.9493, latitude: 51.1819, time: 1531779061073 }\n" 474 | ] 475 | } 476 | ], 477 | "source": [ 478 | "%%node\n", 479 | "getPosition(function(err, data) {\n", 480 | " console.log(data);\n", 481 | "});" 482 | ] 483 | }, 484 | { 485 | "cell_type": "markdown", 486 | "metadata": {}, 487 | "source": [ 488 | "### Promises\n", 489 | "If you prefer to work with JavaScript Promises when writing asynchronous code, then that’s okay too. Let’s rewrite our `getPosition` function to return a Promise. First we're going to install the `request-promise` module from npm:" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": 13, 495 | "metadata": {}, 496 | "outputs": [ 497 | { 498 | "name": "stdout", 499 | "output_type": "stream", 500 | "text": [ 501 | "/opt/conda/envs/DSX-Python27/bin/npm install -s request request-promise\n", 502 | "+ request@2.87.0\n", 503 | "+ request-promise@4.2.2\n", 504 | "updated 2 packages in 0.912s\n" 505 | ] 506 | } 507 | ], 508 | "source": [ 509 | "npm.install( ('request', 'request-promise') )" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "Notice how you can install multiple modules in a single call. Just pass in a Python `list` or `tuple`.\n", 517 | "Then we can refactor our function a little:" 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": 14, 523 | "metadata": {}, 524 | "outputs": [ 525 | { 526 | "name": "stdout", 527 | "output_type": "stream", 528 | "text": [ 529 | "... ..... ..... ..... ..... ... ..... ..... ..... ..... ..... ..... ..... ...\n" 530 | ] 531 | } 532 | ], 533 | "source": [ 534 | "%%node\n", 535 | "var request = require('request-promise');\n", 536 | "var getPosition = function(callback) {\n", 537 | " var r = {\n", 538 | " method:'GET',\n", 539 | " url: 'http://api.open-notify.org/iss-now.json',\n", 540 | " json: true\n", 541 | " };\n", 542 | " return request(r).then(function(body) {\n", 543 | " var obj = null;\n", 544 | " obj = body.iss_position;\n", 545 | " obj.latitude = parseFloat(obj.latitude);\n", 546 | " obj.longitude = parseFloat(obj.longitude);\n", 547 | " obj.time = new Date().getTime(); \n", 548 | " return obj;\n", 549 | " });\n", 550 | "};" 551 | ] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "metadata": {}, 556 | "source": [ 557 | "And call it in the Promises style:" 558 | ] 559 | }, 560 | { 561 | "cell_type": "code", 562 | "execution_count": 15, 563 | "metadata": {}, 564 | "outputs": [ 565 | { 566 | "name": "stdout", 567 | "output_type": "stream", 568 | "text": [ 569 | "... ... ... ...\n", 570 | "{ longitude: 44.0843, latitude: 51.2787, time: 1531779072259 }\n" 571 | ] 572 | } 573 | ], 574 | "source": [ 575 | "%%node\n", 576 | "getPosition().then(function(data) {\n", 577 | " console.log(data);\n", 578 | "}).catch(function(err) {\n", 579 | " console.error(err); \n", 580 | "});" 581 | ] 582 | }, 583 | { 584 | "cell_type": "markdown", 585 | "metadata": {}, 586 | "source": [ 587 | "Or call it in a more compact form:" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 16, 593 | "metadata": {}, 594 | "outputs": [ 595 | { 596 | "name": "stdout", 597 | "output_type": "stream", 598 | "text": [ 599 | "{ longitude: 44.6288, latitude: 51.3208, time: 1531779077984 }\n" 600 | ] 601 | } 602 | ], 603 | "source": [ 604 | "%%node\n", 605 | "getPosition().then(console.log).catch(console.error);" 606 | ] 607 | }, 608 | { 609 | "cell_type": "markdown", 610 | "metadata": {}, 611 | "source": [ 612 | "In the next part of this notebook we'll illustrate how you can access local and remote data sources from within the notebook." 613 | ] 614 | }, 615 | { 616 | "cell_type": "markdown", 617 | "metadata": {}, 618 | "source": [ 619 | "***\n", 620 | "# Part 2: Working with data sources\n", 621 | "\n", 622 | "You can access any data source using your favorite public or home-grown packages. In the second part of this notebook you'll learn how to retrieve data from an Apache CouchDB (or Cloudant) database and visualize it using PixieDust or third-party libraries.\n", 623 | "\n", 624 | "## Accessing Cloudant data sources\n", 625 | "\n", 626 | "\n", 627 | "To access data stored in an Apache CouchDB or Cloudant database, we can use the [`cloudant-quickstart`](https://www.npmjs.com/package/cloudant-quickstart) npm module:" 628 | ] 629 | }, 630 | { 631 | "cell_type": "code", 632 | "execution_count": 17, 633 | "metadata": {}, 634 | "outputs": [ 635 | { 636 | "name": "stdout", 637 | "output_type": "stream", 638 | "text": [ 639 | "/opt/conda/envs/DSX-Python27/bin/npm install -s cloudant-quickstart\n", 640 | "+ cloudant-quickstart@1.25.5\n", 641 | "updated 1 package in 0.983s\n" 642 | ] 643 | } 644 | ], 645 | "source": [ 646 | "npm.install('cloudant-quickstart')" 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "With our Cloudant URL, we can start exploring the data in Node.js. First we make a connection to the remote Cloudant database:" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": 18, 659 | "metadata": {}, 660 | "outputs": [], 661 | "source": [ 662 | "%%node\n", 663 | "// connect to Cloudant using cloudant-quickstart\n", 664 | "const cqs = require('cloudant-quickstart');\n", 665 | "const cities = cqs('https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities');" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "> For this code pattern example a remote database has been pre-configured to accept anonymous connection requests. If you wish to explore the `cloudant-quickstart` library beyond what is covered in this nodebook, we recommend you create your own replica and replace above URL with your own, e.g. `https://myid:mypassword@mycloudanthost/mydatabase`.\n", 673 | "\n", 674 | "Now we have an object named `cities` that we can use to access the database. \n", 675 | "\n", 676 | "### Exploring the data using Node.js in a notebook \n", 677 | "\n", 678 | "We can retrieve all documents using `all`." 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": 19, 684 | "metadata": {}, 685 | "outputs": [ 686 | { 687 | "name": "stdout", 688 | "output_type": "stream", 689 | "text": [ 690 | "[ { _id: '1000501',\n", 691 | "name: 'Grahamstown',\n", 692 | "latitude: -33.30422,\n", 693 | "longitude: 26.53276,\n", 694 | "country: 'ZA',\n", 695 | "population: 91548,\n", 696 | "timezone: 'Africa/Johannesburg' },\n", 697 | "{ _id: '1000543',\n", 698 | "name: 'Graaff-Reinet',\n", 699 | "latitude: -32.25215,\n", 700 | "longitude: 24.53075,\n", 701 | "country: 'ZA',\n", 702 | "population: 62896,\n", 703 | "timezone: 'Africa/Johannesburg' },\n", 704 | "{ _id: '100077',\n", 705 | "name: 'Abū Ghurayb',\n", 706 | "latitude: 33.30563,\n", 707 | "longitude: 44.18477,\n", 708 | "country: 'IQ',\n", 709 | "population: 900000,\n", 710 | "timezone: 'Asia/Baghdad' } ]\n" 711 | ] 712 | } 713 | ], 714 | "source": [ 715 | "%%node\n", 716 | "// If no limit is specified, 100 documents will be returned\n", 717 | "cities.all({limit:3}).then(console.log).catch(console.error)" 718 | ] 719 | }, 720 | { 721 | "cell_type": "markdown", 722 | "metadata": {}, 723 | "source": [ 724 | "Specifying the optional `limit` and `skip` parameters we can paginate through the document list:\n", 725 | "\n", 726 | "```\n", 727 | "cities.all({limit:10}).then(console.log).catch(console.error)\n", 728 | "cities.all({skip:10, limit:10}).then(console.log).catch(console.error)\n", 729 | "```" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": {}, 735 | "source": [ 736 | "If we know the IDs of documents, we can retrieve them singly:" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 20, 742 | "metadata": {}, 743 | "outputs": [ 744 | { 745 | "name": "stdout", 746 | "output_type": "stream", 747 | "text": [ 748 | "{ _id: '2636749',\n", 749 | "name: 'Stowmarket',\n", 750 | "latitude: 52.18893,\n", 751 | "longitude: 0.99774,\n", 752 | "country: 'GB',\n", 753 | "population: 15394,\n", 754 | "timezone: 'Europe/London' }\n" 755 | ] 756 | } 757 | ], 758 | "source": [ 759 | "%%node\n", 760 | "cities.get('2636749').then(console.log).catch(console.error);" 761 | ] 762 | }, 763 | { 764 | "cell_type": "markdown", 765 | "metadata": {}, 766 | "source": [ 767 | "Or in bulk:" 768 | ] 769 | }, 770 | { 771 | "cell_type": "code", 772 | "execution_count": 21, 773 | "metadata": { 774 | "scrolled": true 775 | }, 776 | "outputs": [ 777 | { 778 | "name": "stdout", 779 | "output_type": "stream", 780 | "text": [ 781 | "[ { _id: '5913490',\n", 782 | "name: 'Calgary',\n", 783 | "latitude: 51.05011,\n", 784 | "longitude: -114.08529,\n", 785 | "country: 'CA',\n", 786 | "population: 1019942,\n", 787 | "timezone: 'America/Edmonton' },\n", 788 | "{ _id: '4140963',\n", 789 | "name: 'Washington, D.C.',\n", 790 | "latitude: 38.89511,\n", 791 | "longitude: -77.03637,\n", 792 | "country: 'US',\n", 793 | "population: 601723,\n", 794 | "timezone: 'America/New_York' },\n", 795 | "{ _id: '3520274',\n", 796 | "name: 'Río Blanco',\n", 797 | "latitude: 18.83036,\n", 798 | "longitude: -97.156,\n", 799 | "country: 'MX',\n", 800 | "population: 39543,\n", 801 | "timezone: 'America/Mexico_City' } ]\n" 802 | ] 803 | } 804 | ], 805 | "source": [ 806 | "%%node\n", 807 | "cities.get(['5913490', '4140963','3520274']).then(console.log).catch(console.error);" 808 | ] 809 | }, 810 | { 811 | "cell_type": "markdown", 812 | "metadata": {}, 813 | "source": [ 814 | "Instead of just calling `print` to output the JSON, we can bring PixieDust's `display` function to bear by passing it an array of data to visualize. Using mapbox as renderer and satelite as basemap, we can display the location and population of the selected cities: " 815 | ] 816 | }, 817 | { 818 | "cell_type": "code", 819 | "execution_count": 22, 820 | "metadata": { 821 | "pixiedust": { 822 | "displayParams": { 823 | "basemap": "satellite-v9", 824 | "chartsize": "76", 825 | "coloropacity": "53", 826 | "colorrampname": "Orange to Purple", 827 | "handlerId": "mapView", 828 | "keyFields": "latitude,longitude", 829 | "kind": "simple-cluster", 830 | "legend": "false", 831 | "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", 832 | "rendererId": "mapbox", 833 | "rowCount": "500", 834 | "valueFields": "population,name" 835 | } 836 | }, 837 | "scrolled": false 838 | }, 839 | "outputs": [], 840 | "source": [ 841 | "%%node\n", 842 | "cities.get(['5913490', '4140963','3520274']).then(display).catch(console.error);" 843 | ] 844 | }, 845 | { 846 | "cell_type": "markdown", 847 | "metadata": {}, 848 | "source": [ 849 | "\n", 850 | "We can also query a subset of the data using the `query` function, passing it a [Cloudant Query](https://cloud.ibm.com/docs/services/Cloudant/api/cloudant_query.html#query) statement. Using mapbox as renderer, the customizable output looks as follows:" 851 | ] 852 | }, 853 | { 854 | "cell_type": "code", 855 | "execution_count": 23, 856 | "metadata": { 857 | "pixiedust": { 858 | "displayParams": { 859 | "basemap": "outdoors-v9", 860 | "colorrampname": "Yellow to Blue", 861 | "handlerId": "mapView", 862 | "keyFields": "latitude,longitude", 863 | "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", 864 | "rowCount": "500", 865 | "valueFields": "name,population" 866 | } 867 | }, 868 | "scrolled": false 869 | }, 870 | "outputs": [], 871 | "source": [ 872 | "%%node\n", 873 | "// fetch cities in UK above latitude 54 degrees north\n", 874 | "cities.query({country:'GB', latitude: { \"$gt\": 54}}).then(display).catch(console.error);" 875 | ] 876 | }, 877 | { 878 | "cell_type": "markdown", 879 | "metadata": {}, 880 | "source": [ 881 | "\n", 882 | "\n", 883 | "### Aggregating data\n", 884 | "The `cloudant-quickstart` library also allows aggregations (sum, count, stats) to be performed in the Cloudant database.\n", 885 | "Let’s calculate the sum of the population field:" 886 | ] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 24, 891 | "metadata": {}, 892 | "outputs": [ 893 | { 894 | "name": "stdout", 895 | "output_type": "stream", 896 | "text": [ 897 | "2694222973\n", 898 | "\n" 899 | ] 900 | } 901 | ], 902 | "source": [ 903 | "%%node\n", 904 | "cities.sum('population').then(console.log).catch(console.error);" 905 | ] 906 | }, 907 | { 908 | "cell_type": "markdown", 909 | "metadata": {}, 910 | "source": [ 911 | "Or compute the sum of the `population`, grouped by the `country` field, displaying 10 countries with the largest population:" 912 | ] 913 | }, 914 | { 915 | "cell_type": "code", 916 | "execution_count": 25, 917 | "metadata": { 918 | "pixiedust": { 919 | "displayParams": { 920 | "aggregation": "SUM", 921 | "handlerId": "barChart", 922 | "keyFields": "name", 923 | "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", 924 | "orientation": "vertical", 925 | "rendererId": "google", 926 | "rowCount": "100", 927 | "sortby": "Values DESC", 928 | "valueFields": "population" 929 | } 930 | }, 931 | "scrolled": false 932 | }, 933 | "outputs": [ 934 | { 935 | "name": "stdout", 936 | "output_type": "stream", 937 | "text": [ 938 | "... ... ... ..... ..... ... ... ..... ..... ... ... ..... ..... ...\n", 939 | "CN 389,487,480\n", 940 | "IN 269,553,896\n", 941 | "US 190,515,768\n", 942 | "BR 125,426,547\n", 943 | "RU 108,885,695\n", 944 | "JP 99,000,238\n", 945 | "MX 80,474,387\n", 946 | "ID 63,161,801\n", 947 | "DE 58,884,999\n", 948 | "TR 55,733,719\n" 949 | ] 950 | } 951 | ], 952 | "source": [ 953 | "%%node\n", 954 | "\n", 955 | "// helper function\n", 956 | "function top10(data) {\n", 957 | " // convert input data structure to array\n", 958 | " var pop_array = [];\n", 959 | " Object.keys(data).forEach(function(n,k) {\n", 960 | " pop_array.push({name: n, population: data[n]});\n", 961 | " });\n", 962 | " // sort array by population in descending order\n", 963 | " pop_array.sort(function(a,b) {\n", 964 | " return b.population - a.population; \n", 965 | " });\n", 966 | " // display top 10 entries\n", 967 | " pop_array.slice(0,10).forEach(function(e) {\n", 968 | " console.log(e.name + ' ' + e.population.toLocaleString()); \n", 969 | " });\n", 970 | "}\n", 971 | "\n", 972 | "// fetch aggregated data and invoke helper routine\n", 973 | "cities.sum('population','country').then(top10).catch(console.error);" 974 | ] 975 | }, 976 | { 977 | "cell_type": "markdown", 978 | "metadata": {}, 979 | "source": [ 980 | "The `cloudant-quickstart` package is just one of several Node.js libraries that you can use to access Apache CouchDB or Cloudant. Follow [this link](https://medium.com/ibm-watson-data-lab/choosing-a-cloudant-library-d14c06f3d714) to learn more about your options. " 981 | ] 982 | }, 983 | { 984 | "cell_type": "markdown", 985 | "metadata": {}, 986 | "source": [ 987 | "### Visualizing data using custom charts\n", 988 | "\n", 989 | "If you prefer, you can also use third-party Node.js charting packages to visualize your data, such as [`quiche`](https://www.npmjs.com/package/quiche)." 990 | ] 991 | }, 992 | { 993 | "cell_type": "code", 994 | "execution_count": 26, 995 | "metadata": {}, 996 | "outputs": [ 997 | { 998 | "name": "stdout", 999 | "output_type": "stream", 1000 | "text": [ 1001 | "/opt/conda/envs/DSX-Python27/bin/npm install -s quiche\n", 1002 | "+ quiche@0.3.0\n", 1003 | "updated 1 package in 0.957s\n" 1004 | ] 1005 | } 1006 | ], 1007 | "source": [ 1008 | "npm.install('quiche');" 1009 | ] 1010 | }, 1011 | { 1012 | "cell_type": "code", 1013 | "execution_count": 27, 1014 | "metadata": {}, 1015 | "outputs": [ 1016 | { 1017 | "name": "stdout", 1018 | "output_type": "stream", 1019 | "text": [ 1020 | "... ... ... ... ... ... ... ... ...\n" 1021 | ] 1022 | }, 1023 | { 1024 | "data": { 1025 | "text/html": [ 1026 | "" 1027 | ], 1028 | "text/plain": [ 1029 | "" 1030 | ] 1031 | }, 1032 | "metadata": {}, 1033 | "output_type": "display_data" 1034 | } 1035 | ], 1036 | "source": [ 1037 | "%%node\n", 1038 | "var Quiche = require('quiche');\n", 1039 | "var pie = new Quiche('pie');\n", 1040 | "\n", 1041 | "// fetch cities in UK\n", 1042 | "cities.query({name: 'Cambridge'}).then(function(data) {\n", 1043 | "\n", 1044 | " var colors = ['ff00ff','0055ff', 'ff0000', 'ffff00', '00ff00','0000ff'];\n", 1045 | " for(i in data) {\n", 1046 | " var city = data[i];\n", 1047 | " pie.addData(city.population, city.name + '(' + city.country +')', colors[i]);\n", 1048 | " }\n", 1049 | " var imageUrl = pie.getUrl(true);\n", 1050 | " image(imageUrl); \n", 1051 | "});" 1052 | ] 1053 | }, 1054 | { 1055 | "cell_type": "markdown", 1056 | "metadata": {}, 1057 | "source": [ 1058 | "***\n", 1059 | "# Part 3: Sharing data between Python and Node.js cells\n", 1060 | "\n", 1061 | "You can share variables between Python and Node.js cells. Why woud you want to do that? Read on.\n", 1062 | "\n", 1063 | "The Node.js library ecosystem is extensive. Perhaps you need to fetch data from a database and prefer the syntax of a particular Node.js npm module. You can use Node.js to fetch the data, move it to the Python environment, and convert it into a Pandas or Spark DataFrame for aggregation, analysis and visualisation.\n", 1064 | "\n", 1065 | "PixieDust and pixiedust_node give you the flexibility to mix and match Python and Node.js code to suit the workflow you are building and the skill sets you have in your team.\n", 1066 | "\n", 1067 | "Mixing Node.js and Python code in the same notebook is a great way to integrate the work of your software development and data science teams to produce a collaborative report or dashboard.\n", 1068 | "\n", 1069 | "\n", 1070 | "### Sharing data\n", 1071 | "\n", 1072 | "Define variables in a Python cell." 1073 | ] 1074 | }, 1075 | { 1076 | "cell_type": "code", 1077 | "execution_count": 28, 1078 | "metadata": {}, 1079 | "outputs": [], 1080 | "source": [ 1081 | "# define a couple variables in Python\n", 1082 | "a = 'Hello from Python!'\n", 1083 | "b = 2\n", 1084 | "c = False\n", 1085 | "d = {'x':1, 'y':2}\n", 1086 | "e = 3.142\n", 1087 | "f = [{'a':1}, {'a':2}, {'a':3}]" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "markdown", 1092 | "metadata": {}, 1093 | "source": [ 1094 | "Access or modify their values in Node.js cells." 1095 | ] 1096 | }, 1097 | { 1098 | "cell_type": "code", 1099 | "execution_count": 29, 1100 | "metadata": {}, 1101 | "outputs": [ 1102 | { 1103 | "name": "stdout", 1104 | "output_type": "stream", 1105 | "text": [ 1106 | "Hello from Python! 2 false { y: 2, x: 1 } 3.142 [ { a: 1 }, { a: 2 }, { a: 3 } ]\n" 1107 | ] 1108 | } 1109 | ], 1110 | "source": [ 1111 | "%%node\n", 1112 | "// print variable values\n", 1113 | "console.log(a, b, c, d, e, f);\n", 1114 | "\n", 1115 | "// change variable value \n", 1116 | "a = 'Hello from Node.js!';\n", 1117 | "\n", 1118 | "// define a new variable\n", 1119 | "var g = 'Yes, it works both ways.';" 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "markdown", 1124 | "metadata": {}, 1125 | "source": [ 1126 | "Inspect the manipulated data." 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": 30, 1132 | "metadata": {}, 1133 | "outputs": [ 1134 | { 1135 | "name": "stdout", 1136 | "output_type": "stream", 1137 | "text": [ 1138 | "Hello from Node.js! Yes, it works both ways.\n" 1139 | ] 1140 | } 1141 | ], 1142 | "source": [ 1143 | "# display modified variable and the new variable\n", 1144 | "print('{} {}'.format(a,g))" 1145 | ] 1146 | }, 1147 | { 1148 | "cell_type": "markdown", 1149 | "metadata": {}, 1150 | "source": [ 1151 | "**Note:** PixieDust natively supports [data sharing between Python and Scala](https://ibm-watson-data-lab.github.io/pixiedust/scalabridge.html), extending the loop for some data types:\n", 1152 | " ```\n", 1153 | " %%scala\n", 1154 | " println(a,b,c,d,e,f,g)\n", 1155 | " \n", 1156 | " (Hello from Node.js!,2,null,null,null,null,Yes, it works both ways.)\n", 1157 | " ```" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "markdown", 1162 | "metadata": {}, 1163 | "source": [ 1164 | "### Sharing data from an asynchronous callback\n", 1165 | "\n", 1166 | "If you wish transfer data from Node.js to Python from an asynchronous callback, make sure you write the data to a global variable. \n", 1167 | "\n", 1168 | "Load a csv file from a GitHub repository." 1169 | ] 1170 | }, 1171 | { 1172 | "cell_type": "code", 1173 | "execution_count": 31, 1174 | "metadata": {}, 1175 | "outputs": [ 1176 | { 1177 | "name": "stdout", 1178 | "output_type": "stream", 1179 | "text": [ 1180 | "... ... ...\n", 1181 | "Fetched sample data from GitHub.\n" 1182 | ] 1183 | } 1184 | ], 1185 | "source": [ 1186 | "%%node\n", 1187 | "\n", 1188 | "// global variable\n", 1189 | "var sample_csv_data = '';\n", 1190 | "\n", 1191 | "// load csv file from GitHub and store data in the global variable\n", 1192 | "request.get('https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv').then(function(data) {\n", 1193 | " sample_csv_data = data;\n", 1194 | " console.log('Fetched sample data from GitHub.');\n", 1195 | "});" 1196 | ] 1197 | }, 1198 | { 1199 | "cell_type": "markdown", 1200 | "metadata": {}, 1201 | "source": [ 1202 | "Create a Pandas DataFrame from the downloaded data." 1203 | ] 1204 | }, 1205 | { 1206 | "cell_type": "code", 1207 | "execution_count": 32, 1208 | "metadata": { 1209 | "pixiedust": { 1210 | "displayParams": {} 1211 | } 1212 | }, 1213 | "outputs": [ 1214 | { 1215 | "data": { 1216 | "text/html": [ 1217 | "
\n", 1218 | "\n", 1231 | "\n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1255 | " \n", 1256 | " \n", 1257 | " \n", 1258 | " \n", 1259 | " \n", 1260 | " \n", 1261 | " \n", 1262 | " \n", 1263 | " \n", 1264 | " \n", 1265 | " \n", 1266 | " \n", 1267 | " \n", 1268 | " \n", 1269 | " \n", 1270 | " \n", 1271 | " \n", 1272 | " \n", 1273 | " \n", 1274 | " \n", 1275 | " \n", 1276 | " \n", 1277 | " \n", 1278 | " \n", 1279 | " \n", 1280 | " \n", 1281 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1293 | " \n", 1294 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | "
mpgcylindersenginehorsepowerweightaccelerationyearoriginname
018.08307.0130350412.070Americanchevrolet chevelle malibu
115.08350.0165369311.570Americanbuick skylark 320
218.08318.0150343611.070Americanplymouth satellite
316.08304.0150343312.070Americanamc rebel sst
417.08302.0140344910.570Americanford torino
\n", 1309 | "
" 1310 | ], 1311 | "text/plain": [ 1312 | " mpg cylinders engine horsepower weight acceleration year origin \\\n", 1313 | "0 18.0 8 307.0 130 3504 12.0 70 American \n", 1314 | "1 15.0 8 350.0 165 3693 11.5 70 American \n", 1315 | "2 18.0 8 318.0 150 3436 11.0 70 American \n", 1316 | "3 16.0 8 304.0 150 3433 12.0 70 American \n", 1317 | "4 17.0 8 302.0 140 3449 10.5 70 American \n", 1318 | "\n", 1319 | " name \n", 1320 | "0 chevrolet chevelle malibu \n", 1321 | "1 buick skylark 320 \n", 1322 | "2 plymouth satellite \n", 1323 | "3 amc rebel sst \n", 1324 | "4 ford torino " 1325 | ] 1326 | }, 1327 | "execution_count": 33, 1328 | "metadata": {}, 1329 | "output_type": "execute_result" 1330 | } 1331 | ], 1332 | "source": [ 1333 | "import pandas as pd\n", 1334 | "import io\n", 1335 | "# create DataFrame from shared csv data\n", 1336 | "pandas_df = pd.read_csv(io.StringIO(sample_csv_data))\n", 1337 | "# display first five rows\n", 1338 | "pandas_df.head(5)" 1339 | ] 1340 | }, 1341 | { 1342 | "cell_type": "markdown", 1343 | "metadata": {}, 1344 | "source": [ 1345 | "**Note**: Above example is for illustrative purposes only. A much easier solution is to use [PixieDust's sampleData method](https://ibm-watson-data-lab.github.io/pixiedust/loaddata.html#load-a-csv-using-its-url) if you want to create a DataFrame from a URL. " 1346 | ] 1347 | }, 1348 | { 1349 | "cell_type": "markdown", 1350 | "metadata": {}, 1351 | "source": [ 1352 | "#### References:\n", 1353 | " * [Nodebooks: Introducing Node.js Data Science Notebooks](https://medium.com/ibm-watson-data-lab/nodebooks-node-js-data-science-notebooks-aa140bea21ba)\n", 1354 | " * [Nodebooks: Sharing Data Between Node.js & Python](https://medium.com/ibm-watson-data-lab/nodebooks-sharing-data-between-node-js-python-3a4acae27a02)\n", 1355 | " * [Sharing Variables Between Python & Node.js in Jupyter Notebooks](https://medium.com/ibm-watson-data-lab/sharing-variables-between-python-node-js-in-jupyter-notebooks-682a79d4bdd9)" 1356 | ] 1357 | } 1358 | ], 1359 | "metadata": { 1360 | "kernelspec": { 1361 | "display_name": "Python 2.7", 1362 | "language": "python", 1363 | "name": "python2" 1364 | }, 1365 | "language_info": { 1366 | "codemirror_mode": { 1367 | "name": "ipython", 1368 | "version": 2 1369 | }, 1370 | "file_extension": ".py", 1371 | "mimetype": "text/x-python", 1372 | "name": "python", 1373 | "nbconvert_exporter": "python", 1374 | "pygments_lexer": "ipython2", 1375 | "version": "2.7.15" 1376 | } 1377 | }, 1378 | "nbformat": 4, 1379 | "nbformat_minor": 2 1380 | } 1381 | -------------------------------------------------------------------------------- /doc/source/images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/doc/source/images/architecture.png -------------------------------------------------------------------------------- /doc/source/images/new_custom_environment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/doc/source/images/new_custom_environment.png -------------------------------------------------------------------------------- /doc/source/images/new_notebook_custom_environment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/doc/source/images/new_notebook_custom_environment.png -------------------------------------------------------------------------------- /doc/source/images/notebook_preview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/doc/source/images/notebook_preview.png -------------------------------------------------------------------------------- /notebooks/images/display_sin_cos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/notebooks/images/display_sin_cos.png -------------------------------------------------------------------------------- /notebooks/images/mapbox_americas.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/notebooks/images/mapbox_americas.png -------------------------------------------------------------------------------- /notebooks/images/mapbox_uk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/notebooks/images/mapbox_uk.png -------------------------------------------------------------------------------- /notebooks/images/pd_chart_types.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/notebooks/images/pd_chart_types.png -------------------------------------------------------------------------------- /notebooks/images/pixiedust_node_schematic.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/nodejs-in-notebooks/aaae516889eebb7cbb4db57bdc4963b6a2ad9325/notebooks/images/pixiedust_node_schematic.png -------------------------------------------------------------------------------- /notebooks/nodebook_1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Nodebooks: Introducing Node.js Data Science Notebooks\n", 8 | "\n", 9 | "Notebooks are where data scientists process, analyse, and visualise data in an iterative, collaborative environment. They typically run environments for languages like Python, R, and Scala. For years, data science notebooks have served academics and research scientists as a scratchpad for writing code, refining algorithms, and sharing and proving their work. Today, it's a workflow that lends itself well to web developers experimenting with data sets in Node.js.\n", 10 | "\n", 11 | "To that end, pixiedust_node is an add-on for Jupyter notebooks that allows Node.js/JavaScript to run inside notebook cells. Not only can web developers use the same workflow for collaborating in Node.js, but they can also use the same tools to work with existing data scientists coding in Python.\n", 12 | "\n", 13 | "pixiedust_node is built on the popular PixieDust helper library. Let’s get started.\n", 14 | "\n", 15 | "> Note: Run one cell at a time or unexpected results might be observed.\n", 16 | "\n", 17 | "\n", 18 | "## Part 1: Variables, functions, and promises\n", 19 | "\n", 20 | "\n", 21 | "### Installing\n", 22 | "Install the [`pixiedust`](https://pypi.python.org/pypi/pixiedust) and [`pixiedust_node`](https://pypi.python.org/pypi/pixiedust-node) packages using `pip`, the Python package manager. " 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": null, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "# install or upgrade the packages\n", 32 | "# restart the kernel to pick up the latest version\n", 33 | "!pip install pixiedust --upgrade\n", 34 | "!pip install pixiedust_node --upgrade" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "### Using pixiedust_node\n", 42 | "Now we can import `pixiedust_node` into our notebook:" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "import pixiedust_node" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "And then we can write JavaScript code in cells whose first line is `%%node`:" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "%%node\n", 68 | "// get the current date\n", 69 | "var date = new Date();" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "It’s that easy! We can have Python and Node.js in the same notebook. Cells are Python by default, but simply starting a cell with `%%node` indicates that the next lines will be JavaScript." 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "### Displaying HTML and images in notebook cells\n", 84 | "We can use the `html` function to render HTML code in a cell:" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "%%node\n", 94 | "var str = '

Quote

\"Imagination is more important than knowledge\"\\nAlbert Einstein
';\n", 95 | "html(str)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "If we have an image we want to render, we can do that with the `image` function:" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "%%node\n", 112 | "var url = 'https://github.com/IBM/nodejs-in-notebooks/blob/master/notebooks/images/pixiedust_node_schematic.png?raw=true';\n", 113 | "image(url);" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### Printing JavaScript variables\n", 121 | "\n", 122 | "Print variables using `console.log`." 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "%%node\n", 132 | "var x = { a:1, b:'two', c: true };\n", 133 | "console.log(x);" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "Calling the `print` function within your JavaScript code is the same as calling `print` in your Python code." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "%%node\n", 150 | "var y = { a:3, b:'four', c: false };\n", 151 | "print(y);" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "### Visualizing data using PixieDust\n", 159 | "You can also use PixieDust’s `display` function to render data graphically. Configuring the output as line chart, the visualization looks as follows: " 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "metadata": { 166 | "pixiedust": { 167 | "displayParams": { 168 | "aggregation": "SUM", 169 | "chartsize": "99", 170 | "handlerId": "lineChart", 171 | "keyFields": "x", 172 | "rowCount": "500", 173 | "valueFields": "cos,sin" 174 | } 175 | } 176 | }, 177 | "outputs": [], 178 | "source": [ 179 | "%%node\n", 180 | "var data = [];\n", 181 | "for (var i = 0; i < 1000; i++) {\n", 182 | " var x = 2*Math.PI * i/ 360;\n", 183 | " var obj = {\n", 184 | " x: x,\n", 185 | " i: i,\n", 186 | " sin: Math.sin(x),\n", 187 | " cos: Math.cos(x),\n", 188 | " tan: Math.tan(x)\n", 189 | " };\n", 190 | " data.push(obj);\n", 191 | "}\n", 192 | "// render data \n", 193 | "display(data);" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "PixieDust presents visualisations of DataFrames using Matplotlib, Bokeh, Brunel, d3, Google Maps and, MapBox. No code is required on your part because PixieDust presents simple pull-down menus and a friendly point-and-click interface, allowing you to configure how the data is presented:\n", 201 | "\n", 202 | "" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "### Adding npm modules\n", 210 | "There are thousands of libraries and tools in the npm repository, Node.js’s package manager. It’s essential that we can install npm libraries and use them in our notebook code.\n", 211 | "Let’s say we want to make some HTTP calls to an external API service. We could deal with Node.js’s low-level HTTP library, or an easier option would be to use the ubiquitous `request` npm module.\n", 212 | "Once we have pixiedust_node set up, installing an npm module is as simple as running `npm.install` in a Python cell:" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "npm.install('request');" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "Once installed, you may `require` the module in your JavaScript code:" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "%%node\n", 238 | "var request = require('request');\n", 239 | "var r = {\n", 240 | " method:'GET',\n", 241 | " url: 'http://api.open-notify.org/iss-now.json',\n", 242 | " json: true\n", 243 | "};\n", 244 | "request(r, function(err, req, body) {\n", 245 | " console.log(body);\n", 246 | "});\n" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "As an HTTP request is an asynchronous action, the `request` library calls our callback function when the operation has completed. Inside that function, we can call print to render the data.\n", 254 | "We can organise our code into functions to encapsulate complexity and make it easier to reuse code. We can create a function to get the current position of the International Space Station in one notebook cell:" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "%%node\n", 264 | "var request = require('request');\n", 265 | "var getPosition = function(callback) {\n", 266 | " var r = {\n", 267 | " method:'GET',\n", 268 | " url: 'http://api.open-notify.org/iss-now.json',\n", 269 | " json: true\n", 270 | " };\n", 271 | " request(r, function(err, req, body) {\n", 272 | " var obj = null;\n", 273 | " if (!err) {\n", 274 | " obj = body.iss_position\n", 275 | " obj.latitude = parseFloat(obj.latitude);\n", 276 | " obj.longitude = parseFloat(obj.longitude);\n", 277 | " obj.time = new Date().getTime(); \n", 278 | " }\n", 279 | " callback(err, obj);\n", 280 | " });\n", 281 | "};" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "And use it in another cell:" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "%%node\n", 298 | "getPosition(function(err, data) {\n", 299 | " console.log(data);\n", 300 | "});" 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "### Promises\n", 308 | "If you prefer to work with JavaScript Promises when writing asynchronous code, then that’s okay too. Let’s rewrite our `getPosition` function to return a Promise. First we're going to install the `request-promise` module from npm:" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "npm.install( ('request', 'request-promise') )" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "Notice how you can install multiple modules in a single call. Just pass in a Python `list` or `tuple`.\n", 325 | "Then we can refactor our function a little:" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": null, 331 | "metadata": {}, 332 | "outputs": [], 333 | "source": [ 334 | "%%node\n", 335 | "var request = require('request-promise');\n", 336 | "var getPosition = function(callback) {\n", 337 | " var r = {\n", 338 | " method:'GET',\n", 339 | " url: 'http://api.open-notify.org/iss-now.json',\n", 340 | " json: true\n", 341 | " };\n", 342 | " return request(r).then(function(body) {\n", 343 | " var obj = null;\n", 344 | " obj = body.iss_position;\n", 345 | " obj.latitude = parseFloat(obj.latitude);\n", 346 | " obj.longitude = parseFloat(obj.longitude);\n", 347 | " obj.time = new Date().getTime(); \n", 348 | " return obj;\n", 349 | " });\n", 350 | "};" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "And call it in the Promises style:" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": null, 363 | "metadata": {}, 364 | "outputs": [], 365 | "source": [ 366 | "%%node\n", 367 | "getPosition().then(function(data) {\n", 368 | " console.log(data);\n", 369 | "}).catch(function(err) {\n", 370 | " console.error(err); \n", 371 | "});" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "Or call it in a more compact form:" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": null, 384 | "metadata": {}, 385 | "outputs": [], 386 | "source": [ 387 | "%%node\n", 388 | "getPosition().then(console.log).catch(console.error);" 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": {}, 394 | "source": [ 395 | "In the next part of this notebook we'll illustrate how you can access local and remote data sources from within the notebook." 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "***\n", 403 | "# Part 2: Working with data sources\n", 404 | "\n", 405 | "You can access any data source using your favorite public or home-grown packages. In the second part of this notebook you'll learn how to retrieve data from an Apache CouchDB (or Cloudant) database and visualize it using PixieDust or third-party libraries.\n", 406 | "\n", 407 | "## Accessing Cloudant data sources\n", 408 | "\n", 409 | "\n", 410 | "To access data stored in an Apache CouchDB or Cloudant database, we can use the [`cloudant-quickstart`](https://www.npmjs.com/package/cloudant-quickstart) npm module:" 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": null, 416 | "metadata": {}, 417 | "outputs": [], 418 | "source": [ 419 | "npm.install('cloudant-quickstart')" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "With our Cloudant URL, we can start exploring the data in Node.js. First we make a connection to the remote Cloudant database:" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "%%node\n", 436 | "// connect to Cloudant using cloudant-quickstart\n", 437 | "const cqs = require('cloudant-quickstart');\n", 438 | "const cities = cqs('https://56953ed8-3fba-4f7e-824e-5498c8e1d18e-bluemix.cloudant.com/cities');" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "> For this code pattern example a remote database has been pre-configured to accept anonymous connection requests. If you wish to explore the `cloudant-quickstart` library beyond what is covered in this nodebook, we recommend you create your own replica and replace above URL with your own, e.g. `https://myid:mypassword@mycloudanthost/mydatabase`.\n", 446 | "\n", 447 | "Now we have an object named `cities` that we can use to access the database. \n", 448 | "\n", 449 | "### Exploring the data using Node.js in a notebook \n", 450 | "\n", 451 | "We can retrieve all documents using `all`." 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": null, 457 | "metadata": {}, 458 | "outputs": [], 459 | "source": [ 460 | "%%node\n", 461 | "// If no limit is specified, 100 documents will be returned\n", 462 | "cities.all({limit:3}).then(console.log).catch(console.error)" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "Specifying the optional `limit` and `skip` parameters we can paginate through the document list:\n", 470 | "\n", 471 | "```\n", 472 | "cities.all({limit:10}).then(console.log).catch(console.error)\n", 473 | "cities.all({skip:10, limit:10}).then(console.log).catch(console.error)\n", 474 | "```" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "If we know the IDs of documents, we can retrieve them singly:" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": null, 487 | "metadata": {}, 488 | "outputs": [], 489 | "source": [ 490 | "%%node\n", 491 | "cities.get('2636749').then(console.log).catch(console.error);" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "Or in bulk:" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": null, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "%%node\n", 508 | "cities.get(['5913490', '4140963','3520274']).then(console.log).catch(console.error);" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "Instead of just calling `print` to output the JSON, we can bring PixieDust's `display` function to bear by passing it an array of data to visualize. Using mapbox as renderer and satelite as basemap, we can display the location and population of the selected cities: " 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": null, 521 | "metadata": { 522 | "pixiedust": { 523 | "displayParams": { 524 | "basemap": "satellite-v9", 525 | "chartsize": "76", 526 | "coloropacity": "53", 527 | "colorrampname": "Orange to Purple", 528 | "handlerId": "mapView", 529 | "keyFields": "latitude,longitude", 530 | "kind": "simple-cluster", 531 | "legend": "false", 532 | "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", 533 | "rendererId": "mapbox", 534 | "rowCount": "500", 535 | "valueFields": "population,name" 536 | } 537 | }, 538 | "scrolled": false 539 | }, 540 | "outputs": [], 541 | "source": [ 542 | "%%node\n", 543 | "cities.get(['5913490', '4140963','3520274']).then(display).catch(console.error);" 544 | ] 545 | }, 546 | { 547 | "cell_type": "markdown", 548 | "metadata": {}, 549 | "source": [ 550 | "We can also query a subset of the data using the `query` function, passing it a [Cloudant Query](https://cloud.ibm.com/docs/services/Cloudant/api/cloudant_query.html#query) statement. Using mapbox as renderer, the customizable output looks as follows:" 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": null, 556 | "metadata": { 557 | "pixiedust": { 558 | "displayParams": { 559 | "basemap": "outdoors-v9", 560 | "colorrampname": "Yellow to Blue", 561 | "handlerId": "mapView", 562 | "keyFields": "latitude,longitude", 563 | "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", 564 | "rowCount": "500", 565 | "valueFields": "name,population" 566 | } 567 | }, 568 | "scrolled": false 569 | }, 570 | "outputs": [], 571 | "source": [ 572 | "%%node\n", 573 | "// fetch cities in UK above latitude 54 degrees north\n", 574 | "cities.query({country:'GB', latitude: { \"$gt\": 54}}).then(display).catch(console.error);" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "### Aggregating data\n", 582 | "The `cloudant-quickstart` library also allows aggregations (sum, count, stats) to be performed in the Cloudant database.\n", 583 | "Let’s calculate the sum of the population field:" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": null, 589 | "metadata": {}, 590 | "outputs": [], 591 | "source": [ 592 | "%%node\n", 593 | "cities.sum('population').then(console.log).catch(console.error);" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "Or compute the sum of the `population`, grouped by the `country` field, displaying 10 countries with the largest population:" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": null, 606 | "metadata": { 607 | "pixiedust": { 608 | "displayParams": { 609 | "aggregation": "SUM", 610 | "handlerId": "barChart", 611 | "keyFields": "name", 612 | "mapboxtoken": "pk.eyJ1IjoibWFwYm94IiwiYSI6ImNpejY4M29iazA2Z2gycXA4N2pmbDZmangifQ.-g_vE53SD2WrJ6tFX7QHmA", 613 | "orientation": "vertical", 614 | "rendererId": "google", 615 | "rowCount": "100", 616 | "sortby": "Values DESC", 617 | "valueFields": "population" 618 | } 619 | }, 620 | "scrolled": false 621 | }, 622 | "outputs": [], 623 | "source": [ 624 | "%%node\n", 625 | "\n", 626 | "// helper function\n", 627 | "function top10(data) {\n", 628 | " // convert input data structure to array\n", 629 | " var pop_array = [];\n", 630 | " Object.keys(data).forEach(function(n,k) {\n", 631 | " pop_array.push({name: n, population: data[n]});\n", 632 | " });\n", 633 | " // sort array by population in descending order\n", 634 | " pop_array.sort(function(a,b) {\n", 635 | " return b.population - a.population; \n", 636 | " });\n", 637 | " // display top 10 entries\n", 638 | " pop_array.slice(0,10).forEach(function(e) {\n", 639 | " console.log(e.name + ' ' + e.population.toLocaleString()); \n", 640 | " });\n", 641 | "}\n", 642 | "\n", 643 | "// fetch aggregated data and invoke helper routine\n", 644 | "cities.sum('population','country').then(top10).catch(console.error);" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": {}, 650 | "source": [ 651 | "The `cloudant-quickstart` package is just one of several Node.js libraries that you can use to access Apache CouchDB or Cloudant. Follow [this link](https://medium.com/ibm-watson-data-lab/choosing-a-cloudant-library-d14c06f3d714) to learn more about your options. " 652 | ] 653 | }, 654 | { 655 | "cell_type": "markdown", 656 | "metadata": {}, 657 | "source": [ 658 | "### Visualizing data using custom charts\n", 659 | "\n", 660 | "If you prefer, you can also use third-party Node.js charting packages to visualize your data, such as [`quiche`](https://www.npmjs.com/package/quiche)." 661 | ] 662 | }, 663 | { 664 | "cell_type": "code", 665 | "execution_count": null, 666 | "metadata": {}, 667 | "outputs": [], 668 | "source": [ 669 | "npm.install('quiche');" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": null, 675 | "metadata": {}, 676 | "outputs": [], 677 | "source": [ 678 | "%%node\n", 679 | "var Quiche = require('quiche');\n", 680 | "var pie = new Quiche('pie');\n", 681 | "\n", 682 | "// fetch cities in UK\n", 683 | "cities.query({name: 'Cambridge'}).then(function(data) {\n", 684 | "\n", 685 | " var colors = ['ff00ff','0055ff', 'ff0000', 'ffff00', '00ff00','0000ff'];\n", 686 | " for(i in data) {\n", 687 | " var city = data[i];\n", 688 | " pie.addData(city.population, city.name + '(' + city.country +')', colors[i]);\n", 689 | " }\n", 690 | " var imageUrl = pie.getUrl(true);\n", 691 | " image(imageUrl); \n", 692 | "});" 693 | ] 694 | }, 695 | { 696 | "cell_type": "markdown", 697 | "metadata": {}, 698 | "source": [ 699 | "***\n", 700 | "# Part 3: Sharing data between Python and Node.js cells\n", 701 | "\n", 702 | "You can share variables between Python and Node.js cells. Why woud you want to do that? Read on.\n", 703 | "\n", 704 | "The Node.js library ecosystem is extensive. Perhaps you need to fetch data from a database and prefer the syntax of a particular Node.js npm module. You can use Node.js to fetch the data, move it to the Python environment, and convert it into a Pandas or Spark DataFrame for aggregation, analysis and visualisation.\n", 705 | "\n", 706 | "PixieDust and pixiedust_node give you the flexibility to mix and match Python and Node.js code to suit the workflow you are building and the skill sets you have in your team.\n", 707 | "\n", 708 | "Mixing Node.js and Python code in the same notebook is a great way to integrate the work of your software development and data science teams to produce a collaborative report or dashboard.\n", 709 | "\n", 710 | "\n", 711 | "### Sharing data\n", 712 | "\n", 713 | "Define variables in a Python cell." 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": null, 719 | "metadata": {}, 720 | "outputs": [], 721 | "source": [ 722 | "# define a couple variables in Python\n", 723 | "a = 'Hello from Python!'\n", 724 | "b = 2\n", 725 | "c = False\n", 726 | "d = {'x':1, 'y':2}\n", 727 | "e = 3.142\n", 728 | "f = [{'a':1}, {'a':2}, {'a':3}]" 729 | ] 730 | }, 731 | { 732 | "cell_type": "markdown", 733 | "metadata": {}, 734 | "source": [ 735 | "Access or modify their values in Node.js cells." 736 | ] 737 | }, 738 | { 739 | "cell_type": "code", 740 | "execution_count": null, 741 | "metadata": {}, 742 | "outputs": [], 743 | "source": [ 744 | "%%node\n", 745 | "// print variable values\n", 746 | "console.log(a, b, c, d, e, f);\n", 747 | "\n", 748 | "// change variable value \n", 749 | "a = 'Hello from Node.js!';\n", 750 | "\n", 751 | "// define a new variable\n", 752 | "var g = 'Yes, it works both ways.';" 753 | ] 754 | }, 755 | { 756 | "cell_type": "markdown", 757 | "metadata": {}, 758 | "source": [ 759 | "Inspect the manipulated data." 760 | ] 761 | }, 762 | { 763 | "cell_type": "code", 764 | "execution_count": null, 765 | "metadata": {}, 766 | "outputs": [], 767 | "source": [ 768 | "# display modified variable and the new variable\n", 769 | "print('{} {}'.format(a,g))" 770 | ] 771 | }, 772 | { 773 | "cell_type": "markdown", 774 | "metadata": {}, 775 | "source": [ 776 | "**Note:** PixieDust natively supports [data sharing between Python and Scala](https://ibm-watson-data-lab.github.io/pixiedust/scalabridge.html), extending the loop for some data types:\n", 777 | " ```\n", 778 | " %%scala\n", 779 | " println(a,b,c,d,e,f,g)\n", 780 | " \n", 781 | " (Hello from Node.js!,2,null,null,null,null,Yes, it works both ways.)\n", 782 | " ```" 783 | ] 784 | }, 785 | { 786 | "cell_type": "markdown", 787 | "metadata": {}, 788 | "source": [ 789 | "### Sharing data from an asynchronous callback\n", 790 | "\n", 791 | "If you wish transfer data from Node.js to Python from an asynchronous callback, make sure you write the data to a global variable. \n", 792 | "\n", 793 | "Load a csv file from a GitHub repository." 794 | ] 795 | }, 796 | { 797 | "cell_type": "code", 798 | "execution_count": null, 799 | "metadata": {}, 800 | "outputs": [], 801 | "source": [ 802 | "%%node\n", 803 | "\n", 804 | "// global variable\n", 805 | "var sample_csv_data = '';\n", 806 | "\n", 807 | "// load csv file from GitHub and store data in the global variable\n", 808 | "request.get('https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv').then(function(data) {\n", 809 | " sample_csv_data = data;\n", 810 | " console.log('Fetched sample data from GitHub.');\n", 811 | "});" 812 | ] 813 | }, 814 | { 815 | "cell_type": "markdown", 816 | "metadata": {}, 817 | "source": [ 818 | "Create a Pandas DataFrame from the downloaded data." 819 | ] 820 | }, 821 | { 822 | "cell_type": "code", 823 | "execution_count": null, 824 | "metadata": { 825 | "pixiedust": { 826 | "displayParams": {} 827 | } 828 | }, 829 | "outputs": [], 830 | "source": [ 831 | "import pandas as pd\n", 832 | "import io\n", 833 | "# create DataFrame from shared csv data\n", 834 | "pandas_df = pd.read_csv(io.StringIO(sample_csv_data))\n", 835 | "# display first five rows\n", 836 | "pandas_df.head(5)" 837 | ] 838 | }, 839 | { 840 | "cell_type": "markdown", 841 | "metadata": {}, 842 | "source": [ 843 | "**Note**: Above example is for illustrative purposes only. A much easier solution is to use [PixieDust's sampleData method](https://ibm-watson-data-lab.github.io/pixiedust/loaddata.html#load-a-csv-using-its-url) if you want to create a DataFrame from a URL. " 844 | ] 845 | }, 846 | { 847 | "cell_type": "markdown", 848 | "metadata": {}, 849 | "source": [ 850 | "#### References:\n", 851 | " * [Nodebooks: Introducing Node.js Data Science Notebooks](https://medium.com/ibm-watson-data-lab/nodebooks-node-js-data-science-notebooks-aa140bea21ba)\n", 852 | " * [Nodebooks: Sharing Data Between Node.js & Python](https://medium.com/ibm-watson-data-lab/nodebooks-sharing-data-between-node-js-python-3a4acae27a02)\n", 853 | " * [Sharing Variables Between Python & Node.js in Jupyter Notebooks](https://medium.com/ibm-watson-data-lab/sharing-variables-between-python-node-js-in-jupyter-notebooks-682a79d4bdd9)" 854 | ] 855 | } 856 | ], 857 | "metadata": { 858 | "kernelspec": { 859 | "display_name": "Python 2.7", 860 | "language": "python", 861 | "name": "python2" 862 | }, 863 | "language_info": { 864 | "codemirror_mode": { 865 | "name": "ipython", 866 | "version": 2 867 | }, 868 | "file_extension": ".py", 869 | "mimetype": "text/x-python", 870 | "name": "python", 871 | "nbconvert_exporter": "python", 872 | "pygments_lexer": "ipython2", 873 | "version": "2.7.15" 874 | } 875 | }, 876 | "nbformat": 4, 877 | "nbformat_minor": 2 878 | } 879 | --------------------------------------------------------------------------------