├── data ├── images.h5 └── fmnist.onnx ├── images ├── dock.jpg ├── gold.jpg ├── onnx.png ├── pfa.png ├── sas.png ├── docker.png ├── PMML_Logo.png ├── pfa-doc-1.png ├── pfa-doc-2.png ├── pfa-line.png ├── pmml_example.png └── squeezenet.png ├── binder ├── start ├── environment.yml └── jupyterlab-workspace.json ├── README.md ├── 04-Hack-Lab.ipynb ├── 04a-Hack-Lab-Solution.ipynb ├── 01-Deployment.ipynb ├── 03-ONNX.ipynb └── 02-Model-Formats.ipynb /data/images.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/data/images.h5 -------------------------------------------------------------------------------- /images/dock.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/dock.jpg -------------------------------------------------------------------------------- /images/gold.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/gold.jpg -------------------------------------------------------------------------------- /images/onnx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/onnx.png -------------------------------------------------------------------------------- /images/pfa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/pfa.png -------------------------------------------------------------------------------- /images/sas.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/sas.png -------------------------------------------------------------------------------- /data/fmnist.onnx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/data/fmnist.onnx -------------------------------------------------------------------------------- /images/docker.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/docker.png -------------------------------------------------------------------------------- /images/PMML_Logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/PMML_Logo.png -------------------------------------------------------------------------------- /images/pfa-doc-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/pfa-doc-1.png -------------------------------------------------------------------------------- /images/pfa-doc-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/pfa-doc-2.png -------------------------------------------------------------------------------- /images/pfa-line.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/pfa-line.png -------------------------------------------------------------------------------- /images/pmml_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/pmml_example.png -------------------------------------------------------------------------------- /images/squeezenet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adbreind/pydata-pdx-onnx/master/images/squeezenet.png -------------------------------------------------------------------------------- /binder/start: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Import the workspace 4 | jupyter lab workspaces import binder/jupyterlab-workspace.json 5 | 6 | exec "$@" 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pydata-pdx-onnx 2 | 3 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/adbreind/pydata-pdx-onnx.git/master?urlpath=lab) 4 | -------------------------------------------------------------------------------- /binder/environment.yml: -------------------------------------------------------------------------------- 1 | name: pydata-pdx-onnx 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - python=3.8 6 | - jupyterlab 7 | - numpy 8 | - pip 9 | - pandas 10 | - pyarrow 11 | - python-graphviz 12 | - scikit-learn 13 | - matplotlib 14 | - h5py 15 | - pip: 16 | - onnxmltools 17 | - onnxruntime 18 | -------------------------------------------------------------------------------- /binder/jupyterlab-workspace.json: -------------------------------------------------------------------------------- 1 | {"data":{"layout-restorer:data":{"main":{"dock":{"type":"tab-area","currentIndex":0,"widgets":["notebook:01-Intro.ipynb"]},"mode":"multiple-document","current":"notebook:01-Intro.ipynb"},"left":{"collapsed":false,"current":"filebrowser","widgets":["filebrowser","running-sessions","command-palette","tab-manager"]},"right":{"collapsed":true,"widgets":[]}},"notebook:01-Intro.ipynb":{"data":{"path":"01-Intro.ipynb","factory":"Notebook"}},"file-browser-filebrowser:cwd":{"path":""}},"metadata":{"id":"/lab"}} -------------------------------------------------------------------------------- /04-Hack-Lab.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Loading and scoring some FashionMNIST images with an ONNX model" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Load up the test data from `data/images.h5` (an HDF archive)" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import h5py\n", 24 | "\n", 25 | "file = h5py.File('data/images.h5', 'r')\n", 26 | "\n", 27 | "# find data inside the archive" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "Get the records into numpy" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "Create an `onnxruntime` inference session object from the model at `data/fmnist.onnx`" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "Prepare the data for scoring:\n", 63 | "* original data is\n", 64 | " * pixel intensities (0-255 as uint8)\n", 65 | " * and shape is (10000, 28, 28) as (batch, h, w)\n", 66 | "* we need to convert them to\n", 67 | " * scaled from 0.0 to 1.0 \n", 68 | " * NumPy float32 type\n", 69 | "* for scoring, the model expects batches of shape\n", 70 | " * (batch_size, 1, 28, 28) which is (batch_size, channels, h, w)\n", 71 | " * batch_size can be variable" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "Note the output is always a `list` ... in our case, we just need one item" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "For each image input, we get 10 logit scores. We can argmax these to find MLE clothing item class index." 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [] 101 | } 102 | ], 103 | "metadata": { 104 | "kernelspec": { 105 | "display_name": "Python 3", 106 | "language": "python", 107 | "name": "python3" 108 | }, 109 | "language_info": { 110 | "codemirror_mode": { 111 | "name": "ipython", 112 | "version": 3 113 | }, 114 | "file_extension": ".py", 115 | "mimetype": "text/x-python", 116 | "name": "python", 117 | "nbconvert_exporter": "python", 118 | "pygments_lexer": "ipython3", 119 | "version": "3.7.4" 120 | } 121 | }, 122 | "nbformat": 4, 123 | "nbformat_minor": 4 124 | } 125 | -------------------------------------------------------------------------------- /04a-Hack-Lab-Solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Loading and scoring some FashionMNIST images with an ONNX model" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Load up the test data from `data/images.h5` (an HDF archive)" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": null, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import h5py\n", 24 | "\n", 25 | "file = h5py.File('data/images.h5', 'r')\n", 26 | "\n", 27 | "file.keys()" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "Get the records into numpy" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "import numpy as np\n", 44 | "\n", 45 | "arr = np.array(file['batch'])\n", 46 | "arr.shape, arr.dtype" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "Create an `onnxruntime` inference session object from the model at `data/fmnist.onnx`" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "import onnxruntime as rt\n", 63 | "\n", 64 | "session = rt.InferenceSession(\"data/fmnist.onnx\")\n", 65 | "\n", 66 | "for i in session.get_inputs():\n", 67 | " print(i)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "Prepare the data for scoring:\n", 75 | "* original data is\n", 76 | " * pixel intensities (0-255 as uint8)\n", 77 | " * and shape is (10000, 28, 28) as (batch, h, w)\n", 78 | "* we need to convert them to\n", 79 | " * scaled from 0.0 to 1.0 \n", 80 | " * NumPy float32 type\n", 81 | "* for scoring, the model expects batches of shape\n", 82 | " * (batch_size, 1, 28, 28) which is (batch_size, channels, h, w)\n", 83 | " * batch_size can be variable" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "arr = arr / 255.0\n", 93 | "arr.dtype" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "arr = arr.reshape(-1, 1, 28,28).astype('float32')" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "outputs = session.run(None, {'input': arr })" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "Note the output is always a `list` ... in our case, we just need one item" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "outputs[0].shape" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "For each image input, we get 10 logit scores. We can argmax these to find MLE clothing item class index." 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": {}, 141 | "outputs": [], 142 | "source": [ 143 | "outputs[0][0]" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "predictions = outputs[0].argmax(axis=1)\n", 153 | "\n", 154 | "predictions[:5]" 155 | ] 156 | } 157 | ], 158 | "metadata": { 159 | "kernelspec": { 160 | "display_name": "Python 3", 161 | "language": "python", 162 | "name": "python3" 163 | }, 164 | "language_info": { 165 | "codemirror_mode": { 166 | "name": "ipython", 167 | "version": 3 168 | }, 169 | "file_extension": ".py", 170 | "mimetype": "text/x-python", 171 | "name": "python", 172 | "nbconvert_exporter": "python", 173 | "pygments_lexer": "ipython3", 174 | "version": "3.7.4" 175 | } 176 | }, 177 | "nbformat": 4, 178 | "nbformat_minor": 4 179 | } 180 | -------------------------------------------------------------------------------- /01-Deployment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Deploy Machine Learning Projects in Production with Open Standard Models" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Open Source Pre-History: Proprietary Inference Servers\n", 15 | "\n", 16 | "Why is this a challenge today?\n", 17 | "\n", 18 | "For a long time, businesses using machine learning employed proprietary tools like SAS, SPSS, and FICO to perform modeling.\n", 19 | "\n", 20 | "Many of these products and vendors licensed proprietary \"model servers\" or \"inference servers\" which were created specifically to take models and expose them elsewhere in the IT infrastructure as a service.\n", 21 | "\n", 22 | "If your company was a customer of these products, the enterprise \"solution\" included both the data mining tools (modeling) and the serving tools (inference)." 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## The Rise of Open Source: Stone Age\n", 30 | "\n", 31 | "As open-source data science tools rose in prominence over the last decade, more data scientists, statisticians, researchers, and analysts began relying on\n", 32 | "* Python\n", 33 | " * SciPy stack\n", 34 | " * scikit-learn\n", 35 | " * TensorFlow\n", 36 | " * etc.\n", 37 | "* R\n", 38 | " * dplyr\n", 39 | " * ggplot2\n", 40 | " * etc.\n", 41 | "* Spark, H2O, others...\n", 42 | "\n", 43 | "As we've all seen, the cycle of research, development, publication, and open-source tooling has led to a huge explosion of data-driven uses throughout the world.\n", 44 | "\n", 45 | "__But__ none of those tools had a clear, complete story for how to deploy a model once it was trained.\n", 46 | "\n", 47 | "So engineers carved out the *Stone-Age Solution* ... namely, attempt to wrap the data science stack in a lightweight web service framework, and put it into production.\n", 48 | "\n", 49 | "The classic example is a Python [Flask](https://en.wikipedia.org/wiki/Flask_(web_framework)) web endpoint that wraps a call to scikit-learn's `model.predict(...)`\n", 50 | "\n", 51 | "Before discussing the many drawbacks of this approach, let's quickly review..." 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "## Open Source: Bronze Age\n", 59 | "\n", 60 | "Since model inference is typically lightweight, stateless, and idempotent, it is an ideal candidate for a scale-out containerized service using a container scaling framework like Kubernetes.\n", 61 | "\n", 62 | "The \"Bronze Age\" of open-source model deployment containerized the Stone Age approach, making it easy to scale, robust, etc.\n", 63 | "\n", 64 | "Containerization was definitely an improvement ..." 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "## Open Source: Platform Gold Rush\n", 72 | "\n", 73 | "\n", 74 | "\n", 75 | "Businesses realized that they wanted enterprise manageability over these ML inference services ... \n", 76 | "\n", 77 | "And a lot of entrepreneurs realized that making money in novel ML training was hard (after all, thousands of Ph.D. researchers were working on the same problems, and giving the results away for free) ... but making a \"platform\" that\n", 78 | "* Dockerized open-source ML stacks\n", 79 | "* Deployed them on-prem or in the cloud via Kubernetes\n", 80 | "* and provided some manageability (\"ML Ops\") \n", 81 | "was both easy and lucrative.\n", 82 | "\n", 83 | "### 2018-2019 will go down as the ML Ops Gold Rush\n", 84 | "\n", 85 | "And, as in the California Gold Rush, it has been easier selling tools and services than finding actual gold.\n", 86 | "\n", 87 | "__ML deployment platforms *do* have value to offer__ and we'll come back to that part. But first we need to focus on the Achilles heel, namely Dockerized data science stacks." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## You Wouldn't Deploy an Enterprise Service by Putting Your Dev Machine in the Datacenter\n", 95 | "\n", 96 | "\n", 97 | "\n", 98 | "## So Why Would You Deploy a ML Service by Putting the ML Stack in a Container?\n", 99 | "\n", 100 | "\"It works (for now)\" is about the best thing you can say about such a deployment.\n", 101 | "\n", 102 | "Meanwhile, how do we address...\n", 103 | "\n", 104 | "* model inspection\n", 105 | "* versioning\n", 106 | "* diffing model versions\n", 107 | "* porting to other environments (e.g., ARM vs. Intel or mobile vs. client vs. server)\n", 108 | "* using the model in an alternate runtime (e.g., a scikit-learn model in a Spark job)\n", 109 | "* using models *from* an alternate runtime (e.g., Spark cannot natively export a ML pipeline to a containerizable service)\n", 110 | "* updating dependencies (e.g., patching a security vulnerability in underlying components https://www.cvedetails.com/cve/CVE-2019-6446/)\n", 111 | "* not to mention lots of design issues like...\n", 112 | " * Why should a ML model (which is typically a limited set of math operations) be deployed as a full computing stack and environment?\n", 113 | " * Why should we use an enormous container and billions of compute cycles to perform arithmetic and a bit of trigonometry?\n", 114 | " \n", 115 | "### Fundamentally, Containerizing a ML Stack (with Model) Violates Separation of Concerns\n", 116 | "\n", 117 | "Consider: we send each other plain-text emails, which can be written and read according to standard text encodings, rather than, say, the absurd idea of sending executable VM Images with an OS and a word processor, along with document in the word processor's proprietary format\n", 118 | "\n", 119 | "The ML model, once trained, can be viewed as data. \n", 120 | "\n", 121 | "It should be possible to \n", 122 | "* manage this data using standard, well known data-management tools and practices\n", 123 | "* create this data using any compliant tool\n", 124 | "* consume this data using any compliant tool\n", 125 | "* validate that this data has a single universal interpretation\n", 126 | " * why is this important? consider the impact of tiny difference in implementation of, say, *ln(x)* on inference at scale" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "## How Can We Address this Separation-of-Concerns Problem?\n", 134 | "\n", 135 | "__Get the model *out* of the model-creation environment (both logically and physically)__\n", 136 | "\n", 137 | "Physically: create a separate entity like a file\n", 138 | "\n", 139 | "Logically: ensure that entity is independent -- so saving a scikit-learn model as a pickle file (which will later need scikit-learn after being unpickled) does not count as a solution\n" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "## What Kind of Separate File Do We Want?\n", 147 | "\n", 148 | "Ideally, we'd like a format that is ...\n", 149 | "* an open, cross-industry standard\n", 150 | "* not owned or controlled by any one organization\n", 151 | "* not encumbered by intellectual property restrictions (licensing rules)\n", 152 | "* works with many ML tools, on many platforms\n", 153 | "* time/space efficient\n", 154 | "* robust (can support many kinds of ML models, including future types)\n", 155 | "* consistent (produces the same output for the same model, no matter the deployment OS, architecture, etc.)\n", 156 | "* simple (does not support unnecessary operations)\n", 157 | "* secure (minimizes attack surface by design, offers verifiability, etc)\n", 158 | "* is human readable (or can be made human readable)\n", 159 | "* can be managed in any database, content-management system, source control, etc.\n", 160 | "\n", 161 | "*As in most engineering scenarios, there is no single, magical solution that hits every bullet-point*\n", 162 | "\n", 163 | "But there are number of approaches which offer many of these attributes and which are worthy of consideration.\n", 164 | "\n", 165 | "This session looks at several of these tools." 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [] 174 | } 175 | ], 176 | "metadata": { 177 | "kernelspec": { 178 | "display_name": "Python 3", 179 | "language": "python", 180 | "name": "python3" 181 | }, 182 | "language_info": { 183 | "codemirror_mode": { 184 | "name": "ipython", 185 | "version": 3 186 | }, 187 | "file_extension": ".py", 188 | "mimetype": "text/x-python", 189 | "name": "python", 190 | "nbconvert_exporter": "python", 191 | "pygments_lexer": "ipython3", 192 | "version": "3.7.4" 193 | } 194 | }, 195 | "nbformat": 4, 196 | "nbformat_minor": 4 197 | } 198 | -------------------------------------------------------------------------------- /03-ONNX.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# ONNX (Open Neural Network eXchange)\n", 15 | "\n", 16 | "Originally created by Facebook and Microsoft as an industry collaboration for import/export of neural networks\n", 17 | "* ONNX has grown to include support for \"traditional\" ML models\n", 18 | "* interop with many software libraries\n", 19 | "* has both software (CPU, optional GPU accelerated) and hardware (Intel, Qualcomm, etc.) runtimes.\n", 20 | "\n", 21 | "https://onnx.ai/\n", 22 | "\n", 23 | "* DAG-based model\n", 24 | "* Built-in operators, data types\n", 25 | "* Extensible -- e.g., ONNX-ML\n", 26 | "* Goal is to allow tools to share a single model format\n", 27 | "\n", 28 | "*Of the \"standard/open\" formats, ONNX clearly has the most momentum in the past year or two.*\n", 29 | "\n", 30 | "## Viewing a Model\n", 31 | "\n", 32 | "ONNX models are not directly (as raw data) human-readable, but, as they represent a graph, can easily be converted into textual or graphical representations.\n", 33 | "\n", 34 | "Here is a snippet of the [SqueezeNet](https://arxiv.org/abs/1602.07360) image-recognition model, as rendered in the ONNX visualization tutorial at https://github.com/onnx/tutorials/blob/master/tutorials/VisualizingAModel.md. \n", 35 | "\n", 36 | "> The ONNX codebase comes with the visualization converter used in this example -- it's a simple script currently located at https://github.com/onnx/onnx/blob/master/onnx/tools/net_drawer.py\n", 37 | "\n", 38 | "\n", 39 | "\n", 40 | "### Let's Build a Model and Convert it to ONNX" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "import pandas as pd\n", 50 | "from sklearn.linear_model import LinearRegression\n", 51 | "\n", 52 | "data = pd.read_csv('data/diamonds.csv')\n", 53 | "X = data.carat\n", 54 | "y = data.price\n", 55 | "model = LinearRegression().fit(X.values.reshape(-1,1), y)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "ONNX can be generated from many modeling tools. A partial list is here: https://github.com/onnx/tutorials#converting-to-onnx-format\n", 63 | "\n", 64 | "Microsoft has contributed a lot of resources toward open-source ONNX capabilities, including, in early 2019, support for Apache Spark ML Pipelines: https://github.com/onnx/onnxmltools/blob/master/onnxmltools/convert/sparkml/README.md\n", 65 | "\n", 66 | "__Convert to ONNX__\n", 67 | "\n", 68 | "Note that we can print a string representation of the converted graph." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "from skl2onnx import convert_sklearn\n", 78 | "from skl2onnx.common.data_types import FloatTensorType\n", 79 | "\n", 80 | "initial_type = [('carat', FloatTensorType([1, 1]))]\n", 81 | "onx = convert_sklearn( model, initial_types=initial_type)\n", 82 | "print(onx)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "Let's save it as a file:" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "with open(\"diamonds.onnx\", \"wb\") as f:\n", 99 | " f.write(onx.SerializeToString())" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "The file itself is binary:" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "! head diamonds.onnx" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "## How Do We Consume ONNX and Make Predictions?\n", 123 | "\n", 124 | "One of the things that makes ONNX a compelling solution in 2019 is the wide industry support not just for model creation, but also for performant model inference.\n", 125 | "\n", 126 | "Here is a partial list of tools that consume ONNX: https://github.com/onnx/tutorials#scoring-onnx-models\n", 127 | "\n", 128 | "Of particular interest for productionizing models are\n", 129 | "* Apple CoreML\n", 130 | "* Microsoft `onnxruntime` and `onnxruntime-gpu` for CPU & GPU-accelerated inference\n", 131 | "* TensorRT for NVIDIA GPU\n", 132 | "* Conversion for Qualcomm Snapdragon hardware: https://developer.qualcomm.com/docs/snpe/model_conv_onnx.html\n", 133 | "\n", 134 | "Today, we'll look at \"regular\" server-based inference with a sample REST server, using `onnxruntime`\n", 135 | "\n", 136 | "#### We'll start by loading the `onnxruntime` library, and seeing how we make predictions" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "import onnxruntime as rt\n", 146 | "\n", 147 | "sess = rt.InferenceSession(\"diamonds.onnx\")\n", 148 | "\n", 149 | "print(\"In\", [(i.name, i.type, i.shape) for i in sess.get_inputs()])\n", 150 | " \n", 151 | "print(\"Out\", [(i.name, i.type, i.shape) for i in sess.get_outputs()])" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "We've skipped some metadata annotation in the model creation for this quick example -- that's why our input field name is \"float_input\" and the output is called \"variable\"" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "import numpy as np\n", 168 | "\n", 169 | "sample_to_score = np.array([[1.0]], dtype=np.float32)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": null, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "output = sess.run(['variable'], {'carat': sample_to_score})\n", 179 | "\n", 180 | "output" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": null, 186 | "metadata": {}, 187 | "outputs": [], 188 | "source": [ 189 | "output[0][0][0]" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "#### At this point, we can build our service...\n", 197 | "\n", 198 | "Now we are free to choose our service infrastructure of choice.\n", 199 | "\n", 200 | "Moreover, we can containerize that service, so we get back all of the benefits of Docker, Kubernetes, etc.\n", 201 | "\n", 202 | "But this time, we have a minimal serving infrastructure that knows only about the model itself, and loads models in a single, open, industry-standard format." 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "#### Pros and Cons: ONNX\n", 210 | "* Most major tools have ONNX support\n", 211 | "* MIT license makes it both OSS and business friendly\n", 212 | "* Probably the closest thing we have to an open, versatile, next-gen format *with wide support*\n", 213 | "* Protobuf format is compact and typesafe\n", 214 | "* Biggest weakness was \"classical\" ML and feature engineering support -- this has now been fixed\n", 215 | "* Microsoft open-sourced (Dec 2018) a high-perf runtime (GPU, CPU, language bindings, etc.) https://azure.microsoft.com/en-us/blog/onnx-runtime-is-now-open-source/\n", 216 | " * Being used as part of Windows ML / Azure ML\n", 217 | " * https://github.com/Microsoft/onnxruntime\n", 218 | "* In Q1-Q2 of 2019, Microsoft added a Spark ML Pipeline exporter to the `onnxmltools` project\n", 219 | " * https://github.com/onnx/onnxmltools\n", 220 | "\n", 221 | "Cons:\n", 222 | "* Wasn't originally intended as a deployment format *per se*\n", 223 | " * Doesn't have a standard or reference runtime\n", 224 | " * Doesn't provide certification or standard around correctness\n", 225 | " * No opinion on security, etc.\n", 226 | "* Protobuf format is not human readable or manageable via text-oriented tooling\n", 227 | " * Though the graph itself can be (e.g., PyTorch export output)" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": null, 233 | "metadata": {}, 234 | "outputs": [], 235 | "source": [] 236 | } 237 | ], 238 | "metadata": { 239 | "kernelspec": { 240 | "display_name": "Python 3", 241 | "language": "python", 242 | "name": "python3" 243 | }, 244 | "language_info": { 245 | "codemirror_mode": { 246 | "name": "ipython", 247 | "version": 3 248 | }, 249 | "file_extension": ".py", 250 | "mimetype": "text/x-python", 251 | "name": "python", 252 | "nbconvert_exporter": "python", 253 | "pygments_lexer": "ipython3", 254 | "version": "3.7.4" 255 | }, 256 | "name": "02-ONNX-Models", 257 | "notebookId": 2375086480049026 258 | }, 259 | "nbformat": 4, 260 | "nbformat_minor": 4 261 | } 262 | -------------------------------------------------------------------------------- /02-Model-Formats.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Deployment (High-Level) Solution\n", 8 | "\n", 9 | "## What? Break the Dependency\n", 10 | "\n", 11 | "__How? Export the Model and Move It__\n", 12 | "\n", 13 | "This brings us, for better or worse to a choice of what format to use.\n", 14 | "\n", 15 | "As always, it's a compromise to accommodate ease of use, performance, portability, \"openness,\" etc." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "## 0. Amalgamation\n", 23 | "\n", 24 | "The simplest form of export is \"amalgamation\" ... in which case the model and all necessary code to run are emitted as one big chunk.\n", 25 | "\n", 26 | "In some cases, it's a single source code file that can be compiled on nearly any platform as a standalone program.\n", 27 | " * Classic amalgamation: MXNet + model code https://mxnet.apache.org/api/faq/smart_device\n", 28 | "\n", 29 | "In other cases, it's a chunk of consumable IR code that can be consumed in a common runtime:\n", 30 | " * H2O POJO export https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/productionizing.rst#pojo-quick-start\n", 31 | " * TVM IR https://docs.tvm.ai/tutorials/cross_compilation_and_rpc.html\n", 32 | " * If we haven't already looked at it, make a mental note to explore TVM at some point: https://tvm.ai/about\n", 33 | "\n", 34 | "And sometimes ... it's a coder implementing a model by hand and compiling it! (For simple, popular models, like linear/logistic regression, it's pretty easy once you have the model params.)\n", 35 | "\n", 36 | "__Pros and Cons__\n", 37 | "\n", 38 | "Pros:\n", 39 | "* Easy-to-understand concept\n", 40 | "* Fairly portable\n", 41 | "* Can be compact and performant\n", 42 | " * May be a good choice for extremely constrained embedded environments\n", 43 | "\n", 44 | "Cons:\n", 45 | "* Not interoperable with other high-level environments\n", 46 | "* Not easily human readable, diffable, manageable in a CMS or version control\n", 47 | "* Violates separation of code from data\n", 48 | "* May not fit in well with enterprise manageability and operations needs" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "## 1. Single-Product Format\n", 56 | "\n", 57 | "I.e., a format which serves a specific product ecosystem, but is not intended to interoperate with other systems nor serve as a \"standard\"\n", 58 | "\n", 59 | "*Examples:*\n", 60 | "\n", 61 | "__SparkML + MLeap__\n", 62 | "* MLeap supports Spark, some scikit-learn models and some TensorFlow models\n", 63 | "* Represents models in a \"MLeap Bundle\"\n", 64 | "* MLeap runtime is a JAR that can run in any Java application (or by with a lightweight scoring wrapper provided by MLeap)\n", 65 | " \n", 66 | "__TensorFlow + TensorFlow Serving__\n", 67 | "* TensorFlow models (created directly with TensorFlow or with Keras) serialize to a TF-specific protocol buffer representation\n", 68 | "* TensorFlow Serving loads the latest version of a model\n", 69 | " * TF Serving exposes a gRPC service and, in the latest version, a REST endpoint\n", 70 | "\n", 71 | "__TensorFlow + FlatBuffers + TFLite__\n", 72 | "* FlatBuffers is an \"open\" format with multiple collaborators\n", 73 | "* Targets iOS and Android" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "## 2. Existing Standard Format: PMML\n", 81 | "\n", 82 | "" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "### PMML has existed for over 20 years, and is used widely throughout the world. \n", 90 | "\n", 91 | "It has many advantages, but is not perfect.\n", 92 | "\n", 93 | "Pros:\n", 94 | "* In wide use / well-accepted / large community\n", 95 | "* Core XML dialect can be human readable\n", 96 | "* Models can be processed/managed by text-based tools (VCS/CMS/etc.)\n", 97 | "* Covers the majority of modeling cases companies use today\n", 98 | "\n", 99 | "Cons:\n", 100 | "* Support for production of models in the open-source world is spotty\n", 101 | "* Support for consuming models in the OSS is sparse/minimal\n", 102 | "* Importance of modern open-source tooling has been dragging PMML down\n", 103 | "* Some modern model types and pipelines are not supported, or not supported efficiently/compactly\n", 104 | "\n", 105 | "In practice, PMML -- even with commercial/enterprise, supported products -- is more like USB C than USB 3. \n", 106 | "\n", 107 | "I.e., like USB C, it's very versatile in theory, and the plug always fits, but that tells you little or nothing about whether the two devices connected can have any conversation, let alone the specific conversation you need them to have.\n", 108 | "\n", 109 | "Despite its imperfections, it has many advantages over single-product formats, so we often use it even if it cannot fulfil a promise of being the \"universal\" tool." 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "__Example__\n", 117 | "\n", 118 | "Here is an example of a logistic regression classifier trained using R on the Iris dataset:\n", 119 | "\n", 120 | "(http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/IrisMultinomReg.xml)\n", 121 | "\n", 122 | "" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "### Where do we get a PMML model?\n", 130 | "\n", 131 | "A partial list of products supporting PMML is at http://dmg.org/pmml/products.html\n", 132 | "\n", 133 | "Focusing on the *producing PMML* side, we can see there are a lot of products that can create PMML, even if most of them are commercial or have effectively commercial licensing schemes (e.g. JPMML).\n", 134 | "\n", 135 | "In the open-source world (again, excluding AGPL code like JPMML), we have\n", 136 | "* R -- strongest open-source export support\n", 137 | "* Spark -- very limited support: the listed models are only supported under the *old/deprecated* RDD MLlib API\n", 138 | " * There is work in progress to add PMML export to the new API but it has just begun and may not make progress\n", 139 | "* Python -- aside from the wrapper around the above-mentioned JPMML, the best option today is\n", 140 | " * https://nyoka-pmml.github.io/nyoka/index.html\n", 141 | " \n", 142 | "It is important to note that\n", 143 | "* although there are plenty of commercial products with at least some PMML support\n", 144 | "* and although large enterprises can (and for support/legal reasons prefer to) pay for a product\n", 145 | "* the lack of openness and community is leaving commercial-only ML tooling far behind\n", 146 | " * e.g., all of the top deep learning tools are FOSS\n", 147 | " * this means most of the performance-focused work is tied to the FOSS tools\n", 148 | " * scaling is owned by FOSS (kubeflow, Horovod, etc.)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "### How do we run a PMML model?\n", 156 | "\n", 157 | "Permissive OSS support for running PMML models is effectively nonexistent, so we need architect in tandem with business decisions around a vendor's analytics server product. These business decisions will go beyond the licensing and support, because they will affect all of our enterprise architectures: hardware, network, software, managment/monitoring/operations, reliability/contiuity, compliance etc.\n", 158 | "\n", 159 | "However, we can make use of the AGPL code in JPMML for demonstration purposes.\n", 160 | "\n", 161 | "#### JPMML\n", 162 | "\n", 163 | "JPMML (https://github.com/jpmml) is a set of AGPL OSS projects that \n", 164 | "* form the de facto Java implementation of PMML\n", 165 | "* offer interop with key FOSS tools like Apache Spark, R, Scikit-learn, XGBoost, TensorFlow, etc.\n", 166 | "* provide easy scoring in your own apps, or using a \"scoring wrapper\" or hosted in the cloud\n", 167 | "* is maintained and licensed in connection with https://openscoring.io/ \n", 168 | "* *note: there is an older, abandoned, version of JPMML under a more friendly Apache 2.0 license*\n", 169 | " * this older version has many features and might be suitable for some organizations with a higher risk/ownership appetite\n", 170 | " * https://github.com/jpmml/jpmml" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "## 3. Next-Gen Standard Format: PFA\n", 178 | "\n", 179 | "" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "#### PFA (Portable Format for Analytics) is a Modern Replacement for PMML\n", 187 | "\n", 188 | "##### \"As data analyses mature, they must be hardened — they must have fewer dependencies, a more maintainable structure, and they must be robust against errors.\" - DMG\n", 189 | "\n", 190 | "PFA, created in 2015, is intended to improve upon PMML.\n", 191 | "\n", 192 | "From http://dmg.org/pfa/docs/motivation/:\n", 193 | "\n", 194 | "*Tools such as Hadoop and Storm provide automated data pipelines, separating the data flow from the functions that are performed on data (mappers and reducers in Hadoop, spouts and bolts in Storm). Ordinarily, these functions are written in code that has access to the pipeline internals, the host operating system, the remote filesystem, the network, etc. However, all they should do is math.*\n", 195 | "\n", 196 | "*PFA completes the abstraction by encapsulating these functions as PFA documents. From the point of view of the pipeline system, the documents are configuration files that may be loaded or replaced independently of the pipeline code.*\n", 197 | "\n", 198 | "*This separation of concerns allows the data analysis to evolve independently of the pipeline. Since scoring engines written in PFA are not capable of accessing or manipulating their environment, they cannot jeopardize the production system. Data analysts can focus on the mathematical correctness of their algorithms and security reviews are only needed when the pipeline itself changes.*\n", 199 | "\n", 200 | "*This decoupling is important because statistical models usually change more quickly than pipeline frameworks. Model details are often tweaked in response to discoveries about the data and models frequently need to be refreshed with new training samples.*" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "\n", 208 | "\n", 209 | "(summarized from DMG)\n", 210 | "\n", 211 | "#### Overview of PFA capabilities\n", 212 | "\n", 213 | "PFA flexibility:\n", 214 | "* Control structures, such as conditionals, loops, and user-defined functions\n", 215 | "* Entirely expressed within JSON, and can therefore be easily generated and manipulated by other programs\n", 216 | "* Fine-grained function library supporting extensibility callbacks\n", 217 | "\n", 218 | "The following contribute to PFA’s safety:\n", 219 | "\n", 220 | "* Strict numerical compatibility: the same PFA document and the same input results in the same output, regardless of platform.\n", 221 | "* Spec only defines functions that transform data. I/O is all controlled by the host system.\n", 222 | "* Type system that can be statically checked. ... This system has a type-safe null and PFA only performs type-safe casting, which ensure that missing data never cause run-time errors.\n", 223 | "* The callbacks that generalize PFA’s statistical models are not first-class functions\n", 224 | " * The set of functions that a PFA document might call can be predicted before it runs\n", 225 | " * A PFA host may choose to only allow certain functions.\n", 226 | "* The semantics of shared data guarantee that data are never corrupted by concurrent access and scoring engines do not enter deadlock. " 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "__Example__\n", 234 | "\n", 235 | "Here are some data records:\n", 236 | "\n", 237 | "\n", 238 | "\n", 239 | "And a PFA document which returns the square-root of the sum of the squares of a record's x, y, and z values:\n", 240 | "\n", 241 | "" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "The above example -- along with numerous other tutorials -- can be viewed, *modified*, and run live online at http://dmg.org/pfa/docs/tutorial2/ and other dmg.org pages.\n", 249 | "\n", 250 | "Although it may not be obvious from this small example, PFA is effectively a programming language, albeit a restricted one, and as such can express complex transformations and aggregations of data. The PFA document is a serialized representation or description of a scoring engine, of which one or more instances can be created by a runtime.\n", 251 | "\n", 252 | "That said, it is still intended to be a machine-generated and machine-consumed document." 253 | ] 254 | } 255 | ], 256 | "metadata": { 257 | "kernelspec": { 258 | "display_name": "Python 3", 259 | "language": "python", 260 | "name": "python3" 261 | }, 262 | "language_info": { 263 | "codemirror_mode": { 264 | "name": "ipython", 265 | "version": 3 266 | }, 267 | "file_extension": ".py", 268 | "mimetype": "text/x-python", 269 | "name": "python", 270 | "nbconvert_exporter": "python", 271 | "pygments_lexer": "ipython3", 272 | "version": "3.7.4" 273 | }, 274 | "name": "01-Model-Formats", 275 | "notebookId": 2375086480049053 276 | }, 277 | "nbformat": 4, 278 | "nbformat_minor": 4 279 | } 280 | --------------------------------------------------------------------------------