├── README.md ├── Day 1 The Basics of Rest APIs – What They Are and How to Design One.ipynb ├── Day 3 How to deploy your API on your choice of services – Heroku or Google Cloud.ipynb └── Day 2 How to Make an API for an Existing Python Machine Learning Project.ipynb /README.md: -------------------------------------------------------------------------------- 1 | Notebooks presented by [Rachael Tatman](https://www.kaggle.com/rtatman) during [Kaggle CareerCon 2019](https://www.kaggle.com/careercon2019). The notebooks cover the basics of REST APIs, how to wrap an ML model into a REST API and how to deploy that API on Heroku and GCP's AppEngine. 2 | 3 | The notebooks have been collected from Rachael's Kaggle profile and all of them have been prepared by Rachael herself. 4 | 5 | -------------------------------------------------------------------------------- /Day 1 The Basics of Rest APIs – What They Are and How to Design One.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This is day one of a three day event, held during [Kaggle CareerCon 2019](https://www.kaggle.com/careercon2019). Each day we’ll learn about a new part of developing an API and put it into practice. By day 3, you’ll have written and deployed an API of your very own!\n", 8 | "\n", 9 | " * **[Day 1: The Basics of Rest APIs – What They Are and How to Design One](https://www.kaggle.com/rtatman/careercon-intro-to-apis).** By the end of this day you’ll have written the OpenAPI specification for your API. \n", 10 | " * **[Day 2: How to Make an API for an Existing Python Machine Learning Project](https://www.kaggle.com/rtatman/careercon-making-an-app-from-your-modeling-code).** By the end of this day, you’ll have a Flask app that you can use to serve your model.\n", 11 | " * **[Day 3: How to deploy your API on your choice of services – Heroku or Google Cloud](https://www.kaggle.com/rtatman/careercon-deploying-apis-on-heroku-appengine/).** By the end of this day, you’ll have deployed your model and will be able to actually use your API! (Note that, in order to use Google Cloud, you’ll need to create a billing account, which requires a credit card. If you don’t have access to a credit, you can still use Heroku.)\n", 12 | "\n", 13 | "___" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "# Why would a data scientist want to build an API?\n", 21 | "\n", 22 | "If you’re like me, you probably didn’t build any APIs while you were learning data science. Instead, you might have focused on data cleaning, machine learning, statistics and visualization. But how do you get from “I’ve got a notebook with my modelling code” to “someone can use an app built around my project to draw little hearts around pictures of dogs”?\n", 23 | "\n", 24 | "Generally **if you’ve built a model that does some task you want people to be able to use it**. If you’re working on a team with software engineers, then they’ll probably take care of putting trained models into production. But even if that’s the case, they’ll probably appreciate it if you don’t just email them a notebook (unless your company is like Netflix and [does everything in notebooks](https://medium.com/netflix-techblog/notebook-innovation-591ee3221233)).\n", 25 | "\n", 26 | "> \"I think the truth is any [software engineer] worth their salt will refuse to put a notebook into production. They'll rewrite it from scratch if need be.\" - Jeremy Kun, Google Software Engineer\n", 27 | "\n", 28 | "Rewriting notebooks from scratch, probably in another programming language, isn’t a great use of anyone’s time. If you can give your software engineering colleagues an API instead it will make their life a lot easier, even if they do need to end up re-writing some of it or adding additional functionality. (For example, we’re not going to talk about authentication in these workshops but you probably don’t want to launch a public-facing app without any authentication. 😬) \n", 29 | "\n", 30 | "Even if you never need to train models for production, if you have an idea of how APIs work it will be easier for you to use them. (I remember how confused I was the first time I used the Twitter API! I think it was the first time I’d ever had to deal with a JSON file.) Since a *lot* of data and services are only available via API these days, I personally think it’s a good idea for professional data scientists to have a good idea how APIs work. \n", 31 | "\n", 32 | "# What is an API?\n", 33 | "\n", 34 | "API is short for “application programmatic interface”. \n", 35 | "\n", 36 | "* **Application** refers to code that’s been written to perform a specific task. An API serves as a way to move information between applications. For example, you may be writing a data science application to get sentiment scores for tweets about your company. In order to do this, you’ll need to get information (tweets about your company) from a different application (Twitter). \n", 37 | "* **Programmatic** means that you interact with the interface with code. This is different from graphical interfaces, where you have a visual interface. For example, if you use Twitter you can use the Twitter graphical interface to download your own tweets by clicking a button. If you want to download tweets by a number of different people, however, there’s no button to do that. Instead, you’ll need to write code that interacts with the Twitter API and specifies the type and number of tweets you want to download. \n", 38 | "* **Interface** just means that an API serves as a go-between for two interfaces. **You can think of an API as a postal service.** It defines a set of rules for how to move things around. It also creates the specific addresses you can use to send data to or receive data from. (At least in the US, post offices have little boxes that have their own addresses that you can rent and send things to/from.)\n", 39 | "\n", 40 | "There are several different ways to design and build APIs but the most commonly used is REST. REST is *also* an acronym; it stands for Representational State Transfer. There are a couple of important concepts that differentiate RESTful APIs. I’ll talk about a few of them here but if you’re *really* excited about RESTful API design, you should check out [Roy Fielding’s dissertation, which outlines the REST design philosophy](https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm).\n", 41 | "\n", 42 | "## Client-server\n", 43 | "\n", 44 | "REST uses a client-server architecture. This means that information is stored in one place (the server) and that interacting with that data is done via the client. Generally, each server is a single centralized computing resource that serves many clients.\n", 45 | "\n", 46 | "You’re probably somewhat familiar with this system from using the internet. Your browser (like Chrome, FireFox or Edge) is a client that interacts with the server of the websites you visit (like Kaggle, Twitter or YouTube). So when you interact with a website you’re not actually creating a copy of the entire website and all of its data on your local computer. Instead, you’re sending a series of requests to the server that’s hosting the website and you get back only the data you ask for each time.\n", 47 | "\n", 48 | "## Resources \n", 49 | "\n", 50 | "How does a client know which data to fetch, through? Or if a client sends data to the server, how is it stored and organized so that you can interact with it later?\n", 51 | "\n", 52 | "In REST **each chunk of data or entity is called a resource**. A resource could be anything: a text file, geo-coordinates or a specific customer. (If you’re familiar with object oriented programming, resources are objects.) Resources can have also relationships with one another. For example, each customer might be associated with the geo-coordinates of the stores they've visited. \n", 53 | "\n", 54 | "> *But Rachael, what about the “representational” part of REST?* Each resource is stored with additional information about, generally in a JSON file that also contains the resource itself. That file, with the resource and additional information about, is called a representation. To continue with our post office example, if a resource is a piece of mail, then the representation is the envelope around it with the address and other information. (There’s a [nice discussion of the distinction between resource & representation here](https://lists.w3.org/Archives/Public/public-html/2009Oct/0184.html) if you’re curious.) Assume from here on when I say “resource” I mean “a resource and its representation”.\n", 55 | "\n", 56 | "## Methods\n", 57 | "\n", 58 | "So how do we interact with our resources? REST API’s use methods that allow you to create, interact with and delete resources (you can think of these as functions). For APIs that are served over the web, the most common methods are:\n", 59 | "\n", 60 | "* GET, which will return one or more resources without changing them\n", 61 | "* POST, which creates new resources\n", 62 | "* PUT, which updates existing resources\n", 63 | "* DELETE, which removes existing resources\n", 64 | "\n", 65 | "In our example, we’re only going to be using POST. Because we’re not setting up a database for our application that will store data, we’ll need to create a new resource every time we send data to our API.\n", 66 | "\n", 67 | "# What will we be building? \n", 68 | "\n", 69 | "For my example, I’m going to be building an API that will extract keywords and their location (character index) from text I provide to the API. I’m going to specifically be getting the names of popular data science packages, but you could use this technique to find any list of keywords, like names of pet dogs, ice cream flavors or store addresses. You’re also free to develop your own model and build an API to serve that instead. \n", 70 | "\n", 71 | "Before we start writing any code, however, I want to make sure I know what my API will look like. I can use this 1) as a guide for how to structure my application and 2) as documentation. That way, when it’s time for someone else to use our API, it’s very easy for us to pass on information on what it does. \n", 72 | "\n", 73 | "Here are three questions we want to answer. \n", 74 | "\n", 75 | "* What do we want our API to do?\n", 76 | "* What are we passing in to do that thing?\n", 77 | "* What are we getting back when we do that thing?\n", 78 | "\n", 79 | "For this example, our answers might look something like this.\n", 80 | "\n", 81 | "* We want our API to tell us where specific keywords appear in our text. \n", 82 | "* We are passing in a JSON with our text file.\n", 83 | "* We’re getting a JSON with our keyword matches and where they occur in the input text.\n", 84 | "\n", 85 | "## Your turn!\n", 86 | "\n", 87 | "If you’re working on an API that does something different (like draws hearts around any dogs in a picture or classifies rows of a .csv file into “pass” and “fail”) take some time to answer these questions for yourself. \n", 88 | "\n", 89 | "It’s possible that you may want your API to do multiple things, in which case you’d want to answer all three questions for each thing you want it to do." 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "# Intro to OpenAPI\n", 97 | "\n", 98 | "Fortunately, we don’t have to rely on just asking ourselves questions to build a specification for our API. There are also some standards we can use to guide our API design. I’ve chosen to use [OpenAPI (formerly known as Swagger)](https://swagger.io/docs/specification/about/) because it’s open source and there’s a nice tooling ecosystem around it. (There are even tools that will build a Flask app for you from your specification!) If your company uses a different system, however, you’ll want to use theirs. \n", 99 | "\n", 100 | "Here’s the full specification for our little sample API: \n", 101 | "\n", 102 | "```\n", 103 | "openapi: \"3.0.0\"\n", 104 | "\n", 105 | "info:\n", 106 | " title: \"Package name extractor\"\n", 107 | " description: \"API that accepts a text string and returns packages names in it.\"\n", 108 | " version: \"1.0\"\n", 109 | "\n", 110 | "paths:\n", 111 | " /extractpackages:\n", 112 | " post:\n", 113 | " description: \"Extract package names\"\n", 114 | " \n", 115 | " requestBody:\n", 116 | " description: \"Json with single field containing text to extract entities from\"\n", 117 | " required: true\n", 118 | " content:\n", 119 | " application/json: {}\n", 120 | " \n", 121 | " responses:\n", 122 | " '200':\n", 123 | " description: \"Returns names & indexs of packages in the provided text\"\n", 124 | " content: \n", 125 | " application/json: {}\n", 126 | "\n", 127 | "```\n", 128 | "\n", 129 | "There’s kind of a lot going on there, so let’s break it down piece by piece.\n", 130 | "\n", 131 | "First, we’re specifying what version of OpenAPI we’re using. I’ve picked 3.0.0 because it’s the most recent major version. You’d probably only want to pick a different version if you were working on a project with an existing specification that was written in an earlier version. \n", 132 | "\n", 133 | "```\n", 134 | "openapi: \"3.0.0\"\n", 135 | "```\n", 136 | "\n", 137 | "Next we’ve got information on our specific API .I’ve given it a name, a short description and a version number. Since this is the first version of my app, I’m calling it 1.0. If I make major changes to my API in the future, I’ll want to update the specification and create a new version. \n", 138 | "\n", 139 | "```\n", 140 | "info:\n", 141 | " title: \"Package name extractor\"\n", 142 | " description: \"API that accepts a text string and returns packages names in it.\"\n", 143 | " version: \"1.0\"\n", 144 | "```\n", 145 | "\n", 146 | "Finally, we have the meat of the specificion; our methods. Here we have only one method. It’s at the URL [whatever my app’s URL is]/extractpackages. \n", 147 | "\n", 148 | "It’s a POST method that extracts package names. We pass in a JSON file and, if everything goes well, get back a JSON file. This is the same information that we provided in the answer to the questions in the previous section, just in a machine-readable format.\n", 149 | "\n", 150 | "> The information that the client passes to the server is called a “request” and the information that the server passes back to the client is called a “response”. \n", 151 | "\n", 152 | "The “200” is a HTTP response status code. \"200\" in partiuclar means that the request was accepted and everything’s ok. If we get any other response status code, then our API will return nothing. (The server will probably return send the usual error code it generates for a specififc error, though, like \"404\" if the server can't find what the client requests.)\n", 153 | "\n", 154 | "```\n", 155 | "paths:\n", 156 | " /extractpackages:\n", 157 | " post:\n", 158 | " description: \"Extract package names\"\n", 159 | " \n", 160 | " requestBody:\n", 161 | " description: \"Json with single field containing text to extract entities from\"\n", 162 | " required: true\n", 163 | " content:\n", 164 | " application/json: {}\n", 165 | " \n", 166 | " responses:\n", 167 | " '200':\n", 168 | " description: \"Returns names & indexs of packages in the provided text\"\n", 169 | " content: \n", 170 | " application/json: {}\n", 171 | "```\n", 172 | "\n", 173 | "## Your turn!\n", 174 | "\n", 175 | "Now it’s time for you to write your own specification. You might find it easier to use the [Swagger editor](http://editor.swagger.io/), which has lots of nice features for designing OpenAPI specifications.\n", 176 | "\n", 177 | "I'd recommend copying the example I gave you above and editing it so that it describes the API you want to build. If you'd like feedback from other people, feel free to share a link in the comments! (You can download your specification as a YAML file from the Swagger editor and then upload it as a dataset or to a GitHub repo.) \n", 178 | "\n", 179 | "Tomorrow we'll use our specifications to create our API by writing a Flask app. " 180 | ] 181 | } 182 | ], 183 | "metadata": { 184 | "kernelspec": { 185 | "display_name": "Python 3", 186 | "language": "python", 187 | "name": "python3" 188 | }, 189 | "language_info": { 190 | "codemirror_mode": { 191 | "name": "ipython", 192 | "version": 3 193 | }, 194 | "file_extension": ".py", 195 | "mimetype": "text/x-python", 196 | "name": "python", 197 | "nbconvert_exporter": "python", 198 | "pygments_lexer": "ipython3", 199 | "version": "3.6.4" 200 | } 201 | }, 202 | "nbformat": 4, 203 | "nbformat_minor": 1 204 | } 205 | -------------------------------------------------------------------------------- /Day 3 How to deploy your API on your choice of services – Heroku or Google Cloud.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This is day three of a three day event, held during [Kaggle CareerCon 2019](https://www.kaggle.com/careercon2019). Each day we’ll learn about a new part of developing an API and put it into practice. By day 3, you’ll have written and deployed an API of your very own!\n", 8 | "\n", 9 | " * **[Day 1: The Basics of Rest APIs – What They Are and How to Design One](https://www.kaggle.com/rtatman/careercon-intro-to-apis).** By the end of this day you’ll have written the OpenAPI specification for your API. \n", 10 | " * **[Day 2: How to Make an API for an Existing Python Machine Learning Project](https://www.kaggle.com/rtatman/careercon-making-an-app-from-your-modeling-code).** By the end of this day, you’ll have a Flask app that you can use to serve your model.\n", 11 | " * **[Day 3: How to deploy your API on your choice of services – Heroku or Google Cloud](https://www.kaggle.com/rtatman/careercon-deploying-apis-on-heroku-appengine/).** By the end of this day, you’ll have deployed your model and will be able to actually use your API! (Note that, in order to use Google Cloud, you’ll need to create a billing account, which requires a credit card. If you don’t have access to a credit, you can still use Heroku.)\n", 12 | "\n", 13 | "___\n" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "Today I'm going to walk you through how to deploy your API using either **Heroku** or **AppEngine**! They're fairly similar services, but I wanted to give you a chance to use both and see what you prefer. By using two services I can also show you how different platforms expect a different file structure for your apps.\n", 21 | "\n", 22 | "* **Heroku** is a platform as a service that's designed specifically for serving applications. You don't need to have a credit card to create an API on Heroku, but you'll be limited by what's offered through the free tier.\n", 23 | "* **AppEngine** is part of Google Cloud and is also a way to serve apps. You do need a credit card to create an app on AppEngine, but once it's set up you can [go to your appengine settings](https://console.cloud.google.com/appengine/settings) and set the daily spending limit to 0. This will keep you from being charged.\n", 24 | "\n", 25 | "I'll start with Heroku, then talk about AppEngine and finally show you how to query your API using Python. :)\n", 26 | "\n", 27 | "# Heroku\n", 28 | "\n", 29 | "## Edit files on GitHub\n", 30 | "\n", 31 | "For your conviencice, I've created an extrememly simple sample app on GitHub: https://github.com/rctatman/minimal_flask_example_heroku. You'll want to head over to GitHub, fork this repo and then edit the relevant files for your specific app. \n", 32 | "\n", 33 | "Here's a quick guide to each of the files and information that tells you which to edit:\n", 34 | "\n", 35 | "### Files you'll put the code you wrote yesterday into \n", 36 | "\n", 37 | "[In yesterday's notebook](https://www.kaggle.com/rtatman/careercon-making-an-app-from-your-modeling-code), as the very last exercise we wrote two cells of code, each of which will be a single file. \n", 38 | "\n", 39 | "* **serve.py**: This is the file you should put the code from the first cell in; this is where you'll define the functions that will read in your trained models.\n", 40 | "* **script.py**: This is where you'll put the code from your second notebook. This is what will define the behavior of our different endpoints. \n", 41 | "\n", 42 | "### Files you'll need to add\n", 43 | "\n", 44 | "If you are going to use a pre-trained model, be sure to add it to your repo so that you can read it in. If you like, you can store your models in a new file, but if you do this be sure to update the file path of the code that you read them with. So if you have a model called \"my_model.pkl\" in a folder called \"models\", you'll need to update the code that reads it in from this:\n", 45 | "\n", 46 | "\n", 47 | "```\n", 48 | "pickle.load(open(\"my_model.pkl\", \"rb\")\n", 49 | "```\n", 50 | "\n", 51 | "to this:\n", 52 | "\n", 53 | "```\n", 54 | "pickle.load(open(\"models/my_model.pkl\", \"rb\")\n", 55 | "```\n", 56 | "\n", 57 | "### Files you'll need to edit\n", 58 | "\n", 59 | "* **README**: This is the file you're currently reading. You'll probably want to update this to have information about your specific API and how to use it.\n", 60 | "* **openapi.yaml**: You can relace this file with the specification file that we wrote on day one. ([The notebook's here if you need a refresher](https://www.kaggle.com/rtatman/careercon-intro-to-apis)).\n", 61 | "* **requirements.txt**: This file has information on what packages you use in your app. *You need to make sure that you list every package you import and also gunicorn*. If you remove the line with gunicorn or forget to include a package you import somewhere else, you'll get an error when you try to run your app. \n", 62 | "* **runtime.txt**: This file tells Heroku which version of Python to use to run your app. You'll only need to update this file if you pickled your model file using a different version of Python & that's causing your code to break. \n", 63 | "\n", 64 | "\n", 65 | "### File you don't need to edit\n", 66 | "\n", 67 | "* **LICENSE**: This file is the license your code is released under. If you don't include a license, other folks won't be able to reuse your code. If you fork this repository for your own work, you'll need to keep the license. I've used Apache 2.0 here because that's the same license as public Kaggle Kernels. \n", 68 | "* **Procfile**: This file is required by Heroku. It tells Heroku how to run your application. You probably don't need to change this file. \n", 69 | "\n", 70 | "\n", 71 | "## Deploy to Heroku\n", 72 | "\n", 73 | "Once you've edited your files, you're ready to deploy to [Heroku](www.heroku.com). \n", 74 | "\n", 75 | "* Create a new account or sign into your existing account\n", 76 | "* Go to https://dashboard.heroku.com/apps\n", 77 | "* Create your app\n", 78 | " * Click on “create new app”\n", 79 | " * Give it a name & choose your region\n", 80 | " * Hit “create app”\n", 81 | "* Connect to GitHub repo \n", 82 | " * Click on the Deploy tab\n", 83 | " * CLick on Connect GitHub & search for the repo you want to add (make sure you've forked the repo; you'll only be able to connect to a GitHub repo you own)\n", 84 | "* Deploy your app\n", 85 | " * Next to “Manual deploy” hit “Deploy Branch”\n", 86 | " * If you hit “open app”, you should open a new browser page that points to the URL your app is served from. (Unless you put something at the endpoint \"\\\" it will probably just be a 404 page.)\n", 87 | " \n", 88 | "And that's it! Your app is live. :)\n", 89 | "\n", 90 | "> **What if you run into trouble?** If your app isn't working, click on the [MORE] button in the upper right hand corner, then on \"View logs\". This will show a detailed log of whatever went wrong." 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "# AppEngine\n", 98 | "\n", 99 | "\n", 100 | "For your convenience, I've created an extremely simple sample app on GitHub: https://github.com/rctatman/minimal_flask_example_appengine. You'll want to head over to GitHub, fork this repo and then edit the relevant files for your specific app. \n", 101 | "\n", 102 | "Here's a quick guide to each of the files and information that tells you which to edit. Note that this is *different* from the files for Heroku; the two different services expect different file configurations.\n", 103 | "\n", 104 | "## Edit files on GitHub\n", 105 | "\n", 106 | "### Files you'll put the code you wrote yesterday into \n", 107 | "\n", 108 | "[In yesterday's notebook](https://www.kaggle.com/rtatman/careercon-making-an-app-from-your-modeling-code), as the very last exercise we wrote two cells of code, each of which will be a single file. \n", 109 | "\n", 110 | "* **serve.py**: This is the file you should put the code from the first cell in; this is where you'll define the functions that will read in your trained models.\n", 111 | "* **main.py**: This is where you'll put the code from your second notebook. This is what will define the behavior of our different endpoints. \n", 112 | "\n", 113 | "### Files you'll need to add\n", 114 | "\n", 115 | "If you are going to use a pre-trained model, be sure to add it to your repo so that you can read it in. If you like, you can store your models in a new file, but if you do this be sure to update the file path of the code that you read them with. So if you have a model called \"my_model.pkl\" in a folder called \"models\", you'll need to update the code that reads it in from this:\n", 116 | "\n", 117 | "\n", 118 | "```\n", 119 | "pickle.load(open(\"my_model.pkl\", \"rb\")\n", 120 | "```\n", 121 | "\n", 122 | "to this:\n", 123 | "\n", 124 | "```\n", 125 | "pickle.load(open(\"models/my_model.pkl\", \"rb\")\n", 126 | "```\n", 127 | "\n", 128 | "### Files you'll need to edit\n", 129 | "\n", 130 | "* **README**: This is the file you're currently reading. You'll probably want to update this to have information about your specific API and how to use it.\n", 131 | "* **openapi.yaml**: You can relace this file with the specification file that we wrote on day one. ([The notebook's here if you need a refresher](https://www.kaggle.com/rtatman/careercon-intro-to-apis)).\n", 132 | "* **requirements.txt**: This file has information on what packages you use in your app. It's currently empty becuase I didn't import any packages, but you'll need to include all the packages you use, one per line as show below. If you forget to include a package you import somewhere else, you'll get an error when you try to run your app.\n", 133 | "\n", 134 | "```\n", 135 | "numpy\n", 136 | "pandas\n", 137 | "future\n", 138 | "```\n", 139 | "\n", 140 | "### Files you don't need to edit\n", 141 | "\n", 142 | "* **LICENSE**: This file is the license your code is released under. If you don't include a license, other folks won't be able to reuse your code. If you fork this repository for your own work, you'll need to keep the license. I've used Apache 2.0 here because that's the same license as public Kaggle Kernels. \n", 143 | "* **app.yaml**: This file tells AppEngine which version of Python to use to run your app. You don't need to edit this.\n", 144 | "* **index.yaml**: This file is required by AppEngine and tells it how to index the data you send to Datastore. Since we're not using Datastore, we can just ignore this file.\n", 145 | "\n", 146 | "## Deploy to AppEngine\n", 147 | "\n", 148 | "Now you're ready to deploy your app! We're going to be interacting with AppEngine via Cloud Shell. You can use the GUI as well, but I personally like Cloud Shell. :)\n", 149 | "\n", 150 | "> *Don't forget to sign into your GCP account or create one if you don't have one! You'll also want to set up a billing account you can connect your project to in order to build your app.*\n", 151 | "\n", 152 | "I know this looks like a lot of steps, but I've tried to be very clear so you know what to do at each step. \n", 153 | "\n", 154 | "* Copy your repo into your Cloud Shell VM\n", 155 | " * Either edit and use the button in the GitHub README OR \n", 156 | " * Go to the Cloud Shell (https://console.cloud.google.com/cloudshell/editor) and clone it yourself: git clone [GITHUB-URL]\n", 157 | " * Move into the repo by running `cd [NAME-OF-REPO]` in the black console at the bottom of the screen, which is where you'll run all the commands from here on out. (You'll need to replace [NAME-OF-REPO] with your actual repo.)\n", 158 | "* Launch your app locally (helpful for testing)\n", 159 | " * Use this command to test deploy your app: \n", 160 | " `dev_appserver.py ./app.yaml`\n", 161 | " * Once you see output like: \"Booting worker with pid\" in the command line, you can see your app by hitting the button that looks like <> in a browser window at the top right hand side of your screen. This will open a new tab running your app. If you haven't put anything at the \"\\\" end point, this will just a 404. \n", 162 | " * Use `CTRL + C` to close your app\n", 163 | "* Create a project & enable billing.\n", 164 | " * Run these commands, replacing [YOUR-PROJECT-ID] with your actual product ID. \n", 165 | " * `gcloud projects create [YOUR-PROJECT-ID]`\n", 166 | " * `gcloud config set project [YOUR-PROJECT-ID]`\n", 167 | " * You'll see your project id in yellow\n", 168 | " * Enable cloud build by going to this URL & clicking \"enable\", then following the prompts: https://console.developers.google.com/apis/library/cloudbuild.googleapis.com. \n", 169 | "* Launch the app!\n", 170 | " * Deploy your app by running this command:\n", 171 | " * `gcloud app deploy ./index.yaml ./app.yaml`\n", 172 | " * Pick a region (I'd recommend one closer to you to reduce latency)\n", 173 | " * Enter \"y\" when asked whether you want to continue\n", 174 | " * After it's finished deploying, your app will be at the URL: https://[YOUR-PROJECT-ID].appspot.com/\n", 175 | "* You can query your app directly from Cloud Shell! :)\n", 176 | " * Run these commands to query your app, replacing the [text in brackets] as applicable for your project.\n", 177 | " * `python`\n", 178 | " * `import requests`\n", 179 | " * `requests.[METHOD]('https://[YOUR-PROJECT-ID].appspot.com/[YOUR-ENDPOINT-NAME], json=[JSON-TO-SEND]).json()`" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 1, 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "ename": "NameError", 189 | "evalue": "name 'requests' is not defined", 190 | "output_type": "error", 191 | "traceback": [ 192 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 193 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 194 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrequests\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpost\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'https://api-test-project-236423.appspot.com/api'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minput_text\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 195 | "\u001b[0;31mNameError\u001b[0m: name 'requests' is not defined" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "requests.post('https://api-test-project-236423.appspot.com/api', json=input_text).text" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "# Querying APIs using requests\n", 208 | "\n", 209 | "Ok, now that you've got your app up and running, how do you actually query it? Probably the simplest way to do this from Python is using the requests library. The anatomy of a request is like so:\n", 210 | "\n", 211 | "> requests.[method]([url/endpoint], expected_input_type=some_python_object)\n", 212 | "\n", 213 | "This will send a request to your API and, hopefully, return a response. If it works, you'll probably just see the response code output: \n", 214 | "\n", 215 | "``\n", 216 | "\n", 217 | "In order to get the data that was returned, you can append `.json()` to parse any JSON that was returned or `.text` just to see the raw strings.\n", 218 | "\n", 219 | "Here are some example queries for the sample app I've talked about in my notebooks so far. (Note that I'm only using the free tier of each service to serve these, so if people make a lot of requests and I hit my quota, they'll stop working.)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 2, 225 | "metadata": {}, 226 | "outputs": [ 227 | { 228 | "data": { 229 | "text/plain": [ 230 | "[['pandas', 0, 6], ['numpy', 44, 49], ['future', 61, 67]]" 231 | ] 232 | }, 233 | "execution_count": 2, 234 | "metadata": {}, 235 | "output_type": "execute_result" 236 | } 237 | ], 238 | "source": [ 239 | "import requests\n", 240 | "\n", 241 | "input_text = \"Pandas is my favorite library. I don't like numpy as much as future.\"\n", 242 | "\n", 243 | "# this queries an app I already have running on Heroku\n", 244 | "# code: https://github.com/rctatman/flask_example_heroku\n", 245 | "requests.post('http://kaggle-test.herokuapp.com/extractpackages', json=input_text).json()" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "For the AppEngine version, I called my endpoint \"api\" instead of \"extractpackages\" and I'm honestly just too lazy to change it at this point. Otherwise it looks pretty much the same, but with a different URL." 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 3, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "data": { 262 | "text/plain": [ 263 | "[['pandas', 0, 6], ['numpy', 44, 49], ['future', 61, 67]]" 264 | ] 265 | }, 266 | "execution_count": 3, 267 | "metadata": {}, 268 | "output_type": "execute_result" 269 | } 270 | ], 271 | "source": [ 272 | "# and this queries a pretty much identical app running on AppEngine\n", 273 | "# code: https://github.com/rctatman/flask_example_appengine\n", 274 | "requests.post('https://api-test-project-236423.appspot.com/api', json=input_text).json()" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "And that's it! Over the last three days we've:\n", 282 | "\n", 283 | "* Designed an API and written a specification\n", 284 | "* Prepared our code & models to be put in a Flask App\n", 285 | "* Written the app itself\n", 286 | "* Deployed and used our very own APIs\n", 287 | "\n", 288 | "I hope you found these notebooks helpful and learned something new about APIs. I'd love to hear about what you all built in the comments!" 289 | ] 290 | } 291 | ], 292 | "metadata": { 293 | "kernelspec": { 294 | "display_name": "Python 3", 295 | "language": "python", 296 | "name": "python3" 297 | }, 298 | "language_info": { 299 | "codemirror_mode": { 300 | "name": "ipython", 301 | "version": 3 302 | }, 303 | "file_extension": ".py", 304 | "mimetype": "text/x-python", 305 | "name": "python", 306 | "nbconvert_exporter": "python", 307 | "pygments_lexer": "ipython3", 308 | "version": "3.6.4" 309 | } 310 | }, 311 | "nbformat": 4, 312 | "nbformat_minor": 1 313 | } 314 | -------------------------------------------------------------------------------- /Day 2 How to Make an API for an Existing Python Machine Learning Project.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This is day two of a three day event, held during [Kaggle CareerCon 2019](https://www.kaggle.com/careercon2019). Each day we’ll learn about a new part of developing an API and put it into practice. By day 3, you’ll have written and deployed an API of your very own!\n", 8 | "\n", 9 | " * **[Day 1: The Basics of Rest APIs – What They Are and How to Design One](https://www.kaggle.com/rtatman/careercon-intro-to-apis).** By the end of this day you’ll have written the OpenAPI specification for your API. \n", 10 | " * **[Day 2: How to Make an API for an Existing Python Machine Learning Project](https://www.kaggle.com/rtatman/careercon-making-an-app-from-your-modeling-code).** By the end of this day, you’ll have a Flask app that you can use to serve your model.\n", 11 | " * **[Day 3: How to deploy your API on your choice of services – Heroku or Google Cloud](https://www.kaggle.com/rtatman/careercon-deploying-apis-on-heroku-appengine/).** By the end of this day, you’ll have deployed your model and will be able to actually use your API! (Note that, in order to use Google Cloud, you’ll need to create a billing account, which requires a credit card. If you don’t have access to a credit, you can still use Heroku.)\n", 12 | "\n", 13 | "___\n" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "Alright, now that we know how other people will interact with our code using an API, we’re ready to get coding! Today we’re going to do two things:\n", 21 | "\n", 22 | "1. Get our code ready to be served through an app\n", 23 | "2. Write a Flask app (well, most of one)\n", 24 | "\n", 25 | "Let’s get started!\n", 26 | "\n", 27 | "# Get your modelling code ready to be put in an app\n", 28 | "\n", 29 | "Since you spent some time thinking about your app yesterday, you hopefully have a pretty good idea what you want your Python code to do. Today we’re going to follow a three-step model of development. \n", 30 | "\n", 31 | "* **Make it work**. The first thing you want to do is get your code doing whatever it is that you intend for it to do. It doesn’t have to be beautiful or perfectly optimized (you can always go back and refector it later), it just needs to work.\n", 32 | "* **Make it pretty**. At this stage, you can spend some time tidying up your code. Adding comments, creating functions for particular tasks, making sure your variable names are informative; things that will make it easier for other people to read and use your code.\n", 33 | "* **Make it portable**. Finally, you want to make your code portable. This includes saving out your model so that you can load it into a new session and use it to make predictions.\n", 34 | "\n", 35 | "Let’s see an example of what this process looks like. \n", 36 | "\n", 37 | "## Make it work\n", 38 | "\n", 39 | "First, I wrote a quick script that gets all instances of Python package names & their indexes from some sample text. The general workflow is: \n", 40 | "\n", 41 | "1. Get a list of Python packages (from [this list](https://hugovk.github.io/top-pypi-packages/) helpful maintained by [hugovk on GitHub](https://github.com/hugovk)). \n", 42 | "2. Use that list to create a flashtext KeywordProcessor object. This object will let us use flashtext to find our terms. **Flashtext can be more than 20 times faster than using regular expressions to extract a list of keywords** and you also don't have to write any Perl. You can find more information about the package [here](https://flashtext.readthedocs.io/en/latest/).\n", 43 | "3. Remove common English words and words that are used often on the Kaggle forums, since these are unlikely to be referring to a specific Python package. ([Even though there is a Python package called \"the\"](https://pypi.org/project/the/) and people write \"the\" very often in the forums, they're never actually talking about the package called \"the\".)\n", 44 | "4. Put our KeywordProcessor into action and see if it works!" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 1, 50 | "metadata": { 51 | "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", 52 | "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" 53 | }, 54 | "outputs": [ 55 | { 56 | "name": "stdout", 57 | "output_type": "stream", 58 | "text": [ 59 | "[('dataset', 34, 41), ('dataset', 123, 130)]\n", 60 | "[('dataset', 4, 11), ('public', 119, 125), ('dataset', 149, 156), ('dataset', 201, 208), ('dataset', 374, 381), ('dataset', 433, 440), ('dataset', 656, 663), ('common', 844, 850), ('events', 1034, 1040)]\n", 61 | "[('html', 318, 322), ('html', 370, 374), ('html', 418, 422), ('html', 460, 464)]\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "import numpy as np\n", 67 | "import pandas as pd\n", 68 | "import requests\n", 69 | "from flashtext.keyword import KeywordProcessor\n", 70 | "from nltk.corpus import stopwords\n", 71 | "\n", 72 | "# let's read in a couple of forum posts\n", 73 | "forum_posts = pd.read_csv(\"../input/ForumMessages.csv\")\n", 74 | "\n", 75 | "# get a smaller sub-set for playing around with\n", 76 | "sample_posts = forum_posts.Message[0:3]\n", 77 | "\n", 78 | "# get data from list of top 5000 pypi packages (last 30 days)\n", 79 | "url = 'https://hugovk.github.io/top-pypi-packages/top-pypi-packages-30-days.json'\n", 80 | "data = requests.get(url).json()\n", 81 | "\n", 82 | "# get just the list of package names\n", 83 | "list_of_packages = [data_item['project'] for data_item in data['rows']]\n", 84 | "\n", 85 | "# create a KeywordProcess\n", 86 | "keyword_processor = KeywordProcessor()\n", 87 | "keyword_processor.add_keywords_from_list(list_of_packages)\n", 88 | "\n", 89 | "# remove english stopwords\n", 90 | "keyword_processor.remove_keywords_from_list(stopwords.words('english'))\n", 91 | "\n", 92 | "# remove custom stopwords\n", 93 | "keyword_processor.remove_keywords_from_list(['http','kaggle'])\n", 94 | "\n", 95 | "# test our keyword processor\n", 96 | "for post in sample_posts:\n", 97 | " keywords_found = keyword_processor.extract_keywords(post, span_info=True)\n", 98 | " print(keywords_found)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "Yay, it works! The code isn't the prettiest, though. Even though I've included comments, I've just done everything in a single long script. If I break my code up into functions it will make it more modular and easier to read.\n", 106 | "\n", 107 | "## Make it pretty\n", 108 | "\n", 109 | "My main goal in gussying up my code is to make it easier to add to my app later. So, I want to create functions that will let me do this. I broke my code into two functions. \n", 110 | "\n", 111 | "* One function does what I want my app to do: takes in a pre-trained model (here our word processor) and then applies it\n", 112 | "* The other creates a keyword model, automating the preprocessing we did. That way if we want to create a new keyword processor with different words in the future we can easily do that.\n", 113 | "\n", 114 | "With a little refactoring, my code now looks like this:" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 2, 120 | "metadata": { 121 | "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", 122 | "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a" 123 | }, 124 | "outputs": [ 125 | { 126 | "name": "stdout", 127 | "output_type": "stream", 128 | "text": [ 129 | "['dataset', 'dataset']\n", 130 | "['dataset', 'public', 'dataset', 'dataset', 'dataset', 'dataset', 'dataset', 'common', 'events']\n", 131 | "['html', 'html', 'html', 'html']\n" 132 | ] 133 | } 134 | ], 135 | "source": [ 136 | "# I'm not going to read in the packages & data again since it's \n", 137 | "# already in our current environment.\n", 138 | "\n", 139 | "def create_keywordProcessor(list_of_terms, remove_stopwords=True, \n", 140 | " custom_stopword_list=[\"\"]):\n", 141 | " \"\"\" Creates a new flashtext KeywordProcessor and opetionally \n", 142 | " does some lightweight text cleaning to remove stopwords, including\n", 143 | " any provided by the user.\n", 144 | " \"\"\"\n", 145 | " # create a KeywordProcessor\n", 146 | " keyword_processor = KeywordProcessor()\n", 147 | " keyword_processor.add_keywords_from_list(list_of_terms)\n", 148 | "\n", 149 | " # remove English stopwords if requested\n", 150 | " if remove_stopwords == True:\n", 151 | " keyword_processor.remove_keywords_from_list(stopwords.words('english'))\n", 152 | "\n", 153 | " # remove custom stopwords\n", 154 | " keyword_processor.remove_keywords_from_list(custom_stopword_list)\n", 155 | " \n", 156 | " return(keyword_processor)\n", 157 | "\n", 158 | "def apply_keywordProcessor(keywordProcessor, text, span_info=True):\n", 159 | " \"\"\" Applies an existing keywordProcessor to a given piece of text. \n", 160 | " Will return spans by default. \n", 161 | " \"\"\"\n", 162 | " keywords_found = keywordProcessor.extract_keywords(text, span_info=span_info)\n", 163 | " return(keywords_found)\n", 164 | " \n", 165 | "\n", 166 | "# create a keywordProcessor of python packages \n", 167 | "py_package_keywordProcessor = create_keywordProcessor(list_of_packages, \n", 168 | " custom_stopword_list=[\"kaggle\", \"http\"])\n", 169 | "\n", 170 | "# apply it to some sample posts (with apply_keywordProcessor function, omitting\n", 171 | "# span information)\n", 172 | "for post in sample_posts:\n", 173 | " text = apply_keywordProcessor(py_package_keywordProcessor, post, span_info=False)\n", 174 | " print(text)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "## Make it portable\n", 182 | "\n", 183 | "So far, we've gotten our code working and have refactored it so that it's more modular and other people trying to use it will be able to follow it more easily. But how can we take a model we've trained in one place and apply it in another? By saving the model and then loading it into a new environment. \n", 184 | "\n", 185 | "> In general, ML APIs are used for inference. In other words, you have a pre-trained model that is applied to the data that is sent to it. You could create an API that trains models, but you will probably have to end up having to pay for the compute used to train those models. This can become expensive. Particularly since I'm not covering how to do authentication, which means anyone would be able to train as many models as they wanted with your API, **I'd recommend avoiding including model training in your API.** I'm assuming going forward that your API will use a pretrained model.\n", 186 | "\n", 187 | "So how do you save a model and then read it back into Python? It depends on the specific type of model you're training. \n", 188 | "\n", 189 | "### Library-specific methods for saving models\n", 190 | "\n", 191 | "If the library you used to train your model has a specific set of methods to save and load model file, then I'd recommend using them. Here are the methods for some popular machine learning libraries.\n", 192 | "\n", 193 | "* For *TensorFlow* and *Keras*, you can save models using `model.save_weights()` and read it in using `model.load_weights()`. By default, your model will be saved as a HDF5 file. You can find [more information on tf here](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#manually_save_weights) and [more on Keras here](https://pytorch.org/tutorials/beginner/saving_loading_models.html). .\n", 194 | "* For *PyTorch* you can use `torch.save()` and `torch.load()`. PyTorch saves models as pickles. [More info here](https://pytorch.org/tutorials/beginner/saving_loading_models.html). \n", 195 | "* For *XGBoost*, you can save models with `model.dump_model()` and load them with `model.load_model()`. [More info here](https://xgboost.readthedocs.io/en/latest/python/python_intro.html#training).\n", 196 | "* For *LightGBM*, you can use `model_to_string()` and `model_from_string()`. [More info here](https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.Booster.model_from_string). \n", 197 | "\n", 198 | "### If there's no library-specific technique\n", 199 | "\n", 200 | "If the library you're using doesn't have a specific method for saving out models, probably the easiest choice is to save your model as a pickle. [Pickles are a serialized data format for Python](https://docs.python.org/2/library/pickle.html), which means you can read them directly into your current environment as variables. If your model is built as one or more numpy arrays, [you may be able to use HDF5 instead](https://www.christopherlovell.co.uk/blog/2016/04/27/h5py-intro.html), but pickles can handle more different data structures.\n", 201 | "\n", 202 | "**🥒 A quick warning about pickles 🥒** You should be aware that if there are differences between the version or subversion of Python you used to pickle your model and the one you try to unpickle it with, you model can fail to load. (This is also true of packages with different versions!) To avoid this, you can specify the specific versions of Python and each package you used when you create your app in your requirements file. \n", 203 | "\n", 204 | "**🥒🥒 A second quick warning about pickles 🥒🥒** While they can be useful for moving Python objects around, it’s also not a very secure file format. [As the Python wiki puts it:](https://wiki.python.org/moin/UsingPickle) “Pickle files can be hacked. If you receive a raw pickle file over the network, don't trust it! It could have malicious code in it, that would run arbitrary python when you try to de-pickle it.”\n", 205 | "\n", 206 | "In this case, the Flashtext module doesn’t have a native way to save a load objects (at least that I can find), so I’ll save my keyword processor as a pickle using the pickle library. \n" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 3, 212 | "metadata": {}, 213 | "outputs": [ 214 | { 215 | "name": "stdout", 216 | "output_type": "stream", 217 | "text": [ 218 | "__notebook__.ipynb __output__.json processor.pkl\r\n" 219 | ] 220 | } 221 | ], 222 | "source": [ 223 | "import pickle\n", 224 | "\n", 225 | "# save our file (make sure our file permissions are \"wb\", \n", 226 | "# which will let us _w_rite a _b_inary file)\n", 227 | "pickle.dump(py_package_keywordProcessor, open(\"processor.pkl\", \"wb\"))\n", 228 | "\n", 229 | "# check our current directory to make sure it saved\n", 230 | "!ls" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "From here, we can load our model back in, assign it to a variable and apply it right away. \n", 238 | "\n", 239 | "> **How can I download my trained model from this notebook?** Right now, once you've saved a file, the easiest way to download it is to commit your notebook. Then you'll be able to download it from the \"output\" section of the notebook viewer. This will also give you a record of what code was used to produce your file so you can reproduce your work later on." 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 4, 245 | "metadata": {}, 246 | "outputs": [ 247 | { 248 | "data": { 249 | "text/plain": [ 250 | "[('pandas', 7, 13), ('numpy', 14, 19), ('seaborn', 24, 31)]" 251 | ] 252 | }, 253 | "execution_count": 4, 254 | "metadata": {}, 255 | "output_type": "execute_result" 256 | } 257 | ], 258 | "source": [ 259 | "# read in a processor from our pickled file. Don't forget to \n", 260 | "# include \"rb\", which lets us _r_ead a _b_inary file.\n", 261 | "pickle_keywordProcessor = pickle.load(open(\"processor.pkl\", \"rb\"))\n", 262 | "\n", 263 | "# apply it to some sample text to make sure it works\n", 264 | "apply_keywordProcessor(pickle_keywordProcessor, \"I like pandas numpy and seaborn\") \n" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "At this point, we’re ready to actually start creating our Flask app. We've got: \n", 272 | "\n", 273 | "* **A trained model**. In this example, it’s a KeywordProcessor object that will let us find Python package names. It could be any model you like, though. \n", 274 | "* **A function to apply that model**. In this example, applying our model is pretty straightforward. Depending on what you’re working on, it could be more involved. For example, you may need to crop or resize images so that they fit the pixel dimensions that your model expects. \n", 275 | "\n", 276 | "Since we’re using a trained model we don’t actually need to put any of the code or data we used to train it into our app. We can just use our model and the function that applies it. \n", 277 | "\n", 278 | "## Your turn\n", 279 | "\n", 280 | "**Make it work!** Don’t worry about getting too fancy, just get your code working. " 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 5, 286 | "metadata": {}, 287 | "outputs": [], 288 | "source": [ 289 | "# your code here :)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "**Make it pretty!** Copy and paste your code from the cell above into this cell. Now you can spend some time commenting and refactoring your code so that it’s ready to share. I’d recommend at the very least writing a function to apply your model. " 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 6, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "# your code here :)" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "**Make it portable!** Finally, you’ll need to save out your trained model. You might want to read it back into your notebook and apply your prediction function to make sure it all works. :)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 7, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "# your code here :)" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "\n", 329 | "# Writing a Flask App\n", 330 | "\n", 331 | "Alright, full disclosure; we're not writing an entire app today. (Sorry!) This is because what specific files you need to include in your app will depend on the platform you're using to deploy your API. We'll cover that in detail tomorrow. \n", 332 | "\n", 333 | "We will, however, be writing the core code for our app. This will be broken into two files. Just to make it easier to follow along, I'll be making each \"file\" a single notebook cell, but to actually serve our app we'll need to save each as separate cells. Again, we'll cover all these steps tomorrow.\n", 334 | "\n", 335 | "So, what will be in our two files?\n", 336 | "\n", 337 | "* **A file serve.py that will:**\n", 338 | " * import all the libraries we need\n", 339 | " * define a function that both loads our model and also defines and returns a second function that applies that model\n", 340 | "* **A second file that we won't name yet that will:**\n", 341 | " * import all the libraries we need, including serve.py\n", 342 | " * run the function we defined in serve.py (this loads in our data & saves the function we defined in serve.py to whatever name we give this variable)\n", 343 | " * create an instance of Flask\n", 344 | " * define a path and method for our API that matches [the specifications we wrote yesterday](https://www.kaggle.com/rtatman/careercon-intro-to-apis)\n", 345 | " * define a function to be executed at that path \n", 346 | "\n", 347 | "Credit where it's due: this specific architecture is based on this API by Guillaume Genthial [in this blog post](https://guillaumegenthial.github.io/serving.html). I'd recommend checking the blog post out, especially if you're working with a TensorFlow model.\n", 348 | "\n", 349 | "So here's what's going into the two files that will do the bulk of the work in our app:" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "## This is what will go in our serve.py file\n", 357 | "\n", 358 | "Note that, for this function to work, we need to save our model file as \"processor.pkl\" in the same directory as our serve.py file. (It should already be in your current working directory because we dumped our model to a pickle in the section above.)" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 8, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "from flashtext.keyword import KeywordProcessor\n", 368 | "import pickle\n", 369 | "\n", 370 | "# Function that takes loads in our pickled word processor\n", 371 | "# and defines a function for using it. This makes it easy\n", 372 | "# to do these steps together when serving our model.\n", 373 | "def get_keywords_api():\n", 374 | " \n", 375 | " # read in pickled word processor. You could also load in\n", 376 | " # other models as this step.\n", 377 | " keyword_processor = pickle.load(open(\"processor.pkl\", \"rb\"))\n", 378 | " \n", 379 | " # Function to apply our model & extract keywords from a \n", 380 | " # provided bit of text\n", 381 | " def keywords_api(keywordProcessor, text, span_info=True): \n", 382 | " keywords_found = keywordProcessor.extract_keywords(text, span_info=True) \n", 383 | " return keywords_found\n", 384 | " \n", 385 | " # return the function we just defined\n", 386 | " return keywords_api" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "## This is what will go in our as-yet-unnamed second file" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 9, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "import json\n", 403 | "from flask import Flask, request\n", 404 | "#from serve import get_keywords_api\n", 405 | "# I've commented out the last import because it won't work in kernels, \n", 406 | "# but you should uncomment it when we build our app tomorrow\n", 407 | "\n", 408 | "# create an instance of Flask\n", 409 | "app = Flask(__name__)\n", 410 | "\n", 411 | "# load our pre-trained model & function\n", 412 | "keywords_api = get_keywords_api()\n", 413 | "\n", 414 | "# Define a post method for our API.\n", 415 | "@app.route('/extractpackages', methods=['POST'])\n", 416 | "def extractpackages():\n", 417 | " \"\"\" \n", 418 | " Takes in a json file, extracts the keywords &\n", 419 | " their indices and then returns them as a json file.\n", 420 | " \"\"\"\n", 421 | " # the data the user input, in json format\n", 422 | " input_data = request.json\n", 423 | "\n", 424 | " # use our API function to get the keywords\n", 425 | " output_data = keywords_api(input_data)\n", 426 | "\n", 427 | " # convert our dictionary into a .json file\n", 428 | " # (returning a dictionary wouldn't be very\n", 429 | " # helpful for someone querying our API from\n", 430 | " # java; JSON is more flexible/portable)\n", 431 | " response = json.dumps(output_data)\n", 432 | "\n", 433 | " # return our json file\n", 434 | " return response" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "## Your turn!\n", 442 | "\n", 443 | "Now that you've seen what a very small Flask app looks like, it's time for you to write your own. We'll work more with this code tomorrow when we finish our apps and serve them.\n", 444 | "\n", 445 | "In addition to writing the code, if you don't already have them, I'd recommend creating these accounts ahead of time: \n", 446 | "\n", 447 | "* [A GitHub account](https://github.com/)\n", 448 | "* Either a [Heroku](https://dashboard.heroku.com/login) or [Google Cloud](https://cloud.google.com/) account. If you're using Google Cloud, you'll also want to enable billing. I go over how to do that [in this notebook](https://www.kaggle.com/rtatman/dashboarding-with-notebooks-day-4#Enable-billing). \n" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "## Write your serve.py file here" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 10, 461 | "metadata": {}, 462 | "outputs": [], 463 | "source": [ 464 | "# your code here :) (feel free to copy & paste my code and then modify it for your project -- R)" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "## Write your second file here" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": 11, 477 | "metadata": {}, 478 | "outputs": [], 479 | "source": [ 480 | "# your code here :)" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "See you all tomorrow! :)" 488 | ] 489 | } 490 | ], 491 | "metadata": { 492 | "kernelspec": { 493 | "display_name": "Python 3", 494 | "language": "python", 495 | "name": "python3" 496 | }, 497 | "language_info": { 498 | "codemirror_mode": { 499 | "name": "ipython", 500 | "version": 3 501 | }, 502 | "file_extension": ".py", 503 | "mimetype": "text/x-python", 504 | "name": "python", 505 | "nbconvert_exporter": "python", 506 | "pygments_lexer": "ipython3", 507 | "version": "3.6.4" 508 | } 509 | }, 510 | "nbformat": 4, 511 | "nbformat_minor": 1 512 | } 513 | --------------------------------------------------------------------------------