├── README.md ├── hyperparameters ├── README.md ├── hyperparameters.ipynb ├── pickle_utils.py └── requirements.txt ├── modelserver ├── README.md ├── app.py └── requirements.txt ├── pywren ├── README.md ├── adding.ipynb ├── counting.ipynb ├── pywren_utils.py └── requirements.txt └── zappa ├── README.md ├── app.py └── app2.py /README.md: -------------------------------------------------------------------------------- 1 | # Serverless for data scientists 2 | 3 | Code and notebooks for a talk given at PyBay, 2018-08-19. 4 | 5 | - [Slides and words](https://mike.place/talks/serverless/) 6 | - [Video](https://www.youtube.com/watch?v=9PR2-ogB5qM) 7 | 8 | ## Contents 9 | 10 | 1. [Deployment of a basic flask application with zappa](zappa/) 11 | 2. [Pywren demo](pywren/) 12 | 3. [Hyperparameter optimization with pywren](hyperparameters/) 13 | 4. [Model deployment on AWS Lambda with zappa](modelserver/) 14 | 15 | ## Projects and resources mentioned in the talk 16 | 17 | - [zappa](https://github.com/Miserlou/zappa) -- tool for AWS Lambda deployment 18 | of Python applications 19 | - [dwx](http://github.com/williamsmj/dwx) -- Twitter bot that runs on AWS 20 | Labmda. 21 | - [Serving 39 Million Requests for 22 | $370/Month](https://trackchanges.postlight.com/serving-39-million-requests-for-370-month-or-how-we-reduced-our-hosting-costs-by-two-orders-of-edc30a9a88cd) 23 | -- AWS Lambda migration example 24 | - [Occupy the Cloud: Distributed Computing for the 25 | 99%](https://arxiv.org/abs/1702.04024) -- the pywren paper 26 | - [pywren](https://github.com/pywren/pywren) -- tool for embarrassingly 27 | parallel computation using AWS Lambda as the backend 28 | - [riscamp 2017 pywren 29 | tutorial](https://github.com/ucbrise/risecamp/tree/risecamp2017/pywren) -- 30 | more detailed look at pywren than my talk 31 | - [pywren web 32 | scraping](https://blog.seanssmith.com/posts/pywren-web-scraping.html) -- 33 | blog post that inspired the webscraping example in the talk 34 | - [305 Million Solutions to The Black-Scholes Equation in 16 Minutes with AWS 35 | Lambda](http://www.bradfordlynch.com/blog/2017/05/28/ComputeOnLambda.html) 36 | -- large scale embarrassingly parallel numerical computation example 37 | - [nips.json](https://github.com/williamsmj/nips.json) -- code to scrape the 38 | NIPS website using pywren (and the data scraped) 39 | - [Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of 40 | Tiny 41 | Threads](https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/fouladi) 42 | -- an example of AWS Lambda as part of a long-lived data processing pipeline 43 | (doesn't use pywren) 44 | -------------------------------------------------------------------------------- /hyperparameters/README.md: -------------------------------------------------------------------------------- 1 | # Hyperparameter optimization with pywren 2 | 3 | 1. Install matplotlib, notebook, numpy, pywren, scikit-learn and scipy, e.g. 4 | `pip install -r requirements.txt` 5 | 6 | 2. Run the notebook! 7 | -------------------------------------------------------------------------------- /hyperparameters/hyperparameters.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Training a model locally" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "%matplotlib inline\n", 17 | "import matplotlib.pyplot as plt\n", 18 | "\n", 19 | "import sklearn.datasets\n", 20 | "from sklearn.model_selection import train_test_split\n", 21 | "from sklearn.neighbors import KNeighborsClassifier" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 2, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "X, y = sklearn.datasets.make_blobs(n_samples=1000,\n", 31 | " centers=2,\n", 32 | " center_box=(4,8),\n", 33 | " random_state=42)" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 3, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "data": { 43 | "image/png": "\n", 44 | "text/plain": [ 45 | "
" 46 | ] 47 | }, 48 | "metadata": {}, 49 | "output_type": "display_data" 50 | } 51 | ], 52 | "source": [ 53 | "fig, ax = plt.subplots()\n", 54 | "\n", 55 | "plt.scatter(X[:,0], X[:,1], c=y, alpha=0.5);\n", 56 | "\n", 57 | "fig.savefig(\"/Users/mike/Desktop/fig.pdf\")" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.5, random_state=0)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 5, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "0.838" 78 | ] 79 | }, 80 | "execution_count": 5, 81 | "metadata": {}, 82 | "output_type": "execute_result" 83 | } 84 | ], 85 | "source": [ 86 | "from sklearn.neighbors import KNeighborsClassifier\n", 87 | "\n", 88 | "classifier = KNeighborsClassifier(n_neighbors=5)\n", 89 | "classifier.fit(Xtrain, ytrain)\n", 90 | "classifier.score(Xtest, ytest)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "# Hyperparameter optimization with pywren" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 6, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "all_hyperparams = [{'n_neighbors': k} for k in range(1, 9)]" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 7, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "data": { 116 | "text/plain": [ 117 | "[{'n_neighbors': 1},\n", 118 | " {'n_neighbors': 2},\n", 119 | " {'n_neighbors': 3},\n", 120 | " {'n_neighbors': 4},\n", 121 | " {'n_neighbors': 5},\n", 122 | " {'n_neighbors': 6},\n", 123 | " {'n_neighbors': 7},\n", 124 | " {'n_neighbors': 8}]" 125 | ] 126 | }, 127 | "execution_count": 7, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "all_hyperparams" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 8, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "def train_model(hyperparams):\n", 143 | " classifier = KNeighborsClassifier(**hyperparams)\n", 144 | " classifier.fit(Xtrain, ytrain)\n", 145 | " return classifier" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 9, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "import pywren\n", 155 | "pwex = pywren.default_executor()" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 10, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "name": "stderr", 165 | "output_type": "stream", 166 | "text": [ 167 | "/Users/mike/p/talks/pybay-serverless-datascientists/hyperparameters/.direnv/python-3.6.3/lib/python3.6/site-packages/sklearn/base.py:311: UserWarning: Trying to unpickle estimator KNeighborsClassifier from version 0.19.0 when using version 0.19.2. This might lead to breaking code or invalid results. Use at your own risk.\n", 168 | " UserWarning)\n" 169 | ] 170 | } 171 | ], 172 | "source": [ 173 | "classifiers = pywren.get_all_results(pwex.map(train_model, all_hyperparams))" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 11, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "name": "stdout", 183 | "output_type": "stream", 184 | "text": [ 185 | "{'n_neighbors': 1} 0.758\n", 186 | "{'n_neighbors': 2} 0.782\n", 187 | "{'n_neighbors': 3} 0.814\n", 188 | "{'n_neighbors': 4} 0.82\n", 189 | "{'n_neighbors': 5} 0.838\n", 190 | "{'n_neighbors': 6} 0.838\n", 191 | "{'n_neighbors': 7} 0.842\n", 192 | "{'n_neighbors': 8} 0.828\n" 193 | ] 194 | } 195 | ], 196 | "source": [ 197 | "for hyperparams, classifier in zip(all_hyperparams, classifiers):\n", 198 | " print(hyperparams, classifier.score(Xtest, ytest))" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "# Deployment and serving" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 12, 211 | "metadata": {}, 212 | "outputs": [], 213 | "source": [ 214 | "import pickle_utils\n", 215 | "\n", 216 | "pickle_utils.pickle_to_s3(bucket_name=\"modelservingdemo\",\n", 217 | " key=\"classifier.pkl\",\n", 218 | " obj=classifiers[6])" 219 | ] 220 | } 221 | ], 222 | "metadata": { 223 | "kernelspec": { 224 | "display_name": "Python 3", 225 | "language": "python", 226 | "name": "python3" 227 | }, 228 | "language_info": { 229 | "codemirror_mode": { 230 | "name": "ipython", 231 | "version": 3 232 | }, 233 | "file_extension": ".py", 234 | "mimetype": "text/x-python", 235 | "name": "python", 236 | "nbconvert_exporter": "python", 237 | "pygments_lexer": "ipython3", 238 | "version": "3.6.3" 239 | } 240 | }, 241 | "nbformat": 4, 242 | "nbformat_minor": 2 243 | } 244 | -------------------------------------------------------------------------------- /hyperparameters/pickle_utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import pickle 3 | from io import BytesIO 4 | 5 | import boto3 6 | from botocore.config import Config 7 | 8 | s3client = boto3.client('s3', config=Config(signature_version='s3v4')) 9 | 10 | 11 | def congfigure_bucket_as_website(bucket_name): 12 | s3client.create_bucket(Bucket=bucket_name) 13 | website_configuration = { 14 | "ErrorDocument": {"Key": "error.html"}, 15 | "IndexDocument": {"Suffix": "index.html"}, 16 | } 17 | s3client.put_bucket_website(Bucket=bucket_name, 18 | WebsiteConfiguration=website_configuration) 19 | bucket_policy = json.dumps( 20 | { 21 | "Version": "2012-10-17", 22 | "Statement": [ 23 | { 24 | "Sid": "PublicReadGetObject", 25 | "Effect": "Allow", 26 | "Principal": "*", 27 | "Action": ["s3:GetObject"], 28 | "Resource": ["arn:aws:s3:::{}/*".format(bucket_name)], 29 | } 30 | ], 31 | } 32 | ) 33 | s3client.put_bucket_policy(Bucket=bucket_name, Policy=bucket_policy) 34 | 35 | 36 | def pickle_to_s3(obj, bucket_name, key): 37 | congfigure_bucket_as_website(bucket_name) 38 | with BytesIO() as data: 39 | pickle.dump(obj, data) 40 | data.seek(0) 41 | s3client.create_bucket(Bucket=bucket_name) 42 | s3client.put_object(Body=data, Bucket=bucket_name, Key=key) 43 | 44 | 45 | def pickle_from_s3(bucket_name, key): 46 | with BytesIO() as data: 47 | s3client.download_fileobj(Bucket=bucket_name, Key=key, Fileobj=data) 48 | data.seek(0) 49 | return pickle.load(data) 50 | -------------------------------------------------------------------------------- /hyperparameters/requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib==2.2.3 2 | notebook==5.7.8 3 | numpy==1.15.0 4 | pywren==0.3.0 5 | scikit-learn==0.19.2 6 | scipy==1.1.0 7 | -------------------------------------------------------------------------------- /modelserver/README.md: -------------------------------------------------------------------------------- 1 | # Model deployment on AWS Lambda with zappa 2 | 3 | 1. Follow the instructions in [../hyperparameters/](../hyperparameters/) to 4 | train and upload and model to S3. 5 | 6 | 2. Create and activate a virtual environment (note you must create a fresh 7 | virtual environment to use zappa), e.g. 8 | 9 | python -m virtualenv venv && source venv/bin/activate 10 | 11 | 2. Install the requirements: 12 | 13 | pip install -r requirements.txt 14 | 15 | 4. Run `zappa init` and accept all the defaults 16 | 17 | 5. Run `zappa deploy`. The output of this command (or `zappa status`) includes 18 | the public `amazonaws.com` URL of your deployed flask application. It will 19 | look something like 20 | `https://abcdef123.execute-api.us-east-1.amazonaws.com/dev/` 21 | 22 | 6. Visit `your_amazonaws_url/predict?feature_1=3&feature_2=6` to get a 23 | prediction. 24 | -------------------------------------------------------------------------------- /modelserver/app.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | 3 | import requests 4 | from flask import Flask, request 5 | 6 | app = Flask(__name__) 7 | 8 | url = "https://s3.amazonaws.com/modelservingdemo/classifier.pkl" 9 | r = requests.get(url) 10 | classifier = pickle.loads(r.content) 11 | 12 | 13 | @app.route("/predict") 14 | def predict(): 15 | X = [[float(request.args['feature_1']), 16 | float(request.args['feature_2'])]] 17 | label = classifier.predict(X) 18 | if label == 0: 19 | return 'purple blob' 20 | else: 21 | return 'yellow blob' 22 | 23 | 24 | if __name__ == "__main__": 25 | app.run() 26 | -------------------------------------------------------------------------------- /modelserver/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.15.0 2 | scikit-learn==0.19.2 3 | scipy==1.1.0 4 | requests==2.20.0 5 | -------------------------------------------------------------------------------- /pywren/README.md: -------------------------------------------------------------------------------- 1 | # Pywren demo 2 | 3 | 1. Install pywren (e.g. `pip install pywren`). If you want to make the plot, 4 | install the rest of the requirements in requirements.txt (`e.g. pip install 5 | -r requirements.txt`) 6 | 7 | 2. Configure AWS credentials, e.g. set the appropriate environment variables or 8 | put them in `~/.aws/credentials` 9 | 10 | 3. Run `pywren-setup` and accept the defaults. 11 | 12 | 4. Launch the notebooks. 13 | 14 | 5. For a more complete demo, see the [pywren tutorial at the 2017 15 | risecamp](https://github.com/ucbrise/risecamp/tree/risecamp2017/pywren). 16 | -------------------------------------------------------------------------------- /pywren/adding.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Local application of functions to lists" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "def square(x):\n", 17 | " return x * x" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "parameters = [0, 1, 2, 3, 4, 5]" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 3, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "data": { 36 | "text/plain": [ 37 | "[0, 1, 4, 9, 16, 25]" 38 | ] 39 | }, 40 | "execution_count": 3, 41 | "metadata": {}, 42 | "output_type": "execute_result" 43 | } 44 | ], 45 | "source": [ 46 | "[square(x) for x in parameters]" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 4, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "mapped = map(square, parameters)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 5, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "data": { 65 | "text/plain": [ 66 | "" 67 | ] 68 | }, 69 | "execution_count": 5, 70 | "metadata": {}, 71 | "output_type": "execute_result" 72 | } 73 | ], 74 | "source": [ 75 | "mapped" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 6, 81 | "metadata": {}, 82 | "outputs": [ 83 | { 84 | "data": { 85 | "text/plain": [ 86 | "[0, 1, 4, 9, 16, 25]" 87 | ] 88 | }, 89 | "execution_count": 6, 90 | "metadata": {}, 91 | "output_type": "execute_result" 92 | } 93 | ], 94 | "source": [ 95 | "list(mapped)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "# Remote application of functions to lists with pywren" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 7, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "import pywren\n", 112 | "\n", 113 | "pwex = pywren.default_executor()" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 8, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "futures = pwex.map(square, parameters)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 9, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "data": { 132 | "text/plain": [ 133 | "[,\n", 134 | " ,\n", 135 | " ,\n", 136 | " ,\n", 137 | " ,\n", 138 | " ]" 139 | ] 140 | }, 141 | "execution_count": 9, 142 | "metadata": {}, 143 | "output_type": "execute_result" 144 | } 145 | ], 146 | "source": [ 147 | "futures" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 10, 153 | "metadata": {}, 154 | "outputs": [ 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "[0, 1, 4, 9, 16, 25]" 159 | ] 160 | }, 161 | "execution_count": 10, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "[f.result() for f in futures]" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "# Visualizing remote application\n", 175 | "\n", 176 | "Plotting code taken from the [pywren tutorial at the 2017 risecamp](https://github.com/ucbrise/risecamp/tree/risecamp2017/pywren)" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 11, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "%matplotlib inline\n", 186 | "\n", 187 | "from pywren_utils import plot_pywren_execution" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 12, 193 | "metadata": {}, 194 | "outputs": [ 195 | { 196 | "name": "stdout", 197 | "output_type": "stream", 198 | "text": [ 199 | "Populating the interactive namespace from numpy and matplotlib\n" 200 | ] 201 | }, 202 | { 203 | "data": { 204 | "image/png": "\n", 205 | "text/plain": [ 206 | "
" 207 | ] 208 | }, 209 | "metadata": {}, 210 | "output_type": "display_data" 211 | } 212 | ], 213 | "source": [ 214 | "plot_pywren_execution(futures)" 215 | ] 216 | } 217 | ], 218 | "metadata": { 219 | "kernelspec": { 220 | "display_name": "Python 3", 221 | "language": "python", 222 | "name": "python3" 223 | }, 224 | "language_info": { 225 | "codemirror_mode": { 226 | "name": "ipython", 227 | "version": 3 228 | }, 229 | "file_extension": ".py", 230 | "mimetype": "text/x-python", 231 | "name": "python", 232 | "nbconvert_exporter": "python", 233 | "pygments_lexer": "ipython3", 234 | "version": "3.6.3" 235 | } 236 | }, 237 | "nbformat": 4, 238 | "nbformat_minor": 2 239 | } 240 | -------------------------------------------------------------------------------- /pywren/counting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Counting words with pywren" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pywren\n", 17 | "\n", 18 | "pwex = pywren.default_executor()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "docs = [\"hello world\",\n", 28 | " \"hello pybay\",\n", 29 | " \"hello yourself\"]" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "from collections import Counter\n", 39 | "\n", 40 | "def wordcount(doc):\n", 41 | " return Counter(doc.split())" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 4, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "futures = pwex.map(wordcount, docs)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 5, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "per_doc_counts = pywren.get_all_results(futures)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 6, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/plain": [ 70 | "Counter({'hello': 3, 'world': 1, 'pybay': 1, 'yourself': 1})" 71 | ] 72 | }, 73 | "execution_count": 6, 74 | "metadata": {}, 75 | "output_type": "execute_result" 76 | } 77 | ], 78 | "source": [ 79 | "sum(per_doc_counts, Counter())" 80 | ] 81 | } 82 | ], 83 | "metadata": { 84 | "kernelspec": { 85 | "display_name": "Python 3", 86 | "language": "python", 87 | "name": "python3" 88 | }, 89 | "language_info": { 90 | "codemirror_mode": { 91 | "name": "ipython", 92 | "version": 3 93 | }, 94 | "file_extension": ".py", 95 | "mimetype": "text/x-python", 96 | "name": "python", 97 | "nbconvert_exporter": "python", 98 | "pygments_lexer": "ipython3", 99 | "version": "3.6.3" 100 | } 101 | }, 102 | "nbformat": 4, 103 | "nbformat_minor": 2 104 | } 105 | -------------------------------------------------------------------------------- /pywren/pywren_utils.py: -------------------------------------------------------------------------------- 1 | # From https://github.com/ucbrise/risecamp/tree/risecamp2017/pywren 2 | 3 | def plot_pywren_execution(futures): 4 | import warnings 5 | warnings.filterwarnings("ignore") 6 | from IPython import get_ipython 7 | get_ipython().run_line_magic('pylab', 'inline') 8 | import pylab 9 | import seaborn as sns 10 | import pandas as pd 11 | sns.set_style('whitegrid') 12 | sns.set_context("poster") 13 | import os 14 | import matplotlib.patches as mpatches 15 | import numpy as np 16 | 17 | def collect_execution_info(futures): 18 | results = [f.result() for f in futures] 19 | run_statuses = [f.run_status for f in futures] 20 | invoke_statuses = [f.invoke_status for f in futures] 21 | return {'results' : results,'run_statuses' : run_statuses, 'invoke_statuses' : invoke_statuses} 22 | 23 | info = collect_execution_info(futures) 24 | 25 | # visualization 26 | def visualize_execution(info): 27 | # preparing data 28 | run_df = pd.DataFrame(info['run_statuses']) 29 | invoke_df = pd.DataFrame(info['invoke_statuses']) 30 | info_df = pd.concat([run_df, invoke_df], axis=1) 31 | 32 | def remove_duplicate_columns(df): 33 | Cols = list(df.columns) 34 | for i,item in enumerate(df.columns): 35 | if item in df.columns[:i]: Cols[i] = "toDROP" 36 | df.columns = Cols 37 | return df.drop("toDROP",1) 38 | 39 | info_df = remove_duplicate_columns(info_df) 40 | 41 | total_tasks = len(info_df) 42 | y = np.arange(total_tasks) 43 | 44 | time_offset = np.min(info_df.host_submit_time) 45 | fields = [('host submit', info_df.host_submit_time - time_offset), 46 | ('task start', info_df.start_time - time_offset ), 47 | ('setup done', info_df.start_time + info_df.setup_time - time_offset), 48 | ('task done', info_df.end_time - time_offset), 49 | ('results returned', info_df.download_output_timestamp - time_offset) 50 | ] 51 | 52 | # create plotting env 53 | fig = pylab.figure(figsize=(16, 8)) 54 | ax = fig.add_subplot(1, 1, 1) 55 | 56 | # plotting 57 | patches = [] 58 | palette = sns.color_palette("deep", 6) 59 | point_size = 100 60 | for f_i, (field_name, val) in enumerate(fields): 61 | ax.scatter(val, y, c=palette[f_i], edgecolor='none', s=point_size, alpha=1) 62 | patches.append(mpatches.Patch(color=palette[f_i], label=field_name)) 63 | ax.set_xlabel('wallclock time (sec)') 64 | ax.set_ylabel('task') 65 | 66 | # legend 67 | legend = pylab.legend(handles=patches, 68 | loc='upper left', frameon=True) 69 | legend.get_frame().set_facecolor('#FFFFFF') 70 | 71 | # y ticks 72 | plot_step = 5 73 | y_ticks = np.arange(total_tasks//plot_step + 2) * plot_step 74 | ax.set_yticks(y_ticks) 75 | for y in y_ticks: 76 | ax.axhline(y, c='k', alpha=0.1, linewidth=1) 77 | 78 | # formatting 79 | ax.grid(False) 80 | fig.tight_layout() 81 | 82 | visualize_execution(info) 83 | -------------------------------------------------------------------------------- /pywren/requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib==2.2.3 2 | notebook==5.7.8 3 | pywren==0.3.0 4 | seaborn==0.9.0 5 | -------------------------------------------------------------------------------- /zappa/README.md: -------------------------------------------------------------------------------- 1 | # Deployment of a basic flask application with zappa 2 | 3 | ## Setup and local deployment 4 | 5 | 1. Create and activate a virtual environment (note you must create a fresh 6 | virtual environment to use zappa), e.g. 7 | 8 | python -m virtualenv venv && source venv/bin/activate 9 | 10 | 2. Install flask and zappa 11 | 12 | pip install flask zappa 13 | 14 | 3. `app.py` contains a toy flask application that serves the time. Deploy it 15 | locally by running 16 | 17 | python app.py 18 | 19 | 4. Check it works by visiting `127.0.0.1:5000/time`. You should get the local 20 | time on your computer. 21 | 22 | ## Deployment to AWS Lambda with zappa 23 | 24 | 1. Configure AWS credentials, e.g. set the appropriate environment variables or 25 | put them in `~/.aws/credentials` 26 | 27 | 2. Run `zappa init` and accept all the defaults 28 | 29 | 3. Run `zappa deploy`. The output of this command (or `zappa status`) 30 | includes the public `amazonaws.com` URL of your deployed flask application. 31 | It will look something like 32 | `https://abcdef123.execute-api.us-east-1.amazonaws.com/dev/` 33 | 34 | 4. Visit `your_amazonaws_url/time` to check your app works. You should get 35 | the current time in the UT timezone. 36 | 37 | ## Update a running Lambda deployment and tail the logs 38 | 39 | 1. Now let's switch over to serving `app2.py`, which a version of our time app 40 | that supports timezones. Edit `zappa_settings.json` so that the line 41 | 42 | "app_function": "app.app", 43 | 44 | reads 45 | 46 | "app_function": "app2.app", 47 | 48 | 2. Run `zappa update` 49 | 50 | 3. Visit `your_amazonaws_url/servertime` to get the server time (i.e. UT) 51 | 52 | 4. Visit `your_amazonaws_url/localtime?tz=US/Pacific` to get the time in 53 | the US/Pacific timezone. 54 | 55 | 5. Visit a URL with a deliberate mistake, e.g. 56 | `your_amazonaws_url/localtime?tz=US/Atlantic` (this timezone does not 57 | exist). Run `zappa tail` to tail the logs of your application and observe 58 | the Python exception generated by your URL. 59 | -------------------------------------------------------------------------------- /zappa/app.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | 3 | from flask import Flask 4 | 5 | app = Flask(__name__) 6 | 7 | 8 | @app.route("/time") 9 | def time(): 10 | return str(datetime.datetime.now()) 11 | 12 | 13 | if __name__ == "__main__": 14 | app.run() 15 | -------------------------------------------------------------------------------- /zappa/app2.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import pytz 3 | 4 | from flask import Flask, request 5 | 6 | app = Flask(__name__) 7 | 8 | 9 | @app.route("/servertime") 10 | def servertime(): 11 | return str(datetime.datetime.now()) 12 | 13 | 14 | @app.route("/localtime") 15 | def localtime(): 16 | tz = pytz.timezone(request.args['tz']) 17 | servertime = datetime.datetime.now().astimezone() 18 | localtime = servertime.astimezone(tz) 19 | return str(localtime) 20 | 21 | 22 | if __name__ == "__main__": 23 | app.run() 24 | --------------------------------------------------------------------------------