├── floyd_requirements.txt ├── .gitignore ├── d3-cloropleth-map.png ├── README.md └── quick_start.ipynb /floyd_requirements.txt: -------------------------------------------------------------------------------- 1 | plotly 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .floydignore 2 | .floydexpt 3 | command.sh 4 | .ipynb_checkpoints -------------------------------------------------------------------------------- /d3-cloropleth-map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/floydhub/my-notebook/master/d3-cloropleth-map.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Your first Jupyter Notebook on FloydHub 2 | This tutorial introduces FloydHub and how to use Jupyter Notebooks for your experiments. 3 | 4 | ### Here’s what we’ll learn in this guide: 5 | 6 | - How to use Jupyter Notebooks on FloydHub 7 | - How to Create, Explore, and Mount datasets on FloydHub to use in your code 8 | - FloydHub best practices: 9 | 1. How and why to keep datasets separate from code as standalone Datasets 10 | 2. How to sync your remote FloydHub experiments locally to your machine 11 | 3. How to use .floydignore for low-bandwidth situations 12 | -------------------------------------------------------------------------------- /quick_start.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Running Jupyter Notebooks on FloydHub\n", 8 | "\n", 9 | "[Jupyter Notebooks](https://jupyter.org/) are great for interactively writing, running and sharing your code right from your browser. This tutorial teaches you the basics of running GPU-powered Notebooks on FloydHub.\n", 10 | "\n", 11 | "Running a Jupyter Notebook on FloydHub is easy. Simply type this on your terminal:\n", 12 | "```\n", 13 | "floyd run --mode jupyter --gpu\n", 14 | "```\n", 15 | "This will open a Jupyter Notebook with GPU support, running on FloydHub's servers. If you're viewing this Notebook on FloydHub, you probably already did that! For more info, here's a [quick start tutorial](https://docs.floydhub.com/getstarted/quick_start_jupyter/)." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "### CPU and GPU Support\n", 23 | "\n", 24 | "Notice the `--gpu` flag in the above command? That's all you need to do to get access to a powerful GPU in the cloud. You can view the stats and usage of your GPU by executing the command in the cell below (press **`shift + enter`**)." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "!nvidia-smi" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "Note that you will see the GPU stats only if you use the `--gpu` flag in your `floyd run` command. You can run your Notebook on a CPU machine by omitting the flag or using `--cpu`" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "### Deep Learning Environments\n", 48 | "\n", 49 | "FloydHub comes with fully-configured and optimized environments for all deep learning frameworks! So, you don't have to fiddle with installing CUDA drivers, the framework(s) of your choice and all their dependencies.\n", 50 | "\n", 51 | "The default environment is the latest version of Tensorflow and Keras. Go ahead and run the next cell." 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "import tensorflow as tf\n", 61 | "print(tf.__name__)\n", 62 | "print(tf.__version__)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "import keras\n", 72 | "print(keras.__name__)\n", 73 | "print(keras.__version__)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "If you want to use a different framework, you can specify this using the `--env` flag when you start your Notebook. Want a PyTorch Notebook? Start your Notebook with the following command from your local terminal\n", 81 | "```\n", 82 | "floyd run --mode jupyter --gpu --env pytorch-0.2\n", 83 | "```\n", 84 | "You can see the complete list of deep learning environments [here](https://docs.floydhub.com/guides/environments/)." 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "### Installing Dependencies\n", 92 | "\n", 93 | "All the environments also include lots of common machine learning and deep learning libraries like [Numpy](http://www.numpy.org/), [Pandas](http://pandas.pydata.org/) and [Matplotlib](https://matplotlib.org/)." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "import numpy as np\n", 103 | "a = np.arange(15).reshape(3, 5)\n", 104 | "print(a)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "Of course, we might not have all the packages you want. You can install your own packages from inside your Notebook! Let's install the `plotly` Python package." 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "! pip install plotly" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "You might have more involved requirements - we got you covered!\n", 128 | "\n", 129 | "Say, you want to install multiple Python packages. See how to use [floyd_requirements.txt](https://docs.floydhub.com/guides/jobs/installing_dependencies/#installing-python-dependencies).\n", 130 | "\n", 131 | "Or, your dependency isn't a Python package at all and you want to install it via `apt-get` or even compile it from source. Take a look at our in-depth guide on [installing extra dependencies](https://docs.floydhub.com/guides/jobs/installing_dependencies/#installing-non-python-dependencies)." 132 | ] 133 | }, 134 | { 135 | "attachments": {}, 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "### Training a model for handwritten digit recognition\n", 140 | "\n", 141 | "MNIST is a simple computer vision dataset of handwritten digits like these:\n", 142 | "\n", 143 | "Owing to its popularity, it is commonly called the \"Hello World\" of machine learning! You can read more about it in the [Tensorflow's tutorial](https://www.tensorflow.org/get_started/mnist/beginners).\n", 144 | "\n", 145 | "We will now train a simple multilayer perceptron model to recognize the handwritten digits using Keras" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": { 152 | "collapsed": true 153 | }, 154 | "outputs": [], 155 | "source": [ 156 | "from keras.models import Sequential, save_model, load_model\n", 157 | "from keras.layers import Dense, Dropout\n", 158 | "from keras.optimizers import RMSprop\n", 159 | "from keras.datasets import mnist\n", 160 | "from keras.utils import np_utils\n", 161 | "from keras.callbacks import ModelCheckpoint" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": { 168 | "collapsed": true 169 | }, 170 | "outputs": [], 171 | "source": [ 172 | "# Hyper parameters\n", 173 | "batch_size = 128\n", 174 | "nb_epoch = 10\n", 175 | "\n", 176 | "# Parameters for MNIST dataset\n", 177 | "nb_classes = 10\n", 178 | "\n", 179 | "# Parameters for MLP\n", 180 | "prob_drop_input = 0.2 # drop probability for dropout @ input layer\n", 181 | "prob_drop_hidden = 0.5 # drop probability for dropout @ fc layer" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# Load MNIST dataset from the internet (https://s3.amazonaws.com/img-datasets/mnist.npz)\n", 191 | "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n", 192 | "\n", 193 | "# Split the dataset into a training set and test set\n", 194 | "X_train = X_train.reshape(60000, 784)\n", 195 | "X_test = X_test.reshape(10000, 784)\n", 196 | "X_train = X_train.astype('float32')\n", 197 | "X_test = X_test.astype('float32')\n", 198 | "X_train /= 255\n", 199 | "X_test /= 255\n", 200 | "Y_Train = np_utils.to_categorical(y_train, nb_classes)\n", 201 | "Y_Test = np_utils.to_categorical(y_test, nb_classes)" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": { 208 | "scrolled": true 209 | }, 210 | "outputs": [], 211 | "source": [ 212 | "# Multilayer Perceptron model\n", 213 | "model = Sequential()\n", 214 | "model.add(Dense(activation=\"sigmoid\", units=625, input_dim=784, kernel_initializer=\"normal\", name=\"dense1\"))\n", 215 | "model.add(Dropout(prob_drop_input, name='dropout1'))\n", 216 | "model.add(Dense(activation=\"sigmoid\", units=625, input_dim=625, kernel_initializer=\"normal\", name=\"dense2\"))\n", 217 | "model.add(Dropout(prob_drop_hidden, name='dropout2'))\n", 218 | "model.add(Dense(activation=\"softmax\", units=10, input_dim=625, kernel_initializer=\"normal\", name=\"dense3\"))\n", 219 | "model.compile(optimizer=RMSprop(lr=0.001, rho=0.9), loss='categorical_crossentropy', metrics=['accuracy'])\n", 220 | "\n", 221 | "# Print summary of the model\n", 222 | "model.summary()" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": { 229 | "scrolled": false 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "# Define directories to save the model checkpoints and logs\n", 234 | "save_model(model, '/output/model_mlp')\n", 235 | "!mkdir -p /output/logs\n", 236 | "checkpoint = ModelCheckpoint(filepath='/output/logs/weights.epoch.{epoch:02d}-val_loss.{val_loss:.2f}.hdf5', verbose=0)\n", 237 | "\n", 238 | "# Start training model\n", 239 | "history = model.fit(X_train, Y_Train, epochs=nb_epoch, batch_size=batch_size, verbose=1,\n", 240 | " callbacks=[checkpoint], validation_data=(X_test, Y_Test))" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "We trained our model for 10 epochs. At the end of the 10th epoch, our accuracy on the holdout validation set is around 97%. Not bad!\n", 248 | "\n", 249 | "Now, let's test our model on the Test set." 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "scrolled": true 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "# Evaluate\n", 261 | "evaluation = model.evaluate(X_test, Y_Test, verbose=1)\n", 262 | "print('\\nSummary: Loss over the test dataset: %.2f, Accuracy: %.2f' % (evaluation[0], evaluation[1]))" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "That's it folks! You learnt some of the basics of using Jupyter Notebook on FloydHub and trained a pretty sleek model to recognize handwritten digits. Feel free to play around! (and don't forget to shutdown your job)\n", 270 | "\n", 271 | "Below, we'll talk about slightly more advanced FloydHub constructs. \n", 272 | "\n", 273 | "* How do you save your data so you can come back later and use it? \n", 274 | "* How do you find and use others' public datasets in your job?\n", 275 | "* How do I restart my old Notebook?" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "### Saving Output Data on FloydHub\n", 283 | "\n", 284 | "We just trained a model that recognizes handwritten digits with a 98% accuracy! We, of course, want to save the model that we trained so we can utilize it later.\n", 285 | "\n", 286 | "If you look at the code above, you will notice that the model checkpoints and logs are stored under `/output/model_mlp` and `/output/logs` respectively.\n", 287 | "```\n", 288 | "save_model(model, '/output/model_mlp')\n", 289 | "!mkdir -p /output/logs\n", 290 | "checkpoint = ModelCheckpoint(filepath='/output/logs/weights.epoch.{epoch:02d}-val_loss.{val_loss:.2f}.hdf5', verbose=0)\n", 291 | "```\n", 292 | "\n", 293 | "**The `/output` directory is a special directory on FloydHub.** Any directories, subdirectories or files that you create under the `/output` directory will be saved for you to use later, even after you close your Jupyter Notebook. \n", 294 | "\n", 295 | "**tl;dr: Please ensure that any data that you want to persist should be saved under `/output`. Data stored in any other location will be deleted when you end your Jupyter Notebook job.** Please see our extensive guide on [saving persistant outputs on FloydHub](https://docs.floydhub.com/guides/data/storing_output/)." 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "### Using FloydHub's Public Datasets in your Jobs\n", 303 | "\n", 304 | "FloydHub has a ton of popular datasets. These are community contributed datasets for many machine learning and deep learning tasks. You can find them in the [Explore Page](https://www.floydhub.com/explore/trending) or using the [Search box](https://www.floydhub.com/search/datasets?query=)" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "In the above example, we downloaded the MNIST dataset from the internet using this line of code:\n", 312 | "```\n", 313 | "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n", 314 | "```\n", 315 | "This works well because the MNIST dataset is only about 11MB. If you had a larger dataset, it'd be a pain to download it every time. We highly recommend [creating a separate dataset](https://docs.floydhub.com/guides/create_and_upload_dataset/) or using a publicly available dataset. Here's a public MNIST dataset on FloydHub: [https://www.floydhub.com/redeipirati/datasets/mnist](https://www.floydhub.com/redeipirati/datasets/mnist)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "To use a public dataset in your job, you need to _mount_ it when you execute your `floyd run` command:\n", 323 | "```\n", 324 | "floyd run --mode jupyter --gpu --data redeipirati/datasets/mnist/1:mnist\n", 325 | "```\n", 326 | "This will make this dataset available at `/mnist` for your code to access. You can read more about mounting datasets in our [docs here](https://docs.floydhub.com/guides/data/mounting_data/)." 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": {}, 332 | "source": [ 333 | "### Saving and Stopping your Notebook\n", 334 | "\n", 335 | "You can save the progress you've made in your Notebook by clicking `File -> Save and Checkpoint`. You can view your saved Notebook from the `Code` tab of your job.\n", 336 | "\n", 337 | "Here's [an example](https://www.floydhub.com/emilwallner/projects/deep-learning-from-scratch/3/code/MNIST_deep_learning.ipynb).\n", 338 | "\n", 339 | "**Once you're done working on your Notebook, don't forget to shutdown your job!** You can shutdown your job by clicking on `Cancel` in your job's dashboard. Here's [our guide](https://docs.floydhub.com/guides/stop_job/).\n", 340 | "\n", 341 | "\n", 342 | "\n", 343 | "**Note that simply closing the Notebook tab does not shutdown the job.** Since Jupyter Notebooks are interactive development environments, we don't know if you're done for the day or if you're going to come back and continue working on your Notebook. So, we'll keep your your Notebook running (and charge you) till you explicitly shut it down." 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "### Restarting your Notebook\n", 351 | "\n", 352 | "You can also restart your Notebook to continue working from where you left off by clicking on the `Restart` button. Here's the [guide for it](https://docs.floydhub.com/guides/restart_job/).\n", 353 | "\n", 354 | "" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "## FloydHub Best Practices\n", 362 | "\n", 363 | "### a. Keeping code separate from data\n", 364 | "\n", 365 | "A keypoint of your experiments and a data science best pratice is to have a clean separation of the code from the data that it uses. This will allow you to structure the experiments/Jobs in a more elegant way and optimize the code you need to upload on FloydHub and speed up the experiment cycle iterations.\n", 366 | "\n", 367 | "### b. Sync your remote experiments locally\n", 368 | "\n", 369 | "If you have followed this tutorial, you have certainly noticed that we have worked only on a FloydHub remote Job and the code we have locally is not synced with the current state of our Jupyter Notebook.\n", 370 | "\n", 371 | "If you’d like to update everything locally, we can download everything from the Output tab of the Job's Overview of the Web Dashboard or by using the CLI with `floyd data clone `.\n", 372 | "\n", 373 | "You can read more on [output download](https://docs.floydhub.com/guides/download_output/) on our docs.\n", 374 | "\n", 375 | "### c. Using .floydignore\n", 376 | "\n", 377 | "Use `.floydignore` will can speed up your upload and experiments iterations if your project code contains items that can be ignored from experiments code’s point of view (such as docs, images and video). See our FAQ about [long sync](https://docs.floydhub.com/faqs/job/#my-job-is-taking-a-while-to-sync-changes-how-do-i-make-it-go-faster).\n", 378 | "\n", 379 | "**Note**: If your internet connection have a low bandwidth in upload, with this file you can really improve your experience on our service." 380 | ] 381 | } 382 | ], 383 | "metadata": { 384 | "kernelspec": { 385 | "display_name": "Python 3", 386 | "language": "python", 387 | "name": "python3" 388 | }, 389 | "language_info": { 390 | "codemirror_mode": { 391 | "name": "ipython", 392 | "version": 3 393 | }, 394 | "file_extension": ".py", 395 | "mimetype": "text/x-python", 396 | "name": "python", 397 | "nbconvert_exporter": "python", 398 | "pygments_lexer": "ipython3", 399 | "version": "3.5.3" 400 | } 401 | }, 402 | "nbformat": 4, 403 | "nbformat_minor": 2 404 | } 405 | --------------------------------------------------------------------------------